Digital Image Processing 2023
Digital Image Processing 2023
The video lectures by the authors for specific topics through YouTube enable
easy inference for the readers to apply the learned theory into practice. The
addition of contents at the end of each chapter such as quizzes and review
questions fully prepare readers for further study.
Graduate students, postgraduate students, researchers, and anyone in
general interested in image processing, computer vision, machine learning
domains, etc. will find this book an excellent starting point for information
and an able ally.
Digital Image Processing
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright
.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf
.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
DOI: 10.1201/9781003217428
Typeset in Palatino
by Deanta Global Publishing Services, Chennai, India
Contents
Preface����������������������������������������������������������������������������������������������������������������������ix
Authors��������������������������������������������������������������������������������������������������������������������xi
v
vi Contents
Index...................................................................................................................... 191
Preface
ix
Authors
xi
xii Authors
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
1.1 Introduction
There is a general saying: “A picture is worth a thousand words”. In this book
we are going to learn out-and-out about images. Image processing is one of
the evergreen fields of computer science and engineering. It keeps evolving,
and researchers are consistently working on developing image processing
techniques that provide more features and better accuracy with increased
speed. A special mention goes to the high-speed processing engines that are
available and affordable nowadays. Also, image storage has become much
cheaper. The following discussion should enable the reader to understand
what an image is, what can be understood from an image, how much infor-
mation can be retrieved from an image, and what sort of applications can be
developed from the available information. Stay tuned! Fun awaits you!
DOI: 10.1201/9781003217428-1 1
2 Digital Image Processing
1.2 What Is an Image?
Before dwelling deeper, it is important to understand the fundamentals.
Being strong in the fundamentals will help you throughout this learning
journey.
Disclaimer: We have titled the book “digital image processing”. Hence, we
shall deal with digital images.
According to dictionary definitions, an image, be it digital or still, is only
a binary representation of some visual information. The visual information
can be a simple drawing, photograph, recorded graphs, organization logos,
or anything of this sort. All these images have something in common. If they
are digital images, they all can be stored and saved for future use electroni-
cally on any storage device. Figure 1.1 presents a sample image with infor-
mation inside it. This image is a traffic sign, which gives information about
the signals and signage for drivers. This is a digital image and can be stored
in any digital storage medium. To be more precise, this image was shot with
a digital camera.
FIGURE 1.1
The first image – image with information – digital Image.
Introduction to Image Processing 3
FIGURE 1.2
The “image” processing.
the system would interpret and understand the content to let further actions
happen. The algorithms developed play a major role in understanding the
content with higher accuracy.
Image processing helps users understand the content and context from
any image. Appropriate processing techniques are to be chosen to get the
best results.
1.4 What is a Pixel?
One should understand that an image is nothing but an array or a matrix of
multiple pixels properly arranged in columns and rows. Now, it is also good
to understand what a pixel is. The word pixel originates from “picture ele-
ment”. A pixel is the smallest unit in a digital image. Multiple pixels arranged
in rows and columns form an image. An image is fully composed of pixels.
Figure 1.3 has a picture on the left-hand side (LHS) and the right-hand side
(RHS); the pixels from a particular region of the image can be seen. This will
help you understand that multiple pixels are arranged appropriately to get
4 Digital Image Processing
FIGURE 1.3
Pixels.
a meaningful digital image. The LHS is a complete image, whereas the RHS
represents a part of the same but in the form of pixels. The boxes highlighted
on the RHS image represent the individual pixels. Such multiple individual
pixels are there in the real image. To make it concise, pixels are the smallest
units that form an image. Many more technical details on pixels are pre-
sented in Chapter 2.
1.5 Types of Images
The next very important area of discussion is the types of images. They are
as follows:
1. Binary image.
As one could have guessed, binary is all about 0s and 1s. A binary
will contain only two colors: white and black. Black is represented by
0 and white is represented by 1. Each pixel in this type of image will
have either a value of 0 or 1, representing black or white, respectively
(Figure 1.4). In the binary image, each pixel needs only 1 bit of storage
space. Be it white or black, what we need is just 1 bit to store that pixel.
This is an important aspect to be remembered and this will help in
distinguishing the binary image from the black-and-white image.
2. Black-and-white image.
Most beginners are confused about what a binary image is and
what a black-and-white image is. There is a very fundamental differ-
ence that differentiates the two. When it comes to black-and-white
images, each pixel needs 8 bits of storage space. Each of these pixels
can have a 0 or 1. Again, 0 represents black and 1 represents white.
Multiple 0s and 1s are in an image, but the storage requirement for
the pixels is much higher. This gives smoothness and enriched qual-
ity to the image (Figure 1.5).
Introduction to Image Processing 5
FIGURE 1.4
A binary image.
FIGURE 1.5
A typical black-and-white image.
3. Grayscale image.
The next type of image to be discussed is the grayscale image. It
is a special image that has a range of shades from black to white, i.e.,
the shades should be between white and black. Often people regard
this as no color and they refer to the shades of white and black. The
most commonly used format is the 8-bit format and it accommodates
6 Digital Image Processing
FIGURE 1.6
Grayscale shading pattern.
Introduction to Image Processing 7
FIGURE 1.7
A sample grayscale image.
FIGURE 1.8
Sample input image.
FIGURE 1.9
RGB composition.
FIGURE 1.10
Image processing and agriculture.
FIGURE 1.11
Image processing and automobiles.
10 Digital Image Processing
FIGURE 1.12
Image processing and industry.
FIGURE 1.13
Image processing and medicine.
Introduction to Image Processing 11
FIGURE 1.14
Image processing and defense.
FIGURE 1.15
Welcome screen of Anaconda site.
Introduction to Image Processing 13
FIGURE 1.16
Anaconda installers.
FIGURE 1.17
Python version.
The installation can be started once the download is complete. Once the
extraction is done from the executable files, the screen in Figure 1.18 should
be visible.
Agree to the license requirements (Figure 1.19), then proceed to the next
step.
14 Digital Image Processing
FIGURE 1.18
Anaconda installation screen.
FIGURE 1.19
License agreement.
Introduction to Image Processing 15
FIGURE 1.20
Installation options.
16 Digital Image Processing
FIGURE 1.21
Installation directory.
FIGURE 1.22
Advanced installation options.
Introduction to Image Processing 17
FIGURE 1.23
The progress.
FIGURE 1.24
Anaconda DataSpell IDE.
18 Digital Image Processing
FIGURE 1.25
Anaconda installation complete.
FIGURE 1.26
Verification of installation through the command prompt.
FIGURE 1.27
Command to install OpenCV.
Introduction to Image Processing 19
FIGURE 1.28
Confirmation of successful OpenCV installation.
1. Linear algebra
2. Probability and statistics
3. Signals and systems
4. Differential equations
5. Digital electronics
6. Programming skills (or the logic)
Do not be intimidated by the list! We will make sure the learning is imparted
in the way that is practical.
It is hoped that this chapter refreshed your must-know fundamentals
of image processing. It’s time to move on to learning the basic concepts of
image formation, characteristics of image operations, and image types in the
next chapter.
1.9 Quiz
1. Every image has some information inside. True or false?
2. All images must be even sized. True or false?
3. Image processing is all about understanding the _____ and _____ of
an image.
4. Image processing techniques are meant to only understand the
image. True or false?
20 Digital Image Processing
1.9.1 Answers
1. True
2. False
3. Context and content
4. False. It can be used to enhance the image.
5. Pixel
6. Pixels.
7. 0s and 1s (representing black and white)
8. One bit.
9. Black and white (0s and 1s).
10. 8 bits
11. Range of colors between black and white
12. Red, green, and blue
1.10 Review Questions
1. Define image.
2. Define image processing.
3. Why process an image?
4. What is a pixel and how is it important for an image?
5. What are the types of images you know?
6. Are the black-and-white and grayscale images the same in terms of
the content? Explain.
7. How is a binary image different from a black-and-white image?
Introduction to Image Processing 21
1.10.1 Answers
Further Reading
Abràmoff, M.D., Magalhães, P.J. and Ram, S.J., 2004. Image processing with ImageJ.
Biophotonics International, 11(7), pp. 36–42.
Jain, A.K., 1989. Fundamentals of digital image processing. Englewood Cliffs, NJ: Prentice
Hall.
Petrou, M. and Petrou, C., 2010. Image processing: The fundamentals. John Wiley & Sons.
Russ, J.C., 2016. The image processing handbook. CRC Press.
Sonka, M., Hlavac, V. and Boyle, R., 2014. Image processing, analysis, and machine vision.
Cengage Learning.
Weeks, A.R., 1996. Fundamentals of electronic image processing (pp. 316–414). Bellingham:
SPIE Optical Engineering Press.
Young, I.T., Gerbrands, J.J. and Van Vliet, L.J., 1998. Fundamentals of image processing
(Vol. 841). Delft: Delft University of Technology.
2
Image Processing Fundamentals
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
• Image formation
• Concept of bits per pixel
• Brightness, contrast, and intensity
• Pixel resolution and pixel density
• Color models
• Characteristics of image operations
• Types of images
• Steps in digital image processing
• Elements of digital image processing
2.1 Introduction
The previous chapter just hinted at information about pixels and the type
of images at a very elementary level and may not be sufficient for an aspir-
ing image processing enthusiast. However, this chapter will enhance under-
standing of the fundamentals of image processing. This chapter provides
clear-cut information about pixels, including more technical details. Also,
this chapter dives deeper with some implementation examples for the con-
cepts being dealt with. This chapter is a mix of theoretical and practical
understanding of the concepts. Again, it is very important for the reader
to have installed the software packages detailed in Chapter 1. This chapter
concludes with analyzing color models followed by the complete analysis of
the steps involved in digital image processing.
DOI: 10.1201/9781003217428-2 23
24 Digital Image Processing
FIGURE 2.1
Fundamental principle of reflection.
Image Processing Fundamentals 25
the camera) to an image plane. (An image plane is the surface where the
image is rendered.)
Image formation can be described as occurring in three phases. The first
phase is the scene getting illuminated (lighted) by a source. In Figure 2.2
the light source is the sun. In the second phase, the scene that is illuminated
reflects the radiations to the camera in focus (Figure 2.3). The third phase
happens through the sensors in the camera, which can sense the radiations
(Figure 2.4) completing the whole process.
FIGURE 2.2
Image formation: Phase 1.
FIGURE 2.3
Image formation: Phase 2.
26 Digital Image Processing
FIGURE 2.4
Image formation sequence.
FIGURE 2.5
Bayer mosaic.
The output from this phase, i.e., from the Bayer mosaic, is analog in nature
and needs conversion. The conversion becomes mandatory as the analog
signals cannot be processed digitally. Also, for storage it has to be a digital
signal. Hence, the conversion from an analog to digital signal is mandatory.
This is represented diagrammatically in Figure 2.7.
The continuous data has to be converted to the digital form with the fol-
lowing two steps:
FIGURE 2.6
Image formation with Bayer filters.
FIGURE 2.7
Analog to digital conversion.
Image Processing Fundamentals 29
FIGURE 2.8
Quantization and sampling.
TABLE 2.1
Bits Per Pixel versus Number of Colors
Bits per Pixel (bpp) Number of Colors
1 bpp 2 colors
2 bpp 4 colors
3 bpp 8 colors
4 bpp 16 colors
5 bpp 32 colors
6 bpp 64 colors
7 bpp 128 colors
8 bpp 256 colors
10 bpp 1024 colors
16 bpp 65,536 colors
24 bpp 16,777,216 colors (16.7 million colors)
32 bpp 4,294,967,296 colors (4294 million colors)
30 Digital Image Processing
Now, let’s try substituting the values for bpp. For 1, the formula will give 21
= 2. Hence, it is 2 colors. Similarly, let’s substitute a higher number for bpp: 28
= 256 colors is present per pixel.
But, the point to remember is that all of the colors are none other than a
variant (shade) of R, G, and B. This is how all colors are derived.
The next topic to be discussed is intensity.
FIGURE 2.9
Image intensity. Note: All the boxes in the matrix represent the transition that happened from
black to white.
Image Processing Fundamentals 31
FIGURE 2.10
Image brightness.
FIGURE 2.11
Image contrast.
FIGURE 2.12
Resolution.
FIGURE 2.13
Pixel intensity.
TABLE 2.2
Pixel Density versus Pixels per Square Inch
Pixel Density (Pixels per Inch, ppi) Pixels per Square Inch
1 ppi 1
2 ppi 4 (double ppi = quadruple pixel count)
4 ppi 16
8 ppi 64
FIGURE 2.14
The color model fundamentals.
FIGURE 2.15
RGB from CMY.
FIGURE 2.16
Hue scale.
The hue scale is presented in Figure 2.16 and it ranges from 0 to 360 degrees
(also see Figure 2.17).
The next model to be discussed is the YUV model.
Image Processing Fundamentals 37
FIGURE 2.17
The HSV triangle.
FIGURE 2.18
RGB to YUV conversion.
The standard formula to derive the YUV from RGB is presented next:
Having said this, the conversion from the RGB to the YUV happens in the
way as represented in Figure 2.18.
Is this done? No, we have more information to pay attention to.
There is a concept called chroma subsampling. What is chroma sub-
sampling? Chroma subsampling is a process that is connected to the
lessening of color resolution of video signals. The reason behind this is
very straightforward: to save the bandwidth. Chroma is also called color
component information. One can lessen or reduce this by sampling and
thereby comes the term chroma subsampling. It happens by sampling
at a lower rate than the brightness. As we know, brightness is all about
luminance.
But, when the color information is reduced, won’t it be detected by human
eyes? The answer is interesting. Human eyes are more sensitive to brightness
variation than to color variations and hence there is no problem!
Four different chroma sampling approaches are followed. They are 4:4:4,
4:2:2, 4:1:1, and 4:2:0. Let us understand these through the simple diagram-
matic representation in Figure 2.19, which represents the color conversion for
a 2 × 2 matrix, followed by the chroma subsampling models in Figures 2.20,
2.21, 2.22, and 2.23.
Variants for the YUV model are also available and they are:
1. YCbCr
2. YPbPr
Image Processing Fundamentals
FIGURE 2.19
The RGB to YUV conversion process for 2 2 matrix.
39
40
FIGURE 2.20
Chroma subsampling.
Digital Image Processing
Image Processing Fundamentals 41
Chroma subsampling.
FIGURE 2.21
42
FIGURE 2.22
Chroma subsampling.
Digital Image Processing
Image Processing Fundamentals 43
Chroma subsampling.
FIGURE 2.23
44 Digital Image Processing
They are scaled versions of the YUV model, and hence, readers can pay less
attention to them. The subsequent topic for discussion is characteristics of
image operations.
2.7.1 Types of Operations
The actions one can carry out with an image to transform the input to an
output is defined as the types of operations. There are three fundamental
categories under these types of operations:
1. Point operation
2. Local operator
3. Global operator
First, let’s understand the point operation (Figure 2.24). Here, the output value
at a specific coordinate is dependent only on the input value at the same coor-
dinate and nothing else. An example would be apt here. See Figure 2.25.
The second operation to be understood is local operation. This is diagram-
matically presented in Figure 2.26. In this approach, not just a pixel but the
neighbors are considered and they are also in action. The output intensity
level at a pixel not only depends on the corresponding pixel at the same
coordinate but also on the neighboring pixels (Figure 2.27).
The next topic in queue is the global operation. What is it? Let’s unfold the
mystery! Here, the output value at a specific point is dependent on all the
values in the input image. This is diagrammatically presented in Figure 2.28.
The next subject to be discussed is neighborhoods.
FIGURE 2.24
Point operation.
Image Processing Fundamentals 45
FIGURE 2.25
A simple example of point operation.
FIGURE 2.26
Local operation.
FIGURE 2.27
Local operation with an implementation example.
2.7.2 Types of Neighborhoods
As the name suggest, this concerns the neighboring pixels. Based on the
neighboring pixels, the value of the considered pixel is altered. There are
two types of neighborhoods supported in modern-day image processing.
They are:
• Rectangular sampling
• Hexagonal sampling
46 Digital Image Processing
FIGURE 2.28
Global operation.
FIGURE 2.29
Neighborhoods.
Pros
• Very versatile, and can support multiple colors and options such
as CMY and RGB.
• Quality is not compromised and it ensures perfect and complete
quality.
Cons
• Its very large size is a concern.
Pros
• Small size and reduced storage needs.
• Default image type in digital cameras, which enables more
photos to be stored.
• Apt for the websites and digital documents, as the image loads
faster.
• Compatible with most operating systems and is widely accepted.
In fact, this image type is indeed very popular.
Cons
• The discarded data is a huge concern. This could affect the
content and quality.
• May create room for false observations because of artifacts.
• Transparency is hurt when this kind of image is used.
Pros
• Reduced size for file storage.
• Quality retainment.
• Suitable for images where multiple color shades are not required.
Cons
• Very limited color options.
Pros
• Improved color range support compared to GIF.
• Increased transparency.
• Smaller file size than GIF.
Cons:
• File size still not smaller than JPEG.
2.8.5 RAW Format
People refer to a “raw” format for a number of reasons. Raw image content
is not processed and can’t be used directly. Raw images are the format of
the images immediately after creation, i.e., when you click a photo, before
processing, it would be a raw image. Also, since it is not processed, a raw
image cannot be printed. A RAW file is an uncompressed format, and since
there has been no processing, the size of the file is very high.
There are many types of RAW formats available on the market. They
include CR2 and CRW (both created by Canon), NEF (created by Nikon
Cameras), and PEF (created by Pentax Digital Cameras).
The next topic of discussion is the fundamental steps in digital image
processing.
1. Image acquisition.
The first and foremost step is where to acquire the image. To
make it a bit more technical, this stage is where the image is made
50 Digital Image Processing
FIGURE 2.30
Steps in digital image processing.
FIGURE 2.31
Components in digital image processing.
From the above two components, one can easily identify two sub-
components. One is the sensor that senses the energy being radiated
by the object that we want to capture (i.e., object of focus). The second
one is the digitizer that converts the sensed image in the analog form
to digital form, which would be apt for further processing.
2. Components involved in the processing stage.
The computers or the workstation with appropriate software
installed is the requirement. The current trends are toward smaller
computing engines, which start from Raspberry Pi and go up to the
Intel NUC and Intel AI Vision kit, which are very specific and rich in
features, making them suitable for image processing applications.
3. Components for storage.
This is one of the most important areas of concern when it
comes to image processing. Mass storage is often required in image
processing applications. Based on the application being developed,
one can choose any of the following storage options:
a. Cases where faster processing is required, i.e., real time, one can
store the images in the memory available on the processor itself
or can store them at the next level through memory cards, which
are in closer proximity to the processor. But the size of the mem-
ory available in the processor is normally not large, and hence
this option should be preferred wherever real speed is required.
However, the images should later be stored in secondary storage
medium for future usage.
Image Processing Fundamentals 53
b. Online storage options are on the rise these days. One can even
store images in the cloud and processing can be done there
as well. But cost factors are to be considered in this option.
One may have to spend for this storage. Also, if images are
secured and has confidential content, remote storage may not
be preferred.
c. Storing the images after processing for future reference is called
archiving. The images when stored after processing fall under
this category and can be carried out through secondary storage
devices like drives.
In this chapter we have revisited the concepts behind image formation, types
of images, and quantization and sampling. In the next chapter, let’s focus our
attention toward the noise in an image, the types of image noise, and also the
possible remedial measures to subdue them.
2.11 Quiz
1. Human eyes are the inspiration for the camera’s invention. True or
false?
2. What is the basic principle behind capturing any image?
3. The _____ mosaic has the combination of RGB.
4. In the Bayer mosaic, R is ___%, G is ___%, and B is ___ %.
5. The reason for having green color sensors with 50% weightage in the
Bayer mosaic is _____.
6. Bpp stands for _____.
7. Brightness is _____.
8. _____ is the difference between the minimum and the maximum
pixel intensities in an image.
9. _____ can be defined as a system that makes use of the three pri-
mary colors (RGB) to produce a vast range of colors.
10. _____ normally refers to the size of the display in terms of the pixels.
11. _____ is the finest example of an additive color model, where red,
green, and blue are added together to produce a broad range of other
colors.
54 Digital Image Processing
2.11.1 Answers
1. True
2. Reflection
3. Bayer
4. 25, 50, and 25
5. To replicate the human eye’s sensitivity to green
6. Bits per pixel
7. Relative
8. Contrast
9. Color model
10. Resolution
11. RGB color model
12. Pixel density
13. Hue, saturation, and value
14. True
15. False
2.12 Review Questions
1. Describe the fundamental principle behind image formation with
an appropriate figure.
2. Draw the sequence involved in image formation, technically.
3. What is the Bayer mosaic and how does it work?
4. Define bits per pixel.
5. Define brightness.
6. Define contrast.
7. Define resolution.
8. What is pixel density?
9. Differentiate resolution and pixel density.
Image Processing Fundamentals 55
2.12.1 Answers
1. When a light ray falls on any object, the basic tendency is to reflect it
back. We should realize that the eyes have a lens as well, just like a
camera, and this is the fundamental principle.
2.
56 Digital Image Processing
3. The incoming light has RGB components. The R filter will filter only
the red component, and the G filter and B filter do the same with
their respective components. The combination of the three together
forms the image. One should remember that G is 50% of the total
content, whereas the two others contain 25% each.
14. All monitors and the television displays use this concept!
15. HSV:
• Hue – The color. It is signified as a point in a 360-degree color
circle.
• Saturation – This is directly connected to the intensity of the
color (gray range in the color space). It is normally represented in
terms of percentage. The range is from 0% to 100%, where 100%
signifies intense color.
58 Digital Image Processing
• Value – This can also be called brightness and just like saturation
it is represented as a percentage. The range is from 0% to 100%,
where 0% represents black and 100% represents the brightest.
16. Luminance refers to the brightness and chrominance refers to the
color.
17. Chroma subsampling is a process that is connected to the lessening
of color resolution of video signals to save bandwidth. Chroma is
also called color component information. One can lessen or reduce
this by sampling and hence the term chroma subsampling. It hap-
pens by sampling at a lower rate than the brightness. As we know,
brightness is all about luminance.
18.
Further Reading
Adelson, E.H., Anderson, C.H., Bergen, J.R., Burt, P.J. and Ogden, J.M., 1984. Pyramid
methods in image processing. RCA Engineer, 29(6), pp. 33–41.
Baxes, G.A., 1984. Digital image processing: A practical primer. Cascade Press.
Ibraheem, N.A., Hasan, M.M., Khan, R.Z. and Mishra, P.K., 2012. Understanding
color models: A review. ARPN Journal of Science and Technology, 2(3), pp. 265–275.
Miano, J., 1999. Compressed image file formats: Jpeg, png, gif, xbm, bmp. Addison-Wesley
Professional.
Plataniotis, K.N. and Venetsanopoulos, A.N., 2013. Color image processing and
applications. Springer Science & Business Media.
Russ, J.C., 2016. The image processing handbook. CRC Press.
Image Processing Fundamentals 59
Solomon, C. and Breckon, T., 2011. Fundamentals of digital image processing: A practical
approach with examples in Matlab. John Wiley & Sons.
Walmsley, S.R., Lapstun, P. and Silverbrook Research Pty Ltd, 2006. Method and
apparatus for Bayer mosaic image conversion. U.S. Patent 7,061,650.
Walmsley, S.R., Lapstun, P. and Silverbrook Research Pty Ltd, 2006. Method for Bayer
mosaic image conversion. U.S. Patent 7,123,382.
Webb, J.A., 1992. Steps toward architecture-independent image processing. Computer,
25(2), pp. 21–31.
3
Image Noise: A Clear Understanding
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
• Noise in an image
• Types of image noise
• How noise is created in an image?
• Remedial measures possible for noise
3.1 Introduction
It is always important to understand a definition first before diving deep into
a subject. This chapter is intended to create complete awareness about image
noise, its types, and related concepts.
Image noise is defined in the literature as “a variation or deviation of bright-
ness or color information in images”. Image noise is often referred to as digi-
tal noise. The source of image noise is often camera sensors and associated
internal electronic components. These components in cameras introduce
anomalies or imperfections in the image that certainly degrade the quality
of the image. These imperfections are referred to as image noise. Figure 3.1
is an image with no noise and Figure 3.2 has noise introduced to the image.
One can see the significant impact of noise on the image.
Generally, noise appears in an image because of any one or many of the
following reasons:
• Insufficient lighting
• Environmental conditions
DOI: 10.1201/9781003217428-3 61
62 Digital Image Processing
• Sensor temperature
• Transmission channels
• Dust factors
The factors that influence image noise are depicted in Figure 3.3.
We will first cover the types of noise an image can be affected with. A
simple diagrammatic representation is presented in Figure 3.4.
The first type of noise to be discussed will be photoelectronic noise.
FIGURE 3.1
Image without noise.
FIGURE 3.2
Image with noise.
Image Noise 63
FIGURE 3.3
Factors influencing image noise.
FIGURE 3.4
Types of image noise.
3.2 Photoelectronic Noise
Photoelectronic noise includes photon noise and thermal noise. The follow-
ing sections will elaborate on both.
FIGURE 3.5
Photon noise.
3.2.2 Thermal Noise
Thermal noise is one of the most frequently appearing noises in any elec-
tronic circuit and the camera is no different. Thermal noise is also called
Johnson–Nyquist noise. Thermal noise is produced by random motion of the
charged carriers (i.e., electrons) in any conducting medium. One can observe
thermal noise in almost all electrical/electronic circuits. Thermal noise is
also known as white noise. It is referred to as white noise as it impacts all
the frequency components of the signal equally. Thermal noise naturally
increases with temperature. Try this out in MATLAB with the code snippet
in Figure 3.6. The image has to be fed in as input in the code.
The original image without noise (input image) is presented in Figure 3.1,
compared to the impact of this noise presented in Figure 3.7.
FIGURE 3.6
MATLAB code snippet.
FIGURE 3.7
Impact of thermal noise
image will have many photons and therefore little photon noise. A dim or dull
image has more photon noise as it may not have many photons in action. This
kind of noise cannot be known or corrected in advance. Also, it is to be clearly
observed that photon noise is not related to the equipment or electronics asso-
ciated with it. It is totally dependent on the number of photons. To conclude,
the larger the number of photons collected, the lesser the noise.
Thermal noise can be reduced with careful reduction of temperature of
operation. Also, thermal noise gets reduced with a reduction of the resistor
values in the circuit.
3.3 Impulse Noise
The next type of noise one should know is impulse noise. One type of
impulse noise is salt-and-pepper noise.
Impulse noise is one of the noises that has always received a lot of atten-
tion from researchers. It is often regarded as a very important source for
66 Digital Image Processing
3.3.1 Salt-and-Pepper Noise
Salt-and-pepper noise is peculiar. In this type of noise, the images have
dark pixels in the bright regions and bright pixels in the dark regions. The
main source and origin of this kind of noise is through analog-to-digital
converter errors. As cited earlier, bit transmission errors also cause salt-
and-pepper noise. The resulting impact is the image has a lot of black and
white spots. The noisy pixel in this noise type would have either a salt
value or pepper value. The salt value has a gray level of 255 (brightest)
and the pepper value has a gray level of 0 (darkest). The MATLAB code
for introducing salt-and-pepper noise is presented in Figure 3.8. The input
image is presented as in Figure 3.1. Figure 3.9 is the result after introducing
salt-and-pepper noise.
FIGURE 3.8
MATLAB code snippet for introducing salt-and-pepper noise.
Image Noise 67
FIGURE 3.9
Salt-and-pepper noise.
3.4 Structured Noise
Structured noise can be periodic stationary or periodic nonstationary in
nature. Or it can be aperiodic.
With periodic nonstationary noise, the noise parameters, including
amplitude, frequency, and phase, are varied across the image. This is mostly
caused by interference between the electronic components or electrical
components.
For periodic stationary noise, the noise parameters – amplitude, frequency
and phase – are fixed, unlike with nonstationary noise. Interference between
the components causes this noise, as with nonstationary noise.
When an image is affected by periodic noise, it appears like a repeating
pattern added on top of the original image. When someone wants to analyze
in the frequency domain, it appears like discrete spikes. Notch filters are
used to minimize the impact of the periodic noise.
If the noise is aperiodic, the pattern in the image is repetitive in nature.
The code for inducing structured noise is presented in Figure 3.10, and the
results obtained upon running the code are also presented in Figure 3.11 and
Figure 3.12.
In this chapter we have focused on noise in an image, the types of noise,
and the possible ways to overcome them. Let’s proceed to the next chapter
to further understand the importance of edge detection in image processing,
various edge detection operators, and also the pros and cons of various edge
detection techniques.
68 Digital Image Processing
FIGURE 3.10
MATLAB code snippet for inducing structured noise.
FIGURE 3.11
Ripples to be injected in the original image as noise.
FIGURE 3.12
The structured noise effect
Image Noise 69
3.5 Quiz
1. _____ is a variation or deviation of brightness or color information in
an image.
2. The source of the image noise is mostly from _____.
3. Which of the following are sources of noise?
a. Insufficient lighting
b. Environmental conditions
c. Sensor temperature
d. All the above
e. None of the above
4. Photon noise is also called _____.
5. _____ noise is created with the uncertainty associated with the mea-
surement of light.
6. Photon noise is fully dependent on _____.
7. _____ is produced by random motion of the charged carriers (called
electrons) in any conducting medium.
8. Image impulse noise mainly arises due to the _____.
3.5.1 Answers
1. Noise
2. Camera sensors and electronic components
3. d. All the above
4. Shot noise or Poisson noise
5. Photon noise
6. Number of photons falling on the charge-coupled device (CCD) of
the camera.
7. Thermal noise.
8. Missed transmission of signals.
3.6 Review Questions
1. Define noise with respect to images.
2. What are the major types of noise identified with respect to images?
70 Digital Image Processing
3.6.1 Answers
Further Reading
Boncelet, C., 2009. Image noise models. In The essential guide to image processing (pp.
143–167). Academic Press.
Mastin, G.A., 1985. Adaptive filters for digital image noise smoothing: An evaluation.
Computer Vision, Graphics, and Image Processing, 31(1), pp. 103–121.
Narendra, P.M., 1981. A separable median filter for image noise smoothing. IEEE
Transactions on Pattern Analysis & Machine Intelligence, 1, pp. 20–29.
Peters, R.A., 1995. A new algorithm for image noise reduction using mathematical
morphology. IEEE Transactions on Image Processing, 4(5), pp. 554–568.
Image Noise 71
Rank, K., Lendl, M. and Unbehauen, R., 1999. Estimation of image noise variance. IEE
Proceedings – Vision, Image and Signal Processing, 146(2), pp. 80–84.
Toh, K.K.V., Ibrahim, H. and Mahyuddin, M.N., 2008. Salt-and-pepper noise detec-
tion and reduction using fuzzy switching median filter. IEEE Transactions on
Consumer Electronics, 54(4), pp. 1956–1961.
4
Edge Detection: From a Clear Perspective
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
4.1 Introduction
What is edge detection? This question has to be answered before going
deeper into the topic.
Edges are defined as “sudden and significant changes in the intensity” of
an image. These changes happen between the boundaries of an object in an
image. In Figure 4.1, the mug is seen in the input image, and once the edges
are detected, one can find out the exact layout or boundary of the object. The
edges are detected based on the significant change in intensity between the
DOI: 10.1201/9781003217428-4 73
74 Digital Image Processing
FIGURE 4.1
What are edges?
objects in the image. To be precise, the mug has a different intensity from the
gray background and it is the key idea for us to identify the edges. So, if there
are many objects in an image, edges are the easiest way to identify all of them.
1. One can understand the shape of objects in the image only when the
edges are detected. So, ideally to understand an object and its shape,
it becomes inevitable for someone to detect the edges.
2. There are many technical issues and challenges mapped to the seg-
mentation, registration, and object identification techniques. Edges
prove to be efficient with these techniques at fundamental levels.
a. Step edges/discontinuity
b. Line edges/discontinuity
c. Ramp edges/discontinuity
d. Roof edges/discontinuity
We shall understand all these one after another with appropriate diagram-
matic support.
FIGURE 4.2
Step edge.
FIGURE 4.3
Line edge.
76 Digital Image Processing
c. Step edges are noted as ramp edges when the intensity changes are
not immediate but occur over a finite distance gradually for a longer
duration/distance (Figure 4.4).
d. Line edges are noted as roof edges when the intensity changes are
not immediate but occur over a finite distance gradually for a longer
duration/distance (Figure 4.5).
FIGURE 4.4
Ramp edge.
FIGURE 4.5
Roof edge.
FIGURE 4.6
Types of edges.
Edge Detection 77
1. Image smoothing
2. Edge points detection
3. Edge localization
FIGURE 4.7
Steps in edge detection.
FIGURE 4.8
Image smoothing.
78 Digital Image Processing
FIGURE 4.9
Edge points detection.
FIGURE 4.10
Edge localization.
Now, it is the time to understand all the available and frequently used edge
detection algorithms. We will start with the Sobel operator.
Edge Detection 79
4.5 Sobel Operator
Using the Sobel edge detector, the image is processed in the X and Y direc-
tions. This would result in the formation of new image, which actually is the
sum of the X and Y edges of the image. This approach works through calcu-
lation of the gradient of the image intensity at every pixel within the image.
Let’s get into the math. The Sobel filter has two kernels (3 × 3 matrix). One
of them correspond the X (horizontal) and the other shall be used for the Y
(vertical) direction. These two kernels should be convoluted with the origi-
nal image under process and through which the edge points are calculated
with ease. An example is presented below. The kernel values shown are fixed
for Sobel filter and cannot be altered.
The Gaussian filter plays a vital role in the entire process. The fundamental
idea behind the Gaussian filter is the center having more weight than the
rest. The general approach for detecting the edges is with the first-order or
second-order derivatives as shown in Figure 4.11.
FIGURE 4.11
Sobel-X and Sobel-Y.
80 Digital Image Processing
time. Then, the shift happens and the move has to be made toward the right,
i.e., the next column shift to the column end. A similar process is followed for
the row shifting from top to bottom. Remember, for columns it is left to right
movement, whereas for rows it is top to bottom movement with the Gx and
Gy. The is mathematically explained next with the aid of Figure 4.12.
The matrix in Figure 4.12a is a 5 × 4 image that is being convoluted with the
Gx (Sobel-X) operator. In the considered 5 × 4 matrix, we take only the first 3
× 3 matrix, in which the center value (I) is computed.
Upon the convolution, the resultant matrix in Figure 4.12b should be obtained.
FIGURE 4.12
(a) In the given 5 × 4 image being convoluted with the Gx (Sobel-X) operator, we take only the
first 3 × 3 matrix and the center value (I) is computed. (b) Upon convolution, this resultant
matrix Gx shall be obtained. (c) The resultant matrix Gy upon convolution.
Edge Detection 81
The Sobel-X values are convoluted with the original image matrix values.
Hence, the resultant matrix values would be as follows:
G = GX 2 + GY 2
The G will be compared against the threshold and with which one can
determine whether the taken point is an edge .
The implementation of Sobel edge detections in MATLAB is presented
next with the aid of Figures 4.13, 4.14, and 4.15.
FIGURE 4.13
Input image.
82 Digital Image Processing
FIGURE 4.14
The midway results – gradient magnitude – of the implementation of Sobel edge detections in
MATLAB.
FIGURE 4.15
The final detected edges of the implementation of Sobel edge detections in MATLAB.
Edge Detection 83
The MATLAB code for Sobel edge detection is presented in the following:
A=imread('InputImage.png');
B=rgb2gray(A);
figure,imshow(B)
C=double(B);
for i=1:size(C,1)-2
for j=1:size(C,2)-2
%Sobel mask for x-direction:
Gx=((2*C(i+2,j+1)+C(i+2,j)+C(i+2,j+2))-(2*C(i,j+1)
+C(i,j)+C(i,j+2)));
%Sobel mask for y-direction:
Gy=((2*C(i+1,j+2)+C(i,j+2)+C(i+2,j+2))-(2*C(i+1,j)
+C(i,j)+C(i+2,j)));
Thresh=100;
B=max(B,Thresh);
B(B==round(Thresh))=0;
B=uint8(B);
figure,imshow(~B);title('Edge detected Image');
1 0 −1
1 0 −1
1 0 −1
Prewitt – X
−1 −1 −1
0 0 0
1 1 1
Prewitt – Y
The implementation and code for the Prewitt x and y operators are as
follows:
A=imread('InputImage.png');
B=rgb2gray(A);
figure,imshow(B)
C=double(B);
for i=1:size(C,1)-2
for j=1:size(C,2)-2
%Sobel mask for x-direction:
Gx=((1*C(i+2,j+1)+C(i+2,j)+C(i+2,j+2))-(1*C(i,j+1)
+C(i,j)+C(i,j+2)));
%Sobel mask for y-direction:
Gy=((1*C(i+1,j+2)+C(i,j+2)+C(i+2,j+2))-(1*C(i+1,j)
+C(i,j)+C(i+2,j)));
%The gradient of the image
%B(i,j)=abs(Gx)+abs(Gy);
B(i,j)=sqrt(Gx.^2+Gy.^2);
end
end
figure,imshow(B); title('Prewitt Gradient');
Thresh=100;
B=max(B,Thresh);
B(B==round(Thresh))=0;
B=uint8(B);
figure,imshow(~B);title('Edge detected Image');
FIGURE 4.16
Input image.
FIGURE 4.17
The Prewitt gradient.
86 Digital Image Processing
FIGURE 4.18
Detected edges.
• North
• North west
• West
• South west
• South
• South east
• East
• North east
The masks are frames, as was discussed earlier for the Sobel operator.
Figure 4.19 shows the mask matrices for the directions.
The results obtained are presented in Figures 4.20 and 4.21.
Edge Detection 87
FIGURE 4.19
Robinson mask.
FIGURE 4.20
Input image.
FIGURE 4.21
Output image.
clear all;
clc;
close all;
bw4=imread('IMage.png');
bw4=rgb2gray(bw4);
Edge Detection 89
figure(1)
imshow(bw4)
title('Input Image')
t=1200 ;
bw5=double(bw4);
[m,n]=size(bw5);
g=zeros(m,n);
for i=2:m-1
for j=2:n-1
d1 =(5*bw5(i-1,j-1)+5*bw5(i-1,j)+5*bw5(i-1,j+1)-3*bw5
(i,j-1)-3*bw5(i,j+1)-3*bw5(i+1,j-1)-3*bw5(i+1,j)-3
*bw5(i+1,j+1))^2;
d2 =((-3)*bw5(i-1,j-1)+5*bw5(i-1,j)+5*bw5(i-1,j+1)-3*
bw5(i,j-1)+5*bw5(i,j+1)-3*bw5(i+1,j-1)-3*bw5(i+1,j)-
3*bw5(i+1,j+1))^2;
FIGURE 4.22
Krisch mask structure.
90 Digital Image Processing
d3 =((-3)*bw5(i-1,j-1)-3*bw5(i-1,j)+5*bw5(i-1,j+1)-3*
bw5(i,j-1)+5*bw5(i,j+1)-3*bw5(i+1,j-1)-3*bw5(i+1,j
)+5*bw5(i+1,j+1))^2;
d4 =((-3)*bw5(i-1,j-1)-3*bw5(i-1,j)-3*bw5(i-1,j+1)-3*
bw5(i,j-1)+5*bw5(i,j+1)-3*bw5(i+1,j-1)+5*bw5(i+1,j
)+5*bw5(i+1,j+1))^2;
d5 =((-3)*bw5(i-1,j-1)-3*bw5(i-1,j)-3*bw5(i-1,j+1)-3*
bw5(i,j-1)-3*bw5(i,j+1)+5*bw5(i+1,j-1)+5*bw5(i+1,j
)+5*bw5(i+1,j+1))^2;
d6 =((-3)*bw5(i-1,j-1)-3*bw5(i-1,j)-3*bw5(i-1,j+1)+5*
bw5(i,j-1)-3*bw5(i,j+1)+5*bw5(i+1,j-1)+5*bw5(i+1,j
)-3*bw5(i+1,j+1))^2;
d7 =(5*bw5(i-1,j-1)-3*bw5(i-1,j)-3*bw5(i-1,j+1)+5*bw5
(i,j-1)-3*bw5(i,j+1)+5*bw5(i+1,j-1)-3*bw5(i+1,j)-3
*bw5(i+1,j+1))^2;
d8 =(5*bw5(i-1,j-1)+5*bw5(i-1,j)-3*bw5(i-1,j+1)+5*bw5
(i,j-1)-3*bw5(i,j+1)-3*bw5(i+1,j-1)-3*bw5(i+1,j)-3
*bw5(i+1,j+1))^2;
g(i,j)=round(sqrt(d1+d2+d3+d4+d5+d6+d7+d8));
end
end
for i=1:m
for j=1:n
if g(i,j)>t
bw5(i,j)=255;
else
bw5(i,j)=0;
end
end
end
figure(2)
imshow(bw5)
title('kirsch Edge Detection')
The results obtained are presented in Figure 4.23, which represents the
input image, and in Figure 4.24, which represents the resultant output.
One can follow the same code with change in the mask values to get the
Robinson mask implemented.
The next edge detector to be discussed is the Canny edge detector.
Edge Detection 91
FIGURE 4.23
Input image.
FIGURE 4.24
Resultant output image.
1. Conversion to grayscale.
Let us take a sample image and proceed with the conversion. We have con-
verted the input RGB image (Figure 4.25) to a grayscale image (Figure 4.26).
2. Gaussian blur.
The Gaussian blur is an operator that helps in removing the noise in the
input image. This noise-removed image enables further processing to be
smooth and flawless. The sigma value has to be appropriately set for better
results (Figure 4.27).
FIGURE 4.25
Input image.
FIGURE 4.26
Grayscale converted image result.
Edge Detection 93
Let’s go back to the basics. The Sobel filter is to be used in this process. Let’s
understand what an edge is all about. Sudden intensity change is the edge
and, in fact, the intensity change of the pixel is the edge.
Next, the Sobel operator has to be applied over the input image, and
the steps and sequences remain the same as per the process explained
for Sobel edge detection. The resultant Sobel-operated image is presented
in Figure 4.28 and is referred to as the gradient magnitude of the image.
FIGURE 4.27
Gaussian blur operated image.
FIGURE 4.28
The Sobel gradient.
94 Digital Image Processing
G = GX 2 + GY 2
The G will be compared against the threshold, with which one can deter-
mine if the taken point is an edge or not.
The formula for finding the edge direction is Theta = inv tan (Gy/Gx).
We have the edge direction already available to us. The subsequent step is
to relate the identified edge direction to a direction that can be sketched in
the image, i.e., ideally, it is a prediction of how the movement of edges could
happen.
An example is always handy and we have taken a 3 × 3 matrix as a refer-
ence. It’s all about the colors and Figure 4.29 is to be visualized as a 3 × 3
matrix for the scenario being discussed.
The possible directions of movement are presented in Figure 4.30. The cen-
ter cell is the region of interest for us. It is important to understand this point.
There can be only four possible directions for any pixel. They are:
• 0 degrees
• 45 degrees
FIGURE 4.29
A 3 × 3 matrix example.
Edge Detection 95
• 90 degrees
• 135 degrees
Hence, it forces us into a situation where the edge has to be definitely ori-
ented toward one of these four directions. This is a kind of approximation.
For example, if the orientation angle if observed to be 5 degrees , it is taken as
0 degrees. Similarly, if it is 43 degrees, it should be rounded off to 45 degrees.
For ease of understanding we have drawn a semicircle with color shading
that represents 180 degrees (Figure 4.31). (But, the actual scenario is for 360
degrees.)
With Figure 4.31 as reference, the following rules are stated:
1. Any edge that comes under the yellow range is set to 0 degrees
(which means from 0 to 22.5 degrees and 157.5 to 180 degrees are set
to 0 degrees).
2. Any edge that comes under the green range is all set to 45 degrees
(which means 22.5 degrees to 67.5 degrees is set as 45 degrees).
3. Any edge coming under the blue range is all set to 90 degrees (which
means 67.5 degrees to 112.5 degrees is set as 90 degrees).
4. Any edge coming under the red range is all set to 135 degrees (which
means 112.5 degrees to 157.5 degrees is set as 135 degrees).
After this process, the direction of the edges is mapped to any of the four
directions mentioned earlier. The input image now should look like the one
FIGURE 4.30
The possible directions of movement.
FIGURE 4.31
A semicircle is drawn with color shading that represents 180 degrees. (But, the actual scenario
is for 360 degrees.)
96 Digital Image Processing
FIGURE 4.32
The color map.
FIGURE 4.33
Results before double thresholding.
Edge Detection 97
As one could see from the previous stage’s results, non-maximum threshold-
ing has not provided us excellent results. There is still some noise. The image
even raises a thought in mind that some of the edges shown may not really be
there and some edges could be missed in the process. Hence, there has to be a
process to address this challenge. The process to be followed is thresholding.
We need to go with double thresholding and in this process we have to set
two thresholds: one high and another low. Assume a high threshold value
as 0.8. Any pixel with a value above 0.8 is to be seen as a stronger edge. The
lower threshold can be 0.2. In that case, any pixel below this value is not an
edge at all and hence set them all to 0.
Now comes the next question: What about the values in between, from 0.2
to 0.8? They may or may not be an edge. They are referred as weak edges.
There has to be a process to determine which of the weak edges are actual
edges so as to not to miss them.
6. Edge tracking.
All the remaining weak edges can be removed and that is it. The process
is complete. Once this process is done, we should get the output image in
Figure 4.34.
FIGURE 4.34
Final result with edges detected.
98 Digital Image Processing
The MATLAB code for the Canny edge detection is presented below:
clear all;
clc;
%Input image
img = imread ('IMage.png');
%Show input image
figure, imshow(img);
img = rgb2gray(img);
figure, imshow(img);
img = double (img);
%Value for Thresholding
T_Low = 0.075;
T_High = 0.175;
%Gaussian Filter Coefficient
B = [2, 4, 5, 4, 2; 4, 9, 12, 9, 4;5, 12, 15, 12, 5;4,
9, 12, 9, 4;2, 4, 5, 4, 2 ];
B = 1/159.* B;
%Convolution of image by Gaussian Coefficient
A=conv2(img, B, 'same');
A=uint8(A)
figure,imshow(A);
%Filter for horizontal and vertical direction
KGx = [-1, 0, 1; -2, 0, 2; -1, 0, 1];
KGy = [1, 2, 1; 0, 0, 0; -1, -2, -1];
%Convolution by image by horizontal and vertical filter
Filtered_X = conv2(A, KGx, 'same');
Filtered_Y = conv2(A, KGy, 'same');
%Calculate directions/orientations
arah = atan2 (Filtered_Y, Filtered_X);
arah = arah*180/pi;
pan=size(A,1);
leb=size(A,2);
%Adjustment for negative directions, making all
directions positive
for i=1:pan
for j=1:leb
if (arah(i,j)<0)
arah(i,j)=360+arah(i,j);
end;
end;
end;
arah2=zeros(pan, leb);
%Adjusting directions to nearest 0, 45, 90, or 135
degree
Edge Detection 99
for i = 1 : pan
for j = 1 : leb
if ((arah(i, j) >= 0 ) && (arah(i, j) < 22.5) ||
(arah(i, j) >= 157.5) && (arah(i, j) < 202.5) ||
(arah(i, j) >= 337.5) && (arah(i, j) <= 360))
arah2(i, j) = 0;
elseif ((arah(i, j) >= 22.5) && (arah(i, j) < 67.5)
|| (arah(i, j) >= 202.5) && (arah(i, j) < 247.5))
arah2(i, j) = 45;
elseif ((arah(i, j) >= 67.5 && arah(i, j) < 112.5)
|| (arah(i, j) >= 247.5 && arah(i, j) < 292.5))
arah2(i, j) = 90;
elseif ((arah(i, j) >= 112.5 && arah(i, j) < 157.5)
|| (arah(i, j) >= 292.5 && arah(i, j) < 337.5))
arah2(i, j) = 135;
end;
end;
end;
figure, imagesc(arah2); colorbar;
%Calculate magnitude
magnitude = (Filtered_X.^2) + (Filtered_Y.^2);
magnitude2 = sqrt(magnitude);
BW = zeros (pan, leb);
%Non-Maximum Supression
for i=2:pan-1
for j=2:leb-1
if (arah2(i,j)==0)
BW(i,j) = (magnitude2(i,j) == max([magnitude2(i,j),
magnitude2(i,j+1), magnitude2(i,j-1)]));
elseif (arah2(i,j)==45)
BW(i,j) = (magnitude2(i,j) == max([magnitude2(i,j),
magnitude2(i+1,j-1), magnitude2(i-1,j+1)]));
elseif (arah2(i,j)==90)
BW(i,j) = (magnitude2(i,j) == max([magnitude2(i,j),
magnitude2(i+1,j), magnitude2(i-1,j)]));
elseif (arah2(i,j)==135)
BW(i,j) = (magnitude2(i,j) == max([magnitude2(i,j),
magnitude2(i+1,j+1), magnitude2(i-1,j-1)]));
end;
end;
end;
BW = BW.*magnitude2;
figure, imshow(BW);
%Hysteresis Thresholding
T_Low = T_Low * max(max(BW));
100 Digital Image Processing
FIGURE 4.35
Zero crossing.
0 –1 0
–1 4 –1
0 –1 0
–1 –1 –1
–1 8 –1
–1 –1 –1
The process remains the same as we did with the other edge detectors. With
the above, two kernel operations can be done.
The MATLAB code and the sample results for this edge detector are pre-
sented next:
clear
clear all;
a=imread('IMage.png');
a=rgb2gray(a);
[r c]=size(a)
a=im2double(a);
%filter=[0 -1 0;-1 4 -1; 0 -1 0];
filter=[-1 -1 -1;-1 8 -1; -1 -1 -1];
result=a;
for i=2:r-1
for j=2:c-1
sum=0;
row=0;
col=1;
102 Digital Image Processing
for k=i-1:i+1
row=row+1;
col=1;
for l=j-1:j+1
sum = sum+a(k,l)*filter(row,col);
col=col+1;
end
end
result(i,j)=sum;
end
end
figure, imshow(result);
Figures 4.36 and 4.37 reveal the edges for the input images through usage
of the two aforementioned kernels.
There are, however, some disadvantages to the Laplacian approach:
1. There are two pixel thick edges produced (refer to Figures 4.36 and 4.37).
2. Very highly sensitive to noise.
Table 4.1 compares the various edge detections techniques, and lists the pros
and cons of each technique.We have learned the importance of edge detection
in image processing and related topics in this chapter. Now it’s time to explore
frequency domain, and the significant role played by different filters on a sam-
ple image along with their comparison. Let’s move on to the next chapter.
FIGURE 4.36
Edge detection – Kernel 1.
FIGURE 4.37
Edge detection – Kernel 2.
TABLE 4.1
Comparison of Edge Detection Techniques
Edge Detector Crisp Note Parameter Advantages Disadvantages
Edge Detection
4.11 Review Questions
1. Define edges.
2. What is the actual need to detect edges?
3. What are the four types of edges discussed in this chapter?
4. Define step edge.
5. Define ramp edge.
6. What are three important steps in the edge detection process?
7. What is image smoothing?
8. How is the Sobel operator different from the Prewitt?
9. What is a Robinson compass mask?
10. How is a Robinson mask different from the Krisch edge detector?
11. What are the major disadvantages of the Laplacian edge detector?
4.11.1 Answers
7. The Prewitt edge detector is similar to the Sobel edge detector but
with a minor change. The Prewitt operator gives the values that are
symmetric around the center. But the Sobel operator gives weight to
the point that is lies closer to (x, y).
8. The reason behind the name is simple. In this approach, we should
take one mask and rotate it in all possible eight directions and hence
it is regarded as a compass mask. The compass directions consid-
ered are north, north west, west, south west, south, south east, east,
and north east.
9. The edge detector is again used for finding edges like the Robinson
edge detector. Every feature remains the same other than the aspect
of changing the mask as per the requirements in the Krisch edge
detector. This is more flexible compared to the Robinson mask.
10. The major disadvantages of the Laplacian edge detector are (a) there
are two pixel thick edges produced in this method and (b) it is very
highly sensitive to noise.
Further Reading
Canny, J., 1986. A computational approach to edge detection. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 6, pp. 679–698.
Gupta, S. and Mazumdar, S.G., 2013. Sobel edge detection algorithm. International
Journal of Computer Science and Management Research, 2(2), pp. 1578–1583.
Jin-Yu, Z., Yan, C. and Xian-Xiang, H., 2009, April. Edge detection of images based
on improved Sobel operator and genetic algorithms. In 2009 International
Conference on Image Analysis and Signal Processing (pp. 31–35). IEEE.
Lindeberg, T., 1998. Edge detection and ridge detection with automatic scale selec-
tion. International Journal of Computer Vision, 30(2), pp. 117–156.
Marr, D. and Hildreth, E., 1980. Theory of edge detection. Proceedings of the Royal
Society of London. Series B. Biological Sciences, 207(1167), pp. 187–217.
Torre, V. and Poggio, T.A., 1986. On edge detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2, pp. 147–163.
Vijayarani, S. and Vinupriya, M., 2013. Performance analysis of Canny and Sobel
edge detection algorithms in image mining. International Journal of Innovative
Research in Computer and Communication Engineering, 1(8), pp. 1760–1767.
Vincent, O.R. and Folorunso, O., 2009, June. A descriptive algorithm for Sobel image
edge detection. In Proceedings of Informing Science & IT Education Conference
(InSITE) (Vol. 40, pp. 97–107). Informing Science Institute.
Ziou, D. and Tabbone, S., 1998. Edge detection techniques-an overview. Pattern
Recognition and Image Analysis C/C of Raspoznavaniye Obrazov I Analiz Izobrazhenii,
8, pp. 537–559.
5
Frequency Domain Processing
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
5.1 Introduction
In science or in control systems or in the image processing field, the term
frequency domain is very important and is frequently employed. It means
“analysis of the signals or functions” with respect to the frequency instead of
“time”. When the frequency domain is discussed, the next question could be:
What is the difference between spatial and frequency domains? The answer
is simple. In the spatial domain, the value of the pixels in an image normally
changes with respect to the scene. It is “dealing with the image as it is”. In
the spatial domain, it means working on the intensity values of the image
pixels. But, when it comes to the frequency domain, it is about transforma-
tion of the coefficient, instead of the direct manipulation of the pixels like in
the spatial domain. To make the difference between the spatial domain and
the frequency domain clearer, refer to Figure 5.1.
DOI: 10.1201/9781003217428-5 107
108 Digital Image Processing
The frequency domain can be explained with the help of the following points:
1. Here, the first step is all about conversion of an image from the spa-
tial domain to frequency domain. Normally, one would use fast
Fourier transform toward conversion of the spatial domain content
to the frequency domain. (This is represented as “direct transforma-
tion” in Figure 5.1.)
2. Also, it is common practice to use low-pass filters for smoothing and
high-pass filters for sharpening images (“frequency filter” in Figure 5.1).
3. After the processing is carried out up to this stage, the results avail-
able in hand are not ideal to be displayed as the output image. Hence,
the inverse transformation is carried out to ensure the output image
is accurately displayed.
Fourier transform is the real vital content in the entire process. This fre-
quency domain analysis is normally carried out to understand how the sig-
nal energy gets distributed in a range of frequency.
FIGURE 5.1
Spatial versus frequency domain.
Frequency Domain Processing 109
Step 1: Use the fast Fourier transform to get the grayscale image into
frequency transformation. (Here is where the input image gets
transformed into frequency.) At the end of this step, one should get
the spectrum available for further processing.
Step 2: The spectrum available might not be an easier choice to operate
with the filters. Also, it is not an ideal choice for humans to visualize.
Hence, there arises the need of further processing. Shifting the zero
frequency component to the center of the spectrum is the step to be
carried out.
Step 3: Apply the corresponding filters, such as low pass or high pass,
based on the requirements to select/filter the frequencies.
Step 4: It is time for decentralization. One should get things back on
track through this decentralization process.
Step 5: Apply the inverse fast Fourier transform, which enables
conversion of the spectrum to a grayscale image, referred to as an
output image. (Frequency to spatial transformation happens in the
last step.)
FIGURE 5.2
Frequency domain process.
110 Digital Image Processing
ïì1 D ( u, v ) £ D 0 ïü
H ( u, v ) = í ý
ïî0 D ( u, v ) > D 0 ïþ
where D0 is a positive constant and the filter retains all the frequency compo-
nents within the radius D0. All the frequency components outside the radius
of the circle will be eliminated. In most cases, it will be the high-frequency
components eliminated. Within the circle, the retention happens without
attenuation. D0 is referred to as the cut-off frequency.
D(u, v) is the Euclidean distance from any point (u, v) to the origin of the
frequency plane. Refer to Figure 5.4 to understand the impact of the D0 value
variations. Note that if the value of D0 is very minimal, there is a risk of los-
ing the core information and the image would be over blurred or smoothed.
Hence, it is important to choose the apt D0 value.
Figure 5.5 shows the earlier Figure 5.2 included with the ideal low-pass
filter. The MATLAB code for the ideal low-pass filter is given next, and its
corresponding input and output images are illustrated in Figure 5.6.
FIGURE 5.3
Low-pass filter results.
Frequency Domain Processing 111
FIGURE 5.4
Ideal low-pass filter.
FIGURE 5.5
Ideal low-pass filter.
FIGURE 5.6
Ideal low-pass filter.
112 Digital Image Processing
input_image = imread('test1.png');
[row, col] = size(input_image);
% Filter Design
% Set the variables range
u_row = 0:(row-1);
% indices for meshgrid to transform the domain
index = find(u_row>row/2);
u_row(index) = u_row(index)-row;
% Set variables range
v_col = 0:(col-1);
% indices for meshgrid to transform the domain
idy = find(v_col>col/2);
v_col(idy) = v_col(idy)-col;
% Filtering mask
H = double(D <= D0);
% Convolution
G = H.*FT_img;
1
H ( u, v ) =
1 + éë D ( u, v ) D 0 ùû
2n
FIGURE 5.7
Butterworth low-pass filter.
FIGURE 5.8
Butterworth low-pass filter.
input_image = imread('test1.png');
FT_img = fft2(double(input_image));
% Cut-off Frequency
D0 = 20;
% Filter Design
% Set the variables range
u_row = 0:(row-1);
% indices for meshgrid to transform the domain
index = find(u_row>row/2);
Frequency Domain Processing 115
u_row(index) = u_row(index)-row;
% Set the variables range
v_col = 0:(col-1);
% indices for meshgrid to transform the domain
idy = find(v_col>col/2);
v_col(idy) = v_col(idy)-col;
% Filtering mask
H = 1./(1 + (D./D0).^(2*n));
% Convolution
G = H.*FT_img;
output_image = real(ifft2(double(G)));
FIGURE 5.9
Gaussian low-pass filter.
The transfer function for the Gaussian low-pass filter is presented as follows:
- D2 ( u, v )
H ( u, v ) = e 2 D0 2
where D0 specifies the cut-off frequency as prompted by the user and D (u, v)
is the Euclidean distance from any point (u, v) to the origin of the frequency
plane. It is important to know the distance of every element of the transfer
function to the origin (0,0).
Now let us to apply σ in place of D0. If σ is assigned, then H(u, v) becomes
- D 2 ( u , v ) 2s2
H ( u, v ) = e
input_image=double(imread('test1.png'));
FFT_img1=fft2(input_image);
figure(1),imshow(uint8(input_image))
figure(2),imagesc(log(1+abs(FFT_img1)))
[height,width]=size(input_image);
for x=1:1:height
for y=1:1:width
f(x,y)=input_image(x,y)*(-1)^(x+y);
end
end
Frequency Domain Processing 117
F=fft2(f);
figure(3),imagesc(log(1+abs(F)))
D0=35;
n=2;
for u=1:1:height
for v=1:1:width
D(u,v)=sqrt((u-height/2)^2+(v-width/2)^2);
H(u,v)=exp(-((D(u,v)*D(u,v))/(2*D0*D0)));
end
end
figure(4), imshow(H)
figure(5), mesh(H)
G=F.*H;
g=abs(ifft2(G));
figure(6), imshow(uint8(g))
FIGURE 5.10
Gaussian low-pass filter.
TABLE 5.1
118
Filter view
Mathematical ìï1 1 - D 2 ( u ,v )
D ( u, v ) £ D 0 üï H ( u, v ) =
representation 2n 2D0 2
H ( u, v ) = í ý H ( u, v ) = e
ïî0 D ( u, v ) > D 0 ïþ 1 + éë D ( u, v ) D 0 ùû
Digital Image Processing
Frequency Domain Processing 119
ìï0 D ( u, v ) £ D 0 üï
H ( u, v ) = í ý
îï1 D ( u, v ) > D 0 þï
where D0 is the positive constant. The filter retains all the frequency compo-
nents outside the radius D0. All the frequency components inside the radius
of the circle are suppressed. In most cases it will be all the low-frequency
components getting suppressed.
The next important term to know is the cut-off frequency. D0 is the transi-
tion point between H(u, v) = 1 and H(u, v) = 0, and hence it becomes the cut-
off frequency.
FIGURE 5.11
Sharpened image.
120 Digital Image Processing
Also from the equation, D(u, v) is the Euclidean distance from any point (u,
v) to the origin of the frequency plane.
Refer to Figure 5.12 to understand how the ideal high-pass filter works.
The MATLAB code for the ideal high-pass filter is given next, and its cor-
responding input and output images are illustrated in Figure 5.13.
FIGURE 5.12
Ideal high-pass filter process.
FIGURE 5.13
Ideal high-pass filter.
input_image = imread('test1.png');
% Cut-off frequency
D0 = 10;
Frequency Domain Processing 121
% Filter Design
% Set the variables range
u_row = 0:(row-1);
% indices for meshgrid to transform the domain
index = find(u_row>row/2);
u_row(index) = u_row(index)-row;
% Set the variables range
v_col = 0:(col-1);
% indices for meshgrid to transform the domain
idy = find(v_col>col/2);
v_col(idy) = v_col(idy)-col;
% Filtering mask
H = double(D > D0);
% Convolution
G = H.*FT_img;
1
H ( u, v ) =
1 + éë D 0 D ( u, v ) ùû
2n
where D0 is termed the cut-off frequency and is the transition point between
1 and 0 (H(u, v) = 1 and H(u, v) = 0); n defines the order; and D(u, v) is the
122 Digital Image Processing
FIGURE 5.14
Butterworth high-pass filter process.
FIGURE 5.15
Butterworth high-pass filter.
Euclidean distance from any point (u, v) to the origin of the frequency plane.
The BHPF works based on the D0 value. It passes all the frequencies above
D0, whereas it suppresses all the frequencies below D0. A Butterworth high-
pass filter keeps frequencies outside radius D0 and discards values inside.
See Figure 5.14 for a diagram of the BHPF process.
The MATLAB code for the Butterworth high-pass filter is given next, and
its corresponding input and output images are illustrated in Figure 5.15.
input_image = imread('test1.png');
% Order of n
n = 2;
% Cut-off Frequency
D0 = 10;
% Filter Design
% Set the variables range
u_row = 0:(row-1);
% indices for meshgrid to transform the domain
index = find(u_row>row/2);
u_row(index) = u_row(index)-row;
% Set the variables range
v_col = 0:(col-1);
% indices for meshgrid to transform the domain
idy = find(v_col>col/2);
v_col(idy) = v_col(idy)-col;
% Euclidean distance
D = sqrt(U.^2 + V.^2);
% Filtering mask
H = 1./(1 + (D0./D).^(2*n));
% Convolution
G = H.*FT_img;
where D0 specifies the cut-off frequency as prompted by the user and D(u, v)
is the Euclidean distance from any point (u, v) to the origin of the frequency
plane.
Refer to Figure 5.16 to understand the Gaussian high-pass filter. The
MATLAB code for the Gaussian high-pass filter is given next, and its cor-
responding input and output images and transfer function are illustrated in
Figure 5.17.
FIGURE 5.16
The Gaussian high-pass filter process.
FIGURE 5.17
Gaussian high-pass filter.
Frequency Domain Processing 125
input_image=double(imread('test1.png'));
FT_img=fft2(input_image);
figure(1),imshow(uint8(input_image))
figure(2),imagesc(log(1+abs(FT_img)))
[height,width]=size(input_image);
for x=1:1:height
for y=1:1:width
f(x,y)=input_image(x,y)*(-1)^(x+y);
end
end
F=fft2(f);
figure(3),imagesc(log(1+abs(F)))
D0=35;
n=2;
for u=1:1:height
for v=1:1:width
D(u,v)=sqrt((u-height/2)^2+(v-width/2)^2);
H(u,v)=1-(exp(-((D(u,v)*D(u,v))/(2*D0*D0))));
end
end
figure(4), imshow(H)
figure(5), mesh(H)
G=F.*H;
g=abs(ifft2(G));
figure(6), imshow(uint8(g))
See Table 5.2 for a comparison of high-pass filters, and Table 5.3 for a com-
parison of low-pas filters versus high-pass filters.
The significance of frequency domain has been explored in this chapter.
The next chapter explores image segmentation, its associated algorithms,
and the processes of dilation and erosion.
TABLE 5.2
126
Filter view
Mathematical ìï0 1 - D 2 ( u ,v ) 2 D 0 2
D ( u, v ) £ D 0 üï H ( u, v ) = H ( u, v ) = 1 - e
representation H ( u, v ) = í ý 2n
1 + éë D 0 D ( u, v ) ùû
îï1 D ( u, v ) > D 0 þï
Digital Image Processing
Frequency Domain Processing 127
TABLE 5.3
Low-Pass Filters versus High-Pass Filters
Low-Pass Filter High-Pass Filter
The low-pass filter is mainly meant for A high-pass filter is meant for sharpening
smoothing an image. an image.
It accepts/allows only the low-frequency It accepts/allows only the high-frequency
components, diminishing the high-frequency components, diminishing the low-
ones. frequency ones.
The low-pass filter allows the frequency The high-pass filter allows the frequencies
components that are below the cut-off above the cut-off frequency to pass
frequency to pass through it. through it.
5.5 Quiz
1. Which of the following is not a filter type?
a. Ideal low-pass filter
b. Ideal high-pass filter
c. Butterworth low-pass filter
d. Gaussian medium-pass filter
2. Which of the following provides no ringing effect?
a. Gaussian low-pass filter
b. Ideal low-pass filter
c. Butterworth low-pass filter
d. None of the above
3. Which of the following are mainly meant for smoothing an image?
a. Low-pass filters
b. High-pass filters
c. All of the above
d. None of the above
4. High-pass filters allow only low-frequency components, diminish-
ing the high-frequency ones. True or false?
5. _____ accept/allow only high-frequency components, diminishing
the low-frequency ones.
6. Which of the following statements is false?
a. Low-pass filters allow frequency components that are below the
cut-off frequency to pass through.
128 Digital Image Processing
5.5.1 Answers
1. d
2. a
3. a
4. False
5. High-pass filters
6. c
5.6 Review Questions
1. What is a low-pass filter all about?
2. What are the types of low-pass filters mentioned in this chapter?
3. Explain the transfer function of the ideal low-pass filter.
4. How is the ideal low-pass filter different from the Butterworth low-
pass filter?
5. Which of the low-pass filters is the best to fight ringing effect?
6. How does a high-pass filter work?
7. How many types of high-pass filters are generally used?
8. What are the major differences between low-pass and high-pass
filters?
5.6.1 Answers
ìï1 D ( u, v ) £ D 0 üï
H ( u, v ) = í ý
îï0 D ( u, v ) > D 0 þï
From this equation, D0 is the positive constant. The filter retains
all the frequency components within the radius D0. All the frequency
components outside the radius of the circle are eliminated. In most
cases it all the high-frequency components eliminated. Within the
circle, the retention happens without attenuation. D0 is also referred
to as the cut-off frequency.
Also, D(u, v) is the Euclidean distance from any point (u, v) to
the origin of the frequency plane. Note that if the value of D0 is very
minimal, there is a risk of losing the core information, and it would
be over blurred or smoothed. Hence, it is important to choose the apt
D0 value.
4. The ideal low-pass filter retains all the frequency components within
the radius D0. All the frequency components outside the radius of the
circle are eliminated. The Butterworth low-pass filter passes all the
frequencies inferior to the D0, whereas it cuts off all the frequencies
above the D0. The transition is smoothed with the Butterworth filter.
5. Gaussian low-pass filter.
6. It accepts/allows only the high-frequency components, diminishing
the low-frequency ones.
7. There are three types: ideal high-pass filter, Butterworth high-pass
filter, and Gaussian high-pass filter.
8. See Table 5.3.
Further Reading
Burt, P.J., 1981. Fast filter transform for image processing. Computer Graphics and Image
Processing, 16(1), pp. 20–51.
Chen, C.F., Zhu, C.R. and Song, H.Q., 2007. Image enhancement based on Butterworth
low-pass filter [J]. Modern Electronics Technique, 30(24), pp. 163–168.
Dogra, A. and Bhalla, P., 2014. Image sharpening by gaussian and Butterworth high-
pass filter. Biomedical and Pharmacology Journal, 7(2), pp. 707–713.
Dogra, A. and Bhalla, P., 2014. Image sharpening by gaussian and Butterworth high-
pass filter. Biomedical and Pharmacology Journal, 7(2), pp. 707–713.
Govind, D., Ginley, B., Lutnick, B., Tomaszewski, J.E. and Sarder, P., 2018. Glomerular
detection and segmentation from multimodal microscopy images using a
Butterworth band-pass filter. In Medical imaging 2018: Digital pathology (Vol.
10581, p. 1058114). International Society for Optics and Photonics.
130 Digital Image Processing
Khorsheed, O.K., 2014. Produce low-pass and high-pass image filter in java.
International Journal of Advances in Engineering & Technology, 7(3), p. 712.
Toet, A., 1989. Image fusion by a ratio of low-pass pyramid. Pattern Recognition Letters,
9(4), pp. 245–253.
Zhang, Z. and Zhao, G., 2011, July. Butterworth filter and Sobel edge detection to
image. In 2011 International Conference on Multimedia Technology (pp. 254–256).
IEEE.
6
Image Segmentation: A Clear Analysis
and Understanding
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
• Image segmentation
• Why segmentation is important?
• Image segmentation algorithms
• Thresholding-based image segmentation
• Segmentation algorithms based on edge information
• Segmentation algorithms based on region information
• Segmentation algorithms based on the clustering approach
• Morphological segmentation
• Texture-based segmentation
6.1 Introduction
Let us first understand what segmentation is all about. Let us start with
an example. In Figure 6.1 we have only one object: a motorcycle. It is very
straightforward for the algorithms to predict the content in the given image.
But, there comes a challenge when we have a car and motorcycle together
in a single image, as presented in Figure 6.2. Here in this case, we need our
algorithm to clearly identify the location of the objects and to go ahead with
object detection of the car and motorcycle from the image. Now comes the
segmentation of the picture. Before even going ahead with the classification,
one would need to clearly understand what the image consists of, i.e., what is
DOI: 10.1201/9781003217428-6 131
132 Digital Image Processing
FIGURE 6.1
One object.
FIGURE 6.2
Two objects.
the content of the image, and this can be achieved through image segmenta-
tion (Figure 6.3).
It is time to further define image segmentation in an “image process-
ing way”. Image segmentation is a process or a technique of partitioning a
given image into multiple subgroups based on common properties, such as
intensity or texture. The process groups together pixels of similar proper-
ties, thereby identifying the image’s objects. Through this approach of divid-
ing into multiple subgroups, analyzing the image becomes much easier and
effective, while also reducing the complexity to a greater extent.
Image Segmentation 133
FIGURE 6.3
Object detection.
FIGURE 6.4
After segmentation.
6.2 Types of Segmentation
There are many types of segmentation techniques available and one can
choose the best fit for the process. The segmentation types discussed here
are thresholding, histogram-based, region-based, edge-based, clustering-
based, morphological transforms, and texture-based.
First we will briefly introduce each technique, and the remaining sections
of the chapter will go into depth for each technique.
a. Thresholding method.
Segmentation algorithms based on a thresholding approach are
suitable for images where there is a distinct difference between object
and background. The goal of thresholding-based segmentation algo-
rithms is to divide an image into two distinct regions – object and
background – directly based on intensity values and/or properties
of these values. Thresholding is viewed as one of the simplest tech-
niques for image segmentation. There are three types of threshold-
ing-based image segmentation followed. They are:
• Global thresholding
• Variable thresholding
• Multiple thresholding
b. Histogram-based segmentation.
A histogram of an image is a plot between intensity levels (i.e.,
gray levels) along the x-axis and the number (frequency) of pixels at
Image Segmentation 135
each gray level along the y-axis. A good threshold can be found from
the histogram if the histogram peak is tall, narrow, symmetric, and
separated by deep valleys, and based on this the foreground and the
background can be separated.
c. Region-based segmentation.
The region-based segmentation method segments the image into
various regions of similar characteristics. A region is a group of
connected pixels with similar properties. There are two techniques
that fall under this segmentation and they are region growing, and
region split and merge.
d. Edge-based segmentation.
Edges are defined as “sudden and significant changes in the inten-
sity” of an image. These changes happen between the boundaries of
an object in an image. The edges are detected based on the signifi-
cant change in intensity between the objects in the image. So, in an
image, if there are many objects, edges are the easiest way to identify
all of them. If the segmentation is carried out based on the edges, we
can classify it as edge-based segmentation. Based on the discontinu-
ity or dissimilarities, edge-based segmentation will be carried out. It
is unlike region-based segmentation, which is completely based on
similarity.
e. Clustering-based segmentation.
Clustering-based techniques segment the image into clusters hav-
ing pixels with similar characteristics. There are many types of clus-
tering techniques available, but we will learn of the ones that have a
proven success rate in image processing.
f. Morphological transforms-based segmentation.
Morphology is the study of shapes. Morphological image pro-
cessing refers to a set of image processing operations that process
images based on shapes and not on pixel intensities. If the segmenta-
tion if carried out based on the shapes, we refer to it as morphologi-
cal transforms-based segmentation.
g. Texture-based segmentation approaches.
Segmentation based on texture characteristics consists of divid-
ing an image into different regions based on similarity in texture
features. Texture is defined as a repeated pattern of information or
arrangement of the structure with regular intervals. The texture of
images refers to the appearance, structure, and arrangement of the
parts of an object within the image.
6.3 Thresholding Method
As discussed in brief, segmentation algorithms based on the thresholding
approach are suitable for images where there is a distinct difference between
the object and background. The goal of thresholding-based segmentation
algorithms is to divide an image into two distinct regions – object and back-
ground – directly based on intensity values and/or properties of these val-
ues. This is viewed as one of the simplest techniques for image segmentation.
Let us understand things in detail.
There is an important component called threshold involved, and based on
the threshold T, one could go ahead with the following types of threshold-
ing-based segmentation algorithms:
ïì1 if f ( x, y ) > T
g ( x, y ) = í (6.1)
ïî0 if f ( x, y ) £ T
where g(x,y) represents the output image (segmented image), f(x,y) repre-
sents the input image, and T represents the threshold.
related to objects and the background. The basic iterative algorithm for find-
ing the global threshold T is as follows:
ïì G1 if f ( x, y ) > T
g ( x, y ) = í (6.2)
îïG2 if f ( x, y ) £ T
3. Compute m1 and m 2 :
m1 = average of intensity values of pixels within G1.
m 2 = average of intensity values of pixels within G2 .
m + m2
4. Compute a new threshold value: T = 1 .
2
5. Repeat step 2 to step 5 until the difference between T in successive
iterations is less than the specified ∆T.
This algorithm is an iterative scheme for finding the optimal global thresh-
old, and parameter ∆T is used as a stopping criterion, i.e., when the differ-
ence in threshold between two successive iterations becomes less than ∆T
(say, ∆T = 0.05), the algorithm stops iterating and it is said to have converged
to the optimum solution, i.e., the optimum global threshold.
Assumptions:
Algorithm
å
L -1
by n i and the total number of pixels in an image is given by N = ni .
i =0
The histogram of the image is normalized and this normalized histogram is
considered as probability distribution, given by Equation 6.3:
ni
pi = (6.3)
N
å
L -1
where pi > 0 and pi = 1 .
i =0
å
k
P ( C1 ) = P1 ( k ) = pi (6.4)
i =0
å
L -1
P ( C 2 ) = P2 ( k ) = pi = 1 - P1 ( k ) (6.5)
i = k +1
å
k
m1 ( k ) = i P ( i|C1 ) (6.6)
i =0
Image Segmentation 139
P ( C1|i ) P ( i )
P ( i|C1 ) = (6.7)
P ( C1 )
where P ( i|C1 ) is the probability of intensity value i, given that i comes from
class C1 ; P ( C1|i ) is the probability of C1 given i ; P ( i ) is the probability of the
ith value, which is the ith component of histogram pi . Here, P ( C1|i ) = 1, since
we are dealing only with values of i from class C1 . Substituting P ( i ) and
P ( C1|i ) in Equation 6.7 becomes
pi
P ( i|C1 ) = (
P1 ( k )
)
P ( i ) = pi and P ( C1 ) = P1 ( k ) from ( 6.4 ) (6.8)
å
1 k
m1 ( k ) = i pi (6.9)
P1 ( k ) i =0
Similarly, the mean intensity values of pixels within class C 2 are given by
å
L -1
m2 ( k ) = i P ( i|C 2 ) (6.10)
i = k +1
å
1 L -1
m2 ( k ) = i pi (6.11)
P2 ( k ) i = k +1
å
k
m (k ) = i pi (6.12)
i =0
å
L -1
mG = i pi (6.13)
i =0
The relationships in Equations 6.14 and 6.15 hold for P1, P2 , m1 and m 2 .
P1m1 + P2 m 2 = m G (6.14)
P1 + P2 = 1 (6.15)
140 Digital Image Processing
å
L -1
( i - m G ) pi (6.17)
2
s2G =
i =0
s2B = P1 ( m1 - m G ) + P2 ( m 2 - m G ) (6.18)
2 2
s2B =
( P1m G - m ) 2 (6.20)
P1 ( 1 - P1 )
Reintroducing k, we have
s2B ( k )
h(k ) = (6.21)
s2G
( P ( k ) m – m ( k ))
2
(k ) =
2 1 G
s (6.22)
P ( k ) (1 - P ( k ) )
B
1 1
The optimum global threshold value k* is the one that maximizes Equation 6.23:
( )
s2B k* = max s2B ( k ) (6.23)
0 £ k £ L -1
To find k* , evaluate Equation 6.23 for all integer values of k (subject to condi-
tion 0 < P1 ( k ) < 1 ) and select the value of k that yields the maximum s2B ( k ).
If s2B ( k ) is maximum for more than one value of k, average various values of
k for which s2B ( k ) is maximum.
Image Segmentation 141
Step 6: Segment the input image using the optimum global threshold k* .
ìï1 if f ( x, y ) > k*
g ( x, y ) = í (6.24)
îï0 if f ( x, y ) £ k
*
Figure 6.5a shows the input image, which is an optical microscope image of
polymersome cells. These are the cells that are artificially engineered using
polymers. These cells are invisible to the human immune system and can
be used for targeted drug delivery. Figure 6.5b shows the histogram of the
input image. The goal is to segment the molecules from the background.
Figure 6.5c shows the segmentation results obtained by using the basic itera-
tive algorithm for global threshold. Since the histogram has no distinct val-
leys and since there is no distinct intensity difference between the object
and background, this algorithm failed to produce the desired segmentation
result. Figure 6.5d shows the segmentation result obtained by using Otsu’s
global threshold method. The results obtained by using Otsu’s method is
superior to the results obtained by using the iterative algorithm for global
threshold. The threshold computed by using the basic iterative algorithm
was 169, whereas the threshold computed by using Otsu’s method was 182,
which is closer to the lighter areas defining the cells in the input image. The
separability measure obtained was η ∗ = 0.467.
142 Digital Image Processing
FIGURE 6.5
Global thresholding.
ìa if T1 < f ( x, y ) £ T2
ï
g ( x, y ) = í b if f ( x, y ) > T2 (6.25)
ïc if f ( x, y ) £ T1
î
Here g ( x,y ) denotes the segmented image, a denotes intensities of object 1, b
denotes intensities of object 2, and c denotes the intensity of the background.
T1 denotes the threshold for object 1 and T2 denotes the threshold for object 2.
6.4 Histogram-Based Segmentation
There are many techniques or methods followed for choosing the threshold
value. One could make it a really simple option by choosing the threshold
value manually. Or one can use a thresholding algorithm to compute what
Image Segmentation 143
FIGURE 6.6
Single threshold and multiple threshold based on histogram.
FIGURE 6.7
Segmentation based on variable thresholding.
Image Segmentation 145
rectangles. The histogram for those regions is found next, and then the seg-
mentation based on the local thresholding method is applied to arrive at the
final segmented image.
6.5 Region-Based Segmentation
Region-based segmentation methods segment the image into various regions
of similar characteristics. A region is a group of connected pixels with simi-
lar properties. There are two techniques that fall under this segmentation: (1)
region growing and (2) region split and merge.
We will introduce the region-growing technique first.
6.5.1 Region-Growing Method
The goal of the region-growing method is to group the pixels or subregions
into larger regions based on predefined criteria. This is considered as the
simplest of the techniques and it starts with the seed points. From the seed
points, the growth happens. The growth is governed by appending to each
seed point those of nearby or neighboring pixels that share similar proper-
ties. The similarities could be in terms of the texture, color, or shape. This
region-growing approach is based on similarities, whereas the edge-based
techniques are totally dependent on the dissimilarities. Region-growing
methods are preferred over edge-based techniques when it comes to a noisy
image, as edges are difficult to detected.
This approach is dependent on the examination of the neighboring pixels.
Then, they are merged to seed, then growth occurs. This is an iterative pro-
cess and will keep growing until the strong edges are identified, i.e., discon-
tinuity is identified.
Algorithm
The growth occurs after the comparison, i.e., 1 is compared with all its
neighbors. The regions are identified as follows:
Image Segmentation 147
This is an iterative process and there can be more than one seed.
FIGURE 6.8
Region-splitting technique.
of a pixel in a region minus the minimum value of a pixel, Zmin, in the same
region should not exceed 3).
é5 6 6 6 4 7 6 6ù
ê ú
ê6 7 6 7 5 5 4 7ú
ê6 6 4 4 3 2 5 6ú
ê ú
5 4 5 4 2 3 4 6ú
f ( x, y ) = ê
ê0 3 2 3 3 2 4 7ú
ê ú
ê0 0 0 0 2 2 5 6ú
ê1 1 0 1 0 3 4 4ú
ê ú
ëê1 0 1 0 2 3 5 4 úû
After splitting into four quadrants, it will look like
Image Segmentation 149
R1
é5 6 6 6ù
ê ú
ê6 7 6 7ú
ê6 6 4 4ú
ê ú
ë5 4 5 4û
Here,
Zmax = 7
Zmin = 4
T=3
7 - 4 <= 3 , The entire region R1 satisfies the condition and hence does not
require further split.
Let us proceed with the second region, R2:
R2
é4 7 6 6ù
ê ú
ê5 5 4 7ú
ê3 2 5 6ú
ê ú
ë2 3 4 6û
Here,
Zmax = 7
Zmin = 2
6.6 Edge-Based Segmentation
Edge detection is a very important topic for discussion and edge-based seg-
mentation is considered very effective too. It is a process through which the
edges are located leading to identification of the objects in the image. If the
edges are correctly identified, one can retrieve significant information. Let
us first understand something fundamental about edge detection. It is about
detecting the edges in the image, which is based on the discontinuity in the
color or texture or saturation. Fundamentally, it is discontinuity in the fea-
tures. To increase the accuracy and to improvise the results, more processing
becomes inevitable. Concatenating the edges to form the edge chains is the
most followed solution that helps in identifying borders in the image.
See Chapter 4 for more details about edge detection.
6.7 Clustering-Based Segmentation
Clustering helps in dividing the complete data to multiple clusters. Simply,
it is a method or technique to group data into clusters. The objects inside a
cluster should/must have high similarity. A detailed understanding of the
topic can be found in Chapter 7, Section 7.5.
TABLE 6.1
Basic Set Operations
Set Operation Result
Union A∪B, which contains all the elements in set A and set B.
Intersection A∩B, which contains the elements that are present in both set A and set B.
Compliment Ac, which contains complement of elements in set A, i.e., 1s in set A will
become 0s in set Ac. Similarly, 0s in set A will become 1s in set Ac.
Reflection Reflection of set B≡ww = −b ∀ b∈B
Difference A – B= A ∩ Bc
Translation Translation of set A= (A)z = ww = a +z ∀ a∈A
152 Digital Image Processing
FIGURE 6.9
Basic set operations.
• Dilation is all about increasing the size of the object, as it is the addi-
tion of pixels. But erosion is decreasing the size of the object.
• Dilation fills the holes and disconnected areas, whereas erosion
removes the smallest of the anomalies.
• Dilation is “a XOR b”, whereas erosion is a dual of dilation.
Image Segmentation 153
FIGURE 6.10
Basic logical operations between two sets A and B. (a) Compliment of A. (b) Logical AND
operation between two sets A and B. (c) Logical OR operation between two sets A and B. (d)
EXOR operation between sets A and B. (e) Logical AND operation between compliment of A
and B sets.
FIGURE 6.11
From left, the sample input image, the dilated image, and the eroded image.
154 Digital Image Processing
6.8.1.1 Erosion Example
Assume the following input image and structuring element:
The first step is presented – the center of the structuring element – i.e., 1
should be kept as the reference mark and should be picked and moved over
the input image. The results should be 0 if the 1s do not match. However,
here most of the cells get 0s, as it could not meet the condition of 1s in the
structuring element getting matched to 1s in the input image. Hence, it is a 0.
One can understand that all the 1s in structuring element get a match
when hovered over the input image while the operation is carried out in the
prescribed region. Since it is a match, it is 1.
Erosion, essentially, reduces the number of 1s in the input matrix.
So, the final eroded result is
FIGURE 6.12
Opening operation.
FIGURE 6.13
Closing operation.
Image Segmentation 159
6.8.3 Hit-or-Miss Transform
A simple approach, the hit-or-miss transform can be deployed when a par-
ticular pattern in the foreground and background is searched for. If there is
a match, it is a hit. If not, it is a miss. The following examples further explain
the concept.
Let us assume the structuring and input image (in the form of a matrix) as
shown next:
0 0 0
X 1 X
1 1 1
Structuring element
The structuring element has 1s and 0s, and there are some X cells as well.
They are referred to as “don’t cares” and they don’t have be considered. The
input to be worked out with the structuring element is presented next as case 1.
0 0 0
1 1 0
1 1 1
Input Case 1
For the hit scenario, all the considered 1s in the structuring element should
match with the 1s in the input image. All the considered 0s should also match
with the 0s in the input image. Don’t cares can be ignored. So, having said
that, all the 1s in the following structuring element and input go hand in
hand, and hence this is a hit; people also call it “true”:
160 Digital Image Processing
In the next scenario has different input for the structuring element:
Here, as one can see, it is a miss. The 1s highlighted in the input and struc-
turing element match. However, the highlighted 0s in the structuring ele-
ment do not match with the input elements, and hence is declared a miss.
This is also termed “false”.
The concepts of segmentation have been dealt with in this chapter. Now,
we can move ahead to the next chapter to learn the difference between
regression and classification, discuss a few classification algorithms, and
explore clustering.
6.9 Review Questions
1. Define image segmentation.
2. Why can segmentation often be challenging?
3. List the types of segmentation techniques discussed in this chapter.
4. Which segmentation technique is suitable for images with a distinct
difference between the object and background?
5. Define histogram of an image.
6. How do you find a good threshold from a given histogram?
7. When is multiple thresholding employed?
8. Define edges.
Image Segmentation 161
6.9.1 Answers
Further Reading
Ashburner, J. and Friston, K.J., 2005. Unified segmentation. Neuroimage, 26(3), pp.
839–851.
Haralick, R.M. and Shapiro, L.G., 1985. Image segmentation techniques. Computer
Vision, Graphics, and Image Processing, 29(1), pp. 100–132.
Pal, N.R. and Pal, S.K., 1993. A review on image segmentation techniques. Pattern
Recognition, 26(9), pp. 1277–1294.
Tautz, D., 2004. Segmentation. Developmental Cell, 7(3), pp. 301–312.
Wind, Y., 1978. Issues and advances in segmentation research. Journal of Marketing
Research, 15(3), pp. 317–337.
Yanowitz, S.D. and Bruckstein, A.M., 1989. A new method for image segmentation.
Computer Vision, Graphics, and Image Processing, 46(1), pp. 82–95.
7
Classification: A Must-Know Concept
Learning Objectives
After reading this chapter, the reader should have a clear understanding
about:
• (SVMs)
• Terms used in SVMs
• How SVMs work?
• k-Nearest neighbors (k-NN)
• Clustering
• k-Means clustering
7.1 Introduction
The first question normally arises this way: What is the difference between
regression and classification? Regression and classification both fall under
the heading of supervised learning algorithms. Both have extensive usage in
machine learning (ML) and both use a labeled data set. Then, where are they
different? The problems that they solve are different.
Regression predicts continuous values, for example, salaries, grades,
and ages. Classification classifies things, such as into male/female, pass/
fail, false/true, spam/legitimate. Classification divides the given data set
into classes based on the parameters considered. An example will be very
helpful.
Let’s take Gmail. Gmail classifies email as legit or spam. The model is
trained with millions of emails and has many parameters to consider.
Whenever a new email pops up, the classifications considered include inbox,
DOI: 10.1201/9781003217428-7 163
164 Digital Image Processing
spam, promotions, and updates. If the email is spam, it goes to spam folder.
If it is legit, it goes to the inbox.
There are many famous and frequently used classification algorithms.
They include:
It is good to understand and learn all of them, but that would be out of the
scope for this book. So, we handpicked support vector machine and k-near-
est neighbor for discussion.
7.2.1 Hyperplane
A hyperplane is a plane that separates (i.e., enables grouping) objects that
belong to different classes. This line helps in classifying the data points, e.g.,
the stars and triangles in Figure 7.1.
The dimension of a hyperplane is a variable too. Figure 7.1 has two features
and hence one straight line is sufficient. If there are three features, there has
to be a two-dimensional plane.
Classification 165
FIGURE 7.1
SVM – The complete picture.
7.2.2 Support Vectors
Refer again to Figure 7.1. The red stars and green triangles are the support
vectors. These points are very close to the hyperplane. As vectors, these data
points affect the position and placement of the hyperplane.
7.2.3 Margin
The margin is a gap. If the margin between two classes is large, then it is a
good margin; otherwise it is considered bad. In simple terms, the margin
is the gap between the closest points of two lines. Using Figure 7.1, one can
understand that the margin can be calculated as the perpendicular distance
from the line to the support vectors (red stars and green triangles).
FIGURE 7.2
Hyperplane selection.
Let’s go through the process step by step with the example of Figure 7.2.
The stars and triangles have been grouped together. Now generate hyper-
planes. Three such planes are generated in our example: brown, blue, and
red. The brown and blue have failed miserably to classify. This reflects a high
error rate. However, red is very apt and it does the separation properly. So,
what do we do? Choose the best line. The best drawn line is presented as a
black line on the right side of the Figure 7.2.
That is, it. One can now understand that all the red stars and green tri-
angles are grouped appropriately based on the hyperplane.
The complete implementation and a quick lecture on the SVM can be
found at https://youtu.be/Qd9Aj_EMfk0.
• It is referred to as nonparametric.
• It is also said to be a lazy learner algorithm.
Classification 167
See Figure 7.3 for the start of an example. The problem statement is presented
pictorially in Figure 7.4. The new data entry has to be classified as a red star
or a green triangle.
FIGURE 7.3
Assumed scenario.
168 Digital Image Processing
Case 1: If k is chosen as 1, then the task becomes easier and it is the simplest
option. The input data gets classified as Class A. See Figure 7.5.
Case 2: Let us choose k = 3. First, calculate the Euclidean distance between
the data points. The Euclidean distance is the distance between two points.
(This can be done through other methods too. Python has built-in functions
to help programmers.) See Figure 7.6 to understand how k-NN works with
the k value chosen as 3.
Case 3: Figure 7.7 shows the scenario when k is set as 7.
By now, readers should understand the way k-NN works. But take note that
keeping low k values should be avoided as the prediction could go wrong.
FIGURE 7.4
The assumed scenario to be classified.
FIGURE 7.5
Case 1: k = 1.
Classification 169
FIGURE 7.6
Case 2: k = 3.
FIGURE 7.7
Case 3: k = 7.
• It is simple.
• The more data, the better the classification.
Disadvantages include:
You can find a brief lecture about k-NN from the authors at https://youtu.be
/nVgZbVUmh50.
The next topic for discussion is clustering, a very important and interest-
ing area to learn about.
170 Digital Image Processing
7.5.1 k-Means Clustering
In clustering, the metric being employed is similarity! It is the metric that
measures the relationship between the objects. Why do we need clustering?
Simple, it gives you an exploratory view of the data. One could get better idea
about the data with clustering.
k-Means is also known as a “centroid-based clustering”. According to the
dictionary, a centroid is the center of mass of a geometric object of uniform
density. The same applies to machine learning; it is the data point at the cen-
ter of a cluster. The centroid need not be a member of the data set considered
(though it can be).
This clustering approach is iterative in nature, meaning the algorithm
keeps working until the target is achieved. Let’s take the example data set
in Table 7.1. The challenge is to group the eight objects in the data set as two
clusters. All the objects have the X, Y, and Z coordinates clearly available.
How do we select the k value? K is the number of clusters. Here, it is 2. So, let
us set the k value as 2.
Initially, we have to take any two centroids. Why go with two centroids?
Since the k value is 2, the number of centroids chosen is also 2. Once chosen,
FIGURE 7.8
An example of clustering.
Classification 171
TABLE 7.1
Data Set Considered for Clustering
Object X Y Z
O1 1 4 1
O2 1 2 2
O3 1 4 2
O4 2 1 2
O5 1 1 1
O6 2 4 2
O7 1 1 2
O8 2 1 1
the data points are tagged to any of the clusters based on the distance. It is
time to start the computation.
• First centroid = O2. This will be cluster 1. (O2 = First centroid = 1, 2, 2).
• Second centroid = O6. This will be cluster 2. (O6 = Second centroid
= 2, 4, 2).
Can we choose any other object as the centroid? This is a very common ques-
tion and, yes, any object can become a centroid.
How do we measure distance? There is a formula to the rescue:
where D is the distance between two objects. People call it the Manhattan
distance.
Remember, any object has X, Y, Z coordinates as per the data set! So,
the task is simple. It is time to reconstruct the table and one has to use the
distance between each object and the centroids chosen.
Like O1, O2, and O3, the rest of the calculations to find the distance from
C1 and C2 are to be computed. Refer to Table 7.2 for a clearer understanding.
The next step is to go ahead with the clustering. This is based on the dis-
tance, whichever is shorter. Say C1 is shorter than C2 for an object, then the
object falls to C1. Hence, the clustering should look like Table 7.3.
For a clear understanding, the following color guidelines were followed in
Table 7.3. Cluster 1 is represented by green color and cluster 2 is represented
by red. Also refer to Table 7.4.
In the next round, the next iteration has to be done. Hence, the new clusters
will be as in Table 7.5.
So, we can stop here. No updates in the centroids or changes in the clus-
ter grouping have been observed. Hence, this is the correct clustering. This
is how k-means clustering works. For more information, watch the lecture
172 Digital Image Processing
TABLE 7.2
Distance from C1 and C2
Object X Y Z Distance from C1 (1, 2, 2) Distance from C2 (2, 4, 2)
O1 1 4 1 D = |1–1| + |4–2| + |2–1| = 3 D = |1–1| + |4–2| + |2–1| = 2
O2 1 2 2 D = |1–1| + |2–2| + |2–2|= 0 D = |2–1| + |4–2| + |2–2|= 3
O3 1 4 2 D = |1–1| + |4–2| + |2–2| = 2 D = |2–1| + |4–4| + |2–2| = 1
O4 2 1 2 2 3
O5 1 1 1 2 5
O6 2 4 2 3 0
O7 1 1 2 1 4
O8 2 1 1 3 4
TABLE 7.3
Clustering
Classification 173
TABLE. 7.4
Clusters 1 and 2
TABLE 7.5
Reiterated Results
TABLE 7.6
New Clusters
7.6 Quiz
1. Define machine learning.
2. Define deep learning.
3. Where will someone employ machine learning or deep learning?
4. How is regression useful?
5. What is linear regression?
6. How is linear regression different from logistic regression?
7. Differentiate clustering and classification.
8. Clearly explain how a SVM works.
9. Explain the way k-means clustering functions.
Further Reading
Burkov, A., 2019. The hundred-page machine learning book (Vol. 1). Canada: Andriy
Burkov.
Goldberg, D.E. and Holland, J.H., 1988. Genetic algorithms and machine learning.
Goodfellow, I., Bengio, Y. and Courville, A., 2016. Machine learning basics. Deep
Learning, 1, pp. 98–164.
Jordan, M.I. and Mitchell, T.M., 2015. Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), pp. 255–260.
176 Digital Image Processing
Mohri, M., Rostamizadeh, A. and Talwalkar, A., 2018. Foundations of machine learning.
MIT Press.
Sammut, C. and Webb, G.I., eds., 2011. Encyclopedia of machine learning. Springer
Science & Business Media.
Shalev-Shwartz, S. and Ben-David, S., 2014. Understanding machine learning: From
theory to algorithms. Cambridge University Press.
Williams, D. and Hill, J., 2005. U.S. Patent Application No. 10/939,288.
Zhang, X.-D., 2020. Machine learning. In A matrix algebra approach to artificial intelligence
(pp. 223–440). Singapore: Springer.
8
Playing with OpenCV and Python
8.1 Introduction
You have been exposed to image processing concepts in an extensive man-
ner throughout this book and we hope it has been an enjoyable learning
experience. For the cherry on top, this chapter shows how to play around
with OpenCV and Python. An interesting set of simple programs to help
enhance understanding is presented. Also, YouTube links to video lectures
by the authors are provided. We request readers try out these programs prac-
tically on their respective machines get a feel for the implementation and to
wholly learn the concepts.
DOI: 10.1201/9781003217428-8 177
178 Digital Image Processing
FIGURE 8.1
The Ubuntu terminal.
FIGURE 8.2
The sudo apt update step.
complete the installation of the OpenCV and will be done in a couple of min-
utes. Figure 8.3 is the screenshot presented.
Are we done with the installation? Yes, but it is always best to verify if
things have been accomplished correctly. This validation is the next step.
The easiest way is to check the version of OpenCV with the command high-
lighted in Figure 8.4. It will reveal the version of OpenCV installed and only
if the installation is correct would you get this result.
Playing with OpenCV and Python 179
FIGURE 8.3
OpenCV installation.
8.3 Image Resizing
Image resizing is all about playing with the pixel counts. It helps in getting a
clear result and reducing the computational time.
Let’s see a simple code snippet along with results. The function “cv2.resi
ze(image, (200, 250))” is where the height and width are specified as part of
the function. “Image” is the name of the image file. The complete code is
presented in Figure 8.5. The image size is doubled and halved as well. The
results are presented in Figure 8.6.
One can also have a look at the video at https://youtu.be/unp5tHxS3Ak to
understand the program better.
180 Digital Image Processing
FIGURE 8.4
The OpenCV installation validation.
FIGURE 8.5
Image resizing.
8.4 Image Blurring
Blurring helps make the image less clear and the task can be accomplished
with filters. Choosing the filters is key for success. But why do we want to
blur an image? Simple. It will help in removing the noise. Three types of
blurring are generally followed, and all these filters are helpful in removing
the noise. The three filters are:
FIGURE 8.6
Image resizing results.
FIGURE 8.7
Image blurring with OpenCV.
The three types of blurring can all be implemented with the code in
Figure 8.7 and the subsequent results are presented in Figure 8.8. Also
one can refer to the video at https://youtu.be/ljfOBTAtRyc for a clearer
understanding.
182 Digital Image Processing
FIGURE 8.8
Image blurring results.
8.5 Image Borders
Drawing borders around an image is simple and interesting. With the
method “cv2.copyMakeBorder” one can easily do this. We need to specify
the arguments that correspond the number of pixels from the edge to consti-
tute the borders. The code clearly specifies all these and one can have a look
at the video at https://youtu.be/JGYdI5uVHi4 to understand things easier.
The code is presented in Figure 8.9 followed by the results in Figure 8.10.
FIGURE 8.9
Image bordering with OpenCV.
Playing with OpenCV and Python 183
FIGURE 8.10
Image bordering.
FIGURE 8.11
Image conversion with OpenCV.
FIGURE 8.12
The input image and grayscale image.
have chosen the Canny edge detector. Results are presented in Figure 8.14.
One can view https://youtu.be/YRmiVumcy4I to understand the process in
detail.
All the aforementioned exercises are easy enough for anyone to try, and we
strongly recommend all readers try them. Similar exercises are available by
referring to the playlist created by the authors at https://youtube.com/playl-
ist?list=PL3uLubnzL2Tn3RVU5CywZC17aRGExyZzo.
Playing with OpenCV and Python 185
FIGURE 8.13
Edge detection with the Canny edge detector.
FIGURE 8.14
Input image and detected edges.
FIGURE 8.15
Counting objects.
in Figures 8.18 and 8.19. The training and validation data set for both Fire
and No Fire are also shown in Figure 8.18. The complete code as well as the
results are presented in Figure 8.20.
Readers, we have come to the end of the book. This chapter was about
simple programs that can aid you in enhancing your level of understanding
about image processing. It is hoped the video lectures on YouTube will do
the same. We sincerely hope you had an enjoyable as well as an enthralling
learning experience.
Playing with OpenCV and Python 187
FIGURE 8.16
Counting objects.
188 Digital Image Processing
FIGURE 8.17
Counting objects.
Playing with OpenCV and Python 189
FIGURE 8.18
Data set for Fire and No Fire.
FIGURE 8.19
Training and validation data set for Fire and No Fire.
190 Digital Image Processing
FIGURE 8.20
Code and results for forest fire detection.
Readers have come to the end of this book. This chapter is all about simple programs which can
indeed aid the readers to enhance their level of understanding. Also, the same has been pre-
sented as video lectures through YouTube by the authors. We sincerely hope to have ensured
an enjoyable learning experience.
Index
Canny operator, 91
Charge-coupled device (CCD), 26, 27 H
Characteristics of Image Operations, 44 Hexagonal sampling, 46
Classification, 163 High-Pass Filters/Sharpening Filters,
Clustering, 170 119
Clustering-based segmentation, 135, 150 Histogram-based segmentation, 134, 142
CMY color model, 34 Hit-or-Miss Transform, 159
Color image, 6, 7, 21 How SVMs Work?, 165
Color Models, 34 HSV color model, 34, 36, 37
Comparison of edge detection Hyperplane, 164
operators, 103
Counting Objects with OpenCV, 185
I
191
192 Index
M Salt-and-Pepper Noise, 66
Sampling, 27, 29
MATLAB, 11 Segmentation algorithm based on a
Morphological transforms-based global threshold, 136
segmentation, 135, 150 Segmentation algorithm based
on a variable
O threshold, 138
Segmentation algorithm based on
OpenCV, 11, 12 multiple thresholds, 142
OpenCV Installation, 177 Selection of Global Threshold Using
Otsu Method, 137
P Sobel operator, 79
Spatial versus frequency
Photoelectronic Noise, 63 domain, 108
Photon Noise, 63 Step edges/discontinuity, 75
Pixel, 3, 4, 6, 23 Steps in Digital Image Processing, 49
Pixel Resolution and Pixel Density, Storage, 1, 2, 4, 6, 21, 29
31–33 Structured Noise, 67
PNG (Portable Network Graphic), 49 Subtractive Color, 35
Point operation, 44 Support Vector Machine (SVM), 164
Predicting Forest Fire with OpenCV, 185 Support Vectors, 165
Index 193
T W
Texture-based segmentation What edges are?, 73, 74
approaches, 135 Why detect edges?, 73, 74
Thermal Noise, 64
Thresholding method, 134
Y
TIFF (Tag Image File Format), 47
Tools for Image Processing, 11 YUV color model, 34
Types of edges, 73
Types of image noise., 63
Types of Neighbourhoods, 45
Types of Segmentation, 134