OceanofPDF.com Digital Image Processing Using Python - Manish Kashyap
OceanofPDF.com Digital Image Processing Using Python - Manish Kashyap
com
Digital Image
Processing Using
Python
A comprehensive guide to the
fundamentals
of digital image processing
www.bpbonline.com
OceanofPDF.com
First Edition 2025
ISBN: 978-93-65898-910
All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any
form or by any means or stored in a database or retrieval system, without the prior written permission
of the publisher with the exception to the program listings which may be entered, stored and executed
in a computer system, but they can not be reproduced by the means of publication, photocopy,
recording, or by any electronic and mechanical means.
All trademarks referred to in the book are acknowledged as properties of their respective owners but
BPB Publications cannot guarantee the accuracy of this information.
www.bpbonline.com
OceanofPDF.com
Dedicated to
My family
and
The only truth I know – Krishna
OceanofPDF.com
About the Author
Dr. Ravi Shanker completed his PhD from the Atal Bihari Vajpayee Indian
Institute of Information Technology and Management (ABV-IIITM),
Gwalior. He has gained research experience through a DST-SERB-
sponsored project at institutes of national importance. As a researcher, he
has developed various computer-assisted diagnostic (CAD) systems for
classifying brain MRI images.
Dr. Shanker has published articles in SCIE journals, authored book
chapters, and presented papers at various international and Scopus-indexed
conferences. He also serves as a reviewer for multiple SCIE and Scopus-
indexed journals, including The Journal of Supercomputing, Concurrency
and Computation: Practice & Experience, IEEE Access, and the
International Journal of Imaging Systems and Technology.
Currently, he works as an Assistant Professor at the Indian Institute of
Information Technology, Ranchi.
Shekhar is a Senior Data Scientist based in Hamburg, Germany, with 15
years of expertise in AI and machine learning. He earned his Master's in
Data Science with distinction, conducting pioneering research in computer
vision. His work has been featured in prominent deep-learning publications,
establishing him as an industry thought leader.
His technical expertise spans AWS, Google Cloud, Azure, and IBM Cloud,
where he excels at implementing enterprise-scale AI solutions. In his
current role, he leads innovation in generative AI, focusing on large
language model fine-tuning, RAG systems, and AI agent orchestration. He
also specializes in integrating these technologies into enterprise systems to
develop production-ready applications that deliver tangible business value.
As a technical leader, he builds and mentors high-performing data science
teams while implementing MLOps best practices. His leadership style
combines technical depth with business acumen, enabling organizations to
successfully navigate AI transformation.
Outside of work, Shekhar is an avid CrossFit enthusiast, cyclist, and
marathon runner. His comprehensive understanding of AI theory and
practice, particularly in generative AI, makes him a sought-after technical
reviewer and industry expert.
OceanofPDF.com
Acknowledgement
https://rebrand.ly/90b15f
Errata
We take immense pride in our work at BPB Publications and follow best
practices to ensure the accuracy of our content to provide with an indulging
reading experience to our subscribers. Our readers are our mirrors, and we
use their inputs to reflect and improve upon human errors, if any, that may
have occurred during the publishing processes involved. To let us maintain
the quality and help us reach out to any readers who might be having
difficulties due to any unforeseen errors, please write to us at :
[email protected]
Your support, suggestions and feedbacks are highly appreciated by the BPB
Publications’ Family.
Did you know that BPB offers eBook versions of every book published, with PDF and ePub files
available? You can upgrade to the eBook version at www.bpbonline.com and as a print book
customer, you are entitled to a discount on the eBook copy. Get in touch with us at :
[email protected] for more details.
At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a
range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.
Piracy
If you come across any illegal copies of our works in any form on the internet, we would be
grateful if you would provide us with the location address or website name. Please contact us at
[email protected] with a link to the material.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site
that you purchased it from? Potential readers can then see and use your unbiased opinion to make
purchase decisions. We at BPB can understand what you think about our products, and our
authors can see your feedback on their book. Thank you!
For more information about BPB, please visit www.bpbonline.com.
OceanofPDF.com
Table of Contents
9. Binary Morphology
9.1 Introduction
Structure
Objectives
9.2 Erosion
9.2.1 Illustration of erosion
9.2.2 Mathematics behind erosion
9.2.3 Application of erosion
9.2.4 Python code for erosion
9.3 Dilation
9.3.1 Illustration of dilation
9.3.2 Mathematics behind dilation
9.3.3 Python code for dilation
9.4 Erosion dilation duality
9.5 Opening and closing
9.5.1 Illustration of opening and closing
9.5.2 Mathematical formalization of opening and closing
9.5.3 Application of opening and closing
9.5.4 Python code for opening and closing
9.6 Hit and miss transform
9.6.1 Mathematics behind hit and miss transform
9.6.2 Python code for hit and miss transform
9.7 Boundary extraction
9.7.1 Mathematics behind boundary extraction
9.7.2 Subscripted vs. linear indices
9.7.3 Python code for boundary extraction
9.8 Hole filling
9.8.1 Defining a hole
9.8.2 Hole filling algorithm
9.8.3 Python code for hole filling
9.9 Region filling
9.10 Connected component analysis
9.10.1 Two pass method
9.10.1.1 Pass 1 of connected component analysis
9.10.1.2 Pass 2 of connected component analysis
9.11 Connected component analysis using skimage library
9.12 Convex hull
9.12.1 Graham scan procedure for finding convex hull
9.12.2 Code for understanding convex hull computation
9.12.3 Convex hull of objects in binary images
9.12.4 Python’s inbuilt method for finding convex hull
9.13 Thinning
9.13.1 Illustration of thinning
9.13.2 Mathematics behind thinning
9.13.3 Python code for thinning
9.14 Thickening
9.15 Skeletons
9.15.1 Illustration of skeletonizing
9.15.2 Mathematical formalization of skeletonizing
9.15.3 Python code for skeletonizing
Conclusion
Points to remember
Exercises
Index
OceanofPDF.com
CHAPTER 1
Introduction to Digital Images
1.1 Introduction
Let us begin our journey of digital image processing by understanding some
fundamentals about images. In this chapter, we intend to understand what
image data is. We will learn its representation inside computers and the
meaning of processing an image. A distinction between the types of images
will be made. We will explore the relationship between pixels of an image
i.e., neighborhood and connectivity. Various sections of this chapter will
enable the reader to develop a basic understanding of grayscale and red,
green, and blue (RGB) images and hence will equip the reader to play with
actual image data in the next chapter by using Python programming
language and related libraries. The concepts developed in this chapter will be
implemented in the next chapter to provide hands-on exercise and hence a
richer learning experience.
But before beginning, one must understand the utility of learning digital
image processing. Out of all the senses available to human beings, vision is
often associated with trust and a lot of information and hence the famous
saying — A picture is worth thousand words. Computers try to imitate the
sense of vision by using cameras and the associated algorithms for
processing them. That is where this subject of digital image processing
becomes important. It would be helpful for us if a computer reads number
plates of thousands of vehicles on a toll road instead of a human manually
reading them. Obviously, the intelligence behind interpreting the image for
identifying the number plate and extracting numbers from the image is to be
developed by us. Well, that is what is the subject matter of this book. There
are many more such examples which will demand knowledge of digital
image processing.
Structure
The chapter discusses the following topics:
• Digital images
• Grayscale and RGB images
• Basic image conventions and indexing
• Image formats
Objectives
After reading this chapter, the reader will be able to understand the popular
types of images like RGB and grayscale. Another objective of this chapter is
to introduce the basic element of an image i.e. pixel. Every image is
composed of pixels and every pixel has its identifiers. The reader will learn
the conventions associated with the numbering of pixels as they are used by
popular computational packages like Python. At the end of the chapter,
image formats and their usage are discussed which will help the reader
distinguish between popular image storage formats like bitmap and Joint
Photographic Experts Group (JPEG).
Figure 1.1: Grayscale image on the left and zoomed version of inset on the right
In a format called the 8-bit grayscale image, every pixel is represented by
eight bits inside a computer. The smallest number is (00000000)2=(0)10 and
the largest number is (11111111)2=(255)10. So, there are intensities from zero
to 255 in this format. We may also have a 16-bit grayscale format and there,
the white intensity will be represented by 65535 because that is the largest
number that can be represented by 16 bits. The convention of zero for black
and 255 for white is also arbitrary. One may choose to represent black by
255 and white by 0 too. But in this book, we will stick to the former as it is
widely used. In general, we may have a n-bit grayscale image.
Another important point that we need to understand at this stage is that
between black and white, theoretically, there are infinite shades (intensities)
of gray. But in any given format, we have finite shades of gray. For example,
in an 8-bit grayscale format, we have intensities from zero to 255 i.e., a total
of 256 intensities. Despite this quantization of infinite intensities to finite
levels, visually, there is no difference between the actual scene and the
image. This is because the human eye is insensitive to changes in intensity
between two shades of gray.
55 [255,128,0] Orange
Figure 1.4: R, G, and B frames in order from left to right for Figure 1.3
Note: The first row of blocks has no green or blue component as those values are zero in RGB
representation. It only has a red color in various proportions. The same applies to rows two
and three for green and blue colors. The last row, however, contains five blocks having random
colors. Now for this image, the corresponding red, green, and blue frames are shown in Figure
1.4. Evidently, the red frame does not look red at all. The same is true for green and blue
frames. Then what do these frames represent?
Let us understand this by looking only at the red frame in Figure 1.4. It does
not have a second and third row. This is expected as those two rows have
zero values for the red component. Further, the first row is present. Blocks in
the first row in the red frame have a decreasing order of brightness. This is
also expected from Figure 1.3, one can note that the red color component for
all five blocks in row one has decreasing values, 255,200,150,100 & 50. The
last row is present in every frame. This can also be understood as none of the
block colors have zero values for R, G, or B components. So, the conclusion
is that every frame of an RGB image is like a grayscale image. The intensity
of every pixel in any of the R, G, or B frames represents how strongly that
color is present for that pixel. White color in the R frame will mean full
presence of red color and zero means absence. Any value between these will
mean red is present but not to the full extent.
Let us see a natural image and its R, G, and B components in Figure 1.5 and
the corresponding RGB frames in Figure 1.6:
Figure 1.6: R,G and B components of the image shown in Figure 1.5
1.3 Basic image conventions and indexing
We need to understand the basic conventions related to images. This will
help us in programming too. Refer to Figure 1.7 for the discussion that
follows.
Note: From now onwards, we will be discussing grayscale images only. The generalization of
the presented concepts to higher dimensional images is straightforward.
The first fact to note in Figure 1.7 is that the axis of X matches with the
conventional x-axis but, the axis of Y is in opposite direction to the
conventional y-axis (i.e., vertically downwards). The axis of X increases as
we increase column count and the axis of Y increases as we increase row
count. This convention is followed by most programming languages like
Python, MATLAB, GNU Octave and Scilab, etc:
This pixel index is of two types, subscripted index & linear index. In
subscripted indexing, a pixel is accessed by specifying its row number i and
column number j in the format (i,j). This is a very convenient way of
addressing or indexing a pixel as it is visually intuitive. But since every
array, be it 1D, 2D, 3D, or in general ND, is stored as a 1D array inside the
memory of the computer so accessing pixels by their subscripted indices
puts an additional computational load of converting those to linear indices.
That is where we come to linear indices. These are the indices of an
equivalent 1D array formed from a 2D (or in general ND) array by placing
columns of 2D images one over the other in order. In Figure 1.7, please see
that for every pixel the subscripted indices are shown in small brackets ()
and linear indices are shown in []. So, assuming the name of the image is I, a
pixel may be accessed in the following two alternative ways of indexing:
I(i,j) or I(k) where k=j×r+i
If subscripted indices are known, it is easy to calculate the linear indices as
shown in the formula given in Figure 1.7. The reverse is also trivial. Also,
see what shape and size mean in the context of images in Figure 1.7.
Note: We have started counting rows or columns from zero. This is not necessary, but this is the
convention Python follows. So, we will stick to it. Other programming languages like MATLAB
and GNU Octave, start numbering from one. Although we are going to learn Python
programming in the subsequent chapters just for reference, the Python code for subscripted to
linear indices conversion and back is given in Section 9.7.2. Maybe you could revisit this section
after getting started with Python to have a relook here.
Conclusion
In this chapter, an introduction to the fundamentals of digital image
processing is made. By now, the reader has learned how grayscale and
colored images are different. Readers will also appreciate that the image is a
rectangular grid structure that has basic elements such as pixels. Every pixel
can be uniquely identified by indexing. Indexing could be subscripted or
linear. The various image formats for saving the image data in the context of
storing and transmission are also discussed.
In the next chapter, we will learn the basics of Python programming
language, so that subsequently we can modify the images the way we want
through methods and algorithms.
Points to remember
• The basic element of a digital image is a pixel.
• The image is a rectangular grid of pixels.
• Pixel is uniquely defined by its location on an image.
• Grayscale images are two-dimensional arrays.
• Each pixel in an 8-bit grayscale image can have integer values in the
range zero to 255.
• Colored images are three dimensional arrays.
• Each pixel in the 24-bit colored image is represented by a tuple of three
numbers where individual numbers can have integer values in the range
zero to 255
• Pixels can be linearly indexed or indexed using subscripted indices.
• The Y-axis of the cartesian coordinate and image are in opposite
directions.
• A bitmap is a compression free image storage format but it takes the
largest space to store.
Exercises
1. Identify which color is gray out of the following: (256,45,36), (0,0,255),
(10,10,10). [Hint: See Section 1.2.2]
2. Use any image viewer (like MS Paint on the Windows operating
system) and change the type of an image from .bmp to .jpg and save.
Notice the change in file size of the two images and explain the reason
for this size increment/decrement. [Hint: See Section 1.4]
3. For an image having size 500x100, find the linear indices corresponding
to pixel coordinates (50,50). [Hint: See Section 1.3]
OceanofPDF.com
CHAPTER 2
Python Fundamentals and Related Libraries
2.1 Introduction
In order to play with images and perform processing tasks, we will need to learn some programming, friendly
to images. Out of the available programming languages, we prefer Python on account of its open source, free
nature, and easy syntax. Practically speaking, it is also lightweight on computers, even if they are old. Its cross
platform nature also makes it the programming language of choice. You can run it on Windows OS, Mac OS,
Linux or Android. A second but important reason is the availability of Python libraries for various tasks. Some
contemporary tasks include machine learning, deep learning, and artificial intelligence. There are plenty of
libraries available for Python. We will learn some of them in this chapter in the context of image processing and
many more in the subsequent chapters. Python also lets you integrate various systems easily.
At the time of writing this book, the Python version is 3.x, but concepts developed here could be used on any
version (past and most probably future too) with minimal to no changes in the code. Also, remember that this
chapter is not an exhaustive coverage of Python. In this chapter, we intend to learn enough programming in
Python to start exploring the world of image processing just like a newborn that spends the first few years
learning to speak and then starts to comprehend ideas.
Structure
The chapter discusses the following topics:
• Installing Python software
• General procedure for installing libraries in Python
• Basic language elements of Python
• NumPy library
• Matplotlib library
• OpenCV library
• Pandas library
Objectives
After reading this chapter, the reader will be able to make programs in Python programming language. Since no
background in Python is assumed, this chapter serves as a self learning module from scratch. The reader will be
able to install Python, import images from their computers (hard drives) into Python, and display them. The
reader will also be able to display several pictures and plots on the same figure window with annotations.
Another objective is to learn the Python libraries relevant to image processing. The user will also be able to
define custom packages in Python after reading this chapter.
Note: Inside a block (whether it is a block of If, elif, or else, one may have another conditional statement if desired. It is possible to nest
conditional statements as long as we use proper indentation).
--------------------------------------------------------- 2
Value returned by lambda ... 1
Value returned by lambda ... 4
Completed Successfully ...
Line numbers 10 to 13 of Code 2.6 illustrates the definition of a function. A function needs to be defined before
its usage. The function is also meant to serve a repetitive task like a loop. The difference can be understood by
extending our old example, the evaluation of answer scripts by the professor. Assuming that a professor is
teaching a course in more than one university, he/she has to go to the respective exam center of the college to
check answer scripts. The evaluation standard in this case (function) is again the same. But the services are
needed at different geographical places (in our case different places in codes), that is why functions are useful.
But apart from this, they are more computationally expensive as compared to loops in the sense that while
executing a main program, if a function call is made, the computer must shift the control from the current
position to the position of that function. Execute it and then return to the main function. During this journey, the
important data of the main function is kept on the stack (last in first out list) and retrieved when the control is
returned to the main function. That is why functions are more expensive than loops.
Coming back to defining a function in lines number 10 to 13 of Code 2.6, def keyword is used for defining a
function followed by the name of the function which is a single continuous string. Then in small braces, the
input arguments of the function need to be defined. In our case, the function name is my_func and it takes two
arguments a and b as input. This line must be terminated by the colon. Then comes the function body, it has the
same rules of indentation as if-else and loops. Notice the last line of the function body. The keyword used there
is return. If you have read the last paragraph, this should not be a mysterious word by now. After returning the
keyword, you may notice two variables c, and d. These are the variables that are computed in the function
definition and returned to the main function. Note that you may use any number of input arguments (of any
type like int, float, list, etc.) and it may return any number of variables (of any type) in Python. Now, let us
have a look at what has happened inside the function definition in this specific case. It is easy to see from line
numbers 11 and 12 of Code 2.6 that c and d are simply addition and subtraction respectively of a and b.
Now, let us see line number 16 of Code 2.6. Here, we called the function that we defined. The input parameters
are 2 and 3 which after processing will be returned to variables m and n respectively. Check that in the output
when c and d are printed.
In line numbers 18 of Code 2.6, the function is called again (you can use that function any number of times,
anywhere you want in the code). The point of difference from the first function call to note here is that the
second input argument, instead of being a number is — sum(my_func(5,6)). Here, my_func(5,6) returns two
numbers which the sum() function sums up to give the single number and that acts as the second argument,
which is perfectly valid.
A special class of functions that have only one output and more than one input and can be defined in only one
line (one expression) of code are called lambda functions or simply lambdas. In line number 26 of Code 2.6,
a lambda function is defined as x=lambda a,b,c : a+b-c. Here, a, b, and c are inputs (there can be as many as
desired). The expression that is evaluated to produce a single output x is a+b-c. This function is used as
x(2,3,4) to give an output that may be collected in some other variable or printed as shown in the next line.
Figure 2.12: Images used for import and display. (a) The Gwalior Fort Image (b) River Ganges Image
Once installed, OpenCV can be utilized by entering the following statement in Python — import cv2 as shown
in line number 4 of Code 2.14. Since its name is short i.e., cv2, we will not assign a nickname to it. Line
number 7 which reads - input_image1=cv2.imread('img1.bmp',1) shows us the way to import an image in
Python. Here, input_image1 will be an array capable of storing an unsigned integer of 8 bits. Since it is a
colored image, there will be three frames, and hence, input_image1 is a 3D array. cv2.imread function is used
to read images. The first argument is the name of the image with file extension. The second argument tells
whether we want to import it in colored or grayscale mode. 1 will stand for color and 0 for grayscale. In line
number 8, we have imported another image in grayscale mode (although it was colored originally).
01- #======================================================================
02- # PURPOSE : Import/Displaying Images in OPENCV's Default way
03- #======================================================================
04- import cv2
05-
06- # Read the image
07- input_image1=cv2.imread('img1.bmp',1)
08- input_image2=cv2.imread('img2.bmp',0)
09- # Second argument in above command is 1 for reading image as colored
10- # and 0 for reading it as grayscale
11-
12- # Display the image
13- cv2.imshow('First Image',input_image1)
14- cv2.waitKey(2000) # Put time in milliseconds here
15- cv2.destroyAllWindows()
16-
17- cv2.imshow('Second Image',input_image2)
18- cv2.waitKey(0) # For 0, python waits for keypress
19- cv2.destroyAllWindows()
20-
21- # Print shape of image
22- a1=input_image1.shape;
23- a2=input_image2.shape;
24- print('The shape of image is 1 ... ',a1,'\nThe shape of image is 2 ... ',a2)
25-
26- print("Completed Successfully ...")
Code 2.14: Basic image import and display using OpenCV’s default way
Output of Code 2.14 (on Python shell):
The shape of image is 1 ... (384, 512, 3)
The shape of image is 2 ... (353, 500)
Completed Successfully ...
Line numbers 13 to 19 of Code 2.14 shows two ways of displaying in sequence. The first image will be
displayed for 2000 milliseconds (i.e., two seconds), and then it will be closed automatically. Next, the second
image will be displayed, do not be surprised to see it in grayscale mode, as it was imported in that mode to
begin with. It will be displayed till the user presses any key. Once the user presses the key on the Python shell,
the shape of the image will be printed. It can be noted from the output on the shell that the first image (which
was imported in colored mode) has 384 rows, 512 columns, and a height of three corresponding to red, green,
and blue frames. The second image only has 353 rows and 500 columns but the height parameter is absent as it
is only a 2D image as discussed earlier in Section 1.2.1 and 1.2.2.
Conclusion
In this chapter, the reader must have developed a familiarity with the Python programming language. The
process of installation and usage of basic language elements of Python for a beginner to average level
programmer was illustrated. After reading this chapter, tasks like importing, displaying, and saving the images
through Python programming language should have become easy tasks by now.
In the next chapter, some low to medium level image processing tasks are introduced for the reader to start their
journey of manipulating digital images.
Points to remember
• Python is an open source and free programming language.
• The syntax of Python is mostly similar to MATLAB.
• One can use Python shell for testing individual commands or use files to write multiple lines of code.
• There are multiple libraries/packages available in Python. Multiple packages may contain the function by
the same name.
• NumPy library is used for array based computations.
• Matplotlib library is used for making plots.
• OpenCV library is used for image manipulation.
• Pandas library is used for dealing with spreadsheets (excel files).
Exercises
1. Import a colored image from your hard disk to Python (IDLE) and display it on the screen. (Hint: see
Section 2.7)
2. Examine the array created in Python after importing the image in question one for its shape and intensity
values. (Hint: see Section 2.5)
3. Illustrate the differences between imshow function of OpenCV and Matplotlib library. [Hint: see Section
2.7.1 and 2.7.2]
4. After importing the numpy library (as np), try the following command on Python shell – ‘help(np.pi)’ and
read and understand the result.
5. Write a Python program to add the contents of columns A and B in sheet one of the Excel file named
Data.xlsx. Columns one and two contain numbers only. Use only the first ten rows. Put the result in
column two of the same Excel file but in sheet two. To begin with, create an Excel sheet named Data.xlsx
initialized by some random data. (Hint: see Section 2.8)
OceanofPDF.com
CHAPTER 3
Playing with Digital Images
3.1 Introduction
After going through Chapter 1 and Chapter 2, the readers must have become familiar with
image basics as well as fundamentals of Python programming (relevant to digital image
processing). In this chapter, we will take the first steps to manipulate images. We will learn
the basic relationships between pixels, what histogram (which is a plot of intensity versus
corresponding number of pixels in a given image) is and how to make it, and some
transformations on image intensities and pixel locations. Although most of the book is
dedicated to grayscale images, we will also touch on the topic of conversion from one color
framework to another.
Structure
This chapter will discuss the following topics:
• Playing with pixel and patches
• Neighborhood of a pixel
• Histogram processing
• Basic transformation on images
• Color models
Objectives
After reading this chapter, the reader will understand some very fundamental operations with
the images, like manipulating pixels and their neighborhood, drawing a histogram of a given
image to extract information about the frequency of individual intensities in the original
image, basic intensity, and spatial transformation on images. The reader will also learn how
color is represented inside computers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
initial pixel coordinate, is the linear map matrix, and is the translation
matrix.
Before discussing the affine transformation in totality, we will discuss the effect of linear
mapping on a uniform grid in 2D. In Table 3.3, we will consider the effect shown below:
Equation 3.4:
X'=AX
Notice that as compared to the equation of affine transformation, the translation matrix B=0
here. For , some special matrices for very specific values of a, b, c, and d are
shown in Table 3.3:
S. Linear map
Transformation matrix Before and after transformation
No. name
Identity
1.
transformation
Scaling in X-
2. direction by a
factor of 2
Rotation by
angle θ in
counter
3. clockwise
direction. θ =
π/6 in current
case
Mirroring
4.
about X axis
Horizontal
5. shear mapping
by factor m
Horizontal
6. squeeze
mapping
Projection onto
7.
Y axis
Apart from the linear mappings depicted in Table 3.3, there is an additional global movement
of the set of grid points as directed by . The preceding equation can also be written in the
following format, as shown in the following equation:
Equation 3.7:
The advantage of the above form is that we now have to deal with only one matrix; earlier,
there were two. One for linear map and another for translation. Note that the above equation
has 6 degrees of freedom as there are 6 free parameters a, b, c, d e and f – they can, in
general, have any arbitrary value.
The matrix for squeeze mapping by k=2 together with a translation in X-direction by 40 and
Y-direction by 50 is:
Let us try to apply the affine transformation (as dictated by the above matrix) to an actual
image. Then, we will discuss some important points about affine transformation. The code
for that is Code 3.13 with its output shown in Figure 3.15. In line number 22, affine
transformation matrix (ATM) for the above specified case of squeeze mapping is created.
In line number 28, affine transform is applied by using the warp function. However, instead
of passing ATM as the second argument, we pass its inverse (this is the requirement of the
function) as - np.linalg.inv(ATM). linalg means linear algebra sub library and inv means
inverse. In the output shown in Figure 3.15, there are two versions of transformed output.
The first is cropped, and the second is uncropped. The size of the cropped version is the
same as the input image, but it loses some image portion that goes outside the input image
dimensions. The second version, which is an uncropped version, is created in line number
30. Notice that the command is exactly the same as line number 28, with only one additional
argument output_shape=(rows/1.5+50,cols*1.5). This forces an output shape on the image,
and hence, we can see the chopped-off portions of the processed image as well. We only
have to pass a tuple in the format (rows desired, columns desired) that we wish to see in
output.
01-
#=================================================================
=====
02- # PURPOSE : Learning Affine Transformation on images
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- from skimage.transform import warp # For warping the image by transform matrix
06- import my_package.my_functions as mf # This is a user defined package and ...
07- # one may find the details related to its contents and usage in section 2.7.3
08-
09- #--------------------------------------------------------------------------
10- # Importing the image and displaying
11- #--------------------------------------------------------------------------
12- input_image=cv2.imread('img1.bmp',0)
13- rows, cols = input_image.shape
14-
15- fig,ax=plt.subplots(1,3)
16- fig.show()
17- mf.my_imshow(input_image,"(a) Input Grayscale Image",ax[0])
18-
19- #--------------------------------------------------------------------------
20- # Creating Affine Transform Matrix (ATM)
21- #--------------------------------------------------------------------------
22- ATM=np.float32([[1.5,0,40],[0,1/1.5,50],[0,0,1]])
23-
24- #--------------------------------------------------------------------------
25- # Warping the image according to the matrix selected and displaying
26- #--------------------------------------------------------------------------
27-
28- affine_transformed_image=mf.norm_uint8(warp(input_image,np.linalg.inv(ATM)))
29- mf.my_imshow(affine_transformed_image ,"(b) Affine image (cropped)",ax[1])
30-
affine_transformed_image=mf.norm_uint8(warp(input_image,np.linalg.inv(ATM),output_sh
ape=(np.int16(rows/1.5+50),np.int16(cols*1.5))))
31- mf.my_imshow(affine_transformed_image ,"(c) Affine image (un-cropped)",ax[2])
32-
33- plt.show()
34- print("Completed Successfully ...")
Code 3.13: Code for affine transformation on image
The output of the code is shown below:
Figure 3.15: Output of Code 3.13(a) Original grayscale image (b) Squeeze mapping by a factor 2 – cropped output,
cropping is done to match input image size (c) Squeeze mapping by a factor 2 – uncropped output
While interpreting the output in Figure 3.15, remember the convention of the X and Y axes
on the image described in Section 1.3 i.e., the top left corner of the image is point (0,0), the
top edge of the image is X axis (positive rightwards), and the left vertical edge is Y axis
(positive downwards). For squeeze mapping by k=2, one may note in part (c) of Figure 3.15
that the image has been stretched by a factor of 2 in the X direction and shrunken by a factor
of 2 (or stretched by ½) in the direction of Y. Further, in the X direction, there is a shift by
40, and in the Y direction, there is a shift by 50.
You may now try all the transforms discussed in this section by replacing the ATM in line
number 20/30 in Code 3.13. Now, having understood the mathematical background of affine
transforms and their application, let us understand some important characteristics of affine
transform. Refer to part (a) of Figure 3.16, wherein the original grayscale image with a
geometrical shape is shown. In parts (b), (c), and (d), shear mapping, rotation, and squeeze
mapping are applied successively.
where R= ,T= , E=[g h]. From the previous section, we know the effect of R (which
is rotation or a linear transformation in general) and T (which is a translation). For affine
transformation, E=[0 0] and i=1 also z=z’=1. Mathematically speaking, a matrix of order
n×n transforms an n dimensional vector (point) to another n dimensional vector. For
example, in Equation 3.8, a 3D point (x,y,z) is transformed to (x’,y’,z’). However, in the
previous section, we used this matrix to translate a 2D point to another 2D point such that
z=z’=1. Actually, (x,y,1) and (x’,y’,1) are still 3D points but those 3D points are restricted so
that they only lie in plane z=1, and hence we treat them as 2D points. We did this because we
wanted to incorporate translation together with linear transformation, making the overall
operation non-linear, called affine transform. In 3D, this transformation is linear, but for a 2D
point, this becomes non-linear.
Now, we want to investigate what happens when g, h, and i can take any arbitrary values.
However, remember that we are still talking about 2D points. To understand the situation
practically, we will define a new coordinate system called homogeneous coordinate
system, as depicted in Figure 3.17. The 3D coordinate is called homogeneous
coordinate, and we are paying special attention to the plane =1. That plane is the plane
where we would like to watch the phenomena of transforming the set of points (or shapes in
general) to another set of points (i.e., the initial and final positions of points or shapes will
remain in the plane itself). Let us assign a new coordinate system of 2D cartesian coordinates
to this plane ( =1) where a general point is represented by (x,y) on that plane.
(x,y).
Also, understand that every point lying on that line L will map to the same point on the
plane. That is, through the above procedure, points in 3D are projected onto points in 2D.
However, remember that the two coordinates are different.
Now, let us refer to Figure 3.18. Notice that the four planes that are shown there by name P1,
P2, P3, and P4 are actually the same planes when seen from a 3D origin because their
projection (considering origin as projection center) on the plane =1 is same. That is, P2
(which lies in the plane =1) is the projection of P1 as well as P3 and P4.
Figure 3.18: Equivalent planes in 2D coordinates
So, planes P1, P3, and P4, which are randomly oriented in 3D, have the same image in 2D
plane =1. This is called equivalence of planes during projection.
Now, let us come back to defining our objective of doing projective transformation. A 2D
projective transformation (i.e., in plane =1) is defined as the transformation of one
quadrilateral to another quadrilateral. This is an informal definition of projective
transformation. In general, the transformation defined in Equation 3.8 will transform a 3D
point to 3D point (or a 3D shape to a 3D shape). However, if we apply some constraint on
the structure of a matrix/equation, we will be able to map 3D shape to 3D shape so that the
initial and final shapes lie in plane =1.
The constraint that we apply on Equation 3.8 is that the coordinates of initial and final point
will be written in homogeneous coordinates. Hence, Equation 3.8 can be rewritten, as shown
below:
Equation 3.9:
All we have done in Equation 3.9 is that on RHS, the coordinates of source/input points are
written with z=1 because that shape already lies on the plane z=1. Through 3x3 matrix
transformation, that shape will be transformed to an arbitrary location in 3D. This is
represented by . It is a projection on the plane (of 2D cartesian coordinates with
z=1) is ( / , / ,1) and the 2D cartesian coordinate is (x’, y’).
Since the introduction of homogeneous coordinates, any point in 3D in general will be
normalized such that z coordinate will be 1. In the RHS of Equation 3.9, whether we write
, both are equivalent. So, Equation 3.9 is unique up to a scale factor. From the 3x3
transformation matrix, if we take a common factor, say i, every other element will be divided
by i. Hence, the last element of the matrix will be 1 (although this can be done with any
element, but conventionally, it is done with the last one). So, Equation 3.9 can be rewritten,
as:
Equati on 3.10:
This is our final equation for projective transformation, which has 8 degrees of freedom (as
in a 3x3 matrix, there are 8 values to choose). Remember, the source/input and target/output
coordinates are written in homogeneous coordinates, and the equation is valid up to a scale
factor only because of the reasons discussed earlier.
We are already aware of what a, b, c, d, e, and f will do in transformation matrix. Now,
through Figure 3.19, let us see what g and h are capable of doing. For this, the
transformation matrix that we are using is . One may note the change of values of g
and h in various sub-plots of Figure 3.19. This process is called elation. Notice that due to
this, parallel lines will not remain parallel, and the ratios of distances will not be preserved as
well.
The reverse relation is trivial to find out. The CMY color model is mostly used by the
printing industry.
Equation 3.13:
Equation 3.14:
Equation 3.15:
Equation 3.18:
Conclusion
In this chapter, the reader must have developed a beginner level understanding of dealing
with the smallest element of an image, that is a pixel. Histogram processing was introduced
to extract some useful information from images. Through histogram equalization, the
histogram with arbitrary distribution can be converted to uniform distribution. Matching was
also introduced to make the histogram of a given image similar to a target image. Some basic
global transformations were introduced, and affine and projective transformations were
covered in detail. For color image processing, basic color models were also introduced,
illustrating their relation to the RGB color model to facilitate the conversion.
Points to remember
• The neighborhood of a pixel can be of many types like 4, 8. This can be closed (meaning
the central pixel is included) or open.
• The histogram of an image is a plot of intensities on the X-axis and their frequencies on
the Y-axis.
• Histogram equalization means modifying the histogram of the current image so that it
becomes uniform. This is achieved by modifying pixel values in the original image by
the prescribed procedure.
• Histogram matching/specification means making a histogram of one image like the other.
• Logarithmic transform on images is a global, non-linear transform that enlightens the
darker intensities.
• Affine transformation is linear mapping plus translation.
• Projective transformation is a generalization of affine transformation in the sense that
affine uses 6 degrees of freedom, but projective uses 8.
• The RGB color model is used by display devices like computer screens, the CMY color
model is used by printers, and the HIS model is used by color picker programs.
Exercises
1. Using Python, import two images captured from the same camera and from the exact
same scene. The first image should be taken in daylight and the second at night. Form
their histograms and compare them. [Hint: Refer Code 3.2]
2. For the images imported in question 1, equalize both the images and then compare their
histograms and visual quality.
3. For the image shown in part (a) of Figure 3.11, choose a suitable gamma transformation
by choosing the value of γ so that the results match with part (b) of Figure 3.11 due to
the log transformation.
4. Scan a document using your mobile camera and straighten the image so formed by
using projective transformation. [Hint: See Section 3.5.2.2]
5. Use a color picker program and note the HSI value for the chosen color. Calculate the
corresponding RGB values and verify with the RGB values displayed in the color
picker. [Hint: In the Microsoft Windows Paint program, the color picker shows both
values].
OceanofPDF.com
CHAPTER 4
Spatial Domain Processing
4.1 Introduction
In this chapter, we will explore the methods/systems that are applied to
digital images for processing them and obtaining the desired output in the
spatial domain. For example, one may want to find all vertical lines (if any)
in the given image. One may want to highlight/illuminate the areas that are
darker in the image, leaving other areas untouched or one may want to
sharpen the image for better visibility — countless such applications can be
listed. For these specific applications, there should be specific systems that
should get the job done. An important question at this stage is the number of
such systems that can be designed. Even if we have designed systems for all
such applications — a new application will pop out the next moment.
We will begin by understanding signals (image in our case) and systems in
1D and, from there, generalize the understanding of 2D image processing
systems. Although the introduction to signals and systems in this chapter is
not mathematically rigorous, it is still necessary and intuitively appealing for
understanding.
Structure
In this chapter, we will cover the following topics:
• Signals in one dimension
• Systems in one dimension
• Graphical illustration of one-dimensional convolution
• One dimensional filter design intuition
• Concept of two-dimensional filtering of images
• Two-dimensional filtering of images using Python
• Smoothening low pass filters in spatial domain
• Sharpening filters
• Convolution vs. correlation
Objectives
After reading this chapter, the reader will be able to understand the concepts
of spatial domain filtering for one and two-dimensional signals. The reader
will be able to apply concepts so developed to digital images for enhancing
them using the well-established methods in spatial domain processing. The
reader will be able to understand various smoothening and sharpening filters
(just like water filter that separates dirt from water) and the method of
creation of such filters. Finally, the operation of convolution for achieving
the spatial domain filtering through various filters (also called as kernels)
will also be presented.
Equations 4.3:
Where A, B, and C are scaling factors. The utility of having such system
characteristics will become clear soon.
4.3.2 Time invariant systems
A system whose characteristics do not change with time is called a time
invariant system. That is, if you give one input to the system today and get
some output (today), and then you give the same input to it tomorrow, you
will get the exact same output that you got today. Obviously, we would like
our system to behave in this way but there are some applications where the
other category is also useful and that category is called time variant
systems. Mathematically, if a system is represented by 𝑦𝑛=𝑇(𝑥𝑛), where T
is the transformation (or system), 𝑥[𝑛] is the input to the system and 𝑦[𝑛]
the output, then the following property shown in below should hold for it to
be time invariant:
Equation 4.4:
𝑦𝑛−𝑎=𝑇(𝑥𝑛−𝑎)
Where a is the delay in time. For negative values of a instead of delay, we
will have advance in time.
In the above equation, ℎ[𝑛] can, in general, be infinite in extent. It will then
be called infinite impulse response (IIR). However, if it is finite in nature,
it is called finite impulse response (FIR).
The result of the application of this filter on a signal is shown in Figure 4.5:
The above kind of filters give high weightage to the element processed and
its immediate neighbors. The weightage decreases as we move away from
the element being processed. There could be many such weighting structures
like Gaussian weighting (a symmetric bell-shaped curve) — all with similar
results.
Where f' (x) is the digital derivative corresponding to a data point f (x) and
f(x+h) is the data point h points forward. Similarly, f(x-h) is the data point h
points back. Now, there are two things to note: first, for digital/discrete data,
we assume h=1 (although it might take any value, but for images, two
consecutive pixels differ by 1 — this justifies our choice). Second, like two
point forward/backward/central difference, there exists n point
forward/backward/central difference equations as well. However, for
discussion purposes, we will use only 2-point formulas with h=1. So, the
preceding equations reduce to the following form represented in the
following equations:
Two-point forward difference (Equations 4.10):
The graphical illustration for the same is shown in the following figure:
Figure 4.8: Illustrating the 2-dimensional convolution process
Equation 4.15 can be compactly written, as:
Equation 4.16:
For a general pixel (x,y), Equation 4.16 will take the general form and
become Equation 4.14. For different filter sizes, the limits of summation will
change accordingly. It is conventional to create square-sized filters with a
total number of rows (and hence columns) as an odd number. Otherwise, it
will be difficult to determine the center of the filter.
Referring to Figure 4.8, the value of every pixel in the output image is
calculated in the same way. However, the calculation is done in order. We
start from the topmost left pixel, process it, and then move on to the next
pixel in that row. Once we are done with processing one row, we go to the
next row and repeat the same process until all the rows (and hence pixels)
are processed.
One issue that may be incurred while processing will be for the pixels
located at the boundary. For example, in part (e) of Figure 4.8, when the
center of the flipped filter is overlapped with I(0,0) (shown in a dark gray
shade), some portion of the filter window hangs outside the image, and there
is no intensity value corresponding to those filter coefficients. We assume all
such intensity values to be zero. It is a good idea to append a frame of zeros
of appropriate size around the image beforehand, as shown in part (e) of
Figure 4.8 – this is called zero padding. If the filter (and hence flipped filter)
is of size 5x5, the frame of zeros that needs to be appended will be 2 pixels
thick. For a N×N filter size, the frame of zeros should be (N-1)/2 pixel thick
(assuming N to be an odd number). In the next section, we will explore the
effect of applying different filters to the images.
Figure 4.10: Illustrating three types (shape) of output in the convolution process
The output of Code 4.2 is given as follows:
Shape of input image ... (66, 81)
Shape of convolution filter ... (5, 5)
Shape of SAME CONVOLUTION ... (66, 81)
Shape of VALID CONVOLUTION ... (62, 77)
Shape of FULL CONVOLUTION ... (70, 85)
Completed Successfully ...
Keep in mind that these shapes will change according to the filter size used.
The size of the padding is determined by half of the filter size. Usually,
filters are of odd no. × odd no. shape, so by half filter size, we mean (odd-
1)/2, as shown in the following equation:
Equation 4.17:
half filter size = (filter size-1)/2
For full convolution, the output shape will be more than the original image
by 2 x half filter size. For valid convolution, the output shape will be less
than the original image by 2 x half filter size. Also remember that in Python,
we need to specify same, valid, or full option and zero padding will be taken
care of accordingly internally. We do not have to manually zero pad the
image.
From this point onward, unless otherwise stated, we will use the SAME
convolution so that the shape of the output image matches the shape of the
input image.
Figure 4.15: Sufficiently zoomed out excel sheet for visualizing Circular filter with size m=111
Above is a snapshot of the excel sheet. One may see this in the code folder:
Figure 4.17: Result of application of (a) box filter (b) equivalent circular filter on test image of Figure
4.16.
Part (a) and (b) of Figure 4.17 shows the result after convolution with
equivalent box and circular filters, respectively. Notice that in part (a),
which is due to the box filter, apart from smoothening, there is a distortion of
circular shape (the final dark region appears to be square, and the effect of
smoothening is not uniform. It is different in horizontal, vertical vs. other
directions). However, in part (b), the distortion is absent — this is desired
also. This happens because a box filter is biased in horizontal and vertical
direction due to its very shape. However, the circular filter is isotropic (i.e.,
it behaves the same irrespective of the direction). This is the reason why
circular filters are preferred over box filters and any other shape for that
matter.
In the next section, we will discover that smoothening can be further
improved if the filter coefficients are wisely chosen.
From Equation 4.21, it can be clearly seen that there are two exponential
functions for the x and y directions, respectively (they are drawn in Figure
4.22). The advantage of this separability is that 2D convolution can be
replaced by two 1D convolutions, which reduces the computational
complexity of convolution to a great extent. Although the topic of
computational complexity of algorithms is out of the scope of this book, this
fact is worth noting about the Gaussian kernel. Not all kernels can be
separated in this way.
In general, Gaussian functions are important because a lot of natural
phenomena follow Gaussian distribution. For example, if we plot the
histogram of marks obtained by students in a class, it will probably be
Gaussian distributed. The histogram of the height of all the people in your
area of living (colony/zone) is Gaussian distributed. Noise in a lot of systems
also follows Gaussian distribution.
Code 4.5 implements the inbuilt Gaussian filter for smoothening.
gaussian_filter function comes from the scipy.ndimage library. It takes the
first argument as the input image and the second as the variance of the
Gaussian function (see line number 14 in Code 4.5). There could be more
arguments for modifying other default arguments, one can always use the
help function on the Python shell.
01-
#=====================================================
=================
02- # PURPOSE : Learning use of Inbuilt Gaussian filter
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import scipy.ndimage as sci
06- import my_package.my_functions as mf # This is a user defined package
and ...
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- input_image=cv2.imread('img1.bmp',0)
10- fig,ax=plt.subplots(1,2)
11- fig.show()
12- mf.my_imshow(input_image,'Input Grayscale Image',ax[0])
13- input_image=np.float32(input_image)
14-
15- filtered_image=sci.gaussian_filter(input_image,5)
16- mf.my_imshow(mf.norm_uint8(filtered_image),"Filtered image
(INBUILT GAUSSIAN FILTER)",ax[1])
17-
18- plt.show()
19- print("Completed Successfully ...")
Code 4.5: Code for using inbuilt Gaussian filter
The output of Code 4.5 will be similar to the results shown in Figure 4.12,
Figure 4.17, and Figure 4.19. That is why it is not shown separately.
Figure 4.23: Illustration of difference between original image and its blurred version
In part (a) and (b) of Figure 4.23, original grayscale image and its blurred
version (generated by Gaussian smoothening with σ=5) are shown
respectively. Part (c) shows the difference between the two. Note that most
portions in part (c) correspond to edges in the original image. To understand
this further, in part (d), we have only plotted those pixels from part (c) that
have intensities greater than 10% of the maximum value in (c). It can be
clearly understood that the difference between the original image and the
blurred version contains only high-frequency components (i.e., edge
information).
4.9.1 Unsharp masking and high boost filtering
If the information of the image in part (c) of Figure 4.23 is added back to the
original image, we will get a sharpened image, as shown in part (b) of
Figure 4.24. This process is called unsharp masking.
Note that the magnitude of the gradient tells you how strong the edge is at
that point (the higher the value, the stronger the edge). The direction of the
gradient tells us about the direction in which this edge is (in radians or
degrees). The direction vector is perpendicular to the direction of the edge at
that pixel. The result of the application of the Prewitt kernel is shown in
Figure 4.25. In part (a) of the figure, a test image is shown, which we will
use to understand the behavior of the Prewitt operator. Part (b) shows the
result of the application of the horizontal Prewitt kernel Px onto the image.
Now, we need to note a few things. Firstly, if we look at the edges of the
rectangle, only vertical edges are shown in the response in part (b). The
reason is that the placement of 0’s, 1’s, and -1’s in the horizontal kernel Px, it
will only respond to changes in the horizontal direction in the original
image. However, if one moves along the horizontal edge in the original
image, there is no change in its intensity; hence, the gradient along that
direction is 0.
Secondly, the vertical edges found in part (b) are different in color. The first
one is bright; however, the second one is darker. This can also be understood
from the placement of 0’s, 1’s, and -1’s in the horizontal filter Px. Note that
while convolution is performed, filter Px will be flipped and we obtain
flipped (Px) = . Try to visualize the process of convolution. For a
given pixel in the original image, we will put the center of this filter on that
pixel, find element by element product of overlapping elements, and then
add all those products. For a given Px, the response will be highest when the
pixel under processing has a black region (with all zeros) to its left and a
white region (with all 255’s) to its right. This is why the first vertical edge in
part (b) of Figure 4.25 is bright while the second vertical edge to the right
side is dark, as the response of the kernel on those points will be the least.
Try to correlate this with Figure 4.6, where we have studied the same thing
for a 1D derivative — a change from a low to a high value in signal results
in a positive peak in derivative, and for a high to low change in signal level,
a negative peak was observed in the derivative.
Similarly, other edges can be interpreted. Also, note that for the hypotenuse
of the triangle, the response in non-zero in both part (b) and part (c) of the
figure because it has changes in both horizontal and vertical direction.
The same arguments apply to part (c) of Figure 4.25 but for kernel Py. In
part (d) of the image, we find the overall gradient, as illustrated earlier in
this section. Also, we are only interested in the magnitude of the response.
One can clearly note that in part (d), all the edges are correctly marked.
Figure 4.25: Finding Image Gradient (magnitude only) by using Prewitt filter
The code for generating the response of the Prewitt filter (also the Sobel and
Roberts filter — which are discussed in the next section) is given in Code
4.7 By default, it is set to work for the Prewitt filter. To run it for other
filters, changes in line number 27, 28, and 48 are to be made.
01-
#=====================================================
=========================
02- # PURPOSE : Illustration of Prewitt, Sobel & Roberts kernel on Images
03-
#=====================================================
=========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import scipy.signal as sci
08- import my_package.my_functions as mf # This is a user defined package
and ...
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- input_image=np.float32(cv2.imread('img13.bmp',0))
12- #------------------------------------------------------------------------------
13- # CREATING PREWITT, SOBEL or ROBERTS KERNELS
14- #------------------------------------------------------------------------------
15- my_filter1_x_Prewitt=np.float32([[1,0,-1],[1,0,-1],[1,0,-1]]) # Prewitt x-
direction
16- my_filter1_y_Prewitt=np.float32([[1,1,1],[0,0,0],[-1,-1,-1]]) # Prewitt y-
direction
17-
18- my_filter1_x_Sobel=np.float32([[1,0,-1],[2,0,-2],[1,0,-1]]) # Sobel x-
direction
19- my_filter1_y_Sobel=np.float32([[1,2,1],[0,0,0],[-1,-2,-1]]) # Sobel y-
direction
20-
21- my_filter1_x_Roberts=np.float32([[0,-1],[1,0]]) # Roberts x-direction
22- my_filter1_y_Roberts=np.float32([[-1,0],[0,1]]) # Roberts y-direction
23-
24- #------------------------------------------------------------------------------
25- # SELECTING PREWITT or SOBEL or ROBERTS KERNELS
HERE
26- #------------------------------------------------------------------------------
27- filter1_x=my_filter1_x_Roberts
28- filter1_y=my_filter1_y_Roberts
29-
30- #------------------------------------------------------------------------------
31- # 2D CONVOLUTION
32- #------------------------------------------------------------------------------
33- G_mag_x=sci.convolve(input_image,filter1_x,'same') # x-derivative
34- G_mag_y=sci.convolve(input_image,filter1_y,'same') # y-derivative
35- G_mag=(G_mag_x**2+G_mag_y**2)**(1/2) # Gradient Magnitude
36-
37- #------------------------------------------------------------------------------
38- # PLOTTING
39- #------------------------------------------------------------------------------
40- fig1,ax1=plt.subplots(2,2)
41- fig1.show()
42- mf.my_imshow(mf.norm_uint8(input_image),"(a) Input
Image",ax1[0,0])
43- mf.my_imshow(mf.norm_uint8(G_mag_x),"(b) x-derivative",ax1[0,1])
44- mf.my_imshow(mf.norm_uint8(G_mag_y),"(c) y-derivative",ax1[1,0])
45- mf.my_imshow(mf.norm_uint8(G_mag),"(d) Gradient
Magnitude",ax1[1,1])
46-
47- # Way of adding super title to figure having subplots
48- fig1.suptitle("Roberts Kernel Results", fontsize=15)
49- # Change the string in above line according to selected kernel
50-
51- plt.show()
52- print("Completed Successfully ...")
Code 4.7: Image Gradient by Prewitt, Sobel, and Robert's method
Having understood the Prewitt operator/kernel — which was derived from
two-point central difference formula, we may synthesize the following
popular kernels and use them in place of horizontal and vertical Prewitt
kernel with the gradient illustrated above. There are many such kernels; we
discuss some other popular ones in the next two sections.
Figure 4.26: Finding Image Gradient (magnitude only) by using Sobel filter
Note that for this test image shown in part (a) of Figure 4.26, the results
seem to be similar to Prewitt kernel’s results — and that is expected.
Equations 4.28:
or
Equation 4.30:
Keeping in mind the above equation, it is not difficult to see the Laplacian
kernel structure as: . Since this filter is symmetrical in both x and y
directions, there is no need to have two separate filters in x and y directions.
Let us now see the response of this filter over our test image in Figure 4.30:
Figure 4.30: Application of Laplacian kernel to test image
An interesting point to note here is the result in part (b) of Figure 4.30,
unlike the Prewitt or Soble operators, where we had either bright or dark
response for the left and right vertical edges of the rectangle, respectively,
we have both bright and dark response at both. You may compare this with
Figure 4.7 where the same second derivative is applied to a 1D signal.
There, we pointed out that we would have zero crossing at the edge, and
around that zero crossing, we would have two peaks in opposite directions.
The sign of the first peak will be decided by whether the change in signal is
from high to low (positive peak) or low to high (negative peak). The other
peak after the zero crossing will have an opposite sign. The same
phenomenon is observed in 2D here. Instead of having a single edge
response, we have a double edge response for the Laplacian kernel.
Because of the double edge problem, Laplacian is not used for sharpening
but rather used for edge detection because, at the edge, we have a zero
crossing. Again, one might argue that in the case of the first derivatives, we
had peaks (maxima or minima) at those locations, so our purpose is served
there, but the argument is that maxima or minima could be local or global –
so they are relative. However, zero crossing is a zero crossing whether the
edge is strong or weaker in contrast. Secondly, it is easy to design a zero-
crossing detector as compared to a peak detector. This justifies the usage of
the Laplacian kernel.
Like the kernel used above for the second derivative, kernels are also
derived from other second order equations. Some of them are:
. The latter two are primarily used as point detectors;
we recommend trying them out.
Figure 4.32: Similarity measurement using correlation and equivalent process for convolution
From here, we learned that if one desires to find out the location of a match
in the two signals, one should use correlation and not convolution. Although
mathematically, both operations seem to be the same except for the flipping
of kernel. However, if the filter/signal used is symmetric in nature, then both
convolution and correlation will become the same as flipping a symmetric
filter or no flipping — both are the same.
Conclusion
Spatial domain filtering is an operation that relies on the mathematical
operation of convolution. Convolution is achieved using various kernels.
Various kernels for doing operations of sharpening, smoothening etc. were
presented in this chapter. The shape of the filters (i.e. number of rows and
columns in rectangular filters or radius in circular filters) plays an important
role in deciding the results apart from the dominant role played by filter
coefficients. Various types of filters like sharpening, smoothening, etc. were
explored. Also, the chapter concluded by pointing out the differences
between the operation of convolution and correlation as both are
mathematically similar but yield very different results.
Points to remember
• For applying the operation of convolution, the system should be linear
and time invariant.
• The response of derivative filter shoots at those points in the signals
where there is significant difference between consecutive samples.
• Circular shaped filters do not have propagation artifacts. A box filter can
be made circular by selecting the weights (coefficients) of the filters
properly.
• The kernel of a two-dimensional Gaussian filter is separable into its x
and y components. Not all filters have this property of separability.
• From application point of view the relation between convolution and
correlation is flipping of the filter while application. Both operations are
same in case of symmetric filter.
Exercises
1. Explain the difference between the working of first and second order
derivative filters.
2. How is the following filter expected to behave dominantly? Averaging
or sharpening?
5 4 4 3 3
4 5 5 4 4
3 4 5 6 5
3 4 5 3 4
2 3 3 4 4
OceanofPDF.com
CHAPTER 5
Frequency Domain Image
Processing
5.1 Introduction
In this chapter, we will be introduced to the concept of frequency. To be able
to have different perspectives about a given mathematical function, we have
domains in which we can see that signal. These domains offer different
perspectives about that same signal. The most common domain is the time
domain (or independent variable vs. value in general) where we see the
values of signal with respect to time, as a graph. For images, we do not have
time domain, we have space domain as the independent variable for image
are 2D pixel coordinates in space. So, time/space domain is one such domain
which is naturally known to us. In general, for a mathematical signal, values
of signal plotted with respect to its independent variable is one domain. Can
the same signal be seen in some other domain too? — this is the question we
are trying to answer in this chapter. In this chapter, we will explore the idea
of transforming the signals from one domain to another to obtain more
information from the signal.
The new domain that we are going to explore in this chapter is called
frequency domain. We are going to study frequency domain for digital
signals in detail.
Structure
This chapter discusses the following topics:
• One dimensional analog signal in time and frequency domain
• One dimensional discrete time signal in time and frequency domain
• Two – dimensional Fourier transform
• Filtering of images in frequency domain
Objectives
After reading this chapter, the reader will be able to understand what
frequency domain is and how to bring a signal in one, two, or multiple
dimensions to frequency domain. This chapter presents the theory of
transformations from scratch, so no background is assumed. The reader will
also be able to appreciate the physical interpretation of frequency, especially
in the case of images. Based on the understanding so developed, the reader
will be able to perform frequency domain filtering using various filters
available in the literature.
To bring the signal back to the time domain from the frequency domain,
refer to the following equation:
Equation 5.2:
Above is also called continuous time Fourier series (CTFS). Let us take a
moment to understand what these equations mean. x(t) is the periodic signal
with a period Tp, which we wish to transform in the frequency domain with
.
Let us first talk about Equation 5.2. Recall that in section 5.2.1, we said that
any signal of practical interest can be represented as a summation of
weighted sinusoids (formally weighted linear combination of sinusoids).
This is also true with exponential signals with imaginary exponent as well.
Any signal of practical interest can be represented as a weighted linear
combination of exponentials with an imaginary exponent. This is what
Equation 5.2 represents.
Like sinusoids and exponentials, there are other functions too which can do
the same, but we use sinusoids when we are dealing with real signals and
exponentials when we deal with complex input signals. In fact,
ejθ=cos(θ)+jsin(θ). So, representing x(t) by a weighted linear combination of
sinusoidal or exponentials is equivalent. However, we prefer to use
exponentials as we assume that some of our input signals will be complex in
nature. These sinusoidal signals or exponentials are called fundamental basis
functions of Fourier series/transform.
We now try to understand why other fundamental basis are not used. The
answer comes from signal theory. Sinusoids and exponents are well behaved
with linear time invariant (LTI) systems. To elaborate – if we apply a
signal of a given frequency at the input of the LTI system, we get the signal
with the same frequency at the output but with a different amplitude and
phase as dictated by the LTI system. This property is useful in signal
processing. Here is an explanation of how it happens – any signal of
practical interest can be thought of as weighted linear combination of
exponentials and so if this signal is passed through LTI system, at the output
same linear combination will be there will possibly changes in the weights
and phases only. This greatly simplifies calculations in the field of signal
processing.
For our case, coming back to Equation 5.2, ej2πkF0t is the fundamental basis
function. It is a family of exponentials for k=0,±1,±2,… Finally, ck is the
complex weight associated with the kth exponential. ck’s are calculated using
Equation 5.1. A plot of kF0 vs. ck is the frequency domain representation that
we were talking about in the previous section. Since we have changed the
fundamental basis function from sinusoid to exponential, it is necessary that
we see the plot of a few signals in the time and frequency domain to have a
better grasp. We will do that once we discuss Fourier transform in the next
section.
To bring the signal back from frequency to time, we will use the following
equation:
Equation 5.4:
Figure 5.7: Discrete time periodic cosine signal with different frequencies
We state this here without proof that when the discrete frequency f can be
written in the form of p/q such that p and q are relatively prime, i.e., the
fraction p/q cannot be further simplified, then the signal is periodic with a
fundamental period as q samples. Also, the unit of frequency in a discrete
world is not cycles/seconds but cycles/samples. This is because we do not
have time in a discrete world but the index of time. So, things are taken on a
per-sample basis instead of a per-second basis. Keeping this in mind, let us
now try to interpret parts (a), (b), and (c) of Figure 5.7, in which discrete
cosine signal with frequencies 1/10, 2/10, and 3/10 are shown respectively.
Their corresponding fundamental periods are 10 samples, 5 samples, and 10
samples. This is because 1/10 and 3/10 cannot be further simplified.
However, 2/10 is 1/5, and hence, for that signal, the fundamental period is 5
samples only. One may check that the signals repeat after 10, 5, and 10
samples, respectively.
Above is also called as DTFS. Here also, the summation range is finite,
meaning any discrete sequence periodic with period N can be represented as
a weighted linear combination of N exponential signals (fundamental basis).
In the continuous time domain, there could be infinite components, but here
only N suffice. The fundamental frequency of the exponential fundamental
basis is .
In the above equation, we have used w in place of f simply because this form
is popular. The relation between w and f is w=2πf and, hence, dw=2πdf.
The inverse transform is given in the following equation:
Equation 5.10:
Above is also called as DTFT. One of the problems with DTFT is that its
frequency domain (i.e., frequency axis) is continuous, as observed in
Equation 5.10. It is problematic because computers cannot store infinite
values (any two points on a continuous frequency axis will have infinite
points between them). This is why DTFT is not suitable for use with
computers. Hence, the frequency axis is so discretized that no information is
lost, and we come to something called DFT, which is the topic of the next
sub-section. This discretization is such that we have the same number of
samples in the time domain and frequency domain (usually). However, in
general, the samples in the time and frequency domain (on x-scale) might be
different too.
With k=0,1,2,3,4 … (N-1). In the discrete time domain, for DFT, we have
discrete time axis indexed by time instants n and similarly, discrete
frequency axis indexed by k. Both the time axis (in the time domain) and
frequency axis (in the frequency domain) are discrete. As we know that t=nT
holds for sampling a continuous time signal to discrete time signal through
uniform sampling, n=5 will not mean that the time is 5. It simply means
time t=5T, where T is the spacing between two samples in the time domain.
Frequency axis (in frequency domain) as discussed earlier if periodic with
period 1 if f is used as a frequency variable. However, since we are working
with w (when we studied DTFT in the previous section), the period will be
2π (as w=2πf). Any period of length 2π will do. So, we take it from 0 to 2π.
In DFT, this period is divided into N equal parts. These parts in 1 period are
indexed by k. DFT can be regarded as a sampled version (in the frequency
domain) of DTFT.
Keeping in mind the above discussion and Equation 5.11, and the fact that in
earlier sections, we decided to use ej2πfn as fundamental basis, here for DFT,
f = (or equivalently ). Hence, the fundamental basis in the RHS of
the Equations 5.11 and 5.12 can be understood. There are exactly N points in
the signal in the time domain and N points in the frequency domain for DFT.
To bring a signal back from frequency to the time domain, we use the
following equation [called inverse discrete Fourier transform (IDFT)]:
Equation 5.12:
Let us take a look at the second case where we have fixed the frequency but
rotate the image by some degrees in time domain. Refer to Figure 5.13 for
understanding such a case. The frequency f (and not fx) is fixed as 0.05. The
difference between f and fx can be understood from the fact that f = fx when
the image is not rotated at all [as it was shown in part (a) of Figure 5.12.
However, in parts (a), (b), and (c) of Figure 5.13, the image has been rotated
by 300, 600 and 1350 with respect to the x-axis. In general, fx and fy have
non-zero values but their resultant is f=0.05, which is oriented in the
direction specified in parts (a), (b) and (c). The equivalent change in
frequency domain can be noted as rotation of the two dots by the same
degree (which were earlier on horizontal line for rotation angle =0]. Let us
now investigate what does rotation means in general. To understand this, let
us talk about parts (a) and (b) of Figure 5.13. The remaining parts of the
figure will follow. In part (a), the sinusoidal pattern makes an angle of 300
with the horizontal. Due to this, if you note in frequency domain, the
distance between the projections of 2 dots on fx axis is reduced. However, as
compared to the zero rotation case (where fy=0), here fy≠0. It has some finite
value, but the resultant f is still 0.05 in direction of 300. This can be
correlated with time domain too. Now, the variation along y axis is not
purely 0 if one travels along any fixed vertical line. The variation along any
horizontal line is reduced as compared to the zero rotation case because of
the tilt in the pattern.
Figure 5.13: Changing rotation in space domain vs. its effect in frequency domain
One thing becomes clear here – in Figure 5.13 parts (a), (c) and (e) or
equivalently (b), (d) and (f) have the same resultant frequency of f=0.05. If
we trace the locus of all such points in frequency domain that have same
resultant frequency f, we get the result shown in Figure 5.14. Note that f=0.3
is chosen so that the ellipse in frequency domain has big enough size for
good visibility purposes.
Figure 5.14: Concentric circles image and space frequency representation
Let us take a moment to grasp what just happened. Part (a) of Figure 5.14
was created by adding all images with resultant f=0.3 for all possible
rotations from 00 to 1800 (this figure can be generated by using Code 5.6 but
in that code, modification for equal number of samples on both frequency
axis is which is discussed in coming paragraphs are also incorporated). The
equivalent frequency domain representation has an ellipse which is the locus
of all points having resultant f=0.3. In the time domain, you may now note
that in any direction (if you see from center), there is a sinusoidal variation
with resultant f=0.3. Let us see what happens to this ellipse when we lower f
to 0.1. See Figure 5.15. The lowering of frequency is reflected in the low
frequency 2D sinusoid in time domain, and the size of ellipse has become
smaller in frequency domain. Note that the time domain wave is not
perfectly circular because it has been constructed by adding rotated versions
which had some plain background apart from the rotated sinusoidal pattern
too (as shown in Figure 5.13):
Figure 5.15: Concentric circles image and space frequency representation
The crux is that in the frequency domain, an ellipse centered around origin
represents a single resultant frequency. If any portion of ellipse is missing, it
means that equivalently in time domain, that variation is absent in those
directions where the ellipse is missing.
Now, let us address one issue here. If you look at the frequency domain
representation in any of the figures from Figure 5.10 to Figure 5.15, the
frequency samples on the x and y axis (i.e., values of fx and fy) are not equal
in number although they have the same range -1/2 to 1/2. This seems unjust.
Due to this reason, in part (b) of Figure 5.15, we get an ellipse, otherwise, if
there are equal number of samples on both axis (i.e., equal number of rows
and columns on frequency axis), we will get a circle. It is the nature of the
FFT algorithm that the size of input image (array) will also be the output of
the same size. Hence, to make equal number of samples on both the
frequency axis in frequency domain, we need to make number of rows and
columns equal in space (or time) domain. Therefore, we do zero padding to
columns if the total number of columns are less than rows or vice versa. This
can be simply done by using Code 5.6 and the corresponding result in Figure
5.16. Recall that we discussed the issue of frequency axis sampling for
obtaining DFT (FFT) from DTFT at the end of SubSection 5.3.2.2. Note line
number 39 to 44 in Code 5.6 where the syntax of inputting number of
samples for both the frequency axis is shown. As we will note later, this will
greatly simplify our 2D filter design as filters with circular cutoff
frequencies are relatively easy to design as compared to corresponding
elliptical filters. Due to this, we will follow the convention of axis with equal
number of samples in the frequency domain unless otherwise stated.
01-
#=====================================================
=================
02- # PURPOSE : Having equal no. of samples on both axis in freq. domain
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- from scipy import ndimage
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- #---------------------------------------------------------------------------
12- # Creating a test image in spatial domain and displaying
13- #---------------------------------------------------------------------------
14- r=70
15- c=100
16- input_image=np.float32(np.zeros((r,c)))
17- input_image2=np.float32(np.zeros((r,c)))
18-
19- f=.3 # Set Discrete Frequency here
20- n=np.linspace(0,c-1,c)
21- one_row=np.sin(2*np.pi*f*n)
22- for i in range(0,r,1):
23- input_image[i,:]=one_row
24- rot_angle=30 # Set Rotation angle in degrees here
25-
26- for rot_angle in np.arange(0,180,1):
27- input_image2 =
input_image2+ndimage.rotate(input_image,rot_angle,reshape=False)
28-
29- input_image =input_image2
30-
31- fig1,ax1=plt.subplots(1,2)
32- fig1.show()
33- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial Domain
f=0.3',ax1[0])
34- ax1[0].axis('on')
35-
36- #---------------------------------------------------------------------------
37- # IMAGE IN FREQUENCY DOMAIN
38- #---------------------------------------------------------------------------
39- # No. of frequency samples on both axis in freq. domain
40- freq_points=np.max([r,c])
41- # NOTICE THE WAY THE FOLLOWING COMMAND IS USED
42- # [freq_points,freq_points] argument correspond to total samples
43- # on fx and fy axis respectively.
44- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
45- mag_image=np.abs(sfft.fftshift(fft_input_image))
46- mf.my_imshow(mf.norm_uint8(mag_image),"(b) Image in Frequency
Domain",ax1[1])
47- ax1[1].axis('on')
48-
49- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
50- x_positions=np.linspace(0,freq_points-1,5);
51- x_labels=x_positions/np.max(x_positions)-0.5
52- ax1[1].set_xticks(x_positions, x_labels)
53-
54- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
55- y_positions=np.linspace(0,freq_points-1,5);
56- y_labels=y_positions/np.max(y_positions)-0.5
57- ax1[1].set_yticks(y_positions, y_labels)
58-
59- plt.show()
60- print("Completed Successfully ...")
Code 5.6: Setting equal number of samples on both frequency axis fx and fy
The output is shown below:
Figure 5.16: Figure 5.14 reproduced with equal number of samples on both frequency axis in part (b)
Figure 5.17: Ideal low pass filter design and filtering in frequency domain
To understand what an ideal filter is, see part (e) of the figure. It shows the
frequency domain representation (magnitude plot only) of the system. It is
such that it is 1 in the range -0.2 to 0.2, and zero everywhere else (in the
range -0.5 to 0.5, after which it is periodic). This value 0.2 is called cutoff
frequency, denoted by fc=0.2. If this filter were to be element by element
multiplied with the frequency response of the signal [which is shown in part
(d) of the figure], this will abruptly truncate the frequency response of the
system beyond the range [-0.2,0.2]. This abrupt/ instantaneous truncation
gives it the name ideal filter. Lowpass filter allows only low frequencies
decided by the cutoff frequency) to appear in the output.
Now, let us understand the complete situation depicted in Figure 5.17. Parts
(a) and (b) of the figure show two different sinusoids with frequencies 1/3
and 1/20, respectively. Also, note that their amplitudes are also different
(although it is not necessary). Part (c) is our input signal made by adding the
signals of part (a) and part (b) in the time domain. Part (d) shows the
magnitude spectrum of part (c). As already discussed, note the impulses
corresponding to the frequencies in the input signals with their magnitudes.
As our system, which is a lowpass ideal filter, we are using the magnitude
response as shown in part (e) of the figure. Multiplying the system in part
(e) element by element of the signal in part (d) (both in frequency domain),
the higher frequency will be chopped off (as it is greater than fc=0.2). This is
called filtered signal. In part (f), the filtered signal is brought back to time
domain and plotted. It matches the low frequency component as shown in
part (b) of the figure.
The code for this is given in Code 5.7. In line number 64, if instead of
np.zeros, np.ones is used, and in line number 68, if the RHS of the equation
is 0 instead of 1 as of now, we will get an ideal high pass filtering scenario.
Its results are depicted in Figure 5.18. By comparing this with Figure 5.17,
your understanding of low pass and high pass filters will be validated.
Like this, a band pass filter (a filter that allows only a certain band of
frequencies), a band reject filter (a filter that dis-allows certain band of
frequencies) can be constructed as desired. In an ideal filter, the pass band
(where frequencies are allowed) has magnitude response as 1 and stop band
(where frequencies are dis-allowed) as 0. There is a sharp transition from 0
to 1 or 1 to 0 in frequency domain.
01-
#=====================================================
=================
02- # PURPOSE : Frequency domain IDEAL LOW PASS filtering in one-
dimension
03-
#=====================================================
=================
04- import cv2, matplotlib.pyplot as plt, numpy as np
05- import scipy.fft as sfft
06- import my_package.my_functions as mf # This is a user defined package
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- #-----------------------------------------------------------------
10- # Constructing cosine signal (discrete time) with two frequency
components
11- #-----------------------------------------------------------------
12- f1=1/3 # Discrete frequency (In cycles/sample for component 1)
13- f2=1/20 # Discrete frequency (In cycles/sample for component 2)
14- L=81 # Total no. of samples in the signal
15- n=np.arange(0,L,1) # Index of time
16- sig1=3*np.sin(2*np.pi*f1*n)
17- sig2=2*np.sin(2*np.pi*f2*n)
18- sig=sig1+sig2 # Discrete time signal with two frequencies
19-
20- #-----------------------------------------------------------------
21- # Plotting 1st component of the signal in time domain
22- #-----------------------------------------------------------------
23- fig1,ax1=plt.subplots(3,2)
24- fig1.show()
25- ax1[0,0].stem(n,np.real(sig1)) # Component 1 of the signal
26- ax1[0,0].grid()
27- ax1[0,0].set_title("(a) Component 1")
28- ax1[0,0].set_xlabel("n",fontsize=12)
29- ax1[0,0].set_ylabel("Amplitude",fontsize=12)
30-
31- #-----------------------------------------------------------------
32- # Plotting 2nd component of the signal in time domain
33- #-----------------------------------------------------------------
34- ax1[0,1].stem(n,np.real(sig2)) # Component 2 of the signal
35- ax1[0,1].grid()
36- ax1[0,1].set_title("(b) Component 2")
37- ax1[0,1].set_xlabel("n",fontsize=12)
38- ax1[0,1].set_ylabel("Amplitude",fontsize=12)
39-
40- #-----------------------------------------------------------------
41- # Plotting the signal in time domain (signal=component1+component2)
42- #-----------------------------------------------------------------
43- ax1[1,0].stem(n,np.real(sig))
44- ax1[1,0].grid()
45- ax1[1,0].set_title("(c) Input Signal = Component 1 + Component 2")
46- ax1[1,0].set_xlabel("n",fontsize=12)
47- ax1[1,0].set_ylabel("Amplitude",fontsize=12)
48-
49- #-----------------------------------------------------------------
50- # Transforming signal to frequency domain
51- #-----------------------------------------------------------------
52- freq_axis=np.linspace(-(L-1)/2,(L-1)/2,L)/L
53- fft_sig=sfft.fftshift(sfft.fft(sig/L))
54- ax1[1,1].stem(freq_axis,np.abs(fft_sig))
55- ax1[1,1].grid()
56- ax1[1,1].set_title("(d) Magnitude Plot (Signal)",color='k')
57- ax1[1,1].set_xlabel("f",fontsize=12)
58- ax1[1,1].set_ylabel("Magnitude",fontsize=12)
59-
60- #-----------------------------------------------------------------
61- # Lowpass filter design in frequency domain
62- #-----------------------------------------------------------------
63- # Initialisation with all zeros
64- freq_filter=np.zeros(np.size(freq_axis))
65- # SET CUTOFF FREQUENCY HERE
66- fc=.2
67- # LPF Design
68- freq_filter[np.asarray(np.where((freq_axis>-fc) & (freq_axis<fc)))]=1
69- ax1[2,0].stem(freq_axis,freq_filter)
70- ax1[2,0].grid()
71- ax1[2,0].set_title("(e) IDEAL LPF in frequency domain",color='k')
72- ax1[2,0].set_xlabel("f",fontsize=12)
73- ax1[2,0].set_ylabel("Magnitude",fontsize=12)
74-
75- #-----------------------------------------------------------------
76- # Transforming the filtered signal back to time domain
77- #-----------------------------------------------------------------
78- filtered_signal_time=L*fft_sig*freq_filter # L is normalisation factor
79- ifft_sig=sfft.ifft(sfft.ifftshift(filtered_signal_time))
80- ax1[2,1].stem(n,np.real(ifft_sig))
81- ax1[2,1].grid()
82- ax1[2,1].set_title("(f) Filtered Signal (Time domain)")
83- ax1[2,1].set_xlabel("n",fontsize=12)
84- ax1[2,1].set_ylabel("Amplitude",fontsize=12)
85-
86- plt.show()
87- print("Completed Successfully ...")
Code 5.7: Frequency domain low pass filtering illustration
The output of the code is shown in figure below:
Figure 5.18: Ideal high pass filter illustration
Figure 5.19: Conventions used for time to frequency and vice versa conversion
From this point onward, when we use the frequency domain, we mean the
normalized frequency domain, but we drop the word normalized — it will be
understood by default. Also, while displaying the image transformed back to
space domain, we will display it by removing the padded part. Further, the
labels and annotations and x and y axis in both the domains will be dropped.
01-
#=====================================================
=================
02- # PURPOSE : Understanding padding while taking FFT-IFFT
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # Importing and displaying image in space domain
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img1.bmp',0))
14- r,c=np.shape(input_image)
15- fig1,ax1=plt.subplots(2,2)
16- fig1.show()
17- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
18- ax1[0,0].axis('on')
19-
20- ax1[0,0].set_xlabel('x (or c) axis->')
21- ax1[0,0].set_ylabel('<- y (or r) axis')
22-
23- #---------------------------------------------------------------------------
24- # IMAGE IN FREQUENCY DOMAIN
25- #---------------------------------------------------------------------------
26- fft_input_image=sfft.fft2(input_image)
27- mag_image=np.abs(sfft.fftshift(fft_input_image))
28- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain",ax1[0,1])
29- ax1[0,1].axis('on')
30-
31- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
32- x_positions=np.linspace(0,c,5);
33- x_labels=x_positions/np.max(x_positions)-0.5
34- ax1[0,1].set_xticks(x_positions, x_labels)
35-
36- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
37- y_positions=np.linspace(0,r-1,5);
38- y_labels=y_positions/np.max(y_positions)-0.5
39- ax1[0,1].set_yticks(y_positions, y_labels)
40-
41- ax1[0,1].set_xlabel('fx axis ->')
42- ax1[0,1].set_ylabel('fy axis ->')
43-
44- #---------------------------------------------------------------------------
45- # IMAGE IN FREQUENCY DOMAIN (With Equal samples for fx and
fy)
46- #---------------------------------------------------------------------------
47- # No. of frequency samples on both axis in freq. domain
48- freq_points=np.max([r,c])
49- # NOTICE THE WAY THE FOLLOWING COMMAND IS USED
50- # [freq_points,freq_points] argument correspond to total samples
51- # on fx and fy axis respectively.
52- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
53- mag_image=np.abs(sfft.fftshift(fft_input_image))
54- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(c)
NORMALISED Frequency Domain",ax1[1,0])
55- ax1[1,0].axis('on')
56-
57- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
58- x_positions=np.linspace(0,freq_points-1,5);
59- x_labels=x_positions/np.max(x_positions)-0.5
60- ax1[1,0].set_xticks(x_positions, x_labels)
61-
62- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
63- y_positions=np.linspace(0,freq_points-1,5);
64- y_labels=y_positions/np.max(y_positions)-0.5
65- ax1[1,0].set_yticks(y_positions, y_labels)
66-
67- ax1[1,0].set_xlabel('fx axis ->')
68- ax1[1,0].set_ylabel('fy axis ->')
69-
70- #---------------------------------------------------------------------------
71- # Image Transformed back in time domain
72- #---------------------------------------------------------------------------
73- image_back_in_time=sfft.ifft2(fft_input_image)
74- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(d) PADDED
Spatial Domain',ax1[1,1])
75- ax1[1,1].axis('on')
76-
77- ax1[1,1].set_xlabel('x (or c) axis->')
78- ax1[1,1].set_ylabel('<- y (or r) axis')
79-
80- plt.show()
81- print("Completed Successfully ...")
Code 5.8: Conventions while transforming an image from space to frequency domain and vice versa
where fr is the radial frequency such that fr2=fx2+fy2. Also, note that the
function H(fr) is magnitude response in the frequency domain. This becomes
a radial Gaussian function in the frequency domain with a maximum value
of 1. Apart from this we also need to define the cutoff frequency fc. But the
problem is that the Gaussian function does not have a sharp cutoff point.
One widely used convention is half power point. Let us understand half
power point.
In the frequency domain, we see the signal from the perspective of its
composition from basic frequencies. If a signal has one of the frequency
components as fr with magnitude |H(fr)| the power contributed by that
component is |H(fr)|2 – this is a result from signal processing theory which
we are stating here without proof. Since we are talking about monotonically
decreasing Gaussian low pass filter, as the frequency increases, the
magnitude decreases and hence the power. One may select the half power
point as the cutoff frequency of the filter. This means that when we create
Gaussian lowpass filter by using Equation 5.17, where we choose σ such that
at fr=fc (fc being the cutoff frequency). |H(fr)|i.e., the magnitude must
correspond to half power. The maximum power was at zero frequency which
is 12=1 (as the magnitude was 1 there). We now look for a frequency (i.e.,
value of fr) such that|H(fr)|2 becomes i.e., . We will call this
fr as cutoff frequency and denote it by fc.
So, if we rewrite Equation 5.17 in the following form, we get:
Equation 5.18:
The preceding equation gives us that value of σ for which fr equals cutoff
frequency fc. We call that cutoff value of σ as σc Note that fc is not an abrupt
cutoff frequency as in an ideal filter. This is the frequency where the
magnitude will become , which corresponds to half power of the
maximum magnitude of the overall signal.
Gaussian low pass filtering is illustrated in Figure 5.22. Compare it with
Figure 5.21 and notice differences in filter structure in part (c) of both
figures and most importantly, the absence of fringing in parts (e) and (f) of
the latter. Notice that smoothening is present though. This is the advantage
of using Gaussian filters in frequency domain.
Figure 5.22: Frequency domain low pass filtering of images with Gaussian filter having fc=0.1
cycles/sample
The code for generating results in Figure 5.22 is shown here. For the most
part, it is same as Code 5.9. The only difference is in Gaussian low pass
filter creation with a given cutoff frequency. We recommend playing with
the cutoff frequency and noticing the results.
01-
#=====================================================
=================
02- # PURPOSE : Frequency domain Gaussian Low pass filtering of images
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # Importing and displaying image in space domain
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img9.bmp',0))
14- r,c=np.shape(input_image)
15- fig1,ax1=plt.subplots(2,3)
16- fig1.show()
17- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
18-
19- #---------------------------------------------------------------------------
20- # Image in normalised frequency domain (Magnitude Plot)
21- #---------------------------------------------------------------------------
22- freq_points=np.max([r,c])
23- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
24- mag_image=np.abs(sfft.fftshift(fft_input_image))
25- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain (Magnitude)",ax1[0,1])
26-
27- #---------------------------------------------------------------------------
28- # Designing GAUSSIAN LOW PASS Filter
29- #---------------------------------------------------------------------------
30- # Initialising an image with shape equal to fft_input_image with all zeros
31- freq_domain_filter=np.zeros((freq_points,freq_points))
32- fc=.1 # SET CUTOFF FREQUENCY OF FILTER HERE
33-
34- # Creating 2D freq. grid for Gaussian filter generation in freq. domain
35- f_positions_norm=np.linspace(0,freq_points,freq_points)
36- fx=f_positions_norm/np.max(f_positions_norm)-0.5
37- fy=fx # Because we are taking circularly symmetry
38- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
39-
40- # Sigma of Gaussian dictated by cutoff frequency fc
41- sigma1=np.sqrt((fc**2)/(2*np.log(np.sqrt(2))))
42- # 2D Gaussian creation
43- sigma_x=sigma1
44- sigma_y=sigma1
45- Gauss_function=np.exp(-((fxx**2)/(2*(sigma_x**2))+(fyy**2)/(2*
(sigma_y**2))))
46- freq_domain_filter=Gauss_function
47- mf.my_imshow(mf.norm_uint8(freq_domain_filter),"(c) Frequency
Domain Filter",ax1[0,2])
48-
49- #---------------------------------------------------------------------------
50- # Filtering in frequency domain
51- #---------------------------------------------------------------------------
52- freq_filtered_image=sfft.fftshift(fft_input_image)*freq_domain_filter
53-
mf.my_imshow(mf.norm_uint8(np.log(1+np.abs(freq_filtered_image))),"(d)
Frequency Domain Filtering",ax1[1,0])
54-
55- #---------------------------------------------------------------------------
56- # Image Transformed back in time domain
57- #---------------------------------------------------------------------------
58- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
59- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(e) PADDED
Spatial Domain',ax1[1,1])
60-
61- #---------------------------------------------------------------------------
62- # Image Transformed back in time domain (displayed without
padding)
63- #---------------------------------------------------------------------------
64- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
65- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(f)
Spatial Domain',ax1[1,2])
66-
67- plt.show()
68- print("Completed Successfully ...")
Code 5.10: Low pass filtering in frequency domain using Gaussian (monotonic) filter
At this point, it is also important to note that conventionally, decibel (dB)
scale is used for specifying the cutoff frequencies. Quantity X representing
magnitude or amplitude in decibel becomes 10 log10X. An example relevant
to our case is the magnitude response. However, quantities representing
power become 20 log10X when read in decibel scale. Note that when X= ,
the power of X represented in dB is 20 log10 ≈-3. That is why, the half
power frequency is also called as -3db frequency or 3 dB decay frequency.
Gaussian filter does what we need. There are many such filter structures that
are monotonic and will give us similar results. We are both satisfied and
dissatisfied with the results. We are satisfied because smoothing is achieved
without fringing. However, we are dissatisfied because the Gaussian filter's
magnitude response does not match the ideal filter's response. To see what
this means, look at Figure 5.23:
Figure 5.23: Magnitude response of a typical non-ideal lowpass filter in the frequency domain
It shows the two-dimensional magnitude response of a typical lowpass
frequency domain filter superimposed with a corresponding one-dimensional
equivalent. The one-dimensional equivalent is drawn by taking the profile on
the horizontal line in the center of the image. Also, note that the axis has two
different meanings corresponding to one and two-dimensional signals. The
current axis marks correspond to two-dimensional signals on both axes. For
a one-dimensional signal, the marks on the x-axis remain the same, but on
the y-axis, the markings should be read from 0 to 1.
Let us discuss the one-dimensional case first. Clearly, the filter structure is
not an ideal one. In an ideal lowpass filter, there are two regions – one where
the magnitude response is exactly 1 in low frequency region (called
passband) and the second where the magnitude response is exactly 0 in high
frequency region (called stop band). In the current structure, however, the
magnitude response remains one (or nearly one in some portion), and then it
decays in some finite region to nearly zero value. In the practical filter, there
is a third region called the transition region, where the response is neither
one nor zero, but it decays from one to zero.
Similar observations can be made about the two-dimensional filter structure.
Compare part (c) of Figure 5.21 with Figure 5.23. One can see that the ideal
filter in the frequency plane has a magnitude response of 1 in a circular
region. Outside this region, it is 0. However, in a practical filter, this
transition from 1 to 0 is gradual rather than abrupt, as noted in Figure 5.23.
Hence, passband, transition band, and stop band can be clearly imagined.
Refer to Figure 5.24 where these regions are marked. The boundaries of the
pass band, transition band, and stop band are determined through specific
criteria based on the desired performance of the filter. These criteria involve
factors like cutoff frequencies, filter order, and desired attenuation levels.
However, this topic is beyond the scope of this book, and readers interested
in a deeper understanding are encouraged to refer to textbooks on signal
processing.
Figure 5.24: Passband, transition band, and stop band in a practical lowpass filter
Having understood the above, let us now look at parts (a), (b), and (c) of
Figure 5.25. We will talk about parts (d), (e), and (f) shortly. Parts (a) and
(b) are similar, we have already worked with them in the previous sections –
the ideal lowpass filter and the Gaussian filter. Also, note that they have the
same cutoff frequency. Despite this, their magnitude responses differ
significantly due to the presence of a wide transition band in the Gaussian
lowpass filter in part (b). In part (c), another filter called the Butterworth
filter, which we shall explore, is shown. It also has the same cutoff
frequency as the lowpass ideal and Gaussian filters of parts (a) and (b). The
magnitude response of this filter, when compared to the ideal filter, is closer
in appearance when compared to the Gaussian filter. This has a narrow
transition band that we were looking for. A narrow transition band results in
lower undesired frequencies in the output, including those in the transition
and stopband, compared to a wider transition band. In the next section, we
will study the Butterworth filter (which is not a monotonic filter) in detail.
However, before that, remember – we need smoothness in magnitude
response to avoid fringing, and we need a narrow transition band to greatly
attenuate undesired frequencies – if not completely remove them. Both are
not achievable simultaneously. So, there will be a tradeoff between the two.
Note that the LHS is a magnitude squared response and not a magnitude
response alone. fc, as usual, is the half power cutoff (-3 dB cutoff) frequency.
This fact can be verified by putting fr= fc in Equation 5.20. This will yield
|H(fr)|2 = which corresponds to half power, or in other words,
. here controls the width of the transition band. It is called the order of filter.
The higher the order, the closer we approximate the ideal filter (at the cost of
more fringing). The lower the order, the more the smoothening (at the cost of
a wide transition band and hence undesired frequencies in output).
To see how the Butterworth filter changes its transition width with respect to
the order of the filter, see parts (d), (e), and (f) of Figure 5.25. All of them
are Butterworth filters with a cutoff frequency fc=0.1 cycles/sample but
order n=1, 2, and 50 respectively. For n=50, the response closely matches
with ideal filter’s magnitude response as in part (a) of the figure. For n=1
[as in part (d)], the transition band is extremely wide.
Let us now see an illustration comparing the results of the application of
ideal, Gaussian, and Butterworth low pass filter on the input image of part
(a) of Figure 5.22. Refer to Figure 5.26 for seeing the results. Note that all
the filters have the same cutoff frequency, as shown in parts (a), (b), and (c).
Their one-dimensional equivalents are also shown superimposed on them.
The results of the application of the ideal filter in part (d) show fringing. In
part (e), fringing is absent, but as one can note carefully, some high-
frequency components are present (sharp boundaries). This is because the
Gaussian filter’s magnitude response is non-zero in the transition band (over
a wide range of frequencies). This problem is removed in part (f), which
shows the result corresponding to the Butterworth filter as the transition
band is narrow.
Figure 5.25: Magnitude response of different filters with same cutoff frequency
However, due to closeness to the ideal magnitude response, some fringing
appears. Still, the result of the Butterworth filter is better than the ideal filter
(in terms of fringing) and Gaussian (in terms of suppressing the frequency
components after cutoff frequency).
Figure 5.26: Frequency domain low pass filtering with fc=0.1 cycles/sample on Figure 5.22 with
different filters
In Figure 5.27, we have shown the result of the application of Butterworth
lowpass filters with a cutoff frequency of 0.1 cycles/sample but with
different orders of 1, 5, and 10. From this figure, one can note that fringing
increases as we increase the order of the filter because the magnitude
response approaches ideal filter characteristics, and hence, the transition
from 1 to 0 becomes sharper, which leads to the phenomenon of fringing.
The second important point to note here is that smoothening is best when the
order is more because the passband completely passes all frequencies in it,
and the stopband almost completely stops the frequencies in it. At order n=1,
even after smoothening, the edge information can be seen.
There exists a tradeoff between fringing and frequency passing/stopping –
one must choose the Butterworth filter order accordingly.
Figure 5.27: Result of applying Butterworth filter with fc=0.1 cycles/sample on Figure 5.22 with
different orders
The code for testing ideal, Gaussian, and Butterworth filters is written in
Code 5.11. Note that this is a general code for creating low pass, high pass,
band stop, and band pass filters. Low pass filtering has already been
discussed. The remaining filters will be discussed in the coming sections.
The implementation of the code is that the user has a choice to use ideal,
Gaussian, or Butterworth variants for filter creation in the above categories.
So, this code is a general code that will be used in the next sections, and all
the figures in this and the coming sections are generated by modifying this
code only. The code is lengthy, but if you have followed it so far, it will not
be daunting.
001-
#=====================================================
=================
002- # PURPOSE : General Code for frequency domain filtering of images
003- # This code implements - Lowpass, Highpass, Band stop and Bandpass
filtering
004- # This code will do the above for Ideal, Gaussian and Butterworth
filters
005-
#=====================================================
=================
006- import cv2,matplotlib.pyplot as plt
007- import numpy as np
008- import scipy.fft as sfft
009- import sys
010- import my_package.my_functions as mf # This is a user defined
package
011- # one may find the details related to its contents and usage in section
2.7.3
012-
013- #---------------------------------------------------------------------------
014- # FUNCTION : Designing IDEAL LOW PASS Filter
015- #---------------------------------------------------------------------------
016- def Ideal_LPF(freq_points,fc):
017- # Initialising an image with shape equal to fft_input_image with all
zeros
018- freq_domain_filter=np.zeros((freq_points,freq_points))
019- # Creating the LPF in following loop
020- for i in np.arange(0,freq_points,1):
021- for j in np.arange(0,freq_points,1):
022- if np.sqrt((i-freq_points/2)**2+(j-freq_points/2)**2)
<fc*freq_points:
023- freq_domain_filter[i,j]=1
024- return(freq_domain_filter)
025-
026- #---------------------------------------------------------------------------
027- # FUNCTION : Designing GAUSSIAN LOW PASS Filter
028- #---------------------------------------------------------------------------
029- def Gauss_LPF(freq_points,fc):
030- freq_domain_filter=np.zeros((freq_points,freq_points))
031- # Creating 2D freq. grid for Gaussian filter generation in freq.
domain
032- f_positions_norm=np.linspace(0,freq_points,freq_points)
033- fx=f_positions_norm/np.max(f_positions_norm)-0.5
034- fy=fx # Because we are taking circularly symmetry
035- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
036- # Sigma of Gaussian dictated by cutoff frequency fc (half power
freq.)
037- sigma_c=np.sqrt((fc**2)/(2*np.log(np.sqrt(2))))
038- # 2D Gaussian creation
039- freq_domain_filter=np.exp(-((fxx**2+fyy**2)/(2*(sigma_c**2))))
040- return(freq_domain_filter)
041-
042- #---------------------------------------------------------------------------
043- # FUNCTION : Designing BUTTERWORTH LOW PASS Filter
044- #---------------------------------------------------------------------------
045- def Butter_LPF(freq_points,fc,n):
046- freq_domain_filter=np.zeros((freq_points,freq_points))
047- # Creating 2D freq. grid for Butterworth filter generation in freq.
domain
048- f_positions_norm=np.linspace(0,freq_points,freq_points)
049- fx=f_positions_norm/np.max(f_positions_norm)-0.5
050- fy=fx # Because we are taking circularly symmetry
051- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
052- # 2D Butterworth creation (fc is already -3db frequency)
053- freq_domain_filter=np.sqrt(1/(1+((np.sqrt(fxx**2+fyy**2))/(fc))**
(2*n)))
054- return(freq_domain_filter)
055-
056- #---------------------------------------------------------------------------
057- # Importing and displaying image in space domain
058- #---------------------------------------------------------------------------
059- input_image=np.float32(cv2.imread('img18.bmp',0))
060- r,c=np.shape(input_image)
061- fig1,ax1=plt.subplots(2,3)
062- fig1.show()
063- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
064-
065- #---------------------------------------------------------------------------
066- # Image in normalised frequency domain (Magnitude Plot)
067- #---------------------------------------------------------------------------
068- freq_points=np.max([r,c])
069- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
070- mag_image=np.abs(sfft.fftshift(fft_input_image))
071- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain (Magnitude)",ax1[0,1])
072-
073- # Select the nature of filter (Low pass Highpass, Band Stop, Band Pass)
...
074- # ... and the type (Ideal, Gaussian, Butterworth) accoding to below ...
075- # the match case ladder below ...
076- filter_type=12
077-
078- fc1=.15 # First cutoff (the only cutoff for Lowpass or Highpass filter)
079- fc2=.25 # Second cutoff (for Band stop and band pass filter)
080-
081- n=10 # Order of butterworth filter (if used)
082-
083- #---------------------------------------------------------------------------
084- # Designing Required Filter
085- #---------------------------------------------------------------------------
086- match filter_type:
087- case 1:
088- Str='IDEAL Lowpass Filter'
089- freq_domain_filter=Ideal_LPF(freq_points,fc1)
090- case 2:
091- Str='Gaussian Lowpass Filter'
092- freq_domain_filter=Gauss_LPF(freq_points,fc1)
093- case 3:
094- Str='Butterworth Lowpass Filter'
095- freq_domain_filter=Butter_LPF(freq_points,fc1,n)
096- case 4:
097- Str='IDEAL Highpass Filter'
098- freq_domain_filter=1-Ideal_LPF(freq_points,fc1)
099- case 5:
100- Str='GAUSSIAN Highpass Filter'
101- freq_domain_filter=1-Gauss_LPF(freq_points,fc1)
102- case 6:
103- Str='Butterworth Highpass Filter'
104- freq_domain_filter=1-Butter_LPF(freq_points,fc1,n)
105- case 7:
106- Str='IDEAL Band Stop Filter'
107- LPF=Ideal_LPF(freq_points,fc1)
108- HPF=1-Ideal_LPF(freq_points,fc2)
109- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
110- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
111- case 8:
112- Str='GAUSSIAN Band Stop Filter'
113- LPF=Gauss_LPF(freq_points,fc1)
114- HPF=1-Gauss_LPF(freq_points,fc2)
115- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
116- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
117- case 9:
118- Str='Butterworth Band Stop Filter'
119- LPF=Butter_LPF(freq_points,fc1,n)
120- HPF=1-Butter_LPF(freq_points,fc2,n)
121- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
122- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
123- case 10:
124- Str='IDEAL Bandpass Filter'
125- LPF=Ideal_LPF(freq_points,fc1)
126- HPF=1-Ideal_LPF(freq_points,fc2)
127- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
128- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
129- freq_domain_filter=1-(HPF+LPF)
130- case 11:
131- Str='GAUSSIAN Bandpass Filter'
132- LPF=Gauss_LPF(freq_points,fc1)
133- HPF=1-Gauss_LPF(freq_points,fc2)
134- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
135- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
136- freq_domain_filter=1-(HPF+LPF)
137- case 12:
138- Str='Butterworth Bandpass Filter'
139- LPF=Butter_LPF(freq_points,fc1,n)
140- HPF=1-Butter_LPF(freq_points,fc2,n)
141- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
142- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
143- freq_domain_filter=1-(HPF+LPF)
144- case _:
145- print('select valid input')
146- sys.exit() # To exit the system
147-
148- #---------------------------------------------------------------------------
149- # Plotting the filter response
150- #---------------------------------------------------------------------------
151- mf.my_imshow(mf.norm_uint8(freq_domain_filter),"(c) Frequency
Domain Filter",ax1[0,2])
152- ax1[0,2].plot(freq_points-
c*freq_domain_filter[np.int16(freq_points/2),:])
153- ax1[0,2].axis('on')
154-
155- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
156- x_positions=np.linspace(0,freq_points-1,5);
157- x_labels=x_positions/np.max(x_positions)-0.5
158- ax1[0,2].set_xticks(x_positions, x_labels)
159-
160- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
161- y_positions=np.linspace(0,freq_points-1,5);
162- y_labels=y_positions/np.max(y_positions)-0.5
163- ax1[0,2].set_yticks(y_positions, y_labels)
164-
165- ax1[0,2].set_xlabel('fx & f axis ->')
166- ax1[0,2].set_ylabel('fy & amp [0 to 1] axis ->')
167-
168- fig1.suptitle(Str)
169-
170- #---------------------------------------------------------------------------
171- # Filtering in frequency domain
172- #---------------------------------------------------------------------------
173- freq_filtered_image=sfft.fftshift(fft_input_image)*freq_domain_filter
174-
mf.my_imshow(mf.norm_uint8(np.log(1+np.abs(freq_filtered_image))),"(d)
Frequency Domain Filtering",ax1[1,0])
175-
176- #---------------------------------------------------------------------------
177- # Image Transformed back in time domain
178- #---------------------------------------------------------------------------
179- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
180- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(e) PADDED
Spatial Domain',ax1[1,1])
181-
182- #---------------------------------------------------------------------------
183- # Image Transformed back in time domain (displayed without
padding)
184- #---------------------------------------------------------------------------
185- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
186- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(f)
Spatial Domain',ax1[1,2])
187-
188- plt.show()
189- print("Completed Successfully ...")
Code 5.11: General code for various filters
From line number 86 to 146, a match case ladder is shown. It is used to
select one of the possible cases according to the match’s variable, which in
our case is filter_type. Here, it is used to create the filter type chosen. The
string str typed in each case to let us know which filter will be created. At
the beginning of the program, there are three user defined functions for the
creation of low pass ideal filter, low pass Gaussian filter, and low pass
Butterworth filter. Soon, we shall see that high pass, band pass, and band
stop filters can be created from the lowpass filter prototypes. In line number
78, we select the cutoff frequencies for low and high pass filters. In band
pass and band stop filters, there will be two frequencies corresponding to
band edges. The second frequency will be set in line number 79 (we will
discuss this in great detail soon). Additionally, for Butterworth filters, we
will have a filter order parameter as well, which we set in line number 81.
For example, if we want to create a Butterworth filter of order 5 with a
cutoff frequency of 0.1 cycles/sample, we may set fc1 in line number 78
equal to 0.1, n in line number 81 to 5 and filter_type in line number 76 to 3
to get the output shown in Figure 5.27.
Before concluding the discussion on lowpass filters, it is important to note
the phase response, which we have not addressed yet. Linear phase is
essential in image processing applications, and we will explain why later in
this book. However, all the filters that we have constructed above and the
ones that we are going to construct in the next few sections have linear
phases. In fact, one result from signal processing states that symmetric finite
impulse response filters have linear phase, which means that if our filters are
so designed that their existence in the space/time domain is bounded by
some lower and upper limit on space/time (i.e., finite number of samples in
our case) and if their magnitude response possesses symmetry about the
origin, then the phase response of the filter is guaranteed to be linear. This is
the case with all the filters that we have designed so far and those that we are
going to design in the coming sections. The result of application of code is
shown in following figure:
Figure 5.28: Result of Code 5.11
Also, keep in mind that ideal, Gaussian, and Butterworth filters are not the
only filters available – there are many more, but they will give similar
results. The crux is that if we have understood how to deal with these,
exploring new filters by ourselves will be trivial.
Figure 5.29: Response of Butterworth high pass filter with cutoff frequency = 0.3 cycles/sample and
order 3
In Section 4.9.1, unsharp masking and high boost filtering were discussed.
The same strategy can be applied here by using the results developed in this
section instead of obtaining the high-frequency filtered content from the
process of spatial filtering by a suitable kernel.
Figure 5.30: Construction of a band stop filter from cum of lowpass and high pass filters
The same may be noted in part (b) of the figure. Apart from the central
bright spot (which corresponds to the average brightness level of the entire
image), there are two dominant spikes in magnitude plot. In part (c) of the
figure, a band stop filter to remove these spikes is constructed
[fc1=0.15,fc2=0.25] and order of the filter n=10. One may note that because
of the filtering caused by this, the output is practically free from the periodic
noise initially present in the original image.
Figure 5.31: Band stop filtering for an input image with periodic noise
It is important to note that if the difference fc2-fc1 is very small, the band stop
filter is then called by a special name called notch filter. This is used when
the portion from the frequency domain to be removed is localized in a very
tiny width region. The output in Figure 5.31 is generated by setting
filter_type=9 in Code 5.11 and setting the parameters, as shown in Figure
5.31. Note that in the match case ladder, for filter_type=9, after creating the
band stop filter, normalization is also done to bring the values of magnitude
in the range 0 to 1.
Figure 5.32: Butterworth band pass filtering on an image corrupted with periodic noise
At this stage, it is recommended to play with Code 5.11 and try all the
possible combinations. In fact, while designing a band stop filter/band pass
filter, one may use Butterworth lowpass filter and a high pass filter
constructed from Gaussian or ideal low pass filter (not present in Code 5.11).
Any combination will do. The selection of a specific filter type depends on
the usage.
Conclusion
The frequency domain presents alternate insights into the signal or system’s
structure. In the spatial domain, one has excessive control over the actual
pixel values. In the frequency domain, one can control (filter out) edges like
sharp regions, smooth regions, etc. In this chapter, methods to transform the
signal from one to another domain were presented together with an
illustration of filtering operations, which were equivalent to spatial domain
kernel based methods introduced in the previous chapter. In the next chapter,
we will study the non-linear filtering and concerns regarding phase during
filtering.
Points to remember
• The Fourier family of transform has many methods like CTFS, CTFT,
DTFS, DTFT, DFT, etc. One needs to understand which to use when,
according to the input signal type.
• Analog frequency is measured in cycles/second, but discrete frequency is
measured in cycles/sample.
• Low frequencies will mean low variations in signal. In an image low
frequency region may correspond to a clear sky region or wall of a
building painted in a single color, etc.
• High frequency regions have sharp edges, corner points, and a lot more
structures.
• One analog frequency has infinitely many equivalent discrete
frequencies called aliases.
• When an image is brought into the frequency domain, one must take an
equal number of samples on the X and Y axis in the frequency domain
so that one frequency lies completely on a circle instead of an ellipse.
• The order of the Butterworth filter decides its closeness to the ideal filter
shape. The higher the order, the closer it is to the ideal shape.
• A lowpass filter prototype can be easily converted to a high pass, band
pass, or band stop filter by using simple equations.
Exercises
1. Take an indoor image (like an image of a study room, etc.). Apply a
high pass Butterworth filter to it and comment on the results so
obtained.
2. On the same image taken in question 1, try changing the filter order for
the Butterworth filter and compare the results obtained.
3. Sharpen an image by using a high pass filter created in the frequency
domain.
4. Explain the context of median filtering as a non-linear filter and discuss
its applications.
5. Take a picture in the dark and perform contrast and illumination
enhancement on it.
OceanofPDF.com
CHAPTER 6
Non-linear Image Processing and
the Issue of Phase
6.1 Introduction
Convolution, as elaborated in the previous chapters, is quite useful for
accomplishing linear filtering using space/time domain kernels (for filtering
in space/time domain) and frequency domain filters (for frequency domain
filtering). It is a linear operation. In this chapter, we will understand that
filtering can be non-linear too. We will also explore non-linear filtering in
this chapter and touch upon the issue of conversion of data from a
continuous to a discrete/digital world and its side effects called aliasing. We
could have done this in one of the previous chapters, but the context that we
have built by now will help us strengthen our understanding better.
Structure
This chapter discusses the following topics:
• Median filtering and salt and pepper noise removal
• Sampling theorem
• Homomorphic filtering
• Phase and images
• Selective filtering of images
Objectives
The primary objective of this chapter is to understand the non-linear
methods for image processing. These operations do not use the convolution
operation, as convolution is linear in nature. The reader will also understand
the sampling process used to convert the continuous scene to an image in
digital form. You will also understand the contribution of phase in image
processing as, until now, only the magnitude part of the spectrum is used for
filter design (in the linear case).
Figure 6.1: Gaussian filtering vs. median filtering for salt and pepper noise removal
One may note that the kind of noise visible in Figure 6.1 (a) is like spilling
ground salt and pepper on the image. Noisy pixels are either perfectly white
(salt) or black (pepper). A natural way to get rid of this kind of noise is to
smoothen the image. That is what we do in part (b) of Figure 6.1. We
filtered the image using a Gaussian kernel in the spatial domain. The noise is
not removed but gets worse (the image has become jittery). Also, blurring
due to the weighted averaging nature of the Gaussian filter makes us
unhappy. We want the noise to be removed and do not want blurring to be
there. Ideally, we are expecting the results shown in Figure 6.1 (c); that is,
salt and pepper noise is removed, and there is no blurring. Tree branches and
the sky are now visible. That is what median filtering does as seen in the
following figure:
Figure 6.2: Illustration of noise removal by Gaussian filtering and median filtering
In order to understand how median filtering works, let us first take a one-
dimensional equivalent example, as shown in Figure 6.2. Let us talk about
the noise-corrupted signal in part (a) of the figure first. The signal shape is
such that its first half is one full period of sine wave followed by the second
part, which is a bipolar pulse (+1 and -1 values). The first half of the signal
(one wave of sinusoid) is corrupted by salt and pepper noise. The salt
component has a value of +3, and pepper has a value of -3. The signal shape
is chosen on purpose. From the first half of the signal, we will learn how
noise removal is done, and from the second half of the signal, we will learn
the smoothening caused by the filter/kernel (which is undesirable).
When we smoothen a signal by using a kernel in the spatial domain through
the process of convolution. Recall from Section 4.5.1 that for the pixel under
processing, the processed value (the result of convolution for that sample) is
essentially the weighted average of its neighborhood. We also know that
averages are highly affected by extreme values in the total samples being
averaged. This justifies the shape of the first half of the signal in Figure 6.2
(b) where the Gaussian filtering is done. Since averages are affected by
extreme values, in the first half of the signal, we have a jittery (still noisy)
shape. The second part of the signal did not possess noise to begin with. Due
to weighted averaging caused by the Gaussian filter, notice that the edges of
the bipolar pulse in the second half of the signal are smoothened out which is
an undesirable effect.
In Figure 6.2 (c), we do median filtering. Instead of taking the weighted
averages around the neighborhood of the pixel under processing (linear
operation), we take the median (nonlinear operation). Median, as we know,
is the middle value in the set of ordered data. So, extreme values will not
affect its calculation. This is why, in Part (c) of the figure, the signal is
noise-free with no jittery behavior. For this reason, no additional
smoothening is introduced in the latter part of the signal; this is what we
desire.
The one-dimensional understanding that we have developed can be directly
generalized to two-dimensional signals (i.e., images) as shown in Figure 6.1.
The results can be easily interpreted now. The results in Figure 6.2 are
generated using Code 6.1. The code is straightforward to understand. Also,
note the inbuilt way of applying the Gaussian filter in the spatial domain in
Line 24:
01-
#=====================================================
=================
02- # PURPOSE : Learning Denoising by MEDIAN Filter vs Gaussian Filter
03- # (Salt and Pepper Noise)
04-
#=====================================================
=================
05- import cv2
06- import matplotlib.pyplot as plt
07- import numpy as np
08- import scipy.ndimage as sci
09- import my_package.my_functions as mf # This is a user defined package
10- # one may find the details related to its contents and usage in section
2.7.3
11-
12- #--------------------------------------------------------------------------
13- # Importing and displaying the image
14- #--------------------------------------------------------------------------
15- a=cv2.imread('img21.bmp',0)
16- fig,ax=plt.subplots(1,3)
17- fig.show()
18- mf.my_imshow(a,'(a) Input Grayscale Image',ax[0])
19- a=np.float32(a)
20-
21- #--------------------------------------------------------------------------
22- # Gaussian filtering (Linear) the image in space domain and displaying
23- #--------------------------------------------------------------------------
24- filtered_image=sci.gaussian_filter(a,1) # Second argument is sigma of
Gaussian
25- mf.my_imshow(mf.norm_uint8(filtered_image),"(b) Gaussian Filtered
Image",ax[1])
26-
27- #--------------------------------------------------------------------------
28- # Median filtering (Non-Linear) the image in space domain and
displaying
29- #--------------------------------------------------------------------------
30- filtered_image2=sci.median_filter(a,3) # Second argument is
neighborhood size
31- mf.my_imshow(mf.norm_uint8(filtered_image2),"(c) Median Filtered
Image",ax[2])
32-
33- plt.show()
34- print("Completed Successfully ...")
Code 6.1: Salt and Pepper noise removal by Gaussian versus median filtering
What is the reason for the presence of median noise? How to simulate it in
experiments? All these questions will be answered in subsequent chapters in
this book.
6.3.1 Aliasing
To understand how the pyramids could be seen as the Taj Mahal or vice
versa, refer to Figure 6.3. There are two continuous time signals in it. Both
are sine signals but one with frequency F1= Hz, d the other with frequency
F2= Hz. To bring these signals inside computers or in general digital
signal processors, they must be digitalized (time discretized and amplitude
discretized. Here, we will focus on time discretization. Let us take samples
of both signals with the same sampling frequency Fs=1 Hz. This means that
one sample will be taken every second for both signals. The sampled signals
for the corresponding continuous-time signals are also shown in the figure.
One may note that samples of both signals coincide! This is where our
problem begins. In the continuous time world, there are two different signals
(i.e., with two different frequencies), but in the discrete time world, they
become the same. That is where the pyramids become the Taj Mahal or vice
versa.
So, before looking for a solution to our problem, let us first try to find out
the frequency of the sampled signal formed in both cases. Recall that for a
continuous time world, we are using capital letters and for a discrete time
world we are using small letters to represent frequency.
Continuous world and discrete world frequencies F and f are related through
the sampling frequency Fs as per the following equation:
Equation 6.1:
So, the discrete frequency of the sampled signal 1 is
cycles/sample. Similarly, for the second discrete time signal (which comes
from the second continuous time signal after sampling), discrete frequency is
cycles/sample. Let us check whether f1=f2? Evidently, not
because . We now need to understand how the sampled signal is the
same for both cases.
Refer to the following figure for a better understanding:
log(m(x,y))=log(i(x,y))+log(r(x,y))
Conclusion
This chapter introduces the concept of sampling, its side effect aliasing, and
the method of avoiding it. Through homomorphic filtering, one gets familiar
with illumination and contrast enhancement. It is also observed that most
structural information is present in the phase part of the total spectrum,
hence it plays an important role in images. Selective filtering is also
introduced to create interactive apps and programs.
In the next chapter we will try to address the issue of noise and degradation
in the images. Although it is impossible to remove the noise completely once
it is added to the image but it can be minimized. That is what we are going to
explore.
Points to remember
• Median filtering is non-linear and is best suited for the removal of salt
and pepper noise.
• Due to improper sampling, the phenomenon of aliasing may occur. Due
to this higher-frequency components may overlap with lower-frequency
components causing visual inconsistencies in the spatial domain. To
avoid this, a proper sampling rate should be selected as per the Nyquist
sampling rate.
• Phase carries the most important structural information of the image.
• Sampling of continuous data to discrete time data is completely
reversible if samples are taken at the Nyquist sampling rate. In that case,
sampling is a lossless process.
• Quantization however is lossy.
Exercises
1. At the sampling frequency of , find 5 aliases of frequency and
corresponding discrete frequencies. Comment on the discrete
frequencies so obtained. [Hint: Section 6.3.1].
2. Explain the advantages of homomorphic filtering.
3. Suggest a way of hiding information in the images by using the fact
that most of the information in images is contained in the phase
spectrum instead of the magnitude spectrum.
4. Plot the frequency domain representation of a good quality image of a
fingerprint and comment on the shape of it. Do you find it similar to the
frequency domain plot of other images?
5. As noted in Figure 6.6, while taking projective transformation, aliasing
might occur. Suggest a method to avoid this situation.
OceanofPDF.com
CHAPTER 7
Noise and Image Restoration
7.1 Introduction
An image may become degraded because of various reasons. It may happen
due to the various stages at which it is stored and processed. These stages
could be capturing, transmitting, and storage. During capturing, we capture
the 3D world into a 2D picture by taking its projection. This projection is not
a perfect 3D to 2D projection because, between the object and the camera
(capturing sensor), there may be electromagnetic interference, fog, or any
other kind of disturbance due to which the intended object is not captured
perfectly. During transmission, the channel may be non-linear, which may
introduce phase distortion. Additionally, it may introduce noise. During
storage, to save space, compression may introduce degradation in the image.
Images that are hard printed onto posters, books, etc., may degrade because
of time. Reasons might include weather, wear-and-tear, etc. All of this leads
to changes in the appearance of images – some of which can be undone, and
the remaining cannot be. In this chapter, we will try to learn techniques that
are used to restore a noisy image to the extent possible.
At this point, we want to clarify the difference between enhancement and
restoration. You may enhance your selfie by applying various beautification
filters available in your mobile apps, but you will restore the scanned version
of an old picture of you that your parents took during your childhood. So,
enhancement is a subjective process, but restoration is objective—and this
chapter deals with the restoration of noisy images only.
We will only discuss methods of reversing the effect of noise and not
degradation. However, most of the time, noise and degradation are
simultaneously present in images. There exists a tradeoff between both
during restoration. Restoring a noisy image will mean blurring the image in
general, and restoring a degraded image will mean high-pass filtering
(sharpening) in some sense. In an image where both noise and degradation
are present, blurring and sharpening must be applied simultaneously to undo
both simultaneously.
Structure
This chapter discusses the following topics:
• Noise and degradation model
• Restoration in presence of noise only
• Detection of noise in images
• Measurement of noise in images using PSNR
• Classical methods of noise removal
• Adaptive filtering
Objectives
After reading this chapter, you will be able to identify the noise distribution
that corrupts a given image. The reader will also be able to quantify and
select a suitable method to minimize noise and, hence, restore it.
Year 1 2 3 4 5 6 7 8 9 10 11 12
Grade 1 2 3
Here, x is the intensity value in the range 0-255 for a grayscale image, p(x) is
the probability for a given intensity x. Since p(x) is probability, its value lies
in the interval [0,1] and ∫p(x)dx=1 because it is the probability density
function. One may check this fact by running the following command on
Python shell after running Code 7.1 – np.sum(normalised_hist).
Conventionally, instead of using σ, σ2 is used as a parameter for Gaussian
distribution. σ2 is called variance.
The primary causes of Gaussian noise in images are at the stages of
acquisition, where low illumination, sensor temperature, and internal circuits
note this noise. Gaussian distribution occurs very frequently in various
aspects of nature.
where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability that for a given intensity x, s is the scale parameter (or
mode, i.e., the most frequently occurring value). Although the above
equation appears like a Gaussian with 0 mean and absence of √2π in the
denominator (scaling factor), note that in the denominator of scaling factor,
s2 is outside the square root sign, but in Gaussian, σ2 was inside the square
root sign. This causes differences in the shapes of both curves. One may note
the shape of Rayleigh distribution for different values of s shown in Figure
7.4:
Figure 7.4: Rayleigh distribution for different values of scale parameter
s is called the scale parameter. The higher the value of scale, the lower the
peak and the greater the thickness of the distribution. Also, the peak shifts
rightwards for a higher scale parameter value. Note that, unlike a Gaussian
distribution, the curves are not symmetric about the vertical axis passing
through their peaks.
Figure 7.5: Rayleigh noise (mode=30) in images
Figure 7.5 shows an image with Rayleigh noise in part (d). This figure is
generated by using Code 7.1 by setting select_noise_distribution=2, due to
which np.random.rayleigh is called. One may play with the scale (mode)
parameter and note the difference in results.
where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability tor a given intensity x. k is the shape, ϴ is the scale, and Γ
is the Gamma function. There are two parameters: shape (ϴ) and scale (Γ).
Figure 7.6 shows the effect of adding synthesized Gamma noise onto the
image:
Where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability tor a given intensity x. β is the scale parameter.
Alternatively, , where λ is called as rate parameter. Figure 7.7 depicts
the effect of adding exponential noise to the input image:
Figure 7.7: Exponential noise (scale=50) corrupted image
One may try to play with the scale parameter in Code 7.1, line number 36
(set select_noise_distribution=4 in line number 17 to use exponential
noise). Increasing the scale increases the spread.
Where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability tor a given intensity x. (a,b) is the interval where the value
remains constant. is the lower limit, and b is the higher limit. Figure 7.8
shows an image corrupted by uniform noise.
Figure 7.8: Uniform noise (lower limit = 50 and higher limit =150) corrupted image
One may try to play with the low and high limit parameter in Code 7.1, line
number 40 and 41 (set select_noise_distribution=45 in line number 17 to
use uniform noise).
where x is the intensity value (in the range [0,255] for a grayscale image or
in the range [0,2k-1] for a k bit image), p(x) is the probability for a given
intensity x. Ps is the probability for a pixel being salt, and Pp is the
probability that the pixel is pepper.
As explained earlier, salt and pepper noise are called so because it manifests
as the brightest intensity and lowest intensity spots in the image. An example
is shown in Figure 7.9. As the original image is an 8-bit grayscale image,
salt noise will correspond to an intensity 28-1=255, and pepper noise will
correspond to 0. Both salt and pepper are mixed in equal amounts. However,
one may have only salt or only pepper noise or a mixture of them in any
other proportion too.
Figure 7.9 shows one such example where, to the original image, we have
added salt and pepper noise (in equal proportion) such that 10% of the total
image pixels are noise pixel. To reproduce these results, one may use Code
7.2. This code is similar to Code 7.1 in the sense that it will also help you
add different noise types to a given image. However, in Code 7.1, we have
used numpy library to add random noise; here in Code 7.2, we have used
scikit-image library for the same. You may install scikit-image library using
the procedure illustrated in Section 2.3 by using the pip install scikit-image
command.
Figure 7.9: Salt and pepper noise addition to a 8-bit grayscale image
Also, note that the random_noise function in this package expects the image
to be normalized in the range of [0,1]. Further, it adds noise to the image and
returns the noise added image itself. In the earlier case (Code 7.1 and
random_noise function from numpy library), neither was the input
expected in the range of [0,1] intensity values nor did it return the noise
added image – it returns noise, which we must manually add to the image.
01-
#=====================================================
=================
02- # PURPOSE : Adding noise of various distributions to the image [PART
2]
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- # (install the package below using 'pip install scikit-image'
06- # as per the procedure illustrated in section 2.3)
07- import scipy.ndimage as sp
08- from skimage.util import random_noise
09- import my_package.my_functions as mf # This is a user defined package
and ...
10- # one may find the details related to its contents and usage in section
2.7.3
11-
12- #----------------------------------------------------------------------
13- # Importing and normalising the image to the range [0,1]
14- #----------------------------------------------------------------------
15- input_img=cv2.imread('img2.bmp',0)
16- img=np.float64(input_img)/255
17- # Pixel values above are normalised to range 0 to 1
18- # This is the requirement of 'random_noise' function imported above
19-
20- #----------------------------------------------------------------------
21- # Choosing noise to be added and adding noise
22- #----------------------------------------------------------------------
23- select_noise_distribution=1 # (CHOOSE NOISE DISTRIBUTION
HERE)
24- # note that zero mean noise distribution is assumed wherever applicable
25- match(select_noise_distribution):
26- case 1:
27- method_name='Salt & Pepper'
28- noisy_img = random_noise(img, mode='s&p',amount=.1)
29- case 2:
30- method_name='Gaussian'
31- noisy_img = random_noise(img,
mode='gaussian',mean=0,var=.05,clip=True)
32- case 3:
33- method_name='Poisson'
34- noisy_img = random_noise(mf.norm_uint8(255*img),
mode='poisson',clip=True)
35- # The Poisson distribution is only defined for positive integers. To
apply
36- # this noise type, the number of unique values in the image is found
and
37- # the next round power of two is used to scale up the floating-point
result,
38- # after which it is scaled back down to the floating-point image range.
39- case 4:
40- method_name='Speckle'
41- noisy_img = random_noise(img,
mode='speckle',mean=0,var=.1,clip=True)
42-
43- #----------------------------------------------------------------------
44- # Displaying
45- #----------------------------------------------------------------------
46- fig,ax=plt.subplots(1,2)
47- fig.show()
48- mf.my_imshow(input_img,'(a) Grayscale Image',ax[0])
49- mf.my_imshow(mf.norm_uint8(noisy_img),'(b) Noisy image
('+str(method_name)+')',ax[1])
50-
51- plt.show()
52- print("Completed Successfully ...")
Code 7.2: Code to add various kind of noises - PART 2
In line number 28 of the code, three arguments have been passed to the
function random_noise. The first is the image onto which noise is to be
added. This image should have pixel values in the range [0,1]. The second
argument is the mode, which tells us which method is to be used, and the
third argument is the amount (specifically for salt and pepper noise), which
tells what percentage of total pixels should be affected by noise in the output
image. The output of this function is again in the range [0,1] if clip=True is
used as one of the arguments of the function in the end. Otherwise, it may
have a range a little larger than that depending upon the addition of noise to
high and low-intensity values in the image. Note that this function adds
noise to the input image itself. No separate addition of the output of this
function to the original image is needed.
Using Code 7.2, one may also add Gaussian, Poisson, and Speckle noise –
learning their distributions is left as an exercise for the reader. However, by
using the code, one can simulate and get the noisy image for further
processing.
The following table shows typical scenarios where the noise distributions
discussed above appears:
Noise
S. No. Typical use case
distribution
where r and c represent the number of rows and columns in the image
respectively. is measured in dB. Higher the PSNR, lesser is the noise in the
image. The same can be confirmed from Figure 7.12 for Rayleigh noise with
increasing parameter (mode) value. Increasing mode means more and more
intensity values are affected by noise.
Code 7.3 illustrates the procedure to calculate the value between any to
images of same shape.
01-
#=====================================================
=================
02- # PURPOSE : Calculating PSNR between two images of same shape
03-
#=====================================================
=================
04- import cv2
05-
06- img1=cv2.imread('img2.bmp',0)
07- img2=cv2.imread('img24.bmp',0)
08-
09- psnr = cv2.PSNR(img1,img2)
10- print("PSNR is : "+str(psnr)+' dB')
11-
12- print("Completed Successfully ...")
Code 7.3: PSNR calculation between two images (of same shape)
The results are shown below:
Figure 7.12: Addition of Rayleigh distributed noise with different parameter (mode) values to the test
image
7.6 Classical methods of noise removal
Assuming that only noise is present in the given image (and no other
disturbance), classical methods to remove noise include spatial domain
filtering. Here, arithmetic mean filtering, geometric mean filtering, harmonic
mean filtering, and contra-harmonic mean filtering methods are discussed.
The equations for these are given as follows:
Equation 7.9:
Equation 7.10:
Equation 7.11:
Equation 7.12:
Equations 7-9 to 7-12 are similar in form. nbd represents the neighborhood
of a pixel of some predefined size. For Q=0 in contra-harmonic filter, it
becomes an arithmetic mean filter and for Q=-1, it becomes harmonic mean
filter. The neighborhood size of the pixel under consideration is mxn.
All the above filters try to remove the noise by using its statistical properties
from a given neighborhood. However, because of different forms, some
filters work better than others in different cases and for different noise
distributions. For example, an arithmetic mean filter removes noise but
introduces blurring. The geometric mean filter does the same thing, but the
blurring is less. The harmonic mean filter works well with salt noise but fails
with pepper noise. It does well with other types of noise though.
The result of applying these equations is shown in Figure 7.13 with the code
for generating the results in Code 7.4:
Where σn2 is the noise variance. This parameter is usually unknown for a
noisy image if the noise is due to unknown sources but can be easily
estimated by the procedure illustrated in Section 7.4. σL2 and μL are the
variance and mean of the local neighborhood of shape mxn considered. Refer
to the following figure for a better understanding:
Figure 7.14: Adaptive filtering vs. arithmetic mean filtering
μL tells us the average intensity of the local neighborhood, and σn2 is related
to the contrast of the neighborhood. If σn2=0, then the filter should return the
value of pixel as it is. If the σL2 is high relative to σn2, a value closet to the
noise_img(i,j) should be returned. This region is associated with high-
frequency content like edges, sharp boundaries, corners, etc., and must be
retained as is. If the two variances are equal, an arithmetic mean is desired as
output. This situation occurs when the neighborhood of the current pixel has
the same properties as the overall image. Hence, averaging is a good idea to
remove the noise.
The result of applying Equation 7.13 is shown in Figure 7.14. The results
can be obtained by a code like Code 7.4 with the necessary modifications as
dictated by Equation 7.13. One can observe that local adaptive filtering
achieves better results.
Conclusion
Noise is an unwanted phenomenon in images which cannot be completely
removed. It can, however, be minimized. In this chapter, some methods of
identifying noise distribution were presented. Also, using PSNR, one can
quantify the noise content in the image. Depending on the type of noise
added, one could select a noise removal (minimization) method to remove
the given noise. This chapter enabled the user to take an informed decision
on which method to choose. In the next chapter, multiresolution image
processing is introduced where the topic of noise minimization is further
taken up as an application.
Points to remember
• Noise, once added, cannot be completely removed. It can only be
minimized using various methods.
• Noise can have different distributions depending on how it is generated.
• Salt and pepper noise requires median filtering, which is non-linear in
nature.
• There is no correlation between PSNR and the visual quality of the
image.
• Local noise removal methods like adaptive noise removal work better
than global methods.
• For in a contra-harmonic filter, it becomes an arithmetic mean filter, and
for , it becomes a harmonic mean filter.
Exercises
1. How is noise different from degradation?
2. Can non-linear filtering be achieved by the process of convolution?
Give reasons to support your answer.
3. If Gaussian noise has a finite non-zero mean, what will be its effect on
the appearance of the image?
4. If PSNR is low for an image, what can you conclude about the
appearance of that image?
5. Import an image using Python, add Gamma distributed noise, and
denoise the image so formed. Calculate the PSNR in both cases, i.e.,
noisy and noise minimized case.
OceanofPDF.com
CHAPTER 8
Wavelet Transform and Multi-
resolution Analysis
8.1 Introduction
This chapter is an informal introduction to the theory of wavelet transforms
and multi resolution analysis. We will try to understand what resolution is
and how traveling from one resolution to another, some information
becomes important or unimportant. We will then point out limitations of the
Fourier family of transforms, and from there, we will try to understand
wavelet transforms. Finally, we will try to put wavelet transform to use by
discussing an application on noise removal.
Structure
The following topics are covered in this chapter:
• Resolution and its impact on frequency content
• Time vs. frequency resolution
• Loss of location information in frequency domain
• Short time Fourier transform
• Concept of scale and scalogram
• Continuous wavelet transform
• Discrete wavelet transform
• Multi resolution analysis using wavelets
• Noise removal using multi resolution analysis
Objectives
After reading this chapter, the reader will be able to appreciate the
importance of resolution (or scale) in noise calculations. Wavelets are
introduced to break an image at multiple scales and do processing
individually. This will enable the reader to understand the importance of the
information contained in the image at a given resolution.
Figure 8.2: Time vs. frequency resolution (every row is a Fourier transform pair)
Since the sine frequency is same in all the three rows, one would expect the
same Fourier transform; however, that is not the case. In the first row, we see
an ideal representation. When the signal is truncated in time in the second
row, the peaks in the frequency domain remain at the same place, but two
changes are observed. First, some higher frequency components also appear
in frequency domain – this is due to abrupt truncation in time domain.
Secondly, the amplitude of peaks in the frequency domain is reduced from
0.5 to some smaller value. This is due to the principle of conservation of
energy. The energy of higher frequency components that have started to
appear now because of abrupt truncation comes from the main peak’s
energy.
In the third row, these two effects are prolonged. Additionally, one may note
that instead of having a pinpointed frequency spike in the frequency domain,
the base has thickened for the same reasons discussed above.
Having made these observations, it is important to note this interpretation –
whenever we pinpoint in frequency, time resolution becomes poor and vice
versa. Look at the first row of the figure again; by pinpointing in frequency,
we mean the presence of two pinpointed spikes in a double-sided magnitude
spectrum. By poor time resolution, we mean that the entire time duration is
selected (the signal is not truncated or pinpointed). As against this, look at
the third row; there, we pinpoint in time (i.e., a very small portion of sine
signal is selected as compared to the first row) – due to this, the frequency
spikes have broadened – i.e., frequency resolution has become poor. So,
there exists a tradeoff between time and frequency resolution – one cannot
have both good at the same time.
Having understood this important concept, let us now see another important
topic on the limitation of what Fourier family of transforms cannot do in the
coming section.
Figure 8.5: STFT for small window size. In part (b), darker regions have higher amplitudes
Part (a) of Figure 8.5 is a non-stationary signal comprised of different
frequencies – 40, 30, 20, and 10 hertz in various parts (in order). Part (b) of
the figure shows STFT of the signal in part (a). Note that for a 1D signal,
STFT will be 2D. The way we take STFT is simple – Take a window (as
shown in part (a) of the figure). The output will be computed every time.
This can be noted by the fact that the X axis of both parts is time. For a
given point in time, to calculate the output, place that window on the signal
such that it is centered at that point. Now, whatever portion is inside the
window, take its FFT, and that becomes the output for that point. Since the
output will also be a 1D signal (having both magnitude and phase), we plot
the magnitude only as a column at corresponding time in the output. Note
that we are plotting a 2-sided spectrum. That is why there are two horizontal
spikes for the first part of signal in the output at -40 and 40. Also note that in
part (b) of the figure, the darker the region, the higher is the magnitude. The
same explanation is valid for the other three regions as well. Now, looking at
part (b), both questions that we began with can be answered simultaneously.
Time axis in STFT is the answer of where and frequency axis in STFT is the
answer of what frequencies are present.
Mathematically, continuous time STFT is calculated using the following
equation:
Equation 8.1:
where τ,w are respectively, the time and frequency parameters in STFT plot.
W() is the window function used. The discrete time STFT is represented
below:
Equation 8.2:
To answer the question of what window size should be used, let us look at
Figure 8.6. It has the same results as of Figure 8.5 but with larger window
size. Note the effect this change has on the STFT. Two changes are evident –
the first is the thinning of the horizontal spikes. The reason for this is the
duality between time and frequency domain. Roughly stated, if a signal is
thick in the time domain, it will be thin in frequency domain and vice versa.
Second, at the transition point where one frequency truncates and other
frequency begins in time, STFT shows a sharp transition in Figure 8.5 but
not in Figure 8.6. It appears that two frequencies are present at the transition
time – but that is not the truth. This is because when a thick window is
placed at the transition boundary in the time domain, it will have some part
of both signals. However, for a thin window, this effect will be minimized –
but never void. This can be seen in part (b) of Figure 8.5, as there is a small
region of overlap. However, in part (b) of Figure 8.6, this effect is
prolonged:
Figure 8.6: STFT for larger window size. In part (b), darker regions have higher amplitudes
The same can be noted from Figure 8.7, which has a 3D view of STFT for
the previous two cases discussed.
Figure 8.7: Three-dimensional view of STFT for small (Figure 8.5) vs. larger (Figure 8.6) window
size
To generate outputs like the ones discussed above, use the following code:
01-
#=====================================================
=================
02- # PURPOSE : Calculating and plotting STFT of a signal
03-
#=====================================================
=================
04- import cv2, matplotlib.pyplot as plt, numpy as np
05- import scipy.fft
06- import scipy.signal
07- from mpl_toolkits.mplot3d import Axes3D
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- #-----------------------------------------------------------------
12- # Setting parameters for signal construction
13- #-----------------------------------------------------------------
14- Fs=100 # Sampling Frequency
15- T=1/Fs # Sampling interval
16- L=1001 # Total no. of samples in signal
17- n=np.linspace(0,L-1,L) # Index of time
18- t=n*T # Time axis
19-
20- #-----------------------------------------------------------------
21- # Constructing a non-stationary signal in discrete time
22- #-----------------------------------------------------------------
23- cut1=np.int16(np.floor(L/4))
24- sig1=4*np.sin(2*np.pi*(40)*t[0:cut1])
25- sig2=3*np.sin(2*np.pi*(30)*t[cut1:2*cut1])
26- sig3=2*np.sin(2*np.pi*(20)*t[2*cut1:3*cut1])
27- sig4=1*np.sin(2*np.pi*(10)*t[3*cut1::])
28- sig=np.hstack((sig1,sig2,sig3,sig4)) # Non stationary signal
29-
30- #-----------------------------------------------------------------
31- # Plotting the non-stationary signal in time domain
32- #-----------------------------------------------------------------
33- fig,ax=plt.subplots(2,1)
34- fig.show()
35- ax[0].plot(t,sig,color='k')
36- ax[0].grid()
37- ax[0].set_title("(a) Input Signal",fontsize=12)
38- ax[0].set_xlabel("t",fontsize=12)
39- ax[0].set_ylabel("Amplitude",fontsize=12)
40-
41- #-----------------------------------------------------------------
42- # Plotting the magnitude spectrum of non-stationary signal (FFT)
43- #-----------------------------------------------------------------
44- freq_axis2=np.linspace(-(L-1)/2,(L-1)/2,L)/L
45- fft_sig=scipy.fft.fftshift(scipy.fft.fft(sig/L))
46- ax[1].plot(freq_axis2*Fs,np.abs(fft_sig),color='k')
47- ax[1].grid()
48- ax[1].set_title("(b) 2 Sided Magnitude Plot (FFT)",fontsize=12,color='k')
49- ax[1].set_xlabel("F",fontsize=12)
50- ax[1].set_ylabel("Magnitude",fontsize=12)
51-
52- #-----------------------------------------------------------------
53- # Plotting the magnitude spectrum of non-stationary signal (STFT)
54- #-----------------------------------------------------------------
55- fig2,ax2=plt.subplots()
56- f, t, Zxx = scipy.signal.stft(sig,Fs,window=np.ones(30),nperseg=30)
57- amp=np.max(np.abs(Zxx))
58- ax2.pcolormesh(t, f, np.abs(Zxx),cmap='gray_r')
59- ax2.set_title('STFT (Single Sided Spectrum)')
60- ax2.set_ylabel('Frequency [Hz]')
61- ax2.set_xlabel('Time [sec]')
62- ax2.grid()
63-
64- #-----------------------------------------------------------------
65- # Magnitude spectrum of non-stationary signal (STFT in 3D)
66- #-----------------------------------------------------------------
67- fig3d = plt.figure()
68- ax3d = fig3d.add_subplot(111, projection='3d')
69- T, F = np.meshgrid(t, f)
70- ax3d.plot_surface(T, F, 2*np.abs(Zxx), cmap='gray_r', edgecolors='k')
71- # In above, multiplication by 2 is due to single sided spectrum
72- ax3d.set_title('3D STFT Magnitude (SINGLE SIDED)')
73- ax3d.set_xlabel('Time [sec]')
74- ax3d.set_ylabel('Frequency [Hz]')
75- ax3d.set_zlabel('Magnitude')
76-
77- plt.show()
78- print("Completed Successfully ...")
Code 8.1: Inbuilt method for calculating and plotting STFT in Python
The output of the preceding code is shown in the following three figures:
Figure 8.8: Output 1 of Code 8.1 - non-stationary signal and its double sided FFT (magnitude plot)
The following figure shows the STFT view:
Figure 8.9: Output 2 of Code 8.1 - STFT Top view (single sided magnitude spectrum)
Refer to the following figure to understand STFT magnitude:
Figure 8.10: Output 3 of - 3D view of STFT (Single sided magnitude spectrum)
Most of Code 8.1 is self-explanatory. Line no. 56 which is f, t, Zxx =
scipy.signal.stft(sig,Fs,window=np.ones(30),nperseg=30) calculates STFT.
There are three outputs: frequency, time, and a 2D matrix of coefficients
(STFT). The inputs are signal, sampling frequency, then a window. In our
example, it was a rectilinear window. However, it could be any window as
one may please. In the current case, it is rectangular. If nothing is specified,
it defaults to the Hann window with the total number of samples specified by
the fourth argument nperseg . A rectilinear window is not the one that we
can use as already discussed it will cause abrupt truncation errors in the
frequency domain which will lead to fringes. There are other shapes, too,
with their own advantages and disadvantages but a detailed study of window
shapes is out of scope of this book. We recommend playing with line no. 56
to see the effect of changing window size and type.
Having seen the effect of window size in STFT, in the next section, we
introduce the concept of scale which will help us determine the window size
to be used.
8.6 Concept of scale and scalogram
To understand what we mean by scale, look at Figure 8.11. In each row of
the figure, the first column has the original signal, which has two different
frequency components – high and low, localized in time at different
(consecutive) locations. Also shown in the same column is a window
function (dark in color). The second column for each row shows the output
of convolution between the original signal and window function in column
no. 1. The third column is simply the two-sided magnitude spectrum of the
second column.
If the window size is small (for convolution and not for FFT as in STFT), we
call it low scale and if it is large, we call it high scale. Column no. 1 has low
to high scale windows from top to bottom.
The preceding equation is a litmus test for finding the frequency of interest.
Assume (in a thought experiment) we have a signal sin(mx) for which we do
not know the value of m. We have a device where we can find out the
integral above (i.e., we have full control over value of n). To know the
correct value of m, we will substitute different values of n. For almost all the
values, we get a 0 answer, but if |m|=|n|,, the answer is non-zero. This
means we have identified the frequency. Now, look at the following equation
of continuous time Fourier transform, which is reproduced here for
convenience:
Equation 8.4:
In the above equation, f(x) is the function that we are testing for the presence
or absence of certain localized frequency (vibration) represented by ψ(x) -
the mother wavelet and is its translated and scaled version. An
important point to note is that by using ψ function, we do not just look for
presence (or absence) of the frequency represented by it but also where it is
in the tested signal. This where is accounted for by t i.e., translation
parameter. We also talked about window size proportional to scale in the
earlier section, the s parameter here takes care of that. Like Fourier
transform, wavelet transform is also an integral transform that finds
frequencies in the signal but additionally also discloses the location of that
frequency in the signal. To see how a wavelet looks like, let us see an
example in Figure 8.14 which shows one of the many available wavelets:
where j is the scale parameter and k is the shift parameter with mother
wavelet has impulse response (in continuous time), as shown below:
Equation 8.7:
sampled at 1,2j,22j,…,2N.
The factor of 2j comes into picture because reducing the scale of the signal
can now be done in quantized levels (not all continuous scales are possible)
is equivalent to down sampling the signal by a factor of 2 for scale j– that is
why the term 2j. Note that in Equation 8.6, f(x) is continuous but, k makes it
discrete.
Figure 8.16: Illustration of single level decomposition using DWT and re-composition using IDWT
Note that the signal in the above figure is discrete time, but it is plotted using
plot function for understanding purposes only. Part (a) in the figure shows a
signal, which is a sinusoid corrupted by noise. Noise has mostly high
frequency components by its very nature. Part (b) shows the approximation
coefficients (cA) for signal in part (a). One may note that the level of noise
seems reduced in it as it is filtered out – but not completely as noise has
some low frequency component too. Part (c) of the figure shows high
frequency content of part (a) i.e., detail coefficients (cD), which is mostly
noise. One more thing to note is that in both part (b) and (c), the length of
signal is halved as there is a down sampler too in the process, as shown in
Figure 8.11. If we take IDWT using and as inputs, we get back the original
signal exactly as shown in part (d) of the figure. The code for doing this is
given as follows:
01-
#=====================================================
=================
02- # PURPOSE : Understanding DWT and IDWT
03-
#=====================================================
=================
04- import pywt
05- import numpy as np
06- import matplotlib.pyplot as plt
07-
08- #-----------------------------------------------------------------------
09- # Creating Data and performing DWT
10- #-----------------------------------------------------------------------
11- fm=20
12- Fs=10*fm
13- L=1000
14- T=1/Fs
15- t=np.arange(0,L,1)*T
16- y0 = np.sin(2*np.pi*(fm/30)*t) # Low frequency component
17- noise = np.random.normal(0,.1,L) # Mostly high frequency component
18- y=y0+noise # Original signal
19- (cA, cD) = pywt.dwt(y,'sym2',mode='per') # mode='per' for half length
output
20-
21- #-----------------------------------------------------------------------
22- # Plotting DWT
23- #-----------------------------------------------------------------------
24- fig,ax=plt.subplots(4,1)
25- fig.show()
26-
27- ax[0].plot(t,y,'k')
28- ax[0].grid()
29- ax[0].set_title("(a) Original Signal (with noise)")
30-
31- ax[1].plot(t[0:np.int32(len(t)/2)],cA,'k')
32- ax[1].grid()
33- ax[1].set_title("(b) Approximation Coef. (Through DWT)")
34-
35- ax[2].plot(t[0:np.int32(len(t)/2)],cD,'k')
36- ax[2].grid()
37- ax[2].set_title("(c) Detail Coef. (Through DWT)")
38-
39- #-----------------------------------------------------------------------
40- # Performing IDWT
41- #-----------------------------------------------------------------------
42- # A = pywt.idwt(cA, None, 'sym2',mode='per')
43- # D = pywt.idwt(None, cD, 'sym2',mode='per')
44- # recovered_signal=A + D
45-
46- recovered_signal = pywt.idwt(cA, cD, 'sym2',mode='per')
47-
48- #-----------------------------------------------------------------------
49- # Plotting IDWT
50- #-----------------------------------------------------------------------
51- ax[3].plot(t,recovered_signal,'k')
52- ax[3].grid()
53- ax[3].set_title("(d) Recovered Signal (Through IDWT)")
54-
55- plt.show()
56- print("Completed Successfully ... ")
Code 8.3: Code illustration DWT and IDWT for single level decomposition
Most of the code is self-explanatory. Line no. 19 is used for a single level
decomposition using DWT. Similarly, line no. 46 is used for IDWT. It takes
the approximation and detail coefficients and the other parameters as input
and the same for the DWT command and regenerates the original signal
exactly. Line no. 42 to 44 are commented on as they are alternate ways of
doing the same thing. Instead of using cA and cD simultaneously, one may
generate the approximation signal and detail signal in the time domain and
then later add them in the time domain to get back the original signal without
any loss.
This also motivates us to manipulate the approximation and detail
coefficients separately and combine them using IDWT to get the modified
signal as desired. In the above code, if cA is decomposed into 2nd, 3rd and
further levels, the control that we can exercise over the signal will be more.
This is what we precisely do in the next section by demonstrating noise
removal/minimization as an example.
Figure 8.17: Comparison of wavelet and MRA based denoising to traditional denoising
Part (c) of the figure shows the result obtained using wavelet and Multi
Resolution Analysis (MRA) based denoising and part (d) shows the result
of traditional averaging based denoising. Which result is better? At first
glance, it appears that part (d) is better than part (c) as it is smoother;
however, this is not the case. It is smoother but not better. This is because if
you see the original signal, between 0.10 and 0.15 on X scale, there is a rect
type structure. This means there is a step change at the beginning and end of
this structure. Now, look at the denoised versions in part (c) and part (d). It
is evident that part (c), i.e., wavelet and MRA based denoised output
preserves it better. Hence, it is better. While removing noise, the signal
should retain its original properties. Similar observations can be made about
the portion of the signal where the sinusoidal structure ends.
Now, let us see how the denoising is achieved using wavelet and MRA.
Refer to Figure 8.18. It shows the result of five levels of decomposition
using wavelet transform for the signal of Figure 8.17 (b). The column on the
left and right in this figure display the approximation and detail coefficients
respectively.
Detail coefficients, which correspond to the high frequency content, have
mostly noise. However, as we know, there were some high frequency
transitions in the original signal too, which are also present there. If we
ignore the detail coefficients completely, that information will be lost, and
high frequency regions will be converted to low frequency regions as it
happens in traditional filtering by using averaging filters of various kind. We
need to keep important details in detail coefficients. The local spikes that one
may note in detail coefficients correspond to important high frequency
detain in the original signal – one may visually observe it too. So, instead of
completely removing the detail coefficients, it is a good idea to remove only
those detail coefficients, at every level, which are less than (say) 90% of the
peak value at that level. Such a strategy is called hard thresholding. That is
what is being done in the above results.
Figure 8.18: 5 level decomposition of the noisy signal of Figure 8.17 (b)
There are many other strategies to select the important regions in the detail
coefficients which the reader may explore. The code for generating the
results of Figure 8.17 and Figure 8.18 is given as follows:
01-
#=====================================================
=======================
02- # PURPOSE : Denoising using wavelet based MRA
03-
#=====================================================
=======================
04- import pywt
05- import numpy as np
06- import scipy.ndimage as sci
07- import matplotlib.pyplot as plt
08-
09- #----------------------------------------------------------------------
10- # Creating Data and performing DWT (At multiple resolutions - MRA)
11- #----------------------------------------------------------------------
12- fm=200
13- Fs=100*fm
14- L=2**12 # Keep this in powers of 2 for ease of plotting later
15- T=1/Fs
16- t=np.arange(0,L,1)*T
17-
18- y0 = np.zeros(L)
19- y0[np.int32(len(t)/7):np.int32(len(t)/5)+500]=1
20- y0=np.sin(2*np.pi*(fm/4)*t)*y0
21- y0[np.int32(len(t)/2):np.int32(len(t)/2)+1000]=1
22-
23- noise = np.random.normal(0,.1,L)
24- y=y0+noise # Input signal with noise
25-
26- #----------------------------------------------------------------------
27- # Decomposition by using DWT - MRA
28- #----------------------------------------------------------------------
29- levels_of_decomposition=5
30- fig1,ax1=plt.subplots(levels_of_decomposition,2)
31- fig1.show()
32-
33- cA_list=[]
34- cD_list=[]
35- y2=y.copy()
36- for i in np.arange(0,levels_of_decomposition,1):
37- (cA, cD) = pywt.dwt(y2,'sym2',mode='per')
38- cA_list.append(cA)
39- cD_list.append(cD)
40- ax1[i,0].plot(t[0:np.int32(len(y2)/2)],cA,'k')
41- ax1[i,0].grid()
42- str1="cA at level "+str(i+1)
43- ax1[i,0].set_title(str1)
44-
45- ax1[i,1].plot(t[0:np.int32(len(y2)/2)],cD,'k')
46- ax1[i,1].grid()
47- str2="cD at level "+str(i+1)
48- ax1[i,1].set_title(str2)
49-
50- y2=cA
51- #----------------------------------------------------------------------------
52- # IDWT MRA Based Reconstruction by Hard Thresholding
53- #----------------------------------------------------------------------------
54- recovered_signal=cA
55- for i in np.arange(levels_of_decomposition-1,-1,-1):
56- A = pywt.idwt(recovered_signal, None, 'sym2',mode='per')
57- thresh_array=cD_list[i].copy()
58-
thresh_array[np.abs(thresh_array)>.9*np.max(np.abs(thresh_array))]=1
59- D = pywt.idwt(None, cD_list[i]*thresh_array, 'sym2',mode='per')
60- recovered_signal=A + D
61-
62- #----------------------------------------------------------------------
63- # Traditional Noise removal (for comparison)
64- #----------------------------------------------------------------------
65- y3=y.copy()
66- filter1=np.ones(15*levels_of_decomposition)
67- filter1=filter1/np.sum(filter1)
68- recovered_signal2=sci.correlate(y3,filter1)
69-
70- #----------------------------------------------------------------------
71- # Plotting Logic
72- #----------------------------------------------------------------------
73- fig2,ax2=plt.subplots(4,1)
74- fig2.show()
75-
76- ax2[0].plot(t,y0,'k')
77- ax2[0].grid()
78- ax2[0].set_title("(a) Original Signal")
79-
80- ax2[1].plot(t,y,'k')
81- ax2[1].grid()
82- ax2[1].set_title("(b) Original Signal + Noise")
83-
84- ax2[2].plot(t,recovered_signal,color='k')
85- ax2[2].grid()
86- ax2[2].set_title("(c) Recovered Signal (Wavelet MRA Denoising)")
87-
88- ax2[3].plot(t,recovered_signal2,'k')
89- ax2[3].grid()
90- ax2[3].set_title("(d) Recovered Signal (Traditional Filtering)")
91-
92- plt.show()
93- print("Completed Successfully ... ")
Code 8.4: Denoising of 1D signal using wavelet and MRA and comparison of the results with
traditional denoising
If you have followed it so far, the code should be easy to understand.
Conclusion
In this chapter, wavelet based multi-resolution analysis was introduced to
emphasize the importance of information at different scales (or resolutions).
The results of removing the noise independent of scale vs. with scale were
shown. It was found that generally, in images, noise is present at higher
resolutions (or low scales), and hence, the image must be first decomposed
into various scales before noise removal. The results of de-noising obtained
due to MRA were better than due to conventional filtering. In the next
chapter, we introduce binary morphology which is a very important topic in
object identification and detection.
Points to remember
• Not every information in an image is important at every resolution.
• Before down sampling the image, high frequency content should be
removed.
• Higher resolution means a lower scale and vice versa.
• At lower resolution (i.e., at higher scale), only high frequency
components are not preserved.
• Low frequency contents remain practically stable at all resolutions.
• The better an object in the image is localized in time, poor will be its
frequency resolution and vice versa.
• Fourier transform is primarily meant for stationary signals.
• The width of the window affects the results of STFT.
• Approximation coefficients of DWT correspond to low frequency
regions and detail coefficients correspond to high frequencies.
• Noise is usually more in detail coefficients and can be removed by hard
or soft thresholding.
Exercises
1. What is the difference between waterfall plot and wavelet plot?
2. Explain the effect of aliasing during down sampling of data.
3. What problem does STFT solve in comparison to Fourier Transform?
4. For a one-dimensional signal, what does approximation and detail
coefficient represent?
5. How does an image denoised due to MRA using wavelet preserve the
detail in the image?
OceanofPDF.com
CHAPTER 9
Binary Morphology
9.1 Introduction
Full moon night is full moon night because the moon becomes (appears to
be) circular that day. On other days, its boundary is not (does not appear to
be) circular. You can easily identify your friend in a photograph that has
many other people who you do not know. If you see an image of a broken
object, your brain can easily figure out and mentally reconstruct the broken
part.
An image is just another piece of data. You try to identify either a sub-data
of your interest or some desired properties in that image. Your brain can do it
trivially, but when computers try to do that, we call it morphology – the
study of the form of things. Modern image processing is all about that.
Machines try to read the number plates of vehicles, identify individuals by
seeing their facial image or by scanning thumbprints, do man-less surgery,
etc. It is all about extracting information from image data. Data is not the
new gold - rather, it is ore from which gold is extracted. Here, the
information extracted is gold.
This chapter is a small step towards extracting information from images.
Small step because we will do it on binary images first, and then in the
subsequent chapters, we will do that on grayscale and colored images with
advanced methods.
Structure
This chapter covers the following topics:
• Erosion
• Dilation
• Duality between erosion and dilation
• Opening and closing
• Hit and miss transform
• Boundary extraction
• Hole filling
• Region filling
• Connected component analysis
• Connected component analysis using skimage library
• Convex hull
• Thinning
• Thickening
• Skeletons
Objectives
The objective of this chapter is to introduce the reader to the processing of
binary images. This processing is important because it helps us extract useful
structural information from images. These structures may be skeletons or a
number of connected components, etc. Although binary images are seldom
used, they could appear at intermediate processing stages of some
algorithms, and hence, it is important to understand how they are processed.
9.2 Erosion
Soil erosion is an example of naturally occurring erosion. Soil decays layer
by layer – but uniformly everywhere. In this section, we will note a similar
phenomenon about objects in binary images. We will also set the foundation
for some of the upcoming sections in terms of general procedure to be used
in binary morphology, as well as some mathematical formalization to be
used frequently in this chapter.
9.3 Dilation
Dilation is a dual operation of erosion (more on this a little later). In this
section, we will study how to dilate a binary object. We will also see a
working example through illustration and code.
Equation 9.6:
The above equations are a mathematical way of saying that erosion and
dilation operations are duals of each other with respect to set
complementation and reflection. If the structuring element is symmetric (i.e.,
= SE), which is usually the case in practice (as we will see shortly), the
above equations become Equations 9.7 and 9.8, respectively:
Equation 9.7:
And
Equation 9.8:
Let us begin by interpreting Equations 9.7and 9.8. Equation 9.7 states that if
you erode F by SE and in the eroded image, look at its background
(indicated by complement( )C), it can be alternatively found out by dilating
the background of original image by B. (The dilation of background with
logic 0 will be possible just by complimenting the input image by replacing
black pixels by white and white by black and then proceed).
Equation 9.8 says the same for dilation. In both the equations, it is
mandatory that the structuring element be symmetric. If it is not, we can use
Equations 9.5 and 9.6 where the RHS has a reflected version of the
structuring element.
Figure 9.4: Comparative illustration of opening and closing with erosion and dilation
Equation 9.10:
Also, opening and closing operations are duals of each other with respect to
the set complementation and reflection. This is represented in the following
equations respectively:
Equation 9.11:
Equation 9.12:
9.5.3 Application of opening and closing
To understand the potential application areas of opening and closing, we will
refer to Figure 9.5. Let us first understand the various foreground elements
shown in part (a) of this figure. There are five major groups of foreground
elements that we will consider in the input image; the first is a pentagon. The
second is a composite of squares and triangles with connections between
them. The third one is the cluster of four white circles placed horizontally.
The fourth one is the vertical rectangle with circular holes of different radii.
Fifth is a horizontal rectangle with three gulfs of different thicknesses
(shown at the bottom right of the figure). Each of these is designed to
understand specific attributes of opening and closing:
Figure 9.7: Hit and miss transform — broken as intersection of two erosions
Part (a) of the figure is the actual binary input image and part (b) is its
complimented form, as we can easily see the background as foreground for
the second erosion in Equation 9.13. Parts (c) and (d) show the structuring
elements used for individual erosions. Parts (e) and (f) show the results of
the eroding input image in part (a) and the complimented image in part (b)
by SE1 and SE2, respectively. The result of the intersection of these two
eroded versions is shown in part (g), which is the final hit and miss
transformed image. Note that Figure 9.6 and Figure 9.7 have the same input
(hence, the same output image). However, in Figure 9.6, we use a single
filter instead of two (let us call it SE1,2). For constructing SE1,2 from SE2 and
SE2, the foreground of SE1,2 is the foreground of SE1 and the background of
SE1,2 is the foreground of SE2. Remember that these sets of pixels never
intersect. The remaining pixels are do not care pixels in SE1,2.
Figure 9.9: Holes in a binary image (text is not a part of the figure)
It is important to note that holes are always fully bounded by boundary. That
boundary may be thick (as in the first object in the figure — hole H1 and
H4) or thin (as in the last object in the figure — hole H3). It may be thicker
at some points and thin at others. Our objective in hole filling is to design an
algorithm for filling the hole when the worst type of boundary is present. Let
us now explore the worst type of boundary and the method of filling it.
The above equation has a form like Equation 9.15, but instead of Ic, it uses I.
Also, the structuring element used is a 3x3 structure with all ones.
Additionally, we need as many pixels inside the corresponding foreground
objects as initial points inside the object of interest to start with – which is
not an automated process. To automate the process, we are going to study
connected component analysis in the next section.
Figure 9.15: Pass 1 and 2 of connected component analysis for finding connected components
The result in Figure 9.15 has been generated using the following code:
001-
#=====================================================
=================
002- # PURPOSE : Learning Connected Component Analysis
003-
#=====================================================
=================
004- import cv2
005- import matplotlib.pyplot as plt
006- import numpy as np
007- import my_package.my_functions as mf # This is a user defined
package
008- # one may find the details related to its contents and usage in section
2.7.3
009-
010- #--------------------------------------------------------------------------
011- # Defining Custom function to annotate the plots (not necessary once
we
012- # understand the concept)
013- #--------------------------------------------------------------------------
014-
015- # Function for plotting grid lines over pixels of image and pixel
numbering
016- def plot_pixel_grid_on_image(req_size,ax,img,pass0):
017- req_size_x=req_size[1]+1
018- req_size_y=req_size[0]+1
019-
020- #------------------ For grid lines on image -------------------------------
021- for i in np.arange(0,req_size_x,1):
022-
ax.plot(i*np.ones(req_size_y)-.5,np.arange(0,req_size_y,1)-.5,color='.5')
023- for i in np.arange(0,req_size_y,1):
024-
ax.plot(np.arange(0,req_size_x,1)-.5,i*np.ones(req_size_x)-.5,color='.5')
025- # In the above, color can be set as grayscale value between 0 to 1
also
026-
027- #------------------ For pixel numbering -----------------------------------
028- for i in np.arange(0,req_size_x-1,1):
029- for j in np.arange(0,req_size_y-1,1):
030- if img[j,i]==0:
031- # White text on black background
032- ax.text(i-.25,j+.25,str(pass0[j,i]),color='1',fontsize=8)
033- else:
034- # Black text on white (or any non-zero gray) background
035- ax.text(i-.25,j+.25,str(pass0[j,i]),color='0',fontsize=8)
036-
037- #--------------------------------------------------------------------------
038- # Creating a binary image for understanding the concept (This could be
039- # replaced by a binary image instead when working with real images)
040- #--------------------------------------------------------------------------
041- I=np.uint8(255*np.array([\
042- [0,0,0,0,0,0,0,0,0,0,1,1,1], \
043- [0,0,1,0,0,0,0,0,1,0,0,1,1], \
044- [0,1,1,0,1,0,1,1,0,0,0,0,0], \
045- [0,0,0,1,0,0,0,1,0,1,1,1,0], \
046- [0,0,0,0,0,0,0,0,0,1,0,1,0], \
047- [0,1,0,1,0,0,1,0,0,1,0,1,0], \
048- [0,0,1,0,0,1,1,1,0,0,1,0,0], \
049- [0,1,0,1,0,0,1,0,0,0,0,0,1], \
050- [0,0,0,0,0,0,0,0,0,0,1,0,1], \
051- [0,0,0,0,0,0,0,0,0,0,1,1,1]] ))
052-
053- # Creating zero padded image
054-
I2=cv2.copyMakeBorder(I,1,1,1,1,cv2.BORDER_CONSTANT,value=0)
055- r,c=np.shape(I)
056-
057- #--------------------------------------------------------------------
058- # PASS 1
059- #--------------------------------------------------------------------
060- next_label=1 # variable for creation of new labels
061- equ_list=[] # Equivalent labels list
062-
063- pass1=np.zeros((r+2,c+2))
064- for i in np.arange(0,r+2,1):
065- for j in np.arange(0,c+2,1):
066- if I2[i,j]==255:
067- arr=np.sort(np.array([pass1[i-1,j-1],pass1[i-1,j],\
068- pass1[i-1,j+1],pass1[i,j-1]]),0)
069- if (np.sum(arr)==0): # If all processed pixels are background
070- pass1[i,j]=next_label
071- next_label=next_label+1
072- else: # If all processed pixels are background
073- arr=np.delete(arr,np.where(arr==0))
074- pass1[i,j]=arr[0]
075- if (len(np.unique(arr))!=1):
076- equ_list.append(arr.tolist())
077-
078- pass1=np.int16(pass1)
079- fig,ax=plt.subplots(1,2)
080- mf.my_imshow(I2,'(a) PASS 1',ax[0])
081- plot_pixel_grid_on_image(np.shape(I2),ax[0],I2,pass1)
082-
083- #--------------------------------------------------------------------
084- # Equivalent labels list management logic
085- #--------------------------------------------------------------------
086- len_list=len(equ_list)
087- dummy_arr=np.zeros((len_list,2))
088- for i in np.arange((len_list-1),-1,-1):
089- dummy_arr[i,0]=i
090- dummy_arr[i,1]=equ_list[i][0]
091- dummy_arr=dummy_arr[dummy_arr[:,1].argsort()]
092-
093- #--------------------------------------------------------------------
094- # PASS 2
095- #--------------------------------------------------------------------
096- pass2=pass1.copy()
097- for i in np.arange(len_list-1,-1,-1):
098- len_sub_list=len(equ_list[i])
099- for j in np.arange(1,len_sub_list,1):
100- pass2[np.where(pass2==equ_list[i][j])]=equ_list[i][0]
101-
102- mf.my_imshow(I2,'(b) PASS 2',ax[1])
103- plot_pixel_grid_on_image(np.shape(I2),ax[1],I2,pass2)
104-
105- plt.show()
106- print("Completed Successfully ...")
Code 9.8: Connected component analysis
A few important points to note about the code are — first, in line number 16,
the last argument is changed from the previous codes in this chapter as we
intend to display the label number and not the pixel linear indices.
Accordingly, line numbers 32 and 35 have changed. The code only looks
large in length because we want to plot the grid and display the label
numbers over the image. Also, we have used an image created inside the
code — we could simply import any binary image for doing the same thing.
The code is otherwise compact and can be easily understood from the
explanation above.
Figure 9.17: Convex and non-convex objects together with convex hull
The rubber band in our experiment is called convex hull. Note that all the
interior angles of a convex hull (whether it is drawn for convex or non-
convex object) are less than 180 degrees. The objects may not always be in
polygonal shape. Then, it will be difficult to compute the internal angles.
That is where the convexity of the surface from the outer side (or concavity
from inner side) is guaranteed for a convex hull.
An alternate definition of convex objects (and not convex hull) is that if a
line segment is drawn between any two points inside a convex object, it
always remains inside the object — this is true for any two pair of points
(pixels) in the object.
1 pivot 1 2 Pass
2 1 2 3 Pass
3 2 3 4 Fail
4 1 2 4 Pass
5 2 4 5 Pass
6 4 5 6 Fail
7 2 4 6 Fail
8 1 2 6 Pass
9 2 6 7 Pass
10 6 7 8 Pass
11 7 8 9 Fail
12 6 7 9 Fail
13 2 6 9 Pass
14 6 9 pivot Pass
If the value of β, which is the difference between the slopes, in the above
equation is 0, then the points are collinear. If it is positive, then there is a
right turn (clockwise rotation) and if it is negative, the turn is left
(counterclockwise). Instead of using the above equation, the following
modified form of the following equation can also be used:
Equation 9.18:
with the same interpretation for µ as for β. We will use this in Code 9.10. Do
not be intimidated by the number of lines used in code. The code is simple if
you have followed it so far:
01-
#=====================================================
========================
02- # PURPOSE : Finding Convex Hull using Graham Scan Algorithm
03-
#=====================================================
========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import skimage as ski
08- import math
09- import my_package.my_functions as mf # This is a user defined package
10- # one may find the details related to its contents and usage in section
2.7.3
11-
12- # Function to calculate the convex hull using Grahm Scan Procedure
13- def conv_hull_graham_scan(points_list): # Input - Unsorted List of 2D
points
14- n = len(points_list)
15-
16- # Nested-Function to find the orientation indicator of triplet
17- # of points (p, q, r). Returns:
18- # 0: Collinear points_list
19- # 1: Clockwise points_list
20- # 2: Counterclockwise points_list
21- def find_orient_indicator(p, q, r):
22- val = (q[1] - p[1]) * (r[0] - q[0]) - (q[0] - p[0]) * (r[1] - q[1])
23- if val == 0:
24- return 0
25- return 1 if val > 0 else 2
26-
27- # Find the point with the lowest y-coordinate (and leftmost if tie
appears)
28- btm_left_pt = min(points_list, key=lambda point: (point[1], point[0]))
29-
30- # Sort the points_list based on polar angle with respect to the
btm_left_pt point
31- sorted_points_list = sorted(points_list, key=lambda point:\
32- (math.atan2(point[1] - btm_left_pt[1], point[0] - btm_left_pt[0]),
point))
33-
34- # Initialize the convex hull with the first three points_list
35- hull = [sorted_points_list[0], sorted_points_list[1],
sorted_points_list[2]]
36- hull_index=[0,1,2]
37-
38- # Iterate over the sorted points_list to build the convex hull
39- print('p','q','r','Pass?')
40- for i in range(3, n):
41- print(hull_index[-3],hull_index[-2], hull_index[-1],1)
42- while len(hull) > 1 and find_orient_indicator(hull[-2], hull[-1],
sorted_points_list[i]) != 2:
43- hull.pop()
44- pop_ele=hull_index.pop()
45- print(hull_index[-1],pop_ele,i,0)
46- hull.append(sorted_points_list[i])
47- hull_index.append(i)
48- print(hull_index[-3],hull_index[-2], hull_index[-1],1)
49- print(hull_index[-2], hull_index[-1],0,1)
50-
51- return hull,sorted_points_list
52-
53- # Creating Random points in 2D space (points should be more than 2)
54- # Although fractional coordinates can also be taken, for illustration,
55- # we take integer coordinates.
56- #points_list = np.random.randint(1,10,(10,2)).tolist()
57- points_list = np.array([(7,9),(5,8),(2,7),(4,6),(2,6),(6,6),(4,5),(8,5),(1,4),
(6,4)]).tolist()
58-
59- # Calculate the convex hull
60- convex_hull_pts_list,sorted_points_list =
conv_hull_graham_scan(points_list)
61- sorted_points_list=np.array(sorted_points_list)
62-
63- #--------------------------------------------------------------------------
64- # Plotting Logic
65- #--------------------------------------------------------------------------
66- # Plot the convex hull and points_list
67- fig,ax=plt.subplots(1,2)
68- fig.show()
69- points_array=np.array(points_list)
70- convex_hull_pts_array=np.array(convex_hull_pts_list)
71-
ax[1].plot(points_array[:,0],points_array[:,1],'k.',markersize=10,label="2D
Points")
72-
73- # Append the initial point at the last of the array as well for plotting
74- # so that the convex hull is closed curve
75-
convex_hull_pts_array2=np.vstack((convex_hull_pts_array,convex_hull_pts
_array[0]))
76-
ax[1].plot(convex_hull_pts_array[0,0],convex_hull_pts_array[0,1],'ko',mark
ersize=10,label="Pivot point")
77- r,c=np.shape(points_array)
78-
79- for i in np.arange(1,r,1):
80- ax[1].plot([sorted_points_list[0,0],sorted_points_list[i,0]],\
81- [sorted_points_list[0,1],sorted_points_list[i,1]],'--
',markersize=10,color=[.7,.7,.7])
82- ax[1].text(sorted_points_list[i,0],sorted_points_list[i,1],'('+str(i)+')')
83- ax[1].plot(convex_hull_pts_array2[:,0],convex_hull_pts_array2[:,1],'-
',color='black',label="Convex Hull")
84-
ax[0].plot(points_array[:,0],points_array[:,1],'k.',markersize=10,label="2D
Points")
85- ax[0].grid()
86- ax[0].set_title("(a) Set of points",fontsize=15)
87- ax[0].set_xlabel("The X axis ->")
88- ax[0].set_ylabel("The Y axis ->")
89- ax[0].axis("equal")
90- ax[0].legend()
91- ax[1].grid()
92- ax[1].set_title("(b) Convex Hull of set of points",fontsize=15)
93- ax[1].set_xlabel("The X axis ->")
94- ax[1].set_ylabel("The Y axis ->")
95- ax[1].axis("equal")
96- ax[1].legend()
97-
98- plt.show()
99- print("Completed Successfully ...")
Code 9.10: Grahm scan algorithm for finding convex hull
From line numbers 15 to 51, the function for the Graham scan procedure is
written. We will come to it a little later. Let us first discuss from line number
56 onwards. In line number 56, a list of random points is created. However,
if you want to see the exact graphs of Figure 9.18, you may use line number
57 by uncommenting it. In line number 60, the convex hull is calculated
through the function conv_hull_graham_scan, and the returned variables
convex_hull_pts_list,sorted_points_list contain the convex hull and sorted
points list as discussed, respectively. Rest all is plotting logic.
The function conv_hull_graham_scan has a sub function
find_orient_indicator, which finds whether we have to move left or not, as
per the discussion at the beginning of this sub section. In line number 28,
which is btm_left_pt = min(points_list, key=lambda point: (point[1],
point[0])), we find out the point with the lowest Y coordinate, and if there is
a tie between two or many points, we select the point with the lowest X
coordinate. Let us understand the min function in some detail through the
following code example:
1- a=[1.5,.5,-3,2]
2- b=[x**2 for x in a]
3- print(min(a))
4- print(b)
5- print(min(a,key=lambda x:x**2))
Code 9.11: Understanding min function in python
The output of the code is as follows:
-3
[2.25, 0.25, 9, 4]
0.5
First, we should note that we are learning about the min function of Python
that does not belong to numpy or any library. In Code 9.11, line number 1
defines a list of some numbers in variable a. In line number 2, we create a
list b such that every element of b is a squared element of a. In line number
3, the min function is used, and the value of the minimum element of list a,
which is -3, is printed. Now begins the interesting part. In line number 4, list
b is printed, and then in line number 5, we again apply the min function to
list a to find the minimum element, but this time according to some key. This
key is a lambda function (it can be a function in general) that returns x
containing squares of all elements in list a. Minimum element is found from
x and not a. Whatever is the index of that minimum value in x, at that same
index in a, the value is returned as an answer of this min function. From the
output, the minimum amongst squared elements is 0.25. Its index is 1, and
a(1) =0.5.
Coming back to line number 28 of Code 9.10, btm_left_pt is the point with
a minimum Y coordinate, and if there is a tie, it selects the minimum X
coordinate. This is ensured by key=lambda point: (point[1], point[0]).
Similarly, in line number 31, sorted_points_list is created as per the angles
made with respect to the X axis, as discussed earlier, by using the sorted
function and key as the angle. In line number 35, we initialize hull, i.e., a
convex hull, as a stack prefilled with the first three sorted points. In line
number 36, we initialize the corresponding indices, too. This is for the
purposes of printing the table like Table 9.1. From line number 39 to 51, the
convex hull is created in the hull variable, and the table is printed on python
shell output as follows (for the example of Figure 9.18):
p q r Pass?
0121
1231
2340
1241
2451
4560
2460
1261
2671
6781
7890
6790
2691
6901
Completed Successfully ...
In the above output, 1 represents pass, and 0 represents fail in the last
column. A 0 in columns of p, q, and r represents the pivot (starting and
ending) point of the convex hull.
9.13 Thinning
Thinning operation is an application of the hit and miss transform. It is a
specific type of skeletonization – which is a general category (to be
discussed in coming sections). By now, we know that erosion and opening
operations reduce the object in size. If erosion is applied enough times, the
foreground completely vanishes. However, the opening can only be applied
once as applying it a second time onwards does not change the result. There
is something in between these two extremes — thinning. Even if an infinite
number of iterations are allowed, it will not erode the object completely. It
preserves the extent and connectivity of foreground elements. Let us
understand it through some illustrations.
or alternatively
Equation 9.20:
with the symbols having a usual meaning as defined earlier. The structuring
element may have any desired shape, but usually, at each successive
iteration, we do not use the same structuring element. Rather, we use a set of
structuring elements and repeat them once the set has been exhausted. This
is done to uniformly thin the object edges from all sides. An example of a set
of structuring elements is shown in Figure 9.23. The order of structuring
elements applied at each iteration also matters.
9.14 Thickening
Thickening is the morphological dual of thinning and is defined by the
following equation:
Equation 9.21:
The structuring elements remain the same in form as for thinning, but the
foreground is replaced by the background. Do not care about being do not
care — remain unaffected.
Thickening is usually applied as thinning of the background of the input
image by first complementing the input image and then complementing the
result obtained after the background (now a foreground of the complimented
image) thinning. During thickening, after the final image is obtained
according to the procedure used above, there may be some disconnected
points or small sets of pixels that need post processing for their removal.
9.15 Skeletons
Skeletonizing is a general operation of finding the skeleton of a given object
in a binary image. This can be done in many ways. One such way was
thinning, which was covered in Section 9.13. There can be many more ways.
We discuss one such method in this section, followed by a different
implementation in Python to see different perspectives.
The above equation generates parts (g), (h), and (i) of Figure 9.24 from pairs
in parts (a) and (d), parts (b) and (e), and parts (c) and (f). This is shown in
the following equation:
Equation 9.24:
The above equation generates part (p) of the figure from parts (m), (n), and
(o), which were generated by dilating parts (j), (k), and (l), respectively.
Conclusion
In this chapter, the processing algorithms of binary images were introduced.
They include erosion, dilation, opening, closing, hit and miss transform, etc.
Based on these basic operations, one can form higher operations like
boundary extraction, hole filling, connected component analysis, thinning,
thickening, and skeletonizing, etc.
Points to remember
• Erosion tends to remove the outer layer of the binary object if a box filter
is used.
• Dilation does the opposite of erosion.
• Opening and closing are like erosion and dilation with the difference of
how they treat protrusions.
• Finding the convex hull of a two-dimensional object can be compared to
wrapping a rubber band around it.
• Thinning is one kind of skeletonizing.
• There may be multiple definitions of skeletonizing.
Exercises
1. Take a grayscale image and import it using Python. Binarize it by using
some threshold value. Find the number of connected components in the
binary image.
2. Write a Python code that automatically fills all holes of a given size or
less in a binary image.
3. Read the algorithm behind skeletonizing in Python using the help
command and compare it with the one presented in this chapter. List out
its advantages and disadvantages.
4. Take an image of a fingerprint and try to remove its noise by using
binary morphology.
OceanofPDF.com
Index
Symbols
2D DFT, configuring 209, 210
2D DFT Domain, frequency 210-217
2D Discrete Fourier Transform (2D DFT) 209
2D With Python, filtering 155-159
A
Adaptive Filtering 300, 301
B
Boundary Extraction 360
Boundary Extraction Code, implementing 362
Boundary Extraction Mathematics, operations 360
C
code files 22
Color Model 135
Color Model, types
Cyan-Magenta-Yellow (CMY) 136
Hue-Saturation-Intensity 137, 138
RCB 135, 136
Connected Component 368
Connected Component, pass 368
Continuous Time, ways
aperiodic 194, 195
fourier, transforms 195-197
periodic 193, 194
Convex Hull 375, 376
Convex Hull, architecture 376-378
Convex Hull Code, implementing 379-384
Convex Hull Image, optimizing 384
Convolution/Correlation, comparing 183-186
D
DataFrame 78
Degradation Model 278, 279
derivative-based image, section
Prewitt Kernel 174-176
Roberts Kernel 178
Sobel Kernel 177, 178
Digital Image 2
Digital Image, processing 3
Dilation 342
Dilation, illustration 343
Dilation Mathematics, operations 343, 344
Dilation Python, code 344, 347
Discrete Time Fourier Series (DTFS) 203
Discrete Time Signals 197, 198
Discrete Time Signals, optimizing 198-202
DTFS, ways
aperiodic, signals 204
fourier, transform 205-209
periodic, signals 203, 204
E
Erosion/Dilation, interpreting 347
F
Fast Fourier transform (FFT) 205
filter design, elements
derivative filters 150-152
second order, velocity 152, 153
weight, average 149, 150
Filtering, domains
1D Frequency 219-223
2D Ideal 227-230
band pass 249, 250
band stop 248, 249
Butterworth 237-247
frequency 223, 224
Gaussian Lowpass 231-237
high pass 247, 248
Fourier Transform, utilizing 308-314
Frequency Domain, visualizing 306-308
G
Grayscale Image 4
H
help 26
Histogram 91
Histogram, architecture 104
Histogram Equalization, concepts
digital data, limitations 103
dimensional data, visualizing 99-101
mathematical, pre-requisite 99
Histogram, implementing 104-106
Histogram Matching 107-109
Histogram Mathematical Data, optimizing 104
Histogram, process
Equalization 97-99
Grayscale Image, preventing 92, 93
information, obtaining 94-96
Hit/Miss Code, implementing 357-359
Hit/Miss Mathematic, operations 356, 357
Hit/Miss Transform 354-356
Hole Filling 363
Hole Filling, algorithm 364, 365
Hole Filling, architecture 364
Hole Filling Code, implementing 365-367
Homomorphic Filtering 262
Homomorphic Filtering, components
contrast, improving 267, 268
Illumination 262, 263
smoothening 263-267
I
IDLE’s editor 21
Image Conventions, indexing 8-10
Image, filtering 272-275
Image, formats 10
Intensity Transformation 109
Intensity Transformation, key points
Logarithmic Transformation 111-114
negatives 110
Power Law, correction 114-117
L
linear index 9
list 29
M
Mac/Linux, steps 19, 20
math library 25
Matplotlib 66
Matplotlib, points
simple graphs, plotting 66-69
subplots, using 69-71
Median Filtering 254-257
N
Neighborhood 90, 91
noise 279-283
noise, configuring 295-300
noise image, detecting 292, 293
noise removal, ways
1D Signal 324-328
Wavelets 329-332
noise, types
Erlang 286, 287
Exponential 287
Gaussian 283, 284
Rayleigh 285, 286
Salt/Pepper 288-291
Uniform 288
np.array() 53
NumPy Library 40
NumPy Library, points
Array Operations 57-59
One Dimensional Array, optimizing 40-47
Sub-Arrays, preventing 61-65
Three Dimensional Array, utilizing 54-56
Two Dimensional Array, optimizing 47-53
NumPy Library, steps 16-18
O
One Dimension 142, 143
One Dimension, aspects
2D Version, filtering 153-155
filter design, intuition 148
Graphical Illistrates 146, 147
One Dimension, elements
analog signals 190-192
Periodic/Aperiodic, signals 193
One Dimension System 143
One Dimension System, types
Linear Systems 144
Linear Time Invariants (LTI) 145
Open/Close 348
Open/Close, applications 350, 351
Open/Close Code, optimizing 351
Open/Close, illustration 348
Open/Close Mathematical, operations 349
OpenCV 71
OpenCV, concepts
image, importing 72, 73
Matplotlib Image, displaying 73-75
Python Package, importing 75-77
P
Pandas 78
Pandas Library, optimizing 78-83
Peak Signal-to-Noise Ratio (PSNR) 293, 294
Phase Spectrum 268
Phase Spectrum, configuring 268-270
Phase Spectrum Images, swapping 270, 271
Phase Spectrum Phase, filtering 271, 272
pixel address 9
pixel index 9
Pixel/Patches, configuring 86-90
pixels 2
Python, elements
conditional, statements 31-33
Hello World, optimizing 20, 21
IDLE Editor 21, 22
Input/Output, optimizing 22-26
Lambda Function, utilizing 37-39
List Data, structure 26-30
Loops 34-37
Tuple Data, structure 30, 31
variables, printing 21
Python Library, steps 18
Python shell 20
Python Software 14
Python Software, steps 14-16
R
Region Filling 367
Resolution/Frequency, utilizing 304, 305
RGB Image 5, 6
RGB Image, interpretation 7, 8
S
Sampling Theorem 257
Sampling Theorem, ways
1D, generalization 261
Aliasing 257-261
scale/scalogram, concepts 314-316
Sharpening Filters 171
Sharpening Filters, aspects
2D Derivative 182, 183
derivative-based image 174
unsharp, masking 171, 172
Skeletonizing 389
Skeletonizing Code, implementing 392, 393
Skeletonizing, illustration 389-391
Skeletonizing Mathematical, operations 391
Skimage Library 373-375
Smoothening Filters 159, 160
Smoothening Filters, types
Averaging 160, 161
Circular 161-165
Gaussian 166-169
Weight 165, 166
Soil Erosion 336
Soil Erosion, application 339
Soil Erosion, illustration 336-338
Soil Erosion Mathematic, operations 338
Soil Erosion Python, code 339, 340
Spatial Transformation 117
Spatial Transformation, aspects
Affine 117-124
Projective 126-134
subscripted index 9
Subscripted/Linear Indices, comparing 360-362
T
Thickening 389
Thinning 386
Thinning Code, implementing 388, 389
Thinning, illustration 386, 387
Thinning Mathematics, operations 387, 388
Time/Frequency, resolution 305, 306
U
uint8 86
V
variable 21
W
Wavelet Transform 316-320
Wavelet Transform, analyzing 321-324
OceanofPDF.com