0% found this document useful (0 votes)
54 views

OceanofPDF.com Digital Image Processing Using Python - Manish Kashyap

The document is a comprehensive guide titled 'Digital Image Processing Using Python' by Dr. Manish Kashyap, aimed at providing an introduction to digital image processing with practical applications using Python. It covers fundamental concepts, techniques, and libraries essential for manipulating and analyzing digital images, structured into various chapters that progressively build on each topic. The book is designed for students and professionals alike, offering hands-on experience and insights into the field of image processing.

Uploaded by

Alisha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

OceanofPDF.com Digital Image Processing Using Python - Manish Kashyap

The document is a comprehensive guide titled 'Digital Image Processing Using Python' by Dr. Manish Kashyap, aimed at providing an introduction to digital image processing with practical applications using Python. It covers fundamental concepts, techniques, and libraries essential for manipulating and analyzing digital images, structured into various chapters that progressively build on each topic. The book is designed for students and professionals alike, offering hands-on experience and insights into the field of image processing.

Uploaded by

Alisha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 482

OceanofPDF.

com
Digital Image
Processing Using
Python
A comprehensive guide to the
fundamentals
of digital image processing

Dr. Manish Kashyap

www.bpbonline.com
OceanofPDF.com
First Edition 2025

Copyright © BPB Publications, India

ISBN: 978-93-65898-910

All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any
form or by any means or stored in a database or retrieval system, without the prior written permission
of the publisher with the exception to the program listings which may be entered, stored and executed
in a computer system, but they can not be reproduced by the means of publication, photocopy,
recording, or by any electronic and mechanical means.

LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY


The information contained in this book is true to correct and the best of author’s and publisher’s
knowledge. The author has made every effort to ensure the accuracy of these publications, but
publisher cannot be held responsible for any loss or damage arising from any information in this
book.

All trademarks referred to in the book are acknowledged as properties of their respective owners but
BPB Publications cannot guarantee the accuracy of this information.

www.bpbonline.com

OceanofPDF.com
Dedicated to

My family
and
The only truth I know – Krishna

OceanofPDF.com
About the Author

Dr. Manish Kashyap is an Assistant Professor at Maulana Azad National


Institute of Technology Bhopal (NIT Bhopal). He received his Ph.D. in
image processing in 2017 from the Indian Institute of Information
Technology and Management Gwalior (IIIT Gwalior). Dr. Kashyap
completed his M.Tech and B.Tech in Electronics and Communication
Engineering from Jaypee Institute of Information Technology, NOIDA in
2013 and 2009, respectively. He has authored a good number of research
papers in Scientific Citation Indexed and SCOPUS-indexed journals and
conferences. His primary objective is to make technical subjects more
accessible and understandable for students and readers through his lectures,
research papers, articles, and books.
OceanofPDF.com
About the Reviewers

Dr. Ravi Shanker completed his PhD from the Atal Bihari Vajpayee Indian
Institute of Information Technology and Management (ABV-IIITM),
Gwalior. He has gained research experience through a DST-SERB-
sponsored project at institutes of national importance. As a researcher, he
has developed various computer-assisted diagnostic (CAD) systems for
classifying brain MRI images.
Dr. Shanker has published articles in SCIE journals, authored book
chapters, and presented papers at various international and Scopus-indexed
conferences. He also serves as a reviewer for multiple SCIE and Scopus-
indexed journals, including The Journal of Supercomputing, Concurrency
and Computation: Practice & Experience, IEEE Access, and the
International Journal of Imaging Systems and Technology.
Currently, he works as an Assistant Professor at the Indian Institute of
Information Technology, Ranchi.
Shekhar is a Senior Data Scientist based in Hamburg, Germany, with 15
years of expertise in AI and machine learning. He earned his Master's in
Data Science with distinction, conducting pioneering research in computer
vision. His work has been featured in prominent deep-learning publications,
establishing him as an industry thought leader.
His technical expertise spans AWS, Google Cloud, Azure, and IBM Cloud,
where he excels at implementing enterprise-scale AI solutions. In his
current role, he leads innovation in generative AI, focusing on large
language model fine-tuning, RAG systems, and AI agent orchestration. He
also specializes in integrating these technologies into enterprise systems to
develop production-ready applications that deliver tangible business value.
As a technical leader, he builds and mentors high-performing data science
teams while implementing MLOps best practices. His leadership style
combines technical depth with business acumen, enabling organizations to
successfully navigate AI transformation.
Outside of work, Shekhar is an avid CrossFit enthusiast, cyclist, and
marathon runner. His comprehensive understanding of AI theory and
practice, particularly in generative AI, makes him a sought-after technical
reviewer and industry expert.
OceanofPDF.com
Acknowledgement

I want to express my deepest gratitude to my family and friends for their


unwavering support and encouragement throughout this book's writing,
especially my mother.
I am also grateful to BPB Publications for their guidance and expertise in
bringing this book to fruition. It was a long journey of revising this book,
with valuable participation and collaboration of reviewers, technical
experts, and editors.
I would like to acknowledge the invaluable contributions of my colleagues
and co-workers throughout my years in academia, who have taught me so
much and provided valuable feedback on my work.
Finally, I would like to thank all the readers for their interest in my book
and their support in making it a reality. Your encouragement has been truly
invaluable.
OceanofPDF.com
Preface

In the rapidly evolving field of digital image processing, Python has


emerged as a powerful and accessible tool for tackling complex problems
and bringing innovative solutions to life. This book, Digital Image
Processing Using Python, is designed to provide a comprehensive and
practical introduction to this dynamic field. Whether you are an
undergraduate student, a master’s level scholar, or a research professional,
this book offers valuable insights and hands-on experience in image
processing.
The primary goal of this book is to bridge the gap between theoretical
concepts and practical applications. Each chapter is meticulously crafted to
explain fundamental image processing techniques while integrating real-
world Python code examples. These examples are intended to reinforce
your understanding and provide you with the skills to implement these
techniques in various scenarios. By working through these examples, you
will understand how to manipulate and analyze digital images effectively.
As you progress through this book, you will not only learn about the core
principles of image processing but also how to apply them using Python’s
powerful libraries. The practical approach adopted here is aimed at making
complex concepts more accessible and relatable. We hope this book will
serve as a valuable resource for those looking to advance their knowledge
and skills in image processing, and inspire you to explore further into this
exciting field. The book covers the following chapters:
Chapter 1: Introduction to Digital Images - This chapter covers the
foundational concepts of digital images, detailing their structure,
representation, and the role of pixels in image data. It covers the essential
principles that define digital images, providing a thorough understanding
necessary for advanced image processing techniques. This introduction is
crucial for grasping the complexities of image manipulation and sets the
groundwork for subsequent chapters on practical applications and Python
implementations
Chapter 2: Python Fundamental and Related Libraries - This chapter
covers the core Python programming concepts and libraries essential for
digital image processing. It also covers fundamental Python syntax, data
structures, and introduces key libraries such as NumPy, Matplotlib, and
OpenCV, which are pivotal for manipulating and analyzing images. By
establishing a solid understanding of these tools, this chapter provides the
technical foundation required for effectively implementing image
processing techniques in later sections of the book.
Chapter 3: Playing with Digital Images - This chapter covers practical
techniques for interacting with digital images, focusing on manipulating
pixels and patches, analyzing pixel neighborhoods, and performing
histogram processing. It explores basic transformations applied to images
and introduces various color models essential for understanding image
representation. This chapter aims to equip readers with hands-on experience
in fundamental image processing tasks, setting the stage for more advanced
manipulations and analyses.
Chapter 4: Spatial Domain Processing - This chapter covers the
principles of spatial domain image processing, beginning with one-
dimensional signals and systems. It includes graphical illustrations of one-
dimensional convolution and provides an intuitive understanding of filter
design. The chapter then extends to two-dimensional filtering, detailing
methods for applying filters using Python, and covers both smoothing low-
pass filters and sharpening filters. Additionally, it differentiates between
convolution and correlation, offering a comprehensive overview of spatial
domain techniques essential for effective image manipulation.
Chapter 5: Frequency Domain Image Processing - This chapter covers
the principles of frequency domain image processing, beginning with the
analysis of one-dimensional analog and discrete time signals in both time
and frequency domains. It extends to two-dimensional Fourier transforms,
detailing their application to image processing. The chapter also covers
filtering techniques in the frequency domain, providing a thorough
understanding of how frequency-based methods can enhance and
manipulate images effectively.
Chapter 6: Non-linear Image Processing and the Issue of Phase - This
chapter covers non-linear image processing techniques and the concept of
phase in image analysis. It covers median filtering in the spatial domain for
removing salt-and-pepper noise and addresses the conversion of continuous
data to discrete formats, including the sampling theorem. Additionally, the
chapter explores homomorphic filtering, the role of phase in image
processing, and selective filtering methods, providing a comprehensive
understanding of advanced techniques for image enhancement and noise
reduction.
Chapter 7: Noise and Image Restoration - This chapter covers noise and
degradation in images, starting with an overview of noise and degradation
models. It covers techniques for image restoration in the presence of noise,
including methods for detecting and measuring noise using PSNR. The
chapter also explores classical noise removal methods and adaptive filtering
techniques, offering a thorough understanding of strategies to improve
image quality and restore clarity amidst noise.
Chapter 8: Wavelet Transform and Multi-resolution Analysis - This
chapter covers wavelet transforms and multi-resolution analysis, beginning
with the impact of resolution on frequency content and the trade-offs
between time and frequency resolution. It addresses the loss of location
information in the frequency domain and introduces the short-time Fourier
transform. The chapter covers key concepts such as scale, scalogram, and
both continuous and discrete wavelet transforms. Additionally, it explores
multi-resolution analysis techniques using wavelets and their applications
for noise removal, providing a comprehensive understanding of advanced
methods for analyzing and processing image data.
Chapter 9: Binary Morphology - This chapter covers the essential
concepts and techniques of binary morphology, focusing on operations such
as erosion and dilation, and their duality. It explores advanced
morphological processes including opening and closing, hit-and-miss
transform, and boundary extraction. The chapter also addresses hole and
region filling, connected component analysis, and its implementation using
the image library. Additional topics include convex hull, thinning,
thickening, and skeletonization, providing a comprehensive guide to
manipulating and analyzing binary images through morphological
techniques.
OceanofPDF.com
Code Bundle and Coloured Images
Please follow the link to download the
Code Bundle and the Coloured Images of the book:

https://rebrand.ly/90b15f

The code bundle for the book is also hosted on GitHub at


https://github.com/bpbpublications/Digital-Image-Processing-Using-
Python. In case there’s an update to the code, it will be updated on the
existing GitHub repository.
We have code bundles from our rich catalogue of books and videos
available at https://github.com/bpbpublications. Check them out!

Errata
We take immense pride in our work at BPB Publications and follow best
practices to ensure the accuracy of our content to provide with an indulging
reading experience to our subscribers. Our readers are our mirrors, and we
use their inputs to reflect and improve upon human errors, if any, that may
have occurred during the publishing processes involved. To let us maintain
the quality and help us reach out to any readers who might be having
difficulties due to any unforeseen errors, please write to us at :
[email protected]
Your support, suggestions and feedbacks are highly appreciated by the BPB
Publications’ Family.

Did you know that BPB offers eBook versions of every book published, with PDF and ePub files
available? You can upgrade to the eBook version at www.bpbonline.com and as a print book
customer, you are entitled to a discount on the eBook copy. Get in touch with us at :
[email protected] for more details.
At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a
range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.

Piracy
If you come across any illegal copies of our works in any form on the internet, we would be
grateful if you would provide us with the location address or website name. Please contact us at
[email protected] with a link to the material.

If you are interested in becoming an author


If there is a topic that you have expertise in, and you are interested in either writing or
contributing to a book, please visit www.bpbonline.com. We have worked with thousands of
developers and tech professionals, just like you, to help them share their insights with the global
tech community. You can make a general application, apply for a specific hot topic that we are
recruiting an author for, or submit your own idea.

Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site
that you purchased it from? Potential readers can then see and use your unbiased opinion to make
purchase decisions. We at BPB can understand what you think about our products, and our
authors can see your feedback on their book. Thank you!
For more information about BPB, please visit www.bpbonline.com.

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
Table of Contents

1. Introduction to Digital Images


1.1 Introduction
Structure
Objectives
1.1.1 Digital images
1.1.2 Digital image processing
1.2 Grayscale and RGB images
1.2.1 Grayscale image as a two dimensional array of data
1.2.2 RGB colored image
1.2.3 Interpretation of color frames in an RGB image
1.3 Basic image conventions and indexing
1.4 Image formats
Conclusion
Points to remember
Exercises

2. Python Fundamentals and Related Libraries


2.1 Introduction
Structure
Objectives
2.2 Installing Python software
2.3 General procedure for installing libraries in Python
2.4 Basic language elements of Python
2.4.1 Hello world program
2.4.2 Defining and printing variables
2.4.3 Using IDLE editor in Python
2.4.4 Defining variables, basic math, typecasting, and input/output
2.4.5 List data structure in Python
2.4.6 Tuple data structure in Python
2.4.7 Conditional statements in Python
2.4.8 Loops in Python
2.4.9 Functions and lambdas in Python
2.5 NumPy library
2.5.1 Defining and dealing with one dimensional array
2.5.2 Defining and dealing with two dimensional array
2.5.3 Defining and dealing with three dimensional arrays
2.5.4 Operations on arrays
2.5.5 Important points to remember about arrays
2.6 Matplotlib library
2.6.1 Plotting simple graphs
2.6.2 Making use of subplots
2.7 OpenCV library
2.7.1 Importing and displaying the images in default OpenCV way
2.7.2 Displaying the images using matplotlib’s default way
2.7.3 Creating packages in Python
2.8 Pandas library
Conclusion
Points to remember
Exercises

3. Playing with Digital Images


3.1 Introduction
Structure
Objectives
3.2 Playing with pixel and patches
3.3 Neighborhood of a pixel
3.4 Histogram processing
3.4.1 Histogram of a grayscale image
3.4.2 Information obtained from histogram
3.4.3 Histogram equalization
3.4.3.1 Mathematical pre-requisite for understanding histogram
equalization
3.4.3.2 Histogram equalization on one dimensional data
3.4.3.3 Limitations of histogram equalization in digital data
3.4.4 Histogram matching
3.4.4.1 Defining histogram
3.4.4.2 Mathematical background
3.4.4.3 Implementation details
3.4.4.4 Understanding histogram matching
3.5 Basic transformation on images
3.5.1 Intensity transformations
3.5.1.1 Image negatives
3.5.1.2 Logarithmic transformation
3.5.1.3 Power Law Transformation
3.5.2 Spatial transformations
3.5.2.1 Affine transformation
3.5.2.2 Projective transformation
3.6 Color models
3.6.1 RGB color model
3.6.2 Cyan-Magenta-Yellow color model
3.6.3 Hue-Saturation-Intensity color model
Conclusion
Points to remember
Exercises
4. Spatial Domain Processing
4.1 Introduction
Structure
Objectives
4.2 Signals in one dimension
4.3 Systems in one dimension
4.3.1 Linear systems
4.3.3 Linear time invariant systems in one dimension
4.4 Graphical illustration of one-dimensional convolution
4.5 One dimensional filter design intuition
4.5.1 Averaging filters in one dimension
4.5.2 First order derivative filters
4.5.3 Second order derivative filters
4.6 Concept of two-dimensional filtering of images
4.7 Two-dimensional filtering of images in Python
4.8 Smoothening filters
4.8.1 Averaging filters
4.8.2 Circular filters
4.8.3 Weighted filters
4.8.4 Gaussian filter
4.9 Sharpening filters
4.9.1 Unsharp masking and high boost filtering
4.9.2 First order derivative-based image sharpening
4.9.2.1 Prewitt Kernel
4.9.2.2 Sobel kernel
4.9.2.3 Roberts kernel
4.9.2.4 Sharpening by Prewitt, Sobel, and Roberts kernels
4.9.3 Second order derivative-based image enhancement
4.10 Convolution vs. correlation
Conclusion
Points to remember
Exercises

5. Frequency Domain Image Processing


5.1 Introduction
Structure
Objectives
5.2 One dimensional analog signal in time and frequency domain
5.2.1 Frequency domain for analog signals
5.2.2 Fourier family of transforms
5.2.2.1 Continuous time Fourier series for periodic signals
5.2.2.2 Continuous time Fourier transform for aperiodic signals
5.2.2.3 Fourier transforms for periodic signal
5.3 One dimensional discrete time signal in time and frequency
domain
5.3.1 Discrete time complex exponential with imaginary exponent
5.3.2 Fourier family of transforms for discrete time case
5.3.2.1 Discrete time Fourier series for periodic signals
5.3.2.2 Discrete time Fourier transform for aperiodic signals
5.3.2.3 Discrete Fourier transform
5.4 Two – dimensional Fourier transform
5.4.1 2D discrete Fourier transform and inverse discrete Fourier
transform
5.4.2 Image in frequency domain
5.5 Filtering of images in frequency domain
5.5.1 1D frequency domain filter design
5.5.2 Conventions used in frequency domain filtering
5.5.3 Two-dimensional ideal filtering
5.5.4 Gaussian lowpass filtering
5.5.5 Butterworth low pass filtering
5.5.6 High pass filtering of images in frequency domain
5.5.7 Band stop filtering and notch filters
5.5.8 Band pass filtering of images
Conclusion
Points to remember
Exercises

6. Non-linear Image Processing and the Issue of Phase


6.1 Introduction
Structure
Objectives
6.2 Median filtering and salt and pepper noise removal
6.3 Sampling theorem
6.3.1 Aliasing
6.3.2 Sampling theorem in 1D
6.4 Homomorphic filtering
6.4.1 Illumination reflectance model of image formation
6.4.2 Improving illumination in images
6.4.3 Improving the contrast in images
6.5 Phase and images
6.5.1 Phase spectrum of images
6.5.2 Swapping of the phase of two images
6.5.3 Non-linear phase filters and their effect
6.6 Selective filtering of images
Conclusion
Points to remember
Exercises

7. Noise and Image Restoration


7.1 Introduction
Structure
Objectives
7.2 Noise and degradation model
7.3 Restoration in presence of noise only
7.3.1 Defining noise
7.3.2 Gaussian noise
7.3.3 Rayleigh noise
7.3.4 Erlang noise
7.3.5 Exponential noise
7.3.6 Uniform noise
7.3.7 Salt and pepper noise
7.4 Detection of noise in images
7.5 Measurement of noise in images using PSNR
7.6 Classical methods of noise removal
7.7 Adaptive filtering
Conclusion
Points to remember
Exercises

8. Wavelet Transform and Multi-resolution Analysis


8.1 Introduction
Structure
Objectives
8.2 Resolution and its impact on frequency content
8.3 Time vs. frequency resolution
8.4 Loss of location information in frequency domain
8.5 Short time Fourier transform
8.6 Concept of scale and scalogram
8.7 Continuous wavelet transform
8.8 Discrete wavelet transform
8.9 Multi resolution analysis using wavelets
8.10 Noise removal using multi resolution analysis
8.10.1 Noise removal using MRA and wavelets for one-dimensional
signal
8.10.2 Noise removal using MRA and wavelets for images
Conclusion
Points to remember
Exercises

9. Binary Morphology
9.1 Introduction
Structure
Objectives
9.2 Erosion
9.2.1 Illustration of erosion
9.2.2 Mathematics behind erosion
9.2.3 Application of erosion
9.2.4 Python code for erosion
9.3 Dilation
9.3.1 Illustration of dilation
9.3.2 Mathematics behind dilation
9.3.3 Python code for dilation
9.4 Erosion dilation duality
9.5 Opening and closing
9.5.1 Illustration of opening and closing
9.5.2 Mathematical formalization of opening and closing
9.5.3 Application of opening and closing
9.5.4 Python code for opening and closing
9.6 Hit and miss transform
9.6.1 Mathematics behind hit and miss transform
9.6.2 Python code for hit and miss transform
9.7 Boundary extraction
9.7.1 Mathematics behind boundary extraction
9.7.2 Subscripted vs. linear indices
9.7.3 Python code for boundary extraction
9.8 Hole filling
9.8.1 Defining a hole
9.8.2 Hole filling algorithm
9.8.3 Python code for hole filling
9.9 Region filling
9.10 Connected component analysis
9.10.1 Two pass method
9.10.1.1 Pass 1 of connected component analysis
9.10.1.2 Pass 2 of connected component analysis
9.11 Connected component analysis using skimage library
9.12 Convex hull
9.12.1 Graham scan procedure for finding convex hull
9.12.2 Code for understanding convex hull computation
9.12.3 Convex hull of objects in binary images
9.12.4 Python’s inbuilt method for finding convex hull
9.13 Thinning
9.13.1 Illustration of thinning
9.13.2 Mathematics behind thinning
9.13.3 Python code for thinning
9.14 Thickening
9.15 Skeletons
9.15.1 Illustration of skeletonizing
9.15.2 Mathematical formalization of skeletonizing
9.15.3 Python code for skeletonizing
Conclusion
Points to remember
Exercises

Index
OceanofPDF.com
CHAPTER 1
Introduction to Digital Images

1.1 Introduction
Let us begin our journey of digital image processing by understanding some
fundamentals about images. In this chapter, we intend to understand what
image data is. We will learn its representation inside computers and the
meaning of processing an image. A distinction between the types of images
will be made. We will explore the relationship between pixels of an image
i.e., neighborhood and connectivity. Various sections of this chapter will
enable the reader to develop a basic understanding of grayscale and red,
green, and blue (RGB) images and hence will equip the reader to play with
actual image data in the next chapter by using Python programming
language and related libraries. The concepts developed in this chapter will be
implemented in the next chapter to provide hands-on exercise and hence a
richer learning experience.
But before beginning, one must understand the utility of learning digital
image processing. Out of all the senses available to human beings, vision is
often associated with trust and a lot of information and hence the famous
saying — A picture is worth thousand words. Computers try to imitate the
sense of vision by using cameras and the associated algorithms for
processing them. That is where this subject of digital image processing
becomes important. It would be helpful for us if a computer reads number
plates of thousands of vehicles on a toll road instead of a human manually
reading them. Obviously, the intelligence behind interpreting the image for
identifying the number plate and extracting numbers from the image is to be
developed by us. Well, that is what is the subject matter of this book. There
are many more such examples which will demand knowledge of digital
image processing.

Structure
The chapter discusses the following topics:
• Digital images
• Grayscale and RGB images
• Basic image conventions and indexing
• Image formats

Objectives
After reading this chapter, the reader will be able to understand the popular
types of images like RGB and grayscale. Another objective of this chapter is
to introduce the basic element of an image i.e. pixel. Every image is
composed of pixels and every pixel has its identifiers. The reader will learn
the conventions associated with the numbering of pixels as they are used by
popular computational packages like Python. At the end of the chapter,
image formats and their usage are discussed which will help the reader
distinguish between popular image storage formats like bitmap and Joint
Photographic Experts Group (JPEG).

1.1.1 Digital images


In today’s world capturing digital images from our smartphones is a trivial
task. Although it is not the only way of capturing digital images but a good
starting point for understanding the concept. All that is needed is a scene
(three dimensional for the above case), a sensor, usually a Complementary
Metal Oxide Semiconductor (CMOS) sensor in current smartphones, and a
storage device to store the data from the sensor. This sensor is a rectangular
grid of smaller elements called pixels. The sensor converts the
electromagnetic energy of light hitting every pixel to voltage or charge for
storage. The more the intensity, the more the charge, and vice versa. That is
how a grid of intensities is captured. This way, a grayscale image is
captured. Technically, the image will have all shades of gray i.e., from black
to white including a mixture of black and white in desired proportion. If we
intend to capture a colored image, we need to have three separate sensors for
red, green, and blue colors, this will give us an image that could depict any
color theoretically. We will further elaborate on how three primary colors
can be used to represent any given color in Section 1.2.2.
The type of sensors used for acquiring images is a topic that is out of the
scope of this book. Readers may find suitable text on digital imaging. It
deals with acquiring a digital image from various sensors; however, digital
image processing deals primarily with the processing part that comes after
the image is acquired.

1.1.2 Digital image processing


Digital images are often captured by sensors and are raw in the sense that
they may or may not faithfully represent the actual scene that is required by
various end users. So, one of the aims of digital image processing is to make
the captured image a faithful representation of the scene from which they are
captured. As an example, consider the image captured by a smartphone’s
camera while continuously moving your hand. The captured image will have
some motion blur or jittery appearance. So, by using a set of tools and
techniques, we should be able to remove that motion blur. Motion blur is not
the only reason why we may need image processing post-capture. Reasons
may include but are not limited to noise, poor color representation by
sensors, non-linear relationship between intensities depicted in the scene and
intensities captured, etc.
The second reason why you may be required to process images is because
you may be looking for some special features in the image. An example of
this could be a CT scan image used by doctors where the doctor might be
looking for some abnormality in the bone tissue to diagnose a fracture.
Instead of doing this manually, we may apply some image processing
techniques that will mark the fracture region and will assist doctors and save
time. As we progress reading this book, we will come up with many more
reasons why image processing is required. Some reasons are featuring
extraction, classification, segmentation, enhancement, registration, character
recognition, facial recognition, biometric identification, image text-to-speech
conversion, etc.
Digital image processing may be then defined as the application of a set of
tools and techniques applied on raw or even processed images to either make
them visually palatable to its end user (enhancement, restoration, noise
removal, etc.) or to extract desired information from images (segmentation,
classification, registration, labeling, etc.).
One important point that needs to be addressed here is the usage of artificial
intelligence and machine learning in the field of digital image processing.
These days, there are a lot of programming languages like Python,
MATLAB etc. which provide readymade artificial intelligence and machine
learning based tools to do pattern recognition tasks on images. This is
helpful but the objective of this book is to introduce the fundamentals of
digital image processing without depending on huge data that is used for
training these algorithms. Therefore, in this book we introduce fundamentals
only. Artificial intelligence and machine learning is helpful only when the
input output relationship between data and its intended usage is known
through a lot of training examples. But that is not the topic of this book.
Here, we introduce fundamentals using which processing can be done if a
single image is available.

1.2 Grayscale and RGB images


There are many types of digital images depending on the scene, sensor, and
end-user (viewer). In this section, we will study grayscale images and RGB
colored images. Throughout this book, most of the experiments are
performed on grayscale images only. The methods developed, however, can
be applied to other types of images too with some context dependent
modification.

1.2.1 Grayscale image as a two dimensional array of data


Shown in Figure 1.1 is a typical grayscale image together with an enlarged
version of the image patch shown in the box (inset) to the right. A few points
are worth noting here. First, if we sufficiently zoom in on the images, we
will realize that images are comprised of small rectangular blocks called
pixels. Each pixel has its intensity which remains the same everywhere
within that pixel. We may find pitch black or pure white intensities in
various pixels together with some pixels having a mixture of black and white
in different proportions.
Mathematically speaking, when this image is brought inside computers for
processing (a process that will be explained in the upcoming chapters), it is
treated as an array of numbers. Array, because it has a rectangular grid type
structure and of numbers, because this grid is a grid of intensities, and they
can be represented by numbers. Conventionally, pure black intensity (or
simply black) is represented by zero, pure white intensity (or simply white)
is represented by 255, and other gray values by numbers between zero and
255. Let us now discuss the reasons for why white color has a value of 255
and not 2000 or maybe -4523.89. The key here is the representation format
of numbers which is described next. Refer to the following figure for a better
understanding:

Figure 1.1: Grayscale image on the left and zoomed version of inset on the right
In a format called the 8-bit grayscale image, every pixel is represented by
eight bits inside a computer. The smallest number is (00000000)2=(0)10 and
the largest number is (11111111)2=(255)10. So, there are intensities from zero
to 255 in this format. We may also have a 16-bit grayscale format and there,
the white intensity will be represented by 65535 because that is the largest
number that can be represented by 16 bits. The convention of zero for black
and 255 for white is also arbitrary. One may choose to represent black by
255 and white by 0 too. But in this book, we will stick to the former as it is
widely used. In general, we may have a n-bit grayscale image.
Another important point that we need to understand at this stage is that
between black and white, theoretically, there are infinite shades (intensities)
of gray. But in any given format, we have finite shades of gray. For example,
in an 8-bit grayscale format, we have intensities from zero to 255 i.e., a total
of 256 intensities. Despite this quantization of infinite intensities to finite
levels, visually, there is no difference between the actual scene and the
image. This is because the human eye is insensitive to changes in intensity
between two shades of gray.

1.2.2 RGB colored image


Before talking about colored images, let us introduce color itself. Although
color frameworks are dealt with in Section 3.6, it is worth noting that in the
case of grayscale images, we could synthesize any gray intensity by
selecting a mixture of pure white and pure black intensity in the right
proportion, in the same way, any color can be synthesized by mixing three
primary colors (say red, green, and blue) in correct proportions. At this point,
it is also worth mentioning that like RGB, there are many other primary
color sets like cyan, magenta, and yellow (CMY) capable of emulating all
colors when mixed in correct proportions. RGB color model is used for
additive color mixing (like on screens), CMYK (where K stands for black
color) is used for subtractive color mixing (like in printing). We will discuss
the color frameworks in Section 3.6.
Now, let us understand what is meant by a pixel color inside a digital
computer. To understand this, you can use RGB color space as the primary
color space. If you want to synthesize pure red color in the RGB framework,
you need to mix 255 portions of red (because that is what pure color will
mean), zero portion of green, and zero portion of blue. This is represented as
[R,G,B]=[255,0,0]. Now, refer to Table 1.1 for an illustration of color
synthesis by mixing different proportions of primary colors (RGB). You may
try playing with RGB values to generate color in the Paint program offered
by Windows OS, Pinta program (and many more alternatives) offered by
Linux OS, and Paintbrush/MacPaint in MacOS.
From Table 1.1, one can appreciate that we are now able to synthesize any
color that one may think just by mixing three primary colors in desired
proportions. Still, Infinite colors cannot be created. In the next paragraph or
so, we discuss the reasons for it and therefore, we may appreciate the
human visual system (HVS).
S. No. [R,G,B] values Color Color name

11 [255,0,0] Pure red

22 [0,255,0] Pure green

33 [0,0,255] Pure blue

44 [255,255,0] Pure yellow

55 [255,128,0] Orange

66 [255,129,5] Random shade of orange

77 [128,0,0] Deep red

88 [50,20,80] Some shade of violet

99 [150,150,150] Some shade of gray

Table 1.1 : Illustration of color synthesis


Let us consider how many possible combinations can we make from the
triplet [R, G, B] when either R or G or B can individually have values from
zero to 255 independent of each other. The answer is
256×256×256=16777216. Well, that is a huge number but still, it is not
infinite. Earlier, in the case of grayscale images, we discussed that since the
human eye cannot distinguish between very close shades of gray and that is
how the quantization from infinite to finite levels is possible. Owing to the
same reasons, by using a limited number of colors, a visually good
representation of an actual colored scene can be made.
You may look at two very close colors in Table 1.1 at serial numbers 5 and 6
for two shades of orange. Visibly, they are almost the same. However, their
RGB values are different. Refer to the following figure for a better
understanding:
Figure 1.2: A 3D array data structure for storing an RGB image
In a format called a 24-bit bitmap, each pixel has three components for its
red (8-bit), green (8-bit), and blue (8-bit) colors. So, the data structure that
may contain RGB images is a three dimensional array of size r×c×3. Where
r is the height (or number of rows), c is the width (or number of columns),
and the last dimension 3 corresponds to three color components of every
pixel (R, G and B). Such an array is shown in Figure 1.2. It can be
interpreted as stacking of 3, 2x2 arrays over each other. Remember every
pixel has three components here and every component (for a single pixel)
occupies eight bits in memory. So, the total size of the image is r×c×3×8
bits i.e., r×c×24 bits, and hence the name 24-bit bitmap.
Note: This is not the only format for storage, size may vary according to the number of bits
used to represent individual pixels in an image. But we will use this format frequently because
bitmap format is a lossless format. (We will talk more about this when we discuss various
image storage formats in Section 1.4.)

1.2.3 Interpretation of color frames in an RGB image


Now that we know, an RGB image has three frames, so does a red frame
look red in color? Similarly blue and green frames do they look blue and
red? To find out the answer, let us look at an RGB image as shown in Figure
1.3. The figure has 20 blocks of different colors. The value of the color of
each block is shown in the figure below:

Figure 1.3: RGB Image together with color values


Refer to the following figure to understand the order of the frames:

Figure 1.4: R, G, and B frames in order from left to right for Figure 1.3
Note: The first row of blocks has no green or blue component as those values are zero in RGB
representation. It only has a red color in various proportions. The same applies to rows two
and three for green and blue colors. The last row, however, contains five blocks having random
colors. Now for this image, the corresponding red, green, and blue frames are shown in Figure
1.4. Evidently, the red frame does not look red at all. The same is true for green and blue
frames. Then what do these frames represent?
Let us understand this by looking only at the red frame in Figure 1.4. It does
not have a second and third row. This is expected as those two rows have
zero values for the red component. Further, the first row is present. Blocks in
the first row in the red frame have a decreasing order of brightness. This is
also expected from Figure 1.3, one can note that the red color component for
all five blocks in row one has decreasing values, 255,200,150,100 & 50. The
last row is present in every frame. This can also be understood as none of the
block colors have zero values for R, G, or B components. So, the conclusion
is that every frame of an RGB image is like a grayscale image. The intensity
of every pixel in any of the R, G, or B frames represents how strongly that
color is present for that pixel. White color in the R frame will mean full
presence of red color and zero means absence. Any value between these will
mean red is present but not to the full extent.
Let us see a natural image and its R, G, and B components in Figure 1.5 and
the corresponding RGB frames in Figure 1.6:

Figure 1.5: A natural (outdoor) image


Take your time to interpret the frame intensities and convince yourself that
what we have discussed above is validated:

Figure 1.6: R,G and B components of the image shown in Figure 1.5
1.3 Basic image conventions and indexing
We need to understand the basic conventions related to images. This will
help us in programming too. Refer to Figure 1.7 for the discussion that
follows.
Note: From now onwards, we will be discussing grayscale images only. The generalization of
the presented concepts to higher dimensional images is straightforward.

The first fact to note in Figure 1.7 is that the axis of X matches with the
conventional x-axis but, the axis of Y is in opposite direction to the
conventional y-axis (i.e., vertically downwards). The axis of X increases as
we increase column count and the axis of Y increases as we increase row
count. This convention is followed by most programming languages like
Python, MATLAB, GNU Octave and Scilab, etc:

Figure 1.7: Image pixel numbering convention (i=row, j=column)


Every pixel in an image can be accessed by providing its pixel address /
pixel index.
Note: This is not the address of the memory location where that pixel is stored. So, to avoid
confusion, we will use the term pixel index from now onwards.

This pixel index is of two types, subscripted index & linear index. In
subscripted indexing, a pixel is accessed by specifying its row number i and
column number j in the format (i,j). This is a very convenient way of
addressing or indexing a pixel as it is visually intuitive. But since every
array, be it 1D, 2D, 3D, or in general ND, is stored as a 1D array inside the
memory of the computer so accessing pixels by their subscripted indices
puts an additional computational load of converting those to linear indices.
That is where we come to linear indices. These are the indices of an
equivalent 1D array formed from a 2D (or in general ND) array by placing
columns of 2D images one over the other in order. In Figure 1.7, please see
that for every pixel the subscripted indices are shown in small brackets ()
and linear indices are shown in []. So, assuming the name of the image is I, a
pixel may be accessed in the following two alternative ways of indexing:
I(i,j) or I(k) where k=j×r+i
If subscripted indices are known, it is easy to calculate the linear indices as
shown in the formula given in Figure 1.7. The reverse is also trivial. Also,
see what shape and size mean in the context of images in Figure 1.7.
Note: We have started counting rows or columns from zero. This is not necessary, but this is the
convention Python follows. So, we will stick to it. Other programming languages like MATLAB
and GNU Octave, start numbering from one. Although we are going to learn Python
programming in the subsequent chapters just for reference, the Python code for subscripted to
linear indices conversion and back is given in Section 9.7.2. Maybe you could revisit this section
after getting started with Python to have a relook here.

1.4 Image formats


In order to be able to store the image data, various formats are used. Some of
the popular formats are listed in Table 1.2 with their usage:

S. No. Image format File extension Usage

1 Bitmap .bmp Lossless image storage format.

Lossy data compression format for digitally


generated images preserving the visual image
2 JPEG .jpg/.jpeg quality to good extent. The tradeoff between
image quality and storage size is user defined.
JPEG do not support transparency or animation.

3 Portable Network .png It is raster graphics file format that supports


Graphics (PNG) lossless data compression. Used to transmit high
quality images over the internet (network). It
does not support non RGB color frameworks.

Lossless image data compression format


Graphics designed for portability between applications
4 Interchange Format .gif and operating systems. It supports 8 bits per
(GIF) pixel as against 24 bits per pixel in colored
images. It supports animation.

It is the format for storing raster graphics


Tag Image File
5 .tiff images. Initially developed for scanner machine
Format (TIFF)
vendors to agree on common format.

Table 1.2 : Some popular image formats


Now, let us agree upon a storage format for images that we will process in
subsequent chapters. We will store images in .bmp format always because as
mentioned in Table 1.2, .bmp is a popular lossless image storage format and
we do not want the results of processing images to be compromised on
account of compression by other formats.

Conclusion
In this chapter, an introduction to the fundamentals of digital image
processing is made. By now, the reader has learned how grayscale and
colored images are different. Readers will also appreciate that the image is a
rectangular grid structure that has basic elements such as pixels. Every pixel
can be uniquely identified by indexing. Indexing could be subscripted or
linear. The various image formats for saving the image data in the context of
storing and transmission are also discussed.
In the next chapter, we will learn the basics of Python programming
language, so that subsequently we can modify the images the way we want
through methods and algorithms.

Points to remember
• The basic element of a digital image is a pixel.
• The image is a rectangular grid of pixels.
• Pixel is uniquely defined by its location on an image.
• Grayscale images are two-dimensional arrays.
• Each pixel in an 8-bit grayscale image can have integer values in the
range zero to 255.
• Colored images are three dimensional arrays.
• Each pixel in the 24-bit colored image is represented by a tuple of three
numbers where individual numbers can have integer values in the range
zero to 255
• Pixels can be linearly indexed or indexed using subscripted indices.
• The Y-axis of the cartesian coordinate and image are in opposite
directions.
• A bitmap is a compression free image storage format but it takes the
largest space to store.

Exercises
1. Identify which color is gray out of the following: (256,45,36), (0,0,255),
(10,10,10). [Hint: See Section 1.2.2]
2. Use any image viewer (like MS Paint on the Windows operating
system) and change the type of an image from .bmp to .jpg and save.
Notice the change in file size of the two images and explain the reason
for this size increment/decrement. [Hint: See Section 1.4]
3. For an image having size 500x100, find the linear indices corresponding
to pixel coordinates (50,50). [Hint: See Section 1.3]

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
CHAPTER 2
Python Fundamentals and Related Libraries

2.1 Introduction
In order to play with images and perform processing tasks, we will need to learn some programming, friendly
to images. Out of the available programming languages, we prefer Python on account of its open source, free
nature, and easy syntax. Practically speaking, it is also lightweight on computers, even if they are old. Its cross
platform nature also makes it the programming language of choice. You can run it on Windows OS, Mac OS,
Linux or Android. A second but important reason is the availability of Python libraries for various tasks. Some
contemporary tasks include machine learning, deep learning, and artificial intelligence. There are plenty of
libraries available for Python. We will learn some of them in this chapter in the context of image processing and
many more in the subsequent chapters. Python also lets you integrate various systems easily.
At the time of writing this book, the Python version is 3.x, but concepts developed here could be used on any
version (past and most probably future too) with minimal to no changes in the code. Also, remember that this
chapter is not an exhaustive coverage of Python. In this chapter, we intend to learn enough programming in
Python to start exploring the world of image processing just like a newborn that spends the first few years
learning to speak and then starts to comprehend ideas.

Structure
The chapter discusses the following topics:
• Installing Python software
• General procedure for installing libraries in Python
• Basic language elements of Python
• NumPy library
• Matplotlib library
• OpenCV library
• Pandas library

Objectives
After reading this chapter, the reader will be able to make programs in Python programming language. Since no
background in Python is assumed, this chapter serves as a self learning module from scratch. The reader will be
able to install Python, import images from their computers (hard drives) into Python, and display them. The
reader will also be able to display several pictures and plots on the same figure window with annotations.
Another objective is to learn the Python libraries relevant to image processing. The user will also be able to
define custom packages in Python after reading this chapter.

2.2 Installing Python software


Let us get the installation done first; it is an easy process. Remember, the installation process may vary slightly
from version to version, but in this section, you will get an idea of installation and important points to note
while installing.
In order to download the software, use Python’s official website: www.python.org and refer to the following
steps:
1. Select a version suitable for your operating system from the download tab. In this book, we use Python
3.10.4 (64-bit version) however the codes developed will work on most successive versions with no to
least changes.
2. Here, the installation for Windows machines is illustrated.
3. After downloading, double click the file and you will be presented with an installation prompt, as shown in
Figure 2.1.
4. Make sure to check both the options Install launcher for all users (recommended) and Add Python 3.10
to PATH as indicated below:

Figure 2.1: Python installation prompt/screen one


5. Then, click on the Customize installation option. You will see another prompt as shown in Figure 2.2. Make sure to check all boxes as
indicated.

Figure 2.2: Python installation prompt/screen two


6. Another prompt will then appear as indicated in Figure 2.3.
Note: In the Customize installation location option, there will be a default path selected. You may change it to another if you want. As
indicated in the figure, we have changed it to C:\Python310. This path is short and easy to type. So, change it as per your convenience. We
will need to type this path every time we install a new library. Also, remember that you may have multiple versions of Python installed on
your system in different paths. This may be required as some libraries are specific to some versions of python before they are updated to
match the newer versions.

Figure 2.3: Python installation prompt/screen three


7. Click the Install button after writing the desired path and the software will be installed in a few moments. Although, at this stage, we are done
with the installation of Python onto our machines, we should also learn how to install libraries to Python.

2.3 General procedure for installing libraries in Python


Here, the process for installing NumPy library is illustrated which is a library for dealing with array objects.
The same procedure will be applicable for installing all the other libraries too. (We will learn all the required
libraries when the situation demands in the book). So, to be able to install NumPy library in Windows OS, refer
to the following steps:
1. Open the command prompt by typing cmd in search and right clicking and running in administrative
mode, as shown in Figure 2.4:

Figure 2.4: Opening the command prompt in administrative mode


The following figure showcases some important commands:

Figure 2.5: Commands for installing numpy library


2. Once the command prompt is open, it usually opens in the default directory. So, you will notice the
following text typed there: C:\WINDOWS\system32>. If this is not the case with you, to be able to come
out of all the internal folders to C:\, type cd\ as shown in Figure 2.5.
3. Now, remember we installed Python in C:\Python310 folder. So, to navigate to that folder type cd
Python310 as shown in Figure 2.5.
4. Once we are in that folder, type pip install numpy to install numpy package (pip is the default way of
installing packages/libraries in Python).
5. After this, NumPy will be successfully installed (remember, an active internet connection is required while
installing Python packages as they will be downloaded after the command is given). Similarly, when
needed, other packages can be installed too.
After the installation of Python and the required libraries are done, you can start writing the code in Python
with the default editor IDLE by following the steps given below:
1. Look for it in the search bar in Windows and open it to enjoy coding in the next section. Note that there are
two ways to write a code, by using the command line option in IDLE and by using a dedicated code file.
2. To start, we will follow the first method i.e., by using the command line python interface. For that, open
IDLE from the Windows app menu (it must have been installed with Python automatically).
3. A window as shown in Figure 2.6 will appear. It is called a Python shell. We can write commands line by
line and see the results of execution here. The (>>>) sign is where we begin to write our first command.

Figure 2.6: Python shell


At times one might work with different Python projects that require different versions of the same library. Due
to this we have something called as virtual environments is Python. This simply means that one can install two
or multiple versions of the same library but in different environments (technically folders). The installation that
we have done till now is global installation. But if the situation listed above arises, one may create virtual
environments and get the job done. Below we detail the procedure of doing the same. However, it is not
required to understand the rest of the material of this sub-section because we will always use one version of one
library throughout the book. So, the readers may skip this section if desired. But the following section will be
helpful if one works with different projects with the requirement of different version of the library. One may
create as many environments as they desire. At one time one will work with only one environment which is to
be activated first. On activation, the globally installed libraries will not work, but the local ones will. On
deactivation, the global ones will start to work again. It is also possible to install different sets of libraries in
different environments; some projects might require a few libraries while some might require large number of
libraries.
One may revisit this section again if required. The procedure illustrated below is applicable to Windows OS.
Similar steps may be followed for Mac or Linux:
• Creation of project folder: Open the command prompt on windows and go to the folder where your
project files are stored. Assume that we are working in D:\My_Project (The command prompt will show
this followed by > i.e., D:\My_Project>)
• Creation of virtual environment: To create an environment named env1 (this name could be anything) as
a sub folder, type the following command on command prompt — Python -m venv env1 (venv stands for
creation of virtual environment). You may browse the project folder and see that a subfolder by name of
env1 is created. Alternatively type dir on command prompt and see.
• Activation of virtual environment: Once the environment is created, it needs to be activated. To activate,
use the following command on command prompt — env1\Scripts\activate.bat and you will note that
command prompt’s next line will look like — (env1) D:\My_Projects>. This means that the environment
env1 is activated. You may already note that inside My_project folder there are files for activation and
deactivation. Using the above command on command prompt, we have called activation file.
• Installation of libraries/packages inside virtual environment: Inside
D:\My_Projects\env1\Lib\site_packages one may see the default packages/libraries installed. You may
see only pip (one or two folders) that comes by default. To install new package, now use pip install
numpy command on command prompt. Once installed (with active internet connection) you may see that
inside D:\My_Projects\env1\Lib\site_packages relevant NumPy package folders are created.
Alternatively, this can be checked using command pip list on command prompt.
• Running the main project code in activated environment: Once installed, we may use the package.
Suppose that we have a Python code in file named ABC.py kept at location D:\My_Projects, one may
simply run it by writing on command prompt — Python ABC.py and it will show the output of that file.
This file must not use any other library that NumPy as up to now we have only installed numpy library
(Although we have not written any code as of now, this is just to understand the procedure of making
environments. One may revisit this section later if required). If the file ABC.py uses more libraries,
relevant libraries can be installed by using the pip command as illustrated above when the environment is
active.
• Storing the project dependencies of virtual environment for portability: Assuming that one is happy
with the project that they have created and now they want to put their code online or give it to someone so
that they may use it or maybe they want to collaborate with other members for improving the project or
adding functionalities to it, in any case, the libraries (versions) that the main author of the project has used
to run the project on their computer will be called as project dependencies. The other user who will receive
this project will have to first install all those libraries (with the correct version number) on their computer.
The other versions of the libraries might also work — but not always. To automate this process, on may
use the following command on the command prompt — pip freeze > requirements.txt after successful
execution, a file named requirements.txt will be created in the main project folder. If one sees its contents,
it will be a list of packages installed with their correct version numbers. Now, when a code is transferred to
others via any methods, usually the environment folder is not given with it but the requirements.txt file
is. This is due to the reasons that the environment folder is huge in size — so it is deleted from the
My_Project folder before transferring. The receiver will create a similar environment on their device. The
receiving person will nor manually install all the libraries as listed in supplied file. He will simply run the
following command on command prompt after creating their environment in My_project folder — pip
install -r requirments.txt. -r in the above command stands for install from a file.
• Deactivating the environment: When one is done working with their project or they want to switch to
other project with different dependencies, to come out of the current environment, following command on
command prompt must be used — deactivate. On execution of this command, the command prompt will
show — D:\My_Project instead of (env1) D:\My_Projects>.
Remember that the above procedure is for the Windows operating system. One can always find the similar
procedure for Mac/Linus on Python’s official documentation page. Also, Python needs to be installed on one’s
system before creating any virtual environments. Also, in the above text the default way of creating virtual
environments is illustrated that comes bundled with Python right out of the box. There are, however, many
alternative ways that one may explore on their own like using Conda, pipenv, virtualenv etc.

2.4 Basic language elements of Python


In this section, we will learn the basic language elements of Python which will include defining variables,
doing basic math operations, advanced math operations, taking input from the user, lists, tuples, conditional
statements, loops, functions, and related concepts. Note that Python is a case sensitive language.

2.4.1 Hello world program


In order to start, let us print Hello world on the screen. For this, we need to type the command print(“Hello
world”) on Python shell after the (>>>) mark. Try that and see the result, as shown in Figure 2.7:

Figure 2.7: Hello World program


Notice in Figure 2.7, we have printed Hello World twice in two different commands. The intention is to make
it clear that whether you use single or double quotes for putting text, both are valid.
In the above program, we have also learned how to use Python shell; type the command and see the result
instantly.

2.4.2 Defining and printing variables


In order to be able to define a variable a having a value of 2, type the following lines in python shell >>> a=2.
This will define a variable a of integer type having value 2 in it. To know whether it is an integer or not type
another command >>> type(a) to see the result. The result will be <class ‘int’> is printed on the shell. To print
the value on the screen, try >>> print(“ The value of a is … ”, a). You will see a The value of a is … 2 will
be printed on the screen. In the same way, we may define floats as >>> b=2.4 and character variables as >>>
c=’Hello world!’.
Also, note that once you have initialized a variable as having an integer value, it can be changed to a float or
character variable by simply assigning another value to it. What this means is that the following two commands
will work in sequence with no problems:
• >>> a=2
• >>> a=’Hello’
The latest value and type will be final.

2.4.3 Using IDLE editor in Python


Now that we understand how to use the Python shell, let us move on to learning how to use the Python editor.
To be able to write the code, any editor can be used if you know and follow all the rules of writing. There are
editors available who will flash errors or warn you if you make syntactical mistakes. So, one may choose any
good editor available but, in this book, we will stick to the one that comes with default Python installation (that
we have already done in Section 2.2) i.e., IDLE’s editor. It can be accessed by opening the IDLE and going to
Files>New File. Once you complete writing the code, it can be run by going to RUN>Run Module (or by
alternatively pressing the F5 function key on your computer) from the menu options of the code file itself.
From now on, we will be discussing codes written in code files for illustrating various concepts. An important
aspect of any programming language is styling of the code — which means some conventions and best
practices (like indentation of code lines) to make the code universally readable by every user. Although this is
not mandatory to do it is a good idea to write a code by following the style guidelines universally used. For this
one may follow the Python style guidelines on the official documentation at – https://peps.python.org/pep-
0008/. This page is updated on a regular basis for the best practices in the Python community. Readers are
requested to visit the website and get acquainted with the same.

2.4.4 Defining variables, basic math, typecasting, and input/output


Now, let us see how to use a code file. We will learn some important concepts from it. Refer to Code 2.1, for
viewing how codes are written in the code files. We will discuss every line of code step by step. For easy
referencing, code lines are numbered. The corresponding output of running the code is shown after it.
01- #======================================================================
02- # PURPOSE : Getting Started with Python
03- #======================================================================
04- import math
05- #---------------------------------------------------------------------
06- # Defining Variables
07- #---------------------------------------------------------------------
08- a='Manish' # Character or string
09- print("a is ...",a)
10- a=2 # Integer
11- print("a is ...",a)
12- b=3.2 # Float
13- print("b is ...",b)
14- #---------------------------------------------------------------------
15- # Basic Mathematics
16- #---------------------------------------------------------------------
17- print('....................................................... 1')
18- c=a+b # Addition
19- print("c is ...",c)
20- d=a-b # Subtraction
21- print("d is ...",d)
22- e=a*b # Multiplication
23- print("e is ...",e)
24- f=a/b # Division
25- print("f is ...",f)
26- g=a**b # Power
27- print("g is ...",g)
28- #---------------------------------------------------------------------
29- # Advanced Math
30- #---------------------------------------------------------------------
31- print('....................................................... 2')
32- a=90
33- print("Sine of a is ... ",math.sin(a))
34- a=(math.pi)/2
35- print("Sine of a is ... ",math.sin(a))
36- a=8
37- print("Log of a is ... ",math.log(a,8))
38- print("Log of a is ... ",math.log(a,2))
39- #---------------------------------------------------------------------
40- # Typecasting
41- #---------------------------------------------------------------------
42- print('....................................................... 3')
43- a=2.9
44- print("a is ...",a)
45- a=int(a)
46- print("a is ...",a)
47- a=float(a)
48- print("a is ...",a)
49- #---------------------------------------------------------------------
50- # Taking Input From User
51- #---------------------------------------------------------------------
52- print('....................................................... 4')
53- a=input("Enter a string ... ") # 'a' is by default a string
54- print("a is ...",a)
55-
56- print('....................................................... 5')
57- a=int(input("Enter an integer ... ")) # Notice typecasting in this
58- print("a is ...",a)
59-
60- print('....................................................... 6')
61- a=float(input("Enter an float ... ")) # Notice typecasting in this
62- print("a is ...",a)
63-
64- print("Completed Successfully ... ")
Code 2.1: Getting started with code files in Python
Output of Code 2.1:
a is ... Manish
a is ... 2
b is ... 3.2
............................. 1
c is ... 5.2
d is ... -1.2000000000000002
e is ... 6.4
f is ... 0.625
g is ... 9.18958683997628
............................. 2
Sine of a is ... 0.8939966636005579
Sine of a is ... 1.0
Log of a is ... 1.0
Log of a is ... 3.0
............................. 3
a is ... 2.9
a is ... 2
a is ... 2.0
............................. 4
Enter a string ... My First Input String
a is ... My First Input String
............................. 5
Enter an integer ... 3
a is ... 3
....................................................... 6
Enter an float ... 5.3
a is ... 5.3
Completed Successfully ...
Let us discuss this step by step.
Every code is saved in a file on a computer using the file extension .py in Python. The filename can be
anything. To be able to run a code in Python using IDLE, refer to the following points for a better
understanding:
1. Open IDLE, and then in the FILE menu, click NEW if you are writing a fresh code or OPEN (if the code is
already written in some file) and provide the path to wherever your file is stored and open it.
2. This will open a new window (which we call the editor window). Here you can see and write/edit your
code.
3. After you are done editing/writing the code, you should go to the RUN menu and select the RUN
MODULE option (or alternatively press F5). This will run your code, and you can see the corresponding
output in the python shell.
4. For Code 2.1, when you run it on the Python shell, it will ask you to enter a string, an integer, and a float
(in that order), now you may enter the corresponding values you want. In this case, we have used values —
My First input string, 3 and 5.3.
You may check that from the output shown. So, let us finally begin to discuss Code 2.1 line by line.
Line numbers 1,2 and 3 are comment lines. They are not a part of actual executable code, but as a standard
practice, it is a good idea to keep some comments that explain the purpose of a few lines or some important
context based information here. Comments can be written anywhere in the code. They start with a # symbol. As
a convention, we will use the first few lines in the code file to describe the purpose of the code.
Line number 4 is for importing the math library. Somewhere in the code, we will use functions available in the
math library. This library already comes bundled with Python. We do not need to install it. Notice that the
keyword import is highlighted in a different color scheme. You may note the color scheme for variables,
functions, etc. in the entire code file as well. This is called syntax highlighting. If you are using a black and
white version of this book, you may notice syntax highlighting in the actual code that you are typing in Python.
It is not necessary to write the full code into a code file, you may write some sections of the code that you feel
worth checking and practicing and for the rest, trust the explanation.
From line number 5 to 13, we have simply defined a character/string, integer, and float. It has been simply
printed onto the screen. This is already known to us because we did that on Python shell too. On the shell, you
may see that those lines are printed (refer to the output pasted just after the code).
From line number 14 to 27, some basic math operations are performed which does not require the math library
that we have imported earlier. In line number 17, you may notice a print statement simply printing ……1. This
is a convention that we adopt throughout this book to keep track of output while it is printed onto the shell. For
example, in the code output shown, you may see the output of the basic method section after the printed line
……1. We call this an output marker. Please note that for multiplication we use an asterisk (*) operator and
for doing power/exponent operations i.e. xy, we use x**y (i.e. a double asterisk).
From line numbers 28 to 38, we do some advanced math operations. Here, the functions/methods present in the
math library are used. For example, to be able to find the sin of a variable a, we use the syntax math.sin(a).
There are a few facts to note, refer to the following points:
• First, every function that is defined inside a library is referenced in the syntax LIBRARY.FUNCTION
as in math.sin(). Also, note that the sin may also be defined in some other libraries too. So, when we
use sin() from another library, it should be used with that library’s name.
• Second, notice the usage of math.sin() inside the print function. This is allowed in Python. You may
alternatively write x=math.sin(a) and then print it by using print(x) too. But the way we have written
it in line number 33 rescues us from creating an unnecessary variable x.
• Third, note that apart from functions, constants are also defined in that library. In line number 34 we
have used π from the math library as math.pi.
To know a list of all functions, present in the library and their internal details, type the following command on
Python shell — import math and then in the next line, type help(math). The output will reveal all the
functions, variables, constants, etc. inside the math library. If you do not want all the functions, and you want to
know only the log function, type help(math.log) on shell. This will reveal all the details of that function alone.
We recommend trying all this. Once this curiosity is addressed, confirm the output for the advanced math
section after output marker two in the output shown.
From line number 39 to 48, we have discussed the issue of conversion of one variable type to another,
popularly called typecasting. Notice how variable a, which was earlier defined as a float by assigning it a
floating-point value, is converted to int by using a=int(a) command in line number 45. Similarly, one can also
appreciate line number 47 by seeing its output in the output section.
From line numbers 49 to 64, a method for taking input from users is illustrated. Notice the difference in the
syntax of line numbers 53, 57, and 61. By default, whatever is inputted from the user onto the screen is treated
as a string. However, if one wishes to take an integer or a float, appropriate typecasting must be done as
demonstrated in line numbers 57 and 61. Also note that if you enter the wrong data type while inputting, you
will most likely get an error message. We recommend making an incorrect input to see how and what error
appears.
Finally, in line number 64, which happens to be the last line of the code, as a convention, we always use
print(“Completed Successfully … ”) statement. This is because it tells us that the code is successfully
executed and lets us know if there is some time delay in executing the steps of the code; we remain aware that
the code is running.
Be clear with every step of this code before proceeding to the next section. If required, you may experiment
with the code multiple times.

2.4.5 List data structure in Python


There are various data structures in Python. Data structure as the name suggests, is a container for holding data.
You need a water bottle to hold water while you need a carry bag to hold grocery items. In the same way,
depending on the kind of data, a suitable data structure can be selected. So, we begin by studying list data
structure in Code 2.2. The output, as a convention that we follow in this book, is listed immediately after that
code. Read the code, most of the lines are self-explanatory. However, we will discuss key lines and touch on
other important ones to get an overall understanding.
01- #======================================================================
02- # # PURPOSE : Understanding LIST in Python
03- #======================================================================
04-
05- print("\n--------------------------------------------------------- 1 ")
06- a=[1,2.0,"Rajesh",[1,2,'Manish'],'8123']
07- print("The list is ... \n",a)
08-
09- n=-1
10- print("Element ",n,"of the list is ....\n",a[n]) # access any element
11-
12- b=a[1:4] # slicing from element 1 to 3 (4 excluded)
13- print("Sliced array is ... ",b)
14-
15- a[2]=300 # changing the element of list
16- print("List after change of element is ... \n",a)
17-
18- a[2:4]=[100,200] # changing the slice
19- print("List after change of slice is ... \n",a)
20-
21- e=a[4][0] # Accessing element of element
22- print("The required element is ... \n",e)
23-
24- #---------------------------------------------------------------------
25- # Operations on Lists (frequently used ones only)
26- #---------------------------------------------------------------------
27-
28- print("\n--------------------------------------------------------- 2 ")
29-
30- a=[1,2.0,"Rajesh",[1,2,'Manish'],'hi']
31- print("The list is ... \n",a)
32-
33- a.append(2) # Append something at the end (can append only one item)
34- print("The list after appending is ... \n",a)
35-
36- index=2
37- element=500
38- a.insert(index,element) # insert an element at a location (index)
39- print("The list after inserting an element is ... \n",a)
40-
41- b=[1,2,5]
42- c=[4,5,3]
43- c.extend(b) # appending another list at the end called extending
44- print("List c after extending is ... \n",c)
45-
46- print("\n--------------------------------------------------------- 3 ")
47- r=c.index(5) # returns the index of element (only first occurence)
48- # (If the value is not there, it will throw an error)
49- print("The index of element in c is ... \n",r)
50-
51- c.remove(5) # removes the first occourence of the element
52- # (If the value is not there, it will throw an error)
53- print("List after removal of the element is ... \n",c)
54-
55- c.sort(reverse=0) # 0 for ascending order, 1 for descending order
56- print("The list after Sorting is ... \n",c)
57-
58- a=['Hello',1,2,[1,2,3]]
59- a.reverse()
60- print("The list after reversing is ... \n",a)
61-
62- print("The List has total no. of elements = ... \n",len(a))
63-
64- a=[1,2,3]
65- b=[4,5,6,7]
66- print("List Concatenation ... \n",a+b)
67- print("List Concatenation (with repetition) ... \n",a+2*b)
68-
69- print("Completed Successfully ... ")
Code 2.2: List data structure in Python
Output of Code 2.2:
--------------------------------------------------------- 1
The list is ...
[1, 2.0, 'Rajesh', [1, 2, 'Manish'], '8123']
Element -1 of the list is ....
8123
Sliced array is ... [2.0, 'Rajesh', [1, 2, 'Manish']]
List after change of element is ...
[1, 2.0, 300, [1, 2, 'Manish'], '8123']
List after change of slice is ...
[1, 2.0, 100, 200, '8123']
The required element is ...
8
--------------------------------------------------------- 2
The list is ...
[1, 2.0, 'Rajesh', [1, 2, 'Manish'], 'hi']
The list after appending is ...
[1, 2.0, 'Rajesh', [1, 2, 'Manish'], 'hi', 2]
The list after inserting an element is ...
[1, 2.0, 500, 'Rajesh', [1, 2, 'Manish'], 'hi', 2]
List c after extending is ...
[4, 5, 3, 1, 2, 5]
--------------------------------------------------------- 3
The index of element in c is ...
1
List after removal of the element is ...
[4, 3, 1, 2, 5]
The list after Sorting is ...
[1, 2, 3, 4, 5]
The list after reversing is ...
[[1, 2, 3], 2, 1, 'Hello']
The List has total no. of elements = ...
4
List Concatenation ...
[1, 2, 3, 4, 5, 6, 7]
List Concatenation (with repetition) ...
[1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7]
Completed Successfully ...
In order to see what a list is and how is it defined, refer to line number 6 of Code 2.2. A list is a data structure
that may contain other datatypes (like int, float, another list, etc.) in the form of comma separated values
enclosed in square brackets. Repetition of values is allowed, but the elements of the list may be modified later.
Elements may be deleted; new elements may be added. It is the most versatile data structure available. We will
use the list for image processing too. Line number 6 illustrates the creation of a list. It is a list of int, float, list,
and string (8123 — is treated as a string, since it is written inside a single quote).
Elements in the list are numbered. Remember, in Python, numbering starts from zero if you are counting in the
forward direction. So, the first element has an index zero, the second element has an index one, and so on.
However, counting starts from -1, if you start to count from the last element and in the backward direction. Line
numbers 9 and 10 illustrate how to access the element of a list. Since the value of the index is -1, this means
that the last element (or the first element counted from a backward direction) will get printed. Confirm this in
the output too.
Line number 12 illustrates slicing. It means taking a portion of the list and storing it if required. This is done by
b=a[1:4]. So, elements from index 1 to 3 will be stored in the new array b. Remember, in Python, whenever
we write 1:4, it means we are referring to 0,1,2, and 3 but the 4 index will be excluded. Verify this in the output
yourself.
Lines numbers 15 and 18 tell us how to change a single element in the original list or a slice of the list
respectively. In line number 21, the command is e=a[4][0]. This means, in list a, we want to access the fourth
index element (i.e., 8123) and we know beforehand that it is a string (or in general any data structure that has
multiple elements), we want to access the zero-index element, and from that the fourth index element. It may
be seen in the output before marker number two, that only eight gets printed.
From line number 24 onwards, operations on the list are illustrated. The code is self-explanatory, as it contains
comments too. At this stage, we recommend trying all these commands on the Python shell side by side.

2.4.6 Tuple data structure in Python


A data structure like a list is a tuple. The only point of difference between a list and a tuple is that a tuple is
immutable (i.e., the elements of tuples cannot be changed once defined). Have a look at Code 2.3 that
demonstrates the creation and usage of tuples:
01- #======================================================================
02- # PURPOSE : Understanding TUPLE in Python
03- #======================================================================
04-
05- print("\n--------------------------------------------------------- 1 ")
06- a=("Arjun",1,2.22,[1,2,3])
07- print("Our tule is ... \n",a)
08-
09- # a[2]=200 -This command will give an error because tuples are immutable
10- a[3][1]=5 # But interestingly, this works!!!
11- print("New value of the tuple is ... \n",a)
12-
13- print("Completed Successfully ... ")
Code 2.3: Tuple data structure in Python
Output of Code 2.3:
--------------------------------------------------------- 1
Our tuple is ...
('Arjun', 1, 2.22, [1, 2, 3])
New value of the tuple is ...
('Arjun', 1, 2.22, [1, 5, 3])
Completed Successfully ...
In Code 2.3, line number 6, it can be seen that tuples can be created using the same syntax that we used for lists
with the difference that the comma-separated list of elements is enclosed in small braces i.e. (). Although the
elements of a tuple cannot be changed, as demonstrated in line number 10 if the element of a tuple is a list, the
content of that list can be modified. If this were not possible, it would violate the fundamental right of a list as
being mutable (i.e., elements of a list can be changed).
Since tuples are immutable, there are no operations on them. We can only access the elements of tuples to read
them. Like lists and tuples, there are other data structures too like sets and dictionaries. But since we will
seldom using them for image processing, interested readers can see that from any good book on Python.

2.4.7 Conditional statements in Python


We are now going to discuss conditional statements in Python. You use conditional statements when depending
on one event, one of the choices is made. For example, if the temperature is high this afternoon, one can wear a
t-shirt or if it is low, a coat can be worn. To address such types of situations in code, we have the if statement,
if-else statement, and if-elif-else statement in Python. Code 2.4 shows an illustration of the concepts, followed
by the output:
01- #======================================================================
02- # PURPOSE : Understanding Conditional Statements in Python
03- #======================================================================
04-
05- #---------------------------------------------------------------------
06- # If statement
07- #---------------------------------------------------------------------
08-
09- print("\n--------------------------------------------------------- 1 ")
10- a=-9
11- if a<0:
12- print("The number is less than 0")
13- print("Its negative ... ")
14-
15- #---------------------------------------------------------------------
16- # If-Else Ladder
17- #---------------------------------------------------------------------
18-
19- print("\n--------------------------------------------------------- 2 ")
20- a=2
21- if a<0:
22- print("The number is less than 0")
23- print("Its negative ... ")
24- else:
25- print("The number is greater than or equal to 0")
26- print("It's positive or 0")
27-
28- #---------------------------------------------------------------------
29- # If-elif-else Ladder
30- #---------------------------------------------------------------------
31-
32- print("\n--------------------------------------------------------- 3 ")
33- a=0
34- if a<0:
35- print("The number is less than 0")
36- print("Its negative ... ")
37- elif a==0:
38- print("The number is equal to 0")
39- print("It's 0")
40- else:
41- print("The number is greater than 0")
42- print("It's positive ")
43-
44- print("Completed Successfully ... ")
Code 2.4: Conditional statements in Python
Output of Code 2.4:
--------------------------------------------------------- 1
The number is less than 0
Its negative ...
--------------------------------------------------------- 2
The number is greater than or equal to 0
It's positive or 0
--------------------------------------------------------- 3
The number is equal to 0
It's 0
Completed Successfully ...
Line numbers 11 to 13 in Code 2.4 illustrates how to implement an if statement. The keyword for implementing
the if statement is if itself followed by a condition again followed by a colon. Whenever this condition is
evaluated as true, the statements in the if block are executed. Now, let us learn what an if block is. The if block
begins from the next line. Observe that the next line is not aligned to the beginning of the if statement. It starts
with one tab separation. Similarly, the second line after the if statement also has the same structure. These two
lines belong to the if block. The moment we write anything aligned to the beginning of if line again, it is not a
part of the if block. Remember in Python programming language, there is no end statement for conditional
statements, loops or even functions. Indentation decides it all.
Suppose just after the if statement, in the next line, we forget to give the space equal to one tab, that line will
not be treated as a part of the if block, and the if statement will have a block of length zero. This means the
existence of an if statement will make no difference to the code. Try experimenting with the code to see how
this works. Try changing the value of a and see if it is positive, whether the two lines in the if block are printed
or not. As you might have already guessed, they will not be printed in the output for positive values of a.
Line numbers 21 to 26 in Code 2.4 illustrates how to implement the if-else block. Although it is simple and
self-explanatory, it is worth mentioning that else is not followed by any condition. It works for all the cases for
which the if block does not work. That is, if the condition of the if statement is evaluated as false, the else block
automatically becomes true and hence will be executed.
Line numbers 34 to 42 in Code 2.4 illustrates the construction of the if-elif-else ladder. Sometimes we have
non-binary choices in life. For example, if you pass an exam with an A+ grade, your father might promise to
buy you a bike, or if you get an A grade, he will take you on a tour to India. If none of the above happens, he
might buy you a gaming computer. Such situations are handled by the if-elif-else statement. Notice the usage in
line numbers 21 to 26. Try experimenting with values of a and verify if appropriate blocks are printed or not.
One important point about the if-elif-else statement is that it can have as many elif. This means you can have
many choices for the result of a given event.
Note: In programming languages like C/C++, the if-elseif-if ladder is used. However, notice how we spell else if in Python. It is called elif
and not elseif.

Note: Inside a block (whether it is a block of If, elif, or else, one may have another conditional statement if desired. It is possible to nest
conditional statements as long as we use proper indentation).

2.4.8 Loops in Python


Many times, one is required to do a task repetitively. For example, a professor evaluates the answer sheets of
100 students. Here, the process evaluation is the same for all 100 students. So, this is a repetitive task.
Evaluation is to be performed 100 times in the same way. Remember, depending on the quality of answers
written, the marks of students might vary, but for everyone, the evaluation has happened on the same standards.
We represent such situations by loop in programming. In Python, we get for loops and while loops. They are
illustrated in Code 2.5, followed by output:
01- #======================================================================
02- # PURPOSE : Understanding Loops in Python
03- #======================================================================
04-
05- #---------------------------------------------------------------------
06- # FOR Loop
07- #---------------------------------------------------------------------
08-
09- print("\n--------------------------------------------------------- 1 ")
10- a=[1,"HI",(1,2,3),[3,4],{1,2,3},{1:100,'a':200}]
11- for i in a:
12- print(i)
13-
14- #---------------------------------------------------------------------
15- # FOR Loop (using range)
16- #---------------------------------------------------------------------
17- print("\n--------------------------------------------------------- 2 ")
18- for i in range(0,12,3): # In range, upper limit (12 here) is excluded
19- print(i)
20-
21- #---------------------------------------------------------------------
22- # FOR Loop (break statement)
23- #---------------------------------------------------------------------
24-
25- print("\n--------------------------------------------------------- 3 ")
26- a=[2,4,2,1,5,6,7,0]
27- for i in a:
28- if i==1:
29- break;
30- print(i)
31-
32- #---------------------------------------------------------------------
33- # FOR Loop (continue statement)
34- #---------------------------------------------------------------------
35-
36- print("\n--------------------------------------------------------- 4 ")
37- a=[2,4,2,1,5,6,7,0]
38- for i in a:
39- if i==1:
40- continue;
41- print(i)
42-
43- #---------------------------------------------------------------------
44- # WHILE Loop
45- #---------------------------------------------------------------------
46-
47- print("\n--------------------------------------------------------- 4 ")
48- a=1;
49- while a<5:
50- print(a)
51- a=a+1
52-
53- print("Completed Successfully ... ")
Code 2.5: Loops in Python
Output of Code 2.5:
--------------------------------------------------------- 1
1
HI
(1, 2, 3)
[3, 4]
{1, 2, 3}
{1: 100, 'a': 200}
--------------------------------------------------------- 2
0
3
6
9
--------------------------------------------------------- 3
2
4
2
--------------------------------------------------------- 4
2
4
2
5
6
7
0
--------------------------------------------------------- 4
1
2
3
4
Completed Successfully ...
In the case of the evaluation example, with every answer sheet that is checked, the number of answer sheets
checked increases by one. So, if that problem were to be solved using a loop, we would need an iteration
variable whose value should change from 1,2,3,4 … 100 (regular sequence). Technically speaking, a Python
loop can iterate over a regular or irregular sequence. An example of an irregular sequence could be a list in
Python. See line number 10 of Code 2.5 where we have defined the following list: a=[1,"HI",(1,2,3),[3,4],
{1,2,3},{1:100,'a':200}]. We want to run a loop taking this as the iteration variable. This means that the loop
will run for the first time in a=1 (first element of the list). For the second time, a= “HI” and so on.
In line number 11 and 12 of Code 2.5, for loop is defined. Notice the syntax (the line ends in a colon like if-
else). A loop needs an iteration variable which happens to be the list i. Note that there can be as many lines in
the for block as desired (in the current example it is one). Notice the indentation of the for block too. Similar to
the if-else statement, one tab separation is mandatory for all lines that belong to the for loop. Once the one-tab
separation is broken, all lines (including that one) after that are not the part of for loop. In line number 12, the
value of i, which is the iteration variable, is printed. The value i picks up values from a with every iteration.
You may note in the output section, the elements of the list are printed after output marker one. Also, keep in
mind that there can be loops inside loops with proper indentation. We will make use of such nested loops in the
upcoming chapters.
Let us now talk about the loop of line numbers 18 and 19 of Code 2.5. It is the same as we have discussed
earlier with one difference. The iteration variable picks up values from the range() function. This is a way of
creating regular iteration variables. In the current example, range(0,12,3) creates values from 0 to 12 (but 12
excluded) with a gap of 3, that is - 0,3,6,9. Verify this in the output after marker two.
There are situations in which we would like to stop executing the loop if some condition is satisfied. For
example, if we find the number 1 in the list, we would like to stop the execution of the loop. This is done by
using the break statement and is demonstrated in line number 25 to 30 of Code 2.5. Verify this in the output
after output marker three. Breaking does not mean stopping the code. It simply means the termination of the
loop and then whatever code lines are written after that will be executed normally.
Note: In line 28, if i==1 means if i is equal to 1. == is not an assignment operator. It is an equality checker.
When the condition is true, the if block gets executed, otherwise it does not.
Another situation could be that we would like to skip the execution of the loop for certain iterations. This is
done using the continue statement. Refer to line numbers 36 to 41 of Code 2.5. In this example, we skip the
iteration where the variable has a value of one. Confirm this from the output after output marker four.
In line numbers 48 to 51 of Code 2.5, an illustration for while loop is being made. It is similar to the for loop
and the one less frequently used.
At this point it is important to learn list comprehension in Python as we are now through with the concept of
lists and loops by now. List comprehension helps us create a list from and existing list in a shorter syntax. To
understand this, we take an example by creating one list from the other using loops and then by using list
comprehensions. Let us take a list as follows – [5,2,3,1,4]. We want to create a list from this list such that every
element in new list is 2 plus the original one but only when it is originally greater than 3. This means we expect
the new list to be – [7,6]. This can be accomplished using a for loop as follows:
for i in a:
if i>3:
b.append(i+2)
For an already created empty list b. But the same can be done in one line using list comprehensions as follows:
b= [i+2 for i in a if i>3]
With the exact same output. The example above is simple enough to illustrate the general form of list
comprehensions.

2.4.9 Functions and lambdas in Python


In this section we will learn about functions and one-line functions called lambdas. Lambdas are a very popular
way of defining functions in new programming languages.
01- #======================================================================
02- # PURPOSE : Understanding Functions in Python
03- #======================================================================
04-
05- #---------------------------------------------------------------------
06- # functions
07- #---------------------------------------------------------------------
08-
09- print("\n--------------------------------------------------------- 1 ")
10- def my_func(a,b):
11- c=a+b
12- d=a-b
13- return c,d
14-
15- print("Function demonstration ... ")
16- m,n=my_func(2,3)
17- print("The returned values are ... ",m,"&",n)
18- m,n=my_func(2,sum(my_func(5,6)))
19- print("The returned values are... “,m,”&”,n)
20-
21- #---------------------------------------------------------------------
22- # Lambdas
23- #---------------------------------------------------------------------
24-
25- print(“\n--------------------------------------------------------- 2 “)
26- x=lambda a,b,c : a+b-c
27- print("Value returned by lambda ... ",x(2,3,4))
28- print("Value returned by lambda ... ",x(7,3,6))
29-
30- print("Completed Successfully ... ")
Code 2.6: Functions and lambda functions in Python
Output of Code 2.6:
--------------------------------------------------------- 1
Function demonstration ...
The returned values are ... 5 & -1
The returned values are... 12 & -8

--------------------------------------------------------- 2
Value returned by lambda ... 1
Value returned by lambda ... 4
Completed Successfully ...
Line numbers 10 to 13 of Code 2.6 illustrates the definition of a function. A function needs to be defined before
its usage. The function is also meant to serve a repetitive task like a loop. The difference can be understood by
extending our old example, the evaluation of answer scripts by the professor. Assuming that a professor is
teaching a course in more than one university, he/she has to go to the respective exam center of the college to
check answer scripts. The evaluation standard in this case (function) is again the same. But the services are
needed at different geographical places (in our case different places in codes), that is why functions are useful.
But apart from this, they are more computationally expensive as compared to loops in the sense that while
executing a main program, if a function call is made, the computer must shift the control from the current
position to the position of that function. Execute it and then return to the main function. During this journey, the
important data of the main function is kept on the stack (last in first out list) and retrieved when the control is
returned to the main function. That is why functions are more expensive than loops.
Coming back to defining a function in lines number 10 to 13 of Code 2.6, def keyword is used for defining a
function followed by the name of the function which is a single continuous string. Then in small braces, the
input arguments of the function need to be defined. In our case, the function name is my_func and it takes two
arguments a and b as input. This line must be terminated by the colon. Then comes the function body, it has the
same rules of indentation as if-else and loops. Notice the last line of the function body. The keyword used there
is return. If you have read the last paragraph, this should not be a mysterious word by now. After returning the
keyword, you may notice two variables c, and d. These are the variables that are computed in the function
definition and returned to the main function. Note that you may use any number of input arguments (of any
type like int, float, list, etc.) and it may return any number of variables (of any type) in Python. Now, let us
have a look at what has happened inside the function definition in this specific case. It is easy to see from line
numbers 11 and 12 of Code 2.6 that c and d are simply addition and subtraction respectively of a and b.
Now, let us see line number 16 of Code 2.6. Here, we called the function that we defined. The input parameters
are 2 and 3 which after processing will be returned to variables m and n respectively. Check that in the output
when c and d are printed.
In line numbers 18 of Code 2.6, the function is called again (you can use that function any number of times,
anywhere you want in the code). The point of difference from the first function call to note here is that the
second input argument, instead of being a number is — sum(my_func(5,6)). Here, my_func(5,6) returns two
numbers which the sum() function sums up to give the single number and that acts as the second argument,
which is perfectly valid.
A special class of functions that have only one output and more than one input and can be defined in only one
line (one expression) of code are called lambda functions or simply lambdas. In line number 26 of Code 2.6,
a lambda function is defined as x=lambda a,b,c : a+b-c. Here, a, b, and c are inputs (there can be as many as
desired). The expression that is evaluated to produce a single output x is a+b-c. This function is used as
x(2,3,4) to give an output that may be collected in some other variable or printed as shown in the next line.

2.5 NumPy library


One of the definitions of physical quantity is something that can be quantized. Whatever you can measure in
numbers is a physical quantity. Temperature, weight, speed, etc. are all measurable hence physical quantities.
Emotions, relations etc. on the other hand cannot be quantified, hence they are all non-physical. Science deals
with physical quantities only and as discussed earlier, all of them can be quantified (brought into the form of
numbers). Usually, an array is a data structure suitable to store such data. Arrays differ from lists in the sense
that a list may contain anything as elements (the same or different data of elements is allowed). Arrays,
however, are data structures that store elements with the same datatype.
If something can be represented by numbers, it can be simulated inside computers because that is what
computers are best at, number crunching. So, to be able to deal with real-world objects (physical quantities), we
need a good manager of numbers inside computers. One such powerful manager is the numpy library.
Technically speaking, NumPy is a powerful n-dimensional array creation and manipulation library. It has a
great variety of numerical computation tools (that we are going to see in this section). NumPy supports a wide
range of modern hardware. It is faster than lists because most of its code is written in C and C++, so we get the
fastest computing. Also, unlike lists, NumPy arrays are stored in a continuous place in memory and hence
manipulation is very efficient. This is called the locality of reference in computer science.
Since numpy library deals with numerical computation, its name comes from the Num of numerical and Py of
Python, that is why it is called Numerical Python (NumPy). So, in this section, let us see numpy at work. We
will learn important NumPy concepts here which are generic for every field. We will give special attention to
one, two, and three dimensional arrays for image processing that is sufficient (although NumPy can deal with
n-dimensional arrays in general). But you will get an idea of how to deal with higher dimensional objects too.

2.5.1 Defining and dealing with one dimensional array


Let us explore Code 2.7 for understanding the creation and manipulation of NumPy arrays. There is a lot to
learn from this code so, pay good attention. The output of this code (as a convention in this book) is shown
immediately after it.
001- #======================================================================
002- # PURPOSE : Dealing with one dimensional array
003- #======================================================================
004-
005- import numpy as np; # importing the python array object 'numpy' by name 'np'.
006-
007- #-----------------------------------------------------------------------
008- # Creating 1D array
009- #-----------------------------------------------------------------------
010- print("---------------------------- 1")
011- a=np.array([2,8,63,-9,3.2])
012- print(a)
013- print("The shape of array",a,"is",a.shape)
014-
015- #-----------------------------------------------------------------------
016- # Accessing 2nd element (Remember that in Python, indexing starts from 0)
017- #-----------------------------------------------------------------------
018- print("---------------------------- 2")
019- b=a[1]
020- print(b)
021-
022- #-----------------------------------------------------------------------
023- # Changing the 2nd element i.e (element at index 1)
024- #-----------------------------------------------------------------------
025- print("---------------------------- 3")
026- a[1]=800
027- print(a)
028-
029- #-----------------------------------------------------------------------
030- # Calculating the length of 'a'
031- #-----------------------------------------------------------------------
032- print("---------------------------- 4")
033- c=len(a)
034- print("Length of a is ... ",c)
035-
036- #-----------------------------------------------------------------------
037- # Accessing the last element of array
038- #-----------------------------------------------------------------------
039- print("---------------------------- 5")
040- # (Notice that indexing starts from -1 in reverse direction.
041- print("The last element is ... ", a[-1])
042- print('The second last element is ...',a[-2])
043-
044- #-----------------------------------------------------------------------
045- # Creating array from one number to another with a GAP using arange
046- #-----------------------------------------------------------------------
047- print("---------------------------- 6")
048- # (Remember the syntax is (START,END+GAP,GAP)
049- d=np.arange(2,10,1) # numbers from 2 to 9 (NOT 10) with a gap of 1.
050- print(d)
051- d=np.arange(2,10,3) # numbers from 2 to 9 (NOT 10) with a gap of 3.
052- print(d)
053- d=np.arange(-2.3,1.3,.5)
054- print(d)
055-
056- #-----------------------------------------------------------------------
057- # Creating numbers from START to END using linspace
058- #-----------------------------------------------------------------------
059- print("---------------------------- 7")
060- d=np.linspace(1,5,10) # end limit i.e. 5 is included.
061- print(d)
062- d=np.linspace(-2.3,2.2,5)
063- print(d)
064-
065- #-----------------------------------------------------------------------
066- # Accessing elements of array in arbitrary order
067- #-----------------------------------------------------------------------
068- print("---------------------------- 8")
069- a=np.array([2,8,63,-9,3.2,0 ,2 ,4 ,8 ,6 ,4 ,7 ,9,6]);
070- print(a)
071- b=a[[0,-1,2,2,-7]]
072- print(b)
073- print(a[np.arange(2,10,2)])
074-
075- #-----------------------------------------------------------------------
076- # Concatenating two arrays
077- #-----------------------------------------------------------------------
078- print("---------------------------- 9")
079- a=np.array([1,2,3,4,5])
080- print(a)
081- b=np.array([2.3,-9.4])
082- print(b)
083- c=np.hstack((a,b)); # Horizontal concatenation (double brackets are necessary)
084- # vertical is not possible here because of dimension mismatch
085- # (otherwise use np.vstack)
086- print(c)
087-
088- #-----------------------------------------------------------------------
089- # Deleting specific elements from array
090- #-----------------------------------------------------------------------
091- print("---------------------------- 10")
092- a=np.array([1,2,3,4,5,6,7,8,9,10,11])
093- print(a)
094- a=np.delete(a,[8,2,4]) # supply indices of elements to be deleted
095- print(a)
096-
097- #-----------------------------------------------------------------------
098- # Obtaining the dimensionality of array
099- #-----------------------------------------------------------------------
100- print("---------------------------- 11")
101- a=np.array([1,2,3,4,5,6,7,8,9,10,11])
102- print(a)
103- print(a.ndim)
104-
105- #-----------------------------------------------------------------------
106- # Obtaining type
107- #-----------------------------------------------------------------------
108- print("---------------------------- 12")
109- a=np.array([1,2,3,4,5,6,7,8,9,10,11.9])
110- print(a)
111- print(a.dtype)
112- print(type(a))
113-
114- #-----------------------------------------------------------------------
115- # Accessing elements of 1D array in sequence
116- #-----------------------------------------------------------------------
117- print("---------------------------- 13")
118- a=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17])
119- print(a)
120- # Elements from index 0 to index 6 (excluded) with a gap of 3
121- print(a[0:6:3])
122-
123- print("Completed Successfully")
Code 2.7: Dealing with one dimensional NumPy arrays in Python
Output for Code 2.7:
---------------------------- 1
[ 2. 8. 63. -9. 3.2]
The shape of array [ 2. 8. 63. -9. 3.2] is (5,)
---------------------------- 2
8.0
---------------------------- 3
[ 2. 800. 63. -9. 3.2]
---------------------------- 4
Length of a is ... 5
---------------------------- 5
The last element is ... 3.2
The second last element is ... -9.0
---------------------------- 6
[2 3 4 5 6 7 8 9]
[2 5 8]
[-2.3 -1.8 -1.3 -0.8 -0.3 0.2 0.7 1.2]
---------------------------- 7
[1. 1.44444444 1.88888889 2.33333333 2.77777778 3.22222222
3.66666667 4.11111111 4.55555556 5. ]
[-2.3 -1.175 -0.05 1.075 2.2 ]
---------------------------- 8
[ 2. 8. 63. -9. 3.2 0. 2. 4. 8. 6. 4. 7. 9. 6. ]
[ 2. 6. 63. 63. 4.]
[63. 3.2 2. 8. ]
---------------------------- 9
[1 2 3 4 5]
[ 2.3 -9.4]
[ 1. 2. 3. 4. 5. 2.3 -9.4]
---------------------------- 10
[ 1 2 3 4 5 6 7 8 9 10 11]
[ 1 2 4 6 7 8 10 11]
---------------------------- 11
[ 1 2 3 4 5 6 7 8 9 10 11]
1
---------------------------- 12
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.9]
float64
<class 'numpy.ndarray'>
---------------------------- 13
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
[0 3]
Completed Successfully
Before we begin to explain Code 2.7, note that it will require numpy package to be installed. The procedure is
already illustrated in Section 2.3. Line number 5 of Code 2.7 which is import numpy as np; imports this
package. Although import numpy would have also worked, we have nicknamed numpy as np by using the
keyword as (you can use any nickname you want, but np is conventionally used for numpy). This reduces our
typing effort as every time we call a function from numpy library, we have to type
numpy.SOMEFUNCTION. So, instead, we will be typing np.SOMEFUNCTION. Note that a semicolon
after completion of the statement is optional.
In line number 11 of Code 2.7 which is a=np.array([2,8,63,-9,3.2]), a method to create a one-dimensional
array is illustrated. It is trivial to understand that the array that is formed will be stored in the variable a, written
on the left-hand side of the statement. Now, to be able to create an array, we use the NumPy function array(),
and inside it, we pass a list [2,8,63,-9,3.2] as an argument. This creates a NumPy array a. Note that here the list
should have numbers only. If it has a char/string with numbers, the array will not be created as it would violate
the definition of the array. Also note that instead of passing a list of numbers to the function array(), we could
have passed a tuple of numbers too. Line number 12 simply prints the array and line number 13 prints the shape
of the array by using a shape syntax. Now, first, what is shape? Assume that we have a two dimensional array
with five rows and six columns, its shape will be (5,6). Similarly, here we have a one dimensional array (so a
row or column will not make sense), its shape is simply the number of elements in it. In the present case, there
are five, which can be verified from output.
In order to know the shape, we may use two types of syntax, a.shape, and np.shape(a). To understand the
availability of these two syntax for usage, we must use some concepts from object oriented programming. Let
us take an example, we are human beings, but we are different realizations of human beings. Although we both
are human beings, one might be tall, one might be short, one might be 20 years of age, and one might be 35. So
human beings, in this case, is a class and we are the two different realizations or objects of that class. None of
us have four legs, because that is not the case with human beings. In a similar way, numpy is a class, and the
variable a is one of its realizations. You may create another variable b of numpy class which may store a
different array and hence, will have different properties (like shape). So, to be able to know the properties of the
object of a class (shape in this case) we use syntax one mentioned above. It can also be known by the second
alternate syntax. Try both to see if you get the same results. Remember that in syntax one, you do not have to
put braces after the shape function. If you are through with this, the rest of the code is easy to follow.
In line numbers 19 of Code 2.7 which is b=a[1] illustrates how to access the second element (or the element
with index one, remember that numbering in Python starts with zero) of the array so formed. It is important to
note the usage of square brackets for putting the index. Similarly, line number 26, which is a[1]=800, does the
reverse that is, it changes the value of the element with index one (or second element) of the array and replaces
it with 800. Confirm the result in the output section. Line number 33 is c=len(a), which gives us a way to
calculate the length of the array. Note that length can be calculated for only one-dimensional arrays and that the
len function does not belong to numpy library. It is a function of the Python language itself. len function can
also be used to calculate the length of lists, tuples, etc.
Line numbers 41 and 42 of Code 2.7 prints the value of the last and second element of the array by counting
indices from the backward direction. We already know that counting indices from the last element in a
backward direction starts from -1.
Line numbers 51 (49 and 53 too) of Code 2.7 which is d=np.arange(2,10,3), illustrates the concept of the
creation of an array with some default numbers. It uses arange() function of the numpy library which takes
three arguments, starting number, ending number+gap, and gap. For example, the d=np.arange(2,10,3) creates
a one dimensional array that has numbers starting from two and goes up to 10 (but 10 excluded) with a gap of
3. This means d =[2,5,8].
At this point, it is important to remember the range function from Python which we have studied in 2.4.8. It
also created the same sequence of numbers and an iteration variable. This means that the range function will
not store the numbers it creates in memory. It just provides iteration variables whenever the loop runs.
However, np.arange function creates an array and stores it in memory. Since the range function does not
consume any storage memory, it is efficient when used as an iterator vs. numpy.arange (np.arange).
For the creation of default arrays, a similar function called linspace is there in line number 60 and 62 too,
which reads d=np.linspace(1,5,10); this is a little different from np.arange. The first difference is that the end
limit (in this case five) will be included in the result and the last input argument (which is 10 in this case) does
not represent gap, it represents the total number of elements to be created between 1 and 5 i.e., initial and final
number by linearly dividing the gap between them in 10, in this case, equal parts. So, np.arange() gives you
control over gap but np.linspace() gives you a control over total number of elements that should be there in the
array that was created.
Note: There are two more important functions (not a part of Code 2.7) that are used to create default arrays initialized with all zeros and
all ones respectively; they are np.zeros() and np.ones(). They only take one input argument as a number (for one dimensional array
creation). For example, np.zeros(5) will create [0,0,0,0,0].
Returning to Code 2.7, there are situations when we would like to retrieve some elements of array. This is
demonstrated in line number 69 which creates an array a=np.array([2,8,63,-9,3.2,0 ,2 ,4 ,8 ,6 ,4 ,7 ,9,6]) and
line number 71 which accesses its elements in random order as b=a[[0,-1,2,2,-7]]. Notice here that in place of
passing a single index to a[], we could pass a list (or another np.array as demonstrated in line number 73, but
not a tuple) of indices to access the elements. The order and times the index occurs do not matter. Verify this in
the output of Code 2.7 after output marker eight.
Line number 79 to 86 illustrates how arrays can be concatenated. The lines are well commented and self-
explanatory. In line number 94, the deletion of specific elements from an array is illustrated: it reads
a=np.delete(a,[8,2,4]). All we have to do is provide the array name from which we want the elements to be
deleted as the first argument, and the second argument is a list of indices of all the elements we want to delete.
As illustrated in line number 103, to know the array's dimensionality, one may use syntax a.ndim or
alternatively np.ndim(a). Since, in this case, our array is one dimensional, the answer will be one. Verify this
after output marker 11 in the results. As given in line numbers 111 and 112, the datatype of individual elements
in the array and the overall datatype can be seen by using the commands a.dtype and type(a) respectively: see
the outputs of these, you will be able to note that every element by default if float of 64 bits.

2.5.2 Defining and dealing with two dimensional array


The creation of a two-dimensional or for that matter n-dimensional array is not very different from one
dimensional arrays. So, Code 2.8 illustrates the same concepts that we developed for one dimensional arrays
and two dimensional arrays together with the output just after that. Although most of the code is self-
explanatory, we will discuss important points that need attention.
001-
#============================================================================
002- # PURPOSE : Dealing with two dimensional arrays
003-
#============================================================================
004- import numpy as np
005- #----------------------------------------------------------------------------
006- # Creating a 2D array
007- #----------------------------------------------------------------------------
008-
009- print("---------------------------- 1")
010- a=np.array([(2,5,9,8),(-8,-6,3,0),(7,6,2,7)]) # Array of tuples
011- b=np.array([[2,5,9,8],[-8,-6,3,0],[7,6,2,7]]) # Array of Lists
012- c=np.array([[2,5,9,8],[-8,-6,3,0],(7,6,2,7)]) # Array of mixture
013- print(a,”\n”)
014- print(b,”\n”)
015- print(c,"\n")
016-
017- #----------------------------------------------------------------------------
018- # Accessing element at index (2,1)
019- #----------------------------------------------------------------------------
020- print("---------------------------- 2")
021- b=a[2,1]
022- print("element at index (2,1) is ... ",b,"\n")
023-
024- #----------------------------------------------------------------------------
025- # Changing the element at position (2,1)
026- #----------------------------------------------------------------------------
027- print("---------------------------- 3")
028- a[2,1]=800
029- print("Changed array is ...",a)
030-
031- #----------------------------------------------------------------------------
032- # Calculating total number of elements (size) and dimensions (Shape) of 'a'
033- #----------------------------------------------------------------------------
034- print("---------------------------- 4")
035- c=np.size(a)
036- print("The size of array is ... ",c,"\nCalculated by another formula ...",a.size)
037- d=np.shape(a)
038- print("The shape of array a is ",d,"\nCalculated by another formula ...",a.shape)
039-
040- #----------------------------------------------------------------------------
041- # Creating array of all zeros/Ones of predefined size (say 3x4)
042- #----------------------------------------------------------------------------
043- print(“---------------------------- 5”)
044- d=np.zeros((3,4)) # OR d=np.zeros([3,4])
045- print(d,”\n”)
046- d=np.ones((3,4))
047- print(d,”\n”)
048-
049- #----------------------------------------------------------------------------
050- # Accessing elements of array in arbitrary order/Replacement
051- #----------------------------------------------------------------------------
052- print("---------------------------- 6")
053- a=np.linspace(1,12,12).reshape(3,4)
054- print('array a is : ',a,"\n")
055- # Access elements with index (0,0) & (0,2) & (1,2) & (2,1)
056- b=a[[0,0,1,2],[0,2,2,1]]
057- print(b,"\n")
058- a[np.array([0,0,1,2]),np.array([0,2,2,1])]=100 # For replacing elements by single number
059- print(a,"\n")
060- a[np.array([0,0,1,2]),np.array([0,2,2,1])]=np.array([300,400,500,600])
061- # for replacing by different numbers
062- print(a,"\n")
063-
064- #----------------------------------------------------------------------------
065- # Concatenating two arrays
066- #----------------------------------------------------------------------------
067- print("---------------------------- 7")
068- a=np.array([(1,2),(5,6)])
069- print(a,"\n")
070- b=np.array([(2.3),(-9.4)])
071- print(b,"\n")
072- c=np.vstack((a,b)); # Vertical concatenation
073- # horizontal is not possible here because of dimension mismatch
074- # (otherwise use np.hstack)
075- print(c,"\n")
076-
077- #----------------------------------------------------------------------------
078- # Deleting specific row/column from array
079- #----------------------------------------------------------------------------
080- print("---------------------------- 8")
081- a=np.array([(2,5,9,8),(-8,-6,3,0),(7,6,22,7)]);
082- print(a,"\n")
083- a=np.delete(a,2,1) # Last argument is for direction.
084- # Direction 0=rows, direction 1=columns.
085- # middle argument is the index of element along that direction
086- # here 2nd (starting from 0) column will be deleted
087- print(a,"\n")
088-
089- #-----------------------------------------------------------------------
090- # Obtaining the dimensionality of array
091- #-----------------------------------------------------------------------
092- print("---------------------------- 9")
093- a=np.array([(2,5,9,8),(-8,-6,3,0),(7,6,22,7)]);
094- print(a)
095- print(a.ndim)
096-
097- #-----------------------------------------------------------------------
098- # Obtaining type
099- #-----------------------------------------------------------------------
100- print("---------------------------- 10")
101- a=np.array([(2,5,9,8),(-8,-6,3,0),(7,6,22,7)]);
102- print(a)
103- print(a.dtype)
104- print(type(a))
105-
106- #-----------------------------------------------------------------------
107- # Acessing elements of 2D array in sequence
108- #-----------------------------------------------------------------------
109- print("---------------------------- 11")
110- a=np.array([(2,5,9,8,3,4,5),(-8,-6,3,0,4,5,6),
(7,6,22,7,5,6,7),(6,7,8,100,200,300,400)]);
111- print(a)
112- # Elements from alternate rows and columns
113- print(a[0::2,0::2])
114- # Elements from alternate rows and columns with gap of 3
115- #but index range (0 to 4, 5 excluded)
116- print(a[0::2,0:5:3])
117-
118- print("Completed Successfully")
Code 2.8: Defining and dealing with two dimensional arrays
Output of Code 2.8:
---------------------------- 1
[[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 2 7]]
[[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 2 7]]
[[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 2 7]]
---------------------------- 2
element at index (2,1) is ... 6
---------------------------- 3
Changed array is ... [[ 2 5 9 8]
[ -8 -6 3 0]
[ 7 800 2 7]]
---------------------------- 4
The size of array is ... 12
Calculated by another formula ... 12
The shape of array a is (3, 4)
Calculated by another formula ... (3, 4)
---------------------------- 5
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
---------------------------- 6
array a is : [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 10. 11. 12.]]
[ 1. 3. 7. 10.]
[[100. 2. 100. 4.]
[ 5. 6. 100. 8.]
[ 9. 100. 11. 12.]]
[[300. 2. 400. 4.]
[ 5. 6. 500. 8.]
[ 9. 600. 11. 12.]]
---------------------------- 7
[[1 2]
[5 6]]
[ 2.3 -9.4]
[[ 1. 2. ]
[ 5. 6. ]
[ 2.3 -9.4]]
---------------------------- 8
[[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 22 7]]
[[ 2 5 8]
[-8 -6 0]
[ 7 6 7]]
---------------------------- 9
[[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 22 7]]
2
---------------------------- 10
[[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 22 7]]
int32
<class 'numpy.ndarray'>
---------------------------- 11
[[ 2 5 9 8 3 4 5]
[ -8 -6 3 0 4 5 6]
[ 7 6 22 7 5 6 7]
[ 6 7 8 100 200 300 400]]
[[ 2 9 3 5]
[ 7 22 5 7]]
[[2 8]
[7 7]]
Completed Successfully
Line numbers 10, 11, and 12 in Code 2.8 illustrate how a two-dimensional array is created. Note that all of
these are three alternate syntaxes for the creation of an array with the same elements. In line number 10 which
is a=np.array([(2,5,9,8),(-8,-6,3,0),(7,6,2,7)]), one may note that np.array() is the function used to create
array again. Inside this function, we will pass a list as indicated by square brackets []. This list is not a list of
simple numbers instead, it is a list of tuples (or a list as in line number 11 or a mixture as in line number 12).
In line number 53 of Code 2.8 which is a=np.linspace(1,12,12).reshape(3,4), is a way for creating a default
two-dimensional array is illustrated. First, np.linspace(1,12,12) creates an array
[1,2,3,4,5,6,7,8,9,10,11,12], and then reshape
(3,4) converts it to a two dimensional array with three rows and four columns. linspace and reshape commands
can be applied in two different steps too, but that would require one intermediate variable creation.
In line number 56 of Code 2.8, notice the syntax of accessing elements from the array. It reads b=a[[0,0,1,2],
[0,2,2,1]]. In the first list, we pass row indices only and in the second list we provide corresponding column
indices. The rest of the code is self-explanatory. We insist on trying every line on the shell or code file.

2.5.3 Defining and dealing with three dimensional arrays


It might get beyond imagination when we try to deal with three-dimensional arrays or arrays with more than
two dimensions as it is difficult to print a 3D object on a 2D screen or paper. Code 2.9 shows only the creation
and display of a 3D image. Rest all concepts that we have developed in the previous two sections are also
applicable to them but here, we focus more on understanding the structure of 3D arrays while creating and
displaying. The array shown in Figure 2.8 is the 3D array that we intend to create and display. Notice that it has
two slices or frames (dark or front, which is fully visible, and light or back, which is partially visible). The dark
one is frame number zero and light one is frame number one. The full contents of frame numbers zero and one
are respectively.

Figure 2.8: 3D array that we want to create in numpy


Hence, the array that we have created has three rows, four columns, and a height of two. Now, there are two
ways to look at this 3D array. The first one, which is more relevant to image processing is the one where we
treat the 3D array as a stack of multiple 2D arrays. The second, however, is more mathematical to deal with. In
Code 2.9, we have shown both ways of creation and display so, let us take the first one first. Line numbers 9 to
14 tell us how to create a 3D array by treating that as slices of multiple (in our case only two) 2D arrays kept
one over the other. First, we create two 2D arrays by the name of a and b in line numbers 9 and 10. Then, in
line number 11, we initialize a 3×4×2 3D array filled with all ones (you may do it with zeros too). Then, in line
numbers 13 and 14, we replace frame 0 with a (they have the same size) and frame 1 with b. The syntax (:) in
place of rows and columns will mean all rows and all columns. Using the same syntax in line numbers 19 and
20, we print the array frame by frame as we cannot print a 3D array onto a 2D screen or paper. Check its output
after output marker one in the output section. This is the way we will be using to deal with images too.
In the second method, a 3D array is strictly treated as 3D array. You should notice the syntax of the creation of
the same array in Figure 2.8 in line number 33 of Code 2.9. Let us try to understand that too. That is the syntax
where the 3D array is written at once (not in parts/ slices). A 3D array has three dimensions. The first one is
named a row, the second a column, and the third a height. This order is the default convention in numpy library
and many other languages too. Instead of calling them row, column, and height, we may call them dimensions
1, 2, 3, and so on for higher dimensional general cases. So, whenever we write a n-dimensional array at once,
first its highest dimension is taken care of. Then, the second highest till the first (or zero index dimension). Let
us understand with the example of Figure 2.8 again. Let us take care of the third dimension, which is the
height. First, we will make three lists of lists. The first list is [[1,10],[2,20],[3,30],[4,40]], and the second is
[[5,50],[6,60],[7,70],[8,80]], and the third one is [[9,90],[10,100],[11,110],[12,120]]. Every one of these three
lists takes care of height dimension. [1,10] is the height dimension of the first row and first column. Similarly
[10,100], is the height dimension of the third (index 2) row and second (index 1) column. Now, we make a list
of all of these lists, and we get what is written in line number 33 of Code 2.9. From line numbers 34 to 39, we
print the output of the array, so created firstly slice by slice and secondly as a 3D array.
But we will hardly be making use of the second way of printing. We will stick to the first one for images.
01- #======================================================================
02- # PURPOSE : Dealing with Higher dimensional arrays
03- #======================================================================
04- import numpy as np
05-
06- #----------------------------------------------------------------------------
07- # Creating a 3D array (By treating it as slice of multiple 2D arrays)
08- #----------------------------------------------------------------------------
09- a=np.array([(1,2,3,4),(5,6,7,8),(9,10,11,12)])
10- b=np.array([(10,20,30,40),(50,60,70,80),(90,100,110,120)])
11- d=np.ones((3,4,2))
12-
13- d[:,:,0]=a;
14- d[:,:,1]=b;
15- #----------------------------------------------------------------------------
16- # Printing 3D array (slice by slice)
17- #----------------------------------------------------------------------------
18- print("---------------------------- 1")
19- print(d[:,:,0])
20- print(d[:,:,1])
21-
22- #-----------------------------------------------------------------------
23- # Accessing the elements of array
24- #-----------------------------------------------------------------------
25- print("---------------------------- 2")
26- e=d[1,2,1]
27- print("Element at index (1,2,1) is ...",e)
28-
29- #-----------------------------------------------------------------------
30- # Creating a 3D array (By direct method)
31- #-----------------------------------------------------------------------
32- print("---------------------------- 3")
33- f=np.array([[[1,10],[2,20],[3,30],[4,40]],[[5,50],
[6,60],[7,70],[8,80]],[[9,90],[10,100],[11,110],[12,120]]])
34- print("Printing by slice by clice method ...")
35- print(f[:,:,0])
36- print(f[:,:,1])
37- print("Printig Directly ...")
38- print(f)
39- print("Completed Successfully")
Code 2.9: Creating and dealing with 3 dimensional arrays
Output of Code 2.9:
---------------------------- 1
[[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 10. 11. 12.]]
[[ 10. 20. 30. 40.]
[ 50. 60. 70. 80.]
[ 90. 100. 110. 120.]]
---------------------------- 2
Element at index (1,2,1) is ... 70.0
---------------------------- 3
Printing by slice by slice method ...
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
[[ 10 20 30 40]
[ 50 60 70 80]
[ 90 100 110 120]]
Printing Directly ...
[[[ 1 10]
[ 2 20]
[ 3 30]
[ 4 40]]
[[ 5 50]
[ 6 60]
[ 7 70]
[ 8 80]]
[[ 9 90]
[ 10 100]
[ 11 110]
[ 12 120]]]
Completed Successfully

2.5.4 Operations on arrays


Now, we look at some important operations on and between arrays. Refer to Code 2.10 and its output. In lines
10 and 11, we create two numpy arrays for doing operations on them. Line number 11 needs special attention
because it tells us how to create an array initialized with random numbers in a given range. We are trying to
create a 3×4 array initialized with random numbers between two and 15. The code will give different outputs
on your computer every time you run it, now we do not know for sure what numbers will this array have when
we run this code.
From line numbers 20 to 31, we list some element by element operations between two arrays of the same
shape. Remember, shape in this case is 3×4, and size, which is the total number of elements in that array is 12.
Element by element operations means that these operations will be performed between corresponding elements
of both arrays. For example, in line number 24 which reads c=a*b, we perform multiplication element by
element. This is not a mathematical multiplication of two arrays, in fact, in the current case, the dimensions are
inconsistent for matrix multiplication. Here, element a[i,j] of array a will be multiplied with corresponding
element b[i,j] in array b. Verify this in the results section. Also, note the operators for the remainder (%), and
power (**) in line numbers 28 and 30 respectively.
For the mathematical multiplication of two arrays, notice the usage of np.matmul command in line number 30.
Mat for matrix and mul for multiplication. Line number 47 of Code 2.10 illustrates the way of finding a binary
array of the same shape (and of course size) as a given array satisfying some given condition. Notice the output
after output marker four that the result is an array with binary (logical – i.e., True/False) elements. However, if
we desire to retain only those elements that satisfy the given condition and the rest should be made zero, we
need to use the syntax of line number 48. One of the important use cases of finding something in an array is
finding the indices of elements that satisfy a given condition in an array. This is illustrated in line number 56 by
using np.where() function. Note that this returns the result in the form of a tuple. To be able to display them as
an array, we need np.asarray() function as demonstrated in line number 58 (row and corresponding column
indices are grouped into separate tuples separated by a comma and returned: notice this after output marker
number five).
01- #======================================================================
02- # PURPOSE : Array Operations
03- #======================================================================
04- import numpy as np
05-
06- #---------------------------------------------------------------------
07- # Creating some 2D arrays for use
08- #---------------------------------------------------------------------
09- print("---------------------------- 1")
10- a=np.array([(2,5,9,8),(-8,-6,3,0),(7,6,2,7)]);
11- b=np.random.randint(2,15,(3,4)); # (low,high,size).
12- #creates random integers in low to high range for specified size
13- print('array a is ...',a,"\n")
14- print('array b is ...',b,"\n")
15-
16- #---------------------------------------------------------------------
17- # Element by element operations
18- #---------------------------------------------------------------------
19- print("---------------------------- 2")
20- c=a+b;
21- print('addition of a and b is ... \n',c,"\n")
22- c=a-b;
23- print('subtraction of a and b is ... \n',c,"\n")
24- c=a*b;
25- print('element by element multiplication of a and b is ... \n',c,"\n")
26- c=a/b;
27- print('element by element division of a and b is ... \n',c,"\n")
28- c=a%b;
29- print('element by element remainder of a and b is ... \n',c,"\n")
30- c=a**b;
31- print('element by element power of a and b is ... \n',c,"\n")
32-
33- #---------------------------------------------------------------------
34- # Mathematical operations
35- #---------------------------------------------------------------------
36- print("---------------------------- 3")
37- a=np.array([(1,2),(9,3)])
38- b=np.array([[2],[3]])
39- c=np.matmul(a,b) # Mathematical multiplication
40- print('Mathematical multiplication of a and b is ... \n',c,"\n")
41-
42- #---------------------------------------------------------------------
43- # Identifying Elements based on special conditions
44- #---------------------------------------------------------------------
45- print("---------------------------- 4")
46- a=np.array([[1,4,6],[2,1,4],[0,4,5]])
47- print("The required array is ...\n",a>3,"\n")
48- print("The corresponding elements in array are ... \n",a*(a>3),"\n")
49-
50- #---------------------------------------------------------------------
51- # WHERE command (Gives the indices of the required condition)
52- #---------------------------------------------------------------------
53- print("---------------------------- 5")
54- a=np.random.randint(3,20,(3,5)) # random array of size 3x5 with elements lying between 3,20 is created
55- print(a,"\n")
56- b=np.where((a<10)|(a>14)) # this is returned as tuple.
57- # Brackets around individual conditions above are absolutely necessary
58- b1=np.asarray(b,"\n") # tuple to numpy array conversion (abbrevation : As array)
59- print(b,"\n") # output as tuple
60- print(b1,"\n") # output as numpy array
61- print('And those values are ...',a[b],"\n") # here b1 wont work
62-
63- print("Completed Successfully ... ")
Code 2.10: Operations on arrays
Output of Code 2.10:
---------------------------- 1
array a is ... [[ 2 5 9 8]
[-8 -6 3 0]
[ 7 6 2 7]]
array b is ... [[ 6 10 14 11]
[10 8 11 3]
[ 8 6 12 7]]
---------------------------- 2
addition of a and b is ...
[[ 8 15 23 19]
[ 2 2 14 3]
[15 12 14 14]]
subtraction of a and b is ...
[[ -4 -5 -5 -3]
[-18 -14 -8 -3]
[ -1 0 -10 0]]
element by element multiplication of a and b is ...
[[ 12 50 126 88]
[-80 -48 33 0]
[ 56 36 24 49]]
element by element division of a and b is ...
[[ 0.33333333 0.5 0.64285714 0.72727273]
[-0.8 -0.75 0.27272727 0. ]
[ 0.875 1. 0.16666667 1. ]]
element by element remainder of a and b is ...
[[2 5 9 8]
[2 2 3 0]
[7 0 2 0]]
element by element power of a and b is ...
[[ 64 9765625 1796636465 0]
[1073741824 1679616 177147 0]
[ 5764801 46656 4096 823543]]
---------------------------- 3
Mathematical multiplication of a and b is ...
[[ 8]
[27]]
---------------------------- 4
The required array is ...
[[False True True]
[False False True]
[False True True]]
The corresponding elements in array are ...
[[0 4 6]
[0 0 4]
[0 4 5]]
---------------------------- 5
[[ 3 3 4 17 6]
[ 4 4 8 15 11]
[18 9 12 18 4]]
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], dtype=int64), array([0, 1, 2, 3, 4, 0, 1, 2, 3, 0, 1, 3, 4], dtype=int64))
[[0 0 0 0 0 1 1 1 1 2 2 2 2]
[0 1 2 3 4 0 1 2 3 0 1 3 4]]
And those values are ... [ 3 3 4 17 6 4 4 8 15 18 9 18 4]
Completed Successfully ...

2.5.5 Important points to remember about arrays


Now, we will talk about creating sub-arrays from a given array or sub-list from a given list. There are two ways
in which this can be done depending upon the need. The first way is to create a view and the second way is to
create a copy. Now, let us understand the difference between these two by referring to Code 2.11:
001- #======================================================================
002- # PURPOSE : Array Operations (Important Points)
003- #======================================================================
004- import numpy as np
005-
006- #----------------------------------------------------------------------
007- # VIEW (np.array & list)
008- #----------------------------------------------------------------------
009- print("\n---------------------------- 1 ")
010- a=np.array([1,2,5,7,2,3])
011- print("a is \n",a)
012- b=a
013- b[3]=3000 # This will reflect the change in both a & b (Hence it's a view)
014- print("a is \n",a)
015- print('b is \n',b)
016-
017- #---------------------------------------------------------------------
018- # COPY (np.array & List)
019- #---------------------------------------------------------------------
020- print("\n---------------------------- 2 ")
021- a=np.array([1,2,5,7,2,3])
022- print("\n\n a is \n",a)
023- b=a.copy()
024- b[3]=3000 # This will NOT reflect the change in both a (Hence it's a copy)
025- print("a is \n",a)
026- print('b is \n',b)
027-
028- #---------------------------------------------------------------------
029- # np.array to list
030- #---------------------------------------------------------------------
031- print("\n---------------------------- 3 ")
032- a=np.array([1,2,5,7,2,3])
033- print("a is \n",a)
034- b=a.tolist()
035- b[3]=3000 # This will NOT reflect the change in both a (Hence it's a copy)
036- print("a is \n",a,"\n data type of a is ",type(a))
037- print('b is \n',b,"\n data type of b is ",type(b))
038-
039- #---------------------------------------------------------------------
040- # list to np.array
041- #---------------------------------------------------------------------
042- print("\n---------------------------- 4 ")
043- a=[1,2,5,7,2,3]
044- print("a is \n",a)
045- b=np.array(a)
046- b[3]=3000 # This will NOT reflect the change in both a (Hence it's a copy)
047- print("a is \n",a,"\n data type of a is ",type(a))
048- print('b is \n',b,"\n data type of b is ",type(b))
049-
050- #---------------------------------------------------------------------
051- # SLICED VIEW (np.array vs List)
052- #---------------------------------------------------------------------
053- print("\n---------------------------- 5 ")
054- a=[1,2,5,7,2,3]
055- print("a is \n",a)
056- b=a
057- b[3]=3000 # This will reflect the change in both a & b (Hence it's a view)
058- print("a is \n",a)
059- print('b is \n',b)
060-
061- c=b[0:3]
062- c[2]=2000 # Sliced array is copy for a list but for np.array, its view (IMP)
063-
064- print("b is \n",b)
065- print('c is \n',c)
066-
067-
068- #----- but for np.array, story is different ------------
069-
070- print("\n---------------------------- 6 ")
071- a=np.array([1,2,5,7,2,3])
072- print("a is \n",a)
073- b=a
074- b[3]=3000 # This will reflect the change in both a & b (Hence it's a view)
075- print("a is \n",a)
076- print('b is \n',b)
077-
078- c=b[0:3]
079- c[2]=2000 # Sliced array is copy for a list but for np.array, its view (IMP)
080-
081- print("b is \n",b)
082- print('c is \n',c)
083-
084- print("Completed Successfully ... ")
Code 2.11: View and copy
Output of Code 2.11:
---------------------------- 1
a is
[1 2 5 7 2 3]
a is
[ 1 2 5 3000 2 3]
b is
[ 1 2 5 3000 2 3]
---------------------------- 2
a is
[1 2 5 7 2 3]
a is
[1 2 5 7 2 3]
b is
[ 1 2 5 3000 2 3]
---------------------------- 3
a is
[1 2 5 7 2 3]
a is
[1 2 5 7 2 3]
data type of a is <class 'numpy.ndarray'>
b is
[1, 2, 5, 3000, 2, 3]
data type of b is <class 'list'>
---------------------------- 4
a is
[1, 2, 5, 7, 2, 3]
a is
[1, 2, 5, 7, 2, 3]
data type of a is <class 'list'>
b is
[ 1 2 5 3000 2 3]
data type of b is <class 'numpy.ndarray'>
---------------------------- 5
a is
[1, 2, 5, 7, 2, 3]
a is
[1, 2, 5, 3000, 2, 3]
b is
[1, 2, 5, 3000, 2, 3]
b is
[1, 2, 5, 3000, 2, 3]
c is
[1, 2, 2000]
---------------------------- 6
a is
[1 2 5 7 2 3]
a is
[ 1 2 5 3000 2 3]
b is
[ 1 2 5 3000 2 3]
b is
[ 1 2 2000 3000 2 3]
c is
[ 1 2 2000]
Completed Successfully ...
Refer to line numbers 10 to 15 of Code 2.11. In line number 10, an array a is created. In line number 12, we
write b=a. This line simply means that a will also be called by the name b. It does not mean that another
variable b is created whose value will be equal to a. This can be verified by the next line which says b[3]=3000
because, in the next two lines, the value of a and b is printed which shows that corresponding change is
reflected in both. We say b is a view of a. Whether you change something in a or b, the corresponding change
will be reflected in both. Lines 21 to 26 demonstrate the creation of copy. Notice the syntax b=a.copy() in line
number 23. Now, this will create another variable b with the same values as in a and the change in one of them
will not be replaced in the other. Both view and copy have their usage. view does not create another variable in
memory however, copy does. Also, note that although we have demonstrated this for an array, the same concept
is applicable to lists too.
Line numbers 30 to 48 of Code 2.11 illustrates the functions for conversion of list to array and vice versa. Lines
53 to 84 illustrate the concept of view and copy but for sliced arrays and lists. Earlier from lines 10 to 26, we
did not have a slice, we had a full array/list which was used for the creation of view/copy. Read line numbers
30 to 48 and convince yourself by seeing the corresponding output that the Sliced array is a copy for the list
but a view for numpy array.

2.6 Matplotlib library


As they say on matplotlib.org, Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python. Now, we will be able to display the data in the form of nice plots.
Remember, to be able to use the matplotlib library, use the method illustrated in Section 2.3 and use the
following command: pip install matplotlib.

2.6.1 Plotting simple graphs


So, to begin learning how matplotlib library works, refer to Code 2.12. Remember that at this stage, we will
begin by making simple useful plots, matplotlib has much more to offer.
01- #======================================================================
02- # PURPOSE : Learning simple plots (USING MATPLOTLIB)
03- #======================================================================
04- import matplotlib.pyplot as plt
05- import numpy as np
06-
07- #----------------------------------------------------------------------
08- # Making Simple plots
09- #----------------------------------------------------------------------
10-
11- Fs=1000
12- T=1/Fs
13- L=1001 # Keep this odd for ease of Mathematics
14- t=np.linspace(0,L-1,L)*T
15-
16- f1=5
17- sig1=np.sin(2*np.pi*f1*t)
18- f2=2
19- sig2=np.sin(2*np.pi*f2*t)
20- sig3=t;
21-
22- fig1,ax1=plt.subplots()
23- fig1.show()
24- plt.plot(t,sig1,label='sine curve',color='g',marker='.')
25- plt.plot(t,sig2,label='cosine curve',color='k')
26- plt.plot(t,sig3,label='Line',linestyle='dashed',linewidth=3)
27- plt.grid()
28- plt.title("Input Signal",fontsize=20) # ALTERNATIVE ax1.set_title("Input Signal")
29- plt.xlabel("The Time Axis",fontsize=15) # ALTERNATIVE ax1.set_xlabel("The Time Axis")
30- ax1.set_ylabel("The Amplitude axis",fontsize=15) # ALTERNATIVE plt.ylabel("The Amplitude
axis",fontsize=15)
31- plt.legend()
32-
33- sig4=np.array([2,3,5,6,7,8,5,3,4,5])
34- sig5=np.array([9,8,7,8.8,4.5,2.09,3,1,2,3])
35- x=np.linspace(0,9,10)
36- fig2,ax2=plt.subplots()
37- fig2.show()
38- ax2.stem(x,sig4,label='signal 4')
39- plt.stem(x,sig5,'r-.','g>',label='signal 5')
40- plt.grid()
41- plt.title("Input Signal",fontsize=20)
42- plt.xlabel("The Time Axis",fontsize=15)
43- ax2.set_ylabel("The Amplitude axis",fontsize=15)
44- plt.legend()
45-
46- plt.show()
47- print("Completed Successfully")
Code 2.12: Exploring Matplotlib
Output of Code 2.12:
Figure 2.9: Continuous plot
And the second output is mentioned below:

Figure 2.10: Discrete plot


Before looking at Code 2.12, let us look at Figure 2.9 which shows the actual output that will be generated
through the first half of this code. Notice that the plot contains three curves (two sin curves and one line) in
different colors and line styles (dashed, continuous, thin, thick, etc.) The figure has a title and labels for the x
and y axis. It also has a legend to name the three curves. The grid in the background is on for better readability.
Now, through the code, we will try to understand how to generate this.
Line number four of Code 2.12 which is import matplotlib.pyplot as plt, imports matplotlib.pyplot by the
nickname plt. Pyplot is for plotting graphs in Python. Note that the numpy library is also imported because it
will also be used somewhere in the code. Line number 14, which is t=np.linspace(0,L-1,L)*T, creates the time
axis of the curves to be plotted. It is clear by the nature of the linspace command, it will be creating numbers
from zero to L-1 such that there are a total of L numbers i.e., [0,1,2,3, … (L-1)], now *T converts this to
[0,T,2T,3T … (L-1)T]. T can be chosen arbitrarily (usually small). Physically T means the distance between
two samples. In engineering, the inverse of T is called frequency Fs as defined in line number 11. Fs is called
sampling frequency. A value of Fs=1000 means there will be 1000 samples in one second on the time axis.
In line number 17 which is sig1=np.sin(2*np.pi*f1*t), signal 1 is created. It has the format Sin(2πft) where f is
the frequency of the signal (set in line number 16). Notice that t is the time samples that we have just created.
Hence, sig1 will be an array. Similarly, signal 2 and signal 3 are created in line numbers 19 and 20. Notice that
signal 3 is a ramp signal. In line number 27 which is plt.grid(), we made the background grid in the plot for
better readability of values on the y axis.
Line numbers 22 to 31 plot the actual figure. Line 22, which is fig1,ax1=plt.subplots() creates an empty figure
(which is also not displayed yet). This command has two output arguments -fig1 and ax1. These are handles to
the figure. We will talk about these two in detail in the next code where we will study creating subplots. These
are just ways of referring to the figure that we intend to plot. So, we delayed the discussion of these till the next
code. Line number 23 i.e., fig1.show() displays the figure (which is still empty). plt.plot command in lines 24,
25, and 26 plots the three functions with their properties like color, type, etc. For example, in line number 26
which is plt.plot(t,sig3,label='Line',linestyle='dashed',linewidth=3), it would have been sufficient to write
plt.plot(t,sig3) but it will plot sig3 vs. t in default color and thickness. So, all the other arguments are optional
and are used for styling the graph so plotted. We need to plot sig3 in dashed format with linewidth= 3. The
label for this should be a Line as this is not a sin signal. Try playing with these to see the change in a plot that
we created. To put a title, labels on the x-axis, and y-axis in lines 28, 29, and 30 are used with self-explanatory
syntax. Notice that alternate syntax for using the same is also given which will become clearer when we study
sub-plotting soon. Then, refer to line number 31 which is the plt.legend(). It puts all the labels of curves in the
form of a legend on the graph. Remember, with the plt.plot(), we may define labels corresponding to the curve
but until then, we put the plt.legend() command, a summary showing all the labels will not be plotted till then.
Although the signal that we have plotted is discrete, the plot looks continuous as the sample spacing is small,
and when the plt.plot function is used, it plots by joining the amplitudes of the two nearest samples. If,
however, we want to plot discrete plots, instead of using plt.plot, we must use plt.stem function as
demonstrated in the remaining part of the code for the second figure. Sig4, and sig5 which are discrete signals
are plotted with different properties there.
Note: Line number 46, which is plt.show() is not required in some integrated development environments (IDEs) but in some IDEs, if not
written, it prevents plots from being displayed on screen.

2.6.2 Making use of subplots


In the previous section, we saw how to draw simple graphs. In this section, we will try to make subplots. To
know what a subplot is, see Figure 2.11. In that figure (which is called the plot, there are also six subplots
called axes). For illustration, only two out of six axes have been used to make graphs, four are left empty. The
two graphs are the same as we used in the previous section. The following code demonstrates the procedure:
01- #======================================================================
02- # PURPOSE : Learning simple plots (USING MATPLOTLIB)
03- #======================================================================
04- import matplotlib.pyplot as plt, numpy as np
05-
06- #-----------------------------------------------------------------
07- # Making Subplots
08- #-----------------------------------------------------------------
09-
10- Fs=1000
11- T=1/Fs
12- L=1001 # Keep this odd for ease of Mathematics
13- t=np.linspace(0,L-1,L)*T
14-
15- f1=5
16- sig1=np.sin(2*np.pi*f1*t)
17- f2=2
18- sig2=np.sin(2*np.pi*f2*t)
19- sig3=t;
20-
21- fig1,ax1=plt.subplots(2,3)
22- fig1.show()
23- ax1[0,0].plot(t,sig1,label='sine curve',color='g',marker='.')
24- ax1[0,0].plot(t,sig2,label='cosine curve',color='k')
25- ax1[0,0].plot(t,sig3,label='Line',linestyle='dashed',linewidth=3)
26- ax1[0,0].grid()
27- ax1[0,0].set_title("Input Signal",fontsize=20)
28- ax1[0,0].set_xlabel("The Time Axis",fontsize=15)
29- ax1[0,0].set_ylabel("The Amplitude axis",fontsize=15)
30- ax1[0,0].legend()
31-
32- sig4=np.array([2,3,5,6,7,8,5,3,4,5])
33- sig5=np.array([9,8,7,8.8,4.5,2.09,3,1,2,3])
34- x=np.linspace(0,9,10)
35- ax1[1,1].stem(x,sig4,label='signal 4')
36- ax1[1,1].stem(x,sig5,'r-.','g>',label='signal 5')
37- ax1[1,1].grid()
38- ax1[1,1].set_title("Input Signal",fontsize=20)
39- ax1[1,1].set_xlabel("The Time Axis",fontsize=15)
40- ax1[1,1].set_ylabel("The Amplitude axis",fontsize=15)
41- ax1[1,1].legend()
42-
43- plt.show()
44- print("Completed Successfully")
Code 2.13: Creating subplots
Output for Code 2.13 is demonstrated in Figure 2.11:

Figure 2.11: Subplots in Python


Refer to Code 2.13 for the code. Most of the code is the same as Code 2.12. We will highlight the point of
difference here. Notice line number 21 which is fig1,ax1=plt.subplots(2,3), the input argument is now not
empty. It has the total number of rows and columns that we want to see in the final plot (figure). Now, let us
understand the meaning of fig1 and ax1 outputs (remember these are just names, and hence, can be anything
you desire). In a code, there might be multiple figures (plots). So, to refer to one figure(plot), we use the
corresponding handle. Similarly, on one figure (plot), there may be multiple sub-plots (axes). We refer to them
by using the corresponding axes variable (in the current case ax1). In line numbers 23 to 30, instead of using
plt.plot, which will also work fine, we have used ax1[0,0].plot() to make the code clear (this is an alternate
syntax). We used [0,0] because we wanted to utilize the first axes in the entire figure (plot). The rest of the code
is self-explanatory.
Note: If you are stuck, the help() function on the shell comes in handy.

2.7 OpenCV library


OpenCV is an open source computer vision library. We will be making frequent use of this library for
importing, displaying, manipulating, and saving images. Before use, it is necessary to install it using the
method illustrated in Section 2.3 by using the pip install opencv-python command. Also, make sure to keep
the code files and the images being used in that code in the same folder on your computer otherwise, there will
be a need to provide path to the images where they are stored on your computer every time.

2.7.1 Importing and displaying the images in default OpenCV way


In order to illustrate the basic operations, we will use the images as shown in Figure 2.12:

Figure 2.12: Images used for import and display. (a) The Gwalior Fort Image (b) River Ganges Image
Once installed, OpenCV can be utilized by entering the following statement in Python — import cv2 as shown
in line number 4 of Code 2.14. Since its name is short i.e., cv2, we will not assign a nickname to it. Line
number 7 which reads - input_image1=cv2.imread('img1.bmp',1) shows us the way to import an image in
Python. Here, input_image1 will be an array capable of storing an unsigned integer of 8 bits. Since it is a
colored image, there will be three frames, and hence, input_image1 is a 3D array. cv2.imread function is used
to read images. The first argument is the name of the image with file extension. The second argument tells
whether we want to import it in colored or grayscale mode. 1 will stand for color and 0 for grayscale. In line
number 8, we have imported another image in grayscale mode (although it was colored originally).
01- #======================================================================
02- # PURPOSE : Import/Displaying Images in OPENCV's Default way
03- #======================================================================
04- import cv2
05-
06- # Read the image
07- input_image1=cv2.imread('img1.bmp',1)
08- input_image2=cv2.imread('img2.bmp',0)
09- # Second argument in above command is 1 for reading image as colored
10- # and 0 for reading it as grayscale
11-
12- # Display the image
13- cv2.imshow('First Image',input_image1)
14- cv2.waitKey(2000) # Put time in milliseconds here
15- cv2.destroyAllWindows()
16-
17- cv2.imshow('Second Image',input_image2)
18- cv2.waitKey(0) # For 0, python waits for keypress
19- cv2.destroyAllWindows()
20-
21- # Print shape of image
22- a1=input_image1.shape;
23- a2=input_image2.shape;
24- print('The shape of image is 1 ... ',a1,'\nThe shape of image is 2 ... ',a2)
25-
26- print("Completed Successfully ...")
Code 2.14: Basic image import and display using OpenCV’s default way
Output of Code 2.14 (on Python shell):
The shape of image is 1 ... (384, 512, 3)
The shape of image is 2 ... (353, 500)
Completed Successfully ...
Line numbers 13 to 19 of Code 2.14 shows two ways of displaying in sequence. The first image will be
displayed for 2000 milliseconds (i.e., two seconds), and then it will be closed automatically. Next, the second
image will be displayed, do not be surprised to see it in grayscale mode, as it was imported in that mode to
begin with. It will be displayed till the user presses any key. Once the user presses the key on the Python shell,
the shape of the image will be printed. It can be noted from the output on the shell that the first image (which
was imported in colored mode) has 384 rows, 512 columns, and a height of three corresponding to red, green,
and blue frames. The second image only has 353 rows and 500 columns but the height parameter is absent as it
is only a 2D image as discussed earlier in Section 1.2.1 and 1.2.2.

2.7.2 Displaying the images using matplotlib’s default way


Although OpenCV’s default way of displaying the images works fine, we may need to display multiple images
simultaneously, or maybe in a single figure (plot) which has multiple axes (subplots). So, in this section, we
explore matplotlib’s default way of displaying images keeping the above requirements in mind. Refer to Code
2.15. The import of images is done by using the OpenCV package only, this was the case with the previous two
codes. We only explore the display part here. Notice that in line numbers 13 and 14, we create empty
matplotlib figures with two axes, one for colored image display and the other for grayscale image display
(although in theory one can display graphs too). In line number 16 which reads plt.subplot(1,2,1), we decide
where we want to put the first image. These first two arguments are nothing but the axes grid size which is 1×2
(already fixed) and we want to plot it in the first axes, that is why the third argument is 1. In line number 17 we
change the color format from BGR to RGB as BGR is the default in Python but for display, RGB is mandatory.
Keep in mind that in Python, the first frame is B, G, and then R respectively. It is not RGB; it is BGR by
default. In line number 18 which reads plt.axis('off'), we put the x and y axis labels as off because by default
Python will put some x and y axis markings, try removing this line from the code and see the result. On images,
such markings are not at all required. Then, in line number 19, we put the tile of the image (but the syntax is a
little complex). To display the second image again, the same process is repeated in the code.
The output can be seen in Figure 2.13. Here, the figure will not disappear after some time or after a keypress.
We can open multiple such figures showing different images and plots simultaneously.
01- #======================================================================
02- # PURPOSE : Display images the matplotlib way
03- #======================================================================
04- import cv2, matplotlib.pyplot as plt,numpy as np
05-
06- # Read the image in OpenCV way
07- input_image1=cv2.imread('img1.bmp',1)
08- input_image2=cv2.imread('img2.bmp',0)
09- # Second argument in above command is 1 for reading image as colored
10- # and 0 for reading it as grayscale
11-
12- # Displaying images with Matplotlib
13- fig1,ax=plt.subplots(1,2)
14- fig1.show()
15-
16- plt.subplot(1,2,1)
17- plt.imshow(cv2.cvtColor(input_image1, cv2.COLOR_BGR2RGB))
18- plt.axis('off')
19- plt.gca().set_title("Gwalior Fort (Colored Image)")
20-
21- plt.subplot(1,2,2)
22- plt.imshow(cv2.cvtColor(input_image2, cv2.COLOR_BGR2RGB))
23- plt.axis("off")
24- plt.gca().set_title('River Ganga (Grayscale image)')
25-
26- cv2.imwrite("result1.bmp",input_image1)
27- cv2.imwrite("result2.bmp",input_image2)
28-
29- plt.show()
30- print("Completed Successfully ...")
Code 2.15: Displaying images in matplotlib's default way
Output of Code 2.15 can be seen in the following Figure 2.13:
Figure 2.13: Matplotlib's way of displaying images
So, we achieved what we wanted. We have overcome all the difficulties that were present in OpenCV. The cost
we paid for doing this is – more syntax (for BGR to RGB conversion, putting the axis off every time we display
an image, writing long syntax for putting the title). In the next section, we present a custom made package that
when imported in Python will erase all these problems and we will be able to display images the intended way
in just one line of code. That way, we will also learn how to create packages in Python.

2.7.3 Creating packages in Python


In this section, let us learn in generic terms, how to create a package (like numpy and matplotlib) and use it.
We will create a package for the display of images such that, unlike OpenCV, we do not have to specify the
timing for an image to be displayed. Unlike matplotlib, we do not have to write additional syntax every time for
conversion of BGR format to RGB format and write long syntaxes for displaying title and syntax to put the axis
marking off every time. We want to create a package that we can simply import like other packages and use
functions inside it. One of the functions will be our desired image display function.
Now onwards, let us make it a standard practice to keep all the code files and relevant data (images, video,
audio, Excel files, etc.) in the same folder. Although, in theory, we may keep them anywhere on the computer
or even network, for using them, we must always provide the path. So, in whatever directory (folder on
Windows) you are working on, to be able to create a package, create a new folder and give it a suitable name
like my_package. Inside that folder, create one file named init.py (a regular Python file named init). You do not
have to write anything inside this file. This file is just there to initialize things. Now, you can create a file where
you will be writing your custom functions. For example, in the current case, we are creating a Python file
my_functions.py. So now, the my_package folder has two files; one empty and another where we are going to
write our function.
Before understanding what should be written inside the my_functions file, assume that it s already created and
now let us try to use it to display some images in a code. Later, we will see what is inside it. So, refer to Code
2.16. Here, again, the same images that we were using in earlier codes are used to display and the output will
resemble Figure 2.13. But the point of difference is the usage of our custom created package and the ease of
use that it brings with itself.
01- #======================================================================
02- # PURPOSE : Defining custom package and use it in future
03- #======================================================================
04-
05- import cv2, matplotlib.pyplot as plt,numpy as np
06- import my_package.my_functions as mf # This is a user defined package
07-
08- # Read the image
09- input_image1=cv2.imread('img1.bmp',1)
10- input_image2=cv2.imread('img2.bmp',0)
11- # Second argument in above command is 1 for reading image as colored
12- # and 0 for reading it as grayscale
13-
14- # Displaying images with Matplotlib
15- fig1,ax1=plt.subplots(1,2)
16- fig1.show()
17-
18- mf.my_imshow(input_image1,"The Gwalior Fort",ax1[0])
19- mf.my_imshow(input_image2,"The Ganges",ax1[1])
20- cv2.imwrite("result1.bmp",input_image1)
21- cv2.imwrite("result2.bmp",input_image2)
22-
23- plt.show()
24- print("Completed Successfully ...")
Code 2.16: Displaying images using a custom package named – my_package
Notice the line number 6 in Code 2.16. Here, we have imported the package my_package that we created and
imported the file my_functions inside it by using the syntax import my_package.my_functions as mf. Also,
note that it is nicknamed as mf. In line numbers 18 and 19, two images have been displayed just by using two
different lines. No time limit to put, BGR to RGB conversion, axis markers, and no additional commands to put
the title. It is done with the contents written inside my_functions file containing one function called
my_imshow. It takes three arguments; the array to be displayed as an image, the title for it, and where it is to
be displayed. So, from now onwards, we will be using this method frequently to display images in this book.
You may use the other alternatives too.
Code 2.17 lists out the content of my_functions file. The file has two functions, my_imshow, which we have
used in Code 2.17 and norm_uint8, which we might use in the future with due notification.
01- #======================================================================
02- # Purpose : creating custom functions for usage
03- #======================================================================
04-
05- import matplotlib.pyplot as plt
06- import cv2
07- import numpy as np
08-
09- # Function my_imshow is used to display image using matplotlib method and is
10- # usually used with subplots
11-
12- def my_imshow(input_image,str='',ax='Nothing Passed'):
13- if ax=='Nothing Passed':
14- fig1,ax1=plt.subplots()
15- fig1.show()
16- if input_image.ndim==3:
17- plt.imshow(cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB))
18- else:
19- plt.imshow(cv2.cvtColor(input_image, cv2.COLOR_GRAY2RGB))
20- plt.axis("off")
21- plt.gca().set_title(str)
22- return fig1,ax1
23- else:
24- if input_image.ndim==3:
25- ax.imshow(cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB))
26- else:
27- ax.imshow(cv2.cvtColor(input_image, cv2.COLOR_GRAY2RGB))
28- ax.axis("off")
29- ax.set_title(str)
30-
31- # Function norm_uint8 defined below,
32- # Normalises and convert to uint8 an input_image and return the same, usually
33- # used while displaying a grayscale image which has been converted to float32
34- # during processing
35-
36- def norm_uint8(input_image):
37- input_image=input_image-np.min(input_image)
38- input_image=255*input_image/np.max(input_image)
39- input_image=np.uint8(input_image)
40- return input_image
Code 2.17: Contents of my_functions.py file

2.8 Pandas library


Pandas is one of the most powerful and flexible open source data analysis/manipulation tools available. In this
section, we will see it in the context of manipulating Excel files. Refer to Section 2.3 for installing packages.
Follow the procedure illustrated there but use the following for installation of the Pandas library, pip install
pandas. Similarly, use pip install xlsxwriter for installing a library that deals with Excel sheet writing/saving.
With this, we are ready to go. If while running the code written using the above two libraries shows an error
that says missing dependency openpyxl or something similar, use pip install openpyxl or the dependency that
is missing. Although not everyone is bound to get this error, with the change of version of Python and its
libraries, dependencies of packages may also change. Dependency is a package that may depend on other
installed packages for its execution. Most of the time dependencies are automatically installed. But if not, while
running the code, an error message is shown displaying the name of the dependency. So, see it for your case (if
any) and install it.
The central data structure in pandas library is called DataFrame. It is a two dimensional, shape-mutable (which
means that the shape of the table can be changed depending on need in code – new rows or columns may be
added. The older ones may be deleted if needed), and potentially heterogeneous (meaning it s the table so
created is capable of storing different datatypes at the same time not necessarily and integer for an integer array
in C programming) tabular data structure. It can be thought of as a spreadsheet or SQL table, where data is
organized in rows and columns. For example in line number 15 of the code next, variable s1 will be a
DataFrame. It can be checked by running type(s1) in Python shell and the output can be noted as – <class
'pandas.core.frame.DataFrame'>.
In order to be able to understand the usage of Pandas library, refer to Figure 2.14 which is an image of the
Excel file that we are going to use. The Excel file has the name Marks Entry.xlsx. It has two sheets named Hi
and Hello. The contents of sheet number 0 i.e., Hi are displayed in the figure, and sheet number 1 which has the
name Hello is blank. We will use this sheet to understand pandas library. Now, refer to Code 2.18 for further
discussion.
Figure 2.14: An Excel file named Marks Entry.xlsx
In order to understand Code 2.18, refer to the comments written with commands. Notice the use of iLoc
function as it is a little different from numpy library.
01- #======================================================================
02- # PURPOSE : Learning How to deal with EXCEL files using PANDAS
03- #======================================================================
04- import pandas as pd
05- import xlsxwriter
06-
07- # Creating File Handle to EXCEL File
08- file1=pd.ExcelFile("Marks Entry.xlsx")
09-
10- # Printing all sheet names available
11- print('............................................... 1')
12- print(file1.sheet_names)
13-
14- # Accessing Sheet 1 (NOTICE the argument as sheet name 'Hi' for that below)
15- s1=file1.parse('Hi') # Alternatively 0 can be passed instead of sheet name
16- # as it's the first sheet and numbering begin from 0 in Python
17-
18- # Accessing Sheet 2 (NOTICE the argument 1 for that below)
19- s2=file1.parse(1)
20-
21- # Display all column names of sheet s1
22- print('............................................... 2')
23- print(s1.columns)
24-
25- # Access one column from s1
26- print('............................................... 3')
27- print(s1['Name'])
28-
29- # Access multiple columns from s1
30- print('............................................... 4')
31- print(s1[['Name',"S.No.",'Name']])
32-
33- # Access any one row from s1
34- print('............................................... 5')
35- print(s1.iloc[2]) # Counting starts from 0 as usual but header is not counted !
36-
37- # Display first n rows (header counted)
38- print('............................................... 6')
39- print(s1.head(3))
40-
41- # Access arbitrary rows from s1
42- print('............................................... 7')
43- print(s1.iloc[[2,4,0]]) # Counting starts from 0 as usaul but header is not counted !
44-
45- # Access a cell from s1
46- print('............................................... 8')
47- print(s1.iloc[0,1])
48-
49- # write sheets (dataframes) to excel file
50- data2excel=pd.ExcelWriter('output1.xlsx',engine='xlsxwriter')
51- s1.to_excel(data2excel,sheet_name='result1')
52- data2excel.save()
53-
54- # Sorting by ONLY ONE Column
55- print('............................................... 9')
56- var1=s1['Name'].sort_values(ascending=True)
57- print(var1)
58-
59- data2excel=pd.ExcelWriter('output2.xlsx',engine='xlsxwriter')
60- var1.to_excel(data2excel,sheet_name='result1')
61- data2excel.save()
62-
63- # Sorting by ENTIRE WORKSHEET as per ONE Column
64- print('............................................... 10')
65- var1=s1.sort_values('Name',ascending=True)
66- print(var1)
67-
68- data2excel=pd.ExcelWriter('output3.xlsx',engine='xlsxwriter')
69- var1.to_excel(data2excel,sheet_name='result1')
70- data2excel.save()
71-
72- # Filter data i.e. columns as per a condition
73- print('............................................... 11')
74- print(s1[s1['Marks in Physics']<40])
75-
76- # Know the datatypes of each column
77- print('............................................... 12')
78- print(s1.dtypes)
79-
80- print("\nCompleted Successfully ...")
Code 2.18: Dealing with Excel files using pandas library
Output of Code 2.18:
............................................... 1
['Hi', 'Hello']
............................................... 2
Index(['S.No.', 'Name', 'Marks in Physics'], dtype='object')
............................................... 3
0 Manish
1 Rajesh
2 Rahul
3 Ram
4 Shyam
5 Sita
6 Gita
7 Hari
8 Vishnu
9 Sam
10 Ravan
Name: Name, dtype: object
............................................... 4
Name S.No. Name
0 Manish 1 Manish
1 Rajesh 2 Rajesh
2 Rahul 3 Rahul
3 Ram 4 Ram
4 Shyam 5 Shyam
5 Sita 6 Sita
6 Gita 7 Gita
7 Hari 8 Hari
8 Vishnu 9 Vishnu
9 Sam 10 Sam
10 Ravan 11 Ravan
............................................... 5
S.No. 3
Name Rahul
Marks in Physics 52
Name: 2, dtype: object
............................................... 6
S.No. Name Marks in Physics
0 1 Manish 50
1 2 Rajesh 51
2 3 Rahul 52
............................................... 7
S.No. Name Marks in Physics
2 3 Rahul 52
4 5 Shyam 56
0 1 Manish 50
............................................... 8
Manish
............................................... 9
6 Gita
7 Hari
0 Manish
2 Rahul
1 Rajesh
3 Ram
10 Ravan
9 Sam
4 Shyam
5 Sita
8 Vishnu
Name: Name, dtype: object
............................................... 10
S.No. Name Marks in Physics
6 7 Gita 90
7 8 Hari 12
0 1 Manish 50
2 3 Rahul 52
1 2 Rajesh 51
3 4 Ram 34
10 11 Ravan 34
9 10 Sam 56
4 5 Shyam 56
5 6 Sita 78
8 9 Vishnu 34
............................................... 11
S.No. Name Marks in Physics
3 4 Ram 34
7 8 Hari 12
8 9 Vishnu 34
10 11 Ravan 34
............................................... 12
S.No. int64
Name object
Marks in Physics int64
dtype: object
Completed Successfully ...
Many more libraries/packages are used in Python, we will mention some of these when their usage is
encountered in the coming chapters. But, with this introduction and a little practice on your side, we are now
able to dive into the world of digital image processing with hands-on experience in the upcoming chapters.

Conclusion
In this chapter, the reader must have developed a familiarity with the Python programming language. The
process of installation and usage of basic language elements of Python for a beginner to average level
programmer was illustrated. After reading this chapter, tasks like importing, displaying, and saving the images
through Python programming language should have become easy tasks by now.
In the next chapter, some low to medium level image processing tasks are introduced for the reader to start their
journey of manipulating digital images.

Points to remember
• Python is an open source and free programming language.
• The syntax of Python is mostly similar to MATLAB.
• One can use Python shell for testing individual commands or use files to write multiple lines of code.
• There are multiple libraries/packages available in Python. Multiple packages may contain the function by
the same name.
• NumPy library is used for array based computations.
• Matplotlib library is used for making plots.
• OpenCV library is used for image manipulation.
• Pandas library is used for dealing with spreadsheets (excel files).

Exercises
1. Import a colored image from your hard disk to Python (IDLE) and display it on the screen. (Hint: see
Section 2.7)
2. Examine the array created in Python after importing the image in question one for its shape and intensity
values. (Hint: see Section 2.5)
3. Illustrate the differences between imshow function of OpenCV and Matplotlib library. [Hint: see Section
2.7.1 and 2.7.2]
4. After importing the numpy library (as np), try the following command on Python shell – ‘help(np.pi)’ and
read and understand the result.
5. Write a Python program to add the contents of columns A and B in sheet one of the Excel file named
Data.xlsx. Columns one and two contain numbers only. Use only the first ten rows. Put the result in
column two of the same Excel file but in sheet two. To begin with, create an Excel sheet named Data.xlsx
initialized by some random data. (Hint: see Section 2.8)
OceanofPDF.com
CHAPTER 3
Playing with Digital Images

3.1 Introduction
After going through Chapter 1 and Chapter 2, the readers must have become familiar with
image basics as well as fundamentals of Python programming (relevant to digital image
processing). In this chapter, we will take the first steps to manipulate images. We will learn
the basic relationships between pixels, what histogram (which is a plot of intensity versus
corresponding number of pixels in a given image) is and how to make it, and some
transformations on image intensities and pixel locations. Although most of the book is
dedicated to grayscale images, we will also touch on the topic of conversion from one color
framework to another.

Structure
This chapter will discuss the following topics:
• Playing with pixel and patches
• Neighborhood of a pixel
• Histogram processing
• Basic transformation on images
• Color models

Objectives
After reading this chapter, the reader will understand some very fundamental operations with
the images, like manipulating pixels and their neighborhood, drawing a histogram of a given
image to extract information about the frequency of individual intensities in the original
image, basic intensity, and spatial transformation on images. The reader will also learn how
color is represented inside computers.

3.2 Playing with pixel and patches


Let us begin by learning how to play with pixel intensities. For this purpose, refer to Code
3.1 and its output (on the shell as well as Figure 3.1). There is a lot to learn from this
example, so we will take things one by one.
First, this code uses the package that we created in Section 2.7.3. In this example, as seen in
line number 11, the image that we use is the same as part (a) of Figure 2.12, i.e., The
Gwalior Fort image. While line number 12 should display this image as an output-colored
image, the results do not contain this to save space. Line number 15 shows an alternate
syntax for converting a colored BGR image to a grayscale image. Although, as shown in line
number 8 of Code 2.14, it could have been alternatively imported as a grayscale image in the
first place. From line number 21 to 27, the shape, size, and data type of the images are
displayed. Here, it is important to note the data type. It is printed as uint8 in the output.
uint8 means an unsigned integer of 8 bits. A pixel in a grayscale image is 8 bits. Every bit is
supposed to contain numbers from 0 to 255. No fractions, no negative numbers – that is why
a data type unsigned integer of 8 bits (uint8) is suitable. We will discuss this more in the
upcoming lines of the code.
In line number 30, variable b, a view of the NumPy array grayscale_image is extracted and
displayed in part (b) of Figure 3.1. Remember that the view is tied to the array from which it
is created – any changes in the original array or view will be simultaneously reflected in
both. In the present case, row no. 50 to 150, and column no. 50 to 150 are selected for the
creation of the view. In line number 34, all rows and all columns of view b are made as 0.
Needless to say, this change will be reflected in the original grayscale_image. This can be
verified by displaying the image and line number 35 [also notice the output in part (c) of
Figure 3.1]. In the output, one can notice a black patch at locations [50:150,50:150].
However, one may manually put any values in place of 0 to all pixels or individual pixels.
In line number 38, the grayscale image is again restored. Now, we will create a copy in line
number 40 and make all the elements in the copy equal to zero. This should not affect the
grayscale image, as copy creates another variable in memory that is not tied to the original
image (numpy array). This can be verified from part (e) of Figure 3.1.
Line number 44 to 46 illustrate how to zero-pad images with some constant value (0 in this
case). Let us understand what is zero padding first. Refer to part (f) of Figure 3.1. In this
figure, notice that the original grayscale image now has borders of black color on its top,
bottom, left, and right. This is what we mean by zero padding, and this is required in some
cases to be able to match up the size of the current image to another image without changing
its content. This is also required when we apply filters to images (more on this later). The
syntax for this is shown in line number 45 which reads
grayscale_image2=cv2.copyMakeBorder(grayscale_image,0,100,200,300,cv2.BORDER
_CONSTANT,value=0). The first argument is the image on which zero padding is required.
Second, third, fourth and fifth arguments are, respectively, the number of pixels (thickness)
required on the top, bottom, left, and right sides. The last argument, which reads
cv2.BORDER_CONSTANT, value=0, is for providing a fill value in those padded border
regions. There are many options for that. BORDER_CONSTANT, value=0 fills the borders
with all zeros (i.e., constant value), but one may explore other options by typing
help(cv2.copyMakeBorder) on Python shell. We will use constant fill unless otherwise
required.
01-
#=================================================================
=====
02- # PURPOSE : Playing with pixel and some good practices to remember
03-
#=================================================================
=====
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package and ...
08- # one may find the details related to its contents and usage in section 2.7.3
09-
10- # Loading the input image as colored image
11- input_image=cv2.imread('img1.bmp')
12- mf.my_imshow(input_image,"Input Image (Colored)")
13-
14- # Converting the colored image to grayscale
15- grayscale_image=cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)
16- fig1,ax1=plt.subplots(3,2)
17- fig1.show()
18- mf.my_imshow(grayscale_image,'(a). Grayscale Image',ax1[0,0])
19-
20- # Calculating the shape, size, dtype parameters for both images
21- print("Shape parameters for colored image are ...",input_image.shape)
22- print("Size parameter for colored image is ...",input_image.size)
23- print("Data Type for colored image is ...",input_image.dtype)
24-
25- print("Shape parameters of grayscale image are ...",grayscale_image.shape)
26- print("Size parameter of grayscale image is ...",grayscale_image.size)
27- print("Data Type of grayscale image is ...",grayscale_image.dtype)
28-
29- # Accessing a patch of grayscale image
30- b=grayscale_image[50:150,50:150]; # Remember this is a view NOT a copy
31- mf.my_imshow(b,'(b). Grayscale Image\'s patch',ax1[0,1])
32-
33- # Demo - Changing values in view changes original data
34- b[:,:]=0
35- mf.my_imshow(grayscale_image,'(c). After changing values in view',ax1[1,0])
36-
37- # Demo - Creating a copy
38- grayscale_image=cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)
39- mf.my_imshow(grayscale_image,'(d). Grayscale Image (Converted again)',ax1[1,1])
40- b=grayscale_image[50:150,50:150].copy(); # Way to copy a patch into another array
41- b[:,:]=0
42- mf.my_imshow(grayscale_image,'(e). After changing values in copied array',ax1[2,0])
43-
44- # Zero Padding
45-
grayscale_image2=cv2.copyMakeBorder(grayscale_image,0,100,200,300,cv2.BORDER_C
ONSTANT,value=0)
46- mf.my_imshow(grayscale_image2,"(f). Border Padded Image with zeros",ax1[2,1])
47-
48- # Conversion to float32 or float64 for manipulation
49- img=grayscale_image.astype(np.float64)
50- print("\nData type of the image is now ...",img.dtype)
51- img=grayscale_image.astype(np.uint8)
52-
53- # dtype conversion issues
54- print("\nData type conversion issues - ")
55- print("This is 5 in uint8 format",np.uint8(5))
56- print("This is 5.8 in uint8 format",np.uint8(5.8))
57- print("This is [-3, -2, -1, 0, 1, 2, 254, 255, 256, 257, 258] in uint8 format\n",np.uint8([-3,
-2, -1, 0, 1, 2, 254, 255, 256, 257, 258]))
58-
59- x=np.uint8([250])
60- y=np.uint8([7])
61- print("\nSum in NUMPY way is ...",x+y) # Modulo 256 addition
62- print("Sum in OPEN CV way is ...",cv2.add(x,y)) # Saturation
63-
64- plt.show()
65- print("Completed Successfully ...")
Code 3.1: Playing with image pixels
The output of Code 3.1 (on Python shell and Figure 3.1) is given as follows:
Shape parameters for colored image are ... (384, 512, 3)
Size parameter for colored image is ... 589824
Data Type for colored image is ... uint8
Shape parameters of grayscale image are ... (384, 512)
Size parameters of grayscale image is ... 196608
Data Type of grayscale image is ... uint8

Data type of the image is now ... float64

Data type conversion issues -


This is 5 in uint8 format 5
This is 5.8 in uint8 format 5
This is [-3, -2, -1, 0, 1, 2, 254, 255, 256, 257, 258] in uint8 format
[253 254 255 0 1 2 254 255 0 1 2]

Sum in NUMPY way is ... [1]


Sum in OPEN CV way is ... [[255]]
Completed Successfully ...
Line number 49 of Code 3.1 illustrates the conversion of data type from uint8 to float64
(which means a float of 64 bits). This is necessary to facilitate intermediate calculations on
images. Once the image is imported inside computers to be manipulated so that desired
results are obtained, it is obvious that mathematical operation will be applied. The result of
mathematical operations can be anything — a negative number, a fraction, or even a real
number outside the range of 0 and 255. uint8 format is unsuitable and, hence, this kind of
conversion is necessary. Once calculations are over, the image is normalized (usually) and
converted back to uint8 format. The conversion to uint8 format is shown in line number 51.
The step of normalization before this is not done in this code. We will elaborate on it as and
when needed.
The point to note here is that, during conversion from float64 or, for that matter, any format
to uint8, the conversion of numbers outside the range of 0 to 255 should be kept in mind.
This is illustrated in line number 54 to 57. Notice the output of line number 55 and 56. 5 in
uint8 format is 5 but 5.8 in uint8 format is also 5 as uint8 must be an integer. So, the
fractional part is simply removed (not even rounded off to the nearest integer). Notice the
output of line number 57. It illustrates all the possible cases that need our attention.
Convince yourself from the output that integers in the range of 0 to 255 remain as it is in
uint8 format, but numbers outside the range are modulo 256 versions of original numbers.
Now, this is what happens inside numpy library for uint8 format. OpenCV works in a
different way as compared to other libraries like Matplotlib for doing the exact same task.
Figure 3.1: Output of Code 3.1
In line number 59 and 60, two uint8 numbers in the form of an array of NumPy are created.
Line number 61 and 62 demonstrate the addition of these two numbers by NumPy and
OpenCV methods. In the output, note that NumPy performs addition and results in modulo
256 of the regular addition. OpenCV, on the other hand, saturates the result of addition to
255 if it is greater than 255.
This suggests the requirement of normalization. Once the image is manipulated in float64
form, before converting it back to uint8 form, all the numbers must be scaled to the range of
0 to 255 as without that, undesired results may be obtained.

3.3 Neighborhood of a pixel


Neighborhoods play an important role in life in general. An individual may be defined by the
kind of people he/she lives with, works with, and is friends with. As we shall see in the
coming sections, neighborhoods also play an important role for a pixel in an image. At this
point, we need to understand the notion of neighborhood. Refer to Figure 3.2. We are
looking at pixel with indices (4,2), (4,6), and (4,10) respectively (marked in dark shade). We
will define 4-neighborhood, 4-neighborhood (diagonal), and 8-neighborhood for these three
respectively.
Figure 3.2: Types of neighborhoods of a pixel
Let us first consider pixel with indices (4,2). The four other pixels, i.e., (4,1), (3,2), (4,3), and
(5,2), which are marked in a light gray shade, are the 4 neighbors of pixel (4,2). We define
neighborhood in the following way – Open 4-neighborhood of pixel (4,2) is the set
containing pixels with indices {(4,1), (3,2), (4,3), (5,2)} i.e., open 4-neighborhood does not
contain the pixel for which neighborhood is calculated. Similarly, the closed 4-neighborhood
of the pixel (4,2) is the set with pixel coordinates {(4,2), (4,1), (3,2), (4,3), (5,2)} i.e., closed
4-neighborhood contains the pixel for which the neighborhood is calculated. 4-neighborhood
of a pixel is represented by N4(p) where p is the pixel under consideration. This is
understood as a closed neighborhood by default in this book until otherwise stated.
One can also determine the 4-neighborhood by considering diagonal 4-neighbors. To see an
illustration of a diagonal 4-neighborhood, see the pixel with coordinates (4,6) in Figure 3.2.
This neighborhood can also be closed or open. It is represented by ND(p), where p is the
pixel under consideration. This is also understood as a closed neighborhood by default in this
book until otherwise stated.
Similarly, see pixel (4,10) in Figure 3.2 where the 8-neighbors are marked in a light gray
shade. 8-neighborhood is represented by N8(p), and it can also be closed or open (default
closed).
We will discuss more about neighborhoods, connectivity, adjacency, and, in general
contribution of set theory in image processing for morphology in the chapter on morphology.
as it will be relevant there. For now, this suffices.

3.4 Histogram processing


Histograms play an important role in displaying the images, as well as describing a region in
the image. They provide an automatic and easy way to enhance images as well. Let us begin
by exploring what a histogram is.
3.4.1 Histogram of a grayscale image
Often, it is required to know the distribution of colors (intensities) in an image. Refer to part
(a) and (b) of Figure 3.3 for such a plot. Part (a) of Figure 3.3 is the grayscale version of
Figure 1.5. Note that it is not the R, G, or B component (it is the average of RGB
components). Part (b) of Figure 3.3 shows the distribution of intensities (from 0 to 255) in
the image. For a given grayscale intensity on the Y axis, we plot the total number of pixels in
the image with that value. After plotting it for all the possible values, i.e., 0 to 255, we get
the curve shown in part (b) of Figure 3.3. Now, since we have a total of 256 values on the X-
axis, this histogram is 256-bin histogram. Also, the Y axis’s maximum value will indirectly
depend on the image size. So, it will be different for every image. Hence, this is called un-
normalized histogram. However, we can normalize the Y axis by dividing the scale by the
total number of pixels. That is how we get part (c) of Figure 3.3, which is 256-bin
normalized histogram. A normalized histogram tells us that if we select a random pixel in the
given image, what will be the probability of it, acquiring a particular grayscale value – it can
be read from the normalized histogram. Whenever we say histogram, we mean normalized
histogram by default.

Figure 3.3: Understanding histogram


Now, let us appreciate the significance of the X and Y axis data. Since the image we are
considering in part (a) of Figure 3.3 is an 8-bit grayscale image, we have a total of 256 (28)
intensity values on the X axis. However, an n-bin histogram can also be made. The only
condition is that n cannot be greater than 256 in the case of an 8-bit grayscale image. Such a
plot for the 16-bin histogram is shown in Figure 3.4. Note that instead of using plot, we have
used the stem command. This is because if we use the stem command for Figure 3.3, the
plotted graph will not be properly visible for readability (you may try doing that and see for
yourself).
Let us understand what bins are. 256 bins mean that infinite grayscale shades have been
quantized to only 256 gray levels for the reasons discussed in Section 1.2.1. However, we
may use a lower number of grayscale levels to generate a 16-bin, or for that matter, an n-bin
histogram for the same image. A 16 bin histogram means we will have 16 intensity values
equally spaced between 0 and 255 on the X scale (interval size in this case is 256/16 = 16).
This means bin 1 will have a total number of pixels whose intensities fall on or before zero,
bin 2 will have a total number of pixels whose intensity values fall between 1 and 17, and so
on.
Let us see the difference between 256 bin and 16 bin histograms. Obviously, a 256 bin
histogram has more information specific to grayscale levels, but it is more computationally
expensive to compute as compared to 16 bin histogram. The total number of bins in the
histogram depends on how good or poor resolution is needed on X scale.

Figure 3.4: 16 bin histogram illustration


Mathematically speaking, if we take 0 as black intensity and 255 as white intensity and
assume infinite grayscale levels between them (i.e., all fractions included) and assume every
pixel’s intensity as a random variable that can take any value between 0 to 255, then the
continuous histogram will be exactly equal to probability density function (PDF).
However, since we take only finite gray levels on the X axis, the histogram approximates
PDF. The greater the number of gray levels on the X axis, the better the histogram
approximates the PDF of intensities in the image. Following figure shows three different
frames of a colored image and the corresponding histograms:
Figure 3.5: Histograms of B, G, and R frames of image in Figure 1.5

3.4.2 Information obtained from histogram


Another point to note is that the histogram alone tells us a lot about the image contrast in
general. For example, in Figure 3.5, we have plotted a 16 bin histogram for the B, G, and R
frames of image shown in Figure 1.5. Here, it may be noted that the histogram of the red
frame has nearly 0 values for intensities between 0 to 50. This means the corresponding
image will not have darker regions, and the same can be confirmed from the red frame
image, i.e., part (c) of Figure 3.5. It is not a good contrast image though, as its intensity
values are not uniformly distributed over the entire range 0-255. You may also note that the 3
histograms of B, G, and R frames in Figure 3.5 are all different – and this is expected.
Convince yourself that two entirely different images of different subjects may have the same
histograms, and two images of the same subject taken in different illumination conditions
may have different histograms.
The code for the generation of the histogram as shown in Figure 3.4 is given in Code 3.2 (it
applies to Figure 3.3 as well, but with a minor modification of changing stem to plot in line
numbers 23 and 28 and setting a number of bins to 256 in line number 18). Most of the code
is self-explanatory. Only line number 18, 19, and 20 need an explanation. In line number 18,
the number of bins are chosen. This number for an n-bit grayscale image can be anything
less than . Here, for an 8-bit grayscale image, it is chosen as 16 (which is less than 256). In
line number 19, the X axis points are created by keeping the required scaling in mind. In line
number 20, the actual histogram is calculated by using the cv2.calcHist command. The first
parameter is the name of the grayscale/colored image in square braces. The second
parameter is the frame number in the case of a BGR image. If it is already grayscale, this
parameter will be always [0]. The third parameter is the details of the mask. Let us
understand what mask is. If u require to compute the histogram for only some specific
regions of the image, you may create a binary mask and supply in place of this argument – in
our case, we do not require it. Then, you have to supply the number of bins and range for
histogram calculation. Notice that everything here is in square braces. Similarly, one may
explore Code 3.3 for generation of output as shown in Figure 3.5.
01-
#=================================================================
=====
02- # PURPOSE : Learning Histogram
03-
#=================================================================
=====
04-
05- import cv2,matplotlib.pyplot as plt, numpy as np
06- import my_package.my_functions as mf # This is a user defined package and ...
07- # one may find the details related to its contents and usage in section 2.7.3
08-
09- #--------------------------------------------------------------------------
10- # Image Histogram
11- #--------------------------------------------------------------------------
12- a=cv2.imread('img3.bmp',0)
13- r,c=np.shape(a)
14- fig,ax=plt.subplots(3,1)
15- fig.show()
16- mf.my_imshow(a,'(a) Grayscale Image',ax[0])
17-
18- no_of_bins=16 #This defines the total no. of points on X axis
19- X_axis=255*np.arange(0,no_of_bins,1)/(no_of_bins-1)
20- hist_values=cv2.calcHist([a],[0],None,[no_of_bins],[0,256]) # Corresponding values on
Y axis
21-
22- plt.subplot(3,1,2)
23- plt.stem(X_axis,hist_values)
24- plt.grid()
25- plt.title('(b) '+str(no_of_bins)+' bin UN-NORMALIZED histogram')
26-
27- plt.subplot(3,1,3)
28- plt.stem(X_axis,hist_values/(r*c))
29- plt.grid()
30- plt.title('(c) '+str(no_of_bins)+' bin NORMALIZED histogram')
31-
32- plt.show()
33- print("Completed Successfully ...")
Code 3.2: Generating histogram of grayscale image
The code for making interpretation of the histograms so generated is mentioned below:
01-
#=================================================================
=====
02- # PURPOSE : Learning Histogram Interpretation
03-
#=================================================================
=====
04-
05- import cv2,matplotlib.pyplot as plt, numpy as np
06- import my_package.my_functions as mf # This is a user defined package and ...
07- # one may find the details related to its contents and usage in section 2.7.3
08-
09- #--------------------------------------------------------------------------
10- # Image Histogram
11- #--------------------------------------------------------------------------
12- a=cv2.imread('img3.bmp',1)
13- r,c,h=np.shape(a)
14- mf.my_imshow(a,'Input BGR Image')
15-
16- no_of_bins=16
17- X_axis=255*np.arange(0,no_of_bins,1)/(no_of_bins-1)
18- histB = cv2.calcHist([a],[0],None,[no_of_bins],[0,256])/(r*c)
19- histG = cv2.calcHist([a],[1],None,[no_of_bins],[0,256])/(r*c)
20- histR = cv2.calcHist([a],[2],None,[no_of_bins],[0,256])/(r*c)
21-
22- fig4,ax4=plt.subplots(2,3)
23- fig4.show()
24-
25- mf.my_imshow(a[:,:,0],'(a) Blue Frame',ax4[0,0])
26- mf.my_imshow(a[:,:,1],'(b) Green Frame',ax4[0,1])
27- mf.my_imshow(a[:,:,2],'(c) Red Frame',ax4[0,2])
28-
29- plt.subplot(2,3,4)
30- plt.stem(X_axis,histB)
31- plt.grid()
32- plt.title('(d) Blue Frame '+str(no_of_bins)+' bin histogram')
33-
34- plt.subplot(2,3,5)
35- plt.stem(X_axis,histG)
36- plt.grid()
37- plt.title('(e) Green Frame '+str(no_of_bins)+' bin histogram')
38-
39- plt.subplot(2,3,6)
40- plt.stem(X_axis,histR)
41- plt.grid()
42- plt.title('(f) Red Frame '+str(no_of_bins)+' bin histogram')
43-
44- plt.show()
45- print("Completed Successfully ...")
Code 3.3: Histogram interpretation

3.4.3 Histogram equalization


For understanding histogram equalization, refer to Figure 3.6. Part (b) of Figure 3.6 is a
processed version of part (a), and one would agree that part (b) shows more details than part
(a). Parts (a) and (b) have the same information but it is just that in part (a), it is not properly
displayed to suit human vision. Look at the histogram of part (a) in part (c). The histogram
reveals that most of the image intensities are centered around the black portion. This is also
evident in the image itself – most of its portions are dark. On the other hand, part (b), which
is in turn derived from part (a), has a histogram that almost uniformly spans all the gray
levels depicted in part (d). The process of input image modification by which an arbitrary
shaped histogram for a given input digital image is converted to an almost uniform
histogram for processed output image is called histogram equalization. Remember that
visual perceptual information is preserved in such a process. What we mean by this is the
content of the image remains the same but the lighting is adjusted to suit human vision. In
simpler terms, the process of conversion of arbitrarily shaped histogram to almost uniform
histogram is histogram equalization.
Figure 3.6: Histogram equalization
In general, the contrast of the image improves after histogram equalization. We will justify
the word almost uniform later. Before understanding what works behind histogram
equalization, let us see how it is implemented in Python in Code 3.4. Most of the code is
similar to Code 3.2 with the line number 24 which reads hist_equ_image =
cv2.equalizeHist(a) introduced for histogram equalization. Notice that the simplicity of
syntax is self-explanatory. The output is already shown in Figure 3.6.
01-
#=================================================================
=====
02- # PURPOSE : Learning Histogram Equalization
03-
#=================================================================
=====
04-
05- import cv2,matplotlib.pyplot as plt, numpy as np
06- import my_package.my_functions as mf # This is a user defined package and ...
07- # one may find the details related to its contents and usage in section 2.7.3
08-
09- a=cv2.imread('img4.bmp',0)
10- r,c=np.shape(a)
11- fig,ax=plt.subplots(2,2)
12- fig.show()
13- mf.my_imshow(a,'(a) Grayscale Image',ax[0,0])
14-
15- no_of_bins=30
16- X_axis=255*np.arange(0,no_of_bins,1)/(no_of_bins-1)
17- hist_values=cv2.calcHist([a],[0],None,[no_of_bins],[0,256])/(r*c)
18-
19- plt.subplot(2,2,3)
20- plt.stem(X_axis,hist_values)
21- plt.grid()
22- plt.title('(c) '+str(no_of_bins)+' bin histogram of (a)')
23-
24- hist_equ_image = cv2.equalizeHist(a)
25- mf.my_imshow(hist_equ_image,'(b) Histogram Equalised Image',ax[0,1])
26- hist_values2=cv2.calcHist([hist_equ_image],[0],None,[no_of_bins],[0,256])/(r*c)
27-
28- plt.subplot(2,2,4)
29- plt.stem(X_axis,hist_values2)
30- plt.grid()
31- plt.title('(d) '+str(no_of_bins)+' bin histogram of (b)')
32-
33- plt.show()
34- print("Completed Successfully ...")
Code 3.4: Histogram equalization
Having understood the purpose and procedure, you need to understand the mechanism of
histogram equalization. We will do it in three steps:
• First, we will state the mathematical prerequisite from continuous random variables to
understand the mechanism.
• Then we will take a numerical example for discrete/digital case.
• In the end, we will list some limitations of applying the concept from continuous world
to digital world, in this case attributing the reasons to very specific events in such
conversion.

3.4.3.1 Mathematical pre-requisite for understanding histogram equalization


The following theorem called probability integral transform plays a very important role in
histogram equalization as well as histogram matching (a topic that we will study in Section
3.4.4). It states that:
If a random variable X has a continuous distribution with cumulative distribution function
(CDF) FX, then the random variable Y defined as FX (X) has a standard uniform distribution
on interval [0 1].
For a moment, let us assume we are dealing with continuous images instead of discrete ones.
That is, now we have a continuous range of colors between 0 and 255. Also, the pixel
locations are continuous. In such a case, if we plot a histogram, it will be the PDF of the
intensity values (i.e., continuous random variable X). The probability integral transform
theorem says that there is a guaranteed way of transforming this image into another image so
that the PDF (histogram) of the resulting image is uniformly distributed, provided the
transformation function used for such a conversion is CDF of the input random variable X.
However, since histogram is an approximation to PDF as the histogram is used for
digital/discrete data, whereas the theorem is valid for the continuous random variables, there
will be some deviations in the results. Still, we can expect the histogram of an output
(digital) image to have a good contrast (i.e., a histogram with approximately the same
number of pixels for every grayscale value). We will talk about the deviations from ideality
introduced due to the application of the theorem from a continuous random variable to a
discrete random variable but let us start with a working example in the next section.

3.4.3.2 Histogram equalization on one dimensional data


To understand how histogram equalization works for digital images, refer to the example
shown in Figure 3.7 and Table 3.1. Here, we are considering a hypothetical image with total
number of pixels = 69. Total number of grayscale levels available = 17, hence grayscale
values from 0 to 16. The un-normalized histogram of such an image is shown in part (a) of
Figure 3.7. Notice that gray levels after 11 are all 0. The image histogram does not span the
entire grayscale values, and hence, it is not uniformly distributed.
The X-axis of the histogram (i.e., gray levels) is shown in column 1 of Table 3.1. Each entry
in column 2 of this table is the corresponding number of pixels in the image with a grayscale
value. Hence, columns 1 and 2 have complete un-normalized histogram information. Column
no. 3 in the table is the normalized histogram (obtained by dividing every entry in column 2
by total number of pixels, i.e., 69 in this case). In a continuous time domain, column 3 is
called PDF. If we sum up column 2, we get the total number of pixels in the image. Summing
up column no. 3 gives us 1 because it is a PDF of input data (abbreviated as IP PDF).
Next, in column no. 4, we calculate CDF from PDF. To do this in a continuous domain, we
would need to integrate column no. 3, i.e., input PDF when integrated gives input CDF. In a
discrete domain, to calculate any entry in column no. 4, sum up all the entries in column no.
3 till the row where you desire to calculate CDF value. This CDF will act as the
transformation function as discussed in the previous section. Column no. 5 is
CDF×Max.intensity value, which in this case is CDF × 16. In the last column, these values
are rounded off to the nearest integer.
From the last column and column no. 2, we calculate the histogram (un-normalized) for the
transformed data (image). In the first row of the last column, the grayscale value is 2, and the
corresponding value in column 2 is 7. This means that in the output histogram,
corresponding to grayscale value 2, 7 will be the total number of pixels. Similarly, the values
for all the other grayscale levels are calculated. There may be multiple entries for some
grayscale levels in the last column. For example, 15 occurs 2 times with corresponding
values in column 2 as 4 and 3. So, in the output histogram, grayscale value 15 will have 4+3,
i.e., 7 pixels in total. In the same manner, all the other values are calculated, and a histogram,
as plotted in part (b) of Figure 3.7, is obtained. So, even without seeing the input image, just
by seeing its histogram, the equalized histogram, i.e., the histogram of the output image, can
be calculated:

Figure 3.7: Process behind histogram equalization


If the input image is available, then by seeing the first and last columns of Table 3.1, the
output image can be obtained. The procedure is simple. Simply replace every pixel’s
grayscale values in the input image with the mapping done in the first and last columns.
The histogram we have obtained in part (b) of Figure 3.7 has utilized almost all the
grayscale values. There is no region that is left completely blank. However, in part (a) of
Figure 3.7, the portion beyond grayscale value 11 was blank. Hence, this transformation is
called histogram equalization and helps to improve the contrast of the image.
IP un-normalized OP gray level (not rounded
IP gray level IP PDF IP CDF OP gray level
histogram off)

0 7 0.10 0.10 1.62 2

1 8 0.12 0.22 3.48 3

2 9 0.13 0.35 5.57 6

3 8 0.12 0.46 7.42 7

4 8 0.12 0.58 9.28 9

5 8 0.12 0.70 11.13 11

6 6 0.09 0.78 12.52 13

7 5 0.07 0.86 13.68 14

8 4 0.06 0.91 14.61 15

9 3 0.04 0.96 15.30 15


10 2 0.03 0.99 15.77 16

11 1 0.01 1.00 16.00 16

12 0 0.00 1.00 16.00 16

13 0 0.00 1.00 16.00 16

14 0 0.00 1.00 16.00 16

15 0 0.00 1.00 16.00 16

16 0 0.00 1.00 16.00 16

Table 3.1 : Calculation for histogram equalization


The example illustrated in Figure 3.7 has been generated by using Code 3.5. The data used
for the input histogram is shown in line number 9. You may change the data and play with
the code to see if this transformation works. Also, note that this code is not limited to images
only. It is useful for any kind of data for which a histogram is known. Line number 48 to 52
helps in saving the table of the form, as shown in part (c) of Figure 3.7 to excel file named
output1.xlsx.
01-
#=================================================================
=====
02- # PURPOSE : Understanding Mechanism of Histogram Equalization
03-
#=================================================================
=====
04- import numpy as np
05- import matplotlib.pyplot as plt
06- import pandas as pd
07-
08- # Input Un-normalized histogram here
09- occurrence_input=np.array([7,8,9,8,8,8,6,5,4,3,2,1,0,0,0,0,0])
10-
11- total_gray_level=len(occurrence_input)
12- total_pixel=np.sum(occurrence_input) # total data points (if the histogram is not from
image)
13- max_gray_level=total_gray_level-1 # this is because 0 is also counted as gray level
14- gray_levels_X_axis=np.arange(0,total_gray_level,1) # X-axis of histogram
15-
16- PDF_input=occurrence_input/total_pixel # Normalised Histogram of input image
17-
18- fig,ax=plt.subplots(2,1)
19- fig.show()
20- ax[0].stem(gray_levels_X_axis,occurrence_input)
21- ax[0].grid('On')
22- ax[0].set_title('(a) Un-normalized input histogram (arbitrary distribution)',fontsize=15)
23- ax[0].set_xlabel('Grayscale values')
24- ax[0].set_ylabel('No. of pixel')
25-
26- CDF_input=np.cumsum(PDF_input) # Transformation function derived from input itself
27- OP_unnormalized_histogram=max_gray_level*CDF_input
28- Hist_eq_level=np.round(OP_unnormalized_histogram)
29-
30- occurrence_output=np.zeros((total_gray_level)) # initialising output un-normalized
histogram
31- for i in gray_levels_X_axis:
32-
occurrence_output[i]=np.sum(np.array(occurrence_input[np.where(Hist_eq_level==i)]))
33- PDF_output=occurrence_output/total_pixel # Normalized Histogram of output image
34- ax[1].stem(gray_levels_X_axis,occurrence_output)
35- ax[1].grid('On')
36- ax[1].set_title('(b) Un-Normalized output histogram (uniform distribution
expected)',fontsize=15)
37- ax[1].set_xlabel('Grayscale values')
38- ax[1].set_ylabel('No. of pixel')
39-
40- # For printing data onto shell
41-
summary=np.hstack([gray_levels_X_axis,occurrence_input,PDF_input,CDF_input,OP_unno
rmalized_histogram,Hist_eq_level])
42- summary=summary.reshape(6,total_gray_level).T
43-
44- np.set_printoptions(suppress=True) # for printing integers without decimal point
45- np.set_printoptions(precision=2) # for making decimal precision =2 while printing
46- print(summary)
47-
48- df = pd.DataFrame(summary, columns = ['IP gray level','IP Un-normalized histogram','IP
PDF','IP CDF','OP Gray Level (not rounded off)','OP Gray Level'])
49- print(df)
50- data2excel=pd.ExcelWriter('output1.xlsx',engine='xlsxwriter')
51- df.to_excel(data2excel,sheet_name='histogram equalization',index=False)
52- data2excel.save()
53-
54- plt.show()
55- print("Completed Successfully ...")
Code 3.5: Code for histogram equalization (not the inbuilt method of OpenCV)

3.4.3.3 Limitations of histogram equalization in digital data


Ideally, the histogram of the output image computed by the histogram equalization process
should have uniform distribution - meaning every grayscale value occurs in the image in
equal amounts. Hence, the PDF should be a flat line in the range of allowed intensity levels
(it is undefined everywhere else anyway). However, this does not happen in practice because
as discussed in Section 3.4.3.1, such a guarantee is only made for continuous variables and
not discrete variables. Also, the n-histogram is an approximation to the actual PDF. Hence,
the calculation of CDF is also not ideal. Owing to these reasons, in digital domains, we make
approximate calculations, and the results obtained are not ideal. That is, flat histogram is not
obtained. We may expect utilization of almost all grayscale values and the spread of
histogram after applying this process.

3.4.4 Histogram matching


Histogram equalization, as we now know, is a method where an image having any arbitrary
PDF is transformed into another image (preserving visual perceptual information) with
uniform PDF. Let us now explore a method using which an image with arbitrary PDF may be
transformed to another image (preserving visual perceptual information) with a pre-specified
PDF (not necessarily uniform). Let us explore this kind of conversion in this section. Let us
emphasize the need of histogram matching now. One may think of many such applications at
this stage, too – we may want to unify the contrast of multiple images; we may want to
compress the images by restricting their intensity values in a certain range, etc.

3.4.4.1 Defining histogram


Histogram matching (also called histogram specification) is a method by which an image
with an arbitrary histogram can be transformed into another image (preserving visual
perceptual information) with a given pre-defined histogram (not necessarily uniform). This
method is called histogram specification because we are specifying a histogram that we want
our final image to have. The reason for calling this histogram matching will become clear in
Section 3.4.4.2. Also, keep in mind that the above is achieved only approximately as we are
dealing with digital images and not continuous ones. So, the output image will be such that
its histogram will try to match the pre-specified histogram as closely as possible.

3.4.4.2 Mathematical background


As stated in Section 3.4.3.1, probability integral transform guarantees a uniform PDF for any
given input PDF if the transformation used is the CDF of the input PDF. The reverse also
holds true, i.e., if Y has a uniform distribution on [0,1], and if X has a cumulative distribution
FX then, the random variable FX-1 (Y) has the same distribution as X.
Forward and backward transform gives us a way to achieve histogram specification. First,
we convert a given histogram to uniform, and then we convert the uniform histogram so
obtained to any desired histogram. We have effectively converted an input histogram to a
given output histogram. Here, for the purpose of this histogram specification, the two
intermediate uniform histograms are matched, hence the name histogram matching.

3.4.4.3 Implementation details


Before having a look at how the matching process happens, let us get acquainted with how it
is implemented in Python. Refer to Figure 3.8 and Figure 3.9 which are generated by Code
3.6. Part (a) of Figure 3.8 is the input image with the corresponding histogram in part (a) of
Figure 3.9:

Figure 3.8: Illustration of histogram specification/equalization


Similarly, the histograms of the other parts of the above figure are shown in the
corresponding parts of figure below:
Figure 3.9: Histograms of corresponding images in Figure 3.8
We want this histogram to become like the histogram of part (c) of Figure 3.8, i.e., part (c)
of Figure 3.9. The final image after matching is shown in part (d) of Figure 3.8 and the
matched histogram in part (d) of Figure 3.9. For comparison with a histogram equalized
image, the equalized image is shown in part (b) of Figure 3.8 with its histogram in part (b)
of Figure 3.9. Notice that the histogram of part (d) of Figure 3.9 matches approximately
with the histogram of part (c) of Figure 3.9. The input image is dark, and hence, its
histogram is centered towards low values. An equalized image has intensities in almost all
ranges. The matched image has mid-level intensities as dictated by the reference histogram.
The code for implementing this is shown in Code 3.6. Histogram matching is done using
match_histograms function in line number 22 using the scikit library (skimages) imported
by nickname sk in line number 6. To be able to use this library, one must install it by using
the procedure illustrated in Section 2.3 and by using the command pip install scikit-image.
01-
#=================================================================
=====
02- # PURPOSE : Histogram Matching / Specification
03-
#=================================================================
=====
04- import cv2
05- import matplotlib.pyplot as plt
06- import skimage.exposure as sk # use 'pip install scikit-image' for using this
07- import my_package.my_functions as mf # This is a user defined package and ...
08- # one may find the details related to its contents and usage in section 2.7.3
09-
10- image=cv2.imread('img5.bmp',0) # Input Grayscale Image
11- r1,c1=image.shape
12- no_of_bins=256
13- hist_image = cv2.calcHist([image],[0],None,[no_of_bins],[0,256])/(r1*c1)
14-
15- hist_equ_image = cv2.equalizeHist(image) # Histogram Equalised Image
16- hist_equ=cv2.calcHist([hist_equ_image],[0],None,[no_of_bins],[0,256])/(r1*c1)
17-
18- reference=cv2.imread('img6.bmp',0) # Reference image
19- r2,c2=reference.shape
20- hist_reference = cv2.calcHist([reference],[0],None,[no_of_bins],[0,256])/(r2*c2)
21-
22- matched = sk.match_histograms(image,reference) # Histogram matched image
23- matched=mf.norm_uint8(matched)
24- r3,c3=matched.shape
25- hist_matched = cv2.calcHist([matched],[0],None,[no_of_bins],[0,256])/(r3*c3)
26-
27- fig,ax=plt.subplots(2,2)
28- fig.show()
29-
30- mf.my_imshow(image,'(a) Input Grayscale Image',ax[0,0])
31- mf.my_imshow(hist_equ_image,'(b) Histogram Equalized Image',ax[0,1])
32- mf.my_imshow(reference,'(c) Reference Grayscale Image',ax[1,0])
33- mf.my_imshow(matched,'(d) Matched Grayscale Image',ax[1,1])
34-
35- fig2,ax2=plt.subplots(2,2)
36- fig2.show()
37-
38- ax2[0,0].plot(hist_image)
39- ax2[0,0].set_title("(a) Histogram of input image")
40- ax2[0,0].grid(1)
41-
42- ax2[0,1].plot(hist_equ)
43- ax2[0,1].set_title("(b) Histogram of Equalized image")
44- ax2[0,1].grid(1)
45-
46- ax2[1,0].plot(hist_reference)
47- ax2[1,0].set_title("(c) Histogram of reference image")
48- ax2[1,0].grid(1)
49-
50-
51- ax2[1,1].plot(hist_matched)
52- ax2[1,1].set_title("(d) Histogram of matched image")
53- ax2[1,1].grid(1)
54-
55- plt.show()
56- print("Completed Successfully ...")
Code 3.6: Code for histogram matching using scikit library in Python

3.4.4.4 Understanding histogram matching


Refer to Table 3.2 for understanding the process of histogram matching. The table has 15
columns, which are shown as numbered. The table is divided into 3 major parts: (a) for
histogram equalization of the target, (b) for histogram equalization of the source, and (c) for
displaying the matched values. Part (a) and (b) of Table 3.2 individually, are the same as
Table 3.1 in format. However, part (b) has one extra column at the end. It is calculated as
mentioned next. It is already known that pixel values in column 12 of Table 3.2 will replace
column 7. Hence, for the new pixel values in column 7, the new frequency (total number of
pixels having that intensity) is calculated. For example, in the first row of part (b) of Table
3.2, 2 will replace 0. Since there were 7 pixels with 0 intensity earlier, the equalized image
intensity 2 will have 7 pixels. Similarly, it can also be seen from part (b) of Table 3.2 that
intensity 16 in the equalized image will replace intensities 10, 11, 12, 13, 14, 15 and 16. So,
the total number of pixels for the new intensity 16 will be the sum of frequencies of older
intensities 10 to 16, i.e., (2+1+0+0+0+0+0) which is 3.
(a) Histogram equalization target (b) Histogram equalization source (c) Matched

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Target Target Source Source New Target


CD CD Matched
IP Target OP IP Source OP no. IP
PDF CDF F x PDF CDF F x histogram
gray frequency gray gray frequency gray of gray
16 16 frequency
level level level level pixel level

0 0 0.0 0.0 0.0 0 0 7 0.1 0.1 1.6 2 7 0 0

1 0 0.0 0.0 0.0 0 1 8 0.1 0.2 3.5 3 8 1 0

2 0 0.0 0.0 0.0 0 2 9 0.1 0.3 5.6 6 9 2 0

3 0 0.0 0.0 0.0 0 3 8 0.1 0.5 7.4 7 8 3 0

4 0 0.0 0.0 0.0 0 4 8 0.1 0.6 9.3 9 8 4 0

5 6 0.1 0.1 1.4 1 5 8 0.1 0.7 11.1 11 8 5 0


6 9 0.1 0.2 3.5 3 6 6 0.1 0.8 12.5 13 6 6 8

7 13 0.2 0.4 6.5 6 7 5 0.1 0.9 13.7 14 5 7 9

8 13 0.2 0.6 9.5 10 8 4 0.1 0.9 14.6 15 8 8


7
9 13 0.2 0.8 12.5 13 9 3 0.0 1.0 15.3 15 9 6

10 9 0.1 0.9 14.6 15 10 2 0.0 1.0 15.8 16 10 7

11 6 0.1 1.0 16.0 16 11 1 0.0 1.0 16.0 16 11 3

12 0 0.0 1.0 16.0 16 12 0 0.0 1.0 16.0 16 12 3

13 0 0.0 1.0 16.0 16 13 0 0.0 1.0 16.0 16 3 13 3

14 0 0.0 1.0 16.0 16 14 0 0.0 1.0 16.0 16 14 3

15 0 0.0 1.0 16.0 16 15 0 0.0 1.0 16.0 16 15 3

16 0 0.0 1.0 16.0 16 16 0 0.0 1.0 16.0 16 16 3

Table 3.2 : Illustrative example of histogram matching


Having understood the table structure, let us now understand the problem of histogram
matching. The problem is like this: an image has intensities from 0 to 16 (a total of 17
intensities). Its histogram is shown in columns 7 and 8 of part (b) of Table 3.2. It is desired to
convert this histogram so that its shape matches the histogram represented by columns 1 and
2 of part (a) of Table 3.2. Hence, the nomenclature source and target histogram.
Matching happens between the equalized versions of the histograms. This is why part (a)
and part (b) of Table 3.2 are computed (like Table 3.1). To match an intensity, say 6 in
column 1 (shown highlighted), its corresponding equalized intensity is found, which is 3 in
the present case, as highlighted in column 6. Now, the closest match of this equalized value
from the target histogram is found to the values in column 12 (shown highlighted). By
closest match, we mean the least integer closest to the current number (not greater than the
current number). That number happens to be 3. Corresponding to this number in column 12,
column 13 is seen for the new frequency. This pair (6,8) forms one entry in the final
histogram, as shown highlighted in columns 14 and 15 in part (c) of the table. This way, an
entire table is formed. For numbers not finding any match from part (a) to part (b) of the
table, the final entry of the frequency in column number 15 is 0. An example is intensity 0 to
4, which has 0 frequency in column 1. However, 0 does not exist in column 12. This is why,
in the final matched histogram in columns 14 and 15, entries corresponding to intensities 0 to
4 are 0.

3.5 Basic transformation on images


An image has two variables – the intensity of the pixel (from 0 to 255 for grayscale) and the
location of the pixel (in terms of x-y coordinates). So, an image can be manipulated either by
playing with the intensities of pixels (by keeping their location fixed) or by changing their
location (but keeping their intensities fixed). Another way is to change both of them
simultaneously. In this section, we will explore the first two ways, as the third one can then
be trivially done by applying the methods in succession (in whatever order desired) as many
times as needed. Although intensity transformation is also a part of spatial domain
processing (a topic of Chapter 4), but we will study some basic techniques here.

3.5.1 Intensity transformations


Intensity transformations help improve the visual quality of the entire image or parts of an
image. That way, we have two types of intensity transformations: global and local. In global
transform, the same rule is applied to every pixel of the image. In local transform, one rule is
applied in one region of the image, and the others are left untouched (or maybe a different
rule is applied there). Here, we will discuss some of the useful global intensity
transformations.

3.5.1.1 Image negatives


Negative of images, although not very helpful today, have historical value because they were
used in the printing of images by projecting them onto photographic film. They do not
enhance the image in any way. They just invert the intensities. If we talk about grayscale
images, 0 becomes 255, and 255 becomes zero. Any grayscale intensity value (say x) in the
input image is mapped to 255-x in the output image. Refer to Figure 3.10 for an illustration
of image negatives:

Figure 3.10: Image and its negative


The corresponding is given below:
01-
#=================================================================
=====
02- # PURPOSE : Learning Image Negative (GLOBAL TRANSFORMATION)
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package and ...
06- # one may find the details related to its contents and usage in section 2.7.3
07-
08- #--------------------------------------------------------------------------
09- # Image Negatives (Global Transform)
10- #--------------------------------------------------------------------------
11- input_image=cv2.imread('img1.bmp',0)
12- fig,ax=plt.subplots(1,2)
13- fig.show()
14- mf.my_imshow(input_image,'(a) Input Image',ax[0])
15-
16- negative_image=255-input_image;
17- mf.my_imshow(negative_image,"(b) Negative of the image",ax[1])
18-
19- plt.show()
20- print("Completed Successfully ...")
Code 3.7: Forming image negative

3.5.1.2 Logarithmic transformation


Let us refer to Figure 3.11 for understanding what logarithmic transformation does. The
input image has most of the intensities centered around the black color in its histogram
(Refer to part (c) of Figure 3.6 for its histogram). Now, the log-transformed image is such
that the narrow range of low intensities is mapped to a wider range of relatively high
intensities. This results in the expansion of the range of lower intensities. The expansion is
nonlinear, and the reverse is true for higher intensities. Their wider range of high intensities
is compressed to a narrower range of intensities. This is evident from the log-transformed
image.
For understanding the non-linearity in the expansion range of low intensities and
compression of range of high intensities, compare the log-transformed image in Figure 3.11
with part (b) of Figure 3.6, which is a histogram equalized version of the input image. One
can easily see those high intensities (near white) appear saturated due to equalization.
However, in a log transformed image, this saturation is absent. Low intensities are well
enhanced in log-transformed images as compared to histogram equalized images.
Figure 3.11: Log transformation
Log transformation is used in the above context. One application of log transforms is
displaying the 2D Fourier Transform of images. The 2D transform, as we shall see in the
upcoming chapters, is almost black except for some white pixels in the center. If it is
displayed without applying log transform, the information in dark regions will not be easy to
read from a visual perspective.
Let us now see the procedure for applying log transformation. Assuming that the input pixel
intensity is represented by r and output pixel intensity by s, the log transformation is defined
in the equation below:
Equation 3.1:
s=c log(1+r)
Where c is the amplitude factor. Since log transformation is global in nature, this
transformation is applied to every pixel of the image. To see the effect of this transformation
pictorially, refer to Figure 3.12:
Figure 3.12: Intensity mapping in log transformation
The input intensities are plotted on the x-scale, and the output intensities on the y-scale.
Notice the range of input intensities 0-50; in output, they become 0-175 approximately. This
is a stretching of low intensities. Similarly, note input intensities 200-250. They are mapped
to 245-255 approximately. This is the intensity range compression for high intensities.
Figure 3.12 is generated using Code 3.9.
Figure 3.11 is generated using Code 3.8. Line number 18 and 19 need some explanation in
Code 3.8. As stated earlier, when an image is imported into Python, it is in uint8 format.
01-
#=================================================================
=====
02- # PURPOSE : Learning Log Transformation on images
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package and ...
06- # one may find the details related to its contents and usage in section 2.7.3
07-
08- #--------------------------------------------------------------------------
09- # Log Transformation (Global Transform)
10- #--------------------------------------------------------------------------
11- input_image=cv2.imread('img4.bmp',0)
12-
13- fig,ax=plt.subplots(1,2)
14- fig.show()
15- mf.my_imshow(input_image,'(a) Input Image',ax[0])
16-
17- amp_factor=1;
18- log_trans_image=amp_factor*np.log(1+np.float32(input_image))
19- log_trans_image=mf.norm_uint8(log_trans_image)
20- mf.my_imshow(log_trans_image,"(b) Log Transformed Image",ax[1])
21-
22- plt.show()
23- print("Completed Successfully ...")
Code 3.8: Code for log transformation on images
For doing calculations that may involve fractional values, we need to convert that to float (32
or 64-bit, according to the need). Line number 18 does that exactly. Before applying the log
transformation in syntax np.float32(input_image), in line number 19, the processed image,
which is in float32 format, is converted back to uint8 format after doing normalization. This
is achieved by using the norm_uint8 function from the custom package designed in Section
2.7.3 (a look into that section is recommended). The code for generating logarithmic
mapping is given below:
01-
#=================================================================
=====
02- # PURPOSE : LOG transformation curve
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package and ...
06- # one may find the details related to its contents and usage in section 2.7.3
07-
08- #--------------------------------------------------------------------------
09- # Log Transformation (Global Transform)
10- #--------------------------------------------------------------------------
11- input_intensities=np.arange(0,256,1) # Input Grayscale Range
12- amp_factor=1;
13- log_trans_intensities=amp_factor*np.log(1+np.float32(input_intensities))
14- log_trans_intensities=mf.norm_uint8(log_trans_intensities)
15- plt.plot(input_intensities,log_trans_intensities)
16- plt.grid()
17- plt.title("Log Transformation")
18- plt.xlabel("Input Intensities")
19- plt.ylabel("Output Intensities")
20-
21- plt.show()
22- print("Completed Successfully ...")
Code 3.9: Code for generating logarithmic mapping in grayscale range
Inverse log transformation will do the reverse of what has been discussed above. That is, it
will compress the range of lower intensities and expand the range of higher intensities.
Hence, it is suitable for images that are too bright.

3.5.1.3 Power Law Transformation


Older television devices used to have a Gamma correction calibration button. It was present
because whenever a sensor (like a camera) captures intensities/colors, the mapping of
intensities before and after capture is not linear. To correct that, power law, as illustrated in
the following equation:
Equation 3.2:
s=crγ
Where s is output intensity and r is input intensity. To display the intensities correctly, power
law correction (or Gamma correction) is required. The exponent in the above equation is
Gamma – hence the name γ-transform or γ correction. c is the amplitude factor and is taken
as 1 in our calculations (since its scaling factor, it does not matter much as we use
norm_uint8 function later).
Before looking at the actual application of Gamma transformation, let us see the input-output
intensity mapping as depicted in Figure 3.13. It does bear some similarity with the log
transform for some values of Gamma (say γ=0.1 to 0.3), but even then, it is more severe.
That is, the expansion/compression range is larger. Furthermore, the mapping changes
accordingly for different Gamma values. This is required as different capturing devices may
have different kinds of power law relations between their input and captured intensities. By
changing γ in the power law equation, one may tune the output to get visually correct results.
The plot shown in Figure 3.13 is generated by using Code 3.10. Apart from its main purpose,
notice line number 19 and 20 and related logic for plotting text with the curves:
01-
#=================================================================
=====
02- # PURPOSE : GAMMA transformation curve
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package and ...
06- # one may find the details related to its contents and usage in section 2.7.3
07-
08- #--------------------------------------------------------------------------
09- # GAMMA Transformation (Global Transform)
10- #--------------------------------------------------------------------------
11- input_intensities=np.arange(0,256,1) # Input Grayscale Range
12- G=np.array([.05,.10,.20,.30,.40,.50,.70,1,1.5,2.5,5,10])
13- amp_factor=1;
14- j=10
15- for i in G:
16- Gamma_trans_intensities=amp_factor*(np.float32(input_intensities))**i
17- Gamma_trans_intensities=mf.norm_uint8(Gamma_trans_intensities)
18- plt.plot(input_intensities,Gamma_trans_intensities,'b')
19- t=plt.text(input_intensities[j],Gamma_trans_intensities[j],i)
20- t.set_bbox(dict(facecolor='pink', alpha=1, edgecolor='red'))
21- j=j+20
22-
23- plt.grid()
24- plt.title("GAMMA Transformation")
25- plt.xlabel("Input Intensities")
26- plt.ylabel("Output Intensities")
27- plt.legend()
28-
29- plt.show()
30- print("Completed Successfully ...")
Code 3.10: Code for generation of input/utput mapping of power law transformation

Figure 3.13: Intensity mapping in Gamma correction


To see the effect of applying Gamma transformation to images, refer to Figure 3.14. You
must try Gamma transformation with different values of Gamma by using Code 3.11.
Compare the results with log transformation and histogram equalization.
01-
#=================================================================
=====
02- # PURPOSE : GAMMA transformation
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package and ...
06- # one may find the details related to its contents and usage in section 2.7.3
07-
08- #--------------------------------------------------------------------------
09- # GAMMA Transformation (Global Transform)
10- #--------------------------------------------------------------------------
11- input_image=cv2.imread('img4.bmp',0)
12-
13- fig,ax=plt.subplots(1,2)
14- fig.show()
15- mf.my_imshow(input_image,"(a) Input Grayscale Image",ax[0])
16-
17- amp_factor=1;
18-
19- G=.3
20- Gamma_trans_image=mf.norm_uint8(amp_factor*(np.float32(input_image))**(G))
21- mf.my_imshow(Gamma_trans_image,"(b) Gamma Transformed Image with Y =
"+str(G),ax[1])
22-
23- plt.show()
24- print("Completed Successfully ...")
Code 3.11: Code for applying Gamma transformation on grayscale image
The output of the code is shown in figure below:
Figure 3.14: Gamma (power law) transformation

3.5.2 Spatial transformations


Spatial transforms are different from space-domain transformation (a topic of Chapter 4). In
spatial transformation, the pixel location of all pixels in an image is changed according to
some rule. In this section, we will discuss some important spatial transformations.

3.5.2.1 Affine transformation


Affine transformation is a combination of translating every pixel’s coordinate of an image
together with linear mapping. It can be mathematically represented below:
Equation 3.3:
X' =AX+B

Where is the transformed pixel coordinate after the transformation, is the

initial pixel coordinate, is the linear map matrix, and is the translation
matrix.
Before discussing the affine transformation in totality, we will discuss the effect of linear
mapping on a uniform grid in 2D. In Table 3.3, we will consider the effect shown below:
Equation 3.4:
X'=AX
Notice that as compared to the equation of affine transformation, the translation matrix B=0
here. For , some special matrices for very specific values of a, b, c, and d are
shown in Table 3.3:
S. Linear map
Transformation matrix Before and after transformation
No. name

Identity
1.
transformation
Scaling in X-
2. direction by a
factor of 2

Rotation by
angle θ in
counter
3. clockwise
direction. θ =
π/6 in current
case

Mirroring
4.
about X axis
Horizontal
5. shear mapping
by factor m

Horizontal
6. squeeze
mapping

Projection onto
7.
Y axis

Table 3.3 : Illustration of linear mapping


We consider the application of linear transformation equation, i.e., Equation 3.4, which can
be written in expanded form, as shown below:
Equation 3.5:

For the following cases:


1. Identity
2. Scaling
3. Rotation
4. Mirroring
5. Shear mapping
6. Squeeze mapping
7. Projection onto one of the axis
For different values of a, b, c, and d, these cases will be realized in corresponding rows, as
shown in Table 3.3. The corresponding coding can be seen in Code 3.12. However, note that
any 2x2 matrix used for transformation will be called linear transformation matrix (LTM)
in general.
We will discuss Code 3.12 and Table 3.3 in parallel. Notice the usage of np.float32 in line
number 11 and 13 for creating x and y coordinates of the uniform grid in 2D, as marked by
thick circles in any figure of Table 3.3. Instead of np.float32, we could have used np.array,
but the problem is we are not using any fractional number while defining the array. The
numbers are all integers. When calculations on these locations by linear transformation result
in fractional values, they will be rounded off. This is what we do not want; that is why we
use np.float32. Now, there are 14 linear transformation matrices, as defined in Code 3.12,
for different purposes in 7 broad categories. These categories are not exhaustive. Parameters
a, b, c, and d in general can have any value. The transformation matrix will still be called a
linear transformation in general. However, the 7 categories that we present in Table 3.3 are
used frequently.
Identity transformation does nothing to the pixel location. Scaling, on the other hand, can be
applied to both the X and/or Y axes. In Table 3.3, only scaling in the X-direction is shown. In
Code 3.12, LTM for scaling in X and/or Y directions can be found in line number 22 to 24.
Similarly, other linear transforms and their variants can be found in Code 3.12 and Table 3.3.
One may note the pixel locations before and after the transformation and correlate the
transformation matrix used with the result.
01-
#=================================================================
==========
02- # PURPOSE : Understanding Linear Mapping
03-
#=================================================================
==========
04- import matplotlib.pyplot as plt
05- import numpy as np
06-
07- #---------------------------------------------------------------------------
08- # Defining points on X-Yplane
09- #---------------------------------------------------------------------------
10- # Constructing a 1D array of all row coordinates
11- org_x=np.float32([0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3])
12- # Constructing a 1D array of corresponding column coordinates
13- org_y=np.float32([0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3])
14-
15- #---------------------------------------------------------------------------
16- # Creating 2D linear transformation matrices (LTM) for use
17- #---------------------------------------------------------------------------
18- #1. Identity Transformation
19- LTM1=np.array([[1,0],[0,1]])
20-
21- #2. Scaling
22- LTM2=np.array([[2,0],[0,1]]) # Scaling in X-direction by a factor of 2
23- LTM3=np.array([[1,0],[0,3]]) # Scaling in Y-direction by a factor of 3
24- LTM4=np.array([[2,0],[0,3]]) # Scaling in X & Y-direction by a factor of 2 & 3 resp.
25-
26- #3. Rotating by angle theta in ACW direction
27- theta=np.pi/6 # (30 degrees)
28- LTM5=np.array([[np.cos(theta),-np.sin(theta)],[np.sin(theta),np.cos(theta)]])
29-
30- #4. Mirroring
31- LTM6=np.array([[-1,0],[0,1]]) # Mirror about X axis
32- LTM7=np.array([[1,0],[0,-1]]) # Mirror about Y axis
33- LTM8=np.array([[-1,0],[0,-1]]) # Mirror about X & Y axis
34-
35- #5. Shear mapping
36- m=2
37- LTM9=np.array([[1,m],[0,1]]) # Horizonatal shear mapping by factor 'm'
38- n=3
39- LTM10=np.array([[1,0],[n,1]]) # Vertical shear mapping by factor 'n'
40- LTM11=np.array([[1,m],[n,1]]) # Horizonatal & Vertical shear mapping by factor 'm' &
'n' resp.
41-
42- #6. Squeeze mapping by factor 'k'
43- k=2
44- LTM12=np.array([[k,0],[0,1/k]])
45-
46- #7. Projection onto 'y' axis
47- m=2
48- LTM13=np.array([[0,0],[0,1]]) # Projection onto 'y' axis
49- LTM14=np.array([[1,0],[0,0]]) # Projection onto 'x' axis
50-
51- #---------------------------------------------------------------------------
52- # SELECTING ONE LTM FROM THE ABOVE LTM's
53- #---------------------------------------------------------------------------
54- LTM=LTM5
55-
56- #---------------------------------------------------------------------------
57- # Transforming every 2D point by using linear transformation matrix
58- #---------------------------------------------------------------------------
59- trans_x=org_x*0
60- trans_y=org_x*0
61- for i in np.arange(0,len(org_y),1):
62- trans_x[i],trans_y[i]=np.matmul(LTM,[org_y[i],org_x[i]])
63-
64- #---------------------------------------------------------------------------
65- # Plotting Everything
66- #---------------------------------------------------------------------------
67- fig,ax=plt.subplots()
68- fig.show()
69- ax.plot(org_y,org_x,'co',label="Original Points",markersize=12)
70- ax.grid()
71- ax.axis('equal')
72- ax.set_xlabel("X axis -->")
73- ax.set_ylabel("Y axis -->")
74- ax.plot(trans_x,trans_y,'k>',label="Transformed Points")
75- ax.legend()
76-
77- plt.show()
78- print("Completed Successfully ...")
Code 3.12: Linear mapping code
Affine transformation (Equation 3.3) has translation in addition to linear mapping, i.e., it can
be described in expanded form, as shown in the following equation:
Equation 3.6:

Apart from the linear mappings depicted in Table 3.3, there is an additional global movement
of the set of grid points as directed by . The preceding equation can also be written in the
following format, as shown in the following equation:
Equation 3.7:

The advantage of the above form is that we now have to deal with only one matrix; earlier,
there were two. One for linear map and another for translation. Note that the above equation
has 6 degrees of freedom as there are 6 free parameters a, b, c, d e and f – they can, in
general, have any arbitrary value.
The matrix for squeeze mapping by k=2 together with a translation in X-direction by 40 and
Y-direction by 50 is:

Let us try to apply the affine transformation (as dictated by the above matrix) to an actual
image. Then, we will discuss some important points about affine transformation. The code
for that is Code 3.13 with its output shown in Figure 3.15. In line number 22, affine
transformation matrix (ATM) for the above specified case of squeeze mapping is created.
In line number 28, affine transform is applied by using the warp function. However, instead
of passing ATM as the second argument, we pass its inverse (this is the requirement of the
function) as - np.linalg.inv(ATM). linalg means linear algebra sub library and inv means
inverse. In the output shown in Figure 3.15, there are two versions of transformed output.
The first is cropped, and the second is uncropped. The size of the cropped version is the
same as the input image, but it loses some image portion that goes outside the input image
dimensions. The second version, which is an uncropped version, is created in line number
30. Notice that the command is exactly the same as line number 28, with only one additional
argument output_shape=(rows/1.5+50,cols*1.5). This forces an output shape on the image,
and hence, we can see the chopped-off portions of the processed image as well. We only
have to pass a tuple in the format (rows desired, columns desired) that we wish to see in
output.
01-
#=================================================================
=====
02- # PURPOSE : Learning Affine Transformation on images
03-
#=================================================================
=====
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- from skimage.transform import warp # For warping the image by transform matrix
06- import my_package.my_functions as mf # This is a user defined package and ...
07- # one may find the details related to its contents and usage in section 2.7.3
08-
09- #--------------------------------------------------------------------------
10- # Importing the image and displaying
11- #--------------------------------------------------------------------------
12- input_image=cv2.imread('img1.bmp',0)
13- rows, cols = input_image.shape
14-
15- fig,ax=plt.subplots(1,3)
16- fig.show()
17- mf.my_imshow(input_image,"(a) Input Grayscale Image",ax[0])
18-
19- #--------------------------------------------------------------------------
20- # Creating Affine Transform Matrix (ATM)
21- #--------------------------------------------------------------------------
22- ATM=np.float32([[1.5,0,40],[0,1/1.5,50],[0,0,1]])
23-
24- #--------------------------------------------------------------------------
25- # Warping the image according to the matrix selected and displaying
26- #--------------------------------------------------------------------------
27-
28- affine_transformed_image=mf.norm_uint8(warp(input_image,np.linalg.inv(ATM)))
29- mf.my_imshow(affine_transformed_image ,"(b) Affine image (cropped)",ax[1])
30-
affine_transformed_image=mf.norm_uint8(warp(input_image,np.linalg.inv(ATM),output_sh
ape=(np.int16(rows/1.5+50),np.int16(cols*1.5))))
31- mf.my_imshow(affine_transformed_image ,"(c) Affine image (un-cropped)",ax[2])
32-
33- plt.show()
34- print("Completed Successfully ...")
Code 3.13: Code for affine transformation on image
The output of the code is shown below:

Figure 3.15: Output of Code 3.13(a) Original grayscale image (b) Squeeze mapping by a factor 2 – cropped output,
cropping is done to match input image size (c) Squeeze mapping by a factor 2 – uncropped output
While interpreting the output in Figure 3.15, remember the convention of the X and Y axes
on the image described in Section 1.3 i.e., the top left corner of the image is point (0,0), the
top edge of the image is X axis (positive rightwards), and the left vertical edge is Y axis
(positive downwards). For squeeze mapping by k=2, one may note in part (c) of Figure 3.15
that the image has been stretched by a factor of 2 in the X direction and shrunken by a factor
of 2 (or stretched by ½) in the direction of Y. Further, in the X direction, there is a shift by
40, and in the Y direction, there is a shift by 50.
You may now try all the transforms discussed in this section by replacing the ATM in line
number 20/30 in Code 3.13. Now, having understood the mathematical background of affine
transforms and their application, let us understand some important characteristics of affine
transform. Refer to part (a) of Figure 3.16, wherein the original grayscale image with a
geometrical shape is shown. In parts (b), (c), and (d), shear mapping, rotation, and squeeze
mapping are applied successively.

Figure 3.16: Properties of affine transform


From Figure 3.16, one can easily observe the following facts, which are in general
applicable to any affine transformation:
• Origin does not necessarily map to origin
• Lines map to lines
• Parallel lines remain parallel
• The ratio of distances is preserved

3.5.2.2 Projective transformation


Projective transformation can be thought of as a generalization of affine transformation in
the sense that it uses 8 degrees of freedom (as we shall see soon) with the following
characteristics:
• Origin does not necessarily map to origin
• Lines map to lines
• Parallel lines do not remain parallel in general
• The ratio of distances is not preserved
The last two characteristics are different when compared to affine transformation. Before
looking at the actual transformation, we will build up some intuition for understanding
projective transformation.
Let us begin by looking at some useful insights offered by the following equation:
Equation 3.8:

The transformation matrix can be further written as or compactly as

where R= ,T= , E=[g h]. From the previous section, we know the effect of R (which
is rotation or a linear transformation in general) and T (which is a translation). For affine
transformation, E=[0 0] and i=1 also z=z’=1. Mathematically speaking, a matrix of order
n×n transforms an n dimensional vector (point) to another n dimensional vector. For
example, in Equation 3.8, a 3D point (x,y,z) is transformed to (x’,y’,z’). However, in the
previous section, we used this matrix to translate a 2D point to another 2D point such that
z=z’=1. Actually, (x,y,1) and (x’,y’,1) are still 3D points but those 3D points are restricted so
that they only lie in plane z=1, and hence we treat them as 2D points. We did this because we
wanted to incorporate translation together with linear transformation, making the overall
operation non-linear, called affine transform. In 3D, this transformation is linear, but for a 2D
point, this becomes non-linear.
Now, we want to investigate what happens when g, h, and i can take any arbitrary values.
However, remember that we are still talking about 2D points. To understand the situation
practically, we will define a new coordinate system called homogeneous coordinate
system, as depicted in Figure 3.17. The 3D coordinate is called homogeneous
coordinate, and we are paying special attention to the plane =1. That plane is the plane
where we would like to watch the phenomena of transforming the set of points (or shapes in
general) to another set of points (i.e., the initial and final positions of points or shapes will
remain in the plane itself). Let us assign a new coordinate system of 2D cartesian coordinates
to this plane ( =1) where a general point is represented by (x,y) on that plane.

Figure 3.17: Homogeneous coordinate system


In such a setting, if we consider the origin of 3D homogeneous coordinate as the point of
projection, then every point in 3D space will have a corresponding image on the plane =1.
Since the origin of a homogeneous coordinate is the point of projection, a line joining origin
and any general point (say ) as shown in Figure 3.17) in 3D will intersect the plane at
some point. In the 2D cartesian coordinate of the plane, let us call that point of intersection
(x,y). Through little geometrical exercise, one can establish that x= / and = / . So, we
say that homogeneous coordinates of a general 3D point are , its normalized 3D
homogeneous coordinate is , or equivalently, its 2D cartesian coordinate is ( )=

(x,y).
Also, understand that every point lying on that line L will map to the same point on the
plane. That is, through the above procedure, points in 3D are projected onto points in 2D.
However, remember that the two coordinates are different.
Now, let us refer to Figure 3.18. Notice that the four planes that are shown there by name P1,
P2, P3, and P4 are actually the same planes when seen from a 3D origin because their
projection (considering origin as projection center) on the plane =1 is same. That is, P2
(which lies in the plane =1) is the projection of P1 as well as P3 and P4.
Figure 3.18: Equivalent planes in 2D coordinates
So, planes P1, P3, and P4, which are randomly oriented in 3D, have the same image in 2D
plane =1. This is called equivalence of planes during projection.
Now, let us come back to defining our objective of doing projective transformation. A 2D
projective transformation (i.e., in plane =1) is defined as the transformation of one
quadrilateral to another quadrilateral. This is an informal definition of projective
transformation. In general, the transformation defined in Equation 3.8 will transform a 3D
point to 3D point (or a 3D shape to a 3D shape). However, if we apply some constraint on
the structure of a matrix/equation, we will be able to map 3D shape to 3D shape so that the
initial and final shapes lie in plane =1.
The constraint that we apply on Equation 3.8 is that the coordinates of initial and final point
will be written in homogeneous coordinates. Hence, Equation 3.8 can be rewritten, as shown
below:
Equation 3.9:

All we have done in Equation 3.9 is that on RHS, the coordinates of source/input points are
written with z=1 because that shape already lies on the plane z=1. Through 3x3 matrix
transformation, that shape will be transformed to an arbitrary location in 3D. This is
represented by . It is a projection on the plane (of 2D cartesian coordinates with
z=1) is ( / , / ,1) and the 2D cartesian coordinate is (x’, y’).
Since the introduction of homogeneous coordinates, any point in 3D in general will be
normalized such that z coordinate will be 1. In the RHS of Equation 3.9, whether we write
, both are equivalent. So, Equation 3.9 is unique up to a scale factor. From the 3x3
transformation matrix, if we take a common factor, say i, every other element will be divided
by i. Hence, the last element of the matrix will be 1 (although this can be done with any
element, but conventionally, it is done with the last one). So, Equation 3.9 can be rewritten,
as:
Equati on 3.10:

This is our final equation for projective transformation, which has 8 degrees of freedom (as
in a 3x3 matrix, there are 8 values to choose). Remember, the source/input and target/output
coordinates are written in homogeneous coordinates, and the equation is valid up to a scale
factor only because of the reasons discussed earlier.
We are already aware of what a, b, c, d, e, and f will do in transformation matrix. Now,
through Figure 3.19, let us see what g and h are capable of doing. For this, the
transformation matrix that we are using is . One may note the change of values of g
and h in various sub-plots of Figure 3.19. This process is called elation. Notice that due to
this, parallel lines will not remain parallel, and the ratios of distances will not be preserved as
well.

Figure 3.19: Elation process in projective transformation


This effect, when combined with linear transformation and translation (represented by non-
zero values of a, b, c, d, e, and f) is called projective transformation. Now, in Figure 3.20, let
us see what a typical projective transformation looks like:

Figure 3.20: Typical projective transformation


We want to translate all points in the source quadrilateral to the target quadrilateral. Both the
quadrilaterals lie in the plane =1. Now, this can be done through proper selection of
transformation matrix and using Equation 3.10. However, note that the RHS of this equation,
when calculated, will give the coordinates in 3D. This is represented by intermediate
quadrilateral in Figure 3.20. This quadrilateral is not in the plane =1, but its projection is.
The projection can be easily obtained by converting the coordinates of every point to
homogeneous coordinates (normalized).
Now, let us see what practical problems can be solved through projective transformation
through Figure 3.21 and Figure 3.22. In Figure 3.21, we intend to map the quadrilateral
shown from source to target. In this case, observe that the content inside marked
quadrilaterals lies on a plane itself (to a very good degree). So, after transformation, one-to-
one correspondence between these planes can be established.
Figure 3.21: Projective transform applied to planar object
However, in Figure 3.22, notice that the content shown inside quadrilaterals varies in both.
For example, the two pillars behind are not fully visible in the first image, however, they are
visible in the second. So, since the content has 3D change, the object inside quadrilaterals is
not planar object. In such situations, the projective transformation will work only
approximately (when it is forcibly applied).

Figure 3.22: Projective transformation applied to non-planar object


The code for doing projective transformation is given in Code 3.14 together with the sample
outputs corresponding to the intended correct usage, viewpoint change application, and
visually incorrect usage mathematically correct shown in Figure 3.23, Figure 3.24, and
Figure 3.25 respectively. Before discussing the output images, we will discuss the code
itself:
01-
#=================================================================
=====
02- # PURPOSE : Learning Projective Transformation on images (INTERACTIVE)
03-
#=================================================================
=====
04- import numpy as np
05- import cv2
06- import matplotlib.pyplot as plt
07- from skimage import transform as tf
08- from skimage.transform import warp # For warping the image by transform matrix
09- import my_package.my_functions as mf # This is a user defined package and ...
10- # one may find the details related to its contents and usage in section 2.7.3
11-
12- #--------------------------------------------------------------------------
13- # Importing the image
14- #--------------------------------------------------------------------------
15- input_image=cv2.imread('img8.bmp',0)
16- rows, cols = input_image.shape
17- fig,ax=plt.subplots(1,2)
18- fig.show()
19- mf.my_imshow(input_image,"(a) Input Grayscale Image",ax[0])
20- ax[0].axis("on")
21-
22- #--------------------------------------------------------------------------
23- # Taking the input of source points from user
24- #--------------------------------------------------------------------------
25- src=np.asarray(plt.ginput(1)) # Src Pt 1 (I/p from user)
26- ax[0].plot(src[0,0],src[0,1],'r.',markersize=10)
27- src=np.vstack((src,np.asarray(plt.ginput(1)))) # Src Pt 2 (I/p from user)
28- ax[0].plot(src[1,0],src[1,1],'r.',markersize=10)
29- src=np.vstack((src,np.asarray(plt.ginput(1)))) # Src Pt 3 (I/p from user)
30- ax[0].plot(src[2,0],src[2,1],'r.',markersize=10)
31- src=np.vstack((src,np.asarray(plt.ginput(1)))) # Src Pt 4 (I/p from user)
32- ax[0].plot(src[3,0],src[3,1],'r.',markersize=10)
33-
34- #--------------------------------------------------------------------------
35- # Fixed destination points
36- #--------------------------------------------------------------------------
37- dst=np.float32([ [0,0], # Corresponding Target Point 1
38- [0,rows], # Corresponding Target Point 2
39- [cols,rows], # Corresponding Target Point 3
40- [cols,0] ]) # Corresponding Target Point 4
41-
42- #--------------------------------------------------------------------------
43- # Deducing the transformation matrix from corresponding points
44- #--------------------------------------------------------------------------
45- trans_matrix = tf.estimate_transform('projective', src, dst)
46-
47- #--------------------------------------------------------------------------
48- # Applying projective transformation
49- #--------------------------------------------------------------------------
50-
projective_transformed_image=mf.norm_uint8(warp(input_image,np.linalg.inv(trans_matrix
),output_shape=(np.int16(rows*1),np.int16(cols*1))))
51- mf.my_imshow(projective_transformed_image ,"(b) Projective Transformed
Image",ax[1])
52-
53- #--------------------------------------------------------------------------
54- # Plotting Logic
55- #--------------------------------------------------------------------------
56- ax[1].axis("on")
57- ax[1].plot(dst[:,0],dst[:,1],'r.',markersize=20)
58-
59- plt.show()
60- print("Completed Successfully ... ")
Code 3.14: Interactive code for doing projective transformation
The output of the above code is shown in the three figures below for three different cases.
The following figure shows the output with the correct intended usage:

Figure 3.23: Correct intended usage of projective transform


The output of projective transform for viewpoint change is shown below. Notice that only
the input points were differently placed as compared to the above case.

Figure 3.24: Use of projective transform for viewpoint change


The following is the result of forceful application of projective transformation to an image.
This corresponds to visually incorrect usage.
Figure 3.25: Forceful application of projective transform to non-planar object
This code is like Code 3.13 for doing affine transform on images. However, now the user can
choose the source quadrilateral by clicking over the image and putting the four vertex points
on the source image. The order of points clicked should be anticlockwise, starting from the
top left point of the quadrilateral. One may notice the syntax of taking interactive input from
the user in line number 25, 27 and 31 of Code 3.14. Also note that in the source and
destination/target array, the points should be placed in matching/corresponding positions
according to the desired transformation. In line number 45, the command for estimation of
the matrix is shown. Since projective transform has 8 degrees of freedom, we require 4
points (pair of x,y coordinates to guess the matrix); for affine transform, 6 degrees of
freedom are there (hence, g=h=0 in the matrix). We only need 3 points (a pair of x,y
coordinates to guess the matrix). The same code can be used to do affine transformation as
well. Note also that we have fixed the destination/target quadrilateral in the code. This is,
however, not compulsory. One may choose any quad-to-quad transformation as they please.
In Figure 3.25, the source quadrilateral does not have a planar object. The projective
transformation if forcibly applied. The result hence is not visually appealing or consistent
with our common day to day experience.
Also, note that if a set of 4 points does not form a convex quadrilateral (like the one shown
in Figure 3.26), the code can still apply the projective transform with little practical usage.

Figure 3.26: A non-convex quadrilateral

3.6 Color models


A three-dimensional cartesian coordinate system is known to us. It is a way of representing a
point in space by using three coordinates x, y, and z. Similarly, we need three primary colors
to represent any color: red, green, and blue. We get the desired colors by mixing them in a
proper proportion. Remember that here we are not talking about how humans perceive color
– that is a complex physiological phenomenon and not yet fully understood. We are talking
about how colors are represented inside computers. A color model is, hence, a coordinate
system for colors. In this section, we will study some classical color models out of the many
available.

3.6.1 RGB color model


Refer to part (a) of Figure 3.27 to understand the RGB color model. The three axes are for
colors red, green, and blue. Hence, they start from 0 and end at 255. Note that fractional
values between 0 and 255 also allow for a continuous color system. If it is discrete, then only
some values will be allowed. 0 will mean the absence of color, and 255 will mean the full
presence of color. So, a triplet (52,63,250) represents a specific color, which contains 52 on
the red axis, 63 on the green axis, and 250 on the blue axis. Hence, black color will be
represented by (0,0,0) and white by (255,255,255).
Also, note that the color cube is not hollow. It is filled with infinite colors inside. To
understand that it is a solid cube and not hollow, we can see part (b) of Figure 3.27. In this
part, a discrete color cube is shown, as it can be easily seen that the cube is not hollow:

Figure 3.27: RGB color model


Keeping the above color model in mind, an RGB colored image is an image that has three
frames stacked one over the other, as demonstrated in Section 1.2.2 Figure 1.2. Also, the
default format python uses are BGR instead of RGB, as noted in Section 2.7. However, we
have overcome this difficulty by designing a custom package as made in Section 2.7.3 (Code
2.17).
The RGB color model is mostly used by display devices like TV monitors, computer
screens, etc. It is simple to understand and implement.

3.6.2 Cyan-Magenta-Yellow color model


Refer to Figure 3.28 for understanding the CMY color model in comparison to the RGB
color model. The primary colors in the RGB model were red, green, and blue. However, in
the CMY model, they are cyan, magenta, and yellow. It is interesting to note that RGB and
CMY are complementary color models. Pure colors red, green, and blue mix to produce
white, i.e., [255, 255, 255] in the RGB color model. In the CMY model, pure colors cyan,
magenta, and yellow mix to produce black, i.e., [0,0,0].

Figure 3.28: RGB vs. CMY framework


This can be mathematically understood in the following equation for any value of the triplet
(R, G, B):
Equation 3.11:

The reverse relation is trivial to find out. The CMY color model is mostly used by the
printing industry.

3.6.3 Hue-Saturation-Intensity color model


A point in 3 dimensional cartesian coordinates can be uniquely defined by a triplet (x,y,z).
However, this is not the only way of defining a point uniquely. Refer to Figure 3.29. First,
without looking at the context (labels), note that any point in three-dimensional space can be
defined by the intersection of the three structures shown in part (a), (b), and (c) of the figure.
In part (a) of the figure, a doorlike structure tied to a hinge (the diagonal line) is shown. This
doorlike structure is free to rotate around this hinge in 360 degrees. In part (b) of the figure,
a cone made of a triangle (instead of a circle) is shown. The axis of this cone is the same as
the hinge of the door. The triangular cone could have any radius it desires. Similarly, in part
(c) of the figure, a disk perpendicular to the hinge is shown. This disc could slide across the
hinge but must remain perpendicular to the hinge.
Figure 3.29: HSI color model
An interesting point to note is that for a given position of the door in part (a), a given radius
of the triangular cone in part (b), and for a given position of the disc on the hinge in part (c),
the three structures intersect in a unique point (if they were put in the same 3D box shown).
Hence, it is also a coordinate system.
Now, let us apply context to it. In part (a) of Figure 3.29, each position (angle) of the door
corresponds to a unique color called as hue. In part (b) of the figure, the radius of the cone
corresponds to the mixing of white color in pure color. This is called saturation. The lower
the radius, the larger the portion of white color is mixed. In part (c) of the figure, the position
of the disk tells us about the illumination called intensity. Also note that this is shown in a
similar RGB cube only.
Now, one can characterize any point in three-dimensional color space by its hue, saturation,
and intensity. This is called the Hue-Saturation-Intensity (HIS) color model. This model is
primarily used by artists in color pickers in various drawing and writing applications. The
conversion from RGB to HSI color model and vice versa is given in the following equations:
Equations 3.12:

Equation 3.13:

Equation 3.14:

Equation 3.15:

The inverse relations are given in the following equations:


Equation 3.16:
Equation 3.17:

Equation 3.18:

Conclusion
In this chapter, the reader must have developed a beginner level understanding of dealing
with the smallest element of an image, that is a pixel. Histogram processing was introduced
to extract some useful information from images. Through histogram equalization, the
histogram with arbitrary distribution can be converted to uniform distribution. Matching was
also introduced to make the histogram of a given image similar to a target image. Some basic
global transformations were introduced, and affine and projective transformations were
covered in detail. For color image processing, basic color models were also introduced,
illustrating their relation to the RGB color model to facilitate the conversion.

Points to remember
• The neighborhood of a pixel can be of many types like 4, 8. This can be closed (meaning
the central pixel is included) or open.
• The histogram of an image is a plot of intensities on the X-axis and their frequencies on
the Y-axis.
• Histogram equalization means modifying the histogram of the current image so that it
becomes uniform. This is achieved by modifying pixel values in the original image by
the prescribed procedure.
• Histogram matching/specification means making a histogram of one image like the other.
• Logarithmic transform on images is a global, non-linear transform that enlightens the
darker intensities.
• Affine transformation is linear mapping plus translation.
• Projective transformation is a generalization of affine transformation in the sense that
affine uses 6 degrees of freedom, but projective uses 8.
• The RGB color model is used by display devices like computer screens, the CMY color
model is used by printers, and the HIS model is used by color picker programs.

Exercises
1. Using Python, import two images captured from the same camera and from the exact
same scene. The first image should be taken in daylight and the second at night. Form
their histograms and compare them. [Hint: Refer Code 3.2]
2. For the images imported in question 1, equalize both the images and then compare their
histograms and visual quality.
3. For the image shown in part (a) of Figure 3.11, choose a suitable gamma transformation
by choosing the value of γ so that the results match with part (b) of Figure 3.11 due to
the log transformation.
4. Scan a document using your mobile camera and straighten the image so formed by
using projective transformation. [Hint: See Section 3.5.2.2]
5. Use a color picker program and note the HSI value for the chosen color. Calculate the
corresponding RGB values and verify with the RGB values displayed in the color
picker. [Hint: In the Microsoft Windows Paint program, the color picker shows both
values].

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech happenings around the
world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
CHAPTER 4
Spatial Domain Processing

4.1 Introduction
In this chapter, we will explore the methods/systems that are applied to
digital images for processing them and obtaining the desired output in the
spatial domain. For example, one may want to find all vertical lines (if any)
in the given image. One may want to highlight/illuminate the areas that are
darker in the image, leaving other areas untouched or one may want to
sharpen the image for better visibility — countless such applications can be
listed. For these specific applications, there should be specific systems that
should get the job done. An important question at this stage is the number of
such systems that can be designed. Even if we have designed systems for all
such applications — a new application will pop out the next moment.
We will begin by understanding signals (image in our case) and systems in
1D and, from there, generalize the understanding of 2D image processing
systems. Although the introduction to signals and systems in this chapter is
not mathematically rigorous, it is still necessary and intuitively appealing for
understanding.

Structure
In this chapter, we will cover the following topics:
• Signals in one dimension
• Systems in one dimension
• Graphical illustration of one-dimensional convolution
• One dimensional filter design intuition
• Concept of two-dimensional filtering of images
• Two-dimensional filtering of images using Python
• Smoothening low pass filters in spatial domain
• Sharpening filters
• Convolution vs. correlation

Objectives
After reading this chapter, the reader will be able to understand the concepts
of spatial domain filtering for one and two-dimensional signals. The reader
will be able to apply concepts so developed to digital images for enhancing
them using the well-established methods in spatial domain processing. The
reader will be able to understand various smoothening and sharpening filters
(just like water filter that separates dirt from water) and the method of
creation of such filters. Finally, the operation of convolution for achieving
the spatial domain filtering through various filters (also called as kernels)
will also be presented.

4.2 Signals in one dimension


For image processing, we are only interested in discrete and digital signals,
as shown in Figure 4.1. Both signals are defined at discrete points in time
(here, the independent variable is time; in the case of images, it is space) and
undefined everywhere else. Digital signals can, however, take only
predefined values in amplitude, while discrete signals can take any value of
amplitude they wish.
Figure 4.1: Discrete vs. digital signal
Digital images by their name are digital, i.e., their amplitude values are
quantized (as we have seen for uint8 grayscale image, every pixel can have
values in the set {0,1,2,3…255}). Note that the quantized levels are not
necessarily integers; in general, they can be anything, but usually, integers
are taken. The point is if the maximum and minimum amplitude of the signal
is fixed, a discrete signal can take any continuous value in that range — that
is how infinite values are possible. However, digital signals can take only
finite (predefined) values in that interval. So, the topics discussed in this
section will be applicable to discrete signals in general. We have already
discussed that the image from uint8 format is to be converted to float32 or
float 64 for processing, and after processing, it is again converted back to
uint8 format for displaying and storage.
The signal shown in Figure 4.2 is called a discrete impulse signal. It is a
building block of every other discrete signal possible in 1D. Any arbitrary
signal in the discrete domain, i.e. can be constructed from the sum of scaled
and shifted unit impulse signals. Mathematically represented in the
following equation:
Equation 4.1:
The graphical representation is shown in the figure below:

Figure 4.2: Discrete impulse signal


Equation 4.1 is called sifting property (and not shifting property) in
electrical, electronics, and communication engineering.

4.3 Systems in one dimension


System means anything (hardware or algorithm) that takes some data as
input and processes/manipulates it according to some rule (which is the
fingerprint or characteristic of the system) and gives output. There can be
systems of various types, but we are primarily interested in the following
features:
• Linearity
• Time invariance
Why we need these properties will become clear very soon, but first, let us
understand what they mean.
4.3.1 Linear systems
Systems for which scaled input produces scaled output with the same scaling
factor are called linear systems. Also, if two inputs are independently applied
to the systems and their outputs are noted, the sum of the two inputs is
applied to the system, and it is found that the output is the sum of the two
independent outputs, then the system is called a linear system.
Mathematically, if a system is represented by y[n]=T(x[n]), where T is the
transformation (or system), x[n] is the input to the system and y[n] the
output. Then, the following two properties, represented in the following
equations, should hold simultaneously for it to be linear:
Equations 4.2:

Equations 4.3:

Where A, B, and C are scaling factors. The utility of having such system
characteristics will become clear soon.
4.3.2 Time invariant systems
A system whose characteristics do not change with time is called a time
invariant system. That is, if you give one input to the system today and get
some output (today), and then you give the same input to it tomorrow, you
will get the exact same output that you got today. Obviously, we would like
our system to behave in this way but there are some applications where the
other category is also useful and that category is called time variant
systems. Mathematically, if a system is represented by 𝑦𝑛=𝑇(𝑥𝑛), where T
is the transformation (or system), 𝑥[𝑛] is the input to the system and 𝑦[𝑛]
the output, then the following property shown in below should hold for it to
be time invariant:
Equation 4.4:
𝑦𝑛−𝑎=𝑇(𝑥𝑛−𝑎)
Where a is the delay in time. For negative values of a instead of delay, we
will have advance in time.

4.3.3 Linear time invariant systems in one dimension


The systems that follow Equation 4.2, 4.3, and 4.4 are called linear time
invariant (LTI) systems. LTI systems are interesting due to the property
discussed in the next paragraph.
The discrete impulse signal shown in Figure 4.2 is a signal. Let us assume
its output is known (after it passes through our LTI system). Consequently,
the output of discrete impulse is also known yesterday, tomorrow, a month
later, and so on because the LTI system, by definition, is time invariant.
Technically, this means that the output of shifted impulse (forward or
backward) signal is known if the output of the impulse is known. Now, here
is the interesting part — earlier in Section 4.2, through Equation 4.1, we said
that any arbitrary signal can be constructed from a weighted/scaled sum of
impulses. Since the system we are using is LTI, so due to linearity and time
invariance, the output of entire signal becomes known. This is the power of
LTI systems and its impulse response and that makes us hugely interested in
them.
We will discuss the time variant systems and the advantages they bring in the
coming chapters. For now, the LTI system and its powerful impulse response
have led us to write the following equations:
Equation 4.5:
𝑦𝑛=𝑘=−∞∞𝑥𝑘ℎ[𝑛−𝑘]
Where 𝑥𝑛 is the input signal, 𝑦𝑛 is the output signal, and ℎ𝑛 is the impulse
response of the LTI system. Note that we do not need to look into the
construction of how the LTI system is made. As discussed earlier, it is
impulse response ℎ𝑛 that is its fingerprint and hence output for any input
can be determined from it using Equation 4.5. It is also called convolution
sum equation. Convolution (represented by the symbol *) between 2 signals
𝑥𝑛 and ℎ𝑛 is represented by 𝑥𝑛∗ℎ[𝑛] below:
Equation 4.6:

In the above equation, ℎ[𝑛] can, in general, be infinite in extent. It will then
be called infinite impulse response (IIR). However, if it is finite in nature,
it is called finite impulse response (FIR).

4.4 Graphical illustration of one-dimensional convolution


Figure 4.3 illustrates the process of convolution in 1D. It will be useful for
us to understand the convolution (also called filtering operation) from an
intuitive viewpoint. We have a 1D finite duration signal {4,8,2,7,6,2,3} and
the filter/mask/kernel (i.e., the impulse response of the LTI system as
discussed in an earlier section) as {5,6,3,7,1}.
Figure 4.3: Illustration of convolution in 1 dimension
Convolving the signal with the filter is equivalent to passing the signal
through an LTI system whose impulse response (or fingerprint) is the given
filter. Note that the signal has seven elements; hence, we will also expect the
output to have seven elements, though the filter has five elements. Also, note
that we are assuming that the filter will have an odd number of elements for
mathematical convenience and ease of understanding (although this is not
necessary). Since the output has seven unknown elements, we must calculate
them by the process of convolution in seven steps, as indicated in Figure 4.3.
In the process of convolution, a flipped version of the filter is used (one can
get the intuition of why a flipped filter is used by looking at Equation 4.6).
In Step 1, as shown in the figure, the center element of the flipped filter (i.e.,
an element with index 2 in the flipped filter) is graphically kept over the
element to be calculated (i.e., an element with index 0 in the signal). Due to
this, some portions of the flipped filter and signal overlap (shown in gray
shade). Note that in Step 1, the first two elements of the flipped filter do not
have any corresponding elements in the signal, and hence, they are assumed
to overlap with 0. This is called zero padding. Now, the elements in the
signal that overlaps with the flipped filter are correspondingly multiplied,
and the sum of all such multiplications so obtained becomes the value of the
output element with index 0.
After this, we proceed to Step 2, where we try to calculate the value of the
output element with index 1. In this case, the same procedure is repeated.
The only difference is that the flipped filter is shifted by one position to the
right from its previous position. Hence, till Step 7, all the values for the
output in the filter are calculated. This is the process of convolution/filtering
of 1D signal.
Convolution can be implemented by using the np.convolve command shown
in line number 9 of Code 4.1. The third argument 'same' simply makes the
output size equal to the number of elements in the first argument (in this
case, my_signal).
Code 4.1: Code for Convolution / filtering of 1D signal
01-
#=====================================================
=================
02- # PURPOSE : Learning Convolution
03-
#=====================================================
=================
04- import matplotlib.pyplot as plt
05- import numpy as np
06-
07- my_signal=np.array([4,8,2,7,6,2,3]) # SIGNAL
08- filter1=np.array([5,6,3,7,1]) # CONVOLUTION FILTER
09- result_conv1=np.convolve(my_signal,filter1,'same') # CONVOLUTION
RESULT
10-
11- # PLOTTING LOGIC
12- fig,ax=plt.subplots()
13- fig.show()
14- ax.plot(my_signal,'k',label='Original Signal',linewidth=2)
15- ax.plot(np.arange(0,len(my_signal),1),my_signal,'k.',markersize=10)
16- ax.plot(result_conv1,'--b',label='Convolution result')
17-
ax.plot(np.arange(0,len(result_conv1),1),result_conv1,'b.',markersize=10)
18- ax.grid()
19- ax.legend()
20- ax.set_xticks(np.arange(0,len(my_signal),1))
21- ax.set_yticks(np.arange(0,result_conv1.max(),10))
22- ax.set_title('Illustration of Convolution',fontsize=15)
23- ax.set_xlabel('Time axis')
24- ax.set_ylabel('Amplitude axis')
25-
26- plt.show()
27- print("Completed Successfully ...")
Code 4.1: Code for convolution/filtering of 1D signal
The result of running the above code is shown in Figure 4.4:
Figure 4.4: Space/time domain filtering using convolution

4.5 One dimensional filter design intuition


Having understood the filtering operation, our objective in this section is to
know the filter structure (i.e., values of the elements of the filter) that will be
able to achieve certain tasks.
One thing is clear from the previous discussion — the value of any element
in the resulting array (convolved result) can be calculated as a weighted
summation of the neighboring elements (individually weighted by
corresponding filter coefficients).
Let us deepen our understanding by studying some examples.

4.5.1 Averaging filters in one dimension


Let us consider the following filter structure: ((1)/5)*[1,1,1,1,1] or
equivalently [(1/5,1/5,1/5,1/5,1/5)]. Clearly, in this case, all the coefficients
are the same, i.e., 1/5. When we convolve any signal with this filter, we
simply get the weighted average of the neighborhood with neighborhood
size 5.
Although this example illustrates the usage of an averaging filter of size 5, in
general, we have the following structure for an averaging filter of size n with
total number of elements as n:
or,

The result of the application of this filter on a signal is shown in Figure 4.5:

Figure 4.5: Result of applying an averaging filter on 1D signal


Let us make some important observations about the averaging filter of
Figure 4.5. First, as the average is always between the maximum and
minimum value of the data, the convolved result never goes higher than the
maximum amplitude of the signal, nor does it go below the minimum value.
Second, one may note that the sharp edges in the original signal (for
example, at t=3, t=6, and t=21) have become smooth in the convolved
result. This is why an average filter is also called a smoothening filter.
Third, decreasing the filter size will reduce the effect of smoothening, and
increasing the filter size will increase the smoothening.
Averaging can also be achieved by following filters structures as well:

The above kind of filters give high weightage to the element processed and
its immediate neighbors. The weightage decreases as we move away from
the element being processed. There could be many such weighting structures
like Gaussian weighting (a symmetric bell-shaped curve) — all with similar
results.

4.5.2 First order derivative filters


The purpose of a derivative is to find changes in data (wherever it occurs).
Digital derivatives (also called digital differences) can be found in many
ways. Some of them are listed in the following equations:
Two-point forward difference (Equation 4.7):

Two-point backward difference (Equation 4.8):

Two-point central difference (Equation 4.9):

Where f' (x) is the digital derivative corresponding to a data point f (x) and
f(x+h) is the data point h points forward. Similarly, f(x-h) is the data point h
points back. Now, there are two things to note: first, for digital/discrete data,
we assume h=1 (although it might take any value, but for images, two
consecutive pixels differ by 1 — this justifies our choice). Second, like two
point forward/backward/central difference, there exists n point
forward/backward/central difference equations as well. However, for
discussion purposes, we will use only 2-point formulas with h=1. So, the
preceding equations reduce to the following form represented in the
following equations:
Two-point forward difference (Equations 4.10):

Two-point backward difference (Equations 4.11):


Two-point central difference (Equations 4.12):

The corresponding filter coefficients for forward, backward, and central


derivatives are respectively: [1,-1], [1,-1], and [1/2,0,(-1/2]. These filters are
obtained by putting coefficients of [ …f(x+3),f(x+2),f(x+1),f(x),f(x-1),f(x-
2),f(x-3)… ] from Equations 4.10, 4.11, and 4.12 respectively. (The reverse
order in the above general formulation will be understood when we will
study correlation in coming sections soon.)
The result of application of all these filters on a signal is shown in Figure
4.6. Part (a) of this figure shows the original signal with different regions
marked:

Figure 4.6: Response of various 1st derivatives to a given signal


From time t=2 to 7, we have a gradual build up (alternatively, built down is
also possible) of data — we call it Ramp. From t=7 to 10, we have a
constant region. From t=10 to 11, there is a step change (sudden change
whether from low to high or reverse is called step change). Similarly, other
portions of signals can be classified.
Now, let us talk about part (b) of Figure 4.6. It shows the response of the
first forward difference (derivative) applied to the original signal. This is
done by convolving the original signal with filter [1,-1]. Intuitively, to
calculate the output at any point, we may just calculate the difference of
value at point 1 step ahead and the current point. Notice that at the
downward going step edge at the original signal (t=10 to 11) the response of
filter is also negative. Reverse is true at points t=14 to 15 where step change
is positive. From t=2 to 7, ramp exists in original signal, and hence, the
response of forward derivative is constant = 1 as the increase in the value of
ramp is by 1 at every point. Constant regions (e.g., t=7 to 10 and t=11 to 14)
have zero response in forward derivatives as there is no change of value in
consecutive samples.
Backward derivative, as shown in part (c) of Figure 4.6 also gives similar
output but inverted. Central derivative in part (d) of Figure 4.6 gives a
response which is smoother as compared to the above two as its filter
structure contains three coefficients: [0.5,0,-0.5].
If nothing is specified, by default, a 2-point forward derivative will be
assumed in this book, unless otherwise stated.

4.5.3 Second order derivative filters


Like first order derivative, second order derivative can be defined in many
ways. It can be thought of as a derivative of derivative. For example,
velocity is the first derivative of displacement, and acceleration is the
second. However, acceleration is the first derivative of velocity, too. We take
the definition of second order derivative, to be represented in the following
equation:
Equation 4.13:
f'' (x)=f(x-1)-2f(x)+f(x+1)
with the filter as: [1,-2,1] and the result of the application of this filter on a
signal is shown in Figure 4.7.There are many important points to note, such
as:
Firstly, look at the point t= 10 to 11 in the original signal. There is a step
change (high to low). At the corresponding locations in the second order
derivative response, we see a zero crossing. The same is true for t=14 to 15,
where there is a ramp (low to high). However, for the first derivative’s
response, as in the previous section, there were peaks of -ve and +ve
amplitudes, respectively.
Secondly, at the start of ramp (t=2), there is a small positive peak, and at the
stopping point of ramp (t=7), there is a small -ve peak. This characteristic is
also different from the response of first order derivative in the previous
section, as there was a constant value throughout the ramp; however, here,
we get two peaks at the onset and ending. We will elaborate more on this
when we apply the 2D equivalent of these filters on images in the coming
section.

Figure 4.7: Response of second order derivative to a signal


Having understood the concept of filtering (convolution) in 1D, we will
generalize these concepts to 2D and explore the results on images in the
coming sections.

4.6 Concept of two-dimensional filtering of images


Images are 2D signals, and hence, a 2D version of convolution equation is
needed. This is given below:
Equation 4.14:
Where I(x,y) is the input image to LTI system, h(x,y) is the 2D impulse
response, and O(x,y) is the output of the system. Also, note that the impulse
response h(x,y) is finite extent in this case; the summation in RHS is over
finite range.
We will try to understand the above equation (i.e., the process of 2D
convolution by referring to Figure 4.8. Part (a) of the figure shows the input
image I(x,y). Part (b) shows the final image after processing. Part (c) shows
the filter (i.e., the impulse response of LTI system -finite in this case). Part
(d) shows a flipped filter, which is obtained from the original filter by
rotating it 1800. Flipped filter is used to produce the output from part (a) as
given in part (b). Suppose we want to calculate the value of intensity O(2,4)
in output image for pixel coordinate (2,4), that pixel is highlighted by dark
gray color. To obtain the value of O(2,4), we place the flipped filter on the
input image such that I(2,4) is perfectly aligned with h(0,0). The overlapping
region is highlighted in part (a) of the figure. We then calculate the element-
by-element product for the corresponding overlapping elements. Finally, we
add all such products (a total of nine products will be there in this case).
Mathematically represented below:
Equation 4.15:

The graphical illustration for the same is shown in the following figure:
Figure 4.8: Illustrating the 2-dimensional convolution process
Equation 4.15 can be compactly written, as:
Equation 4.16:

For a general pixel (x,y), Equation 4.16 will take the general form and
become Equation 4.14. For different filter sizes, the limits of summation will
change accordingly. It is conventional to create square-sized filters with a
total number of rows (and hence columns) as an odd number. Otherwise, it
will be difficult to determine the center of the filter.
Referring to Figure 4.8, the value of every pixel in the output image is
calculated in the same way. However, the calculation is done in order. We
start from the topmost left pixel, process it, and then move on to the next
pixel in that row. Once we are done with processing one row, we go to the
next row and repeat the same process until all the rows (and hence pixels)
are processed.
One issue that may be incurred while processing will be for the pixels
located at the boundary. For example, in part (e) of Figure 4.8, when the
center of the flipped filter is overlapped with I(0,0) (shown in a dark gray
shade), some portion of the filter window hangs outside the image, and there
is no intensity value corresponding to those filter coefficients. We assume all
such intensity values to be zero. It is a good idea to append a frame of zeros
of appropriate size around the image beforehand, as shown in part (e) of
Figure 4.8 – this is called zero padding. If the filter (and hence flipped filter)
is of size 5x5, the frame of zeros that needs to be appended will be 2 pixels
thick. For a N×N filter size, the frame of zeros should be (N-1)/2 pixel thick
(assuming N to be an odd number). In the next section, we will explore the
effect of applying different filters to the images.

4.7 Two-dimensional filtering of images in Python


By now, we know that filtering of images means convolution of the image
with a filter. Here, the image is signal, and filter is the impulse response of
the system. In Python, using the NumPy library (as in Code 4.1), one
dimensional convolution can be done. To do higher dimensional
convolution, we use the SciPy library (we will illustrate this soon). Before
seeing the actual code at work, let us see how many forms (shapes) the
output can be obtained for a given input image (irrespective of the filter
coefficients). Refer to Figure 4.9:
Figure 4.9: Illustrating three kinds of outputs in the convolution process
In Figure 4.9, various rectangular regions are shown. First, the original
image is shown as a white rectangle (shape 16x18). The indexes of rows and
columns are also written alongside. Then, five filter positions are shown
(filter size is 5x5). In filter position 1, the last filter coefficient (bottom right
coefficient) of the filter overlaps with the first pixel of the image. In the
second filter position, the center coefficient of the filter overlaps with the
first pixel of the image. In filter position 3, the filter is completely inside the
image and just touches its top and left boundaries. These three positions are
treated as starting points of convolution in three different types (shapes) of
outputs. Filter position 1 is full convolution, filter position 2 is b, and filter
position 3 is valid convolution. Since the filters at positions 1 and 2 are
partially occluded by filter 1, filter positions 4 and 5, respectively,
correspond to other corner positions of filters 1 and 2. They are shown for
clarity.
If filter position 1 is the starting position of convolution, then there is a
requirement of 4 pixel padding of zeros around the image. However, the
shape of the output image will be the original image shape + 2-pixel thick
boundary. It is also shown in the figure. The reason for this shape of output
is because the central coefficient of the filter is used to generate that result –
and that is 2 coefficients away from the boundary of the filter itself. The
padded values are, by default, assumed to be zero. The output so generated is
called full convolution. The shape of the full convolved result in this case is
(16+2+2)×(18+2+2) = 20x22.
If filter position 2 is used as the starting position of convolution, then the
output will have a shape equal to that of the original image (shape 16x18).
However, padding of 2-pixel thick boundary (initialized with zeros) will be
needed. This is the most used mode of convolution where the output and
input size are the same.
If filter position 3 is used as the initial position for starting the convolution,
then the shape of the output image will be less than the original image by a
2-pixel thick boundary. This mode is the only mode where no padding is
required. All the calculated values of convolution are calculated purely from
the image signal itself. This is called valid output because, in the other two
cases, the values at the boundary are calculated by assuming zero padding,
which is the data we attach to the signal forcibly. It is not a part of the signal
to begin with. The shape of the valid convolved result in this case is 12x14.
Having understood the above, let us see the code for convolution in the
above three ways in Code 4.2 with the result shown in Figure 4.10, and the
content of the shell output after that. Note line number 21, 22, and 23 for the
usage of convolution in three ways and convince yourself with the output
shapes as given in shell output after Figure 4.10. Also, note that in line
number 15 and 16, an average filter is created. This averaging filter, as
discussed earlier in the previous section, will smoothen the image. This is
clear from Figure 4.10 in all the forms of output (same, valid, and full).
Note that in part (b) and (d) of Figure 4.10, the effect of zero padding is
clearly visible as the dark boundary. It is thicker in part (d), where full
convolution is taken as it has a thick zero padding boundary. The boundary
in the output is dark because of zero padding. The calculated averages for
individual pixels in output during convolution tend to be on the darker side.
However, in part (c) of the figure, this dark boundary is absent because no
padding is required in VALID convolution. Due to this, the shapes of all
three outputs are different, as indicated in the shell output after Figure 4.10.
01-
#=====================================================
=========================
02- # PURPOSE : Illustration of 2D convolution
03-
#=====================================================
=========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import scipy.signal as sci
08- import my_package.my_functions as mf # This is a user defined package
and ...
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- input_image=np.float32(cv2.imread('img9.bmp',0))
12- #------------------------------------------------------------------------------
13- # 2D FILTER DESIGN
14- #------------------------------------------------------------------------------
15- n=5;
16- my_filter=np.ones([n,n])/(n**2) # Averaging filter (shape 5x5)
17-
18- #------------------------------------------------------------------------------
19- # 2D CONVOLUTION in 3 DIFFERENT WAYS
20- #------------------------------------------------------------------------------
21- conv_result_same=sci.convolve(input_image,my_filter,'same')
22- conv_result_valid=sci.convolve(input_image,my_filter,'valid')
23- conv_result_full=sci.convolve(input_image,my_filter,'full')
24-
25- #------------------------------------------------------------------------------
26- # PLOTTING
27- #------------------------------------------------------------------------------
28- fig1,ax1=plt.subplots(2,2)
29- fig1.show()
30- mf.my_imshow(mf.norm_uint8(input_image),"(a) Input
Image",ax1[0,0])
31- mf.my_imshow(mf.norm_uint8(conv_result_same),"(b) Convolution
(SAME)",ax1[0,1])
32- mf.my_imshow(mf.norm_uint8(conv_result_valid),"(c) Convolution
(VALID)",ax1[1,0])
33- mf.my_imshow(mf.norm_uint8(conv_result_full),"(d) Convolution
(FULL)",ax1[1,1])
34-
35- #------------------------------------------------------------------------------
36- # PRINTING SHAPE OF INPUT IMAGE, FILTER & VARIOUS
OUTPUTS
37- #------------------------------------------------------------------------------
38- print("Shape of input image ... ",np.shape(input_image))
39- print("Shape of convolution filter ... ",np.shape(my_filter))
40- print("Shape of SAME CONVOLUTION ...
",np.shape(conv_result_same))
41- print("Shape of VALID CONVOLUTION ...
",np.shape(conv_result_valid))
42- print("Shape of FULL CONVOLUTION ...
",np.shape(conv_result_full))
43-
44- plt.show()
45- print("Completed Successfully ...")
Code 4.2: 2D convolution on images
The output of the above code is shown in the following figure:

Figure 4.10: Illustrating three types (shape) of output in the convolution process
The output of Code 4.2 is given as follows:
Shape of input image ... (66, 81)
Shape of convolution filter ... (5, 5)
Shape of SAME CONVOLUTION ... (66, 81)
Shape of VALID CONVOLUTION ... (62, 77)
Shape of FULL CONVOLUTION ... (70, 85)
Completed Successfully ...
Keep in mind that these shapes will change according to the filter size used.
The size of the padding is determined by half of the filter size. Usually,
filters are of odd no. × odd no. shape, so by half filter size, we mean (odd-
1)/2, as shown in the following equation:
Equation 4.17:
half filter size = (filter size-1)/2
For full convolution, the output shape will be more than the original image
by 2 x half filter size. For valid convolution, the output shape will be less
than the original image by 2 x half filter size. Also remember that in Python,
we need to specify same, valid, or full option and zero padding will be taken
care of accordingly internally. We do not have to manually zero pad the
image.
From this point onward, unless otherwise stated, we will use the SAME
convolution so that the shape of the output image matches the shape of the
input image.

4.8 Smoothening filters


As discussed in the previous section, it is clear that smoothening filters will
try to smoothen out edges in images. Edges are the places in the image
where there is a change in intensity (possibly a step change). Other places in
the image could also have changes in intensity. Wherever the change is high,
we call it high frequency region. However, if we work in flat regions of
intensity like sky, etc., the changes in intensity values are nil to low. Such
regions are called low frequency regions. (We will understand this when we
study the frequency domain processing in Chapter 5). In this section, we will
try to study various kinds of averaging filters and the effect that they have on
the output.

4.8.1 Averaging filters


Let us refer to Figure 4.11 to understand what box filters mean. In part (a) of
the figure, a box filter with constant coefficients (all equal to 1/25) is shown.
The filter size is 5×5 (total 25 elements). Hence, every element is 1/25.
Here, note that it is compulsory for every coefficient in a box filter to have
the same value. Similarly, in part (b) of the figure, a box filter with size
10×10 is shown. In general, designing a box filter of size n×n is now trivial.
Figure 4.11: Examples of box filters (a) Box filter of size 5×5 (b) Box filter of size 10×10
Now, you can see the result of the application of such box filters in Figure
4.12. Part (a) has the original input image. Part (b), (c), and (d) have box
filtered images with filter sizes 5x5, 10x10, and 20x20, respectively.
Two things are worth noting here. First, if we increase the shape of the box
filter, the smoothening increases. Second, because of an increase in shape,
the values at the boundary become averaged out with padded zeros; hence,
the thickness of the dark boundary increases as well.
In general, every coefficient of a box filter of shape n×n must be 1/(n2). This
is because if we sum up all the elements of that box filter, we will get 1. This
implies that the filter energy is 1. This further implies that the filter is not
incorporating any additional energy (positive or negative) into the image; the
values after processing will still remain in the range [0 to 255]. If, however,
we change that, we get results out of this range. However, since we use
mf.norm_uint8 function (as in line number 30 to 33 of Code 4.2), the
values will be brought back to the range [0 to 255] for display purposes.
Figure 4.12: Application of different box filters on image
Refer to the following figure to see the problem of saturated grayscale
values:

Figure 4.13: Problem of saturated grayscale values


If we do not use that function, the values will be saturated. Hence, some
regions will have saturated color with pixel values beyond the range of [0 to
255], and so when we say that energy has been imparted from filter to image
(which is undesirable) — check that by not using that function and by
changing the coefficient of box filter to something else than 1/n2 for filter
shape nxn. An example of 5x5 box filter with every coefficient as 1/10 is
shown in Figure 4.13. Here, instead of using mf.norm_uint8, np.uint8 is
used while displaying, and we know how it behaves from Code 3.1 and
related discussion.
Remember that every box filter is an average filter, but every averaging filter
is not a box filter. This will become clear in the coming sections.

4.8.2 Circular filters


Before discussing the importance and usage of circular filters, let us see how
they look. We will also see equivalent box filters to compare the box and
circular filters fairly. Refer to Figure 4.14. Part (a) of this figure shows a
circular filter. By circular, we mean that the region where ones appear in the
filter are approximately circular. The row and column indices are also
written alongside for better readability. Note that the circular filter has a
rectangular shape as well. However, the region where ones exist is
approximately circular. During the convolution process, these zeros will
have no effect. Also, note that both filters of part (a) and (b) of Figure 4.14
are not normalized. This is done for illustrative purposes only. However, the
actual filters will have every element divided by the sum of all the elements
in the corresponding filters.

Figure 4.14: Circular vs. box filters of equivalent size


In part (b) of Figure 4.14, a box filter is shown. This filter has the same
number of ones (approximately) as in the circular filter. We call both the
filters equivalent as they take an average over the same region, which has an
equal area as shaded (remember that zeros in circular filters have no effect
on convolution). The code for generating the above two filters is shown in
Code 4.3. This code illustrates the process of such creation and saves the so-
created filters in an Excel file in two different sheets. Figure 4.14 is
generated by using this code. One may try with different filter shapes. Notice
that for higher filter shapes, the region covered by ones in the circular filter
will approximate a circle more closely. An example of the excel sheet
(sufficiently zoomed out) with m=101 as the filter shape for the circular
filter is shown in Figure 4.15, where is the rectangular (square) shape of the
circular filter. We will use this filter and its equivalent box filter in our next
program to understand the preferred usage of circular filters over box filters.
01-
#=====================================================
=================
02- # PURPOSE : Creating and visualizing Box & Circular filters
03-
#=====================================================
=================
04- import pandas as pd
05- import xlsxwriter
06- import numpy as np
07- import scipy.signal as sci
08- #------------------------------------------------------------------------------
09- # BOX & CIRCULAR FILTER DESIGN
10- #------------------------------------------------------------------------------
11- m=111 # shape of circular filter is (mxm) (keep this odd x odd)
12- circular_filter=np.zeros((m,m)) # circular filter initialised with zeros
13-
14- # Double loop to create circular filter
15- for i in np.arange(0,m,1):
16- for j in np.arange(0,m,1):
17- # this next if conition helps in creating circular filter
18- # it does so by making all the coef's = 1
19- # within a predefined distance from the central
20- # point of filter
21- if np.sqrt((i-(m-1)/2)**2+(j-(m-1)/2)**2)<(m-1)/2:
22- circular_filter[i,j]=1
23- # upto this point the circular filter is UN-normalised
24- s1=pd.DataFrame(circular_filter) # Converting array to dataframe
25- #for writing into excel file
26-
27- # in the next line we calculate the shape of square filter
28- # which has same no. of ones as present in circular filter
29- # where 'n' is one side of square
30- n=np.int16(np.sqrt(np.sum(circular_filter)))
31- box_filter=np.ones([n,n]) # BOX FILTER of shape nxn (UN-
normalised)
32- s2=pd.DataFrame(box_filter)# Converting array to dataframe
33- #for writing into excel file
34-
35- # Normalising the circular filter & box_filter
36- circular_filter=circular_filter/np.sum(circular_filter)
37- box_filter=box_filter/(n**2)
38-
39- #------------------------------------------------------------------------------
40- # Saving to excel sheet
41- #------------------------------------------------------------------------------
42- # write sheets (dataframes) to excel file
43- data2excel=pd.ExcelWriter('Box and Circular
Filter.xlsx',engine='xlsxwriter')
44- s1.to_excel(data2excel,sheet_name='Circular_filter')
45- s2.to_excel(data2excel,sheet_name='Box_filter')
46- data2excel.save()
47- print("\nCompleted Successfully ...")
Code 4.3: Creating circular and equivalent box filter and saving data in excel sheet
Now, having understood the difference between a circular and box filter
from a construction viewpoint and the equivalence between circular and box
filters, let us move on to understand the usage of circular filters. Figure 4.16
shows a test pattern over which we will run the circular filter of Figure 4.15:

Figure 4.15: Sufficiently zoomed out excel sheet for visualizing Circular filter with size m=111
Above is a snapshot of the excel sheet. One may see this in the code folder:

Figure 4.16: Test image (shape – 312x416)


The results of applying the two filters are shown below:

Figure 4.17: Result of application of (a) box filter (b) equivalent circular filter on test image of Figure
4.16.
Part (a) and (b) of Figure 4.17 shows the result after convolution with
equivalent box and circular filters, respectively. Notice that in part (a),
which is due to the box filter, apart from smoothening, there is a distortion of
circular shape (the final dark region appears to be square, and the effect of
smoothening is not uniform. It is different in horizontal, vertical vs. other
directions). However, in part (b), the distortion is absent — this is desired
also. This happens because a box filter is biased in horizontal and vertical
direction due to its very shape. However, the circular filter is isotropic (i.e.,
it behaves the same irrespective of the direction). This is the reason why
circular filters are preferred over box filters and any other shape for that
matter.
In the next section, we will discover that smoothening can be further
improved if the filter coefficients are wisely chosen.

4.8.3 Weighted filters


To understand weighted filters, refer to Figure 4.18 (which is the test image)
and Figure 4.19 (which shows weighted filters and the effect of running
them on the test image of Figure 4.18):
Figure 4.18: A alphabet test image for filters of Figure 4.19 (a) and (b)
The figure shown above is the standard image that we are going to use and
the figure shown below are filters applied on it:

Figure 4.19: Illustration of weighted vs. box filter’s output


Part (a) and (b) of Figure 4.19 shows equivalent weighted and box filters in
un normalized form. The unnormalized weights are also mentioned in part
(a) and (b), and the pixels in the filter are accordingly grayscale colored. The
brightest pixel is white, and then the grayscale color decreases in intensity
according to its value. Also, note that both filters have different shapes
(weighted — 5x5, box 7x7). Every element of these filters is to be divided by
the sum of the total elements in corresponding filters before performing
convolution. The equivalence between two filters is in the sense that in un
normalized form, the sum of all elements of both filters is the same (if not
exactly, then approximately; exactly is an ideal situation). The reason for
calling the filter in part (a) as weighted is that now every element is
different. Further, the weighting scheme is monotonically decreasing —
which means that the center element (or element under processing is given
the highest weight, and then the weights keep reducing in a circularly
symmetric pattern in 2D. This way, while convolution is done, more
weightage is given to the nearby elements (pixels) of the image.
The result of application of these two filters can be seen in part (c) and part
(d) of Figure 4.19. It can be clearly noticed that the blurring is brutal due to
the box filter, which is the obvious reason for weighing every pixel in the
neighborhood as equal. Secondly, one may also notice the boundary pixels.
We know that they will be dark because of zero padding. However, the result
due to the weighted filter is less affected by it.
Note that while defining equivalence, we have considered equating the un
normalized sum of all elements in both filters. This definition is not
standardized and depends on the application. Someone might take the shape
of the filter as a normalizing factor. In our case, the shape of the weighted
filter is 5x5, but the box filter is 7x7. Hence, we have not normalized them
with respect to the shape. Also, note that the weighting scheme suggested
here is not unique. There can be many such weighing schemes — all that it
requires is a function that implements weighing, which should be circularly
symmetric and monotonically decreasing. We recommend trying different
weighing schemes and noting the difference in results.

4.8.4 Gaussian filter


Gaussian filters are also weighted (monotonic) filters but they have special
importance in signal processing in general. Before discussing their
importance, let us see its graph for a 1D case in Figure 4.20. Mathematically,
it can be written in the form of an equation, as shown below:
Equation 4.18:

where a is the amplitude of the Gaussian function (highest value of the


function), b is the location of the center of the curve (Gaussian curve is a
bell-shaped curve, as shown in Figure 4.20) and c indicates the thickness of
the curve. For two different sets of these values, two different Gaussian
curves are shown in the figure. A more commonly used expression for the
Gaussian function is shown below (as used in statistics):
Equation 4.19:

It is easy to see that . The term µ is called mean,


and the term σ2 is the variance here (as called in statistics).

Figure 4.20: Illustration of Gaussian function


The expression for 2D Gaussian function is represented in the following
equation:
Equation 4.20:

where is the amplitude, x0 , and y0 are the means in respective directions. σx


and σy are the variances in respective directions. A 2D plot of the Gaussian
curve for values A=1, x0= y0 =0, and σx= σy=5 is shown in Figure 4.21. The
plot shown is normalized with respect to intensity values by norm_uint8
function. Notice that the intensity is maximum at the center and reduces as
we go away from the center and diminishes to 0 gracefully. To generate a
Gaussian filter, as shown in Figure 4.21, one may use Code 4.4. The code is
a straightforward implementation of Equation 4.20. The new function
introduced there is meshgrid function, which creates a mesh of 2D indices,
as explained in a short example in code comments.
01-
#=====================================================
=================
02- # PURPOSE : Creating a 2D Gaussian filter/function
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package
and ...
06- # one may find the details related to its contents and usage in section
2.7.3
07-
08- x=np.arange(-10,11,.01) # Array of x-coordinates
09- # e.g. x=[1,2,3,4]
10-
11- y=np.arange(-10,11,.01) # Array of y-coordinates
12- # e.g. y=[1,2,3]
13-
14- xx,yy=np.meshgrid(x,y) # 2 arrays of x & y coordinates in 2D
15- # e.g. xx=[1,2,3,4
16- # 1,2,3,4 ==> Array of ALL x-coordinates
17- # 1,2,3,4]
18-
19- # e.g. yy=[1,1,1,1
20- # 2,2,2,2 ==> Array of corresponding y-coordinates
21- # 3,3,3,3]
22-
23- A=1 # Amplitude of 2D Gaussian Function
24- sigma_x=5 # Standard deviation along x-direction
25- sigma_y=5 # Standard deviation along y-direction
26- x0=0 # Mean along x-direction
27- y0=0 # Mean along y-direction
28-
29- # Creating the Gaussian function as per 2D equation
30- Gauss_function=A*np.exp(-(((xx-x0)**2)/(2*(sigma_x**2))+((yy-
y0)**2)/(2*(sigma_y**2))))
31- mf.my_imshow(mf.norm_uint8(Gauss_function),"Typical 2D Gaussian
Function")
32-
33- plt.show()
34- print("Completed Successfully ...")
Code 4.4: Creating 2D Gaussian function/filter (Result of execution in Figure 4.21)
The output of the above code is shown in next figure:
Figure 4.21: 2D Gaussian Function (illustration of general shape) generated by Code 4.4
It is not necessary that the Gaussian filter should have the same variance
along x and y directions. As shown in Figure 4.22, it can have different
variances in both directions. Hence, it is easy to see that σx and σy controls
the spread of the Gaussian function in respective directions.
Another important point to note about Gaussian kernel (filter) is that it is
separable. By separability, we mean that a filter in 2D can be written as a
product of two simple 1D filters. In Figure 4.22, the two 1D filters are also
drawn along the x and y axis with dashed white lines:
Figure 4.22: Illustration of different variance in both directions in a Gaussian filter and separability
of Gaussian kernel
The separability of 2D Gaussian kernel can also be noted by rewriting
Equation 4.20, as shown below:
Equation 4.21:

From Equation 4.21, it can be clearly seen that there are two exponential
functions for the x and y directions, respectively (they are drawn in Figure
4.22). The advantage of this separability is that 2D convolution can be
replaced by two 1D convolutions, which reduces the computational
complexity of convolution to a great extent. Although the topic of
computational complexity of algorithms is out of the scope of this book, this
fact is worth noting about the Gaussian kernel. Not all kernels can be
separated in this way.
In general, Gaussian functions are important because a lot of natural
phenomena follow Gaussian distribution. For example, if we plot the
histogram of marks obtained by students in a class, it will probably be
Gaussian distributed. The histogram of the height of all the people in your
area of living (colony/zone) is Gaussian distributed. Noise in a lot of systems
also follows Gaussian distribution.
Code 4.5 implements the inbuilt Gaussian filter for smoothening.
gaussian_filter function comes from the scipy.ndimage library. It takes the
first argument as the input image and the second as the variance of the
Gaussian function (see line number 14 in Code 4.5). There could be more
arguments for modifying other default arguments, one can always use the
help function on the Python shell.
01-
#=====================================================
=================
02- # PURPOSE : Learning use of Inbuilt Gaussian filter
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import scipy.ndimage as sci
06- import my_package.my_functions as mf # This is a user defined package
and ...
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- input_image=cv2.imread('img1.bmp',0)
10- fig,ax=plt.subplots(1,2)
11- fig.show()
12- mf.my_imshow(input_image,'Input Grayscale Image',ax[0])
13- input_image=np.float32(input_image)
14-
15- filtered_image=sci.gaussian_filter(input_image,5)
16- mf.my_imshow(mf.norm_uint8(filtered_image),"Filtered image
(INBUILT GAUSSIAN FILTER)",ax[1])
17-
18- plt.show()
19- print("Completed Successfully ...")
Code 4.5: Code for using inbuilt Gaussian filter
The output of Code 4.5 will be similar to the results shown in Figure 4.12,
Figure 4.17, and Figure 4.19. That is why it is not shown separately.

4.9 Sharpening filters


It is intuitive to think about sharpening as the opposite of smoothening. In a
sense it is true too. To understand sharpening, let us look at what happens if
we subtract a blurred version of an image from its original image in Figure
4.23:

Figure 4.23: Illustration of difference between original image and its blurred version
In part (a) and (b) of Figure 4.23, original grayscale image and its blurred
version (generated by Gaussian smoothening with σ=5) are shown
respectively. Part (c) shows the difference between the two. Note that most
portions in part (c) correspond to edges in the original image. To understand
this further, in part (d), we have only plotted those pixels from part (c) that
have intensities greater than 10% of the maximum value in (c). It can be
clearly understood that the difference between the original image and the
blurred version contains only high-frequency components (i.e., edge
information).
4.9.1 Unsharp masking and high boost filtering
If the information of the image in part (c) of Figure 4.23 is added back to the
original image, we will get a sharpened image, as shown in part (b) of
Figure 4.24. This process is called unsharp masking.

Figure 4.24: Illustration of image sharpening


In place of adding the difference image to the original, if we multiply the
difference image by k and then add it to the original, this process is called
high boost filtering. Part (a) of Figure 4.24 is the original image. Part (b) is
high boost filtering with k=1 (which is also called unsharp masking). Part
(c) and (d) are also results of high boost filtering with k= 3 & 5 respectively.
In general, k can have any real value greater than 0. For k>1, the strength of
sharpening increases. This can be noted in part (c) and (d) of Figure 4.24.
The code to generate Figure 4.23 and Figure 4.24 is given in Code 4.6.
Mathematically speaking, if f(x,y) is the original grayscale image and g(x,y)
is the blurred version of f(x,y), then high boost filtered image h(x,y) for any
k>0 can be written as shown below:
Equation 4.22:
h(x,y)=f(x,y)+k[f(x,y)-g(x,y)]
where for k=1, the process is called unsharp masking. For any other k, it is
called high boost filtering.
01-
#=====================================================
=================
02- # PURPOSE : Illustration of Unsharp Masking & High boost filtering
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import scipy.ndimage as sci
06- import my_package.my_functions as mf # This is a user defined package
and ...
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- # ---------------------------------------------------------------------
10- # IMPORTING IMAGE & DISPLAYING
11- # ---------------------------------------------------------------------
12- input_image=cv2.imread('img3.bmp',0)
13- fig,ax=plt.subplots(2,2)
14- fig.show()
15- mf.my_imshow(input_image,'(a) Grayscale Image',ax[0,0])
16- input_image=np.float32(input_image)
17-
18- # ---------------------------------------------------------------------
19- # GAUSSIAN FILTERING
20- # ---------------------------------------------------------------------
21- filtered_image=sci.gaussian_filter(input_image,5)
22- mf.my_imshow(mf.norm_uint8(filtered_image),"(b) Gaussian
Smoothened Image (sigma = 5)",ax[0,1])
23- difference_image=input_image-filtered_image
24-
25- # ---------------------------------------------------------------------
26- # DIFFERENCE IMAGE
27- # ---------------------------------------------------------------------
28- mf.my_imshow(mf.norm_uint8(difference_image),"(c) Difference
Image",ax[1,0])
29- th_difference=255*(difference_image>.1*np.max(difference_image))
30- mf.my_imshow(np.uint8(th_difference),"(d) Thresholded difference (For
better visualization)",ax[1,1])
31-
32- # ---------------------------------------------------------------------
33- # UNSHARP MASKING & HIGH BOOST FILTERING
34- # ---------------------------------------------------------------------
35- fig2,ax2=plt.subplots(2,2)
36- mf.my_imshow(np.uint8(input_image),"(a) Original",ax2[0,0])
37- mf.my_imshow(np.uint8(input_image+difference_image),"(b) Unsharp
Masking (k=1)",ax2[0,1])
38- mf.my_imshow(np.uint8(input_image+3*difference_image),"(c) High
boost filtering (k=3)",ax2[1,0])
39- mf.my_imshow(np.uint8(input_image+5*difference_image),"(d) High
boost filtering (k=5)",ax2[1,1])
40- fig.show()
41-
42- plt.show()
43- print("Completed Successfully ...")
Code 4.6: Unsharp Masking and High boost Filtering

4.9.2 First order derivative-based image sharpening


In this section and the next, we will discuss derivative-based image
sharpening. This section will focus on the first order derivative-based image
sharpening. The case for one-dimensional derivatives was discussed in
Section 4.5.2. The conclusions drawn there will be applicable in 2D as well.
However, we would like to see that for images in this section.
4.9.2.1 Prewitt Kernel
To begin with, Equation 4.12 for two-point central difference is written
again, as shown below:
Equation 4.23:

We already know that this is used to calculate the derivative of a 1D signal in


the x-direction. The filter/kernel for this is [1/2,0,-1/2] or (1/2)×[1,0,-1] or
simply [1,0,-1] as we are not interested in the weighing factor 1/2. If we try
to create the 2D version of this filter of size 3×3 for x-direction alone, it will
be and for the y-direction alone, it will be . These two
filter kernels are called Prewitt operators in the x and y directions,
respectively.
An image is a 2D entity that may have changes in any direction (other than x
or y). So, the above two operators may be combined by using gradient
operation to yield changes in any arbitrary direction (by using resultant). The
gradient G(x,y) at a pixel (x,y) is defined as follows: if we convolve our
original grayscale image with Px and Py respectively, we get gradient image
Gx (x,y) for x-direction and Gy (x,y) for y-direction. Hence, the overall
gradient is a vector . From this, we can calculate the
magnitude of gradient M(x,y) and direction of gradient D(x,y), as shown
below, respectively:
Equations 4.24:
M(x,y)=[Gx (x,y)2+Gy (x,y)2]1/2
Equations 4.25:

Note that the magnitude of the gradient tells you how strong the edge is at
that point (the higher the value, the stronger the edge). The direction of the
gradient tells us about the direction in which this edge is (in radians or
degrees). The direction vector is perpendicular to the direction of the edge at
that pixel. The result of the application of the Prewitt kernel is shown in
Figure 4.25. In part (a) of the figure, a test image is shown, which we will
use to understand the behavior of the Prewitt operator. Part (b) shows the
result of the application of the horizontal Prewitt kernel Px onto the image.
Now, we need to note a few things. Firstly, if we look at the edges of the
rectangle, only vertical edges are shown in the response in part (b). The
reason is that the placement of 0’s, 1’s, and -1’s in the horizontal kernel Px, it
will only respond to changes in the horizontal direction in the original
image. However, if one moves along the horizontal edge in the original
image, there is no change in its intensity; hence, the gradient along that
direction is 0.
Secondly, the vertical edges found in part (b) are different in color. The first
one is bright; however, the second one is darker. This can also be understood
from the placement of 0’s, 1’s, and -1’s in the horizontal filter Px. Note that
while convolution is performed, filter Px will be flipped and we obtain
flipped (Px) = . Try to visualize the process of convolution. For a
given pixel in the original image, we will put the center of this filter on that
pixel, find element by element product of overlapping elements, and then
add all those products. For a given Px, the response will be highest when the
pixel under processing has a black region (with all zeros) to its left and a
white region (with all 255’s) to its right. This is why the first vertical edge in
part (b) of Figure 4.25 is bright while the second vertical edge to the right
side is dark, as the response of the kernel on those points will be the least.
Try to correlate this with Figure 4.6, where we have studied the same thing
for a 1D derivative — a change from a low to a high value in signal results
in a positive peak in derivative, and for a high to low change in signal level,
a negative peak was observed in the derivative.
Similarly, other edges can be interpreted. Also, note that for the hypotenuse
of the triangle, the response in non-zero in both part (b) and part (c) of the
figure because it has changes in both horizontal and vertical direction.
The same arguments apply to part (c) of Figure 4.25 but for kernel Py. In
part (d) of the image, we find the overall gradient, as illustrated earlier in
this section. Also, we are only interested in the magnitude of the response.
One can clearly note that in part (d), all the edges are correctly marked.
Figure 4.25: Finding Image Gradient (magnitude only) by using Prewitt filter
The code for generating the response of the Prewitt filter (also the Sobel and
Roberts filter — which are discussed in the next section) is given in Code
4.7 By default, it is set to work for the Prewitt filter. To run it for other
filters, changes in line number 27, 28, and 48 are to be made.
01-
#=====================================================
=========================
02- # PURPOSE : Illustration of Prewitt, Sobel & Roberts kernel on Images
03-
#=====================================================
=========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import scipy.signal as sci
08- import my_package.my_functions as mf # This is a user defined package
and ...
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- input_image=np.float32(cv2.imread('img13.bmp',0))
12- #------------------------------------------------------------------------------
13- # CREATING PREWITT, SOBEL or ROBERTS KERNELS
14- #------------------------------------------------------------------------------
15- my_filter1_x_Prewitt=np.float32([[1,0,-1],[1,0,-1],[1,0,-1]]) # Prewitt x-
direction
16- my_filter1_y_Prewitt=np.float32([[1,1,1],[0,0,0],[-1,-1,-1]]) # Prewitt y-
direction
17-
18- my_filter1_x_Sobel=np.float32([[1,0,-1],[2,0,-2],[1,0,-1]]) # Sobel x-
direction
19- my_filter1_y_Sobel=np.float32([[1,2,1],[0,0,0],[-1,-2,-1]]) # Sobel y-
direction
20-
21- my_filter1_x_Roberts=np.float32([[0,-1],[1,0]]) # Roberts x-direction
22- my_filter1_y_Roberts=np.float32([[-1,0],[0,1]]) # Roberts y-direction
23-
24- #------------------------------------------------------------------------------
25- # SELECTING PREWITT or SOBEL or ROBERTS KERNELS
HERE
26- #------------------------------------------------------------------------------
27- filter1_x=my_filter1_x_Roberts
28- filter1_y=my_filter1_y_Roberts
29-
30- #------------------------------------------------------------------------------
31- # 2D CONVOLUTION
32- #------------------------------------------------------------------------------
33- G_mag_x=sci.convolve(input_image,filter1_x,'same') # x-derivative
34- G_mag_y=sci.convolve(input_image,filter1_y,'same') # y-derivative
35- G_mag=(G_mag_x**2+G_mag_y**2)**(1/2) # Gradient Magnitude
36-
37- #------------------------------------------------------------------------------
38- # PLOTTING
39- #------------------------------------------------------------------------------
40- fig1,ax1=plt.subplots(2,2)
41- fig1.show()
42- mf.my_imshow(mf.norm_uint8(input_image),"(a) Input
Image",ax1[0,0])
43- mf.my_imshow(mf.norm_uint8(G_mag_x),"(b) x-derivative",ax1[0,1])
44- mf.my_imshow(mf.norm_uint8(G_mag_y),"(c) y-derivative",ax1[1,0])
45- mf.my_imshow(mf.norm_uint8(G_mag),"(d) Gradient
Magnitude",ax1[1,1])
46-
47- # Way of adding super title to figure having subplots
48- fig1.suptitle("Roberts Kernel Results", fontsize=15)
49- # Change the string in above line according to selected kernel
50-
51- plt.show()
52- print("Completed Successfully ...")
Code 4.7: Image Gradient by Prewitt, Sobel, and Robert's method
Having understood the Prewitt operator/kernel — which was derived from
two-point central difference formula, we may synthesize the following
popular kernels and use them in place of horizontal and vertical Prewitt
kernel with the gradient illustrated above. There are many such kernels; we
discuss some other popular ones in the next two sections.

4.9.2.2 Sobel kernel


The Sobel kernel is like the Prewitt kernel, with the only point of difference
being that it gives more weightage to the row/column in which the central
pixel lies during convolution. The filter is for x direction and
for y direction. Note that for the x-direction, it gives more weightage to the
central row and for the y-direction, to the central column. The results of the
application of these kernels are shown in Figure 4.26:

Figure 4.26: Finding Image Gradient (magnitude only) by using Sobel filter
Note that for this test image shown in part (a) of Figure 4.26, the results
seem to be similar to Prewitt kernel’s results — and that is expected.

4.9.2.3 Roberts kernel


Roberts kernel for x direction is and for y direction. Notice
that this is 2×2 in size. The result of the application of these filters on the
test image is shown in Figure 4.27. Note that due to 1’s in diagonal in the
structure of Robert’s filter, the results are different as compared to Sobel and
Prewitt in horizontal and vertical directions. However, the gradient
(magnitude) comes out to be the same in this case, too — and that is what
we are interested in for sharpening purposes.
In Code 4.7, we have manually created the kernels. OpenCV library has
these kernels available as functions. So, alternate codes can also be written
doing the same things. It is left as an exercise for the reader to explore other
possibilities.
Figure 4.27: Finding image gradient (magnitude only) by using Robert’s filter

4.9.2.4 Sharpening by Prewitt, Sobel, and Roberts kernels


As discussed in Section 4.9.1 and Code 4.6, sharpening is a straightforward
operation. To the original image, we add high frequency information
(difference of the original image and it is blurred version). Here, in place of
the difference of the original image and it is blurred version, we will use
gradient magnitude. However, the thing to note here is that the obtained
gradient magnitude through Prewitt, Sobel, Roberts, or any other method is
not in a small range inside 0-255 grayscale levels. So, before adding, we
must make some arrangements. To illustrate these concepts, refer to Code
4.8 and its output in Figure 4.28 and Figure 4.29.
01-
#=====================================================
=========================
02- # PURPOSE : Illustration of Image Enhancement (Sharpening) using
Sobel Kernel
03-
#=====================================================
=========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import scipy.signal as sci
08- import my_package.my_functions as mf # This is a user defined package
and ...
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- input_image=np.float32(cv2.imread('img12.bmp',0))
12- #------------------------------------------------------------------------------
13- # CREATING SOBEL KERNEL
14- #------------------------------------------------------------------------------
15- filter1_x=np.float32([[1,0,-1],[2,0,-2],[1,0,-1]]) # Sobel x-direction
16- filter1_y=np.float32([[1,2,1],[0,0,0],[-1,-2,-1]]) # Sobel y-direction
17-
18- #------------------------------------------------------------------------------
19- # 2D CONVOLUTION
20- #------------------------------------------------------------------------------
21- G_mag_x=sci.convolve(input_image,filter1_x,'same') # x-derivative
22- G_mag_y=sci.convolve(input_image,filter1_y,'same') # y-derivative
23- G_mag=(G_mag_x**2+G_mag_y**2)**(1/2) # Gradient Magnitude
24-
25- #------------------------------------------------------------------------------
26- # PLOTTING
27- #------------------------------------------------------------------------------
28- fig1,ax1=plt.subplots(2,2)
29- fig1.show()
30- mf.my_imshow(mf.norm_uint8(input_image),"(a) Input
Image",ax1[0,0])
31- mf.my_imshow(mf.norm_uint8(G_mag_x),"(b) x-derivative",ax1[0,1])
32- mf.my_imshow(mf.norm_uint8(G_mag_y),"(c) y-derivative",ax1[1,0])
33- mf.my_imshow(mf.norm_uint8(G_mag),"(d) Gradient
Magnitude",ax1[1,1])
34- fig1.suptitle("Sobel Kernel Results", fontsize=15)
35- fig2,ax2=plt.subplots(1,3)
36- fig2.show()
37- mf.my_imshow(mf.norm_uint8(input_image),"(a) Input Image",ax2[0])
38- sharpened_image=input_image+np.float32(mf.norm_uint8(G_mag))
39- mf.my_imshow(mf.norm_uint8(sharpened_image),"(b) Sharpened
Output (histogram shifted)",ax2[1])
40-
41- #------------------------------------------------------------------------------
42- # SHARPENING & PLOTTING
43- #------------------------------------------------------------------------------
44- G_mag=np.float32(mf.norm_uint8(G_mag)) # Bring Grad Mag in 0-255
range
45- # and bringing back to float 32 format
46-
47- # Finding locations where gradient magnitude is significant
48- # The threshold used is median of gradient magnitude itself
49-
indx=np.where(np.float32(mf.norm_uint8(G_mag)>np.median(G_mag)))
50- input_image[indx]=input_image[indx]+G_mag[indx]
51-
52- # Saturating the values above 255 after addition to 255
53- input_image[input_image>255]=255
54-
55- sharpened_image=input_image
56- mf.my_imshow(mf.norm_uint8(sharpened_image),"(c) Sharpened
Output",ax2[2])
57-
58- plt.show()
59- print("Completed Successfully ...")
Code 4.8: Image sharpening illustration
Most of the code is like Code 4.7. We have used the Sobel operator to find
the gradient magnitude. The result of which can be seen in Figure 4.28.
In line number 38, we have simply added the original image to gradient
magnitude and plotted it in line number 39. The result can be seen in part (b)
of Figure 4.29. Notice that due to the addition of two images, the values are
averaged out for every pixel. Due to this, the overall brightness seems to be
reduced, i.e., the histogram has shifted to the left side. Although sharpening
is there, we want to preserve the brightness too. That is why we do what is
illustrated next.
In line number 44, we first bring the gradient magnitude to range 0-255 as
originally; it was way much out — you can check this by using Python shell
always. Then, we bring it back to float32 format as we are not yet done with
the complete process of sharpening – as we have to add this to the original
image. In line number 49, we find out those indices of pixels in the gradient
image that are higher than their median values, and at those pixels in the
original image, we add gradient magnitude in line number 50 — the rest of
the pixels are left untouched. Due to this addition, for some pixels, their
value may go beyond 255. So, in line number 53, we again saturate those
values to 255 (strictly speaking, this was also needed in Code 4.7 but there
were not many pixels that would benefit from it, so we left it there). The
result of this can be seen in part (c) of Figure 4.29. The image is sharpened
as compared to the original image in part (a) of the figure:
Figure 4.28: Sobel operator on natural image
The sharpened output is shown in figure next:

Figure 4.29: Illustration of image sharpening by using Sobel operator


At this point, it is important to note that apart from Prewitt, Sobel, and
Roberts, there are many more kernels available. All of them have been
derived from some kind of digital derivative method (equation) — one must
try to explore and compare the result amongst many available kernels. Also,
derivatives are not only used for sharpening; they are used for enhancing the
image — sometimes through addition to the original (which we call
sharpening), sometimes through other ways like element-by-element
multiplication, or otherwise. We will explore in coming chapters that apart
from gradient magnitude, the angle also plays a vital role in a lot of
applications.
4.9.3 Second order derivative-based image enhancement
The second derivative for a 1D signal was given in Equation 4.13, which is
re-written as:
Equation 4.26:
f'' (x)=f(x-1)-2f(x)+f(x+1)
Had it been for the 2D case, we would have two such derivatives. f'' (x)—
for constant y and f'' (y) for constant x. Both are written as follows:
Equations 4.27:

Equations 4.28:

Laplacian (which is the second derivative) is defined [for a 2D function


f(x,y)], as represented below:
Equation 4.29:

or
Equation 4.30:

Keeping in mind the above equation, it is not difficult to see the Laplacian
kernel structure as: . Since this filter is symmetrical in both x and y
directions, there is no need to have two separate filters in x and y directions.
Let us now see the response of this filter over our test image in Figure 4.30:
Figure 4.30: Application of Laplacian kernel to test image
An interesting point to note here is the result in part (b) of Figure 4.30,
unlike the Prewitt or Soble operators, where we had either bright or dark
response for the left and right vertical edges of the rectangle, respectively,
we have both bright and dark response at both. You may compare this with
Figure 4.7 where the same second derivative is applied to a 1D signal.
There, we pointed out that we would have zero crossing at the edge, and
around that zero crossing, we would have two peaks in opposite directions.
The sign of the first peak will be decided by whether the change in signal is
from high to low (positive peak) or low to high (negative peak). The other
peak after the zero crossing will have an opposite sign. The same
phenomenon is observed in 2D here. Instead of having a single edge
response, we have a double edge response for the Laplacian kernel.
Because of the double edge problem, Laplacian is not used for sharpening
but rather used for edge detection because, at the edge, we have a zero
crossing. Again, one might argue that in the case of the first derivatives, we
had peaks (maxima or minima) at those locations, so our purpose is served
there, but the argument is that maxima or minima could be local or global –
so they are relative. However, zero crossing is a zero crossing whether the
edge is strong or weaker in contrast. Secondly, it is easy to design a zero-
crossing detector as compared to a peak detector. This justifies the usage of
the Laplacian kernel.
Like the kernel used above for the second derivative, kernels are also
derived from other second order equations. Some of them are:
. The latter two are primarily used as point detectors;
we recommend trying them out.

4.10 Convolution vs. correlation


For mathematicians, convolution and correlation are two operations between
two functions. From their perspective, it is easy to say that convolution is the
same as correlation except for flipping the kernel.
For engineers, however, we do not deal with functions. We try to model the
real world with functions. Our main concern is signals and systems. Both
can be modeled by using functions. Convolution is an operation between a
signal and a system. It is visualized as a system taking an input (called
signal) and producing an output. However, correlation is visualized as an
operation between two signals where we try to find the similarity between
both. For us as engineers, convolution and correlation are different
operations due to their context. For convolution, we will talk about impulse
response, but impulse response does not make any sense for correlation.
To further understand the difference between convolution and correlation, let
us see Figure 4.31:
Figure 4.31: Impulse response for convolution and equivalent process for correlation
We now look at this figure from the perspective of convolution only. Part (a)
is the input signal (although it is an image only). This is a 2D unit impulse
function. Part (b) is the filter/kernel applied. In our case, this is the impulse
response of the system (as discussed in Section 4.3.3 for a 1D system). Part
(c) of Figure 4.31 shows the response (convolution without flipping the
kernel, also called correlation) of the system in part (b) to the input signal in
part (a). Part (d) shows the actual convolution result between the signal and
the system. Now, notice the result in part (d) first. If the input of the system
is an impulse, the output will be the system function itself at the point where
the impulse exists. This is clearly shown in part (d), where we get the filter
copied at the point where there is an impulse in the input image (signal).
However, in part (c), one may note that the response is flipped (rotated by
180 degrees). From here, we learn that if an impulse is put as an input to the
system, we get the system’s impulse response as an output for convolution
while in correlation it is flipped. The code for generating Figure 4.31 is
given in Code 4.9:
01-
#=====================================================
=================
02- # PURPOSE : Understanding Convolution & Correlation
03- # (Impulse response perspective)
04-
#=====================================================
=================
05- import my_package.my_functions as mf # This is a user defined package
06- # one may find the details related to its contents and usage in section
2.7.3
07- import cv2,matplotlib.pyplot as plt, numpy as np
08- import scipy.ndimage as sci
09-
10- a=np.zeros([11,11])
11- a[5,5]=1
12- a=np.float32(a) # Signal Created
13- my_filter=np.array([[1,2,3],[4,5,6],[7,8,9]]) # System Created
14-
15- filtered_image_corr=sci.correlate(a,my_filter) # Performing Correlation
16- filtered_image_conv=sci.convolve(a,my_filter) # Performing
Convolution
17-
18- #--------------------------------------------------------------------------
19- # Plotting
20- #--------------------------------------------------------------------------
21- fig1,ax1=plt.subplots(2,2)
22- fig1.show()
23- mf.my_imshow(a,'Input Image',ax1[0,0])
24- mf.my_imshow(mf.norm_uint8(my_filter),'Filter Applied',ax1[0,1])
25- mf.my_imshow(mf.norm_uint8(filtered_image_corr),'Response of
CORR',ax1[1,0])
26- mf.my_imshow(mf.norm_uint8(filtered_image_conv),'Response of
CONV',ax1[1,1])
27-
28- plt.show()
29- print("Completed Successfully ...")
Code 4.9: Convolution vs. correlation: Impulse response perspective
Now, let us refer to Figure 4.32. We will look at this figure only from the
perspective of correlation. Here, our objective is to find out the shape shown
in part (b) of the figure inside the image in part (a). Note that the signal in
part (b) exists in the signal in part (a) at the position marked by a star sign in
the signal in part (a). At that point in the final response, we should get the
brightest pixel. This is what correlation does: finding the similarity between
two signals. The response will be highest (brightest at a point where the
shapes match perfectly).
Keeping this in mind, let us now try to analyze the results shown in part (c)
and (d) of the figure. In part (c), the result of the actual correlation is shown.
One can observe that the response is brightest at the same location where the
center of shape is in the signal in part (a) (as marked by star), which is
expected from correlation. However, in part (d), where we show the result of
correlation by flipping the kernel (equivalent to convolution), the location of
the brightest spot in the response does not overlap with the location as
marked by star — it is one location below that (in y direction). The actual
location where maxima should be detected and the location where maxima is
detected are different, which means the result is not as per the correlation.
So, this is not the actual correlation.

Figure 4.32: Similarity measurement using correlation and equivalent process for convolution
From here, we learned that if one desires to find out the location of a match
in the two signals, one should use correlation and not convolution. Although
mathematically, both operations seem to be the same except for the flipping
of kernel. However, if the filter/signal used is symmetric in nature, then both
convolution and correlation will become the same as flipping a symmetric
filter or no flipping — both are the same.

Conclusion
Spatial domain filtering is an operation that relies on the mathematical
operation of convolution. Convolution is achieved using various kernels.
Various kernels for doing operations of sharpening, smoothening etc. were
presented in this chapter. The shape of the filters (i.e. number of rows and
columns in rectangular filters or radius in circular filters) plays an important
role in deciding the results apart from the dominant role played by filter
coefficients. Various types of filters like sharpening, smoothening, etc. were
explored. Also, the chapter concluded by pointing out the differences
between the operation of convolution and correlation as both are
mathematically similar but yield very different results.

Points to remember
• For applying the operation of convolution, the system should be linear
and time invariant.
• The response of derivative filter shoots at those points in the signals
where there is significant difference between consecutive samples.
• Circular shaped filters do not have propagation artifacts. A box filter can
be made circular by selecting the weights (coefficients) of the filters
properly.
• The kernel of a two-dimensional Gaussian filter is separable into its x
and y components. Not all filters have this property of separability.
• From application point of view the relation between convolution and
correlation is flipping of the filter while application. Both operations are
same in case of symmetric filter.

Exercises
1. Explain the difference between the working of first and second order
derivative filters.
2. How is the following filter expected to behave dominantly? Averaging
or sharpening?

5 4 4 3 3

4 5 5 4 4
3 4 5 6 5

3 4 5 3 4

2 3 3 4 4

[Hint: Notice the neighboring coefficients – if they are very different,


the filter will dominantly behave as high pass otherwise lowpass].
3. Import an image using Python, find out the gradient magnitude using
Sobel operator.
4. Take a filter which is symmetric and convolve and correlate it with a
given image. Compare the results of the two and comment.

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
CHAPTER 5
Frequency Domain Image
Processing

5.1 Introduction
In this chapter, we will be introduced to the concept of frequency. To be able
to have different perspectives about a given mathematical function, we have
domains in which we can see that signal. These domains offer different
perspectives about that same signal. The most common domain is the time
domain (or independent variable vs. value in general) where we see the
values of signal with respect to time, as a graph. For images, we do not have
time domain, we have space domain as the independent variable for image
are 2D pixel coordinates in space. So, time/space domain is one such domain
which is naturally known to us. In general, for a mathematical signal, values
of signal plotted with respect to its independent variable is one domain. Can
the same signal be seen in some other domain too? — this is the question we
are trying to answer in this chapter. In this chapter, we will explore the idea
of transforming the signals from one domain to another to obtain more
information from the signal.
The new domain that we are going to explore in this chapter is called
frequency domain. We are going to study frequency domain for digital
signals in detail.
Structure
This chapter discusses the following topics:
• One dimensional analog signal in time and frequency domain
• One dimensional discrete time signal in time and frequency domain
• Two – dimensional Fourier transform
• Filtering of images in frequency domain

Objectives
After reading this chapter, the reader will be able to understand what
frequency domain is and how to bring a signal in one, two, or multiple
dimensions to frequency domain. This chapter presents the theory of
transformations from scratch, so no background is assumed. The reader will
also be able to appreciate the physical interpretation of frequency, especially
in the case of images. Based on the understanding so developed, the reader
will be able to perform frequency domain filtering using various filters
available in the literature.

5.2 One dimensional analog signal in time and frequency


domain
In this section, we will not develop the theory of analog frequency domain
rigorously but heuristically. We will get to know the frequency domain for
analog signals, and referring to that, we will develop the theory for digital
signals in the next section.
The frequency of going to college/workplace is daily which is more as
compared to taking a haircut (probably monthly). So, frequency can be
understood as a number of times a particular event happens in a given time.

5.2.1 Frequency domain for analog signals


Figure 5.1 shows two analog signals in part (a) and part (b). The signal in
part (a) completes 3 cycles in 0.1 seconds or, alternatively, 30 cycles per
second. The signal in part (b) completes 5 cycles in 0.1 seconds or,
alternatively, 50 cycles per second. That is why we say that the first signal
has a frequency of 30 hertz and the second has 50 hertz. Hertz is a unit in
which we measure cycles per second. Needless to say, signal 1 has a lower
frequency, and signal 2 is a higher frequency.
In parts (c) and (d), the same signals 1 and 2 are shown in the frequency
domain. For signal 1 in the time domain, it only has a frequency of 30 hertz.
That is why in the frequency domain [notice the scale of x-axis in part (c)],
there is a single spike at frequency 30 hertz with amplitude 1, as the original
signal had amplitude 1. Similar arguments apply to the high frequency signal
2 and its corresponding frequency domain in part (d) of the figure. Also,
keep in mind that while a signal is seen in the frequency domain, the signal
itself is not changed. Our perspective of viewing that signal has changed.
In Figure 5.2, we have constructed signal 3 [as shown in part (c)] by adding
signal 1 and signal 2 in part (a) and (b) respectively. Signal 1 has a
frequency of 30 hertz and an amplitude of 1. Signal 2 has a frequency of 50
hertz and an amplitude of 2. Signal 3, hence, has both frequencies 30 and 50
hertz. However, it is difficult to read that from the time domain graph [as
shown in part (c)]. In part (d), the frequency domain representation of signal
3 is shown, and from there, one can clearly read what frequencies are present
in the signal with what magnitude. This is the power of viewing the signal in
the frequency domain. The information that was not easily readable in the
time domain could be readily read in the frequency domain.
Figure 5.1: Time vs. frequency domain
In the preceding example, signal 3 comprised only two frequency
components. Theoretically speaking, any practical analog signal may
comprise any number of frequency components (including infinite frequency
components).
Figure 5.2: Signal with multiple frequencies in time vs. frequency domain
For an illustration, let us see Figure 5.3. Part (a) of this figure has a
rectilinear function, and it is impossible to know from the time domain what
frequencies are present in the signal with what magnitude. However, when
transformed into the frequency domain [as shown in part (b) of the figure],
this information is readily readable.
Figure 5.3: Rectilinear function in time and frequency domain
Let us take a pause here to interpret Figure 5.3. There are a few important
points to note:
• The function shown in the time domain is the same as the frequency
domain. Its shape appears different in both domains, yet it is the same.
• The rectilinear function [as seen in the frequency domain in part (b) of
the figure] has infinite frequency components with different magnitudes
(the graph is shown in finite range only).
• The frequency domain representation can be interpreted as any practical
signal that can be constructed from weighted sinusoids mixed (summed
up) in different proportions.
• One crucial point to note is that the range of available frequencies in the
analog domain is from 0 to ∞. Secondly, all the available frequencies are
distinct. This means that frequency 5 is different from frequency 2, and
so on. These facts seem trivial in the analog domain, but as we shall see
in the discrete/digital domain, this is not the case.
By practical signals, we mean signals with finite energy, a limited number of
maxima and minima, and a limited number of discontinuities. However, the
important question is how to transform the signal from time to frequency
domain. We will answer this question in the next section.

5.2.2 Fourier family of transforms


Continuous time periodic and aperiodic signals have different ways of
transforming the signal from time to frequency domain and vice versa. For
periodic signals, we have the Fourier series, and for aperiodic signals, we
have the Fourier transform. Assuming that the signal to be transformed is
x(t), following conditions, called Dirichlet's conditions, should be met by
the signal to be transformed:
• x(t) should have a finite number of discontinuities (in a period if it is
periodic).
• x(t) should contain a finite number of maxima and minima (in a period if
it is periodic).
• x(t) must be absolutely integrable (in a period if it is periodic).
Any signal that does not follow the preceding conditions is not of any
practical interest. Remember that these conditions are sufficient conditions
for the existence of a Fourier series or transform. There are some signals
which do not follow the above conditions but still have a Fourier series or
transform. However, if the above conditions are followed, the signal will
have a Fourier series/transform representation.

5.2.2.1 Continuous time Fourier series for periodic signals


A periodic signal x(t) can be brought into the frequency domain by using the
following equation:
Equation 5.1:

To bring the signal back to the time domain from the frequency domain,
refer to the following equation:
Equation 5.2:
Above is also called continuous time Fourier series (CTFS). Let us take a
moment to understand what these equations mean. x(t) is the periodic signal
with a period Tp, which we wish to transform in the frequency domain with
.
Let us first talk about Equation 5.2. Recall that in section 5.2.1, we said that
any signal of practical interest can be represented as a summation of
weighted sinusoids (formally weighted linear combination of sinusoids).
This is also true with exponential signals with imaginary exponent as well.
Any signal of practical interest can be represented as a weighted linear
combination of exponentials with an imaginary exponent. This is what
Equation 5.2 represents.
Like sinusoids and exponentials, there are other functions too which can do
the same, but we use sinusoids when we are dealing with real signals and
exponentials when we deal with complex input signals. In fact,
ejθ=cos(θ)+jsin(θ). So, representing x(t) by a weighted linear combination of
sinusoidal or exponentials is equivalent. However, we prefer to use
exponentials as we assume that some of our input signals will be complex in
nature. These sinusoidal signals or exponentials are called fundamental basis
functions of Fourier series/transform.
We now try to understand why other fundamental basis are not used. The
answer comes from signal theory. Sinusoids and exponents are well behaved
with linear time invariant (LTI) systems. To elaborate – if we apply a
signal of a given frequency at the input of the LTI system, we get the signal
with the same frequency at the output but with a different amplitude and
phase as dictated by the LTI system. This property is useful in signal
processing. Here is an explanation of how it happens – any signal of
practical interest can be thought of as weighted linear combination of
exponentials and so if this signal is passed through LTI system, at the output
same linear combination will be there will possibly changes in the weights
and phases only. This greatly simplifies calculations in the field of signal
processing.
For our case, coming back to Equation 5.2, ej2πkF0t is the fundamental basis
function. It is a family of exponentials for k=0,±1,±2,… Finally, ck is the
complex weight associated with the kth exponential. ck’s are calculated using
Equation 5.1. A plot of kF0 vs. ck is the frequency domain representation that
we were talking about in the previous section. Since we have changed the
fundamental basis function from sinusoid to exponential, it is necessary that
we see the plot of a few signals in the time and frequency domain to have a
better grasp. We will do that once we discuss Fourier transform in the next
section.

5.2.2.2 Continuous time Fourier transform for aperiodic signals


Fourier transform is used mainly for aperiodic signals. To bring a signal x(t)
from time to frequency domain (which satisfies Dirichlet's conditions as
described in the previous section), we use:
Equation 5.3:

To bring the signal back from frequency to time, we will use the following
equation:
Equation 5.4:

Above is also called as continuous time Fourier transform (CTFT). The


only difference between series and transform is that in the case of series, the
frequency axis was discrete. That is, ej2πkF0t for =0,±1,±2,… had consecutive
exponentials separated by kF0, but in the case of transform, the frequency
axis is continuous. A plot of F vs. X(F) is called a frequency domain plot.
It is beyond the scope of this book to prove the above formulas rigorously.

5.2.2.3 Fourier transforms for periodic signal


In this section we will learn that whether it is a periodic or aperiodic signal,
Fourier transform only will be used to bring it to frequency domain. Fourier
series representation will not be necessary. We state here without proof that
if we know the Fourier series coefficients ck’s, we just multiply them by 2π
and treat the frequency axis as continuous one in case of periodic signal’s
transformed domain. That will be the Fourier transform of a periodic signal.
By now, we have a single method of transforming a signal (whether periodic
or aperiodic) that satisfies Dirichlet’s conditions into the frequency domain,
and we call that Fourier transform.
Let us now see a few time vs. frequency domain plots taking fundamental
basis functions as exponentials in Figure 5.4. In part (a) of the figure, a real
signal (i.e., a signal with zero imaginary part) sin(2π30t) +2sin (2π50t) is
shown. Notice that it is a linear combination of two sine signals with
frequencies 30 and 50 with weights as 1 and 2 (corresponding amplitudes).
Since we have selected fundamental basis as exponential signals, we may
write:
Equation 5.5:

Rearranging, we get the following equation:


Equation 5.6:

Now, from Equation 5.6, we see that corresponding to a frequency 30 for


sin, we have 2 frequencies +30 and -30 in its complex representation. The
same is true for all the other frequencies as well. This is because of the
conversion of sinusoid to exponential. That is why there will be negative
frequencies in our frequency domain plot. Practically, frequency cannot be
negative but here by negative frequency, we mean the corresponding
negative phasor in the exponential representation of exponentials. In
layman’s terms, positive frequency is when you circle something in
clockwise direction, and negative frequency is for circling in anti-clockwise
direction with the same speed as earlier.
Also, when we plot the frequency domain representation, we only talk about
the magnitudes of these individual exponential terms and not the angle (or
phase). This is what we have been doing in earlier plots in previous sections
too. Keeping all these points in mind, it is now easy to interpret the plot in
part (b) of Figure 5.4.
For a frequency of 30, there are two spikes at -30 and 30. Since the
amplitude for this component is 1, it is distributed as 0.5 and 0.5 between
-30 and 30. For frequency 50, spikes at -50 and 50 can be seen with
individual amplitudes 1, which sum up to 2, which is the amplitude of this
component in the original signal. This kind of plot is called double sided
Fourier transform as we plot frequencies on both sides of zero. This is
conventionally very popular in the signal processing domain.

Figure 5.4: Double sided Fourier transform of a signal


Also, remember that since the signal is real, the distribution of half
magnitudes between positive and corresponding negative frequencies exists.
Had it been complex, it would no longer remain equal in general. This is
shown in Figure 5.5 for a signal sin (2π30t)+j2cos (2π30t). Parts (a) and (b)
of the figure show the real and complex parts of the signal (notice their
amplitudes). Part (c) of the figure shows the double sided Fourier transform.
Notice the magnitude asymmetry between the negative and positive
components of the same frequency (-30 and 30 in this case).

Figure 5.5: Double sided Fourier transform of complex signal


To better understand the magnitude of the double sided Fourier transform in
part (b) of Figure 5.5, the following decomposition will help:

Also, remember that, in general, the output of the Fourier transform is


complex in nature. Note that until now, in this chapter, we have not done any
coding to plot the above graphs for the analog domain. This information is
presented to help you understand the analog frequency domain.

5.3 One dimensional discrete time signal in time and frequency


domain
The concepts that we will develop in this section will apply to discrete time
signals. Keep in mind the nature of the image signal while we make
transitions to the frequency domain for digital signals. First, it is digital, i.e.,
its independent variable (pixel coordinates) and dependent variable (intensity
values) are all discrete in nature. Also, the image is finite in extent. That is,
we have a finite number of rows and columns. The intensity values are also
bounded, i.e., integers between 0 to 255 for a uint8 image. So, an image
readily satisfies Dirichlet’s conditions as illustrated earlier.
In this section, we will talk about 1D signals only. Those 1D signals will be
finite in extent as well as bounded in amplitude (both maximum and
minimum). This means we are essentially talking about finite length and
amplitude bounded discrete one-dimensional sequence as our signal.

5.3.1 Discrete time complex exponential with imaginary


exponent
Let us look at the complex exponential ej2πfn where j=√(-1), and
n=0,±1,±2,±3,… This will act as the fundamental basis signal for bringing a
discrete-time signal from time to the frequency domain (more about this in
coming sections). ej2πfn is a complex exponential with an imaginary
exponent (in general, we may have complex exponentials of the form e(a+jb).
In the present case, a=0 and 2πfn).
Let us compare ej2πFt, i.e., the fundamental basis for CTFT to ej2πfn. Note
that continuous frequencies are denoted by capital letter F and discrete
frequency by small letter f. Also, in place of time (independent variable) t,
we have n, which is not time but index of time. Let us see what index of time
is. In discrete world, for example, the signal is not defined at all times
(independent variable); rather it is defined at some predefined (usually
equally spaced) time instants. For example, if we measure the temperature of
a cooling body at an interval of 5 seconds, then t=nT where n=0,1,2,3,…,
T=5 seconds. t is the continuous time, n is the time in a discrete world, and it
is called an index of time. This process of recording signals at regular
intervals of time is called uniform sampling. So, t and n behave differently.
An important property of ej2πfn is that it is periodic in f. The addition of any
integer to f will give us the same complex exponential back, i.e.,
ej2π(f+m)n=ej2πfn, where m=0,±1,±2,… Proving this is simple,
e(j2π(f+m)n)=ej2πfn ej2πmn=ej2πfn as ej2πmn=1 because m and n are integers, the
product will also be an integer, and ej2πx=1 if x is an integer. This means that
the unique range of f is from [0 1] (or alternatively ). Note that any
interval in f with size 1 will do. The physical meaning of this derivation is
that we only have a limited number of frequencies available in the discrete
domain (i.e., unique frequencies exist only in unit intervals). This was not
the case in a continuous world. There, we had infinite frequencies (all
distinct) available.
Let us try to verify this in Figure 5.6. In this figure, we have not plotted
ej2πfn. Rather, we plot only the real part of ej2πfn which is cos(2πfn). This is
because one cannot plot complex signals in one graph. If you wish to see its
imaginary part’s plot, you may do that in a separate figure. However, it will
look similar as it will be a plot of sin(2πfn). Just by seeing the graph of either
the real or imaginary part of ej2πfn, the concept will be understood. Part (a)
of Figure 5.6 shows a signal whose frequency is f1. Part (b) shows the
signal whose frequency is shifted by an integer as compared to the signal in
part (a). Part (c) shows the signal whose frequency is increased by a non-
integer value as compared to part (a).
Note that, as we discussed earlier, signals in parts (a) and (b) have the same
appearance despite different frequencies (separated by integer values). We
expected this because we have already proved this. Similarly, the signal in
part (c) is bound to be different as compared to the signal in part (a). The
code for generating Figure 5.6 is shown in Code 5.1. You must try to play
with the frequency values at relevant places to validate the proof that we did
above. This is the first and very big difference between discrete time and
analog world.
Figure 5.6: Periodicity of frequency in discrete time signals/functions
The code for generating the output shown in the figure is given below:
01-
#=====================================================
=================
02- # PURPOSE : Finite Frequency range in discrete domain
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import scipy.ndimage as sci
06- import my_package.my_functions as mf # This is a user defined package
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- # Designing the signal in discrete time domain
10- f1=1/10 # Frequency is usually specified as p/q
11- n=np.arange(0,40,1) # Discrete time (independent variable) axis
12- comp_exp=np.exp(2j*np.pi*f1*n) # Complex exponential with
imaginary exponent
13-
14- fig1,ax1=plt.subplots(3,1)
15- fig1.show()
16- ax1[0].stem(n,np.real(comp_exp))
17- ax1[0].set_title('Original Signal (frequency f1)')
18- ax1[0].grid()
19-
20- f2=f1+9 # Frequency shifted by integer
21- comp_exp=np.exp(2j*np.pi*f2*n)
22- ax1[1].stem(n,np.real(comp_exp))
23- ax1[1].set_title('Signal whose frequency is shifted by addition of integer
(f1 + integer)')
24- ax1[1].grid()
25-
26- f3=f1+2.3 # Frequency shifted by non-integer
27- comp_exp=np.exp(2j*np.pi*f3*n)
28- ax1[2].stem(n,np.real(comp_exp))
29- ax1[2].set_title('Signal whose frequency is shifted by addition of non-
integer (f1 + non_integer)')
30- ax1[2].grid()
31-
32- fig1.suptitle('Illustration of periodicity of frequency',fontsize=15)
33-
34- plt.show()
35- print("Completed Successfully ...")
Code 5.1: Code for illustration of limited frequency range availability in discrete time
functions/signals
j2πfn
One must keep in mind that although the real or imaginary part of e is
cos(2πfn) or sin(2πfn), there is no guarantee that this cos or sin function’s
graph will be periodic with respect to n. At this stage it is important to note
that discrete cos and sin functions are not periodic always. There are very
specific circumstances under which they will be periodic. Let us now
explore this through Figure 5.7 and Figure 5.8, which shows discrete time
periodic cosine and aperiodic cosine signals, respectively.

Figure 5.7: Discrete time periodic cosine signal with different frequencies
We state this here without proof that when the discrete frequency f can be
written in the form of p/q such that p and q are relatively prime, i.e., the
fraction p/q cannot be further simplified, then the signal is periodic with a
fundamental period as q samples. Also, the unit of frequency in a discrete
world is not cycles/seconds but cycles/samples. This is because we do not
have time in a discrete world but the index of time. So, things are taken on a
per-sample basis instead of a per-second basis. Keeping this in mind, let us
now try to interpret parts (a), (b), and (c) of Figure 5.7, in which discrete
cosine signal with frequencies 1/10, 2/10, and 3/10 are shown respectively.
Their corresponding fundamental periods are 10 samples, 5 samples, and 10
samples. This is because 1/10 and 3/10 cannot be further simplified.
However, 2/10 is 1/5, and hence, for that signal, the fundamental period is 5
samples only. One may check that the signals repeat after 10, 5, and 10
samples, respectively.

Figure 5.8: Discrete time aperiodic cosine signal


However, if the discrete frequency f cannot be represented as a fraction, i.e.,
it is in non-terminating decimal form, then the signal is aperiodic. This is
best understood by the example of a cosine signal with f=π shown in Figure
5.8. Note that the overall shape appears to repeat, but there is no single
period that repeats itself exactly.
From the above discussion, the following important points can be noted:
• The range of over which ej2πfn is unique in any unit interval.
Conventionally, [0,1] or is chosen to make calculations easy. One
consequence of this is if ej2πfn is to be used as a fundamental basis
function of the discrete version of the Fourier transform, the frequency
domain plot will be periodic with period unity.
• In discrete time, cos(2πfn) or sin(2πfn) is not always periodic. It is
periodic if and only if f can be expressed in the form of p/q where p and
q are relative primes, then q will be the fundamental period.
• The unit of frequency in the discrete domain is cycles per sample.
Figure 5.7 and Figure 5.8 can be generated by using Code 5.2, which is
given as follows:
01-
#=====================================================
=================
02- # PURPOSE : Understanding periodicity of discrete cosine signals
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import scipy.ndimage as sci
06- import my_package.my_functions as mf # This is a user defined package
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- f1=1/10
10- n=np.arange(0,40,1)
11- sig1=np.cos(2*np.pi*f1*n)
12-
13- fig1,ax1=plt.subplots(3,1)
14- fig1.show()
15- ax1[0].stem(n,np.real(sig1))
16- ax1[0].set_title('(a) COS Signal with frequency f1 = 1/10')
17- ax1[0].grid()
18-
19- f2=2/10
20- sig2=np.exp(2j*np.pi*f2*n)
21- ax1[1].stem(n,np.real(sig2))
22- ax1[1].set_title('(b) COS Signal with frequency f2 = 2/10')
23- ax1[1].grid()
24-
25- f3=3/10
26- sig3=np.exp(2j*np.pi*f3*n)
27- ax1[2].stem(n,np.real(sig3))
28- ax1[2].set_title('(c) COS Signal with frequency f3 = 3/10')
29- ax1[2].grid()
30-
31- fig2,ax2=plt.subplots()
32- fig2.show()
33- f4=np.pi
34- sig4=np.cos(2*np.pi*f4*n)
35- ax2.stem(n,sig4)
36- ax2.set_title('COS Signal with frequency f4 = pi')
37- ax2.grid()
38-
39- plt.show()
40- print("Completed Successfully ...")
Code 5.2: Understanding periodicity of discrete time sinusoids

5.3.2 Fourier family of transforms for discrete time case


In this section, first, we list out discrete time Fourier series (DTFS) and
discrete time Fourier transform (DTFT) for the sake of having equivalent
ways in the discrete domain when compared to continuous domain.
However, as we shall realize, both will be of no use to us. We will then
discover a new transform called the discrete Fourier transform (DFT) and
will use that throughout this book to transform images from time to
frequency domain.

5.3.2.1 Discrete time Fourier series for periodic signals


A discrete time periodic signal x[n] can be brought into the frequency
domain by using the following equation:
Equation 5.7:
Where the period of x[n] is N and k =0,1,2 … (N-1). Unlike the continuous
time case, we do not have k as an integer from -∞ to -∞. Rather, we only
have N total ck’s. A plot of vs. ck is called frequency domain plot. Note
that to represent the discrete sequence in the time domain, we use x[n] and
not x(n).
To bring back the signal from frequency domain to time, we use the
following equation:
Equation 5.8:

Above is also called as DTFS. Here also, the summation range is finite,
meaning any discrete sequence periodic with period N can be represented as
a weighted linear combination of N exponential signals (fundamental basis).
In the continuous time domain, there could be infinite components, but here
only N suffice. The fundamental frequency of the exponential fundamental
basis is .

5.3.2.2 Discrete time Fourier transform for aperiodic signals


A discrete time aperiodic signal x[n] can be brought into the frequency
domain by using the equation shown below:
Equation 5.9:

In the above equation, we have used w in place of f simply because this form
is popular. The relation between w and f is w=2πf and, hence, dw=2πdf.
The inverse transform is given in the following equation:
Equation 5.10:

Above is also called as DTFT. One of the problems with DTFT is that its
frequency domain (i.e., frequency axis) is continuous, as observed in
Equation 5.10. It is problematic because computers cannot store infinite
values (any two points on a continuous frequency axis will have infinite
points between them). This is why DTFT is not suitable for use with
computers. Hence, the frequency axis is so discretized that no information is
lost, and we come to something called DFT, which is the topic of the next
sub-section. This discretization is such that we have the same number of
samples in the time domain and frequency domain (usually). However, in
general, the samples in the time and frequency domain (on x-scale) might be
different too.

5.3.2.3 Discrete Fourier transform


Any discrete time sequence x[n] having a finite number of elements N (i.e., a
finite duration sequence), can be brought from the time domain (more
correctly, index of time, i.e., n), to frequency domain (more correctly, index
of frequency, i.e., k) by using the following equation, which we call the
discrete Fourier transform:
Equation 5.11:

With k=0,1,2,3,4 … (N-1). In the discrete time domain, for DFT, we have
discrete time axis indexed by time instants n and similarly, discrete
frequency axis indexed by k. Both the time axis (in the time domain) and
frequency axis (in the frequency domain) are discrete. As we know that t=nT
holds for sampling a continuous time signal to discrete time signal through
uniform sampling, n=5 will not mean that the time is 5. It simply means
time t=5T, where T is the spacing between two samples in the time domain.
Frequency axis (in frequency domain) as discussed earlier if periodic with
period 1 if f is used as a frequency variable. However, since we are working
with w (when we studied DTFT in the previous section), the period will be
2π (as w=2πf). Any period of length 2π will do. So, we take it from 0 to 2π.
In DFT, this period is divided into N equal parts. These parts in 1 period are
indexed by k. DFT can be regarded as a sampled version (in the frequency
domain) of DTFT.
Keeping in mind the above discussion and Equation 5.11, and the fact that in
earlier sections, we decided to use ej2πfn as fundamental basis, here for DFT,
f = (or equivalently ). Hence, the fundamental basis in the RHS of
the Equations 5.11 and 5.12 can be understood. There are exactly N points in
the signal in the time domain and N points in the frequency domain for DFT.
To bring a signal back from frequency to the time domain, we use the
following equation [called inverse discrete Fourier transform (IDFT)]:
Equation 5.12:

Fast Fourier transform (FFT) is an algorithm for implementing DFT in a


time efficient manner. Similarly, inverse fast Fourier transform (IFFT) is
a method of implementing IDFT in a time efficient manner. Equations 5.11
and 5.12 are computationally complex to implement, and that is where FFT
and IFFT help. Both DFT and FFT (IDFT and IFFT) give the exact same
result. We are not going into the details of how FFT is implemented; rather,
we will use FFT as a tool to bring the signal from time to frequency domain.
To do the reverse, we will use IFFT. To see this at work, let us have a look at
Figure 5.9 and Code 5.3.
Shown in part (a) of the figure is a discrete time sinusoid with frequency
f=1/10 cycles/sample. The code for transforming this signal is shown in line
numbers 42 and 43 of Code 5.3. In line number 42, first the frequency axis is
created. Remember that since the spectrum will be periodic with a period
unity for f (or alternatively 2π for w), the frequency axis is created and
displayed for that range only in part (b) of the figure. In line number 44 of
the code, only the magnitude of the transformed signal is plotted using the
command ax2.stem(freq_axis,np.abs(fft_sig1)). The result is shown in part
(b) of the figure. Note that conventionally, we see the spectrum from f=[-12
1/2], that is why there is a need to make a shift in view. This is done by
using the fftshift function in line number 54. Also, note that the frequency
axis is created accordingly in line number 53. The plot obtained in part (c)
of Figure 5.9 is the result. One can clearly see that for f=±1/10, there are
spikes (impulses) present with magnitude as explained earlier.

Figure 5.9: Usage of FFT and FFTSHIFT


01-
#=====================================================
=================
02- # PURPOSE : 1D time-frequency representation of signal using FFT
03-
#=====================================================
=================
04- import cv2, matplotlib.pyplot as plt, numpy as np
05- import scipy.fft as sfft
06- import my_package.my_functions as mf # This is a user defined package
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- #-----------------------------------------------------------------
10- # Constructing a cosine signal in discrete time
11- #-----------------------------------------------------------------
12- f1=1/10 # Discrete frequency (In cycles/sample)
13- L=41 # Total no. of samples in the signal (keep this odd)
14- n=np.arange(0,L,1) # Index of time
15- sig1=np.sin(2*np.pi*f1*n) # Discrete time signal
16-
17- #-----------------------------------------------------------------
18- # Creating figure with empty axis
19- #-----------------------------------------------------------------
20- fig=plt.figure()
21- ax1=fig.add_subplot(3,1,1)
22- # fig.add_subplot(m,n,i) assumes that the figure has a
23- # grid structure of mxn and we are creating ith axis in that
24- ax2=fig.add_subplot(3,2,3)
25- ax3=fig.add_subplot(3,2,4)
26- ax4=fig.add_subplot(3,2,5)
27- ax5=fig.add_subplot(3,2,6)
28- fig.show()
29-
30- #-----------------------------------------------------------------
31- # Plotting the signal
32- #-----------------------------------------------------------------
33- ax1.stem(n,sig1)
34- ax1.grid()
35- ax1.set_title("(a) Input Signal",fontsize=12)
36- ax1.set_xlabel("n",fontsize=12)
37- ax1.set_ylabel("Amplitude",fontsize=12)
38-
39- #-----------------------------------------------------------------
40- # Transforming Signal to Frequency Domain and plotting Magnitude
plot
41- #-----------------------------------------------------------------
42- freq_axis=np.linspace(0,L-1,L)/L
43- fft_sig1=sfft.fft(sig1/L)
44- ax2.stem(freq_axis,np.abs(fft_sig1))
45- ax2.grid()
46- ax2.set_title("(b) Magnitude Plot (Without
FFTSHIFT)",fontsize=12,color='k')
47- ax2.set_xlabel("f",fontsize=12)
48- ax2.set_ylabel("Magnitude",fontsize=12)
49-
50- #-----------------------------------------------------------------
51- # Using FFT-SHIFT to correctly display the magnitude plot
52- #-----------------------------------------------------------------
53- freq_axis2=np.linspace(-(L-1)/2,(L-1)/2,L)/L
54- fft_sig1=sfft.fftshift(sfft.fft(sig1/L))
55- ax3.stem(freq_axis2,np.abs(fft_sig1))
56- ax3.grid()
57- ax3.set_title("(c) Magnitude Plot (With
FFTSHIFT)",fontsize=12,color='k')
58- ax3.set_xlabel("f",fontsize=12)
59- ax3.set_ylabel("Magnitude",fontsize=12)
60-
61- #-----------------------------------------------------------------
62- # Plotting Phase plot
63- #-----------------------------------------------------------------
64- ax4.stem(freq_axis2,np.angle(fft_sig1)/np.pi)
65- ax4.grid()
66- ax4.set_title("(d) Phase plot (Truncation
neglected)",fontsize=12,color='k')
67- ax4.set_xlabel("f",fontsize=12)
68- ax4.set_ylabel("Phase (x Pi)",fontsize=12)
69-
70- #-----------------------------------------------------------------
71- # Plotting Corrected Phase plot
72- #-----------------------------------------------------------------
73- max_mag=np.max(np.abs(fft_sig1))
74- fft_sig1[np.abs(fft_sig1)<0.90*max_mag]=0
75- ax5.stem(freq_axis2,np.angle(fft_sig1)/np.pi)
76- ax5.grid()
77- ax5.set_title("(e) Corrected Phase Plot (Truncation
considered)",fontsize=12)
78- ax5.set_xlabel("f",fontsize=12)
79- ax5.set_ylabel("Phase (x Pi)",fontsize=12)
80-
81- fig.suptitle("Time-Frequency representation of discrete time
signal",fontsize=15)
82- plt.show()
83- print("Completed Successfully ...")
Code 5.3: Code for transforming a discrete time signal to frequency domain
Another important point to note in part (c) of Figure 5.9 is that ideally, two
spikes at ±1/10 should exist, and the rest of everything should be zero.
However, here, we note that at other frequencies, some small magnitude is
present. The reason is that the signal we used is a truncated version of the sin
signal (otherwise, the sine signal is infinite on a time scale). Due to this
abrupt truncation, other frequencies also appear in the magnitude response.
Ideally, they should be neglected.
Part (d) of the figure shows the raw phase response, which is calculated in
line number 64 of the code by the function np.angle. While plotting, we
divide the response by π, and so the vertical axis should be read by
multiplying it with π. We are calling this response raw because it is not the
phase response of discrete time sin signal but a truncated version of it.
Because of that, as we noted in the magnitude response, there were some
non-zero magnitude values corresponding to which we now have some finite
phase. To overcome this problem, in line number 73 and 74 of the Code 5.3,
the complex FFT is made zero at the places where its magnitude response is
less than 90% of the maximum value of the magnitude response itself. After
that, the phase response is plotted in part (e) of the figure. This is the
expected phase response. At this point, we are not elaborating on phase
response as we will talk about it in great detail in later sections – it is
included here for the sake of completeness. Also, note the import of FFT
from the SciPy library in line number 5 of the code. The discussion for IDFT
(IFFT) is due – we shall take it when we study frequency domain filtering of
1D signals and images.

5.4 Two – dimensional Fourier transform


We must have a two-dimensional version of DFT and IDFT for bringing
images from time to frequency domain and vice versa. We are going to see
that in this section. We will apply the two-dimensional Fourier transform to
images. In the process we will see how an image looks in frequency domain.
An interesting observation to make there will be that frequency domain
representation of any image does not have any visually and physically
palatable information.

5.4.1 2D discrete Fourier transform and inverse discrete Fourier


transform
The following equation is used to bring a two-dimensional function (in our
case, an image signal) from time to frequency domain (2D DFT):
Equation 5.13:

and the corresponding 2D IDFT is represented below:


Equation 5.14:
with the symbols having the usual meaning as discussed in the previous
sections.
Our concern, however, is not to design these forward and inverse transforms
by ourselves, but to use these functions from some library in Python to bring
an image into the frequency domain and then manipulate the frequency
content so that we may process the image in the desired way. That is what
we do in the next section by using FFT2 and IFFT2 functions.

5.4.2 Image in frequency domain


Let us begin by seeing how an image looks in the frequency domain. The
following code will help us do that:
01-
#=====================================================
=================
02- # PURPOSE : Displaying an image in spatial and frequency domains
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # IMAGE IN SPATIAL DOMAIN
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img1.bmp',0))
14- fig1,ax1=plt.subplots(1,2)
15- fig1.show()
16- mf.my_imshow(mf.norm_uint8(input_image),'(a) Image in Spatial
Domain',ax1[0])
17-
18- #---------------------------------------------------------------------------
19- # IMAGE IN FREQUENCY DOMAIN
20- #---------------------------------------------------------------------------
21- fft_input_image=sfft.fft2(input_image)
22- mag_image=np.abs(sfft.fftshift(fft_input_image))
23- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Image in
Frequency Domain",ax1[1])
24-
25- plt.show()
26- print("Completed Successfully ...")
Code 5.4: Transforming an image into the frequency domain
Notice the use of FFT2 in line number 21. The result of executing Code 5.4
is shown in Figure 5.10. In the frequency domain, the image does not look
appealing to the human eye – we do not know what is in it. However, it is
the same image as we see it in the space domain. The frequency domain is
just another perspective. Let us now discuss what information we can see in
the frequency domain and how to interpret the image. Note that we have
only plotted the magnitude response in the frequency domain and not the
phase response. Also, note that the magnitude response of the image in the
frequency domain is plotted by applying log transformation for better
visibility in this case – sometimes, it will not be plotted by using log
transformation if the visibility is good anyway.
Figure 5.10: Image in space vs. frequency domain
This seems too much to take. Let us now understand the frequency domain
image by taking a test image in the time domain as a reference. See Figure
5.11. Before discussing this figure, also note that this is generated by Code
5.5. We will use this code (possibly multiple times) with minor
modifications in plotting subsequent figures and by changing the discrete
frequency and rotation of the pattern in the time domain, in line number 17
and 22, respectively.

Figure 5.11: Test image in time and frequency domain with


Coming back to Figure 5.11, part (a) shows a pattern of a sinusoidal wave in
the x-direction with discrete frequency fx=0.05. It has zero frequency in the
y-direction, i.e., fy=0. The total number of rows and columns are 70 and 100,
respectively (both in the time and frequency domain – that is what FFT will
do; the number of samples will remain the same in the time and frequency
domain). In part (b) of the image, which is the magnitude response only, the
axis of x corresponds to fx and similarly, the axis of y corresponds to fy.
Recall that the discrete frequency f is periodic with period 1, and hence the
range shown for frequency in part (b) of the figure is from -1 to 1 for both fx
and fy. However, since the number of rows and columns is different (r≠c)
then, in the frequency domain, the discrete samples of frequency are more
for columns (i.e., the x axis). Also, recall that DFT (and hence FFT) has a
discrete frequency axis instead of a continuous frequency axis.
Now, it is not difficult to correlate the space domain image to the frequency
domain image in Figure 5.11. Let us try to read part (b) of this figure. There
are two dominant dots at fx=0.05 and fx=-0.05. This means that in the space
domain, there should be sinusoidal variation along the x-direction with
frequency f=0.05 – and we know that is true. Next, if we notice the y-axis in
the frequency domain, both the white dots are present at fy=0. This means
that if one travels along any vertical line in the space domain, no change will
be noticed – and we know that is also true.
01-
#=====================================================
=================
02- # PURPOSE : Understanding 2D frequency domain through test image
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- from scipy import ndimage
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- #---------------------------------------------------------------------------
12- # Creating a test image in spatial domain and displaying
13- #---------------------------------------------------------------------------
14- r=70
15- c=100
16- input_image=np.float32(np.zeros((r,c)))
17- f=.05 # Set Discrete Frequency here
18- n=np.linspace(0,c-1,c)
19- one_row=np.sin(2*np.pi*f*n)
20- for i in range(0,r,1):
21- input_image[i,:]=one_row
22- rot_angle=0 # Set Rotation angle in degrees here
23- input_image = ndimage.rotate(input_image,rot_angle,reshape=False)
24-
25- fig1,ax1=plt.subplots(1,2)
26- fig1.show()
27- mf.my_imshow(mf.norm_uint8(input_image),'(a) Image in Spatial
Domain',ax1[0])
28- ax1[0].axis('on')
29-
30- #---------------------------------------------------------------------------
31- # IMAGE IN FREQUENCY DOMAIN
32- #---------------------------------------------------------------------------
33- fft_input_image=sfft.fft2(input_image)
34- mag_image=np.abs(sfft.fftshift(fft_input_image))
35- mf.my_imshow(mf.norm_uint8(mag_image),"(b) Image in Frequency
Domain",ax1[1])
36- ax1[1].axis('on')
37-
38- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
39- x_positions=np.linspace(0,c-1,5);
40- x_labels=x_positions/np.max(x_positions)-0.5
41- ax1[1].set_xticks(x_positions, x_labels)
42-
43- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
44- y_positions=np.linspace(0,r-1,5);
45- y_labels=y_positions/np.max(y_positions)-0.5
46- ax1[1].set_yticks(y_positions, y_labels)
47-
48- plt.show()
49- print("Completed Successfully ...")
Code 5.5: Code for generating test image of Figure 5.11 in time and frequency domain
Now, let us try to vary the pattern in part (a) of Figure 5.11 and notice the
changes in part (b). We will do this in two parts. First, we will try to change
the frequency of the original time domain (Figure 5.12) and second, we will
rotate the pattern in the time domain for a given frequency (Figure 5.13).
Let us take the first case in Figure 5.12. In comparison to the image in part
(a) in the time domain, the image in part (c) has a lower frequency.
Consequently, the magnitude plot in parts (b) and (d) respectively reflect this
result. The two dots in the frequency domain are near 0 (the dc frequency) in
part (d) as compared to part (b). The reverse is true when we compare the
image in part (a) with part (e) or their frequency domain representations in
part (b) vs. part (f). The frequency in part (e) or equivalently (f) is 0.45
cycles/sample, which is near the highest possible frequency of 0.5
cycles/sample. Beyond 0.5 cycles/sample, the spectrum will repeat itself. We
recommend playing with Code 5.5 at this stage to strengthen your
understanding of the time vs. frequency domain and reproduce the results in
Figure 5.12:
Figure 5.12: Changing fy in the space domain vs. its effect in the frequency domain

Let us take a look at the second case where we have fixed the frequency but
rotate the image by some degrees in time domain. Refer to Figure 5.13 for
understanding such a case. The frequency f (and not fx) is fixed as 0.05. The
difference between f and fx can be understood from the fact that f = fx when
the image is not rotated at all [as it was shown in part (a) of Figure 5.12.
However, in parts (a), (b), and (c) of Figure 5.13, the image has been rotated
by 300, 600 and 1350 with respect to the x-axis. In general, fx and fy have
non-zero values but their resultant is f=0.05, which is oriented in the
direction specified in parts (a), (b) and (c). The equivalent change in
frequency domain can be noted as rotation of the two dots by the same
degree (which were earlier on horizontal line for rotation angle =0]. Let us
now investigate what does rotation means in general. To understand this, let
us talk about parts (a) and (b) of Figure 5.13. The remaining parts of the
figure will follow. In part (a), the sinusoidal pattern makes an angle of 300
with the horizontal. Due to this, if you note in frequency domain, the
distance between the projections of 2 dots on fx axis is reduced. However, as
compared to the zero rotation case (where fy=0), here fy≠0. It has some finite
value, but the resultant f is still 0.05 in direction of 300. This can be
correlated with time domain too. Now, the variation along y axis is not
purely 0 if one travels along any fixed vertical line. The variation along any
horizontal line is reduced as compared to the zero rotation case because of
the tilt in the pattern.

Figure 5.13: Changing rotation in space domain vs. its effect in frequency domain
One thing becomes clear here – in Figure 5.13 parts (a), (c) and (e) or
equivalently (b), (d) and (f) have the same resultant frequency of f=0.05. If
we trace the locus of all such points in frequency domain that have same
resultant frequency f, we get the result shown in Figure 5.14. Note that f=0.3
is chosen so that the ellipse in frequency domain has big enough size for
good visibility purposes.
Figure 5.14: Concentric circles image and space frequency representation
Let us take a moment to grasp what just happened. Part (a) of Figure 5.14
was created by adding all images with resultant f=0.3 for all possible
rotations from 00 to 1800 (this figure can be generated by using Code 5.6 but
in that code, modification for equal number of samples on both frequency
axis is which is discussed in coming paragraphs are also incorporated). The
equivalent frequency domain representation has an ellipse which is the locus
of all points having resultant f=0.3. In the time domain, you may now note
that in any direction (if you see from center), there is a sinusoidal variation
with resultant f=0.3. Let us see what happens to this ellipse when we lower f
to 0.1. See Figure 5.15. The lowering of frequency is reflected in the low
frequency 2D sinusoid in time domain, and the size of ellipse has become
smaller in frequency domain. Note that the time domain wave is not
perfectly circular because it has been constructed by adding rotated versions
which had some plain background apart from the rotated sinusoidal pattern
too (as shown in Figure 5.13):
Figure 5.15: Concentric circles image and space frequency representation
The crux is that in the frequency domain, an ellipse centered around origin
represents a single resultant frequency. If any portion of ellipse is missing, it
means that equivalently in time domain, that variation is absent in those
directions where the ellipse is missing.
Now, let us address one issue here. If you look at the frequency domain
representation in any of the figures from Figure 5.10 to Figure 5.15, the
frequency samples on the x and y axis (i.e., values of fx and fy) are not equal
in number although they have the same range -1/2 to 1/2. This seems unjust.
Due to this reason, in part (b) of Figure 5.15, we get an ellipse, otherwise, if
there are equal number of samples on both axis (i.e., equal number of rows
and columns on frequency axis), we will get a circle. It is the nature of the
FFT algorithm that the size of input image (array) will also be the output of
the same size. Hence, to make equal number of samples on both the
frequency axis in frequency domain, we need to make number of rows and
columns equal in space (or time) domain. Therefore, we do zero padding to
columns if the total number of columns are less than rows or vice versa. This
can be simply done by using Code 5.6 and the corresponding result in Figure
5.16. Recall that we discussed the issue of frequency axis sampling for
obtaining DFT (FFT) from DTFT at the end of SubSection 5.3.2.2. Note line
number 39 to 44 in Code 5.6 where the syntax of inputting number of
samples for both the frequency axis is shown. As we will note later, this will
greatly simplify our 2D filter design as filters with circular cutoff
frequencies are relatively easy to design as compared to corresponding
elliptical filters. Due to this, we will follow the convention of axis with equal
number of samples in the frequency domain unless otherwise stated.
01-
#=====================================================
=================
02- # PURPOSE : Having equal no. of samples on both axis in freq. domain
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- from scipy import ndimage
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- #---------------------------------------------------------------------------
12- # Creating a test image in spatial domain and displaying
13- #---------------------------------------------------------------------------
14- r=70
15- c=100
16- input_image=np.float32(np.zeros((r,c)))
17- input_image2=np.float32(np.zeros((r,c)))
18-
19- f=.3 # Set Discrete Frequency here
20- n=np.linspace(0,c-1,c)
21- one_row=np.sin(2*np.pi*f*n)
22- for i in range(0,r,1):
23- input_image[i,:]=one_row
24- rot_angle=30 # Set Rotation angle in degrees here
25-
26- for rot_angle in np.arange(0,180,1):
27- input_image2 =
input_image2+ndimage.rotate(input_image,rot_angle,reshape=False)
28-
29- input_image =input_image2
30-
31- fig1,ax1=plt.subplots(1,2)
32- fig1.show()
33- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial Domain
f=0.3',ax1[0])
34- ax1[0].axis('on')
35-
36- #---------------------------------------------------------------------------
37- # IMAGE IN FREQUENCY DOMAIN
38- #---------------------------------------------------------------------------
39- # No. of frequency samples on both axis in freq. domain
40- freq_points=np.max([r,c])
41- # NOTICE THE WAY THE FOLLOWING COMMAND IS USED
42- # [freq_points,freq_points] argument correspond to total samples
43- # on fx and fy axis respectively.
44- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
45- mag_image=np.abs(sfft.fftshift(fft_input_image))
46- mf.my_imshow(mf.norm_uint8(mag_image),"(b) Image in Frequency
Domain",ax1[1])
47- ax1[1].axis('on')
48-
49- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
50- x_positions=np.linspace(0,freq_points-1,5);
51- x_labels=x_positions/np.max(x_positions)-0.5
52- ax1[1].set_xticks(x_positions, x_labels)
53-
54- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
55- y_positions=np.linspace(0,freq_points-1,5);
56- y_labels=y_positions/np.max(y_positions)-0.5
57- ax1[1].set_yticks(y_positions, y_labels)
58-
59- plt.show()
60- print("Completed Successfully ...")
Code 5.6: Setting equal number of samples on both frequency axis fx and fy
The output is shown below:

Figure 5.16: Figure 5.14 reproduced with equal number of samples on both frequency axis in part (b)

5.5 Filtering of images in frequency domain


Here, we need to emphasize that the operation of filtering in space/time
domain is convolution between the input signal and impulse response of the
system (as elaborated in Sections 4.4 and 4.5). The equivalent operation in
frequency domain is multiplication of frequency response of the signal and
frequency response of the impulse response of the system used. In a
space/time domain, the impulse response of the system was called
space/time domain filter. In a frequency domain, the frequency response
(magnitude as well as phase) of an impulse response is called filter. Also,
keep in mind that we are working with linear time (shift) invariant systems
only. So, all that has been said is for those systems only. In this section, we
will focus on designing frequency domain filters and discuss the related
concepts. Note that the subject of filter design is too large to be rigorously
covered in this section. However, we will do all that is needed for images. A
combination of mathematical rigor and intuition will be used. Before we
design filters for images (2D data), let us try out hands on 1D signals and
their filtering in frequency domain in the next sub section.

5.5.1 1D frequency domain filter design


To understand the process of frequency domain filter design, let us take an
example of an ideal low pass filter design, as illustrated in Figure 5.17:

Figure 5.17: Ideal low pass filter design and filtering in frequency domain
To understand what an ideal filter is, see part (e) of the figure. It shows the
frequency domain representation (magnitude plot only) of the system. It is
such that it is 1 in the range -0.2 to 0.2, and zero everywhere else (in the
range -0.5 to 0.5, after which it is periodic). This value 0.2 is called cutoff
frequency, denoted by fc=0.2. If this filter were to be element by element
multiplied with the frequency response of the signal [which is shown in part
(d) of the figure], this will abruptly truncate the frequency response of the
system beyond the range [-0.2,0.2]. This abrupt/ instantaneous truncation
gives it the name ideal filter. Lowpass filter allows only low frequencies
decided by the cutoff frequency) to appear in the output.
Now, let us understand the complete situation depicted in Figure 5.17. Parts
(a) and (b) of the figure show two different sinusoids with frequencies 1/3
and 1/20, respectively. Also, note that their amplitudes are also different
(although it is not necessary). Part (c) is our input signal made by adding the
signals of part (a) and part (b) in the time domain. Part (d) shows the
magnitude spectrum of part (c). As already discussed, note the impulses
corresponding to the frequencies in the input signals with their magnitudes.
As our system, which is a lowpass ideal filter, we are using the magnitude
response as shown in part (e) of the figure. Multiplying the system in part
(e) element by element of the signal in part (d) (both in frequency domain),
the higher frequency will be chopped off (as it is greater than fc=0.2). This is
called filtered signal. In part (f), the filtered signal is brought back to time
domain and plotted. It matches the low frequency component as shown in
part (b) of the figure.
The code for this is given in Code 5.7. In line number 64, if instead of
np.zeros, np.ones is used, and in line number 68, if the RHS of the equation
is 0 instead of 1 as of now, we will get an ideal high pass filtering scenario.
Its results are depicted in Figure 5.18. By comparing this with Figure 5.17,
your understanding of low pass and high pass filters will be validated.
Like this, a band pass filter (a filter that allows only a certain band of
frequencies), a band reject filter (a filter that dis-allows certain band of
frequencies) can be constructed as desired. In an ideal filter, the pass band
(where frequencies are allowed) has magnitude response as 1 and stop band
(where frequencies are dis-allowed) as 0. There is a sharp transition from 0
to 1 or 1 to 0 in frequency domain.
01-
#=====================================================
=================
02- # PURPOSE : Frequency domain IDEAL LOW PASS filtering in one-
dimension
03-
#=====================================================
=================
04- import cv2, matplotlib.pyplot as plt, numpy as np
05- import scipy.fft as sfft
06- import my_package.my_functions as mf # This is a user defined package
07- # one may find the details related to its contents and usage in section
2.7.3
08-
09- #-----------------------------------------------------------------
10- # Constructing cosine signal (discrete time) with two frequency
components
11- #-----------------------------------------------------------------
12- f1=1/3 # Discrete frequency (In cycles/sample for component 1)
13- f2=1/20 # Discrete frequency (In cycles/sample for component 2)
14- L=81 # Total no. of samples in the signal
15- n=np.arange(0,L,1) # Index of time
16- sig1=3*np.sin(2*np.pi*f1*n)
17- sig2=2*np.sin(2*np.pi*f2*n)
18- sig=sig1+sig2 # Discrete time signal with two frequencies
19-
20- #-----------------------------------------------------------------
21- # Plotting 1st component of the signal in time domain
22- #-----------------------------------------------------------------
23- fig1,ax1=plt.subplots(3,2)
24- fig1.show()
25- ax1[0,0].stem(n,np.real(sig1)) # Component 1 of the signal
26- ax1[0,0].grid()
27- ax1[0,0].set_title("(a) Component 1")
28- ax1[0,0].set_xlabel("n",fontsize=12)
29- ax1[0,0].set_ylabel("Amplitude",fontsize=12)
30-
31- #-----------------------------------------------------------------
32- # Plotting 2nd component of the signal in time domain
33- #-----------------------------------------------------------------
34- ax1[0,1].stem(n,np.real(sig2)) # Component 2 of the signal
35- ax1[0,1].grid()
36- ax1[0,1].set_title("(b) Component 2")
37- ax1[0,1].set_xlabel("n",fontsize=12)
38- ax1[0,1].set_ylabel("Amplitude",fontsize=12)
39-
40- #-----------------------------------------------------------------
41- # Plotting the signal in time domain (signal=component1+component2)
42- #-----------------------------------------------------------------
43- ax1[1,0].stem(n,np.real(sig))
44- ax1[1,0].grid()
45- ax1[1,0].set_title("(c) Input Signal = Component 1 + Component 2")
46- ax1[1,0].set_xlabel("n",fontsize=12)
47- ax1[1,0].set_ylabel("Amplitude",fontsize=12)
48-
49- #-----------------------------------------------------------------
50- # Transforming signal to frequency domain
51- #-----------------------------------------------------------------
52- freq_axis=np.linspace(-(L-1)/2,(L-1)/2,L)/L
53- fft_sig=sfft.fftshift(sfft.fft(sig/L))
54- ax1[1,1].stem(freq_axis,np.abs(fft_sig))
55- ax1[1,1].grid()
56- ax1[1,1].set_title("(d) Magnitude Plot (Signal)",color='k')
57- ax1[1,1].set_xlabel("f",fontsize=12)
58- ax1[1,1].set_ylabel("Magnitude",fontsize=12)
59-
60- #-----------------------------------------------------------------
61- # Lowpass filter design in frequency domain
62- #-----------------------------------------------------------------
63- # Initialisation with all zeros
64- freq_filter=np.zeros(np.size(freq_axis))
65- # SET CUTOFF FREQUENCY HERE
66- fc=.2
67- # LPF Design
68- freq_filter[np.asarray(np.where((freq_axis>-fc) & (freq_axis<fc)))]=1
69- ax1[2,0].stem(freq_axis,freq_filter)
70- ax1[2,0].grid()
71- ax1[2,0].set_title("(e) IDEAL LPF in frequency domain",color='k')
72- ax1[2,0].set_xlabel("f",fontsize=12)
73- ax1[2,0].set_ylabel("Magnitude",fontsize=12)
74-
75- #-----------------------------------------------------------------
76- # Transforming the filtered signal back to time domain
77- #-----------------------------------------------------------------
78- filtered_signal_time=L*fft_sig*freq_filter # L is normalisation factor
79- ifft_sig=sfft.ifft(sfft.ifftshift(filtered_signal_time))
80- ax1[2,1].stem(n,np.real(ifft_sig))
81- ax1[2,1].grid()
82- ax1[2,1].set_title("(f) Filtered Signal (Time domain)")
83- ax1[2,1].set_xlabel("n",fontsize=12)
84- ax1[2,1].set_ylabel("Amplitude",fontsize=12)
85-
86- plt.show()
87- print("Completed Successfully ...")
Code 5.7: Frequency domain low pass filtering illustration
The output of the code is shown in figure below:
Figure 5.18: Ideal high pass filter illustration

5.5.2 Conventions used in frequency domain filtering


Smoothening was discussed in the previous chapter in space/time domain. In
this section, the same thing will be discussed in the frequency domain.
Before we design the actual ideal lowpass filter in 2D, let understand the
process of conversion of an image from space/time to frequency domain and
bring the image back from frequency to space/time domain. These
conventions will be followed throughout the book.
Look at Figure 5.19. Part (a) of the image shows the actual grayscale image
with an unequal number of rows and columns in general. Due to this, the
magnitude plot of part (b) also has an unequal number of samples for fx and
fy though they both have the same range [-0.5,0.5]. This will create difficulty
in filter design as a single frequency will be represented by an ellipse as
discussed earlier. To circumvent this problem, we normalize the frequency
domain, as shown in part (c) of the figure. Here, we have an equal number
of samples for fx and fy. However, if we bring this normalized image from
frequency to space domain back, we see what is shown in part (d) of the
figure. This is due to the fact that while taking FFT (see line numbers 44 to
54 of Code 5.8), we choose the number of samples for both the axis in
frequency domain as same and that is equal to the maximum out of the total
number of rows and columns. This will allow us to create filters with
circular boundaries.

Figure 5.19: Conventions used for time to frequency and vice versa conversion
From this point onward, when we use the frequency domain, we mean the
normalized frequency domain, but we drop the word normalized — it will be
understood by default. Also, while displaying the image transformed back to
space domain, we will display it by removing the padded part. Further, the
labels and annotations and x and y axis in both the domains will be dropped.
01-
#=====================================================
=================
02- # PURPOSE : Understanding padding while taking FFT-IFFT
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # Importing and displaying image in space domain
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img1.bmp',0))
14- r,c=np.shape(input_image)
15- fig1,ax1=plt.subplots(2,2)
16- fig1.show()
17- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
18- ax1[0,0].axis('on')
19-
20- ax1[0,0].set_xlabel('x (or c) axis->')
21- ax1[0,0].set_ylabel('<- y (or r) axis')
22-
23- #---------------------------------------------------------------------------
24- # IMAGE IN FREQUENCY DOMAIN
25- #---------------------------------------------------------------------------
26- fft_input_image=sfft.fft2(input_image)
27- mag_image=np.abs(sfft.fftshift(fft_input_image))
28- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain",ax1[0,1])
29- ax1[0,1].axis('on')
30-
31- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
32- x_positions=np.linspace(0,c,5);
33- x_labels=x_positions/np.max(x_positions)-0.5
34- ax1[0,1].set_xticks(x_positions, x_labels)
35-
36- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
37- y_positions=np.linspace(0,r-1,5);
38- y_labels=y_positions/np.max(y_positions)-0.5
39- ax1[0,1].set_yticks(y_positions, y_labels)
40-
41- ax1[0,1].set_xlabel('fx axis ->')
42- ax1[0,1].set_ylabel('fy axis ->')
43-
44- #---------------------------------------------------------------------------
45- # IMAGE IN FREQUENCY DOMAIN (With Equal samples for fx and
fy)
46- #---------------------------------------------------------------------------
47- # No. of frequency samples on both axis in freq. domain
48- freq_points=np.max([r,c])
49- # NOTICE THE WAY THE FOLLOWING COMMAND IS USED
50- # [freq_points,freq_points] argument correspond to total samples
51- # on fx and fy axis respectively.
52- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
53- mag_image=np.abs(sfft.fftshift(fft_input_image))
54- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(c)
NORMALISED Frequency Domain",ax1[1,0])
55- ax1[1,0].axis('on')
56-
57- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
58- x_positions=np.linspace(0,freq_points-1,5);
59- x_labels=x_positions/np.max(x_positions)-0.5
60- ax1[1,0].set_xticks(x_positions, x_labels)
61-
62- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
63- y_positions=np.linspace(0,freq_points-1,5);
64- y_labels=y_positions/np.max(y_positions)-0.5
65- ax1[1,0].set_yticks(y_positions, y_labels)
66-
67- ax1[1,0].set_xlabel('fx axis ->')
68- ax1[1,0].set_ylabel('fy axis ->')
69-
70- #---------------------------------------------------------------------------
71- # Image Transformed back in time domain
72- #---------------------------------------------------------------------------
73- image_back_in_time=sfft.ifft2(fft_input_image)
74- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(d) PADDED
Spatial Domain',ax1[1,1])
75- ax1[1,1].axis('on')
76-
77- ax1[1,1].set_xlabel('x (or c) axis->')
78- ax1[1,1].set_ylabel('<- y (or r) axis')
79-
80- plt.show()
81- print("Completed Successfully ...")
Code 5.8: Conventions while transforming an image from space to frequency domain and vice versa

5.5.3 Two-dimensional ideal filtering


Now, let us discuss the actual frequency domain low pass filter design
filtering in frequency domain by referring to Figure 5.20 and Code 5.9:
Figure 5.20: Frequency domain low pass filtering of images cycles/sample
Parts (a) and (b) of Figure 5.20 do not need elaboration (hopefully, if you
have understood the discussion so far). In part (c), an ideal filter with a
cutoff frequency fc=0.2 cycles/sample is shown. Notice that it is a low pass
filter as its passband is centered around the center of the image – which
corresponds to 0 frequency. The boundary of the passband is fc=0.2
cycles/sample. We will see the reason for arriving at this conclusion when
we discuss Code 5.9. Part (d) shows the actual filtering process in the
frequency domain, where the frequency response (not magnitude response
alone) is multiplied by the filter in the frequency domain. Part (e) of the
figure shows the result of bringing the frequency domain filtered image in
part (d) back to the space domain. Notice that this time, a padded region
appears (which is expected as the original image has a different number of
rows and columns). The important thing is that, like the other parts in the
filtered image, the padded region is also blurred. Anyways, we have
removed (clipped) the padded region and displayed the original image in
part (f) of the figure. that the padded region is also affected because the
frequency domain representation in part (d) is normalized. Hence, to begin
with, the padding was already present (see line number 23 of Code 5.9).
01-
#=====================================================
=================
02- # PURPOSE : Frequency domain Low pass filtering of images
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # Importing and displaying image in space domain
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img9.bmp',0))
14- r,c=np.shape(input_image)
15- fig1,ax1=plt.subplots(2,3)
16- fig1.show()
17- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
18-
19- #---------------------------------------------------------------------------
20- # Image in normalised frequency domain (Magnitude Plot)
21- #---------------------------------------------------------------------------
22- freq_points=np.max([r,c])
23- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
24- mag_image=np.abs(sfft.fftshift(fft_input_image))
25- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain (Magnitude)",ax1[0,1])
26-
27- #---------------------------------------------------------------------------
28- # Designing IDEAL LOW PASS Filter
29- #---------------------------------------------------------------------------
30- # Initialising an image with shape equal to fft_input_image with all zeros
31- freq_domain_filter=np.zeros((freq_points,freq_points))
32- fc=.2 # SET CUTOFF FREQUENCY OF FILTER HERE
33-
34- # Creating the LPF in following loop
35- for i in np.arange(0,freq_points,1):
36- for j in np.arange(0,freq_points,1):
37- if np.sqrt((i-freq_points/2)**2+(j-freq_points/2)**2)
<fc*freq_points:
38- freq_domain_filter[i,j]=1
39- mf.my_imshow(mf.norm_uint8(freq_domain_filter),"(c) Frequency
Domain Filter",ax1[0,2])
40-
41- #---------------------------------------------------------------------------
42- # Filtering in frequency domain
43- #---------------------------------------------------------------------------
44- freq_filtered_image=sfft.fftshift(fft_input_image)*freq_domain_filter
45-
mf.my_imshow(mf.norm_uint8(np.log(1+np.abs(freq_filtered_image))),"(d)
Frequency Domain Filtering",ax1[1,0])
46-
47- #---------------------------------------------------------------------------
48- # Image Transformed back in time domain
49- #---------------------------------------------------------------------------
50- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
51- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(e) PADDED
Spatial Domain',ax1[1,1])
52-
53- #---------------------------------------------------------------------------
54- # Image Transformed back in time domain (displayed without
padding)
55- #---------------------------------------------------------------------------
56- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
57- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(f)
Spatial Domain',ax1[1,2])
58-
59- plt.show()
60- print("Completed Successfully ...")
Code 5.9: Code for frequency domain low pass filtering of images
Although, Code 5.9 is self-explanatory, we will discuss the filter creation
part from line number 27 to line number 39. In line number 31, we first
initialize the frequency domain filter with all zeros. In the next line, the
cutoff frequency fc is set. Note that this cutoff frequency is radial in nature.
This means that all the points fx and fy where fc =√(fx2+fy2 are the boundary
(cutoff) points. Since we are plotting normalized frequency domain plots, the
locus of all such points will be a circle of radius fc.
Now, look at the structure of the loop from line number 35 to 38. Line
numbers 35 and 36 are used to iterate over all rows and columns of the filter
that we initialized with all zeros. Then, to design a low pass filter, we need to
create the passband as a circular region across the center with a radius equal
to the desired cutoff frequency fc. That is what we look for in line number
37. np.sqrt((i-freq_points/2)**2+(j-freq_points/2)**2) denotes the radial
distance of point (i,j) from the center of the image. We need this to be less
than fc*freq_points (i.e., less than fc). The multiplier freq_points is present
because the distance between any two consecutive pixels separated
horizontally (or vertically) is 1, but we need to see it as fc. If the condition in
line number 37 is satisfied, we make that a passband pixel value (i.e., equate
it) to 1. One very important point to note is in line number 44, where a
filtering operation is done on the entire FFT of the signal and not on the
magnitude response alone – this will be true for all the filtering operations
that we are going to do in this chapter. As mentioned earlier, phases will be
covered later in this book as it requires separate attention.
Now, let us note a few important points by comparing Figure 5.20 and
Figure 5.21. They are the sample run of Code 5.9 for fc=0.2 and fc=0.1
cycles/sample:
• Smoothening: Due to low pass filtering in the frequency domain,
blurring occurs. This is visible in the results of both the figures.
• Fringing: With ideal lowpass filters, near high frequency (edge like)
regions in the original image, the resulting filtered image will have
fringing (wave like structures in filtered image near edges). This is
clearly visible in the result in Figure 5.20. This phenomenon is also
there in Figure 5.21 but is too severe that this fringing is almost
everywhere. Fringing is caused due to abrupt truncation of high
frequency components. To get rid of this unwelcome artifact, non-ideal
filters with desired characteristics must be used.
• Images are lowpass signals: This means that most of the information of
the images is present in its low frequency content. This is evident from
Figure 5.21, where the cutoff frequency is very low, but still, in the
filtered image, the dominant structure can be effectively seen. Try
playing with the cutoff frequency in the code. For values of cutoff
frequency near 0.5 or -0.5, the image will have negligible blurring or
smoothening.
Figure 5.21: Frequency domain low pass filtering of images with an ideal filter having fc=0.1
cycles/sample

5.5.4 Gaussian lowpass filtering


Fringing (also called ringing) is a problem when it comes to ideal lowpass
filtering. It is present because of the abrupt discontinuity that is present in
the very structure of the ideal filter (i.e., at the cutoff frequency). We can,
however, create filters that do not abruptly truncate the frequencies beyond
the cutoff frequency. Rather, they do it gradually. Instead of having an ideal
filter, we might have monotonic filters in the frequency domain too. One of
the most common monotonic filters used is the Gaussian filter. Recall that
we have used it in time domain filtering in Equation 4.20, which is rewritten
as:
Equation 5.15:

To be able to create a filter like this in the frequency domain, we need to


make the following changes: replace x by fx, y by fy. We set A=1 because we
want the maximum value in the passband as 1. Also, σx2=σy2=σ2 and
x0=fx0=0 and y0=fy0=0 because we want a circularly symmetric filter
centered at (0,0) frequency. After making these changes and simplifying, we
get the following equation:
Equation 5.16:

which may be rewritten, as shown below:


Equation 5.17:

where fr is the radial frequency such that fr2=fx2+fy2. Also, note that the
function H(fr) is magnitude response in the frequency domain. This becomes
a radial Gaussian function in the frequency domain with a maximum value
of 1. Apart from this we also need to define the cutoff frequency fc. But the
problem is that the Gaussian function does not have a sharp cutoff point.
One widely used convention is half power point. Let us understand half
power point.
In the frequency domain, we see the signal from the perspective of its
composition from basic frequencies. If a signal has one of the frequency
components as fr with magnitude |H(fr)| the power contributed by that
component is |H(fr)|2 – this is a result from signal processing theory which
we are stating here without proof. Since we are talking about monotonically
decreasing Gaussian low pass filter, as the frequency increases, the
magnitude decreases and hence the power. One may select the half power
point as the cutoff frequency of the filter. This means that when we create
Gaussian lowpass filter by using Equation 5.17, where we choose σ such that
at fr=fc (fc being the cutoff frequency). |H(fr)|i.e., the magnitude must
correspond to half power. The maximum power was at zero frequency which
is 12=1 (as the magnitude was 1 there). We now look for a frequency (i.e.,
value of fr) such that|H(fr)|2 becomes i.e., . We will call this
fr as cutoff frequency and denote it by fc.
So, if we rewrite Equation 5.17 in the following form, we get:
Equation 5.18:

and substitute we get:


Equation 5.19:

The preceding equation gives us that value of σ for which fr equals cutoff
frequency fc. We call that cutoff value of σ as σc Note that fc is not an abrupt
cutoff frequency as in an ideal filter. This is the frequency where the
magnitude will become , which corresponds to half power of the
maximum magnitude of the overall signal.
Gaussian low pass filtering is illustrated in Figure 5.22. Compare it with
Figure 5.21 and notice differences in filter structure in part (c) of both
figures and most importantly, the absence of fringing in parts (e) and (f) of
the latter. Notice that smoothening is present though. This is the advantage
of using Gaussian filters in frequency domain.

Figure 5.22: Frequency domain low pass filtering of images with Gaussian filter having fc=0.1
cycles/sample
The code for generating results in Figure 5.22 is shown here. For the most
part, it is same as Code 5.9. The only difference is in Gaussian low pass
filter creation with a given cutoff frequency. We recommend playing with
the cutoff frequency and noticing the results.
01-
#=====================================================
=================
02- # PURPOSE : Frequency domain Gaussian Low pass filtering of images
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # Importing and displaying image in space domain
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img9.bmp',0))
14- r,c=np.shape(input_image)
15- fig1,ax1=plt.subplots(2,3)
16- fig1.show()
17- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
18-
19- #---------------------------------------------------------------------------
20- # Image in normalised frequency domain (Magnitude Plot)
21- #---------------------------------------------------------------------------
22- freq_points=np.max([r,c])
23- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
24- mag_image=np.abs(sfft.fftshift(fft_input_image))
25- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain (Magnitude)",ax1[0,1])
26-
27- #---------------------------------------------------------------------------
28- # Designing GAUSSIAN LOW PASS Filter
29- #---------------------------------------------------------------------------
30- # Initialising an image with shape equal to fft_input_image with all zeros
31- freq_domain_filter=np.zeros((freq_points,freq_points))
32- fc=.1 # SET CUTOFF FREQUENCY OF FILTER HERE
33-
34- # Creating 2D freq. grid for Gaussian filter generation in freq. domain
35- f_positions_norm=np.linspace(0,freq_points,freq_points)
36- fx=f_positions_norm/np.max(f_positions_norm)-0.5
37- fy=fx # Because we are taking circularly symmetry
38- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
39-
40- # Sigma of Gaussian dictated by cutoff frequency fc
41- sigma1=np.sqrt((fc**2)/(2*np.log(np.sqrt(2))))
42- # 2D Gaussian creation
43- sigma_x=sigma1
44- sigma_y=sigma1
45- Gauss_function=np.exp(-((fxx**2)/(2*(sigma_x**2))+(fyy**2)/(2*
(sigma_y**2))))
46- freq_domain_filter=Gauss_function
47- mf.my_imshow(mf.norm_uint8(freq_domain_filter),"(c) Frequency
Domain Filter",ax1[0,2])
48-
49- #---------------------------------------------------------------------------
50- # Filtering in frequency domain
51- #---------------------------------------------------------------------------
52- freq_filtered_image=sfft.fftshift(fft_input_image)*freq_domain_filter
53-
mf.my_imshow(mf.norm_uint8(np.log(1+np.abs(freq_filtered_image))),"(d)
Frequency Domain Filtering",ax1[1,0])
54-
55- #---------------------------------------------------------------------------
56- # Image Transformed back in time domain
57- #---------------------------------------------------------------------------
58- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
59- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(e) PADDED
Spatial Domain',ax1[1,1])
60-
61- #---------------------------------------------------------------------------
62- # Image Transformed back in time domain (displayed without
padding)
63- #---------------------------------------------------------------------------
64- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
65- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(f)
Spatial Domain',ax1[1,2])
66-
67- plt.show()
68- print("Completed Successfully ...")
Code 5.10: Low pass filtering in frequency domain using Gaussian (monotonic) filter
At this point, it is also important to note that conventionally, decibel (dB)
scale is used for specifying the cutoff frequencies. Quantity X representing
magnitude or amplitude in decibel becomes 10 log10X. An example relevant
to our case is the magnitude response. However, quantities representing
power become 20 log10X when read in decibel scale. Note that when X= ,
the power of X represented in dB is 20 log10 ≈-3. That is why, the half
power frequency is also called as -3db frequency or 3 dB decay frequency.
Gaussian filter does what we need. There are many such filter structures that
are monotonic and will give us similar results. We are both satisfied and
dissatisfied with the results. We are satisfied because smoothing is achieved
without fringing. However, we are dissatisfied because the Gaussian filter's
magnitude response does not match the ideal filter's response. To see what
this means, look at Figure 5.23:
Figure 5.23: Magnitude response of a typical non-ideal lowpass filter in the frequency domain
It shows the two-dimensional magnitude response of a typical lowpass
frequency domain filter superimposed with a corresponding one-dimensional
equivalent. The one-dimensional equivalent is drawn by taking the profile on
the horizontal line in the center of the image. Also, note that the axis has two
different meanings corresponding to one and two-dimensional signals. The
current axis marks correspond to two-dimensional signals on both axes. For
a one-dimensional signal, the marks on the x-axis remain the same, but on
the y-axis, the markings should be read from 0 to 1.
Let us discuss the one-dimensional case first. Clearly, the filter structure is
not an ideal one. In an ideal lowpass filter, there are two regions – one where
the magnitude response is exactly 1 in low frequency region (called
passband) and the second where the magnitude response is exactly 0 in high
frequency region (called stop band). In the current structure, however, the
magnitude response remains one (or nearly one in some portion), and then it
decays in some finite region to nearly zero value. In the practical filter, there
is a third region called the transition region, where the response is neither
one nor zero, but it decays from one to zero.
Similar observations can be made about the two-dimensional filter structure.
Compare part (c) of Figure 5.21 with Figure 5.23. One can see that the ideal
filter in the frequency plane has a magnitude response of 1 in a circular
region. Outside this region, it is 0. However, in a practical filter, this
transition from 1 to 0 is gradual rather than abrupt, as noted in Figure 5.23.
Hence, passband, transition band, and stop band can be clearly imagined.
Refer to Figure 5.24 where these regions are marked. The boundaries of the
pass band, transition band, and stop band are determined through specific
criteria based on the desired performance of the filter. These criteria involve
factors like cutoff frequencies, filter order, and desired attenuation levels.
However, this topic is beyond the scope of this book, and readers interested
in a deeper understanding are encouraged to refer to textbooks on signal
processing.

Figure 5.24: Passband, transition band, and stop band in a practical lowpass filter
Having understood the above, let us now look at parts (a), (b), and (c) of
Figure 5.25. We will talk about parts (d), (e), and (f) shortly. Parts (a) and
(b) are similar, we have already worked with them in the previous sections –
the ideal lowpass filter and the Gaussian filter. Also, note that they have the
same cutoff frequency. Despite this, their magnitude responses differ
significantly due to the presence of a wide transition band in the Gaussian
lowpass filter in part (b). In part (c), another filter called the Butterworth
filter, which we shall explore, is shown. It also has the same cutoff
frequency as the lowpass ideal and Gaussian filters of parts (a) and (b). The
magnitude response of this filter, when compared to the ideal filter, is closer
in appearance when compared to the Gaussian filter. This has a narrow
transition band that we were looking for. A narrow transition band results in
lower undesired frequencies in the output, including those in the transition
and stopband, compared to a wider transition band. In the next section, we
will study the Butterworth filter (which is not a monotonic filter) in detail.
However, before that, remember – we need smoothness in magnitude
response to avoid fringing, and we need a narrow transition band to greatly
attenuate undesired frequencies – if not completely remove them. Both are
not achievable simultaneously. So, there will be a tradeoff between the two.

5.5.5 Butterworth low pass filtering


Mathematically, a Butterworth filter is described by the following magnitude
squared:
Equation 5.20:

Note that the LHS is a magnitude squared response and not a magnitude
response alone. fc, as usual, is the half power cutoff (-3 dB cutoff) frequency.
This fact can be verified by putting fr= fc in Equation 5.20. This will yield
|H(fr)|2 = which corresponds to half power, or in other words,
. here controls the width of the transition band. It is called the order of filter.
The higher the order, the closer we approximate the ideal filter (at the cost of
more fringing). The lower the order, the more the smoothening (at the cost of
a wide transition band and hence undesired frequencies in output).
To see how the Butterworth filter changes its transition width with respect to
the order of the filter, see parts (d), (e), and (f) of Figure 5.25. All of them
are Butterworth filters with a cutoff frequency fc=0.1 cycles/sample but
order n=1, 2, and 50 respectively. For n=50, the response closely matches
with ideal filter’s magnitude response as in part (a) of the figure. For n=1
[as in part (d)], the transition band is extremely wide.
Let us now see an illustration comparing the results of the application of
ideal, Gaussian, and Butterworth low pass filter on the input image of part
(a) of Figure 5.22. Refer to Figure 5.26 for seeing the results. Note that all
the filters have the same cutoff frequency, as shown in parts (a), (b), and (c).
Their one-dimensional equivalents are also shown superimposed on them.
The results of the application of the ideal filter in part (d) show fringing. In
part (e), fringing is absent, but as one can note carefully, some high-
frequency components are present (sharp boundaries). This is because the
Gaussian filter’s magnitude response is non-zero in the transition band (over
a wide range of frequencies). This problem is removed in part (f), which
shows the result corresponding to the Butterworth filter as the transition
band is narrow.

Figure 5.25: Magnitude response of different filters with same cutoff frequency
However, due to closeness to the ideal magnitude response, some fringing
appears. Still, the result of the Butterworth filter is better than the ideal filter
(in terms of fringing) and Gaussian (in terms of suppressing the frequency
components after cutoff frequency).

Figure 5.26: Frequency domain low pass filtering with fc=0.1 cycles/sample on Figure 5.22 with
different filters
In Figure 5.27, we have shown the result of the application of Butterworth
lowpass filters with a cutoff frequency of 0.1 cycles/sample but with
different orders of 1, 5, and 10. From this figure, one can note that fringing
increases as we increase the order of the filter because the magnitude
response approaches ideal filter characteristics, and hence, the transition
from 1 to 0 becomes sharper, which leads to the phenomenon of fringing.
The second important point to note here is that smoothening is best when the
order is more because the passband completely passes all frequencies in it,
and the stopband almost completely stops the frequencies in it. At order n=1,
even after smoothening, the edge information can be seen.
There exists a tradeoff between fringing and frequency passing/stopping –
one must choose the Butterworth filter order accordingly.
Figure 5.27: Result of applying Butterworth filter with fc=0.1 cycles/sample on Figure 5.22 with
different orders
The code for testing ideal, Gaussian, and Butterworth filters is written in
Code 5.11. Note that this is a general code for creating low pass, high pass,
band stop, and band pass filters. Low pass filtering has already been
discussed. The remaining filters will be discussed in the coming sections.
The implementation of the code is that the user has a choice to use ideal,
Gaussian, or Butterworth variants for filter creation in the above categories.
So, this code is a general code that will be used in the next sections, and all
the figures in this and the coming sections are generated by modifying this
code only. The code is lengthy, but if you have followed it so far, it will not
be daunting.
001-
#=====================================================
=================
002- # PURPOSE : General Code for frequency domain filtering of images
003- # This code implements - Lowpass, Highpass, Band stop and Bandpass
filtering
004- # This code will do the above for Ideal, Gaussian and Butterworth
filters
005-
#=====================================================
=================
006- import cv2,matplotlib.pyplot as plt
007- import numpy as np
008- import scipy.fft as sfft
009- import sys
010- import my_package.my_functions as mf # This is a user defined
package
011- # one may find the details related to its contents and usage in section
2.7.3
012-
013- #---------------------------------------------------------------------------
014- # FUNCTION : Designing IDEAL LOW PASS Filter
015- #---------------------------------------------------------------------------
016- def Ideal_LPF(freq_points,fc):
017- # Initialising an image with shape equal to fft_input_image with all
zeros
018- freq_domain_filter=np.zeros((freq_points,freq_points))
019- # Creating the LPF in following loop
020- for i in np.arange(0,freq_points,1):
021- for j in np.arange(0,freq_points,1):
022- if np.sqrt((i-freq_points/2)**2+(j-freq_points/2)**2)
<fc*freq_points:
023- freq_domain_filter[i,j]=1
024- return(freq_domain_filter)
025-
026- #---------------------------------------------------------------------------
027- # FUNCTION : Designing GAUSSIAN LOW PASS Filter
028- #---------------------------------------------------------------------------
029- def Gauss_LPF(freq_points,fc):
030- freq_domain_filter=np.zeros((freq_points,freq_points))
031- # Creating 2D freq. grid for Gaussian filter generation in freq.
domain
032- f_positions_norm=np.linspace(0,freq_points,freq_points)
033- fx=f_positions_norm/np.max(f_positions_norm)-0.5
034- fy=fx # Because we are taking circularly symmetry
035- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
036- # Sigma of Gaussian dictated by cutoff frequency fc (half power
freq.)
037- sigma_c=np.sqrt((fc**2)/(2*np.log(np.sqrt(2))))
038- # 2D Gaussian creation
039- freq_domain_filter=np.exp(-((fxx**2+fyy**2)/(2*(sigma_c**2))))
040- return(freq_domain_filter)
041-
042- #---------------------------------------------------------------------------
043- # FUNCTION : Designing BUTTERWORTH LOW PASS Filter
044- #---------------------------------------------------------------------------
045- def Butter_LPF(freq_points,fc,n):
046- freq_domain_filter=np.zeros((freq_points,freq_points))
047- # Creating 2D freq. grid for Butterworth filter generation in freq.
domain
048- f_positions_norm=np.linspace(0,freq_points,freq_points)
049- fx=f_positions_norm/np.max(f_positions_norm)-0.5
050- fy=fx # Because we are taking circularly symmetry
051- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
052- # 2D Butterworth creation (fc is already -3db frequency)
053- freq_domain_filter=np.sqrt(1/(1+((np.sqrt(fxx**2+fyy**2))/(fc))**
(2*n)))
054- return(freq_domain_filter)
055-
056- #---------------------------------------------------------------------------
057- # Importing and displaying image in space domain
058- #---------------------------------------------------------------------------
059- input_image=np.float32(cv2.imread('img18.bmp',0))
060- r,c=np.shape(input_image)
061- fig1,ax1=plt.subplots(2,3)
062- fig1.show()
063- mf.my_imshow(mf.norm_uint8(input_image),'(a) Spatial
Domain',ax1[0,0])
064-
065- #---------------------------------------------------------------------------
066- # Image in normalised frequency domain (Magnitude Plot)
067- #---------------------------------------------------------------------------
068- freq_points=np.max([r,c])
069- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
070- mag_image=np.abs(sfft.fftshift(fft_input_image))
071- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Frequency
Domain (Magnitude)",ax1[0,1])
072-
073- # Select the nature of filter (Low pass Highpass, Band Stop, Band Pass)
...
074- # ... and the type (Ideal, Gaussian, Butterworth) accoding to below ...
075- # the match case ladder below ...
076- filter_type=12
077-
078- fc1=.15 # First cutoff (the only cutoff for Lowpass or Highpass filter)
079- fc2=.25 # Second cutoff (for Band stop and band pass filter)
080-
081- n=10 # Order of butterworth filter (if used)
082-
083- #---------------------------------------------------------------------------
084- # Designing Required Filter
085- #---------------------------------------------------------------------------
086- match filter_type:
087- case 1:
088- Str='IDEAL Lowpass Filter'
089- freq_domain_filter=Ideal_LPF(freq_points,fc1)
090- case 2:
091- Str='Gaussian Lowpass Filter'
092- freq_domain_filter=Gauss_LPF(freq_points,fc1)
093- case 3:
094- Str='Butterworth Lowpass Filter'
095- freq_domain_filter=Butter_LPF(freq_points,fc1,n)
096- case 4:
097- Str='IDEAL Highpass Filter'
098- freq_domain_filter=1-Ideal_LPF(freq_points,fc1)
099- case 5:
100- Str='GAUSSIAN Highpass Filter'
101- freq_domain_filter=1-Gauss_LPF(freq_points,fc1)
102- case 6:
103- Str='Butterworth Highpass Filter'
104- freq_domain_filter=1-Butter_LPF(freq_points,fc1,n)
105- case 7:
106- Str='IDEAL Band Stop Filter'
107- LPF=Ideal_LPF(freq_points,fc1)
108- HPF=1-Ideal_LPF(freq_points,fc2)
109- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
110- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
111- case 8:
112- Str='GAUSSIAN Band Stop Filter'
113- LPF=Gauss_LPF(freq_points,fc1)
114- HPF=1-Gauss_LPF(freq_points,fc2)
115- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
116- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
117- case 9:
118- Str='Butterworth Band Stop Filter'
119- LPF=Butter_LPF(freq_points,fc1,n)
120- HPF=1-Butter_LPF(freq_points,fc2,n)
121- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
122- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
123- case 10:
124- Str='IDEAL Bandpass Filter'
125- LPF=Ideal_LPF(freq_points,fc1)
126- HPF=1-Ideal_LPF(freq_points,fc2)
127- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
128- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
129- freq_domain_filter=1-(HPF+LPF)
130- case 11:
131- Str='GAUSSIAN Bandpass Filter'
132- LPF=Gauss_LPF(freq_points,fc1)
133- HPF=1-Gauss_LPF(freq_points,fc2)
134- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
135- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
136- freq_domain_filter=1-(HPF+LPF)
137- case 12:
138- Str='Butterworth Bandpass Filter'
139- LPF=Butter_LPF(freq_points,fc1,n)
140- HPF=1-Butter_LPF(freq_points,fc2,n)
141- freq_domain_filter=(HPF+LPF)-np.min(HPF+LPF)
142- freq_domain_filter=
(freq_domain_filter)/np.max(freq_domain_filter)
143- freq_domain_filter=1-(HPF+LPF)
144- case _:
145- print('select valid input')
146- sys.exit() # To exit the system
147-
148- #---------------------------------------------------------------------------
149- # Plotting the filter response
150- #---------------------------------------------------------------------------
151- mf.my_imshow(mf.norm_uint8(freq_domain_filter),"(c) Frequency
Domain Filter",ax1[0,2])
152- ax1[0,2].plot(freq_points-
c*freq_domain_filter[np.int16(freq_points/2),:])
153- ax1[0,2].axis('on')
154-
155- # Setting the x-ticks as per frequency (f) in range [-.5 to .5]
156- x_positions=np.linspace(0,freq_points-1,5);
157- x_labels=x_positions/np.max(x_positions)-0.5
158- ax1[0,2].set_xticks(x_positions, x_labels)
159-
160- # Setting the y-ticks as per frequency (f) in range [-.5 to .5]
161- y_positions=np.linspace(0,freq_points-1,5);
162- y_labels=y_positions/np.max(y_positions)-0.5
163- ax1[0,2].set_yticks(y_positions, y_labels)
164-
165- ax1[0,2].set_xlabel('fx & f axis ->')
166- ax1[0,2].set_ylabel('fy & amp [0 to 1] axis ->')
167-
168- fig1.suptitle(Str)
169-
170- #---------------------------------------------------------------------------
171- # Filtering in frequency domain
172- #---------------------------------------------------------------------------
173- freq_filtered_image=sfft.fftshift(fft_input_image)*freq_domain_filter
174-
mf.my_imshow(mf.norm_uint8(np.log(1+np.abs(freq_filtered_image))),"(d)
Frequency Domain Filtering",ax1[1,0])
175-
176- #---------------------------------------------------------------------------
177- # Image Transformed back in time domain
178- #---------------------------------------------------------------------------
179- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
180- mf.my_imshow(mf.norm_uint8(image_back_in_time),'(e) PADDED
Spatial Domain',ax1[1,1])
181-
182- #---------------------------------------------------------------------------
183- # Image Transformed back in time domain (displayed without
padding)
184- #---------------------------------------------------------------------------
185- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
186- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(f)
Spatial Domain',ax1[1,2])
187-
188- plt.show()
189- print("Completed Successfully ...")
Code 5.11: General code for various filters
From line number 86 to 146, a match case ladder is shown. It is used to
select one of the possible cases according to the match’s variable, which in
our case is filter_type. Here, it is used to create the filter type chosen. The
string str typed in each case to let us know which filter will be created. At
the beginning of the program, there are three user defined functions for the
creation of low pass ideal filter, low pass Gaussian filter, and low pass
Butterworth filter. Soon, we shall see that high pass, band pass, and band
stop filters can be created from the lowpass filter prototypes. In line number
78, we select the cutoff frequencies for low and high pass filters. In band
pass and band stop filters, there will be two frequencies corresponding to
band edges. The second frequency will be set in line number 79 (we will
discuss this in great detail soon). Additionally, for Butterworth filters, we
will have a filter order parameter as well, which we set in line number 81.
For example, if we want to create a Butterworth filter of order 5 with a
cutoff frequency of 0.1 cycles/sample, we may set fc1 in line number 78
equal to 0.1, n in line number 81 to 5 and filter_type in line number 76 to 3
to get the output shown in Figure 5.27.
Before concluding the discussion on lowpass filters, it is important to note
the phase response, which we have not addressed yet. Linear phase is
essential in image processing applications, and we will explain why later in
this book. However, all the filters that we have constructed above and the
ones that we are going to construct in the next few sections have linear
phases. In fact, one result from signal processing states that symmetric finite
impulse response filters have linear phase, which means that if our filters are
so designed that their existence in the space/time domain is bounded by
some lower and upper limit on space/time (i.e., finite number of samples in
our case) and if their magnitude response possesses symmetry about the
origin, then the phase response of the filter is guaranteed to be linear. This is
the case with all the filters that we have designed so far and those that we are
going to design in the coming sections. The result of application of code is
shown in following figure:
Figure 5.28: Result of Code 5.11
Also, keep in mind that ideal, Gaussian, and Butterworth filters are not the
only filters available – there are many more, but they will give similar
results. The crux is that if we have understood how to deal with these,
exploring new filters by ourselves will be trivial.

5.5.6 High pass filtering of images in frequency domain


One of the easiest ways of creating a high pass filter is to subtract the low
pass filter response from unity (if the low pass filter has a magnitude
spectrum bounded by 0 and 1). This is what we will do in this section. Also,
note that from the discussion on low pass filters, we know that Butterworth
filters can be tuned to ideal magnitude response by increasing the order and
can be made to have a wide transition band like the Gaussian filter if we
decrease the order. So, in this section, we will present the results for
Butterworth filters only. However, Code 5.11 is designed to accommodate all
kinds as discussed earlier.
In Figure 5.29, the response of the Butterworth filter with order 3 and cutoff
frequency 0.2 cycles/sample is shown. It is not surprising to note that output
contains the edge information (i.e., the high-frequency content of the image).
The response is generated from Code 5.11 by setting filter_type in line
number 76 to 6, which selects a Butterworth high pass filter from the match-
case ladder, and fc1 to 0.2 and n to 3 in line numbers 78 and 81 respectively.
Also, note that since it is a Butterworth filter of order 3, the problem of
fringing is not visible. One can try the same thing for an ideal high pass filter
by setting the filter_type to 4 and notice that there will be fringing. From
line number 102 to 104 in Code 5.11, one can note that from a low pass filter
prototype, a high pass filter is created simply by subtracting it from 1:

Figure 5.29: Response of Butterworth high pass filter with cutoff frequency = 0.3 cycles/sample and
order 3
In Section 4.9.1, unsharp masking and high boost filtering were discussed.
The same strategy can be applied here by using the results developed in this
section instead of obtaining the high-frequency filtered content from the
process of spatial filtering by a suitable kernel.

5.5.7 Band stop filtering and notch filters


A band stop filter stops frequencies in a given range [fc1,fc2 ] (Note
thatfc1<fc2). From the point of view of its construction, it can be made by
summing a lowpass filter with a cutoff frequency fc1 and a high pass filter
with a cutoff frequency fc2, as shown in Figure 5.30.
We know from the previous section that a high pass filter can be constructed
from a low pass filter. This means that a band stop filter can be easily
constructed from two different low pass filters with cutoff frequencies fc1
and fc2.
Having understood the construction and purpose of band stop filters, let us
see them at work in Figure 5.31. In the input image in part (a) [courtesy -
https://commons.wikimedia.org/wiki/File:Rosa_Gold_Glow_2_with_inte
rference.jpg under license -
https://en.wikipedia.org/wiki/GNU_Free_Documentation_License ], one
may note periodic noise is present. Periodic noise appears in images due to
the noise generated by capturing devices. These devices produce
characteristic noise. One of the characteristics of periodic noise is that it is
very well localized in the frequency domain.

Figure 5.30: Construction of a band stop filter from cum of lowpass and high pass filters
The same may be noted in part (b) of the figure. Apart from the central
bright spot (which corresponds to the average brightness level of the entire
image), there are two dominant spikes in magnitude plot. In part (c) of the
figure, a band stop filter to remove these spikes is constructed
[fc1=0.15,fc2=0.25] and order of the filter n=10. One may note that because
of the filtering caused by this, the output is practically free from the periodic
noise initially present in the original image.
Figure 5.31: Band stop filtering for an input image with periodic noise
It is important to note that if the difference fc2-fc1 is very small, the band stop
filter is then called by a special name called notch filter. This is used when
the portion from the frequency domain to be removed is localized in a very
tiny width region. The output in Figure 5.31 is generated by setting
filter_type=9 in Code 5.11 and setting the parameters, as shown in Figure
5.31. Note that in the match case ladder, for filter_type=9, after creating the
band stop filter, normalization is also done to bring the values of magnitude
in the range 0 to 1.

5.5.8 Band pass filtering of images


A band pass filter passes frequencies in a given range: [fc1, fc2](Note that
fc1<fc2). From the point of view of its construction, it can be made by
subtracting a properly tuned and normalized band stop filter from 1; as easy
as that. Let us see a band pass filter at work in Figure 5.32. The filter created
in this figure has the same parameters as the band stop filter of the previous
section except filter_type=12. For the same input image corrupted by
periodic noise, the output in this case will only contain the periodic noise.
This is expected, as a band-stop filter eliminates the range of frequencies
where the noise is present and since a band pass filter will do just opposite,
we get only noise in the output. One more point to note is that the extracted
band has two noise impulses (as seen in the frequency domain) but also
some portion of an image (though small). This portion also appears in the
output image.
Also, note that in Code 5.11, in match case 12, after the creation of the band
pass filter, normalization is necessary to make the filter values in the range 0
to 1 so that there is no amplification/de-amplification of the filtered image.
In this case as well, if the difference fc2-fc1 is very small, we get a notch-pass
filter.

Figure 5.32: Butterworth band pass filtering on an image corrupted with periodic noise
At this stage, it is recommended to play with Code 5.11 and try all the
possible combinations. In fact, while designing a band stop filter/band pass
filter, one may use Butterworth lowpass filter and a high pass filter
constructed from Gaussian or ideal low pass filter (not present in Code 5.11).
Any combination will do. The selection of a specific filter type depends on
the usage.

Conclusion
The frequency domain presents alternate insights into the signal or system’s
structure. In the spatial domain, one has excessive control over the actual
pixel values. In the frequency domain, one can control (filter out) edges like
sharp regions, smooth regions, etc. In this chapter, methods to transform the
signal from one to another domain were presented together with an
illustration of filtering operations, which were equivalent to spatial domain
kernel based methods introduced in the previous chapter. In the next chapter,
we will study the non-linear filtering and concerns regarding phase during
filtering.

Points to remember
• The Fourier family of transform has many methods like CTFS, CTFT,
DTFS, DTFT, DFT, etc. One needs to understand which to use when,
according to the input signal type.
• Analog frequency is measured in cycles/second, but discrete frequency is
measured in cycles/sample.
• Low frequencies will mean low variations in signal. In an image low
frequency region may correspond to a clear sky region or wall of a
building painted in a single color, etc.
• High frequency regions have sharp edges, corner points, and a lot more
structures.
• One analog frequency has infinitely many equivalent discrete
frequencies called aliases.
• When an image is brought into the frequency domain, one must take an
equal number of samples on the X and Y axis in the frequency domain
so that one frequency lies completely on a circle instead of an ellipse.
• The order of the Butterworth filter decides its closeness to the ideal filter
shape. The higher the order, the closer it is to the ideal shape.
• A lowpass filter prototype can be easily converted to a high pass, band
pass, or band stop filter by using simple equations.

Exercises
1. Take an indoor image (like an image of a study room, etc.). Apply a
high pass Butterworth filter to it and comment on the results so
obtained.
2. On the same image taken in question 1, try changing the filter order for
the Butterworth filter and compare the results obtained.
3. Sharpen an image by using a high pass filter created in the frequency
domain.
4. Explain the context of median filtering as a non-linear filter and discuss
its applications.
5. Take a picture in the dark and perform contrast and illumination
enhancement on it.

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
CHAPTER 6
Non-linear Image Processing and
the Issue of Phase

6.1 Introduction
Convolution, as elaborated in the previous chapters, is quite useful for
accomplishing linear filtering using space/time domain kernels (for filtering
in space/time domain) and frequency domain filters (for frequency domain
filtering). It is a linear operation. In this chapter, we will understand that
filtering can be non-linear too. We will also explore non-linear filtering in
this chapter and touch upon the issue of conversion of data from a
continuous to a discrete/digital world and its side effects called aliasing. We
could have done this in one of the previous chapters, but the context that we
have built by now will help us strengthen our understanding better.

Structure
This chapter discusses the following topics:
• Median filtering and salt and pepper noise removal
• Sampling theorem
• Homomorphic filtering
• Phase and images
• Selective filtering of images
Objectives
The primary objective of this chapter is to understand the non-linear
methods for image processing. These operations do not use the convolution
operation, as convolution is linear in nature. The reader will also understand
the sampling process used to convert the continuous scene to an image in
digital form. You will also understand the contribution of phase in image
processing as, until now, only the magnitude part of the spectrum is used for
filter design (in the linear case).

6.2 Median filtering and salt and pepper noise removal


Noise is a topic we will discuss in subsequent chapters, but to explain
median filtering (a type of nonlinear filtering in the spatial domain), we will
take an example of a noisy image, as shown in Figure 6.1 (a). It is the image
of tree branches (barely visible due to noise) and the sky.

Figure 6.1: Gaussian filtering vs. median filtering for salt and pepper noise removal
One may note that the kind of noise visible in Figure 6.1 (a) is like spilling
ground salt and pepper on the image. Noisy pixels are either perfectly white
(salt) or black (pepper). A natural way to get rid of this kind of noise is to
smoothen the image. That is what we do in part (b) of Figure 6.1. We
filtered the image using a Gaussian kernel in the spatial domain. The noise is
not removed but gets worse (the image has become jittery). Also, blurring
due to the weighted averaging nature of the Gaussian filter makes us
unhappy. We want the noise to be removed and do not want blurring to be
there. Ideally, we are expecting the results shown in Figure 6.1 (c); that is,
salt and pepper noise is removed, and there is no blurring. Tree branches and
the sky are now visible. That is what median filtering does as seen in the
following figure:

Figure 6.2: Illustration of noise removal by Gaussian filtering and median filtering
In order to understand how median filtering works, let us first take a one-
dimensional equivalent example, as shown in Figure 6.2. Let us talk about
the noise-corrupted signal in part (a) of the figure first. The signal shape is
such that its first half is one full period of sine wave followed by the second
part, which is a bipolar pulse (+1 and -1 values). The first half of the signal
(one wave of sinusoid) is corrupted by salt and pepper noise. The salt
component has a value of +3, and pepper has a value of -3. The signal shape
is chosen on purpose. From the first half of the signal, we will learn how
noise removal is done, and from the second half of the signal, we will learn
the smoothening caused by the filter/kernel (which is undesirable).
When we smoothen a signal by using a kernel in the spatial domain through
the process of convolution. Recall from Section 4.5.1 that for the pixel under
processing, the processed value (the result of convolution for that sample) is
essentially the weighted average of its neighborhood. We also know that
averages are highly affected by extreme values in the total samples being
averaged. This justifies the shape of the first half of the signal in Figure 6.2
(b) where the Gaussian filtering is done. Since averages are affected by
extreme values, in the first half of the signal, we have a jittery (still noisy)
shape. The second part of the signal did not possess noise to begin with. Due
to weighted averaging caused by the Gaussian filter, notice that the edges of
the bipolar pulse in the second half of the signal are smoothened out which is
an undesirable effect.
In Figure 6.2 (c), we do median filtering. Instead of taking the weighted
averages around the neighborhood of the pixel under processing (linear
operation), we take the median (nonlinear operation). Median, as we know,
is the middle value in the set of ordered data. So, extreme values will not
affect its calculation. This is why, in Part (c) of the figure, the signal is
noise-free with no jittery behavior. For this reason, no additional
smoothening is introduced in the latter part of the signal; this is what we
desire.
The one-dimensional understanding that we have developed can be directly
generalized to two-dimensional signals (i.e., images) as shown in Figure 6.1.
The results can be easily interpreted now. The results in Figure 6.2 are
generated using Code 6.1. The code is straightforward to understand. Also,
note the inbuilt way of applying the Gaussian filter in the spatial domain in
Line 24:
01-
#=====================================================
=================
02- # PURPOSE : Learning Denoising by MEDIAN Filter vs Gaussian Filter
03- # (Salt and Pepper Noise)
04-
#=====================================================
=================
05- import cv2
06- import matplotlib.pyplot as plt
07- import numpy as np
08- import scipy.ndimage as sci
09- import my_package.my_functions as mf # This is a user defined package
10- # one may find the details related to its contents and usage in section
2.7.3
11-
12- #--------------------------------------------------------------------------
13- # Importing and displaying the image
14- #--------------------------------------------------------------------------
15- a=cv2.imread('img21.bmp',0)
16- fig,ax=plt.subplots(1,3)
17- fig.show()
18- mf.my_imshow(a,'(a) Input Grayscale Image',ax[0])
19- a=np.float32(a)
20-
21- #--------------------------------------------------------------------------
22- # Gaussian filtering (Linear) the image in space domain and displaying
23- #--------------------------------------------------------------------------
24- filtered_image=sci.gaussian_filter(a,1) # Second argument is sigma of
Gaussian
25- mf.my_imshow(mf.norm_uint8(filtered_image),"(b) Gaussian Filtered
Image",ax[1])
26-
27- #--------------------------------------------------------------------------
28- # Median filtering (Non-Linear) the image in space domain and
displaying
29- #--------------------------------------------------------------------------
30- filtered_image2=sci.median_filter(a,3) # Second argument is
neighborhood size
31- mf.my_imshow(mf.norm_uint8(filtered_image2),"(c) Median Filtered
Image",ax[2])
32-
33- plt.show()
34- print("Completed Successfully ...")
Code 6.1: Salt and Pepper noise removal by Gaussian versus median filtering
What is the reason for the presence of median noise? How to simulate it in
experiments? All these questions will be answered in subsequent chapters in
this book.

6.3 Sampling theorem


Continuous and discrete worlds behave in different ways. The way one
element looks in a continuous world might be poles apart from how it looks
in a discrete world. Does it mean if that the great pyramids of Egypt in a
continuous world would appear as if they were the Taj Mahal from India in
the discrete world? The answer is, maybe yes! Here is the point: The key
connection between a continuous world and a discrete world is the sampling
frequency. If it is not chosen correctly, the result could be like turning the
pyramids into the Taj Mahal. Let us explore this in the next sub-section.
Note that this section, that is, Section 6.3 and its sub-sections are not specific
to non-linear processing. It is in general applicable to any discrete world
conversion.

6.3.1 Aliasing
To understand how the pyramids could be seen as the Taj Mahal or vice
versa, refer to Figure 6.3. There are two continuous time signals in it. Both
are sine signals but one with frequency F1= Hz, d the other with frequency
F2= Hz. To bring these signals inside computers or in general digital
signal processors, they must be digitalized (time discretized and amplitude
discretized. Here, we will focus on time discretization. Let us take samples
of both signals with the same sampling frequency Fs=1 Hz. This means that
one sample will be taken every second for both signals. The sampled signals
for the corresponding continuous-time signals are also shown in the figure.
One may note that samples of both signals coincide! This is where our
problem begins. In the continuous time world, there are two different signals
(i.e., with two different frequencies), but in the discrete time world, they
become the same. That is where the pyramids become the Taj Mahal or vice
versa.
So, before looking for a solution to our problem, let us first try to find out
the frequency of the sampled signal formed in both cases. Recall that for a
continuous time world, we are using capital letters and for a discrete time
world we are using small letters to represent frequency.
Continuous world and discrete world frequencies F and f are related through
the sampling frequency Fs as per the following equation:
Equation 6.1:
So, the discrete frequency of the sampled signal 1 is
cycles/sample. Similarly, for the second discrete time signal (which comes
from the second continuous time signal after sampling), discrete frequency is
cycles/sample. Let us check whether f1=f2? Evidently, not
because . We now need to understand how the sampled signal is the
same for both cases.
Refer to the following figure for a better understanding:

Figure 6.3: Alias signals in discrete time


Recall from Section 5.3.1 (Figure 5.6) that in the discrete-time domain, a
frequency of f is the same as f ± integer. Keeping this in mind and knowing
that =1+ , the two signals from a continuous-time world will appear as the
same signal in a discrete-time world. We call F2= , an alias of F1= at a
given sampling frequency of Fs=1 Hz or vice versa. In other words, F1 and
F2 are aliases of each other at a given sampling frequency of Fs. Both map to
a discrete frequency of f = f1= f2 in the discrete time world.
Having understood what aliases are, let us now understand the aliasing
phenomenon. Refer to Figure 6.4. The left column subplots that are, Parts
(a), (c), and (e) have continuous time signals with frequencies
Hetrz and signal in Part (e) has two frequencies F1
and F2. Their corresponding discrete time versions are presented in the right
column of the figure as Parts (b), (d), and (f). All the continuous time signals
shown are sampled at the sampling frequency Fs=2 Hetrz. So, the discrete
time frequencies in parts (b), (d), and (f) are f1= cycles/sample,
cycles/sample (so, Hetrz is an alias of, hetrz at
Fs=2 hetrz) and signal in Part (f) has both f1 and f2 combined. Now, here is
the interesting thing to note: F2>F1 but f2<f1. If we notice part (e) which is a
continuous-time signal comprised of frequencies F1 and F2 with F2> F1, the
corresponding discrete time version in part (f) is a signal with frequencies f1
and f2 with f2< f1. So, we may say that the higher frequency in a continuous
world has been mapped to a lower frequency in a discrete time world at the
given sampling frequency. This is what we call aliasing and because of
aliasing, the shape of the signals is significantly different in continuous time
and discrete time domain as shown in the following figure:

Figure 6.4: Aliasing at a sampling frequency of 2 hertz


Having understood aliasing in one-dimension, let us see how it appears in
two dimensions. Refer to Figure 6.5 to understand the image that we are
going to use to illustrate aliasing in two dimensions. Part (a) of this figure
illustrates the highest frequency image that is possible to display. It has a
chessboard pattern, and every square is one pixel thick. That is the highest
frequency that can be depicted along horizontal and vertical directions. Since
the chessboard pattern is only a pixel thick, in part (b) of the image, we have
zoomed a portion from part (a) as shown by the region enclosed by dots in
part (a). This figure is generated by using Code 3.14. The input image is
custom generated. In part (b) of the figure, one may note the chessboard
pattern.
Using the code for projective transformation and this input figure, it is not
hard to demonstrate the aliasing phenomenon. This is illustrated in Figure
6.6. Part (a). The image shows the same high-frequency image. It also
shows the 4 dots marked for applying the projective transformation as per
Code 3.14 (recall that the order of points clicked, while the code is run,
should be anti-clockwise starting from the top left point of the quadrilateral).
In Part (b), the image is projective transformed. For the most part, it is as
expected, but at the bottom part, it has some patterns that cannot be
explained. This is what two-dimensional aliasing is. These patterns are low-
frequency patterns. Next, let us see how these patterns are generated?
Let us first note the line between the first two dots in the images (the top two
points). Due to projective transformation, this line is stretched (as can be
seen in Part (b) of the figure). Earlier, a few pixels represented the line, and
now, in part (b), more pixels represent the same line. Refer to the following
figure to understand aliasing in two dimensions:
Figure 6.5: Test image for understanding aliasing in two dimensions. (a) High frequency image (b)
Zoomed version of high frequency image obtained by using projective transformation
Refer to the following figure for a better understanding:

Figure 6.6: Two-dimensional aliasing


Now, let us talk about the bottom two points in both images. Due to
projective transformation, earlier in Part (a), the connecting line between the
two points had enough pixels to be able to represent the high-frequency
pattern. Any reduction in the number of pixels will lead to the pattern not
being captured perfectly by the image. That is what aliasing is. In part (b)
the corresponding points have a smaller number of pixels to capture that
high-frequency pattern. Equivalently the sampling frequency is not high
enough. This leads to the generation of patterns (due to the corresponding
low-frequency aliases) not present in the original image.
Note: Aliasing can be present in any input image in general because when we capture images
from the real world, there are infinite frequencies present in the real scene. Digital images can
never capture infinite frequencies. So, every digital image is aliased. However, there are
techniques to reduce aliasing that will be discussed in a subsequent part of the book and the
next subsection addresses this problem by applying constraints on sampling frequency.

6.3.2 Sampling theorem in 1D


Let us try to understand the sampling theorem from a one-dimensional
perspective. Generalizations to higher dimensions are straightforward.
We are currently concerned with a band limited signal. A band limited signal
means that the frequency content of that signal is limited to a finite band of
frequencies, that is, there is a lower and higher cutoff frequency beyond
which frequencies are absent in the original continuous time/space scene.
Without losing generality, let us consider that the band of frequencies in
which the signal exists is [-Fm,Fm] in the continuous time domain. Fm Here
stands for the maximum frequency contained in the signal. Now, beyond this
range, there are no frequencies in the signal so there is no question of
aliasing from the outside band frequencies. Our objective here is to remove
any possibility of aliasing amongst the frequency content in the range [-
Fm,Fm ] when the signal is discretized in time/space at the sampling
frequency Fs.
Hence, our problem statement is frequency -Fm in the interval [-Fm,Fm ]
should not be an alias of Fm. If that is guaranteed, there is no chance that
other frequencies in the given range will become aliases of each other. The
discrete range after continuous to discrete conversion will be
. In the discrete world, fm=fm± integer. Equivalently, in
the continuous world, at a given sampling rate of Fs, Fm=Fm± integer×Fs.
Let us keep the value of the integer as 1 because all the other combinations
will still be at higher frequencies and will be equivalent. Out of this, the
frequency of -Fm in continuous world can only be aliased by Fm-Fs (and not
Fm+Fs as this will go to the higher frequency range). So, we want Fm-Fs <
Fm or Fs>2Fm. Let us put it formally:
Sampling theorem: A continuous time/space signal can be represented by
its samples and can be recovered back when the sa]mpling frequency is
greater than twice the highest frequency component of that continuous
time/space signal.
By ensuring the above, there will be a one-to-one relation between
continuous time/space frequency and the discrete time/space frequency at
the given sampling rate. Although we are not talking about the back
conversion of signal in a discrete to continuous world, the sampling theorem
guarantees exact reconstruction too.
In nature, there are infinite frequencies, but humans can only see a small
range of frequencies called the visible range. This forms the basis of
considering image signals as band limited and accordingly capturing devices
(cameras) are made.

6.4 Homomorphic filtering


Like median filtering, homomorphic filtering is yet another technique for
non-linear filtering of images. Median filtering was done in the space
domain, but homomorphic filtering will be done in the frequency domain.
The objective of homomorphic filtering is to improve the illumination and
enhance the contrast in the images at the same time. Several times, the
images we take are non-uniformly illuminated. They may also have poor
contrast. This mode of filtering tries to address both these issues by first
applying non-linear transformation on the image (log transformation) and
then transforming the image into a frequency domain where linear filtering
techniques can be applied to process the image. Do not worry if this sounds
too much. We will discuss this in great detail. Let us build some
prerequisites first in the next sub-section.

6.4.1 Illumination reflectance model of image formation


An image m(x,y) is made up of two multiplicative components: Illumination
i(x,y) and Reflectance r(x,y). Refer to the following equation for clarity:
Equation 6.2:
m(x,y)=i(x,y)r(x,y)
Once the image is captured, there is no way to separate the illumination and
reflectance components from each other, but we would like to do so. We may
want to strengthen the illumination sometimes so we might require i(x,y)
component or we may require r(x,y) to strengthen the overall contrast.
Usually, in images, illumination is a slowly varying phenomenon, and hence,
that component will have a low-frequency range. Reflectance is the property
of how a body reflects light, and it might change drastically from object to
object (that is, at edges) and hence it is concentrated towards high-frequency
region. Reflectance is thus responsible for contrast. Can we simply apply a
lowpass filter to separate illumination and a high-pass filter to separate out
reflectance? No, we cannot as i(x,y) and r(x,y) are not added to the Equation
6.2 instead, they are multiplied. So, a natural solution is to take a logarithm
on both sides, as shown in the following equation:
Equation 6.3:

log(m(x,y))=log(i(x,y))+log(r(x,y))

The above equation will enable us to separate the illumination and


reflectance components in a frequency domain, but it also introduces non-
linearity. It is not a bad thing to have as long as we know how to deal with it.
Let us now take the Fourier transformation of the image, as shown below:
Equation 6.4:
F{log(m(x,y))}=F{log(i(x,y))}+F{log(r(x,y))}
Remember that the Fourier transform is a linear operator, and hence in the
above equation, it is possible to distribute it over right hand side It is not
possible to do so in the Equation 6.2 over RHS as for the Fourier operator,
distributivity over addition is possible but not over multiplication. Equation
6.4 can be rewritten as the following equation:
Equation 6.5:
M(u,v)=I(u,v)+R(u,v)
With M(u,v)=F{log(m(x,y))}, I(u,v)=F{log(i(x,y))} and
R(u,v)=F{log(r(x,y))}. u,v are frequency domain indices.
Also, note that the Equation 6.5 is a remarkable achievement. It is so
because it lets us separate illumination and reflectance components
(logarithms of those) just by knowing the frequency that separates these two
in the frequency domain. We will use this to our advantage in the next two
sub-sections for illumination and contrast enhancement.

6.4.2 Improving illumination in images


To enhance the illumination component in images so that uniform
illumination can be achieved in the entire image, Equation 6.5 opens a way.
All that is needed is to apply a lowpass filter H(u,v) to M(u,v), and the
illumination will be automatically enhanced. This is shown in Figure 6.7 (b)
for an input image in part (a). Notice that the image in part (b) has enhanced
illumination (uniform) but due to the lowpass nature of the filter, there is
some blurring as well. Although the illumination is enhanced, the contrast
becomes poor due to smoothening:

Figure 6.7: Homomorphic filtering


The code for generating Figure 6.7 is shown in Code 6.2. It is used both for
illumination and contrast enhancement in the next section. Part (c) of Figure
6.7 shows contrast enhancement (which will be elaborated on in the next
subsection). So, let us understand the code to gain more insights about both
the processes:
01-
#=====================================================
=================
02- # PURPOSE : Learning Homomorphic Filtering
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # FUNCTION : Designing BUTTERWORTH LOW PASS Filter
12- #---------------------------------------------------------------------------
13- def Butter_LPF(freq_points,fc,n):
14- freq_domain_filter=np.zeros((freq_points,freq_points))
15- # Creating 2D freq. grid for Butterworth filter generation in freq.
domain
16- f_positions_norm=np.linspace(0,freq_points,freq_points)
17- fx=f_positions_norm/np.max(f_positions_norm)-0.5
18- fy=fx # Because we are taking circularly symmetry
19- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
20- # 2D Butterworth creation (fc is already -3db frequency)
21- freq_domain_filter=np.sqrt(1/(1+((np.sqrt(fxx**2+fyy**2))/(fc))**
(2*n)))
22- return(freq_domain_filter)
23-
24- #---------------------------------------------------------------------------
25- # Importing and displaying image in space domain
26- #---------------------------------------------------------------------------
27- input_image=np.float32(cv2.imread('img5.bmp',0))
28- r,c=np.shape(input_image)
29- fig1,ax1=plt.subplots(3,1)
30- fig1.show()
31- mf.my_imshow(mf.norm_uint8(input_image),'(a) Input Image',ax1[0])
32- input_image=np.log(1+input_image)
33- # Although we need the logarithm of the input image but instead, in the
above line,
34- # we use (1+input_image) because log(0) is -Inf and since uint8 images
have range
35- # from 0-255, we change it to 1-256 for intermediate computations, later
36- # mf.norm_uint8 function will bring them to 0-255 range after
processing
37-
38- #---------------------------------------------------------------------------
39- # Bring the log image into frequency domain
40- #---------------------------------------------------------------------------
41- freq_points=np.max([r,c])
42- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
43-
44- #---------------------------------------------------------------------------
45- # Designing LPF for illumination enhancement & Filtering
46- #---------------------------------------------------------------------------
47- fc1=.05 # Cutoff Frequency of LPF
48- n1=10 # Order of Butterworth LPF
49- freq_domain_filter_LPF=.01*Butter_LPF(freq_points,fc1,n1)
50- # Butterworth filter gain (.01) in above line is explained in text
51-
freq_filtered_image_LPF=sfft.fftshift(fft_input_image)*freq_domain_filter_
LPF
52-
53- #---------------------------------------------------------------------------
54- # Designing HPF for contrast enhancement & Filtering
55- #---------------------------------------------------------------------------
56- fc2=.0000001 # Cutoff Frequency of HPF
57- n2=10 # Order of Butterworth HPF
58- freq_domain_filter_HPF=1-Butter_LPF(freq_points,fc2,n2)
59-
freq_filtered_image_HPF=sfft.fftshift(fft_input_image)*freq_domain_filter_
HPF
60-
61- #---------------------------------------------------------------------------
62- # Image Transformed back in time domain (displayed without zero
padding)
63- #---------------------------------------------------------------------------
64-
image_back_in_time_LPF=np.exp(sfft.ifft2(sfft.ifftshift(freq_filtered_image
_LPF)))
65- # exponential function in above line is used to remove the effect of
logarithm
66- mf.my_imshow(mf.norm_uint8(image_back_in_time_LPF[0:r,0:c]),'(b)
Illumination enhanced',ax1[1])
67-
68-
image_back_in_time_HPF=np.exp(sfft.ifft2(sfft.ifftshift(freq_filtered_image
_HPF)))
69- # exponential function in above line is used to remove the effect of
logarithm
70-
mf.my_imshow(mf.norm_uint8(np.log(1+image_back_in_time_HPF[0:r,0:c]
)),'(c) Contrast enhanced',ax1[2])
71-
72- plt.show()
73- print("Completed Successfully ...")
Code 6.2: Homomorphic filtering code
From i, a Butterworth lowpass filter is defined for later usage. We have
defined the Butterworth filter only because a high-order Butterworth filter
will be close in response to an ideal filter but will not have as much of a
fringing problem as an ideal filter possesses due to smoothness.
In Line 32, we have brought the input image into the logarithm domain (no-
linear domain), as per our discussion for the Equation 6.3. Note that we have
added a 1 to the image because the range of intensity values will be from 0-
255 but log(0) is not defined. So, to avoid this problem, we have changed the
range from 0-255 to 1-256. This is not a problem, as later we can remove
this by using our norm_uint8 function. We take the Fourier transform of the
log-transformed image in Line 42 as per Equation 6.4 or alternatively
Equation 6.5.
From Lines 47 to 51, we have designed a Butterworth lowpass filter and
filtered our image in the frequency domain. Note that this is linear filtering,
but it is done in a non-linear domain as the image has already been log-
transformed. Now, one may note that the Butterworth lowpass filter, as
designed in line 49 has gained 0.01 and not 1. To understand the reason,
have a look at the following results:
If a=[1,10,100,1000],
then exp(1×log(a)) = [1,10,100,1000], Gain=1
and exp(10×log(a)) = [100,1010,1020,1030], Gain=10
but exp(0.01×log(a)) = [1,1.02,1.04,1.07], Gain=0.01
In the above results, we have taken a range of numbers from 1 to 1000. In
the next 3 lines, we have taken the log of these numbers, multiplied them
with a gain, and then undone the log by taking the exponential. Keeping the
above results in mind and noting that we want the magnitude response of the
input image’s FFT to be uniformly strengthened in the lowpass band after
the image is filtered, a gain of 0.01 or any lower number is a good idea. That
is why in line 49 of the code, we multiply a gain of 0.01 to the created
lowpass Butterworth filter.
Then in Lines 64 to 66, the frequency domain filtered image is brought back
to the time domain, and the effect of the logarithm is reversed by taking
exponential (line 64). Due to this, we get the result as shown in Figure 6.7
(b). This is one way of doing homomorphic filtering for illumination
enhancement but as discussed earlier, though illumination enhancement is
done, due to the lowpass filter, blurring appears in the image that reduces the
contrast/details of the image. The lower the cutoff frequency, the higher the
blurring. We do not know what cutoff to select, hence it is found by the
visual quality of the output. To circumvent this problem, we will do contrast
enhancement in the next sub-section and will observe that illumination
enhancement will be achieved as a byproduct. Also, the remaining lines of
code will be relevant to the next sub-section.

6.4.3 Improving the contrast in images


Contrast, as discussed in earlier sub-sections, is linked to the reflectance
component of the image. All we need to do then is, instead of applying a
lowpass filter, apply a high pass filter to the frequency domain
representation of the log-transformed image (as in Equation 6.4 or Equation
6.5). The cutoff of this high-pass filter is supposed to be as close to zero
frequency as possible.
It is already clear that high pass operation is carried out to enhance contrast.
Let us now see the effect of removing the low-frequency components of the
frequency domain representation from a log-transformed image. We will
note whether the removal of the illumination component is it not. The thing
to understand is that the removal of the illumination component is equivalent
to strengthening the illumination component. We will see the reasons for
this. Imagine we have an image that has a non-uniform illumination to begin
with. To enhance its illumination, in the previous sub-section, we applied a
lowpass filter with gain < 0 to the frequency domain representation of the
log-transformed image. This was done in order to make the gain
approximately 1 (or approximately constant) in that region. When we apply
a high pass filter to strengthen contrast, automatically, the gain in the low
pass region becomes constant (definitely not 1). So, it has the same effect of
making the illumination uniform everywhere (whether it is 1 or some other
value is irrelevant).
Since the frequencies centered around near zero frequency region are
dominantly responsible for overall image brightness, therefore we want to
make the cutoff very low to include this in the stop band of the high pass
filter.
The generation of high-pass filters in Code 6.2 is in Lines 56 to 59.
Note: In Line 56, the cutoff frequency is chosen to be extremely low. The rest of the lines of
code are the same as for the earlier lowpass filter. The result of this can be seen in Figure 6.7
(c). This image has great illumination (comparable to part (b)), and the contrast is also good.
Blurring is also absent. So, it is a win-win situation. Illumination is now uniform, the contrast is
good, and there is no blurring.

This is why, when we say homomorphic filtering, we mean the application


of a high pass filter (Butterworth or any other) with a cutoff frequency close
to zero frequency to the frequency domain representation of the log-
transformed image. It leads to simultaneous illumination and contrast
enhancement.

6.5 Phase and images


Until now, in this book, we have not plotted the phase spectrum of any
image. Usually, for signals of a general kind, phase is an extremely
important part of the spectrum. Most of the signal processing books dedicate
a significant portion to non-linear phase filter design. So, how have we not
given due credit to the phase spectrum? If it is an image, for its processing,
we need systems with linear phase only. We will try to figure this out in this
section.

6.5.1 Phase spectrum of images


Before we elaborate on the phase and its game, let us first try to see how to
plot the phase spectrum of an image. Refer to Figure 6.8. The phase plot can
be easily noticed in part (c) of the figure. Unlike the magnitude plot (part (b)
of the figure), which has a central bright spot, the phase plot possesses no
visibly distinguishable information in this case. Most natural images will
have a phase plot that looks similar. Although they are boring to look at, they
carry significant information, as will be demonstrated in the next sub-
section. Before that, where |h(w)| is the magnitude response and ϕ(w) is the
phase response of frequency domain function h(w). in the code, rhs of
Equation 6.6 is written as
mag_image*np.exp(np.sqrt(-1+0j)*phase_image)) in line 35.
Code 6.3 shows how to plot the phase plot. Notice line 29 in where |H(w)| is
the magnitude response and ϕ(w) is the phase response of frequency domain
function H(w). In the code, RHS of Equation 6.6 is written as
mag_image*np.exp(np.sqrt(-1+0j)*phase_image)) in line 35.
Code 6.3 is where the angle is calculated using np.angle command. It is as
simple as that. Also, note line 35 in the code, which is a code manifestation
of the two-dimensional version of the following equation:
Equation 6.6:
H(w)=|H(w)|ejϕ(w)
Where |H(w)| is the magnitude response and ϕ(w) is the phase response of
the frequency domain function H(w). In the code, the RHS of the Equation
6.6 is written as mag_image*np.exp(np.sqrt(-1+0j)*phase_image)) in line
35.
Refer to the following figure for a better understanding:

Figure 6.8: Magnitude and phase spectrum of an image


01-
#=====================================================
======================
02- # PURPOSE : Learning how to plot phase spectrum of images
03-
#=====================================================
======================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #---------------------------------------------------------------------------
11- # Importing and displaying image in space domain
12- #---------------------------------------------------------------------------
13- input_image=np.float32(cv2.imread('img1.bmp',0))
14- r,c=np.shape(input_image)
15- fig1,ax1=plt.subplots(2,2)
16- fig1.show()
17- mf.my_imshow(mf.norm_uint8(input_image),'(a) Input image',ax1[0,0])
18-
19- #---------------------------------------------------------------------------
20- # Image in normalized frequency domain
21- #---------------------------------------------------------------------------
22- # Magnitude Plot
23- freq_points=np.max([r,c])
24- fft_input_image=sfft.fft2(input_image,[freq_points,freq_points])
25- mag_image=np.abs(sfft.fftshift(fft_input_image))
26- mf.my_imshow(mf.norm_uint8(np.log(1+mag_image)),"(b) Magnitude
Plot",ax1[0,1])
27-
28- # Phase plot
29- phase_image=np.angle(sfft.fftshift(fft_input_image))
30- mf.my_imshow(mf.norm_uint8(phase_image),"(c) Phase Plot",ax1[1,0])
31-
32- #---------------------------------------------------------------------------
33- # Image Transformed back in time domain
34- #---------------------------------------------------------------------------
35-
image_back_in_time=sfft.ifft2(sfft.ifftshift(mag_image*np.exp(np.sqrt(-1+0
j)*phase_image)))
36- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(d)
Recovered Image',ax1[1,1])
37-
38- plt.show()
39- print("Completed Successfully ...")
Code 6.3: Code to plot the complete spectrum of an image: magnitude and phase plot
Also note that, like magnitude plots, we plot phase plots in normalized form,
as discussed in Section 5.5.2.

6.5.2 Swapping of the phase of two images


In this section, we experiment to demonstrate that the phase contains most of
the structural information of the image. The experiment is simple: we take
two images and calculate their magnitude and phase responses. We then
reconstruct the original images (as in the previous sub-section) from the
magnitude and phase responses but with a twist. We swap the phases. Refer
to Figure 6.9 for the results. Notice that just by swapping the phases, it
seems like the first image [image in Part (a)] is similar to the image of part
(d) and the remaining two are visually similar too.
This is strange. We have dedicated our time to designing filters by looking at
magnitude responses only, but it seems that phase can change the game
entirely. That is true for any signal in general.
Figure 6.9: Phase swapping experiment
The results of Figure 6.9 can be reproduced by modifying Code 6.3.

6.5.3 Non-linear phase filters and their effect


Having understood that the phase has an important role to play in images
(signals), it is time to see how images look when we filter them from filters
(systems) with the non-linear phase, refer to the following figure for a better
understanding:
Figure 6.10: Image processed through an all-pass filter with a non-linear phase
See Figure 6.10, where an input image in part (a) of the figure (signal) is
processed from a system whose magnitude response is shown in part (b)
(system). It is clear from the magnitude response (of the system and not the
signal) that the filter is an all-pass filter. That is, it allows all frequencies to
pass through, but if one notices the phase response in part (c), the phase
neither decays nor grows linearly. At some radius, it is high then it turns low,
then high and low again, and so on. If an image passes through such a
system, there is no doubt that all frequencies will pass. No frequency will be
suppressed or enhanced, and all will be treated equally, but some frequencies
will dislocate from their original spatial positions. That is why the result
looks jittery.
An equivalent example in audio signal for filtering with an all-pass filter
with a non-linear phase is where, in the output, all the audio frequencies will
be present in the input, but some frequencies will travel faster as they pass
through the system. This is similar to jumbling the order of words spoken in
a sentence. For an image, it manifests as the jumbling of spatial order and
hence a jittery appearance. However, if the phase is linear (whether it
decays, grows, or remains constant), there will be no jumbling up of spatial
order of frequencies. That is why we only use systems with linear phase
when it comes to images.

6.6 Selective filtering of images


In today’s world, technology facilitates interactive designs. Be it mouse-
based or touchscreen-based, one can now interactively design filters. For
this, one should know what frequencies should be suppressed in frequency
domain representation. Then, there will be no need to design a rule/formula-
based filter structure. The example shown in Figure 6.11 illustrates this:
Figure 6.11: Selective filter design illustration
Shown in Figure 6.11 (a) and (c) is an FFT pair of an input image (first
column) and in parts (b) and (d) output/processed image’s FFT pair. In part
(d) of the figure, one can simply click on the frequency that needs to be
eliminated. At that clicked place, a patch of 1/10 the size of the original
image will be inserted (multiplied element by element) corresponding to the
Butterworth high pass filter (which will eliminate the effect of that in the
output image. One may notice that in the frequency domain (part (c) of the
figure) there are three dominant points. One at the center, which represents
the low-frequency components and hence need not be suppressed. The other
two are probably periodic noises in the original image. We click on those
two points in part (d) which suppresses them. This can be noticed as there
are two dark patches introduced. Also, the processed image in part (b) is
now free of any periodic noise. The result of Figure 6.11 can be compared
with Figure 5.31. One may note that due to the compulsion of designing a
Butterworth filter in the latter, other than the two noise points in the
frequency domain, all the frequency points on the periphery of the
suppressed region in the latter are removed. This, of course, removes the
periodic noise but also causes some kind of smoothening due to the removal
of other essential high-frequency content. This problem is eliminated in
Figure 6.11 where the user can interactively remove the undesired
frequencies. Note that it is not necessary to use a Butterworth filter patch.
One may use any. The result is shown in Figure 6.11 can be generated by
Code 6.4. If you have followed this far, the following code is self-
explanatory:
01-
#=====================================================
=================
02- # PURPOSE : Interactive filter design
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt
05- import numpy as np
06- import scipy.fft as sfft
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #----------------------------------------------------------------------
11- # Butterworth Lowpass filter design
12- #----------------------------------------------------------------------
13- def Butter_LPF(freq_points,fc,n):
14- FREQfilter=np.zeros((freq_points,freq_points))
15- # Creating 2D freq. grid for Butterworth filter generation in freq.
domain
16- f_positions_norm=np.linspace(0,freq_points,freq_points)
17- fx=f_positions_norm/np.max(f_positions_norm)-0.5
18- fy=fx # Because we are taking circularly symmetry
19- fxx,fyy=np.meshgrid(fx,fy) # 2 arrays of fx & fy coordinates in 2D
20- # 2D Butterworth creation (fc is already -3db frequency)
21- FREQfilter=np.sqrt(1/(1+((np.sqrt(fxx**2+fyy**2))/(fc))**(2*n)))
22- return(FREQfilter)
23-
24- #----------------------------------------------------------------------
25- # Importing and displaying image in space domain
26- #----------------------------------------------------------------------
27- input_image=np.float32(cv2.imread('img18.bmp',0))
28- r,c=np.shape(input_image)
29- fig1,ax1=plt.subplots(2,2)
30- fig1.show()
31- # Input Image (spatial domain)
32- mf.my_imshow(mf.norm_uint8(input_image),'(a) Input',ax1[0,0])
33- # Output Image (spatial domain- default initialisation)
34- mf.my_imshow(mf.norm_uint8(input_image),'(b) Output',ax1[0,1])
35-
36- #----------------------------------------------------------------------
37- # Designing Patch of Butterworth HPF for Interactive click
38- #----------------------------------------------------------------------
39- freq_points=np.max([r,c])
40- PatchBy2=np.int16(freq_points/10)
41- patch=2*PatchBy2
42- patch_filter=1-Butter_LPF(patch,.2,5)
43-
44- #----------------------------------------------------------------------
45- # Image in normalised frequency domain (Magnitude Plot)
46- #----------------------------------------------------------------------
47- FFTimg=sfft.fft2(input_image,[freq_points,freq_points])
48- MAGimg=np.abs(sfft.fftshift(FFTimg))
49- # Input Image (frequency domain)
50- mf.my_imshow(mf.norm_uint8(np.log(1+MAGimg[PatchBy2:
(freq_points-PatchBy2),PatchBy2:(freq_points-PatchBy2)])),"(c) Input
(Mag)",ax1[1,0])
51- # Output Image (frequency domain- default initialisation)
52- mf.my_imshow(mf.norm_uint8(np.log(1+MAGimg[PatchBy2:
(freq_points-PatchBy2),PatchBy2:(freq_points-PatchBy2)])),"(d) Output
(Mag)",ax1[1,1])
53-
54-
55- #----------------------------------------------------------------------
56- # Designing interactive filter
57- #----------------------------------------------------------------------
58- # Frequency domain filter
59- FREQfilter=np.ones((freq_points,freq_points))
60-
61- # Infinite Loop for creating interactive filter
62- # and editing the response in realtime
63- while(True):
64- # Getting the coordinates of mouse
65- src=np.int32(np.asarray(plt.ginput(1)))
66- # Calculating top left (row) and top left (column) corner coordinates
67- TLC_c=np.int16(src[0,0]-patch/2)+PatchBy2 # Top left corner
(column)
68- TLC_r=np.int16(src[0,1]-patch/2)+PatchBy2 # Top left corner (row)
69-
70- # Replacing the patch in default initialised frequency domain filter by
71- # patch created above according to the interactive mouse coordinates
72- FREQfilter[TLC_r:(TLC_r+patch),TLC_c:
(TLC_c+patch)]=FREQfilter[TLC_r:(TLC_r+patch),TLC_c:
(TLC_c+patch)]*patch_filter
73- MAGimg[TLC_r:(TLC_r+patch),TLC_c:
(TLC_c+patch)]=MAGimg[TLC_r:(TLC_r+patch),TLC_c:
(TLC_c+patch)]*patch_filter
74- mf.my_imshow(mf.norm_uint8(np.log(1+MAGimg[PatchBy2:
(freq_points-PatchBy2),PatchBy2:(freq_points-PatchBy2)])),"(d) Output
(Mag)",ax1[1,1])
75-
76- #------------------------------------------------------------------
77- # Filtering in frequency domain
78- #------------------------------------------------------------------
79- freq_filtered_image=sfft.fftshift(FFTimg)*FREQfilter
80-
81- #------------------------------------------------------------------
82- # Image Transformed back in time domain (displayed without
padding)
83- #------------------------------------------------------------------
84- image_back_in_time=sfft.ifft2(sfft.ifftshift(freq_filtered_image))
85- mf.my_imshow(mf.norm_uint8(image_back_in_time[0:r,0:c]),'(b)
Output',ax1[0,1])
86-
87- plt.show()
88- print("Completed Successfully ...")
Code 6.4: Selective filtering
It is recommended that the reader play with the above code. If one clicks on
the central bright point for any image (in the frequency domain, part (d) of
the figure), it will be a high pass filter with a cutoff close to zero frequency.

Conclusion
This chapter introduces the concept of sampling, its side effect aliasing, and
the method of avoiding it. Through homomorphic filtering, one gets familiar
with illumination and contrast enhancement. It is also observed that most
structural information is present in the phase part of the total spectrum,
hence it plays an important role in images. Selective filtering is also
introduced to create interactive apps and programs.
In the next chapter we will try to address the issue of noise and degradation
in the images. Although it is impossible to remove the noise completely once
it is added to the image but it can be minimized. That is what we are going to
explore.

Points to remember
• Median filtering is non-linear and is best suited for the removal of salt
and pepper noise.
• Due to improper sampling, the phenomenon of aliasing may occur. Due
to this higher-frequency components may overlap with lower-frequency
components causing visual inconsistencies in the spatial domain. To
avoid this, a proper sampling rate should be selected as per the Nyquist
sampling rate.
• Phase carries the most important structural information of the image.
• Sampling of continuous data to discrete time data is completely
reversible if samples are taken at the Nyquist sampling rate. In that case,
sampling is a lossless process.
• Quantization however is lossy.

Exercises
1. At the sampling frequency of , find 5 aliases of frequency and
corresponding discrete frequencies. Comment on the discrete
frequencies so obtained. [Hint: Section 6.3.1].
2. Explain the advantages of homomorphic filtering.
3. Suggest a way of hiding information in the images by using the fact
that most of the information in images is contained in the phase
spectrum instead of the magnitude spectrum.
4. Plot the frequency domain representation of a good quality image of a
fingerprint and comment on the shape of it. Do you find it similar to the
frequency domain plot of other images?
5. As noted in Figure 6.6, while taking projective transformation, aliasing
might occur. Suggest a method to avoid this situation.
OceanofPDF.com
CHAPTER 7
Noise and Image Restoration

7.1 Introduction
An image may become degraded because of various reasons. It may happen
due to the various stages at which it is stored and processed. These stages
could be capturing, transmitting, and storage. During capturing, we capture
the 3D world into a 2D picture by taking its projection. This projection is not
a perfect 3D to 2D projection because, between the object and the camera
(capturing sensor), there may be electromagnetic interference, fog, or any
other kind of disturbance due to which the intended object is not captured
perfectly. During transmission, the channel may be non-linear, which may
introduce phase distortion. Additionally, it may introduce noise. During
storage, to save space, compression may introduce degradation in the image.
Images that are hard printed onto posters, books, etc., may degrade because
of time. Reasons might include weather, wear-and-tear, etc. All of this leads
to changes in the appearance of images – some of which can be undone, and
the remaining cannot be. In this chapter, we will try to learn techniques that
are used to restore a noisy image to the extent possible.
At this point, we want to clarify the difference between enhancement and
restoration. You may enhance your selfie by applying various beautification
filters available in your mobile apps, but you will restore the scanned version
of an old picture of you that your parents took during your childhood. So,
enhancement is a subjective process, but restoration is objective—and this
chapter deals with the restoration of noisy images only.
We will only discuss methods of reversing the effect of noise and not
degradation. However, most of the time, noise and degradation are
simultaneously present in images. There exists a tradeoff between both
during restoration. Restoring a noisy image will mean blurring the image in
general, and restoring a degraded image will mean high-pass filtering
(sharpening) in some sense. In an image where both noise and degradation
are present, blurring and sharpening must be applied simultaneously to undo
both simultaneously.

Structure
This chapter discusses the following topics:
• Noise and degradation model
• Restoration in presence of noise only
• Detection of noise in images
• Measurement of noise in images using PSNR
• Classical methods of noise removal
• Adaptive filtering

Objectives
After reading this chapter, you will be able to identify the noise distribution
that corrupts a given image. The reader will also be able to quantify and
select a suitable method to minimize noise and, hence, restore it.

7.2 Noise and degradation model


It is impossible to model degradation and noise exactly using any
mathematical model, as their effect cannot be eliminated once the image has
them. However, it is possible to approximate the model of degradation and
noise so that we come closer to the real one that has distorted the original
image.
Figure 7.1: Block diagram of the degradation restoration process
Refer to Figure 7.1 for an understanding of the general degradation-
restoration process. In this figure, an input image i(x,y) free from any noise
or degradation gets corrupted by a degradation function df(x,y). We model
the degradation as a linear shift invariant (LSI) process. Although this will
not be the case in general, LSI models can give a good approximation to the
most naturally occurring degradation phenomenon. Since the degradation is
linear, and it is coming from a system (sensor, transmission channel, or
storage device), the output will be a degraded image di(x,y)=i(x,y)*df(x,y),
i.e., the convolution between signal (image) and system (degradation filter)
in space domain. In the next stage, noise is added to the degraded image, and
we call the result degraded-noisy image dni(x,y)=i(x,y)*df(x,y)+n(x,y). Here,
it is important to note that noise is not always additive. It could be attached
to the signal in non-additive ways (like multiplication) as well. So, we model
noise as additive noise only. The detailed noise distributions will be
discussed in the coming sections. Then, the resulting image has degradation
as well as noise. In this chapter, we will learn several techniques to minimize
the effect of both. We will design restoration filter rf(x,y) in the coming
sections, which when convolved with the degraded-noisy image gives the
restored image ri(x,y)=dni(x,y)*rf(x,y). Equivalently in frequency domain,
multiplication between the corrupted image (signal) and the frequency
domain equivalent of restoration filter (system) will do.
7.3 Restoration in presence of noise only
In this section, we will understand noise in images, how to measure it, and
some classical methods that attempt to minimize it. Complete removal of
noise is impossible; hence, it can only be minimized. In the next subsection,
we will begin by defining noise in images.

7.3.1 Defining noise


Any disturbance mixed in the signal is called noise.
Let us see an example in Figure 7.2. Part (a) of the figure shows the original
image which is free from noise. Part (b) shows the noise which is to be
added to the image in part (a), but since it is zero mean Gaussian noise and
we must display the images in gray scale range (0-255), while displaying,
there is an additional bias of 30 for better visibility. Had the bias not been
there, all the points below zero in noise would not be visible. Part (c) shows
the probability density function (histogram) of the noise to be added, which
clearly appears to be Gaussian noise with zero mean. It is generated from
Gaussian distribution only, which will become clear from Code 7.1, which is
used to generate it. The shape is jittery because we have a limited number of
pixels in the original image. Had it been an image with an infinite number of
pixels, the shape would be perfectly Gaussian. Part (d) shows the noisy
image. Notice the granularity this noisy image possesses in comparison to
the original one.
In practice, we will have images captured from the real world that are
corrupted by noise. Their noise distributions will not be known beforehand.
So, our objective will be to know what distribution it is and how to deal with
it. To understand that we are first going to add a known noise to images and
develop the basics from there.
Code 7.1 illustrates the Python code for adding noise from various
distributions to the original image like Gaussian, Rayleigh, Erlang (Gamma),
exponential, and uniform distribution. Let us now understand the code and
discuss these distributions and their parameters.
In line number 17 of the code, we select the noise distribution from which
the noise is to be generated. From line number 19 to 42, the match case
ladder will create the selected noise by making selections according to the
choice made in line number 17. np.random.normal function will take the
parameters of the distribution selected, and the last argument as [r,c] which
is the number of rows and columns in the synthesized noise image
syn_noise. However, syn_noise might contain a few values outside the
grayscale range, and so, to display, we clip/trim it back to the grayscale
range, and we use line number 47 to 49. Then, we will create the noisy
image in line number 78 by adding the noise to the image (as we are only
bothered about additive noise as of now).

Figure 7.2: Addition of Gaussian noise (mean=0, sigma=60) to grayscale image


Also, note that while displaying the histogram in line number 68, the sample
at zero is not considered as due to clipping/trimming the values in grayscale
range, its values might increase beyond the true value. The rest of the code is
self-explanatory. It is well commented to guide you through.
01-
#=====================================================
=================
02- # PURPOSE : Adding noise of various distributions to the image [PART
1]
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- import my_package.my_functions as mf # This is a user defined package
and ...
06- # one may find the details related to its contents and usage in section
2.7.3
07-
08- #----------------------------------------------------------------------
09- # Importing image
10- #----------------------------------------------------------------------
11- img=np.float32(cv2.imread('img2.bmp',0))
12- r,c=np.shape(img)
13-
14- #----------------------------------------------------------------------
15- # Choosing noise to be added
16- #----------------------------------------------------------------------
17- select_noise_distribution=4 # (CHOOSE NOISE DISTRIBUTION
HERE)
18- # note that zero mean noise distribution is assumed wherever applicable
19- match(select_noise_distribution):
20- case 1:
21- method_name='Gaussian Noise'
22- mu=0 # Mean of the Gaussian (keep it 0)
23- sigma=60 # standard deviation of the Gaussian
24- syn_noise=np.random.normal(mu, sigma, [r,c])
25- case 2:
26- method_name='Rayleigh'
27- mode=50 # Most occouring value in Rayleigh distribution
28- syn_noise=np.random.rayleigh(mode, [r,c])
29- case 3:
30- method_name='Erlang (Gamma)'
31- shape=5 # Controls shape of Gamma distribution (must be +ve)
32- scale=15 # Controls spread of distribution
33- syn_noise=np.random.gamma(shape, scale, [r,c])
34- case 4:
35- method_name='Exponential'
36- scale=50
37- syn_noise=np.random.exponential(scale, [r,c])
38- case 5:
39- method_name='Uniform'
40- low_lim=50
41- high_lim=150
42- syn_noise=np.random.uniform(low_lim, high_lim, [r,c])
43- syn_noise2=syn_noise.copy() # Storing a copy of noise for later use
44- #----------------------------------------------------------------------
45- # Trimming the noise pixel values outside 0-255 range for display only
46- #----------------------------------------------------------------------
47- syn_noise[syn_noise>255]=255 # To clip noise values above 255
48- syn_noise[syn_noise<0]=0 # To clip noise values below zero
49- syn_noise=np.uint8(syn_noise) # not mf.norm_uint8 because that
will
50- # stretch max value to 255
51- fig,ax=plt.subplots(2,2)
52- fig.show()
53- mf.my_imshow(mf.norm_uint8(img),'(a) Grayscale Image',ax[0,0])
54- mf.my_imshow(syn_noise+30,'(b) '+method_name+' (bias of
30)',ax[0,1])
55- # syn_noise+30 in above line because otherwise noise will be
56- # added to black image and hence will not be clearly visible
57- # this is for display purposes only
58-
59- #----------------------------------------------------------------------
60- # Plotting normalized histogram of noise distribution
61- #----------------------------------------------------------------------
62- no_of_bins=256 # This defines the total no. of points on X axis of
histogram
63- X_axis=255*np.arange(0,no_of_bins,1)/(no_of_bins-1)
64- # Corresponding values on Y axis
65- hist_values=cv2.calcHist([syn_noise],[0],None,[no_of_bins],[0,256])
66-
67- normalised_hist=hist_values/(r*c)
68- ax[1,0].plot(X_axis[1:255],normalised_hist[1:255])
69- # above is a probability density function
70- ax[1,0].grid()
71- ax[1,0].set_title('(c) '+method_name+' distribution')
72- ax[1,0].set_xlabel('Grayscale Values')
73- ax[1,0].set_ylabel('probability')
74-
75- #----------------------------------------------------------------------
76- # Adding noise to the image and displaying
77- #----------------------------------------------------------------------
78- noisy_image=np.float32(img)+np.float32(syn_noise2) # adding un-
trimmed version of noise
79- noisy_image[noisy_image>255]=255
80- noisy_image[noisy_image<0]=0
81- mf.my_imshow(mf.norm_uint8(noisy_image),'(d) Noise added to
image',ax[1,1])
82-
83- plt.show()
84- print("Completed Successfully ...")
Code 7.1: Code to add various kind of noises - PART 1
Now, let us talk about the individual distributions and their parameters in the
order they appear in the code.

7.3.2 Gaussian noise


We will first see a Gaussian distribution (for different parameters) in Figure
7.3. This distribution is also called normal distribution. Remember that we
are now talking about a distribution rather than a function.

Figure 7.3: Gaussian distribution for four different sets of parameters


There are two parameters: mean (µ) and standard deviation (σ). It can be
noted that curves with the same meaning have the same center position (i.e.,
the position of peak of curve on the x-axis). However, their thickness may
differ. Secondly, curves with the same variance have the same thickness
(irrespective of their x-position of peak of curve). Mean tells the position of
the center of the curve and variance – its thickness. Since the area bound by
the curve and the x-axis is constant, the wider the curve, the shorter (in
height) the peak and vice versa.
At this point, it is extremely important to understand the difference between
a function and distribution. A function (one-dimensional for now) is a listing
of values corresponding to the independent variable in order of independent
variable. Once defined, it has a definite shape. On the other hand,
distribution is the summary of given data (one-dimensional for now) which
tells us the number of times a particular value has been present in that data.
Let us take an example to understand the difference. A function is listed in
tabular form as follows:
Grade 3 3 3 2 3 1 3 2 2 3 3 3

Year 1 2 3 4 5 6 7 8 9 10 11 12

Table 7.1 : Data in form of function


The preceding table lists the 12 classes that a student passes to clear his
school studies. Also, assuming that there are only 3 grades, 1, 2, and 3,
which he can score, his/her grades are also shown. This is a function where
independent variable is year and dependent variable is grade. Now, look at
the following table:
Frequency 1 3 8

Grade 1 2 3

Table 7.2 : Data in form of frequency table


The preceding table tells you how many times (in his/her school studies) the
student has received grade 1, 2, or 3. This is called distribution. Note that in
distribution, we do not have any information about when (in which year) the
student received those grades.
Having understood the above, we will return to Gaussian distribution as
shown in Figure 7.3 (let us specifically take one out of four shown – let us
say mean =100 and standard deviation = 10); we must understand that this
is a plot for distribution. If we talk about the image and the process of noise
addition, the selected noise curve in the figure tells us that most pixels will
have a noise value of 100, as that is the mean or peak of the curve. The
number of pixels having a noise value of 200 is negligible (as can be read
from the curve). However, the pixels that will have a 100-noise value in
space are not fixed.
Also, note that we are interested in 0 mean Gaussian noise only as we do not
want to change the average (DC) value of the image. The Gaussian
probability density function is mathematically represented below:
Equation 7.1:

Here, x is the intensity value in the range 0-255 for a grayscale image, p(x) is
the probability for a given intensity x. Since p(x) is probability, its value lies
in the interval [0,1] and ∫p(x)dx=1 because it is the probability density
function. One may check this fact by running the following command on
Python shell after running Code 7.1 – np.sum(normalised_hist).
Conventionally, instead of using σ, σ2 is used as a parameter for Gaussian
distribution. σ2 is called variance.
The primary causes of Gaussian noise in images are at the stages of
acquisition, where low illumination, sensor temperature, and internal circuits
note this noise. Gaussian distribution occurs very frequently in various
aspects of nature.

7.3.3 Rayleigh noise


The probability density function of Rayleigh distribution is given below:
Equation 7.2:

where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability that for a given intensity x, s is the scale parameter (or
mode, i.e., the most frequently occurring value). Although the above
equation appears like a Gaussian with 0 mean and absence of √2π in the
denominator (scaling factor), note that in the denominator of scaling factor,
s2 is outside the square root sign, but in Gaussian, σ2 was inside the square
root sign. This causes differences in the shapes of both curves. One may note
the shape of Rayleigh distribution for different values of s shown in Figure
7.4:
Figure 7.4: Rayleigh distribution for different values of scale parameter
s is called the scale parameter. The higher the value of scale, the lower the
peak and the greater the thickness of the distribution. Also, the peak shifts
rightwards for a higher scale parameter value. Note that, unlike a Gaussian
distribution, the curves are not symmetric about the vertical axis passing
through their peaks.
Figure 7.5: Rayleigh noise (mode=30) in images
Figure 7.5 shows an image with Rayleigh noise in part (d). This figure is
generated by using Code 7.1 by setting select_noise_distribution=2, due to
which np.random.rayleigh is called. One may play with the scale (mode)
parameter and note the difference in results.

7.3.4 Erlang noise


The probability density function of Erlang (also called Gamma) distribution
is given below:
Equation 7.3:

where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability tor a given intensity x. k is the shape, ϴ is the scale, and Γ
is the Gamma function. There are two parameters: shape (ϴ) and scale (Γ).
Figure 7.6 shows the effect of adding synthesized Gamma noise onto the
image:

Figure 7.6: Gamma noise (shape=5, scale=15) corrupted image


One may try to play with the shape and scale parameter in Code 7.1, line
number 31 and 32 respectively (set select_noise_distribution=3 in line
number 17 to use Gamma noise). Increasing the shape shifts the position of
peak on x-scale and increasing the scale increases the width of lobe.

7.3.5 Exponential noise


The probability density function of exponential distribution is given below:
Equation 7.4:

Where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability tor a given intensity x. β is the scale parameter.
Alternatively, , where λ is called as rate parameter. Figure 7.7 depicts
the effect of adding exponential noise to the input image:
Figure 7.7: Exponential noise (scale=50) corrupted image
One may try to play with the scale parameter in Code 7.1, line number 36
(set select_noise_distribution=4 in line number 17 to use exponential
noise). Increasing the scale increases the spread.

7.3.6 Uniform noise


The probability density function of uniform distribution is given as follows:
Equation 7.5:

Where x is the intensity value in the range 0-255 for a grayscale image, p(x)
is the probability tor a given intensity x. (a,b) is the interval where the value
remains constant. is the lower limit, and b is the higher limit. Figure 7.8
shows an image corrupted by uniform noise.
Figure 7.8: Uniform noise (lower limit = 50 and higher limit =150) corrupted image
One may try to play with the low and high limit parameter in Code 7.1, line
number 40 and 41 (set select_noise_distribution=45 in line number 17 to
use uniform noise).

7.3.7 Salt and pepper noise


In Section 6.2, we have informally introduced salt and pepper noise and the
method of removal of this noise by median filtering in spatial domain. The
following equation defines the salt and pepper distribution as:
Equation 7.6:

where x is the intensity value (in the range [0,255] for a grayscale image or
in the range [0,2k-1] for a k bit image), p(x) is the probability for a given
intensity x. Ps is the probability for a pixel being salt, and Pp is the
probability that the pixel is pepper.
As explained earlier, salt and pepper noise are called so because it manifests
as the brightest intensity and lowest intensity spots in the image. An example
is shown in Figure 7.9. As the original image is an 8-bit grayscale image,
salt noise will correspond to an intensity 28-1=255, and pepper noise will
correspond to 0. Both salt and pepper are mixed in equal amounts. However,
one may have only salt or only pepper noise or a mixture of them in any
other proportion too.
Figure 7.9 shows one such example where, to the original image, we have
added salt and pepper noise (in equal proportion) such that 10% of the total
image pixels are noise pixel. To reproduce these results, one may use Code
7.2. This code is similar to Code 7.1 in the sense that it will also help you
add different noise types to a given image. However, in Code 7.1, we have
used numpy library to add random noise; here in Code 7.2, we have used
scikit-image library for the same. You may install scikit-image library using
the procedure illustrated in Section 2.3 by using the pip install scikit-image
command.

Figure 7.9: Salt and pepper noise addition to a 8-bit grayscale image
Also, note that the random_noise function in this package expects the image
to be normalized in the range of [0,1]. Further, it adds noise to the image and
returns the noise added image itself. In the earlier case (Code 7.1 and
random_noise function from numpy library), neither was the input
expected in the range of [0,1] intensity values nor did it return the noise
added image – it returns noise, which we must manually add to the image.
01-
#=====================================================
=================
02- # PURPOSE : Adding noise of various distributions to the image [PART
2]
03-
#=====================================================
=================
04- import cv2,matplotlib.pyplot as plt, numpy as np
05- # (install the package below using 'pip install scikit-image'
06- # as per the procedure illustrated in section 2.3)
07- import scipy.ndimage as sp
08- from skimage.util import random_noise
09- import my_package.my_functions as mf # This is a user defined package
and ...
10- # one may find the details related to its contents and usage in section
2.7.3
11-
12- #----------------------------------------------------------------------
13- # Importing and normalising the image to the range [0,1]
14- #----------------------------------------------------------------------
15- input_img=cv2.imread('img2.bmp',0)
16- img=np.float64(input_img)/255
17- # Pixel values above are normalised to range 0 to 1
18- # This is the requirement of 'random_noise' function imported above
19-
20- #----------------------------------------------------------------------
21- # Choosing noise to be added and adding noise
22- #----------------------------------------------------------------------
23- select_noise_distribution=1 # (CHOOSE NOISE DISTRIBUTION
HERE)
24- # note that zero mean noise distribution is assumed wherever applicable
25- match(select_noise_distribution):
26- case 1:
27- method_name='Salt & Pepper'
28- noisy_img = random_noise(img, mode='s&p',amount=.1)
29- case 2:
30- method_name='Gaussian'
31- noisy_img = random_noise(img,
mode='gaussian',mean=0,var=.05,clip=True)
32- case 3:
33- method_name='Poisson'
34- noisy_img = random_noise(mf.norm_uint8(255*img),
mode='poisson',clip=True)
35- # The Poisson distribution is only defined for positive integers. To
apply
36- # this noise type, the number of unique values in the image is found
and
37- # the next round power of two is used to scale up the floating-point
result,
38- # after which it is scaled back down to the floating-point image range.
39- case 4:
40- method_name='Speckle'
41- noisy_img = random_noise(img,
mode='speckle',mean=0,var=.1,clip=True)
42-
43- #----------------------------------------------------------------------
44- # Displaying
45- #----------------------------------------------------------------------
46- fig,ax=plt.subplots(1,2)
47- fig.show()
48- mf.my_imshow(input_img,'(a) Grayscale Image',ax[0])
49- mf.my_imshow(mf.norm_uint8(noisy_img),'(b) Noisy image
('+str(method_name)+')',ax[1])
50-
51- plt.show()
52- print("Completed Successfully ...")
Code 7.2: Code to add various kind of noises - PART 2
In line number 28 of the code, three arguments have been passed to the
function random_noise. The first is the image onto which noise is to be
added. This image should have pixel values in the range [0,1]. The second
argument is the mode, which tells us which method is to be used, and the
third argument is the amount (specifically for salt and pepper noise), which
tells what percentage of total pixels should be affected by noise in the output
image. The output of this function is again in the range [0,1] if clip=True is
used as one of the arguments of the function in the end. Otherwise, it may
have a range a little larger than that depending upon the addition of noise to
high and low-intensity values in the image. Note that this function adds
noise to the input image itself. No separate addition of the output of this
function to the original image is needed.
Using Code 7.2, one may also add Gaussian, Poisson, and Speckle noise –
learning their distributions is left as an exercise for the reader. However, by
using the code, one can simulate and get the noisy image for further
processing.
The following table shows typical scenarios where the noise distributions
discussed above appears:
Noise
S. No. Typical use case
distribution

Found in low-light or high-ISO images where the sensor amplifies


1 Gaussian noise both signal and noise. Often manifests as random grain or speckles in
the image.

Common in images captured in low-light conditions or in radar


2 Rayleigh noise imaging. Appears as uneven noise distribution, particularly in images
with a lot of uniform regions.

Rarely encountered in common photographic scenarios, but can appear


3 Erlang noise in specialized fields like communication systems and some modeling
of optical or satellite imagery.

Common in medical imaging, especially in scenarios like MRI or PET


4 Exponential noise scans. It can also appear in low-light photography or sensor
calibration.
Noise
S. No. Typical use case
distribution

Typically seen in low-quality sensor data, it appears as grainy, evenly


5 Uniform noise distributed speckles across an image. Can occur in image compression
artifacts or synthetic noise generation.

Frequently found in digital images that have been corrupted due to


6 Salt and pepper transmission errors or sensor malfunctions. It appears as random white
and black pixels scattered throughout the image.

Table 7.3 : Noise distributions and typical scenarios

7.4 Detection of noise in images


If the noise is produced due to known causes like camera/sensor, wiring,
etc., one can design a test image like the one shown in Figure 7.10. It has
patches of constant gray values. When noise is added to such an image
(either artificially or due to other causes) while capturing, it can be estimated
by seeing the histogram. To understand the procedure, refer to Figure 7.11.
Part (a) of the figure shows a noisy image. Part (b) shows the histogram of
noise-free image of Figure 7.10. Note that there are only four intensity
values: 50, 100, 150, and 200. The patches of grayscale value 50 and 100
have square shapes with equal size. The remaining two patches are
rectangular with unequal sizes. This explains the frequency in histogram in
part (b) of the figure:

Figure 7.10: Test image for noise detection


Part (c) of Figure 7.11 shows the probability density function of noise that is
added to the original image. Usually, this is unknown. So, to estimate it, we
see the histogram of the noisy image shown in part (d) of the figure. One
may note that, as compared to the histogram of the noise-free image, which
had four sharp spikes, we get the Rayleigh distribution shape centered at
those spike positions. This clearly lets the user identify what distribution is
present.

Figure 7.11: Illustration of estimating noise distribution and its parameters


The heights are proportional to the height of spikes (proportional to the area
that the corresponding gray level patch occupies in the image). However, the
width (on x-axis) is equal for all four and is equal to the original noise
distribution’s width in part (c) of the figure. This helps us estimate the mode
parameter for Rayleigh noise distribution. However, we are not detailing the
procedure for that here as there are many distributions for which we have
many parameters. Our prime interest is in eliminating this noise from the
image and most of the procedures that we will discuss are independent of
noise distributions. These methods depend on the extent up to which the
images are corrupted by noise.
Remember that in natural images, if one is interested in detecting noise
distribution by the above method, one must find patches of reasonably
uniform intensity. A histogram of those patches will give us the information
needed. Obtaining patches with uniform intensity is not always possible, so
the above method will only work in some cases. Due to this, in typical
scenarios, an image of constant intensity background is taken, and ambient
noise distribution is estimated.

7.5 Measurement of noise in images using PSNR


Noise in the images can be quantified only if a noise-free image is available.
In the next few lines we will understand the usage of noise removal when the
original image, which is noise free is already available to us. The answer is
for the testing and development of algorithms/methods that will try to knock
out the noise from images whose noise-free versions are not available. Peak
signal-to-noise ratio (PSNR) is a popular way of doing it in signal
processing. For two-dimensional data (image), it is defined as follows:
PSNR is defined as the ratio of maximum possible power of an image and
power of corrupting noise. Mathematically, it is represented below:
Equation 7.7:

where L is the number of maximum possible intensity levels in an image.


For a grayscale image of 8 bits, it is 256. MSE is mean squared error, and
RMSE is root mean squared error. MSE can be defined as shown below:
Equation 7.8:

where r and c represent the number of rows and columns in the image
respectively. is measured in dB. Higher the PSNR, lesser is the noise in the
image. The same can be confirmed from Figure 7.12 for Rayleigh noise with
increasing parameter (mode) value. Increasing mode means more and more
intensity values are affected by noise.
Code 7.3 illustrates the procedure to calculate the value between any to
images of same shape.
01-
#=====================================================
=================
02- # PURPOSE : Calculating PSNR between two images of same shape
03-
#=====================================================
=================
04- import cv2
05-
06- img1=cv2.imread('img2.bmp',0)
07- img2=cv2.imread('img24.bmp',0)
08-
09- psnr = cv2.PSNR(img1,img2)
10- print("PSNR is : "+str(psnr)+' dB')
11-
12- print("Completed Successfully ...")
Code 7.3: PSNR calculation between two images (of same shape)
The results are shown below:

Figure 7.12: Addition of Rayleigh distributed noise with different parameter (mode) values to the test
image
7.6 Classical methods of noise removal
Assuming that only noise is present in the given image (and no other
disturbance), classical methods to remove noise include spatial domain
filtering. Here, arithmetic mean filtering, geometric mean filtering, harmonic
mean filtering, and contra-harmonic mean filtering methods are discussed.
The equations for these are given as follows:
Equation 7.9:

Equation 7.10:

Equation 7.11:

Equation 7.12:

Equations 7-9 to 7-12 are similar in form. nbd represents the neighborhood
of a pixel of some predefined size. For Q=0 in contra-harmonic filter, it
becomes an arithmetic mean filter and for Q=-1, it becomes harmonic mean
filter. The neighborhood size of the pixel under consideration is mxn.
All the above filters try to remove the noise by using its statistical properties
from a given neighborhood. However, because of different forms, some
filters work better than others in different cases and for different noise
distributions. For example, an arithmetic mean filter removes noise but
introduces blurring. The geometric mean filter does the same thing, but the
blurring is less. The harmonic mean filter works well with salt noise but fails
with pepper noise. It does well with other types of noise though.
The result of applying these equations is shown in Figure 7.13 with the code
for generating the results in Code 7.4:

Figure 7.13: Classical noise removal methods


The code for the same is given below:
001-
#=====================================================
=================
002- # PURPOSE : Learning Classical Noise Removal Methods
003-
#=====================================================
=================
004- import cv2,matplotlib.pyplot as plt, numpy as np
005- from skimage.util import random_noise
006- import my_package.my_functions as mf # This is a user defined
package and ...
007- # one may find the details related to its contents and usage in section
2.7.3
008-
009- #----------------------------------------------------------------------
010- # Importing and normalizing the image to the range [0,1]
011- #----------------------------------------------------------------------
012- input_img=np.float64(cv2.imread('img20.bmp',0))
013- img=input_img/255
014- # Pixel values above are normalized to range 0 to 1
015- # This is the requirement of 'random_noise' function imported above
016-
017- #----------------------------------------------------------------------
018- # Choosing noise to be added and adding noise
019- #----------------------------------------------------------------------
020- method_name='Gaussian'
021- noisy_img = random_noise(img,
mode='Gaussian',mean=0,var=.01,clip=False)
022- # clip=False in above line because if it were true, the noise will be
023- # a mixture of Gaussian and Salt-and-Pepper Noise
024- noisy_img=mf.norm_uint8(noisy_img)
025- noisy_img[np.where(noisy_img==0)]=1
026- noisy_img=np.float64(noisy_img)
027-
028- #----------------------------------------------------------------------
029- # Choosing the neighborhood size for processing
030- #----------------------------------------------------------------------
031- filter_size=3 # Keep this odd
032- half_filter_size=(filter_size-1)/2
033-
034- #----------------------------------------------------------------------
035- # Arithmetic Mean Filtering
036- #----------------------------------------------------------------------
037- output_img_Arithmetic=np.float64(np.zeros_like(input_img))
038- # Above is an array of same size as input_img filled with all zeros
039- r,c=np.shape(output_img_Arithmetic)
040- for i in np.arange(half_filter_size,r-half_filter_size,1):
041- for j in np.arange(half_filter_size,c-half_filter_size,1):
042- output_img_Arithmetic[np.int16(i),np.int16(j)]=np.average(\
043- noisy_img[np.int16(i-half_filter_size):\
044- np.int16(i+half_filter_size+1),\
045- np.int16(j-half_filter_size):\
046- np.int16(j+half_filter_size+1)])
047-
048- #----------------------------------------------------------------------
049- # Geometric Mean Filtering
050- #----------------------------------------------------------------------
051- output_img_Geometric=np.float64(np.zeros_like(input_img)) # array
of same size as input_img
052- # filled with all zeros
053- r,c=np.shape(output_img_Geometric)
054- for i in np.arange(half_filter_size,r-half_filter_size,1):
055- for j in np.arange(half_filter_size,c-half_filter_size,1):
056- output_img_Geometric[np.int16(i),np.int16(j)]=np.prod(\
057- noisy_img[np.int16(i-half_filter_size):\
058- np.int16(i+half_filter_size+1),\
059- np.int16(j-half_filter_size):\
060- np.int16(j+half_filter_size+1)])
061-
output_img_Geometric=np.power(output_img_Geometric,1/(filter_size**2))
062-
063- #----------------------------------------------------------------------
064- # Harmonic Mean Filtering
065- #----------------------------------------------------------------------
066- output_img_Harmonic=np.float32(np.zeros_like(input_img)) # array of
same size as input_img
067- # filled with all zeros
068- r,c=np.shape(output_img_Harmonic)
069- for i in np.arange(half_filter_size,r-half_filter_size,1):
070- for j in np.arange(half_filter_size,c-half_filter_size,1):
071- output_img_Harmonic[np.int16(i),np.int16(j)]=np.sum(\
072- 1/noisy_img[np.int16(i-half_filter_size):\
073- np.int16(i+half_filter_size+1),\
074- np.int16(j-half_filter_size):\
075- np.int16(j+half_filter_size+1)])
076- output_img_Harmonic[np.where(output_img_Harmonic==0)]=1
077- output_img_Harmonic=(filter_size**2)/output_img_Harmonic
078-
079- #----------------------------------------------------------------------
080- # Contraharmonic Mean Filtering
081- #----------------------------------------------------------------------
082- Q=5 # Order of contraharmonic filter
083- output_img_Contraharmonic=np.zeros_like(input_img) # array of same
size as input_img
084- # filled with all zeros
085- r,c=np.shape(output_img_Contraharmonic)
086- noisy_img=np.float32(noisy_img)
087- for i in np.arange(half_filter_size,r-half_filter_size,1):
088- for j in np.arange(half_filter_size,c-half_filter_size,1):
089- output_img_Contraharmonic[np.int16(i),np.int16(j)]=np.sum(\
090- np.power(noisy_img[np.int16(i-half_filter_size):\
091- np.int16(i+half_filter_size+1),\
092- np.int16(j-half_filter_size):\
093- np.int16(j+half_filter_size+1)],Q+1))/\
094- np.sum(\
095- np.power(noisy_img[np.int16(i-half_filter_size):\
096- np.int16(i+half_filter_size+1),\
097- np.int16(j-half_filter_size):\
098- np.int16(j+half_filter_size+1)],Q))
099- #----------------------------------------------------------------------
100- # Displaying the results
101- #----------------------------------------------------------------------
102- fig,ax=plt.subplots(2,3)
103- fig.show()
104- mf.my_imshow(mf.norm_uint8(input_img),'(a) Grayscale
image',ax[0,0])
105- mf.my_imshow(mf.norm_uint8(noisy_img),'(b) Noisy image
('+str(method_name)+')',ax[0,1])
106- mf.my_imshow(mf.norm_uint8(output_img_Arithmetic),'(c)
Arithmetic mean filtered',ax[0,2])
107- mf.my_imshow(mf.norm_uint8(output_img_Geometric),'(d) Geometric
mean filtered',ax[1,0])
108- mf.my_imshow(mf.norm_uint8(output_img_Harmonic),'(e) Harmonic
mean filtered',ax[1,1])
109- mf.my_imshow(mf.norm_uint8(output_img_Contraharmonic),'(f)
Contraharmonic mean filtered',ax[1,2])
110-
111- plt.show()
112- print("Completed Successfully ...")
Code 7.4: Classical noise removal methods coding
It is worthy to note about Code 7.4. that the arithmetic mean filter code is
equivalent to box filtering introduced earlier in Section 4.8.1. The earlier
implementation was much more computationally efficient. Similarly, the
geometric mean filter could be thought of as an arithmetic mean filter
applied to the logarithm of image instead of original image. This also
enables more computationally efficient implementation.
7.7 Adaptive filtering
The noise removal methods read so far are global in nature. Every pixel was
treated in the same way by the algorithm. However, depending on the
statistics of the region, this should change so that better results can be
obtained. To elaborate the previous statement, a region with almost constant
intensity should be treated differently as compared to region with dominant
texture when it comes to noise removal – so the filter should accordingly
change its form for these different regions. This is called adaptive filtering.
One form of adaptive local noise reduction filter is shown below:
Equation 7.13:

Where σn2 is the noise variance. This parameter is usually unknown for a
noisy image if the noise is due to unknown sources but can be easily
estimated by the procedure illustrated in Section 7.4. σL2 and μL are the
variance and mean of the local neighborhood of shape mxn considered. Refer
to the following figure for a better understanding:
Figure 7.14: Adaptive filtering vs. arithmetic mean filtering

μL tells us the average intensity of the local neighborhood, and σn2 is related
to the contrast of the neighborhood. If σn2=0, then the filter should return the
value of pixel as it is. If the σL2 is high relative to σn2, a value closet to the
noise_img(i,j) should be returned. This region is associated with high-
frequency content like edges, sharp boundaries, corners, etc., and must be
retained as is. If the two variances are equal, an arithmetic mean is desired as
output. This situation occurs when the neighborhood of the current pixel has
the same properties as the overall image. Hence, averaging is a good idea to
remove the noise.
The result of applying Equation 7.13 is shown in Figure 7.14. The results
can be obtained by a code like Code 7.4 with the necessary modifications as
dictated by Equation 7.13. One can observe that local adaptive filtering
achieves better results.

Conclusion
Noise is an unwanted phenomenon in images which cannot be completely
removed. It can, however, be minimized. In this chapter, some methods of
identifying noise distribution were presented. Also, using PSNR, one can
quantify the noise content in the image. Depending on the type of noise
added, one could select a noise removal (minimization) method to remove
the given noise. This chapter enabled the user to take an informed decision
on which method to choose. In the next chapter, multiresolution image
processing is introduced where the topic of noise minimization is further
taken up as an application.

Points to remember
• Noise, once added, cannot be completely removed. It can only be
minimized using various methods.
• Noise can have different distributions depending on how it is generated.
• Salt and pepper noise requires median filtering, which is non-linear in
nature.
• There is no correlation between PSNR and the visual quality of the
image.
• Local noise removal methods like adaptive noise removal work better
than global methods.
• For in a contra-harmonic filter, it becomes an arithmetic mean filter, and
for , it becomes a harmonic mean filter.

Exercises
1. How is noise different from degradation?
2. Can non-linear filtering be achieved by the process of convolution?
Give reasons to support your answer.
3. If Gaussian noise has a finite non-zero mean, what will be its effect on
the appearance of the image?
4. If PSNR is low for an image, what can you conclude about the
appearance of that image?
5. Import an image using Python, add Gamma distributed noise, and
denoise the image so formed. Calculate the PSNR in both cases, i.e.,
noisy and noise minimized case.

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
CHAPTER 8
Wavelet Transform and Multi-
resolution Analysis

8.1 Introduction
This chapter is an informal introduction to the theory of wavelet transforms
and multi resolution analysis. We will try to understand what resolution is
and how traveling from one resolution to another, some information
becomes important or unimportant. We will then point out limitations of the
Fourier family of transforms, and from there, we will try to understand
wavelet transforms. Finally, we will try to put wavelet transform to use by
discussing an application on noise removal.

Structure
The following topics are covered in this chapter:
• Resolution and its impact on frequency content
• Time vs. frequency resolution
• Loss of location information in frequency domain
• Short time Fourier transform
• Concept of scale and scalogram
• Continuous wavelet transform
• Discrete wavelet transform
• Multi resolution analysis using wavelets
• Noise removal using multi resolution analysis

Objectives
After reading this chapter, the reader will be able to appreciate the
importance of resolution (or scale) in noise calculations. Wavelets are
introduced to break an image at multiple scales and do processing
individually. This will enable the reader to understand the importance of the
information contained in the image at a given resolution.

8.2 Resolution and its impact on frequency content


Consider the following scenario. There is an image, and two experiments are
performed on it. In the first case, we down sample it by taking every 3rd
pixel (this number 3 is arbitrary). In the second case, the same thing is done,
but just before down sampling, the image is lowpass filtered. In which case
does one expect the result to be better? Obviously, the first one is because
the information is only lost in down sampling but in the second case the
information is lost due to low pass filtering as well as down sampling. This
way we get the wrong answer. To understand why, see Figure 8.1:
Figure 8.1: Relation between resolution and frequency
In part (a) of this figure, the original grayscale image is shown. It is down
sampled by a factor of 3 in part (b). Note that although the images in part
(a) and (b) appear to be of same size, they are not. For display purposes, they
have been shown to be of equal size by stretching. One may note their
shapes written alongside (in the title of respective sub-figures). Part (c)
shows the result of low pass filtering the image in part (a). In part (d), the
image in part (c) is down sampled by the same size as it was for part (b). So,
experiment 1 is conversion of part (a) to (b) by simple down sampling, and
experiment 2 is the conversion of (a) to (c) and then to (d) – i.e., low pass
filtering followed by down sampling.
Now, compare the result images in part (b) and (d). Image in part (d) is
visually better – this is against the logic we posed in answering the question
above! The reason lies in resolution and the capacity of resolution to depict
certain frequencies. Let us first understand what resolution is. For a given
area to be photographed, if the number of pixels are more, we are at higher
resolution. For example, in part all the parts of the Figure 8.1, the same area
has been filmed but part (a) and (c) have larger shapes (a greater number of
rows and columns i.e., a greater number of pixels per unit area). Hence, it is
at higher resolution as compared to parts (b) and (d). A higher resolution has
a capacity to capture finer details (high frequency regions or regions where
there are edges). However, a lower resolution cannot have them. In fact,
whenever we move from higher to lower resolution, we must remove the
high frequency content of the image before down sampling – simply because
lower resolution cannot depict it properly. That is what we have done in part
(d) of the image by low pass filtering part (c) and then down sampling. This
is precisely the reason for its better appearance.
From the above discussion, it becomes clear that there exists a direct relation
between resolution and frequency. The higher the resolution, higher is the
frequency content it can depict (low pass content of course can be depicted
too).

8.3 Time vs. frequency resolution


Figure 8.2 shows three Fourier transform pairs. The first row has a sine
signal for the entire duration, the second row has the same sine frequency
but for a shorter duration of time, and the third signal too has the same sine
signal but for the smallest duration of time:

Figure 8.2: Time vs. frequency resolution (every row is a Fourier transform pair)
Since the sine frequency is same in all the three rows, one would expect the
same Fourier transform; however, that is not the case. In the first row, we see
an ideal representation. When the signal is truncated in time in the second
row, the peaks in the frequency domain remain at the same place, but two
changes are observed. First, some higher frequency components also appear
in frequency domain – this is due to abrupt truncation in time domain.
Secondly, the amplitude of peaks in the frequency domain is reduced from
0.5 to some smaller value. This is due to the principle of conservation of
energy. The energy of higher frequency components that have started to
appear now because of abrupt truncation comes from the main peak’s
energy.
In the third row, these two effects are prolonged. Additionally, one may note
that instead of having a pinpointed frequency spike in the frequency domain,
the base has thickened for the same reasons discussed above.
Having made these observations, it is important to note this interpretation –
whenever we pinpoint in frequency, time resolution becomes poor and vice
versa. Look at the first row of the figure again; by pinpointing in frequency,
we mean the presence of two pinpointed spikes in a double-sided magnitude
spectrum. By poor time resolution, we mean that the entire time duration is
selected (the signal is not truncated or pinpointed). As against this, look at
the third row; there, we pinpoint in time (i.e., a very small portion of sine
signal is selected as compared to the first row) – due to this, the frequency
spikes have broadened – i.e., frequency resolution has become poor. So,
there exists a tradeoff between time and frequency resolution – one cannot
have both good at the same time.
Having understood this important concept, let us now see another important
topic on the limitation of what Fourier family of transforms cannot do in the
coming section.

8.4 Loss of location information in frequency domain


Look at Figure 8.3. It simply shows the Fourier transform pair of a signal.
However, in the left column of the figure, the constituent parts of the signal
in the time domain are shown. sig1 has the lowest frequency and sig4 has the
highest:
Figure 8.3: Magnitude spectrum of a stationary signal
The signal shown in time domain in Figure 8.3 is stationary in nature. This
means that signal properties like mean, variance etc., will remain constant
with respect to time. This can also be seen directly from the fact that the
signal is periodic. If only the magnitude spectrum is shown to us, we will
only know by seeing that four frequency components are present. We cannot
ask this question – where are those frequency components present in the
time domain (their localization)? As observed from the figure in the current
case, every frequency component is present everywhere. To contrast this,
look at Figure 8.4. Here, the constituent signals (shown on left) have the
same frequencies but are limited in time:
Figure 8.4: Magnitude spectrum of a non- stationary signal
Due to this, the signal in time domain looks different. Also, its statistical
properties like mean, variance, etc., change over time due to this obvious
difference in construction as compared to Figure 8.3. However, it is
important to note that the magnitude spectrum shown in Figure 8.4 has the
same form as in the previous case – it has peaks at four frequencies, which
are its constituents. The other frequencies also have non-zero but very small
values because there is an abrupt truncation from one frequency to another at
a point in time in the non-stationary signal, as can be seen from the figure
(the reason for this was explained in the previous section). The highest
magnitude is not 1 in magnitude spectrum but is approximately the same. In
effect, apart from a scale factor, the magnitude spectrum remains the same
for both stationary and non-stationary signals. This is expected as we have
already stated that the information of localization (i.e., where the frequency
component is present in the time domain) is lost in frequency domain.
However, we would like to have a situation where we can ask the two
questions simultaneously – What frequencies are there in my signal? and
where are they present in the signal in time domain? We want to do so
because while filtering out the signal, if it is known where the frequency
component that we want to filter out is located, we could apply a local
operation instead of global one. To answer such type of questions
simultaneously, we need to learn for a short time Fourier transform, and
wavelet transform. In the next section, we informally introduce a short time
Fourier transform (STFT).

8.5 Short time Fourier transform


To solve the problem of loss of information of localization in frequency
domain, STFT was introduced. The idea here is to answer the two questions
simultaneously – What frequencies are present in a given signal, and where
are they present? This is achieved by taking the Fourier transform (FFT for
discrete time case) of the signal in parts instead as a whole. To understand
this procedure, refer to Figure 8.5:

Figure 8.5: STFT for small window size. In part (b), darker regions have higher amplitudes
Part (a) of Figure 8.5 is a non-stationary signal comprised of different
frequencies – 40, 30, 20, and 10 hertz in various parts (in order). Part (b) of
the figure shows STFT of the signal in part (a). Note that for a 1D signal,
STFT will be 2D. The way we take STFT is simple – Take a window (as
shown in part (a) of the figure). The output will be computed every time.
This can be noted by the fact that the X axis of both parts is time. For a
given point in time, to calculate the output, place that window on the signal
such that it is centered at that point. Now, whatever portion is inside the
window, take its FFT, and that becomes the output for that point. Since the
output will also be a 1D signal (having both magnitude and phase), we plot
the magnitude only as a column at corresponding time in the output. Note
that we are plotting a 2-sided spectrum. That is why there are two horizontal
spikes for the first part of signal in the output at -40 and 40. Also note that in
part (b) of the figure, the darker the region, the higher is the magnitude. The
same explanation is valid for the other three regions as well. Now, looking at
part (b), both questions that we began with can be answered simultaneously.
Time axis in STFT is the answer of where and frequency axis in STFT is the
answer of what frequencies are present.
Mathematically, continuous time STFT is calculated using the following
equation:
Equation 8.1:

where τ,w are respectively, the time and frequency parameters in STFT plot.
W() is the window function used. The discrete time STFT is represented
below:
Equation 8.2:

To answer the question of what window size should be used, let us look at
Figure 8.6. It has the same results as of Figure 8.5 but with larger window
size. Note the effect this change has on the STFT. Two changes are evident –
the first is the thinning of the horizontal spikes. The reason for this is the
duality between time and frequency domain. Roughly stated, if a signal is
thick in the time domain, it will be thin in frequency domain and vice versa.
Second, at the transition point where one frequency truncates and other
frequency begins in time, STFT shows a sharp transition in Figure 8.5 but
not in Figure 8.6. It appears that two frequencies are present at the transition
time – but that is not the truth. This is because when a thick window is
placed at the transition boundary in the time domain, it will have some part
of both signals. However, for a thin window, this effect will be minimized –
but never void. This can be seen in part (b) of Figure 8.5, as there is a small
region of overlap. However, in part (b) of Figure 8.6, this effect is
prolonged:

Figure 8.6: STFT for larger window size. In part (b), darker regions have higher amplitudes
The same can be noted from Figure 8.7, which has a 3D view of STFT for
the previous two cases discussed.
Figure 8.7: Three-dimensional view of STFT for small (Figure 8.5) vs. larger (Figure 8.6) window
size
To generate outputs like the ones discussed above, use the following code:
01-
#=====================================================
=================
02- # PURPOSE : Calculating and plotting STFT of a signal
03-
#=====================================================
=================
04- import cv2, matplotlib.pyplot as plt, numpy as np
05- import scipy.fft
06- import scipy.signal
07- from mpl_toolkits.mplot3d import Axes3D
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- #-----------------------------------------------------------------
12- # Setting parameters for signal construction
13- #-----------------------------------------------------------------
14- Fs=100 # Sampling Frequency
15- T=1/Fs # Sampling interval
16- L=1001 # Total no. of samples in signal
17- n=np.linspace(0,L-1,L) # Index of time
18- t=n*T # Time axis
19-
20- #-----------------------------------------------------------------
21- # Constructing a non-stationary signal in discrete time
22- #-----------------------------------------------------------------
23- cut1=np.int16(np.floor(L/4))
24- sig1=4*np.sin(2*np.pi*(40)*t[0:cut1])
25- sig2=3*np.sin(2*np.pi*(30)*t[cut1:2*cut1])
26- sig3=2*np.sin(2*np.pi*(20)*t[2*cut1:3*cut1])
27- sig4=1*np.sin(2*np.pi*(10)*t[3*cut1::])
28- sig=np.hstack((sig1,sig2,sig3,sig4)) # Non stationary signal
29-
30- #-----------------------------------------------------------------
31- # Plotting the non-stationary signal in time domain
32- #-----------------------------------------------------------------
33- fig,ax=plt.subplots(2,1)
34- fig.show()
35- ax[0].plot(t,sig,color='k')
36- ax[0].grid()
37- ax[0].set_title("(a) Input Signal",fontsize=12)
38- ax[0].set_xlabel("t",fontsize=12)
39- ax[0].set_ylabel("Amplitude",fontsize=12)
40-
41- #-----------------------------------------------------------------
42- # Plotting the magnitude spectrum of non-stationary signal (FFT)
43- #-----------------------------------------------------------------
44- freq_axis2=np.linspace(-(L-1)/2,(L-1)/2,L)/L
45- fft_sig=scipy.fft.fftshift(scipy.fft.fft(sig/L))
46- ax[1].plot(freq_axis2*Fs,np.abs(fft_sig),color='k')
47- ax[1].grid()
48- ax[1].set_title("(b) 2 Sided Magnitude Plot (FFT)",fontsize=12,color='k')
49- ax[1].set_xlabel("F",fontsize=12)
50- ax[1].set_ylabel("Magnitude",fontsize=12)
51-
52- #-----------------------------------------------------------------
53- # Plotting the magnitude spectrum of non-stationary signal (STFT)
54- #-----------------------------------------------------------------
55- fig2,ax2=plt.subplots()
56- f, t, Zxx = scipy.signal.stft(sig,Fs,window=np.ones(30),nperseg=30)
57- amp=np.max(np.abs(Zxx))
58- ax2.pcolormesh(t, f, np.abs(Zxx),cmap='gray_r')
59- ax2.set_title('STFT (Single Sided Spectrum)')
60- ax2.set_ylabel('Frequency [Hz]')
61- ax2.set_xlabel('Time [sec]')
62- ax2.grid()
63-
64- #-----------------------------------------------------------------
65- # Magnitude spectrum of non-stationary signal (STFT in 3D)
66- #-----------------------------------------------------------------
67- fig3d = plt.figure()
68- ax3d = fig3d.add_subplot(111, projection='3d')
69- T, F = np.meshgrid(t, f)
70- ax3d.plot_surface(T, F, 2*np.abs(Zxx), cmap='gray_r', edgecolors='k')
71- # In above, multiplication by 2 is due to single sided spectrum
72- ax3d.set_title('3D STFT Magnitude (SINGLE SIDED)')
73- ax3d.set_xlabel('Time [sec]')
74- ax3d.set_ylabel('Frequency [Hz]')
75- ax3d.set_zlabel('Magnitude')
76-
77- plt.show()
78- print("Completed Successfully ...")
Code 8.1: Inbuilt method for calculating and plotting STFT in Python
The output of the preceding code is shown in the following three figures:
Figure 8.8: Output 1 of Code 8.1 - non-stationary signal and its double sided FFT (magnitude plot)
The following figure shows the STFT view:

Figure 8.9: Output 2 of Code 8.1 - STFT Top view (single sided magnitude spectrum)
Refer to the following figure to understand STFT magnitude:
Figure 8.10: Output 3 of - 3D view of STFT (Single sided magnitude spectrum)
Most of Code 8.1 is self-explanatory. Line no. 56 which is f, t, Zxx =
scipy.signal.stft(sig,Fs,window=np.ones(30),nperseg=30) calculates STFT.
There are three outputs: frequency, time, and a 2D matrix of coefficients
(STFT). The inputs are signal, sampling frequency, then a window. In our
example, it was a rectilinear window. However, it could be any window as
one may please. In the current case, it is rectangular. If nothing is specified,
it defaults to the Hann window with the total number of samples specified by
the fourth argument nperseg . A rectilinear window is not the one that we
can use as already discussed it will cause abrupt truncation errors in the
frequency domain which will lead to fringes. There are other shapes, too,
with their own advantages and disadvantages but a detailed study of window
shapes is out of scope of this book. We recommend playing with line no. 56
to see the effect of changing window size and type.
Having seen the effect of window size in STFT, in the next section, we
introduce the concept of scale which will help us determine the window size
to be used.
8.6 Concept of scale and scalogram
To understand what we mean by scale, look at Figure 8.11. In each row of
the figure, the first column has the original signal, which has two different
frequency components – high and low, localized in time at different
(consecutive) locations. Also shown in the same column is a window
function (dark in color). The second column for each row shows the output
of convolution between the original signal and window function in column
no. 1. The third column is simply the two-sided magnitude spectrum of the
second column.
If the window size is small (for convolution and not for FFT as in STFT), we
call it low scale and if it is large, we call it high scale. Column no. 1 has low
to high scale windows from top to bottom.

Figure 8.11: Illustration for understanding the concept of scale


From the first row, we can make the following observations, since window
width is small, the scale is low. The result of convolution is as good as the
original signal - i.e., column no. 1 and 2 have same signals. In column no. 3,
the magnitude spectrum also reflects the same fact. This means that at low
scales, both high and low frequency components can be contained in the
output. Remember this was true for high resolution data (as per our
discussion in section 8.2). Scale and frequency appear to be inversely related
from the above discussion. We will explore more about this in coming
discission.
In the second row of Figure 8.11, the scale is increased by some amount (the
window becomes thicker as compared to first row). The result suppresses the
high frequency component as noted from column no. 2 and 3 of row 2. This
indicates that as we increase the scale (or equivalently lower the resolution),
the high frequency component starts to vanish, but low frequency
component remains robust (negligible change).
From row no. 3, we can see that when the scale is sufficiently high (or
resolution is sufficiently low), high frequencies vanish completely, but still,
low frequencies remain practically unaffected. Clearly, scale and resolution
show inverse dependence.
The second column of Figure 8.11 can be called a discrete scalogram (not
formally though). Note that in Figure 8.11, we had three rows where we
increased the scale in steps. It would be great if we could do that scale
increment gradually and see column no.2’s result stacked one over other
from the top. We will have a plot shown in part (b) of Figure 8.12 for the
signal of part (a) in the same figure. It is a two-dimensional representation (a
2D array) of the original signal providing scale and the content at that scale
information.
The signal in part (a) has four different frequencies in four different regions.
In the scalogram of part (b), the topmost row represents lowest scale
(smallest window width). So, the full signal will be present as it is in the
output. Hence, the first row is the signal itself seen from the top with white
color representing hills and black valleys.
Figure 8.12: Scalogram
As we increase the scale (coming down the row in scalogram), the high
frequency components start to disappear and low frequency components
remain robust (i.e., little change). This kind of representation is called a
scalogram or waterfall plot (as it looks like one). Also note that instead of
plotting the time on the X-axis, we have plotted the index of time (as the
signal is discrete time), also called translation in wavelet terminology. Even
for the continuous signals, time will be called translation in wavelet
terminology.
We are now in a position to answer the question about window size being
small or large. We will choose window size proportional to (as dictated by)
scale. If this is understood, it will be easier to understand continuous wavelet
transform in the next section. Readers are encouraged to produce the results
presented in Figure 8.12 using Python code.

8.7 Continuous wavelet transform


To understand wavelet transform, we refer to Figure 8.13 and compare it
with the scalogram of Figure 8.12. First, note that both figures have the
same input signal. In Figure 8.12, the X scale was index of time however, in
Figure 8.13 its time. However, that should not matter if the context is
understood (it could be time for continuous case and index of time for
discrete case). The difference between scalogram and wavelet transform is
that at a lower scale, in scalogram, all the frequencies were at the output. In
wavelet transform, at lower scales, only high frequencies are seen. This is
because at lower scales (higher resolution), we want to pay attention to high
frequencies only as they are present (well localized) there only and if we
increase the scale (lower the resolution), they will vanish. Low frequencies
can be dealt with on higher scales as they are robust to increment in scale.

Figure 8.13: Continuous wavelet transform illustration


To convert a scalogram to wavelet transform, all we need to do at lower
resolutions is filter out (remove) the low frequency components, and at
higher resolutions, filter out the high frequency component (it will be
automatically removed as it cannot survive there). Instead of using box
window, we will use something called a wavelet and that will do the job. So,
before looking at the structure of wavelets and their properties and the
mathematics behind them, let us have some useful insights in place. Look at
the following equation for non-negative integers m and n:
Equation 8.3:

The preceding equation is a litmus test for finding the frequency of interest.
Assume (in a thought experiment) we have a signal sin(mx) for which we do
not know the value of m. We have a device where we can find out the
integral above (i.e., we have full control over value of n). To know the
correct value of m, we will substitute different values of n. For almost all the
values, we get a 0 answer, but if |m|=|n|,, the answer is non-zero. This
means we have identified the frequency. Now, look at the following equation
of continuous time Fourier transform, which is reproduced here for
convenience:
Equation 8.4:

Here, instead of sin(nx), we have full control over e-j2πFt (again a


combination of sin and cos by using Euler’s formula). We try to identify
whether a frequency F exists in x(t) or not. This is the intuition behind the
working of Fourier transform. If F is present in x(t), x(F) will be non-zero,
otherwise 0.
Now, like sin(nx) or e-j2πFt, there are many other terms which can be used to
do the same. One of them happens to be wavelet. Unlike the former two, it is
finite in extent. Sine and exponents are infinite in extent. Let us define
continuous wavelet transform mathematically in the equation given below,
and then we will talk about it.
Equation 8.5:

In the above equation, f(x) is the function that we are testing for the presence
or absence of certain localized frequency (vibration) represented by ψ(x) -
the mother wavelet and is its translated and scaled version. An
important point to note is that by using ψ function, we do not just look for
presence (or absence) of the frequency represented by it but also where it is
in the tested signal. This where is accounted for by t i.e., translation
parameter. We also talked about window size proportional to scale in the
earlier section, the s parameter here takes care of that. Like Fourier
transform, wavelet transform is also an integral transform that finds
frequencies in the signal but additionally also discloses the location of that
frequency in the signal. To see how a wavelet looks like, let us see an
example in Figure 8.14 which shows one of the many available wavelets:

Figure 8.14: Morlet wavelet


As can be seen from the figure, wavelets are short vibrations for finite
duration of time (space). Evidently, because of wavy structure, it can find
similar vibrations in the signal. Also, due to finite duration, it will locate
where it has found that vibration in the signal.
Now, understanding Figure 8.13 is easy. If a suitable wavelet is used at a
given resolution, only vibrations with the given scale will be detected. The
code which is used to generate Figure 8.13 is given as follows:
01-
#=====================================================
=================
02- # PURPOSE : Understanding Continuous Wavelet Transform (CWT)
03-
#=====================================================
=================
04- import pywt # Install using – pip install pywavelets
05- import numpy as np
06- import matplotlib.pyplot as plt
07-
08- # Creating Signal and performing CWT
09- Fs=100
10- T=1/Fs
11- L=1001 # Keep this odd for ease of Mathematics
12- n=np.linspace(0,L-1,L)
13- t=n*T
14-
15- cut1=np.int32(np.floor(L/4))
16- sig1=np.sin(2*np.pi*(40)*t[0:cut1])
17- sig2=np.sin(2*np.pi*(20)*t[cut1:2*cut1])
18- sig3=np.sin(2*np.pi*(10)*t[2*cut1:3*cut1])
19- sig4=np.sin(2*np.pi*(5)*t[3*cut1::])
20- y=np.hstack((sig1,sig2,sig3,sig4))
21- scale1=np.arange(1,20,.1)
22- coef, freqs=pywt.cwt(y,scale1,'morl',sampling_period=T)
23-
24- # Plotting Logic
25- fig,ax=plt.subplots(2,1)
26- fig.show()
27- ax[0].plot(t,y,'k')
28- ax[0].grid()
29- ax[0].set_title("Original Signal",fontsize=15)
30- ax[0].set_xlabel("Time")
31- ax[0].set_ylabel("Amplitude")
32- ax[0].set_xlim((0,np.max(t)))
33-
34- ax[1].grid()
35- ax[1].matshow(np.log(1+abs(coef)),cmap='gray')
36- ax[1].set_xlabel("TRANSLATION")
37- ax[1].set_ylabel("SCALE")
38- ax[1].set_title("CWT (Top View)")
39- ax[1].set_aspect('auto')
40- ax[1].set_yticks(np.arange(0, len(scale1), step=20),
labels=np.int32(scale1[0:len(scale1):20]))
41- ax[1].set_xticks(np.arange(0, L, step=100),
labels=np.int32(t[0:len(t):100]))
42-
43- plt.show()
44- print("Completed Successfully ... ")
Code 8.2: Continuous wavelet transform
You may need to install PyWavelets package by using pip install
PyWavelets using the process illustrated in Section 2.3. Line no. 22 is
important, which is coef,
freqs=pywt.cwt(y,scale1,'morl',sampling_period=T). It calculates the
coefficients of wavelet transform. The wavelet used here is Morlet wavelet.
Although there are numerous options available, the selection of the mother
wavelet depends on the application and is out of the scope of this book.

8.8 Discrete wavelet transform


Having developed the intuition for the concept of scale, translation, and
mother wavelet for continuous wavelet transform, it is not hard to
understand the following equation that describes the discrete wavelet
transform:
Equation 8.6:

where j is the scale parameter and k is the shift parameter with mother
wavelet has impulse response (in continuous time), as shown below:
Equation 8.7:
sampled at 1,2j,22j,…,2N.
The factor of 2j comes into picture because reducing the scale of the signal
can now be done in quantized levels (not all continuous scales are possible)
is equivalent to down sampling the signal by a factor of 2 for scale j– that is
why the term 2j. Note that in Equation 8.6, f(x) is continuous but, k makes it
discrete.

8.9 Multi resolution analysis using wavelets


The representation in the previous section, which is parallel to its
corresponding continuous time counterpart equations, is seldom used for
DWT. The following cascading of filter bank approach is popular when it
comes to discrete wavelet transform. Refer to Figure 8.15:

Figure 8.15: Cascade of filters in discrete wavelet transform


In Figure 8.15, x[n] is a discrete input signal. g[n] and h[n] are lowpass and
high pass filters respectively. h[n] is the discrete equivalent of h(x) as given
in Equation 8.7 and g[n] is the quadrature mirror filter of h[n]. A quadrature
mirror filter is a filter whose magnitude response is a mirror image around
of another filter. The power sum of such a high and low pass filter pair is
equal to 1. The filters h[n] and g[n] are said to be complementary in nature.
In the context of the above figure, x[n], when passed through h[n] and g[n]
yields high pass and lowpass signals, which are then down sampled by a
factor of 2 (as per the discussion in Section 8.2). This is called one level
decomposition, and we get level 1 coefficients as a result. The output of low
pass filter is called approximation coefficient (because of its low pass
nature) and the output of high pass filter is called detail coefficient because
it captures the edge like information which corresponds to details in data.
Due to down sampling, both signals have half the number of samples as
compared to x[n].
The same procedure is applied to the output of lowpass filter to get second
level coefficients as shown in the figure. We do not apply the filter to output
of high pass filter, the reason is simple, as pointed out in CWT’ s discussion
in Section 8.7, the low frequency contents are retained as we go to higher
and higher scales. The high pass content is simply discarded. However, we
also keep a record of this high passed content at each level (scale) so that we
can retrieve back the original signal from any level of decomposition.
By multi-resolution analysis, we mean processing the approximation and
detail coefficient of the scale (level) we desire. The approximation and detail
coefficient are represented by and respectively. Let us now see all this at
work in Figure 8.16:

Figure 8.16: Illustration of single level decomposition using DWT and re-composition using IDWT
Note that the signal in the above figure is discrete time, but it is plotted using
plot function for understanding purposes only. Part (a) in the figure shows a
signal, which is a sinusoid corrupted by noise. Noise has mostly high
frequency components by its very nature. Part (b) shows the approximation
coefficients (cA) for signal in part (a). One may note that the level of noise
seems reduced in it as it is filtered out – but not completely as noise has
some low frequency component too. Part (c) of the figure shows high
frequency content of part (a) i.e., detail coefficients (cD), which is mostly
noise. One more thing to note is that in both part (b) and (c), the length of
signal is halved as there is a down sampler too in the process, as shown in
Figure 8.11. If we take IDWT using and as inputs, we get back the original
signal exactly as shown in part (d) of the figure. The code for doing this is
given as follows:
01-
#=====================================================
=================
02- # PURPOSE : Understanding DWT and IDWT
03-
#=====================================================
=================
04- import pywt
05- import numpy as np
06- import matplotlib.pyplot as plt
07-
08- #-----------------------------------------------------------------------
09- # Creating Data and performing DWT
10- #-----------------------------------------------------------------------
11- fm=20
12- Fs=10*fm
13- L=1000
14- T=1/Fs
15- t=np.arange(0,L,1)*T
16- y0 = np.sin(2*np.pi*(fm/30)*t) # Low frequency component
17- noise = np.random.normal(0,.1,L) # Mostly high frequency component
18- y=y0+noise # Original signal
19- (cA, cD) = pywt.dwt(y,'sym2',mode='per') # mode='per' for half length
output
20-
21- #-----------------------------------------------------------------------
22- # Plotting DWT
23- #-----------------------------------------------------------------------
24- fig,ax=plt.subplots(4,1)
25- fig.show()
26-
27- ax[0].plot(t,y,'k')
28- ax[0].grid()
29- ax[0].set_title("(a) Original Signal (with noise)")
30-
31- ax[1].plot(t[0:np.int32(len(t)/2)],cA,'k')
32- ax[1].grid()
33- ax[1].set_title("(b) Approximation Coef. (Through DWT)")
34-
35- ax[2].plot(t[0:np.int32(len(t)/2)],cD,'k')
36- ax[2].grid()
37- ax[2].set_title("(c) Detail Coef. (Through DWT)")
38-
39- #-----------------------------------------------------------------------
40- # Performing IDWT
41- #-----------------------------------------------------------------------
42- # A = pywt.idwt(cA, None, 'sym2',mode='per')
43- # D = pywt.idwt(None, cD, 'sym2',mode='per')
44- # recovered_signal=A + D
45-
46- recovered_signal = pywt.idwt(cA, cD, 'sym2',mode='per')
47-
48- #-----------------------------------------------------------------------
49- # Plotting IDWT
50- #-----------------------------------------------------------------------
51- ax[3].plot(t,recovered_signal,'k')
52- ax[3].grid()
53- ax[3].set_title("(d) Recovered Signal (Through IDWT)")
54-
55- plt.show()
56- print("Completed Successfully ... ")
Code 8.3: Code illustration DWT and IDWT for single level decomposition
Most of the code is self-explanatory. Line no. 19 is used for a single level
decomposition using DWT. Similarly, line no. 46 is used for IDWT. It takes
the approximation and detail coefficients and the other parameters as input
and the same for the DWT command and regenerates the original signal
exactly. Line no. 42 to 44 are commented on as they are alternate ways of
doing the same thing. Instead of using cA and cD simultaneously, one may
generate the approximation signal and detail signal in the time domain and
then later add them in the time domain to get back the original signal without
any loss.
This also motivates us to manipulate the approximation and detail
coefficients separately and combine them using IDWT to get the modified
signal as desired. In the above code, if cA is decomposed into 2nd, 3rd and
further levels, the control that we can exercise over the signal will be more.
This is what we precisely do in the next section by demonstrating noise
removal/minimization as an example.

8.10 Noise removal using multi resolution analysis


In this section, we will understand the usefulness of wavelet based noise
removal using an example in one dimension. Then, we will apply denoising
using wavelets on images and compare the results with traditional denoising
on images.

8.10.1 Noise removal using MRA and wavelets for one-


dimensional signal
Refer to Figure 8.17 for understanding the advantages of wavelet based
denoising. Part (a) of the figure shows a one-dimensional signal. Part (b)
shows the same signal with noise added to it. We will try to denoise this
signal.

Figure 8.17: Comparison of wavelet and MRA based denoising to traditional denoising
Part (c) of the figure shows the result obtained using wavelet and Multi
Resolution Analysis (MRA) based denoising and part (d) shows the result
of traditional averaging based denoising. Which result is better? At first
glance, it appears that part (d) is better than part (c) as it is smoother;
however, this is not the case. It is smoother but not better. This is because if
you see the original signal, between 0.10 and 0.15 on X scale, there is a rect
type structure. This means there is a step change at the beginning and end of
this structure. Now, look at the denoised versions in part (c) and part (d). It
is evident that part (c), i.e., wavelet and MRA based denoised output
preserves it better. Hence, it is better. While removing noise, the signal
should retain its original properties. Similar observations can be made about
the portion of the signal where the sinusoidal structure ends.
Now, let us see how the denoising is achieved using wavelet and MRA.
Refer to Figure 8.18. It shows the result of five levels of decomposition
using wavelet transform for the signal of Figure 8.17 (b). The column on the
left and right in this figure display the approximation and detail coefficients
respectively.
Detail coefficients, which correspond to the high frequency content, have
mostly noise. However, as we know, there were some high frequency
transitions in the original signal too, which are also present there. If we
ignore the detail coefficients completely, that information will be lost, and
high frequency regions will be converted to low frequency regions as it
happens in traditional filtering by using averaging filters of various kind. We
need to keep important details in detail coefficients. The local spikes that one
may note in detail coefficients correspond to important high frequency
detain in the original signal – one may visually observe it too. So, instead of
completely removing the detail coefficients, it is a good idea to remove only
those detail coefficients, at every level, which are less than (say) 90% of the
peak value at that level. Such a strategy is called hard thresholding. That is
what is being done in the above results.

Figure 8.18: 5 level decomposition of the noisy signal of Figure 8.17 (b)
There are many other strategies to select the important regions in the detail
coefficients which the reader may explore. The code for generating the
results of Figure 8.17 and Figure 8.18 is given as follows:
01-
#=====================================================
=======================
02- # PURPOSE : Denoising using wavelet based MRA
03-
#=====================================================
=======================
04- import pywt
05- import numpy as np
06- import scipy.ndimage as sci
07- import matplotlib.pyplot as plt
08-
09- #----------------------------------------------------------------------
10- # Creating Data and performing DWT (At multiple resolutions - MRA)
11- #----------------------------------------------------------------------
12- fm=200
13- Fs=100*fm
14- L=2**12 # Keep this in powers of 2 for ease of plotting later
15- T=1/Fs
16- t=np.arange(0,L,1)*T
17-
18- y0 = np.zeros(L)
19- y0[np.int32(len(t)/7):np.int32(len(t)/5)+500]=1
20- y0=np.sin(2*np.pi*(fm/4)*t)*y0
21- y0[np.int32(len(t)/2):np.int32(len(t)/2)+1000]=1
22-
23- noise = np.random.normal(0,.1,L)
24- y=y0+noise # Input signal with noise
25-
26- #----------------------------------------------------------------------
27- # Decomposition by using DWT - MRA
28- #----------------------------------------------------------------------
29- levels_of_decomposition=5
30- fig1,ax1=plt.subplots(levels_of_decomposition,2)
31- fig1.show()
32-
33- cA_list=[]
34- cD_list=[]
35- y2=y.copy()
36- for i in np.arange(0,levels_of_decomposition,1):
37- (cA, cD) = pywt.dwt(y2,'sym2',mode='per')
38- cA_list.append(cA)
39- cD_list.append(cD)
40- ax1[i,0].plot(t[0:np.int32(len(y2)/2)],cA,'k')
41- ax1[i,0].grid()
42- str1="cA at level "+str(i+1)
43- ax1[i,0].set_title(str1)
44-
45- ax1[i,1].plot(t[0:np.int32(len(y2)/2)],cD,'k')
46- ax1[i,1].grid()
47- str2="cD at level "+str(i+1)
48- ax1[i,1].set_title(str2)
49-
50- y2=cA
51- #----------------------------------------------------------------------------
52- # IDWT MRA Based Reconstruction by Hard Thresholding
53- #----------------------------------------------------------------------------
54- recovered_signal=cA
55- for i in np.arange(levels_of_decomposition-1,-1,-1):
56- A = pywt.idwt(recovered_signal, None, 'sym2',mode='per')
57- thresh_array=cD_list[i].copy()
58-
thresh_array[np.abs(thresh_array)>.9*np.max(np.abs(thresh_array))]=1
59- D = pywt.idwt(None, cD_list[i]*thresh_array, 'sym2',mode='per')
60- recovered_signal=A + D
61-
62- #----------------------------------------------------------------------
63- # Traditional Noise removal (for comparison)
64- #----------------------------------------------------------------------
65- y3=y.copy()
66- filter1=np.ones(15*levels_of_decomposition)
67- filter1=filter1/np.sum(filter1)
68- recovered_signal2=sci.correlate(y3,filter1)
69-
70- #----------------------------------------------------------------------
71- # Plotting Logic
72- #----------------------------------------------------------------------
73- fig2,ax2=plt.subplots(4,1)
74- fig2.show()
75-
76- ax2[0].plot(t,y0,'k')
77- ax2[0].grid()
78- ax2[0].set_title("(a) Original Signal")
79-
80- ax2[1].plot(t,y,'k')
81- ax2[1].grid()
82- ax2[1].set_title("(b) Original Signal + Noise")
83-
84- ax2[2].plot(t,recovered_signal,color='k')
85- ax2[2].grid()
86- ax2[2].set_title("(c) Recovered Signal (Wavelet MRA Denoising)")
87-
88- ax2[3].plot(t,recovered_signal2,'k')
89- ax2[3].grid()
90- ax2[3].set_title("(d) Recovered Signal (Traditional Filtering)")
91-
92- plt.show()
93- print("Completed Successfully ... ")
Code 8.4: Denoising of 1D signal using wavelet and MRA and comparison of the results with
traditional denoising
If you have followed it so far, the code should be easy to understand.

8.10.2 Noise removal using MRA and wavelets for images


Using the concepts developed in the previous section, images can also be
denoised. In this section, we will discuss the same. There are a few points to
consider when we talk about two-dimensional data instead of one-
dimensional data. Let us begin by looking at Figure 8.19. Part (a) of this
figure shows the original image. Part (b) shows Gaussian noise added to
image. Part (c) has the results of denoising using wavelets and MRA. Part
(d) shows averaging filter based denoising. The denoising is better in part
(c) as it tends to preserve the details in the original image; however, in part
(d), the image is blurred. We have noted and identified the reasons for this in
the previous section for one-dimensional signal. In the image denoised by
Wavelet and MRA, some enhancement is needed to make it visually better,
but the noise is better removed.
Let us see Code 8.5, that produces the result shown in Figure 8.19. We will
discuss some important lines of code with related concepts here. Line no. 20,
which is coeffs = pywt.wavedec2(img, 'db2', level=2), is used to
decompose the image using wavelet transform into multiple levels (scales).
It takes as input the image to be decomposed, the wavelet to be used, and the
number of levels used for decomposition. It returns a list (in our case coeffs)
which is a list in the format [cAn,(cHn,cVn,cDn),...,(cH1,cV1,cD1)]. Note
that there are ‘n’ levels of decomposition. For the nth level, the list contains
the first element as the approximation coefficient. The second element of the
list, which is a tuple (cHn,cVn,cDn), has horizontal, vertical, and detail
coefficient.
Figure 8.19: Wavelet and MRA based denoising vs. traditional denoising illustration
Recall that in one-dimensional case, there was only detail coefficient but
here, there are three. Similarly, for the other levels (<n) only tuples
containing these three coefficients is there in the list. This is because, by
combining the approximation coefficient of previous level and the three
other coefficient, the approximation coefficient for current level can be
generated.
01-
#=====================================================
=======================
02- # PURPOSE : Denoising of Images by Wavelet based MRA
03-
#=====================================================
=======================
04- import cv2
05- import numpy as np
06- import matplotlib.pyplot as plt
07- from skimage.util import random_noise
08- import pywt
09- import scipy.ndimage as sci
10- import my_package.my_functions as mf # This is a user defined package
11-
12- #-----------------------------------------------------------------------
13- # Importing Image and Displaying
14- #-----------------------------------------------------------------------
15- img0 = np.float64(cv2.imread('img1.bmp',0))
16- fig,ax=plt.subplots(2,2)
17- fig.show()
18- mf.my_imshow(mf.norm_uint8(img0),"(a) Original Image",ax[0,0])
19-
20- #-----------------------------------------------------------------------
21- # Creating and Adding Gaussian Noise to Image
22- #-----------------------------------------------------------------------
23- img = random_noise(img0/255, mode='Gaussian',mean=0,var=.05)
24- mf.my_imshow(mf.norm_uint8(img),"(b) Noisy Image",ax[0,1])
25-
26- #-----------------------------------------------------------------------
27- # Performing Decomposition (Wavelet & MRA)
28- #-----------------------------------------------------------------------
29- coeffs = pywt.wavedec2(img, 'db2', level=2)
30- # coeffs are in format of list [cAn, (cHn,cVn,cDn), ... , (cH1,cV1,cD1)]
31- # Approximation coeff are only given for the last (nth) decomposition
level
32-
33- #-----------------------------------------------------------------------
34- # Performing Denoising (Hard Thresholding)
35- #-----------------------------------------------------------------------
36- len_list=len(coeffs)
37- hard_thresh=70/100 # Hard hard_thresholding
38- for i in np.arange(1,len_list,1):
39- cH = coeffs[i][0].copy()
40- cV = coeffs[i][1].copy()
41- cD = coeffs[i][2].copy()
42-
43- cH[np.where(cH<hard_thresh*np.max(cH))] = 0
44- cV[np.where(cV<hard_thresh*np.max(cV))] = 0
45- cD[np.where(cD<hard_thresh*np.max(cD))] = 0
46- coeffs[i] = (cH,cV,cD)
47-
48- #-----------------------------------------------------------------------
49- # Performing Reconstruction (Wavelet & MRA)
50- #-----------------------------------------------------------------------
51- reconstructed_image=pywt.waverec2(coeffs, 'db2')
52- mf.my_imshow(mf.norm_uint8(reconstructed_image),"(c) Denoised by -
Wavelet & MRA",ax[1,0])
53-
54- #-----------------------------------------------------------------------
55- # Conventional Denoising by filtering
56- #-----------------------------------------------------------------------
57- r,c=np.shape(img)
58- n=np.int32(.05*r)
59- filter1=np.ones((n,n))
60- filter1=filter1/np.sum(filter1)
61- reconstructed_image2=sci.correlate(img,filter1)
62- mf.my_imshow(mf.norm_uint8(reconstructed_image2),"(d) Denoised by
- Averaging",ax[1,1])
63-
64- plt.show()
65- print("Completed Successfully ...")
Code 8.5: Denoising based on wavelet and MRA vs. averaging filter
From line no. 36 to 46, the logic for hard thresholding is written. The three
coefficients (cHn,cVn,cDn) for every level contain high frequency content, is
mostly noise. So, we only keep dominant peaks and remove other parts
below a given threshold. This is hard thresholding. Hard thresholding sets
wavelet coefficients to zero if they are below a certain threshold, while
leaving larger coefficients unchanged. It is simple to implement but can
introduce sharp discontinuities or artifacts in the signal. In contrast, soft
thresholding also sets smaller coefficients to zero but reduces the larger
coefficients by the threshold amount, resulting in smoother transitions. Soft
thresholding typically produces fewer artifacts and better preserves the
signal's structure, making it preferred for denoising applications. However,
soft thresholding is computationally more complex than hard thresholding.
The reader is encouraged to explore the soft thresholding option.
Line no. 51, which is reconstructed_image=pywt.waverec2(coeffs, 'db2')
reconstructs the image by using coeffs array and the wavelet. There is no
need to provide the number of levels here as it will be automatically known
to the waverec2 function from the length of coeffs list. Remember to use the
same wavelet in decomposition and reconstruction. Also note that if we skip
the hard thresholding part, the process of decomposition and reconstruction
nullifies each other, and the same image is produced as output. The rest of
the code is self-explanatory.

Conclusion
In this chapter, wavelet based multi-resolution analysis was introduced to
emphasize the importance of information at different scales (or resolutions).
The results of removing the noise independent of scale vs. with scale were
shown. It was found that generally, in images, noise is present at higher
resolutions (or low scales), and hence, the image must be first decomposed
into various scales before noise removal. The results of de-noising obtained
due to MRA were better than due to conventional filtering. In the next
chapter, we introduce binary morphology which is a very important topic in
object identification and detection.

Points to remember
• Not every information in an image is important at every resolution.
• Before down sampling the image, high frequency content should be
removed.
• Higher resolution means a lower scale and vice versa.
• At lower resolution (i.e., at higher scale), only high frequency
components are not preserved.
• Low frequency contents remain practically stable at all resolutions.
• The better an object in the image is localized in time, poor will be its
frequency resolution and vice versa.
• Fourier transform is primarily meant for stationary signals.
• The width of the window affects the results of STFT.
• Approximation coefficients of DWT correspond to low frequency
regions and detail coefficients correspond to high frequencies.
• Noise is usually more in detail coefficients and can be removed by hard
or soft thresholding.

Exercises
1. What is the difference between waterfall plot and wavelet plot?
2. Explain the effect of aliasing during down sampling of data.
3. What problem does STFT solve in comparison to Fourier Transform?
4. For a one-dimensional signal, what does approximation and detail
coefficient represent?
5. How does an image denoised due to MRA using wavelet preserve the
detail in the image?

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
CHAPTER 9
Binary Morphology

9.1 Introduction
Full moon night is full moon night because the moon becomes (appears to
be) circular that day. On other days, its boundary is not (does not appear to
be) circular. You can easily identify your friend in a photograph that has
many other people who you do not know. If you see an image of a broken
object, your brain can easily figure out and mentally reconstruct the broken
part.
An image is just another piece of data. You try to identify either a sub-data
of your interest or some desired properties in that image. Your brain can do it
trivially, but when computers try to do that, we call it morphology – the
study of the form of things. Modern image processing is all about that.
Machines try to read the number plates of vehicles, identify individuals by
seeing their facial image or by scanning thumbprints, do man-less surgery,
etc. It is all about extracting information from image data. Data is not the
new gold - rather, it is ore from which gold is extracted. Here, the
information extracted is gold.
This chapter is a small step towards extracting information from images.
Small step because we will do it on binary images first, and then in the
subsequent chapters, we will do that on grayscale and colored images with
advanced methods.
Structure
This chapter covers the following topics:
• Erosion
• Dilation
• Duality between erosion and dilation
• Opening and closing
• Hit and miss transform
• Boundary extraction
• Hole filling
• Region filling
• Connected component analysis
• Connected component analysis using skimage library
• Convex hull
• Thinning
• Thickening
• Skeletons

Objectives
The objective of this chapter is to introduce the reader to the processing of
binary images. This processing is important because it helps us extract useful
structural information from images. These structures may be skeletons or a
number of connected components, etc. Although binary images are seldom
used, they could appear at intermediate processing stages of some
algorithms, and hence, it is important to understand how they are processed.

9.2 Erosion
Soil erosion is an example of naturally occurring erosion. Soil decays layer
by layer – but uniformly everywhere. In this section, we will note a similar
phenomenon about objects in binary images. We will also set the foundation
for some of the upcoming sections in terms of general procedure to be used
in binary morphology, as well as some mathematical formalization to be
used frequently in this chapter.

9.2.1 Illustration of erosion


To understand erosion with an illustration, look at Figure 9.1. Part (a) of this
figure shows the input binary image. It has only two grayscale colors: black
(0) and white (255). However, they could be anything – for example,
grayscale value 4 and 200. The only point is that it should have two possible
values. Conventionally, it is chosen as 0 and 255 for black (logic 0) and
white (logic 1), respectively. Further, black values represent the background
in the image, and white values represent the foreground. Hence, in the
current image, there is only one white object:

Figure 9.1: Erosion operation


The numbers on each pixel indicate their corresponding linear indices (to
recall what linear indexing is, refer to Section 1.3) is not the intensity value
of that pixel. Part (b) of the figure shows the Structuring Element
represented by SE. To calculate the resulting image shown in part (c) of the
figure, we will keep the center of this element, i.e., pixel number 4 of the
structuring element on each pixel of the input image in part (a), and by
following certain rules, we will obtain the value of the pixel in the result
image. Remember that the dimensions of a pixel are the same in all parts –
although the structuring element is shown zoomed in for clarity.
Now, let us reveal the rules for erosion. Remember that we do processing for
every pixel in the image. While processing a particular pixel (which means
keeping the central pixel, i.e., pixels number 4 of the structuring element on
the input image’s pixel under processing), its value is declared as 255 (or
foreground, or logic 1) in the result image if and only if all the foreground
pixels of structuring element overlap with foreground pixel of the object
(foreground) in the input image. Otherwise, the resulting pixel in the result
image has a logic 0 (black) value.
Let us take an example of pixel number 36 in the input image, as shown in
part (a) of Figure 9.1. When the structuring element overlaps with it (i.e.,
SE’s pixel 4 coincides with pixel 36 of the input image), consequently, pixels
(24), (25), 26, (35), 36, 37, 46, 47, 48 in the input image coincide with pixels
0, 1, (2), 3, 4, 5, (6), (7), 8 of the structuring element, respectively. Note that
in the previous line, we have shown the pixels belonging to the background
in black color. Here, the rule is that the foreground pixel in SE should
coincide with the foreground pixel in the input image. However, on
examining the first pixel of SE, i.e., pixel 0, it coincides with pixel (24) in
the input image, which is the background pixel. This violates the rule, and
hence the value of pixel 36 (which is the pixel under processing becomes
logic 0 (or grayscale value 0) in the resulting image. We do not need to
check other pixel combinations since the first combination violated the rule.
However, if you check pixel 1 and 3 of SE, they also violate the rule. Simply
stated, the foreground of the SE does not remain contained in the foreground
of input image when center of SE (i.e., pixel 4 coincides with pixel 36 of
input image).
If you try the same exercise for pixel number 48 in the input image, you will
find that the foreground of SE is contained within foreground of input
image, and hence the output value for pixel 81 in result image in part (c) of
the figure is logic 1 (grayscale value 255).
An important observation is that, as compared to input image’s foreground,
some pixels which have violated the rule of erosion have been removed or
eroded from the result image and hence the name erosion. The object
(foreground) also appears to be smaller in the resulting image.
Remember that we have chosen the central pixel of SE as the origin/anchor.
However, this is not necessary; it could be anywhere in the set of foreground
pixels of SE.

9.2.2 Mathematics behind erosion


The language of morphological operations is set theory. Let us familiarize
ourselves with some notations which will be used throughout this chapter.
Remember that we are interested in 2 dimensional sets of integer space as
images are 2 dimensional. They have integer coordinates.
Let us represent all foreground pixels (and not background pixels) of
structuring element by . The translation of all pixels in denoted by(SE)z is
defined in as:
Equation 9.1:

Where z=(x',y') is the translated coordinate of the origin/anchor of SE


(which we are assuming as the central pixel. Although it could be any
foreground pixel chosen but, conventionally, it is chosen as center) and n=
(x,y) is any pixel location of foreground pixel in un-translated SE. Further, if
we denote by F, all the foreground pixels (and not the background pixels) in
the input image, then the erosion operation (represented by ) can be
defined as shown below:
Equation 9.2:

Having understood the working of erosion in the previous section, it is not


hard to read Equation 9.2 as Foreground pixel eroded by structuring element
is a set of pixels z such that structuring element translated by z is fully
contained in the foreground. Remember that by foreground pixel, we mean
foreground of input image, and by SE, we mean its foreground pixels only
and that z=(x',y') is the translated coordinate of the origin of SE. The above
definition assumes that we are putting the results in an empty image (all
pixel values are logic 0 to begin with) of same size as input image.

9.2.3 Application of erosion


To understand potential applications of erosion, refer to Figure 9.2. The
structuring element is a 3x3 image with all ones.
Figure 9.2: Potential use case of erosion
Comparing the input and result images, we note that structures smaller than
the structuring element vanish. For example, see the objects containing
pixels 57 and 117 in the input image; they have vanished. The objects equal
to or larger in size than the structuring element remain but their sizes are
reduced due to erosion. In another example, see objects containing pixels 33
and 93 in the input image. Further, an object containing pixel 55 in the input
image is split into two parts in the output image. Only that portion of the
object that could fully contain the structuring element remains (that too
reduced in size due to erosion). So, all the applications where smaller objects
need to be deleted as they may correspond to noise may benefit from
erosion.
The form of the structuring element is decided by the application based.

9.2.4 Python code for erosion


The results shown in Figure 9.1 are generated by Code 9.1. In line numbers
15 to 35, some custom functions are defined to plot grid lines on the image
to clearly show the pixel region and to do the numbering of pixels are
designed. These are for annotation purposes only. They can be avoided when
you just have to apply erosion. In line numbers 41 to 52, we have created a
binary image instead of importing it so that we can later play with it by
changing the foreground element as desired. Instead of this, one may import
an image, convert it to binary, and achieve the same thing. Line number 62
which reads — result_image = cv2.erode(Bimg1,SE,iterations =
1,anchor=(-1,-1)) is doing erosion. Note that the third argument is kept 1
here as we want to erode the image only once. However, if one wants to
progressively erode the image, they may set that many number of iterations
as desired. The fourth argument sets the anchor of the SE. We want the
center of SE to be the anchor. Hence, its coordinates with respect to the top
left corner of SE will be (1,1). If the shape of SE was 5x5, the coordinate of
the center will be (2,2). Accordingly, we can select the value of anchor point.
However, for the special case when irrespective of shape of SE, we want the
center of the SE to be the anchor, we can use (-1,-1) as anchor coordinates
that will always select the center. In the current case, it is the center of 3x3
array, and hence the coordinate (-1,-1) (or alternatively (1,1) will also work
fine). The remaining code is self-explanatory.
01-
#=====================================================
=================
02- # PURPOSE : Learning EROSION operation
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #--------------------------------------------------------------------------
11- # Defining Custom function to annotate the plots (not necessary once we
12- # understand the concept)
13- #--------------------------------------------------------------------------
14-
15- # Function for plotting grid lines over pixels of image and pixel
numbering
16- def plot_pixel_grid_on_image(req_size,ax,img):
17- req_size_x=req_size[1]+1
18- req_size_y=req_size[0]+1
19-
20- #------------------ For grid lines on image -------------------------------
21- for i in np.arange(0,req_size_x,1):
22-
ax.plot(i*np.ones(req_size_y)-.5,np.arange(0,req_size_y,1)-.5,color='.5')
23- for i in np.arange(0,req_size_y,1):
24-
ax.plot(np.arange(0,req_size_x,1)-.5,i*np.ones(req_size_x)-.5,color='.5')
25- # In the above, color can be set as grayscale value between 0 to 1
also
26-
27- #------------------ For pixel numbering -----------------------------------
28- for i in np.arange(0,req_size_x-1,1):
29- for j in np.arange(0,req_size_y-1,1):
30- if img[j,i]==0:
31- # White text on black background
32- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='1',fontsize=8)
33- else:
34- # Black text on white (or any non-zero gray) background
35- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='0',fontsize=8)
36-
37- #--------------------------------------------------------------------------
38- # Creating a binary image for understanding the concept (This could be
39- # replaced by a binary image instead when working with real images)
40- #--------------------------------------------------------------------------
41- Bimg1=np.uint8(255*np.array([\
42- [0,0,0,0,0,0,0,0,0,0,0], \
43- [0,0,0,0,0,0,0,0,0,0,0], \
44- [0,0,0,0,0,0,0,0,0,0,0], \
45- [0,0,0,1,1,1,1,1,0,0,0], \
46- [0,0,1,1,1,1,1,1,1,0,0], \
47- [0,0,1,1,1,1,1,1,1,0,0], \
48- [0,0,1,1,1,1,1,1,1,0,0], \
49- [0,0,0,1,1,1,1,1,0,0,0], \
50- [0,0,0,0,0,0,0,0,0,0,0], \
51- [0,0,0,0,0,0,0,0,0,0,0], \
52- [0,0,0,0,0,0,0,0,0,0,0]] ))
53-
54- #--------------------------------------------------------------------------
55- # Creating structuring elements (SE) for performing morphological
operations
56- #--------------------------------------------------------------------------
57- SE=np.uint8(255*np.array([\
58- [1,1,0], \
59- [1,1,0], \
60- [0,1,1], ]))
61-
62- result_image = cv2.erode(Bimg1,SE,iterations = 1,anchor=(-1,-1))
63-
64- #--------------------------------------------------------------------------
65- # Plotting Logic
66- #--------------------------------------------------------------------------
67- fig = plt.figure()
68- ax1= fig.add_subplot(1,3,1)
69- ax2= fig.add_subplot(3,7,11)
70- ax3= fig.add_subplot(1,3,3)
71-
72- mf.my_imshow(Bimg1,'(a) Binary Image',ax1)
73- plot_pixel_grid_on_image(np.shape(Bimg1),ax1,Bimg1)
74-
75- mf.my_imshow(SE,'(b) Structuring Element',ax2)
76- plot_pixel_grid_on_image(np.shape(SE),ax2,SE)
77-
78- mf.my_imshow(result_image,'(c) '+'Erosion',ax3)
79- plot_pixel_grid_on_image(np.shape(result_image),ax3,result_image)
80-
81- plt.show()
82- print("Completed Successfully ...")
Code 9.1: Code for doing morphological erosion operation on binary image

9.3 Dilation
Dilation is a dual operation of erosion (more on this a little later). In this
section, we will study how to dilate a binary object. We will also see a
working example through illustration and code.

9.3.1 Illustration of dilation


Dilation is illustrated in Figure 9.3. To create the result image (as shown in
part (d) of this figure) from the input image, the structuring element is first
reflected about its origin and then that reflected structuring element is used
to do dilation following some rule. Before discussing that rule, we will
explore more about the origin of the reflection of a 2D array. We have
assumed origin as the central pixel but, it could be anywhere – it can even
have fractional (x,y) coordinates. Reflecting a 2D element in this context
means flipping about the x axis followed by flipping about the y axis (or the
reverse order may also be followed). Interestingly, this is equivalent to
rotating the entire array by 180 degrees in an anticlockwise
(counterclockwise) direction once about the origin/anchor. This gives us the
reflected structuring element.
Once the reflected SE is formed, it is used to create the output image. We
process this for every pixel in the output image. The rule is simple: if the
translated version of reflected SE’s foreground intersects with the
foreground of the input image, then the index of input image, where the
anchor (in our case, center) of SE is placed is noted. At that index in the
output image, we set the logic value 1 (assuming that the output image is of
same size as that of input image and pre-initialized with all logic 0 values.
One may check this in the output image (dilated image) in part (d) of Figure
9.3:

Figure 9.3: Dilation operation

9.3.2 Mathematics behind dilation


Let us define the reflection of the structuring element, denoted by , as
shown below:
Equation 9.3:

Using the above definition, we define dilation in Equation 9.4, which is


represented by ⊕ as follows:
Equation 9.4:
With the symbols having the same meaning as discussed in the earlier
section. Equation 9.4 simply reads as: the foreground F of input image
dilated by SE is a set of all pixels with coordinate z [z is the translation of
anchor of SE, in our case, anchor is center, i.e. (-1,-1)] such that the reflected
and translated version of SE has at least one foreground pixel intersecting
(overlapping) with the foreground F of the input image. Dilation finds
obvious applications for thickening the objects in the binary image.

9.3.3 Python code for dilation


Figure 9.3 is generated by using the following code:
01-
#=====================================================
=================
02- # PURPOSE : Learning DILATION operation
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #--------------------------------------------------------------------------
11- # Defining Custom function to annotate the plots (not necessary once we
12- # understand the concept)
13- #--------------------------------------------------------------------------
14-
15- # Function for plotting grid lines over pixels of image and pixel
numbering
16- def plot_pixel_grid_on_image(req_size,ax,img):
17- req_size_x=req_size[1]+1
18- req_size_y=req_size[0]+1
19-
20- #------------------ For grid lines on image -------------------------------
21- for i in np.arange(0,req_size_x,1):
22-
ax.plot(i*np.ones(req_size_y)-.5,np.arange(0,req_size_y,1)-.5,color='.5')
23- for i in np.arange(0,req_size_y,1):
24-
ax.plot(np.arange(0,req_size_x,1)-.5,i*np.ones(req_size_x)-.5,color='.5')
25- # In the above, color can be set as grayscale value between 0 to 1
also
26-
27- #------------------ For pixel numbering -----------------------------------
28- for i in np.arange(0,req_size_x-1,1):
29- for j in np.arange(0,req_size_y-1,1):
30- if img[j,i]==0:
31- # White text on black background
32- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='1',fontsize=8)
33- else:
34- # Black text on white (or any non-zero gray) background
35- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='0',fontsize=8)
36-
37- #--------------------------------------------------------------------------
38- # Creating a binary image for understanding the concept (This could be
39- # replaced by a binary image instead when working with real images)
40- #--------------------------------------------------------------------------
41- Bimg1=np.uint8(255*np.array([\
42- [0,0,0,0,0,0,0,0,0,0,0], \
43- [0,0,0,0,0,0,0,0,0,0,0], \
44- [0,0,0,0,0,0,0,0,0,0,0], \
45- [0,0,0,1,1,1,1,1,0,0,0], \
46- [0,0,1,1,1,1,1,1,1,0,0], \
47- [0,0,1,1,1,1,1,1,1,0,0], \
48- [0,0,1,1,1,1,1,1,1,0,0], \
49- [0,0,0,1,1,1,1,1,0,0,0], \
50- [0,0,0,0,0,0,0,0,0,0,0], \
51- [0,0,0,0,0,0,0,0,0,0,0], \
52- [0,0,0,0,0,0,0,0,0,0,0]] ))
53-
54- #--------------------------------------------------------------------------
55- # Creating structuring elements (SE) for performing morphological
operations
56- #--------------------------------------------------------------------------
57- SE=np.uint8(255*np.array([\
58- [1,1,0], \
59- [1,1,0], \
60- [0,1,1], ]))
61-
62- result_image = cv2.dilate(Bimg1,SE,iterations = 1,anchor=(-1,-1))
63-
64- #--------------------------------------------------------------------------
65- # Plotting Logic
66- #--------------------------------------------------------------------------
67- fig = plt.figure()
68- ax1= fig.add_subplot(1,3,1)
69- ax2= fig.add_subplot(2,5,3)
70- ax3= fig.add_subplot(2,5,8)
71- ax4= fig.add_subplot(1,3,3)
72-
73- mf.my_imshow(Bimg1,'(a) Binary Image',ax1)
74- plot_pixel_grid_on_image(np.shape(Bimg1),ax1,Bimg1)
75-
76- mf.my_imshow(SE,'(b) SE',ax2)
77- plot_pixel_grid_on_image(np.shape(SE),ax2,SE)
78-
79- # Creating the reflected SE just for display
80- SE_rot=cv2.rotate(SE,cv2.ROTATE_90_COUNTERCLOCKWISE)
81- SE_rot=cv2.rotate(SE_rot,cv2.ROTATE_90_COUNTERCLOCKWISE)
82-
83- mf.my_imshow(SE_rot,'(c) Reflected SE (180 deg. ACW)',ax3)
84- plot_pixel_grid_on_image(np.shape(SE),ax3,SE_rot)
85-
86- mf.my_imshow(result_image,'(d) '+'Dilation',ax4)
87- plot_pixel_grid_on_image(np.shape(result_image),ax4,result_image)
88-
89- plt.show()
90- print("Completed Successfully ...")
Code 9.2: Code for morphological dilation
Most of the code is like Code 9.1 with changes at line number 62, where we
use dilation instead of erosion. Another addition to the code is line numbers
79 to 84, where the structuring elements reflected version is created just for
display purposes. Remember that we rotated the SE two times by 90 degrees
in anticlockwise direction in line numbers 80 and 81 (the second argument
tells us that) since there is no such thing as —
cv2.ROTATE_180_COUNTERCLOCKWISE. Clockwise direction is
chosen by convention. Another point to note is that the SE passed as an
argument in line number 62 is automatically reflected, and then dilation
happens internally; there is no need to pass the reflected version of SE as an
argument in line number 62 as it is done internally.
9.4 Erosion dilation duality
Without proving, we state that the following equation holds true:
Equation 9.5:

Equation 9.6:

The above equations are a mathematical way of saying that erosion and
dilation operations are duals of each other with respect to set
complementation and reflection. If the structuring element is symmetric (i.e.,
= SE), which is usually the case in practice (as we will see shortly), the
above equations become Equations 9.7 and 9.8, respectively:
Equation 9.7:

And
Equation 9.8:

Let us begin by interpreting Equations 9.7and 9.8. Equation 9.7 states that if
you erode F by SE and in the eroded image, look at its background
(indicated by complement( )C), it can be alternatively found out by dilating
the background of original image by B. (The dilation of background with
logic 0 will be possible just by complimenting the input image by replacing
black pixels by white and white by black and then proceed).
Equation 9.8 says the same for dilation. In both the equations, it is
mandatory that the structuring element be symmetric. If it is not, we can use
Equations 9.5 and 9.6 where the RHS has a reflected version of the
structuring element.

9.5 Opening and closing


You can understand opening and closing well if the concepts of erosion and
dilation are understood well. This is because opening is erosion followed by
dilation, and closing is dilation followed by erosion. This is followed by
some coding-based examples for better understanding.

9.5.1 Illustration of opening and closing


Refer to Figure 9.4 for a comparative illustration of opening and closing
with erosion and dilation for better understanding.
The following points are important to understand opening and closing in
Figure 9.4:
• In part (a) of the figure, the foreground (white pixels) is chosen on
purpose to understand the concepts.
• In part (d) of the figure, the structuring element is asymmetric with
anchor (-1,-1), i.e. pixel 4.
• Erosion is shown in part (b) of the figure. One can visually see and
confirm that pixel 33 is the only pixel on which if we overlap the anchor
pixel of structuring element, the structuring element will completely fit
inside the foreground of input image. That is why it is highlighted.
• In part (e) of the figure, opening is shown. The output of the opening can
be easily understood from erosion. It is erosion followed by dilation of
the eroded image. Erosion only marks the anchor pixel as high in the
output image corresponding to which the structuring element fits in the
foreground but opening marks the anchor pixel as well as all the
corresponding positions on which the reflected structuring element
overlaps (with foreground or background). For example, pixel 33 is the
anchor’s position – so it is marked. Now, if the reflected (180 degrees
rotated) version of structuring element is kept on this pixel (whether it
overlaps with foreground or background), it will highlight pixels
23,24,32,34 and 43 as shown. This is because of dilation. Note that pixel
24 was a part of the background in the input image, but still, it is
highlighted because of the reflected structuring element. Had the
structuring element been symmetric, we could have said that the
reflected structuring element (which will be the structuring element
itself in that case) will always overlap with the foreground of the input
image.
• Part (c) of the figure shows dilation.
• Part (f) of the figure shows closing. It can be understood as dilation
followed by erosion of the dilated image.

Figure 9.4: Comparative illustration of opening and closing with erosion and dilation

9.5.2 Mathematical formalization of opening and closing


Formally, we may define opening (represented by unfilled circle o) and
closing (represented by filled circle •) by Equations 9.9 and 9.10,
respectively:
Equation 9.9:

Equation 9.10:

Also, opening and closing operations are duals of each other with respect to
the set complementation and reflection. This is represented in the following
equations respectively:
Equation 9.11:

Equation 9.12:
9.5.3 Application of opening and closing
To understand the potential application areas of opening and closing, we will
refer to Figure 9.5. Let us first understand the various foreground elements
shown in part (a) of this figure. There are five major groups of foreground
elements that we will consider in the input image; the first is a pentagon. The
second is a composite of squares and triangles with connections between
them. The third one is the cluster of four white circles placed horizontally.
The fourth one is the vertical rectangle with circular holes of different radii.
Fifth is a horizontal rectangle with three gulfs of different thicknesses
(shown at the bottom right of the figure). Each of these is designed to
understand specific attributes of opening and closing:

Figure 9.5: Potential applications for opening and closing


The structuring element is shown in part (c) of the figure and is to the scale.
Remember that in the current case, it is symmetric.
Talking about the pentagon and the results of opening and closing in parts
(b) and (d) of the figure, it becomes clear that opening operation introduces
smoothening of corner points. This is one of the characteristics of opening.
Closing does possess this characteristic. It also smoothens the boundary,
though in a slightly different manner, as we will see shortly.
Coming to the second composite part formed by a combination of square,
triangle, and connections between them, let us note that the connections are
of two types — one is a straight line connection where there is no break;
however, in the bottom curved connection, there are various disconnections
in between. Also, note that both connections have widths lower than the
structuring elements. Near the top right corner of the square, there is a
protruding element. As shown in part (b) of the figure, opening removes all
the connections (of widths smaller than the structuring element). This is
expected because for opening, the first step is erosion, and because of that, if
the structuring element cannot be contained in the foreground, that portion of
the foreground is removed. Similar reasoning applies to protrusion too.
However, in the case of closing, as shown in part (d) of the figure, those
connections are retained. Two more important points to note in this context
are — firstly, the curved connection, which had many broken parts earlier, is
now completely connected. Secondly, the portions of connections attached to
the square and circle (at the points of connection) are now smoother after
closing. So, closing also introduces smoothening in its own way.
Talking about the cluster of four circles (placed horizontally), it can be noted
that opening only allows the circles to be bigger than the structuring element
in the output. On the contrary, closing allows all. This argument is not shape
specific. Had there been structures other than circles, the principle is that if
they are smaller than the structuring element, then opening will remove
them, but closing will allow all. This attribute of opening comes in handy in
binary images when there are small patches of noise that we intend to
remove.
Fourthly, note the vertical rectangle with four holes in it in the input image.
Closing [in part (d)] has the effect of filling all the holes smaller than the
structuring elements in it. Lastly, note the horizontal rectangle with three
gulfs of different widths is also treated properly by closing operation. Gulfs
of width are smaller than the structuring elements are almost filled, and the
larger one is smoothened out with it becoming thin. Thus, closing can be
used for filling gulfs as well.
In summary, we can say that opening can be used for smoothening
boundaries of objects. It breaks narrow connections and eliminates
protruding elements (smaller in size than the structuring element). Closing
connects narrow gaps, fills gulfs, holes, and gaps in boundaries/counters. It
also smoothens the boundaries.

9.5.4 Python code for opening and closing


The code for generating the results of Figure 9.4 is shown below. Figure 9.5
can also be generated by the same code with minor modifications like taking
a custom-made input image [as depicted in part (a) of the figure] instead of
designing a binary image (Bimg1) in the code itself, not plotting the grids
and numbers on the pixels, etc. We recommend trying it out.
Line numbers 59 and 60 are the new lines as compared to the previous code.
They achieve the purpose of opening and closing. Note that they do not have
any iteration parameter as to whether you apply opening or closing on a
binary image once or n number of times; the result remains the same.
01-
#=====================================================
=================
02- # PURPOSE : Learning Morphological Opening & Closing
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #--------------------------------------------------------------------------
11- # Defining Custom function to annotate the plots (not necessary once we
12- # understand the concept)
13- #--------------------------------------------------------------------------
14-
15- # Function for plotting grid lines over pixels of image and pixel
numbering
16- def plot_pixel_grid_on_image(req_size,ax,img):
17- req_size_x=req_size[1]+1
18- req_size_y=req_size[0]+1
19-
20- #------------------ For grid lines on image -------------------------------
21- for i in np.arange(0,req_size_x,1):
22-
ax.plot(i*np.ones(req_size_y)-.5,np.arange(0,req_size_y,1)-.5,color='.5')
23- for i in np.arange(0,req_size_y,1):
24-
ax.plot(np.arange(0,req_size_x,1)-.5,i*np.ones(req_size_x)-.5,color='.5')
25- # In the above, color can be set as grayscale value between 0 to 1
also
26-
27- #------------------ For pixel numbering -----------------------------------
28- for i in np.arange(0,req_size_x-1,1):
29- for j in np.arange(0,req_size_y-1,1):
30- if img[j,i]==0:
31- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='1',fontsize=8)
32- else:
33- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='0',fontsize=8)
34- #--------------------------------------------------------------------------
35- # Creating some binary images for understanding the concepts
36- #--------------------------------------------------------------------------
37- Bimg1=np.uint8(255*np.array([\
38- [0,0,0,0,0,0,0,0,0,0,0,0], \
39- [0,0,0,0,0,0,0,0,0,0,0,0], \
40- [0,0,0,1,1,0,0,0,1,0,0,0], \
41- [0,0,1,1,1,1,1,1,1,1,0,0], \
42- [0,0,0,1,1,0,0,0,1,0,0,0], \
43- [0,0,0,1,0,0,0,0,0,0,0,0], \
44- [0,0,0,1,1,0,0,1,0,0,0,0], \
45- [0,0,0,0,0,1,1,0,0,0,0,0], \
46- [0,0,0,0,0,0,0,0,0,0,0,0], \
47- [0,0,0,0,0,0,0,0,0,0,0,0]] ))
48-
49- #--------------------------------------------------------------------------
50- # Creating structuring element (SE) for performing morphological
operations
51- #--------------------------------------------------------------------------
52- SE=np.uint8(255*np.array([\
53- [0,1,1], \
54- [1,1,1], \
55- [0,1,0], ]))
56-
57- result_image_E = cv2.erode(Bimg1,SE,iterations = 1,anchor=(-1,-1))
58- result_image_D = cv2.dilate(Bimg1,SE,iterations = 1,anchor=(-1,-1))
59- result_image_O =
cv2.morphologyEx(Bimg1,cv2.MORPH_OPEN,SE,anchor=(-1,-1))
60- result_image_C =
cv2.morphologyEx(Bimg1,cv2.MORPH_CLOSE,SE,anchor=(-1,-1))
61-
62- #--------------------------------------------------------------------------
63- # Plotting Logic
64- #--------------------------------------------------------------------------
65- fig,ax=plt.subplots(2,3)
66-
67- mf.my_imshow(Bimg1,'(a) Binary Image',ax[0,0])
68- plot_pixel_grid_on_image(np.shape(Bimg1),ax[0,0],Bimg1)
69-
70- mf.my_imshow(result_image_E,'(b) '+'Erosion',ax[0,1])
71-
plot_pixel_grid_on_image(np.shape(result_image_E),ax[0,1],result_image_
E)
72-
73- mf.my_imshow(result_image_D,'(c) '+'Dilation',ax[0,2])
74-
plot_pixel_grid_on_image(np.shape(result_image_E),ax[0,2],result_image_
D)
75-
76- mf.my_imshow(SE,'(d) Structuring Element',ax[1,0])
77- plot_pixel_grid_on_image(np.shape(SE),ax[1,0],SE)
78-
79- mf.my_imshow(result_image_O,'(e) '+'Opening',ax[1,1])
80-
plot_pixel_grid_on_image(np.shape(result_image_O),ax[1,1],result_image_
O)
81-
82- mf.my_imshow(result_image_C,'(f) '+'Closing',ax[1,2])
83-
plot_pixel_grid_on_image(np.shape(result_image_O),ax[1,2],result_image_
C)
84-
85- plt.show()
86- print("Completed Successfully ...")
Code 9.3: Comparative illustration of opening and closing with erosion and dilation
9.6 Hit and miss transform
Refer to Figure 9.6 for understanding the concept of hit and miss transform.
Let us see the structuring element in part (b) first. It now has three regions:
foreground (white), background (black), and do not care (intermediate
grayscale value). The anchor is set to (-1,-1), which means that it has to be
the center (pixel 12) of the structuring element [note that (2,2) will also work
as SE is 5x5 in shape].
Let us see what hit and miss transform offer. For a given placement of a
structuring element on the input image, the pixel corresponding to the anchor
pixel of the structuring element in the output image will be made logic 1 (or
white in color) if the following two conditions are met simultaneously. First,
the foreground of the structuring element is completely contained in the
foreground of the input image, and second, the background of the structuring
element also matches the background of the input image. We do not have to
do anything for the do not care conditions.

Figure 9.6: Hit and miss transform


Note that if only the first condition were to be satisfied, it will be called
erosion. However, if the second condition is satisfied simultaneously, it
becomes hit and miss transform. If the foreground of structuring element is
contained in foreground of image, it is a hit. Similarly, if the background of
structuring element also matches with background of input image, it is a hit
of background, which we call miss; hence the name hit and miss transform.
Now, let us look at the input image in part (a) of Figure 9.6. It has three
objects. Let us talk about them one by one. When the anchor pixel of SE,
i.e., pixel 12 coincides with pixel 24 of the input image, the T shape
(foreground) coincides by 100%. We have found a hit. Simultaneously, the
background pixels of SE (black ones) also match with the background of the
input image. We just found a miss. About the do not care pixels in SE (pixel
3 and 8) we do not care what they overlap with. Since hit and miss are found
simultaneously, the conditions of hit and miss transform are fulfilled. Hence,
pixel 24 is highlighted in the output image in part (c) of the figure.
Now, talking about the second object in the input image (center object), if
we place the anchor of SE coincident with pixel 59 in the input image, we
find that the foreground of SE is contained in the foreground of the second
object — hence it is a hit. However, the background of SE does not coincide
100% with the background of the input image. Pixel 7 of SE and pixel 52 of
the input image do not overlap in color — hence, we have not found a miss.
Thus, pixel 59 in the output image is not highlighted.
The case of the third object in the input image (the right one) is like the first
object. If we place the anchor of SE coincident with pixel 94 of the input
image, we find a hit as well as a miss, and hence pixel 94 is made logic 1 in
output. However, they do not care pixels of SE (i.e., 3 and 8) overlap with 81
and 88 of the input images. Pixels 81 and 88 have different colors, but they
do not care.

9.6.1 Mathematics behind hit and miss transform


Mathematically, hit and miss transform is represented by:
Equation 9.13:

Let us understand it part by part. Starting from RHS, we see that it is an


intersection to two erosions. The first erosion is of the foreground of the
input image by SE1. The second erosion is of the background (i.e. Fc) by
SE2. So, we have two structuring elements. The first bracket finds a hit and
the second finds a miss. The common portion (intersection) is called hit and
miss. However, in the illustration given in Figure 9.6, we had only one SE.
Let us call it SE1,2. It is made up of SE1 and SE2. So, the LHS is the hit and
miss of by SE1,2. Now, let us talk about the construction of SE1,2 by
combining SE1 and SE1,2 by using another illustration shown in Figure 9.7:

Figure 9.7: Hit and miss transform — broken as intersection of two erosions
Part (a) of the figure is the actual binary input image and part (b) is its
complimented form, as we can easily see the background as foreground for
the second erosion in Equation 9.13. Parts (c) and (d) show the structuring
elements used for individual erosions. Parts (e) and (f) show the results of
the eroding input image in part (a) and the complimented image in part (b)
by SE1 and SE2, respectively. The result of the intersection of these two
eroded versions is shown in part (g), which is the final hit and miss
transformed image. Note that Figure 9.6 and Figure 9.7 have the same input
(hence, the same output image). However, in Figure 9.6, we use a single
filter instead of two (let us call it SE1,2). For constructing SE1,2 from SE2 and
SE2, the foreground of SE1,2 is the foreground of SE1 and the background of
SE1,2 is the foreground of SE2. Remember that these sets of pixels never
intersect. The remaining pixels are do not care pixels in SE1,2.

9.6.2 Python code for hit and miss transform


The results shown in Figure 9.6 are generated by the following code:
01-
#=====================================================
=================
02- # PURPOSE : Learning Hit and Miss Transform
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- #--------------------------------------------------------------------------
11- # Defining Custom function to annotate the plots (not necessary once we
12- # understand the concept)
13- #--------------------------------------------------------------------------
14-
15- # Function for plotting grid lines over pixels of image and pixel
numbering
16- def plot_pixel_grid_on_image(req_size,ax,img):
17- req_size_x=req_size[1]+1
18- req_size_y=req_size[0]+1
19-
20- #------------------ For grid lines on image -------------------------------
21- for i in np.arange(0,req_size_x,1):
22-
ax.plot(i*np.ones(req_size_y)-.5,np.arange(0,req_size_y,1)-.5,color='.5')
23- for i in np.arange(0,req_size_y,1):
24-
ax.plot(np.arange(0,req_size_x,1)-.5,i*np.ones(req_size_x)-.5,color='.5')
25- # In the above, color can be set as grayscale value between 0 to 1
also
26-
27- #------------------ For pixel numbering -----------------------------------
28- for i in np.arange(0,req_size_x-1,1):
29- for j in np.arange(0,req_size_y-1,1):
30- if img[j,i]==0:
31- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='1',fontsize=8)
32- else:
33- ax.text(i-.25,j+.25,str(i+(req_size_y-
2)*i+j),color='0',fontsize=8)
34- #--------------------------------------------------------------------------
35- # Creating a binary image for understanding the concept
36- #--------------------------------------------------------------------------
37- Bimg2=np.uint8(255*np.array([\
38- [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], \
39- [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], \
40- [0,0,1,1,1,0,0,1,1,1,0,0,1,1,1,0,0], \
41- [0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0], \
42- [0,0,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0], \
43- [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], \
44- [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]] ))
45-
46- #--------------------------------------------------------------------------
47- # Creating structuring element
48- #--------------------------------------------------------------------------
49- SE_HITMISS=np.array([\
50- [-1,-1,-1,-1,-1], \
51- [-1, 1, 1, 1,-1], \
52- [-1,-1, 1,-1,-1], \
53- [ 0, 0, 1,-1,-1], \
54- [-1,-1,-1,-1,-1], ])
55- # In above SE, 1 is for foreground
56- # -1 is for background
57- # 0 is for dont care
58-
59-
result_image=cv2.morphologyEx(Bimg2,cv2.MORPH_HITMISS,SE_HIT
MISS,anchor=(-1,-1))
60-
61- #--------------------------------------------------------------------------
62- # Plotting Logic
63- #--------------------------------------------------------------------------
64-
65- fig,ax= plt.subplots(2,1)
66-
67- mf.my_imshow(Bimg2,'(a) Binary Image',ax[0])
68- plot_pixel_grid_on_image(np.shape(Bimg2),ax[0],Bimg2)
69-
70- mf.my_imshow(result_image,'(c) Hit and Miss Transform',ax[1])
71- plot_pixel_grid_on_image(np.shape(result_image),ax[1],result_image)
72-
73- ax1= fig.add_subplot(1,4,4)
74- mf.my_imshow(np.uint8(255*(SE_HITMISS>0)+100*
(SE_HITMISS==0)),\
75- '(b) Structuring Element',ax1)
76- plot_pixel_grid_on_image(np.shape(SE_HITMISS),ax1,\
77- np.uint8(255*(SE_HITMISS>0)+100*(SE_HITMISS==0)))
78-
79- plt.show()
80- print("Completed Successfully ...")
Code 9.4: Hit and miss transform
The different lines from previous codes are line numbers 47 to 59. Note that
for using the command in line number 59, we need a combined structuring
element instead of two separate structuring elements. Thus, 1 represents the
foreground, -1 represents the background, and 0 represents do not care
pixels.
The results in Figure 9.7 are not difficult to generate if one knows how to
erode images and take logical and NumPy arrays. We recommend you try
this out.

9.7 Boundary extraction


Boundary extraction in binary images is a simple task. We will explore that
in this section. As a biproduct of coding, we will also learn Python code for
converting subscripted to linear indices in images and vice versa, as that will
be used in our actual Python program.

9.7.1 Mathematics behind boundary extraction


A simple illustration of boundary extraction is shown in Figure 9.8:
Figure 9.8: Boundary extraction
As you might have already guessed, the boundary of the foreground object
can be extracted by simply subtracting the eroded version of it from itself.
That is a layman’s definition; by subtraction, we mathematically mean set
difference precisely. So mathematically, if we represent by β(F), the set of
boundary pixels, it can be found as shown below:
Equation 9.14:
β(F)=F-(F SE)
Where - sign represents set difference. Usually, a symmetric structuring
element is used to do so. The thicker you want the boundary to be, the bigger
the shape of the structuring element you should choose. The code for it will
be simple enough, but we will need some Python basics discussed in the next
sub-section before presenting the actual code.

9.7.2 Subscripted vs. linear indices


Recall from Section 1.3 that there are two ways to represent a pixel’s
location in an image. First, by specifying the row and column number of that
pixel as ordered pair (r,c), which is called subscripted indexing, and
second, by specifying the count (starting from 0) counting pixels in order of
column after column — that is called linear indexing. The following code
shows the conversion from one form to another form of indexing:
01-
#=====================================================
=================
02- # PURPOSE : Convert subscripted indices to linear and vice versa
03-
#=====================================================
=================
04- import numpy as np
05-
06- # Create a 2D array and print it
07- input_array=np.array([[1,2,3,4,5],[4,5,6,7,8],[3,4,5,6,7]])
08- print(input_array)
09-
10- # Find all subscipted indices in the array that follow a given condition
11- sub_indx=np.where(input_array>5)
12- print(sub_indx)
13-
14- # Convert those subscripted indices to linear
15-
lin_indx=np.ravel_multi_index(sub_indx,np.shape(input_array),order='F')
16- # order in the above line tells us whether we want to count
17- # column wise ('F' - Fortran order) or row wise ('C' - C lang. order)
18- # The result returned by ravel_multi_index is not sorted.
19- print(lin_indx)
20-
21- # Convert linear indices so formed back to subscripted indices
22-
recovered_sub_indx=np.unravel_index(lin_indx,np.shape(input_array),order
='F')
23- print(recovered_sub_indx)
24-
25- print("Completed Successfully ...")
Code 9.5: Conversion from subscripted to linear indices and vice versa
The output of the code is as follows:
[[1 2 3 4 5]
[4 5 6 7 8]
[3 4 5 6 7]]
(array([1, 1, 1, 2, 2], dtype=int64), array([2, 3, 4, 3, 4], dtype=int64))
[ 7 10 13 11 14]
(array([1, 1, 1, 2, 2], dtype=int64), array([2, 3, 4, 3, 4], dtype=int64))
Completed Successfully ...
In line number 7, a two-dimensional array is created. In line number 11, all
the subscripted indices of the positions where the input array has a number
greater than 5 are recorded. In line number 15, by using
np.ravel_multi_index function, subscripted indices are converted to linear
form. This function takes three arguments: first, the array having subscripted
indices; second, the size of the original input array; and third, for order. By
order, we mean that we want to count column wise or row wise. As
discussed in Section 1.3, we will follow the column wise order. For that, we
need to use Fortran order (as in Fortran language, this order for counting is
followed) represented by F. However, if you require to use row counting
order, then you should use C which stands for C programming order. In line
number 22, np.unravel_index function recovers the original subscripted
indices from the linear indices. See the output above and consider
experimenting with the code, and you will be ready to go.

9.7.3 Python code for boundary extraction


The simple (in the light of Code 9.5) self-explanatory code for boundary
extraction is presented in Code 9.6. The results of Figure 9.8 are generated
using this code.
01-
#=====================================================
=================
02- # PURPOSE : Learning Binary Boundary Extraction Algorithm
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- # Reading the image from disk
11- input_image=np.float32(cv2.imread('img25.bmp',0))
12-
13- # Creating a symmetric structuring element
14- SE=np.uint8(np.ones((3,3)))
15-
16- # Eroding the input image with the structuring element
17- eroded_image=cv2.erode(input_image,SE,iterations = 1)
18-
19- # Finding subscripted indices of foreground in input image
20- ip_f=np.where(input_image>0)
21-
22- # Finding subscripted indices of foreground in eroded image
23- e_f=np.where(eroded_image>0)
24-
25- # Converting subscripted indices to linear (for foreground of input
image)
26- # Also the data type is changed to SET as we need to perform set
difference later
27-
ip_f_lin=set(np.ravel_multi_index(ip_f,np.shape(input_image),order='F'))
28-
29- # Converting subscripted indices to linear (for foreground of eroded
image)
30- # Also the data type is changed to SET as we need to perform set
difference later
31-
e_f_linear=set(np.ravel_multi_index(e_f,np.shape(input_image),order='F'))
32-
33- # Applying the set difference (making the result LIST first and then INT
array)
34- bounday_pixel_lin=np.int32(list(ip_f_lin.difference(e_f_linear)))
35-
36- # Getting back the subscripted indices
37-
bounday_pixel_cord=np.unravel_index(bounday_pixel_lin,np.shape(input_i
mage),order='F')
38-
39- # Creating empty result image of same shape as input
40- result_image=np.zeros(np.shape(input_image))
41-
42- # Putting the boundary pixels as WHITE
43- result_image[bounday_pixel_cord]=255
44-
45- # Display logic
46- fig1,ax1=plt.subplots(1,2)
47- mf.my_imshow(mf.norm_uint8(input_image),'(a) Input Binary
Image',ax1[0])
48- mf.my_imshow(mf.norm_uint8(result_image),'(b) Boundary
Image',ax1[1])
49-
50- plt.show()
51- print("Completed Successfully ...")
Code 9.6: Boundary extraction algorithm code

9.8 Hole filling


Let us first define a hole, and then we will discuss how to fill it. As we will
note, hole filling algorithm is nothing but constrained dilation. In effect, this
algorithm is an application of the principle of dilation only — under some
constraints.

9.8.1 Defining a hole


Refer to Figure 9.9 for an illustration of holes. A binary image has
background (black in color) and foreground objects (single or multiple in
numbers, white color). Sometimes, the objects may contain a trapped portion
of the background, as shown in the figure. In the figure, we have four holes
labeled as H1, H2, H3, and H4:

Figure 9.9: Holes in a binary image (text is not a part of the figure)
It is important to note that holes are always fully bounded by boundary. That
boundary may be thick (as in the first object in the figure — hole H1 and
H4) or thin (as in the last object in the figure — hole H3). It may be thicker
at some points and thin at others. Our objective in hole filling is to design an
algorithm for filling the hole when the worst type of boundary is present. Let
us now explore the worst type of boundary and the method of filling it.

9.8.2 Hole filling algorithm


If we have an image where there is only one white pixel and the rest is the
background, and we start dilating that one white pixel by some structuring
element with infinite allowable iterations, all the background will be
converted to the foreground (in some finite iterations because the image has
finite dimensions). The principle behind hole filling is simple — specify a
point inside the hole, then dilate it conditionally — i.e., the dilated region
should never go out of the boundary of the hole. That way, the hole will be
filled.
We need to address three points: First, the form of the structuring element.
Second, the worst boundary for which we should design our algorithm.
Third, the condition that should be imposed on dilation to ensure it never
crosses the boundary of the hole. The answer is in the following equation:
Equation 9.15:

Where I is the image, Dk is the dilated region at iteration k with D0


representing a single pixel coordinate that we supply at the beginning of the
algorithm inside the hole. The form of the structuring element is shown in
Figure 9.10:

Figure 9.10: Structuring element for hole filling


The worst boundary is a single pixel thick boundary (8-connected), and the
check we need to apply after every iteration of dilation is taking the
intersection of the dilated region with the complement of the image. This
will ensure that the dilation of the specified pixel inside the hole is bound by
the boundary of the hole. The algorithm stops when Dk and Dk-1 becomes the
same.
We will see various illustrations to understand the three answers given
above. However, first, let us implement Equation 9.15 and see the results in
the next section.

9.8.3 Python code for hole filling


As per Equation 9.15, following is an implementation of the hole filling
algorithm. The code is interactive. Refer to Figure 9.11 for the result.
01-
#=====================================================
=================
02- # PURPOSE : Learning Hole Filling Algorithm
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import my_package.my_functions as mf # This is a user defined package
08- # one may find the details related to its contents and usage in section
2.7.3
09-
10- I=cv2.imread('img26.bmp',0) # INPUT IMAGE
11- r,c=np.shape(I)
12- SE=np.uint8(np.array([[0,1,0],\
13- [1,1,1],\
14- [0,1,0]])) # SE FOR HOLE FILLING
15-
16- # Take input from user (inside the hole to be filled)
17- fig,ax=plt.subplots(1,2)
18- mf.my_imshow(I,'(a) Input Binary Image',ax[0])
19- mf.my_imshow(I,'(b) Input Binary Image',ax[1])
20- src=np.int16(np.asarray(plt.ginput(1))) # Src Pt (I/p from user)
21- ax[0].plot(src[0,0],src[0,1],'r.',markersize=10)
22-
23- # INTERMEDIATE DILATION IMAGE AT kth ITERATION
24- D=np.uint8(np.zeros((r,c)))
25- D[src[0,1],src[0,0]]=255 # Mark the pixel as inputed by user
26-
27- # INTERMEDIATE DILATION IMAGE AT (k-1)th ITERATION
28- D_prev=np.zeros((r,c)) # D(k-1)
29- I_comp=np.uint8(255*(I==0))
30-
31- while 1:
32- D=np.uint8(cv2.dilate(D,SE,iterations = 1))
33- D=np.uint8(1*np.logical_and(D,I_comp))
34- if np.sum(1*np.logical_xor(D_prev,D))==0:
35- break # break if D(k)=D(k-1)
36- D_prev=np.uint8(D.copy())
37-
38- hole_filled_image=np.uint8(1*np.logical_or(D,I))
39- mf.my_imshow(np.uint8(255*hole_filled_image),'(b) Hole filled',ax[1])
40-
41- plt.show()
42- print("Completed Successfully ...")
Code 9.7: Hole filling algorithm (interactive)
And here is the output of the code:

Figure 9.11: Hole filling algorithm


In part (a) of the figure, the user needs to select a pixel inside the hole
(marked by a dot) region of any of the holes. In part (b) of the figure, the
corresponding hole will be filled.
Now, to answer the three questions highlighted in the previous section, let us
look at Figure 9.12:

Figure 9.12: Illustration of hole filling algorithm (with number grid)


The initial pixel specified by the user is pixel 55, as can be seen in part (a) of
the figure. Now, the hole will start to fill because of dilation. Imagine a
situation when pixel 43 is getting processed. Due to the structuring element,
pixels 33, 42, 43, 44, and 53 should become high, but because 33 and 42 are
boundary pixels, they will be logic 0 in . Due to this, the intersection of the
dilated set and Ic will not have pixels 33 and 42, and hence, the final dilated
region will exclude these two pixels. This is the normal course of the
algorithm. Suppose that instead of using the structuring element of Figure
9.10, we use an all ones (3x3) structuring element. For the same pixel under
processing, i.e., pixel 43, due to dilation, pixels 32, 33, 34, 42, 43, 44, 52, 53,
and 54 will be highlighted. However, pixel 32 falls out of the boundary, and
this pixel will also not be removed by intersection with Ic. This is why the
current choice of the structuring element is justified, and the worst boundary
is a single pixel (8 connected) boundary. Had it been 3 or 4 pixels thick, the
problem of crossing the boundary would never arise. We should design our
algorithm for the worst possible case. Hence, the structuring element of
Figure 9.10 is a proper choice.
9.9 Region filling
In the previous section, we have already set the foundation for region filling.
Holes are one kind of region in the binary image. By regions, we
conventionally mean the foreground object/objects in the binary image. If
we want to identify all the foreground objects, we need to use the following
equation:
Equation 9.16:

The above equation has a form like Equation 9.15, but instead of Ic, it uses I.
Also, the structuring element used is a 3x3 structure with all ones.
Additionally, we need as many pixels inside the corresponding foreground
objects as initial points inside the object of interest to start with – which is
not an automated process. To automate the process, we are going to study
connected component analysis in the next section.

9.10 Connected component analysis


Connected component analysis is more than just finding the connected parts
of the foreground object in the binary image. For a given foreground object,
it also tells us how many pixels there are, what are the boundary pixels, the
centroid, and many other desired quantities. In this section, we will study
just one way of finding connected components; there are many with their
own advantages, like computational complexity, etc.

9.10.1 Two pass method


The primary objective of this method is to find all the connected components
in the image. Secondly, properties like the number of pixels per object, their
coordinates, centroids, etc., can be trivially derived from the so found
connected components. The output of this method looks like the one shown
in Figure 9.13:
Figure 9.13: Typical output of connected component analysis
One may note that there are 7 connected objects (with 8-connectivity). On
each pixel of a given object, a unique number is written (note that this
number is not the linear index pixel coordinate this time). Ideally, one would
expect numbers from 1 to 7 for 7 objects, but they are not from that set.
Although it is trivial to map these numbers in a figure to range from 1 to 7
that is just relabeling. The crux is that every connected component is
identified by a unique identifier. Let us now discuss the method following
which we arrived at this figure from the input image. It happens in two
passes. Bypasses, we mean full traversal of the image in some order. Our
order is conventional, pixels will be processed from left to right, top to
bottom. Let us see what happens in pass 1.

9.10.1.1 Pass 1 of connected component analysis


While traversing the entire image pixel by pixel in the order specified in the
previous section, in pass 1, we process every foreground pixel. For
background pixels, nothing is to be done. Since a binary image may have
foreground pixels at the boundary, we zero pad the entire image by a 1-pixel
thick border in our implementation. We also make a pass1 image of the same
size as the padded input image and store the result of pass 1 in it. This image
will contain temporary labels of the objects, which will turn into unique
labels in pass 2. Now, let us discuss what happens at every pixel during
processing. We need to keep Figure 9.14 in mind that while calculating the
label for pass1 image for pixel I(i,j), the pixels highlighted in gray color are
already processed. A variable called next_label is initialized to 1.

Figure 9.14: Processing details of pass 1 at a pixel


To calculate the label for I(i,j), we do the following:
• If the already processed pixels are all background, we assign a new label
to the current pixel by equating the value of pass1(i,j)=new_label and
increment the value of new_label.
• If the already processed pixels in the pass1 image have the same label
value for all, we copy this value to the result, i.e., pass1(i,j).
• If the already processed pixels in the pass1 image have different labels,
we find the minimum of all and assign this value to pass1(i,j). Also, all
these different labels are equivalent. Hence, we keep a record of these in
a list, and in pass 2, we will make all of them the same (usually, the
smallest among them will replace all).

9.10.1.2 Pass 2 of connected component analysis


Once pass 1 is over, in pass 2, we again traverse the image in the
conventional order and replace all the equivalent labels with a single label
(the minimum one), which is our result. Have a look at Figure 9.15 to see
the results of the two passes. Pass 2 image is the result. From this image,
other properties like pixels per object, boundary pixels, area, perimeter,
centroid, etc., can be easily derived (which we are not covering here as it is a
trivial exercise).

Figure 9.15: Pass 1 and 2 of connected component analysis for finding connected components
The result in Figure 9.15 has been generated using the following code:
001-
#=====================================================
=================
002- # PURPOSE : Learning Connected Component Analysis
003-
#=====================================================
=================
004- import cv2
005- import matplotlib.pyplot as plt
006- import numpy as np
007- import my_package.my_functions as mf # This is a user defined
package
008- # one may find the details related to its contents and usage in section
2.7.3
009-
010- #--------------------------------------------------------------------------
011- # Defining Custom function to annotate the plots (not necessary once
we
012- # understand the concept)
013- #--------------------------------------------------------------------------
014-
015- # Function for plotting grid lines over pixels of image and pixel
numbering
016- def plot_pixel_grid_on_image(req_size,ax,img,pass0):
017- req_size_x=req_size[1]+1
018- req_size_y=req_size[0]+1
019-
020- #------------------ For grid lines on image -------------------------------
021- for i in np.arange(0,req_size_x,1):
022-
ax.plot(i*np.ones(req_size_y)-.5,np.arange(0,req_size_y,1)-.5,color='.5')
023- for i in np.arange(0,req_size_y,1):
024-
ax.plot(np.arange(0,req_size_x,1)-.5,i*np.ones(req_size_x)-.5,color='.5')
025- # In the above, color can be set as grayscale value between 0 to 1
also
026-
027- #------------------ For pixel numbering -----------------------------------
028- for i in np.arange(0,req_size_x-1,1):
029- for j in np.arange(0,req_size_y-1,1):
030- if img[j,i]==0:
031- # White text on black background
032- ax.text(i-.25,j+.25,str(pass0[j,i]),color='1',fontsize=8)
033- else:
034- # Black text on white (or any non-zero gray) background
035- ax.text(i-.25,j+.25,str(pass0[j,i]),color='0',fontsize=8)
036-
037- #--------------------------------------------------------------------------
038- # Creating a binary image for understanding the concept (This could be
039- # replaced by a binary image instead when working with real images)
040- #--------------------------------------------------------------------------
041- I=np.uint8(255*np.array([\
042- [0,0,0,0,0,0,0,0,0,0,1,1,1], \
043- [0,0,1,0,0,0,0,0,1,0,0,1,1], \
044- [0,1,1,0,1,0,1,1,0,0,0,0,0], \
045- [0,0,0,1,0,0,0,1,0,1,1,1,0], \
046- [0,0,0,0,0,0,0,0,0,1,0,1,0], \
047- [0,1,0,1,0,0,1,0,0,1,0,1,0], \
048- [0,0,1,0,0,1,1,1,0,0,1,0,0], \
049- [0,1,0,1,0,0,1,0,0,0,0,0,1], \
050- [0,0,0,0,0,0,0,0,0,0,1,0,1], \
051- [0,0,0,0,0,0,0,0,0,0,1,1,1]] ))
052-
053- # Creating zero padded image
054-
I2=cv2.copyMakeBorder(I,1,1,1,1,cv2.BORDER_CONSTANT,value=0)
055- r,c=np.shape(I)
056-
057- #--------------------------------------------------------------------
058- # PASS 1
059- #--------------------------------------------------------------------
060- next_label=1 # variable for creation of new labels
061- equ_list=[] # Equivalent labels list
062-
063- pass1=np.zeros((r+2,c+2))
064- for i in np.arange(0,r+2,1):
065- for j in np.arange(0,c+2,1):
066- if I2[i,j]==255:
067- arr=np.sort(np.array([pass1[i-1,j-1],pass1[i-1,j],\
068- pass1[i-1,j+1],pass1[i,j-1]]),0)
069- if (np.sum(arr)==0): # If all processed pixels are background
070- pass1[i,j]=next_label
071- next_label=next_label+1
072- else: # If all processed pixels are background
073- arr=np.delete(arr,np.where(arr==0))
074- pass1[i,j]=arr[0]
075- if (len(np.unique(arr))!=1):
076- equ_list.append(arr.tolist())
077-
078- pass1=np.int16(pass1)
079- fig,ax=plt.subplots(1,2)
080- mf.my_imshow(I2,'(a) PASS 1',ax[0])
081- plot_pixel_grid_on_image(np.shape(I2),ax[0],I2,pass1)
082-
083- #--------------------------------------------------------------------
084- # Equivalent labels list management logic
085- #--------------------------------------------------------------------
086- len_list=len(equ_list)
087- dummy_arr=np.zeros((len_list,2))
088- for i in np.arange((len_list-1),-1,-1):
089- dummy_arr[i,0]=i
090- dummy_arr[i,1]=equ_list[i][0]
091- dummy_arr=dummy_arr[dummy_arr[:,1].argsort()]
092-
093- #--------------------------------------------------------------------
094- # PASS 2
095- #--------------------------------------------------------------------
096- pass2=pass1.copy()
097- for i in np.arange(len_list-1,-1,-1):
098- len_sub_list=len(equ_list[i])
099- for j in np.arange(1,len_sub_list,1):
100- pass2[np.where(pass2==equ_list[i][j])]=equ_list[i][0]
101-
102- mf.my_imshow(I2,'(b) PASS 2',ax[1])
103- plot_pixel_grid_on_image(np.shape(I2),ax[1],I2,pass2)
104-
105- plt.show()
106- print("Completed Successfully ...")
Code 9.8: Connected component analysis
A few important points to note about the code are — first, in line number 16,
the last argument is changed from the previous codes in this chapter as we
intend to display the label number and not the pixel linear indices.
Accordingly, line numbers 32 and 35 have changed. The code only looks
large in length because we want to plot the grid and display the label
numbers over the image. Also, we have used an image created inside the
code — we could simply import any binary image for doing the same thing.
The code is otherwise compact and can be easily understood from the
explanation above.

9.11 Connected component analysis using skimage library


In this section, we use the inbuilt methods in the skimage library to find
connected components and learn how many properties can we find for them
and how to extract their properties. The following Code 9.9 will help us do
that:
01-
#=====================================================
======================
02- # PURPOSE : Learning Connected Component Analysis using inbuilt
functions
03-
#=====================================================
======================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import skimage as ski
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- # Import and Display the image
12- input_image=cv2.imread('img30.bmp',0)
13- fig1,ax1=plt.subplots(1,2)
14- mf.my_imshow(input_image,'(a) Input Binary Image',ax1[0])
15-
16- # Find all labels, print unique labels and display as image
17- label_img = ski.measure.label(input_image, background=0)
18- print("Unique labels generated are ... ",np.unique(label_img))
19- mf.my_imshow(mf.norm_uint8(label_img),'(b) Labelled Image',ax1[1])
20-
21- # Finding properties of objects in image
22- comp_props=ski.measure.regionprops(label_img)
23-
24- # Printing some of all the available properties of connected components
25- for i in np.arange(0,np.max(label_img),1):
26- print("Area, Centroid and Eccentricity of object with label
",str(i+1),"are ...")
27-
print(comp_props[i].area,comp_props[i].centroid,comp_props[i].eccentricity
)
28-
29- plt.show()
30- print("Completed Successfully ...")
Code 9.9: Connected component analysis using skimage library
Line number 17 which is label_img = ski.measure.label(input_image,
background=0) uses skimage library to calculate label image. It takes the
image on which we want to compute the labels as input as the first argument.
The second argument is background pixels grayscale value. By default, it is
0. However, if a binary image is formed using some other convention, we
may set this value here. The calculations for background are not made by
this command. The output (labeled image) is shown in Figure 9.16. In part
(b) of the images, labels are represented by colors (grayscale values). One
may make a code like Code 9.8 where numbers are used as labels instead of
color.

Figure 9.16: Output of connected component analysis using skimage library


Line number 22 which is
comp_props=ski.measure.regionprops(label_img) computes various
properties available for the connected components. Line number 27 prints
some of them. To see all the available properties, one may use
help(ski.measure.regionprops) on the python shell — there are numerous
such properties but we have only printed three of them — area, centroid, and
eccentricity. The syntax for accessing the desired properties is shown in line
number 27. The output of the program (as seen on the shell) is as follows:
Unique labels generated are ... [0 1 2 3]
Area, Centroid and Eccentricity of object with label 1 are ...
14627 (81.00840910644698, 139.46550899022355) 0.8311768368274545
Area, Centroid and Eccentricity of object with label 2 are ...
5220 (172.29367816091954, 341.0147509578544) 0.4802359825171935
Area, Centroid and Eccentricity of object with label 3 are ...
15135 (238.25041295011562, 172.5660389824909) 0.6978435742000586
By area, we mean the total number of pixels in that connected component.
Centroid is returned as a set of coordinates. Eccentricity is the eccentricity of
the ellipse that has the same second moments as the region. It is the ratio of
the focal distance (distance between focal points) over the major axis length.
The value is in the interval [0, 1). When it is 0, the ellipse becomes a circle.

9.12 Convex hull


Figure 9.17 will help us understand the concept of convex objects and
convex hulls. Parts (a) and (b) of the figure show two different objects (in
gray color). We call the first one convex and the second non-convex.
Informally, let us discuss the way of deciding this boundary for other shapes.
Simply put a rubber band around the boundary of the objects. If the rubber
band touches every part of the boundary, the object is convex, otherwise it is
non-convex. This can be seen in parts (c) and (d) of the figure. From part
(c), keeping in mind the above test, we can call the object in part (a) as
convex, since the boundary of object in part (a) is completely touched by the
rubber band [shown in black color in part (c)]. However, as we note from
part (d), some portion of the boundary of the object and the rubber band
does not overlap — and this is why the object in part (b) is non-convex.

Figure 9.17: Convex and non-convex objects together with convex hull
The rubber band in our experiment is called convex hull. Note that all the
interior angles of a convex hull (whether it is drawn for convex or non-
convex object) are less than 180 degrees. The objects may not always be in
polygonal shape. Then, it will be difficult to compute the internal angles.
That is where the convexity of the surface from the outer side (or concavity
from inner side) is guaranteed for a convex hull.
An alternate definition of convex objects (and not convex hull) is that if a
line segment is drawn between any two points inside a convex object, it
always remains inside the object — this is true for any two pair of points
(pixels) in the object.

9.12.1 Graham scan procedure for finding convex hull


Before we try to find out the convex hull of objects in binary images, we will
study how a convex hull is calculated for a set of points in the two-
dimensional Euclidian space. We will then carry forward the concepts
developed in this section to binary morphology for finding the convex hull.
Have a look at Figure 9.18. In part (a) of this figure, 10 points in two-
dimensional space are shown. Our objective is to enclose these 10 points
with a rubber band. Imagine these points as nails fitted on a wall
(background). We want to put a rubber band around them such that all the
nails are inside it. Technically, this is called finding a convex hull.
Figure 9.18: Convex hull for a set of points in two-dimensional space
In part (b) of the figure, the desired rubber band is shown with a dark
boundary. Let us now see how to achieve it. Although there are many
methods for finding a convex hull with their own pros and cons (despite the
same result), we will try to understand the Graham scan algorithm for its
simplicity of understanding and implementation:
1. Select a pivot point: The pivot point is the point with the lowest Y
coordinate. If there is a tie, then it is the leftmost point with the lowest Y
coordinate. In our case, as noted from part (b) of the figure, point (1,4)
is the pivot point (although there was a point (6,4) also with the same Y
coordinate, but as per the rule, we took the leftmost one).
2. Number all the points: In our case, 9 points remain. To number them,
we assume a horizontal line passing through the pivot point. Calculate
the angle made by a line joining the point under consideration and the
pivot point with respect to this horizontal line. The points are then
numbered in ascending order of angles. Part (b) of Figure 9.18 shows
the numbers assigned to the rest of the points calculated in the above
manner. For convenience, dotted lines to every point are already shown.
Note that by now, we have not calculated the dark boundary. That is our
final aim.
3. Select-reject: Now, select or reject the numbered points for being
included in the convex hull. For this, we will traverse all the points
starting in order of their numbers as assigned in Step 2. Let us
understand by example of part (b) in the figure. Since Step 3 will
happen in stages, we have summarized the stages of traversal in Table
9.1. Keep an eye on this table as we proceed with the explanation.
Stage no. p q r q-pass/fail

1 pivot 1 2 Pass

2 1 2 3 Pass

3 2 3 4 Fail

4 1 2 4 Pass

5 2 4 5 Pass

6 4 5 6 Fail
7 2 4 6 Fail

8 1 2 6 Pass

9 2 6 7 Pass

10 6 7 8 Pass

11 7 8 9 Fail

12 6 7 9 Fail

13 2 6 9 Pass

14 6 9 pivot Pass

Table 9.1 : Stages of Graham scan algorithm


At every stage of traversing the points, we will maintain three variables p, q, and r. Remember
that at every stage, we test for point q for its inclusion into the convex hull or not. To begin with,
we take p=pivot point, q = point (1), and r =point (2). So, point (1) is under test. Also, remember
that the pivot point will always be a part of the convex hull — so there is no testing for it. Now,
we have to test point q. Travel from point p to q on a straight line segment [see in part (b) of
Figure 9.18]. On reaching the end of the line segment, look at the deviation in the direction
required to travel to point r — i.e., whether we go left or right. If you go left — that is allowed,
and point q passes the test. However, if you travel right, point q will fail the test. After passing or
failing, there are separate procedures for updating values of p, q, and r for the next stage (we
will see shortly).
Currently, as per the values of p, q, and r, we start traveling from pivot point to point (1). As we
reach the end, we note that we will have to take a left to reach point (2) — point (1) has passed
the test. We record these entries in the first row of Table 9.1. The reason point (1) is highlighted
in the column of q with gray color will be addressed when we compute the results. Also, note
that by now, the thick line of part (b) of the figure, which stands for the convex hull is not drawn
— it will be drawn in the end when we have the final convex hull. We are traveling on
imaginary lines as of now. Now that q has passed the test, we will update p, q, and r as follows –
p(new) = q (old), q(new) = r(old), and r(new) =next point in order. So, now p=1, q=2, and r =3.
Again, apply for the same test and see for yourself that q passes the test. Make an entry in the
second row of the table and form the new values of p, q, and r, as shown in the next row of the
table.
At this stage (row 3 of the table), you will find that q (i.e., point 3) fails the test. We all pass and
fail in life multiple times. It is just that the update procedure will change. p and q for stage 4 will
be the same as p and q of stage 2 as that is the stage where 3 (i.e., failed point) last appeared in
the column of r (in pass row). For the current stage, r will increment, i.e., r = point 4. Keep
repeating the procedure, and it will be easy for you to create the entire table.
4. Computing the result: The points included in the convex hull are not necessarily all the points
that have passed. Many points have passed and failed multiple times. To compute the result, start
from the bottom most row and the column of q in Table 9.1. Note all those elements, ascending
upwards in the column of q, whose latest result is a pass (previous results do not matter). You
will see that those points are 9, 6, 2, 1, and 0 (always included as it is the pivot point). Those
points (in that order), when joined, form the convex hull.

9.12.2 Code for understanding convex hull computation


Before presenting the actual code, let us understand some fundamentals that
will help us understand the code better. Have a look at Figure 9.19. There
are three points in the Euclidian space — p, q, and r. While traveling from
point p to q on a straight line segment and arriving near point q to travel to
point r, we need to take a left turn. It could also have been the right turn if
the point r was on the other side. We need to mathematically represent this
left/right turn.

Figure 9.19: Illustration of left-right turn


This can be easily done by finding out the difference in slopes of the two
line segments, as given equation with the interpretation that follows:
Equation 9.17:

If the value of β, which is the difference between the slopes, in the above
equation is 0, then the points are collinear. If it is positive, then there is a
right turn (clockwise rotation) and if it is negative, the turn is left
(counterclockwise). Instead of using the above equation, the following
modified form of the following equation can also be used:
Equation 9.18:
with the same interpretation for µ as for β. We will use this in Code 9.10. Do
not be intimidated by the number of lines used in code. The code is simple if
you have followed it so far:
01-
#=====================================================
========================
02- # PURPOSE : Finding Convex Hull using Graham Scan Algorithm
03-
#=====================================================
========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import skimage as ski
08- import math
09- import my_package.my_functions as mf # This is a user defined package
10- # one may find the details related to its contents and usage in section
2.7.3
11-
12- # Function to calculate the convex hull using Grahm Scan Procedure
13- def conv_hull_graham_scan(points_list): # Input - Unsorted List of 2D
points
14- n = len(points_list)
15-
16- # Nested-Function to find the orientation indicator of triplet
17- # of points (p, q, r). Returns:
18- # 0: Collinear points_list
19- # 1: Clockwise points_list
20- # 2: Counterclockwise points_list
21- def find_orient_indicator(p, q, r):
22- val = (q[1] - p[1]) * (r[0] - q[0]) - (q[0] - p[0]) * (r[1] - q[1])
23- if val == 0:
24- return 0
25- return 1 if val > 0 else 2
26-
27- # Find the point with the lowest y-coordinate (and leftmost if tie
appears)
28- btm_left_pt = min(points_list, key=lambda point: (point[1], point[0]))
29-
30- # Sort the points_list based on polar angle with respect to the
btm_left_pt point
31- sorted_points_list = sorted(points_list, key=lambda point:\
32- (math.atan2(point[1] - btm_left_pt[1], point[0] - btm_left_pt[0]),
point))
33-
34- # Initialize the convex hull with the first three points_list
35- hull = [sorted_points_list[0], sorted_points_list[1],
sorted_points_list[2]]
36- hull_index=[0,1,2]
37-
38- # Iterate over the sorted points_list to build the convex hull
39- print('p','q','r','Pass?')
40- for i in range(3, n):
41- print(hull_index[-3],hull_index[-2], hull_index[-1],1)
42- while len(hull) > 1 and find_orient_indicator(hull[-2], hull[-1],
sorted_points_list[i]) != 2:
43- hull.pop()
44- pop_ele=hull_index.pop()
45- print(hull_index[-1],pop_ele,i,0)
46- hull.append(sorted_points_list[i])
47- hull_index.append(i)
48- print(hull_index[-3],hull_index[-2], hull_index[-1],1)
49- print(hull_index[-2], hull_index[-1],0,1)
50-
51- return hull,sorted_points_list
52-
53- # Creating Random points in 2D space (points should be more than 2)
54- # Although fractional coordinates can also be taken, for illustration,
55- # we take integer coordinates.
56- #points_list = np.random.randint(1,10,(10,2)).tolist()
57- points_list = np.array([(7,9),(5,8),(2,7),(4,6),(2,6),(6,6),(4,5),(8,5),(1,4),
(6,4)]).tolist()
58-
59- # Calculate the convex hull
60- convex_hull_pts_list,sorted_points_list =
conv_hull_graham_scan(points_list)
61- sorted_points_list=np.array(sorted_points_list)
62-
63- #--------------------------------------------------------------------------
64- # Plotting Logic
65- #--------------------------------------------------------------------------
66- # Plot the convex hull and points_list
67- fig,ax=plt.subplots(1,2)
68- fig.show()
69- points_array=np.array(points_list)
70- convex_hull_pts_array=np.array(convex_hull_pts_list)
71-
ax[1].plot(points_array[:,0],points_array[:,1],'k.',markersize=10,label="2D
Points")
72-
73- # Append the initial point at the last of the array as well for plotting
74- # so that the convex hull is closed curve
75-
convex_hull_pts_array2=np.vstack((convex_hull_pts_array,convex_hull_pts
_array[0]))
76-
ax[1].plot(convex_hull_pts_array[0,0],convex_hull_pts_array[0,1],'ko',mark
ersize=10,label="Pivot point")
77- r,c=np.shape(points_array)
78-
79- for i in np.arange(1,r,1):
80- ax[1].plot([sorted_points_list[0,0],sorted_points_list[i,0]],\
81- [sorted_points_list[0,1],sorted_points_list[i,1]],'--
',markersize=10,color=[.7,.7,.7])
82- ax[1].text(sorted_points_list[i,0],sorted_points_list[i,1],'('+str(i)+')')
83- ax[1].plot(convex_hull_pts_array2[:,0],convex_hull_pts_array2[:,1],'-
',color='black',label="Convex Hull")
84-
ax[0].plot(points_array[:,0],points_array[:,1],'k.',markersize=10,label="2D
Points")
85- ax[0].grid()
86- ax[0].set_title("(a) Set of points",fontsize=15)
87- ax[0].set_xlabel("The X axis ->")
88- ax[0].set_ylabel("The Y axis ->")
89- ax[0].axis("equal")
90- ax[0].legend()
91- ax[1].grid()
92- ax[1].set_title("(b) Convex Hull of set of points",fontsize=15)
93- ax[1].set_xlabel("The X axis ->")
94- ax[1].set_ylabel("The Y axis ->")
95- ax[1].axis("equal")
96- ax[1].legend()
97-
98- plt.show()
99- print("Completed Successfully ...")
Code 9.10: Grahm scan algorithm for finding convex hull
From line numbers 15 to 51, the function for the Graham scan procedure is
written. We will come to it a little later. Let us first discuss from line number
56 onwards. In line number 56, a list of random points is created. However,
if you want to see the exact graphs of Figure 9.18, you may use line number
57 by uncommenting it. In line number 60, the convex hull is calculated
through the function conv_hull_graham_scan, and the returned variables
convex_hull_pts_list,sorted_points_list contain the convex hull and sorted
points list as discussed, respectively. Rest all is plotting logic.
The function conv_hull_graham_scan has a sub function
find_orient_indicator, which finds whether we have to move left or not, as
per the discussion at the beginning of this sub section. In line number 28,
which is btm_left_pt = min(points_list, key=lambda point: (point[1],
point[0])), we find out the point with the lowest Y coordinate, and if there is
a tie between two or many points, we select the point with the lowest X
coordinate. Let us understand the min function in some detail through the
following code example:
1- a=[1.5,.5,-3,2]
2- b=[x**2 for x in a]
3- print(min(a))
4- print(b)
5- print(min(a,key=lambda x:x**2))
Code 9.11: Understanding min function in python
The output of the code is as follows:
-3
[2.25, 0.25, 9, 4]
0.5
First, we should note that we are learning about the min function of Python
that does not belong to numpy or any library. In Code 9.11, line number 1
defines a list of some numbers in variable a. In line number 2, we create a
list b such that every element of b is a squared element of a. In line number
3, the min function is used, and the value of the minimum element of list a,
which is -3, is printed. Now begins the interesting part. In line number 4, list
b is printed, and then in line number 5, we again apply the min function to
list a to find the minimum element, but this time according to some key. This
key is a lambda function (it can be a function in general) that returns x
containing squares of all elements in list a. Minimum element is found from
x and not a. Whatever is the index of that minimum value in x, at that same
index in a, the value is returned as an answer of this min function. From the
output, the minimum amongst squared elements is 0.25. Its index is 1, and
a(1) =0.5.
Coming back to line number 28 of Code 9.10, btm_left_pt is the point with
a minimum Y coordinate, and if there is a tie, it selects the minimum X
coordinate. This is ensured by key=lambda point: (point[1], point[0]).
Similarly, in line number 31, sorted_points_list is created as per the angles
made with respect to the X axis, as discussed earlier, by using the sorted
function and key as the angle. In line number 35, we initialize hull, i.e., a
convex hull, as a stack prefilled with the first three sorted points. In line
number 36, we initialize the corresponding indices, too. This is for the
purposes of printing the table like Table 9.1. From line number 39 to 51, the
convex hull is created in the hull variable, and the table is printed on python
shell output as follows (for the example of Figure 9.18):
p q r Pass?
0121
1231
2340
1241
2451
4560
2460
1261
2671
6781
7890
6790
2691
6901
Completed Successfully ...
In the above output, 1 represents pass, and 0 represents fail in the last
column. A 0 in columns of p, q, and r represents the pivot (starting and
ending) point of the convex hull.

9.12.3 Convex hull of objects in binary images


The method of finding a convex hull discussed in the previous sub-section
can be extended to binary images too. The (row,column) coordinate of the
foreground object in a binary image should be used instead of the X-Y
coordinates used earlier. To find the convex hull of a foreground object, we
do not need the coordinates of every pixel in the object. Instead, the
coordinates of the boundary will suffice — and that can be found easily by
the boundary extraction algorithm discussed in Section 9.7. One must try this
out to get results like Figure 9.20:

Figure 9.20: Grahm scan procedure applied on image


Once we obtain the result shown in part (c) of Figure 9.20, we may fill the
inside region of the hull by region filling/hole filling algorithm as desired.

9.12.4 Python’s inbuilt method for finding convex hull


Having understood one of the methods of finding a convex hull, here is the
inbuilt way of finding a convex hull on a binary image in python:
01-
#=====================================================
=================
02- # PURPOSE : Finding Convex Hull using Python's Inbuilt Method.
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import skimage as ski
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- # Import and Display the image
12- input_image=cv2.imread('img28.bmp',0)
13- fig1,ax1=plt.subplots(1,3)
14- mf.my_imshow(input_image,'(a) Input Binary Image',ax1[0])
15-
16- # Find Convex Hull and Display
17- convex_hull_image =
255*ski.morphology.convex_hull_image(input_image)
18- mf.my_imshow(np.uint8(convex_hull_image),'(b) Convex Hull',ax1[1])
19-
20- # Embed the original set in convex hull and display
21- convex_hull_image[np.where(input_image>0)]=128
22- mf.my_imshow(np.uint8(convex_hull_image),'(c) Convex Hull (with
Foreground)',ax1[2])
23-
24- plt.show()
25- print("Completed Successfully ...")
Code 9.12: Python's inbuilt method for finding convex hull
Line number 17 in the code has it all. The output is shown as follows:
Figure 9.21: Output of Code 9.12 for finding the convex hull of object in binary image

9.13 Thinning
Thinning operation is an application of the hit and miss transform. It is a
specific type of skeletonization – which is a general category (to be
discussed in coming sections). By now, we know that erosion and opening
operations reduce the object in size. If erosion is applied enough times, the
foreground completely vanishes. However, the opening can only be applied
once as applying it a second time onwards does not change the result. There
is something in between these two extremes — thinning. Even if an infinite
number of iterations are allowed, it will not erode the object completely. It
preserves the extent and connectivity of foreground elements. Let us
understand it through some illustrations.

9.13.1 Illustration of thinning


Refer to Figure 9.22. Part (a) of this figure has the input image having a
strange foreground shape on purpose. Note that while only one foreground
object is shown, there could be multiple objects as well. The point to note
about the foreground shape is that it is all connected (somewhere very thick
and somewhere connected by a pixel thick line), and it has various parts
bulging out here and there.
Part (b) of the figure shows the results of the thinning operation with infinite
iterations allowed. However, once the result of thinning the object becomes
the same as the previous iterations, the operation is stopped (we will soon
mathematically define the operation). Note that the result also has a single
object only (all connected — no breaks). Remember that (from Figure 9.5)
opening may introduce breaks. Thinning maintains connectivity. Another
important point to note is that the result in part (b) looks like some kind of
bone structure (skeleton) of the foreground (if it is considered an organism).
Notice that there is a bone for almost every big bulge or protrusion. This is
what we mean by extent preservation. We may call the result in part (b)
some kind of skeleton of the input object, but be cautious because,
technically, skeletonization is a generalized term and can be achieved in
several ways. We will study it in the coming sections.

Figure 9.22: Thinning operation


Talking about part (c) of Figure 9.22, it is a partially thinned output of part
(a) with the number of iterations specified by the user. However, in part (b),
an infinite number of iterations are theoretically allowed, but the result
stabilizes (does not change with further thinning) in a finite number of
iterations.

9.13.2 Mathematics behind thinning


The thinning operation can be defined by the following two equivalent
equations:
Equation 9.19:

or alternatively
Equation 9.20:

with the symbols having a usual meaning as defined earlier. The structuring
element may have any desired shape, but usually, at each successive
iteration, we do not use the same structuring element. Rather, we use a set of
structuring elements and repeat them once the set has been exhausted. This
is done to uniformly thin the object edges from all sides. An example of a set
of structuring elements is shown in Figure 9.23. The order of structuring
elements applied at each iteration also matters.

Figure 9.23: Structuring element array for thinning


The skeleton formed due to thinning is such that it is on the medial position
of the object (foreground).

9.13.3 Python code for thinning


One may use Equation 9.19 or 9.20 to implement thinning from scratch (we
recommend trying it out) or use Code 9.13, which is an inbuilt function for
thinning in python:
01-
#=====================================================
=================
02- # PURPOSE : Learning Morphological Thinning
03-
#=====================================================
=================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- import skimage as ski
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- # Import and Display the image
12- input_image=cv2.imread('img31.bmp',0)
13- fig1,ax1=plt.subplots(1,3)
14- mf.my_imshow(input_image,'(a) Input Binary Image',ax1[0])
15-
16- # Calculate Thinned Image and Display
17- thin_image = 255*ski.morphology.thin(input_image)
18- # thin_image =
255*ski.morphology.thin(input_image,max_num_iter=np.Inf)
19- mf.my_imshow(np.uint8(thin_image),'(b) Thinned Image',ax1[1])
20-
21- # Find Partially Thinned Image and Display
22- P_thin_image =
255*ski.morphology.thin(input_image,max_num_iter=15)
23- mf.my_imshow(np.uint8(P_thin_image),'(c) Partially Thinned
Image',ax1[2])
24-
25- plt.show()
26- print("Completed Successfully ...")
Code 9.13: Code for a morphological thinning operation
The code is self-explanatory if you have followed it so far. Line number 17
does the job of thinning using the thin function from skimage.morphology
package. This is the default usage. It assumes infinite iterations and chooses
the structuring element array by itself. Line number 18 will do the same job
as line number 17, but here, the maximum number of iterations is explicitly
passed as a parameter. We use infinite from the numpy library here. One
may also use it as any finite number if desired. This is done in line number
22, where only 15 iterations are allowed. The result of this code is already
shown in Figure 9.22.

9.14 Thickening
Thickening is the morphological dual of thinning and is defined by the
following equation:
Equation 9.21:

The structuring elements remain the same in form as for thinning, but the
foreground is replaced by the background. Do not care about being do not
care — remain unaffected.
Thickening is usually applied as thinning of the background of the input
image by first complementing the input image and then complementing the
result obtained after the background (now a foreground of the complimented
image) thinning. During thickening, after the final image is obtained
according to the procedure used above, there may be some disconnected
points or small sets of pixels that need post processing for their removal.

9.15 Skeletons
Skeletonizing is a general operation of finding the skeleton of a given object
in a binary image. This can be done in many ways. One such way was
thinning, which was covered in Section 9.13. There can be many more ways.
We discuss one such method in this section, followed by a different
implementation in Python to see different perspectives.

9.15.1 Illustration of skeletonizing


One method of finding the skeleton of a foreground object in a given binary
image is illustrated in Figure 9.24. There are many parts to this figure. Let us
see parts (p), (q), and (r) first. Part (p) displays the original image. Part (q)
shows the structuring element. Part (r) shows the image obtained by
processing the image in part (p), i.e., the original image by structuring
element in part (q) by an array of methods. This array of methods, called
skeletonizing (one possible way), will be discussed soon. The reverse of this
process, i.e., conversion of the skeleton in part (r) using structuring element
in part (q), is also another array of methods that will be soon discussed. In
this process, the image in part (p) can be represented by fewer pixels, as
shown in part (r) and can be recovered whenever desired. This is why part
(r) is called the skeleton of the image in part (p). Also, note that the skeleton
pixels in part (r) lie in the middle of the foreground object.

Figure 9.24: One process of skeletonizing


One could raise the objection that the skeleton is not fully connected. It has
many disconnected parts; however, that is not a problem. One may define a
skeleton to be discrete or fully connected. In the present case, we allow it to
be discrete. Now, let us look at the process of forward and backward
conversion.
Parts (a), (b), and (c) of Figure 9.24 shows the original image, the result of
eroding the original image, and result of eroding the eroded image once
more, respectively. Note the nomenclature in the title of parts (b) and (c).
E(a) in part (b) of the figure means the erosion of the image in part (a) one
time. Similarly, in part (c) of the figure, the title E(b) means erosion of
image in part (b). Erosion can be thought of as the process of removing the
one-pixel thick outer layer of the object. We have performed this process
twice. This could be done till we have an empty image, but for the process of
illustration, we stop at 2 levels in part (c).
Parts (d), (e), and (f) show the result of opening (denoted by O( ) in the title)
the images in parts (a), (b), and (c), respectively. Remember that opening is
known to remove thin protrusions or connections. Now, intuitively, the thin
protrusions or connections that are removed by erosion at every stage must
be part of the skeleton. We can obtain these removed protrusions or
connections by taking the set difference of images in parts (a) and (d), parts
(b) and (e), and parts (c) and (f), respectively. This is shown in parts (g), (h),
and (i), respectively. In parts (j), (k), and (l), successive unions of parts (g),
(h), and (i) are shown. The final union shown in part (l), called the skeleton
of the foreground object, is shown in part (a) of the figure. The claim here is
that part (l) of the figure can generate the image in part (a) by some kind of
reverse process, and that is why we call the image in part (j) a skeleton of
part (a) with respect to the given structuring element.
The reverse process is simple and is shown in the remaining part of the
figure. Simply take the dilation [represented by D( ) in titles] of the
successive unions found in parts (j), (k), and (l). This is shown in parts (m),
(n), and (o) of the figure, respectively. Part (p) shows the union of images in
parts (m), (n), and (o) and the result matches with the original image in part
(a).
One important point to note here is that the images in parts (l) and (r) differ
in grayscale values of the pixel. Part (l) displays a binary image, but in part
(r), the image is not binary. Part (r) is the union of parts (g), (h), and (i). The
pixels of parts (g), (h), and (i) are marked in different gray colors in part (r)
as during reconstruction, we need to take dilation of part (g) and then
dilation of union of parts (g) and (h) which happens to be an image in part
(k) and then the dilation of union of parts (g), (h), and (i), which is part (l).
Then finally, take the union of all three separate dilations. This is why the
skeleton in part (r) is not a binary image.

9.15.2 Mathematical formalization of skeletonizing


Although there are different approaches to skeletonizing, keeping in mind
the material presented in the previous section, we can write the following
equation to describe the process:
Equation 9.22:
The above equation generates part (l) of Figure 9.24 from parts (g), (h), and
(i), as shown in the following equation:
Equation 9.23:

The above equation generates parts (g), (h), and (i) of Figure 9.24 from pairs
in parts (a) and (d), parts (b) and (e), and parts (c) and (f). This is shown in
the following equation:
Equation 9.24:

For the reverse process, we have the following equation:


Equation 9.25:

The above equation generates part (p) of the figure from parts (m), (n), and
(o), which were generated by dilating parts (j), (k), and (l), respectively.

9.15.3 Python code for skeletonizing


One could use the methods presented in the previous sections to generate the
results of Figure 9.24. In this section, we present the result of skeletonizing
as present in the skimage.morphology library. This results in a fully
connected skeleton, and the method used there is different from the one
presented in the previous section. The method can be explored by the help
command on shell — help(skeletonize) after skimage.morphology is
imported.
01-
#=====================================================
========================
02- # PURPOSE : Finding Skeletonized Image
03-
#=====================================================
========================
04- import cv2
05- import matplotlib.pyplot as plt
06- import numpy as np
07- from skimage.morphology import skeletonize
08- import my_package.my_functions as mf # This is a user defined package
09- # one may find the details related to its contents and usage in section
2.7.3
10-
11- # Import and display the image
12- input_image=cv2.imread('img31.bmp',0)
13- fig1,ax1=plt.subplots(1,2)
14- mf.my_imshow(input_image,'(a) Input Binary Image',ax1[0])
15-
16- # Find Skeletonized Image and Display
17- skeletonized_image = 255*skeletonize(input_image>0)
18- # 'input_image>0' is used to binarize the input image if not already.
19- mf.my_imshow(np.uint8(skeletonized_image),'(b) Skeletonized
Image',ax1[1])
20-
21- plt.show()
22- print("Completed Successfully ...")
Code 9.14: Skeletonization in the skimage library
The result of the above code is shown in Figure 9.25:
Figure 9.25: Skeletonizing the input image
Skeletonizing is used in character recognition for a given language when the
binary image of handwritten text is available.

Conclusion
In this chapter, the processing algorithms of binary images were introduced.
They include erosion, dilation, opening, closing, hit and miss transform, etc.
Based on these basic operations, one can form higher operations like
boundary extraction, hole filling, connected component analysis, thinning,
thickening, and skeletonizing, etc.

Points to remember
• Erosion tends to remove the outer layer of the binary object if a box filter
is used.
• Dilation does the opposite of erosion.
• Opening and closing are like erosion and dilation with the difference of
how they treat protrusions.
• Finding the convex hull of a two-dimensional object can be compared to
wrapping a rubber band around it.
• Thinning is one kind of skeletonizing.
• There may be multiple definitions of skeletonizing.
Exercises
1. Take a grayscale image and import it using Python. Binarize it by using
some threshold value. Find the number of connected components in the
binary image.
2. Write a Python code that automatically fills all holes of a given size or
less in a binary image.
3. Read the algorithm behind skeletonizing in Python using the help
command and compare it with the one presented in this chapter. List out
its advantages and disadvantages.
4. Take an image of a fingerprint and try to remove its noise by using
binary morphology.

Join our book’s Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com

OceanofPDF.com
Index
Symbols
2D DFT, configuring 209, 210
2D DFT Domain, frequency 210-217
2D Discrete Fourier Transform (2D DFT) 209
2D With Python, filtering 155-159

A
Adaptive Filtering 300, 301

B
Boundary Extraction 360
Boundary Extraction Code, implementing 362
Boundary Extraction Mathematics, operations 360

C
code files 22
Color Model 135
Color Model, types
Cyan-Magenta-Yellow (CMY) 136
Hue-Saturation-Intensity 137, 138
RCB 135, 136
Connected Component 368
Connected Component, pass 368
Continuous Time, ways
aperiodic 194, 195
fourier, transforms 195-197
periodic 193, 194
Convex Hull 375, 376
Convex Hull, architecture 376-378
Convex Hull Code, implementing 379-384
Convex Hull Image, optimizing 384
Convolution/Correlation, comparing 183-186

D
DataFrame 78
Degradation Model 278, 279
derivative-based image, section
Prewitt Kernel 174-176
Roberts Kernel 178
Sobel Kernel 177, 178
Digital Image 2
Digital Image, processing 3
Dilation 342
Dilation, illustration 343
Dilation Mathematics, operations 343, 344
Dilation Python, code 344, 347
Discrete Time Fourier Series (DTFS) 203
Discrete Time Signals 197, 198
Discrete Time Signals, optimizing 198-202
DTFS, ways
aperiodic, signals 204
fourier, transform 205-209
periodic, signals 203, 204

E
Erosion/Dilation, interpreting 347

F
Fast Fourier transform (FFT) 205
filter design, elements
derivative filters 150-152
second order, velocity 152, 153
weight, average 149, 150
Filtering, domains
1D Frequency 219-223
2D Ideal 227-230
band pass 249, 250
band stop 248, 249
Butterworth 237-247
frequency 223, 224
Gaussian Lowpass 231-237
high pass 247, 248
Fourier Transform, utilizing 308-314
Frequency Domain, visualizing 306-308

G
Grayscale Image 4

H
help 26
Histogram 91
Histogram, architecture 104
Histogram Equalization, concepts
digital data, limitations 103
dimensional data, visualizing 99-101
mathematical, pre-requisite 99
Histogram, implementing 104-106
Histogram Matching 107-109
Histogram Mathematical Data, optimizing 104
Histogram, process
Equalization 97-99
Grayscale Image, preventing 92, 93
information, obtaining 94-96
Hit/Miss Code, implementing 357-359
Hit/Miss Mathematic, operations 356, 357
Hit/Miss Transform 354-356
Hole Filling 363
Hole Filling, algorithm 364, 365
Hole Filling, architecture 364
Hole Filling Code, implementing 365-367
Homomorphic Filtering 262
Homomorphic Filtering, components
contrast, improving 267, 268
Illumination 262, 263
smoothening 263-267

I
IDLE’s editor 21
Image Conventions, indexing 8-10
Image, filtering 272-275
Image, formats 10
Intensity Transformation 109
Intensity Transformation, key points
Logarithmic Transformation 111-114
negatives 110
Power Law, correction 114-117

L
linear index 9
list 29

M
Mac/Linux, steps 19, 20
math library 25
Matplotlib 66
Matplotlib, points
simple graphs, plotting 66-69
subplots, using 69-71
Median Filtering 254-257
N
Neighborhood 90, 91
noise 279-283
noise, configuring 295-300
noise image, detecting 292, 293
noise removal, ways
1D Signal 324-328
Wavelets 329-332
noise, types
Erlang 286, 287
Exponential 287
Gaussian 283, 284
Rayleigh 285, 286
Salt/Pepper 288-291
Uniform 288
np.array() 53
NumPy Library 40
NumPy Library, points
Array Operations 57-59
One Dimensional Array, optimizing 40-47
Sub-Arrays, preventing 61-65
Three Dimensional Array, utilizing 54-56
Two Dimensional Array, optimizing 47-53
NumPy Library, steps 16-18

O
One Dimension 142, 143
One Dimension, aspects
2D Version, filtering 153-155
filter design, intuition 148
Graphical Illistrates 146, 147
One Dimension, elements
analog signals 190-192
Periodic/Aperiodic, signals 193
One Dimension System 143
One Dimension System, types
Linear Systems 144
Linear Time Invariants (LTI) 145
Open/Close 348
Open/Close, applications 350, 351
Open/Close Code, optimizing 351
Open/Close, illustration 348
Open/Close Mathematical, operations 349
OpenCV 71
OpenCV, concepts
image, importing 72, 73
Matplotlib Image, displaying 73-75
Python Package, importing 75-77

P
Pandas 78
Pandas Library, optimizing 78-83
Peak Signal-to-Noise Ratio (PSNR) 293, 294
Phase Spectrum 268
Phase Spectrum, configuring 268-270
Phase Spectrum Images, swapping 270, 271
Phase Spectrum Phase, filtering 271, 272
pixel address 9
pixel index 9
Pixel/Patches, configuring 86-90
pixels 2
Python, elements
conditional, statements 31-33
Hello World, optimizing 20, 21
IDLE Editor 21, 22
Input/Output, optimizing 22-26
Lambda Function, utilizing 37-39
List Data, structure 26-30
Loops 34-37
Tuple Data, structure 30, 31
variables, printing 21
Python Library, steps 18
Python shell 20
Python Software 14
Python Software, steps 14-16

R
Region Filling 367
Resolution/Frequency, utilizing 304, 305
RGB Image 5, 6
RGB Image, interpretation 7, 8

S
Sampling Theorem 257
Sampling Theorem, ways
1D, generalization 261
Aliasing 257-261
scale/scalogram, concepts 314-316
Sharpening Filters 171
Sharpening Filters, aspects
2D Derivative 182, 183
derivative-based image 174
unsharp, masking 171, 172
Skeletonizing 389
Skeletonizing Code, implementing 392, 393
Skeletonizing, illustration 389-391
Skeletonizing Mathematical, operations 391
Skimage Library 373-375
Smoothening Filters 159, 160
Smoothening Filters, types
Averaging 160, 161
Circular 161-165
Gaussian 166-169
Weight 165, 166
Soil Erosion 336
Soil Erosion, application 339
Soil Erosion, illustration 336-338
Soil Erosion Mathematic, operations 338
Soil Erosion Python, code 339, 340
Spatial Transformation 117
Spatial Transformation, aspects
Affine 117-124
Projective 126-134
subscripted index 9
Subscripted/Linear Indices, comparing 360-362

T
Thickening 389
Thinning 386
Thinning Code, implementing 388, 389
Thinning, illustration 386, 387
Thinning Mathematics, operations 387, 388
Time/Frequency, resolution 305, 306

U
uint8 86

V
variable 21

W
Wavelet Transform 316-320
Wavelet Transform, analyzing 321-324

OceanofPDF.com

You might also like