Industrial Vision Systems With Raspberry Pi
Industrial Vision Systems With Raspberry Pi
I N N O VAT I O N S
SERIES
Industrial Vision
Systems with
Raspberry Pi
Build and Design Vision products
Using Python and OpenCV
—
K. Mohaideen Abdul Kadhar
G. Anand
Maker Innovations Series
Jump start your path to discovery with the Apress Maker Innovations
series! From the basics of electricity and components through to the most
advanced options in robotics and Machine Learning, you’ll forge a path to
building ingenious hardware and controlling it with cutting-edge software.
All while gaining new skills and experience with common toolsets you can
take to new projects or even into a whole new career.
The Apress Maker Innovations series offers projects-based learning,
while keeping theory and best processes front and center. So you get
hands-on experience while also learning the terms of the trade and how
entrepreneurs, inventors, and engineers think through creating and
executing hardware projects. You can learn to design circuits, program AI,
create IoT systems for your home or even city, and so much more!
Whether you’re a beginning hobbyist or a seasoned entrepreneur
working out of your basement or garage, you’ll scale up your skillset to
become a hardware design and engineering pro. And often using low-
cost and open-source software such as the Raspberry Pi, Arduino, PIC
microcontroller, and Robot Operating System (ROS). Programmers and
software engineers have great opportunities to learn, too, as many projects
and control environments are based in popular languages and operating
systems, such as Python and Linux.
If you want to build a robot, set up a smart home, tackle assembling a
weather-ready meteorology system, or create a brand-new circuit using
breadboards and circuit design software, this series has all that and more!
Written by creative and seasoned Makers, every book in the series tackles
both tested and leading-edge approaches and technologies for bringing
your visions and projects to life.
Acknowledgments����������������������������������������������������������������������������xvii
Introduction���������������������������������������������������������������������������������������xix
vii
Table of Contents
viii
Table of Contents
ix
Table of Contents
x
Table of Contents
xi
Table of Contents
Bibliography�������������������������������������������������������������������������������������321
Index�������������������������������������������������������������������������������������������������323
xii
About the Authors
Dr. K. Mohaideen Abdul Kadhar completed
his undergraduate degree in Electronics and
Communication Engineering in 2005 and his
M.Tech with a specialization in Control and
Instrumentation in 2009. In 2015, he obtained
a Ph.D. in Control System Design using
evolutionary algorithms. He has more than 16
years of experience in teaching and research.
His areas of interest include computer vision,
image processing, data science, optimization
techniques, controller design, Python programming, working with
Raspberry Pi boards and edge AI. He is currently developing customized
industrial vision systems for various industrial requirements. He has been
a consultant for several industries in developing machine vision systems
for industrial applications, a master trainer, and delivered workshops
on computer vision, image processing, optimization techniques, control
system, data science, and Python programming.
xiii
About the Authors
xiv
About the Technical Reviewer
Farzin Asadi received his B.Sc. in Electronics
Engineering, his M.Sc. in Control Engineering,
and his Ph.D. in Mechatronics Engineering.
Currently, he is with the Department of
Electrical and Electronics Engineering at the
Maltepe University, Istanbul, Turkey. Dr. Asadi
has published over 40 international papers
and 25 books. He is on the editorial board of
seven scientific journals as well. His research
interests include switching converters, control
theory, robust control of power electronics
converters, and robotics.
xv
Acknowledgments
I am deeply grateful to my parents and family for their unwavering support
and encouragement throughout this journey. Their belief in me has been
my strength.
I would like to express my heartfelt gratitude to my wife Mrs.
M. Jashima Parveen for her endless patience, incredible support, and
understanding throughout this journey. Her unwavering belief in me, even
when I doubted myself, has been my pillar of strength. Her patience, love,
and encouragement have been instrumental in making this book a reality.
I am truly blessed to have her by my side.
I would like to express my sincere thanks to the principal and
management of Sri Eshwar College of Engineering for their continuous
support and encouragement. In particular, I would like to thank Principal
Dr. Sudha Mohanram for her visionary leadership and unwavering support
in developing the advanced research center "Computational Imaging and
Machine Vision." Her continuous support, guidance, encouragement, and
belief in my strength have been instrumental in making this book a reality.
Her dedication to fostering a culture of innovation and excellence has been
truly inspiring.
I would like to extend my heartfelt thanks to my coauthor, G. Anand,
for his invaluable contributions to this book. His expertise, dedication, and
insights have enriched the content and quality of this work. Collaborating
with him has been a rewarding experience, and I am grateful for his
partnership throughout this journey.
My sincere thanks to the Apress team, especially Shobana Srinivasan,
James Markham and Jessica Vakili for their patience and support makes to
write quality contents of this book.
—G. Anand
xviii
Introduction
In recent years, there has been a significant demand for vision systems in
industries due to their noninvasive nature in inspecting product quality.
Industrial vision systems can be developed for various applications such
as identifying surface defects and faults, verifying component orientation
and measurements, and more. Modern image processing techniques like
image segmentation, thresholding, contours, feature extraction, object
detection, image analysis, and frame analysis can be utilized to create real-
time vision systems for quality checks in industries.
This book demonstrates the implementation of industrial vision
systems in real time using Raspberry Pi. It is designed to be helpful for
graduate students, research scholars, and entry-level vision engineers.
The book begins with an introductory chapter that provides readers with
an overview of such systems. The second chapter introduces readers to
Raspberry Pi hardware, guiding them through OS installation and camera
setup with practical demonstrations in Python software. Chapter 3 helps
readers learn and practice essential Python libraries for handling images
and videos.
Chapter 4 covers the challenges involved in developing standard
industrial vision systems, including the choice of cameras, their
placement, and required lighting conditions. Chapter 5 provides a
detailed overview of fundamental image processing techniques, starting
with basic concepts like image properties and types, and progressing to
advanced image operations like filtering, morphological operations, and
thresholding.
xix
Introduction
xx
CHAPTER 1
Introduction
to Industrial Vision
Systems
Modern manufacturing relies on industrial vision systems, which give
machines the ability to see and understand their surroundings with
previously unheard-of efficiency and precision. These cutting-edge
technologies revolutionize quality control and automation processes by
converting raw data into insights that can be put into practice. A thorough
introduction to industrial vision systems, this book can be considered as a
comprehensive guide for students, scholars, and professionals just starting
out in the field of computer vision. This book provides both theoretical
insights and practical implementation, as it carefully examines the diverse
components and cutting-edge techniques essential to vision systems.
Readers will get practical experience creating reliable vision systems for
industrial use, with an emphasis on using the flexible Raspberry Pi board
and Python programming. By the end of this journey, readers will possess
the knowledge and capabilities necessary to design and develop efficient
vision systems, paving the way for novel ideas and breakthroughs in the
area of industrial automation.
2
Chapter 1 Introduction to Industrial Vision Systems
The images in the human vision system are carried by nerve impulses
from the retina of the eye to the brain where they are processed, whereas
the images in the computer vision system are carried by electric impulses.
The most important difference in terms of the performance of the two
systems is that the human vision system is trained by a lifetime of context
that enables humans to perceive a lot of information regarding a scene,
such as the types of objects present, the distance in which they are present,
if they are moving or stationary, the speed of a moving object, etc., in a
split second. On the other hand, the computer vision system developed for
a particular application needs to be trained only with respect to the context
of that application that enables the system to perform particular tasks
relatively quickly compared to human vision.
Over the years, computer vision has found a lot of applications across
various domains. For instance, a lot of imaging techniques are currently
used in the field of medicine to diagnose as well as gather information on
a number of ailments. Computer vision has naturally found its way into
biometric recognition where a number of imaging techniques, ranging from
simple thumbprints to face images, are used to verify the authenticity of
persons. Another area where computer vision has been gaining prominence
is the military where a number of sensors including imaging sensors are
used to sweep over a particular area and learn a lot of information that
can be helpful in making timely strategic decisions. One more interesting
application of computer vision system is the development of autonomous
vehicles, ranging from simple cars or trucks to advanced technologies like
UAVs and rovers for space exploration that learn from the images of the
scene in front of them to make real-time split-second decisions that can
be used to perform maneuvering or even more complicated tasks. The
industrial sector is another domain that is deeply impacted by computer
vision techniques in the recent times with a goal of driving efficiency and
precision. By utilizing computer vision, industrial vision systems may
3
Chapter 1 Introduction to Industrial Vision Systems
4
Chapter 1 Introduction to Industrial Vision Systems
5
Chapter 1 Introduction to Industrial Vision Systems
teaching purposes, this board gained more popularity than expected and
later went through a series of upgrades to cater to a wide range of domains
such as computer vision, robotics, localized cloud, etc. It supports a
number of operating systems depending on the nature of application for
which the board is to be used. The primary OS provided by the Raspberry
Pi Foundation is the Raspberry Pi OS, previously named as Raspbian OS,
which is a Unix-like operating system based on Debian Linux distribution.
It comes with a default integrated development environment (IDE) for
Python which can be utilized for implementing our vision system using
the OpenCV package. The Raspberry Pi board provides a dedicated
interface to connect with the camera, and Raspberry Pi camera modules
are also available that are compatible with all the Raspberry Pi models.
The Pi board also provides USB ports for which USB webcams can also be
interfaced.
The manufacturing industries usually have a production line followed
by an assembly line. The production line in a manufacturing industry is
basically a line of machines and workers along which a product moves
while it is being produced. For example, a production line may involve
processes like molding, packaging, painting, dying, etc. The parts or
components coming out of various production lines are usually passed
to an assembly line where they are assembled or configured in a series of
steps to get a final product.
Quality control is an essential part of a production line as it helps to
meet customer expectations and ensure satisfaction, maintain a strong
brand reputation to stay competitive, reduce cost by minimizing defects,
and comply with standards and regulations. Most industries employ inline
quality control wherein multiple inspection points are incorporated across
the production line. These points often employ people to manually inspect
the quality of the product with respect to various standards or specifications,
identify any noncompliant or defective product, and isolate them from
the production line or the assembly line. But humans are often prone to
a number of shortcomings such as biased views, error-prone judgments,
6
Chapter 1 Introduction to Industrial Vision Systems
7
Chapter 1 Introduction to Industrial Vision Systems
Camera Illuminaon
Control System
Robot or PLC
8
Chapter 1 Introduction to Industrial Vision Systems
9
Chapter 1 Introduction to Industrial Vision Systems
Summary
This chapter has provided a peek into the word of industrial vision systems
where we discussed the following topics:
• A gentle introduction to computer vision systems
touching upon its history, relevancy with human
vision, and its applications
10
Chapter 1 Introduction to Industrial Vision Systems
11
CHAPTER 2
Getting Started
with Raspberry Pi
Raspberry Pi is a series of small credit-card-sized single-board
computers developed by the Raspberry Pi Foundation in London.
These minicomputers are portable and less expensive and provide
interoperability with other hardware such as monitor, keyboard, and
mouse, thereby effectively converting it as a low-cost PC. Originally
developed for educational purposes, these boards became more popular
as a multifunctional device and went on to find applications across various
domains such as robotics, network management, weather monitoring, etc.
It also turned into a favorite gadget for people with computer or electronics
background to develop hobby projects. This chapter starts with a brief
introduction about the components of the Raspberry Pi hardware to give
the readers a feel of its capabilities. Then, it delves into the installation
of the Raspberry Pi OS along with the OpenCV library followed by the
configuration of the board for remote access. The chapter ends with a
demonstration of interfacing a camera to the Pi board that would enable us
to capture images or videos in real time.
1
www.raspberrypi.com/documentation/computers/raspberry-pi.html
14
Chapter 2 Getting Started with Raspberry Pi
The first Raspberry Pi board was released in the year 2012. Since then,
the Raspberry Pi series has gone through a lot of changes. Each stage of
evolution came with something new that was embraced at once by the Pi
community. Table 2-1 shows the changes in features across the different
models of Raspberry Pi released over the years.
15
Chapter 2 Getting Started with Raspberry Pi
Installing OS
Now that we have a basic understanding of the hardware features
supported by the different versions of Raspberry Pi boards, it’s time to see
how to interface the board with I/O devices, install the operating system,
and use the board as a PC. We’ll need a computer monitor, keyboard and
mouse, the Raspberry Pi power supply, and an SD card (a minimum of
8GB microSD card is recommended).
A software called “Raspberry Pi Imager” is recommended by the
Raspberry Pi community to install an operating system to your SD card.
Another SD card-based installer named NOOBS (New Out Of the Box
Software) was previously used widely, but the Raspberry Pi community no
longer recommends or supports using NOOBS. Use the following steps to
install the Raspberry Pi OS in the SD card.
18
Chapter 2 Getting Started with Raspberry Pi
provided that all those devices as well as our Pi board have bluetooth
support. After connecting all the I/O devices, we can go ahead and power
up the board.
When the Raspberry Pi boots up for the first time, it will take us to a
configuration wizard that will allow us to set up our Raspberry Pi. The
wizard will first allow us to configure international settings and our time
zone information, following which we will be prompted to create a user
account where we can choose our username and password. Note that
some old versions of the OS may require us to go with a default username
of “pi.” After creating our user account, we can configure our screen (like
reducing the size of the desktop) and our wireless network. Once the
wireless network is configured, the Raspberry Pi board will have access
to the Internet. Sometimes, we will be prompted to update our OS to the
latest version once we are connected to the Internet. In this case, we can
go ahead with the update following which we will be prompted to reboot
the Raspberry Pi. Now the board will boot into the Raspberry Pi OS which
will have an easy-to-navigate interface like other operating systems such as
Windows and Linux.
19
Chapter 2 Getting Started with Raspberry Pi
20
Chapter 2 Getting Started with Raspberry Pi
This will download the entire zip file to our Raspberry Pi. After this
we need to download the OpenCV contrib zip file which has an extended
library that is extremely important for our entire OpenCV to work. These
are pre-built packages for Python and can be downloaded using the
following terminal command:
We should now unzip both the zip files using the terminal command
unzip opencv.zip and unzip opencv_contrib.zip, respectively. Both the
folders are now unzipped and ready to use. The next step will be to install
the Python library called numpy which provides support for working with
multidimensional arrays and matrices. Since images are represented as an
array of numbers (we will discuss more about this in Chapter 5), OpenCV
requires numpy library support to perform most of the image processing
tasks. This library can be downloaded using the terminal command pip
install numpy. Note that this pip package is already downloaded along
with our Python package.
We already have two unzipped folders or directories named
“opencv-4.0.0” and “opencv_contrib-4.0.0” in our home directory. In order
to compile the OpenCV library, we need to create a new directory called
“build” inside the “opencv-4.0.0” directory which can be done using the
following terminal commands.
cd ~/opencv-4.0.0
mkdir build
cd build
Now that we have navigated into the “build” folder in our terminal
window, the next important step will be to run CMake for OpenCV. This
is where we can configure how OpenCV has to be compiled. Type the
following lines in the terminal window:
21
Chapter 2 Getting Started with Raspberry Pi
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib-4.0.0/
modules \
-D ENABLE_NEON=ON \
-D ENABLE_VFPV3=ON \
-D BUILD_TESTS=OFF \
-D WITH_TBB=OFF \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-D BUILD_EXAMPLES=OFF ..
23
Chapter 2 Getting Started with Raspberry Pi
24
Chapter 2 Getting Started with Raspberry Pi
IP address. We can look out for the hostname of our Raspberry Pi (default
hostname is “raspberrypi” as mentioned earlier) in this list and get its IP
address.
There are other ways of getting the IP address of the Pi board, which
the readers are free to explore on their own. Once we have the IP address,
we can access our Raspberry Pi using either the SSH protocol or the
VNC. Let us discuss both these options in detail.
25
Chapter 2 Getting Started with Raspberry Pi
26
Chapter 2 Getting Started with Raspberry Pi
RealVNC in the device we will be using to take control and then sign in to
the VNC viewer using the same RealVNC account credentials and then
click to connect to our Raspberry Pi.
To complete the direct or cloud connection, we must authenticate to
VNC Server. If we are trying to establish a connection from the compatible
VNC Viewer app from RealVNC, we need to enter the username and
password that we normally use to log in to our user account on the
Raspberry Pi. But, if we’re connecting from a non-RealVNC viewer app,
we’ll need to downgrade VNC Server’s authentication scheme, specify
a password that is unique to VNC server, and then use that instead. To
accomplish this, we need to open the VNC Server dialog on our Raspberry
Pi, select Menu ➤ Options ➤ Security, and choose “VNC password” from
the “Authentication” dropdown.
27
Chapter 2 Getting Started with Raspberry Pi
To connect the camera module, gently pull up on the CSI port’s plastic
clip and insert the flex cable from the camera module into the port. Ensure
that the connectors at the bottom of the cable are facing the contacts in the
port while inserting. Next, boot the Raspberry Pi and go to the “Raspberry
Pi Configuration” in the “Preferences” menu. Select the “Interfaces” tab,
enable the camera, and then reboot the Raspberry Pi.
There are two ways to control the camera module. The first method is
to use the command line. Open the terminal window and run the following
command to take a still picture: raspistill -o Desktop/img.jpg. This will
open a camera preview for 5 seconds, then take a picture and save it in the
path provided by us in the command (in our case, the picture is stored with
the filename “img.jpg” in the desktop). In a similar way, we can record a
video using the terminal command raspivid -o Desktop/vid.h264.
28
Chapter 2 Getting Started with Raspberry Pi
The first two lines of the code are used to import the modules
necessary to run the code. Here, we have imported the PiCamera module
and sleep module. The third line of the code uses the PiCamera module
to create a camera object that can be used to preview or Ucapture images.
In the fourth line of code, we start a preview of the camera. As the camera
is activated to sense the light levels, it needs some time to adjust to the
lighting conditions. So we use the sleep module in the fifth line of code to
delay the execution of the code for 5 seconds after starting the preview.
Then, the capture function from the camera object is used to capture an
image and store it in the desktop. The last line of the code is used to stop
the preview. If our preview is upside down, we can use the line camera.
rotation = 180 next to the creation of the camera object (third line in
our code). Also, we can use the start_recording() and stop_recording()
function instead of the capture() function to record videos instead of
capturing still images.
29
Chapter 2 Getting Started with Raspberry Pi
30
Chapter 2 Getting Started with Raspberry Pi
To control the webcam with Python, we can use the OpenCV library.
The following lines of code are used to create an infinite loop to capture
the video and show it on our screen until we press a specified character key
from the keyboard that will break the loop and stop the video.
import cv2
webcam = cv2.VideoCapture(0):
while True:
ret, img = webcam.read()
cv2.imshow('video',img)
key = cv2.waitKey(1)
if key == ord("x"):
break
webcam.release()
cv2.destroyAllWindows()
The first line is used to import the OpenCV library modules. The
VideoCapture(0) function is used to create an object named “webcam”
to return the video from the first webcam on your Raspberry Pi. The
“while True:” condition is an infinite loop to create an endless display of
video on our display. The read() function is used to capture a frame from
the “webcam” object, and imshow() function from the OpenCV module
is used to display the image. The waitKey() function from the OpenCV
module is used to wait for 1ms to detect any key stroke from the user. The
ord() function is used to convert the character “x” to its corresponding
31
Chapter 2 Getting Started with Raspberry Pi
Summary
In this chapter, we have discussed about the following topics:
In the next chapter, we will discuss more about using OpenCV to read
and write images, as well as videos, and display them using a visualization
library in Python called Matplotlib.
32
CHAPTER 3
Python Libraries
for Image Processing
In this chapter, we will get introduced to Python programming, which
would set us up for the different concepts we will be learning throughout
the book using Python. We will start with the different Python-based IDEs
available for Raspberry Pi. Then, we will move on to work with a couple
of widely used Python libraries numpy and PIL that are used widely
with respect to image processing tasks. Next, we will see how to visualize
(display) images using the Matplotlib library. Finally, we will see how to
capture and save images as well as videos using the OpenCV library.
Thonny
This is a default IDE that comes with Python 3.10 installed along with
the Raspberry Pi OS. We can open it by navigating to the “Programming”
drop-down menu under the main menu (Raspberry Pi Logo) and clicking
“Thonny Python IDE.” The IDE has a very simple interface, as shown in
Figure 3-1, with two parts: the code editor is where we can type our code
and the shell is where we can get inputs to our program, display outputs
and errors, and execute simple lines of code. There are additional tabs that
can be added to the default interface, and they are available in the “View”
menu. For example, if we check the “Variables” option in the “View” menu,
this will display the “Variables” tab to the right side of the interface where
all the variables we create in our program and their respective values are
displayed. Similarly, the readers are free to explore the other windows on
their own.
34
Chapter 3 Python Libraries for Image Processing
Geany
Geany is an open source lightweight IDE that is supported in many
different operating systems such as Windows, Linux, MacOS, Raspberry,
Pi OS, etc. This IDE is also installed by default along with the Raspberry Pi
OS. This IDE supports multiple programming as well as markup languages
such as C, C++, C#, Java, PHP, HTML, Python, Ruby, etc. The default
interface of the Geany IDE is shown in Figure 3-2. As seen in the figure, the
IDE opens with an empty untitled editor tab where we can type our code.
The status tab shows the timestamp for every activity that has happened
from the time we open Geany IDE. The Symbols tab will list all the symbols
(variables) that we create in the program. Once we type the program in the
editor, we can save it using the .py extension and execute the program by
clicking “Execute” option in the “Build” drop-down menu or by pressing
the F5 key in our keyboard. The output will be displayed in a separate
command window.
35
Chapter 3 Python Libraries for Image Processing
IDLE
IDLE is a default IDE installed along with the Python package download
from its official page. But in case of Raspberry Pi OS, Thonny is the default
IDE for Python that comes with the OS. Since most Python programmers
are accustomed to using IDLE in other OS environments, it would be
comfortable for them to stick to the same IDE in Raspberry Pi OS as well.
To install IDLE in Raspberry Pi OS, enter the following command in the
Terminal window: sudo apt-get install idle3. The IDLE interface is shown in
Figure 3-3. As we can see, the IDE opens to a shell window whose function
is the same as the shell in Thonny IDE. To open an editor, go to “File” and
click “New File.” We can then type our complete code in the editor window,
save the file using .py extension, and then execute the same.
36
Chapter 3 Python Libraries for Image Processing
All the codes and demonstrations in this book are done using the
default “IDLE” IDE. The readers are free to select an IDE of their choice as
the choice of IDE will not affect the illustrated codes in the book. With the
selection of a desired IDE, let us now proceed to understand the basic data
types supported in Python.
Integer List
Float Tuple
Complex String
There is not much to say about the numeric data types as most of
us are familiar with them. The advantage in Python is that the data type
is detected by default as soon as we assign a value to a variable. This is
evident from the simple example shown as follows. The type() command
displays the data type of the variable given to it. In Table 3-1, the output of
the print statements is shown in the right column. Arithmetic operators
and assignment operators can be used to perform mathematical
operations with respect to these data types.
37
Chapter 3 Python Libraries for Image Processing
The Boolean data type consists of just two values: true and false.
This data type is mostly associated with conditional statements where
conditional operators and logical operators applied on other data types
result in a Boolean data type based on which a subset of code is executed.
The set data type is an unordered collection of elements (of other data
types) separated by commas written inside curly brackets. A unique
feature of set is that it does not allow duplicate values. Set operations such
as union, intersection, difference, and complement can be performed
on these sets. Dictionaries are used to store data points as key-value
pairs separated by commas written inside curly brackets. Each value in a
dictionary can be accessed only with the help of its key. A simple example
of these three data types is shown in Table 3-2.
38
Chapter 3 Python Libraries for Image Processing
39
Chapter 3 Python Libraries for Image Processing
predefined methods are available for tuples and strings as well which the
readers can explore. Table 3-3 shows some simple examples for these
sequence data types.
40
Chapter 3 Python Libraries for Image Processing
The for loop is used for iterating over a sequence (such as a list, tuple,
string, set, or dictionary). The for loop can execute a set of statements,
once for each item in the sequence. For example, the for loop in Table 3-5
performs the sum of numbers 1 to 5.
41
Chapter 3 Python Libraries for Image Processing
The while loop, on the other hand, can execute a set of statements
repeatedly as long as a condition is true. The sum of numbers 1 to 5 can
also be performed using a while loop using the code in Table 3-6.
42
Chapter 3 Python Libraries for Image Processing
NumPy
NumPy is a library used specifically for working with multidimensional
arrays and matrices. Created by an American data scientist and
businessman Travis Oliphant in the year 2005, NumPy offers a collection
of functions that can be used to create as well as perform mathematical
operations on arrays. The NumPy package can be installed in our
Raspberry Pi OS using the terminal command sudo apt-get install python3-
numpy. It is essential that we get comfortable with NumPy arrays, as the
OpenCV library that we will be using to build our vision systems utilize the
NumPy arrays to store and operate on the image data. This is not a surprise
as images are stored in memory as a matrix of numbers also known as
pixel values. So, let’s dive in and get a feel of the different aspects of the
NumPy library.
43
Chapter 3 Python Libraries for Image Processing
44
Chapter 3 Python Libraries for Image Processing
import numpy as np 42
#0D array Number of dimensions: 0
a=np.array(42) Shape of a: ()
print(a) Number of elements in
print("Number of dimensions:",a.ndim) a: 1
print("Shape of a:",a.shape)
print("Number of elements in a:",a.size)
#1D array [1 2 3]
b=np.array([1,2,3]) Size of b: 1
print(b) Shape of a: (3,)
print("Size of b:",b.ndim) Number of elements in
print("Shape of a:",b.shape) a: 3
print("Number of elements in a:",b.size)
#2D array [[1 2 3]
c=np.array([[1,2,3],[4,5,6]]) [4 5 6]]
print(c) Number of dimensions: 2
print("Number of dimensions:",c.ndim) Shape of a: (2, 3)
print("Shape of a:",c.shape) Number of elements in
print("Number of elements in a:",c.size) a: 6
45
Chapter 3 Python Libraries for Image Processing
The indexing in NumPy arrays is similar to that of lists except the fact
that there may be more than one index value depending upon the number
of dimensions in the array. For example, consider a 3x3 matrix stored in
a variable “a”. a[2,2] denotes the value of the array a at the intersection of
the 3rd row and 3rd column (since the index starts from 0). The above
indexing is illustrated in the following table.
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
As we can see, the number 9 at the intersection of the 3rd row, 3rd
column has a row index of 2 and a column index of 2. We can also slice an
array using the format “start:end+1” with respect to any dimension. For
instance, a[0:2,:] indicates that we take the values from all the columns
in rows 1 and 2 alone. If we don’t provide a value for “start,” then the start
value is assumed as “0” by default (so we can also write :2 instead of 0:2),
and if we don’t provide a value for “end+1,” then the values up to the last
row or column are taken. The following table would clearly illustrate this
example.
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
Let us consider another example to get the hang of how the indexing
works. This time we would type the code to perform the operation as
illustrated in Table 3-9.
46
Chapter 3 Python Libraries for Image Processing
a=np. [4 5 6]
array([[1,2,3],[4,5,6],[7,8,9]]) [7 8 9]]
print(a)
47
Chapter 3 Python Libraries for Image Processing
sum() Returns the sum of array elements over the specified axis
add() Returns the element-wise addition of two arrays
subtract() Returns the element-wise subtraction of two arrays
sqrt() Returns the element-wise square root of an array
multiply() Returns the element-wise multiplication of two arrays
matmul() Returns the matrix multiplication of two arrays
copy() Creates a copy of the original array; changes made to one will
not affect the other
view() Creates a view of the original array; changes made to one will
affect the other
reshape() Changes the shape of the original array
concatenate() Merges two arrays along a specified axis
array_split() Splits an array into multiple parts
where() Searches an array for a certain value and returns the indexes
that get a match
sort() Sorts the array along a specified axis
48
Chapter 3 Python Libraries for Image Processing
on, a new library called Pillow was built on top of PIL which provides
support for Python 3 and can run on all operating systems. A wide variety
of image file formats are supported by Pillow such as jpeg, png, bmp, gif,
tif, etc. We can perform various operations on the images such as filtering,
blurring, resizing, cropping, adding a watermark, etc. This library can be
installed in our Raspberry Pi OS using the terminal command sudo apt-get
install pillow.
To read and display an image, we can use the open() and show()
functions in the Pillow library, respectively. The filename of the image
along with the path name should be specified inside the open() function
which would then read the image and create an object with the name that
we provide. Then, the object name must be given to the show() function to
display the image. We can then use instance attributes to get more details
about the image such as format, size, and mode. The format attribute tells
us the type of the image such as jpeg, png, bmp, etc. The size attribute
gives us a 2-tuple which contains the width and height of the image in
pixels. The mode attribute gives the names of the bands in the image like
“L” for grayscale images, “RGB” for true color images, etc. The code given
in Table 3-11 illustrates all these functions and attributes.
49
Chapter 3 Python Libraries for Image Processing
We can see that the image is a png file. The size of the image is 512,512
which implies that it is 512 pixels wide and 512 pixels high and it belongs
to the category of RGB images. Pillow library provides a number of
functions to perform various image processing tasks. We will see a few of
them in action, and the rest of them are left to the readers to explore.
The crop() function can be used to get a sub-rectangle region from the
given image. The boundary of the sub-rectangle region is specified by a
4-tuple with coordinates specified in the template: “crop(left, upper, right,
lower)”. In other words, we specify the row and column number for the
pixel in the top-left corner of the sub-rectangle followed by the row and
column number for the pixel in the bottom-left corner. This operation is
illustrated in Table 3-12.
50
Chapter 3 Python Libraries for Image Processing
Pillow offers a few functions that can be used for performing geometric
transformations. For instance, the resize() function, illustrated in
Table 3-13, can be used to change the size of the image where we need to
specify the number of rows and columns required inside the function.
51
Chapter 3 Python Libraries for Image Processing
b=a.resize((150,150))
b.show()
52
Chapter 3 Python Libraries for Image Processing
b=a.rotate((45))
b.show()
53
Chapter 3 Python Libraries for Image Processing
b=a.transpose(Image.FLIP_LEFT_
RIGHT)
b.show()
54
Chapter 3 Python Libraries for Image Processing
# colour transform
b=a.convert("L")
b.show()
(continued)
55
Chapter 3 Python Libraries for Image Processing
Code Output
56
Chapter 3 Python Libraries for Image Processing
(2,2,1) (2,2,2)
(2,2,3) (2,2,4)
57
Chapter 3 Python Libraries for Image Processing
58
Chapter 3 Python Libraries for Image Processing
59
Chapter 3 Python Libraries for Image Processing
import cv2
a=cv2.imread("C:/Users/User/Pictures/Lenna.png")
cv2.imshow("Lenna_image",a)
cv2.imwrite("C:/Users/User/Documents/Lenna_new.jpg",a)
60
Chapter 3 Python Libraries for Image Processing
61
Chapter 3 Python Libraries for Image Processing
The variable “a” created by reading the “Lenna” image using the
OpenCV package is of the data type “ndarray.” Therefore, the individual
pixels can be accessed by indexing, and the different functions available
with the numpy package can be applied to this variable. For instance, the
shape() command applied to the variable “a” with the “Lenna” image will
result in the tuple (512, 512, 3), which indicates that the variable is a 3D
array with 3 different matrices of size 512 x 512 corresponding to the three
color channels blue, green, and red.
This brings us to another stark difference between the image type in
reading with PIL and OpenCV package. The color images of “png,” “jpeg,” and
most other related file extensions are basically composed of these three color
channels blue(B), green(G), and red(R). The PIL package uses the RGB channel
ordering, whereas the OpenCV package uses the BGR channel ordering. The
imshow() command in the OpenCV package changes the ordering to RGB
and then displays the image. But if we use the imshow function in Matplotlib
package, it will display the image with the same BGR channel ordering.
Since the image is made of three color channels, each pixel in the image
is composed of three values corresponding to the three colors. Technically,
this implies that the color of each pixel is obtained by a mixture of different
shades (intensities) of the three basic colors blue, green, and red. As the
variable containing the image in OpenCV implementation is a numpy array,
the pixel values can be accessed using the normal array indices that we
discussed earlier in this chapter. For instance, the first pixel in the “Lenna”
image (matrix) stored in the variable “a” can be obtained using the index
a[1,1,:] which means that we access the values at the intersection of the first
row and first column from all three color channels.
Another difference is that the size function in PIL and OpenCV provide
different values. In PIL, the size function for the variable “a” containing
the “Lenna” image produces the tuple (512, 512) which is basically
the dimension of the 2D image. The third dimension, number of color
channels, is not displayed at the output, whereas the shape function from
numpy can be used to display it in case of OpenCV. The size function in
62
Chapter 3 Python Libraries for Image Processing
OpenCV provides the total number of pixels in the image which is nothing
but the product of the three values in the tuple obtained from the shape
function, that is, 512 x 512 x 3 = 786432 pixels. All these properties we
discussed above are illustrated using the code in Table 3-17 and the figure
following the code. Figure 3-8 shows the difference caused by the channel
ordering in these two different packages.
import cv2
import matplotlib.pyplot as plt
from PIL import Image
# PIL implementaion
a=Image.open("C:/Users/User/
Pictures/Lenna.png")
print(type(a)) <class ' PIL.
print(a.size) PngImagePlugin.
plt.subplot(211) PngImageFile '>
plt.imshow(a) (512, 512)
plt.title('Lenna Image read
with PIL package')
# OpenCV implementaion
b=cv2.imread("C:/Users/User/
Pictures/Lenna.png")
print(type(b))
print(b.shape)
print(b.size)
plt.subplot(212)
plt.imshow(b) <class 'numpy.ndarray'>
plt.title('Lenna image read (512, 512, 3)
with OpenCV package') 786432
plt.show()
63
Chapter 3 Python Libraries for Image Processing
Figure 3-8. Lenna Image from PIL and OpenCV displayed using
Matplotlib
64
Chapter 3 Python Libraries for Image Processing
displays the image read using OpenCV package in BGR channel ordering.
Therefore, it is necessary to convert the BGR image to RGB image in order
to display the color image properly. This can be achieved by using the
OpenCV function cv2.cvtColor which takes the variable containing the
input image and the function for type of conversion as parameters.
The conversion from BGR image to RGB image can be obtained by
providing the function cv2.COLOR_BGR2RGB as a parameter to the
cvtColor function. Similarly, the function for converting the BGR image to
a 2D grayscale image is cv2.COLOR_BGR2GRAY. The conversion to binary
image, on the other hand, is done using threshold operation with the cv2.
threshold function. This function takes four parameters: the variable
containing the grayscale input image, the threshold value, the maximum
value (which is 255 in case of grayscale image), and the thresholding
method. The two commonly used thresholding methods are binary
thresholding and OTSU thresholding. We will discuss these two types in
Chapter 5 of this book, whereas we will illustrate the binary thresholding
with the cv2.THRESH_BINARY function in the following code. Figure 3-9
shows the different image conversions discussed above. It can be observed
that the color map function “gray” has to be provided along with the image
variable in the imshow function for displaying single-channel images like
grayscale and binary images. The readers are encouraged to explore the
other image conversions from the OpenCV documentation.
import cv2
import matplotlib.pyplot as plt
a=cv2.imread("C:/Users/Lenovo/Pictures/Lenna.png")
plt.subplot(221)
plt.imshow(a)
plt.title('Original BGR image')
b=cv2.cvtColor(a, cv2.COLOR_BGR2RGB)
plt.subplot(222)
plt.imshow(b)
65
Chapter 3 Python Libraries for Image Processing
plt.title('RGB image')
c=cv2.cvtColor(a, cv2.COLOR_BGR2GRAY)
plt.subplot(223)
plt.imshow(c,cmap='gray')
plt.title('Grayscale image')
d=cv2.threshold(c,127,255,cv2.THRESH_BINARY)[1]
plt.subplot(224)
plt.imshow(d,cmap='gray')
plt.title('Binary image')
plt.show()
66
Chapter 3 Python Libraries for Image Processing
67
Chapter 3 Python Libraries for Image Processing
import cv2
import matplotlib.pyplot as plt
# read the input Lenna image
img=cv2.imread("/content/Lenna.png",0) # read the image as
grayscale
# perform image operations using opencv
cropped_image = img[0:320, 0:320]
resized_image = cv2.resize(img, (256, 256))
rotated_image = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
clipped_image = cv2.flip(img, 1)
blurred_image = cv2.blur(img, (3,3))
# diplay the images
plt.subplot(231)
plt.imshow(img, cmap='gray')
plt.title('Original Image')
plt.subplot(232)
plt.imshow(cropped_image,cmap='gray')
plt.title('Cropped Image')
plt.subplot(233)
plt.imshow(resized_image, cmap='gray')
plt.title('Resized Image')
plt.subplot(234)
plt.imshow(rotated_image,cmap='gray')
plt.title('Rotated Image')
plt.subplot(235)
plt.imshow(clipped_image,cmap='gray')
plt.title('Clipped Image')
plt.subplot(236)
plt.imshow(blurred_image,cmap='gray')
plt.title('Blurred Image')
plt.show()
68
Chapter 3 Python Libraries for Image Processing
69
Chapter 3 Python Libraries for Image Processing
The next step will be to capture the frames using the read() method
of the object created for video capture and then save the frames to the
output video using the write() method of the object created for writing the
captured video. These two operations would be carried out in an infinite
loop which can be terminated by a specified keystroke as we discussed
in the previous chapter. Once the loop is terminated, we can destroy all
windows and release both the Capture and Writer objects. We can also use
the imshow() function to display the frames that we are recording. The
following code summarizes the entire process behind capturing and saving
a video using OpenCV.
import cv2
vid=cv2.VideoCapture(0)
filename = "C:/Users/User/Pictures/out.mp4"
codec = cv2.VideoWriter_fourcc('H','2','6','4')
framerate = 30
resolution = (640,480)
vid_out =cv2.VideoWriter(filename,codec,framerate,resolution)
while(True):
ret,frame=vid.read()
vid_out.write(frame)
cv2.imshow('frame',frame)
key = cv2.waitKey(1)
if key == ord("x"):
break
vid.release()
vid_out.release()
cv2.destroyAllWindows()
70
Chapter 3 Python Libraries for Image Processing
Video Transformations
In most applications, the video frames need to be transformed before
analyzing them for information. The image operations discussed in the
previous section can be applied to the frames of the video captured in
real time. For instance, if the color information is not important, then the
frames can be converted to a grayscale image prior to further processing.
The following code illustrates three different operations applied to the video
frames in real time. Initially, the captured frames are converted to grayscale
images. Next, the grayscale images are cropped to a smaller size. The cropping
operation is normally performed to capture the area of interest. In this
illustration, the height and width of a frame are used to create the indices for
cropping the image array. Note that the double forward slash (//) operator is
used for division which indicates integer division since the array indices must
be integers. Finally, a blurring operation is used to smoothen the frames.
import cv2
vid=cv2.VideoCapture(0)
framerate = 30
resolution = (640,480)
while(True):
ret,frame=vid.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
h=gray.shape[0]
w=gray.shape[1]
gaussianblur = cv2.GaussianBlur(gray[h//2:3*h//2,
w//2:3*w//2], (5, 5), 0)
cv2.imshow('blur',gaussianblur)
key = cv2.waitKey(1)
if key == ord("x"):
break
vid.release()
cv2.destroyAllWindows()
71
Chapter 3 Python Libraries for Image Processing
Summary
The following are the topics we covered in this chapter:
72
CHAPTER 4
Challenges in
Machine Vision
System
Implementing machine vision systems can be challenging due to various
factors. One of the challenges is the need for accurate calibration and
alignment of cameras and sensors to ensure precise image capture.
Additionally, the complexity of processing large amounts of visual data
in real time requires powerful computational resources and efficient
algorithms. Furthermore, integrating machine vision systems into existing
manufacturing industry infrastructure may require significant investment
in hardware and software upgrades. The challenges with implementing
machine vision systems can be categorized as technical, environmental,
integration, performance, and human factors. In this chapter, we’ll look
more closely at these challenges.
Technical Challenges
One of the major technical challenges is the complexity of hardware and
software integration. Integrating machine vision systems into existing
infrastructure in the industry often involves connecting cameras and
74
Chapter 4 Challenges in Machine Vision System
75
Chapter 4 Challenges in Machine Vision System
76
Chapter 4 Challenges in Machine Vision System
Detail Size (ODS) is 0.2mm (approximately), and the Object Size (OS) is
55mm (this can be set by the user, or based on trial-and-error method, the
user can choose the length). The resolution can be calculated using the
following formula (4.1).
OS – Object Size
77
Chapter 4 Challenges in Machine Vision System
78
Chapter 4 Challenges in Machine Vision System
79
Chapter 4 Challenges in Machine Vision System
80
Chapter 4 Challenges in Machine Vision System
and controlled lighting systems can help mitigate the challenges posed
by variability in lighting conditions and enhance the performance of
machine vision applications. Figure 4-2 shows the effect of varying lighting
conditions on an object in a real-time environment.
81
Chapter 4 Challenges in Machine Vision System
Integration Challenges
Compatibility with existing systems and technologies includes
compatibility with existing systems and equipment, such as conveyors
or robotics, to seamlessly incorporate the machine vision system into
the production line. Integration also involves ensuring that the machine
vision system can effectively communicate and share data with other
components of the production process, such as quality control databases
or inventory management systems. Additionally, integration challenges
may involve training and educating personnel on how to effectively
use and interpret the data provided by the machine vision system. For
example, in a manufacturing plant, a machine vision system can be
integrated into the assembly line to inspect and verify the accuracy of
components being assembled. The system can capture images of each
component and compare them against a database of reference images
to detect any defects or variations. The machine vision system can then
communicate this information to the quality control database, triggering
alerts or initiating corrective actions if necessary. Training and educating
personnel on how to interpret the data provided by the machine vision
system is crucial for effective decision-making.
Incorporating a machine vision system seamlessly into the overall
workflow requires careful planning and coordination with various
departments. This may involve conducting thorough assessments
of existing processes and identifying areas where the system can
be integrated without disrupting productivity. Additionally, regular
maintenance and calibration of the machine vision system is essential
to ensure accurate and reliable results over time. Furthermore, training
employees on how to effectively use the machine vision system is
necessary to maximize its potential. This includes educating them on
how to interpret and analyze the data generated by the system, as well as
troubleshooting any issues that may arise. By investing time and resources
into the proper implementation and maintenance of a machine vision
82
Chapter 4 Challenges in Machine Vision System
83
Chapter 4 Challenges in Machine Vision System
Performance Challenges
The following performance problems of machine vision systems are
critical for satisfying the user or customer:
• Real-time processing and response require the
identification of potential limitations and obstacles
that may arise when implementing the machine vision
system in terms of performance, such as handling
large volumes of data or dealing with complex product
variations.
84
Chapter 4 Challenges in Machine Vision System
85
Chapter 4 Challenges in Machine Vision System
system can greatly enhance the efficiency and accuracy of defect analysis,
but it is essential to acknowledge and address its limitations and work
toward continuous improvement.
Summary
This chapter provides information about challenges in implementing
machine vision systems in terms of regular maintenance and calibration
that should be performed to ensure its continued accuracy and reliability.
This includes
86
CHAPTER 5
Image Processing
Using OpenCV
A number of image processing techniques with varying levels of
complexities are used in a number of applications like medical imaging,
multimedia processing, computer vision, etc. OpenCV provides a huge
collection of algorithms and functions that can help us to do a wide range
of image processing operations such as filtering, edge detection, object
detection, segmentation, and more. In this chapter, we will begin by
exploring the various properties and types of images along with the various
noises that could affect the images. Then, we will go on to explore various
processing techniques like image enhancement methods, image filtering
techniques for edge detection, morphological operations, thresholding
techniques, blob detection, and contour detection.
Image Acquisition
Image acquisition is the first and foremost part of any vision system. It is
the process of capturing images using hardware such as cameras or other
imaging sensors. Several factors decide the quality of the captured images
like the type of device used for acquisition, the resolution of the acquired
images, the lighting conditions, etc. In case of digital photography, the
acquisition process begins the moment the light enters the camera lens
and hits the image sensor, which in turn captures the light and converts it
into a digital signal that can be processed by the camera’s software.
There are two main categories of digital image sensors: the charge-
coupled device (CCD) sensors and the complementary metal oxide
semiconductor (CMOS) sensors. The quality of the acquired image is
affected by the type of sensor used in our camera. For instance, the CCD
sensors are more sensitive to light, and hence, they can produce high-
quality images, but they consume more power and are more expensive. On
the other hand, CMOS sensors are much cheaper and consume less power
compared to CCD sensors, but they produce low-quality images with
more noise.
The resolution of an image is also a dominant factor in influencing
the quality. High-resolution images will contain more pixels that results
in a sharper and more detailed image. However, the increased number
of pixels would require more storage space as well as processing power.
Another important factor affecting the quality of an image is the lighting
conditions prevalent at the time of image acquisition. Poor lighting
conditions can result in underexposed, overexposed, or blurry images that
may lead to more complexity during further processing. Therefore, there
needs to be some control with respect to the lighting conditions while
capturing images. This can be achieved by either adjusting the camera
settings or using external lighting sources.
The vision systems used in industries are often subjected to adverse
environments. Hence, it is essential to build a robust and reliable system
that can perform convincingly in these conditions. By optimizing the
abovementioned factors, it is possible to capture high-quality images
which in turn can provide accurate results.
88
Chapter 5 Image Processing Using OpenCV
Resolution
Resolution refers to the number of pixels that the image is made up of.
In simpler terms, resolution describes the level of detail in an image. For
instance, higher the resolution, the sharper and more detailed the images
will appear. In case of digital images, image resolution is often measured as
a pixel count. More generally, image resolution is described in PPI (pixels
per inch) or DPI (dots per inch) which represents how many pixels are
displayed per inch of an image. For example, a 300 ppi image would be
made of 300 pixels in every square inch of the image. A higher resolution
implies that there are more pixels per inch(PPI) which result in a high-
quality image with much more details. But the increase in pixels would
also increase the storage requirements. In other words, high-resolution
images would require more storage space, and it may take longer to load.
89
Chapter 5 Image Processing Using OpenCV
Aspect Ratio
Aspect ratio is the relationship between the height and width of an image.
It essentially determines the shape of an image and can affect how the
image is displayed or printed. The aspect ratio is usually written with
two numbers separated by a colon. When we say that the aspect ratio of
an image is w:h, then the image is “w” units wide and “h” units high. For
examples, a square image has an aspect ratio of 1:1 as the width and height
are equal. Other common aspect ratios are 4:3 (standard television) and
16:9 (widescreen television).
Color Depth
Color depth refers to the maximum number of colors that could be
represented in an image. It is also called as bit depth since it represents
the number of bits that define the color of each pixel. For instance, a
pixel with a depth of 1 bit have two values: black and white. More the bit
depth, more the colors that an image can contain, and more accurate
the color representation is. For example, an 8-bit image can contain up
to 28 = 256 colors, and a 24-bit image can contain 224 = 16,777,216 colors
(approximately 16 million). We would discuss about these different images
in the upcoming sections. Despite the number of colors an image contains,
the image displayed on to a screen is limited by the number of colors
supported by the monitor. For example, an 8-bit monitor can support 256
colors per channel.
Image Size
The size of an image, also called its dimensions or pixel dimensions, is
given in pixels in the format “width x height”. In other words, it gives the
number of pixels used to represent the image along the horizontal and
vertical direction in a display. For instance, the Lenna image we used in
90
Chapter 5 Image Processing Using OpenCV
the previous chapters has a dimension of 512 x 512 which implies that
the image is represented using 512 pixels both horizontally and vertically.
The image size can also be determined by multiplying both height and
width (in inches) by the dpi. For example, the pixel dimensions of a 4 x
6-inch photograph scanned at 300 dpi is 1200 x 1800 (4x300=1200 and
6x300=1800).
Noise
Image noise generally refers to the random variations in the brightness or
color information in digital images that are not part of the original scene or
object photographed. There are number of factors that cause these noises
like the quality of the camera sensor, the camera setting used, the amount
of available light, and so on. Depending on the causes and characteristics,
several different types of noises can occur in an image. These noises can
be removed or minimized by using filtering techniques which we would
discuss later in this chapter. Some of the common types of noises are
discussed as follows.
Gaussian Noise
This is the most common type of noise that occurs in most images during
acquisition. The major cause of this type of noise is the variations in the
level of illumination. It is called Gaussian noise because it has a probability
91
Chapter 5 Image Processing Using OpenCV
92
Chapter 5 Image Processing Using OpenCV
Speckle Noise
Speckle noise is a granular noise texture that degrades the quality of
an image by appearing as a grainy pattern in the image. This type of
noise mostly occurs in synthetic aperture radar (SAR) images, medical
ultrasound images, holographic images, etc. Speckle noise can be
generated artificially by multiplying the pixels of the image with random
noise values. Like the Gaussian noise, the speckle noise is also statistically
independent of the signal, but the difference is that the speckle noise is
multiplicative whereas Gaussian noise is additive. Figure 5-3 illustrates the
effect of speckle noise on the nut image.
93
Chapter 5 Image Processing Using OpenCV
Types of Images
Based on the bit depth, that is, the color information in an image and the
number of bits used to represent a pixel, the image can be categorized as
follows.
Binary Images
These are images that take only two possible pixel values. A pixel value
of 0 denotes black color, and a pixel value of 1 denotes white color. These
images are also called as black and white images or 1-bit images since a
single bit is sufficient to represent the pixel values as they can take only
either of the two values 0 and 1. We would use the same nut image to
illustrate the different types of images in this section. Figure 5-4 shows the
black and white version of the nut image. The matrix to the right shows
the pixel values corresponding to the rectangular area highlighted by the
red box in the image. We can see that the region of pixels with value 0 is
dark (black) and the region of pixels with value 1 is bright (white). The
transition between the black and white regions provides a sense of edge
between the two regions.
94
Chapter 5 Image Processing Using OpenCV
Grayscale Images
Grayscale images are those images that are composed exclusively of
shades of gray. These images are also called as monochrome images or
8-bit images as each pixel in this image is represented using 8 bits. This
implies that there are 28 = 256 possible values for a pixel. The values range
from 0 to 255 where a pixel with value 0 is a black pixel and a pixel with
value 255 is a white pixel. The values in between represent the different
levels of intensities in moving from the dark region toward the bright
region providing different shades of gray. Figure 5-5 shows the grayscale
version of the nut image and the matrix of pixel values corresponding to
the rectangular area highlighted by the red box in the image. We can see
that the pixel values corresponding to the dark region of the rectangle are
less than the values corresponding to the bright region.
Color Images
Color images are basically composed of three bands of monochrome
(grayscale) image data where each band of data corresponds to a different
color. In other words, the color image stores the gray-level information
in each spectral band (color band). Each pixel in the image will be
represented by three values corresponding to each color band, and each
95
Chapter 5 Image Processing Using OpenCV
96
Chapter 5 Image Processing Using OpenCV
97
Chapter 5 Image Processing Using OpenCV
import cv2
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
# Pillow library methods
img=Image.open("C:/Users/user/Pictures/Nut.jpg")
imgn = img.resize((128,128),Image.NEAREST)
imgbl = img.resize((128,128),Image.BILINEAR)
imgbc = img.resize((128,128),Image.BICUBIC)
imgl= img.resize((128,128),Image.LANCZOS)
plt.subplot(2,4,1)
plt.imshow(imgn)
plt.title('Nearest Neighbor_PIL')
plt.subplot(2,4,2)
plt.imshow(imgbl)
plt.title('bilinear_PIL')
plt.subplot(2,4,3)
plt.imshow(imgbc)
plt.title('bicubic_PIL')
plt.subplot(2,4,4)
plt.imshow(imgn)
plt.title('Lanczos_PIL')
# OpenCV library methods
98
Chapter 5 Image Processing Using OpenCV
99
Chapter 5 Image Processing Using OpenCV
Image Enhancement
The objective of image enhancement is to process an image so that
the result is much more suitable than the original image for a specific
application. It is a technique basically used to improve the visual quality
of the images. It is a crucial step for vision systems as the quality of the
image can have a significant impact on the outcome. Enhancement can
be considered as an umbrella term that includes a number of operations
such as removing blur, eliminating noise, increasing contrast, etc.
These enhancement operations can be done either in spatial domain
or frequency domain. The spatial domain techniques operate directly
on the pixels, whereas in frequency domain techniques, the frequency
information of the image is extracted using transformation techniques like
Fourier transform, and the enhancement operations are then done in the
frequency domain. We will focus only on the spatial domain operations.
The operation in spatial domain can be represented mathematically
as g(x,y)=T[f(x,y)] where g is the output image, f is the input image, and
T is an operator on f defined over a neighborhood of (x,y). Based on this
neighborhood of pixels over which the operations are performed, the
spatial domain techniques can be further classified into two types, namely,
point operations and spatial operations. In point operations, the operator
is applied over a neighborhood of “1×1” which implies a single pixel. In
other words, the selected operator is applied independently over each
and every pixel in the image. Therefore, we can rewrite the mathematical
transformation function as s=T(r), where T is a transformation technique
that maps a pixel value r to a pixel value s. On the contrary, spatial
operations are applied over a neighborhood of multiple pixels say “n×n”.
Each pixel in the output image is obtained by applying an operator over
a neighborhood on “n×n” pixels in the input image. Filtering is one of
100
Chapter 5 Image Processing Using OpenCV
Image Negatives
Assume that the given image has intensity levels in the range [0,L-1]. The
negative of the image is obtained by using the transformation function
s=L-1-r. For instance, if the input image is a grayscale image, then it will
have intensity levels in the range [0,255]. Therefore, the negative of the
grayscale image can be obtained by the function s=255-r. This function
basically reverses the intensity levels of an image that makes it appear
like a photographic negative. This function can be very useful especially
in the field of medical image processing. The following Python code
illustrates the negative operation on the nut image, and the resulting image
is displayed together with the input image in Figure 5-8 to observe the
difference caused by the operation.
Import cv2
import matplotlib.pyplot as plt
r=cv2.imread("C:/Users/user/Pictures/Nut.jpg",0)
s=255-r
plt.subplot(1,2,1)
plt.imshow(r,cmap='gray')
plt.title('Original Image')
plt.subplot(1,2,2)
plt.imshow(s,cmap='gray')
plt.title('Negative Image')
plt.show()
101
Chapter 5 Image Processing Using OpenCV
Log Transformation
For an image with intensity levels in the range [0,L-1], the log
transformation of the image can be obtained by the transformation
function s=clog(1+r). The log transformation is used to compress the
dynamic range of an image with large variations in pixel values by
expanding the range of low-intensity values while compressing the range
of high-intensity values. The converse is true for the case of inverse log
transformation. This transformation can be used when the dynamic range
of an image exceeds the capability of a display device, thereby making it
difficult for the display to faithfully reproduce the wide range of values. For
the purpose of illustration, we will use the same nut image to see how this
transformation affects the image (with c=1). We can see from Figure 5-9
that the dark regions have been enhanced implying an increase in the
range of dark pixels.
import cv2
import matplotlib.pyplot as plt
import numpy as np
r=cv2.imread("C:/Users/user/Pictures/Nut.jpg",0)
102
Chapter 5 Image Processing Using OpenCV
s=np.log(1+r) # c=1
plt.subplot(1,2,1)
plt.imshow(r,cmap='gray')
plt.title('Original Image')
plt.subplot(1,2,2)
plt.imshow(s,cmap='gray')
plt.title('Image after log transformation')
plt.show()
103
Chapter 5 Image Processing Using OpenCV
import cv2
import numpy as np
import matplotlib.pyplot as plt
r = cv2.imread("C:/Users/user/Pictures/Nut.jpg",0)
gamma_corrected_1 = r**0.1 # gamma=0.1
gamma_corrected_2 = r**1.2 # gamma=1.2
gamma_corrected_3 = r**2.4 # gamma=2.4
plt.subplot(2,2,1)
plt.imshow(r,cmap='gray')
plt.title('Original Image')
plt.subplot(2,2,2)
plt.imshow(gamma_corrected_1,cmap='gray')
plt.title('Gamma = 0.1')
plt.subplot(2,2,3)
plt.imshow(gamma_corrected_2,cmap='gray')
plt.title('Gamma = 1.2')
plt.subplot(2,2,4)
plt.imshow(gamma_corrected_3,cmap='gray')
plt.title('Gamma = 2.4')
plt.show()
104
Chapter 5 Image Processing Using OpenCV
Contrast Stretching
Contrast stretching is a part of functions called piecewise-linear
transformation functions that are not entirely linear in nature. Contrast
refers to the difference in luminance or color that makes an object
distinguishable from other objects in a frame. Contrast of an image can be
represented mathematically as
I max − I min
contrast =
I max + I min
105
Chapter 5 Image Processing Using OpenCV
where Imax is the maximum possible intensity level and Imin is the minimum
possible intensity level of the image. For example, a grayscale image has
the maximum intensity value of Imax=255 and a minimum intensity value
of Imin=0.
Contrast stretching process can be used to expand the range of
intensity levels in an image in such a way that it covers the entire possible
range of the camera or display. The mapping between the intensity levels
of the input grayscale image and output image obtained from the contrast
stretching process is shown in Figure 5-11 where the dotted line denotes
the identity mapping for which the output image is equal to the input
image and the solid line denotes the mapping for contrast stretching.
106
Chapter 5 Image Processing Using OpenCV
slopes, we will have three different straight-line equations. The slope of the
lines can be calculated using the slope triangle method. The mapping can
be modified by changing the parameters (r1, s1) and (r2, s2). An important
thing to note here is that when r1=s1=0 and r2=s2=255, then the function
becomes equal to the straight dotted line. The following code illustrates
this process with (r1, s1) = (75, 25) and (r2, s2) = (160, 220), and the resulting
image is shown in Figure 5-12.
import cv2
import matplotlib.pyplot as plt
import numpy as np
r=cv2.imread("C:/Users/user/Pictures/Nut.jpg",0)
m=r.shape[0]
n=r.shape[1]
# enter the parameters
r1=75
s1= 25
r2=160
s2=220
# implement the straight lines
s=np.empty([m,n]) # initialize empty array for output image
for i in range(m):
for j in range(n):
if (0<=r[i,j] and r[i,j]<=r1):
s[i,j]=(s1/r1)*r[i,j]
elif (r1<r[i,j] and r[i,j]<=r2):
s[i,j]=((s2-s1)/(r2-r1))*(r[i,j]-r1)+s1
else:
s[i,j]=((255-s2)/(255-r2))*(r[i,j]-r2)+s2
plt.subplot(1,2,1)
plt.imshow(r,cmap='gray')
plt.title('Original Image')
107
Chapter 5 Image Processing Using OpenCV
plt.subplot(1,2,2)
plt.imshow(s,cmap='gray')
plt.title('Image after contrast stretching')
plt.show()
108
Chapter 5 Image Processing Using OpenCV
When the kernel is centered on a pixel, the values of the pixels covered
by the kernel and the values of the kernel are combined in some way to
produce a new value for the center pixel. Basically, each value of the kernel
is multiplied by the corresponding pixel value in the image over which
it is placed on, and then all the multiplication results are then added to
obtain the new value for the center pixel. The kernel is slid over the image
to modify each pixel in the image. This combined operation of shift,
multiply, and add is termed as convolution, and since this is done over a
2D image plane, it is also called 2D convolution. Let us look at some of the
commonly used edge detection filters in this section.
Mean Filter
A mean filter is generally used to reduce noise in an image by averaging the
pixel values in a small neighborhood surrounding each pixel in the image.
The mean filter is implemented using a kernel that is a square matrix with
odd number of rows like 3x3, 5x5, and 7x7, etc. When the center pixel of
the kernel is positioned over the pixel being processed, the values of all
pixels covered by the kernel are added, and the resulting sum is divided
by the total number of pixels in the kernel to obtain the average. This
average value is then used to replace the center pixel in the image covered
by the kernel. The kernel is then slid over the image to perform the same
operation over each and every pixel in the image. The effect of the mean
filter is to smoothen the image which helps to reduce the high-frequency
noise present in the image. Since the edges of the image correspond to
high-frequency information, the edges can be blurred which in turn
reduces the sharpness of the image.
This can be a drawback in applications that require edge preservation.
One way to work around this blurring effect of the mean filter is to use a
weighted mean filter, in which the kernel weights are chosen based on a
predefined distribution such as the Gaussian distribution. The weighted
mean filter assigns higher weight to the center of the kernel and lower
109
Chapter 5 Image Processing Using OpenCV
weights as we move away from the center. This preserves the edges and
reduces the blurring effect. The effect of mean filter and weighted mean
filter on the grayscale nut image is illustrated in the following code. We
can clearly observe the blurring of edges in the output of mean filter in
Figure 5-13 which is then mitigated by the weighted Gaussian mean filter.
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("C:/Users/user/Pictures/Nut.jpg",0)
new_img1 = cv2.blur(img,(7,7)) # (7,7) is the kernel size
new_img2 = cv2.GaussianBlur(img,(7,7),0)
plt.subplot(131)
plt.imshow(img,cmap='gray')
plt.title('Original Image')
plt.subplot(132)
plt.imshow(new_img1,cmap='gray')
plt.title('Output of mean filter')
plt.subplot(133)
plt.imshow(new_img2,cmap='gray')
plt.title('Output of weighted mean filter')
plt.show()
110
Chapter 5 Image Processing Using OpenCV
Median Filter
The idea behind median filter is pretty straightforward. When the kernel
is placed over the image, the pixels covered by the kernel are sorted in
ascending order, and the center pixel is replaced by the median value of
the sorted list of pixels. The median filter can be quite handy in removing
salt and pepper noise. We know that the intensity value of pepper noise
is close to zero, whereas the intensity value of salt noise is close to 255.
So, these noise values are at the two extremes of the intensity spectrum,
and hence, they are naturally removed when we replace each pixel by the
median value of the neighborhood covered by the kernel. This filtering
process is illustrated in the following code using the nut image. We can
clearly see the effectiveness of the median filter in improving the quality of
the image affected by the salt and pepper noise in Figure 5-14.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread('C:/Users/user/Pictures/Nut.jpg',0)
# add salt and pepper noise
img_sn = img.copy()
prob=0.3
probs = np.random.random(img_sn.shape[:2])
img_sn[probs<(prob/2)] = 0
img_sn[probs>1-(prob/2)] = 255
# apply median filter
med = cv2.medianBlur(img_sn,3) # kernel size (3,3)
plt.subplot(131)
plt.imshow(img,cmap='gray')
plt.title('original image')
plt.subplot(132)
plt.imshow(img_sn,cmap='gray')
111
Chapter 5 Image Processing Using OpenCV
Sobel Filter
Sobel filter is a popular edge detection algorithm that works by measuring
the gradient of image intensity at each pixel within an image. It finds the
edges by looking for smooth or abrupt intensity changes at each pixel and
determining how likely that the pixel belongs to an edge. It also determines
the direction in which the edge is likely to be oriented. The Sobel filter uses
two 3x3 kernels, one each for the horizontal and vertical direction. The
kernels are given as
-1 0 1 -1 -2 -1
-2 0 2 0 0 0
-1 0 1 1 2 1
112
Chapter 5 Image Processing Using OpenCV
Following are the steps involved in detecting edges using Sobel filter:
G = Gx2 + G y2
G
Θ = arctan y
Gx
Canny Filter
Canny edge detector is a popular multistep algorithm used to detect the
edges in a given digital image. Developed by John Canny in the year 1986,
this algorithm has found widespread used in computer vision and image
113
Chapter 5 Image Processing Using OpenCV
114
Chapter 5 Image Processing Using OpenCV
115
Chapter 5 Image Processing Using OpenCV
import cv2
import matplotlib.pyplot as plt
import numpy as np
img=cv2.imread("C:/Users/user/Downloads/Gear.jpg")
# convert to grayscale image
grayimg=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Smoothen the image
blurred=cv2.GaussianBlur(grayimg,(3,3),0)
# Apply Sobel Filter
# calculate x and y gradient
sobelx=cv2.Sobel(blurred,cv2.CV_64F,1,0,ksize=3)
sobely=cv2.Sobel(blurred,cv2.CV_64F,0,1,ksize=3)
# Calculate gradient magnitude and direction
grad_mag=np.sqrt(sobelx**2+sobely**2)
grad_dir=np.arctan(sobely,sobelx)
# Apply threshold to obtain binary image
threshold=120
s_edgeimg=np.uint8(grad_mag>threshold)*255
# Apply the Canny edge detector
116
Chapter 5 Image Processing Using OpenCV
117
Chapter 5 Image Processing Using OpenCV
element, the pattern of 1’s denotes the shape of the structuring element, and
one of its pixels is usually treated as the origin of the structuring element.
A common practice is to have odd dimensional matrix as the structuring
element where the center of the matrix is usually considered as the origin.
There are two basic morphological techniques in image processing
named erosion and dilation. Other morphological operations like opening
and closing are done using a combination of the erosion and dilation
process. Both erosion and dilation follow a process similar to convolution,
where a small structuring element is slid over the image in a row-wise
manner so that the center pixel is positioned at all possible locations in
the image. At each position, it is compared with the connected pixels, and
based on how the pixels of the structuring element matches the pixels of
the image, we can have three different operations:
Erosion
To understand the process of erosion, consider the binary image block and
structuring element shown in Figure 5-16. Here the dark pixels of the
image have a value of “0”, and the bright pixels have a value of “1”. The
structuring element of 1’s is then used to traverse the image and find the
pixel values where the element fits the neighborhood. The center pixel of
118
Chapter 5 Image Processing Using OpenCV
the image block is maintained as “1” in the event of a fit, else they are
changed to “0”. The only place where the fit occurs is shown with a red
bounding box, and the resulting output image is shown to the right. The
erosion operation is denoted by the symbol .
It can be seen that the erosion with a small structuring element tends
to shrink the structures in an image by stripping away a layer of pixels from
the inner as well as outer boundaries of regions. The erosion process can
be used to split or disassemble joint objects and to remove the extrusions
from an object. This is illustrated from the following code, and the
resulting image is given in Figure 5-17.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# read image and convert it to binary image
img = cv2.imread('C:/Users/user/Pictures/erosion.jpg', 0)
(thresh, bwImage) = cv2.threshold(img, 127, 255, cv2.
THRESH_BINARY)
# define the structuring element
se = np.ones((3, 3), np.uint8)
# apply erosion
img_erosion = cv2.erode(bwImage, se, iterations=1)
119
Chapter 5 Image Processing Using OpenCV
Dilation
Dilation, on the other hand, works by changing the pixel value of image
covered by the center pixel of the structuring element to “1” if there is at
least a single pixel value that is matched between the structuring element
and the image neighborhood covered by it. In other words, we change the
center pixel to “1” if the structuring element hits the image neighborhood.
Let us consider the same image block and structuring element illustration
as before. The places where the structuring element hits the image
neighborhood are indicated by the red bounding box, and the resulting
image is shown to the right in Figure 5-18. The dilation operation is
denoted by the symbol ⨁.
120
Chapter 5 Image Processing Using OpenCV
It can be seen that the dilation with a small structuring element tends
to expand the structures in an image by adding a layer of pixels from the
inner as well as outer boundaries of regions. The dilation process can be
used to repair any breaks or damages in the image and repair or correct
intrusions in the image. This is illustrated from the following code, and the
resulting image is given in Figure 5-19.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# read image and convert it to binary image
img = cv2.imread('C:/Users/user/Pictures/chapter5_dilation.
jpg', 0)
(thresh, bwImage) = cv2.threshold(img, 127, 255, cv2.
THRESH_BINARY)
# define the structuring element
se = np.ones((5,5), np.uint8)
# apply dilation
img_erosion = cv2.dilate(bwImage, se, iterations=1)
# display the images
plt.subplot(121)
plt.imshow(bwImage,cmap='gray')
121
Chapter 5 Image Processing Using OpenCV
plt.title('Original image')
plt.subplot(122)
plt.imshow(img_erosion,cmap='gray')
plt.title('Dilated image')
plt.show()
122
Chapter 5 Image Processing Using OpenCV
123
Chapter 5 Image Processing Using OpenCV
124
Chapter 5 Image Processing Using OpenCV
Binary Thresholding
This is the simplest form of thresholding where the pixel values of a
grayscale image are compared to a threshold T. The pixels with value less
than T are converted to black pixels with value “0”, and those with value
greater than T are converted to white pixels with value “1”. In essence, the
grayscale image is converted to a binary image. The application of binary
125
Chapter 5 Image Processing Using OpenCV
threshold with three different values 63, 127, and 191 to our nut image is
illustrated in the following code. Since 63 is closer to the black pixel value
“0” and distant from the white pixel “255” in the grayscale image, a larger
range of pixels is converted to white pixels in the resultant binary image
which results in an image dominated by white intensity. The second
threshold of 127 is in the midway between the black pixel “0” and the white
pixel “255” that gives a better-balanced binary image. The third threshold
of 191 is very close to the white pixel “255” resulting in a large range of
pixels to be turned black in the binary image. The resulting images for
these three thresholds are shown in Figure 5-21.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# read image
img = cv2.imread('C:/Users/user/Pictures/Nut.jpg', 0)
# apply binary thresholding
(thresh1, bwImage1) = cv2.threshold(img, 63, 255, cv2.
THRESH_BINARY)
(thresh2, bwImage2) = cv2.threshold(img, 127, 255, cv2.
THRESH_BINARY)
(thresh3, bwImage3) = cv2.threshold(img, 191, 255, cv2.
THRESH_BINARY)
# display the images
plt.subplot(221)
plt.imshow(img,cmap='gray')
plt.title('original image')
plt.subplot(222)
plt.imshow(bwImage1,cmap='gray')
plt.title('Threshold value=63')
plt.subplot(223)
plt.imshow(bwImage2,cmap='gray')
126
Chapter 5 Image Processing Using OpenCV
plt.title('Threshold value=127')
plt.subplot(224)
plt.imshow(bwImage3,cmap='gray')
plt.title('Threshold value=255')
plt.show()
127
Chapter 5 Image Processing Using OpenCV
import cv2
import numpy as np
import matplotlib.pyplot as plt
# read image
img = cv2.imread('C:/Users/user/Pictures/Nut.jpg', 0)
# apply binary inverse thresholding
(thresh1, bwImage1) = cv2.threshold(img, 63, 255, cv2.THRESH_
BINARY_INV)
(thresh2, bwImage2) = cv2.threshold(img, 127, 255, cv2.THRESH_
BINARY_INV)
(thresh3, bwImage3) = cv2.threshold(img, 191, 255, cv2.THRESH_
BINARY_INV)
# display the images
plt.subplot(221)
plt.imshow(img,cmap='gray')
plt.title('original image')
plt.subplot(222)
plt.imshow(bwImage1,cmap='gray')
plt.title('Threshold value=63')
plt.subplot(223)
plt.imshow(bwImage2,cmap='gray')
plt.title('Threshold value=127')
plt.subplot(224)
128
Chapter 5 Image Processing Using OpenCV
plt.imshow(bwImage3,cmap='gray')
plt.title('Threshold value=255')
plt.show()
Otsu’s Thresholding
In both the thresholding methods that we have discussed, the threshold
for segmentation is selected manually. There are other methods where an
optimal threshold could be selected automatically by implementing a fixed
procedure. One common method for automated thresholding is Otsu’s
129
Chapter 5 Image Processing Using OpenCV
method, named after Nobuyuki Otsu, the person behind the development
of this algorithm. Otsu’s thresholding is a simple method used to generate
a threshold value that splits the images into two classes, the foreground
and the background, by minimizing the intra-class variance. This method
is well suited for bimodal images whose histogram shows two clear peaks,
each representing different intensity range. So, if the intensity range of the
foreground and background is clearly segregated from each other, then
Otsu’s threshold would produce a binary image with clear segmentation
between the foreground and background. For our illustration, let us go
with the same nut image. We will plot the histogram to check if there are
two clear peaks. The result of binary thresholding with the middle value of
127 alongside the result of Otsu’s thresholding shown in Figure 5-23 helps
us to compare the two techniques.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# read image
img = cv2.imread('C:/Users/user/Pictures/Nut.jpg', 0)
# calculate histograk
hist = cv2.calcHist([img],[0],None,[256],[0,256])
# binary thresholding
(thresh1, bwImage1) = cv2.threshold(img, 127, 255, cv2.
THRESH_BINARY)
# Otsu thresholding
(otsu_thresh, bwImage2) = cv2.threshold(
img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU,
)
# display the images
plt.subplot(221)
plt.imshow(img,cmap='gray')
plt.title('original image')
130
Chapter 5 Image Processing Using OpenCV
plt.subplot(222)
plt.plot(hist)
plt.title('Histogram of original image')
plt.subplot(223)
plt.imshow(bwImage1,cmap='gray')
plt.title('Binary thresholding with threshold value=127')
plt.subplot(224)
plt.imshow(bwImage2,cmap='gray')
plt.title(f'Otsu thresholding with threshold value =
%f'%otsu_thresh)
plt.show()
131
Chapter 5 Image Processing Using OpenCV
We can see that the histogram of the nut image does not have two
distinct peaks since there is no clear segregation between the foreground
and background intensities. Around the top-right corner, the bright
portions of the nut image are similar in intensity to the background at that
area. Similarly, we can observe that the dark pixel intensities of the image
to the left side are similar to the shadow created in the background. It can
also be observed from the code segment for Otsu’s thresholding that the
initial value of the threshold is set to 0 and the Otsu method computes
the optimal threshold to be 121 which is indicated in the title of the fourth
image in the plot. From the histogram, it is evident that one cluster of
pixels has a distribution with a clear peak, whereas the rest of the pixels
are distributed widely with no clear peak. Otsu’s method tries to find the
optimal value between the two clusters and ends up with the value of 124.
The readers can try to apply this method to an image with higher contrast
between the foreground and background to see a much better result.
Adaptive Thresholding
In all the thresholding techniques discussed so far, a single threshold
value is used for all the pixel values. Such a threshold is often termed as
a global threshold. Alternatively, we can apply different threshold values
for different parts of an image based on the local values of the pixels in
each neighborhood. These threshold values are called local thresholds,
and the techniques used to apply local thresholds are termed as adaptive
thresholding techniques. These kinds of techniques are suited for images
with uneven lighting conditions.
In adaptive thresholding, the threshold is calculated from either
arithmetic mean or Gaussian mean of the pixel intensities in each region.
All the pixel values contribute equally in the calculation of arithmetic
mean value, whereas in Gaussian mean value, maximum weightage is
given to the center pixel and the weights decrease as we move farther from
the center pixel. The process of adaptive thresholding using arithmetic
132
Chapter 5 Image Processing Using OpenCV
mean and Gaussian mean is illustrated here using the nut image, and the
resulting images are shown in Figure 5-24.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# read image
img = cv2.imread('C:/Users/user/Pictures/Nut.jpg', 0)
# adaptive thresholding using arithmetic mean
bwImage1 = cv2.adaptiveThreshold(
img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,11,4)
# adaptive thresholding using Gaussian mean
bwImage2 = cv2.adaptiveThreshold( img,255,cv2.ADAPTIVE_
THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,4)
# display the images
plt.subplot(131)
plt.imshow(img,cmap='gray')
plt.title('original image')
plt.subplot(132)
plt.imshow(bwImage1,cmap='gray')
plt.title('Adaptive thresholding with arithmetic mean')
plt.subplot(133)
plt.imshow(bwImage2,cmap='gray')
plt.title('Adaptive thresholding with Gaussian mean')
plt.show()
133
Chapter 5 Image Processing Using OpenCV
In the preceding code, the number 255 denotes the maximum value
that is applied to pixel values exceeding the threshold. The number 11
indicates the size of the neighborhood area that is used to calculate the
threshold for each pixel. The number 4 at the end is a constant value that is
subtracted from the mean or Gaussian mean.
134
Chapter 5 Image Processing Using OpenCV
import cv2
import numpy as np;
import matplotlib.pyplot as plt
# Read image
im=cv2.imread(
"C:/Users/user/Pictures/Coins.png",cv2.IMREAD_GRAYSCALE)
plt.imshow(im,cmap='gray')
plt.title('Original image')
plt.show()
135
Chapter 5 Image Processing Using OpenCV
inverted_img = cv2.bitwise_not(im)
# Set up the detector with default parameters.
detector = cv2.SimpleBlobDetector_create()
# Detect blobs.
keypoints = detector.detect(inverted_img)
print(len(keypoints))
# Draw detected blobs as red circles.
blobs = cv2.drawKeypoints(
im, keypoints, np.array([]), (0,0,255), cv2.DRAW_MATCHES_FLAGS_
DRAW_RICH_KEYPOINTS)
# Show keypoints
cv2.imshow("Detected Blobs", blobs)
cv2.waitKey(0)
136
Chapter 5 Image Processing Using OpenCV
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread('C:/Users/user/Pictures/rice.jpeg')
img_grey = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#convert the grayscale image to binary image
thresh = 130
137
Chapter 5 Image Processing Using OpenCV
138
Chapter 5 Image Processing Using OpenCV
Summary
This brings us to the end of this chapter. We have elaborately discussed
a wide variety of image processing techniques that are more relevant
to industrial vision applications. Let us recall all the techniques that we
discussed in the chapter:
139
Chapter 5 Image Processing Using OpenCV
140
CHAPTER 6
Graphical User
Interface with OpenCV
and tkinter
In this chapter, we will discuss the design of a basic graphical user interface
(GUI) using a number of graphical components for an industrial vision
system. There are multiple libraries available in Python for designing a
GUI, and we will use the tkinter library in this chapter. We will start with
the discussion of individual GUI components provided by tkinter along
with a simple demonstration for each one of them and finally build a
complete user interface for a simple vision system.
GUIs at a Glance
A GUI is a digital interface between the users and the underlying computer
program behind an application that lets them interact with the program
with the help of graphical elements such as icons, buttons, windows,
sliders, menus, etc. A GUI makes life simple for common people without
knowledge of technical details by providing a user-friendly way to interact
with complex systems or software applications. Prior to GUIs, users
interacted with computer programs using text commands in an interface
called character user interface (CUI). Unlike CUI, the GUI provides visual
elements to represent various functions and actions associated with the
programs. Ranging from operating systems to browsers and multimedia
applications, GUI have become the standard interface for interaction.
Following are the key components of a GUI:
142
Chapter 6 Graphical User Interface with OpenCV and tkinter
Tkinter
Tkinter is one of the most commonly used Python libraries for developing
GUI. It is the standard Python interface to the Tk GUI toolkit where Tk is
a cross-platform widget toolkit providing a library of GUI widgets. The
name tkinter comes from the phrase “Tk interface,” and it is an open
source library released under Python license. tkinter comes bundled
together with Python package, making it easily available for developers.
The GUI code developed using tkinter can work on multiple platforms like
Windows, MacOS, and Linux.
Tkinter offers a number of widgets that provide different ways for the
users to interact with an application. Discussing all the widgets is beyond
the scope of this book. We will only discuss some of the common widgets
that are often used in vision systems as illustrated in Table 6-1. We will first
learn to build simple GUIs with each of these widgets one by one. Later on,
in the chapter, we will build a comprehensive GUI with multiple widgets
for an industrial vision system.
143
Chapter 6 Graphical User Interface with OpenCV and tkinter
Label Displays static text or images that users can just view but not
interact with
Button Enables users to initiate an action by clicking it
Entry A single-line text field where users can type in strings involving text
and numbers
Radiobutton Let users choose one among several choices
Checkbutton Allow users turn an action on or off
messagebox Displays a message to the user which is non-editable multi-line text
Toplevel Provides users with a separate window
filedialog Used to get information from users like typing text or selecting
files to open, inform them of some events, confirm an action,
and more
Canvas A widget intended for drawing pictures and placing graphics, text,
and other widgets
Scale A graphical slider that allows users to select values from a scale
Label
Let us start with a small program that creates a simple window with a label
widget that displays a welcome message.
144
Chapter 6 Graphical User Interface with OpenCV and tkinter
In the preceding code, the function Tk() creates the main window in
which the required graphic widgets will be added. The title() function is
used to provide a title to the window. All the widgets of the application
window will be managed using the variable name created for the window
which is root in our case. This is evident from the next line of code where
we create a label widget using the root variable. We just provide the
commonly used “Hello World!” string to be displayed in the Label widget
so that we don’t get a blank window. Now that the Label widget is created,
the pack() function in tkinter is used to push it into the main window. The
mainloop() window keeps the main window running in a loop so that it
stays open as long as the user chooses to close the application. This GUI
code can be executed in any Python IDE of our choice. For instance, the
code can be typed in a new editor window in the IDLE IDE, as shown in
Figure 6-1, and then it can be executed by clicking “Run Module” in the
“RUN” menu at the top palette or by clicking the “f5” key in the keyboard.
This will create the sample window as shown in the figure.
145
Chapter 6 Graphical User Interface with OpenCV and tkinter
146
Chapter 6 Graphical User Interface with OpenCV and tkinter
Button
Now that we know how to display labels, let us try to use a button widget
to generate a label. A button widget needs two things: a text prompt on
top of the button indicating its functionality and an action that needs to be
carried out upon pressing the button. In our case, the action of generating
a label can be created first with a user-defined function as illustrated in
the following code. Once the action is defined, we can use the Button()
function to create the button. Within the button function, the text option
can be used to provide the text prompt, and the command option can be
used to set the function call for the label generation function we created.
147
Chapter 6 Graphical User Interface with OpenCV and tkinter
Recall that we need to use the pack() function to push the button into the
main window. In addition, we can customize the button by providing the
height and width of the button using the padx and pady options. We can
also set the foreground color (color of the text over the button) and the
background color (color of the button) using the fg and bg options. Readers
can explore the other configuration options provided by the Button
function from the official documentation. The window generated by the
code is illustrated in Figure 6-3.
148
Chapter 6 Graphical User Interface with OpenCV and tkinter
Entry
In both of the previous examples, we programmed the text to be displayed
in the label. In this section, we will use the entry widget to get the text from
the user and use a button widget to push the text to the label widget. In
the following code illustrated, we initially create the entry widget using the
Entry() function. It can be seen that the function allows us to customize
the widget by providing options like width, borderwidth, etc. The function
provides an insert method with the syntax insert(index, value), where
index is the position at which the value is inserted and value is the text to
be inserted in the entry field. Note that we have provided a default string
“Enter text” to be displayed as default. The index 0 enables us to replace
this default string with the string that we type since we are inserting in
the same position as the default string. Next, we create action function
for the pushbutton wherein we create a label widget. Unlike the previous
examples, here we use the get() function to get the text from the entry
widget rather than giving our own text. And finally, we use the button
function to initiate the action. The resulting window is shown in Figure 6-4.
149
Chapter 6 Graphical User Interface with OpenCV and tkinter
Radiobutton
In this section, we will use radio buttons to push integer values into the
label widget. As illustrated in the following code, we first define a variable
r using the IntVar() function to hold integer data which is then retrieved
using the get() function while defining the action for the radiobutton
widgets. Next, we define the action for radiobutton widgets with a user-
defined function where we use the config() function to update the label
widget. Following this, we create the radio buttons using the Radiobutton()
function. This function takes in the following inputs: the text to be
displayed for the radio button, the control variable to keep track of the
user’s choices, the value to be assigned to the control variable, and the
action to be initiated upon clicking the radio button. Note that we use the
Lambda function which gets the value assigned to the control variable
r and provides it to the radio button action function we defined earlier.
Finally, we create a label widget with empty text and push it to the window.
When we click a radio button, the value assigned to the control variable
corresponding to that button is displayed in the label widget as illustrated
in Figure 6-5.
150
Chapter 6 Graphical User Interface with OpenCV and tkinter
Checkbutton
Unlike the radio buttons that allow us to choose one among multiple
options, checkbuttons allow us to select multiple options at once. In the
code illustrated as follows, we use the BooleanVar() functions with the
checkboxes where the value of the variables will automatically be updated
to True when the corresponding buttons are selected and to False when
151
Chapter 6 Graphical User Interface with OpenCV and tkinter
152
Chapter 6 Graphical User Interface with OpenCV and tkinter
Messagebox
Oftentimes, we find the need to provide some warnings or convey some
information regarding the application to the user. Message boxes provide
a simple way of achieving this using a popup box that is triggered on the
behest of an action initiated by the user. In the following simple code, we
use a button action to trigger the messagebox widget. Figure 6-7 illustrates
the result of the action.
153
Chapter 6 Graphical User Interface with OpenCV and tkinter
Toplevel
The toplevel widget can be used in an application when there is a need
for an extra window to represent some extra information to the user
or to provide a separate interface for a subset of the application. These
additional windows can be directly managed by using their own variable
name and do not need to be associated with a parent window. The simple
code given here illustrates the creation of an additional window with its
own label. The resulting windows are shown in Figure 6-8.
154
Chapter 6 Graphical User Interface with OpenCV and tkinter
root.title('Main window')
# create additional window
top = Toplevel()
top.title('Sub window')
# create labels for both windows
mylabel1 = Label(root, text = 'This is the main window').pack()
mylabel2 = Label(top, text='This is the sub window').pack()
mainloop()
155
Chapter 6 Graphical User Interface with OpenCV and tkinter
In the following code, we use a button widget to open the dialog widget
for selecting an image and a Canvas widget for drawing the image. We
will specify the width and height of the canvas according to the size of the
image to be displayed. In our case, we use the same size as the image, that
is, 512 x 512. Under the function defined for the button action, we use the
askopenfilename function of the fielddialog widget to browse for image.
The askopenfilename function takes three parameters as input in our code.
The first one is the initial directory into which the fielddialog widget will
open followed by a text to be displayed on the dialog window and then the
filetypes to be shown in it. Note that we have provided two file types in the
function: “.png” indicating PNG images and “*.*” indicating all other file
types. These two types will therefore be available in the file-type drop-
down menu in the dialog window as illustrated in Figure 6-9 (a). We can
select an image available in the initial directory itself or navigate to other
directories for selecting the image. The result of this askopenfilename
would be the filename of the image along with its path.
To enable the file to be displayed in the canvas, we need to use the
PhotoImage function in tkinter library to read the file. But the problem is
that this PhotoImage function supports only GIF and PGM/PPM formats,
whereas the commonly used image formats that we will be using are JPEG
and PNG. A way around this problem is to use an alternative PhotoImage
function provided by PIL library under the ImageTk class. We initially read
the image using the imread() function of the OpenCV library. The reason
for this is that, the image is read as a numpy array when using OpenCV,
and this allows us to perform a number of numerical operations on the
image as we discussed in Chapter 5. We then use the fromarray() function
in the PIL library to convert the image into a PIL Image file format and
then use the PhotoImage function in the same library to convert the image
to image objects that could be displayed in the Canvas widget.
Next, we can use the create_image function to draw the image on
the canvas. The first two numbers inside the function indicates the (x,y)
coordinates used to position the image. We can vary the coordinates to
156
Chapter 6 Graphical User Interface with OpenCV and tkinter
shift the position of the image either horizontally or vertically. The anchor
value determines the position within the image that will be aligned with
the coordinates. For instance, we have specified “nw” as the anchor which
implies that the northwest anchor point of the image (top leftmost corner)
will be placed at the coordinates (0, 0) on the canvas. After configuring the
canvas, the image opened using the PhotoImage function is assigned to the
image attribute of the Canvas widget. The resulting image displayed on the
canvas widget is shown in Figure 6-9 (b). For all the GUI illustrations in this
chapter, from this point onward, we use the image of a tipped saw blade.
157
Chapter 6 Graphical User Interface with OpenCV and tkinter
158
Chapter 6 Graphical User Interface with OpenCV and tkinter
Scale
Sliders are often used in vision-based GUIs to make real-time adjustment
to certain parameters of an image. The scale widget allows us to create a
slider in our GUI that covers a scale of values. In the following code, we
click a button to browse and select an image and a slider to zoom-in or
zoom-out of the image. Unlike our previous example, the canvas widget
here will have to be updated by both the button action and the slider
action. Therefore, we create separate functions for loading and displaying
the image, which were previously implemented in the same function, so
that the display function can be called for both the button action and the
slider action.
Let’s go over the functions one at a time. The load() function initiates
the action for the button widget, same as the click() function in the
previous case, where we browse and select an image using a dialog
window. The difference here is that once we get the filename from the
selected image and read the file using the imread() function, we call the
display() function to push the image to the canvas widget. We have made
a little tweak to this display() function as well. Since the display has to be
updated with the corresponding zoomed image every time the slider is
moved, we initially check if the canvas has an image displayed already and
ensure that the image is removed before updating the new image.
Before displaying the image, we call the update() function to apply
the zoom settings captured from the slider to the original image. In this
function, we first get the zoom scale from the scale widget using the get()
function and normalize the value. Next, we determine the new width and
height by multiplying this normalized value with the width and height
of the original image. Now that we have the new dimensions, we can use
the resize() function to scale the original image accordingly. We need to
provide the type of interpolation to be used for scaling the image, and in
this example, we have used the linear interpolation. Finally, we use the
159
Chapter 6 Graphical User Interface with OpenCV and tkinter
160
Chapter 6 Graphical User Interface with OpenCV and tkinter
import cv2
# action for Button widget
def load():
filename = filedialog.askopenfilename(filetypes=[("Image
files", "*.png *.jpg *")])
if filename:
global displayed_img, img_cv, img
img_cv=cv2.cvtColor(cv2.imread(filename), cv2.COLOR_
BGR2RGB)
display()
# function to update the canvas widget
def display():
global displayed_img, img_cv, img
if displayed_img:
canvas.delete(displayed_img)
displayed_img = None
161
Chapter 6 Graphical User Interface with OpenCV and tkinter
img = ImageTk.PhotoImage(Image.fromarray(resized_img))
except AttributeError:
print("")
# action for Scale widget
def zoom_image(value):
update()
display()
# create the root window
root = Tk()
root.title("Image Zoom App")
# create a canvas to display the image
canvas = Canvas(root, width=512, height=512)
canvas.pack()
# create a button to load the image
load_button = Button(root, text="Load Image", command=load)
load_button.pack()
# create a scale widget to zoom the image
zoom_scale = Scale(root, from_=10, to=200, orient="horizontal",
label="Zoom", command=zoom_image)
zoom_scale.set(100) # Initial zoom level (100%)
zoom_scale.pack()
# intialize the global variables
img_cv = None
displayed_img = None
img = None
root.mainloop()
162
Chapter 6 Graphical User Interface with OpenCV and tkinter
163
Chapter 6 Graphical User Interface with OpenCV and tkinter
The first row consists of a label widget displaying the name of the
organization for which the GUI is developed. Here, a bogus corporation
name comprised of the initials of the authors is used for the purpose of
illustration. This widget is extended to cover both columns by specifying
the value for columnspan and sticky option in the grid() function. The
sticky option “ew” indicates that the label must stretch from the left (west)
to the right (east) side of its cavity defined using the columnspan option.
Also the borderwidth and relief options are used in the Label() function to
create a visual boundary effect to the label widget.
In the second row, the version of the GUI is displayed toward the
end of the row in the second column. The positioning of the widget at
the rightmost corner of the second column is achieved by setting the
sticky option in the grid() function as “e”. We have simply provided a
bogus version number 1.0 for the purpose of illustration. The title of
the application is positioned at the center of the third row using the
columnspan and sticky option similar to the first label. Again, we have
simply provided the title “Vision System” for the purpose of illustration.
164
Chapter 6 Graphical User Interface with OpenCV and tkinter
In the fourth row, we have used two frame widgets which we have
not discussed so far in this chapter. The frame widget is simply used to
group multiple widgets together. In the first frame, two canvas widgets are
grouped together. The first canvas is used to display live video feed from
the USB camera. We have already discussed how to capture the video
and read the frames in Chapter 2. The only thing we have done here is to
display the time stamp over the video using the datetime.now() from the
datetime library to get the current time and the putText() function in the
OpenCV library to push the time into the frames of the video. Timestamps
could be handy for managers in assembly line as saving the timestamp of
detection for each product could help them perform some analysis later.
The second canvas is used to perform edge detection over the frames of
the video and display the edge detected video. The Canny edge detector
discussed in Chapter 5 is used here.
The second frame consists of a button widget and a set of labels and
entry widgets grouped together. The button widget is used to browse
and select a reference image which can then be used to compare with
the product image captured in the assembly line with the help of image
processing algorithms to identify defective products. The first and third
entry widgets are used to display the number of accepted and rejected
pieces in the assembly line. These widgets get updated each time a
product is detected by our vision system. The second entry widget is used
to provide a delay time which is set according to the time delay between
the arrivals of two successive products in the assembly line. Finally, an exit
button is provided in the fifth row which is used to close the root window.
The action for the button is defined using the close_window() function
as illustrated in the code. Ample comments are provided to explain each
and every part of the code illustrated as follows. As this chapter is mainly
focused on GUI design, the illustration is focused mainly on building the
GUI and does not include the background process involved in identifying
the defective product by comparing the captured product image with the
reference image. This will be discussed in the subsequent chapters.
165
Chapter 6 Graphical User Interface with OpenCV and tkinter
# Create labels
company_label = Label(root, text="KMAKGA Corporation Ltd",
borderwidth=5, relief="ridge",
font=("Arial Black", 24),fg="red",
bg="dark blue")
company_label.grid(row=0,column=0,columnspan=4,padx=30,pady=0,
sticky="ew")
version_label = Label(root, text="Version 1.0",bg="blue")
version_label.grid(row=1,column=2,padx=0,pady=0,sticky="e")
vision_label = Label(root, text="Vision System",
font=("Helvetica", 16,"bold"), fg="red", bg="blue")
vision_label.grid(row=2,column=0,columnspan=3,padx=0,pady=10,
sticky="ew")
166
Chapter 6 Graphical User Interface with OpenCV and tkinter
frame.grid(row=3,column=0,padx=10,pady=0)
# Create canvases
video_canvas1 = Canvas(frame, borderwidth=2, width=400,
height=400, bg="blue")
video_canvas1.grid(row=3,column=0,padx=5,pady=5)
video_canvas2 = Canvas(frame, borderwidth=2, width=400,
height=400, bg="blue")
video_canvas2.grid(row=3,column=1,padx=5,pady=5)
video_canvas1.create_text(80, 10, text="Video Feed",
font=("Helvetica", 10), anchor="ne")
video_canvas2.create_text(110, 10, text="Edge detection",
font=("Helvetica", 10), anchor="ne")
167
Chapter 6 Graphical User Interface with OpenCV and tkinter
168
Chapter 6 Graphical User Interface with OpenCV and tkinter
dt = str(datetime.datetime.now())
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_rgb = cv2.putText(frame_rgb, dt,
(10,50),font,1,(255,0,0))
# Display original video in the first canvas
orig_photo = ImageTk.PhotoImage(image=Image.
fromarray(frame_rgb))
video_canvas1.create_image(0, 30, image=orig_photo,
anchor=NW)
video_canvas1.photo = orig_photo
video_canvas1.after(10, update_video_canvas)
# Release the webcam and close OpenCV when the window is closed
cap.release()
cv2.destroyAllWindows()
Summary
We have now made a solid foundation for the design of GUI for vision
systems by discussing the following topics in this chapter along with a
sample demonstration for each topic:
169
Chapter 6 Graphical User Interface with OpenCV and tkinter
170
CHAPTER 7
Feature Detection
and Matching
We explored how to combine OpenCV and Tkinter to make graphical user
interfaces (GUIs) for image processing applications in the last chapter.
Let us now explore how we can extract useful features from a given image.
Feature detection allows us to identify distinguishing points or patterns in
an image that serves as the basis of applications like object recognition,
image segmentation, pattern matching, etc. In this chapter, we will start
with a basic understanding of what image features are. Then we move on
to discuss different methods for detecting what we call keypoints in an
image. We will then discuss a special algorithm that will allow us to detect
corner points in an image. Finally, we will see how to detect the shapes of
objects in an image.
Image Features
Image features are the unique components or patterns in an image that
provides information about the visual characteristics of the image. These
features help to break down the visual information of an image and are a
crucial aspect of computer vision systems as the features learned from the
image help to perform tasks like object recognition, image segmentation,
and classification. The features can be categorized into two types: global
and local features. The global features represent the overall characteristics
of the whole image, whereas the local features represent information about
specific regions or objects in the image.
For instance, the color histogram of a digital image gives a
representation of the color distribution in the image, and global statistics
such as skewness and kurtosis help to describe the pixel intensity
distribution of the image. These features are represented by a single vector
which provides a holistic understanding of the given image. On the other
hand, features like points, edges, corners, etc., are unique to particular
patches in the image and are distinct from their immediate neighborhood.
The extraction of these local features requires two steps: detecting
points or regions of interest that are invariant to scale, illumination
and rotation, and converting the detected features into a numerical
representation. These points of interest are also called keypoints, whereas
the numerical features that provide a description of these keypoints are
called descriptors.
In this chapter, we will be discussing some of the common techniques
that are used widely to extract these features from the images.
SIFT Features
In computer vision, scale-invariant feature transform (SIFT) is an
algorithm for keypoint detection and description that can be employed to
extract unique local features from an image. As the name of the algorithm
implies, the extracted features are robust to the changes in the image
caused by the changes in its scale or illumination levels as well as rotation
of the image. Numerous applications such as object detection, image
matching, and image stitching make extensive use of the SIFT features. For
instance, in an object detection task, the keypoints of an object in an image
can be extracted to provide a description of the object which can then be
used to detect that object in other images.
172
Chapter 7 Feature Detection and Matching
173
Chapter 7 Feature Detection and Matching
in the object can be used to detect the keypoints in the image and their
descriptors. Finally, the drawKeypoints() function in the OpenCV library
can be used to mark the keypoints over the original image. Figure 7-1
shows the descriptors marked in red color over the gear image, and it
can be observed that they are mostly distributed over the edges of the
gear teeth that form the important features of the gear. In the production
line, the alignment of the gear teeth need to be verified before they are
integrated into the actual products, and these descriptors could serve the
purpose as they are invariant to scale and rotation.
174
Chapter 7 Feature Detection and Matching
SURF Features
SURF, which stands for Speeded-Up Robust Features (SURF), is a sort
of accelerated version of SIFT features and can detect keypoints that
are invariant to scale, illumination, and rotation. The steps involved in
keypoint detection and description using SURF features comprises of
four steps:
1. Scale-Space Extrema Detection: In order to identify
features at various sizes, SURF operates on an image
at many scales. It locates local extrema in the scale
space using a method similar to the difference of
Gaussians (DoG) used in SIFT. To find areas of
interest, the image is convolved using Gaussian
filters at various scales, followed by the subtraction
of successive blurred images.
175
Chapter 7 Feature Detection and Matching
176
Chapter 7 Feature Detection and Matching
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the gear image
img = cv2.imread('Gear.jpg', cv2.IMREAD_GRAYSCALE)
# Create a SURF object
surf = cv2.xfeatures2d.SURF_create(10000)
# Detect key points and compute descriptors
keypoints, descriptors = surf.detectAndCompute(img, None)
# Convert grayscale image to BGR for colored keypoints
img1_color = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
# Draw keypoints with increased thickness
for kp in keypoints:
x, y = int(kp.pt[0]), int(kp.pt[1])
radius = int(kp.size / 4)
# Use thickness of 2 for the circle
img1=cv2.circle(img1_color, (x, y), radius, (255, 0, 0),
thickness=2)
# Display the image
plt.subplot(121)
177
Chapter 7 Feature Detection and Matching
plt.imshow(img, cmap='gray')
plt.title('Original Image')
plt.subplot(122)
plt.imshow(img1)
plt.title('Image with keypoints')
plt.show()
FAST Features
FAST stands for Features from Accelerated Segment Test. This corner
detector algorithm published in 2006 is computationally efficient and
hence is faster than other algorithms thereby justifying its acronym.
Therefore, this detector is well suited for real-time video processing
applications that require high-speed computations with limited resources.
The steps involved in determining the interest points using FAST
features are as follows:
The following code illustrates the extraction of keypoints from the gear
image using FAST features. The FastFeatureDetector_create() function
in OpenCV library can be used to create a FAST object, and the detect()
method can be used to get the keypoints. Here, we use the circle() function
to draw the keypoints as well as the 16 pixels on the circle surrounding the
keypoint. As all these little circles are closely packed together, they appear
like circular patches on the image, and we can see from Figure 7-3 that
most of these patches are distributed across the edges of the gear.
179
Chapter 7 Feature Detection and Matching
import cv2
import matplotlib.pyplot as plt
# Read the image
image = cv2.imread('C:/Users/user/Pictures/gear.jpg', 0)
# display the original image
plt.subplot(121)
plt.imshow(image,cmap='gray')
plt.title('Original image')
# Define the FAST detector
fast = cv2.FastFeatureDetector_create()
# Find the interest points
keypoints = fast.detect(image, None)
# Draw the interest points and the 16 pixels on the circle
for keypoint in keypoints:
circle = cv2.circle(image, (int(keypoint.pt[0]),
int( keypoint.pt[1])), 3, (255,0 , 0), 2)
for i in range(-7, 8):
for j in range(-7, 8):
if (i**2 + j**2) <= 49:
cv2.circle(image, (int(keypoint.pt[0]) + i,
int(keypoint.pt[1]) + j), 1,
(255, 0, 0), 1)
# Display the image with keypoints
plt.subplot(122)
plt.imshow(image,cmap='gray')
plt.title('Image with keypoints')
plt.show()
180
Chapter 7 Feature Detection and Matching
BRIEF Features
While using feature descriptors for applications like object recognition,
there is a need for faster and memory-efficient matching which in turn
would require short descriptors. One way to achieve this is to reduce the
descriptors by applying a dimensionality reduction algorithm (like LDA or
PCA) to the original descriptors and then converting the descriptor vector
into binary strings with fewer bits. But this would still require us to first
compute the full descriptors before applying dimensionality reduction.
The BRIEF descriptors overcome this by computing binary strings directly
from image patches surrounding the keypoints instead of computing the
descriptors.
BRIEF, which stands for Binary Robust Independent Elementary
Features, is a feature descriptor that works in tandem with keypoint
detection algorithms like SURF, FAST, or HARIS. As mentioned above,
BRIEF uses binary strings as efficient keypoint descriptors, making it
ideal for real-time applications. The following are the steps involved in
determining the BRIEF features.
181
Chapter 7 Feature Detection and Matching
182
Chapter 7 Feature Detection and Matching
import cv2
import numpy as np
# Load the input gear image
img = cv2.imread('C:/Users/user/Pictures/gear.jpg', 0)
# Initialize a FAST detector and detect keypoints
f = cv2.FastFeatureDetector.create()
kp= f.detect(img)
# Initialize an extractor and extract BRIEF descriptors for the
keypoints
b = cv2.xfeatures2d.BriefDescriptorExtractor.create()
keypoints,descriptors = b.compute(img, kp)
# Print the characteristics of the descriptors
print("The data type of the descriptors variable
is:",type(descriptors))
print("\nDescriptor size:", len(descriptors))
#Print the first descriptor
print("\nThe first descriptor is:")
print(descriptors[0])
# Print the descriptor as binary
print("\nThe first descriptor in binary:")
print(' '.join([np.binary_repr(num,8) for num in
descriptors[0]]))
Output:
The data type of the descriptors variable is: <class 'numpy.
ndarray'>
183
Chapter 7 Feature Detection and Matching
ORB Features
ORB was developed as an efficient alternative to SIFT and SURF features
in terms of computation as well as matching performance. Another key
aspect is the fact that both SIFT and SURF features were patented at the
time of development of ORB features thereby making it a free alternative to
these features. Currently, the patent for SIFT features has expired making it
freely available, whereas the SURF features are still patented requiring the
purchase of license if the features are to be used for commercial purposes.
ORB stands for Oriented FAST and Rotated BRIEF. As the name
implies, it is a combination of the FAST keypoints and BRIEF descriptors
with certain modifications to overcome their shortcomings. The following
are the steps involved in computing ORB features:
184
Chapter 7 Feature Detection and Matching
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Read the input image
img = cv2.imread('C:/Users/user/Pictures/gear.jpg',0)
# Display the original image
plt.subplot(121)
plt.imshow(img,cmap='gray')
plt.title('Original Image')
# Detect and compute the keypoints and descriptors
orb = cv2.ORB_create() # Create an ORB object
kp, des = orb.detectAndCompute(img, None)
185
Chapter 7 Feature Detection and Matching
Corner Detection
The terms keypoints, corners, and features are used interchangeably in
literature, but there is a subtle distinction between a corner and a keypoint.
A corner can be considered as the intersection of two edges or boundaries
thereby making it an important feature for vision tasks such as object
recognition and tracking. In the context of image processing, corners can
be considered as those points in an image where the image will undergo
186
Chapter 7 Feature Detection and Matching
187
Chapter 7 Feature Detection and Matching
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
# Read the grid image
img = cv.imread('C:/Users/user/Pictures/grid.jpg')
# Display the original image
plt.subplot(121)
plt.imshow(img)
plt.title('Grid image')
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) # Convert the image
to grayscale
188
Chapter 7 Feature Detection and Matching
189
Chapter 7 Feature Detection and Matching
image space into a different parameter space. First let us consider the case
of detecting lines in images where the parameter space is composed of
two parameters: slope of the line and its y-intercept. The following are the
steps involved in detecting lines using Hough transform:
1. Detect Edges: The first step normally involves
detecting edges in the image using standard
techniques like the Canny edge detector.
2. Accumulator Array: For each edge point, multiple
lines may pass through it leading to multiple slopes
(m) and intercept (b). The accumulator array is
a collection of these parameters. In other words,
each cell in the accumulator array might represent
a particular m and b. Since vertical lines will have
infinite slope, these parameters will be converted
to polar form using the expression r = xcosθ + ysinθ
where (x,y) denotes the point for which we are
computing the parameters, r is the perpendicular
distance from the origin to the line, and θ is the
angle between the x-axis and the line.
3. Voting: With respect to each edge point in the image,
the corresponding cells in the accumulator array are
incremented, the process being termed as voting.
The idea behind this procedure is that if multiple
edge points lie on the same line, then their votes
will accumulate at the corresponding entries in the
accumulator array.
4. Thresholding: The peaks in the accumulator array
after the voting process represent potential lines
in the image. An additional step of thresholding
is applied to select significant peaks thereby
avoiding noise.
190
Chapter 7 Feature Detection and Matching
The following code illustrates the detection of lines from the same
grid image that we used in our last section with the help of Hough
transform. It can be seen that we are using a Canny edge detector with
two thresholds to detect the edge points in the image. Then, we pass these
points to the HoughLines() function in the OpenCV library to detect the
lines in the image to which some of the edge points are associated with.
The number 1 given to the function is the resolution of the parameter r
in pixels which implies that we are using a resolution of 1 pixel. Similarly,
the value π/180 given to the function is the resolution of parameter θ in
radians which implies that we are using a resolution of 1 degree as it is
equivalent to π/180 in radians. Then, we are providing a threshold of 100
which is the minimum number of votes required to be considered as a line.
The parameter minLineLength of 100 implies that line segments shorter
than that are rejected. Similarly, the parameter maxLineGap of 10 is the
maximum allowed gap between points on the same line to link them. The
resulting image is shown in Figure 7-6 which shows that every line in the
image is detected accurately using the Hough transform.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the nut image
image = cv2.imread('C:/Users/user/Pictures/grid.jpg')
# Display the original image
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.subplot(121)
plt.imshow(image)
plt.title('Orignal image')
191
Chapter 7 Feature Detection and Matching
192
Chapter 7 Feature Detection and Matching
code illustrates the detection of circles in a gear image using Hough circle
transform. The parameters that we provide as input to the HoughCircles()
function in the OpenCV library are as follows:
Finally, the circles are drawn over the image with the help of circle()
function in the OpenCV library where we provide the original image, the
coordinates (x, y, r) for the circles, the desired color by specifying (R, G, B)
values, and then the desired thickness. The HoughCircles() function
193
Chapter 7 Feature Detection and Matching
import cv2
import matplotlib.pyplot as plt
# Read the gear image
image = cv2.imread('C:/Users/user/Pictures/Gear.jpg')
# plot the original image
plt.subplot(121)
plt.imshow(image)
plt.title('Gear image')
# Convert the image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect circles
circles = cv2.HoughCircles(gray, cv2.HOUGH_GRADIENT, dp=1,
minDist=20, param1=120, param2=50, minRadius=5, maxRadius=100)
# Draw the detected circles
for (x, y, r) in circles[0]:
cv2.circle(image, (int(x), int(y)), int(r), (255, 0, 0), 10)
194
Chapter 7 Feature Detection and Matching
195
Chapter 7 Feature Detection and Matching
(R, G, B) colors, and the thickness of the boundary. From Figure 7-8, we
can see that three levels of contours are detected: the overall boundary of
the frame, the boundary of the outer gear teeth, and the boundary of the
inner circle in the gear.
import cv2
import matplotlib.pyplot as plt
# Load the gear image
img = cv2.imread('C:/Users/user/Pictures/gear.jpg')
# Display the original image
plt.subplot(121)
plt.imshow(img)
plt.title('Original gear image')
# Convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply thresholding to binarize the image
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV |
cv2.THRESH_OTSU)[1]
# Find contours in the image
contours = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_
APPROX_SIMPLE)[0]
# Draw contours on the image
for contour in contours:
cv2.drawContours(img, [contour], -1, (255, 0, 0), 10)
# Show the image with contours
plt.subplot(122)
plt.imshow(img)
plt.title('Gear image with contours')
plt.show()
196
Chapter 7 Feature Detection and Matching
Summary
In this chapter, we have navigated through a diverse array of techniques
used for deriving features of interest from a given image which is essential
for interpreting images in computer vision. Following are the key concepts
that we discussed in this chapter:
• Keypoint detectors such as SIFT, SURF, FAST, BRIEF,
and ORB for identifying distinctive points in an image
• Harris detector for effectively detecting corner points
in an image that are usually found at the conjunction of
two edges or boundaries
• Hough transform for detecting lines and circles in a
given image precisely
• Contours to detect other random shapes or boundaries
of objects in an image
In the next chapter, we will discuss another set of crucial techniques
for detecting and extracting objects in an image called the segmentation
techniques.
197
CHAPTER 8
Image Segmentation
An important aspect of vision systems is to differentiate between objects in
a scene. The first step toward this goal is to segregate the various objects in a
given image. This is accomplished by a process called segmentation wherein
all pixels corresponding to various regions (objects) are grouped together
thereby segregating the given image into its constituent objects. In the field
of computer vision, image segmentation is an essential and fundamental
process that forms the basis for many applications, including autonomous
vehicles, medical imaging, object recognition, and more. Building on the
knowledge gained in earlier chapters, this chapter explores the complex field
of segmentation with a focus on advanced approaches and procedures.
We looked at thresholding methods as a foundation for image
segmentation in Chapter 5 that utilized intensity thresholds to help discern
between various regions in a given image. Although thresholding works
well in some situations, it might not be enough for complicated images
with a lot of noise, varying illumination, or complex structures. This
chapter will broaden our comprehension of segmentation techniques and
explore more sophisticated approaches in addition to thresholding.
values above or below a threshold are divided into two groups by the
threshold value, which often represents distinct objects or backgrounds.
Thresholding techniques can be broadly classified into global and local
methods. We have already discussed some of these techniques in Chapter 5.
In this section, we will recall how to segment a given image using a global
threshold technique as well as a local threshold technique.
For the global thresholding technique, we will take the Otsu
thresholding technique that we discussed earlier in Chapter 5. The
following code illustrates the segmentation of a given image using Otsu’s
thresholding. Let us take the image of a U-bolt for this demonstration.
The first step is to read the image and convert it into a grayscale image.
Then we will apply a Gaussian filter using the GaussianBlur() function
with a filter size of 7 x 7 which will help to get rid of any high-frequency
noises present in the image. The threshold() function is then used to
apply thresholding to the filtered image. The attribute THRESH_OTSU
is used to determine the optimal threshold for the image starting with
the initial value of 0 provided in the function, and then the attribute
THRESH_BINARY is then used to apply binary thresholding on the image
using the determined threshold. It can be seen that the U-bolt object is
clearly segmented from the background as shown in Figure 8-1. It can
also be observed that the shadow caused by the lighting is also segmented
as part of the object which reiterates the importance of uniform lighting
requirements. The determined optimal threshold value is displayed in the
title of the segmented image in the figure and is equal to 146.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the image in grayscale
image = cv2.imread('C:/Users/User/Pictures/New folder/U-Bolt.
jpg', cv2.IMREAD_GRAYSCALE)
blurred = cv2.GaussianBlur(image, (7, 7), 0)
200
Chapter 8 Image Segmentation
201
Chapter 8 Image Segmentation
image, let us apply adaptive thresholding to the same image. This method
considers multiple small regions in the given image and computes a
threshold using the set of pixels in those regions. The following code
illustrates the process of applying adaptive thresholding to the same U-bolt
image. The adaptiveThreshold() function in the OpenCV library is used
to accomplish this task. The ADAPTIVE_THRESHOLD_GAUSSIAN_C
attribute indicates that the threshold value will be a Gaussian-weighted
sum of the neighborhood values in the selected region minus a constant
“c”, the value of which is specified as 3 in the function. The number
17 in the function indicates that the thresholding is applied across 17
x 17 regions in the given image. The segmented image illustrated in
Figure 8-2 shows that segmentation attained is much better than the
Otsu thresholding method that was greatly affected by the shadows in
the image.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the image in grayscale
image = cv2.imread('C:/Users/User/Pictures/New folder/U-Bolt.
jpg', cv2.IMREAD_GRAYSCALE)
blurred = cv2.GaussianBlur(image, (7,7), 0)
# Apply OTSU's thresholding
thresh = cv2.adaptiveThreshold(blurred, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_
BINARY, 17, 3)
# Display the original and binary images
plt.subplot(121)
plt.imshow(image, cmap='gray')
plt.title('Original Image')
plt.subplot(122)
202
Chapter 8 Image Segmentation
plt.imshow(thresh, cmap='gray')
plt.title('Segmented Image(Adaptive Thresholding)')
plt.show()
203
Chapter 8 Image Segmentation
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the image
image = cv2.imread('C:/Users/User/Pictures/New
folder/20230917_113209.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (3, 3), 0)
204
Chapter 8 Image Segmentation
205
Chapter 8 Image Segmentation
206
Chapter 8 Image Segmentation
207
Chapter 8 Image Segmentation
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the image
img = cv2.imread('C:/Users/User/Pictures/New
folder/20230917_113209.jpg')
imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply thresholding to create a binary image
208
Chapter 8 Image Segmentation
209
Chapter 8 Image Segmentation
210
Chapter 8 Image Segmentation
There are much more technical details associated with the K-means
algorithm which are beyond the scope of the book. So we will move
forward to discuss how this algorithm can help us in segmenting the
objects in an image. In our context of image segmentation, the data
points correspond to the pixels in an image, and the K-means algorithm
is used to group similar pixels together. The following code illustrates
211
Chapter 8 Image Segmentation
212
Chapter 8 Image Segmentation
kmeans = KMeans(n_clusters=k)
kmeans.fit(pix)
# Get the labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Assign each pixel to its corresponding cluster centroid
seg_image = centroids[labels].reshape(img.shape)
# Display the original and segmented image
plt.subplot(121)
plt.imshow(img)
plt.title('Original Image')
plt.subplot(122)
plt.imshow(seg_image.astype(np.uint8))
plt.title('Image segmented using Kmeans clustering')
plt.show()
213
Chapter 8 Image Segmentation
Summary
Segmenting objects from an image could help us further in applications
like object detection and tracking, medical imaging, face recognition, etc.
In this chapter, we have explored different techniques to segment an object
in a given image as listed here:
214
CHAPTER 9
Optical Character
Recognition
Optical character recognition, shortly called OCR, is a technology used
to detect text characters from scanned text documents or digital images
captured by a camera and convert them into an editable data. To put it in
different terms, the OCR technology extracts machine-encoded text from
the text characters recognized within scanned documents or images. OCR
technology began to emerge in the early twentieth century. The primary
purpose of early OCR systems was to identify printed text in documents
that were typewritten or typeset. Later on, techniques like pattern
matching and template matching were employed to create commercial
character recognition systems. With the rise in popularity of machine
learning methods as well as the development of advanced computing
capabilities, these techniques gained prominence in implementing
highly accurate OCR systems. Deep learning methods have improved
OCR performance tremendously in the recent years, making it possible to
recognize intricate handwritten text.
OCR has been widely adopted in multiple vision system applications
like verifying information printed on packaging labels, reading and
logging number plates of vehicles, automatic inventory management
using serial numbers printed on products, extracting data from invoices,
bills, or inspection reports for further analysis, etc. By incorporating
OCR techniques for performing these operations, accuracy can be
greatly improved as it removes the possibilities for human error, and the
productivity can be greatly improved as it minimizes the time spent in
these tasks. As with every technology, there are some limitations with
OCR systems as well. For instance, the quality of the OCR output is highly
dependent on the quality of the input images. Therefore, poor-quality
inputs can degrade the performance of the system. Also, OCR systems are
still facing a lot of challenges with respect to handwritten text recognition
owing to the variety of handwriting styles.
Despite the challenges, OCR has become an integral part of industrial
automation across the world, especially in handling of diverse document
formats. In this chapter, we will start with the fundamentals of OCR
application exploring the various stages of the OCR pipeline. Then we
will explore some of the leading OCR libraries and APIs that are readily
available to be used for real-time applications.
216
Chapter 9 Optical Character Recognition
Image Preprocessing
This stage involves preparing the input image for character recognition
by improving its quality and transforming it to a form suitable for further
processing. The key techniques involved in this stage are discussed as
follows.
Noise Reduction:
As discussed in Chapter 5, images captured in real time are often
susceptible to noise which reduces the quality of the image making it
difficult for further processing. Therefore, these noises need to be treated
first to enhance the image quality by smoothing them out using suitable
filters like Gaussian blurring or median filtering. Recall from our earlier
discussions that the suitable kernel size has to be selected while applying
these filters to get better results.
i. Contrast Enhancement
A lot of photos, industrial images, and scanned
documents have low contrast, which makes
the text and background appear washed out or
blended together. Because of this, OCR algorithms
have trouble telling them apart, which might
result in mistakes. We can simplify the process
of distinguishing individual characters from the
background by increasing the contrast. By making
the segmentation process simpler, each character
may be isolated by the OCR software to ensure
accurate recognition. Popular techniques like
contrast stretching, histogram equalization, adaptive
histogram equalization, etc., can be used to enhance
the contrast of the image. We can guarantee that
our OCR system receives the cleanest and most
readable input by using the appropriate contrast
enhancement technique
217
Chapter 9 Optical Character Recognition
ii. Binarization
As we are interested only in recognizing the
characters present in the given image, the color
information becomes irrelevant, and hence, it
would be better to convert the image into black and
white. This would serve two purposes: reducing the
computational complexity as the 3-plane RGB image
is converted to a single plane binary image with just
two values (0 for black and 1 for white) and clearly
distinguishing the text from the background. Recall
from our earlier discussions that binarizing a color
image will involve two steps: convert the image into a
grayscale image, and then apply thresholding (global
or local) to convert it into a black and white image.
218
Chapter 9 Optical Character Recognition
Character Segmentation
Once the image is prepared for further processing, the next stage in the
pipeline is to segment the individual characters in the image. We will
discuss some of the prominent techniques for character segmentation in
this section.
219
Chapter 9 Optical Character Recognition
220
Chapter 9 Optical Character Recognition
Feature Extraction
Once we have segmented the characters, the next step would be to
represent the characters in a suitable way for recognition algorithms. We
will discuss some of the prominent feature extraction techniques here:
i. Statistical Features
221
Chapter 9 Optical Character Recognition
Character Recognition
The characters in a given image can be recognized and extracted with the
help of the features derived from the image. One of the earlier techniques
that was commonly used in tradition OCR systems was template matching.
Some traditional machine learning techniques were also commonly used
for recognizing characters. We will discuss both these techniques briefly to
understand the underlying principle.
222
Chapter 9 Optical Character Recognition
i. Template Matching
223
Chapter 9 Optical Character Recognition
224
Chapter 9 Optical Character Recognition
225
Chapter 9 Optical Character Recognition
classification algorithm relies on the idea that data points that share
comparable features are probably members of the same class. This
approach is easy to use and understand, which makes it appropriate for a
variety of classification problems.
The following code illustrates the implementation of handwritten digit
recognition using kNN. Let’s discuss the code one step at a time:
226
Chapter 9 Optical Character Recognition
227
Chapter 9 Optical Character Recognition
228
Chapter 9 Optical Character Recognition
knn = KneighborsClassifier(n_neighbors=5)
# Train the model and make predictions
knn.fit(x_train_scaled, y_train)
y_pred = knn.predict(x_test_scaled)
# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {}%".format(accuracy*100))
# Sample prediction
new_image = x_test_scaled[0].reshape(1, 28 * 28)
prediction = knn.predict(new_image)
plt.imshow(x_test[0].reshape(28,28),cmap='gray')
plt.title('Test Image')
plt.text(10, 3, 'Label: {}'.format(prediction[0]), fontsize=12,
bbox=dict(facecolor='white', alpha=0.9))
plt.show()
229
Chapter 9 Optical Character Recognition
Figure 9-2. Test image with the label detected by the kNN model
230
Chapter 9 Optical Character Recognition
231
Chapter 9 Optical Character Recognition
Implementing OCR using Google® Cloud Vision API will require two
separate procedures. The first one is to configure the Vision API and create
an access key to utilize the API, and the second step is to download the
Google® Cloud Vision library for python to implement the text detection
algorithm. Let us start with the steps involved in enabling and configuring
the Vision API using the Google® Cloud Console.1
1
https://console.cloud.google.com/
232
Chapter 9 Optical Character Recognition
233
Chapter 9 Optical Character Recognition
234
Chapter 9 Optical Character Recognition
235
Chapter 9 Optical Character Recognition
236
Chapter 9 Optical Character Recognition
Figure 9-9. Downloading the key required for availing Vision API
services
237
Chapter 9 Optical Character Recognition
the development of code for text detection. In this illustration, we will use
the saw blade image shown in Figure 9-10. Our goal is to detect the text
provided over the label of the saw blade.
238
Chapter 9 Optical Character Recognition
239
Chapter 9 Optical Character Recognition
byteImage = f.read()
# create an image object make request to Vision API
image = vision.Image(content=byteImage)
response = client.text_detection(image=image)
# loop over the results
ocr=''
for text in response.text_annotations[1::]:
ocr+= text.description+' '
print("Detected Text: \n", ocr)
Detected Text:
110mm 4³1 , 3 " 20 to 16 mm N.MAX 14000RPM POWFRIEE PPT FOWER
HOOLD POWERTEX 30T POWER TOOLS QUALITY BLIM GUT TDKWA10430
Tungsten Carbide Tipped Saw Blade SKU NO.182100 Unplug saw
before mounting blade . Do not operate without saw blade querd
. Always wear eye protection.Pailure to head all warnings could
resultin serious bodily injury .
240
Chapter 9 Optical Character Recognition
that, users must create an Azure subscription and then assign a Resource
group to that subscription. Since these processes are out of scope of this
book, we will begin with the creation of computer vision API as illustrated
in the following steps.
241
Chapter 9 Optical Character Recognition
2
https://portal.azure.com/
242
Chapter 9 Optical Character Recognition
243
Chapter 9 Optical Character Recognition
Figure 9-15. Keys and Endpoint for the Computer Vision service
244
Chapter 9 Optical Character Recognition
The following code illustrates the process of extracting the text from
the saw blade image using computer vision API. As always, we start by
importing all the necessary libraries for implementing the OCR system. We
then get the endpoint URL and the API key that we saved to the notepad
earlier, and then create a client using these credentials. Once the client is
set up, we go to read the input image data from a file-like object in memory
using the function io.BytesIO. The read_in_stream() method of the Azure
Computer Vision client object takes the image data as a binary stream,
performs an asynchronous analysis operation, and then returns a response
object which contains the location of the operation in the Azure service.
The Operation-Location parameter (an URL) is then extracted from the
response header and can be used to track the status of the operation and
retrieve its results. Then, we use the split() function to extract the operation
ID from the Operation-Location URL string.
The get_read_result() method of the Azure Computer Vision client is
used to retrieve the results of the asynchronous image analysis. The result.
status provides us the status of the operation. Initially the status is in not
started or running mode, and we use this block of code in a while loop to
245
Chapter 9 Optical Character Recognition
wait until the status turns to succeeded. The sleep() method in the time
library is used to pause the execution of the while loop for 10 seconds so
that the status is checked every 10 seconds. The code breaks out of the loop
once the status is changed to succeeded. Then we iterate overt the analyze_
result.read_results property of the ‘ComputerVisionResult’ object. Each
item in the read_results list consists of text content corresponding to the
lines of text in the image. Next, we iterate over the result.lines property to
read the text contents of each line in the image which are then appended
to the text variable which we created as an empty string before the for
loop. Once all the lines are extracted, the text variable can be printed the
result of which is shown following the code. It can be seen that, similar to
the Google Cloud Vision output, there are few mistakes in reading the text
owing to the poor quality of the image.
import io
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.computervision import
ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models
import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import requests
import time
API_KEY = 'fea22b23dbf848b3b7562f583948c41a'
ENDPOINT = 'https://ocr-example.cognitiveservices.azure.com/'
client = ComputerVisionClient(ENDPOINT,CognitiveServicesCredent
ials(API_KEY))
# load the input image
filename = "C:/Users/Lenovo/Downloads/20240202_082924.png"
with io.open(filename, "rb") as f:
image = io.BytesIO(f.read())
# perform analysis operation and get the operation id
246
Chapter 9 Optical Character Recognition
Text description:
POWERTEE PPT SLIM GUT FOWER FOOLS POWERTEX 30T QUALITY POWER
TOOLS 110mm 4318' 3 " 20 to 16 mm N.MAX 14000RPM TOKWA10430
Tungsten Carbide Tipped Saw Blade SKU NO.182100 Unplug saw
before mounting blade. Do not operate without saw blade querd.
Always wear eye protection. Pallure to head all warnings could
result In serious bodily Injury.
247
Chapter 9 Optical Character Recognition
EasyOCR
EasyOCR is a user-friendly and versatile optical character recognition
(OCR) package for the Python programming language. The library offers
pre-trained text detection and identification models with support for
several languages, and an emphasis on speed and efficiency in word
recognition from images. EasyOCR consists of three primary parts: feature
extraction, sequence labeling, and decoding. Feature extraction uses deep
learning models like ResNet and VGG to extract meaningful features from
the input image. The next phase, sequence labeling, makes use of Long
Short-Term Memory (LSTM) networks to interpret the sequential context
of the retrieved features. Lastly, the Connectionist Temporal Classification
(CTC) method is used in the decoding phase to decode and transcribe
the labeled sequences into the actual recognized text. As a detailed
explanation of these techniques is beyond the scope of the book, readers
are encouraged to explore them on their own.
248
Chapter 9 Optical Character Recognition
The EasyOCR library can be installed using the command pip install
easyocr in the Windows command prompt. Once the library is installed,
the code illustrated as follows can be utilized for extracting the text from
a given image. In this illustration, we are using a partial image of the top
view of Raspberry Pi board. Once we read the image, we use imshow()
method from opencv library to display the original image. Next, we create
a reader object of the Reader class provided in the easyocr module. The
argument en passed to the Reader class specifies that the OCR should be
performed for English language, and the argument gpu=False indicates
that the OCR should be performed on the CPU and not the GPU. The
system will take some time to create this reader object. The gpu argument
can be set to True if we have a CUDA-capable GPU, and this would speed
up the process.
The readtext method in the reader object can then be used to extract
the text from the given image. The output of this method will be a list of
pairs where each pair consists of the bounding box coordinates for each
text detected in the image and the corresponding text extracted. We then
use a for loop to iterate over the detected pairs. In each iteration, we
create a rectangle around the detected text with the help of the bounding
box coordinates using the cv2.rectangle() function and then display the
extracted text about the bounding box using the cv2.putText() function.
The original as well as the text detected imags are illustrated in Figure 9-16.
It can be noticed that this library does a fairly decent job of extracting the
text from the image.
import easyocr
import cv2
img = cv2.imread('C:/Users/Lenovo/Documents/Piboard1.jpg')
cv2.imshow("Original Image",img)
cv2.waitKey(0)
# create a reader object and extract text from the image
reader = easyocr.Reader(['en'],gpu=False)
249
Chapter 9 Optical Character Recognition
results = reader.readtext(img,paragraph=True)
# Create bounding box and display extracted text over the image
for (bbox, text) in results:
(tl, tr, br, bl) = bbox
tl = (int(tl[0]), int(tl[1]))
tr = (int(tr[0]), int(tr[1]))
br = (int(br[0]), int(br[1]))
bl = (int(bl[0]), int(bl[1]))
cv2.rectangle(img, tl, br, (0, 0, 255), 2)
cv2.putText(img, text, (tl[0], tl[1]-5),
cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
cv2.imshow("Text Deteccted Image", img)
cv2.waitKey(0)
Keras-OCR
Based on TensorFlow and Keras, keras-ocr is an open source Python
framework that provides an extensive toolkit for text extraction from
images. It is based on the combination of two major deep learning
250
Chapter 9 Optical Character Recognition
251
Chapter 9 Optical Character Recognition
import keras_ocr
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('C://Users/user/Documents/Piboard1.jpg')
pipeline = keras_ocr.pipeline.Pipeline()
predictions = pipeline.recognize([img])
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 20))
ax1.imshow(img)
ax1.set_title("Original Image")
keras_ocr.tools.drawAnnotations(image=img,
predictions=predictions[0],
ax=ax2)
ax2.set_title("Annotated Image")
252
Chapter 9 Optical Character Recognition
253
Chapter 9 Optical Character Recognition
Summary
Optical character recognition is an important component of industrial
vision system that can be used to automatically check text engraved on or
stickers pasted or printed on the surface of manufactured equipment and
components. Also, it can be used to automate text entries into systems that
were otherwise done manually in various scenarios. In this chapter, we
discussed about the following aspects with respect to OCR systems:
254
CHAPTER 10
Machine Learning
Techniques for Vision
Application
In traditional programming, we will manually write the code based
on a well-defined algorithm that takes the input data and provides the
desired output data. Machine learning (ML), on the other hand, learns the
algorithm from the data without being explicitly programmed. The data
can be of any format, but in the domain of computer vision, it primarily
refers to images. ML algorithms for vision systems analyze the image data
and learn to identify patterns relevant to the task at hand. Over time, the
algorithm gains experience and improve its ability to perform the task. The
term task here can refer to a wide range of computer vision applications
like image classification, object tracking, image segmentation, OCR, etc.
This ultimately leads to intelligent systems that are able to learn and adapt
by enabling machines to make data-driven predictions and automate
complicated decisions. The groundwork for using machine learning with
image data in your Raspberry Pi-based vision system will be laid out in
this part. As with all the traditional image processing system that we have
discussed so far, ML-based image processing systems will also require
certain preprocessing steps that can enable the system to learn faster. We
will start by discussing these techniques briefly and then begin to explore
some of the traditional ML models for image data. We will then delve into
deep learning techniques that are better suited to handle image data than
the traditional ML techniques.
Image Preprocessing
Just like the way we prepare the ingredients for cooking in accordance with
the dish that we are going to make, we will preprocess images to convert
them to a suitable format before feeding them to the machine learning
models for computer vision tasks. This plays a crucial part in enhancing
the performance of the learning model in a number of ways:
256
Chapter 10 Machine Learning Techniques for Vision Application
257
Chapter 10 Machine Learning Techniques for Vision Application
These preprocessing methods help us to get our image data ready for
the best possible performance from our machine learning model. This
lays the groundwork for our Raspberry Pi-based vision system to analyze
images accurately and train the model effectively.
258
Chapter 10 Machine Learning Techniques for Vision Application
1. Supervised Learning
2. Unsupervised Learning
259
Chapter 10 Machine Learning Techniques for Vision Application
3. Semi-supervised Learning:
4. Reinforcement Learning:
260
Chapter 10 Machine Learning Techniques for Vision Application
261
Chapter 10 Machine Learning Techniques for Vision Application
The input layer is the actual data that is passed to the network. In case
of images, each node in the input layer corresponds to a pixel value in the
image. Suppose that a digital image to be processed by the neural network
is of size 30 x 30 pixels, the image will be reshaped to a single vector
consisting of 30x30 = 900 pixels, and hence, the input layer will consist of
900 nodes each holding a pixel value.
These values are then passed on to the hidden layers which learn
and extract higher-level features from the input data. These layers are
called hidden because the computations they perform are not directly
observable. The neurons in the hidden layer compute a weighted sum of
the inputs they receive from the previous layers or directly from the input
layer (in the case of first hidden layer) and then pass the results through
an activation function. A single node in a hidden layer is illustrated in
Figure 10-2. The value of Z is the weighted sum of the inputs given by
Z = w1x1 + w2x2 + … + wnxn where (x1, x2, … xn) are the inputs to the node
received from the previous hidden layer or the input layer and (w1, w2, …
wn) are the model weights which the model learns iteratively. In addition
to the input nodes, neural networks will have an additional bias node
which is a constant value added to the above result. This bias is used to
offset the result of each node and helps in shifting the activation function
toward the positive or negative side. The function f(.) is the activation
function which introduces nonlinearities into the network thereby
allowing it to learn complex mapping between inputs and outputs. Some
of the common activation functions include rectified linear unit (ReLu),
Sigmoid, Softmax, and tanh. Readers are encouraged to explore the math
behind these functions and their respective applications.
262
Chapter 10 Machine Learning Techniques for Vision Application
x1
w1
x2 w2
Z f f(Z)
wn
xn
263
Chapter 10 Machine Learning Techniques for Vision Application
264
Chapter 10 Machine Learning Techniques for Vision Application
of the filter coefficients and the image pixels covered by the kernel, and
replacing the center pixel in the region covered by the kernel with the
result. This operation is the same as the filtering operation we discussed
earlier in Chapter 5. Figure 10-4 illustrates the convolution of a random
grid data with a random kernel.
265
Chapter 10 Machine Learning Techniques for Vision Application
266
Chapter 10 Machine Learning Techniques for Vision Application
267
Chapter 10 Machine Learning Techniques for Vision Application
import cv2
import numpy as np
from keras.datasets import mnist
from keras.layers import Dense, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.models import Sequential
from keras.utils import to_categorical
from matplotlib import pyplot as plt
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# normalize to range 0-1
trainX = trainX / 255.0
testX = testX / 255.0
# one hot encoding
trainY = to_categorical(trainy)
testY = to_categorical(testy)
268
Chapter 10 Machine Learning Techniques for Vision Application
269
Chapter 10 Machine Learning Techniques for Vision Application
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu' ))
model.add(Dense(10, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_
crossentropy', metrics=['accuracy'])
print(model.summary())
history = model.fit(trainX, trainY, batch_size=32, epochs=10,
validation_split=0.2)
270
Chapter 10 Machine Learning Techniques for Vision Application
plt.subplot(121)
plt.plot(history.history['accuracy'])
plt.title('Model accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.subplot(122)
plt.plot(history.history['loss'])
plt.title('Model loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()
Now that we have trained the model and visualized it’s learning
performance on the training dataset, the next step is to evaluate the
model’s performance on the test dataset. We use the evaluate() method
to do this which takes the test data as well as the corresponding labels as
271
Chapter 10 Machine Learning Techniques for Vision Application
input and provided the test accuracy as well as loss as the output. We can
use a print statement to view this accuracy, and we get a value of 0.9853
which is equivalent to 98.53%. Therefore the model performance on the
unseen test data, which it has not encountered during the training phases,
is also close to the training performance indicating that the model adapts
well to new data. To test the model prediction, we take the first image in
the test data set using the index testX[0], reshape it to the size (1,28,28,1)
suitable for the model, and then use the predict() function to generate
the prediction for the image. This function will generate an array of
probabilities where each element represents the probability that the image
corresponding to a particular digit class (0–9). The argmax() function can
then be used on the prediction output to determine the index of the largest
probability among the array, and this index corresponds to the output
class of the digit. For example, in our case, the first image in the test dataset
corresponds to digit 7, and the value in index 7 of the prediction array will
be maximum (equal to 1) indicating that the detected number is 7. Readers
can verify this by printing the prediction array. Finally, we can display
the test image with predicted digit as its title as illustrated in Figure 10-8.
The model can be saved as a ‘.h5’ model using the save() method from the
tensorflow library. This will save the model architecture, learned weights,
and optimizer configuration into a single file.
272
Chapter 10 Machine Learning Techniques for Vision Application
273
Chapter 10 Machine Learning Techniques for Vision Application
Before developing the code for the use case, let us first understand
the challenges involved in dealing with images of real-world documents
with handwritten digits. The foremost issue with real documents is the
variability with respect to a number of factors like writing style, stroke
thickness, slant, etc. The MNIST dataset with clean, standardized digit
images will definitely not be able to capture this variability. MNIST may
not generalize well to images with noise or distortion caused by various
factors like imperfect writing conditions, scanning errors, etc. Addressing
these issues require sophisticated models trained on large and diverse
datasets representative of these variations and complexities.
Setting aside all the challenges for now, let us see how well our trained
model detects the handwritten digits at different locations in the front
page of an answer script. We start by importing the necessary library and
loading the trained model using the load_model() method. Next we load
the image followed by the coordinates for each of the boxes in the format
[x,y,w,h], where x, y indicates the coordinates of the top-left corner of the
boxes and w, h indicates the width and height of the boxes. To visualize
how these coordinates correspond to the boxes of interest, we create a
mask image with same size as the original image where the pixel values
corresponding to the boxes are 255 and all the other pixel values of zero.
The original image as well as the mask image is illustrated in Figure 10-9.
import cv2
import numpy as np
import matplotlib.pyplot as plt
from keras.models import load_model
# Load the model
model = load_model('my_model.h5')
# load the image and mask coordinates
img = cv2.imread('Scripts\script_new.jpeg')
mask_coordinates = [[221,227,382,283], [223,287,384,343],
[224,343,382,400], [224,398,382,455],
274
Chapter 10 Machine Learning Techniques for Vision Application
[224,458,382,515], [224,515,382,570],
[222,570,382,626], [224,627,384,683],
[224,684,382,746], [224,748,382,807],
[224,803,382,862], [224,862,382,921],
[790,226,952,285], [952,229,1113,285],
[1117,229,1278,285], [788,343,952,401],
[951,341,1114,399], [1113,342,1279,399],
[789,400,952,454], [952,400,1115,454],
[1115,400,1279,456], [788,514,952,570],
[954,512,1113,568], [1116,512,1278,567],
[788,570,952,628], [953,568,1117,626],
[1113,569,1281,626], [788,683,952,746],
[955,683,1113,748], [1114,685,1281,746],
[789,748,952,803], [953,749,1116,804],
[1114,750,1279,807], [789,861,951,919],
[952,863,1114,921], [1115,862,1278,919]]
# Create mask image
new_array = np.zeros_like(img)
for i in mask_coordinates:
new_array[i[1]:i[3],i[0]:i[2]] = 1
image_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert
BGR to RGB
plt.figure()
plt.subplot(211)
plt.imshow(image_rgb)
plt.title('Original Image')
plt.subplot(212)
plt.imshow(255*new_array)
plt.title('Mask Image')
plt.show()
275
Chapter 10 Machine Learning Techniques for Vision Application
From the image, it can be seen that there are two sections in the
answer script. The first section under “Part A” is composed of 12 questions,
and the second section under “Part B” is composed of 4 questions (with
subsections), with corresponding marks entered for both sections. Now,
276
Chapter 10 Machine Learning Techniques for Vision Application
we have to extract the regions specified by the mask coordinates from the
original image, preprocess them to make them compatible to the model,
and then predict the digits in the image regions using the trained model.
The following code illustrates these operations. First, we start by
initializing two figures fig1 and fig2 with subplots using matplotlib library
to display the original and preprocessed images of the digits. To obtain
these images, we iterate through the list of mask coordinates and extract
specific regions of the script image. From Figure 10-9, we can also note
that a number of boxes in the “Part B” section are empty. To drop these
empty boxes from further execution, we use a conditional statement to
allow only those boxes where the variance of pixel values, determined
using var() method in the numpy library, is greater than 200. If the region
contains a digit, the image is displayed in a corresponding subplot in the
second figure fig1 and then preprocessed by converting it to grayscale,
applying a binary threshold, and resizing it to 28x28 pixels to match the
input size of the trained machine learning model. The preprocessed image
is displayed in a corresponding subplot in the second figure fig2 and then
normalized to a range of 0-1, reshaped to a 4D tensor (batch size, height,
width, channels) and passed to the model.predict() method to obtain the
digit prediction. The predicted digit is appended to the out list and the
loop continues to the next digit. Finally, the extracted and preprocessed
images of the digits are displayed using plt.show() method as shown in
Figures 10-10 and 10-11, respectively.
out=[]
fig1, ax1 = plt.subplots(4, 5)
fig2, ax2 = plt.subplots(4, 5)
ax1_flat = ax1.flatten()
ax2_flat = ax2.flatten()
m=0
for i in mask_coordinates:
img_i = image_rgb[i[1]+5:i[3]-5,i[0]+5:i[2]-5]
277
Chapter 10 Machine Learning Techniques for Vision Application
if np.var(img_i)>200:
ax1_flat[m].plt(img_i)
# preprocess the image
gray = cv2.cvtColor(img_1[:,50:100], cv2.COLOR_
RGB2GRAY)
ret, thresh = cv2.threshold(gray, 190, 255, cv2.THRESH_
BINARY_INV)
resized = cv2.resize(thresh, (28, 28))
ax2_flat[m].plt(resized, cmap='gray')
# Normalize the image
normalized_image = resized / 255.0
# Reshape the image to be compatible with the model
reshaped_image = normalized_image.reshape(1, 28, 28, 1)
# Predict the digit using the model
prediction = model.predict(reshaped_image)
# Get the index of the highest confidence digit
predicted_digit_index = np.argmax(prediction)
out.append(predicted_digit_index)
m+=1
fig1.suptitle('Digit images extracted from their boxes')
fig2.suptitle('Preprocessed digit images')
plt.show()
278
Chapter 10 Machine Learning Techniques for Vision Application
279
Chapter 10 Machine Learning Techniques for Vision Application
All the predicted digits are now available in the list “out.” Rather than
directly printing out the list, we use the following code to iteratively print
the digits by associating them with their corresponding question number as
illustrated in the following. This will help us to easily verify our results with
the original image. It can be seen that the marks for questions 4, 5, 11, and
the second part of questions 13 and 16 are incorrectly detected by the model.
So, the model trained on MNIST dataset is still able to successfully predict 15
out of 20 digits from the images extracted from an external document that it
was not previously trained on. As mentioned at the start of this section, this
performance can be improved further by training more complex models
than this simple CNN model on even larger and diverse datasets.
280
Chapter 10 Machine Learning Techniques for Vision Application
281
Chapter 10 Machine Learning Techniques for Vision Application
282
Chapter 10 Machine Learning Techniques for Vision Application
S
ummary
In this chapter, we have explored the fundamental machine learning
techniques used widely in image processing techniques. The chapter is
organized as follows:
• The basic preprocessing steps involved in developing
machine learning applications
• The basic categories of traditional machine learning
techniques
• The detailed process behind artificial neural networks
(ANN) techniques that mimic the behavior of the
human brain
• The detailed process behind convolutional neural
networks (CNN) and the additional capabilities that
enable them to learn patterns from images
• Handwritten digit extraction from MNIST dataset
images using CNN
• Application of the learned model for handwritten digits
to real-time student answer script to extract the marks
for section-wise questions
283
CHAPTER 11
Industrial Vision
System Applications
We have covered a lot of ground with respect to vision systems, starting
from basic image processing techniques to more complex algorithms that
can be used to develop advanced systems capable of performing crucial
tasks. By using such complicated vision systems for industrial applications,
a number of tasks ranging from quality control to safety monitoring can
be performed in real time which can help to avoid human errors. We will
take our learnings from all the chapters and develop vision systems for
four different industrial applications in this chapter (Case Studies 11.1
through 11.4). We will further make the systems more user friendly by
incorporating them into easily understandable user interfaces.
sudo raspi-config
vcgencmd get_camera
286
Chapter 11 Industrial Vision System Applications
287
Chapter 11 Industrial Vision System Applications
import cv2
video_capture = cv2.VideoCapture('video.mp4')
while True:
ret, frame = video_capture.read()
# Your image processing code here
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video_capture.release()
cv2.destroyAllWindows()
288
Chapter 11 Industrial Vision System Applications
289
Chapter 11 Industrial Vision System Applications
290
Chapter 11 Industrial Vision System Applications
291
Chapter 11 Industrial Vision System Applications
This final image with the details of spacing as well as the status of the
thread quality (‘Good!’ or ‘Bad!’) is returned by the function as illustrated
in the following code.
import cv2
import numpy as np
from picamera import PiCamera
from picamera.array import PiRGBArray
def detect_thread_quality(image):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply Gaussian blur to reduce noise
blur = cv2.GaussianBlur(gray, (5, 5), 0)
# Apply thresholding to get binary image
_, thresh = cv2.threshold(blur, 200, 255, cv2.
THRESH_BINARY)
# Find contours
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.
CHAIN_APPROX_SIMPLE)
# Filter out contours that are too large or too small
valid_contours = []
for contour in contours:
area = cv2.contourArea(contour)
292
Chapter 11 Industrial Vision System Applications
293
Chapter 11 Industrial Vision System Applications
status = 'Bad!'
# Draw circles around the threads
for center in centers:
cv2.circle(image, center, 5, (0, 0, 255), -1)
return status, image
In the next stage we are going to use the function to detect the thread
quality from a bolt image in a video feed and display the result back in a
GUI. For this purpose, we are going to reuse the GUI that we developed
in Chapter 6 with some refinements suited for our use case as illustrated
here. First, we import the picamera method from the picamera library
that will allow us capture image using the Pi camera and also import the
PiRGBArray method that will allow us to read the captured frames as
raw numpy arrays. The GUI is comprised of two frames. The first frame
consists of two video canvasses where the first canvas is used for displaying
the frames captured with the Pi camera and the second canvas is used to
display the processed frames returned by our function marked with the
thread centers as well as the text containing the thread spacing details.
The second frame, placed parallel to the first frame with the canvasses,
consist of a label widget with the text Thread Quality and an entry widget
to display the status of the thread quality received from our function. The
resulting output as displayed in the GUI is shown in Figure 11-2.
294
Chapter 11 Industrial Vision System Applications
root = Tk()
root.title("Industrial Vision App")
root.geometry("1600x1000")
root.configure(bg="blue")
# Function to close the window
def close_window():
root.destroy()
# Create labels
company_label = Label(root, text="KMAKGA Corporation Ltd",
borderwidth=5, relief="ridge",
font=("Arial Black", 24), fg="red",
bg="dark blue")
company_label.grid(row=0, column=0, columnspan=4, padx= 30,
pady=0, sticky="ew")
version_label = Label(root, text="Version 1.0",bg="blue")
version_label.grid(row=1,column=2,padx=0,pady=0,sticky="e")
vision_label = Label(root, text="Vision System",
font=("Helvetica", 16,"bold"), fg="red", bg="blue")
vision_label.grid(row=2, column=0, columnspan=3, padx=0,
pady=10, sticky="ew")
# Create a frame to hold the canvases
frame = Frame(root, width=800, height=800, relief="ridge",
borderwidth=1,bg="blue")
frame.grid(row=3,column=0,padx=10,pady=0)
# Create canvases
video_canvas1 = Canvas(frame, borderwidth=2, width=500,
height=400, bg="blue")
video_canvas1.grid(row=3,column=0,padx=5,pady=5)
video_canvas2 = Canvas(frame, borderwidth=2, width=500,
height=400, bg="blue")
video_canvas2.grid(row=3,column=1,padx=5,pady=5)
295
Chapter 11 Industrial Vision System Applications
296
Chapter 11 Industrial Vision System Applications
297
Chapter 11 Industrial Vision System Applications
298
Chapter 11 Industrial Vision System Applications
299
Chapter 11 Industrial Vision System Applications
300
Chapter 11 Industrial Vision System Applications
import cv2
import numpy as np
import cv2
import numpy as np
def component_id(frame, template_ok, template_not_ok):
w_ok, h_ok = template_ok.shape[::-1]
w_not_ok, h_not_ok = template_not_ok.shape[::-1]
# Convert frame to grayscale
gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
301
Chapter 11 Industrial Vision System Applications
The function will return the frame with each object in it surrounded
by a bounding box and the corresponding label. The next step is to call
the function from inside the GUI code to apply it on the frames captured
by the webcam as illustrated in the following. The GUI consists of a single
frame with two canvasses for displaying the frames from the original video
and the output frames produced by the above function. The template
functions, shown in Figure 11-3, are read using the imread()function
302
Chapter 11 Industrial Vision System Applications
303
Chapter 11 Industrial Vision System Applications
304
Chapter 11 Industrial Vision System Applications
305
Chapter 11 Industrial Vision System Applications
306
Chapter 11 Industrial Vision System Applications
import cv2
def thread_count(img, thres):
img_gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
inv_img=cv2.bitwise_not(img_gray) #invert b&w colors
res,thresh_img=cv2.threshold(inv_img, thres, 255, cv2.
THRESH_BINARY_INV)
thresh_img=255- thresh_img
307
Chapter 11 Industrial Vision System Applications
contours,hierarchy=cv2.findContours(thresh_img, cv2.RETR_
TREE,cv2.CHAIN_APPROX_SIMPLE)
area = []
for i in range(len(contours)):
area.append(cv2.contourArea(contours[i]))
count=len([num for num in area if num > 4])
return count, thresh_img
We can now call this function from inside our GUI code, illustrated as
follows, to apply it on real-time frames captured from a video. As in our
previous thread quality detection system, we have two video canvasses
enclosed in a frame, one for displaying the capture video frames and
the other for displaying the inverted frames after thresholding using the
thread_count() function. Another frame consists of two labels: one label
with the text Thread count followed by a text widget next to it for displaying
the count and another label for displaying whether a fault is detected in the
bolt image. We pass the captured frames to our thread_count() function
which will produce two outputs: the count of the threads that is stored in
the variable sumt and the inverted version of the thresholded image stored
in the variable thresh_img. In our example, a threshold of 202 is provided
to this function, but the value will vary depending upon the nature of the
image. The original video frames are displayed in the first canvas, and the
inverted frames are displayed in the second canvas.
A conditional statement is then used on the sumt variable to determine
whether there is any fault in the bolt. We are using a threshold of 20 which
implies that input bolt image should have at least 20 threads, the lack of
which will be concluded as a faulty component. Therefore, we are passing
the string “No fault is detected” to the text widget in the case of “True”
condition and the string “Fault detected” in the case of “False” condition.
To give the messages a distinguishing feature, we are displaying the result
of “True” condition in green and that of “False” condition in red. Also we
produce three beep sounds in case of detection of faulty components using
308
Chapter 11 Industrial Vision System Applications
the beep() method in the winsound library. We can configure the frequency
and duration of the sound by adjusting the two input parameters to the
beep() function. The two use cases are shown in Figures 11-5 and 11-6,
respectively.
309
Chapter 11 Industrial Vision System Applications
310
Chapter 11 Industrial Vision System Applications
311
Chapter 11 Industrial Vision System Applications
312
Chapter 11 Industrial Vision System Applications
313
Chapter 11 Industrial Vision System Applications
returns the labeled image which is the image in the same variable with
the bounding box as well as text labels marked on it and the detected text
string represented by the variable t. Inside the function, the reader object
is created using the Reader() method from the easyocr library and the
readtext() method of the reader object is then used to detect the text from
the image. The output of this method, represented by the variable results,
will consist of a number of bounding boxes corresponding to different text
objects in the image and the corresponding text strings. We iterate over this
output to read and mark the text strings along with bounding boxes on the
same input image. The additional component we have added here is the
variable t which is initialized as an empty string before the loop. With each
iteration, the corresponding text from the results variable is concatenated
to this variable.
import easyocr
import cv2
def ocr_det(img):
# create a reader object and extract text from image
reader = easyocr.Reader(['en'],gpu=False)
results = reader.readtext(img,paragraph=True)
# Create bounding box and display extracted text over
the image
t = ''
for (bbox, text) in results:
(tl, tr, br, bl) = bbox
t += text + '\n'
tl = (int(tl[0]), int(tl[1]))
tr = (int(tr[0]), int(tr[1]))
br = (int(br[0]), int(br[1]))
bl = (int(bl[0]), int(bl[1]))
cv2.rectangle(img, tl, br, (0, 0, 255), 2)
314
Chapter 11 Industrial Vision System Applications
Now that we have written the code for OCR system with easyocr
library as a user-defined function, the next step is to redesign the GUI to
accommodate this use case. The following code illustrates our new GUI for
OCR system. There are two frames in the GUI. The first frame consists of
two video canvasses, one each for displaying the original video capturing
an image and the image after text extraction marked with bounding boxes
around the text areas and the corresponding text labels. The second
frame consists of a label widget displaying the title for the text output
and a text widget to display the actual text extracted from the image. To
call the function ocr_det that we created earlier, we need to import the
function from the code by using the filename that we used to save the
code. Since the code in our case is saved with the same filename name as
the function, we import the function as: from ocr_det import ocr_det. Once
all the required libraries are imported and all the widgets are created, we
then define a program to read frames from video capture object and then
apply the ocr_det function to extract the text from the image. As discussed
earlier, this function will return a labeled image with bounding box around
the text areas with corresponding labels and the text string with all the
extracted text. The original video frames are displayed in video_canvas1
and the frames created by the function with the bounding boxes and labels
are displayed in video_canvas2, and the text string is displayed in the text
widget as illustrated in Figure 11-7.
315
Chapter 11 Industrial Vision System Applications
316
Chapter 11 Industrial Vision System Applications
video_canvas1.grid(row=3,column=0,padx=5,pady=5)
video_canvas2 = Canvas(frame, borderwidth=2, width=500,
height=400, bg="blue")
video_canvas2.grid(row=3,column=1,padx=5,pady=5)
video_canvas1.create_text(80, 10, text="Video Feed",
font=("Helvetica", 10), anchor="ne")
video_canvas2.create_text(110, 10, text="Text detection",
font=("Helvetica", 10), anchor="ne")
# Create a frame to display the extracted text
entry_frame = Frame(root, width = 200, height = 400, bg="blue")
entry = Text(entry_frame, borderwidth=2, relief="ridge",
height=10, width=40)
label = Label(entry_frame, text='Detected Text',
font=("Helvetica", 18), fg="black", bg="blue")
label.grid(row=0, column=0, padx=5, pady=25, sticky="n")
entry.grid(row=1, column=0, padx=5, pady=25, sticky="w")
entry_frame.grid(row=3, column=1, padx=5, pady=5)
# create a close button
close_button = Button(root, width=15, height=2, borderwidth=2,
relief="ridge", text="Exit", command = close_window)
close_button.grid(row=4, column=0, columnspan=2, padx=0,
pady=50, sticky="n")
# Function to update the video canvas with webcam feed
def update_video_canvas():
for frame in camera.capture_continuous(cap, format='bgr',
use_video_port=True):
# Display original video in the first canvas
orig_photo = ImageTk.PhotoImage(image = Image.
fromarray(frame))
video_canvas1.create_image(0, 30, image=orig_photo,
anchor=NW)
317
Chapter 11 Industrial Vision System Applications
video_canvas1.photo = orig_photo
# Text detection using easyocr
img, t = ocr_det(frame)
text_photo = ImageTk.PhotoImage(image = Image.
fromarray(img))
video_canvas2.create_image(0, 30, image=text_photo,
anchor=NW)
video_canvas2.photo = text_photo
# Update the detected text to the text entry widget
entry.delete(0.0, END)
entry.insert(END, t)
video_canvas1.after(10, update_video_canvas)
cap.truncate(0)
# Open the Pi cam
camera = PiCamera()
cap = PiRGBArray(camera)
# Call the update_video_canvas function to start displaying the
video feed
update_video_canvas()
# Start the tkinter main loop
root.mainloop()
318
Chapter 11 Industrial Vision System Applications
S
ummary
Congratulations, you have come to the end of the book! In this chapter, we
have developed vision systems for four different industrial applications:
• Thread quality detection from nut/bolt images
It is to be noted that all the systems developed in this chapter are tested
in a simple room environment. Implementing the same systems in real
industrial environment will have its own set of challenges. Nevertheless,
you’re encouraged to take the learning from this book and develop more
such real-time vision systems.
319
Bibliography
[1] www.ibm.com/in-en/topics/computer-vision
[4] https://automation-insights.blog/2021/03/10/
machine-vision-5-simple-steps-to-choose-the-
right-camera/
322
Index
A B
Application programming Binary Large Objects (Blob)
interface (API) detection, 134–136
Azure compute Binary Robust Independent
vision, 240–247 Elementary Features
cloud-based services, 230 (BRIEF), 181–185, 197
definition, 230
Google®, 231–240
Artificial intelligence (AI), 2, 7, 258 C
Artificial neural networks (ANNs), Camera Serial Interface (CSI)
223, 261–264, 283 advantages, 27
Azure Compute Vision command line, 28
deployment completion interface, 27, 28
message, 243 module, 27
get_read_result() method, 245 OpenCV library, 31
home page, 244 PiCamera module, 29
keys and endpoint, 244 Python library, 29
Operation-Location URL source code, 29
string, 245 stream resulting, 31
physical infrastructure, 240 terminal window, 30
portal home page, 241 USB ports, 30
Python code, 245 Character user interface (CUI), 142
service configuration page, 242 Charge-coupled device (CCD), 88
source code, 246–249 Cloud computing
steps, 241 platform, 240
324
INDEX
325
INDEX
326
INDEX
Infrastructure as a Service L
(IaaS), 240
Line/shape detection
Integrated Development and
accumulator array, 190
Learning Environment
backprojection, 191
(IDLE), 36, 37, 145, 146
Canny edge detector, 190
Integrated development
circle detection, 195
environments (IDEs), 6, 33
computer vision, 189
Geany, 35
drawContours() function, 195
IDLE, 36, 37
findContours() function, 195
Thonny, 34
HoughCircles() function, 193
Internet of Things (IoT), 4, 231
Hough transform, 190–192
OpenCV library, 193, 194
K shape detection, 196, 197
thresholding, 190
Keras-ocr library, 251–254
traditional techniques, 189
K-means clustering
voting, 190
algorithm, 210
Local area network (LAN), 23, 24
definition, 212
Long Short-Term Memory
reshape() function, 212
(LSTM), 248
segmentation, 213
source code, 212, 213
k-nearest neighbors (kNN), 223
accuracy_score() method, 227 M
classification/regression Machine learning (ML)
tasks, 225 algorithms, 258
handwritten digit artificial neural networks
recognition, 226–229 (ANNs), 261–264
KNeighborsClassifier() augmentation, 258
function, 227 CNNs (see Convolutional neural
load_data() function, 226 networks (CNN))
MNIST dataset, 229 computer vision system, 255
predict() function, 228 image processing
StandardScaler() function, 227 systems, 255–257
test image, 230 learning model, 256
327
INDEX
328
INDEX
329
INDEX
330
INDEX
331