Object Detection using Python OpenCV
Saturday, March 30, 2019 8:33 PM
Clipped from: https://circuitdigest.com/tutorial/object-detection-using-
python-opencv
Object Detection using Python & OpenCV
We started with learning basics of OpenCV and then done some basic image
processing and manipulations on images followed by Image segmentations
and many other operations using OpenCV and python language. Here, in this
section, we will perform some simple object detection techniques using
template matching. We will find an object in an image and then we will
describe its features. Features are the common attributes of the image such
as corners, edges etc. We will also take a look at some common and popular
object detection algorithms such as SIFT, SURF, FAST, BREIF & ORB.
As told in the previous tutorials, OpenCV is Open Source Commuter
Vision Library which has C++, Python and Java interfaces and supports
Windows, Linux, Mac OS, iOS and Android. So it can be easily installed
in Raspberry Pi with Python and Linux environment. And Raspberry Pi with
Food Recognizer Page 1
in Raspberry Pi with Python and Linux environment. And Raspberry Pi with
OpenCV and attached camera can be used to create many real-time image
processing applications like Face detection, face lock, object tracking, car
number plate detection, Home security system etc.
Object detection and recognition form the most important use case for
computer vision, they are used to do powerful things such as
• Labelling scenes
• Robot Navigation
• Self-driving cars
• Body recognition (Microsoft Kinect)
• Disease and cancer detection
• Facial recognition
• Handwriting recognition
• Identifying objects in satellite images
Object Detection VS Recognition
Object recognition is the second level of object detection in which computer
is able to recognize an object from multiple objects in an image and may be
able to identify it.
Now, we will perform some image processing functions to find an object
from an image.
Finding an Object from an Image
Here we will use template matching for finding character/object in an image,
use OpenCV’s cv2.matchTemplate() function for finding that object
import cv2
import numpy as np
Load input image and convert it into gray
image=cv2.imread('WaldoBeach.jpg')
cv2.imshow('people',image)
cv2.waitKey(0)
gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
Load the template image
template=cv2.imread('waldo.jpg',0)
#result of template matching of object over an image
Food Recognizer Page 2
#result of template matching of object over an image
result=cv2.matchTemplate(gray,template,cv2.TM_CCOEFF)
sin_val, max_val, min_loc, max_loc=cv2.minMaxLoc(result)
Create bounding box
top_left=max_loc
#increasing the size of bounding rectangle by 50 pixels
bottom_right=(top_left[0]+50,top_left[1]+50)
cv2.rectangle(image, top_left, bottom_right, (0,255,0),5)
cv2.imshow('object found',image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Food Recognizer Page 3
In cv2.matchTemplate(gray,template,cv2.TM_CCOEFF), input the gray-scale
image to find the object and template. Then apply the template matching
method for finding the objects from the image, here cv2.TM_CCOEFF is
used.
The whole function returns an array which is inputted in result, which is the
result of the template matching procedure.
And then we use cv2.minMaxLoc(result), which gives the coordinates or the
bounding box where the object was found in an image, and when we get
those coordinates draw a rectangle over it, and stretch a little dimensions of
the box so the object can easily fit inside the rectangle.
There are variety of methods to perform template matching and in this case
we are using cv2.TM_CCOEFF which stands for correlation coefficient.
cv2.matchTemplate takes a “sliding window” of the object and slides it over
the image from left to right and top to bottom, one pixel at a time. Then for
each location, we compute the correlation coefficient to determine how
“good” or “bad” the match is.
Regions with sufficiently high correlation can be considered as matches, from
there all we need is to call to cv2.minMaxLoc to find where the good
matches are in template matching.
Food Recognizer Page 4
matches are in template matching.
Feature Description Theory
In template matching we slide a template image across a source image until
a match is found. But it is not the best method for object recognition, as it
has severe limitations. This method isn’t very resilient.
The following factors make template matching a bad choice for object
detection.
• Rotation renders this method ineffective.
• Size (known as scaling) affects this as well.
• Photometric changes (e.g. brightness, contrast, hue etc.)
• Distortion form view point changes (Affine).
The one solution for this problem is image features
Image features are interesting areas of an image that are somewhat
unique to that specific image. They are also called key point features or
interest points.
The sky is an uninteresting feature, whereas as certain keypoints (marked in
red circles) can be used for the detection of the above image (interesting
Features). The image shown above clearly shows the difference between the
interesting feature and uninteresting feature.
Importance of feature detection
Food Recognizer Page 5
Features are important as they can be used to analyze, describe and match
images. They have extensive use in:
• Image alignment – e.g panorma stiching (finding corresponding
matches so we can stitch images together)
• 3D reconstruction
• Robot navigation
• Object recognition
• Motion tracking
• And more!
What defines the interest points?
Interesting areas carry a lot of distinct information and unique information of
Food Recognizer Page 6
Interesting areas carry a lot of distinct information and unique information of
an area. Typically, they are areas of high change of intensity, corners or
edges and more. But always be careful as noise can appear “informative”
when it is not! So try to blur so as to reduce noise.
Characteristic of Good or Interesting Features
Repeatable – They can be found in multiple pictures of the same scene.
Distinctive – Each feature is somewhat unique and different to other
features of the same scene.
Compactness/Efficiency – Significantly less features than pixels in the
image.
Locality – Feature occupies a small area of the image and is robust to
clutter and occlusion.
Food Recognizer Page 7
Corners as features
Corners are identified when shifting a window in any direction over that
point gives a large change in intensity.
Corners are not the best cases for identifying the images, but yes they have
certainly good use cases of them which make them handy to use.
So to identify corners in your image, imagine the green window we are
looking at and the black one is the image we want to find corners in, and
now when we move the window only inside the black box we see there is no
change in intensity and hence the image is flat i.e. no corners identified.
Now when we move the window in one direction we see that there is change
of intensity in one direction only, hence it’s an edge not a corner.
When we move the window in the corner, and no matter in what direction
Food Recognizer Page 8
When we move the window in the corner, and no matter in what direction
we move the window now there is a change in intensity, and this is identified
as a corner.
So let’s identify corner with the help of Harris Corner Detection algorithm,
developed in 1998 for corner detection and works fairly well.
The following OpenCV function is used for the detection of the corners.
cv2.cornerHarris(input image, block size, ksize, k)
Input image - Should be grayscale and float32 type.
blockSize - The size of neighborhood considered for corner detection
ksize - Aperture parameter of Sobel derivative used.
k - Harris detector free parameter in the equation
Output – array of corner locations (x,y)
Also an important thing to note is that Harris corner detection algorithm
requires a float 32 array datatype of image, i.e. image should be gray
image of float 32 type.
import cv2
import numpy as np
Load image then grayscale
image = cv2.imread('chess.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
The cornerHarris function requires the array datatype to be float32
gray = np.float32(gray)
harris_corners = cv2.cornerHarris(gray, 3, 3, 0.05)
We use dilation of the corner points to enlarge them
kernel = np.ones((7,7),np.uint8)
harris_corners = cv2.dilate(harris_corners, kernel, iterations = 2)
Threshold for an optimal value, it may vary depending on the image
Food Recognizer Page 9
Threshold for an optimal value, it may vary depending on the image
image[harris_corners > 0.025 * harris_corners.max() ] = [255, 127,
127]
cv2.imshow('Harris Corners', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Corner Harris returns the location of the corners, so as to visualize these
tiny locations we use dilation so as to add pixels to the edges of the corners.
So to enlarge the corner we run the dilation twice. And then we again do
some thresholding to change the colors of the corners.
The following function is used for the same with the below mentioned
parameters
cv2.goodFeaturesToTrack(input image, maxCorners, qualityLevel,
minDistance)
Input Image - 8-bit or floating-point 32-bit, single-channel image.
Food Recognizer Page 10
• Input Image - 8-bit or floating-point 32-bit, single-channel image.
• maxCorners – Maximum number of corners to return. If there are
more number of corners than the total numbers of corners which are
actually found, then the strongest one of them is returned.
• qualityLevel – Parameter characterizing the minimal accepted quality
of image corners. The parameter value is multiplied by the best corner
quality measure (smallest eigenvalue). The corners with the quality
measure less than the product are rejected. For example, if the best
corner has the quality measure = 1500, and the qualityLevel=0.01 ,
then all the corners with the quality measured less than 15 are rejected.
• minDistance – Minimum possible Euclidean distance between the
returned corners.
import cv2
import numpy as np
img = cv2.imread('chess.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
We specify the top 50 corners
corners = cv2.goodFeaturesToTrack(gray, 100, 0.01, 15)
for corner in corners:
x, y = corner[0]
x = int(x)
y = int(y)
cv2.rectangle(img,(x-10,y-10),(x+10,y+10),(0,255,0), 2)
cv2.imshow("Corners Found", img)
cv2.waitKey()
cv2.destroyAllWindows()
It also returns the array of location of the corners like previous method, so
we iterate through each of the corner position and plot a rectangle over it.
Food Recognizer Page 11
Problems with corners as features
Corner matching in images is tolerant of or corner detection don’t
have any problem with image detection when the image is
• Rotated
• Translated (i.e. shifts in image)
• Slight photometric changes e.g. brightness
or affine intensity
However, it is intolerant of:
• Large changes in intensity or photometric
changes)
• Scaling (i.e. enlarging or shrinking)
Food Recognizer Page 12
SIFT, SURF, FAST, BRIEF & ORB Algorithms
Scale Invariant Feature Transform (SIFT)
The corner detectors like Harris corner detection algorithm are rotation
invariant, which means even if the image is rotated we could still get the
same corners. It is also obvious as corners remain corners in rotated image
also. But when we scale the image, a corner may not be the corner as
shown in the above image.
SIFT is used to detect interesting keypoints in an image using the difference
of Gaussian method, these are the areas of the image where variation
exceeds a certain threshold and are better than edge descriptor.
Then we create a vector descriptor for these interesting areas. And the scale
Invariance is achieved via the following process:
i. Interesting points are scanned at several different scales.
ii. The scale at which we meet a specific stability criteria, is then
selected and encoded by the vector descriptor. Therefore, regardless of the
initial size, the more stable scale is found which allows us to be scale
invariant.
Rotation invariance is achieved by obtaining the Orientation
Assignment of the key point using image gradient magnitudes. Once we
know the 2D direction, we can normalize this direction.
Food Recognizer Page 13
know the 2D direction, we can normalize this direction.
A full paper on SIFT can be read here:
http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf.
And you can also find a tutorial on the official OpenCV link.
Food Recognizer Page 14
Speeded Up Robust Features (SURF)
SURF is the speeded up version of SIFT, as the SIFT is quite computational
expensive
SURF was developed to improve the speed of a scale invariant feature
detector. Instead of using the Difference of Gaussian approach, SURF uses
Hessian matrix approximation to detect interesting points and uses the sum
of Haar wavelet responses for orientation assignment.
A full paper on SIFT can be read here: http://www.vision.ee.ethz.ch/
~surf/eccv06.pdf
Alternatives of SIFT and SURF
As the SIFT and SURF are patented they are not freely available for
commercial use however there are alternatives to these algorithms which
are explained in brief here
Features from Accelerated Segment Test (FAST)
• Key point detection only (no descriptor, we can use SIFT or SURF to
compute that)
• Used in real time applications
Here you can find the papers on FAST
https://www.edwardrosten.com/work/rosten_2006_machine.pdf
Binary Robust Independent Elementary Features (BRIEF)
• Computers descriptors quickly (instead of using SIFT or SURF)
• it is quite fast.
Here you can find the paper on BRIEF
http://cvlabwww.epfl.ch/~lepetit/papers/calonder_pami11.pdf
Oriented FAST and Rotated BRIEF (ORB)
• Developed out of OpenCV Labs (not patented so free to use!)
• Combines both Fast and Brief
Here you can find the paper on ORB
http://www.willowgarage.com/sites/default/files/orb_final.pdf
Food Recognizer Page 15
Using SIFT, SURF, FAST, BRIEF & ORB in OpenCV
Flow process for SIFT, SURF, FAST, BRIEF & ORB
Feature Detection implementation
The SIFT & SURF algorithms are patented by their respective creators, and
while they are free to use in academic and research settings, you should
technically be obtaining a license/permission from the creators if you are
using them in a commercial (i.e. for-profit) application.
Below we are explaining programming examples of all the
algorithms mentioned above.
SIFT
import cv2
import numpy as np
image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Food Recognizer Page 16
Create SIFT Feature Detector object
sift = cv2.xfeatures2d.SIFT_create()
#Detect key points
keypoints = sift.detect(gray, None)
print("Number of keypoints Detected: ", len(keypoints))
Draw rich key points on input image
image = cv2.drawKeypoints(image, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('Feature Method - SIFT', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 1893
Here the keypoints are (X,Y) coordinates extracted using sift detector and
drawn over the image using cv2 draw keypoint function.
Food Recognizer Page 17
SURF
import cv2
import numpy as np
image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create SURF Feature Detector object, here we set hessian threshold
to 500
# Only features, whose hessian is larger than hessianThreshold are retained
by the detector
#you can increase the value of hessian threshold to decrease the keypoints
surf = cv2.xfeatures2d.SURF_create(500)
keypoints, descriptors = surf.detectAndCompute(gray, None)
print ("Number of keypoints Detected: ", len(keypoints))
Draw rich key points on input image
image = cv2.drawKeypoints(image, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('Feature Method - SURF', image)
cv2.waitKey()
cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 1548
Food Recognizer Page 18
FAST
import cv2
import numpy as np
image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create FAST Detector object
fast = cv2.FastFeatureDetector_create()
# Obtain Key points, by default non max suppression is On
# to turn off set fast.setBool('nonmaxSuppression', False)
keypoints = fast.detect(gray, None)
print ("Number of keypoints Detected: ", len(keypoints))
Draw rich keypoints on input image
image = cv2.drawKeypoints(image, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('Feature Method - FAST', image)
cv2.waitKey()
cv2.destroyAllWindows()
Console Output:
Food Recognizer Page 19
Console Output:
Number of keypoints Detected: 8960
BRIEF
import cv2
import numpy as np
image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create FAST detector object
brief = cv2.xfeatures2d.BriefDescriptorExtractor_create()
Create BRIEF extractor object
#brief = cv2.DescriptorExtractor_create("BRIEF")
# Determine key points
keypoints = fast.detect(gray, None)
Obtain descriptors and new final keypoints using BRIEF
keypoints, descriptors = brief.compute(gray, keypoints)
print ("Number of keypoints Detected: ", len(keypoints))
Food Recognizer Page 20
print ("Number of keypoints Detected: ", len(keypoints))
Draw rich keypoints on input image
image = cv2.drawKeypoints(image, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('Feature Method - BRIEF', image)
cv2.waitKey()
cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 8735
ORB
import cv2
import numpy as np
image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Create ORB object, we can specify the number of key points we
desire
orb = cv2.ORB_create()
Food Recognizer Page 21
orb = cv2.ORB_create()
# Determine key points
keypoints = orb.detect(gray, None)
Obtain the descriptors
keypoints, descriptors = orb.compute(gray, keypoints)
print("Number of keypoints Detected: ", len(keypoints))
Draw rich keypoints on input image
image = cv2.drawKeypoints(image, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('Feature Method - ORB', image)
cv2.waitKey()
cv2.destroyAllWindows()
Console Output:
Number of keypoints Detected: 500
We can specify the number of keypoints which has maximum limit of 5000,
however the default value is 500, i.e. ORB automatically would detect best
500 keypoints if not specified for any value of keypoints.
So this is how object detection takes place in OpenCV, the same programs
Food Recognizer Page 22
So this is how object detection takes place in OpenCV, the same programs
can also be run in OpenCV installed Raspberry Pi and can be used as a
portable device like Smartphones having Google Lens.
Food Recognizer Page 23