0% found this document useful (0 votes)
12 views

Unit 3-Non CNN approaches to object recognition

The document discusses the differences between object detection and image classification, highlighting that object detection provides bounding boxes and class labels for identified entities in an image. It covers traditional non-CNN approaches to object detection, such as Haar features and cascading classifiers, as well as the Viola-Jones algorithm. Applications of object detection include facial recognition, autonomous vehicles, and features like smile detection in smartphones.

Uploaded by

mailtoyashi04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit 3-Non CNN approaches to object recognition

The document discusses the differences between object detection and image classification, highlighting that object detection provides bounding boxes and class labels for identified entities in an image. It covers traditional non-CNN approaches to object detection, such as Haar features and cascading classifiers, as well as the Viola-Jones algorithm. Applications of object detection include facial recognition, autonomous vehicles, and features like smile detection in smartphones.

Uploaded by

mailtoyashi04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Object Detection and

Instance
Segmentation
Topics
• Difference between object detection and image classification
• Traditional, non CNN approaches to object detection
• R-CNN
• Regions with CNN features
• Fast R-CNN
• fast region-based CNN
• Faster R-CNN
Image classification vs object detection
How confident is
NEED FOR OBJECT DETECTION the model that
the identified
entity is the one
that is claimed?

we are very
confident that there
is an
entity, say a dog,
in the image, but its
scale and position
in the image is not
as prominent as
that of its owner, a
Person entity?
Image classification vs object detection
• Image classification
• tell you that there is at least an image, but not exactly how many of them
there are
• do not tell you where the identified entity in the image is.
• object detection
• tells you the placement of an entity in the image
• gives you bounding boxes and class labels (along with the probability of
detection) of all the entities identified in an image.
Differences between object detection
and image classification
Scenario 1
• Assume You are watching the movie 101 https://www.youtub
e.com/watch?v=nT-
Dalmatians, pCZyKmcw
• To know how many Dalmatians you can
actually count in a given movie scene from
that movie.
• Image Classification could, at best, tell you
that there is at least one dog or one
• Dalmatian but not exactly how many of them
there are.
Scenario 2
• Want to extract the image of the dog from there to search on the
web for its breed or similar dogs like it
• Problem here is that searching the whole image might not work,
and without identifying individual objects from the image, you
have to do the cut extract-search job manually for this task
Object detection
• Need a technique that not only identifies the entities in an image but also tells you
their placement in the image.
• Object detection gives you bounding boxes and class labels (along with the
probability of detection) of all the entities identified in an image.
Applications
Facial Recognition feature that you have in Facebook, Google Photos
• Face Recognition: Google Photos uses facial recognition technology to identify and group photos of the
same person across different albums and time periods.

Autonomous vehicle
It helps them detect other vehicles, pedestrians, cyclists, traffic signs, and obstacles on the road, enabling
safe navigation
Application in phone
To find out how many of the guests present at your party were actually enjoying it, you can even run an
object detection for Smiling Faces or a Smile Detector
Smile Shutter feature @phone
• To automatically click the image when most of the faces in the scene are detected as smiling
Object detection

• object detection may be considered a combination of two


tasks
• Getting the right bounding boxes (or as many of them to
filter later)
• Classifying the object in that bounding box (while
returning the classification effectiveness for filtering)
Object detection
• Region Proposals (regions that we send as proposals for classifying
objects), need to have some mechanism for finding the best values
for the following parameters:
• Starting (or center) coordinates to extract/draw the candidate
bounding box
• Length of the candidate bounding box
• Width of the candidate bounding box
• Stride across each axis (distance from one starting location to
another in the x-horizontal axis and y-vertical axis)
Object detection
• each object will have a
different scale, so we know
that one fixed value for L and
W for these boxes will not
work.
• extract N number of candidate
boxes per starting coordinate
in the image, where N
encompasses most of the
sizes/scales that may fit
classification problem.
candidate boxes generation
• L represents the length of the image, and w represents
the width of the image.
• Consider all combinations of length and width within the
image dimensions, l×w candidate boxes for each
starting coordinate.
• For example, if our image is 100×100 pixels,
considering all possible combinations of length and
width within this range would lead to 100×100=10,000
candidate boxes per starting coordinate.
• Computationally expensive and impractical.
Object detection
• how many starting coordinates we need to visit in our image from where we will
extract these N boxes each, or the Stride
• big stride will lead us to extract sub-images in themselves
• short stride (say, 1 pixel in each direction) may mean a lot of candidate boxes

• Big stride = fewer starting points = fewer boxes to look at.


• Small stride = more starting points = more boxes to examine.

• choice of stride, or the step size of the sliding window as it


moves across the image, is crucial in balancing computational
efficiency with the accuracy of object detection
NON CNN approaches to object
detection
• Libraries such as OpenCV and others include software bundles for
Smartphones, Robotic projects, and many others, to provide detection
capabilities of specific objects (face, smile, and so on)

• Traditional approaches:
• Haar features
• cascading classifiers
• Viola-Jones algorithm
Introduction
 Before CNN
 OpenCV libraries used for object detection - Smartphones, Robotic projects, and many others
 innovative ideas drawing inspirations from different fields of science and mathematics
 Haar features
 cascading classifiers
 Viola-Jones algorithm
 Haar Features (Haar wavelet – derived from maths)
 Haar classifier, or a Haar cascade classifier, is a machine learning object detection program that identifies
objects in an image and video
 These features on the image makes it easy to find out the edges or the lines in the image, or to pick areas
where there is a sudden change in the intensities of the pixels.
 Haar or Haar-like features are formations of rectangles with varying pixel density.
 sum up the pixel intensity in the adjacent rectangular regions at specific locations in the detection region.
Haar Features

 Based on the difference between the sums of pixel intensities


across regions, they categorize the different subsections of the
image - Two rectangle features, Three rectangle features,
Four rectangle features
 Works better for monochrome image
Haar Features
• These categories can be grouped into
three major groups
Two rectangle features
• Three rectangle features
• Four rectangle features

Two rectangle features


• responsible for finding out the edges in a horizontal or in a vertical direction
Three rectangle features
• responsible for finding out if there is a lighter region surrounded by darker regions on either side or
vice-versa.
Four rectangle features
• responsible for finding out change of pixel intensities across diagonals
Haar Features
How a haar feature traverses on an
image from its left towards its right.
 Haar Feature Challenges:
 Haar feature extraction involves calculating the difference between the sums of
pixel intensities in adjacent rectangular regions.
 Even though the vast majority of the sub-regions do not contain the target
object, the classifier still processes them, leading to unnecessary computational
overhead.
 inability to efficiently prioritize the analysis of regions more likely to contain the
object of interest
Cascading Classifiers
 Cascading classifiers
 combines multiple Haar features in a hierarchy to build a classifier.
 Instead of analyzing the entire image with each Haar feature, cascades break
down the detection process into stages, each consisting of a set of features.
 only a small number of pixels among the entire image is related to the
object in concern.
 The Viola-Jones algorithm
 capable of delivering detections with high TPRs (True Positive Rates) and low FPRs
(False Positive Rates)
 The constraints of the algorithm:
 It could work only for detecting, not recognizing
 The faces had to be present in the image as a frontal view. No other view could be detected.

 Heart of the algorithm: Haar (like) Features and Cascading Classifiers


 uses a subset of Haar features to determine general features on a face such as:
 Eyes (determined by a two-rectangle feature (horizontal), with a dark horizontal rectangle
above the eye forming the brow, followed by a lighter rectangle below)
 Nose (three-rectangle feature (vertical), with the nose as the center light rectangle and one
darker rectangle on either side on the nose, forming the temple), and so on
Introduction

 The Viola-Jones algorithm


 These Haar-like features are then used in the cascading classifiers to detection
problem without losing the robustness of detection.
 Drawback
 But still, the training of these cascades for a new object was very time consuming, and they had
a lot of constraints
Edge detection

You might also like