0% found this document useful (0 votes)
48 views21 pages

CV Unit 3

Uploaded by

bgvirat53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views21 pages

CV Unit 3

Uploaded by

bgvirat53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Real World Computer Vision Applications

Syllabus content
Unit 3: Facial Recognition with Computer Vision

 Facial Recognition with Computer Vision Overview


 Face Detection Algorithm
 Face Detection Implementation
 Test Photographs
 Alternative to OpenCV
 Object Detection with Computer Vision Overview
 Benefits of Object Detection
 Working of Object Detection
 Create a custom object detector
 Use a pretrained object Detector
 Other object Detection method

What is Facial Recognition:

Facial recognition is a type of computer vision that uses optical input to analyze images and identify
faces. It's a form of artificial intelligence (AI) that mimics the human ability to recognize faces. Facial
recognition software uses AI, image recognition, and other advanced technologies to map, analyze, and
confirm a face's identity.
Detection is the process of finding a face in an image. Enabled by computer vision, facial recognition can
detect and identify individual faces from an image containing one or many people's faces.

Facial recognition is a system used to identify a person by analyzing the individual's facial features, and the
term also refers to the software that automates the process. It scans the person's face, notes key characteristics,
and compares it to another image stored in a database. If the images match, the system confirms the identity.

Two broad categories used to classify facial recognition software are holistic and feature-based:

Holistic models examine your entire face and compare your features to those in images stored in a database.

A feature-based model analyses your face more deeply—for example, considering measurements between
features and the contours of bones.
Real World Computer Vision Applications

First, detection and recognition are different tasks. Face detection is the crucial part of face recognition

determining the number of faces on the picture or video without remembering or storing details. It may define
some demographic data like age or gender, but it cannot recognize individuals.

Face recognition identifies a face in a photo or a video image against a pre-existing database of faces. Faces

indeed need to be enrolled into the system to create the database of unique facial features. Afterward, the

system breaks down a new image into key features and compares them against the information stored in the

database.

How facial recognition works

Facial recognition software typically follows a three-step process: detection, analysis, and recognition.
 Detect: In the first step, the program searches through an image looking for facial data. It views faces
from the front and side, looking for distinctive features to analyze in the next step.
 Analyze: After identifying a face in an image, the program examines facial landmarks like the distance
from the chin to the forehead and between the eyes. It also considers the shape of different features like
the cheekbones, lips, ears, and more.
 Recognize: In the final step of the process, the facial recognition program applies what it's learned
from the data to verify an individual's identity. It may compare the current image under analysis with a
stored image like one used on a government ID.
Real World Computer Vision Applications

What is facial recognition used for?


Facial recognition software has multiple uses, including protecting access to sensitive information, confirming
identity, and preventing fraud. What was once only seen in sci-fi flicks now has applications in your daily life.
The following list highlights some ways you may see people and organizations using facial recognition
technology:
 Access control: Verify identity before granting access to devices, buildings, and documents.
 Attendance: Scan people as they enter a facility to create an attendance record for work or school.
 Banking: Confirm a customer's identity at ATMs and banking centers to prevent fraud.
 Customer experience: Notify authorities when known shoplifters are in stores, suggest products for
customers, and allow customers to pay for purchases.
 Health care: Improve infection control by reducing the number of touchpoints in a facility, identify
genetic diseases, and monitor patients.
 Investigations: Assist detectives during investigations and ensure officers have arrested the correct
individuals.
 Security: Confirm the identity of individuals, track movements, and prevent unauthorized access to
sensitive locations and equipment.
 Transportation: Verify passengers' identity at airports and border crossings to increase convenience
and security.
Real World Computer Vision Applications

What is face detection?


Face detection, also called facial detection, is an artificial intelligence (AI)-based computer technology used
to find and identify human faces in digital images and video. Face detection technology is often used for
surveillance and tracking of people in real time. It is used in various fields including security, biometrics, law
enforcement, entertainment and social media.

Face detection uses machine learning (ML) and artificial neural network (ANN) technology, and plays an
important role in face tracking, face analysis and facial recognition. In face analysis, face detection uses facial
expressions to identify which parts of an image or video should be focused on to determine age, gender and
emotions. In a facial recognition system, face detection data is required to generate a faceprint and match it
with other stored faceprints.

How face detection works


Face detection applications use AI algorithms, ML, statistical analysis and image processing to find human
faces within larger images and distinguish them from nonface objects such as landscapes, buildings and
other human body parts. Before face detection begins, the analyzed media is preprocessed to improve its
quality and remove images that might interfere with detection.

Face detection algorithms typically start by searching for human eyes, one of the easiest features to detect.
They then try to detect facial landmarks, such as eyebrows, mouth, nose, nostrils and irises. Once the
algorithm concludes that it has found a facial region, it does additional tests to confirm that it has detected a
face.

To ensure accuracy, the algorithms are trained on large data sets that incorporate hundreds of thousands of
positive and negative images. The training improves the algorithms' ability to determine whether there are
faces in an image and where they are.
Real World Computer Vision Applications

Face detection software


detects faces by identifying facial features in a photo or video using machine learning algorithms. It first
looks for an eye, and from there it identifies other facial features. It then compares these features to training
data to confirm it has detected a face.

Face detection methods

First, the computer examines either a photo or a video image and tries to distinguish faces from any other

objects in the background. There are methods that a computer can use to achieve this, compensating for

illumination, orientation, or camera distance. Yang, Kriegman, and Ahuja presented a classification for face

detection methods. These methods are divided into four categories, and the face detection algorithms could

belong to two or more groups.


Real World Computer Vision Applications

Knowledge-based face detection

This method relies on the set of rules developed by humans according to our knowledge. We know that a face

must have a nose, eyes, and mouth within certain distances and positions with each other. The problem with

this method is to build an appropriate set of rules. If the rules are too general or too detailed, the system ends

up with many false positives. However, it does not work for all skin colors and depends on lighting

conditions that can change the exact hue of a person’s skin in the picture.

Template matching

The template matching method uses predefined or parameterized face templates to locate or detect the faces

by the correlation between the predefined or deformable templates and input images. The face model can be

constructed by edges using the edge detection method.


Real World Computer Vision Applications

A variation of this approach is the controlled background technique. If you are lucky to have a frontal face

image and a plain background, you can remove the background, leaving face boundaries.

For this approach, the software has several classifiers for detecting various types of front-on faces and some

for profile faces, such as detectors of eyes, a nose, a mouth, and in some cases, even a whole body. While the

approach is easy to implement, it is usually inadequate for face detection.

Feature-based face detection

The feature-based method extracts structural features of the face. It is trained as a classifier and then used to

differentiate facial and non-facial regions. One example of this method is color-based face detection that

scans colored images or videos for areas with typical skin color and then looks for face segments.

Haar Feature Selection relies on similar properties of human faces to form matches from facial features:

location and size of the eye, mouth, bridge of the nose, and the oriented gradients of pixel intensities. There

are 38 layers of cascaded classifiers to obtain the total number of 6061 features from each frontal face. You

can find some pre-trained classifiers here. Histogram of Oriented Gradients (HOG) is a feature extractor for

object detection. The features extracted are the distribution (histograms) of directions of gradients (oriented

gradients) of the image.

Histogram of Oriented Gradients (HOG) is a feature extractor for object detection. The features extracted

are the distribution (histograms) of directions of gradients (oriented gradients) of the image.

Gradients are typically large round edges and corners and allow us to detect those regions. Instead of

considering the pixel intensities, they count the occurrences of gradient vectors to represent the light direction

to localize image segments. The method uses overlapping local contrast normalization to improve accuracy.
Real World Computer Vision Applications

Appearance-based face detection

The more advanced appearance-based method depends on a set of delegate training face images to find out

face models. It relies on machine learning and statistical analysis to find the relevant characteristics of face

images and extract features from them. This method unites several algorithms:

Eigenface-based algorithm efficiently represents faces using Principal Component Analysis (PCA). PCA is

applied to a set of images to lower the dimension of the dataset, best describing the variance of data. In this

method, a face can be modeled as a linear combination of eigenfaces (set of eigenvectors). Face recognition,

in this case, is based on the comparing of coefficients of linear representation.

Face Detection Implementation

We will use the Python OpenCV Library, which is one of the main tools on the market for developing

Visual Computing applications.


Real World Computer Vision Applications

Download training screenshot, and Cascade Classifier.XLM training with face identification training.

Let’s now show our code in Python:


import cv2 # OpenCV Import
img = cv2.imread('/content/imagem-computer-vision.jpg', cv2.IMREAD_UNCHANGED) # Import Image with Peoples
cv2_imshow(img)

detector_face = cv2.CascadeClassifier('/content/haarcascade_frontalface_default.xml')

imagem_cinza = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

cv2_imshow(imagem_cinza)

deteccoes = detector_face.detectMultiScale(imagem_cinza, scaleFactor=1.3, minSize=(30,30))


deteccoes
array([[1635, 156, 147, 147],
Real World Computer Vision Applications
[ 284, 262, 114, 114],
[1149, 260, 129, 129],
[ 928, 491, 171, 171],
[ 222, 507, 151, 151]], dtype=int32)
for (x, y, l, a) in deteccoes:

#print(x, y, l, a)

cv2.rectangle(img, (x, y), (x + l, y + a), (0,255,0), 2)

cv2_imshow(img)

In selecting the alternatives to OpenCV, we adopted the following


criteria:
 Ease of adoption of the technology

 Usability

 Scalability,

 Robustness

 Flexibility

Below is a list of my alternatives, following the criteria above:

1) Microsoft Computer Vision API

2) AWS Rekognition

3) Google Cloud Vision API

4) Scikit-Image

5) SimpleCV
Real World Computer Vision Applications
6) Azure Face API

7) DeepDream

8) IBM Watson Visual Recognition

9) Clarifi

10) DeepPy

Object Detection

Object detection is a computer vision technique that uses machine learning or deep learning algorithms to
identify and locate objects in images or videos. Object detection models can identify multiple objects in a
single image or video, and can tell you if an object is present and where it is.

Object detection is a computer vision technique for locating instances of objects in images or videos. Object
detection algorithms typically leverage machine learning or deep learning to produce meaningful results.
When humans look at images or video, we can recognize and locate objects of interest within a matter of
moments. The goal of object detection is to replicate this intelligence using a computer.

Object detection models typically identify objects from a known set of classes, such as people, cars, or
animals. For example, if a picture of two dogs is given to an object detection model, the model might draw a
box around each dog and label the box "dog".

Why object detection required

Object detection is a key technology behind advanced driver assistance systems (ADAS) that enable cars to
detect driving lanes or perform pedestrian detection to improve road safety. Object detection is also useful in
applications such as video surveillance or image retrieval systems.
Real World Computer Vision Applications

How object detection works?


Object Detection Using Deep Learning

You can use a variety of techniques to perform object detection. Popular deep learning–based approaches
using convolutional neural networks (CNNs), such as R-CNN and YOLO v2, automatically learn to detect
objects within images.

You can choose from two key approaches to get started with object detection using deep learning:

 Create and train a custom object detector. To train a custom object detector from scratch, you
need to design a network architecture to learn the features for the objects of interest. You also need to
compile a very large set of labeled data to train the CNN. The results of a custom object detector can
be remarkable. That said, you need to manually set up the layers and weights in the CNN, which
requires a lot of time and training data.

 Use a pretrained object detector. Many object detection workflows using deep learning
leverage transfer learning, an approach that enables you to start with a pretrained network and then
fine-tune it for your application. This method can provide faster results because the object detectors
have already been trained on thousands, or even millions, of images.

Detecting a stop sign using a pretrained R-CNN.

Whether you create a custom object detector or use a pretrained one, you will need to decide what type of
object detection network you want to use: a two-stage network or a single-stage network.
Real World Computer Vision Applications
Two-Stage Networks

The initial stage of two-stage networks, such as R-CNN and its variants, identifies region proposals, or
subsets of the image that might contain an object. The second stage classifies the objects within the region
proposals. Two-stage networks can achieve very accurate object detection results; however, they are
typically slower than single-stage networks.

High-level architecture of R-CNN (top) and Fast R-CNN (bottom) object detection.

Single-Stage Networks

In single-stage networks, such as YOLO v2, the CNN produces network predictions for regions across the
entire image using anchor boxes, and the predictions are decoded to generate the final bounding boxes for
the objects. Single-stage networks can be much faster than two-stage networks, but they may not reach the
same level of accuracy, especially for scenes containing small objects.
Real World Computer Vision Applications

Overview of YOLO v2 object detection.

Object Detection Using Machine Learning

Machine learning techniques are also commonly used for object detection, and they offer different
approaches than deep learning. Common machine learning techniques include:

 Aggregate channel features (ACF)

 SVM classification using histograms of oriented gradient (HOG) features

 The Viola-Jones algorithm for human face or upper body detection

Tracking pedestrians using an ACF object detection algorithm.

Similar to deep learning–based approaches, you can choose to start with a pretrained object detector or
create a custom object detector to suit your application. You will need to manually select the identifying
features for an object when using machine learning, compared with automatic feature selection in a deep
learning–based workflow.

Machine Learning vs. Deep Learning for Object Detection

Determining the best approach for object detection depends on your application and the problem you’re
trying to solve. The main consideration to keep in mind when choosing between machine learning and deep
learning is whether you have a powerful GPU and lots of labeled training images. If the answer to either of
these questions is no, a machine learning approach might be the better choice. Deep learning techniques tend
to work better when you have more images, and GPUs decrease the time needed to train the model.

Other Object Detection Methods

In addition to deep learning– and machine learning–based object detection, there are several other common
techniques that may be sufficient depending on your application, such as:
Real World Computer Vision Applications
 Image segmentation and blob analysis, which uses simple object properties such as size, shape, or
color

 Feature-based object detection, which uses feature extraction, matching, and RANSAC to estimate
the location of an object

What are the alternative of OpenCV

OpenCV is one of the most popular and widely-used libraries for computer vision tasks.
However, there are several other libraries and frameworks available that offer alternatives to
OpenCV, each with its own set of features, strengths, and weaknesses.
Real World Computer Vision Applications

Alternatives to OpenCV for Computer Vision


Below are some of the top alternatives of OpenCV for computer vision in Python:
TensorFlow
TensorFlow, developed by Google, is primarily known as a deep learning framework.
However, it also provides a comprehensive set of tools and APIs for computer vision tasks
through its TensorFlow Image Processing (TF Image) module. TensorFlow offers high-level
abstractions for building and training deep neural networks for image classification, object
detection, segmentation, and more.

PyTorch
PyTorch, developed by Facebook, is another popular deep learning framework widely used in
the research community. PyTorch offers a flexible and intuitive interface for building custom
neural networks for various computer vision tasks. It provides dynamic computation graphs,
making it easy to experiment with different network architectures and algorithms.

scikit-image
scikit-image is a Python library specifically designed for image processing tasks. It provides a
collection of algorithms and functions for image filtering, feature extraction, segmentation,
and more. scikit-image is built on top of NumPy, making it easy to integrate with other
scientific computing libraries in the Python ecosystem.

Dlib
Dlib is a C++ library that offers a wide range of tools and algorithms for machine learning,
computer vision, and image processing. It is known for its robust implementation of facial
landmark detection, object tracking, and facial recognition algorithms. Dlib also provides
Python bindings for easy integration into Python projects.

Simple Cv
SimpleCV is a Python framework designed to make computer vision tasks accessible to
beginners and non-experts. It provides a high-level interface for common computer vision
tasks, such as image acquisition, processing, feature extraction, and object detection.
Real World Computer Vision Applications
SimpleCV abstracts away much of the complexity involved in computer vision, making it
suitable for rapid prototyping and experimentation.

Caffe
Caffe is a deep learning framework developed by Berkeley AI Research (BAIR). While it is
primarily focused on deep learning tasks, Caffe also includes modules for computer vision
tasks such as image classification, object detection, and segmentation. Caffe is known for its
speed and efficiency, particularly in training large-scale convolutional neural networks
(CNNs).

MXNet
MXNet is a deep learning framework that offers support for both symbolic and imperative
programming models. It provides a comprehensive set of tools and APIs for building and
deploying deep learning models for computer vision tasks. MXNet’s flexibility and scalability
make it suitable for both research and production environments.

What is Object Detection ?

Object detection is a computer vision technique that uses machine learning or deep learning to
locate and classify objects in images or videos. The goal is to develop computational models
that can answer the fundamental question, "What objects are where?"

Object detection is a technique that uses neural networks to localize and


classify objects in images. This computer vision task has a wide range of
applications, from medical imaging to self-driving cars.

Object detection can be used in many areas, including: Medical imaging, Self-driving cars,
Image retrieval, Video surveillance, and Food manufacturing.
To train an object detection model, you need to create a neural network and show it images of an
object in different scenarios. You then label the object and its location.
Real World Computer Vision Applications

Understanding Object Detection


Object detection primarily aims to answer two critical questions about any image: “Which
objects are present?” and “Where are these objects situated?” This process involves both
object classification and localization:
 Classification: This step determines the category or type of one or more objects
within the image, such as a dog, car, or tree.
 Localization: This involves accurately identifying and marking the position of an
object in the image, typically using a bounding box to outline its location.
Key Components of Object Detection
1. Image Classification
Image classification assigns a label to an entire image based on its content. While it’s a crucial
step in understanding visual data, it doesn’t provide information about the object’s location
within the image.
2. Object Localization
Object localization goes a step further by not only identifying the object but also determining
its position within the image. This involves drawing bounding boxes around the objects.
3. Object Detection
Object detection merges image classification and localization. It detects multiple objects in an
image, assigns labels to them, and provides their locations through bounding boxes.

Working of object detection

Or

How object detection works?

 Looking at the Picture: Imagine a computer looking at a picture.

 Finding Clues: The computer looks for clues like shapes, colors, and patterns in the picture.

 Guessing What’s There: Based on those clues, it makes guesses about what might be in the

picture.

 Checking the Guesses: It checks each guess by comparing it to things it already knows.

 Drawing Boxes: If it’s pretty sure about something, it draws a box around it to show whe re it

thinks the object is.


Real World Computer Vision Applications

 Making Sure: Finally, it double-checks its guesses to make sure it got things right and fix any

mistakes

How to create a custom object detector:

Creating a custom object detector in computer vision involves several steps, including preparing the dataset,
training the model, and evaluating the results.

Preparing the dataset


 Collect images: Gather photos of the objects you want to detect.
 Annotate images: Label the objects in the images with bounding boxes.
Real World Computer Vision Applications

 Split data: Divide the dataset into training and testing sets.
Training the model
 Choose a model architecture: Select a model like YOLO or EfficientNet Lite.
 Train the model: Use the training data to train the model.
 Evaluate the model: Assess the model's performance using the testing data.
Additional steps
 Convert to TFRecord format: If using TensorFlow, convert the annotations to TFRecord format.
 Create a configuration file: Configure the training process.
 Export the model: Save the model for use in other applications.
Tools and resources
 TensorFlow Object Detection API: A popular framework for training object detectors.
 Detecto: A library that allows training a model with just a few lines of code.
 TensorFlow Lite Model Maker: A tool for training object detection models for edge devices.
 YOLOv7: A step-by-step guide for training a custom object detector with YOLOv7.
 Custom Vision Service: A Microsoft Azure service for building object detectors.
 MATLAB: A toolbox that allows training custom object detectors.

Optical Character Recognition with Computer Vision

Optical Character Recognition (OCR) is a machine-learning-based technology that uses computer vision and
pattern recognition to convert text from images and documents into a machine-readable format. This format
Real World Computer Vision Applications
can be digitally modified and used in machine processes, such as cognitive computing, or presented on the
web.

OCR can be used to extract text from many types of documents, including:

Forms, Invoices, Articles, Reports, Street signs, Product labels, and Posters.

OCR works by analyzing the structure of an image, dividing it into elements like blocks of text, tables, or
images. The lines are then divided into words and characters, which are compared to a set of pattern
images. Once a character is identified, it's converted into an ASCII code, which computer systems can use
for further manipulation.

Examples of OCR engines include: Text extraction tools, PDF to .txt converters, and Google's image search
function.

History of OCR

Early OCR technologies were purely mechanical, dating back to 1914 when “Emanuel Goldberg developed
a machine that could read characters and then converted them into standard telegraph code” (Dhavale,
2017, p. 91). Goldberg continued his research into the 1920s and 1930s when he developed a system that
searched microfilm (scaled-down documents, typically films, newspapers, journals, etc.) for characters and
then OCR’d them.

In 1974, Ray Kurzweil and Kurzweil Computer Products, Inc. continued developing OCR systems,
mainly focusing on creating a “reading machine for the blind.” Kurzweil’s work caught industry leader
Xerox’s attention, who wished to commercialize the software further and develop OCR applications for
document understanding.

You might also like