CV Unit 3
CV Unit 3
Syllabus content
Unit 3: Facial Recognition with Computer Vision
Facial recognition is a type of computer vision that uses optical input to analyze images and identify
faces. It's a form of artificial intelligence (AI) that mimics the human ability to recognize faces. Facial
recognition software uses AI, image recognition, and other advanced technologies to map, analyze, and
confirm a face's identity.
Detection is the process of finding a face in an image. Enabled by computer vision, facial recognition can
detect and identify individual faces from an image containing one or many people's faces.
Facial recognition is a system used to identify a person by analyzing the individual's facial features, and the
term also refers to the software that automates the process. It scans the person's face, notes key characteristics,
and compares it to another image stored in a database. If the images match, the system confirms the identity.
Two broad categories used to classify facial recognition software are holistic and feature-based:
Holistic models examine your entire face and compare your features to those in images stored in a database.
A feature-based model analyses your face more deeply—for example, considering measurements between
features and the contours of bones.
Real World Computer Vision Applications
First, detection and recognition are different tasks. Face detection is the crucial part of face recognition
determining the number of faces on the picture or video without remembering or storing details. It may define
some demographic data like age or gender, but it cannot recognize individuals.
Face recognition identifies a face in a photo or a video image against a pre-existing database of faces. Faces
indeed need to be enrolled into the system to create the database of unique facial features. Afterward, the
system breaks down a new image into key features and compares them against the information stored in the
database.
Facial recognition software typically follows a three-step process: detection, analysis, and recognition.
Detect: In the first step, the program searches through an image looking for facial data. It views faces
from the front and side, looking for distinctive features to analyze in the next step.
Analyze: After identifying a face in an image, the program examines facial landmarks like the distance
from the chin to the forehead and between the eyes. It also considers the shape of different features like
the cheekbones, lips, ears, and more.
Recognize: In the final step of the process, the facial recognition program applies what it's learned
from the data to verify an individual's identity. It may compare the current image under analysis with a
stored image like one used on a government ID.
Real World Computer Vision Applications
Face detection uses machine learning (ML) and artificial neural network (ANN) technology, and plays an
important role in face tracking, face analysis and facial recognition. In face analysis, face detection uses facial
expressions to identify which parts of an image or video should be focused on to determine age, gender and
emotions. In a facial recognition system, face detection data is required to generate a faceprint and match it
with other stored faceprints.
Face detection algorithms typically start by searching for human eyes, one of the easiest features to detect.
They then try to detect facial landmarks, such as eyebrows, mouth, nose, nostrils and irises. Once the
algorithm concludes that it has found a facial region, it does additional tests to confirm that it has detected a
face.
To ensure accuracy, the algorithms are trained on large data sets that incorporate hundreds of thousands of
positive and negative images. The training improves the algorithms' ability to determine whether there are
faces in an image and where they are.
Real World Computer Vision Applications
First, the computer examines either a photo or a video image and tries to distinguish faces from any other
objects in the background. There are methods that a computer can use to achieve this, compensating for
illumination, orientation, or camera distance. Yang, Kriegman, and Ahuja presented a classification for face
detection methods. These methods are divided into four categories, and the face detection algorithms could
This method relies on the set of rules developed by humans according to our knowledge. We know that a face
must have a nose, eyes, and mouth within certain distances and positions with each other. The problem with
this method is to build an appropriate set of rules. If the rules are too general or too detailed, the system ends
up with many false positives. However, it does not work for all skin colors and depends on lighting
conditions that can change the exact hue of a person’s skin in the picture.
Template matching
The template matching method uses predefined or parameterized face templates to locate or detect the faces
by the correlation between the predefined or deformable templates and input images. The face model can be
A variation of this approach is the controlled background technique. If you are lucky to have a frontal face
image and a plain background, you can remove the background, leaving face boundaries.
For this approach, the software has several classifiers for detecting various types of front-on faces and some
for profile faces, such as detectors of eyes, a nose, a mouth, and in some cases, even a whole body. While the
The feature-based method extracts structural features of the face. It is trained as a classifier and then used to
differentiate facial and non-facial regions. One example of this method is color-based face detection that
scans colored images or videos for areas with typical skin color and then looks for face segments.
Haar Feature Selection relies on similar properties of human faces to form matches from facial features:
location and size of the eye, mouth, bridge of the nose, and the oriented gradients of pixel intensities. There
are 38 layers of cascaded classifiers to obtain the total number of 6061 features from each frontal face. You
can find some pre-trained classifiers here. Histogram of Oriented Gradients (HOG) is a feature extractor for
object detection. The features extracted are the distribution (histograms) of directions of gradients (oriented
Histogram of Oriented Gradients (HOG) is a feature extractor for object detection. The features extracted
are the distribution (histograms) of directions of gradients (oriented gradients) of the image.
Gradients are typically large round edges and corners and allow us to detect those regions. Instead of
considering the pixel intensities, they count the occurrences of gradient vectors to represent the light direction
to localize image segments. The method uses overlapping local contrast normalization to improve accuracy.
Real World Computer Vision Applications
The more advanced appearance-based method depends on a set of delegate training face images to find out
face models. It relies on machine learning and statistical analysis to find the relevant characteristics of face
images and extract features from them. This method unites several algorithms:
Eigenface-based algorithm efficiently represents faces using Principal Component Analysis (PCA). PCA is
applied to a set of images to lower the dimension of the dataset, best describing the variance of data. In this
method, a face can be modeled as a linear combination of eigenfaces (set of eigenvectors). Face recognition,
We will use the Python OpenCV Library, which is one of the main tools on the market for developing
Download training screenshot, and Cascade Classifier.XLM training with face identification training.
detector_face = cv2.CascadeClassifier('/content/haarcascade_frontalface_default.xml')
cv2_imshow(imagem_cinza)
#print(x, y, l, a)
cv2_imshow(img)
Usability
Scalability,
Robustness
Flexibility
2) AWS Rekognition
4) Scikit-Image
5) SimpleCV
Real World Computer Vision Applications
6) Azure Face API
7) DeepDream
9) Clarifi
10) DeepPy
Object Detection
Object detection is a computer vision technique that uses machine learning or deep learning algorithms to
identify and locate objects in images or videos. Object detection models can identify multiple objects in a
single image or video, and can tell you if an object is present and where it is.
Object detection is a computer vision technique for locating instances of objects in images or videos. Object
detection algorithms typically leverage machine learning or deep learning to produce meaningful results.
When humans look at images or video, we can recognize and locate objects of interest within a matter of
moments. The goal of object detection is to replicate this intelligence using a computer.
Object detection models typically identify objects from a known set of classes, such as people, cars, or
animals. For example, if a picture of two dogs is given to an object detection model, the model might draw a
box around each dog and label the box "dog".
Object detection is a key technology behind advanced driver assistance systems (ADAS) that enable cars to
detect driving lanes or perform pedestrian detection to improve road safety. Object detection is also useful in
applications such as video surveillance or image retrieval systems.
Real World Computer Vision Applications
You can use a variety of techniques to perform object detection. Popular deep learning–based approaches
using convolutional neural networks (CNNs), such as R-CNN and YOLO v2, automatically learn to detect
objects within images.
You can choose from two key approaches to get started with object detection using deep learning:
Create and train a custom object detector. To train a custom object detector from scratch, you
need to design a network architecture to learn the features for the objects of interest. You also need to
compile a very large set of labeled data to train the CNN. The results of a custom object detector can
be remarkable. That said, you need to manually set up the layers and weights in the CNN, which
requires a lot of time and training data.
Use a pretrained object detector. Many object detection workflows using deep learning
leverage transfer learning, an approach that enables you to start with a pretrained network and then
fine-tune it for your application. This method can provide faster results because the object detectors
have already been trained on thousands, or even millions, of images.
Whether you create a custom object detector or use a pretrained one, you will need to decide what type of
object detection network you want to use: a two-stage network or a single-stage network.
Real World Computer Vision Applications
Two-Stage Networks
The initial stage of two-stage networks, such as R-CNN and its variants, identifies region proposals, or
subsets of the image that might contain an object. The second stage classifies the objects within the region
proposals. Two-stage networks can achieve very accurate object detection results; however, they are
typically slower than single-stage networks.
High-level architecture of R-CNN (top) and Fast R-CNN (bottom) object detection.
Single-Stage Networks
In single-stage networks, such as YOLO v2, the CNN produces network predictions for regions across the
entire image using anchor boxes, and the predictions are decoded to generate the final bounding boxes for
the objects. Single-stage networks can be much faster than two-stage networks, but they may not reach the
same level of accuracy, especially for scenes containing small objects.
Real World Computer Vision Applications
Machine learning techniques are also commonly used for object detection, and they offer different
approaches than deep learning. Common machine learning techniques include:
Similar to deep learning–based approaches, you can choose to start with a pretrained object detector or
create a custom object detector to suit your application. You will need to manually select the identifying
features for an object when using machine learning, compared with automatic feature selection in a deep
learning–based workflow.
Determining the best approach for object detection depends on your application and the problem you’re
trying to solve. The main consideration to keep in mind when choosing between machine learning and deep
learning is whether you have a powerful GPU and lots of labeled training images. If the answer to either of
these questions is no, a machine learning approach might be the better choice. Deep learning techniques tend
to work better when you have more images, and GPUs decrease the time needed to train the model.
In addition to deep learning– and machine learning–based object detection, there are several other common
techniques that may be sufficient depending on your application, such as:
Real World Computer Vision Applications
Image segmentation and blob analysis, which uses simple object properties such as size, shape, or
color
Feature-based object detection, which uses feature extraction, matching, and RANSAC to estimate
the location of an object
OpenCV is one of the most popular and widely-used libraries for computer vision tasks.
However, there are several other libraries and frameworks available that offer alternatives to
OpenCV, each with its own set of features, strengths, and weaknesses.
Real World Computer Vision Applications
PyTorch
PyTorch, developed by Facebook, is another popular deep learning framework widely used in
the research community. PyTorch offers a flexible and intuitive interface for building custom
neural networks for various computer vision tasks. It provides dynamic computation graphs,
making it easy to experiment with different network architectures and algorithms.
scikit-image
scikit-image is a Python library specifically designed for image processing tasks. It provides a
collection of algorithms and functions for image filtering, feature extraction, segmentation,
and more. scikit-image is built on top of NumPy, making it easy to integrate with other
scientific computing libraries in the Python ecosystem.
Dlib
Dlib is a C++ library that offers a wide range of tools and algorithms for machine learning,
computer vision, and image processing. It is known for its robust implementation of facial
landmark detection, object tracking, and facial recognition algorithms. Dlib also provides
Python bindings for easy integration into Python projects.
Simple Cv
SimpleCV is a Python framework designed to make computer vision tasks accessible to
beginners and non-experts. It provides a high-level interface for common computer vision
tasks, such as image acquisition, processing, feature extraction, and object detection.
Real World Computer Vision Applications
SimpleCV abstracts away much of the complexity involved in computer vision, making it
suitable for rapid prototyping and experimentation.
Caffe
Caffe is a deep learning framework developed by Berkeley AI Research (BAIR). While it is
primarily focused on deep learning tasks, Caffe also includes modules for computer vision
tasks such as image classification, object detection, and segmentation. Caffe is known for its
speed and efficiency, particularly in training large-scale convolutional neural networks
(CNNs).
MXNet
MXNet is a deep learning framework that offers support for both symbolic and imperative
programming models. It provides a comprehensive set of tools and APIs for building and
deploying deep learning models for computer vision tasks. MXNet’s flexibility and scalability
make it suitable for both research and production environments.
Object detection is a computer vision technique that uses machine learning or deep learning to
locate and classify objects in images or videos. The goal is to develop computational models
that can answer the fundamental question, "What objects are where?"
Object detection can be used in many areas, including: Medical imaging, Self-driving cars,
Image retrieval, Video surveillance, and Food manufacturing.
To train an object detection model, you need to create a neural network and show it images of an
object in different scenarios. You then label the object and its location.
Real World Computer Vision Applications
Or
Finding Clues: The computer looks for clues like shapes, colors, and patterns in the picture.
Guessing What’s There: Based on those clues, it makes guesses about what might be in the
picture.
Checking the Guesses: It checks each guess by comparing it to things it already knows.
Drawing Boxes: If it’s pretty sure about something, it draws a box around it to show whe re it
Making Sure: Finally, it double-checks its guesses to make sure it got things right and fix any
mistakes
Creating a custom object detector in computer vision involves several steps, including preparing the dataset,
training the model, and evaluating the results.
Split data: Divide the dataset into training and testing sets.
Training the model
Choose a model architecture: Select a model like YOLO or EfficientNet Lite.
Train the model: Use the training data to train the model.
Evaluate the model: Assess the model's performance using the testing data.
Additional steps
Convert to TFRecord format: If using TensorFlow, convert the annotations to TFRecord format.
Create a configuration file: Configure the training process.
Export the model: Save the model for use in other applications.
Tools and resources
TensorFlow Object Detection API: A popular framework for training object detectors.
Detecto: A library that allows training a model with just a few lines of code.
TensorFlow Lite Model Maker: A tool for training object detection models for edge devices.
YOLOv7: A step-by-step guide for training a custom object detector with YOLOv7.
Custom Vision Service: A Microsoft Azure service for building object detectors.
MATLAB: A toolbox that allows training custom object detectors.
Optical Character Recognition (OCR) is a machine-learning-based technology that uses computer vision and
pattern recognition to convert text from images and documents into a machine-readable format. This format
Real World Computer Vision Applications
can be digitally modified and used in machine processes, such as cognitive computing, or presented on the
web.
OCR can be used to extract text from many types of documents, including:
Forms, Invoices, Articles, Reports, Street signs, Product labels, and Posters.
OCR works by analyzing the structure of an image, dividing it into elements like blocks of text, tables, or
images. The lines are then divided into words and characters, which are compared to a set of pattern
images. Once a character is identified, it's converted into an ASCII code, which computer systems can use
for further manipulation.
Examples of OCR engines include: Text extraction tools, PDF to .txt converters, and Google's image search
function.
History of OCR
Early OCR technologies were purely mechanical, dating back to 1914 when “Emanuel Goldberg developed
a machine that could read characters and then converted them into standard telegraph code” (Dhavale,
2017, p. 91). Goldberg continued his research into the 1920s and 1930s when he developed a system that
searched microfilm (scaled-down documents, typically films, newspapers, journals, etc.) for characters and
then OCR’d them.
In 1974, Ray Kurzweil and Kurzweil Computer Products, Inc. continued developing OCR systems,
mainly focusing on creating a “reading machine for the blind.” Kurzweil’s work caught industry leader
Xerox’s attention, who wished to commercialize the software further and develop OCR applications for
document understanding.