0% found this document useful (0 votes)
54 views

Computer Vision Unit 3-1

No
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Computer Vision Unit 3-1

No
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Real World Computer Vision Applications

Syllabus content
Unit 3: Facial Recognition with Computer Vision

 Facial Recognition with Computer Vision Overview


 Face Detection Algorithm
 Face Detection Implementation
 Test Photographs
 Alternative to OpenCV
 Object Detection with Computer Vision Overview
 Benefits of Object Detection
 Working of Object Detection
 Create a custom object detector
 Use a pretrained object Detector
 Other object Detection method
 Optical Character Recognition with Computer Vision
 Overview
 How does Optical Character Recognition work
 OCR Applications in the Real World
 Text Recognition with Tesseract OCR
 The Different Ways for Text Detection

Satellite to Map Image Translation Dataset in computer vison


In computer vision, a Satellite to Map Image Translation Dataset is typically used for tasks that involve translating
satellite imagery into corresponding map-like representations, such as road networks, building layouts, or terrain
features. This task falls under image-to-image translation, which is often tackled using Generative Adversarial
Networks (GANs) or other deep learning techniques.
Here are key components and popular datasets for this type of task:
1. Task Overview
 Goal: The objective is to convert satellite images (typically high-resolution, containing complex and varied
details) into simplified map representations (such as road maps or segmentation masks showing specific
features like buildings or vegetation).
 Applications:
Real World Computer Vision Applications
o Urban planning

o Autonomous driving

o Geographic Information Systems (GIS)

o Disaster management (e.g., detecting damaged areas)

 Techniques:
o Pix2Pix: A GAN-based model for image-to-image translation.

o CycleGAN: Used when paired data is not available.

o Segmentation Networks: For specific tasks like road network extraction.

2. Popular Datasets
 Google Maps Dataset: Often used for translating satellite images to Google-style maps. This may involve
scraping or using public APIs to gather pairs of satellite images and map tiles.
 DeepGlobe Road Extraction Dataset: Contains satellite images paired with road networks, useful for road
extraction tasks.
 Inria Aerial Image Labeling Dataset: Provides aerial imagery paired with building annotations, useful for
building footprint extraction.
 SpaceNet Dataset: A publicly available corpus of labeled satellite imagery focusing on road network
extraction and building footprint detection.
3. Challenges
 High Variability in Satellite Images: Weather conditions, seasons, and atmospheric conditions can
drastically alter satellite images.
 Precision: Translating complex details like roads and buildings while ensuring that small, intricate structures
are correctly represented.
 Scale Differences: The resolution of satellite images may vary greatly, requiring models to adapt to different
scales effectively.
4. Use of GANs
 Pix2Pix: A conditional GAN designed for paired image-to-image translation. It takes input images (satellite
images) and produces output images (maps).
 CycleGAN: Useful for unpaired image translation when paired datasets are unavailable. It translates between
domains (satellite and maps) without requiring direct pixel-to-pixel alignment.

What is Facial Recognition:

Facial recognition is a type of computer vision that uses optical input to analyze images and identify
faces. It's a form of artificial intelligence (AI) that mimics the human ability to recognize faces. Facial
recognition software uses AI, image recognition, and other advanced technologies to map, analyze, and
confirm a face's identity.
Detection is the process of finding a face in an image. Enabled by computer vision, facial recognition can
detect and identify individual faces from an image containing one or many people's faces.
Real World Computer Vision Applications
Facial recognition is a system used to identify a person by analyzing the individual's facial features, and the
term also refers to the software that automates the process. It scans the person's face, notes key characteristics,
and compares it to another image stored in a database. If the images match, the system confirms the identity.

Two broad categories used to classify facial recognition software are holistic and feature-based:

Holistic models examine your entire face and compare your features to those in images stored in a database.

A feature-based model analyses your face more deeply—for example, considering measurements between
features and the contours of bones.

First, detection and recognition are different tasks. Face detection is the crucial part of face recognition

determining the number of faces on the picture or video without remembering or storing details. It may define

some demographic data like age or gender, but it cannot recognize individuals.

Face recognition identifies a face in a photo or a video image against a pre-existing database of faces. Faces

indeed need to be enrolled into the system to create the database of unique facial features. Afterward, the

system breaks down a new image into key features and compares them against the information stored in the

database.

How facial recognition works

Facial recognition software typically follows a three-step process: detection, analysis, and recognition.
 Detect: In the first step, the program searches through an image looking for facial data. It views faces
from the front and side, looking for distinctive features to analyze in the next step.
Real World Computer Vision Applications
 Analyze: After identifying a face in an image, the program examines facial landmarks like the distance
from the chin to the forehead and between the eyes. It also considers the shape of different features like
the cheekbones, lips, ears, and more.
 Recognize: In the final step of the process, the facial recognition program applies what it's learned
from the data to verify an individual's identity. It may compare the current image under analysis with a
stored image like one used on a government ID.

What is facial recognition used for?


Facial recognition software has multiple uses, including protecting access to sensitive information, confirming
identity, and preventing fraud. What was once only seen in sci-fi flicks now has applications in your daily life.
The following list highlights some ways you may see people and organizations using facial recognition
technology:
 Access control: Verify identity before granting access to devices, buildings, and documents.
 Attendance: Scan people as they enter a facility to create an attendance record for work or school.
 Banking: Confirm a customer's identity at ATMs and banking centers to prevent fraud.
 Customer experience: Notify authorities when known shoplifters are in stores, suggest products for
customers, and allow customers to pay for purchases.
 Health care: Improve infection control by reducing the number of touchpoints in a facility, identify
genetic diseases, and monitor patients.
 Investigations: Assist detectives during investigations and ensure officers have arrested the correct
individuals.
 Security: Confirm the identity of individuals, track movements, and prevent unauthorized access to
sensitive locations and equipment.
 Transportation: Verify passengers' identity at airports and border crossings to increase convenience
and security.
Real World Computer Vision Applications

What is face detection?


Face detection, also called facial detection, is an artificial intelligence (AI)-based computer technology used
to find and identify human faces in digital images and video. Face detection technology is often used for
surveillance and tracking of people in real time. It is used in various fields including security, biometrics, law
enforcement, entertainment and social media.

Face detection uses machine learning (ML) and artificial neural network (ANN) technology, and plays an
important role in face tracking, face analysis and facial recognition. In face analysis, face detection uses facial
expressions to identify which parts of an image or video should be focused on to determine age, gender and
emotions. In a facial recognition system, face detection data is required to generate a faceprint and match it
with other stored faceprints.

How face detection works


Face detection applications use AI algorithms, ML, statistical analysis and image processing to find human
faces within larger images and distinguish them from nonface objects such as landscapes, buildings and
other human body parts. Before face detection begins, the analyzed media is preprocessed to improve its
quality and remove images that might interfere with detection.

Face detection algorithms typically start by searching for human eyes, one of the easiest features to detect.
They then try to detect facial landmarks, such as eyebrows, mouth, nose, nostrils and irises. Once the
Real World Computer Vision Applications

algorithm concludes that it has found a facial region, it does additional tests to confirm that it has detected a
face.

To ensure accuracy, the algorithms are trained on large data sets that incorporate hundreds of thousands of
positive and negative images. The training improves the algorithms' ability to determine whether there are
faces in an image and where they are.

Face detection software detects faces by identifying facial features in a photo or video using machine
learning algorithms. It first looks for an eye, and from there it identifies other facial features. It then
compares these features to training data to confirm it has detected a face.

Face detection methods

First, the computer examines either a photo or a video image and tries to distinguish faces from any other

objects in the background. There are methods that a computer can use to achieve this, compensating for

illumination, orientation, or camera distance. Yang, Kriegman, and Ahuja presented a classification for face

detection methods. These methods are divided into four categories, and the face detection algorithms could

belong to two or more groups.


Real World Computer Vision Applications

Face detection methods

Face detection software uses several different methods, each with advantages and disadvantages:

Knowledge- or rule-based. These approaches describe a face based on rules. Establishing well-defined,
knowledge-based rules can be a challenge, however.

Feature-based or feature-invariant. These methods use features such as a person's eyes or nose to detect a
face. They can be negatively affected by noise and light.

Template matching. This method is based on comparing images with previously stored standard face
patterns or features and correlating the two to detect a face. However, this approach struggles to address
variations in pose, scale and shape.

Appearance-based: This method uses statistical analysis and ML to find the relevant characteristics of face
images. The appearance-based method can struggle with changes in lighting and orientation.

Algorithms used for face detection:

Face detection algorithms are a key component of computer vision and are used to identify
and locate human faces in digital images or videos. These algorithms are the foundation for a
wide range of applications, including facial recognition, emotion detection, and security
systems. Here's an overview of some common face detection algorithms:

1. Haar Cascades

- Developed by: Paul Viola and Michael Jones in 2001.

- Approach: Haar Cascades use machine learning to train a cascade function from a large
number of positive and negative images. The algorithm then detects faces by scanning the
image at different scales and positions.
Real World Computer Vision Applications

- Pros: Fast and efficient for real-time applications.

- Cons: Can be less accurate, particularly with faces at different angles or under varied
lighting conditions.

2. Histogram of Oriented Gradients (HOG)

- Developed by: Navneet Dalal and Bill Triggs in 2005.

- Approach: HOG detects faces by capturing the structure of the human face using
gradients in the image. It divides the image into small regions and computes the gradient
orientation for each region.

- Pros: Robust to variations in lighting and small changes in pose.

- Cons: Can be computationally expensive and may struggle with complex backgrounds.

3. Convolutional Neural Networks (CNNs)

- Approach: CNN-based face detection algorithms use deep learning techniques to learn
features from a large set of labeled face images. These models are trained on large datasets
and can achieve high accuracy in detecting faces, even in challenging conditions.

- Examples: MTCNN (Multi-Task Cascaded Convolutional Networks), RetinaFace.

- Pros: Highly accurate, able to detect faces in various poses and lighting conditions.

- Cons: Requires a large amount of computational power and data for training.

4. You Only Look Once (YOLO)

- Approach: YOLO is a real-time object detection system that divides an image into a grid
and predicts bounding boxes and class probabilities for each grid cell. It is often used for fast
face detection in videos.

- Pros: Extremely fast, suitable for real-time applications.

- Cons: May miss small faces in the image due to its grid-based approach.

5. Single Shot MultiBox Detector (SSD)


Real World Computer Vision Applications

- Approach: SSD is another deep learning-based object detection method that can detect
multiple objects, including faces, in an image in a single shot. It uses a series of convolutional
layers to predict the bounding boxes and classes.

- Pros: Balances speed and accuracy, works well for real-time detection.

- Cons: May be less accurate than other deep learning models like Faster R-CNN.

6. Facial Landmark Detection

- Approach: This technique involves detecting key points on a face, such as the eyes, nose,
and mouth, and then using these landmarks to identify the face. Algorithms like Dlib's 68-
point facial landmark detector are commonly used.

- Pros: Provides precise information about face orientation and expression.

- Cons: More computationally intensive and may require post-processing.

7. Viola-Jones Algorithm

- Approach: This is one of the earliest and most well-known face detection algorithms,
using a combination of simple rectangular features, integral images, and a cascaded classifier
to detect faces.

- Pros: Fast and suitable for real-time face detection.

- Cons: Can struggle with faces that are not frontal or have complex backgrounds.

Applications of Face Detection:

- Security Systems: Surveillance and access control systems.

- Social Media: Tagging and photo organization.

- Healthcare: Monitoring patient emotions or conditions.

- Entertainment: Augmented reality filters, gaming.

These algorithms form the basis for more advanced tasks like facial recognition, expression
analysis, and other biometric applications.
Real World Computer Vision Applications
Face Detection Implementation

Multiple face detection


Real World Computer Vision Applications

Test Photograph

a "test photograph" refers to an image used for evaluating the performance of an algorithm or model. These
images are essential in the development and testing of computer vision systems, such as object detection,
face recognition, image classification, and more. Here's how test photographs are typically used in computer
vision:

1. Model Evaluation

 Purpose: Test photographs are used to assess how well a computer vision model performs on unseen
data. After training a model on a set of images (training set), the test photographs (test set) are used
to evaluate its accuracy, precision, recall, and other performance metrics.
 Example: After training a face detection algorithm, a set of test photographs containing faces in
various poses, lighting conditions, and backgrounds is used to evaluate how accurately the model
detects faces.

2. Algorithm Benchmarking

 Purpose: Test photographs are used to benchmark different algorithms by providing a standardized
set of images for comparison. This helps in determining which algorithm performs best under
specific conditions.
 Example: A researcher might use the same test photographs to compare the performance of different
object detection algorithms (e.g., YOLO vs. SSD) on identifying objects in an image.

3. Validation of Model Generalization

 Purpose: Test photographs help determine how well a model generalizes to new data. A model that
performs well on the training data might not perform well on unseen test photographs if it has
overfitted to the training set.
 Example: A model trained to classify dog breeds might be tested on photographs of dog breeds not
included in the training set to see how well it generalizes.

4. Cross-Domain Testing

 Purpose: Test photographs from different domains (e.g., different environments, cultures, or types
of images) are used to evaluate how robust a computer vision model is across various contexts.
 Example: A face recognition system might be tested on photographs from different countries to
ensure it performs well across different ethnicities and facial features.
Real World Computer Vision Applications
5. Dataset Evaluation

 Purpose: Large test datasets composed of test photographs are often used to evaluate the overall
effectiveness of computer vision systems. Popular datasets like ImageNet, COCO (Common Objects
in Context), or MNIST are widely used benchmarks in the field.
 Example: The COCO dataset contains a large set of test photographs with labeled objects, which are
used to evaluate object detection algorithms.

6. Error Analysis

 Purpose: Test photographs are used to analyze and understand the types of errors a computer vision
model makes. This can help in improving the model by focusing on its weaknesses.
 Example: If a model frequently misclassifies certain objects, test photographs showing those objects
can be used to investigate why the model is failing and how it can be improved.

7. Performance on Edge Cases

 Purpose: Test photographs that represent edge cases or challenging scenarios are used to see how
well a model handles difficult situations, such as occlusions, low lighting, or unusual perspectives.
 Example: A self-driving car system might be tested using photographs of pedestrians partially
obscured by objects to see how well it detects them.

8. Real-World Application Testing

 Purpose: Test photographs taken in real-world environments are used to evaluate how a computer
vision model performs outside of controlled lab conditions.
 Example: A model developed for surveillance might be tested using photographs from real CCTV
footage to see how well it identifies people in varying conditions.

Key Considerations for Test Photographs in Computer Vision:

 Diversity: Test photographs should cover a wide range of scenarios to ensure the model is robust.
 Realism: Test images should resemble the real-world data the model will encounter in deployment.
 Balance: The test set should be balanced in terms of classes and conditions to provide a fair
evaluation.
 Unseen Data: Ideally, test photographs should not be part of the training dataset to ensure an
unbiased evaluation.
Real World Computer Vision Applications

What are the alternative of OpenCV

OpenCV is one of the most popular and widely-used libraries for computer vision tasks.
However, there are several other libraries and frameworks available that offer alternatives to
OpenCV, each with its own set of features, strengths, and weaknesses.

Alternatives to OpenCV for Computer Vision


Below are some of the top alternatives of OpenCV for computer vision in Python:

TensorFlow
TensorFlow, developed by Google, is primarily known as a deep learning framework.
However, it also provides a comprehensive set of tools and APIs for computer vision tasks
through its TensorFlow Image Processing (TF Image) module. TensorFlow offers high-level
abstractions for building and training deep neural networks for image classification, object
detection, segmentation, and more.

PyTorch
PyTorch, developed by Facebook, is another popular deep learning framework widely used in
the research community. PyTorch offers a flexible and intuitive interface for building custom
neural networks for various computer vision tasks. It provides dynamic computation graphs,
making it easy to experiment with different network architectures and algorithms.

scikit-image
scikit-image is a Python library specifically designed for image processing tasks. It provides
a collection of algorithms and functions for image filtering, feature extraction, segmentation,
and more. scikit-image is built on top of NumPy, making it easy to integrate with other
scientific computing libraries in the Python ecosystem.
Real World Computer Vision Applications

Dlib
Dlib is a C++ library that offers a wide range of tools and algorithms for machine learning,
computer vision, and image processing. It is known for its robust implementation of facial
landmark detection, object tracking, and facial recognition algorithms. Dlib also provides
Python bindings for easy integration into Python projects.

Simple Cv
SimpleCV is a Python framework designed to make computer vision tasks accessible to
beginners and non-experts. It provides a high-level interface for common computer vision
tasks, such as image acquisition, processing, feature extraction, and object detection.
SimpleCV abstracts away much of the complexity involved in computer vision, making it
suitable for rapid prototyping and experimentation.

Caffe
Caffe is a deep learning framework developed by Berkeley AI Research (BAIR). While it is
primarily focused on deep learning tasks, Caffe also includes modules for computer vision
tasks such as image classification, object detection, and segmentation. Caffe is known for its
speed and efficiency, particularly in training large-scale convolutional neural networks
(CNNs).

MXNet
MXNet is a deep learning framework that offers support for both symbolic and imperative
programming models. It provides a comprehensive set of tools and APIs for building and
deploying deep learning models for computer vision tasks. MXNet’s flexibility and
scalability make it suitable for both research and production environments.

Face Detection Algorithms

Face detection algorithms are specialized computer vision algorithms designed to identify and locate human
faces within images or videos. These algorithms are foundational for various applications, including facial
recognition, emotion detection, security systems, and human-computer interaction. Here’s an overview of
some commonly used face detection algorithms:
Real World Computer Vision Applications
1. Haar Cascades

 Developer: Paul Viola and Michael Jones (2001).


 Method: The Haar Cascade algorithm uses machine learning where a cascade function is trained
with many positive and negative images. Haar-like features, which are simple patterns in the image
(like edges or changes in intensity), are used to detect objects like faces.
 Advantages:
o Fast and efficient for real-time face detection.
o Works well with frontal faces.
 Disadvantages:
o Less effective with non-frontal faces or under different lighting conditions.
o Prone to false positives.

2. Histogram of Oriented Gradients (HOG)

 Developer: Navneet Dalal and Bill Triggs (2005).


 Method: HOG features describe the appearance and shape of an object in an image by capturing the
direction of gradients or edge orientations within localized portions of the image.
 Advantages:
o Robust to lighting variations and minor changes in pose.
o Often used with Support Vector Machines (SVM) for face detection.
 Disadvantages:
o Computationally intensive compared to simpler methods like Haar Cascades.
o May not perform well on faces with extreme angles.

3. Convolutional Neural Networks (CNNs)

 Method: CNNs are deep learning models that automatically learn features from large datasets of
images. These models are particularly effective for face detection due to their ability to capture
complex patterns in the data.
 Examples:
o MTCNN (Multi-Task Cascaded Convolutional Networks): Detects faces and facial
landmarks in a multi-stage process.
o RetinaFace: A state-of-the-art face detector that provides high accuracy by combining face
detection with keypoint localization.
 Advantages:
o High accuracy, especially in detecting faces under varying poses and lighting conditions.
Real World Computer Vision Applications
o Can detect small faces and faces in challenging conditions.
 Disadvantages:
o Requires significant computational resources and large datasets for training.
o More complex to implement and fine-tune compared to traditional methods.

4. You Only Look Once (YOLO)

 Method: YOLO is a real-time object detection system that divides an image into a grid and predicts
bounding boxes and class probabilities directly from the full images in a single evaluation.
 Advantages:
o Extremely fast, making it suitable for real-time applications.
o Can detect multiple faces and objects in an image simultaneously.
 Disadvantages:
o May miss smaller faces due to its grid-based approach.
o Less accurate compared to some other deep learning-based models in certain scenarios.

5. Single Shot MultiBox Detector (SSD)

 Method: SSD is a deep learning-based object detection model that predicts bounding boxes and
object classes in a single pass through the network. It uses a series of convolutional layers to detect
faces at multiple scales.
 Advantages:
o Balanced trade-off between speed and accuracy.
o Suitable for detecting faces in real-time.
 Disadvantages:
o May not be as accurate as models like Faster R-CNN for complex images.
o Can struggle with very small or very large faces.

6. Faster R-CNN

 Method: Faster R-CNN is an advanced deep learning model that uses a Region Proposal Network
(RPN) to propose candidate object regions, followed by a classifier that refines these regions and
classifies them.
 Advantages:
o High accuracy in detecting faces, even in challenging conditions.
o Effective in detecting small faces and faces with occlusions.
 Disadvantages:
Real World Computer Vision Applications
o Computationally intensive, making it less suitable for real-time applications without
specialized hardware.
o Slower compared to models like YOLO and SSD.

7. Facial Landmark Detection

 Method: Instead of detecting the entire face, facial landmark detection algorithms identify key
points on the face, such as the eyes, nose, mouth, and chin. These landmarks can then be used to
infer the presence and orientation of a face.
 Advantages:
o Provides detailed information about face orientation and expression.
o Useful for applications like face alignment and emotion recognition.
 Disadvantages:
o More computationally intensive than simple face detection.
o Requires accurate landmark localization, which can be challenging in some conditions.

Applications of Face Detection:

 Security and Surveillance: Used in security cameras and access control systems to detect and track
individuals.
 Social Media: Automatic tagging and photo organization.
 Healthcare: Monitoring patient conditions and expressions.
 Automotive: Driver monitoring systems in vehicles to detect drowsiness or distraction.
 Retail: Customer behavior analysis and targeted advertising.

Challenges in Face Detection:

 Pose Variations: Detecting faces at various angles and orientations.


 Occlusions: Faces partially obscured by objects or other faces.
 Lighting Conditions: Variability in lighting can significantly affect detection accuracy.
 Expression Variations: Changes in facial expression can alter the appearance of a face, making
detection more challenging.
 Scale Variations: Detecting faces of different sizes in the same image.
Real World Computer Vision Applications

Object Detection:
what is Object Detection ?

Object detection is a computer vision technique that uses machine learning or deep learning to
locate and classify objects in images or videos. The goal is to develop computational models
that can answer the fundamental question, "What objects are where?"

Object detection is a technique that uses neural networks to localize and classify objects in images. This
computer vision task has a wide range of applications, from medical imaging to self-driving cars.

Object detection can be used in many areas, including: Medical imaging, Self-driving cars,
Image retrieval, Video surveillance, and Food manufacturing.
To train an object detection model, you need to create a neural network and show it images of an
object in different scenarios. You then label the object and its location.

Understanding Object Detection


Object detection primarily aims to answer two critical questions about any image: “Which
objects are present?” and “Where are these objects situated?” This process involves both
object classification and localization:
 Classification: This step determines the category or type of one or more objects
within the image, such as a dog, car, or tree.
 Localization: This involves accurately identifying and marking the position of an
object in the image, typically using a bounding box to outline its location.
Key Components of Object Detection
1. Image Classification
Image classification assigns a label to an entire image based on its content. While it’s a crucial
step in understanding visual data, it doesn’t provide information about the object’s location
within the image.
2. Object Localization
Object localization goes a step further by not only identifying the object but also determining
its position within the image. This involves drawing bounding boxes around the objects.
3. Object Detection
Object detection merges image classification and localization. It detects multiple objects in an
image, assigns labels to them, and provides their locations through bounding boxes.

Working of object detection

Or

How object detection works?


Real World Computer Vision Applications

 Looking at the Picture: Imagine a computer looking at a picture.

 Finding Clues: The computer looks for clues like shapes, colors, and patterns in the picture.

 Guessing What’s There: Based on those clues, it makes guesses about what might be in the picture.

 Checking the Guesses: It checks each guess by comparing it to things it already knows.

 Drawing Boxes: If it’s pretty sure about something, it draws a box around it to show where it thinks the

object is.

 Making Sure: Finally, it double-checks its guesses to make sure it got things right and fix any mistakes

How to create a custom object detector:


Real World Computer Vision Applications
Creating a custom object detector in computer vision involves several steps, including preparing the dataset,
training the model, and evaluating the results.

Preparing the dataset

 Collect images: Gather photos of the objects you want to detect.


 Annotate images: Label the objects in the images with bounding boxes.
 Split data: Divide the dataset into training and testing sets.
Training the model

 Choose a model architecture: Select a model like YOLO or EfficientNet Lite.


 Train the model: Use the training data to train the model.
 Evaluate the model: Assess the model's performance using the testing data.
Additional steps

 Convert to TFRecord format: If using TensorFlow, convert the annotations to TFRecord format.
 Create a configuration file: Configure the training process.
 Export the model: Save the model for use in other applications.
Tools and resources

 TensorFlow Object Detection API: A popular framework for training object detectors.
 Detecto: A library that allows training a model with just a few lines of code.
 TensorFlow Lite Model Maker: A tool for training object detection models for edge devices.
 YOLOv7: A step-by-step guide for training a custom object detector with YOLOv7.
 Custom Vision Service: A Microsoft Azure service for building object detectors.
 MATLAB: A toolbox that allows training custom object detectors.

What is OCR?
Optical character recognition (OCR) is a technology that uses automated data extraction to quickly convert
images of text into a machine-readable format.
OCR is sometimes referred to as text recognition. An OCR program extracts and repurposes data from
scanned documents, camera images and image-only PDFs. OCR software singles out letters on the image,
puts them into words, and then puts the words into sentences, thus enabling access to and editing of the
original content. It also eliminates the wasted effort of redundant manual data entry.
OCR systems use a combination of hardware and software to convert physical, printed documents into
machine-readable text. Hardware, such as an optical scanner or specialized circuit board, copies or reads
text, then software typically handles the advanced processing.

OCR software can take advantage of artificial intelligence (AI) to implement more advanced methods of
intelligent character recognition (ICR) for identifying languages or handwriting. Organizations often use the
Real World Computer Vision Applications

process of OCR to turn printed legal or historical documents into PDF documents so that users can edit,
format and search the documents as if created with a word processor.
ebookHow to choose the right AI foundation model
Learn how to choose the right approach in preparing data sets and employing AI models, plus how to use the
model selection framework to balance performance cost, risks and deployment needs.

The history of OCR


In 1974, Ray Kurzweil started Kurzweil Computer Products, Inc., whose omni-font OCR product might
recognize text printed in virtually any font. He decided that the best application of this technology would be
a machine learning (ML) device for the vision-impaired, so he created a reading machine that might read
text aloud in a text-to-speech format. In 1980, Kurzweil sold his company to Xerox, which was interested in
further commercializing paper-to-computer text conversion.
OCR technology became popular in the early 1990s while digitizing historical newspapers. Since then, the
technology has undergone several improvements. Today, products can deliver near-to-perfect OCR
accuracy. Advanced methods can automate complex document-processing workflows.
Before OCR technology became available, the only option to digitally format documents was manually
reentering the text. Not only is the redundant input time-consuming, but it also comes with inevitable
inaccuracies and typing errors. Today, OCR services are widely available to the public. For example,
Google Cloud Vision OCR can be used to scan and store documents on your smartphone.

How does OCR work?


OCR software uses a scanner to reprocess the physical form of a document to editable, digital text. The OCR
software can run as a free-standing program, OCR application programming interface or web-based service.

Image acquisition: All document pages are copied and then the OCR engine converts the digital document
into a two-color or black-and-white version. The scanned-in image or bitmap is analyzed for light and dark
portions. The program then identifies the dark portions as characters that need to be recognized, while light
areas are identified as background.
Preprocessing: The digital image is cleaned to remove extraneous pixels. This preprocessing can include
deskewing to correct for the image being improperly aligned during scanning, removing graphic rules and
boxes that were part of the printed image and determining whether script text is included.
Text recognition: The dark portions are processed to find alphabetic letters, numeric digits or symbols. This
stage typically involves targeting one character, word or block of text at a time. Characters are then
identified by using one of two algorithms, either pattern recognition or feature recognition.
 Pattern recognition (or pattern matching): The OCR program has previously been trained on
examples of text in various fonts and formats to recognize characters by comparison to a template in
the scanned document or image file. Each unique combination of shape, scale and font is called a
glyph. For this to work, the characters must be in a font that the OCR program has already been
trained on. Given the number of fonts worldwide and languages that use different
characters, such as Arabic, Chinese, English, French, German, Greek,
Japanese, Korean or Spanish, training on every combination of font and language would be an
enormous system drain.
Real World Computer Vision Applications

 Feature recognition (detection or extraction): This is used when the OCR program is analyzing a font
that it has not been trained on. OCR applies rules regarding the features of a specific letter or number
to recognize characters in the scanned document. Features include the number of angled lines, line
intersections, loops or curves in a character. For example, the capital letter “A” is stored as two
diagonal lines that meet with a horizontal line across the middle. When a character is identified, it is
converted into an American Standard Code for Information Interchange (ASCII) code that computer
systems use to handle further manipulations.
Layout recognition: A more complete OCR program will also analyze the structure of a document image. It
divides the page into elements, such as blocks of text, tables or images. The lines are divided into words and
then into characters. After the characters have been singled out, the program compares them with a set of
pattern images. After processing all likely matches, the program returns the recognized text.
Postprocessing: The gathered information is stored as a digital file, either in an editable form or PDF.
Some systems retain both the input image and the post-OCR versions for easier comparison and more
complete document management.

Types of OCR
There are 4 types of OCR programs, with increasing sophistication:
Simple OCR: Analysis is character-by-character pattern-matching, comparing scanned characters to the
stored glyphs. With so many potential font and language combinations, the types of documents that can be
analyzed are limited.
Optical mark recognition (OMR): For identifying checked boxes and other marks, such as bubbles in
surveys or a signature on a form, plus logos, symbols and watermarks. All can be identified by matching to
stored images, as with simple OCR.

Intelligent character recognition (ICR): As mentioned previously, ICR brings in the power of AI. By
using ML or deep learning, the OCR program learns to read just as humans do: through continual practice
and training. A neural network reviews text repeatedly looking for distinctive attributes: the locations of
curves, intersections, lines and loops.
Intelligent word recognition: This is the natural evolution of the previous ICR recognition, but now the AI
has been trained to recognize a word in a single image, making it ultimately faster.
The benefits of OCR
The benefits of employing OCR technology include the ability to:
 Cut costs by reducing or eliminating redundant manual input.
 Streamline workflows with the input of preprinted documents or written forms and speed research
with searchable digital data.

 Automate document routing, content processing and preparation for text mining.

 Save the cost of storing yet more paper records.

 Centralize and secure data sets for protection against fires, break-ins and documents lost in the bank
vaults.

 Enable greater access to data for visually impaired staff and customers.
Real World Computer Vision Applications

 Improve service by giving employees the most up-to-date and accurate information.

**********************

You might also like