Unit 1
Unit 1
COMPUTER VISION:
Computer vision is a sub-field of AI and machine learning that enables the machine to
see, understand, and interpret the visuals such as images, video, etc., and extract useful
information from them that can be helpful in the decision-making of AI applications.
It can be considered as an eye for an AI application. With the help of computer vision
technology, such tasks can Be done that would be impossible without this technology,
such as Self Driving Cars.
COMPUTER VISION PROCESS:
1. Capturing an Image
In the next step, different CV algorithms are used to process the digital data stored in a
file. These algorithms determine the basic geometric elements and generate the image
using the stored digital data.
Finally, the CV analyses the data, and according to this analysis, the system takes the
required action for which it is designed.
Although computer vision has been utilized in so many fields, there are a few common
tasks for computer vision systems. These tasks are given below:
o Object classification: Object classification is a computer vision technique/task
used to classify an image, such as whether an image contains a dog, a person's
face, or a banana. It analyzes the visual content (videos & images) and classifies
the object into the defined category. It means that we can accurately predict the
class of an object present in an image with image classification.
o Object Identification/detection: Object identification or detection uses image
classification to identify and locate the objects in an image or video. With such
detection and identification technique, the system can count objects in a given
image or scene and determine their accurate location and labelling. For example,
in a given image, one dog, one cat, and one duck can be easily detected and
classified using the object detection technique.
o Object Verification: The system processes videos, finds the objects based on
search criteria, and tracks their movement.
o Object Landmark Detection: The system defines the key points for the given
object in the image data.
o Image Segmentation: Image segmentation not only detects the classes in an
image as image classification; instead, it classifies each pixel of an image to
specify what objects it has. It tries to determine the role of each pixel in the
image.
o Object Recognition: In this, the system recognizes the object's location with
respect to the image.
o Facial recognition: Computer vision has enabled machines to detect face images
of people to verify their identity. Initially, the machines are given input data
images in which computer vision algorithms detect facial features and compare
them with databases of fake profiles. Popular social media platforms like
Facebook also use facial recognition to detect and tag users. Further, various
government spy agencies are employing this feature to identify criminals in
video feeds.
o Healthcare and Medicine: Computer vision has played an important role in the
healthcare and medicine industry. Traditional approaches for evaluating
cancerous tumours are time-consuming and have less accurate predictions,
whereas computer vision technology provides faster and more accurate
chemotherapy response assessments; doctors can identify cancer patients who
need faster surgery with life-saving precision.
o Self-driving vehicles: Computer vision technology has also contributed to its
role in self-driving vehicles to make sense of their surroundings by capturing
video from different angles around the car and then introducing it into the
software. This helps to detect other cars and objects, read traffic signals,
pedestrian paths, etc., and safely drive its passengers to their destination.
o Optical character recognition (OCR)
Optical character recognition helps us extract printed or handwritten text from
visual data such as images. Further, it also enables us to extract text from
documents like invoices, bills, articles, etc.
o Machine inspection: Computer vision is vital in providing an image-based
automatic inspection. It detects a machine's defects, features, and functional
flaws, determines inspection goals, chooses lighting and material-handling
techniques, and other irregularities in manufactured products.
o Retail (e.g., automated checkouts): Computer vision is also being implemented
in the retail industries to track products, shelves, wages, record product
movements into the store, etc. This AI-based computer vision technique
automatically charges the customer for the marked products upon checkout
from the retail stores.
o 3D model building: 3D model building or 3D modeling is a technique to
generate a 3D digital representation of any object or surface using the software.
In this field also, computer vision plays its role in constructing 3D computer
models from existing objects. Furthermore, 3D modeling has a variety of
applications in various places, such as Robotics, Autonomous driving, 3D
tracking, 3D scene reconstruction, and AR/VR.
o Medical imaging: Computer vision helps medical professionals make better
decisions regarding treating patients by developing visualization of specific body
parts such as organs and tissues. It helps them get more accurate diagnoses and
a better patient care system. E.g., Computed Tomography (CT) or Magnetic
Resonance Imaging (MRI) scanner to diagnose pathologies or guide medical
interventions such as surgical planning or for research purposes.
o Automotive safety: Computer vision has added an important safety feature in
automotive industries. E.g., if a vehicle is taught to detect objects and dangers, it
could prevent an accident and save thousands of lives and property.
o Surveillance: It is one of computer vision technology's most important and
beneficial use cases. Nowadays, CCTV cameras are almost fitted in every place,
such as streets, roads, highways, shops, stores, etc., to spot various doubtful or
criminal activities. It helps provide live footage of public places to identify
suspicious behaviour, identify dangerous objects, and prevent crimes by
maintaining law and order.
o Fingerprint recognition and biometrics: Computer vision technology detects
fingerprints and biometrics to validate a user's identity. Biometrics deals with
recognizing persons based on physiological characteristics, such as the face,
fingerprint, vascular pattern, or iris, and behavioural traits, such as gait or
speech. It combines Computer Vision with knowledge of human physiology and
behavior.
Computer vision has emerged as one of the most growing domains of artificial
intelligence, but it still has a few challenges to becoming a leading technology. There are
a few challenges observed while working with computer vision technology.
Optical Character Recognition (OCR) is the process that converts an image of text into
a machine-readable text format.
Image acquisition :A scanner reads documents and converts them to binary data. The
OCR software analyzes the scanned image and classifies the light areas as background
and the dark areas as text.
Preprocessing: The OCR software first cleans the image and removes errors to prepare
it for reading. These are some of its cleaning techniques:
The two main types of OCR algorithms or software processes that an OCR software uses
for text recognition are called pattern matching and feature extraction.
Postprocessing
After analysis, the system converts the extracted text data into a computerized file.
Some OCR systems can create annotated PDF files that include both the before and after
versions of the scanned document.
Types of OCR:
Data scientists classify different types of OCR technologies based on their use and
application. The following are a few examples:
A simple OCR engine works by storing many different font and text image patterns as
templates. The OCR software uses pattern-matching algorithms to compare text images,
character by character, to its internal database. If the system matches the text word by
word, it is called optical word recognition. This solution has limitations because there
are virtually unlimited font and handwriting styles, and every single type cannot be
captured and stored in the database.
Intelligent word recognition systems work on the same principles as ICR, but process
whole word images instead of preprocessing the images into characters.
Optical mark recognition identifies logos, watermarks, and other text symbols in a
document.
Applications of OCR
Automatically reading the machine-readable zone (MRZ) and other relevant parts of
a passport
Parsing the routing number, account number, and currency amount from a bank
check
Understanding text in natural scenes such as the photos captured from your
smartphone
OBJECT RECOGNITION:
Object recognition is the technique of identifying the object present in images
and videos. It is one of the most important applications of machine learning and deep
learning. The goal of this field is to teach machines to understand (recognize) the
content of an image just like humans do.
Augmented Reality:
2) Marker-less AR
It is used in events, business, and navigation apps, for instance, the technology uses
location-based information to determine what content the user gets or finds in a certain
area. It may use GPS, compasses, gyroscopes, and accelerometers as can be used on
mobile phones.
The below example shows that a Marker-less AR does not need any physical markers to
place objects in a real-world space:
3) Project-based AR
This kind uses synthetic light projected on the physical surfaces to detect the interaction
of the user with the surfaces. It is used on holograms like in Star Wars and other sci-fi
movies.
4) Superimposition-based AR
In this case, the original item is replaced with an augmentation, fully or partially. The
below example is allowing users to place a virtual furniture item over a room image with
a scale on the IKEA Catalog app.
Augmented reality creates an immersive experience for all its users. Though the most
common AR forms are through glasses or a camera lens, interest in AR is growing, and
businesses are showcasing more types of lenses and hardware through the marketplace. There
are five significant components of AR:
1. Artificial intelligence. Most augmented reality solutions need artificial intelligence (AI) to work,
allowing users to complete actions using voice prompts. AI can also help process information for
your AR application.
2. AR software. These are the tools and applications used to access AR. Some businesses can create
their own form of AR software.
3. Processing. You’ll need processing power for your AR technology to work, generally by leveraging
your device’s internal operating system.
4. Lenses. You’ll need a lens or image platform to view your content or images. The better quality your
screen is, the more realistic your image will appear.
5. Sensors. AR systems need to digest data about their environment to align the real and digital worlds.
When your camera captures information, it sends it through software for processing.
o It increases accuracy.
o It offers innovation, continuous improvement, and individualized learning.
o It helps developers to build games that offer real experiences.
o It enhances the knowledge and information of the user.
o Using VR, people start ignoring the real world. They started living in the virtual
world instead of dealing with the issues of the real world.
o Training in the virtual environment does not have the same result as training in
the actual world.
o It is not guaranteed that a person has done a task well in the real world if he/she
has performed that task well in the virtual world.
Now, let's see the comparison chart between Augmented reality and Virtual reality.
Here, we are showing the comparison between both terms on the basis of some
characteristics.
On the basis Augmented Reality Virtual Reality
of
Devices used In AR, there is a use of tablet, In VR, there is a use of head-
smartphones, or another mobile mounted display or glasses.
device.
Reality and Augmented reality is 75% real and Virtual reality is 75% virtual
virtuality 25% virtual. and 25% real.
Network data Augmented reality requires upwards A virtual reality video with
of 100Mbps bandwidth. 720p requires a connection
of atleast 50Mbps.
Revenue The projected revenue share for The projected revenue share
augmented reality in 2020 is $120 for virtual reality in 2020 is
million. $30 million.
In this picture, we have an example query image that illustrates the user’s information
need and a very large dataset of images. CBIR system’s task is to rank all the images in
the dataset according to how likely they are to fulfill the user’s information need.
Feature Extraction Methods in CBIR
CBIR systems need to perform feature extraction, which plays a significant role in
representing an image’s semantic content.
There are two main categories of visual features: global and local.
Global Features
Global features are those that describe an entire image. They contain information on the
entire image. For example, several descriptors characterize color spaces, such as color
moments, color histograms, and so on.
Other global features are concerned with other visual elements such as e.g. shapes and
texture.In this diagram, we find various methods for global feature extraction:
Local Features
While global features have many advantages, they change under scaling and rotation.
For this reason, local features are more reliable in various conditions.
Local features describe visual patterns or structures identifiable in small groups of
pixels. For example, edges, points, and various image patches.
The descriptors used to extract local features consider the regions centered around the
detected visual structures. Those descriptors transform a local pixel neighborhood into
a vector presentation.
One of the most used local descriptors is SIFT which stands for Scale-Invariant Feature
Transform. It consists of a descriptor and a detector for key points. It doesn’t change
when we rotate the image we’re working on. However, it has some drawbacks, such as
needing a fixed vector for encoding and a huge amount of memory.
Deep Neural Networks
Recently, state-of-the-art CBIR systems have started using machine-learning methods
such as deep-learning algorithms. They can perform feature extraction far better than
traditional methods.
Usually, a Deep Convolutional Neural Network (DCNN) is trained using available data.
Its job is to extract features from images. So, when a user sends the query image to
the database system, DCNN extracts its features. Then, the query-image features are
compared to those of the database images. In that step, the database system finds the
most similar images using similarity measures and returns them to the user:
Since there are various pre-trained convolutional networks as well as Computer Vision
Datasets, some people prefer ready-to-use models such as AlexNet, GoogLeNet, and
ResNet50 over training their networks from scratch.
So, deep-learning models such as DCNN extract features automatically. In contrast, in
traditional models, we pre-define the features to extract.
Computer Vision Retail Industry:
1. Automated payment
With the use of computer vision in retail, you don’t have to wait in a long queue to pay.
Products can be monitored using a combination of sensors and computer vision. In
addition, they can also recognize the customer and automatically charge them after they
leave the store.
2. In-store advertisement
Computer vision in retail can also be used to identify certain customers when they enter
the store and send them special discounts. They can also get recommendations on what
to buy, depending on their purchase history.
3. Stock management
Computer vision can be used to detect empty shelves and reduce the replenishment
period, this increases product availability on the shelf. This solution can also verify shelf
price, which is often a time-consuming manual operation, minimizing pricing anomalies.
4. Customer advisory
In the near future, CV algorithms will be so advanced that they will help you find the
perfect product or an accessory matching your new dress. They have the ability to
become fully-operational customer advisors.
5. Virtual Mirrors
Virtual mirrors may become the central focus of personalization and customer
experience enhancement in retail. A virtual mirror is basically a mirror with a display
behind the glass. It is powered by computer vision cameras and AR and can display a
wide range of contextual information which helps buyers connect with the brand better.
6. Crowd Analysis
Computer vision can correctly count retail shoppers and study customer behavior in
total. Retailers will be able to track the customer journeys throughout the physical
store, calculate the total time spent with each product, and guarantee that the store
follows all standardized protocols.
7. Self-checkout
8. Cashierless Stores
As revolutionary as it may sound, cashier-less stores are paving the way for a more
streamlined, AI-assisted shopping experience in stores across the world. Computer
Vision is being tested out in various retail stores to completely replace the need for
human staff.
9. Inventory Management
By automating inventory cycle counts with computer vision, retail businesses can
update their inventory system in real-time to develop an omni-channel retail
experience.
Systems can monitor facial expressions and identify how a customer feels, giving
marketers a way to know how people respond to specific goods.
2. Early disease recognition. There are a number of diseases that answer to medical
treatment only in the early stages. Computer vision technology allows recognizing
symptoms when they are not yet apparent and enables doctors to intervene early. This
makes a huge difference in treating patients that would not otherwise get the help they
need. By recognizing early-onset illnesses, doctors can prescribe drugs to help fight
those diseases or even perform surgeries earlier and save lives. The aim here is to
accelerate the speed of the diagnosis process through computer vision employment and
make treatment more successful.
3. Enhanced medical procedures efficiency. Computer vision is known not only for
diagnostic accuracy but also for being generally efficient for patients and healthcare
professionals alike. In particular, computer-aided diagnoses minimize doctor and
patient interaction. This reduction is especially of benefit in light of the physician
shortage projections
1. Radiology and oncology. Computer vision has broad application in healthcare but
especially in the fields of radiology and oncology. The potential use cases include
monitoring of tumor progression, bone fractures detection, and the search for
metastases in the tissues. Breast cancer, lung cancer, leukemia, prostate cancer, and
others are all malignancies that can be detected through computer-aided diagnosis. In
particular, AI-powered solutions like IBM Watson Imaging Clinical Review are designed
to augment radiologists and make the medical image interpretation cheaper, faster, and
more accurate. They allow improving overall radiology department quality and
providing patients with better and more reliable medical care.
2. Cardiology. Although deep learning is still developing and its applications for
computer vision in the field of cardiology are limited, there are some ways in which CV
can benefit the industry. The rapid adoption of automated algorithms for computer
vision in radiology suggests the same is going to happen in other fields, too.
Remarkably, the incorporation of AI into cardiology is happening in the form of:
Vascular imaging
Artery highlighting
AI-aided echocardiographic views analysis
Automated cardiac pathology and anomaly detection
Automated analysis, diagnostics, and prognosis in cardiac CT
Electronic segmentation and calculation of variables in cardiac MRI
As a result, patient groups with cardiovascular risk will be able to get improved care as
physicians will be able to interpret more data in greater depth than ever before.
Computer vision algorithms will unobtrusively assist physicians and enable broader
characterization of patients’ disorders. As a result, they can potentially help plan early
intervention in patients at high risk and lead to better treatment selection and improved
outcomes.
4. Lab tests automation. Cloud computing technology is also used for blood count,
tissue cell analysis, change tracking, and other lab tests. Computer-vision powered
blood analyzers either take images of blood samples or receive comprehensible input in
the form of a picture of the already prepared slide containing a film of blood. As a rule,
trained professionals take such images from a custom-designed camera attached to an
ordinary microscope. Then, based on image processing and computer vision
technologies, the system processes the input and automatically detects specific
abnormalities in blood samples.