0% found this document useful (0 votes)

11 views76 pages

1 Intro To CV

Computer vision is a branch of artificial intelligence that enables computers to interpret and understand visual information from the world, such as images and videos. It encompasses various applications including image recognition, face detection, and augmented reality, and relies on techniques like data collection, preprocessing, and model training using deep learning. The field has evolved significantly with advancements in technology, leading to increased enterprise interest and a projected market growth to $28 billion by 2030.

Uploaded by

Akhilesh S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views76 pages

1 Intro To CV

Uploaded by

Akhilesh S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 76

COMPUTER VISION

Dr. Tripty Singh, Asso Prof

Senior Member IEEE
Department of Computer Science & Engineering,
Amrita School of Engineering, Bengaluru
1
Computer vision is a
field of computer
science that focuses on
enabling computers to
identify and understand
objects and people in
images and videos. Like
other types of AI,
computer vision seeks to
perform and automate
tasks that replicate
human capabilities.

2
COMPUTER VISION, OFTEN ABBREVIATED
AS CV, IS DEFINED AS A FIELD OF STUDY
THAT SEEKS TO DEVELOP TECHNIQUES TO
HELP COMPUTER "SEE" AND UNDERSTAND
THE CONTENT OF DIGITAL IMAGES SUCH
AS PHOTOGRAPHS AND VIDEOS.
INTRODUCTION
Computer Vision is a field of artificial intelligence (AI) that
enables computers and systems to derive meaningful information
from digital images, videos, and other visual inputs—and take
actions or make recommendations based on that information.
If AI enables computers to think, computer vision enables them
to see, observe, and understand.
DIP focuses on enhancing, restoring, compressing, or
transforming images using mathematical operations, while CV
aims to extract meaningful information, recognize objects, or
perform actions based on images using artificial intelligence.
APPLICATIONS OF
COMPUTER VISION
•Image Recognition: Identifying objects, places, people, writing, and actions in images or videos. It is
used in applications such as photo tagging on social media, diagnostics in healthcare, and autonomous
vehicles.

•Face Detection and Recognition: Identifying and verifying human faces in images and videos. This is
widely used in security systems and user identification processes.

•Object Detection: Locating objects within an image and identifying each object. This is crucial for
applications like self-driving cars, where the system must recognize objects on the road.

•Image Segmentation: Dividing an image into parts or segments to simplify its analysis. It's used in
medical imaging to distinguish different tissues and organs.

•Optical Character Recognition (OCR): Converting different types of documents, such as scanned
paper documents or PDFs, into editable and searchable data.

•Augmented Reality (AR): Overlaying digital content on the real world. Examples include Snapchat
filters and apps like Pokemon Go.
HOW DOES COMPUTER
VISION
Computer WORK
Vision relies on pattern recognition. Machines are trained to
recognize patterns through:
1.Data Collection: Gathering a large dataset of images or videos with
labeled objects.
2.Preprocessing: Preparing the data for analysis, such as resizing images,
normalizing pixel values, and augmenting the dataset with transformations
like rotations and flips.
3.Feature Extraction: Identifying the important features or patterns
within the images. Traditional methods include techniques like edge
detection, texture analysis, and histogram of oriented gradients (HOG).
4.Model Training: Using machine learning algorithms, particularly deep
learning, to create models that can recognize patterns and objects in
images. Convolutional Neural Networks (CNNs) are particularly effective for
image-related tasks.
5.Inference: Applying the trained model to new images to recognize
patterns, detect objects, and classify images.
COMPUTER VISION AND IMAGE PROCESSING
Computer vision is distinct from image processing.
Image processing is the process of creating a new image
from an existing image, typically simplifying or enhancing the
content in some way. It is a type of digital signal processing
and is not concerned with understanding the content of an
image.
A given computer vision system may require image
processing to be applied to raw input, e.g. pre-processing
images.
Examples of image processing include:
Normalizing photometric properties of the image, such as
brightness or color.
Cropping the bounds of the image, such as centering an
object in a photograph.
Removing digital noise from an image, such as digital
artifacts from low light levels.
HOW DOES COMPUTER VISION
WORK?
Let’s leave our fluffy cat friends for a moment on the side and let’s get more technical🤔😹.
Below is a simple illustration of the grayscale image buffer which stores our image of
Abraham Lincoln. Each pixel’s brightness is represented by a single 8-bit number, whose
range is from 0 (black) to 255 (white):
Black 0 00000000
White 255 11111111
In point of fact, pixel values are almost universally stored, at the hardware
level, in a one-dimensional array. For example, the data from the image
above is stored in a manner similar to this long list of unsigned chars:

This way of storing image data may run counter to your expectations, since the data
certainly appears to be two-dimensional when it is displayed. Yet, this is the case, since
computer memory consists simply of an ever-increasing linear list of address spaces.
THE EVOLUTION OF
COMPUTER VISION
Before the advent of deep learning, the tasks that computer vision could
perform were very limited and required a lot of manual coding and effort
by developers and human operators. For instance, if you wanted to
perform facial recognition, you would have to perform the following steps:

Capture
Correction
Create a Annotate new Feature Feature
of error
database: images images Selection Reduction
Margins
THE EVOLUTION OF COMPUTER VISION- TO
DEEP LEARNING
COMPUTER VISION APPLIES
HERE TOO
The increased sophistication of Computer vision, artificial
neural networks (ANNs) coupled with the availability of
AI-powered chips have driven am unparalleled enterprise
interest in computer vision (CV). This exciting new
technology will find myriad applications in several
industries, and according to GlobalData forecasts, it would
reach a market size of $28bn by 2030.The increasing
adoption of AI-powered computer vision solutions,
consumer drones; and the rising Industry 4.0 adoption will
drive this phenomenal change. Here are the top computer
vision trends that will be behind the growth of computer
vision for modern enterprises-
Deep learning
3 D holographic imaging
Thermal Imaging
Liquid Lenses
Drones
COMPUTER VISION FOR AUTONOMOUS ROBOTS
LANE DETECTION
GAMES
FACE-TAGGING
The difference between computer vision and image processing in
Computer vision helps to gain high-level of understanding from images or
videos.
For instance, object recognition, which is the process of identifying the
type of objects in an image, is a computer vision problem. In computer
vision, you receive an image as input, and you can produce an image as
output or some other type of information.Whereas, image processing
doesn’t need such a high level of understanding of image. In fact, it is the
sub-field of signal processing but also applied to images. For example, if
you have noisy or blurred images, then under image processing the
deblurring or denoising is done to make the object in the image clearly
visible to machines.
Image process task involves filtering, noise removal, edge detection, and
color processing. In entire processing, you receive an image as input and
produce another image as an output that can be used to train the machine
through computer vision.
The main difference between computer vision and image processing are
the goals (not the methods used). For example, if the goal is to enhance
the image quality for later use, which is called image processing. If the
goal is to visualize like humans, like object recognition, defect detection or
automatic driving, then it is called computer vision.
WHAT IS COMPUTER VISION IN
AI AND MACHINE LEARNING?
Computer vision is simply the process of perceiving the images and
videos available in the digital formats. In Machine Learning (ML)
and AI – Computer vision is used to train the model to recognize
certain patterns and store the data into their artificial memory to
utilize the same for predicting the results in real-life use.
The main purpose of using computer vision technology in ML and AI
is to create a model that can work itself without human
intervention. The whole process involves methods of acquiring the
data, processing, analyzing, and understanding the digital images
to utilize the same in the real-world scenario.
Let’s clear things up: artificial intelligence
(AI), machine learning (ML), and deep
learning (DL) are three different things.
•Artificial intelligence is a science like
mathematics or biology. It studies ways to
build intelligent programs and machines that
can creatively solve problems, which has
always been considered a human
prerogative.
•Machine learning is a subset of artificial
intelligence (AI) that provides systems the
ability to automatically learn and improve
from experience without being explicitly
programmed. In ML, there are different
algorithms (e.g. neural networks) that help to
solve problems.
•Deep learning, or deep neural learning,
is a subset of machine learning, which uses
the neural networks to analyze different
factors with a structure that is similar to the
IMAGE DIGITIZATION

Sampling: measure the value of an image at a finite number of

points.
Quantization: represent measured value (i.e., voltage) at the
sampled point by an integer.
READING IMAGE IN MATLAB
Images are read in MATLAB environment using the function ‘imread.’
Syntax of imread is:
imread(‘filename’);
where ‘filename’ is a string having the complete name of the image,
including its extension.
For example,
>>F = imread(Penguins_grey.jpg);
>>G = imread(Penguins_RGB.jpg);
Please note that when no path information is included in ‘filename,’
‘imread’ reads the file from the current directory. When an image
from another directory has to be read, the path of the image has to
be specified.
Semicolon (;) at the end of a statement is used to suppress the
output. If it is not included, MATLAB displays on the screen the result
of the operation specified in that line.
‘>>’ indicates the beginning of a command line as it appears in the
MATLAB command window.
READING IMAGE IN MATLAB
Images are read in MATLAB environment using the function ‘imread.’ Syntax of
imread is:
imread(‘filename’);
where ‘filename’ is a string having the complete name of the image, including its
extension.
For example,
>>F = imread(Penguins_grey.jpg);
>>G = imread(Penguins_RGB.jpg);
Please note that when no path information is included in ‘filename,’ ‘imread’
reads the file from the current directory. When an image from another directory
has to be read, the path of the image has to be specified.
Semicolon (;) at the end of a statement is used to suppress the output. If it is not
included, MATLAB displays on the screen the result of the operation specified in
that line.
‘>>’ indicates the beginning of a command line as it appears in the MATLAB
command window.
>> imshow(G);
>> imshow(F);
Figure 1 (upward) Image obtained with imshow(F) command
Figure 2 (downward) Image obtained with command imshow(G)
>> A=imread(‘Penguins_grey.jpg’);
>> B=imread(‘Penguins_RGB.jpg’);
>>figure
>>subplot(1,2,1),imshow(A)
>>subplot(1,2,2),imshow(B)
SAMPLING AND QUANTIZATION
In order to become suitable for digital processing, an image function f(x,y) must be
digitized both spatially and in amplitude. Typically, a frame grabber or digitizer is used to
sample and quantize the analogue video signal. Hence in order to create an image which is
digital, we need to covert continuous data into digital form. There are two steps in which it
is done:
•Sampling
•Quantization
The sampling rate determines the spatial resolution of the digitized image, while the
quantization level determines the number of grey levels in the digitized image. A magnitude
of the sampled image is expressed as a digital value in image processing. The transition
between continuous values of the image function and its digital equivalent is called
quantization.
The number of quantization levels should be high enough for human perception of fine
shading details in the image. The occurrence of false contours is the main problem in image
which has been quantized with insufficient brightness levels.
TYPES OF AN IMAGE
BINARY IMAGE– The binary image as its name suggests, contain only two pixel
elements i.e 0 & 1,where 0 refers to black and 1 refers to white. This image is
also known as Monochrome.

BLACK AND WHITE IMAGE– The image which consist of only black and white
color is called BLACK AND WHITE IMAGE.

8 bit COLOR FORMAT– It is the most famous image format.It has 256 different
shades of colors in it and commonly known as Grayscale Image. In this format, 0
stands for Black, and 255 stands for white, and 127 stands for gray.

16 bit COLOR FORMAT– It is a color image format. It has 65,536 different colors
in it.It is also known as High Color Format. In this format the distribution of color
is not as same as Grayscale image.

A 16 bit format is actually divided into three further formats which are Red, Green
and Blue. That famous RGB format.

Pixel = f (r,g,b)

Red =f( 256,0,0)

Green f(0,256,0)

Purpel approxx f(100,150,256)

12 12 12 12
0 0 0 75 75 75
8 8 8 8
12 12 12 25 25 25
0 75 75 75
8 8 8 5 5 5
20 20 20 25 25 25 20
75 75 75
0 0 0 5 5 5 0
12 12 12 20 20 25 25 20 20 20
8 8 8 0 0 5 5 0 0 0
12 12 12 25 25 20 20 20
75 75
8 8 8 5 5 0 0 0
17 17 17 22 22 22 10
75 75 75
5 5 5 5 5 5 0
17 17 10 10 10 22 22 10
75 75
5 5 0 0 0 5 5 0
75 75 75 35 35 35 0 0 0 35

35 35 35 0 0 0 35 35 35 75
10 10 10 20 20 20 20
75 75 75
0 0 0 0 0 0 0
Sampling

32
64

128

256

512

1024
Sampling:-The sampling rate determines
the spatial resolution of the digitized
image

1024 512 256

128 64 32
Quantization:- the quantization level determines the number of grey
levels in the digitized image. A magnitude of the sampled image is
expressed as a digital value in image processing. The transition
between continuous values of the image function and its digital
equivalent is called quantization.
Rounding of grey levels – 1 bit to 16 bit rounding
8-bit 7-bit 6-bit 5-bit

4-bit 3-bit 2-bit 1-bit

COLOR IMAGES
Color images are comprised of three color channels –
red, green, and, blue – which combine to create most of
the colors we can see.

=
COLOR IMAGES
 r ( x, y ) 
f ( x , y )   g ( x, y ) 
 
 b( x, y ) 
COLOR SENSING IN CAMERA:
PRISM
Requires three chips and precise alignment.

CCD(R)

CCD(G)

CCD(B)
COLOR SENSING IN CAMERA:
COLOR FILTER ARRAY
• In traditional systems, color filters are applied to a single
layer of photodetectors in a tiled mosaic pattern.

Bayer grid
Why more green?

Human Luminance Sensitivity Function

COLOR SENSING IN CAMERA:
COLOR FILTER ARRAY

red green blue output

demosaicing
(interpolation)
COLOR SENSING IN CAMERA:
FOVEON X3
• CMOS sensor; takes advantage of the fact that red, blue
and green light silicon to different depths.

http://www.foveon.com/article.php?a=67
ALTERNATIVE COLOR SPACES
Various other color representations can be computed from RGB.
This can be done for:
 Decorrelating the color channels:
 principal components.
 Bringing color information to the fore:
 Hue, saturation and brightness.
 Perceptual uniformity:
 CIELuv, CIELab, …
COLOR TRANSFORMATION -
EXAMPLES
SKIN COLOR
RGB rg
r

g
SKIN DETECTION

M. Jones and J. Rehg,

Statistical Color Models with Application to Skin Detection, International
Journal of Computer Vision, 2002.
RESIZING IMAGE
Image interpolation occurs when
you resize or distort your image from
one pixel grid to another.
Image resizing is necessary when
you need to increase or decrease the
total number of pixels, whereas
remapping can occur when you are
correcting for lens distortion or rotating
an image.
Color image processing
Brightness Adaptation
Contrast control
Transpositioning
Softening EDGES
Smoothening + Brightening
Resizing Images

Zooming : Creating new pixel locations

Assigning gray-level values to these locations
COMMON IMAGE FILE
FORMATS
GIF (Graphic Interchange Format) -
PNG (Portable Network Graphics)
JPEG (Joint Photographic Experts Group)
TIFF (Tagged Image File Format)
PGM (Portable Gray Map)
FITS (Flexible Image Transport System)
IMAGE PROCESSING
An image processing operation typically
defines a new image g in terms of an existing
image f.
We can transform either the range of f.

Or the domain of f:

What kinds of operations can each perform?

LEVELS OF COMPUTER
VISION
The continuum from image processing to computer vision can be
broken up into low-, mid- and high-level processes

Low Level Process Mid Level Process High Level

Input: Image Input: Image Process
Input: Attributes
Output: Image Output: Attributes Output:
Understanding
Examples: Noise Examples: Object
removal, image recognition, Examples: Scene
sharpening segmentation understanding,
autonomous
navigation