0% found this document useful (0 votes)
2 views

Deep Learning based Computer Vision

The document discusses artificial neural networks and deep learning, particularly in the context of image processing and computer vision. It covers topics such as digital images, image formats, spatial filtering, and the effectiveness of deep learning algorithms in tasks like object detection and recognition. Additionally, it explains the architecture of convolutional neural networks and their applications in various fields.

Uploaded by

lihit19426
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Deep Learning based Computer Vision

The document discusses artificial neural networks and deep learning, particularly in the context of image processing and computer vision. It covers topics such as digital images, image formats, spatial filtering, and the effectiveness of deep learning algorithms in tasks like object detection and recognition. Additionally, it explains the architecture of convolutional neural networks and their applications in various fields.

Uploaded by

lihit19426
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

ARTIFICIAL NEURAL NETWORKS AND DEEP LEARNING

Deep Learning and its role in Computer Vision


duction to Robotics

Dr. Sandeep Singh Sengar


Dr. Sandeep Singh Sengar 1
What is a Digital Image?

Image is a two-dimensional intensity function f(x, y), where


the value of f at a spatial location (x, y) is the intensity of the
image at that point.
y
x
Gray
Level
f(x,y)

Dr. Sandeep Singh Sengar


Common image formats
– 1 sample per point (B&W) [0,1]
– 1 sample per point (Grayscale)[0-255]
– 3 samples per point (Red, Green, and Blue)[0-255]
– 4 samples per point (Red, Green, Blue, and “Alpha”, a.k.a. Opacity) [0-
255, 0-1]

Dr. Sandeep Singh Sengar


Color Image

RGB Color Space


A color image is just three functions pasted
together. We can write this as a “vector-
valued” function:

 r ( x, y ) 
f ( x, y ) =  g ( x, y ) 
 

 b ( x, y ) 

Dr. Sandeep Singh Sengar


RGB Image

Dr. Sandeep Singh Sengar


Image Processing

An image processing operation typically defines a new


image g in terms of an existing image f.
We can write the following function for image transform:

Dr. Sandeep Singh Sengar


Why Digital Image Processing?
Digital image processing focuses on two major tasks
– Improvement of pictorial information for human interpretation
– Processing of image data for storage, transmission and
representation for autonomous machine perception
Some argument about where image processing ends and
fields such as image analysis and computer vision start

Dr. Sandeep Singh Sengar


The Spatial Filtering Process
Origin x
a b c j k l
d
g
e
h
f
i
* m
p
n
q
o
r
Original Filter (w)
Simple 3*3
e 3*3 Filter Image
Neighbourhoo Pixels
d
eprocessed = n*e + j*a + k*b
+ l*c + m*d + o*f + p*g + q*h
+ r*i
y Image f (x, y)

The above is repeated for every pixel in the original


image to generate the filtered image
Dr. Sandeep Singh Sengar
Levels of Digital Image Processing
The continuum from image processing to computer vision
can be broken up into low-, mid- and high-level processes

Low Level Process Mid Level Process High Level Process


Input: Image Input: Image Input: Attributes
Output: Image Output: Attributes Output: Understanding
Examples: Noise Examples: Object Examples: Scene
removal, image recognition, understanding,
sharpening segmentation autonomous navigation

Dr. Sandeep Singh Sengar


Spatial filters
Remember that types of neighborhood:

intensity transformation: neighborhood of size 1x1


spatial filter (or mask ,kernel, template or window): neighborhood of larger size, like 3*3 mask

The spatial filter mask is moved from point to point in an image. At each point (x, y),
the response of the filter is calculated
x

Neighbourhood

(x, y) Origin

y Sengar
Dr. Sandeep Singh Image f (x, y)
Neighbourhood Operations

For each pixel in the origin image, the outcome is written on


the same location at the target image.
x Target
Original

Neighbourhood

(x, y)
Origin

y Image f (x, y)
Dr. Sandeep Singh Sengar
The Spatial Filtering Process
Origin x
a b c j k l
d
g
e
h
f
i
* m
p
n
q
o
r
Original Filter (w)
Simple 3*3
e 3*3 Filter Image
Neighbourhood Pixels
eprocessed = n*e + j*a + k*b +
l*c + m*d + o*f + p*g + q*h +
r*i
y Image f (x, y)

The above is repeated for every pixel in the original


image to generate the filtered image
Dr. Sandeep Singh Sengar
Smoothing Spatial Filtering
Origin x
104 100 108

99 106 98

95 90 85
*
1/ 100108
9 /9 /9
104 1 1 Original Filter
Simple 3*3 /9 1106
199 /9 198
/9
3*3 Smoothing Image
Neighbourhood /9 190
195 /9 185
/9
Filter Pixels

e = 1/9*106 + 1/9*104 + 1/9*100 +


1/ *108 + 1/ *99 + 1/ *98 + 1/ *95 +
9 9 9 9
1/ *90 + 1/ *85 = 98.3333
y Image f (x, y) 9 9

The above is repeated for every pixel in


the original image to generate the
smoothed image Dr. Sandeep Singh Sengar
Spatial filters : Smoothing
linear smoothing : averaging kernels

Standard average

Dr. Sandeep Singh Sengar


15

Spatial filters : Smoothing


Standard Average- example

110 120 90 130 The mask is moved


from point to point in
91 94 98 200
an image. At each
90 91 99 100 point (x,y), the
response of the filter
82 96 85 90 is calculated

Standard averaging filter:


(110 +120+90+91+94+98+90+91+99)/9 =883/9 = 98.1

Dr. Sandeep Singh Sengar


Spatial filters : Smoothing
Weighted Average- example

Dr. Sandeep Singh Sengar


Spatial filters : Smoothing
Median Filter- example

Dr. Sandeep Singh Sengar


Another smoothing example
Smoothing example
By smoothing the original image we get rid of lots of the finer detail which
leaves only the gross features for thresholding.

Original Image Smoothed Image Thresholded Image

Dr. Sandeep Singh Sengar


Averaging filter vs. median filter example
Averaging filter vs. median filter example

Original Image Image After Image After


With Noise Averaging Filter Median Filter

• Filtering is often used to remove noise from images.


• Sometimes a median filter works better than an averaging filter.

Dr. Sandeep Singh Sengar


Strange things happen at the edges!
Strange things happen at the edges! (cont …)
At the edges of an image we are missing pixels to form a neighbourhood.
Origin
x
e e

e e e
y

Image f (x, y)

Dr. Sandeep Singh Sengar


21

What happens when the Values of the Kernel Fall Outside


the Image??!

Dr. Sandeep Singh Sengar


22

border padding

Dr. Sandeep Singh Sengar


Applications

Dr. Sandeep Singh Sengar


Dr. Sandeep Singh Sengar
Text Recognition

Dr. Sandeep Singh Sengar


Dr. Sandeep Singh Sengar
Dr. Sandeep Singh Sengar
Dr. Sandeep Singh Sengar
Biometrics

Dr. Sandeep Singh Sengar


Computer Vision

Dr. Sandeep Singh Sengar


“One picture is worth more than
thousand words”

Dr. Sandeep Singh Sengar


Object Detection
• Moving-object detection is one of the basic and most
active research domains in the field of computer vision.
• Underlying assumptions is that moving objects generally
entail intensity changes between consecutive frames.
Object Tracking
The object tracking is used to compute the configuration
(i.e., position and size) of the target in the subsequent
frames corresponding to the state of the target in the initial
frame.
Object Recognition

Object recognition is a computer vision technique for


identifying objects in images or videos.
Medical Image Segmentation

Medical
Imaging

Dr. Sandeep Singh Sengar


What is Machine Learning?
Machine learning is a subset of Artificial Intelligence, provides
computers with the ability to learn without being explicitly
programmed.

ML came in 1950s. Defined in 1951 by “Arthur Samuel” at IBM


(designed checkers play machine):

Ref:https://www.forbes.com/sites/kalevleetaru/2019/01/15/why-machine-learning-needs-semantics-not-
just-statistics/?sh=730fa3aa77b5 36
Dr. Sandeep Singh Sengar
Branch of Machine Learning

37
Dr. Sandeep Singh Sengar
Ref: https://www.wordstream.com/blog/ws/2017/07/28/machine-learning-applications
Deep Learning
Deep Learning is a subfield of machine learning concerned
with algorithms inspired by the structure and function of the
brain called artificial neural networks.
DL/ML is used to find the algorithm (model)
Large data High performance

Dr. Sandeep Singh Sengar


Ref: https://www.intel.la/content/www/xl/es/artificial-intelligence/posts/difference-between-ai-machine-learning-deep-learning.html
Why Deep Learning Today?
▪ Better algorithms and
understanding
▪ Computational power (GPUs,
TPUs, …)
▪ Massive labelled data
▪ Variety of open source tools
and models

Slide adapted from Wai K. Dr. Sandeep Singh Sengar


End-to-end approach?

Dr. Sandeep Singh Sengar


Ref: https://lawtomated.com/a-i-technical-machine-vs-deep-learning/
Deep Learning Process
▪ A deep neural network provides state-of-the-art
accuracy in many tasks, from object detection to
speech recognition
▪ They can learn automatically, without predefined
knowledge explicitly coded by the programmers

Dr. Sandeep Singh Sengar


Effectiveness of Deep Learning
▪ Deep learning algorithms attempt to learn
representation by using a hierarchy of multiple
layers
▪ If we provide the system tons of information, it
begins to understand it and respond in useful
ways
▪ Manually designed features are often over-
specified, incomplete and take a long time to
design and validate
▪ Learned features are easy to adapt, fast to learn

Dr. Sandeep Singh Sengar


Effectiveness of Deep Learning
▪ Deep learning provides a very flexible and
universal, learnable framework for representing
world
▪ Can learn in both unsupervised and supervised
manner
▪ Utilize large amounts of training data
▪ Since 2010, deep learning started outperforming
other machine learning techniques especially in
the areas of machine vision and speech
recognition

Dr. Sandeep Singh Sengar


Deep Learning Examples
▪ Hierarchy of representations with increasing level
of abstraction
▪ Each stage is a kind of trainable nonlinear feature
transform
▪ Image recognition example
• Pixel → edge → texton → motif → part → object
▪ Text example
• Character → word → word group → clause →
sentence → story

Dr. Sandeep Singh Sengar


Deep Learning in Practice
▪ Visual question answering : Given an image and a
natural language question about the image, the
task is to provide an accurate natural language
answer
▪ Click here for demo: http://visualqa.csail.mit.edu/

Dr. Sandeep Singh Sengar


Deep Learning Architectures

Architecture Application
CNN Image recognition, video analysis, natural language processing

RNN Speech recognition, handwriting recognition, Machine Translation

Natural language text compression, handwriting recognition,


LSTM/GRU networks
speech recognition, gesture recognition, image captioning

Image recognition, information retrieval, natural language


DBN
understanding, failure prediction

DSN Information retrieval, continuous speech recognition

Dr. Sandeep Singh Sengar


The Spatial Filtering Process
Origin x
a b c j k l
d
g
e
h
f
i
* m
p
n
q
o
r
Original Filter (w)
Simple 3*3
e 3*3 Filter Image
Neighbourhood Pixels
eprocessed = n*e + j*a + k*b
+ l*c + m*d + o*f + p*g + q*h
+ r*i
y Image f (x, y)

The above is repeated for every pixel in the original


image to generate the filtered image
Dr. Sandeep Singh Sengar
Convolutional Neural Network
A Convolutional Neural Network is a Deep Learning algorithm which can take
in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one from the other.
The pre-processing required in a CNN is much lower as compared to other
classification algorithms.

Dr. Sandeep Singh Sengar


CNN layers
An image is passed through a series of layers:
– Convolutional – filters can be thought of as feature
identifiers
⮚Nonlinear (ReLu) – approximate complex functions
– Max Pooling (down sampling)
– Fully connected layers – softmax/sigmoid
⮚ which produce an output.

Dr. Sandeep Singh Sengar


Ref: https://towardsdatascience.com/understanding-and-implementing-lenet-5-cnn-architecture-deep-learning-a2d531ebc342
Convolutional Neural Network

Dr. Sandeep Singh Sengar


Basic idea of Convolutional

Dr. Sandeep Singh Sengar


Convolutional Layer Example

Stride s=2
#filters=2
#channels=3
Padding p=1

Dr. Sandeep Singh Sengar


Size of Output
I/P size: n*n
Filter size: f*f
O/P size: (n-f+1)*(n-f+1)

Dr. Sandeep Singh Sengar


Padding and stride convolutions
Padding: It is used for same I/P and O/P size
For padding: p
O/P size=(n+2p-f+1)*(n+2p-f+1) i.e. p=(f-1)/2

Stride: s
O/P size= [(n+2p-f)/s+1]* [(n+2p-f)/s+1]

Dr. Sandeep Singh Sengar


Multiple filters
For example to detect Horizontal and vertical edges.

O/P size: (n×n×nc)*(f×f×nc) --> (n-f+1)*(n-f+1)*nc’

Here nc’=# of filters

Dr. Sandeep Singh Sengar


Number of parameters in one layer
Suppose 10 filters of size 3*3*3

Then total parameters will be: [3*3*3+1 (bias)]*10=280

That means one bias for each filter.

It is not dependent on the original image size (beauty of DL)


It makes model to less prone to overfitting.

Dr. Sandeep Singh Sengar


Automatically learnt features

Retain most information (edge detectors)

Towards more abstract representation

Encode high level concepts

Sparser representations:
Detect less (more abstract) features

https://towardsdatascience.com/applied-deep-learning-part-4-
convolutional-neural-networks-584bc134c1e2
Dr. Sandeep Singh Sengar
Non-linear Activation Function

Dr. Sandeep Singh Sengar


Pooling
▪ The goal of the pooling operation is to reduce the
spatial size of convolved features
▪ Pooling helps in extracting salient features which
are rotational and positional invariant
• For example, by changing the orientation of nose,
eyes and ears, the image segment would still be
detected as a head
• This is one of the most prominent features of CNNs

Dr. Sandeep Singh Sengar


Pooling
▪ Two types of pooling operators are common: Max
pooling and Average pooling
• Max pooling returns the maximum value from the
portion of the image covered by the filter
• Average pooling returns the average of all the
values from the portion of the image covered by the
filter

Dr. Sandeep Singh Sengar


Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8 22.3


12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8 22.3


12.7 18.3 22.3 7.9 8.3

18.3
11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8 22.3


12.7 18.3 22.3 7.9 8.3

18.3 18.9
11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Average Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

14.8
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Average Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

14.8 15.6
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar


Max Pooling

Possible Nodes in Hidden Layer i + 1

9 4x4 max
Hidden Layer i

-4 5 4 6
5 6
0 -3 2 -3 2x2 max,
8 9 non overlapping
7 8 -5 9
3 0 -4 1
5 5 6 2x2 max,
overlapping
8 8 9 (contains non-
I/P size: n*n overlapping, so
8 8 9 no need for both)
Filter size: f*f
Padding=p, Stride=s
O/P size: (n+2p-f)/s+1 Dr. Sandeep Singh Sengar
Fully Connected Layer

Dr. Sandeep Singh Sengar


Fully Connected Layer
• Simply, feed forward neural networks.
• Fully Connected Layers form the last few layers in the
network.
• The input to the fully connected layer is the output from
the final Pooling or Convolutional Layer in the flattened form.
• After passing through the fully connected layers, the final
layer uses the softmax activation function which is used to
get probabilities of the input being in a particular class
(classification).

Dr. Sandeep Singh Sengar


CNN Architectures
There are various architectures of CNNs available which have
been key in building algorithms which power and shall power AI
as a whole in the foreseeable future. Some of them have been
listed below:
• LeNet
• AlexNet
• VGGNet
• GoogLeNet
• ResNet
• ZFNet

Dr. Sandeep Singh Sengar


U-Net

Ref: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." In International Conference on Medical image computing and computer-assisted
intervention, pp. 234-241. Springer, Cham, 2015. Dr. Sandeep Singh Sengar
Train, Validation and Test Datasets

• Training Dataset: The sample of data used to fit the model.


• Validation Dataset: The validation set is used to evaluate a given model. We as machine learning
researchers use this data to fine-tune the model hyperparameters. Hence the model occasionally
sees this data, but never does it “Learn” from this. So the validation set in a way affects a model, but
indirectly.
• Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the
training dataset. The Test dataset provides the gold standard used to evaluate the model. It is only
used once a model is completely trained (using the train and validation sets).

Make sure, validation and test set come from same distribution

Hyper parameters: Learning rate, #iterations, Dr.


#hidden layers,Singh
Sandeep #hidden units, choice of activation function
Sengar
Under-fitting and Over-fitting

High bias: under fitting


High variance: Overfitting
Dr. Sandeep Singh Sengar
Bias and Variance

Training set error 1% 15% 15% 0.5%


Validation set error 11% 16% 30% 1%
Result High High bias High bias and Low bias and
variance variance variance

Dr. Sandeep Singh Sengar


Bias-variance Trade-off

Dr. Sandeep Singh Sengar


Under fitting (High bias)
• A statistical model or a machine learning algorithm is said to have under
fitting when it cannot capture the underlying trend of the data.
• Under fitting destroys the accuracy of our machine learning model.
• Training accuracy is much low in this case.

Steps for reducing under fitting:


⮚ Bigger Network
⮚ Train long duration
⮚ Increase the number of parameters in the model

Dr. Sandeep Singh Sengar


Overfitting (high variance)
• Overfitting happens when your model fits too well to the training set.
• It then becomes difficult for the model to generalize to new examples
that were not in the training set.
Steps for reducing overfitting:
⮚ Add more data
⮚ Data augmentation (rotate, crop, zoom)
⮚ Simplify the model
⮚ Change the training process (like loss function)
⮚ Early termination
⮚ Regularization
❑ Dropout and drop connect
❑ L1 and L2 regularization

Dr. Sandeep Singh Sengar


Ideas to improve ML/DL strategies
• Collect more data
• Collect more diverse training examples
• Train algorithm longer with suitable optimizer
• Try bigger network
• Try smaller network
• Try dropout
• Add regularization
• Network architectures:
❑ Activation function
❑ #hidden units
❑ Learning rate
❑ Iterations

Dr. Sandeep Singh Sengar


Problems where ML/DL significantly surpasses
human level performance
• Online advertising: estimate, how likely someone will click on it
• Product recommendations
• Loan approval
• Lots of data

Dr. Sandeep Singh Sengar


CNN for Computer Vision tasks
• Object detection • Image Classification With
• Object Tracking Localization
• Recognition • Object Segmentation
• Face Recognition • Image Style Transfer
• Action and Activity • Image Colorization
Recognition • Image Reconstruction
• Human Pose Estimation • Image Super-Resolution
• Image Classification • Image Synthesis

Dr. Sandeep Singh Sengar


Challenges

The challenge of making • Difficult to simulate something as


systems human-like complex as the human visual system.
• Objects may be in variety of sizes
and aspect ratios.
• Distinguish one object from multiple
others.
• Variety of handwriting styles, curves,
and shapes employed while writing.
• Deformation, appearance variation,
scale variation, occlusion, rotation of
objects.

Computer vision has its present challenges, but the humans working on this technology are steadily
improving it. Dr. Sandeep Singh Sengar
CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example
Filters Features Maps

Dr. Sandeep Singh Sengar


CNN: A Real Example
Filters Features Maps

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


CNN: A Real Example

Dr. Sandeep Singh Sengar


Convolutional Neural Network
Let the task is to predict an image caption
▪ The CNN receives an image of let's say a cat

• This image, in computer term, is a collection of the pixel


▪ Generally, one layer for the greyscale picture and three
layers for a color picture
▪ During the feature learning (i.e., hidden layers), the
network will identify unique features, for instance, the
tail of the cat, the ear, etc.
▪ When the network thoroughly learned how to recognize
a picture, it can provide a probability for each image it
knows
▪ The label with the highest probability will become the
prediction of the network
Dr. Sandeep Singh Sengar
Which Works Better: RNN or CNN?
▪ There is a vast amount of neural network, where
each architecture is designed to perform a given
task
▪ CNN works very well with images
▪ RNN (Recurrent Neural Network) provides
impressive results with time series and text
analysis

Dr. Sandeep Singh Sengar


Self-Review Questions
▪ What is convolution and how it works?
▪ What is pooling and how it works?
▪ What would be the impact of large/small
striding length?

Dr. Sandeep Singh Sengar


References

“Digital Image Processing”, Rafael C.


Gonzalez & Richard E. Woods,
Addison-Wesley, 2002
– Much of the material that follows is taken from
this book

“Machine Vision: Automated Visual


Inspection and Robot Vision”, David
Vernon, Prentice Hall, 1991

Dr. Sandeep Singh Sengar


Thank You

Dr. Sandeep Singh Sengar


Dr. Sandeep Singh Sengar

You might also like