0% found this document useful (0 votes)
34 views80 pages

Convolution Neural Networks

machine learning

Uploaded by

sudheer.emoa229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views80 pages

Convolution Neural Networks

machine learning

Uploaded by

sudheer.emoa229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Convolution Neural Networks

How computer Reads an Image?


Why convolutional Neural Networks?

• Convolutional Neural Networks


are made up of neurons with
learnable weights and biases.
• Each neuron receives several
inputs, takes a weighted sum
over them, pass it through an
activation function and
responds with an output.
• The whole network has loss
function.
Introduction to convolutional neural network
• A convolutional neural network is a specific kind of neural
network with multiple layers.
• It processes data that has a grid-like arrangement then extracts
important features.
Layers in a Convolutional Neural Network

• A convolution neural network has multiple hidden layers that help


in extracting information from an image.
• The four important layers in CNN are:
1. Convolution layer
2. ReLU layer
3. Pooling layer
4. Fully connected layer
How does CNN work?
How does CNN work?
• In CNN, every image is represented in the form of an array of pixel values.
How does CNN work?
Convolutional Layer
• A convolution layer has several filters that perform the convolution
operation. Every image is considered as a matrix of pixel values.
• Consider the following 5x5 image whose pixel values are either 0 or 1.
There’s also a filter matrix with a dimension of 3x3.
• Slide the filter matrix over the image and compute the dot product to
get the convolved feature matrix.
• In convolution operation, the arrays are multiplied element-wise, and
the product is summed to create a new array, which represents a*b.
Selecting Three Filters
Sliding the filter throughout the image
Sliding the filter throughout the image
Sliding the filter throughout the image
Sliding the filter throughout the image
Sliding the filter throughout the image
Relu Layer
• ReLU stands for the rectified linear unit.
Once the feature maps are extracted, the
next step is to move them to a ReLU layer.
• ReLU performs an element-wise operation
and sets all the negative pixels to 0. It
introduces non-linearity to the network,
and the generated output is a rectified
feature map.
Pooling Layer
• Pooling is a down-sampling operation that reduces the
dimensionality of the feature map.
• The rectified feature map now goes through a pooling layer to
generate a pooled feature map.
Pooling Layer
• The pooling layer uses various filters to identify different parts of
the image like edges, corners, body, feathers, eyes, and beak.
Convolution-RELU- Pooling
Flattening Layer
• The next step in the process is called flattening. Flattening is used
to convert all the resultant 2-Dimensional arrays from pooled
feature maps into a single long continuous linear vector.
Overview of CNN
• The pixels from the image are fed to the convolutional layer that
performs the convolution operation
• It results in a convolved map
• The convolved map is applied to a ReLU function to generate a rectified
feature map
• The image is processed with multiple convolutions and ReLU layers for
locating the features
• Different pooling layers with various filters are used to identify specific
parts of the image
• The pooled feature map is flattened and fed to a fully connected layer to
get the final output
Overview of CNN
Softmax Layer
Pooling

Average Pooling Max Pooling


Stride
zero padding = 1
Convolution Layer
• A convolution is a linear operation that involves the multiplication
between an array of input data and weights, called a filter or a
kernel.
• The output of a convolution is referred to as a feature map.
• The type of multiplication applied between a filter-sized patch of
the input and the filter is a dot product.
• The result of convolution operation is called as scalar product.
• Filter is applied to each overlapping part of input data.
• Convolution supports translation invariance.
Cont.,
• In general, a convolution layer will transform an input into a stack
of feature mappings of that input.
• Input vectors (Image)
• Filters (Feature Detector)
• Output vectors (Feature map)

• The depth of the feature map stack depends on how many filters
you define for a layer.
• The final target of convolution is to identify all important
features of the image by keeping minimum features.
• The usage of ReLU helps to
prevent the exponential
growth in the computation
required to operate the
neural network.
• RELU suffers from exploding
gradient and dying RELU • As the backpropagation
problem. algorithm advances
• As the backpropagation downwards, gradients often
algorithm advances get larger known as exploding
downwards, gradients often gradient.
get smaller known as vanishing
gradient.
Translation Invariance
CNN-cont.,
• Max pooling accounts for any spatial or textural distortion.
• Pooling mainly helps in extracting sharp and smooth features.
• Max-pooling helps in extracting low-level features and average
pooling extracts smooth features.
• Reducing the number of parameters through max-pooling reduces
the chance of overfitting.
• The Flattening process simply means to reorder the Pooled
Feature Map in a single column.
CNN Architectures
Contents
• Lenet-5
• AlexNet
• VGG-16, 19
• GoogleNet
• ResNet-50
LeNet-5 (Yann LeCun et al. 1998)
• LeNet-5 is a very efficient multi-layer convolutional neural network for
handwritten character recognition.
• Convolutional neural networks can make good use of the structural
information of images.
• The convolutional layer has fewer parameters, which is also determined by
the main characteristics of the convolutional layer, that is, local connection
and shared weights.
• 5 layers with learnable parameters.
• The input to the model is a grayscale image.
• It has 3 convolution layers, two average pooling layers, and two fully
connected layers with a softmax classifier.
• The number of trainable parameters is 60000.
Architecture
AlexNet
• AlexNet is one of the variants of CNN which is also referred to as a
Deep Convolutional Neural Network.
• Proposed by Geoffery E. Hinton& Alex Krizhevsky
• ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012)
in the year 2012- top 5 error of 15.3%.
• Eight layers including five convolutional layers followed by three
fully connected layers.
• ImageNet Dataset (Classes in ImageNet-
https://deeplearning.cms.waikato.ac.nz/user-guide/class-
maps/IMAGENET/)
AlexNet
• AlexNet architecture consists of 5 convolutional layers, 3 max-
pooling layers, 2 normalization layers, 2 fully connected layers,
and 1 softmax layer.
• Each convolutional layer consists of convolutional filters and a
nonlinear activation function ReLU.
• The pooling layers are used to perform max pooling.
• Input size is fixed due to the presence of fully connected layers.
• AlexNet has 60 million parameters.
Architecture
Dimension Table
Feature Map Calculation (Dimension)
Highlights of AlexNet
• Batch normalization is a process to make neural networks faster and
more stable through adding extra layers in a deep neural network.
• CNN using ReLU was able to reach a 25% error on the CIFAR-10
dataset was six times faster than a CNN using tanh.
• Overlapping Max Pooling- reduction in error by about 0.5% and found
that models with overlapping pooling generally find it harder to
overfit.
• Dropout- “turning off” neurons with a predetermined probability.
• Removing any of the convolutional layers will drastically degrade the
performance of AlexNet.
Dropouts
Flower Dataset
https://www.robots.ox.ac.uk/~vgg/data/flowers/17/
VGG16-GoogleNet-ResNet-
Pretrained Models
VGG-16
• VGG 16 was proposed by Karen Simonyan and Andrew Zisserman
of the Visual Geometry Group Lab of Oxford University in 2014 in the
paper VERY DEEP CONVOLUTIONAL NETWORKS FOR
LARGE-SCALE IMAGE RECOGNITION.
• This architecture achieved top-5 test accuracy of 92.7% in ImageNet,
which has over 14 million images belonging to 1000 classes.
• Versions-VGG-11, VGG-13, VGG-16, VGG-19
Cont.,
• The region in the input space that a particular CNN's feature is
affected by is called Receptive field.
• The size of the output activation map is the same as the input
image dimensions and the spatial resolution is preserved.
• Three consecutive 3x3 filters are equal to a receptive filed of 7x7.
• Two consecutive 3x3 filters are equal to a receptive field of 5x5.
• Small filters with activation layers makes the decision function
more discriminative.
• This would reduce the tendency of the network to over-fit during
the training exercise.
VGG-16
GoogleNet
• Inception leverages feature
• Google Net was proposed by detection at different scales
research team at Google. through convolutions with
different filters.
• Winner of the ILSVRC 2014 image
classification challenge. • Reducing the input size between
the inception module is another
• 1×1 convolutions in the middle of effective method of lessening the
the architecture and global network's computational load.
average pooling.
• Extract distinct features parallelly
• Inception module is a specialized and finally concatenate them
neural network architecture. later.
• GoogLeNet is a 22-layer deep • Model learns local and abstract
convolutional neural network. features which in turn enhances
the model performance.
GoogleNet Architecture
Auxilary Classifiers
• Auxilary Classifiers are added to the intermediate layers of the
architecture, namely the third (Inception 4[a]) and sixth
(Inception4[d]).
• The purpose of an auxiliary classifier is to perform a classification
based on the inputs within the network's midsection and add the
loss calculated during the training back to the total loss of the
network.
• An auxiliary classifier consists of an average pool layer, a conv
layer, two fully connected layers, a dropout layer(70%), and finally
a linear layer with a softmax activation function.
ResNet-50
• Residual Networks is a classic
neural network used as a
backbone for many computer
vision tasks.
• This model was the winner of
ImageNet challenge in 2015. • ResNet-18, ResNet-34, ResNet-
50, ResNet-101, ResNet-110,
• Deep networks are hard to train
because of the notorious ResNet-152, ResNet-164,
vanishing gradient problem — as ResNet-1202
the gradient is back-propagated • ResNet first introduced the
to earlier layers, repeated concept of skip connection/
multiplication may make the shortcut connections that leads to
gradient extremely small.
identity mapping.
Architecture of ResNet-50 deep learning
model
Dimension Table
Identity Block

Convolution Block
Identity block & Convolutional block
Cont.,
• The addition of the identity connection does not introduce extra
parameters.
• Extra Zero entries should be padded for increasing dimensions.
• Skip connections mitigate the problem of vanishing gradient by
allowing this alternate shortcut path for gradient to flow through.
• They allow the model to learn an identity function which ensures
that the higher layer will perform similar to the lower layer.
Pre-trained models

• Feature extraction A pre-trained model is a saved


network that was previously
• Use the Architecture of the pre- trained on a large dataset, typically

trained on a large-scale image-


classification task.
• Train some layers while freeze

others
Learnings in different Layers
When we can go for pre-trained models?
• Size of the Data set is small while
the Data similarity is very high.
• Size of the data is small as well as
data similarity is very low
• Size of the data set is large
however the Data similarity is very
low
• Size of the data is large as well as
there is high data similarity
Tensorflow flower Dataset
References
• Mohamed Elgendy, Deep Learning for Vision Systems, Manning
Publications.

You might also like