Convolution Neural Networks
Convolution Neural Networks
• The depth of the feature map stack depends on how many filters
you define for a layer.
• The final target of convolution is to identify all important
features of the image by keeping minimum features.
• The usage of ReLU helps to
prevent the exponential
growth in the computation
required to operate the
neural network.
• RELU suffers from exploding
gradient and dying RELU • As the backpropagation
problem. algorithm advances
• As the backpropagation downwards, gradients often
algorithm advances get larger known as exploding
downwards, gradients often gradient.
get smaller known as vanishing
gradient.
Translation Invariance
CNN-cont.,
• Max pooling accounts for any spatial or textural distortion.
• Pooling mainly helps in extracting sharp and smooth features.
• Max-pooling helps in extracting low-level features and average
pooling extracts smooth features.
• Reducing the number of parameters through max-pooling reduces
the chance of overfitting.
• The Flattening process simply means to reorder the Pooled
Feature Map in a single column.
CNN Architectures
Contents
• Lenet-5
• AlexNet
• VGG-16, 19
• GoogleNet
• ResNet-50
LeNet-5 (Yann LeCun et al. 1998)
• LeNet-5 is a very efficient multi-layer convolutional neural network for
handwritten character recognition.
• Convolutional neural networks can make good use of the structural
information of images.
• The convolutional layer has fewer parameters, which is also determined by
the main characteristics of the convolutional layer, that is, local connection
and shared weights.
• 5 layers with learnable parameters.
• The input to the model is a grayscale image.
• It has 3 convolution layers, two average pooling layers, and two fully
connected layers with a softmax classifier.
• The number of trainable parameters is 60000.
Architecture
AlexNet
• AlexNet is one of the variants of CNN which is also referred to as a
Deep Convolutional Neural Network.
• Proposed by Geoffery E. Hinton& Alex Krizhevsky
• ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012)
in the year 2012- top 5 error of 15.3%.
• Eight layers including five convolutional layers followed by three
fully connected layers.
• ImageNet Dataset (Classes in ImageNet-
https://deeplearning.cms.waikato.ac.nz/user-guide/class-
maps/IMAGENET/)
AlexNet
• AlexNet architecture consists of 5 convolutional layers, 3 max-
pooling layers, 2 normalization layers, 2 fully connected layers,
and 1 softmax layer.
• Each convolutional layer consists of convolutional filters and a
nonlinear activation function ReLU.
• The pooling layers are used to perform max pooling.
• Input size is fixed due to the presence of fully connected layers.
• AlexNet has 60 million parameters.
Architecture
Dimension Table
Feature Map Calculation (Dimension)
Highlights of AlexNet
• Batch normalization is a process to make neural networks faster and
more stable through adding extra layers in a deep neural network.
• CNN using ReLU was able to reach a 25% error on the CIFAR-10
dataset was six times faster than a CNN using tanh.
• Overlapping Max Pooling- reduction in error by about 0.5% and found
that models with overlapping pooling generally find it harder to
overfit.
• Dropout- “turning off” neurons with a predetermined probability.
• Removing any of the convolutional layers will drastically degrade the
performance of AlexNet.
Dropouts
Flower Dataset
https://www.robots.ox.ac.uk/~vgg/data/flowers/17/
VGG16-GoogleNet-ResNet-
Pretrained Models
VGG-16
• VGG 16 was proposed by Karen Simonyan and Andrew Zisserman
of the Visual Geometry Group Lab of Oxford University in 2014 in the
paper VERY DEEP CONVOLUTIONAL NETWORKS FOR
LARGE-SCALE IMAGE RECOGNITION.
• This architecture achieved top-5 test accuracy of 92.7% in ImageNet,
which has over 14 million images belonging to 1000 classes.
• Versions-VGG-11, VGG-13, VGG-16, VGG-19
Cont.,
• The region in the input space that a particular CNN's feature is
affected by is called Receptive field.
• The size of the output activation map is the same as the input
image dimensions and the spatial resolution is preserved.
• Three consecutive 3x3 filters are equal to a receptive filed of 7x7.
• Two consecutive 3x3 filters are equal to a receptive field of 5x5.
• Small filters with activation layers makes the decision function
more discriminative.
• This would reduce the tendency of the network to over-fit during
the training exercise.
VGG-16
GoogleNet
• Inception leverages feature
• Google Net was proposed by detection at different scales
research team at Google. through convolutions with
different filters.
• Winner of the ILSVRC 2014 image
classification challenge. • Reducing the input size between
the inception module is another
• 1×1 convolutions in the middle of effective method of lessening the
the architecture and global network's computational load.
average pooling.
• Extract distinct features parallelly
• Inception module is a specialized and finally concatenate them
neural network architecture. later.
• GoogLeNet is a 22-layer deep • Model learns local and abstract
convolutional neural network. features which in turn enhances
the model performance.
GoogleNet Architecture
Auxilary Classifiers
• Auxilary Classifiers are added to the intermediate layers of the
architecture, namely the third (Inception 4[a]) and sixth
(Inception4[d]).
• The purpose of an auxiliary classifier is to perform a classification
based on the inputs within the network's midsection and add the
loss calculated during the training back to the total loss of the
network.
• An auxiliary classifier consists of an average pool layer, a conv
layer, two fully connected layers, a dropout layer(70%), and finally
a linear layer with a softmax activation function.
ResNet-50
• Residual Networks is a classic
neural network used as a
backbone for many computer
vision tasks.
• This model was the winner of
ImageNet challenge in 2015. • ResNet-18, ResNet-34, ResNet-
50, ResNet-101, ResNet-110,
• Deep networks are hard to train
because of the notorious ResNet-152, ResNet-164,
vanishing gradient problem — as ResNet-1202
the gradient is back-propagated • ResNet first introduced the
to earlier layers, repeated concept of skip connection/
multiplication may make the shortcut connections that leads to
gradient extremely small.
identity mapping.
Architecture of ResNet-50 deep learning
model
Dimension Table
Identity Block
Convolution Block
Identity block & Convolutional block
Cont.,
• The addition of the identity connection does not introduce extra
parameters.
• Extra Zero entries should be padded for increasing dimensions.
• Skip connections mitigate the problem of vanishing gradient by
allowing this alternate shortcut path for gradient to flow through.
• They allow the model to learn an identity function which ensures
that the higher layer will perform similar to the lower layer.
Pre-trained models
others
Learnings in different Layers
When we can go for pre-trained models?
• Size of the Data set is small while
the Data similarity is very high.
• Size of the data is small as well as
data similarity is very low
• Size of the data set is large
however the Data similarity is very
low
• Size of the data is large as well as
there is high data similarity
Tensorflow flower Dataset
References
• Mohamed Elgendy, Deep Learning for Vision Systems, Manning
Publications.