0% found this document useful (0 votes)
36 views92 pages

Module 3 CNN

The document provides an overview of convolutional neural networks including their architecture, components, and training process. It describes the concepts of convolution, CNN layers, and operations like stride and padding in detail.

Uploaded by

itsnavani2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views92 pages

Module 3 CNN

The document provides an overview of convolutional neural networks including their architecture, components, and training process. It describes the concepts of convolution, CNN layers, and operations like stride and padding in detail.

Uploaded by

itsnavani2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

MODULE 3

CONVOLUTIONAL NEURAL NETWORK

BY
Dr REEMA MATHEW A
PROFESSOR, CSE
VIMAL JYOTHI ENGG COLLEGE
MOB:9645527132, [email protected]

Dr Reema Mathew A

Agenda-25/9/2023
Module 3 Syllubus
CNN Introduction
CNN Components-Overall idea
CNN Architecture
CNN training steps
CNN-Detailed explanation
Input image
Concept of convolution
Convolution of images
CNN Layers
Dr Reema Mathew A
Dr Reema Mathew A

A Convolutional Neural Network (CNN) is a type of Deep Learning


neural network architecture commonly used in Computer Vision.
Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
CNN ARCHITECTURE

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

CNN ARCHITECTURE

Dr Reema Mathew A
Back propagation with example

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Convolutional Neural Network
➢ Convolutional Neural Networks are a class of Deep Learning architectures
that have been widely used for image recognition tasks.
➢ In the convolutional neural networks, the input is depicted as a volume,
which is basically a Mx Nx 3 array of colour pixels; each colour pixel is
associated with three values that correspond to Red, Green and Blue colour
compoments of RGB image at a specified spatial location.
➢ A pixel has value ( IR,IG,IB ) which is determined by the combination of
intensities stored in red colour plane, green colour plane and blue colour
plane, respectively.
➢ An RGB image has three channels, and a greyscale image has one channel.
Therefore, a greyscale image is represented by a MXN array of pixel
Dr Reema Mathew A
intensities, while an RGB image is represented by MxNx 3 array of pixel
intensities (Fig. 3.2).

Input image

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Concept of convolution
Convolutional Neural Networks (CNN, or ConvNet) are a type of
multi-layer neural network that is meant to extract visual patterns
from pixel images.
In CNN, ‘convolution’ is referred to as the mathematical function. It’s
a type of linear operation in which you can multiply two functions to
create a third function that expresses how one function’s shape can be
changed by the other.
In simple terms, two images that are represented in the form of two
matrices, are multiplied to provide an output that is used to extract
information from the image.
y(k) = r(k) * h(k)
Dr Reema Mathew A

The response sequence y(k) can be viewed as a weighted


average of the input sequence r(k); a filtering operation
performed with weights provided by the filter (h(k)).

Dr Reema Mathew A
Dr Reema Mathew A

In general, convolution is defined for any functions for which the
summation as per eqn. (3.1), is defined; and is used for variety of
applications.
In image processing applications in the field of computer vision, the
input is a multidimensional array of data, and the filter is a
multidimensional array of parameters.
The terminology commonly used refers to filter as kernel as well.
Usually both the input and the kernel are zero everywhere except for
a finite set of points.
This means that, in practice, we can implement the infinite
summation (refer to eqn. (3.1)) over a finite number of array elements
Dr Reema Mathew A
Convolution-Definition for images
If we use a two-dimensional image I as the input, we will use a two-
dimensional kernel K; the definition of convolution operation then
takes the form,

We have used uppercase letters I, J and K to denote an image or a


kernel; and the lowercase letters (x and y) and (i and j) to denote
indices or positions in an image or a kernel.
K is the filter kernel with m rows and n columns; (i, j) are its
coordinates.
 I is the original image with M rows and N columns; (x, y) are its
coordinates.
J(x, y) is the (x, y)th element of the filtered image obtained by
Dr Reema Mathew A

convolving the original image with an appropriate filter kernel; the


summation carried out over the finite elements (j) of the kernel.

From eqn. (3.3) we observe that convolution basically performs a


'weighted average of the pixels in the neighbourhood specified by the
kernel.
That is, it multi- plies the value of each nearby pixel by the weight
given by the kernel, then adds all these values together.

Why Convolutions
• Parameter sharing: a feature detector (such as a vertical edge
detector) that’s useful in one part of the image is probably useful in
another part of the image.
• Sparsity of connections: in each layer, each output value depends
only on small number of inputs.
~
Dr Reema Mathew A
LAYERS IN CNN
INPUT LAYER
The input layer consists of raw pixel values of the image to be
classified.
 An image is represented by three colour channels of Red. Green and
Blue; each channel is usually represented by a square matrix of pixel
values ranging from 0 to 255 (Figs 3.1 and 3.2)
The image matrix is flattened before being passed on to the traditional
neural networks. Flattening is not required for processing by a CNN.
The image matrix itself forms the input layer of a CNN.

Dr Reema Mathew A

Convolutional layer
The convolutional layer is used after input layer. It derives an output
(feature map) using filter kernels that operate only on local regions in
the input.
 In the fully-connected neural networks, we connect each input pixel
to each neuron in the hidden layer with a separate parameter
describing the interaction between each input unit and each hidden
unit.
Thus every hidden unit interacts with every input unit. In a fully-
connected network, the input is depicted as a vertical line of neurons.

Dr Reema Mathew A
Suppose that we had a fully-connected first layer with 784 (28 28)
input neurons, and 30 hidden neurons. Then every hidden neuron is
connected sis to 784 input neurons and a total of 784 x 30= 23,550
weights are involved.
In a convolutional neural network, we consider input as 28 x 28
square of neurons. Here we don't connect every input unit to every
hidden unit.
Instead we only make connections in small localized regions of the
input image, say, for example 3 x 3 region corresponding to 9 input
units.
That region in the input image is called local receptive field for the
hidden unit. It is a little window on the input units. Each connection
learns a weight.
We can think of that particular hidden unit as trying to analyze its
particular local receptive field. Each hidden unit connects to its local
Dr Reema Mathew A

receptive fie with 3x3-9 weights; these weights define a filter/kernel.

This sparse connectivity in convolutional neural networks reduces


the memory requirements which requires fewer operations to
compute the output, thereby giving rise to better performance on
machine learning tasks.
To sum up: the spatial extent of the local connectivity is a
hyperparameter. receptive field of a neuron, which defines a kernel
filter.
The individual neuron a feature map learn to analyze their respective
receptive fields using kernels with shared weights. Kernel size is an
important hyperparameter.
.

Dr Reema Mathew A
Dr Reema Mathew A

The kernel is placed in a starting position (top-left corner) of the input


image.
The dot product between the weights of the kernel and the input pixel
intensity at that position is calculated.
The value derived from this dot product becomes the first hidden
neuron (top-left corner of the feature map):
We then slide the kernel across the input surface (width-wise as well
as heightwise) in parallel, to capture meaningful characteristics

Dr Reema Mathew A
Stride
 Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we
move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at
a time and so on. The below figure shows convolution would work with a stride of 2.

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Convolution Operation
Basic Convolution Operation
Step 1: overlay the filter to the input, perform
element wise multiplication, and add the result.

Dr Reema Mathew A

Step 2: move the overlay right one position (or according to the stride
setting), and do the same calculation above to get the next result. And
so on.

Dr Reema Mathew A
Stride
Stride governs how many cells the filter is moved in the input to
calculate the next cell in the result.

Dr Reema Mathew A

Dr Reema Mathew A
Zero Padding
 Add a border of pixel values, with all zero values, along both the axes of
feature map.
 Padding has the following benefits:
 It allows us to use a CONV layer without necessarily shrinking the height
and width of the volumes. This is important for building deeper networks,
since otherwise the height/width would shrink as we go to deeper layers.
 It helps us keep more of the information at the border of an image. Without
padding, very few values at the next layer would be affected by pixels as
the edges of an image.
 Some padding terminologies:
• “valid” padding: no padding
• “same” padding: padding so that the output dimension is the same as the
input Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

If we have an input of size W x W x D and Dout number


of kernels with a spatial size of F with stride S and amount
of padding P, then the size of output volume can be
determined by the following formula:

Dr Reema Mathew A
If we have an activation map of size W x W x D, a pooling
kernel of spatial size F, and stride S, then the size of
output volume can be determined by the following
formula:

Dr Reema Mathew A

Hyperparameters

Size of the kernel/Filter


Nonlinearity-ReLu/tanh/sigmoid
Stride
Zero padding

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
POOLING LAYER
 A pooling layer receives the result from a convolutional layer and
compresses it. The filter of a pooling layer is always smaller than a feature
map.
 The pooling operation involves sliding a two-dimensional filter over each
channel of feature map and summarising the features lying within the
region covered by the filter.
For a feature map having dimensions nh x nw x nc, the dimensions of
output obtained after a pooling layer is
 (nh - f + 1) / s x (nw - f + 1)/s x nc
-> nh- height of feature map
-> nw - width of feature map
-> nc - number of channels in the feature
map
-> f - size of filter
Dr Reema Mathew A

-> s - stride length

Why to use Pooling Layers?


Pooling layers are used to reduce the dimensions of the
feature maps.
Thus, it reduces the number of parameters to learn and the
amount of computation performed in the network.
The pooling layer summarises the features present in a
region of the feature map generated by a convolution
layer.
So, further operations are performed on summarised
features instead of precisely positioned features generated
by the convolution layer. This makes the model more
robust to variations in the position of the features in the
Dr Reema Mathew A

input image.
Types of Pooling Layers:
 Max Pooling
Max pooling is a pooling operation that selects the maximum element
from the region of the feature map covered by the filter. Thus, the
output after max-pooling layer would be a feature map containing the
most prominent features of the previous feature map.

Dr Reema Mathew A

Average Pooling
Average pooling computes the average of the elements present in the
region of feature map covered by the filter.
average pooling gives the average of features present in a patch.
Average pooling smooths the harsh edges of a picture and is used
when such edges are not important.

Dr Reema Mathew A
Min Pooling
In this type of pooling, the summary of the features in a
region is represented by the minimum value in that region.
It is mostly used when the image has a light background
since min pooling will select darker pixels.

Dr Reema Mathew A

Example:Type of pooling?

Dr Reema Mathew A
Interesting properties of pooling layer:
it has hyper-parameters:
size (f)
stride (s)
type (max or avg)
but it doesn’t have parameter; there’s nothing for gradient
descent to learn
When done on input with multiple channels, pooling reduces the
height and width (nW and nH) but keeps nC unchanged:

Dr Reema Mathew A

Advantages of Pooling Layer:


Dimensionality reduction: The main advantage of pooling
layers is that they help in reducing the spatial dimensions of the
feature maps. This reduces the computational cost and also helps
in avoiding overfitting by reducing the number of parameters in
the model.
Translation invariance: Pooling layers are also useful in
achieving translation invariance in the feature maps. This means
that the position of an object in the image does not affect the
classification result, as the same features are detected regardless
of the position of the object.
Feature selection: Pooling layers can also help in selecting the
most important features from the input, as max pooling selects
the most salient features and average pooling preserves more
Dr Reema Mathew A

information.
Disadvantages of Pooling Layer:
Information loss: One of the main disadvantages of pooling layers
is that they discard some information from the input feature maps,
which can be important for the final classification or regression
task.
Over-smoothing: Pooling layers can also cause over-smoothing of
the feature maps, which can result in the loss of some fine-grained
details that are important for the final classification or regression
task.
Hyperparameter tuning: Pooling layers also introduce
hyperparameters such as the size of the pooling regions and the
stride, which need to be tuned in order to achieve optimal
Dr Reema Mathew A
performance. This can be time-consuming and requires some
expertise in model building.

Dr Reema Mathew A
Fully Connected Layer
Neurons in this layer have full connectivity with all neurons in
the preceding and succeeding layer as seen in regular FCNN.
This is why it can be computed as usual by a matrix
multiplication followed by a bias effect.
The FC layer helps to map the representation between the input
and the output.

Dr Reema Mathew A

Dr Reema Mathew A
Non-Linearity Layers
Since convolution is a linear operation and images are far from linear,
non-linearity layers are often placed directly after the convolutional
layer to introduce non-linearity to the activation map.
There are several types of non-linear operations, the popular ones
being:
1. Sigmoid
2.Tanh
3.ReLU

Dr Reema Mathew A

Dr Reema Mathew A
Calculate the size of the output volumes of all
convolutional and pooling layers in the following CNN
architecture?
F=5 F=5
S=1 S=1
F=2 F=2
P=2 P=2
S=2 S=2
No of kernels=16 No of kernels=32
P=0 P=0

INPUT CONV 1 POOL 1 CONV 2 POOL 2


IMAGE

28X28X1
W1XH1XD1 W2XH2XD2 ? W3XH3XD3 ? W4XH4XD4 ? W5XH5XD5 ?

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Calculate the size of the output volumes of all
convolutional and pooling layers in the following CNN
architecture?
F=6 F=6
S=1 S=1
F=2 F=2
P=2 P=2
S=2 S=2
No of kernels=32 No of kernels=32
P=0 P=0

INPUT CONV 1 POOL 1 CONV 2 POOL 2


IMAGE

34X34X1
W1XH1XD1 W2XH2XD2 ? W3XH3XD3 ? W4XH4XD4 ? W5XH5XD5 ?

Dr Reema Mathew A

Understanding and Calculating the number of


Parameters in Convolution Neural Networks (CNNs)

Basically, the number of parameters in a given


layer is the count of “learnable” elements for a
filter parameters for the filter for that layer.
Parameters in general are weights that are learnt
during training.

Dr Reema Mathew A
Calculate the number of parameters in each layer of
the following CNN architecture?

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Problem
 Consider the following architecture of a CNN.
• Input:28x28x1
• First Conv layer:Two 5x5 kernels with weights and bias parameters(no padding, unit
stride); ReLU nonlinearity
• First Max-pooling layer:Kernel size 2x2 and stride=2
• Second Conv layer:Four 7x7 kernels with weights and bias parameters(no padding, unit
stride); ReLU
• Second Max-Pooling layer:Kernel size 2x2 and stride=2
• Flatten layer:Vectorizing feature maps of previous layer and concatenating, resulting in
vector z
• Fully connected (FC) layer: Weights and biases for 10 class classification, resulting in
activation vector a
• Output layer with outputs
 (i)For each
Dr Reemalayer,
Mathew A give shape of the output and the number of parameters.

 (ii)Describe the softmax activation function for output layer.

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Flattening

Dr Reema Mathew A
Intuition behind flattening layer is to converts data into 1-
dimentional array for feeding next layer. we flatted output
of convolutional layer into single long feature vector.
which is connected to final classification model, called
fully connected layer. let’s suppose we’ve [5,5,5] pooled
feature map are flattened into 1x125 single vector. So,
flatten layers converts multidimensional array to single
dimensional vector.

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Agenda-6/10/23-

Motivation behind CNN


Variants of convolution functions
Structured outputs, Data types
Efficient convolution algorithms

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Convolution with stride

Dr Reema Mathew A

Convolution with padding

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Unshared convolution

Dr Reema Mathew A

Dr Reema Mathew A
Tiled convolution

Dr Reema Mathew A

Dr Reema Mathew A
Dilated Convolution

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Sparse Connectivity

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

CNN LAYERS

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Agenda-7/10/2023

Structured Output
Data types
Efficient convolution algorithms
Apllications of CNN

Dr Reema Mathew A

 a) Tensors:
 In the context of convolutional operations, tensors refer to the multi-
dimensional arrays that store the data. In the case of image processing, a 2D
tensor can represent a grayscale image, and a 3D tensor can represent a color
image (with channels for red, green, and blue).
 During convolutional operations in a neural network, these tensors are
convolved with learnable filters or kernels to extract features from the input
data. The output of a convolutional operation is also a tensor, and the depth of
this output tensor corresponds to the number of filters used in the
convolution.
 b) Kernel Flipping:
 Kernel flipping, also known as kernel or filter flipping, is a crucial concept in
convolutional operations. When a kernel is applied to an input tensor, it is
flipped horizontally and vertically before the element-wise multiplication with
the corresponding input values.
 The flipping is necessary because convolutional operations involve a sliding
window (kernel) moving across the input data. For the mathematical operation
to be a true convolution, the kernel must be flipped. This flipping ensures that
the convolution operation captures features and patterns regardless of their
Dr Reema Mathew A
orientation in the input data.
Dr Reema Mathew A

 c) Down Sampling:
 Downsampling is the process of reducing the spatial dimensions of an image or a
feature map. Two common techniques for downsampling are max pooling and
average pooling.
• Max Pooling:
• In max pooling, a window (usually 2x2) slides over the input tensor, and the
maximum value in each window is taken as the output for that region.
• Max pooling helps retain the most important features and reduces the spatial
dimensions.
• Average Pooling:
• In average pooling, similar to max pooling, a window slides over the input tensor,
but instead of taking the maximum value, the average of the values in the window
is computed.
• Average pooling provides a smoothed version of the input and also reduces
spatial dimensions.
 Downsampling is often used in convolutional neural networks to progressively reduce
the spatial resolution of feature maps, making the network more computationally
efficient and reducing the risk of overfitting. It also helps in creating a hierarchy of
Dr Reema Mathew A
features, where higher-level features are represented in lower spatial resolutions.
Structured outputs, Data types

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Applications of Convolutional Networks


 Image Classification – Search Engines, Social Media, Recommender
Systems
 Face Recognition RNN Applications include Social Media,
Identification, and Surveillance
 Medical Image Computing – Predictive Analytics, Healthcare Data
Science
 Health Risk Assessment Using Predictive Analytics
 Drug Discovery Using Predictive Analytics
 Precision Medicine Using Predictive Analytics
 https://www.analyticsvidhya.com/blog/2021/10/applications-of-convolutional-neural-
networkscnn/

Dr Reema Mathew A
Applications of Convolutional Networks

1. Content based image retrieval


2. Object localization
3. Object detection
4. Natural language and sequence learning
5. Video classification

Dr Reema Mathew A

 https://towardsdatascience.com/understanding-and-calculating-the-
number-of-parameters-in-convolution-neural-networks-cnns-fc88790d530d

 https://www.analyticsvidhya.com/blog/2018/12/guide-convolutional-
neural-network-cnn/
 https://medium.com/@iamvarman/how-to-calculate-the-number-of-
parameters-in-the-cnn-5bd55364d7ca
 https://stackoverflow.com/questions/42786717/how-to-calculate-the-
number-of-parameters-for-convolutional-neural-network
 https://towardsdatascience.com/convolutional-neural-networks-explained-
9cc5188c4939

Dr Reema Mathew A
 https://medium.com/inveterate-learner/deep-learning-book-chapter-9-convolutional-
networks-45e43bfc718d
 https://medium.com/analytics-vidhya/convolutional-neural-network-cnn-and-its-
application-all-u-need-to-know-f29c1d51b3e5
 https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-
resnet-and-more-666091488df5
 https://iphysresearch.github.io/blog/post/dl_notes/cs231n/cs231n_9/
 https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-
7baaaecccc96
 APPLNS CNN
 https://vitalflux.com/real-world-applications-of-convolutional-neural-networks/
 GAN
 https://towardsdatascience.com/a-brief-introduction-to-recurrent-neural-networks-
638f64a61ff4

Dr Reema Mathew A

Transfer Learning
Transfer learning is a machine learning method where a model already developed for a
task is reused in another task. Transfer learning is a popular approach in deep learning, as
it enables the training of deep neural networks with less data compared to having to create
a model from scratch.

Dr Reema Mathew A
Dr Reema Mathew A

Why Should You Use Transfer Learning?


 https://www.analyticsvidhya.com/blog/2021/10/understanding-transfer-learning-for-
deep-learning/

 Transfer learning offers a number of advantages, the most important


of which are reduced training time, improved neural network
performance, and the absence of a large amount of data.
 To train a neural model from scratch, a lot of data is typically needed,
but access to that data isn’t always possible – this is when transfer
learning comes in handy.

Dr Reema Mathew A
Approaches to Transfer Learning
 1. TRAINING A MODEL TO REUSE IT
 Imagine you want to solve task A but don’t have enough data to train a
deep neural network. One way around this is to find a related task B
with an abundance of data. Train the deep neural network on task B
and use the model as a starting point for solving task A. Whether you'll
need to use the whole model or only a few layers depends heavily on the
problem you're trying to solve.
 2. USING A PRE-TRAINED MODEL
 The second approach is to use an already pre-trained model. There are
a lot of these models out there, so make sure to do a little research. How
many layers to reuse and how many to retrain depends on the problem.
 Keras, for example, provides numerous pre-trained models that can be
used for transfer learning, prediction, feature extraction and fine-
tuning Dr Reema Mathew A

Evolution of CNN Pretrained Architectures

Dr Reema Mathew A
Dr Reema Mathew A

IMAGENET DATASET
The most highly-used
subset of ImageNet is the
ImageNet Large Scale
Visual Recognition
Challenge (ILSVRC) 2012-
2017 image classification
and localization dataset.

This dataset spans 1000


object classes and
contains 1,281,167 training
images, 50,000 validation
images and 100,000 test
images. This subset is
available on Kaggle.

Dr Reema Mathew A
Case Studies of Convolutional Architectures :
AlexNet, ZFNet, VGGNet19, ResNet-50

Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

Dr Reema Mathew A
Dr Reema Mathew A

ZFNet

Dr Reema Mathew A
• Input is 224x224x3 images.
• Next, 96 convolutions of 7x7 with a stride of 2 are performed, followed
by ReLU activation, 3x3 max pooling with stride 2 and local contrast
normalization.
• Followed by it are 256 filters of 3x3 each which are then again local contrast
normalized and pooled.
• The third and fourth layers are identical with 384 kernels of 3x3 each.
• The fifth layer has 256 filters of 3x3, followed by 3x3 max pooling with
stride 2 and local contrast normalization.
• The sixth and seventh layers have 4096 dense units each.
• Finally, we feed into a Dense layer of 1000 neurons i.e. the number of
classes in ImageNet.

Dr Reema Mathew A

 ZFNet architecture:

• 5 Convolutional layers.
• 3 Fully connected layers.
• 3 Overlapping Max pooling layers.
• ReLU as activation function for hidden layer.
• Softmax as activation function for output layer.
• 60,000,000 trainable parameters(60 Million).
• Cross-entropy as cost function
• Mini-batch gradient descent with Momentum optimizer.

Dr Reema Mathew A
ZFNET

Dr Reema Mathew A

Dr Reema Mathew A
VGG stands for Visual Geometry Group; it is a standard deep Convolutional
Neural Network (CNN) architecture with multiple layers. The “deep” refers to the
number of layers with VGG-16 or VGG-19 consisting of 16 and 19 convolutional
layers.

Dr Reema Mathew A

VGG 19

Dr Reema Mathew A
z

Dr Reema Mathew A

Dr Reema Mathew A
Skip Connections

Dr Reema Mathew A

Dr Reema Mathew A
Resnet 50

Dr Reema Mathew A

ResNet has fewer filters and is less complex


than a VGGNet. A 34-layer ResNet can
achieve a performance of 3.6 billion FLOPs,
and a smaller 18-layer ResNet can achieve
1.8 billion FLOPs, which is significantly
faster than a VGG-19 Network
Dr Reema Mathew A with 19.6
billion FLOPs
RESNET 50
• A 7×7 kernel convolution alongside 64 other kernels with a
2-sized stride.
• A max pooling layer with a 2-sized stride.
• 9 more layers—3×3,64 kernel convolution, another with
1×1,64 kernels, and a third with 1×1,256 kernels. These 3 layers
are repeated 3 times.
• 12 more layers with 1×1,128 kernels, 3×3,128 kernels, and
1×1,512 kernels, iterated 4 times.
• 18 more layers with 1×1,256 cores, and 2 cores 3×3,256 and
1×1,1024, iterated 6 times.
• 9 more layers with 1×1,512 cores, 3×3,512 cores, and 1×1,2048
cores iterated 3 times.
Dr Reema Mathew A

RESNET

Dr Reema Mathew A
Dr Reema Mathew A

 Transfer learning involves leveraging knowledge gained while solving one problem and applying it to
a different, but related, problem. There are some thumb rules or guidelines related to the sizes of
the target and base datasets in the context of transfer learning:
1. Small Target Dataset, Large Base Dataset:
1. Rule: When you have a small dataset for the target task but a large dataset for the base (pre-
training) task.
2. Explanation: In this scenario, the base model has learned rich and general features from a large
dataset. You can fine-tune the pre-trained model on the smaller target dataset to adapt it to the
specific characteristics of the target task. This is often referred to as feature extraction.
2. Small Target Dataset, Small Base Dataset:
1. Rule: When both the target and base datasets are small.
2. Explanation: In situations where data is limited for both tasks, it might be challenging to achieve
good performance with transfer learning. In such cases, you might still use pre-trained models as
a starting point, but the risk of overfitting to the small datasets is higher. Consider using data
augmentation techniques and regularization to mitigate this.
3. Large Target Dataset, Small Base Dataset:
1. Rule: When you have a large dataset for the target task but a small dataset for the base task.
2. Explanation: Having a large target dataset allows you to train a model from scratch effectively.
Transfer learning might still be useful to initialize the model with weights learned from the base
Dr Reema Mathew A
task, but the model may require further training to adapt to the specifics of the target task.
4.Large Target Dataset, Large Base Dataset:
1. Rule: When both the target and base datasets are large.
2. Explanation: In scenarios with abundant data for both tasks, transfer learning might still
offer benefits. You can use the pre-trained model as an initialization and fine-tune it on the
target dataset. Fine-tuning allows the model to adapt to the requirements of the target task
while leveraging the knowledge gained from the base task.
5.Domain Similarity:
1. Rule: Transfer learning is often more effective when the source (base) and target domains
are similar.
2. Explanation: If the domains differ significantly, the pre-trained features may not be as
relevant to the target task. In such cases, the model might require more adaptation or fine-
tuning on the target data.
6.Layer Choice:
1. Rule: Earlier layers in a neural network capture more generic features, while later layers
capture more task-specific features.
2. Explanation: Depending on the similarity of the base and target tasks, you might choose to
freeze or fine-tune specific layers. For highly similar tasks, fine-tuning more layers might be
beneficial, while for dissimilar tasks, freezing more layers and training only the top layers
Dr Reema Mathew A
might be preferred.

Advantages of Convolutional Neural Network (CNN)


1. Efficient image processing – One of the key advantages of CNNs is their ability to process images
efficiently. This is because they use a technique called convolution, which involves applying a filter to
an image to extract features that are relevant to the task at hand. By doing this, CNNs can reduce the
amount of information that needs to be processed, which makes them faster and more efficient than
other types of algorithms.
2. High accuracy rates – Another advantage of CNNs is their ability to achieve high accuracy rates. This
is because they can learn to recognize complex patterns in images by analyzing large datasets. This
means that they can be trained to recognize specific objects or features with a high degree of
accuracy, which makes them ideal for tasks like facial recognition or object detection.
3. Robust to noise – CNNs are also robust to noise, which means that they can still recognize patterns in
images even if they are distorted or corrupted. This is because they use multiple layers of filters to
extract features from images, which makes them more resilient to noise than other types of
algorithms.
4. Transfer learning – CNNs also support transfer learning, which means that they can be trained on
one task and then used to perform another task with little or no additional training. This is because
the features that are extracted by CNNs are often generic enough to be used for a wide range of
tasks, which makes them a versatile tool for many different applications.
5. Automated feature extraction – Finally, CNNs automate the feature extraction process, which means
that they can learn to recognize patterns in images without the need for manual feature engineering.
This makes them ideal for tasks where the features that are relevant to the task are not known in
advance, as Drthe CNN
Reema can
Mathew A learn to identify the relevant features through training.
Disadvantages of CNN
1. High computational requirements – One of the main disadvantages of CNNs is their high
computational requirements. This is because CNNs typically have a large number of layers
and parameters, which require a lot of processing power and memory to train and run. This
can make them impractical for use in some applications where resources are limited.
2. Difficulty with small datasets – CNNs also require large datasets to achieve high accuracy
rates. This is because they learn to recognize patterns in images by analyzing many examples
of those patterns. If the dataset is too small, the CNN may overfit, meaning it becomes too
specialized to the training dataset and performs poorly on new data.
3. CNNs also require large datasets to achieve high accuracy rates. This is because they learn
to recognize patterns in images by analyzing many examples of those patterns. If the
dataset is too small, the CNN may overfit, meaning it becomes too specialized to the
training dataset and performs poorly on new data. – Another disadvantage of CNNs is their
lack of interpretability. This means that it is difficult to understand how the CNN makes its
decisions. This can be problematic in applications where it is important to know why a
certain decision was made.
4. Vulnerability to adversarial attacks – CNNs are also vulnerable to adversarial attacks, which
involve intentionally manipulating the input data to fool the CNN into making incorrect
decisions. This can be a serious problem in applications like autonomous vehicles, where
safety is a critical concern.
5. Limited ability to generalize – Finally, CNNs have a limited ability to generalize to new
situations. This means that they may perform poorly on images that are very different from
Dr Reema Mathew A
those in the training dataset. This can be a problem in applications where the CNN needs to
work with a wide variety of images.

Strengths and weaknesses of convolutional neural networks

Advantages Disadvantages
Efficient image processing High computational requirements
High accuracy rates Difficulty with small datasets
CNNs also require large datasets to
achieve high accuracy rates. This is
because they learn to recognize
patterns in images by analyzing
many examples of those patterns. If
Robust to noise
the dataset is too small, the CNN
may overfit, meaning it becomes
too specialized to the training
dataset and performs poorly on
new data.
Transfer learning Vulnerability to adversarial attacks
Dr Reema Mathew A

Automated feature extraction Limited ability to generalize


Q.What happens if the stride of the convolutional layer
increases?

Increasing the stride will reduce the computational cost of


the convolutions. If we change the stride from one to two,
the reduction in the computational cost is about four.
 It happens because the stride affects the distance
between receptive fields in both dimensions.
 Similarly, if we triple the stride, we can expect a
computational cost reduction of roughly nine times.
The computation cost is reduced because the increase in
stride reduces the number of receptive fields extracted
from the input, consequently, the dimension of the
output is also reduced.
Dr Reema Mathew A

Q.Suppose that a CNN was trained to classify images into different


categories. It performed well on a validation set that was taken from the
same source as the training set but not on a testing set. What could be the
problem with the training of such a CNN? How will you ascertain the
problem? How can those problems be solved?
 If a convolutional neural network (CNN) performs well on a
validation set from the same source as the training set but fails to
generalize to a different testing set, it may be experiencing
overfitting. Overfitting occurs when a model learns the training
data too well, including its noise and specific characteristics, and
fails to generalize to new, unseen data.
 Here are some steps to ascertain and potentially address the
problem:

Dr Reema Mathew A
1. Data Mismatch:
1. Problem: The training and validation sets may be too similar, coming from the same source, leading to
overfitting to the specific characteristics of that source.
2. Solution: Ensure that the training, validation, and testing sets are diverse and representative of the
general population of images the model is expected to encounter. This may involve obtaining data from
different sources or randomizing the selection of samples.
2. Insufficient Data Augmentation:
1. Problem: The augmentation applied during training might not be sufficient to expose the model to
various viewpoints, lighting conditions, and transformations.
2. Solution: Increase the diversity of data augmentation techniques during training. This can include
random rotations, flips, zooms, and other transformations that simulate real-world variations.
3. Model Complexity:
1. Problem: The model may be too complex, capturing noise and outliers in the training data, which
hinders its generalization.
2. Solution: Consider simplifying the model architecture, using techniques such as reducing the number
of layers or adding regularization methods like dropout. This helps prevent the model from memorizing
the training data.
Dr Reema Mathew A

 4.Regularization Techniques:
1. Problem: Lack of regularization techniques may lead to overfitting.
2. Solution: Introduce regularization techniques such as dropout or L2 regularization to penalize
overly complex model parameters during training. These techniques help prevent the model
from fitting the noise in the training data.
5.Learning Rate and Optimization:
1. Problem: Incorrect choice of learning rate or optimization algorithm may hinder convergence
or cause overshooting.
2. Solution: Experiment with different learning rates and optimization algorithms. Techniques like
learning rate schedules or adaptive learning rate methods like Adam can be employed.
6.Evaluate on Multiple Metrics:
1. Problem: Evaluating solely on accuracy may not reveal the model's shortcomings.
2. Solution: Assess the model using various metrics, such as precision, recall, and F1 score,
especially if the classes are imbalanced. This provides a more comprehensive understanding of
the model's performance.

Dr Reema Mathew A
7.Cross-Validation:
1. Problem: The validation set might not be representative of the model's
generalization performance.
2. Solution: Use techniques like k-fold cross-validation to assess the
model's performance on multiple validation sets. This provides a more
robust estimate of its generalization capabilities.
 By systematically analyzing these factors and making adjustments, you can
improve the generalization performance of the CNN on the testing set. Keep
in mind that the process may involve iterative experimentation and fine-
tuning to achieve the best results.

Dr Reema Mathew A

Steps to Use Transfer Learning


 Training a Model to Reuse it
 Using a Pre Trained Model
 Extraction of Features
 Extraction of Features in Neural Networks

Dr Reema Mathew A

You might also like