Module 3 CNN
Module 3 CNN
BY
Dr REEMA MATHEW A
PROFESSOR, CSE
VIMAL JYOTHI ENGG COLLEGE
MOB:9645527132, [email protected]
Dr Reema Mathew A
Agenda-25/9/2023
Module 3 Syllubus
CNN Introduction
CNN Components-Overall idea
CNN Architecture
CNN training steps
CNN-Detailed explanation
Input image
Concept of convolution
Convolution of images
CNN Layers
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
CNN ARCHITECTURE
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
CNN ARCHITECTURE
Dr Reema Mathew A
Back propagation with example
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Convolutional Neural Network
➢ Convolutional Neural Networks are a class of Deep Learning architectures
that have been widely used for image recognition tasks.
➢ In the convolutional neural networks, the input is depicted as a volume,
which is basically a Mx Nx 3 array of colour pixels; each colour pixel is
associated with three values that correspond to Red, Green and Blue colour
compoments of RGB image at a specified spatial location.
➢ A pixel has value ( IR,IG,IB ) which is determined by the combination of
intensities stored in red colour plane, green colour plane and blue colour
plane, respectively.
➢ An RGB image has three channels, and a greyscale image has one channel.
Therefore, a greyscale image is represented by a MXN array of pixel
Dr Reema Mathew A
intensities, while an RGB image is represented by MxNx 3 array of pixel
intensities (Fig. 3.2).
Input image
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Concept of convolution
Convolutional Neural Networks (CNN, or ConvNet) are a type of
multi-layer neural network that is meant to extract visual patterns
from pixel images.
In CNN, ‘convolution’ is referred to as the mathematical function. It’s
a type of linear operation in which you can multiply two functions to
create a third function that expresses how one function’s shape can be
changed by the other.
In simple terms, two images that are represented in the form of two
matrices, are multiplied to provide an output that is used to extract
information from the image.
y(k) = r(k) * h(k)
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
In general, convolution is defined for any functions for which the
summation as per eqn. (3.1), is defined; and is used for variety of
applications.
In image processing applications in the field of computer vision, the
input is a multidimensional array of data, and the filter is a
multidimensional array of parameters.
The terminology commonly used refers to filter as kernel as well.
Usually both the input and the kernel are zero everywhere except for
a finite set of points.
This means that, in practice, we can implement the infinite
summation (refer to eqn. (3.1)) over a finite number of array elements
Dr Reema Mathew A
Convolution-Definition for images
If we use a two-dimensional image I as the input, we will use a two-
dimensional kernel K; the definition of convolution operation then
takes the form,
Why Convolutions
• Parameter sharing: a feature detector (such as a vertical edge
detector) that’s useful in one part of the image is probably useful in
another part of the image.
• Sparsity of connections: in each layer, each output value depends
only on small number of inputs.
~
Dr Reema Mathew A
LAYERS IN CNN
INPUT LAYER
The input layer consists of raw pixel values of the image to be
classified.
An image is represented by three colour channels of Red. Green and
Blue; each channel is usually represented by a square matrix of pixel
values ranging from 0 to 255 (Figs 3.1 and 3.2)
The image matrix is flattened before being passed on to the traditional
neural networks. Flattening is not required for processing by a CNN.
The image matrix itself forms the input layer of a CNN.
Dr Reema Mathew A
Convolutional layer
The convolutional layer is used after input layer. It derives an output
(feature map) using filter kernels that operate only on local regions in
the input.
In the fully-connected neural networks, we connect each input pixel
to each neuron in the hidden layer with a separate parameter
describing the interaction between each input unit and each hidden
unit.
Thus every hidden unit interacts with every input unit. In a fully-
connected network, the input is depicted as a vertical line of neurons.
Dr Reema Mathew A
Suppose that we had a fully-connected first layer with 784 (28 28)
input neurons, and 30 hidden neurons. Then every hidden neuron is
connected sis to 784 input neurons and a total of 784 x 30= 23,550
weights are involved.
In a convolutional neural network, we consider input as 28 x 28
square of neurons. Here we don't connect every input unit to every
hidden unit.
Instead we only make connections in small localized regions of the
input image, say, for example 3 x 3 region corresponding to 9 input
units.
That region in the input image is called local receptive field for the
hidden unit. It is a little window on the input units. Each connection
learns a weight.
We can think of that particular hidden unit as trying to analyze its
particular local receptive field. Each hidden unit connects to its local
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Stride
Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we
move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at
a time and so on. The below figure shows convolution would work with a stride of 2.
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Convolution Operation
Basic Convolution Operation
Step 1: overlay the filter to the input, perform
element wise multiplication, and add the result.
Dr Reema Mathew A
Step 2: move the overlay right one position (or according to the stride
setting), and do the same calculation above to get the next result. And
so on.
Dr Reema Mathew A
Stride
Stride governs how many cells the filter is moved in the input to
calculate the next cell in the result.
Dr Reema Mathew A
Dr Reema Mathew A
Zero Padding
Add a border of pixel values, with all zero values, along both the axes of
feature map.
Padding has the following benefits:
It allows us to use a CONV layer without necessarily shrinking the height
and width of the volumes. This is important for building deeper networks,
since otherwise the height/width would shrink as we go to deeper layers.
It helps us keep more of the information at the border of an image. Without
padding, very few values at the next layer would be affected by pixels as
the edges of an image.
Some padding terminologies:
• “valid” padding: no padding
• “same” padding: padding so that the output dimension is the same as the
input Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
If we have an activation map of size W x W x D, a pooling
kernel of spatial size F, and stride S, then the size of
output volume can be determined by the following
formula:
Dr Reema Mathew A
Hyperparameters
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
POOLING LAYER
A pooling layer receives the result from a convolutional layer and
compresses it. The filter of a pooling layer is always smaller than a feature
map.
The pooling operation involves sliding a two-dimensional filter over each
channel of feature map and summarising the features lying within the
region covered by the filter.
For a feature map having dimensions nh x nw x nc, the dimensions of
output obtained after a pooling layer is
(nh - f + 1) / s x (nw - f + 1)/s x nc
-> nh- height of feature map
-> nw - width of feature map
-> nc - number of channels in the feature
map
-> f - size of filter
Dr Reema Mathew A
input image.
Types of Pooling Layers:
Max Pooling
Max pooling is a pooling operation that selects the maximum element
from the region of the feature map covered by the filter. Thus, the
output after max-pooling layer would be a feature map containing the
most prominent features of the previous feature map.
Dr Reema Mathew A
Average Pooling
Average pooling computes the average of the elements present in the
region of feature map covered by the filter.
average pooling gives the average of features present in a patch.
Average pooling smooths the harsh edges of a picture and is used
when such edges are not important.
Dr Reema Mathew A
Min Pooling
In this type of pooling, the summary of the features in a
region is represented by the minimum value in that region.
It is mostly used when the image has a light background
since min pooling will select darker pixels.
Dr Reema Mathew A
Example:Type of pooling?
Dr Reema Mathew A
Interesting properties of pooling layer:
it has hyper-parameters:
size (f)
stride (s)
type (max or avg)
but it doesn’t have parameter; there’s nothing for gradient
descent to learn
When done on input with multiple channels, pooling reduces the
height and width (nW and nH) but keeps nC unchanged:
Dr Reema Mathew A
information.
Disadvantages of Pooling Layer:
Information loss: One of the main disadvantages of pooling layers
is that they discard some information from the input feature maps,
which can be important for the final classification or regression
task.
Over-smoothing: Pooling layers can also cause over-smoothing of
the feature maps, which can result in the loss of some fine-grained
details that are important for the final classification or regression
task.
Hyperparameter tuning: Pooling layers also introduce
hyperparameters such as the size of the pooling regions and the
stride, which need to be tuned in order to achieve optimal
Dr Reema Mathew A
performance. This can be time-consuming and requires some
expertise in model building.
Dr Reema Mathew A
Fully Connected Layer
Neurons in this layer have full connectivity with all neurons in
the preceding and succeeding layer as seen in regular FCNN.
This is why it can be computed as usual by a matrix
multiplication followed by a bias effect.
The FC layer helps to map the representation between the input
and the output.
Dr Reema Mathew A
Dr Reema Mathew A
Non-Linearity Layers
Since convolution is a linear operation and images are far from linear,
non-linearity layers are often placed directly after the convolutional
layer to introduce non-linearity to the activation map.
There are several types of non-linear operations, the popular ones
being:
1. Sigmoid
2.Tanh
3.ReLU
Dr Reema Mathew A
Dr Reema Mathew A
Calculate the size of the output volumes of all
convolutional and pooling layers in the following CNN
architecture?
F=5 F=5
S=1 S=1
F=2 F=2
P=2 P=2
S=2 S=2
No of kernels=16 No of kernels=32
P=0 P=0
28X28X1
W1XH1XD1 W2XH2XD2 ? W3XH3XD3 ? W4XH4XD4 ? W5XH5XD5 ?
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Calculate the size of the output volumes of all
convolutional and pooling layers in the following CNN
architecture?
F=6 F=6
S=1 S=1
F=2 F=2
P=2 P=2
S=2 S=2
No of kernels=32 No of kernels=32
P=0 P=0
34X34X1
W1XH1XD1 W2XH2XD2 ? W3XH3XD3 ? W4XH4XD4 ? W5XH5XD5 ?
Dr Reema Mathew A
Dr Reema Mathew A
Calculate the number of parameters in each layer of
the following CNN architecture?
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Problem
Consider the following architecture of a CNN.
• Input:28x28x1
• First Conv layer:Two 5x5 kernels with weights and bias parameters(no padding, unit
stride); ReLU nonlinearity
• First Max-pooling layer:Kernel size 2x2 and stride=2
• Second Conv layer:Four 7x7 kernels with weights and bias parameters(no padding, unit
stride); ReLU
• Second Max-Pooling layer:Kernel size 2x2 and stride=2
• Flatten layer:Vectorizing feature maps of previous layer and concatenating, resulting in
vector z
• Fully connected (FC) layer: Weights and biases for 10 class classification, resulting in
activation vector a
• Output layer with outputs
(i)For each
Dr Reemalayer,
Mathew A give shape of the output and the number of parameters.
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Flattening
Dr Reema Mathew A
Intuition behind flattening layer is to converts data into 1-
dimentional array for feeding next layer. we flatted output
of convolutional layer into single long feature vector.
which is connected to final classification model, called
fully connected layer. let’s suppose we’ve [5,5,5] pooled
feature map are flattened into 1x125 single vector. So,
flatten layers converts multidimensional array to single
dimensional vector.
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Agenda-6/10/23-
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Convolution with stride
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Unshared convolution
Dr Reema Mathew A
Dr Reema Mathew A
Tiled convolution
Dr Reema Mathew A
Dr Reema Mathew A
Dilated Convolution
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Sparse Connectivity
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
CNN LAYERS
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Agenda-7/10/2023
Structured Output
Data types
Efficient convolution algorithms
Apllications of CNN
Dr Reema Mathew A
a) Tensors:
In the context of convolutional operations, tensors refer to the multi-
dimensional arrays that store the data. In the case of image processing, a 2D
tensor can represent a grayscale image, and a 3D tensor can represent a color
image (with channels for red, green, and blue).
During convolutional operations in a neural network, these tensors are
convolved with learnable filters or kernels to extract features from the input
data. The output of a convolutional operation is also a tensor, and the depth of
this output tensor corresponds to the number of filters used in the
convolution.
b) Kernel Flipping:
Kernel flipping, also known as kernel or filter flipping, is a crucial concept in
convolutional operations. When a kernel is applied to an input tensor, it is
flipped horizontally and vertically before the element-wise multiplication with
the corresponding input values.
The flipping is necessary because convolutional operations involve a sliding
window (kernel) moving across the input data. For the mathematical operation
to be a true convolution, the kernel must be flipped. This flipping ensures that
the convolution operation captures features and patterns regardless of their
Dr Reema Mathew A
orientation in the input data.
Dr Reema Mathew A
c) Down Sampling:
Downsampling is the process of reducing the spatial dimensions of an image or a
feature map. Two common techniques for downsampling are max pooling and
average pooling.
• Max Pooling:
• In max pooling, a window (usually 2x2) slides over the input tensor, and the
maximum value in each window is taken as the output for that region.
• Max pooling helps retain the most important features and reduces the spatial
dimensions.
• Average Pooling:
• In average pooling, similar to max pooling, a window slides over the input tensor,
but instead of taking the maximum value, the average of the values in the window
is computed.
• Average pooling provides a smoothed version of the input and also reduces
spatial dimensions.
Downsampling is often used in convolutional neural networks to progressively reduce
the spatial resolution of feature maps, making the network more computationally
efficient and reducing the risk of overfitting. It also helps in creating a hierarchy of
Dr Reema Mathew A
features, where higher-level features are represented in lower spatial resolutions.
Structured outputs, Data types
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Applications of Convolutional Networks
Dr Reema Mathew A
https://towardsdatascience.com/understanding-and-calculating-the-
number-of-parameters-in-convolution-neural-networks-cnns-fc88790d530d
https://www.analyticsvidhya.com/blog/2018/12/guide-convolutional-
neural-network-cnn/
https://medium.com/@iamvarman/how-to-calculate-the-number-of-
parameters-in-the-cnn-5bd55364d7ca
https://stackoverflow.com/questions/42786717/how-to-calculate-the-
number-of-parameters-for-convolutional-neural-network
https://towardsdatascience.com/convolutional-neural-networks-explained-
9cc5188c4939
Dr Reema Mathew A
https://medium.com/inveterate-learner/deep-learning-book-chapter-9-convolutional-
networks-45e43bfc718d
https://medium.com/analytics-vidhya/convolutional-neural-network-cnn-and-its-
application-all-u-need-to-know-f29c1d51b3e5
https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-
resnet-and-more-666091488df5
https://iphysresearch.github.io/blog/post/dl_notes/cs231n/cs231n_9/
https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-
7baaaecccc96
APPLNS CNN
https://vitalflux.com/real-world-applications-of-convolutional-neural-networks/
GAN
https://towardsdatascience.com/a-brief-introduction-to-recurrent-neural-networks-
638f64a61ff4
Dr Reema Mathew A
Transfer Learning
Transfer learning is a machine learning method where a model already developed for a
task is reused in another task. Transfer learning is a popular approach in deep learning, as
it enables the training of deep neural networks with less data compared to having to create
a model from scratch.
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Approaches to Transfer Learning
1. TRAINING A MODEL TO REUSE IT
Imagine you want to solve task A but don’t have enough data to train a
deep neural network. One way around this is to find a related task B
with an abundance of data. Train the deep neural network on task B
and use the model as a starting point for solving task A. Whether you'll
need to use the whole model or only a few layers depends heavily on the
problem you're trying to solve.
2. USING A PRE-TRAINED MODEL
The second approach is to use an already pre-trained model. There are
a lot of these models out there, so make sure to do a little research. How
many layers to reuse and how many to retrain depends on the problem.
Keras, for example, provides numerous pre-trained models that can be
used for transfer learning, prediction, feature extraction and fine-
tuning Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
IMAGENET DATASET
The most highly-used
subset of ImageNet is the
ImageNet Large Scale
Visual Recognition
Challenge (ILSVRC) 2012-
2017 image classification
and localization dataset.
Dr Reema Mathew A
Case Studies of Convolutional Architectures :
AlexNet, ZFNet, VGGNet19, ResNet-50
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
Dr Reema Mathew A
ZFNet
Dr Reema Mathew A
• Input is 224x224x3 images.
• Next, 96 convolutions of 7x7 with a stride of 2 are performed, followed
by ReLU activation, 3x3 max pooling with stride 2 and local contrast
normalization.
• Followed by it are 256 filters of 3x3 each which are then again local contrast
normalized and pooled.
• The third and fourth layers are identical with 384 kernels of 3x3 each.
• The fifth layer has 256 filters of 3x3, followed by 3x3 max pooling with
stride 2 and local contrast normalization.
• The sixth and seventh layers have 4096 dense units each.
• Finally, we feed into a Dense layer of 1000 neurons i.e. the number of
classes in ImageNet.
Dr Reema Mathew A
ZFNet architecture:
• 5 Convolutional layers.
• 3 Fully connected layers.
• 3 Overlapping Max pooling layers.
• ReLU as activation function for hidden layer.
• Softmax as activation function for output layer.
• 60,000,000 trainable parameters(60 Million).
• Cross-entropy as cost function
• Mini-batch gradient descent with Momentum optimizer.
Dr Reema Mathew A
ZFNET
Dr Reema Mathew A
Dr Reema Mathew A
VGG stands for Visual Geometry Group; it is a standard deep Convolutional
Neural Network (CNN) architecture with multiple layers. The “deep” refers to the
number of layers with VGG-16 or VGG-19 consisting of 16 and 19 convolutional
layers.
Dr Reema Mathew A
VGG 19
Dr Reema Mathew A
z
Dr Reema Mathew A
Dr Reema Mathew A
Skip Connections
Dr Reema Mathew A
Dr Reema Mathew A
Resnet 50
Dr Reema Mathew A
RESNET
Dr Reema Mathew A
Dr Reema Mathew A
Transfer learning involves leveraging knowledge gained while solving one problem and applying it to
a different, but related, problem. There are some thumb rules or guidelines related to the sizes of
the target and base datasets in the context of transfer learning:
1. Small Target Dataset, Large Base Dataset:
1. Rule: When you have a small dataset for the target task but a large dataset for the base (pre-
training) task.
2. Explanation: In this scenario, the base model has learned rich and general features from a large
dataset. You can fine-tune the pre-trained model on the smaller target dataset to adapt it to the
specific characteristics of the target task. This is often referred to as feature extraction.
2. Small Target Dataset, Small Base Dataset:
1. Rule: When both the target and base datasets are small.
2. Explanation: In situations where data is limited for both tasks, it might be challenging to achieve
good performance with transfer learning. In such cases, you might still use pre-trained models as
a starting point, but the risk of overfitting to the small datasets is higher. Consider using data
augmentation techniques and regularization to mitigate this.
3. Large Target Dataset, Small Base Dataset:
1. Rule: When you have a large dataset for the target task but a small dataset for the base task.
2. Explanation: Having a large target dataset allows you to train a model from scratch effectively.
Transfer learning might still be useful to initialize the model with weights learned from the base
Dr Reema Mathew A
task, but the model may require further training to adapt to the specifics of the target task.
4.Large Target Dataset, Large Base Dataset:
1. Rule: When both the target and base datasets are large.
2. Explanation: In scenarios with abundant data for both tasks, transfer learning might still
offer benefits. You can use the pre-trained model as an initialization and fine-tune it on the
target dataset. Fine-tuning allows the model to adapt to the requirements of the target task
while leveraging the knowledge gained from the base task.
5.Domain Similarity:
1. Rule: Transfer learning is often more effective when the source (base) and target domains
are similar.
2. Explanation: If the domains differ significantly, the pre-trained features may not be as
relevant to the target task. In such cases, the model might require more adaptation or fine-
tuning on the target data.
6.Layer Choice:
1. Rule: Earlier layers in a neural network capture more generic features, while later layers
capture more task-specific features.
2. Explanation: Depending on the similarity of the base and target tasks, you might choose to
freeze or fine-tune specific layers. For highly similar tasks, fine-tuning more layers might be
beneficial, while for dissimilar tasks, freezing more layers and training only the top layers
Dr Reema Mathew A
might be preferred.
Advantages Disadvantages
Efficient image processing High computational requirements
High accuracy rates Difficulty with small datasets
CNNs also require large datasets to
achieve high accuracy rates. This is
because they learn to recognize
patterns in images by analyzing
many examples of those patterns. If
Robust to noise
the dataset is too small, the CNN
may overfit, meaning it becomes
too specialized to the training
dataset and performs poorly on
new data.
Transfer learning Vulnerability to adversarial attacks
Dr Reema Mathew A
Dr Reema Mathew A
1. Data Mismatch:
1. Problem: The training and validation sets may be too similar, coming from the same source, leading to
overfitting to the specific characteristics of that source.
2. Solution: Ensure that the training, validation, and testing sets are diverse and representative of the
general population of images the model is expected to encounter. This may involve obtaining data from
different sources or randomizing the selection of samples.
2. Insufficient Data Augmentation:
1. Problem: The augmentation applied during training might not be sufficient to expose the model to
various viewpoints, lighting conditions, and transformations.
2. Solution: Increase the diversity of data augmentation techniques during training. This can include
random rotations, flips, zooms, and other transformations that simulate real-world variations.
3. Model Complexity:
1. Problem: The model may be too complex, capturing noise and outliers in the training data, which
hinders its generalization.
2. Solution: Consider simplifying the model architecture, using techniques such as reducing the number
of layers or adding regularization methods like dropout. This helps prevent the model from memorizing
the training data.
Dr Reema Mathew A
4.Regularization Techniques:
1. Problem: Lack of regularization techniques may lead to overfitting.
2. Solution: Introduce regularization techniques such as dropout or L2 regularization to penalize
overly complex model parameters during training. These techniques help prevent the model
from fitting the noise in the training data.
5.Learning Rate and Optimization:
1. Problem: Incorrect choice of learning rate or optimization algorithm may hinder convergence
or cause overshooting.
2. Solution: Experiment with different learning rates and optimization algorithms. Techniques like
learning rate schedules or adaptive learning rate methods like Adam can be employed.
6.Evaluate on Multiple Metrics:
1. Problem: Evaluating solely on accuracy may not reveal the model's shortcomings.
2. Solution: Assess the model using various metrics, such as precision, recall, and F1 score,
especially if the classes are imbalanced. This provides a more comprehensive understanding of
the model's performance.
Dr Reema Mathew A
7.Cross-Validation:
1. Problem: The validation set might not be representative of the model's
generalization performance.
2. Solution: Use techniques like k-fold cross-validation to assess the
model's performance on multiple validation sets. This provides a more
robust estimate of its generalization capabilities.
By systematically analyzing these factors and making adjustments, you can
improve the generalization performance of the CNN on the testing set. Keep
in mind that the process may involve iterative experimentation and fine-
tuning to achieve the best results.
Dr Reema Mathew A
Dr Reema Mathew A