0% found this document useful (0 votes)
11 views

Lecture 6 Deep Learning Training and Testing 2025

The document discusses key concepts of Convolutional Neural Networks (CNNs) and Transfer Learning, highlighting the architecture of CNNs, including convolutional and pooling layers, and the importance of using pre-trained models to enhance performance on smaller datasets. It emphasizes the process of Transfer Learning, including normalization, model architecture loading, and fine-tuning, as well as considerations for training and testing CNNs such as data preprocessing, model selection, and overfitting prevention. Popular CNN architectures like VGG, ResNet, and AlexNet are also mentioned, showcasing their features and applications.

Uploaded by

dunaziad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 6 Deep Learning Training and Testing 2025

The document discusses key concepts of Convolutional Neural Networks (CNNs) and Transfer Learning, highlighting the architecture of CNNs, including convolutional and pooling layers, and the importance of using pre-trained models to enhance performance on smaller datasets. It emphasizes the process of Transfer Learning, including normalization, model architecture loading, and fine-tuning, as well as considerations for training and testing CNNs such as data preprocessing, model selection, and overfitting prevention. Popular CNN architectures like VGG, ResNet, and AlexNet are also mentioned, showcasing their features and applications.

Uploaded by

dunaziad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

&

CPCS432
Lecture 6.1 ⑪
Deep Learning
Training and Testing
Computer Vision

Dr. Arwa Basbrain


Summary of CNN Concepts
• A CNN consists of alternating convolutional and pooling layers with MLP at the end.
Every convolutional layer does not necessarily have a pooling layer.
• Convolution is a feature extraction process in the convolutional layer.
• A kernel of dimension k×k is defined to divide input images into grids.
• Filters, of the same dimension as the kernel, are multiplied with the pixels in the kernel,
and the results are summed over each pixel and each image channel. An optional bias is
added to the result to generate feature matrices.
• The pooling layer implements downsampling algorithms (max pooling or average
pooling) to downsample the features.
• The process is repeated for each pair of convolutional-pooling layers where output from
one pooling layer is fed as input to the next convolutional layer.
• The last convolutional/pooling layer feeds feature matrices to the input layer of the MLP.
• The MLP part of the network learns as a conventional MLP network.

CPCS432 Lecture 6 28/09/2023 7


Part 1: Transfer Learning
Intuition
weigher
transfer
Are

weightvoles
inere
trendset ab
e

model
ro make
Transfer Learning
out you use it testDina
• Transfer learning is a technique where knowledge gained from one task is transferred to
solve another similar task.
• This technique is particularly useful when we don't have a large dataset to train from scratch.
• Example: Using pre-trained models from ImageNet to fine-tune on specific datasets (e.g.,
cats vs. dogs).

CPCS432 Lecture 6 28/09/2023 10


Why Use Transfer Learning?

• As the number of training images increases, so does model accuracy.


• In many cases, we don’t have access to thousands of labelled images.
• Transfer learning allows us to use models pre-trained on large datasets (millions
of images) to solve smaller dataset tasks effectively.

CPCS432 Lecture 6 28/09/2023 11


Transfer Learning Intuition
• The features are found from one task may be useful for another task
• Transfer Learning took off in the field of computer vision
• ImageNet - Large-scale image dataset (millions of images, 1k categories)

CPCS432 Lecture 6 28/09/2023 12


Popular Architectures for
Transfer Learning

• VGG: Visual Geometry Group network, deep network with fixed filter sizes.
• ResNet: Residual Network, uses skip connections to solve vanishing gradient
problem.
• Both architectures are pre-trained on the ImageNet dataset (14 million images, 1,000
classes).

CPCS432 Lecture 6 28/09/2023 13


Examples of Popular CNNs
LeNet-5

The LeNet-5 CNN architecture, introduced in 1998 by LeCun et al. in their paper “Gradient-Based Learning Applied
to Document Recognition,” was mainly used for recognizing handwritten and machine-generated characters (optical
character recognition [OCR]) from documents.
It is a CNN consisting of seven layers. • There are two subsampling layers (S2 and S4).
• There is one fully connected layer (F6) and one output layer.
• The convolutional layers use 5×5 convolution kernels with stride 1.
• The subsampling layers are 2×2 average pooling layers.
• The entire network uses the TanH activation function except for the
output layer, which uses softmax.

CPCS432 Lecture 6 28/09/2023 14


Examples of Popular CNNs
The input size is 224×224×3 colour images. AlexNet

The features of AlexNet are as follows:


It is a deep CNN containing eight layers.
Convolution layer 1: Kernel 11×11, filters 96, strides 4×4, activation ReLU
The network has 60 million parameters
Pooling layer 1: MaxPooling with kernel size 3×3, strides 2×2
and 650,000 neurons, and it takes about 3
Convolution layer 2: Kernel 5×5, filters 256, strides 1×1, activation ReLU days to train on a GPU.
Pooling layer 2: MaxPooling with kernel size 3×3, strides 2×2

Convolution layer 3: Kernel 3×3, filters 384, strides 1×1, activation ReLU

Convolution layer 4: Kernel 3×3, filters 384, strides 1×1, activation ReLU

Convolution layer 5: Kernel 3×3, filters 384, strides 1×1, activation ReLU
Pooling layer 5: MaxPooling with kernel size 3×3, strides 2×2
The last three layers are a fully connected MLP.
All convolution layers use ReLU activation functions.
The output layer uses softmax activation.
There are 1,000 classes in the output layer.
CPCS432 Lecture 6 28/09/2023 15
Freeze : Filters values

Examples of Popular CNNs don't change

- Input size 224×224x3 during feature

·
Extraction


Convolution layer 1:kernel 3×3, filters 64, activation ReLU so when
Convolution layer 2: Kernel 3×3, filters 64, activation ReLU
VGG-16
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
i Apply
Convolution layer 3: Kernel 3×3, filters 128, activation ReLU transfer ItIt ishasa 13
CNN that consists of 16 layers.
convolutional layers
Convolution layer 4: Kernel 3×3, filters 128, activation ReLU
learningand
,
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2 3 fully connected dense layers.
Convolution layer 5: Kernel 3×3, filters 256, activation ReLU I
This network has 138 million parameters.
freeze
Convolution layer 6: Kernel 3×3, filters 256, activation ReLU
Convolution layer 7: Kernel 3×3, filters 256, activation ReLU that
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2 part
Convolution layer 8: Kernel 3×3, filters 512, activation ReLU
Convolution layer 9: Kernel 3×3, filters 512, activation ReLU
Convolution layer 10: Kernel 3×3, filters 512, activation ReLU
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Convolution layer 11: Kernel 3×3, filters 512, activation ReLU
Convolution layer 12: Kernel 3×3, filters 512, activation ReLU
Convolution layer 13: Kernel 3×3, filters 512, activation ReLU

Classification
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Fully connected layer 14 (MLP input layer): Flatten dense layer with input size 25088
Fully connected hidden layer 15: Dense layer with input size 4096
Fully connected output layer: Dense layer for 1,000 classes

CPCS432 Lecture 6 28/09/2023 16


Examples of Popular CNNs

GoogLeNet

The Inception architecture introduces several key ideas that allow it to achieve Inception
high performance, both in terms of accuracy and computational cost. Multiple convolutions in parallel branches
Instead of trying to choose different filter
sizes (1x1, 3x3, 5x5, etc.) just try them all!

CPCS432 Lecture 6 28/09/2023 17


Examples of Popular CNNs

ResNet
The key innovation of ResNet is the introduction of "residual blocks" that allow A CNN with branches (one branch is the
for training substantially deeper networks than what was previously possible. identity function, so the other learns the
residual)Variations: ResNet50, ResNet101,
Residual Block ResNet152, ResNet_v2, ResNeXt.

CPCS432 Lecture 6 28/09/2023 18


Transfer Learning Intuition
Transfer Learning in a Picture
freeze
• Freeze the "body" Train only the head
e

CPCS432 Lecture
Lecture6 6 Dr. Arwa Basbrain 28/09/2023 19
Transfer Learning Intuition

ofclassesa
• Transfer Learning in a Picture different
#
• Chop off the old "head", add a new head!

-
a
-

20
CPCS432 Lecture
Lecture6 6 Dr. Arwa Basbrain 28/09/2023 20
when pretrained
my
trained on
Transfer Learning Process model is
with pixels intensity
images
↑ Ranging froma
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.give
~
it images
2. Load the pre-trained model’s architecture and weights. from
Ranging
3. Discard the last layers, replacing them with freshly initialized layers. >
- Cuz there 0-255
may be changes must
4. Freeze the weights of the pre-trained layers and train the new layers. in the new you
fix it to
5. Create the model. match
6. Fine-tune the model over increasing epochs. >
- Train the model

CPCS432 Lecture 6 28/09/2023 21


Transfer Learning Process

1. Normalize input images using the same mean and standard deviation used in
the pre-trained model.
> So it can be
-

0-1 instead

CPCS432 Lecture 6 28/09/2023 22


Transfer Learning Process

1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.

↓ freeze dont change


dense
layer

CPCS432 Lecture 6 28/09/2023 23


Transfer Learning Process

1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.

freeze

CPCS432 Lecture 6 28/09/2023 24


Transfer Learning Process

1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.

feature Map

Feature
O
Vector
- Neurons -# of classes

CPCS432 Lecture 6 28/09/2023 25


Transfer Learning Process

1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.
4. Freeze the weights of the pre-trained layers and train the new layers.
LAYERS
All
kop
:
&

freeze
it trainable then values
if
i made
of
weights in filter will
change

CPCS432 Lecture 6 28/09/2023 26


Transfer Learning Process
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.
4. Freeze the weights of the pre-trained layers and train the new layers.
5. Create the model

CPCS432 Lecture 6 28/09/2023 27


Transfer Learning Process
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.
4. Freeze the weights of the pre-trained layers and train the new layers.
5. Create the model
6. Fine-tune the model over increasing epochs.

CPCS432 Lecture 6 28/09/2023 28


Choose Network Architecture
⑨ Dataset Similarity: When using transfer learning, ensure
the source dataset (e.g., ImageNet) has some similarity to
classificatioa your target dataset. For example, models trained on
al bea
only
chang ImageNet might not be effective for medical images
* · unless fine-tuned.
morefor
seer
seea r
yerbaafreeze"
take

CPCS432 Lecture 5 28/09/2023 29


Transfer Learning
Feature Extraction vs. Fine-Tuning:

• Fine-Tuning: Fine-tune some or all layers of the pre-trained model, allowing the
network to adapt to the specific dataset. This is especially useful if the dataset is large
and similar to the one the pre-trained model was originally trained on.

• Feature Extraction: Use the pre-trained model as a fixed feature extractor, freezing the
convolutional base and training only the classifier on top.

CPCS432 Lecture 6 28/09/2023 30


Part 2: CNN
Training and Testing
Training and Testing CNN

When training and testing Convolutional Neural Networks (CNNs), several key considerations should be
taken into account to ensure effective training, good generalization to unseen data, and reliable model
evaluation. Here are the main factors to consider:

• Data Preprocessing: Ensure clean, balanced, and augmented data.


• Model Selection: Choose an architecture suitable for the complexity and size of your dataset.
• Overfitting Prevention: Apply regularization techniques, early stopping, and data augmentation.
• Evaluation: Use a variety of metrics beyond accuracy, and ensure proper train/test/validation splits.
• Hardware Optimization: Utilize GPUs/TPUs for faster training.
• Transfer Learning: Leverage pre-trained models if your dataset is small or similar to a common large
dataset.
• Model Interpretation

CPCS432 Lecture 6 28/09/2023 32


Training and Testing CNN
Data Preprocessing

Normalization: Normalize image pixel values to a standard range (e.g., 0-1 or -1 to 1). CNNs perform
better with normalized inputs, and this ensures faster and more stable convergence during training.
Data Augmentation: Apply data augmentation techniques like rotations, flips, zooms, and shifts during
training. This helps the model generalize better by artificially increasing the diversity of the training set
and reducing overfitting.
Class Imbalance: Ensure balanced datasets. If one class has significantly more data than others, the model
may become biased toward that class. Consider using oversampling, undersampling, or class weighting to
mitigate this. Eclass with less dat (Ait) class with more

Train/Validation/Test Split: Ensure that you split the data properly into training, validation, and test sets. data
(Remove
Typical splits are 70%-80% training, 10%-15% validation, and 10%-15% testing. from it

CPCS432 Lecture 6 28/09/2023 33


Training and Testing CNN
Network Architecture

Model Complexity: Choose the appropriate architecture size based on your data. Large
datasets can handle complex models (e.g., ResNet, Inception), but smaller datasets may
require simpler models to avoid overfitting.
Transfer Learning: For smaller datasets, consider using pre-trained models (e.g., VGG16,
ResNet, MobileNet). Transfer learning helps leverage knowledge from models trained on
large datasets (like ImageNet).
Parameter Initialization: When training from scratch, weight initialization is important.
Good initialization helps the model converge faster. Modern CNN libraries handle this well,

T
but it's worth checking.

CPCS432 Lecture 6 28/09/2023 34


Training and Testing CNN
Overfitting and Underfitting
Overfitting: This happens when the model performs well on the training data but poorly on validation/test data.
To mitigate overfitting:
• Regularization: Use techniques like dropout, weight decay (L2 regularization), or data
augmentation.
• Early Stopping: Monitor validation loss and stop training when the model starts overfitting.
Underfitting: If your model is underfitting (i.e., not performing well on both training and test data), try:
• Increasing the model complexity (more layers, more neurons).
• Training longer (increasing epochs).
• Providing more relevant features or more training data.

CPCS432 Lecture 6 28/09/2023 35


Training and Testing CNN
Hyperparameter Tuning

Learning Rate: Choosing an appropriate learning rate is crucial. If it's too large, the model might
converge too quickly to a suboptimal solution or may not converge at all. If it's too small, the
training process might be slow, and the model might get stuck in local minima.
• Consider using learning rate schedulers or optimizers like Adam, which adapt the
learning rate during training.
Batch Size: Larger batch sizes make training faster, but they can lead to overfitting. Smaller
batch sizes introduce noise but help generalization. Common batch sizes are 32, 64, or 128, but
experimentation is key.
Number of Epochs: Train for enough epochs to allow the model to learn but monitor validation
accuracy/loss to prevent overfitting. Use early stopping based on validation performance.

CPCS432 Lecture 6 28/09/2023 36


a

Training and Testing CNN


Model Evaluation
~ demotion cross
to
do

valid
make it
.

moreAl .

Cross-Validation: Consider using cross-validation (e.g., k-fold) to ensure that the model
generalizes well to different subsets of the data. This is especially useful for small datasets.
Accuracy, Precision, Recall, F1-Score: Accuracy alone may not be sufficient, especially with
imbalanced datasets. Use metrics like precision, recall, and F1-score to get a better understanding
of model performance.
Confusion Matrix: Analyze the confusion matrix to understand which classes the model is
performing well or poorly on. It helps detect specific misclassifications.
ROC Curve / AUC: For binary classification tasks, the ROC curve and AUC provide insight into
how well the model separates classes at various thresholds.

CPCS432 Lecture 6 28/09/2023 37


Training and Testing CNN
Training Duration and Hardware

GPU/TPU Usage: CNNs are computationally expensive. Utilize hardware accelerators like
GPUs (or TPUs in Google Colab) to speed up training. Training large CNN models on a CPU can
be very slow.
Checkpointing: Regularly save model checkpoints to avoid losing progress in case of hardware
failures. This also allows you to resume training from a specific point.

CPCS432 Lecture 6 28/09/2023 38


Training and Testing CNN
Data Size and Augmentation

Sufficient Data: Ensure you have a sufficient amount of data for training. If not, consider data
augmentation, transfer learning, or synthetic data generation.
Data Augmentation: Apply augmentation (e.g., rotation, zoom, flipping, cropping) during
training to artificially increase the training set size, improve model generalization, and avoid
overfitting.

CPCS432 Lecture 6 28/09/2023 39


Training and Testing CNN
Data Size and Augmentation

Sufficient Data: Ensure you have a sufficient amount of data for training. If not, consider data
augmentation, transfer learning, or synthetic data generation.
Data Augmentation: Apply augmentation (e.g., rotation, zoom, flipping, cropping) during
training to artificially increase the training set size, improve model generalization, and avoid
overfitting.

CPCS432 Lecture 6 28/09/2023 40


Training and Testing CNN
Model Interpretation

Saliency Maps/Grad-CAM: These techniques help visualize which parts of the image the model
focuses on when making predictions. This can help ensure that the model is learning meaningful
patterns and not focusing on irrelevant details.
Misclassification Analysis: Review misclassified examples to understand patterns in the model’s
errors. This might highlight issues with the data, model architecture, or training process.

CPCS432 Lecture 6 28/09/2023 41

You might also like