Lecture 6 Deep Learning Training and Testing 2025
Lecture 6 Deep Learning Training and Testing 2025
CPCS432
Lecture 6.1 ⑪
Deep Learning
Training and Testing
Computer Vision
weightvoles
inere
trendset ab
e
model
ro make
Transfer Learning
out you use it testDina
• Transfer learning is a technique where knowledge gained from one task is transferred to
solve another similar task.
• This technique is particularly useful when we don't have a large dataset to train from scratch.
• Example: Using pre-trained models from ImageNet to fine-tune on specific datasets (e.g.,
cats vs. dogs).
• VGG: Visual Geometry Group network, deep network with fixed filter sizes.
• ResNet: Residual Network, uses skip connections to solve vanishing gradient
problem.
• Both architectures are pre-trained on the ImageNet dataset (14 million images, 1,000
classes).
The LeNet-5 CNN architecture, introduced in 1998 by LeCun et al. in their paper “Gradient-Based Learning Applied
to Document Recognition,” was mainly used for recognizing handwritten and machine-generated characters (optical
character recognition [OCR]) from documents.
It is a CNN consisting of seven layers. • There are two subsampling layers (S2 and S4).
• There is one fully connected layer (F6) and one output layer.
• The convolutional layers use 5×5 convolution kernels with stride 1.
• The subsampling layers are 2×2 average pooling layers.
• The entire network uses the TanH activation function except for the
output layer, which uses softmax.
Convolution layer 3: Kernel 3×3, filters 384, strides 1×1, activation ReLU
Convolution layer 4: Kernel 3×3, filters 384, strides 1×1, activation ReLU
Convolution layer 5: Kernel 3×3, filters 384, strides 1×1, activation ReLU
Pooling layer 5: MaxPooling with kernel size 3×3, strides 2×2
The last three layers are a fully connected MLP.
All convolution layers use ReLU activation functions.
The output layer uses softmax activation.
There are 1,000 classes in the output layer.
CPCS432 Lecture 6 28/09/2023 15
Freeze : Filters values
·
Extraction
↓
Convolution layer 1:kernel 3×3, filters 64, activation ReLU so when
Convolution layer 2: Kernel 3×3, filters 64, activation ReLU
VGG-16
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
i Apply
Convolution layer 3: Kernel 3×3, filters 128, activation ReLU transfer ItIt ishasa 13
CNN that consists of 16 layers.
convolutional layers
Convolution layer 4: Kernel 3×3, filters 128, activation ReLU
learningand
,
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2 3 fully connected dense layers.
Convolution layer 5: Kernel 3×3, filters 256, activation ReLU I
This network has 138 million parameters.
freeze
Convolution layer 6: Kernel 3×3, filters 256, activation ReLU
Convolution layer 7: Kernel 3×3, filters 256, activation ReLU that
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2 part
Convolution layer 8: Kernel 3×3, filters 512, activation ReLU
Convolution layer 9: Kernel 3×3, filters 512, activation ReLU
Convolution layer 10: Kernel 3×3, filters 512, activation ReLU
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Convolution layer 11: Kernel 3×3, filters 512, activation ReLU
Convolution layer 12: Kernel 3×3, filters 512, activation ReLU
Convolution layer 13: Kernel 3×3, filters 512, activation ReLU
Classification
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Fully connected layer 14 (MLP input layer): Flatten dense layer with input size 25088
Fully connected hidden layer 15: Dense layer with input size 4096
Fully connected output layer: Dense layer for 1,000 classes
GoogLeNet
The Inception architecture introduces several key ideas that allow it to achieve Inception
high performance, both in terms of accuracy and computational cost. Multiple convolutions in parallel branches
Instead of trying to choose different filter
sizes (1x1, 3x3, 5x5, etc.) just try them all!
ResNet
The key innovation of ResNet is the introduction of "residual blocks" that allow A CNN with branches (one branch is the
for training substantially deeper networks than what was previously possible. identity function, so the other learns the
residual)Variations: ResNet50, ResNet101,
Residual Block ResNet152, ResNet_v2, ResNeXt.
CPCS432 Lecture
Lecture6 6 Dr. Arwa Basbrain 28/09/2023 19
Transfer Learning Intuition
ofclassesa
• Transfer Learning in a Picture different
#
• Chop off the old "head", add a new head!
-
a
-
20
CPCS432 Lecture
Lecture6 6 Dr. Arwa Basbrain 28/09/2023 20
when pretrained
my
trained on
Transfer Learning Process model is
with pixels intensity
images
↑ Ranging froma
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.give
~
it images
2. Load the pre-trained model’s architecture and weights. from
Ranging
3. Discard the last layers, replacing them with freshly initialized layers. >
- Cuz there 0-255
may be changes must
4. Freeze the weights of the pre-trained layers and train the new layers. in the new you
fix it to
5. Create the model. match
6. Fine-tune the model over increasing epochs. >
- Train the model
1. Normalize input images using the same mean and standard deviation used in
the pre-trained model.
> So it can be
-
0-1 instead
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.
freeze
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.
feature Map
Feature
O
Vector
- Neurons -# of classes
1. Normalize input images using the same mean and standard deviation used in the pre-trained model.
2. Load the pre-trained model’s architecture and weights.
3. Discard the last layers, replacing them with freshly initialized layers.
4. Freeze the weights of the pre-trained layers and train the new layers.
LAYERS
All
kop
:
&
freeze
it trainable then values
if
i made
of
weights in filter will
change
• Fine-Tuning: Fine-tune some or all layers of the pre-trained model, allowing the
network to adapt to the specific dataset. This is especially useful if the dataset is large
and similar to the one the pre-trained model was originally trained on.
• Feature Extraction: Use the pre-trained model as a fixed feature extractor, freezing the
convolutional base and training only the classifier on top.
When training and testing Convolutional Neural Networks (CNNs), several key considerations should be
taken into account to ensure effective training, good generalization to unseen data, and reliable model
evaluation. Here are the main factors to consider:
Normalization: Normalize image pixel values to a standard range (e.g., 0-1 or -1 to 1). CNNs perform
better with normalized inputs, and this ensures faster and more stable convergence during training.
Data Augmentation: Apply data augmentation techniques like rotations, flips, zooms, and shifts during
training. This helps the model generalize better by artificially increasing the diversity of the training set
and reducing overfitting.
Class Imbalance: Ensure balanced datasets. If one class has significantly more data than others, the model
may become biased toward that class. Consider using oversampling, undersampling, or class weighting to
mitigate this. Eclass with less dat (Ait) class with more
Train/Validation/Test Split: Ensure that you split the data properly into training, validation, and test sets. data
(Remove
Typical splits are 70%-80% training, 10%-15% validation, and 10%-15% testing. from it
Model Complexity: Choose the appropriate architecture size based on your data. Large
datasets can handle complex models (e.g., ResNet, Inception), but smaller datasets may
require simpler models to avoid overfitting.
Transfer Learning: For smaller datasets, consider using pre-trained models (e.g., VGG16,
ResNet, MobileNet). Transfer learning helps leverage knowledge from models trained on
large datasets (like ImageNet).
Parameter Initialization: When training from scratch, weight initialization is important.
Good initialization helps the model converge faster. Modern CNN libraries handle this well,
T
but it's worth checking.
Learning Rate: Choosing an appropriate learning rate is crucial. If it's too large, the model might
converge too quickly to a suboptimal solution or may not converge at all. If it's too small, the
training process might be slow, and the model might get stuck in local minima.
• Consider using learning rate schedulers or optimizers like Adam, which adapt the
learning rate during training.
Batch Size: Larger batch sizes make training faster, but they can lead to overfitting. Smaller
batch sizes introduce noise but help generalization. Common batch sizes are 32, 64, or 128, but
experimentation is key.
Number of Epochs: Train for enough epochs to allow the model to learn but monitor validation
accuracy/loss to prevent overfitting. Use early stopping based on validation performance.
valid
make it
.
moreAl .
Cross-Validation: Consider using cross-validation (e.g., k-fold) to ensure that the model
generalizes well to different subsets of the data. This is especially useful for small datasets.
Accuracy, Precision, Recall, F1-Score: Accuracy alone may not be sufficient, especially with
imbalanced datasets. Use metrics like precision, recall, and F1-score to get a better understanding
of model performance.
Confusion Matrix: Analyze the confusion matrix to understand which classes the model is
performing well or poorly on. It helps detect specific misclassifications.
ROC Curve / AUC: For binary classification tasks, the ROC curve and AUC provide insight into
how well the model separates classes at various thresholds.
GPU/TPU Usage: CNNs are computationally expensive. Utilize hardware accelerators like
GPUs (or TPUs in Google Colab) to speed up training. Training large CNN models on a CPU can
be very slow.
Checkpointing: Regularly save model checkpoints to avoid losing progress in case of hardware
failures. This also allows you to resume training from a specific point.
Sufficient Data: Ensure you have a sufficient amount of data for training. If not, consider data
augmentation, transfer learning, or synthetic data generation.
Data Augmentation: Apply augmentation (e.g., rotation, zoom, flipping, cropping) during
training to artificially increase the training set size, improve model generalization, and avoid
overfitting.
Sufficient Data: Ensure you have a sufficient amount of data for training. If not, consider data
augmentation, transfer learning, or synthetic data generation.
Data Augmentation: Apply augmentation (e.g., rotation, zoom, flipping, cropping) during
training to artificially increase the training set size, improve model generalization, and avoid
overfitting.
Saliency Maps/Grad-CAM: These techniques help visualize which parts of the image the model
focuses on when making predictions. This can help ensure that the model is learning meaningful
patterns and not focusing on irrelevant details.
Misclassification Analysis: Review misclassified examples to understand patterns in the model’s
errors. This might highlight issues with the data, model architecture, or training process.