NB4-10: Biomedical Imaging and PyTorch V
NOTE: We are going to use a pre-trained network (ResNet50) to modify its last layer in
order to classify our dataset. We'll see how by using a powerful technique
called transfer learning.
1. Introduction
What is Transfer Learning?
We've built a few models by hand so far. But their performance has been poor. You
might be thinking, is there a well-performing model that already exists for our
problem?
And in the world of deep learning, the answer is often yes.
Transfer learning is a machine learning technique where a model developed for a
particular task is reused as the starting point for a model on a different but related
task. This approach leverages the knowledge gained from solving one problem to help
solve another problem, often resulting in improved performance and reduced training
time.
Key Concepts in Transfer Learning:
1. Pre-trained Models: In transfer learning, a model is often pre-trained on a
large dataset (like ImageNet for image recognition) and then fine-tuned on a
smaller, task-specific dataset. The pre-trained model has already learned
general features that can be useful for many related tasks.
2. Fine-Tuning: This involves taking a pre-trained model and training it further on
a new, smaller dataset. The lower layers of the network, which capture more
generic features, are usually left unchanged, while the upper layers, which are
more task-specific, are modified to better suit the new task.
3. Feature Extraction: In some cases, rather than retraining the entire model,
only the features learned by the pre-trained model are used. These features are
then fed into a new model (such as a classifier) that is trained on the new task.
4. Domain Adaptation: Transfer learning often involves adapting a model trained
in one domain (e.g., medical images) to perform well in another domain (e.g.,
satellite images). The model needs to adjust to different data distributions or
feature spaces.
Applications of Transfer Learning:
Computer Vision: Transfer learning is widely used in image classification,
object detection, and segmentation tasks, where models pre-trained on large
datasets like ImageNet are fine-tuned for specific applications.
Natural Language Processing (NLP): Models like BERT (Bidirectional Encoder
Representations from Transformers) and GPT (Generative Pre-training
Transformer) are pre-trained on vast text corpora and then fine-tuned for tasks
such as sentiment analysis, question answering, or language translation.
Speech Recognition: Pre-trained models can be adapted to
recognize different accents or dialects by fine-tuning on smaller datasets.
Advantages of Transfer Learning:
Reduced Training Time: Since the model starts with pre-learned features, it
requires less time to converge on the new task.
Better Performance: Transfer learning can improve the performance of
models, especially when the new dataset is small.
Resource Efficiency: It allows models to be trained with fewer computational
resources since not all layers of the network need to be retrained.
In summary, Transfer learning enables models to learn new tasks more efficiently by
leveraging the knowledge gained from previously learned tasks, making it a powerful
tool in machine learning, particularly when data or computational resources are
limited.
Where to find pretrained models
Pre-trained models can be found on various platforms, repositories, and libraries that
cater to different machine learning tasks, such as computer vision, natural language
processing (NLP), and more. Below are some popular sources for pre-trained models:
1. Model Zoos and Libraries
TensorFlow Hub:
o URL: tfhub.dev
o Description: TensorFlow Hub provides a vast repository of pre-trained
models that can be easily integrated into TensorFlow projects. Models
cover areas like image classification, text embeddings, and more.
PyTorch Hub:
o URL: pytorch.org/hub
o Description: PyTorch Hub offers a variety of pre-trained models in areas
such as vision, language, and speech. Models can be directly loaded and
fine-tuned within the PyTorch framework.
Hugging Face Model Hub:
o URL: huggingface.co/models
o Description: Hugging Face is the go-to platform for NLP models, offering
pre-trained models for tasks like text classification, translation, question
answering, and more. It also hosts models for vision and other modalities.
Keras Applications:
o URL: Keras Applications
o Description: Keras provides a set of popular pre-trained models for
image classification tasks, including models like VGG16, ResNet, and
InceptionV3, which can be easily imported and fine-tuned.
2. GitHub Repositories
Model-Specific Repositories: Many researchers and organizations release
their pre-trained models on GitHub. Examples include:
o OpenAI: For models like GPT and CLIP.
o Facebook AI Research (FAIR): For models like RoBERTa, Mask R-CNN,
etc.
3. Online Platforms and Marketplaces
Google Cloud Vertex AI:
o URL: vertex-ai
o Description: Google Cloud Vertex AI provides access to pre-trained
models and pipelines for various tasks, including vision and NLP, which
can be deployed on Google Cloud infrastructure.
AWS Marketplace:
o URL: aws.amazon.com/marketplace
o Description: AWS Marketplace offers machine learning models that can
be integrated into Amazon's cloud services, including pre-trained models
for image recognition, NLP, and more.
Microsoft Azure AI Gallery:
o URL: gallery.azure.ai
o Description: Azure AI Gallery provides pre-trained models and solutions
that can be deployed on Microsoft Azure for various tasks, including
predictive analytics, NLP, and computer vision.
4. Research Paper Repositories
Papers with Code:
o URL: paperswithcode.com
o Description: This platform links machine learning papers with their code
implementations and often provides pre-trained models. It's a great
resource for finding state-of-the-art models.
5. Specialized Repositories
Model Zoo for Object Detection (Detectron2):
o URL: Detectron2 Model Zoo
o Description: Facebook AI Research's Detectron2 provides a model zoo
with pre-trained models for object detection and segmentation tasks.
Open Model Zoo (Intel OpenVINO):
o URL: Open Model Zoo
o Description: The Open Model Zoo offers a collection of pre-trained
models optimized for Intel hardware, suitable for tasks like computer
vision and NLP.
These sources should cover most use cases for obtaining pre-trained models across
different domains.
Transfern Learning: Feature Extraction vs Fine-Tuning
Feature Extraction and Fine-Tuning are related concepts in transfern learning, but they
differ in their approach and purpose. Here’s an explanation of the difference between
the two:
Feature Extraction
Feature Extraction is a technique where a model trained on one task or domain is
reused for a different but related task or domain. The main idea is to leverage the
knowledge gained from the original task to improve performance on the new task,
particularly when the new task has a limited amount of data.
How it works:
o You start with a pre-trained model, such as ResNet-50 trained on
ImageNet.
o The initial layers (backbone) of the model are typically retained because
they capture general features of the domain (e.g., edges, textures, etc., in
images).
o Only the final layers (head) are modified, usually by replacing the output
layer to match the specific classes of the new task.
Advantages:
o Reduces training time.
o Requires less data for training.
o Improves performance, especially in tasks with limited data.
Fine-Tuning
Fine-Tuning is a specific type of transfer learning, but with a more detailed approach.
In fine-tuning, you not only reuse the pre-trained model but also allow some or all of
the layers of the model to be retrained on the new dataset. This allows the model to
adjust its parameters more precisely for the new task.
How it works:
o You take a pre-trained model and replace the output layer to match the
new classification task.
o Instead of keeping all layers "frozen" (not trainable), some or all layers of
the model are retrained with a smaller, task-specific dataset.
o Typically, layers closer to the output are trained more aggressively, while
earlier layers might be trained with a lower learning rate or kept frozen.
Advantages:
o Allows the model to adapt more closely to the specific features of the new
dataset.
o Can further improve performance if the new dataset is significantly
different from the original dataset.
Key Comparison:
Feature Extraction: A general approach where a pre-trained model is reused,
often with only the final layers retrained or the output layer replaced.
Fine-Tuning: A more specific technique within transfer learning that
involves retraining (or adjusting) the entire model or part of the model to better
fit the new task, allowing for greater customization and fine-tuning.
In summary,
transfer learningrefers to the overall process of using a pre-trained model for
a new task, where
o feature extractionis a general approach with the output layer replaced
and retrained while
o fine-tuninginvolves more detailed retraining of the model to optimize its
performance on that specific task.
NOTE: In this notebook, transfer learning will be performed. A pre-trained
model (ResNet50) will be used, and its last layers will be adjusted to learn
how to classify images from the new dataset.
Fine-tuning is a specific technique within transfer learning where not only
the last layer is retrained, but also some of the previous layers of the pre-
trained model. In this case, only the last fully connected layer (model_ft.fc)
will be modified, so it will be considered feature extraction and not fine-
tuning.
2. Setting up Our Workspace
First, we check if GPU is connected. The nvidia-smi command (NVIDIA System
Management Interface) is used to monitor and manage NVIDIA GPUs (Graphics
Processing Units) in a system. It provides detailed information about the status and
performance of the GPUs, including GPU utilization, temperature, memory usage,
processes utilizing the GPU, and more.
nvidia-smi is a command-line utility provided by NVIDIA that helps you manage and
monitor NVIDIA GPU devices. It stands for NVIDIA System Management Interface.
Setting our workspace: /content and /content/datasets
Setting our Home
We save the root directory of the project '/content' as 'HOME' since we will be
navigating through the directory to have multiple projects under the same HOME.
Additionally, we will have the datasets in the 'datasets' directory, so all datasets are
easily accessible for any project.
Mount Google Drive
Next, it imports the drive module from the google.colab library, which provides
functionalities for mounting Google Drive in Google Colab.
Additionally, Google Drive is mounted in Google Colab and made available at the
path /content/drive. The user will be prompted to authorize access to Google Drive.
Once authorized, the content of Google Drive will be accessible from that point
onwards in the Colab notebook.
3. Load a dataset (dataloader)
Create a directory where we can save our dataset
Create the dataset directory (if it doesn't exist), where we are going to save the
dataset with which we are going to train our CNN.
Inspect the Dataset: Skin lesion recognition in 7 classes:
The dataset contains several thousand photos of cell images in seven subdirectories
(classes) with one cell image per class. The directory structure is as follows thanks to
this snipet:
0: 'akiec' - actinic keratosis
1: 'bcc' - basal cell carcinoma
2: 'bkl' - benign keratosis
3: 'df' - dermatofibroma
4: 'mel' - melanoma
5: 'nv' - melanocytic nevus
6: 'vasc' - vascular lesion
Setting a Dataloader
The purpose of a DataLoader is fundamental in the context of machine learning and
deep learning, especially when working with large or complex datasets. Its main
purpose is to facilitate the efficient loading and manipulation of data during model
training.
Dataloaders
ResNet-50
ResNet-50 is a deep neural network designed for computer vision tasks, such as
image classification. "ResNet" stands forResidual Network, and "50" indicates that
the network is 50 layers deep.
What is ResNet-50?
ResNet-50 is part of the ResNet family of models, which were introduced by
researchers from Microsoft Research (Kaiming He et al.) in their 2015 paper "Deep
Residual Learning for Image Recognition" This architecture gained fame for its
outstanding performance in image classification challenges, such as the ImageNet
Large Scale Visual Recognition Challenge (ILSVRC).
What is the Problem that ResNet-50 Solves?
Degradation in Deep Networks
One of the challenges in training very deep neural networks is the degradation
problem. As networks get deeper, their performance on the training task tends to
degrade, not because of overfitting, but due to difficulties in optimizing a deep
network, such as vanishing gradients.
Residual Architecture
The key innovation in ResNet-50 is its residual architecture, which introduces the
idea of residual or "skip connections." These connections allow the input of one layer
to bypass several layers and be added to the output of a later layer. This makes it
easier for the layers to learn an identity function if necessary, enabling the network to
learn more complex functions.
Mathematically, instead of a layer learning a function H(x) directly, it is reformulated
so that the layer learns a residual function F(x), where H(x)=F(x)+x. Here, x is the
original input, and F(x) is the transformation learned by the intermediate layers.
Structure of ResNet-50
ResNet-50 is a specific variant with 50 layers, consisting of:
1 initial convolutional layer (7x7 convolutions with a stride of 2).
4 main blocks of residual layers: Each block has multiple convolutional
layers and is where the residual connections are applied. The blocks are
organized as follows:
o Block 1: 3 layers
o Block 2: 4 layers
o Block 3: 6 layers
o Block 4: 3 layers
1 global average pooling layer at the end.
1 fully connected layer for the final classification.
The network uses 1x1 and 3x3 convolutions within its blocks, and the inclusion of
1x1 convolutions helps reduce dimensionality, which is crucial for making the network
computationally efficient.
Applications
ResNet-50 has become a standard architecture used in many computer vision
applications, including:
Image Classification: To categorize images into a large number of classes.
Object Detection: Used as a backbone in more complex algorithms to locate
and classify objects within images.
Image Segmentation: As part of pipelines that segment and label each pixel
in an image.
Advantages of ResNet-50
Depth: The network is deep enough to capture complex features in images.
Avoids Degradation: Thanks to the residual connections, ResNet-50 can be
effectively trained even with a large number of layers.
Versatility: It has proven to be very versatile and can be used as the basis for
many other computer vision tasks.
In summary, ResNet-50 is a powerful and widely used model in the field of computer
vision, capable of handling complex classification and detection tasks thanks to its
innovative residual architecture.
Initializing a ResNet50 model for transfer learning
This code snippet will set up a transfer learningpipeline using a ResNet50 model in
PyTorch, with the option to perform either feature extraction or fine-tuning, depending
on the feature_extract parameter.
Key Points in the Code Snipet:
1. Import Statements:
o torch.nn as nn: Importing the nn module from PyTorch, which provides
various neural network layers and functionalities.
o torch.optim as optim: Importing the optim module, which provides various
optimization algorithms.
2. set_parameter_requires_grad Function:
o Purpose: This function controls whether the gradients for the parameters
of the model are computed during training.
o Functionality:
If feature_extracting is True, it freezes the model's parameters by
setting param.requires_grad = False. This means that during
backpropagation, the weights of these layers will not be updated.
If feature_extracting is False, the model's parameters will remain
trainable, allowing fine-tuning.
3. initialize_model Function:
o Purpose: This function initializes a ResNet50 model with a specified
number of output classes and allows for either feature extraction or fine-
tuning.
o Parameters:
num_classes: The number of output classes for the classification
task.
feature_extract: A boolean that determines whether to perform
feature extraction (True) or fine-tuning (False).
use_pretrained: A boolean that specifies whether to load a pre-
trained version of ResNet50.
o Process:
It loads a ResNet50 model, with pre-trained weights
if use_pretrained is True.
Calls set_parameter_requires_grad to freeze or unfreeze the model's
parameters based on feature_extract.
Modifies the final fully connected (FC) layer of the model to match
the number of classes specified by num_classes. This is necessary
because the original ResNet50 model is typically trained on the
ImageNet dataset, which has 1000 classes.
4. Model Initialization and Transfer to Device:
o model_ft = initialize_model(num_classes, feature_extract,
use_pretrained=True): Initializes the model using the specified
parameters.
o model_ft = model_ft.to(device): Transfers the model to the specified
computing device (e.g., GPU or CPU).
Feature Extraction or Fine-Tuning?
Transfer Learning: The overall process is an example of transfer learning
because it involves using a pre-trained model (ResNet50) as a starting point for
a new task (with a different number of output classes).
Feature Extraction or Fine-Tuning?:
o Feature Extraction: If feature_extract is set to True, most of the model's
parameters are frozen, and only the parameters of the final fully
connected layer are trained. This is a common approach in transfer
learning when the pre-trained features are deemed sufficient, and only the
classifier needs to be adapted to the new task.
o Fine-Tuning: If feature_extract is set to False, all parameters of the model
are updated during training. This is a more intensive form of transfer
learning where the model is fine-tuned to better fit the new task.
Summary: The code is primarily fortransfer learning, with the option to perform
eitherfeature extraction(only training the final layer) orfine-tuning(training
some/all layers).
4. Train model
Define loss and optimizer
This code sets up the optimizer for training a deep learning model in PyTorch,
specifically the model_ft model that was initialized earlier. Below is a detailed
explanation of each part of the code:
1. Selecting Parameters to Update
params_to_update = model_ft.parameters():
o Initially, it is assumed that all the model's parameters (model_ft) will be
updated during training. model_ft.parameters() returns an iterator over all
the model's parameters.
Conditional if feature_extract::
o If feature_extract is True, it indicates that feature extractionis being
performed. In this case, only the parameters of the layers that are not
frozen (i.e., those with requires_grad=True) will be updated.
o The line params_to_update = [param for param in model_ft.parameters()
if param.requires_grad] filters the parameters, creating a list that contains
only those parameters with requires_grad set to True.
o This is useful when you have opted to freeze the pre-trained layers of the
model (using the set_parameter_requires_grad function explained earlier)
and want to train only the final layer (or layers that have been added or
modified).
2. Setting Up the Optimizer
optim.SGD:
o Here, the SGD (Stochastic Gradient Descent) optimizer is being used
to train the model. This optimizer is one of the most common techniques
for updating the weights of a neural network during training.
params_to_update:
o This argument tells the optimizer which parameters should be updated
during training. Depending on whether feature extraction or fine-tuning is
being performed, this list may include all of the model's parameters or
only some of them.
lr=0.001:
o The learning rate determines the size of the steps the optimizer takes in
the direction of the gradient during loss minimization. A small value
like 0.001 allows the model to learn more gradually, reducing the risk of
skipping important minima in the loss function.
momentum=0.9:
o Momentum helps accelerate the optimizer in the direction of the
gradients and mitigates oscillation. A value of 0.9 is quite common and
often improves convergence, especially in deep networks.
3. Setting Up the Loss Function
nn.CrossEntropyLoss:
o This line defines the loss function to be used during
training. CrossEntropyLoss is a standard loss function for multi-class
classification tasks.
o It combines LogSoftmax and Negative Log Likelihood Loss into a single
function, making it suitable for problems where you need to predict a
single class among multiple categories.
Summary
Parameters to Update: These are configured to include either all the model’s
parameters or only those that have not been frozen, depending on whether
feature extraction or fine-tuning is being performed.
Optimizer: It is set up to use SGD with a low learning rate and momentum,
which is suitable for most deep learning problems.
Loss Function: CrossEntropyLoss is used, which is ideal for multi-class
classification problems.
This configuration is crucial for effective model training, as it determines which parts
of the model will be adjusted, how those adjustments will be made, and how the
model's performance will be measured during training.
Define training
This code defines a function called train_model that is used to train a deep learning
model in PyTorch. The function handles both the training and validation phases, tracks
the performance metrics, and uses the tqdm library to display a progress bar during
training. Below is a detailed explanation of its purpose and functionality:
Purpose of the train_model Function
The primary purpose of this function is to train a model over a specified number of
epochs, evaluate it on a validation set, and return the model with the best validation
accuracy, along with the recorded metrics such as losses and accuracies for both
training and validation phases.
Key Components and Functionality
1. Function Arguments:
o model: The PyTorch model to be trained.
o dataloaders: A dictionary containing the data loaders for both training
('train') and validation ('val') datasets.
o criterion: The loss function used to calculate the loss during training.
o optimizer: The optimizer used to update the model's parameters.
o num_epochs: The number of epochs to train the model.
2. Metric Storage:
o The function initializes lists
(train_losses, val_losses, train_accuracies, val_accuracies) to store the loss
and accuracy values for each epoch, separately for the training and
validation phases.
3. Training and Validation Loop:
o The function loops through the specified number of epochs.
o For each epoch, it iterates over two phases: 'train' and 'val'.
Training Phase (phase == 'train'): The model is set to training
mode using model.train(), allowing gradients to be computed and
the model to be updated.
Validation Phase (phase == 'val'): The model is set to evaluation
mode using model.eval(), which disables dropout and batch
normalization layers and prevents gradient computation.
4. Progress Bar with tqdm:
o The tqdm library is used to display a progress bar for each epoch. It tracks
the progress of the model as it processes batches of data, showing the
current loss and accuracy for each batch.
5. Loss and Accuracy Calculation:
o For each batch, the function calculates the loss and the number of correct
predictions.
o The running_loss and running_corrects variables accumulate these values
for the entire phase (training or validation).
o After processing all batches in a phase, the epoch-level loss and accuracy
are calculated by dividing the accumulated values by the total number of
data points in the dataset.
6. Best Model Checkpointing:
o The function tracks the best validation accuracy across all epochs. If the
current epoch's validation accuracy is the highest seen so far, the model's
weights are saved as the best model (best_model_wts).
7. Returning Results:
o After all epochs are completed, the function loads the best model's
weights (the ones with the highest validation accuracy) into the model.
o It returns the trained model along with the lists of losses and accuracies
for both training and validation, which can be used for further analysis or
plotting.
Summary
This function is designed to streamline the training and evaluation of a PyTorch model
by:
Handling the training and validation phases in each epoch.
Tracking and storing performance metrics.
Displaying real-time progress with tqdm.
Returning the model with the best validation accuracy along with the recorded
metrics.
It's a robust utility for training models and monitoring their performance over time.
5. Validate our model
Validation metrics
This code is designed to evaluate a trained deep learning model's performance on a
test dataset using various metrics and visualizations. It performs the following steps:
evaluates the model, calculates accuracy, generates a classification report, creates a
confusion matrix, and plots ROC curves for a multiclass classification problem.
Key Components and Functionality
1. Setting the Model to Evaluation Mode:
o The model is set to evaluation mode using model_ft.eval(). This ensures
that layers like dropout and batch normalization behave correctly during
evaluation, i.e., they don’t introduce randomness as they might during
training.
2. Making Predictions and Storing Results:
o These lists are initialized to store the predicted labels, true labels, and the
predicted probabilities for each class, which will be used later for various
evaluations.
o The code iterates through the test data using torch.no_grad() to disable
gradient computation (not needed during evaluation), making the process
faster and using less memory.
6. Predictions (inference)
Display predictions
This code is designed to visualize a batch of images from a test dataset along with
their true labels and predicted labels after running them through a pre-trained model.
It uses Matplotlib for displaying the images in a grid format, which makes it easier to
inspect the model's predictions visually.
Key Components and Functionality
1. Class Names Definition:
o A list of class names corresponding to the different categories that the
model can predict. These names will be used to label the images with
their true and predicted classes.
2. Getting a Batch of Images:
o The code retrieves a batch of images and their corresponding labels from
the test data loader. This batch will be used for visualizing the model’s
predictions.
3. Preparing the Images and Labels:
o If the images and labels were originally on the GPU, they are moved to the
CPU for further processing and visualization.
4. Making Predictions:
o The model is set to evaluation mode (model_ft.eval()), ensuring no
gradients are calculated or any layers like dropout behave differently.
o Predictions are made on the batch of images, and the predicted labels are
obtained using torch.max(), which identifies the class with the highest
predicted probability.
5. Visualizing the Images:
o Subplot Creation:
The code dynamically creates a grid of subplots to display the
images. The number of rows is calculated based on the total number
of images and the desired number of columns.
o Denormalization (if necessary):
If the images were normalized during preprocessing, they are
denormalized before being displayed. This step ensures that the
images appear in their original color space rather than in a
normalized form.
o Image Display:
The images are converted to NumPy arrays and transposed to
match the format expected by Matplotlib (height x width x
channels).
Each image is displayed with a title indicating the true label and the
predicted label.
o Handling Unused Subplots:
If there are more subplots than images, the unused subplots are
hidden to make the visualization cleaner.
6. Displaying the Visualization:
o Finally, the layout of the subplots is adjusted to prevent overlap, and the
grid of images is displayed using plt.show().
Summary
This code is a visualization utility that:
Displays a batch of images from the test dataset.
Shows both the true and predicted labels for each image, allowing you to
visually inspect the model's performance.
Handles denormalization if the images were normalized during
preprocessing, ensuring they are displayed correctly.
Uses a dynamic grid layout to arrange the images neatly in a grid, adjusting
the number of rows based on the number of images.
This is particularly useful for quickly assessing how well the model is performing on
specific examples and identifying patterns or issues in its predictions.