0% found this document useful (0 votes)
10 views79 pages

AdvAI Unit4

The document discusses transfer learning, a machine learning technique that reuses knowledge from previously trained models to improve performance on related tasks, particularly in areas like computer vision and natural language processing. It highlights the advantages of using pre-trained models, such as reduced training time and improved performance with limited data, and provides examples of popular datasets and models like ImageNet, VGG-16, ResNet-50, and BERT. The document emphasizes the importance of transfer learning in addressing challenges associated with data scarcity and computational costs in deep learning.

Uploaded by

veronicahunterr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views79 pages

AdvAI Unit4

The document discusses transfer learning, a machine learning technique that reuses knowledge from previously trained models to improve performance on related tasks, particularly in areas like computer vision and natural language processing. It highlights the advantages of using pre-trained models, such as reduced training time and improved performance with limited data, and provides examples of popular datasets and models like ImageNet, VGG-16, ResNet-50, and BERT. The document emphasizes the importance of transfer learning in addressing challenges associated with data scarcity and computational costs in deep learning.

Uploaded by

veronicahunterr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

UNIT IV

TRANSFER
LEARNING
SEM 8, AI&DS
Dr. Himani Deshpande(TSEC)
UNIT 4-
TRANSFER LEARNING
• Introduction to transfer learning

– Basic terminologies, Pre-trained model and data sets,

– Feature extraction and fine tune transfer learning ,

– Recent advancement in transfer learning : self- supervised learning and

meta learning.

Dr. Himani Deshpande(TSEC)


REAL-LIFE CHALLENGES IN NLP & IP
TASKS
• Deep learning methods are data-hungry
• >50K data items needed for training,
• Labeled data in the target domain may be limited
• Huge computation cost involved.
• This problem is typically addressed with transfer learning

Dr. Himani Deshpande(TSEC)


TRANSFER LEARNING

You learn to balance the cycle and use the same skill to learn scooter
Dr. Himani Deshpande(TSEC)
TRANSFER LEARNING

• Transfer learning (TL) is a technique in machine learning (ML) in which


knowledge learned from a task is re-used in order to boost performance
on a related task.

• For example, for image classification, knowledge gained while learning


to recognize cars could be applied when trying to recognize trucks.

Dr. Himani Deshpande(TSEC)


TRANSFER LEARNING

• The reuse of a pre-trained model on a new problem is known as transfer


learning in machine learning.

• A machine uses the knowledge learned from a prior assignment to increase


prediction about a new task in transfer learning.

Dr. Himani Deshpande(TSEC)


TRADITIONAL LEARNING VS
TRANSFER LEARNING
• Traditional learning is isolated and occurs purely based on specific tasks,
datasets and training separate isolated models on them. No knowledge is
retained which can be transferred from one model to another.

• In transfer learning, you can leverage knowledge (features, weights etc) from
previously trained models for training newer models and even tackle problems
like having less data for the newer task!

Dr. Himani Deshpande(TSEC)


Traditional Learning vs Transfer learning

Dr. Himani Deshpande(TSEC)


TRANSFER LEARNING
The knowledge of an
already trained machine
learning model is
transferred to a different
but closely linked
problem throughout
transfer learning.

With transfer learning, we basically try to use what we’ve learned in one task to better
understand the concepts in another.
Weights are being automatically being shifted to a network performing “task A” from a
network that performed
Dr. Himani new “task B.”
Deshpande(TSEC)
Dr. Himani Deshpande(TSEC)
CONSERVATIVE TRAINING
Output layer output close Output layer

parameter close

initialization
Input layer Input layer

Target data (e.g.


Source data
A little data from
(e.g. Audio data of
target speaker)
Many speakers)
Dr. Himani Deshpande(TSEC)
LAYER TRANSFER
Output layer Copy some parameters

Target data

Input layer 1. Only train the rest layers (prevent


Source
data overfitting)
2. fine-tune the whole network (if
there
Dr. Himani is sufficient data)
Deshpande(TSEC)
HOW TRANSFER LEARNING
WORKS?
• In computer vision, neural networks typically aim to detect edges in the first
layer, forms in the middle layer, and task-specific features in the latter layers.
• The early and central layers are employed in transfer learning, and the latter
layers are only retrained. It makes use of the labelled data from the task it was
trained on.

Dr. Himani Deshpande(TSEC)


LAYER TRANSFER
• Which layer can be transferred (copied)?

– Speech: usually copy the last few layers


– Image: usually copy the first few layers

Pixels Layer 1 Layer 2 Layer L


x1 …… ……
x2 …… elephant

……
……
……

……

xN …… ……
Dr. Himani Deshpande(TSEC)
TRANSFER LEARNING
Andrew NG

Dr. Himani Deshpande(TSEC)


TRANSFER LEARNING

• Because of the massive amount of CPU power required, transfer learning is


typically applied in computer vision and natural language processing tasks like
sentiment analysis.

Dr. Himani Deshpande(TSEC)


ADVANTAGES OF
PRE-TRAINED MODELS
- Transfer Learning:
Pre-trained models can be leveraged for tasks even with limited data, thanks
to transfer learning. The knowledge gained during pre-training can be transferred to
related tasks.

- Time and Resource Efficiency::


Pre-training on a large dataset requires substantial computational resources,
but users can benefit from these resources without having to replicate the training
process.
Dr. Himani Deshpande(TSEC)
WHY SHOULD YOU USE TRANSFER
LEARNING?
• Transfer learning offers a number of advantages, the most important of which are reduced
training time, improved neural network performance (in most circumstances), and the
absence of a large amount of data.
• To train a neural model from scratch, a lot of data is typically needed, but access to that data
isn’t always possible – this is when transfer learning comes in handy.
• Because the model has already been pre-trained, a good machine-learning model can be
generated with fairly little training data using transfer learning.
• This is especially useful in natural language processing, where huge labelled datasets require a
lot of expert knowledge. Additionally, training time is decreased because building a deep
neural network from the start of a complex task can take days or even weeks.

Dr. Himani Deshpande(TSEC)


TRANSFER LEARNING

Dr. Himani Deshpande(TSEC)


DATASETS
In Image Classification, there are some very popular datasets that are used across research, industry, and
hackathons.
The choice of dataset depends on the specific task you are working on. For many tasks, using a pre-trained
model on a widely accepted dataset like ImageNet can be a good starting point due to the model's ability to
capture general features.
However, fine-tuning on a domain-specific dataset may be necessary for optimal performance on your
specific task. Always check the licensing terms and conditions associated with the datasets to ensure
compliance with usage policies.

The following are some of the prominent ones:


• ImageNet
• CIFAR
• MNIST
• COCO (Common Objects in Context)
Dr. Himani Deshpande(TSEC)
IMAGENET

• ImageNet is a large-scale dataset used for training and evaluating computer


vision models, particularly for image classification tasks.

• ImageNet originally contained over 14 million images,


• but the most widely used subset, known as ImageNet Large Scale Visual
Recognition Challenge (ILSVRC), includes around 1.2 million images for
training, 50,000 for validation, and 100,000 for testing.
• The images cover a diverse set of 1,000 object categories.

Dr. Himani Deshpande(TSEC)


Dr. Himani Deshpande(TSEC)
MNIST
• The MNIST dataset is widely used in the field of machine learning and computer
vision. It stands for Modified National Institute of Standards and Technology.

• MNIST is primarily used for the task of handwritten digit recognition, where the goal
is to train a model to correctly classify images of handwritten digits (0 through 9) into
their respective numerical values.

Image Format: The dataset consists of 28x28 pixel grayscale images of handwritten
digits.
Number of Classes: There are 10 classes, each corresponding to a digit from 0 to 9.
Training and Testing Sets: MNIST is commonly divided into a training set of 60,000
examples and a testing set of 10,000 examples.
Dr. Himani Deshpande(TSEC)
CIFAR
• CIFAR stands for the Canadian Institute for Advanced Research, and the CIFAR
datasets refer to a collection of labelled datasets widely used for training and
evaluating machine learning models, particularly in the field of computer vision.

There are several CIFAR datasets, with CIFAR-10 and CIFAR-100 being the most
popular:
• CIFAR-10 consists of 60,000 32x32 colour images in 10 different classes, with 6,000
images per class.
The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and
truck.
• CIFAR-100 is an extension of CIFAR-10, containing 60,000 32x32 colour images.
However, in CIFAR-100, the images are divided into 100 different classes, each
containing 600 images. The classes in CIFAR-100 are more fine-grained, covering a
broader range of object categories.
Dr. Himani Deshpande(TSEC)
COCO
• COCO addresses three main tasks: object detection, object segmentation, and image captioning. It
provides annotations for these tasks, making it a comprehensive dataset for evaluating models across
multiple visual understanding challenges.
• The dataset includes a large number of images (currently over 200,000 images) collected from a
wide range of everyday scenes. The images are diverse and cover a variety of contexts, making it
more challenging for models to generalize well.
• COCO contains images with 80 common object categories, such as people, animals, vehicles,
household items, and more.
• The availability of rich annotations, diverse images, and multiple tasks make COCO a valuable
resource for training and evaluating models in various aspects of computer vision.
• Researchers often use pre-trained models on COCO for downstream tasks, allowing their models to
learn from the large and diverse set of images and annotations present in the dataset.
Dr. Himani Deshpande(TSEC)
1. Image Classification:
DATASETS
- ImageNet: A large-scale image dataset with millions of labeled images across thousands of categories.
Commonly used for training models like VGG, ResNet, and Inception.
2. Object Detection:
- COCO (Common Objects in Context): A dataset for object detection, segmentation, and captioning.
It includes images with complex scenes and multiple objects.
3. Natural Language Processing (NLP):
- Wikipedia Dump: Large text corpora from Wikipedia articles can be used for language model
pretraining.
- BookCorpus: A dataset containing books for training language models.
- OpenWebText: A large collection of text from the web, commonly used for training models like GPT.
4. Speech Recognition:
- LibriSpeech: A dataset for training Automatic Speech Recognition (ASR) models, containing
audiobooks with transcriptions.
- CommonVoice: A multilingual dataset of voices to train and benchmark speech recognition systems.
Dr. Himani Deshpande(TSEC)
5. Facial Recognition:
- Labeled Faces in the Wild (LFW): A dataset for face verification and recognition,
containing images of celebrities collected from the web.
- CelebA: A dataset with celebrity images annotated with various attributes, commonly
used for facial recognition tasks.
6. Medical Imaging:
- ChestX-ray8: A dataset for chest X-ray image classification tasks, particularly for
pneumonia detection.
- ISIC Skin Cancer Dataset: A dataset for skin cancer classification, including various
types of skin lesions.
7. Scene Understanding:
- ADE20K: A dataset for semantic segmentation of scenes, with pixel-level annotations
for diverse indoor and outdoor scenes.
8. Recommendation Systems:
- MovieLens: A dataset for collaborative filtering and recommendation systems,
Dr. Himani Deshpande(TSEC)
containing movie ratings from users.
O D E L S
I N E D M
E - T RA
PR

Dr. Himani Deshpande(TSEC)


PRE-TRAINED MODELS
- These are neural network models that have been trained on a large dataset for a
specific task before being made available for use.
• The pre-training phase involves training the model on a general or diverse dataset
to learn generic features and patterns.
• This initial training is resource-intensive and often requires large amounts of
labeled data.
• Once pre-trained, these models can be fine-tuned on a smaller dataset specific to
a particular task or domain.
• Fine-tuning allows the model to adapt its knowledge to the specific nuances of the
target task.
Dr. Himani Deshpande(TSEC)
POPULAR PRE TRAINED MODELS

ØThese models serve as a starting point for further training or as feature


extractors for new, related tasks.
Ø The pre-trained models capture general patterns and features from the
data they were initially trained on, allowing them to be used as a
foundation for various applications.
ØThey serve as valuable resources for researchers and practitioners,
providing robust feature extractors for various machine learning
applications.

Dr. Himani Deshpande(TSEC)


EXAMPLES

VGG-16
ResNet50
Inception
BERT (Bidirectional Encoder Representations from Transformers):
YOLO
etc.

Dr. Himani Deshpande(TSEC)


VGG-16
• The VGG-16 is one of the most popular pre-trained models for image
classification. Introduced in the famous ILSVRC 2014 Conference, it was and
remains THE model to beat even today.
• Developed at the Visual Graphics Group at the University of Oxford, VGG-16
beat the then standard of AlexNet and was quickly adopted by researchers and
the industry for their image Classification Tasks.

• Convolutional Layers = 13
• Pooling Layers = 5
• Dense Layers = 3

Dr. Himani Deshpande(TSEC)


VGG16

• Image net Dataset


• 1000 classes
Dr. Himani Deshpande(TSEC)
VGG16

• Freeze means fixing the weights and bias, as true knowledge is gained in terms
of weights and bias,
Dr. Himani Deshpande(TSEC)
VGG16

• Train the dense layers and output functions

Dr. Himani Deshpande(TSEC)


VGG16

• For Binary Outcome

Dr. Himani Deshpande(TSEC)


Y IT WORKS ?

• As early layers works on extracting feature, edges, lines, shapes etc.

Dr. Himani Deshpande(TSEC)


RESNET

Dr. Himani Deshpande(TSEC)


RESNET 50

• ResNet-50 is a specific variant of the ResNet (Residual Network) architecture,


• which is a type of deep neural network designed to address the vanishing
gradient problem during the training of very deep networks.
• The "50" in ResNet-50 refers to the depth of the network, indicating that it
consists of 50 layers.
• ResNet-50 has approximately 25.6 million trainable parameters.

Dr. Himani Deshpande(TSEC)


INCEPTION

• While the above VGG-16 secured the 2nd rank in that year’s ILSVRC, the 1st
rank was secured by none other than Google – via its model GoogLeNet or
Inception as it is now later called as.

• The “Inception” micro-architecture was first introduced by Szegedy et al. in their 2014
paper, Going Deeper with Convolutions:

Dr. Himani Deshpande(TSEC)


Dr. Himani Deshpande(TSEC)
INCEPTION

Dr. Himani Deshpande(TSEC)


INCEPTION V1

Dr. Himani Deshpande(TSEC)


INCEPTION

• The goal of the inception module is to act as a “multi-level feature extractor”


by computing 1×1, 3×3, and 5×5 convolutions within the same module of the
network — the output of these filters are then stacked along the channel
dimension and before being fed into the next layer in the network.
• The original incarnation of this architecture was called GoogLeNet, but
subsequent manifestations have simply been called Inception
vN where N refers to the version number put out by Google.

Dr. Himani Deshpande(TSEC)


BERT(BIDIRECTIONAL ENCODER
REPRESENTATIONS FROM
TRANSFORMERS)
• Directional models read input in a sequential
manner that is from left to right.
• The transformer sequence reads the entire
sequence of words at once.
• This helps in maintaining the context of the
sentence.

Dr. Himani Deshpande(TSEC)


BERT

• BERT is designed to pre-train deep bidirectional representations from


unlabeled text by jointly conditioning on both left and right context in all
layers.

• As a result, the pre-trained BERT model can be fine-tuned with just one
additional output layer to create state-of-the-art models for a wide range of
tasks, such as question answering and language inference.

Dr. Himani Deshpande(TSEC)


Dr. Himani Deshpande(TSEC)
IMAGE CLASSIFICATION MODELS
1.VGG (Visual Geometry Group):
1. VGG16, VGG19: Deep convolutional neural networks with a straightforward architecture.
2.ResNet (Residual Network):
1. ResNet-50, ResNet-101: Deep networks with residual connections, designed to handle
very deep architectures.
3.InceptionV3:
1. A model with inception modules that capture multi-scale features.
4.MobileNet:
1. Designed for mobile and edge devices, balancing accuracy and computational efficiency.
5.EfficientNet:
1. A family of models that achieve state-of-the-art performance with better efficiency.
Dr. Himani Deshpande(TSEC)
OBJECT CLASSIFICATION MODEL
1.Faster R-CNN:
1. Region-based Convolutional Neural Network for object detection.
2.YOLO (You Only Look Once):
1. Real-time object detection model with high accuracy.
3.SSD (Single Shot MultiBox Detector):
1. Object detection model that predicts bounding boxes and class scores in a
single pass.

Dr. Himani Deshpande(TSEC)


NLP MODELS

1.BERT (Bidirectional Encoder Representations from Transformers):


1. Pretrained for various NLP tasks, including language understanding and question
answering.
2.GPT (Generative Pretrained Transformer):
1. A series of models for language generation and understanding.
3.T5 (Text-to-Text Transfer Transformer):
1. A model designed to frame all NLP tasks as converting input text to target text.

Dr. Himani Deshpande(TSEC)


SPEECH RECOGNITION MODELS

1.DeepSpeech:
1. Open-source automatic speech recognition (ASR) model by Mozilla.
2.Wav2Vec:
1. Model for unsupervised pretraining of speech representations.

Dr. Himani Deshpande(TSEC)


RECOMMENDATION SYSTEM MODELS

1.Wide & Deep:


1. Combines linear models with deep neural networks for recommendation
systems.
2.Matrix Factorization Models:
1. Collaborative filtering models for recommendation tasks.

Dr. Himani Deshpande(TSEC)


FACE RECOGNITION

1.OpenFace:
1. A model for facial landmark detection, recognition, and clustering.
2.VGGFace:
1. Pretrained for face recognition tasks.

Dr. Himani Deshpande(TSEC)


TRANSUDATIVE VS INDUCTIVE
TRANSFER LEARNING
• Transductive transfer
– No labeled target domain data available
– Focus of most transfer research in NLP
– E.g. Domain adaptation
• Inductive transfer
– Labeled target domain data available
– Goal: improve performance on the target task by training on other task(s)

Dr. Himani Deshpande(TSEC)


TRANSFER LEARNING APPROACHES

Dr. Himani Deshpande(TSEC)


Source: https://arxiv.org/pdf/1802.05934.pdf
WAYS FOR TRANSFER LEARNING

Transfer learning can be majorly done in two ways:

ØFeature extraction

ØFine-tuning

Dr. Himani Deshpande(TSEC)


FEATURE EXTRACTION

Feature extraction uses the representations of a pre-trained model and feeds it


to another model while fine-tuning involves training of the pre-trained model on
the target task.

Feature extraction in transfer learning involves using a pre-trained model's


learned features on a specific task and applying them to a new, related task. This
process allows the model to leverage knowledge gained from the source task to
improve performance on the target task, even when the datasets are different.

Dr. Himani Deshpande(TSEC)


FEATURE EXTRACTION

• Feature extraction: In feature extraction, the model weights are frozen


and the output from it is directly sent to another model.
• The features can either be sent to a fully connected model or we can
also train a classical model like Support Vector Machine (SVM) or
RandomForest on it.
• The benefit of using this is the task-specific model can be used again
for similar data. Also, if the same data is used repeatedly, extracting
feature once can save a lot of computing resources.

Dr. Himani Deshpande(TSEC)


FINE TUNING

• Fine-tuning: In fine-tuning, as the name implies, the weights are kept


trainable and are fine-tuned for the target task.

• Thus the pre-trained model acts as a starting point for the model
leading to faster convergence compared to the random initialization.

Dr. Himani Deshpande(TSEC)


FINE TUNING

• The main difference between the two is that in fine-tuning, more


layers of the pre-trained model get unfrozen and tuned on custom
data.
• This fine-tuning usually takes more data than feature extraction to
be effective.

Dr. Himani Deshpande(TSEC)


FEATURE SELECTION & FINE TUNING

Dr. Himani Deshpande(TSEC)


Dr. Himani Deshpande(TSEC)
FINE TUNING

Dr. Himani Deshpande(TSEC)


Recent advancement in Transfer Learning :

– Self-supervised learning and


– Meta-learning

Dr. Himani Deshpande(TSEC)


SELF-SUPERVISED LEARNING

• An alternative to transfer learning is self-supervised learning, in which


a supervised task is created using the unlabelled images from the
target domain itself to pre-train the lower layers.

Dr. Himani Deshpande(TSEC)


EXAMPLE

Auto suggestion

Dr. Himani Deshpande(TSEC)


SELF-SUPERVISED
LEARNING

Dr. Himani Deshpande(TSEC)


SELF-SUPERVISED LEARNING

Dr. Himani Deshpande(TSEC)


SELF-SUPERVISED LEARNING

• Self-supervised learning is a machine learning process that uses


automatically generated labels to transform an unsupervised problem into
a supervised problem. It's also known as predictive or pretext learning.

• In self-supervised learning, the model learns one part of the input from
another part of the input.

• Self-supervised learning methods can perform well without large labelled


datasets.
Dr. Himani Deshpande(TSEC)
SELF-SUPERVISED LEARNING

Self-supervised learning can be used for classification and regression tasks.

Some types of self-supervised learning include:


• Autoassociative self-supervised learning
• Contrastive self-supervised learning
• Non-contrastive self-supervised learning

Dr. Himani Deshpande(TSEC)


AUTO ASSOCIATIVE
• Autoassociative self-supervised learning is a specific category of self-supervised
learning where a neural network is trained to reproduce or reconstruct its own input
data.
• In other words, the model is tasked with learning a representation of the data that
captures its essential features or structure, allowing it to regenerate the original input.
• The term "autoassociative" comes from the fact that the model is essentially
associating the input data with itself.
• This is often achieved using autoencoders, which are a type of neural network
architecture used for representation learning.
• Autoencoders consist of an encoder network that maps the input data to a lower-
dimensional representation (latent space), and a decoder network that reconstructs the
input data from this representation.

Dr. Himani Deshpande(TSEC)


AUTOASSOCIATIVE
• The training process involves presenting the model with input data and requiring it to
reconstruct the same data as closely as possible.
• The loss function used during training typically penalizes the difference between the
original input and the reconstructed output.
• By minimizing this reconstruction error, the autoencoder learns a meaningful
representation of the data in its latent space.

Dr. Himani Deshpande(TSEC)


CONTRASTIVE
• For a binary classification task, training data can be divided into positive
examples and negative examples. Positive examples are those that match the
target.
• For example, if you're learning to identify birds, the positive training data are
those pictures that contain birds. Negative examples are those that do not.
• Contrastive self-supervised learning uses both positive and negative examples.
• Contrastive learning's loss function minimizes the distance between positive
samples while maximizing the distance between negative samples.

Dr. Himani Deshpande(TSEC)


NON- CONTRASTIVE
• Non-contrastive self-supervised learning (NCSSL) uses only positive
examples.
• Counterintuitively, NCSSL converges on a useful local minimum rather
than reaching a trivial solution, with zero loss.

• For the example of binary classification, it would trivially learn to classify


each example as positive.

• Effective NCSSL requires an extra predictor on the online side that does
not back-propagate on the target side

Dr. Himani Deshpande(TSEC)


META LEARNING

• The word “meta” usually indicates something more comprehensive


or more abstract.
• For example, a metaverse is a virtual world or the world inside our
world, metadata is data that provides information about other data
and similarly.

• Likewise, in this case, meta-learning refers to learning about learning.


Meta-learning includes machine learning algorithms that learn from
the output of other machine learning algorithms.

Dr. Himani Deshpande(TSEC)


META LEARNING

• Meta-learning is about optimizing and speeding up hyperparameters for


networks that have not yet been trained. Transfer learning, on the other hand,
is a technique that uses a network that has already been trained to train on a
new, similar task.

Dr. Himani Deshpande(TSEC)


META LEARNING

• Transfer learning is a technique of reusing existing neural


networks and on the other hand, meta-learning is an idea of
learning about learning.

Dr. Himani Deshpande(TSEC)


Dr. Himani Deshpande(TSEC)
Dr. Himani Deshpande(TSEC)
https://arxiv.org/pdf/2111.12146.pdf

You might also like