Transfer Learning Seminar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

TRANSFER LEARNING

Done By: Ahmed Majid Ahmed


Supervised By: Dr. Ayad R. Abbas

University of Technology | Computer Science Department


Advanced Machine Learning (PhD) 2024
1. Introduction

Conventional machine learning algorithms have been traditionally designed to


work in isolation and trained to solve specific tasks. The models have to be
rebuilt from scratch once the feature-space distribution changes. Transfer
learning is the idea of overcoming the isolated learning paradigm and utilizing
knowledge acquired for one task to solve related ones.

Transfer learning (TL) is the process of reusing a pre-trained model or a part of


it for a different but related problem.

1
Example, if a machine learning model can identify images of dogs, it can be
trained to identify cats using a smaller image set.

Transfer learning can also be applied across different modalities, such as using
a model that was trained on text to generate captions for images.

2. TL Advantages
• Computational costs: TL reduces the requisite computational costs to
build models for new problems.
• Dataset size: TL particularly helps alleviate difficulties involved in
acquiring large datasets.
• Generalizability: Because TL involves retraining an existing model with a
new dataset, it will gain more knowledge. It will potentially display better
performance can thus inhibit overfitting.
• Improved performance: Models developed through TL often demonstrate
greater robustness in diverse and challenging environments. They better
handle real-world variability and noise.

2
3. Types or TL Strategies

3.1 Inductive transfer learning: Here, the source and target domains are the
same, yet the source and target tasks are different from each other.

Example: Models are pre-trained on a large set of texts and then fine-tuned
using inductive transfer learning to specific functions like sentiment analysis.
Depending upon whether the source domain contains labeled data or not, it can
be further divided into multitask learning and self-taught learning.
3
3.2 Unsupervised Transfer learning:
similar to inductive TL, the difference, labeled data is unavailable in either of the
domains. By comparison, inductive transfer can be considered supervised
learning. It’s helpful if it is challenging or expensive to obtain labeled data.
3.3 Transductive transfer learning
This occurs when the source and target tasks are the same, but the datasets (or
domains) are different. It is especially useful when there is little or no labeled
data from the target domain. For example, consider adapting a sentiment
analysis model trained on product reviews to analyze movie reviews.
This can be further classified into subcategories, referring to settings where
either the feature spaces are different or the marginal probabilities.

We can summarize the different settings for each of the above techniques:

4. Transfer Learning Strategies for Deep Learning


The training time and the amount of data required for deep learning systems are
much more than that of traditional ML systems. There are various deep learning
networks with state-of-the-art performance. In most cases, teams/people share
the details of these networks for others to use. These pre-trained models form
the basis of transfer learning in the context of deep learnings

The two most popular strategies for deep transfer learning are:

4
4.1 Pre-trained Models as Feature Extractors
Deep learning systems and models are layered architectures that learn different
features at different layers, then finally connected to a last layer to get the final
output. This architecture allows us to utilize a pre-trained network without its
final layer as a fixed feature extractor for other tasks.

4.2 Fine Tuning Pre-Trained Models


Here, we do not just replace the final layer, but we also selectively retrain some
of the previous layers. Using this insight, we may freeze (fix weights) certain
layers while retraining, or fine-tune the rest of them to suit our needs.
This brings us to the question, should we freeze layers in the network to use
them as feature extractors or should we also fine-tune layers in the process?

5
5. Types of Deep Transfer Learning
5.1 Domain Adaptation: Usually referred to in scenarios where the marginal
probabilities between the source and target domains are different. For instance,
a corpus of movie reviews labeled as positive or negative would be different
from a corpus of product-review sentiments.
5.2 Domain Confusion: Instead of allowing the model to learn any
representation, we nudge the representations of both domains to be as similar
as possible. This can be achieved by applying certain pre-processing steps
directly to the representations themselves.
5.3 Multitask Learning

Several tasks are learned simultaneously without distinction between the


source and targets, The learner receives information about multiple tasks at
once, as compared to TL, where initially has no idea about the target task.
5.4 One-shot Learning: is a variant of transfer learning, where we try to infer
the required output based on just one or a few training examples. This is
essentially helpful in real-world scenarios where it is not possible to have
labeled data for every possible class, and in scenarios where new classes can be
added often.

6
5.5 Zero-shot Learning: relies on no labeled examples to learn a task. These
methods, make clever adjustments during the training stage itself to exploit
additional information to understand unseen data. It comes in handy in
scenarios such as machine translation.
6. How to implement TL
There are 6 general steps for implementing TL:
6.1 Obtain the pre-trained model: The first step is to get the pre-trained
model that you would like to use for your problem.
6.2 Create a base model: Usually, the first step is to instantiate the base model
using one of the architectures such as ResNet. You can also optionally download
the pre-trained weights. If you don’t download the weights, you will have to use
the architecture to train your model from scratch. Also, you have to remove the
final output layer. Later on, you will add a final output layer that is compatible
with your problem.

7
6.3 Freeze layers so they don’t change during training: This is vital, because
you don’t want the weights in those layers to be re-initialized. If they are, then
you will lose all the learning that has already taken place.

6.4 Add new trainable layers: The next step is to add new trainable layers that
will turn old features into predictions on the new dataset. This is important
because the pre-trained model is loaded without the final output layer.

8
6.5 Train the new layers on the dataset: you have to train the model with a
new output layer in place. Therefore, you will add some new dense layers as you
please, but most importantly, a final dense layer with units corresponding to the
number of outputs expected by your model.
6.6 Improve the model via fine-tuning: Optionally, you can improve the
model’s performance through fine-tuning. Fine-tuning is done by unfreezing the
base model or part of it and training the entire model again on the whole dataset
at a very low learning rate. The low learning rate will increase the performance
of the model on the new dataset while preventing overfitting. The learning rate
has to be low because the model is quite large while the dataset is small.

9
7. Applications of Transfer Learning
• Natural Language Processing (NLP): Embeddings, such as Word2vec
and Fast-Text, have been prepared using different training datasets.
These are utilized in different tasks, such as sentiment analysis and
document classification, by transferring the knowledge from the source
tasks.
• Audio/Speech: For instance, Automatic Speech Recognition (ASR)
models developed for English have been successfully used to improve
speech recognition performance for other languages, such as German.
• Computer Vision: TL can use models produced from large training
datasets and apply them to smaller image sets. This can include
determining the sharp edges of objects in the provided collection of
images. Moreover, the layers that specifically identify edges in images can
be determined and then trained based on the need.

10
References

[1] Sarkar, D. (2022, October 5). A Comprehensive Hands-on Guide to Transfer Learning with
Real-World Applications in Deep Learning. Medium. https://towardsdatascience.com/a-
comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-
deep-learning-212bf3b2f27a

[2] How does transfer learning work in deep learning? (2023, November 7).
www.linkedin.com. https://www.linkedin.com/advice/1/how-does-transfer-learning-
work-deep-skills-data-science-pphhf

[3] What is Transfer Learning? - Transfer Learning in Machine Learning Explained - AWS.
(n.d.). Amazon Web Services, Inc. https://aws.amazon.com/what-is/transfer-learning/

[4] Sharma, P. (2023, December 7). Understanding Transfer Learning for Deep Learning.
Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/10/understanding-
transfer-learning-for-deep-learning/

[5] Mwiti, D. (2023, August 22). Transfer Learning Guide: A Practical Tutorial With Examples
for Images and Text in Keras. neptune.ai. https://neptune.ai/blog/transfer-learning-guide-
examples-for-images-and-text-in-keras

[6] What is transfer learning? | IBM. (n.d.). https://www.ibm.com/topics/transfer-learning

[7] M. (2023, October 30). Transfer Learning: What is it? Data Science Courses |
DataScientest. https://datascientest.com/en/transfer-learning-what-is-it

[8] G. (2023, November 29). What is Transfer Learning? GeeksforGeeks.


https://www.geeksforgeeks.org/ml-introduction-to-transfer-learning/

[9] Transfer Learning Definition, Methods, and Applications | Spiceworks - Spiceworks. (n.d.).
Spiceworks. https://www.spiceworks.com/tech/artificial-intelligence/articles/articles-
what-is-transfer-learning/

11

You might also like