0% found this document useful (0 votes)
5 views

Lecture3 Transfer Learning

The document outlines an introductory course on Applied Machine Learning led by Dr. Tao Han at the New Jersey Institute of Technology. It covers key concepts such as transfer learning, model fine-tuning, multitask learning, domain adaptation, and zero-shot learning, emphasizing their applications and challenges. The course is designed to equip students with practical skills in training classifiers and adapting models to different tasks and domains.

Uploaded by

ra734
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture3 Transfer Learning

The document outlines an introductory course on Applied Machine Learning led by Dr. Tao Han at the New Jersey Institute of Technology. It covers key concepts such as transfer learning, model fine-tuning, multitask learning, domain adaptation, and zero-shot learning, emphasizing their applications and challenges. The course is designed to equip students with practical skills in training classifiers and adapting models to different tasks and domains.

Uploaded by

ra734
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

ECE 498: ST:

Introduction to Applied Machine Learning

• Tao Han, Ph.D.

• Associate Professor
• Electrical and Computer Engineering
• Newark College of Engineering
• New Jersey Institute of Technology

• https://tao-han-njit.netlify.app

Slides are designed based on Prof. Hung-yi Lee’s Machine Learning courses at National Taiwan University
http://weebly110810.weebly.com/3
96403913129399.html
http://www.sucaitianxia.com/png/c
Transfer Learning artoon/200811/4261.html

Dog/Cat
Classifier
cat dog

Data not directly related to the task considered

elephant tiger dog cat

Similar domain, different tasks Different domains, same task


Transfer Learning - Overview
Source Data (not directly related to the task)

labelled

Model Fine-tuning
labelled
Target Data

unlabeled

Warning: different terminology in different literature


Model Fine-tuning
One-shot learning: only a few
examples in target domain
• Task description
• Source data: 𝑥 𝑠 , 𝑦 𝑠 A large amount
• Target data: 𝑥 𝑡 , 𝑦 𝑡 Very little
• Example: (supervised) speaker adaption
• Source data: audio data and transcriptions from many
speakers
• Target data: audio data and its transcriptions of specific
user
• Idea: training a model by source data, then fine-
tune the model by target data
• Challenge: only limited target data, so be careful about
overfitting
Conservative Training
Output layer output close Output layer

parameter close

initialization
Input layer Input layer

Target data (e.g.


Source data
A little data from
(e.g. Audio data of
target speaker)
Many speakers)
Layer Transfer
Output layer Copy some parameters

Target data

Input layer 1. Only train the rest layers (prevent


Source
data overfitting)
2. fine-tune the whole network (if
there is sufficient data)
Layer Transfer
• Which layer can be transferred (copied)?
• Image: usually copy the first few layers

Pixels Layer 1 Layer 2 Layer L


x1 …… ……
x2 …… elephant

……
……
……
……

xN …… ……
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled

Model Fine-tuning
labelled

Multitask Learning
Target Data

unlabeled

Warning: different terminology in different literature


Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Task A Task B
Task A Task B

Input
Input feature Input feature
feature
for task A for task B
Multitask Learning
- Multilingual Speech Recognition
states of states of states of states of states of
French German Spanish Italian Mandarin

Human languages
share some common
characteristics.
acoustic features
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
50
Character Error Rate

45

40 Mandarin
only
35

With
30 European
Language
25
1 10 100 1000

Hours of training data for Mandarin


Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual
deep neural network with shared hidden layers." ICASSP, 2013
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled

Model Fine-tuning
labelled

Multitask Learning
Target Data

Domain-adaptation
unlabeled

Warning: different terminology in different literature


You have learned a lot about ML. Training a classifier is
not a big deal for you. ☺
Training
Data

Testing
Data
99.5% 57.5%
The results are from: http://proceedings.mlr.press/v37/ganin15.pdf

Domain shift: Training and testing data have different


distributions. Domain adaptation
Domain Shift
Training Data Testing Data

Source Target
Domain Domain

1 2 3 4 5 1 2 3 4 5

This is “0”. This is “1”.


Source Domain
Domain Adaptation (with labeled data)

“4” “0” “1”


Knowledge of target domain

• Idea: training a model by source data,


“8”
then fine-tune the model by target data
• Challenge: only limited target data, so be Little but
careful about overfitting labeled
Source Domain
Domain Adaptation (with labeled data)

“4” “0” “1”


Knowledge of target domain

“8”

Large amount of Little but


unlabeled data labeled
Basic Idea Learn to ignore colors
Feature
Extractor feature
(network)

Source
The same
Different
distribution
Target

Feature
Extractor feature
(network)
Domain Adversarial Training
image class distribution

Feature Label
“4”
Extractor Predictor

Source
(labeled)
blue points
Target
(unlabeled)
red points
Domain Adversarial Training
𝜃𝑓∗ = min 𝐿 − 𝐿𝑑 always zero?
𝜃𝑓
𝜃𝑓 𝜃𝑝
Feature Label
“4”
Extractor Predictor
𝐿
Generator 𝜃𝑝∗ = min 𝐿
𝜃𝑝

• Feature extractor: Learn 𝜃𝑑∗ = min 𝐿𝑑


𝜃𝑑
𝐿𝑑
to “fool” domain classifier 𝜃𝑑
Domain Source?
• Also need to support Classifier Target?
label predictor
Discriminator
Domain Adversarial Training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
class 1 (source) Target data
class 2 (source) (class unknown)
Limitation
Decision boundaries learned
from source domain

Source and target data Target data (unlabeled


are aligned, but …… far from boundary)
Considering Decision Boundary
unlabeled Small entropy

Feature Label
Extractor Predictor
1 2 3 4 5

unlabeled
Large entropy
Feature Label
Extractor Predictor
1 2 3 4 5

Used in Decision-boundary Iterative Refinement Training with


a Teacher (DIRT-T) https://arxiv.org/abs/1802.08735

Maximum Classifier Discrepancy https://arxiv.org/abs/1712.02560


Transfer Learning - Overview
Source Data (not directly related to the task)

labelled

Model Fine-tuning
labelled

Multitask Learning
Target Data

Domain-adaptation
unlabeled

Zero-shot learning

Warning: different terminology in different literature


http://evchk.wikia.com/wiki/%E8%8
Zero-shot Learning D%89%E6%B3%A5%E9%A6%AC

• Source data: 𝑥 𝑠 , 𝑦 𝑠 Training data Different


• Target data: 𝑥 𝑡 Testing data tasks

𝑥 𝑠: …… 𝑥𝑡 :

𝑦𝑠: cat dog …… Alpaca

How we solve this problem?


Zero-shot Learning
• Representing each class by its attributes
Training
1 0 0 1 1 1 Database
furry 4 legs tail furry 4 legs tail
attributes
furry 4 legs tail …
Dog O O O
NN NN
class Fish X X O
Chimp O X X

sufficient attributes for one


to one mapping
Zero-shot Learning
• Representing each class by its attributes
Testing Find the class with the most
similar attributes
0 0 1
furry 4 legs tail attributes
furry 4 legs tail …
Dog O O O
NN
class Fish X X O
Chimp O X X

sufficient attributes for one


to one mapping
𝑓 ∗ and g ∗ can be NN.
Zero-shot Learning Training target:
𝑓 𝑥 𝑛 and 𝑔 𝑦 𝑛 as
close as possible
• Attribute embedding

x2 y1 (attribute
of chimp) y2 (attribute
x1 of dog)

2
𝑓 𝑥2 𝑔 𝑦
𝑓 𝑥1 𝑔 𝑦1

y3 (attribute of 𝑔 𝑦3 𝑓 𝑥3
x3
Alpaca)
Embedding Space
More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M.
Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS
2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia
Schmid, “Label-Embedding for Attribute-Based Classification”,
CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff
Dean, Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep
Visual-Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey
Dean, “Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko,
“Captioning Images with Diverse Objects”, arXiv preprint 2016

You might also like