Lecture3 Transfer Learning
Lecture3 Transfer Learning
• Associate Professor
• Electrical and Computer Engineering
• Newark College of Engineering
• New Jersey Institute of Technology
• https://tao-han-njit.netlify.app
Slides are designed based on Prof. Hung-yi Lee’s Machine Learning courses at National Taiwan University
http://weebly110810.weebly.com/3
96403913129399.html
http://www.sucaitianxia.com/png/c
Transfer Learning artoon/200811/4261.html
Dog/Cat
Classifier
cat dog
labelled
Model Fine-tuning
labelled
Target Data
unlabeled
parameter close
initialization
Input layer Input layer
Target data
……
……
……
……
xN …… ……
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled
Model Fine-tuning
labelled
Multitask Learning
Target Data
unlabeled
Input
Input feature Input feature
feature
for task A for task B
Multitask Learning
- Multilingual Speech Recognition
states of states of states of states of states of
French German Spanish Italian Mandarin
Human languages
share some common
characteristics.
acoustic features
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
50
Character Error Rate
45
40 Mandarin
only
35
With
30 European
Language
25
1 10 100 1000
labelled
Model Fine-tuning
labelled
Multitask Learning
Target Data
Domain-adaptation
unlabeled
Testing
Data
99.5% 57.5%
The results are from: http://proceedings.mlr.press/v37/ganin15.pdf
Source Target
Domain Domain
1 2 3 4 5 1 2 3 4 5
“8”
Source
The same
Different
distribution
Target
Feature
Extractor feature
(network)
Domain Adversarial Training
image class distribution
Feature Label
“4”
Extractor Predictor
Source
(labeled)
blue points
Target
(unlabeled)
red points
Domain Adversarial Training
𝜃𝑓∗ = min 𝐿 − 𝐿𝑑 always zero?
𝜃𝑓
𝜃𝑓 𝜃𝑝
Feature Label
“4”
Extractor Predictor
𝐿
Generator 𝜃𝑝∗ = min 𝐿
𝜃𝑝
Feature Label
Extractor Predictor
1 2 3 4 5
unlabeled
Large entropy
Feature Label
Extractor Predictor
1 2 3 4 5
labelled
Model Fine-tuning
labelled
Multitask Learning
Target Data
Domain-adaptation
unlabeled
Zero-shot learning
𝑥 𝑠: …… 𝑥𝑡 :
x2 y1 (attribute
of chimp) y2 (attribute
x1 of dog)
2
𝑓 𝑥2 𝑔 𝑦
𝑓 𝑥1 𝑔 𝑦1
y3 (attribute of 𝑔 𝑦3 𝑓 𝑥3
x3
Alpaca)
Embedding Space
More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M.
Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS
2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia
Schmid, “Label-Embedding for Attribute-Based Classification”,
CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff
Dean, Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep
Visual-Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey
Dean, “Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko,
“Captioning Images with Diverse Objects”, arXiv preprint 2016