Transfer Learning
Transfer Learning
http://weebly110810.weebly.com/
396403913129399.html
Dog/Cat
Classifier
cat dog
Speech English
Recognition Chinese
Taiwanese ……
Image
Medical
Recognition
Images
Text
Specific Webpages
Analysis
domain
Transfer Learning
• Example in real life
研究生 漫畫家
研究生 漫畫家
跑實驗 畫分鏡
指導教授 責編
投稿期刊 投稿
jump
(word embedding knows that) 爆漫王
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled unlabeled
labelled
Model Fine-tuning
Target Data
parameter close
initialization
Input layer Input layer
Target data
……
……
……
……
xN …… ……
Layer Transfer - Image
fine-tune the
whole network
labelled unlabeled
labelled
Fine-tuning
Multitask Learning
Target Data
unlabeled
Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Task A Task B
Task A Task B
Input
Input feature Input feature
feature
for task A for task B
Multitask Learning
- Multilingual Speech
Recognition
states of states of states of states of states of
French German Spanish Italian Mandarin
Human languages
share some common
characteristics.
acoustic features
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
50
Character Error Rate
45
40 Mandarin
only
35
With
30 European
Language
25
1 10 100 1000
labelled unlabeled
labelled
Fine-tuning
Multitask Learning
Target Data
Domain-adversarial
unlabeled
training
Task description
• Source data: Training data Same task,
• Target data: Testing data mismatch
with label
without label
Domain-adversarial training
Domain-adversarial training
feature extractor Similar to GAN
Too easy to feature
extractor ……
Domain classifier
Domain-adversarial training
Maximize label classification accuracy + Maximize label
minimize domain classification accuracy classification accuracy
feature extractor Label predictor
Domain classifier
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
Maximize domain
classification accuracy
This is a big network, but different parts have different goals.
Domain-adversarial training
labelled unlabeled
labelled
Fine-tuning
Multitask Learning
Target Data
Domain-adversarial
unlabeled
training
Zero-shot learning
Zero-shot Learning http://evchk.wikia.com/wiki/
%E8%8D
%89%E6%B3%A5%E9%A6%AC
: …… :
: cat dog ……
x 2 y1 (attribute
of chimp) y2 (attribute
x1 of dog)
(
𝑓 𝑥
2
) 𝑔 ( 𝑦 2
)
𝑓 ( 𝑥1) 𝑔 ( 𝑦1)
𝑔 ( 𝑦3) 𝑓 ( 𝑦 )
3
y3 (attribute of x3
Grass-mud
Embedding Space
horse)
What if we don’t
Zero-shot Learning have database
x 2 y1 (attribute
V(chimp)
of chimp) y2 (attribute
V(dog)
x1 of dog)
(
𝑓 𝑥
2
) 𝑔 ( 𝑦 2
)
𝑓 ( 𝑥1) 𝑔 ( 𝑦1)
𝑔 ( 𝑦3) 𝑓 ( 𝑦 )
3
V(Grass-of
y3 (attribute x3
mud_horse)
Grass-mud
Embedding Space
horse)
Zero-shot Learning
𝑓 ,𝑔 =𝑎𝑟𝑔 min ∑ ‖ 𝑓 𝑥 − 𝑔 𝑦 )‖2
( ) (
∗ ∗ 𝑛 𝑛
Problem?
𝑓 ,𝑔 𝑛
(
𝑓 , 𝑔 =𝑎𝑟𝑔 min ∑ 𝑚𝑎𝑥 0 , 𝑘 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) +max 𝑓 ( 𝑥 ) ∙ 𝑔 (
∗ ∗ 𝑛 𝑛 𝑚
𝑓 ,𝑔 𝑛 𝑚≠ 𝑛
(
, 𝑔 =𝑎𝑟𝑔 min ∑ 𝑚𝑎𝑥 0 , 𝑚 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑥 ) +max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 )
)
∗ ∗ 𝑛 𝑛 𝑛 𝑚
𝑚≠ 𝑛
𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) − max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) >𝑘
𝑛 𝑛 𝑛 𝑚
𝑚≠ 𝑛
V(lion 0.5V(tiger)+0.5V(lion)
)
Only need off-the-shelf NN for
ImageNet and word vector
https://arxiv.org/pdf/1312.5650v3.pdf
Example of Zero-shot Learning
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen,
Nikhil Thorat. Google’s Multilingual Neural Machine Translation System: Enabling
Zero-Shot Translation, arXiv preprint 2016
Example of Zero-shot Learning
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled unlabeled
Self-taught learning
labelled