0% found this document useful (0 votes)
39 views40 pages

Transfer Learning

Uploaded by

杨小飞
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views40 pages

Transfer Learning

Uploaded by

杨小飞
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Transfer Learning

http://weebly110810.weebly.com/
396403913129399.html

Transfer Learning http://www.sucaitianxia.com/png/


cartoon/200811/4261.html

Dog/Cat
Classifier
cat dog

Data not directly related to the task considered

elephant tiger dog cat

Similar domain, different tasks Different domains, same task


http://www.bigr.nl/website/structure/main.php?
page=researchlines&subpage=project&id=64
Why? http://www.spear.com.hk/Translation-company-Directory.html

Task Considered Data not directly related

Speech English
Recognition Chinese
Taiwanese ……

Image
Medical
Recognition
Images

Text
Specific Webpages
Analysis
domain
Transfer Learning
• Example in real life

研究生 漫畫家
研究生 漫畫家

跑實驗 畫分鏡

指導教授 責編

投稿期刊 投稿
jump
(word embedding knows that) 爆漫王
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Model Fine-tuning
Target Data

Warning: different terminology in


different literature
unlabeled
Model Fine-tuning
One-shot learning: only a few
examples in target domain
• Task description
• Source data: A large amount
• Target data: Very little
• Example: (supervised) speaker adaption
• Source data: audio data and transcriptions from many
speakers
• Target data: audio data and its transcriptions of specific
user
• Idea: training a model by source data, then fine-
tune the model by target data
• Challenge: only limited target data, so be careful about
overfitting
Conservative Training
Output layer output close Output layer

parameter close

initialization
Input layer Input layer

Target data (e.g.


Source data
A little data from
(e.g. Audio data of
target speaker)
Many speakers)
Layer Transfer
Output layer Copy some parameters

Target data

Input layer 1. Only train the rest layers (prevent


Source
data overfitting)
2. fine-tune the whole network (if
there is sufficient data)
Layer Transfer
• Which layer can be transferred (copied)?
• Speech: usually copy the last few layers
• Image: usually copy the first few layers

Pixels Layer 1 Layer 2 Layer L


x1 …… ……
x2 …… elephant

……
……
……
……

xN …… ……
Layer Transfer - Image

fine-tune the
whole network

Source: 500 classes


from ImageNet
Only train the
Target: another 500 rest layers
classes from ImageNet

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable


are features in deep neural networks?”, NIPS, 2014
Layer Transfer - Image
Only train the
rest layers

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable


are features in deep neural networks?”, NIPS, 2014
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Fine-tuning
Multitask Learning
Target Data

unlabeled
Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Task A Task B
Task A Task B

Input
Input feature Input feature
feature
for task A for task B
Multitask Learning
- Multilingual Speech
Recognition
states of states of states of states of states of
French German Spanish Italian Mandarin

Human languages
share some common
characteristics.
acoustic features
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
50
Character Error Rate

45

40 Mandarin
only

35

With
30 European
Language
25
1 10 100 1000

Hours of training data for Mandarin


Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual
deep neural network with shared hidden layers." ICASSP, 2013
Progressive Neural Networks
Task 1 Task 2 Task 3

input input input


Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James
Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell, “Progressive
Neural Networks”, arXiv preprint 2016
Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David
Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra, “PathNet: Evolution
Channels Gradient Descent in Super Neural Networks”, arXiv preprint, 2017
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Fine-tuning
Multitask Learning
Target Data

Domain-adversarial
unlabeled

training
Task description
• Source data: Training data Same task,
• Target data: Testing data mismatch

with label

without label
Domain-adversarial training
Domain-adversarial training
feature extractor Similar to GAN
Too easy to feature
extractor ……

Domain classifier
Domain-adversarial training
Maximize label classification accuracy + Maximize label
minimize domain classification accuracy classification accuracy
feature extractor Label predictor

Domain classifier
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
Maximize domain
classification accuracy
This is a big network, but different parts have different goals.
Domain-adversarial training

Domain classifier fails in the end


It should struggle ……
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Domain-adversarial training

Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,


ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Fine-tuning
Multitask Learning
Target Data

Domain-adversarial
unlabeled

training

Zero-shot learning
Zero-shot Learning http://evchk.wikia.com/wiki/
%E8%8D
%89%E6%B3%A5%E9%A6%AC

• Source data: Training data Different


• Target data: Testing data tasks

: …… :

: cat dog ……

In speech recognition, we can not have all possible words


in the source (training) data.
How we solve this problem in speech recognition?
Zero-shot Learning
• Representing each class by its attributes
Training
1 0 0 1 1 1 Database
furry 4 legs tail furry 4 legs tail
attributes
furry 4 legs tail …
Dog O O O
NN NN Fish X X O
class
Chimp O X X

sufficient attributes for one


to one mapping
Zero-shot Learning
• Representing each class by its attributes
Testing Find the class with the most
similar attributes
0 0 1
furry 4 legs tail attributes
furry 4 legs tail …
Dog O O O
NN Fish X X O
class
Chimp O X X

sufficient attributes for one


to one mapping
and can be NN.
Zero-shot Learning Training target:
and as close as
• Attribute embedding possible

x 2 y1 (attribute
of chimp) y2 (attribute
x1 of dog)

(
𝑓 𝑥
2
) 𝑔 ( 𝑦 2
)
𝑓 ( 𝑥1) 𝑔 ( 𝑦1)

𝑔 ( 𝑦3) 𝑓 ( 𝑦 )
3
y3 (attribute of x3
Grass-mud
Embedding Space
horse)
What if we don’t
Zero-shot Learning have database

• Attribute embedding + word embedding

x 2 y1 (attribute
V(chimp)
of chimp) y2 (attribute
V(dog)
x1 of dog)

(
𝑓 𝑥
2
) 𝑔 ( 𝑦 2
)
𝑓 ( 𝑥1) 𝑔 ( 𝑦1)

𝑔 ( 𝑦3) 𝑓 ( 𝑦 )
3
V(Grass-of
y3 (attribute x3
mud_horse)
Grass-mud
Embedding Space
horse)
Zero-shot Learning
𝑓 ,𝑔 =𝑎𝑟𝑔 min ∑ ‖ 𝑓 𝑥 − 𝑔 𝑦 )‖2
( ) (
∗ ∗ 𝑛 𝑛
Problem?
𝑓 ,𝑔 𝑛

(
𝑓 , 𝑔 =𝑎𝑟𝑔 min ∑ 𝑚𝑎𝑥 0 , 𝑘 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) +max 𝑓 ( 𝑥 ) ∙ 𝑔 (
∗ ∗ 𝑛 𝑛 𝑚

𝑓 ,𝑔 𝑛 𝑚≠ 𝑛

(
, 𝑔 =𝑎𝑟𝑔 min ∑ 𝑚𝑎𝑥 0 , 𝑚 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑥 ) +max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 )
)
∗ ∗ 𝑛 𝑛 𝑛 𝑚

𝑓 ,𝑔 𝑛 Margin you defined 𝑚≠ 𝑛

Zero loss: 𝑘 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 )+ max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) <0


𝑛 𝑛 𝑛 𝑚

𝑚≠ 𝑛

𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) − max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) >𝑘
𝑛 𝑛 𝑛 𝑚

𝑚≠ 𝑛

and as close and not as close


Zero-shot Learning
• Convex Combination of Semantic Embedding

lion tiger V(tiger)


0.5 0.5 Find the closest
word vector
V(liger)
NN

V(lion 0.5V(tiger)+0.5V(lion)
)
Only need off-the-shelf NN for
ImageNet and word vector
https://arxiv.org/pdf/1312.5650v3.pdf
Example of Zero-shot Learning

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen,
Nikhil Thorat. Google’s Multilingual Neural Machine Translation System: Enabling
Zero-Shot Translation, arXiv preprint 2016
Example of Zero-shot Learning
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
Self-taught learning
labelled

Fine-tuning Rajat Raina , Alexis Battle , Honglak


Lee , Benjamin Packer , Andrew Y. Ng,
Self-taught learning: transfer learning
Multitask Learning from unlabeled data, ICML, 2007
Target Data

Different from semi-


Domain-adversarial supervised learning
unlabeled

training Self-taught Clustering


Wenyuan Dai, Qiang Yang,
Zero-shot learning Gui-Rong Xue, Yong Yu, "Self-
taught clustering", ICML 2008
Self-taught learning
• Learning to extract better representation from the source
data (unsupervised approach)
• Extracting better representation for target data
Acknowledgement
• 感謝 劉致廷 同學於上課時發現投影片上的錯誤
Appendix
More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell,
“Zero-shot Learning with Semantic Output Codes”, NIPS 2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia Schmid,
“Label-Embedding for Attribute-Based Classification”, CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean,
Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep Visual-
Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean,
“Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko, “Captioning
Images with Diverse Objects”, arXiv preprint 2016

You might also like