0% found this document useful (0 votes)

39 views40 pages

Transfer Learning

Uploaded by

杨小飞

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views40 pages

Transfer Learning

Uploaded by

杨小飞

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Transfer Learning

http://weebly110810.weebly.com/
396403913129399.html

Transfer Learning http://www.sucaitianxia.com/png/

cartoon/200811/4261.html

Dog/Cat
Classifier
cat dog

Data not directly related to the task considered

elephant tiger dog cat

Similar domain, different tasks Different domains, same task

http://www.bigr.nl/website/structure/main.php?
page=researchlines&subpage=project&id=64
Why? http://www.spear.com.hk/Translation-company-Directory.html

Task Considered Data not directly related

Speech English
Recognition Chinese
Taiwanese ……

Image
Medical
Recognition
Images

Text
Specific Webpages
Analysis
domain
Transfer Learning
• Example in real life

研究生漫畫家
研究生漫畫家

跑實驗畫分鏡

指導教授責編

投稿期刊投稿
jump
(word embedding knows that) 爆漫王
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Model Fine-tuning
Target Data

Warning: different terminology in

different literature
unlabeled
Model Fine-tuning
One-shot learning: only a few
examples in target domain
• Task description
• Source data: A large amount
• Target data: Very little
• Example: (supervised) speaker adaption
• Source data: audio data and transcriptions from many
speakers
• Target data: audio data and its transcriptions of specific
user
• Idea: training a model by source data, then fine-
tune the model by target data
• Challenge: only limited target data, so be careful about
overfitting
Conservative Training
Output layer output close Output layer

parameter close

initialization
Input layer Input layer

Target data (e.g.

Source data
A little data from
(e.g. Audio data of
target speaker)
Many speakers)
Layer Transfer
Output layer Copy some parameters

Target data

Input layer 1. Only train the rest layers (prevent

Source
data overfitting)
2. fine-tune the whole network (if
there is sufficient data)
Layer Transfer
• Which layer can be transferred (copied)?
• Speech: usually copy the last few layers
• Image: usually copy the first few layers

Pixels Layer 1 Layer 2 Layer L

x1 …… ……
x2 …… elephant

……
……
……
……

xN …… ……
Layer Transfer - Image

fine-tune the
whole network

Source: 500 classes

from ImageNet
Only train the
Target: another 500 rest layers
classes from ImageNet

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable

are features in deep neural networks?”, NIPS, 2014
Layer Transfer - Image
Only train the
rest layers

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable

are features in deep neural networks?”, NIPS, 2014
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Fine-tuning
Multitask Learning
Target Data

unlabeled
Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Task A Task B
Task A Task B

Input
Input feature Input feature
feature
for task A for task B
Multitask Learning
- Multilingual Speech
Recognition
states of states of states of states of states of
French German Spanish Italian Mandarin

Human languages
share some common
characteristics.
acoustic features
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
50
Character Error Rate

40 Mandarin
only

With
30 European
Language
25
1 10 100 1000

Hours of training data for Mandarin

Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual
deep neural network with shared hidden layers." ICASSP, 2013
Progressive Neural Networks
Task 1 Task 2 Task 3

input input input

Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James
Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell, “Progressive
Neural Networks”, arXiv preprint 2016
Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David
Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra, “PathNet: Evolution
Channels Gradient Descent in Super Neural Networks”, arXiv preprint, 2017
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Fine-tuning
Multitask Learning
Target Data

Domain-adversarial
unlabeled

training
Task description
• Source data: Training data Same task,
• Target data: Testing data mismatch

with label

without label
Domain-adversarial training
Domain-adversarial training
feature extractor Similar to GAN
Too easy to feature
extractor ……

Domain classifier
Domain-adversarial training
Maximize label classification accuracy + Maximize label
minimize domain classification accuracy classification accuracy
feature extractor Label predictor

Domain classifier
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
Maximize domain
classification accuracy
This is a big network, but different parts have different goals.
Domain-adversarial training

Domain classifier fails in the end

It should struggle ……
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Domain-adversarial training

Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,

ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
labelled

Fine-tuning
Multitask Learning
Target Data

Domain-adversarial
unlabeled

training

Zero-shot learning
Zero-shot Learning http://evchk.wikia.com/wiki/
%E8%8D
%89%E6%B3%A5%E9%A6%AC

• Source data: Training data Different

• Target data: Testing data tasks

: …… :

: cat dog ……

In speech recognition, we can not have all possible words

in the source (training) data.
How we solve this problem in speech recognition?
Zero-shot Learning
• Representing each class by its attributes
Training
1 0 0 1 1 1 Database
furry 4 legs tail furry 4 legs tail
attributes
furry 4 legs tail …
Dog O O O
NN NN Fish X X O
class
Chimp O X X
…

sufficient attributes for one

to one mapping
Zero-shot Learning
• Representing each class by its attributes
Testing Find the class with the most
similar attributes
0 0 1
furry 4 legs tail attributes
furry 4 legs tail …
Dog O O O
NN Fish X X O
class
Chimp O X X
…

sufficient attributes for one

to one mapping
and can be NN.
Zero-shot Learning Training target:
and as close as
• Attribute embedding possible

x 2 y1 (attribute
of chimp) y2 (attribute
x1 of dog)

(
𝑓 𝑥
2
) 𝑔 ( 𝑦 2
)
𝑓 ( 𝑥1) 𝑔 ( 𝑦1)

𝑔 ( 𝑦3) 𝑓 ( 𝑦 )
3
y3 (attribute of x3
Grass-mud
Embedding Space
horse)
What if we don’t
Zero-shot Learning have database

• Attribute embedding + word embedding

x 2 y1 (attribute
V(chimp)
of chimp) y2 (attribute
V(dog)
x1 of dog)

(
𝑓 𝑥
2
) 𝑔 ( 𝑦 2
)
𝑓 ( 𝑥1) 𝑔 ( 𝑦1)

𝑔 ( 𝑦3) 𝑓 ( 𝑦 )
3
V(Grass-of
y3 (attribute x3
mud_horse)
Grass-mud
Embedding Space
horse)
Zero-shot Learning
𝑓 ,𝑔 =𝑎𝑟𝑔 min ∑ ‖ 𝑓 𝑥 − 𝑔 𝑦 )‖2
( ) (
∗ ∗ 𝑛 𝑛
Problem?
𝑓 ,𝑔 𝑛

(
𝑓 , 𝑔 =𝑎𝑟𝑔 min ∑ 𝑚𝑎𝑥 0 , 𝑘 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) +max 𝑓 ( 𝑥 ) ∙ 𝑔 (
∗ ∗ 𝑛 𝑛 𝑚

𝑓 ,𝑔 𝑛 𝑚≠ 𝑛

(
, 𝑔 =𝑎𝑟𝑔 min ∑ 𝑚𝑎𝑥 0 , 𝑚 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑥 ) +max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 )
)
∗ ∗ 𝑛 𝑛 𝑛 𝑚

𝑓 ,𝑔 𝑛 Margin you defined 𝑚≠ 𝑛

Zero loss: 𝑘 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 )+ max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) <0

𝑛 𝑛 𝑛 𝑚

𝑚≠ 𝑛

𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) − max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) >𝑘
𝑛 𝑛 𝑛 𝑚

𝑚≠ 𝑛

and as close and not as close

Zero-shot Learning
• Convex Combination of Semantic Embedding

lion tiger V(tiger)

0.5 0.5 Find the closest
word vector
V(liger)
NN

V(lion 0.5V(tiger)+0.5V(lion)
)
Only need off-the-shelf NN for
ImageNet and word vector
https://arxiv.org/pdf/1312.5650v3.pdf
Example of Zero-shot Learning

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen,
Nikhil Thorat. Google’s Multilingual Neural Machine Translation System: Enabling
Zero-Shot Translation, arXiv preprint 2016
Example of Zero-shot Learning
Transfer Learning - Overview
Source Data (not directly related to the task)

labelled unlabeled
Self-taught learning
labelled

Fine-tuning Rajat Raina , Alexis Battle , Honglak

Lee , Benjamin Packer , Andrew Y. Ng,
Self-taught learning: transfer learning
Multitask Learning from unlabeled data, ICML, 2007
Target Data

Different from semi-

Domain-adversarial supervised learning
unlabeled

training Self-taught Clustering

Wenyuan Dai, Qiang Yang,
Zero-shot learning Gui-Rong Xue, Yong Yu, "Self-
taught clustering", ICML 2008
Self-taught learning
• Learning to extract better representation from the source
data (unsupervised approach)
• Extracting better representation for target data
Acknowledgement
• 感謝劉致廷同學於上課時發現投影片上的錯誤
Appendix
More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell,
“Zero-shot Learning with Semantic Output Codes”, NIPS 2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia Schmid,
“Label-Embedding for Attribute-Based Classification”, CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean,
Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep Visual-
Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean,
“Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko, “Captioning
Images with Diverse Objects”, arXiv preprint 2016

Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
订购优质论文，请访问
100% (2)
订购优质论文，请访问
72 pages
Stable Diffusion
No ratings yet
Stable Diffusion
58 pages
Data Science Guide
100% (1)
Data Science Guide
275 pages
Transfer Learning Through Embedding Spaces (Z-Lib - Io)
No ratings yet
Transfer Learning Through Embedding Spaces (Z-Lib - Io)
223 pages
Lecture 17 Transfer Learning
No ratings yet
Lecture 17 Transfer Learning
12 pages
ML 108-2 Domain Adaptation
No ratings yet
ML 108-2 Domain Adaptation
148 pages
Lecture3 Transfer Learning
No ratings yet
Lecture3 Transfer Learning
28 pages
Transfer (v3)
No ratings yet
Transfer (v3)
38 pages
11 Deep Transfer Learning and Multi Task Learning
No ratings yet
11 Deep Transfer Learning and Multi Task Learning
24 pages
Data - and AI-driven Methods in Engineering
No ratings yet
Data - and AI-driven Methods in Engineering
40 pages
Transfer Learning
No ratings yet
Transfer Learning
60 pages
Transferability in Deep Learning: A Survey: Junguang Jiang
No ratings yet
Transferability in Deep Learning: A Survey: Junguang Jiang
64 pages
Meta-Learning For Few-Shot Natural Language Processing - A Survey
No ratings yet
Meta-Learning For Few-Shot Natural Language Processing - A Survey
7 pages
MLSys 2022 Taglets A System For Automatic Semi Supervised Learning With Auxiliary Data Paper
No ratings yet
MLSys 2022 Taglets A System For Automatic Semi Supervised Learning With Auxiliary Data Paper
21 pages
Zero Shot Learning For Text Classification
No ratings yet
Zero Shot Learning For Text Classification
6 pages
T L P - C G M: Ransfer Earning With RE Trained Onditional Enerative Odels
No ratings yet
T L P - C G M: Ransfer Earning With RE Trained Onditional Enerative Odels
24 pages
Lec11 Transfer Learning
No ratings yet
Lec11 Transfer Learning
45 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
4 CS826 - Meta Learning
No ratings yet
4 CS826 - Meta Learning
40 pages
Session 5
No ratings yet
Session 5
33 pages
Lecture 11 Transfer and Few-Shot Learning
No ratings yet
Lecture 11 Transfer and Few-Shot Learning
47 pages
Unit - Ichp 4
No ratings yet
Unit - Ichp 4
19 pages
Unit-V Tranfer Learning Notes
No ratings yet
Unit-V Tranfer Learning Notes
27 pages
Exploring The Limits of Transfer Learning With A Unified Text-to-Text Transformer
No ratings yet
Exploring The Limits of Transfer Learning With A Unified Text-to-Text Transformer
67 pages
Arxiv - 20191023 - Colin Raffel - Exploring The Limits of Transfer Learning With A Unified Text-to-Text Transformer
No ratings yet
Arxiv - 20191023 - Colin Raffel - Exploring The Limits of Transfer Learning With A Unified Text-to-Text Transformer
53 pages
Multitask Transfer
No ratings yet
Multitask Transfer
36 pages
Iconips Paper On Transfer Learning
No ratings yet
Iconips Paper On Transfer Learning
11 pages
Unit - V
No ratings yet
Unit - V
44 pages
Ml-Ii 5
No ratings yet
Ml-Ii 5
5 pages
ULMfit Universal Language Model Fine-Tuning For Text Classification
No ratings yet
ULMfit Universal Language Model Fine-Tuning For Text Classification
9 pages
Learning Text Similarity With Siamese Recurrent Networks: Paul Neculoiu, Maarten Versteegh Mihai Rotaru
No ratings yet
Learning Text Similarity With Siamese Recurrent Networks: Paul Neculoiu, Maarten Versteegh Mihai Rotaru
10 pages
17.feature-Based Distant Domain Transfer Learning
No ratings yet
17.feature-Based Distant Domain Transfer Learning
8 pages
A Teacher-Student Framework For Zero-Resource Neural Machine Translation
No ratings yet
A Teacher-Student Framework For Zero-Resource Neural Machine Translation
11 pages
Transfer Learning Using PNN
No ratings yet
Transfer Learning Using PNN
5 pages
Transfer Learning On Quora Dataset
No ratings yet
Transfer Learning On Quora Dataset
6 pages
Latent Translation: Crossing Modalities by Bridging Generative Models
No ratings yet
Latent Translation: Crossing Modalities by Bridging Generative Models
16 pages
Unit 4
No ratings yet
Unit 4
50 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
Transfer Learning
No ratings yet
Transfer Learning
13 pages
Learning Transferable Visual Models From Natural Language Supervision
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
47 pages
Unit Iii
No ratings yet
Unit Iii
26 pages
Research Paper 2
No ratings yet
Research Paper 2
8 pages
An Embarrassingly Simple Approach For Transfer Learning From Pretrained Language Models
No ratings yet
An Embarrassingly Simple Approach For Transfer Learning From Pretrained Language Models
7 pages
Domain Differential Adaptation For Neural Machine Translation
No ratings yet
Domain Differential Adaptation For Neural Machine Translation
11 pages
Presentation RaviShankar
No ratings yet
Presentation RaviShankar
28 pages
ReviewPaper TransferLearning
No ratings yet
ReviewPaper TransferLearning
6 pages
Google T5
No ratings yet
Google T5
67 pages
Neural Transfer Learning For NLP
No ratings yet
Neural Transfer Learning For NLP
329 pages
A Comprehensive Survey On Transfer Learning
No ratings yet
A Comprehensive Survey On Transfer Learning
31 pages
Transfer Learning
No ratings yet
Transfer Learning
24 pages
Transfer Learnring
No ratings yet
Transfer Learnring
5 pages
5 - Transfer - Learning
No ratings yet
5 - Transfer - Learning
1 page
AdapterFusion: Non-Destructive Task Composition For Transfer Learning
No ratings yet
AdapterFusion: Non-Destructive Task Composition For Transfer Learning
17 pages
Learning Transferable Visual Models From Natural Language Supervision
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
48 pages
A Survey On Deep Transfer Learning
No ratings yet
A Survey On Deep Transfer Learning
10 pages
One Model To Learn Them All: Work Performed While at Google Brain
No ratings yet
One Model To Learn Them All: Work Performed While at Google Brain
10 pages
A Survey On Transfer Learning
No ratings yet
A Survey On Transfer Learning
42 pages

Transfer Learning

Uploaded by

Transfer Learning

Uploaded by

Transfer Learning

Transfer Learning http://www.sucaitianxia.com/png/

Data not directly related to the task considered

elephant tiger dog cat

Similar domain, different tasks Different domains, same task

Task Considered Data not directly related

Warning: different terminology in

Target data (e.g.

Input layer 1. Only train the rest layers (prevent

Pixels Layer 1 Layer 2 Layer L

Source: 500 classes

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable

Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable

Hours of training data for Mandarin

input input input

Domain classifier fails in the end

Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,

• Source data: Training data Different

In speech recognition, we can not have all possible words

sufficient attributes for one

sufficient attributes for one

• Attribute embedding + word embedding

𝑓 ,𝑔 𝑛 Margin you defined 𝑚≠ 𝑛

Zero loss: 𝑘 − 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 )+ max 𝑓 ( 𝑥 ) ∙ 𝑔 ( 𝑦 ) <0

and as close and not as close

lion tiger V(tiger)

Fine-tuning Rajat Raina , Alexis Battle , Honglak

Different from semi-

training Self-taught Clustering

You might also like