0% found this document useful (0 votes)

2 views

Meta-Learning & Transfer Learning

The document discusses the concepts of meta-learning and transfer learning in reinforcement learning (RL), emphasizing the importance of prior knowledge and task similarity for effective learning. It outlines various strategies for transfer learning, such as forward transfer, multi-task transfer, and meta-learning, while also addressing challenges like domain shift and finetuning issues. Additionally, it highlights the potential of meta-learning to accelerate learning by leveraging experience from multiple tasks and improving exploration strategies.

Uploaded by

vinodhrajapakse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Meta-Learning & Transfer Learning

Uploaded by

vinodhrajapakse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Meta-Learning & Transfer Learning

CS 285
Instructor: Sergey Levine
UC Berkeley
What’s the problem?

this is easy (mostly) this is impossible

Why?
Montezuma’s revenge
• Getting key = reward
• Opening door = reward
• Getting killed by skull = bad
Montezuma’s revenge
• We know what to do because we understand what
these sprites mean!
• Key: we know it opens doors!
• Ladders: we know we can climb them!
• Skull: we don’t know what it does, but we know it
can’t be good!
• Prior understanding of problem structure can help
us solve complex tasks quickly!
Can RL use the same prior knowledge as us?

• If we’ve solved prior tasks, we might acquire useful knowledge for

solving a new task
• How is the knowledge stored?
• Q-function: tells us which actions or states are good
• Policy: tells us which actions are potentially useful
• some actions are never useful!
• Models: what are the laws of physics that govern the world?
• Features/hidden states: provide us with a good representation
• Don’t underestimate this!
Transfer learning terminology

transfer learning: using experience from one set of tasks for faster
learning and better performance on a new task

in RL, task = MDP! “shot”: number of attempts in the target

domain
source domain target domain 0-shot: just run a policy trained in the source
domain
1-shot: try the task once
few shot: try the task a few times
How can we frame transfer learning problems?

1. Forward transfer: learn policies that transfer effectively

a) Train on source task, then run on target task (or finetune)
b) Relies on the tasks being quite similar!
2. Multi-task transfer: train on many tasks, transfer to a new task
a) Sharing representations and layers across tasks in multi-task learning
b) New task needs to be similar to the distribution of training tasks
3. Meta-learning: learn to learn on many tasks
a) Accounts for the fact that we’ll be adapting to a new task during training!
Pretraining + Finetuning

The most popular transfer learning method in (supervised) deep learning!

What issues are we likely to face?
➢ Domain shift: representations learned in the source
domain might not work well in the target domain

➢ Difference in the MDP: some things that are possible

to do in the source domain are not possible to do in
the target domain

➢ Finetuning issues: if pretraining & finetuning, the

finetuning process may still need to explore, but
optimal policy during finetuning may be deterministic!
Domain adaptation in computer vision
train here

correct answer

reversed gradient

(same network) can we force this layer to be invariant to domain?

domain classifier:
guess domain from z
incorrect answer

do well here Is this true?

Invariance assumption: everything that is different between domains is irrelevant
Domain adaptation in RL for dynamics?
Why is invariance not enough when the dynamics don’t match?

When might this not work?

Eysenbach et al., “Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers”
What if we can also finetune?

1. RL tasks are generally much less diverse

• Features are less general
• Policies & value functions become overly specialized
2. Optimal policies in fully observed MDPs are
deterministic
• Loss of exploration at convergence
• Low-entropy policies adapt very slowly to new settings

See “exploration 2” lecture on unsupervised skill discovery and “control as inference” lecture on MaxEnt RL methods!
How to maximize forward transfer?
Basic intuition: the more varied the training domain is, the more likely
we are to generalize in zero shot to a slightly different domain.
“Randomization” (dynamics/appearance/etc.): widely used for
simulation to real world transfer (e.g., in robotics)
EPOpt: randomizing physical parameters
training on single torso mass training on model ensemble

train test
ensemble adaptation
unmodeled effects

adapt

Rajeswaran et al., “EPOpt: Learning robust neural network policies…”

More randomization!

Sadeghi et al., “CAD2RL: Real Single-Image Flight without a Single Real Image.” 2016

Xue Bin Peng et al., “Sim-to-Real Transfer of Robotic Control with

Dynamics Randomization.” 2018
Lee et al., “Learning Quadrupedal Locomotion over Challenging Terrain.” 2020
Some suggested readings
Domain adaptation:

Tzeng, Hoffman, Zhang, Saenko, Darrell. Deep Domain Confusion: Maximizing for Domain Invariance. 2014.

Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand, Lempitsky. Domain-Adversarial Training of Neural Networks. 2015.

Tzeng*, Devin*, et al., Adapting Visuomotor Representations with Weak Pairwise Constraints. 2016.

Eysenbach et al., Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers. 2020.

Finetuning:

Finetuning via MaxEnt RL: Haarnoja*, Tang*, et al. (2017). Reinforcement Learning with Deep Energy-Based Policies.

Andreas et al. Modular multitask reinforcement learning with policy sketches. 2017.

Florensa et al. Stochastic neural networks for hierarchical reinforcement learning. 2017.

Kumar et al. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. 2020

Simulation to real world transfer:

Rajeswaran, et al. (2017). EPOpt: Learning Robust Neural Network Policies Using Model Ensembles.

Yu et al. (2017). Preparing for the Unknown: Learning a Universal Policy with Online System Identification.

Sadeghi & Levine. (2017). CAD2RL: Real Single Image Flight without a Single Real Image.

Tobin et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.

Tan et al. (2018). Sim-to-Real: Learning Agile Locomotion For Quadruped Robots.

…and many many others!

How can we frame transfer learning problems?

1. Forward transfer: learn policies that transfer effectively

learn learn learn learn learn

Multi-task learning can:

learn - Accelerate learning of all tasks
that are learned together
- Provide better pre-training for
down-stream tasks
Can we solve multiple tasks at once?

Multi-task RL corresponds to single-task RL in a joint MDP

etc. MDP 0
pick MDP randomly
sample
in first state etc. MDP 1

etc. MDP 2
How does the model know what to do?

• What if the policy can do multiple things in the same environment?

Contextual policies

e.g., do dishes or laundry

images: Peng, van de Panne, Peters

Goal-conditioned policies
A few relevant papers:
• Kaelbling. Learning to achieve goals.
another state • Schaul et al. Universal value function
approximators.
• Andrychowicz et al. Hindsight experience
➢ Convenient: no need to manually define rewards for each task replay.
➢ Can transfer in zero shot to a new task if it’s another goal!
• Eysenbach et al. C-learning: Learning to
➢ Often hard to train in practice (see references) achieve goals via recursive classification.
➢ Not all tasks are goal reaching tasks!
Meta-Learning
What is meta-learning?

• If you’ve learned 100 tasks already, can you

figure out how to learn more efficiently?
• Now having multiple tasks is a huge advantage!
• Meta-learning = learning to learn
• In practice, very closely related to multi-task
learning
• Many formulations
• Learning an optimizer
• Learning an RNN that ingests experience
• Learning a representation

image credit: Ke Li
Why is meta-learning a good idea?

• Deep reinforcement learning, especially model-free, requires a

huge number of samples
• If we can meta-learn a faster reinforcement learner, we can learn
new tasks efficiently!
• What can a meta-learned learner do differently?
• Explore more intelligently
• Avoid trying actions that are know to be useless
• Acquire the right features more quickly
Meta-learning with supervised learning

image credit: Ravi & Larochelle ‘17

Meta-learning with supervised learning

input (e.g., image) output (e.g., label)

training set

test label • How to read in training set?

• Many options, RNNs can work
• More on this later

test input
(few shot) training set
What is being “learned”?
test label

test input
(few shot) training set
What is being “learned”?

meta-learned
RNN hidden
weights
state
Meta Reinforcement Learning
The meta reinforcement learning problem
The meta reinforcement learning problem

0.5 m/s 0.7 m/s -0.2 m/s -0.7 m/s

Contextual policies and meta-learning

“context”
Meta-RL with recurrent policies

meta-learned
RNN hidden
weights
state
Meta-RL with recurrent policies

crucially, RNN hidden state is not reset between episodes!

+0 +0 +1 +1
+0 +0
Why recurrent policies learn to explore

optimizing total reward over

the entire meta-episode with
RNN policy automatically
episode learns to explore!

meta-episode
Meta-RL with recurrent policies

Heess, Hunt, Lillicrap, Silver. Memory-based control with Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel. RL2:
recurrent neural networks. 2015. Blundell, Kumaran, Botvinick. Learning to Reinforcement Fast Reinforcement Learning via Slow Reinforcement
Learning. 2016. Learning. 2016.
Architectures for meta-RL
standard RNN (LSTM) architecture

Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel. RL2:

Fast Reinforcement Learning via Slow Reinforcement
Learning. 2016.

attention + temporal convolution

Mishra, Rohaninejad, Chen, Abbeel. A Simple

Neural Attentive Meta-Learner.
parallel permutation-invariant context encoder

Rakelly, Zhou, Quillen, Finn, Levine. Efficient Off-Policy Meta-

Reinforcement learning via Probabilistic Context Variables.
Gradient-Based Meta-Learning
Back to representations…

is pretraining a type of meta-learning?

better features = faster learning of new task!
Meta-RL as an optimization problem

Finn, Abbeel, Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
MAML for RL in pictures
What did we just do??

Just another computation graph…

Can implement with any autodiff
package (e.g., TensorFlow)
But has favorable inductive bias…
MAML for RL in videos
after 1 gradient step after 1 gradient step
after MAML training (forward reward) (backward reward)
More on MAML/gradient-based meta-learning
for RL
MAML meta-policy gradient estimators:
• Finn, Abbeel, Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
• Foerster, Farquhar, Al-Shedivat, Rocktaschel, Xing, Whiteson. DiCE: The Infinitely
Differentiable Monte Carlo Estimator.
• Rothfuss, Lee, Clavera, Asfour, Abbeel. ProMP: Proximal Meta-Policy Search.

Improving exploration:
• Gupta, Mendonca, Liu, Abbeel, Levine. Meta-Reinforcement Learning of Structured
Exploration Strategies.
• Stadie*, Yang*, Houthooft, Chen, Duan, Wu, Abbeel, Sutskever. Some Considerations on
Learning to Explore via Meta-Reinforcement Learning.

Hybrid algorithms (not necessarily gradient-based):

• Houthooft, Chen, Isola, Stadie, Wolski, Ho, Abbeel. Evolved Policy Gradients.
• Fernando, Sygnowski, Osindero, Wang, Schaul, Teplyashin, Sprechmann, Pirtzel, Rusu. Meta-
Learning by the Baldwin Effect.
Meta-RL as a POMDP
Meta-RL as… partially observed RL?
Meta-RL as… partially observed RL?

encapsulates information policy

needs to solve current task
Meta-RL as… partially observed RL?

encapsulates information policy

needs to solve current task

this is not optimal! but it’s pretty good,

why? both in theory and in
some approximate posterior practice!
(e.g., variational)

act as though z was correct!

See, e.g. Russo, Roy. Learning to Optimize via Posterior Sampling.

Variational inference for meta-RL

maximize post-update reward stay close to prior

(same as standard meta-RL)

Rakelly*, Zhou*, Quillen, Finn, Levine. Efficient Off-Policy Meta-Reinforcement learning via
Probabilistic Context Variables. ICML 2019.
Specific instantiation: PEARL

perform maximization using soft actor-critic (SAC),

state-of-the-art off-policy RL algorithm

Rakelly*, Zhou*, Quillen, Finn, Levine. Efficient Off-Policy Meta-Reinforcement learning via
Probabilistic Context Variables. ICML 2019.
References on meta-RL, inference, and POMDPs

• Rakelly, Zhou, Quillen, Finn, Levine. Efficient Off-Policy

Meta-Reinforcement learning via Probabilistic Context
Variables. ICML 2019.

• Zintgraf, Igl, Shiarlis, Mahajan, Hofmann, Whiteson.

Variational Task Embeddings for Fast Adaptation in Deep
Reinforcement Learning.

• Humplik, Galashov, Hasenclever, Ortega, Teh, Heess. Meta

reinforcement learning as task inference.
The three perspectives on meta-RL

everything needed to solve task

The three perspectives on meta-RL
+ conceptually simple
+ relatively easy to apply
- vulnerable to meta-overfitting
- challenging to optimize in practice

+ good extrapolation (“consistent”)

+ conceptually elegant
- complex, requires many samples

+ simple, effective exploration via posterior sampling

+ elegant reduction to solving a special POMDP
- vulnerable to meta-overfitting
everything needed to solve task
- challenging to optimize in practice
But they’re not that different!

just perspective 1,
but with stochastic
hidden variables!
just a particular
architecture choice
for these

everything needed to solve task

Meta-RL and emergent phenomena
meta-RL gives rise to model-free meta-RL gives rise to meta-RL gives rise to
episodic learning model-based adaptation causal reasoning (!)

Ritter, Wang, Kurth-Nelson, Jayakumar, Blundell, Pascanu, Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo, Dasgupta, Wang, Chiappa, Mitrovic, Ortega, Raposo,
Botvinick. Been There, Done That: Meta-Learning with Hassabis, Botvinick. Prefrontal Cortex as a Meta- Hughes, Battaglia, Botvinick, Kurth-Nelson. Causal
Episodic Recall. Reinforcement Learning System. Reasoning from Meta-Reinforcement Learning.

Humans and animals seemingly learn behaviors in a variety of ways:

➢ Highly efficient but (apparently) model-free RL
➢ Episodic recall
➢ Model-based RL
➢ Causal inference
➢ etc.

Perhaps each of these is a separate “algorithm” in the brain

But maybe these are all emergent phenomena resulting from meta-RL?

Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
Sharad Kanse
No ratings yet
Sharad Kanse
3 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Lecture 16 Meta Learning
No ratings yet
Lecture 16 Meta Learning
39 pages
An Introduction To Intertask Transfer For Reinforcement Learning
No ratings yet
An Introduction To Intertask Transfer For Reinforcement Learning
20 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
2206.10442v1
No ratings yet
2206.10442v1
13 pages
Rldl End Sem
No ratings yet
Rldl End Sem
230 pages
Advances_and_Challenges_in_Meta-Learning_A_Technical_Review
No ratings yet
Advances_and_Challenges_in_Meta-Learning_A_Technical_Review
17 pages
Sim-to-Real Transfer in Deep Reinforcement Learning For Robotics: A Survey
No ratings yet
Sim-to-Real Transfer in Deep Reinforcement Learning For Robotics: A Survey
8 pages
Enhancing Robotic Manipulation: Harnessing The Power of Multi-Task Reinforcement Learning and Single Life Reinforcement Learning in Meta-World
No ratings yet
Enhancing Robotic Manipulation: Harnessing The Power of Multi-Task Reinforcement Learning and Single Life Reinforcement Learning in Meta-World
13 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
robotics1
No ratings yet
robotics1
17 pages
learn to learn
No ratings yet
learn to learn
17 pages
Supervised Pretraining Can Learn In-Context Reinforcement Learning
No ratings yet
Supervised Pretraining Can Learn In-Context Reinforcement Learning
27 pages
Q Transformer
No ratings yet
Q Transformer
20 pages
Meta-Learning in Neural Networks A Survey
No ratings yet
Meta-Learning in Neural Networks A Survey
20 pages
Bio_inspired_AI_seminar_paper
No ratings yet
Bio_inspired_AI_seminar_paper
18 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
3 pages
Transformers As Decision Makers Provable In-Context Reinforcement Learning Via Supervised Pretraining
No ratings yet
Transformers As Decision Makers Provable In-Context Reinforcement Learning Via Supervised Pretraining
61 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
ML-II 5
No ratings yet
ML-II 5
5 pages
2024 MTH058 Lecture09 Meta Learning
No ratings yet
2024 MTH058 Lecture09 Meta Learning
25 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Meta-Learning in Neural Networks A Survey
No ratings yet
Meta-Learning in Neural Networks A Survey
21 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
Online Meta-Learning: y 0. An Algorithm That Understands The Underlying Struc
No ratings yet
Online Meta-Learning: y 0. An Algorithm That Understands The Underlying Struc
19 pages
Thijs Van Der Laan s3986721 Bachelors Thesis
No ratings yet
Thijs Van Der Laan s3986721 Bachelors Thesis
42 pages
ARTICLEONnlp
No ratings yet
ARTICLEONnlp
18 pages
META-LEARNING WITH VERSATILE LOSS GEOMETRIES__FOR FAST ADAPTATION USING MIRROR DESCENT
No ratings yet
META-LEARNING WITH VERSATILE LOSS GEOMETRIES__FOR FAST ADAPTATION USING MIRROR DESCENT
7 pages
Metaworld
No ratings yet
Metaworld
40 pages
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
No ratings yet
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
12 pages
Unit-V Tranfer Learning Notes
No ratings yet
Unit-V Tranfer Learning Notes
27 pages
SSRN 4768234
No ratings yet
SSRN 4768234
6 pages
UNIT V reinforcement learning
No ratings yet
UNIT V reinforcement learning
8 pages
One-Shot Learning With Memory-Augmented Neural Networks
No ratings yet
One-Shot Learning With Memory-Augmented Neural Networks
13 pages
Comparing Task Simplifications To Learn Closed-Loop Object Picking Using Deep Reinforcement Learning
No ratings yet
Comparing Task Simplifications To Learn Closed-Loop Object Picking Using Deep Reinforcement Learning
8 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
AutoRL Tutorials
No ratings yet
AutoRL Tutorials
80 pages
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
No ratings yet
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
132 pages
Unit - V
No ratings yet
Unit - V
44 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Final
No ratings yet
Final
18 pages
NeurIPS-2023-context-shift-reduction-for-offline-meta-reinforcement-learning-Paper-Conference
No ratings yet
NeurIPS-2023-context-shift-reduction-for-offline-meta-reinforcement-learning-Paper-Conference
20 pages
case
No ratings yet
case
6 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Transferability in Deep Learning: A Survey: Junguang Jiang
No ratings yet
Transferability in Deep Learning: A Survey: Junguang Jiang
64 pages
transfer learning
No ratings yet
transfer learning
24 pages
Unsupervised Meta Learning
No ratings yet
Unsupervised Meta Learning
12 pages
Stockhammer TCP 2019
No ratings yet
Stockhammer TCP 2019
37 pages
Large-Scale Retrieval For Reinforcement Learning: These Authors Contributed Equally To This Work
No ratings yet
Large-Scale Retrieval For Reinforcement Learning: These Authors Contributed Equally To This Work
16 pages
cs224r_L01_intro
No ratings yet
cs224r_L01_intro
51 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Fundamentals of Computer Network Analysis and Engineering
From Everand
Fundamentals of Computer Network Analysis and Engineering
Radz
No ratings yet
Java and Java EE Interview Preparations
From Everand
Java and Java EE Interview Preparations
Navin Shet
No ratings yet
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
cs224r_L05_QLearning
No ratings yet
cs224r_L05_QLearning
40 pages
cs224r_L02_Imitation
No ratings yet
cs224r_L02_Imitation
40 pages
Latent Space Editing in Transformer-Based Flow Matching
No ratings yet
Latent Space Editing in Transformer-Based Flow Matching
18 pages
Wang et al. 2022 - Single-cell transcriptional regulation and genetic evolution of neuroendocrine prostate cancer
No ratings yet
Wang et al. 2022 - Single-cell transcriptional regulation and genetic evolution of neuroendocrine prostate cancer
21 pages
CellMinerCDB: NCATS Is a Web-Based Portal Integrating Public Cancer Cell Line Databases for Pharmacogenomic Explorations
No ratings yet
CellMinerCDB: NCATS Is a Web-Based Portal Integrating Public Cancer Cell Line Databases for Pharmacogenomic Explorations
12 pages
Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport
No ratings yet
Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport
34 pages
Extrachromosomal DNA Amplification Contributes to Small Cell Lung Cancer Heterogeneity and Is Associated with Worse Outcomes
No ratings yet
Extrachromosomal DNA Amplification Contributes to Small Cell Lung Cancer Heterogeneity and Is Associated with Worse Outcomes
22 pages
Learning single-cell perturbation responses using neural optimal transport
No ratings yet
Learning single-cell perturbation responses using neural optimal transport
24 pages
Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines
No ratings yet
Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines
81 pages
Journal Club: Correlating chemical sensitivity and basal gene expression reveals mechanism of action
No ratings yet
Journal Club: Correlating chemical sensitivity and basal gene expression reveals mechanism of action
17 pages
SCLC-CellMiner: A Resource for Small Cell Lung Cancer Cell Line Genomics and Pharmacology Based on Genomic Signatures
No ratings yet
SCLC-CellMiner: A Resource for Small Cell Lung Cancer Cell Line Genomics and Pharmacology Based on Genomic Signatures
21 pages
AACR2017 CellMinerCDB Poster
No ratings yet
AACR2017 CellMinerCDB Poster
1 page
rcellminer: exploring molecular profiles and drug response of the NCI-60 cell lines in R
No ratings yet
rcellminer: exploring molecular profiles and drug response of the NCI-60 cell lines in R
3 pages
CellMiner Cross-Database (CellMinerCDB) version 1.2: Exploration of patient-derived cancer cell line pharmacogenomics
No ratings yet
CellMiner Cross-Database (CellMinerCDB) version 1.2: Exploration of patient-derived cancer cell line pharmacogenomics
11 pages
Manage Qlik Sense Sites
100% (1)
Manage Qlik Sense Sites
403 pages
MCQ's On Files and Streams: #Include #Include
No ratings yet
MCQ's On Files and Streams: #Include #Include
9 pages
OOPS Lab External 2023
No ratings yet
OOPS Lab External 2023
2 pages
Pss Submission
No ratings yet
Pss Submission
14 pages
Pusat Latihan Teknologi Tinggi (Adtec) Batu Pahat, Johor: Kertas Tugasan 1
No ratings yet
Pusat Latihan Teknologi Tinggi (Adtec) Batu Pahat, Johor: Kertas Tugasan 1
8 pages
Cryptanalysis of Symmetric-Key Primitives
No ratings yet
Cryptanalysis of Symmetric-Key Primitives
214 pages
E HRM
No ratings yet
E HRM
9 pages
K XK XK X K XK Yk Yk Ykn Ykn: 7.9 State-Space Realizations 7.9.a Controllable Canonical Realization
No ratings yet
K XK XK X K XK Yk Yk Ykn Ykn: 7.9 State-Space Realizations 7.9.a Controllable Canonical Realization
9 pages
Department of Labor: 2002-02-15 Release 1 Guide
100% (1)
Department of Labor: 2002-02-15 Release 1 Guide
271 pages
MAS Assignment (Quantitative Techniques)
No ratings yet
MAS Assignment (Quantitative Techniques)
3 pages
Online Hostel Allotment for Freshers
No ratings yet
Online Hostel Allotment for Freshers
2 pages
Subset Sum Problem
No ratings yet
Subset Sum Problem
29 pages
Ampongan - A2 Tizen
No ratings yet
Ampongan - A2 Tizen
8 pages
Technocrats Institute of Technology
No ratings yet
Technocrats Institute of Technology
29 pages
NixOS Manual
No ratings yet
NixOS Manual
71 pages
BD Pricing GSA Contract 2008 2012
No ratings yet
BD Pricing GSA Contract 2008 2012
19 pages
Ec1358 DSP
No ratings yet
Ec1358 DSP
6 pages
Fyp-Smart Plug-1 PDF
No ratings yet
Fyp-Smart Plug-1 PDF
8 pages
AS400 Command DSP
No ratings yet
AS400 Command DSP
158 pages
Internet Protocol Television (Iptv)
No ratings yet
Internet Protocol Television (Iptv)
17 pages
Maximo Essentials 75
No ratings yet
Maximo Essentials 75
10 pages
Jun Han Jeff Xiang Resume
No ratings yet
Jun Han Jeff Xiang Resume
1 page
Mrinal
No ratings yet
Mrinal
2 pages
Eulerian Graphs
No ratings yet
Eulerian Graphs
47 pages
Anpqp 2.0
No ratings yet
Anpqp 2.0
32 pages
PLM
No ratings yet
PLM
10 pages
DBMS Lesson Plan 2021
No ratings yet
DBMS Lesson Plan 2021
2 pages
CS1100 - Introduction To Programming
No ratings yet
CS1100 - Introduction To Programming
31 pages

Meta-Learning & Transfer Learning

Uploaded by

Meta-Learning & Transfer Learning

Uploaded by

Meta-Learning & Transfer Learning

this is easy (mostly) this is impossible

• If we’ve solved prior tasks, we might acquire useful knowledge for

in RL, task = MDP! “shot”: number of attempts in the target

1. Forward transfer: learn policies that transfer effectively

The most popular transfer learning method in (supervised) deep learning!

➢ Difference in the MDP: some things that are possible

➢ Finetuning issues: if pretraining & finetuning, the

(same network) can we force this layer to be invariant to domain?

do well here Is this true?

When might this not work?

1. RL tasks are generally much less diverse

Rajeswaran et al., “EPOpt: Learning robust neural network policies…”

Xue Bin Peng et al., “Sim-to-Real Transfer of Robotic Control with

Simulation to real world transfer:

…and many many others!

1. Forward transfer: learn policies that transfer effectively

learn learn learn learn learn

Multi-task learning can:

Multi-task RL corresponds to single-task RL in a joint MDP

• What if the policy can do multiple things in the same environment?

e.g., do dishes or laundry

images: Peng, van de Panne, Peters

• If you’ve learned 100 tasks already, can you

• Deep reinforcement learning, especially model-free, requires a

image credit: Ravi & Larochelle ‘17

input (e.g., image) output (e.g., label)

test label • How to read in training set?

0.5 m/s 0.7 m/s -0.2 m/s -0.7 m/s

crucially, RNN hidden state is not reset between episodes!

optimizing total reward over

Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel. RL2:

attention + temporal convolution

Mishra, Rohaninejad, Chen, Abbeel. A Simple

Rakelly*, Zhou*, Quillen, Finn, Levine. Efficient Off-Policy Meta-

is pretraining a type of meta-learning?

Just another computation graph…

Hybrid algorithms (not necessarily gradient-based):

encapsulates information policy

encapsulates information policy

this is not optimal! but it’s pretty good,

act as though z was correct!

See, e.g. Russo, Roy. Learning to Optimize via Posterior Sampling.

maximize post-update reward stay close to prior

perform maximization using soft actor-critic (SAC),

• Rakelly*, Zhou*, Quillen, Finn, Levine. Efficient Off-Policy

• Zintgraf, Igl, Shiarlis, Mahajan, Hofmann, Whiteson.

• Humplik, Galashov, Hasenclever, Ortega, Teh, Heess. Meta

everything needed to solve task

+ good extrapolation (“consistent”)

+ simple, effective exploration via posterior sampling

everything needed to solve task

Humans and animals seemingly learn behaviors in a variety of ways:

Perhaps each of these is a separate “algorithm” in the brain

You might also like

Rakelly, Zhou, Quillen, Finn, Levine. Efficient Off-Policy Meta-

• Rakelly, Zhou, Quillen, Finn, Levine. Efficient Off-Policy