Meta-Learning & Transfer Learning
Meta-Learning & Transfer Learning
CS 285
Instructor: Sergey Levine
UC Berkeley
What’s the problem?
Why?
Montezuma’s revenge
• Getting key = reward
• Opening door = reward
• Getting killed by skull = bad
Montezuma’s revenge
• We know what to do because we understand what
these sprites mean!
• Key: we know it opens doors!
• Ladders: we know we can climb them!
• Skull: we don’t know what it does, but we know it
can’t be good!
• Prior understanding of problem structure can help
us solve complex tasks quickly!
Can RL use the same prior knowledge as us?
transfer learning: using experience from one set of tasks for faster
learning and better performance on a new task
correct answer
reversed gradient
Eysenbach et al., “Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers”
What if we can also finetune?
See “exploration 2” lecture on unsupervised skill discovery and “control as inference” lecture on MaxEnt RL methods!
How to maximize forward transfer?
Basic intuition: the more varied the training domain is, the more likely
we are to generalize in zero shot to a slightly different domain.
“Randomization” (dynamics/appearance/etc.): widely used for
simulation to real world transfer (e.g., in robotics)
EPOpt: randomizing physical parameters
training on single torso mass training on model ensemble
train test
ensemble adaptation
unmodeled effects
adapt
Sadeghi et al., “CAD2RL: Real Single-Image Flight without a Single Real Image.” 2016
Tzeng, Hoffman, Zhang, Saenko, Darrell. Deep Domain Confusion: Maximizing for Domain Invariance. 2014.
Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand, Lempitsky. Domain-Adversarial Training of Neural Networks. 2015.
Tzeng*, Devin*, et al., Adapting Visuomotor Representations with Weak Pairwise Constraints. 2016.
Eysenbach et al., Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers. 2020.
Finetuning:
Finetuning via MaxEnt RL: Haarnoja*, Tang*, et al. (2017). Reinforcement Learning with Deep Energy-Based Policies.
Andreas et al. Modular multitask reinforcement learning with policy sketches. 2017.
Florensa et al. Stochastic neural networks for hierarchical reinforcement learning. 2017.
Kumar et al. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. 2020
Rajeswaran, et al. (2017). EPOpt: Learning Robust Neural Network Policies Using Model Ensembles.
Yu et al. (2017). Preparing for the Unknown: Learning a Universal Policy with Online System Identification.
Sadeghi & Levine. (2017). CAD2RL: Real Single Image Flight without a Single Real Image.
Tobin et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.
Tan et al. (2018). Sim-to-Real: Learning Agile Locomotion For Quadruped Robots.
etc. MDP 0
pick MDP randomly
sample
in first state etc. MDP 1
etc. MDP 2
How does the model know what to do?
image credit: Ke Li
Why is meta-learning a good idea?
training set
test input
(few shot) training set
What is being “learned”?
test label
test input
(few shot) training set
What is being “learned”?
meta-learned
RNN hidden
weights
state
Meta Reinforcement Learning
The meta reinforcement learning problem
The meta reinforcement learning problem
“context”
Meta-RL with recurrent policies
meta-learned
RNN hidden
weights
state
Meta-RL with recurrent policies
+0 +0 +1 +1
+0 +0
Why recurrent policies learn to explore
meta-episode
Meta-RL with recurrent policies
Heess, Hunt, Lillicrap, Silver. Memory-based control with Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel. RL2:
recurrent neural networks. 2015. Blundell, Kumaran, Botvinick. Learning to Reinforcement Fast Reinforcement Learning via Slow Reinforcement
Learning. 2016. Learning. 2016.
Architectures for meta-RL
standard RNN (LSTM) architecture
Finn, Abbeel, Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
MAML for RL in pictures
What did we just do??
Improving exploration:
• Gupta, Mendonca, Liu, Abbeel, Levine. Meta-Reinforcement Learning of Structured
Exploration Strategies.
• Stadie*, Yang*, Houthooft, Chen, Duan, Wu, Abbeel, Sutskever. Some Considerations on
Learning to Explore via Meta-Reinforcement Learning.
Rakelly*, Zhou*, Quillen, Finn, Levine. Efficient Off-Policy Meta-Reinforcement learning via
Probabilistic Context Variables. ICML 2019.
Specific instantiation: PEARL
Rakelly*, Zhou*, Quillen, Finn, Levine. Efficient Off-Policy Meta-Reinforcement learning via
Probabilistic Context Variables. ICML 2019.
References on meta-RL, inference, and POMDPs
just perspective 1,
but with stochastic
hidden variables!
just a particular
architecture choice
for these
Ritter, Wang, Kurth-Nelson, Jayakumar, Blundell, Pascanu, Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo, Dasgupta, Wang, Chiappa, Mitrovic, Ortega, Raposo,
Botvinick. Been There, Done That: Meta-Learning with Hassabis, Botvinick. Prefrontal Cortex as a Meta- Hughes, Battaglia, Botvinick, Kurth-Nelson. Causal
Episodic Recall. Reinforcement Learning System. Reasoning from Meta-Reinforcement Learning.