0% found this document useful (0 votes)

15 views39 pages

Lecture 16 Meta Learning

Uploaded by

Kwame Impact Asante-Boateng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views39 pages

Lecture 16 Meta Learning

Uploaded by

Kwame Impact Asante-Boateng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Meta-Learning

CS 294-112: Deep Reinforcement Learning

Sergey Levine
Class Notes
1. Two weeks until the project milestone!
2. Guest lectures start next week, be sure to attend!
3. Today: part 1: meta-learning
4. Today: part 2: parallelism
How can we frame transfer learning problems?
No single solution! Survey of various recent research papers
1. “Forward” transfer: train on one task, transfer to a new task
a) Just try it and hope for the best
b) Finetune on the new task
c) Architectures for transfer: progressive networks
d) Randomize source task domain
2. Multi-task transfer: train on many tasks, transfer to a new task
a) Model-based reinforcement learning
b) Model distillation
c) Contextual policies
d) Modular policy networks
3. Multi-task meta-learning: learn to learn from many tasks
a) RNN-based meta-learning
b) Gradient-based meta-learning
So far…

• Forward transfer: source domain to target domain

• Diversity is good! The more varied the training, the more likely transfer is to
succeed
• Multi-task learning: even more variety
• No longer training on the same kind of task
• But more variety = more likely to succeed at transfer
• How do we represent transfer knowledge?
• Model (as in model-based RL): rules of physics are conserved across tasks
• Policies – requires finetuning, but closer to what we want to accomplish
• What about learning methods?
What is meta-learning?

• If you’ve learned 100 tasks already, can you

figure out how to learn more efficiently?
• Now having multiple tasks is a huge advantage!
• Meta-learning = learning to learn
• In practice, very closely related to multi-task
learning
• Many formulations
• Learning an optimizer
• Learning an RNN that ingests experience
• Learning a representation

image credit: Ke Li
Why is meta-learning a good idea?

• Deep reinforcement learning, especially model-free, requires a

huge number of samples
• If we can meta-learn a faster reinforcement learner, we can learn
new tasks efficiently!
• What can a meta-learned learner do differently?
• Explore more intelligently
• Avoid trying actions that are know to be useless
• Acquire the right features more quickly
Meta-learning with supervised learning

image credit: Ravi & Larochelle ‘17

Meta-learning with supervised learning

input (e.g., image) output (e.g., label)

training set
test label

• How to read in training set?

• Many options, RNNs can work
test input
• More on this later
(few shot) training set
The meta-learning problem in RL

recent experience state output (e.g., action)

new action

new state
experience
Meta-learning in RL with memory
“water maze” task
second attempt

third attempt

first attempt

with memory without memory

Heess et al., “Memory-based control with recurrent neural networks.”

RL2

Duan et al., “RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”
Connection to contextual policies

just contextual policies, with

experience as context
Back to representations…

is pretraining a type of meta-learning?

better features = faster learning of new task!
Preparing a model for faster learning

Finn et al., “Model-Agnostic Meta-Learning”

What did we just do??

Just another computation graph…

Can implement with any autodiff
package (e.g., TensorFlow)
But has favorable inductive bias…
Model-agnostic meta-learning: accelerating PG
after 1 gradient step after 1 gradient step
after MAML training (forward reward) (backward reward)
Model-agnostic meta-learning: accelerating PG
after 1 gradient step after 1 gradient step
after MAML training (backward reward) (forward reward)
Meta-learning summary & open problems

• Meta-learning = learning to learn

• Supervised meta-learning = supervised learning with datapoints that
are entire datasets
• RL meta-learning with RNN policies
• Ingest past experience with RNN
• Simply run forward pass at test time to “learn”
• Just contextual policies (no actual learning)
• Model-agnostic meta-learning
• Use gradient descent (e.g., policy gradient) learning rule
• Conceptually not that different
• …but can accelerate standard RL algorithms (e.g., learn in one iteration of PG)
Meta-learning summary & open problems

• The promise of meta-learning: use past experience to simply acquire a

much more efficient deep RL algorithm
• The reality of meta-learning: mostly works well on smaller problems
• …but getting better all the time
• Main limitations
• RNN policies are extremely hard to train, and likely not scalable
• Model-agnostic meta-learning presents a tough optimization problem
• Designing the right task distribution is hard
• Generally very sensitive to task distribution (meta-overfitting)
Parallelism in RL
Overview
1. We learned about a number of policy search methods
2. These algorithms have all been sequential
3. Is there a natural way to parallelize RL algorithms?
• Experience sampling vs learning
• Multiple learning threads
• Multiple experience collection threads
Today’s Lecture
1. What can we parallelize?
2. Case studies: specific parallel RL methods
3. Tradeoffs & considerations
• Goals
• Understand the high-level anatomy of reinforcement learning algorithms
• Understand standard strategies for parallelization
• Tradeoffs of different parallel methods
High-level RL schematic

fit a model/
estimate the return

generate samples
(i.e. run the policy)

improve the policy

Which parts are slow?
trivial, fast
fit a model/
estimate the return
real robot/car/power
grid/whatever: expensive, but non-
1x real time, until we trivial to parallelize
invent time travel generate samples
(i.e. run the policy)
MuJoCo simulator:
up to 10000x real time

trivial, nothing to do
improve the policy

expensive, but non-

trivial to parallelize
Which parts can we parallelize?

fit a model/ parallel SGD

estimate the return

generate samples
(i.e. run the policy)

improve the policy parallel SGD

Helps to group data generation and training

(worker generates data, computes gradients, and gradients are pooled)
High-level decisions
1. Online or batch-mode?
2. Synchronous or asynchronous?

generate one step

generate one step
generate samples
generate one step
generate samples policy gradient

generate samples fit Q-value

fit Q-value
fit Q-value
Relationship to parallelized SGD
1. Parallelizing model/critic/actor training typically
fit a model/ involves parallelizing SGD
estimate the return
2. Simple parallel SGD:
1. Each worker has a different slice of data
improve the policy
2. Each worker computes gradients, sums them, sends to
parameter server
3. Parameter server sums gradients from all workers and
sends back new parameters
3. Mathematically equivalent to SGD, but not
asynchronous (communication delays)
4. Async SGD typically does not achieve perfect
parallelism, but lack of locks can make it much faster
Dai et al. ‘15 5. Somewhat problem dependent
Simple example: sample parallelism with PG

(1) (2, 3, 4)
generate samples

generate samples policy gradient

generate samples
Simple example: sample parallelism with PG

(1) (2) (3, 4)

generate samples evaluate reward

generate samples evaluate reward policy gradient

generate samples evaluate reward

Simple example: sample parallelism with PG

Dai et al. ‘15

(1) (2) (3) (4)

generate samples evaluate reward compute gradient

sum & apply
generate samples evaluate reward compute gradient
gradient
generate samples evaluate reward compute gradient
What if we add a critic?

see John’s actor-critic lecture

for what the options here are

(1, 2) (3) (3)

samples & rewards critic gradients
sum & apply critic
gradient
samples & rewards critic gradients
(4) costly synchronization
(5)
policy gradients
sum & apply policy
gradient
policy gradients
What if we add a critic?

see John’s actor-critic lecture

for what the options here are

(1, 2) (3) (3)

samples & rewards critic gradients
sum & apply critic
gradient
samples & rewards critic gradients
(4) (5)
policy gradients
sum & apply policy
gradient
policy gradients
What if we run online?

only the parameter update

requires synchronization (actor + critic params)

(1, 2) (3) (3)

samples & rewards critic gradients
sum & apply critic
gradient
samples & rewards critic gradients
(4) (5)
policy gradients
sum & apply policy
gradient
policy gradients
Actor-critic algorithm: A3C

• Some differences vs DQN, DDPG, etc:

• No replay buffer, instead rely on diversity of samples from
different workers to decorrelate
• Some variability in exploration between workers
• Pro: generally much faster in terms of wall clock
• Con: generally must slower in terms of # of samples (more
on this later…)
Mnih et al. ‘16
Actor-critic algorithm: A3C

DDPG:

more on this later…

1,000,000 steps

20,000,000 steps
Model-based algorithms: parallel GPS

[parallelize sampling]

[parallelize dynamics]
[parallelize LQR]
[parallelize SGD]

(1)
(1)
Rollout execution
(2, 3)

Local policy optimization Global policy optimization

(2, 3) (4) (4)

Yahya, Li, Kalakrishnan, Chebotar, L., ‘16

Model-based algorithms: parallel GPS
Real-world model-free deep RL: parallel NAF

Gu, Holly, Lillicrap, L., ‘16

Simplest example: sample parallelism with
off-policy algorithms

sample
grasp success
sample
predictor training
sample

Preparing To Be His Help Meet Debi Pearl PDF
80% (5)
Preparing To Be His Help Meet Debi Pearl PDF
315 pages
Agyinasare, Charles - Transference of Spirits
100% (1)
Agyinasare, Charles - Transference of Spirits
91 pages
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
D K Olukoya - Receiving The Oil of Divine Favour
100% (4)
D K Olukoya - Receiving The Oil of Divine Favour
15 pages
Releasing The Supernatural - An Adventure Into The Spirit World PDF
100% (7)
Releasing The Supernatural - An Adventure Into The Spirit World PDF
108 pages
Meta-Learning & Transfer Learning
No ratings yet
Meta-Learning & Transfer Learning
56 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
A Tutorial On Meta-Reinforcement Learning: Foundations and Trends in Machine Learning
No ratings yet
A Tutorial On Meta-Reinforcement Learning: Foundations and Trends in Machine Learning
164 pages
Advanced Systemdesign 2023
No ratings yet
Advanced Systemdesign 2023
65 pages
RLDL End Sem
No ratings yet
RLDL End Sem
230 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
RL Chap 5
No ratings yet
RL Chap 5
21 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Op Tim Ization
No ratings yet
Op Tim Ization
19 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
2006 00979v1 PDF
No ratings yet
2006 00979v1 PDF
33 pages
Rudin 22 A
No ratings yet
Rudin 22 A
10 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
Learn To Learn
No ratings yet
Learn To Learn
17 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
No ratings yet
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
132 pages
Rlpyt: A Research Code Base For Deep Reinforcement Learning in Pytorch
No ratings yet
Rlpyt: A Research Code Base For Deep Reinforcement Learning in Pytorch
12 pages
An Empirical Investigation of The Challenges of Real-World Reinforcement Learning
No ratings yet
An Empirical Investigation of The Challenges of Real-World Reinforcement Learning
48 pages
Deep Reinforcement Learning Nanodegree Program Syllabus
No ratings yet
Deep Reinforcement Learning Nanodegree Program Syllabus
13 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Transformers in Reinforcement Learning - A Survey
No ratings yet
Transformers in Reinforcement Learning - A Survey
35 pages
Deep Learning Material
No ratings yet
Deep Learning Material
136 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Stockhammer TCP 2019
No ratings yet
Stockhammer TCP 2019
37 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Lec 23
No ratings yet
Lec 23
51 pages
RL PyTexas 2017 PDF
No ratings yet
RL PyTexas 2017 PDF
29 pages
Unit 3
No ratings yet
Unit 3
13 pages
CM20315 01 Intro
No ratings yet
CM20315 01 Intro
62 pages
Deep Learning For Natural Language GDG Bloomington 1690248059
No ratings yet
Deep Learning For Natural Language GDG Bloomington 1690248059
41 pages
Unit 4
No ratings yet
Unit 4
23 pages
Asynchronous Methods For Deep Reinforcement Learning
No ratings yet
Asynchronous Methods For Deep Reinforcement Learning
28 pages
Asynchronous Methods For Deep Reinforcement Learning
No ratings yet
Asynchronous Methods For Deep Reinforcement Learning
28 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Ray: A Distributed Framework For Emerging AI Applications
No ratings yet
Ray: A Distributed Framework For Emerging AI Applications
19 pages
AI Unit 1
No ratings yet
AI Unit 1
36 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
Case
No ratings yet
Case
6 pages
Meta Q-Network: A Combination of Reinforcement Learning and Meta Learning
No ratings yet
Meta Q-Network: A Combination of Reinforcement Learning and Meta Learning
11 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Week1 UDL CM20315 01 Intro
No ratings yet
Week1 UDL CM20315 01 Intro
49 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Autoencoders: Parallel Programming Parallel Processing
No ratings yet
Autoencoders: Parallel Programming Parallel Processing
5 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Neurips - Verl Workshop
No ratings yet
Neurips - Verl Workshop
48 pages
Meta-Learning With Temporal Convolutions
No ratings yet
Meta-Learning With Temporal Convolutions
14 pages
Lecture Reinforcement Learning
No ratings yet
Lecture Reinforcement Learning
28 pages
64e8c37a3a32b1b85d479988 - AIPromptPlaybook v1
No ratings yet
64e8c37a3a32b1b85d479988 - AIPromptPlaybook v1
28 pages
Deliverance by Fire, by Force by Pastor Hlompho Phamodi
100% (4)
Deliverance by Fire, by Force by Pastor Hlompho Phamodi
185 pages
Servant of God - How To Worship God in Spirit and in Truth PDF
No ratings yet
Servant of God - How To Worship God in Spirit and in Truth PDF
65 pages
The Good Wife's Guide
No ratings yet
The Good Wife's Guide
163 pages
Rameau and The Italian Tradition
No ratings yet
Rameau and The Italian Tradition
8 pages
Bscs Sem 1 Note
No ratings yet
Bscs Sem 1 Note
10 pages
Physics 2 - Speed, Velocity and Acceleration
100% (1)
Physics 2 - Speed, Velocity and Acceleration
75 pages
List of SP10 With Failing Grades
No ratings yet
List of SP10 With Failing Grades
2 pages
CSE 2021 2025 - Syllabus
No ratings yet
CSE 2021 2025 - Syllabus
170 pages
E0 - 270 (On-Campus) - Practice Set
No ratings yet
E0 - 270 (On-Campus) - Practice Set
2 pages
Text Analytics
No ratings yet
Text Analytics
30 pages
Data Visualization With Mathematica - No 3D Rasterization
100% (1)
Data Visualization With Mathematica - No 3D Rasterization
41 pages
Jan Werner February 2019
No ratings yet
Jan Werner February 2019
7 pages
Second Semester Syllabus: 19chy102 Engineering Chemistry-B (2 1 0 3)
No ratings yet
Second Semester Syllabus: 19chy102 Engineering Chemistry-B (2 1 0 3)
12 pages
Assignment 3 Grade 10 Math
No ratings yet
Assignment 3 Grade 10 Math
3 pages
Elements and Principles of Data Analysis
No ratings yet
Elements and Principles of Data Analysis
27 pages
Statistics and Probability For Senior Hi
No ratings yet
Statistics and Probability For Senior Hi
255 pages
STAT 330 Supplementary Notes
No ratings yet
STAT 330 Supplementary Notes
134 pages
Java Record
No ratings yet
Java Record
81 pages
Python Syllabus
No ratings yet
Python Syllabus
10 pages
Class Vi, Maths
No ratings yet
Class Vi, Maths
4 pages
Numbers - Grammar - Deutsch - Info
No ratings yet
Numbers - Grammar - Deutsch - Info
3 pages
Power System Analysis of The IEEE 14-Bus Test System Using PSAT and MATLAB
No ratings yet
Power System Analysis of The IEEE 14-Bus Test System Using PSAT and MATLAB
25 pages
ChE 310 - Mid Quiz Question - ChE 16
No ratings yet
ChE 310 - Mid Quiz Question - ChE 16
3 pages
Algebra P3 (Part 1)
No ratings yet
Algebra P3 (Part 1)
73 pages
TOYCO
No ratings yet
TOYCO
7 pages
Aristotle Galileo and Newton Reading With Reflection
No ratings yet
Aristotle Galileo and Newton Reading With Reflection
11 pages
PE ZC213 / TA ZC233 Engineering Measurements L-3: BITS Pilani
No ratings yet
PE ZC213 / TA ZC233 Engineering Measurements L-3: BITS Pilani
17 pages
Cambridge International List of Fees June 2023-1
No ratings yet
Cambridge International List of Fees June 2023-1
12 pages
Bab I Pendahuluan: 1.2 Pengumpulan Data
No ratings yet
Bab I Pendahuluan: 1.2 Pengumpulan Data
6 pages
Elfini Solver Verification: What's New? User Tasks
No ratings yet
Elfini Solver Verification: What's New? User Tasks
98 pages
Prilims: Syllabus of Paper I (General Studies - I)
No ratings yet
Prilims: Syllabus of Paper I (General Studies - I)
3 pages
A Collection of Useful Scripts, Tutorials and Other Python Related Thing
No ratings yet
A Collection of Useful Scripts, Tutorials and Other Python Related Thing
5 pages
Henry E. Kyburg JR Theory and Measurement
100% (1)
Henry E. Kyburg JR Theory and Measurement
280 pages

Lecture 16 Meta Learning

Uploaded by

Lecture 16 Meta Learning

Uploaded by

Meta-Learning

CS 294-112: Deep Reinforcement Learning

• Forward transfer: source domain to target domain

• If you’ve learned 100 tasks already, can you

• Deep reinforcement learning, especially model-free, requires a

image credit: Ravi & Larochelle ‘17

input (e.g., image) output (e.g., label)

• How to read in training set?

recent experience state output (e.g., action)

with memory without memory

Heess et al., “Memory-based control with recurrent neural networks.”

just contextual policies, with

is pretraining a type of meta-learning?

Finn et al., “Model-Agnostic Meta-Learning”

Just another computation graph…

• Meta-learning = learning to learn

• The promise of meta-learning: use past experience to simply acquire a

improve the policy

expensive, but non-

fit a model/ parallel SGD

improve the policy parallel SGD

Helps to group data generation and training

generate one step

generate samples fit Q-value

generate samples policy gradient

(1) (2) (3, 4)

generate samples evaluate reward policy gradient

generate samples evaluate reward

Dai et al. ‘15

generate samples evaluate reward compute gradient

see John’s actor-critic lecture

(1, 2) (3) (3)

see John’s actor-critic lecture

(1, 2) (3) (3)

only the parameter update

(1, 2) (3) (3)

• Some differences vs DQN, DDPG, etc:

more on this later…

Local policy optimization Global policy optimization

(2, 3) (4) (4)

Yahya, Li, Kalakrishnan, Chebotar, L., ‘16

Gu*, Holly*, Lillicrap, L., ‘16

You might also like

Gu, Holly, Lillicrap, L., ‘16