Lecture 16 Meta Learning
Lecture 16 Meta Learning
image credit: Ke Li
Why is meta-learning a good idea?
training set
test label
new action
new state
experience
Meta-learning in RL with memory
“water maze” task
second attempt
third attempt
first attempt
Duan et al., “RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”
Connection to contextual policies
fit a model/
estimate the return
generate samples
(i.e. run the policy)
trivial, nothing to do
improve the policy
generate samples
(i.e. run the policy)
(1) (2, 3, 4)
generate samples
generate samples
Simple example: sample parallelism with PG
DDPG:
1,000,000 steps
20,000,000 steps
Model-based algorithms: parallel GPS
[parallelize sampling]
[parallelize dynamics]
[parallelize LQR]
[parallelize SGD]
(1)
(1)
Rollout execution
(2, 3)
sample
grasp success
sample
predictor training
sample