Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
CS 285
Instructor: Sergey Levine
UC Berkeley
Definitions
Terminology & notation
1. run away
2. ignore
3. pet
Imitation Learning
training supervised
data learning
Andrey Markov
Definitions
Richard
Andrey Bellman
Markov
Definitions
Richard Bellman
Definitions
The goal of reinforcement learning
we’ll come back to partially observed later
The goal of reinforcement learning
The goal of reinforcement learning
Finite horizon case: state-action marginal
state-action marginal
Infinite horizon case: stationary distribution
stationary distribution
stationary = the
same before and
after transition
Infinite horizon case: stationary distribution
stationary distribution
stationary = the
same before and
after transition
Expectations and stochastic systems
+1 -1
Algorithms
The anatomy of a reinforcement learning algorithm
fit a model/
estimate the return
generate samples
(i.e. run the policy)
fit a model/
estimate the return
generate samples
(i.e. run the policy)
fit a model/
estimate the return
generate samples
(i.e. run the policy)
generate samples
(i.e. run the policy)
fit a model/
estimate the return
generate samples
(i.e. run the policy)
fit a model/
estimate the return
generate samples
(i.e. run the policy)
fit a model/
estimate the return
generate samples
(i.e. run the policy)
fit a model/
estimate the return
generate samples
(i.e. run the policy)
policy
• On policy: each time the policy is changed,
even a little bit, we need to generate new
samples
just one gradient step
Comparison: sample efficiency
off-policy on-policy
• End-to-end training of
deep visuomotor
policies, L.* , Finn* ’16
• Guided policy search
(model-based RL) for
image-based robotic
manipulation
Example 3: walking with policy gradients
• High-dimensional
continuous control with
generalized advantage
estimation, Schulman et
al. ‘16
• Trust region policy
optimization with value
function approximation
Example 4: robotic grasping with Q-functions
• QT-Opt, Kalashnikov et
al. ‘18
• Q-learning from images
for real-world robotic
grasping