1 Introduction To RL
1 Introduction To RL
CS 294 - 112
Course logistics
Class Information & Resources
Sergey Levine Abhishek Gupta Josh Achiam
Assistant Professor PhD Student PhD Student
UC Berkeley UC Berkeley UC Berkeley
consequences
observations
rewards
Examples
Actions: muscle contractions Actions: motor current or torque Actions: what to purchase
Observations: sight, smell Observations: camera images Observations: inventory levels
Rewards: food Rewards: task success measure Rewards: profit
(e.g., running speed)
What is deep RL, and why should we care?
Deep learning: end-to-end training of
expressive, multi-layer models
Action
(run away)
action
sensorimotor loop
Action
(run away)
Example: robotics
robotic state
modeling & low-level
control observations estimation
prediction
planning
control
controls
pipeline (e.g. vision)
Example: playing video games
end-to-end training
deep
learning
robotic state
modeling & low-level
control observations estimation
prediction
planning
control
controls
pipeline (e.g. vision)
end-to-end training
deep state
modeling & low-level
robotic observations estimation
prediction
planning
control
controls
learning (e.g. vision)
tiny, highly specialized tiny, highly specialized
“visual cortex” “motor cortex”
no direct supervision
actions have consequences
The reinforcement learning problem
decisions (actions) Actions: motor current or torque
Observations: camera images
Rewards: task success measure (e.g., running speed)
consequences
observations The reinforcement learning problem is the AI problem!
rewards
When do we not need to worry about
sequential decision making?
When your system is making single isolated decision, e.g. classification, regression
When that decision does not affect future decisions
When should we worry about sequential
decision making?
Limited supervision: you know what you want, but not how to get it
Actions have consequences
Common Applications
autonomous driving business operations
Tesauro, 1995
L.-J. Lin, “Reinforcement learning for robots using neural networks.” 1993
Why should we study this now?
original
video
predictions
Finn et al. 2017
How do we build intelligent machines?
How do we build intelligent machines?
• Imagine you have to build an intelligent machine, where do you start?
Learning as the basis of intelligence
• Some things we can all do (e.g. walking)
• Some things we can only learn (e.g. driving a car)
• We can learn a huge variety of things, including very difficult things
• Therefore our learning mechanism(s) are likely powerful enough to do
everything we associate with intelligence
• But it may still be very convenient to “hard-code” a few really important bits
A single algorithm?
• An algorithm for each “module”?
• Or a single flexible algorithm?
Auditory
Cortex
Human echolocation (sonar)
observations
education one would obtain the
actions
adult brain.
- Alan Turing
environment