0% found this document useful (0 votes)

58 views46 pages

1 Introduction To RL

This document provides an overview of a deep reinforcement learning course at UC Berkeley. It discusses course logistics, prerequisites, what will be covered in the course, assignments, and an introduction to reinforcement learning and why it is important. The key topics covered are: deep reinforcement learning, examples of reinforcement learning problems, why we should study this now given advances in deep learning and computational power, and other challenges that need to be addressed to enable real-world sequential decision making applications.

Uploaded by

Nathaniel Saura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views46 pages

1 Introduction To RL

Uploaded by

Nathaniel Saura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Deep Reinforcement Learning

CS 294 - 112
Course logistics
Class Information & Resources
Sergey Levine Abhishek Gupta Josh Achiam
Assistant Professor PhD Student PhD Student
UC Berkeley UC Berkeley UC Berkeley

• Course website: rll.berkeley.edu/deeprlcourse/

• Piazza: UC Berkeley, CS294-112
• Subreddit (for non-enrolled students):
www.reddit.com/r/berkeleydeeprlcourse/
• Office hours: after class each day (but not today), sign up in advance
for a 10-minute slot on the course website
Prerequisites & Enrollment
• All enrolled students must have taken CS189, CS289, or CS281A
• Please contact Sergey Levine if you haven’t
• Please enroll for 3 units
• Students on the wait list will be notified as slots open up
• Lectures will be recorded
• Since the class is full, please watch the lectures online if you are not enrolled
What you should know
• Assignments will require training neural networks with standard
automatic differentiation packages (TensorFlow by default)
• Review Section
• Josh Achiam will teach a review section in week 3
• You should be able to at least do the TensorFlow MNIST tutorial (if not, come
to the review section and ask questions!)
What we’ll cover
• Full syllabus on course website
1. From supervised learning to decision making
2. Basic reinforcement learning: Q-learning and policy gradients
3. Advanced model learning and prediction, distillation, reward
learning
4. Advanced deep RL: trust region policy gradients, actor-critic
methods, exploration
5. Open problems, research talks, invited lectures
Assignments
1. Homework 1: Imitation learning (control via supervised learning)
2. Homework 2: Policy gradients (“REINFORCE”)
3. Homework 3: Q learning with convolutional neural networks
4. Homework 4: Model-based reinforcement learning
5. Final project: Research-level project of your choice (form a group of
up to 2-3 students, you’re welcome to start early!)

Grading: 40% homework (10% each), 60% project

Your “Homework” Today
1. Sign up for Piazza (see course website)
2. Start forming your final project groups, unless you want to work
alone, which is fine
3. Fill out the enrolled student survey if you haven’t already!
4. Check out the TensorFlow MNIST tutorial, unless you’re a
TensorFlow pro
What is reinforcement learning, and why
should we care?
What is reinforcement learning?
decisions (actions)

consequences
observations
rewards
Examples

Actions: muscle contractions Actions: motor current or torque Actions: what to purchase
Observations: sight, smell Observations: camera images Observations: inventory levels
Rewards: food Rewards: task success measure Rewards: profit
(e.g., running speed)
What is deep RL, and why should we care?
Deep learning: end-to-end training of
expressive, multi-layer models

Deep models are what allow reinforcement

learning algorithms to solve complex problems
end to end!
What does end-to-end learning mean for
sequential decision making?
perception

Action
(run away)

action
sensorimotor loop

Action
(run away)
Example: robotics

robotic state
modeling & low-level
control observations estimation
prediction
planning
control
controls
pipeline (e.g. vision)
Example: playing video games

video extract state

low-level
game game API relevant machine for planner
bot control
controls
AI pipeline features behavior
standard
features mid-level features classifier
computer
(e.g. HOG) (e.g. DPM) (e.g. SVM)
vision
Felzenszwalb ‘08

end-to-end training
deep
learning

robotic state
modeling & low-level
control observations estimation
prediction
planning
control
controls
pipeline (e.g. vision)

end-to-end training
deep state
modeling & low-level
robotic observations estimation
prediction
planning
control
controls
learning (e.g. vision)
tiny, highly specialized tiny, highly specialized
“visual cortex” “motor cortex”

no direct supervision
actions have consequences
The reinforcement learning problem
decisions (actions) Actions: motor current or torque
Observations: camera images
Rewards: task success measure (e.g., running speed)

Deep models are what allow reinforcement

Actions: what to purchase
learning algorithms to solve complex problems
Observations: inventory levels
Rewards: profit
end to end! Actions: words in French
Observations: words in English
Rewards: BLEU score

consequences
observations The reinforcement learning problem is the AI problem!
rewards
When do we not need to worry about
sequential decision making?
When your system is making single isolated decision, e.g. classification, regression
When that decision does not affect future decisions
When should we worry about sequential
decision making?
Limited supervision: you know what you want, but not how to get it
Actions have consequences

Common Applications
autonomous driving business operations

language & dialogue

robotics (structured prediction) finance
Why should we study this now?

1. Advances in deep learning

2. Advances in reinforcement learning
3. Advances in computational capability
Why should we study this now?

Tesauro, 1995

L.-J. Lin, “Reinforcement learning for robots using neural networks.” 1993
Why should we study this now?

Atari games: Real-world robots: Beating Go champions:

Q-learning: Guided policy search: Supervised learning + policy
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. S. Levine*, C. Finn*, T. Darrell, P. Abbeel. “End-to-end gradients + value functions +
Antonoglou, et al. “Playing Atari with Deep training of deep visuomotor policies”. (2015).
Reinforcement Learning”. (2013).
Monte Carlo tree search:
Q-learning: D. Silver, A. Huang, C. J. Maddison, A. Guez,
Policy gradients: S. Gu*, E. Holly*, T. Lillicrap, S. Levine. “Deep L. Sifre, et al. “Mastering the game of Go
J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Reinforcement Learning for Robotic Manipulation with deep neural networks and tree
Abbeel. “Trust Region Policy Optimization”. (2015). with Asynchronous Off-Policy Updates”. (2016). search”. Nature (2016).
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap,
et al. “Asynchronous methods for deep reinforcement
learning”. (2016).
What other problems do we need to solve to
enable real-world sequential decision making?
Beyond learning from reward

• Basic reinforcement learning deals with maximizing rewards

• This is not the only problem that matters for sequential decision
making!
• We will cover more advanced topics
• Learning reward functions from example (inverse reinforcement learning)
• Transferring skills between domains
• Learning to predict and using prediction to act
Where do rewards come from?
Are there other forms of supervision?

• Learning from demonstrations

• Directly copying observed behavior
• Inferring rewards from observed behavior (inverse reinforcement learning)
• Learning from observing the world
• Learning to predict
• Unsupervised learning
• Learning from other tasks
• Transfer learning
• Meta-learning: learning to learn
Imitation learning

Bojarski et al. 2016

More than imitation: inferring intentions

Warneken & Tomasello

Inverse RL examples

Finn et al. 2016

Prediction
What can we do with a perfect model?

Mordatch et al. 2015

Prediction for real-world control

original
video

predictions
Finn et al. 2017
How do we build intelligent machines?
How do we build intelligent machines?
• Imagine you have to build an intelligent machine, where do you start?
Learning as the basis of intelligence
• Some things we can all do (e.g. walking)
• Some things we can only learn (e.g. driving a car)
• We can learn a huge variety of things, including very difficult things
• Therefore our learning mechanism(s) are likely powerful enough to do
everything we associate with intelligence
• But it may still be very convenient to “hard-code” a few really important bits
A single algorithm?
• An algorithm for each “module”?
• Or a single flexible algorithm?

Seeing with your tongue

Auditory
Cortex
Human echolocation (sonar)

[BrainPort; Martinez et al; Roe et al.]

adapted from A. Ng
What must that single algorithm do?
• Interpret rich sensory inputs

• Choose complex actions

Why deep reinforcement learning?
• Deep = can process complex sensory input
▪ …and also compute really complex functions
• Reinforcement learning = can choose complex actions
Some evidence in favor of deep learning
Some evidence for reinforcement learning
• Percepts that anticipate reward
become associated with similar
firing patterns as the reward
itself
• Basal ganglia appears to be
related to reward system
• Model-free RL-like adaptation is
often a good fit for experimental
data of animal adaptation
• But not always…
What can deep learning & RL do well now?
• Acquire high degree of proficiency in
domains governed by simple, known
rules
• Learn simple skills with raw sensory
inputs, given enough experience
• Learn from imitating enough human-
provided expert behavior
What has proven challenging so far?
• Humans can learn incredibly quickly
• Deep RL methods are usually slow
• Humans can reuse past knowledge
• Transfer learning in deep RL is an open problem
• Not clear what the reward function should be
• Not clear what the role of prediction should be
Instead of trying to produce a
program to simulate the adult
mind, why not rather try to
produce one which simulates the
child's? If this were then subjected general learning
to an appropriate course of algorithm

observations
education one would obtain the

actions
adult brain.
- Alan Turing
environment

Megersa MBA Thesis For Defense (2024)
No ratings yet
Megersa MBA Thesis For Defense (2024)
74 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
AIC262 - IntroAI - Lab Manual SP22 - V3.1
No ratings yet
AIC262 - IntroAI - Lab Manual SP22 - V3.1
158 pages
Water Well Drilling Machine and Tools Catalogue
No ratings yet
Water Well Drilling Machine and Tools Catalogue
49 pages
Measurement of Conductance and Kohlrauch's Law
No ratings yet
Measurement of Conductance and Kohlrauch's Law
23 pages
RLDL End Sem
No ratings yet
RLDL End Sem
230 pages
Ne XTFAQ
No ratings yet
Ne XTFAQ
103 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
C100 Service Training Manual:: All Wheel Drive (AWD)
No ratings yet
C100 Service Training Manual:: All Wheel Drive (AWD)
18 pages
RL Week - 1
No ratings yet
RL Week - 1
53 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Lecture - 02 - Introduction - II
No ratings yet
Lecture - 02 - Introduction - II
43 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
cs224r L01 Intro
No ratings yet
cs224r L01 Intro
51 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
An Invitation To Deep Reinforcement Learning: Bernhard Jaeger
No ratings yet
An Invitation To Deep Reinforcement Learning: Bernhard Jaeger
39 pages
Lec 23
No ratings yet
Lec 23
51 pages
Exam - 1013S 2023 Final
No ratings yet
Exam - 1013S 2023 Final
20 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
RL Chap 5
No ratings yet
RL Chap 5
21 pages
Lec 1 Intro Course Overview
No ratings yet
Lec 1 Intro Course Overview
50 pages
Reinforcement Learning Workflows For Ai
No ratings yet
Reinforcement Learning Workflows For Ai
39 pages
Lec 01
No ratings yet
Lec 01
60 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Stockhammer TCP 2019
No ratings yet
Stockhammer TCP 2019
37 pages
ML 5 Reinforcement
No ratings yet
ML 5 Reinforcement
23 pages
Lecture1 Introduction Part1
No ratings yet
Lecture1 Introduction Part1
17 pages
Playbook Executive Briefing Reinforcement Learning
No ratings yet
Playbook Executive Briefing Reinforcement Learning
20 pages
Lecture 1: Introduction: Reinforcement Learning With Tensorflow&Openai Gym
No ratings yet
Lecture 1: Introduction: Reinforcement Learning With Tensorflow&Openai Gym
18 pages
Introduction
No ratings yet
Introduction
24 pages
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
No ratings yet
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
24 pages
Chapter 7 (Part I) - User Defined Datatypes
No ratings yet
Chapter 7 (Part I) - User Defined Datatypes
53 pages
Reinforcement Learning - Introduction
No ratings yet
Reinforcement Learning - Introduction
19 pages
SL Week01
No ratings yet
SL Week01
13 pages
Final
No ratings yet
Final
18 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
Cls 8 - Math D - Term 1 - LP 4
No ratings yet
Cls 8 - Math D - Term 1 - LP 4
2 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Applications of Reinforcement Learning
No ratings yet
Applications of Reinforcement Learning
10 pages
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
No ratings yet
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
9 pages
A Concise Introduction To Reinforcement Learning: February 2018
No ratings yet
A Concise Introduction To Reinforcement Learning: February 2018
12 pages
Case
No ratings yet
Case
6 pages
Four
No ratings yet
Four
5 pages
Everhard™: Abrasion-Resistant Steel Plate
No ratings yet
Everhard™: Abrasion-Resistant Steel Plate
12 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
AI Reinforcdement Learning
No ratings yet
AI Reinforcdement Learning
20 pages
Rotational Motion - Torque and Center of Gravity
No ratings yet
Rotational Motion - Torque and Center of Gravity
39 pages
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
No ratings yet
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
12 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Reinf Learning Res Paper 2
No ratings yet
Reinf Learning Res Paper 2
12 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
Zebex Z-6XXX Programming Guide
No ratings yet
Zebex Z-6XXX Programming Guide
94 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
1 s2.0 S0925231220303337 Main
No ratings yet
1 s2.0 S0925231220303337 Main
12 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
Industrial Filters PDF
No ratings yet
Industrial Filters PDF
48 pages
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
AC Voltmeter: PMMC Based: Known: FSD of I Solution
No ratings yet
AC Voltmeter: PMMC Based: Known: FSD of I Solution
21 pages
ISI-Entrance Solutions: Economics
No ratings yet
ISI-Entrance Solutions: Economics
4 pages
Motion and Its Types - What Is Motion - Types of Motion PPT 2
No ratings yet
Motion and Its Types - What Is Motion - Types of Motion PPT 2
1 page
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
3 pages
Reinforcement Learning in AI
No ratings yet
Reinforcement Learning in AI
4 pages
Relativistic Electrodynamics PDF
No ratings yet
Relativistic Electrodynamics PDF
10 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
Design and Development of A Petrol-Powered Hammer Mill For Rural Nigerian Farmers
No ratings yet
Design and Development of A Petrol-Powered Hammer Mill For Rural Nigerian Farmers
11 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Unleashing The Power of Reinforcement Learning
No ratings yet
Unleashing The Power of Reinforcement Learning
2 pages
Introduction To Cisco PIX and ASA
No ratings yet
Introduction To Cisco PIX and ASA
35 pages
Conceptual and Procedural Knowledge in Mathematics: An Introductory Analysis.
No ratings yet
Conceptual and Procedural Knowledge in Mathematics: An Introductory Analysis.
92 pages
Module 3 - Pneumatics Activity 1
No ratings yet
Module 3 - Pneumatics Activity 1
2 pages
Ts1 ts2
No ratings yet
Ts1 ts2
61 pages
Morales GrokkingDRL V02 Ch1
83% (6)
Morales GrokkingDRL V02 Ch1
34 pages
AIX Basics Student Guide-8
No ratings yet
AIX Basics Student Guide-8
4 pages
Wellcare Oil Tools Private Limited
No ratings yet
Wellcare Oil Tools Private Limited
4 pages
Datamahadev Com Ai Pilot Deep Reinforcement Learning Change Aviation Warfare
No ratings yet
Datamahadev Com Ai Pilot Deep Reinforcement Learning Change Aviation Warfare
8 pages
Multiflex Assembly Instructions
No ratings yet
Multiflex Assembly Instructions
52 pages
Chapter 4 Cheat Sheet
No ratings yet
Chapter 4 Cheat Sheet
4 pages
A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
Deep Reinforcement Learning Mohit Sewak
No ratings yet
Deep Reinforcement Learning Mohit Sewak
6 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
RM BV Manual PDF
No ratings yet
RM BV Manual PDF
9 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet