Lecture 1
Lecture 1
• Administrivia
• Topics covered by the course
• Behavioral cloning
• Imitation learning
• Quiz about background and interests
• (Time permitting) Query the expert only when policy is uncertain
Administrivia
Administrivia
This is a graduate level course
Recommended:
• Experience training neural networks or other function approximators
• Introductory concepts from reinforcement learning or control (e.g. value function/cost-to-go)
Prerequisites
Mandatory:
If you’re missing any of
• Introductory machine learning (e.g. CSC411/ECE521 or equivalent) these this is not the course
• Basic linear algebra + multivariable calculus for you.
• Intro to probability
You’re welcome to audit.
• Programming skills in Python or C++ (enough to validate your ideas)
Project guidelines
http://www.cs.toronto.edu/~florian/courses/csc2626w21/CSC2626_Project_Guidelines.pdf
Grading
Two assignments: 50% Individual submissions
Project guidelines
http://www.cs.toronto.edu/~florian/courses/csc2626w21/CSC2626_Project_Guidelines.pdf
Grading
Two assignments: 50% Individual submissions
• Project presentation: 5%
• Final project report (6-8 pages) + code: 30%
Project guidelines
http://www.cs.toronto.edu/~florian/courses/csc2626w21/CSC2626_Project_Guidelines.pdf
Guiding principles for this course
Robots do not operate in a vacuum. They do not need to learn everything from scratch.
Guiding principles for this course
Robots do not operate in a vacuum. They do not need to learn everything from scratch.
Humans need to easily interact with robots and share our expertise with them.
Guiding principles for this course
Robots do not operate in a vacuum. They do not need to learn everything from scratch.
Humans need to easily interact with robots and share our expertise with them.
Robots need to learn from the behavior and experience of others, not just their own.
Main questions
Reward/cost learning
How can robots easily understand our Task specification
objectives from demonstrations? Inverse reinforcement learning
Inverse optimal control
Inverse optimization
https://www.youtube.com/watch?v=2KMAAmkz9go https://www.youtube.com/watch?v=ilP4aPDTBPE
https://drive.google.com/file/d/0Bz9namoRlUKMa0pJYzRGSFVwbm8/view
Dean Pomerleau’s PhD thesis
ALVINN: training set
https://www.youtube.com/watch?v=qhUvQiKec2U
How much has changed?
offline
“Our collected data is labeled with road type, weather condition, and the driver’s
activity (staying in a lane, switching lanes, turning, and so forth).”
A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots, Giusti et al., 2016
https://www.youtube.com/watch?v=umRdt3zGgpU
How much has changed?
But, there are a few other beautiful ideas that do not involve end-to-end learning.
Visual Teach & Repeat
Human Operator or
Planning Algorithm
Visual Path Following on a Manifold in Unstructured Three-Dimensional Terrain, Furgale & Barfoot, 2010
Visual Teach & Repeat
Key Idea #1: Manifold Map
Visual Path Following on a Manifold in Unstructured Three-Dimensional Terrain, Furgale & Barfoot, 2010
Visual Teach & Repeat
Key Idea #1: Manifold Map Key Idea #2: Visual Odometry
Visual Path Following on a Manifold in Unstructured Three-Dimensional Terrain, Furgale & Barfoot, 2010
Visual Teach & Repeat
https://www.youtube.com/watch?v=_ZdBfU4xJnQ https://www.youtube.com/watch?v=9dN0wwXDuqo
• Administrivia
• Topics covered by the course
• Behavioral cloning
• Imitation learning
• Quiz about background and interests
• (Time permitting) Query the expert only when policy is uncertain
Back to Pomerleau
(Ross & Bagnell, 2010): How are we sure these errors are not due to
overfitting or underfitting?
Steering commands
where s are image features
(Ross & Bagnell, 2010): How are we sure these errors are not due to
overfitting or underfitting?
Steering commands
where s are image features
Supervised Learning
Train/test data are not i.i.d. Assumes train/test data are i.i.d.
Train/test data are not i.i.d. Assumes train/test data are i.i.d.
https://www.youtube.com/watch?v=V00npNnWzSU
DAgger
DAgger
https://www.youtube.com/watch?v=hNsP6-K3Hn4
Learning Monocular Reactive UAV Control in Cluttered Natural Environments, Ross et al, 2013
Today’s agenda
• Administrivia
• Topics covered by the course
• Behavioral cloning
• Imitation learning
• Quiz about background and interests
• (Time permitting) Query the expert only when policy is uncertain
DAgger: Assumptions for theoretical guarantees
(Ross & Gordon & Bagnell, 2011): DAgger, or Dataset Aggregation
Train/test data are not i.i.d. Assumes train/test data are i.i.d.
No-regret:
Appendix: Types of Uncertainty &
Query-Efficient Imitation
1. DropoutDAgger:
Keep an ensemble of learner policies, and only query the expert when they significantly disagree
observations
Biased Coin
Q: Even if you eventually discover the true model, can you predict if the next flip will be heads?
Biased Coin
Q: Even if you eventually discover the true model, can you predict if the next flip will be heads?
A: No, there is irreducible uncertainty / observation noise in the system. This is called aleatoric uncertainty.
Gaussian Process Regression
http://pyro.ai/examples/gp.html
Gaussian Process Regression
http://pyro.ai/examples/gp.html
Noisy observations
Gaussian Process Regression
http://pyro.ai/examples/gp.html
Noisy observations
Gaussian Process Regression
http://pyro.ai/examples/gp.html
Noisy observations
Gaussian Process Classification