Optimal Control and Planning
Optimal Control and Planning
CS 285
Instructor: Sergey Levine
UC Berkeley
Today’s Lecture
1. Introduction to model-based reinforcement learning
2. What if we know the dynamics? How can we make decisions?
3. Stochastic optimization methods
4. Monte Carlo tree search (MCTS)
5. Trajectory optimization
• Goals:
• Understand how we can perform planning with known dynamics models in
discrete and continuous spaces
Recap: the reinforcement learning objective
Recap: model-free reinforcement learning
system dynamics
The objective
1. run away
2. ignore
3. pet
The deterministic case
The stochastic open-loop case
closed-loop open-loop
only sent at t = 1,
then it’s one-way!
The stochastic closed-loop case
can we do better?
typically use Gaussian
distribution
see also: CMA-ES (sort of
like CEM with
momentum)
What’s the upside?
1. Very fast if parallelized
2. Extremely simple
+10 +15
Discrete case: Monte Carlo tree search (MCTS)
30
10
Q = 22 22
Q = 38
12
N=2 13 N=2 31
Q = 12 Q=8 Q = 16
N=1 N=1 N=1
Q = 10
N=1
Additional reading
1. Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener,
Perez, Samothrakis, Colton. (2012). A Survey of Monte Carlo Tree
Search Methods.
• Survey of MCTS methods and basic summary.
Trajectory Optimization with Derivatives
Can we use derivatives?
Shooting methods vs collocation
shooting method: optimize over actions only
Shooting methods vs collocation
collocation method: optimize over actions and states, with constraints
Linear case: LQR
linear quadratic
Linear case: LQR
Linear case: LQR
Linear case: LQR