0% found this document useful (0 votes)
6 views46 pages

Optimal Control and Planning

Uploaded by

haopengchen233
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views46 pages

Optimal Control and Planning

Uploaded by

haopengchen233
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Optimal Control and Planning

CS 285
Instructor: Sergey Levine
UC Berkeley
Today’s Lecture
1. Introduction to model-based reinforcement learning
2. What if we know the dynamics? How can we make decisions?
3. Stochastic optimization methods
4. Monte Carlo tree search (MCTS)
5. Trajectory optimization
• Goals:
• Understand how we can perform planning with known dynamics models in
discrete and continuous spaces
Recap: the reinforcement learning objective
Recap: model-free reinforcement learning

assume this is unknown


don’t even attempt to learn it
What if we knew the transition dynamics?

• Often we do know the dynamics


1. Games (e.g., Atari games, chess, Go)
2. Easily modeled systems (e.g., navigating a car)
3. Simulated environments (e.g., simulated robots, video games)
• Often we can learn the dynamics
1. System identification – fit unknown parameters of a known model
2. Learning – fit a general-purpose model to observed transition data

Does knowing the dynamics make things easier?


Often, yes!
Model-based reinforcement learning
1. Model-based reinforcement learning: learn the transition dynamics,
then figure out how to choose actions
2. Today: how can we make decisions if we know the dynamics?
a. How can we choose actions under perfect knowledge of the system dynamics?
b. Optimal control, trajectory optimization, planning
3. Next week: how can we learn unknown dynamics?
4. How can we then also learn policies? (e.g. by imitating optimal control)
policy

system dynamics
The objective

1. run away
2. ignore
3. pet
The deterministic case
The stochastic open-loop case

why is this suboptimal?


Aside: terminology
what is this “loop”?

closed-loop open-loop

only sent at t = 1,
then it’s one-way!
The stochastic closed-loop case

(more on this later)


Open-Loop Planning
But for now, open-loop planning
Stochastic optimization

simplest method: guess & check “random shooting method”


Cross-entropy method (CEM)

can we do better?
typically use Gaussian
distribution
see also: CMA-ES (sort of
like CEM with
momentum)
What’s the upside?
1. Very fast if parallelized
2. Extremely simple

What’s the problem?


1. Very harsh dimensionality limit
2. Only open-loop planning
Discrete case: Monte Carlo tree search (MCTS)
Discrete case: Monte Carlo tree search (MCTS)

e.g., random policy


Discrete case: Monte Carlo tree search (MCTS)

+10 +15
Discrete case: Monte Carlo tree search (MCTS)

30
10
Q = 22 22
Q = 38
12
N=2 13 N=2 31

Q = 12 Q=8 Q = 16
N=1 N=1 N=1
Q = 10
N=1
Additional reading
1. Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener,
Perez, Samothrakis, Colton. (2012). A Survey of Monte Carlo Tree
Search Methods.
• Survey of MCTS methods and basic summary.
Trajectory Optimization with Derivatives
Can we use derivatives?
Shooting methods vs collocation
shooting method: optimize over actions only
Shooting methods vs collocation
collocation method: optimize over actions and states, with constraints
Linear case: LQR

linear quadratic
Linear case: LQR
Linear case: LQR
Linear case: LQR

quadratic linear linear


Linear case: LQR

quadratic linear linear


Linear case: LQR
Linear case: LQR
LQR for Stochastic and Nonlinear Systems
Stochastic dynamics
The stochastic closed-loop case
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQR
Case Study and Additional Readings
Case study: nonlinear model-predictive control
Additional reading
1. Mayne, Jacobson. (1970). Differential dynamic programming.
• Original differential dynamic programming algorithm.
2. Tassa, Erez, Todorov. (2012). Synthesis and Stabilization of Complex
Behaviors through Online Trajectory Optimization.
• Practical guide for implementing non-linear iterative LQR.
3. Levine, Abbeel. (2014). Learning Neural Network Policies with Guided
Policy Search under Unknown Dynamics.
• Probabilistic formulation and trust region alternative to deterministic line search.
What’s wrong with known dynamics?

Next time: learning the dynamics model

You might also like