02 - Dynamic Programming and LQR
02 - Dynamic Programming and LQR
Saverio Bolognani
1 / 23
Optimal control input
Optimization problem
min J(u)
u∈U
2 / 23
min J(u)
u∈U
3 / 23
Problem decomposition
4 / 23
Bellman’s principle
T
X
min gt (xt , ut )
u,x
t=0
T
X
g u
min (x
t t , t )
u1 ,...uT
x1 ,...,xT t=1
min g0 (X0 , u0 ) +
u0
subject to xt+1 = ft (xt , ut )
x1 = f0 (x0 , u0 )
5 / 23
T
X
gt (xt , ut )
min
u1 ,...uT
x1 ,...,xT t=1
V0 (X0 ) = min g0 (X0 , u0 ) +
u0
subject to xt+1 = ft (xt , ut )
x1 = f0 (x0 , u0 )
| {z }
= V1 (x1 ) = V1 (f0 (x0 ,u0 ))
6 / 23
Dynamic programming
At the last stage (T ), the subproblem is trivial.
VT (xT ) = gT (xT )
subject to xT = fT −1 (xT −1 , uT −1 )
that is
min gT −1 (xT −1 , uT −1 ) + VT (fT −1 (xT −1 , uT −1 ))
uT −1
past decisions u0 , . . . , uT −2 uT −1
x xT
VT (xT )
xT −1
0 T −1 T
7 / 23
Dynamic programming
subject to xT −1 = fT −2 (xT −2 , uT −2 )
that is
min gT −2 (xT −2 , uT −2 ) + VT −1 (fT −2 (xT −2 , uT −2 ))
uT −2
x xT −1
xT −2
VT −1 (xT −1 )
0 T −1 T
8 / 23
Problem decomposition
The optimal control problem is decomposed into stage problems that we can
solve via backward induction.
n o
Vt (xt ) = min gt (xt , ut ) + Vt+1 (ft (xt , ut )) , VT (xT ) = gT (xT ).
ut
9 / 23
Optimal Linear-Quadratic Regulation (LQR)
Markovian update
xt+1 = Axt + But
Cost function
T
X −1
xt⊤ Qxt + ut⊤ Rut + xT⊤ SxT , Q, S ⪰ 0, R ≻ 0
t=0
10 / 23
LQR
Key ideas
1 VT (x) = x ⊤ Sx (convex quadratic function)
2 Show that Vt (x) is also quadratic: Vt (x) = x ⊤ Pt x
3 Compute Pt recursively, working backward from T
4 The optimal ut can be computed as a solution of a convex optimization
problem.
11 / 23
Induction step: optimal control at stage t
that is
u∗ = −(R + B⊤ Pt+1 B)−1 B⊤ Pk+1 Ax
12 / 23
Induction step: quadratic value function
Now we need to check that Vt (x) is also quadratic in x, i.e. Vt (x) = x ⊤ Pt x, and to
find how to compute Pt .
We plug the minimizer u∗ into Vt (x), and obtain the ugly expression
13 / 23
LQR
Optimal control sequence
Backward induction:
where
0 T
14 / 23
What if the optimal control input u takes us to
a trajectory different from the one we have computed?
optimal decisions u0 , . . . , uT −1
0 T
where
15 / 23
Computational complexity
where
16 / 23
Infinite horizon LQR
17 / 23
Infinite-horizon cost function
∞
X
min xt⊤ Qxt + ut⊤ Rut , Q ⪰ 0, R ≻ 0
t=0
Feasibility
If the system is stabilizable, then there is an input sequence that yields a finite cost.
Proof:
Let K be a linear feedback such that A + BK has eigenvalues inside the unit circle.
Consider the input u(t) = Kx(t), which yields
∞
X
V (X0 ) = xt⊤ Qxt + xt⊤ K ⊤ RKxt
t=0
t
As xt = (A + BK ) X0 , we have
∞
X
V (X0 ) = X0⊤ ((A + BK )⊤ )t (Q + K ⊤ RK )(A + BK )t X0
t=0
19 / 23
Optimal feedback control (infinite horizon)
The input
ut = −(R + B⊤ P∞ B)−1 B⊤ P∞ A xt
| {z }
:= Γ∞
20 / 23
Infinite horizon LQR
21 / 23
Stability of the optimal controller
2 0 1 0 1 0 ? ?
xt+1 = x + ut , Q= , R= ut = x
0 3 t 0 0 0 1 ? ? t
| {z }
Γ∞
Theorem
Let Q = C⊤ C. The optimal infinite-horizon feedback Γ∞ stabilizes the system if
and only if
xt+1 = Axt , yt = Cxt
does not have unobservable unstable modes.
In practice: if the system has unstable dynamics, those states need to be weighted
in the cost function.
22 / 23
The control engineer flowchart
23 / 23
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
https://bsaver.io/COCO