0% found this document useful (0 votes)
2 views

02 - Dynamic Programming and LQR

The document discusses optimal control strategies using Dynamic Programming and Linear Quadratic Regulation (LQR) for a lunar trajectory planning problem. It outlines the optimization problem, the challenges of non-convexity, and the use of Bellman's principle to decompose the problem into manageable subproblems. The document also covers the computation of optimal control inputs and the transition from finite to infinite horizon LQR, highlighting the algebraic Riccati equation for determining the optimal feedback control law.

Uploaded by

Ahmet Çelik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

02 - Dynamic Programming and LQR

The document discusses optimal control strategies using Dynamic Programming and Linear Quadratic Regulation (LQR) for a lunar trajectory planning problem. It outlines the optimization problem, the challenges of non-convexity, and the use of Bellman's principle to decompose the problem into manageable subproblems. The document also covers the computation of optimal control inputs and the transition from finite to infinite horizon LQR, highlighting the algebraic Riccati equation for determining the optimal feedback control law.

Uploaded by

Ahmet Çelik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Computational Control

Dynamic Programming and LQR

Saverio Bolognani

Automatic Control Laboratory (IfA)


ETH Zurich
Control task: decide sequence of control “pulses” {ut }t=0,1,2,...,T
Performance metric:
▶ minimize fuel use
▶ circumnavigate the Moon
▶ get back to Earth by time T

1 / 23
Optimal control input

In absence of disturbance/uncertainty, the entire behavior of the system is


determined by the sequence u = u0 , u1 , u2 , . . .

Optimization problem

min J(u)
u∈U

U = Ũ × {0, 1, . . . , T } represents the set of feasible inputs


J(u) is the sum of
▶ a fuel cost term ∥u∥
▶ a terminal constraint term δ(u)
(
0 if the probe has reached the Moon and is back on Earth at time T
δ(u) =
+∞ otherwise

2 / 23
min J(u)
u∈U

An intractable optimization problem:

the dimension of u make the problem size formidable

the cost function requires an accurate model of the system


▶ first-principle model
▶ numerical simulator
▶ black box/oracle for the terminal constraint

the optimization problem is non-convex

the open-loop nature of the optimal control sequence makes it useless in


presence of disturbances and uncertainty

3 / 23
Problem decomposition

Key assumptions: Markovian representation


1 There exists a Markovian state that evolves according to

xt+1 = ft (xt , ut ) (discrete-time dynamical system)

2 The initial condition x0 is known.


3 The cost is additive over time
X
J= gt (xt , ut )
t

What does Markovian mean?


What would be the a valid state for the lunar trajectory planning?
Would the proposed cost satisfy this assumption?

4 / 23
Bellman’s principle

T
X
min gt (xt , ut )
u,x
t=0

subject to xt+1 = ft (xt , ut ), x0 = X0


xt ∈ Xt ∀t
ut ∈ Ut ∀t.

Bellman’s principle of optimality


“An optimal policy has the property that whatever the initial state and the initial
decisions are, the remaining decisions must constitute an optimal policy with
regard to the state resulting from the first decision.”

T
 
 X
g u


 min (x
t t , t ) 


 u1 ,...uT 

x1 ,...,xT t=1
min g0 (X0 , u0 ) +
u0 


 subject to xt+1 = ft (xt , ut ) 


 
x1 = f0 (x0 , u0 )

5 / 23
 

 

 
T

 


 X 
gt (xt , ut ) 


 min 

 u1 ,...uT 

x1 ,...,xT t=1
V0 (X0 ) = min g0 (X0 , u0 ) +
u0 


 subject to xt+1 = ft (xt , ut )



 



 x1 = f0 (x0 , u0 ) 


 | {z }

= V1 (x1 ) = V1 (f0 (x0 ,u0 ))

The problems are nested, therefore we need to


▶ solve the subproblem parametrized in x1
▶ compute the value function V1 (x1 )
▶ solve the “outer” problem
n o
min g(X0 , u0 ) + V1 (f0 (X0 , u0 ))
u0

The “inner” problem is a smaller optimal control problem (shorter horizon).

6 / 23
Dynamic programming
At the last stage (T ), the subproblem is trivial.

VT (xT ) = gT (xT )

At time T − 1, compute VT −1 (xT −1 ) as

min gT −1 (xT −1 , uT −1 ) + VT (xT )


uT −1

subject to xT = fT −1 (xT −1 , uT −1 )

that is
min gT −1 (xT −1 , uT −1 ) + VT (fT −1 (xT −1 , uT −1 ))
uT −1

past decisions u0 , . . . , uT −2 uT −1

x xT

VT (xT )
xT −1

0 T −1 T

7 / 23
Dynamic programming

At time T − 2, compute VT −2 (xT −2 ) as

min gT −2 (xT −2 , uT −2 ) + VT −1 (xT −1 )


uT −2

subject to xT −1 = fT −2 (xT −2 , uT −2 )

that is
min gT −2 (xT −2 , uT −2 ) + VT −1 (fT −2 (xT −2 , uT −2 ))
uT −2

past decisions u0 , . . . , uT −3 uT −2 optimal

x xT −1
xT −2
VT −1 (xT −1 )

0 T −1 T

8 / 23
Problem decomposition
The optimal control problem is decomposed into stage problems that we can
solve via backward induction.

n o
Vt (xt ) = min gt (xt , ut ) + Vt+1 (ft (xt , ut )) , VT (xT ) = gT (xT ).
ut

When is this stage problem simple to solve?


Decision space Ũ convex.
gt convex in ut .
Vt+1 ◦ ft convex in ut

In practice: Quadratic cost g and linear dynamics f


Unless we use approximations of the value function

9 / 23
Optimal Linear-Quadratic Regulation (LQR)

Markovian update
xt+1 = Axt + But
Cost function
T
X −1  
xt⊤ Qxt + ut⊤ Rut + xT⊤ SxT , Q, S ⪰ 0, R ≻ 0
t=0

Value function (minimum cost to go, starting from x at time t)


T
X −1  
Vt (x) = min xs⊤ Qxs + us⊤ Rus + xT⊤ SxT
ut ,...uT −1
s=t

subject to xs+1 = Axs + Bus s = t, . . . , T − 1


xt = x

V0 (x0 ) is the optimal cost.

10 / 23
LQR

Key ideas
1 VT (x) = x ⊤ Sx (convex quadratic function)
2 Show that Vt (x) is also quadratic: Vt (x) = x ⊤ Pt x
3 Compute Pt recursively, working backward from T
4 The optimal ut can be computed as a solution of a convex optimization
problem.

We proceed by induction at stage t, where the value function is

Vt (x) = min x ⊤ Qx + u⊤ Ru +Vt+1 (Ax + Bu)


u | {z }
stage cost

11 / 23
Induction step: optimal control at stage t

We proceed by induction, assuming that Vt+1 (x) = x ⊤ Pt+1 x.


Then Vt (x) becomes

Vt (x) = x ⊤ Qx + min u⊤ Ru + (Ax + Bu)⊤ Pt+1 (Ax + Bu) =


u

x Qx + min u⊤ Ru + x ⊤ A⊤ Pt+1 Ax + 2u⊤ B⊤ Pt+1 Ax + u⊤ B⊤ Pt+1 Bu



u

= x ⊤ Qx + min u⊤ (R + B⊤ Pt+1 B)u + x ⊤ A⊤ Pt+1 Ax + 2u⊤ B⊤ Pt+1 Ax


u

Unconstrained optimization problem:


→ to find the minimizer, we set the gradient with respect to u to zero, i.e.

(R + B⊤ Pt+1 B)u∗ + B⊤ Pt+1 Ax = 0

that is
u∗ = −(R + B⊤ Pt+1 B)−1 B⊤ Pk+1 Ax

12 / 23
Induction step: quadratic value function

Now we need to check that Vt (x) is also quadratic in x, i.e. Vt (x) = x ⊤ Pt x, and to
find how to compute Pt .
We plug the minimizer u∗ into Vt (x), and obtain the ugly expression

Vt (x) = x ⊤ Qx + x ⊤ APt+1 B(R + B⊤ Pt+1 B)−1 (R + B⊤ Pt+1 B) ·


· (R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Ax + x ⊤ A⊤ Pt+1 Ax
− 2x ⊤ APt+1 B(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Ax.

Just collecting the different terms we get Vt (x) = x ⊤ Pt x where Pt is defined as

Pt = Q + A⊤ Pt+1 A − A⊤ Pt+1 B(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A

which completes the proof.


Anything missing?

13 / 23
LQR
Optimal control sequence
Backward induction:

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Axt

where

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Forward integration from x0 :

xt+1 = Axt + But .

Result from offline computation: open-loop sequence u0 , u1 , . . . , uT −1


optimal decisions u0 , . . . , uT −1

0 T

14 / 23
What if the optimal control input u takes us to
a trajectory different from the one we have computed?
optimal decisions u0 , . . . , uT −1

0 T

Optimal feedback control

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A xt


| {z }
:= Γt

where

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

15 / 23
Computational complexity

Optimal feedback control

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A xt


| {z }
:= Γt

where

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Offline computation Online computation

n × n matrix multiplications storage of T m × n matrices


m × m matrix inversion m × n matrix multiplications
T iterations

16 / 23
Infinite horizon LQR

Extension of the LQR optimal control to T → ∞.


“persistent” tracking and regulation problems
time-invariant feedback control law
▶ requires less memory in online implementation
▶ allows analysis tools (stability, etc.) from LTI system theory

17 / 23
Infinite-horizon cost function
∞ 
X 
min xt⊤ Qxt + ut⊤ Rut , Q ⪰ 0, R ≻ 0
t=0

subject to xt+1 = Axt + But , x0 = X0

Feasibility
If the system is stabilizable, then there is an input sequence that yields a finite cost.

Proof:
Let K be a linear feedback such that A + BK has eigenvalues inside the unit circle.
Consider the input u(t) = Kx(t), which yields

X
V (X0 ) = xt⊤ Qxt + xt⊤ K ⊤ RKxt
t=0

t
As xt = (A + BK ) X0 , we have

X
V (X0 ) = X0⊤ ((A + BK )⊤ )t (Q + K ⊤ RK )(A + BK )t X0
t=0

which is finite (≈ geometric series).


18 / 23
How would you compute the optimal control for the infinite-horizon LQR?

Algebraic Riccati Equation


The iteration

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, P0 = 0

converges, for t → −∞, to a solution P∞ ≻ 0 of the algebraic Riccati equation

P = Q + A⊤ PA − A⊤ PB(R + B⊤ PB)−1 B⊤ PA.

Sketch of the proof:


The sequence P0 , P−1 , P−2 , . . . is non decreasing (Pt ⪰ Pt+1 ). Why?
The sequence P0 , P−1 , P−2 , . . . is upper-bounded. Why?

19 / 23
Optimal feedback control (infinite horizon)
The input
ut = −(R + B⊤ P∞ B)−1 B⊤ P∞ A xt
| {z }
:= Γ∞

minimizes the infinite-horizon control cost and yields V (X0 ) = X0⊤ P∞ X0 .

Offline computation Online computation

n × n matrix multiplications storage of one m × n matrix


m × m matrix inversion m × n matrix multiplication
“Infinite” iterations*
* P∞ can be computed by solving
the Algebraic Riccati Equation.

20 / 23
Infinite horizon LQR

Extension of the LQR optimal control to T → ∞.


“persistent” tracking and regulation problems
time-invariant feedback control law
▶ requires less memory in online implementation
▶ allows analysis tools (stability, etc.) from LTI system theory

21 / 23
Stability of the optimal controller

Is it a sensible question to ask?


Example

       
2 0 1 0 1 0 ? ?
xt+1 = x + ut , Q= , R= ut = x
0 3 t 0 0 0 1 ? ? t
| {z }
Γ∞

Theorem
Let Q = C⊤ C. The optimal infinite-horizon feedback Γ∞ stabilizes the system if
and only if
xt+1 = Axt , yt = Cxt
does not have unobservable unstable modes.

In practice: if the system has unstable dynamics, those states need to be weighted
in the cost function.

22 / 23
The control engineer flowchart

23 / 23
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License

https://bsaver.io/COCO

You might also like