0% found this document useful (0 votes)

16 views25 pages

02 - Dynamic Programming and LQR

The document discusses optimal control strategies using Dynamic Programming and Linear Quadratic Regulation (LQR) for a lunar trajectory planning problem. It outlines the optimization problem, the challenges of non-convexity, and the use of Bellman's principle to decompose the problem into manageable subproblems. The document also covers the computation of optimal control inputs and the transition from finite to infinite horizon LQR, highlighting the algebraic Riccati equation for determining the optimal feedback control law.

Uploaded by

Ahmet Çelik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views25 pages

02 - Dynamic Programming and LQR

Uploaded by

Ahmet Çelik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Computational Control

Dynamic Programming and LQR

Saverio Bolognani

Automatic Control Laboratory (IfA)

ETH Zurich
Control task: decide sequence of control “pulses” {ut }t=0,1,2,...,T
Performance metric:
▶ minimize fuel use
▶ circumnavigate the Moon
▶ get back to Earth by time T

1 / 23
Optimal control input

In absence of disturbance/uncertainty, the entire behavior of the system is

determined by the sequence u = u0 , u1 , u2 , . . .

Optimization problem

min J(u)
u∈U

U = Ũ × {0, 1, . . . , T } represents the set of feasible inputs

J(u) is the sum of
▶ a fuel cost term ∥u∥
▶ a terminal constraint term δ(u)
(
0 if the probe has reached the Moon and is back on Earth at time T
δ(u) =
+∞ otherwise

2 / 23
min J(u)
u∈U

An intractable optimization problem:

the dimension of u make the problem size formidable

the cost function requires an accurate model of the system

▶ first-principle model
▶ numerical simulator
▶ black box/oracle for the terminal constraint

the optimization problem is non-convex

the open-loop nature of the optimal control sequence makes it useless in

presence of disturbances and uncertainty

3 / 23
Problem decomposition

Key assumptions: Markovian representation

1 There exists a Markovian state that evolves according to

xt+1 = ft (xt , ut ) (discrete-time dynamical system)

2 The initial condition x0 is known.

3 The cost is additive over time
X
J= gt (xt , ut )
t

What does Markovian mean?

What would be the a valid state for the lunar trajectory planning?
Would the proposed cost satisfy this assumption?

4 / 23
Bellman’s principle

T
X
min gt (xt , ut )
u,x
t=0

subject to xt+1 = ft (xt , ut ), x0 = X0

xt ∈ Xt ∀t
ut ∈ Ut ∀t.

Bellman’s principle of optimality

“An optimal policy has the property that whatever the initial state and the initial
decisions are, the remaining decisions must constitute an optimal policy with
regard to the state resulting from the first decision.”

T
 
 X
g u


 min (x
t t , t ) 


 u1 ,...uT 

x1 ,...,xT t=1
min g0 (X0 , u0 ) +
u0 


 subject to xt+1 = ft (xt , ut ) 


 
x1 = f0 (x0 , u0 )

5 / 23
 

 

 
T

 


 X 
gt (xt , ut ) 


 min 

 u1 ,...uT 

x1 ,...,xT t=1
V0 (X0 ) = min g0 (X0 , u0 ) +
u0 


 subject to xt+1 = ft (xt , ut )



 



 x1 = f0 (x0 , u0 ) 


 | {z }

= V1 (x1 ) = V1 (f0 (x0 ,u0 ))

The problems are nested, therefore we need to

▶ solve the subproblem parametrized in x1
▶ compute the value function V1 (x1 )
▶ solve the “outer” problem
n o
min g(X0 , u0 ) + V1 (f0 (X0 , u0 ))
u0

The “inner” problem is a smaller optimal control problem (shorter horizon).

6 / 23
Dynamic programming
At the last stage (T ), the subproblem is trivial.

VT (xT ) = gT (xT )

At time T − 1, compute VT −1 (xT −1 ) as

min gT −1 (xT −1 , uT −1 ) + VT (xT )

uT −1

subject to xT = fT −1 (xT −1 , uT −1 )

that is
min gT −1 (xT −1 , uT −1 ) + VT (fT −1 (xT −1 , uT −1 ))
uT −1

past decisions u0 , . . . , uT −2 uT −1

x xT

VT (xT )
xT −1

0 T −1 T

7 / 23
Dynamic programming

At time T − 2, compute VT −2 (xT −2 ) as

min gT −2 (xT −2 , uT −2 ) + VT −1 (xT −1 )

uT −2

subject to xT −1 = fT −2 (xT −2 , uT −2 )

that is
min gT −2 (xT −2 , uT −2 ) + VT −1 (fT −2 (xT −2 , uT −2 ))
uT −2

past decisions u0 , . . . , uT −3 uT −2 optimal

x xT −1
xT −2
VT −1 (xT −1 )

0 T −1 T

8 / 23
Problem decomposition
The optimal control problem is decomposed into stage problems that we can
solve via backward induction.

n o
Vt (xt ) = min gt (xt , ut ) + Vt+1 (ft (xt , ut )) , VT (xT ) = gT (xT ).
ut

When is this stage problem simple to solve?

Decision space Ũ convex.
gt convex in ut .
Vt+1 ◦ ft convex in ut

In practice: Quadratic cost g and linear dynamics f

Unless we use approximations of the value function

9 / 23
Optimal Linear-Quadratic Regulation (LQR)

Markovian update
xt+1 = Axt + But
Cost function
T
X −1
xt⊤ Qxt + ut⊤ Rut + xT⊤ SxT , Q, S ⪰ 0, R ≻ 0
t=0

Value function (minimum cost to go, starting from x at time t)

T
X −1
Vt (x) = min xs⊤ Qxs + us⊤ Rus + xT⊤ SxT
ut ,...uT −1
s=t

subject to xs+1 = Axs + Bus s = t, . . . , T − 1

xt = x

V0 (x0 ) is the optimal cost.

10 / 23
LQR

Key ideas
1 VT (x) = x ⊤ Sx (convex quadratic function)
2 Show that Vt (x) is also quadratic: Vt (x) = x ⊤ Pt x
3 Compute Pt recursively, working backward from T
4 The optimal ut can be computed as a solution of a convex optimization
problem.

We proceed by induction at stage t, where the value function is

Vt (x) = min x ⊤ Qx + u⊤ Ru +Vt+1 (Ax + Bu)

u | {z }
stage cost

11 / 23
Induction step: optimal control at stage t

We proceed by induction, assuming that Vt+1 (x) = x ⊤ Pt+1 x.

Then Vt (x) becomes

Vt (x) = x ⊤ Qx + min u⊤ Ru + (Ax + Bu)⊤ Pt+1 (Ax + Bu) =

x Qx + min u⊤ Ru + x ⊤ A⊤ Pt+1 Ax + 2u⊤ B⊤ Pt+1 Ax + u⊤ B⊤ Pt+1 Bu

⊤
u

= x ⊤ Qx + min u⊤ (R + B⊤ Pt+1 B)u + x ⊤ A⊤ Pt+1 Ax + 2u⊤ B⊤ Pt+1 Ax

Unconstrained optimization problem:

→ to find the minimizer, we set the gradient with respect to u to zero, i.e.

(R + B⊤ Pt+1 B)u∗ + B⊤ Pt+1 Ax = 0

that is
u∗ = −(R + B⊤ Pt+1 B)−1 B⊤ Pk+1 Ax

12 / 23
Induction step: quadratic value function

Now we need to check that Vt (x) is also quadratic in x, i.e. Vt (x) = x ⊤ Pt x, and to
find how to compute Pt .
We plug the minimizer u∗ into Vt (x), and obtain the ugly expression

Vt (x) = x ⊤ Qx + x ⊤ APt+1 B(R + B⊤ Pt+1 B)−1 (R + B⊤ Pt+1 B) ·

· (R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Ax + x ⊤ A⊤ Pt+1 Ax
− 2x ⊤ APt+1 B(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Ax.

Just collecting the different terms we get Vt (x) = x ⊤ Pt x where Pt is defined as

Pt = Q + A⊤ Pt+1 A − A⊤ Pt+1 B(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A

which completes the proof.

Anything missing?

13 / 23
LQR
Optimal control sequence
Backward induction:

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Axt

where

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Forward integration from x0 :

xt+1 = Axt + But .

Result from offline computation: open-loop sequence u0 , u1 , . . . , uT −1

optimal decisions u0 , . . . , uT −1

0 T

14 / 23
What if the optimal control input u takes us to
a trajectory different from the one we have computed?
optimal decisions u0 , . . . , uT −1

0 T

Optimal feedback control

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A xt

| {z }
:= Γt

where

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

15 / 23
Computational complexity

Optimal feedback control

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A xt

| {z }
:= Γt

where

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Offline computation Online computation

n × n matrix multiplications storage of T m × n matrices

m × m matrix inversion m × n matrix multiplications
T iterations

16 / 23
Infinite horizon LQR

Extension of the LQR optimal control to T → ∞.

“persistent” tracking and regulation problems
time-invariant feedback control law
▶ requires less memory in online implementation
▶ allows analysis tools (stability, etc.) from LTI system theory

17 / 23
Infinite-horizon cost function
∞
X
min xt⊤ Qxt + ut⊤ Rut , Q ⪰ 0, R ≻ 0
t=0

subject to xt+1 = Axt + But , x0 = X0

Feasibility
If the system is stabilizable, then there is an input sequence that yields a finite cost.

Proof:
Let K be a linear feedback such that A + BK has eigenvalues inside the unit circle.
Consider the input u(t) = Kx(t), which yields
∞
X
V (X0 ) = xt⊤ Qxt + xt⊤ K ⊤ RKxt
t=0

t
As xt = (A + BK ) X0 , we have
∞
X
V (X0 ) = X0⊤ ((A + BK )⊤ )t (Q + K ⊤ RK )(A + BK )t X0
t=0

which is finite (≈ geometric series).

18 / 23
How would you compute the optimal control for the infinite-horizon LQR?

Algebraic Riccati Equation

The iteration

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, P0 = 0

converges, for t → −∞, to a solution P∞ ≻ 0 of the algebraic Riccati equation

P = Q + A⊤ PA − A⊤ PB(R + B⊤ PB)−1 B⊤ PA.

Sketch of the proof:

The sequence P0 , P−1 , P−2 , . . . is non decreasing (Pt ⪰ Pt+1 ). Why?
The sequence P0 , P−1 , P−2 , . . . is upper-bounded. Why?

19 / 23
Optimal feedback control (infinite horizon)
The input
ut = −(R + B⊤ P∞ B)−1 B⊤ P∞ A xt
| {z }
:= Γ∞

minimizes the infinite-horizon control cost and yields V (X0 ) = X0⊤ P∞ X0 .

Offline computation Online computation

n × n matrix multiplications storage of one m × n matrix

m × m matrix inversion m × n matrix multiplication
“Infinite” iterations*
* P∞ can be computed by solving
the Algebraic Riccati Equation.

20 / 23
Infinite horizon LQR

Extension of the LQR optimal control to T → ∞.

21 / 23
Stability of the optimal controller

Is it a sensible question to ask?

Example

2 0 1 0 1 0 ? ?
xt+1 = x + ut , Q= , R= ut = x
0 3 t 0 0 0 1 ? ? t
| {z }
Γ∞

Theorem
Let Q = C⊤ C. The optimal infinite-horizon feedback Γ∞ stabilizes the system if
and only if
xt+1 = Axt , yt = Cxt
does not have unobservable unstable modes.

In practice: if the system has unstable dynamics, those states need to be weighted
in the cost function.

22 / 23
The control engineer flowchart

23 / 23
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License

https://bsaver.io/COCO

Borelli Predictive Control PDF
No ratings yet
Borelli Predictive Control PDF
424 pages
Optimal and Robust Control
No ratings yet
Optimal and Robust Control
233 pages
2017--On the Sample Complexity of the Linear Quadratic Regulator
No ratings yet
2017--On the Sample Complexity of the Linear Quadratic Regulator
43 pages
Relations.ppt
No ratings yet
Relations.ppt
46 pages
2018-Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator
No ratings yet
2018-Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator
47 pages
05 - Robust MPC
No ratings yet
05 - Robust MPC
28 pages
THESIS-Neural Networks and Sliding Modes Control
No ratings yet
THESIS-Neural Networks and Sliding Modes Control
258 pages
Linear Quadratic Regulator
0% (1)
Linear Quadratic Regulator
52 pages
03 - Model Predictive Control
No ratings yet
03 - Model Predictive Control
47 pages
Robotics: Control Theory
No ratings yet
Robotics: Control Theory
54 pages
Lec ICFDC 5
No ratings yet
Lec ICFDC 5
16 pages
LQR
No ratings yet
LQR
34 pages
Optimal Control and Decision Making: Eexam
No ratings yet
Optimal Control and Decision Making: Eexam
18 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
Lecture5 LQR PDF
No ratings yet
Lecture5 LQR PDF
54 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
4F3 - Predictive Control
No ratings yet
4F3 - Predictive Control
27 pages
RL and ObC Lecture 1
No ratings yet
RL and ObC Lecture 1
34 pages
OPTCON LQ Optimal Control 2024-10-16
No ratings yet
OPTCON LQ Optimal Control 2024-10-16
13 pages
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
No ratings yet
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
8 pages
15++ Control Óptimo
No ratings yet
15++ Control Óptimo
11 pages
Solucionario - Mary L Boas - 3ed Sol
No ratings yet
Solucionario - Mary L Boas - 3ed Sol
72 pages
Linear Quadratic Optimal Control
No ratings yet
Linear Quadratic Optimal Control
32 pages
L4 Discrete Time Optimal Control Indirect LQ ARE
No ratings yet
L4 Discrete Time Optimal Control Indirect LQ ARE
26 pages
LQR Lagrange
No ratings yet
LQR Lagrange
18 pages
Chapter2 2
No ratings yet
Chapter2 2
12 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
MATM2624 Final Exam 2022
No ratings yet
MATM2624 Final Exam 2022
2 pages
Mathmission for Xii (2025-26) - By o.p. Gupta
No ratings yet
Mathmission for Xii (2025-26) - By o.p. Gupta
151 pages
Infinite Horizon Linear Quadratic Regulator
No ratings yet
Infinite Horizon Linear Quadratic Regulator
11 pages
06722294
No ratings yet
06722294
6 pages
Lecture 4 Control
No ratings yet
Lecture 4 Control
23 pages
Chapter 4
No ratings yet
Chapter 4
33 pages
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
No ratings yet
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
30 pages
Class4
No ratings yet
Class4
4 pages
Inno2020 Emt4203 Control II Chap3.3-4 LQ Optimal
No ratings yet
Inno2020 Emt4203 Control II Chap3.3-4 LQ Optimal
11 pages
Optimal Control
No ratings yet
Optimal Control
35 pages
Optimal Control - Wikipedia
No ratings yet
Optimal Control - Wikipedia
12 pages
OCDM2223 Tutorial7solved
No ratings yet
OCDM2223 Tutorial7solved
5 pages
Optimal Control 2018 Souanef
No ratings yet
Optimal Control 2018 Souanef
15 pages
Makaleler
No ratings yet
Makaleler
108 pages
L2-1 LQR
No ratings yet
L2-1 LQR
14 pages
Vectors Solution
No ratings yet
Vectors Solution
38 pages
Model Based Output Difference Feedback Optimal Control
No ratings yet
Model Based Output Difference Feedback Optimal Control
6 pages
Linear-Quadratic Regulator (LQR) - Wikipedia
100% (1)
Linear-Quadratic Regulator (LQR) - Wikipedia
4 pages
Kybernetika 39-2003-4 6
No ratings yet
Kybernetika 39-2003-4 6
11 pages
A2 Linear-Quadratic Optimal Control
No ratings yet
A2 Linear-Quadratic Optimal Control
8 pages
09 - Monte Carlo Learning
No ratings yet
09 - Monte Carlo Learning
24 pages
7 Linear Quadratic Control: 7.1 The Problem
No ratings yet
7 Linear Quadratic Control: 7.1 The Problem
10 pages
Inno2024 EMT4203 CONTROL II NOTES R6
No ratings yet
Inno2024 EMT4203 CONTROL II NOTES R6
9 pages
MAT133-syllabus
No ratings yet
MAT133-syllabus
7 pages
Optimal Control of Hybrid Systems
No ratings yet
Optimal Control of Hybrid Systems
6 pages
16 - Optimal Control of Unknown Parameter Systems
No ratings yet
16 - Optimal Control of Unknown Parameter Systems
3 pages
Linear-Quadratic Optimal Control With Integral Quadratic Constraints
No ratings yet
Linear-Quadratic Optimal Control With Integral Quadratic Constraints
14 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
deep_ukf
No ratings yet
deep_ukf
13 pages
Optimization and Control: Examples Sheet 2: LQG Models
No ratings yet
Optimization and Control: Examples Sheet 2: LQG Models
2 pages
Polynomials and The FFT: Chapter 32 in CLR Chapter 30 in CLRS
No ratings yet
Polynomials and The FFT: Chapter 32 in CLR Chapter 30 in CLRS
34 pages
RPLA QB 5 units with img
No ratings yet
RPLA QB 5 units with img
40 pages
EE363 Review Session 1: LQR, Controllability and Observability
No ratings yet
EE363 Review Session 1: LQR, Controllability and Observability
6 pages
Blackmore GNC06
No ratings yet
Blackmore GNC06
15 pages
Optimizing Nonlinear Control Allocation
No ratings yet
Optimizing Nonlinear Control Allocation
6 pages
Applsci 13 08204
No ratings yet
Applsci 13 08204
14 pages
JEE 2025 Top 100 Maths
No ratings yet
JEE 2025 Top 100 Maths
28 pages
Mathematics Syllabus Honours Program
No ratings yet
Mathematics Syllabus Honours Program
50 pages
MAE101-ALG-Chapter 4 - Vector Geometry
No ratings yet
MAE101-ALG-Chapter 4 - Vector Geometry
82 pages
32708 - Engineering mathematics E2 Exam for May 2019 - Tagged
No ratings yet
32708 - Engineering mathematics E2 Exam for May 2019 - Tagged
6 pages
MTH308 Lecture 11
No ratings yet
MTH308 Lecture 11
16 pages
Receding Horizon HN Control of
No ratings yet
Receding Horizon HN Control of
10 pages
Sessional 2 BMA 101
No ratings yet
Sessional 2 BMA 101
3 pages
Da1 Scribd
No ratings yet
Da1 Scribd
7 pages
DMW Theorem
No ratings yet
DMW Theorem
13 pages
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
No ratings yet
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
6 pages
Sup Reg
No ratings yet
Sup Reg
9 pages
Unit 1
No ratings yet
Unit 1
50 pages
Analysis of Multiple Time Series
No ratings yet
Analysis of Multiple Time Series
56 pages
Matrix and Determinant CUET Level Assignment.
No ratings yet
Matrix and Determinant CUET Level Assignment.
4 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Kambaramayanam
No ratings yet
Kambaramayanam
6 pages
5 Fundamental Solutions
No ratings yet
5 Fundamental Solutions
21 pages
Quaternionic - Hyperholomorphic Functions, Singular Integral Operators and Boundary Value Problems. I. - Hyperholomorphic Function Theory
No ratings yet
Quaternionic - Hyperholomorphic Functions, Singular Integral Operators and Boundary Value Problems. I. - Hyperholomorphic Function Theory
26 pages
Lec19 - Linear Quadratic Regulator
No ratings yet
Lec19 - Linear Quadratic Regulator
7 pages
LQR
No ratings yet
LQR
5 pages
Delhi Public School, Digboi: Summer Holidays Assignment (2022-2023) Class: Xii (B) SUBJECT: Physics
No ratings yet
Delhi Public School, Digboi: Summer Holidays Assignment (2022-2023) Class: Xii (B) SUBJECT: Physics
4 pages
Solution SAS 2023 12
No ratings yet
Solution SAS 2023 12
2 pages
Maths Half Yearly Paper
No ratings yet
Maths Half Yearly Paper
6 pages
01b As Further Core Maths 8fm0 01 Paper Mark Scheme
No ratings yet
01b As Further Core Maths 8fm0 01 Paper Mark Scheme
13 pages
Vectors II Tutorial 1 Qns
No ratings yet
Vectors II Tutorial 1 Qns
6 pages
Linear Independence: Is Said To Be Linearly XV XV XV
No ratings yet
Linear Independence: Is Said To Be Linearly XV XV XV
10 pages
HL 049 Matrices 03
No ratings yet
HL 049 Matrices 03
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Optimization in Function Spaces
From Everand
Optimization in Function Spaces
Amol Sasane
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)

02 - Dynamic Programming and LQR

Uploaded by

02 - Dynamic Programming and LQR

Uploaded by

Computational Control

Dynamic Programming and LQR

Automatic Control Laboratory (IfA)

In absence of disturbance/uncertainty, the entire behavior of the system is

U = Ũ × {0, 1, . . . , T } represents the set of feasible inputs

An intractable optimization problem:

the dimension of u make the problem size formidable

the cost function requires an accurate model of the system

the optimization problem is non-convex

the open-loop nature of the optimal control sequence makes it useless in

Key assumptions: Markovian representation

xt+1 = ft (xt , ut ) (discrete-time dynamical system)

2 The initial condition x0 is known.

What does Markovian mean?

subject to xt+1 = ft (xt , ut ), x0 = X0

Bellman’s principle of optimality

The problems are nested, therefore we need to

The “inner” problem is a smaller optimal control problem (shorter horizon).

At time T − 1, compute VT −1 (xT −1 ) as

min gT −1 (xT −1 , uT −1 ) + VT (xT )

At time T − 2, compute VT −2 (xT −2 ) as

min gT −2 (xT −2 , uT −2 ) + VT −1 (xT −1 )

past decisions u0 , . . . , uT −3 uT −2 optimal

When is this stage problem simple to solve?

In practice: Quadratic cost g and linear dynamics f

Value function (minimum cost to go, starting from x at time t)

subject to xs+1 = Axs + Bus s = t, . . . , T − 1

V0 (x0 ) is the optimal cost.

We proceed by induction at stage t, where the value function is

Vt (x) = min x ⊤ Qx + u⊤ Ru +Vt+1 (Ax + Bu)

We proceed by induction, assuming that Vt+1 (x) = x ⊤ Pt+1 x.

Vt (x) = x ⊤ Qx + min u⊤ Ru + (Ax + Bu)⊤ Pt+1 (Ax + Bu) =

x Qx + min u⊤ Ru + x ⊤ A⊤ Pt+1 Ax + 2u⊤ B⊤ Pt+1 Ax + u⊤ B⊤ Pt+1 Bu

= x ⊤ Qx + min u⊤ (R + B⊤ Pt+1 B)u + x ⊤ A⊤ Pt+1 Ax + 2u⊤ B⊤ Pt+1 Ax

Unconstrained optimization problem:

(R + B⊤ Pt+1 B)u∗ + B⊤ Pt+1 Ax = 0

Vt (x) = x ⊤ Qx + x ⊤ APt+1 B(R + B⊤ Pt+1 B)−1 (R + B⊤ Pt+1 B) ·

Just collecting the different terms we get Vt (x) = x ⊤ Pt x where Pt is defined as

Pt = Q + A⊤ Pt+1 A − A⊤ Pt+1 B(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A

which completes the proof.

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 Axt

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Forward integration from x0 :

xt+1 = Axt + But .

Result from offline computation: open-loop sequence u0 , u1 , . . . , uT −1

Optimal feedback control

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A xt

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Optimal feedback control

ut = −(R + B⊤ Pt+1 B)−1 B⊤ Pt+1 A xt

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, PT = S.

Offline computation Online computation

n × n matrix multiplications storage of T m × n matrices

Extension of the LQR optimal control to T → ∞.

subject to xt+1 = Axt + But , x0 = X0

which is finite (≈ geometric series).

Algebraic Riccati Equation

Pt−1 = Q + A⊤ Pt A − A⊤ Pt B(R + B⊤ Pt B)−1 B⊤ Pt A, P0 = 0

converges, for t → −∞, to a solution P∞ ≻ 0 of the algebraic Riccati equation

P = Q + A⊤ PA − A⊤ PB(R + B⊤ PB)−1 B⊤ PA.

Sketch of the proof:

minimizes the infinite-horizon control cost and yields V (X0 ) = X0⊤ P∞ X0 .

Offline computation Online computation

n × n matrix multiplications storage of one m × n matrix

Extension of the LQR optimal control to T → ∞.

Is it a sensible question to ask?

You might also like