05 - Robust MPC
05 - Robust MPC
Robust MPC
Saverio Bolognani
xk+1 = f (xk , uk , wk )
Model mismatch
xk+1 = f̃ (xk , uk )
Missing dynamics / non-Markovianity
xk+1 = f (xk , zk , uk )
zk+1 = fz (xk , zk , uk )
Linearization
xk+1 = Ãxk + B̃uk
And more
▶ time discretization
▶ quantization
▶ time-varying parameters
▶ ...
1 / 26
The main tool against model mismatch/disturbances: feedback.
By determining∗ the optimal control policy at the current state x, we incorporate all
the past information in the decision.
Parametric optimization
K
X −1
u0∗ (x) determined by min gk (xk , uk ) + gK (xK )
u,x
k=0
* determining =
evaluating a policy in very special cases: LQR, Explicit MPC, ...
solving a program in real-time in general: tracking MPC, Economic MPC, ...
2 / 26
We can do better than that if we have prior information on the disturbance.
past
nominal
x ensemble
(same input sequence, different disturbance)
now k
xk+1 = f (xk , uk , wk )
finite disturbances wk ∈ {w 0 , w 1 , . . . , w p }
disturbance set wk ∈ co{w 0 , w 1 , . . . , w p }
probability distribution wk ∼ W
3 / 26
What is performance under uncertainty
Cost
▶ Nominal cost
▶ Worst case cost
▶ Expected cost
▶ ...and more (e.g. depending on risk tolerance)
Constraints
▶ Guaranteed satisfaction of constraints
▶ Constraint satisfaction with high probability
▶ Constraint satisfaction for a number of samples (Monte Carlo, scenario approach)
▶ Bound on expected violation (CVar - conditional value at risk)
▶ ...and more
4 / 26
In this course
5 / 26
Example of unfeasible robust trajectory
cost g(x, u) = x 2 + u2
state constraint |xk | ≤ 1, unconstrained input
horizon K = 5
6 / 26
Example of unfeasible robust trajectory
cost g(x, u) = x 2 + u2
state constraint |xk | ≤ 1, unconstrained input
horizon K = 5
uk = −xk
as it yields the closed loop dynamics xk+1 = wk , which is clearly within bounds.
7 / 26
In solving the open loop finite-time optimal control at the core of the MPC routine,
we are looking for a feasible input sequence
u0 , u1 , . . . , uK
uk = πk (xk ) k = 0, . . . , K − 1
or, equivalently,
uk = θk (w0 , . . . , wk−1 ) k = 0, . . . , K − 1
8 / 26
Unfortunately, computing the optimal robust closed-loop control policies is
extremely hard.
If we could do that, we would have solved our optimal control problems via
dynamic programming.
Not surprisingly, linear state update and quadratic cost is one of the very few
cases when this problem is tractable.
Robust LQR
Minmax LQR
H∞ LQR
two-player LQR
9 / 26
Special (solvable) case
K
X −1
V (x) = min max xk⊤ Qxk + uk⊤ Ruk −γ 2 wk⊤ wk + xK⊤ SxK , Q, S ≥ 0, R > 0
u w
k=0
10 / 26
Dynamic programming solution
and obtain the optimal input uk∗ (x) and the optimal disturbance wk∗ (x).
4 Prove that Vk is a quadratic form: Vk (x) = x ⊤ Pk x.
5 Iterate backwards until u0∗ .
11 / 26
Solution of the Isaac equation
Maximization over w yields a linear function in x and u
Proof:
The argument of the min max function in Vk (x) can be rewritten, using the
inductive assumption Vk+1 (x) = x ⊤ Pk+1 x, as
that is when
w = (γ 2 I − D⊤ Pk+1 D)−1 D⊤ Pk+1 (Ax + Bu).
12 / 26
Solution of the Isaac equation
Minimization over u (assuming worst case ŵk ) yields a linear function in x
Proof:
We can plug the expression for the worst w, i.e., ŵk (x, u) = Λx + Γu into the
expression for Vk (x), and obtain
Vk (x) = min x ⊤ Qx + u⊤ Ru − γ 2 (Λx + Γu)⊤ (Λx + Γu)
u
+ (Ax + Bu + D(Λx + Γu))⊤ Pk+1 (Ax + Bu + D(Λx + Γu))
Notice that this is a standard LQR problem now, for which we know that the
optimal solution is a linear state feedback, i.e. u = Kx.
The expression for K can be computed by zeroing the gradient with respect to u.
13 / 26
Recursive definition of Vk
We can finally prove that the value function is quadratic, by simple substitution of
the linear forms of uk∗ and wk∗ in x
= x ⊤ Qx + x ⊤ K ⊤ RKx − γ 2 x ⊤ H ⊤ Hx+
(A + BKx + DHx)⊤ Pk+1 (A + BKx + DHx)
= x ⊤ Q + K ⊤ RK − γ 2 H ⊤ H + (A + BK + DH)⊤ Pk+1 (A + BK + DH) x
| {z }
Pk
14 / 26
Things you would have to verify
convexity in u at all steps
concavity in w at all steps (requires γ large enough)
invertibility of the Hessians → unique minimizers/maximizers
positive semidefinitess of Pk
Offline computation
Similarly to the LQR case, this entire computation can be performed offline.
Online part in a receding horizon scheme:
u0∗ (x) = K0 x
15 / 26
Two ways to derive u0∗ (x)
Dynamic programming
automatically returns the desired The desired control law u0∗ (x) is
control law u0∗ (x) from offline obtained by parametrizing the
computation. online optimization problem in x
Corresponds to infinite-time optimal
control at the limit K → ∞.
16 / 26
nominal w = 0 closed loop open loop
w1 w1
w2 w2
state
!
input
w3 w3
disturbance
17 / 26
A tradeoff
uk (x) = πk (xk ; v)
Feedback MPC
K
X −1
min max gk (xk , πk (xk , v)) + gK (xK )
v,x w
k=0
18 / 26
Examples of policies
Open-loop policy
v ∈ RK −1 uk (xk ) = πk (xk ; v) = vk
M
X
v ∈ RKM uk (xk ) = πk (xk ; v) = vkm θm (xk )
m=1
Closed-loop policy
19 / 26
Feedback MPC
A policy parametrized in a vector of parameters v:
M
X
v k ∈ RM uk (xk ) = π(xk ; v) = vm θm (xk )
m=1
20 / 26
The optimal trajectory is determined by
that is
xk+1 = (A + BL)xk + Bvk + Dwk
21 / 26
Remark: joint optimization of L and v
uk = vk + Lxk
closed-loop policies
any πk (xk )
open-loop
policies πk (xk ) = vk + Lxk
πk (xk ) = vk + Lk xk
πk (xk ) = vk fixed L
L=0
22 / 26
Optimization over disturbance-feedback policies
Consider control policies of the form
k−1
X
uk = Mki wi + vk .
i=0
is convex.
23 / 26
k−1
X
uk = Mki wi + vk .
i=0
Notice first that a feedback form the disturbance w (with a unit delay) is equivalent
to a feedback from the state, as
Computational complexity
Despite being a convex problem, this problem can be computationally very hard,
because of the huge number of constraints (proportional to the cardinality of the
set W).
24 / 26
Robust MPC: summary
25 / 26
The control engineer flowchart
26 / 26
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
https://bsaver.io/COCO