0% found this document useful (0 votes)
23 views28 pages

05 - Robust MPC

The document discusses Robust Model Predictive Control (MPC) and its challenges, particularly in dealing with model mismatches and disturbances. It emphasizes the importance of feedback in optimizing control policies and explores the use of prior information about disturbances to improve performance under uncertainty. The document also outlines various approaches to robust control, including the use of dynamic programming and feedback MPC strategies.

Uploaded by

Ahmet Çelik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views28 pages

05 - Robust MPC

The document discusses Robust Model Predictive Control (MPC) and its challenges, particularly in dealing with model mismatches and disturbances. It emphasizes the importance of feedback in optimizing control policies and explores the use of prior information about disturbances to improve performance under uncertainty. The document also outlines various approaches to robust control, including the use of dynamic programming and feedback MPC strategies.

Uploaded by

Ahmet Çelik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Computational Control

Robust MPC

Saverio Bolognani

Automatic Control Laboratory (IfA)


ETH Zurich
“All models are wrong, but some are useful”

Neglected exogenous disturbances

xk+1 = f (xk , uk , wk )

Model mismatch
xk+1 = f̃ (xk , uk )
Missing dynamics / non-Markovianity

xk+1 = f (xk , zk , uk )
zk+1 = fz (xk , zk , uk )

Linearization
xk+1 = Ãxk + B̃uk
And more
▶ time discretization
▶ quantization
▶ time-varying parameters
▶ ...

1 / 26
The main tool against model mismatch/disturbances: feedback.
By determining∗ the optimal control policy at the current state x, we incorporate all
the past information in the decision.

Parametric optimization
K
X −1
u0∗ (x) determined by min gk (xk , uk ) + gK (xK )
u,x
k=0

subject to xk+1 = f (xk , uk )


x0 = x
xk ∈ X k
uk ∈ U k

* determining =
evaluating a policy in very special cases: LQR, Explicit MPC, ...
solving a program in real-time in general: tracking MPC, Economic MPC, ...

2 / 26
We can do better than that if we have prior information on the disturbance.

past
nominal
x ensemble
(same input sequence, different disturbance)

now k

xk+1 = f (xk , uk , wk )

finite disturbances wk ∈ {w 0 , w 1 , . . . , w p }
disturbance set wk ∈ co{w 0 , w 1 , . . . , w p }
probability distribution wk ∼ W

Does it matter? How can we use this prior information?

3 / 26
What is performance under uncertainty
Cost
▶ Nominal cost
▶ Worst case cost
▶ Expected cost
▶ ...and more (e.g. depending on risk tolerance)
Constraints
▶ Guaranteed satisfaction of constraints
▶ Constraint satisfaction with high probability
▶ Constraint satisfaction for a number of samples (Monte Carlo, scenario approach)
▶ Bound on expected violation (CVar - conditional value at risk)
▶ ...and more

Not covered: stability under uncertainty.

4 / 26
In this course

Linear system, additive disturbance

xk+1 = Axk + Buk + Dwk

Robust decision: Worst-case


Feedback control law u0∗ (x) determined as the first step of the open-loop robust
optimal control problem
K
X −1
min max gk (xk , uk ) + gK (xK )
u,x w
k=0

subject to xk+1 = Axk + Buk +Dwk


x0 = x
xk ∈ X ∀w ∈ W
uk ∈ U

5 / 26
Example of unfeasible robust trajectory

A very simple system

xk+1 = xk + uk + wk , |wk | ≤ 0.5

cost g(x, u) = x 2 + u2
state constraint |xk | ≤ 1, unconstrained input
horizon K = 5

Simple integration of the plant dynamics yields


4
X 4
X
|x5 | ≤ 1 ⇔ −1 ≤ x0 + uk + wk ≤ 1
k=1 k=1

If this needs to hold for all feasible disturbance w, we have


4
X
−x0 − 1−2 ≤ uk ≤ −x0 + 1−2 for wk = 0.5, 0.5, 0.5, . . .
k=1
4
X
−x0 − 1+2 ≤ uk ≤ −x0 + 1+2 for wk = −0.5, −0.5, −0.5, . . .
k=1

6 / 26
Example of unfeasible robust trajectory

A very simple system

xk+1 = xk + uk + wk , |wk | ≤ 0.5

cost g(x, u) = x 2 + u2
state constraint |xk | ≤ 1, unconstrained input
horizon K = 5

A feasible (although suboptimal) proportional controller clearly exists:

uk = −xk

as it yields the closed loop dynamics xk+1 = wk , which is clearly within bounds.

MPC was supposed to be a clever way to produce static time-invariant feedback


laws, why is it failing to do so?

7 / 26
In solving the open loop finite-time optimal control at the core of the MPC routine,
we are looking for a feasible input sequence

u0 , u1 , . . . , uK

that produces a feasible trajectory regardless of the disturbance wk .


That leads to
extremely conservative control sequences
infeasible problems

Closed loop optimal control


We are allowed to optimize over the set of input policies

uk = πk (xk ) k = 0, . . . , K − 1

or, equivalently,
uk = θk (w0 , . . . , wk−1 ) k = 0, . . . , K − 1

8 / 26
Unfortunately, computing the optimal robust closed-loop control policies is
extremely hard.
If we could do that, we would have solved our optimal control problems via
dynamic programming.
Not surprisingly, linear state update and quadratic cost is one of the very few
cases when this problem is tractable.

Robust LQR
Minmax LQR
H∞ LQR
two-player LQR

Come to Game Theory and Control in Fall!

9 / 26
Special (solvable) case

Soft-constrained LQR game

K
X −1
V (x) = min max xk⊤ Qxk + uk⊤ Ruk −γ 2 wk⊤ wk + xK⊤ SxK , Q, S ≥ 0, R > 0
u w
k=0

The term −γ 2 wk⊤ wk


is irrelevant in the minimization with respect to u
is a “soft bound” on the energy of the disturbance wk
makes the maxw concave and therefore solvable (γ large enough)
can be tuned a posteriori, iterating until wk is acceptable

Interpretation: two players


minimizing player: the controller, trying to reduce the plant cost
maximizing player: nature playing against you, knowing uk (worst case!)

10 / 26
Dynamic programming solution

Let Vk (x) be the min-max problem involving the steps from k to K .


1 Verify that VK is a quadratic form. → It is: VK (x) = x ⊤ Sx.
2 Assume that Vk+1 is a quadratic form: Vk+1 (x) = x ⊤ Pk+1 x.
3 Solve the Isaac equation

Vk (x) = min max x ⊤ Qx + u⊤ Ru − γ 2 w ⊤ w + Vk+1 (Ax + Bu + Dw)


u w

and obtain the optimal input uk∗ (x) and the optimal disturbance wk∗ (x).
4 Prove that Vk is a quadratic form: Vk (x) = x ⊤ Pk x.
5 Iterate backwards until u0∗ .

11 / 26
Solution of the Isaac equation
Maximization over w yields a linear function in x and u

ŵk (x, u) = −(D⊤ Pk+1 D − γ 2 I)−1 D⊤ Pk+1 (Ax + Bu) := Λx + Γu

Proof:
The argument of the min max function in Vk (x) can be rewritten, using the
inductive assumption Vk+1 (x) = x ⊤ Pk+1 x, as

x ⊤ Qx + u⊤ Ru − γ 2 w ⊤ w + (Ax + Bu + Dw)⊤ Pk+1 (Ax + Bu + Dw).

Its gradient with respect to w is

−2γw + 2D⊤ Pk+1 (Ax + Bu + Dw)

which is zero when

(γ 2 I − D⊤ Pk+1 D)w = D⊤ Pk+1 (Ax + Bu)

that is when
w = (γ 2 I − D⊤ Pk+1 D)−1 D⊤ Pk+1 (Ax + Bu).

12 / 26
Solution of the Isaac equation
Minimization over u (assuming worst case ŵk ) yields a linear function in x

uk∗ (x) = Kx ⇒ wk∗ (x) = Λx + Γu = (Λ + ΓK ) x


| {z }
H

Proof:
We can plug the expression for the worst w, i.e., ŵk (x, u) = Λx + Γu into the
expression for Vk (x), and obtain

Vk (x) = min x ⊤ Qx + u⊤ Ru − γ 2 (Λx + Γu)⊤ (Λx + Γu)
u

+ (Ax + Bu + D(Λx + Γu))⊤ Pk+1 (Ax + Bu + D(Λx + Γu))

Notice that this is a standard LQR problem now, for which we know that the
optimal solution is a linear state feedback, i.e. u = Kx.
The expression for K can be computed by zeroing the gradient with respect to u.

13 / 26
Recursive definition of Vk
We can finally prove that the value function is quadratic, by simple substitution of
the linear forms of uk∗ and wk∗ in x

Vk (x) = min max x ⊤ Qx + u⊤ Ru − γ 2 w ⊤ w + Vk+1 (Ax + Bu + Dw)


u w

= x ⊤ Qx + x ⊤ K ⊤ RKx − γ 2 x ⊤ H ⊤ Hx+
(A + BKx + DHx)⊤ Pk+1 (A + BKx + DHx)
 
= x ⊤ Q + K ⊤ RK − γ 2 H ⊤ H + (A + BK + DH)⊤ Pk+1 (A + BK + DH) x
| {z }
Pk

14 / 26
Things you would have to verify
convexity in u at all steps
concavity in w at all steps (requires γ large enough)
invertibility of the Hessians → unique minimizers/maximizers
positive semidefinitess of Pk

Offline computation
Similarly to the LQR case, this entire computation can be performed offline.
Online part in a receding horizon scheme:

u0∗ (x) = K0 x

(time-invariant static feedback law)

15 / 26
Two ways to derive u0∗ (x)

Closed-loop solution Open-loop solution

Computationally intractable (except Computationally tractable (convex


for very few cases like optimization problem) but often
soft-constrained min-max LQR) unfeasible
Constructs a feedback control
u1 (x1 ), u2 (x2 ), . . . , uK∗ (xK ) Constructs an input sequence
Optimal: all past information about u0 , u1 , . . . , uK
the disturbance is used at each Conservative: no past information
stage k about the disturbance is used except
for the information available at k = 0
ŵk = D† (xk+1 − Axk − Buk )

Dynamic programming
automatically returns the desired The desired control law u0∗ (x) is
control law u0∗ (x) from offline obtained by parametrizing the
computation. online optimization problem in x
Corresponds to infinite-time optimal
control at the limit K → ∞.

16 / 26
nominal w = 0 closed loop open loop

w1 w1

w2 w2

state
!
input
w3 w3
disturbance

17 / 26
A tradeoff

Parametrize the feedback control law via a set of parameters v

uk (x) = πk (xk ; v)

and solve a feedback MPC problem.

Feedback MPC
K
X −1
min max gk (xk , πk (xk , v)) + gK (xK )
v,x w
k=0

subject to xk+1 = f (xk , πk (xk , v), wk )


x0 = x
xk ∈ Xk ∀w ∈ W
πk (xk , v) ∈ Uk ∀w ∈ W

We are allowing uk to use current information xk via a policy π!

18 / 26
Examples of policies

Open-loop policy

v ∈ RK −1 uk (xk ) = πk (xk ; v) = vk

Some smart parametrization

M
X
v ∈ RKM uk (xk ) = πk (xk ; v) = vkm θm (xk )
m=1

Closed-loop policy

v ∈ R∞ uk (xk ) = πk (xk ) any function!

19 / 26
Feedback MPC
A policy parametrized in a vector of parameters v:
M
X
v k ∈ RM uk (xk ) = π(xk ; v) = vm θm (xk )
m=1

Selecting the right basis functions {θm }m=1,...,M is often complicated.

Example for linear time-invariant systems: affine control law


K
X −1
min max xk⊤ Qxk + πk (xk , v)⊤ Rπk (xk , v) + xK⊤ SxK
v,x w
k=0

subject to xk+1 = Axk + Bπk (xk , v) + Dwk


x0 = x
xk ∈ Xk ∀w ∈ W
πk (xk , v) ∈ Uk ∀w ∈ W

with the affine control law πk (xk , v) = vk + Lxk

20 / 26
The optimal trajectory is determined by

xk+1 = Axk + B (vk + Lxk ) +Dwk


| {z }
πk (xk ,v)

that is
xk+1 = (A + BL)xk + Bvk + Dwk

The sequence vk is an open-loop optimal policy


The feedback gain L rejects the disturbance wk
▶ A + BL Hurwitz
▶ design L offline for good disturbance rejection

Note: In case of no disturbance, there is no advantage in this parametrization, as


vk can include the term Lxk computed for the nominal system trajectory.

21 / 26
Remark: joint optimization of L and v

Consider the MPC-feedback affine control law

uk = vk + Lxk

What if we optimize both with respect to


the open-loop (feedforward) sequence vk
the closed-loop (feedback) gain L?

closed-loop policies
any πk (xk )

open-loop
policies πk (xk ) = vk + Lxk
πk (xk ) = vk + Lk xk
πk (xk ) = vk fixed L
L=0

A larger set of possible policies (still not any policy πk )

22 / 26
Optimization over disturbance-feedback policies
Consider control policies of the form
k−1
X
uk = Mki wi + vk .
i=0

The optimization problem


K
X −1
min max xk⊤ Qxk + uk⊤ Ruk + xK⊤ SxK
v,M,x w
k=0

subject to xk+1 = Axk + Buk + Dwk


x0 = x
xk ∈ Xk ∀w ∈ W
uk ∈ Uk ∀w ∈ W
k−1
X
uk = Mki wi + vk
i=0

is convex.

23 / 26
k−1
X
uk = Mki wi + vk .
i=0

Notice first that a feedback form the disturbance w (with a unit delay) is equivalent
to a feedback from the state, as

wk = D† (xk+1 − Axk − Buk )

We then have convexity, as the feasible set


 

 xk+1 = Axk + Buk + Dwk 



 x0 = x 


(M, v) xk ∈ Xk ∀w ∈ W
uk ∈ Uk ∀w ∈ W

 


 

uk = k−1
P
i=0 Mki wi + vk
 

is convex (even for non-convex W).

Computational complexity
Despite being a convex problem, this problem can be computationally very hard,
because of the huge number of constraints (proportional to the cardinality of the
set W).
24 / 26
Robust MPC: summary

Standard MPC already rejects disturbance by the feedback nature of the


resulting control law u0∗ (x)
We may want to do better: min-max and robust performance criteria (cost and
constraints)
Simply enforcing worst-case satisfaction of constraints in the open-loop
trajectory planning yields very conservative solutions (when not infeasible
problems)
Ideally, closed-loop policies should be used in the optimization, but this is only
doable for very special cases (min-max LQR)
A compromise consists in parametrizing control policies in a tractable way
Example: combine a pre-computed linear feedback with an optimized
open-loop sequence
The linear feedback and the openloop sequence can be optimized together
via a convex problem.

25 / 26
The control engineer flowchart

26 / 26
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License

https://bsaver.io/COCO

You might also like