0% found this document useful (0 votes)
16 views12 pages

Epfl

Uploaded by

joudi hajar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

Epfl

Uploaded by

joudi hajar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Distributionally Robust Infinite-horizon Control: from a pool of samples to

the design of dependable controllers


Jean-Sébastien Brouillon⋆ , Andrea Martin⋆ , John Lygeros, Florian Dörfler, and Giancarlo Ferrari-Trecate

Abstract— We study control of constrained linear systems the Kullback–Leibler divergence and the total variation dis-
when faced with only partial statistical information about the
arXiv:2312.07324v1 [math.OC] 12 Dec 2023

tance [4], recent literature has shown that working with


disturbance processes affecting the system dynamics and the ambiguity sets defined using the Wasserstein metric [5] offers
sensor measurements. Specifically, given a finite collection of
disturbance realizations, we consider the problem of designing a a number of advantages in terms of expressivity, compu-
stabilizing control policy with provable safety and performance tational tractability, and statistical out-of-sample guarantees
guarantees in face of the inevitable mismatch between the [1]–[3]. Thanks to these properties, Wasserstein DRO has
true and the empirical distributions. We capture this discrep- found application in a wide variety of domains, ranging from
ancy using Wasserstein ambiguity sets, and we formulate a finance and machine learning to game theory, see, e.g., [2],
distributionally robust (DR) optimal control problem, which
provides guarantees on the expected cost, safety, and stability [3], [6]–[8].
of the system. To solve this problem, we first present new Similarly, Wasserstein ambiguity sets have recently been
results for DR optimization of quadratic objectives using convex interfaced with the dynamic environments and continuous
programming, showing that strong duality holds under mild actions spaces typical of control. In [9], the authors consider
conditions. Then, by combining our results with the system a generalization of classical linear quadratic Gaussian (LQG)
level parametrization (SLP) of linear feedback policies, we
show that the design problem can be reduced to a semidefinite control, where the noise distributions belong to Wasserstein
optimization problem (SDP). balls centered at nominal Gaussian distributions. Motivated
by the idea of leveraging uncertainty samples for data-driven
I. I NTRODUCTION
decision-making under general distributions, a parallel line
As modern engineered systems become increasingly com- of research instead considers ambiguity sets centered at
plex and interconnected, classical control methods based on nominal empirical distributions. Among other contributions
stochastic optimization face the challenge of overcoming the exploiting the greater expressivity provided by this data-
lack of a precise statistical description of the uncertainty. driven approach, [10]–[12] consider the design of tube-based
In fact, the probability distribution of the uncertainty is predictive control schemes, [13] and [14] address infinite-
generally unknown and only indirectly observable through a horizon problems using dynamic programming, [15] and
finite number of independent samples. In addition, replacing [16] focus on filtering and state estimation problems. More
the true distribution with a nominal estimate in the spirit fundamentally, [17] and [18] provide exact characterizations
of certainty equivalence often proves unsatisfactory; the of how Wasserstein ambiguity sets propagate through the
optimization process amplifies any statistical error in the system dynamics, shedding light on the role of feedback in
distribution inferred from data, resulting in solutions that are controlling shape and size of the ambiguity sets resulting
prone to yielding poor out-of-sample performance [1]–[3]. from distributional uncertainty.
Motivated by these observations, the paradigm of distri- Despite these advances, it remains unclear how the avail-
butionally robust optimization (DRO) considers a minimax ability of samples can drive the design of a control policy
stochastic optimization problem over a neighborhood of the that guarantees safety and performance in face of distri-
nominal distribution defined in terms of a distance in the butional uncertainty while simultaneously ensuring stability
probability space. In this way, the solution becomes robust of the closed-loop system. Motivated by this challenge,
to the most averse distribution that is sufficiently close to we first establish novel strong duality results for DRO
the nominal distribution, while the degree of conservatism of quadratic functions, which are routinely encountered in
of the underlying optimization can be regulated by adjusting control, by explicitly accounting for the (possibly bounded)
the radius of the ambiguity set. support of the uncertainty. Then, leveraging the system
While several alternatives have been proposed to measure level parametrization (SLP) of linear dynamic controllers
the discrepancy between probability distributions, including [19], we present a convex reformulation of the Distribu-
Jean-Sébastien Brouillon, Andrea Martin, and Giancarlo Ferrari-Trecate tionally Robust Infinite-horizon Controller (DRInC) syn-
are with the Institute of Mechanical Engineering, EPFL, Switzer- thesis problem, which exploits a finite impulse response
land. E-mail addresses: {jean-sebastien.brouillon, andrea.martin, gian- (FIR) approximation of the system closed-loop maps. As
carlo.ferraritrecate}@epfl.ch.
John Lygeros and Florian Dörfler are with the Department of Information key advantages, our optimization-based approach guarantees
Technology and Electrical Engineering, ETH Zürich, Switzerland. E-mail stability of the closed-loop interconnection by design, and
addresses: {jlygeros, dorfler}@ethz.ch. only requires one-shot offline computations. As such, our
Research supported by the Swiss National Science Foundation (SNSF)
under the NCCR Automation (grant agreement 51NF40 80545). solution bypasses the computational bottleneck that would
⋆ Jean-Sébastien Brouillon and Andrea Martin contributed equally. result by recomputing the optimal control policy online
according to a receding horizon strategy [10]–[12]. In fact, We mainly focus on the case where Ξ is a compact polyhe-
as the complexity of the synthesis problem increases with dron. Nevertheless, as we will highlight in the following, our
the number of considered uncertainty samples, solving the results naturally extend to the most studied case Ξ = Rd .
(i) (i)
policy optimization problem in real-time becomes prohibitive Remark 1: Reconstructing wT and vT online, that is,
whenever the nominal empirical distribution is estimated (i) (i)
given the corresponding input and output signals (uT , yT )
using a sufficiently large number of uncertainty samples. Fur- only, is in general not possible. Still, the samples in (2) can
ther, differently from [13] and [14], which consider infinite- be reconstructed from a series of offline experiments con-
horizon DRC in unconstrained scenarios, our approach nat- ducted in a laboratory environment, where the availability of
urally extends to include satisfaction of probabilistic safety additional sensors allows measuring the entire state trajectory
constraints expressed as distributionally robust conditional (i)
xT of the system. Alternatively, if w and v represent the
value-at-risk (CVaR) constraints. Lastly, the proposed op- effect of complex physical phenomena, e.g., wind gusts and
timization perspective allows us to seamlessly study the turbulences, and sensor inaccuracy, respectively, the samples
partially observed setting, extending the recent results [9], in (2) can also be generated using high-fidelity simulators.
[20], [21] on output-feedback DRC to the infinite horizon
case. As we comment throughout the paper, our formulation B. Control objectives, policies, and uncertainty propagation
encompasses several control problems considered in the We consider the problem of designing offline a stabilizing
literature, providing a unified perspective on stochastic and feedback policy that retains probabilistic safety and perfor-
robust control objectives. mance guarantees over an infinite horizon. Specifically, given
D  0, we measure the control cost that a policy u = π(y)
II. P ROBLEM S TATEMENT
incurs whenever the joint disturbance sequence ξ realizes as:
A. System dynamics and uncertainty description  
T′
We consider controllable and observable linear dynamical 1 X ⊤ ⊤
 xt
J(π, ξ) = lim x t u t D ,
systems described by the state-space equations: T ′ →∞ T ′ ut
t=0

xt+1 = Axt + But + wt , yt = Cxt + vt , (1) and we define polytopic safe sets X ⊆ Rn and U ⊆ Rm for
the system state and input signals, respectively, as:
where xt ∈ Rn , ut ∈ Rm , yt ∈ Rp , wt ∈ Rn and vt ∈
Rp are the system state, the control input, the observable X = {x ∈ Rn : gx (x) = max G⊤
xj x + gxj ≤ 0 , Jx ∈ N} ,
j∈[Jx ]
output, and the stochastic disturbances modeling process and
measurement noise, respectively. We study infinite-horizon U = {u ∈ R m
: gu (u) = max G⊤
uj u + guj ≤ 0 , Ju ∈ N} ,
j∈[Ju ]
control when only partial statistical information about the
distribution of the joint disturbance process ξt = (wt , vt ) where [Jx ] denotes the set {1, . . . , Jx } ⊂ N and similarly for
is available. Specifically, we assume availability of N ∈ N [Ju ]. Then, given a safety parameter γ ∈ (0, 1) to control
(1) (N )
independent observations ξT , . . . , ξT , where each sample the level of acceptable constraint violations, we formulate
the following chance constrained stochastic optimization
(i) (i) (i) (i) (i) (i) (i)
ξT = (wT , vT ) = (w0 , . . . , wT , v0 , . . . , vT ) , (2) problem:
constitutes a trajectory of length T ∈ N of wt and vt . As π ⋆ = arg min EP [J(π, ξ)] (3a)
π
no performance or safety guarantee can be established if
the samples in (2) are not representative of the asymptotic subject to CVaRPγ (max{gx (xt (ξ)), gu (ut (ξ))}) ≤ 0 , (3b)
statistics of w and v, we start by formulating the following where CVaR constraints are defined according to
stationarity assumption, see, e.g., [22, p. 154].
1
Assumption 1: For all t ∈ N, the stochastic process that CVaRPγ (g(ξ)) = inf τ + EP [max{g(ξ) − τ, 0}] , (4)
generates the joint disturbance vector ξt = (wt , vt ) is τ ∈R γ
stationary of order T , i.e., P(ξ0 , . . . , ξT ) = P(ξt , . . . , ξt+T ). for any measurable function g : Rd → R. We note that,
We note that Assumption 1 subsumes the usual setting where besides implying that P[xt ∈ X , ut ∈ U] ≥ 1 − γ, (3b)
each realization of the disturbance processes is indepen- also accounts for the expected amount of constraint violation
dent and identically distributed, and more generally allows in the γ percent of cases where any such violation occurs.
modeling temporal correlation between samples that are As such, the CVaR formulation reflects the observation that,
separated by up to T time steps. Further, as the order T can in most control applications, severe breaches of the safety
theoretically be arbitrarily large, this assumption is relatively constraints often have far more detrimental consequences
mild, albeit, in practice, an upper bound on the order T is than mild violations. As the probability distribution P is
often dictated by computational complexity concerns. fundamentally unknown, however, we cannot address the
Throughout the paper, we denote by Ξ ⊆ Rd , with decision problem (3) directly, and we instead rely on the
d = (n + p)(T + 1), the support of the unknown probability following approximations.
distribution P, and we make the following assumption. First, we construct the empirical probability distribution
Assumption 2: The support set Ξ = {ξ ∈ Rd : Hξ ≤ h} N
X
is full-dimensional, that is, Ξ contains a d-dimensional ball b= 1
P δ (i) , (5)
with strictly positive radius. N i=1 ξT
(i)
where δξ(i) denotes the Dirac delta distribution at ξT . In classical H2 and H∞ control problems, which correspond to
T
order to immunize against any error in P,b we replace the the limit cases of ǫ approaching 0 and ∞, respectively.
If ǫ = 0, the Wasserstein ball Bǫ (P) b reduces to the
nominal objective (3a) with the minimization of the worst-
b ⊆ b
singleton {P} and the supremum disappears. This gives
case expected loss over the set of distributions Bǫ (P)
P(Ξ) that are supported on Ξ and are sufficiently close to a simple Monte-Carlo-based control design problem [24],
b 1 More formally, we define
the empirical estimate P. [25]. Moreover, because J(π, ξ) is quadratic, the result-
ing optimal controller is the LQG designed for PN =
b = {Q ∈ P(Ξ) : W (P,
Bǫ (P) b Q) ≤ ǫ} , (6) N (Eξ∼bP [ξ], varξ∼bP [ξ]) in the absence of constraints [26].
b and Indeed, because both the dynamics and the controller are
where ǫ ≥ 0 is the radius of the ambiguity set Bǫ (P),
b b linear, one has2
W (P, Q) is the Wasserstein distance between P and Q, i.e.,
Z EPN [J(π, ξ)] = EbP [J(π, ξ)] ,
b Q) = inf 2
W (P, kξ − ξ ′ k2 π(dξ, dξ ′ ) , (7)
π∈Π Ξ2 which means that the arg minπ of both expectations is also
where Π denotes the set of joint probability distributions of ξ the same.
and ξ ′ with marginal distributions Pb and Q, respectively [1], If ǫ is very large and Ξ is compact, (8) can also be seen
[2]. In (7), the decision variable π encodes a transportation as a generalization of H∞ synthesis methods [26], [27]. In
plan for moving a mass distribution described by P b to a fact, in the limit case of ǫ → ∞ and no matter how P b is
b
distribution described by Q. Thus, Bǫ (P) can be interpreted constructed, (6) contains all distributions P(Ξ) supported
as the set of distributions onto which P b can be reshaped at a on Ξ, including the degenerate distribution taking value at
cost of at most ǫ, where the cost of moving a unit probability the most-averse ξ almost surely.
from ξ to ξ ′ is given by kξ − ξ ′ k22 . Intermediate values of ǫ instead yield solutions that lever-
Second, since dynamic programming solutions are gener- age the observations (2) to trade-off robustness to adversarial
ally computationally intractable, we restrict our attention to perturbations or distribution shifts against performance under
policies π ∈ ΠL that are linear in the past observations y, distributions in a neighborhood of P. b
that is, u = π(y) = K(z)y for some real-rational proper We conclude this section by remarking that, differently
transfer function K(z). Besides computational advantages, from [9], we do not assume that the nominal distribution
our choice is supported by recent advances in DRC, which b is Gaussian, and instead use the empirical estimate (5)
P
show that linear policies are globally optimal for a general- to provide greater design flexibility. In fact, if P is, e.g.,
ization of the classical unconstrained LQG problem, where bimodal, then the Wasserstein distance between P and its
the noise distributions belong to a Wasserstein ambiguity set closest Gaussian distribution G will generally be larger than
(6), centered at a nominal Gaussian distribution P b [9]. the Wasserstein distance between P and its empirical estimate
We are now in a position to state our problem of interest b In turn, this implies that a larger radius ǫ needs to be used
P.
as: to ensure that P ∈ Bǫ (G) with high probability, leading to a
more conservative design.
inf sup EQ [J(π, ξ)] (8a)
π∈ΠL
Q∈Bǫ (b
P) III. BACKGROUND
subject to sup CVaRQ
γ (gt (ξ)) ≤ 0 , ∀t ∈ N , (8b)
In this section, we recall useful technical preliminaries,
Q∈Bǫ (b
P)
and we discuss the design assumptions that will allow us
where gt (ξ) = max{gx (xt (ξ)), gu (ut (ξ))} for compactness. to compute an approximate solution to (8) through convex
Note that the worst-case distributions in (8a) and (8b) may programming. In particular, we start by reviewing the system
not coincide. Despite the fact that in practice the uncertainty level approach to controller synthesis [19], and then present
distribution is unique, the formulation in (8) proves necessary recent duality results from the DRO literature [3].
b and not simply
to ensure safety for all distributions in Bǫ (P)
for the one maximizing the expected control cost. A. System level synthesis

C. Expressivity of the problem formulation and related work The system level synthesis framework provides a con-
vex parameterization of the non-convex set of internally
The solution to the DRO problem (8) depends on the ra- stabilizing controllers K(z), allowing one to reformulate
dius ǫ defining (6). In particular, we argue that (8) generalizes many control problems as optimization over the closed-loop
1 It is well-known that solving (3) upon naively replacing P with b P, that
responses Φxw (z), Φxv (z), Φuw (z) and Φuv (z) that map w
is, setting ǫ to zero in (6), may lead to decisions that are unsafe or exhibit and v to x and u. To define these maps, we first combine
poor out-of-sample performance, as the optimization process often amplifies the linear output feedback policy u = K(z)y with the z
any estimation error in b P. Instead, for any β > 0, if P is light-tailed transform of the state dynamics in (1) to obtain:
and the radius ǫ is chosen as a sublinearly growing function of log(1/β)N
,
then results from measure concentration theory ensure that P lie inside
the ambiguity set (6) with confidence 1 − β, see, [23, Theorem 2] and
(zI − (A + BK(z)C))x = w + BK(z)v .
[2, Theorem 18]. Therefore, in this case, any solution to (8) retains finite-
2 Both expectations are equal to the same linear transformation of the first
samples probabilistic guarantees in terms of out-of-samples control cost and
constraint satisfaction. and second moments of P b and PN , which are equal.
Then, since the transfer matrix (zI − (A + BK(z)C)) is B. A stationarity control problem
invertible for any proper controller K(z), we have As we consider an infinite horizon control problem, we
     focus on the steady state behavior of the system, and we are
x Φxw (z) Φxv (z) w instead less interested in the transient behavior [30]. Moti-
= = Φξ (z)ξ ,
u Φuw (z) Φuv (z) v
  vated by this and to take full advantage of the stationarity
(zI − (A + BK(z)C))−1 Φxw (z)BK(z) properties of ξt in Assumption 1, we focus on designing an
= ξ.
K(z)CΦxw (z) Φuw (z) + zK(z) optimal safe controller to operate the system for t ≥ T only.
In this setting, we proceed to show that the distributionally
In particular, we note that causality of K(z) implies causality robust worst-case control cost and CVaR constraints admit
of Φuv and strict causality of Φxw , Φxv and Φuw . Further, finite-dimensional representations
one can show that the affine subspace defined by Assumption 3: The system is initialized by an external
    controller with x0 , . . . , xT −1 ∈ X and u0 , . . . , uT −1 ∈ U.
zI − A −B Φξ (z) = I 0 , (9a) We therefore redefine the optimization cost J in (8a) as
   
zI − A I T′
Φξ (z) = , (9b) 1 X ⊤
−C 0 JT (π(Φ), ξ) = lim ξt−T :t Φ⊤ DΦξt−T :t .

T →∞ T −T

t=T
characterizes all and only the system responses Φξ (z) that
are achievable by an internally stabilizing controller K(z) Note that due to the stationarity of Q (see Assumption 1),
[19]. Despite the fact that (9) defines a convex feasible JT satisfies
set, minimizing a given convex objective with P∞ respect −k to EQ J(π(Φ), ξ)
the closed-loop transfer matrix Φξ (z) = k=0 Φ(k)z T′
proves challenging, as the resulting optimization problem re- 1 X ⊤
= lim E ξ0:R ∼Q ′
ξt−T :t Φ⊤ DΦξt−T :t ,
mains infinite dimensional. Therefore, to recover tractability ′ T →∞ .. T −T
. t=T
and following [19], [28], we rely on a FIR approximation ξT ′ −T :T ′ ∼Q
of Φξ (z), i.e., we restrict our attention to the truncated = EξT ∼Q ξT⊤ Φ⊤ DΦξT . (12)
PT −k
system response ΦTξ (z) = k=0 Φ(k)z . We remark
that controllability and observability of (1) ensure that (9) The problem statement (8) for DRInC synthesis can be
admits a FIR solution [19, Theorem 4]. At the same time, reformulated as finding the optimal FIR map Φ⋆ of length
since Φξ (z) represents a stable map, the effect of this FIR T + 1 given by
approximation becomes negligible if T is sufficiently large; Φ⋆ = arg min sup EξT ∼Q ξT⊤ Φ⊤ DΦξT , (13)
for the case of LQR regulators, for instance, it was shown Φ achievable Q∈Bǫ (b
P)
that the performance degradation relative to the solution to while satisfying the achievability constraints (9) as well as
the infinite-horizon problem decays exponentially with T , conditional value-at-risk constraints
see [29, Section 5].
T ∼Q
According to the discussed FIR approximation, we let: sup CVaRξ1−γ (G⊤
j ΦξT + gj ) ≤ 0, ∀j ∈ [J], (14)
Q∈Bǫ (b
P)

Φx = [Φxw (T ), . . . , Φxw (0), Φxv (T ), . . . , Φxv (0)] , where J = Jx + Ju and [J] = {1, . . . , J} enumerates all the
Φu = [Φuw (T ), . . . , Φuw (0), Φuv (T ), . . . , Φuv (0)] , constraints on [x⊤ , u⊤ ], which are defined by
   
Gx 0 g
and we define Φ = [Φ⊤ ⊤ ⊤ G= , g= x .
x , Φu ] for compactness. With this 0 Gu gu
notation in place, for any t ≥ T , we have that:
We highlight that while (8a) is an infimum problem,
the minimum in (13) is attained. Indeed, as Ξ is full-
xt = Φx ξt−T :t , ut = Φu ξt−T :t , (10)
dimensional per Assumption 2, there is always a distri-
bution Qb such that E b J(π(Φ), ξ) is strongly convex in
where ξt−T :t = [wt−T , . . . , wt , vt−T , . . . , vt ]⊤ collects the Q
last T +1 realizations of the process and measurement noises. Φ (e.g., an empirical distribution containing samples that
The following proposition, for which we provide a proof form a basis for Rd ). Moreover, since EQb J(π(Φ), ξ) ≤
in Appendix A for the sake of comprehensiveness, shows supQ∈Bǫ (bP) EQ J(π(Φ), ξ) by definition, the supremum in
how to implement a controller that achieves a given pair of (13) is strongly convex and the minimizer Φ⋆ is attainable.
system responses Φx and Φu . However, both the control cost grow quadratically, which
can render supQ∈Bǫ (bP) EQ [J(π(Φ), ξ)] unattainable [3]3 . In
Proposition 1: If the closed loop map Φ is achievable, the
corresponding control policy π(Φ) can be implemented as 3 The ratio between the growth rates of the loss function and the transport
a linear system with dynamics cost is crucial in DRO problems. If the control cost grows faster than the
transport cost, the adversary can make the control cost diverge by moving an
δt = −Φx φt−T :t , ut = Φu φt−T :t + Φuv (0)Cδt , (11) infinitesimal amount of mass very far away from the empirical distribution.
Reversely, if the control cost grows slower, there is always be a point at
which it is not worth for the adversary to keep moving and the supremum
⊤ ⊤ ⊤ ⊤⊤
where φt−T :t = [δt−T +1 , . . . , δt−1 , 02n , yt−T , . . . , yt ] . is attained. This is the case for the constraints, as their cost grows linearly.
what follows, we use the recent advances in DRO theory We start by observing that if the loss is not concave
presented in [3] to reformulate the control design problem with respect to ξT , then the optimization problem in (17)
as a finite-dimensional and tractable problem. may not be convex. In fact, while [2] shows that there is a
hidden convexity when Ξ = Rd , this result does not hold
C. Strong duality for DRO of piecewise linear objectives in general. To illustrate this point, consider for example
The minimization (13) subject to (14) is infinite- the situation drawn in Fig. 1. One can observe that if the
dimensional and therefore cannot be directly solved. The constraint Q ∈ Bǫ (δ) is active, then the problem (17)
next proposition, which serves as a starting point for our amounts to a Quadratically Constrained Quadratic Program
derivations in Section IV, shows how DRO of piecewise (QCQP), which admits a tight convex relaxation as a Semi-
linear objectives can be recast as a finite-dimensional convex Definite Program (SDP) [31]. Conversely, however, when
program. the constraint Q ∈ Bǫ (δ) is not active, the adversary must
Proposition 2: Let aj ∈ Rd and bj ∈ R constitute a piece- maximize a convex Quadratic Program (QP), which is not
wise linear cost with J pieces. If Assumption 2 holds and convex.
ǫ > 0, then the risk:
local optimum
sup EξT ∼Q max a⊤
j ξT + bj , (15) with ε′ ℓ
j∈[J]
Q∈Bǫ
δ Q Q′
can be equivalently computed as:
1 X (i) Ξ
inf λǫ + s , subject to (16a) O
λ≥0,κij ≥0 N ξ
i∈[N ] ε
ε′
kaj k22 (i)
s(i) ≥ bj + − a⊤
j ξT (16b)

1 ⊤ 1 (i) ⊤ Fig. 1. Illustration of two worst-case distributions Q ∈ Bǫ (δ) and Q′ ∈
+ κ HH ⊤ κij − a⊤ H ⊤ κij + HξT + h κij , Bǫ′ (δ) in different Wasserstein balls around the Dirac delta distribution.
4λ ij 2λ j The support ξ is represented by the horizontal blue line above the ξ axis,
for all i = 1, . . . , N and j = 1, . . . , J. and the left-most Dirac distribution represents a local minima in Bǫ′ (δ) for
Proof: This proposition is a direct consequence of [3, the risk R(Q) in (17).
Proposition 2.12]. For the sake of clarity, we report detailed
derivations in Appendix C. Whether the constraint Q ∈ Bǫ (δ) is active or not depends
Proposition 2 uses strong duality to establish an equivalence on the value taken at the optimum by its Lagrange multiplier
between (16) and (15). In particular, the decision variables λ, which represents the shadow cost of robustification. The
λ and κij in (16) correspond to the Lagrange multipliers following proposition provides a sufficient condition for the
associated with the constraints Q ∈ Bǫ and ξT ∈ Ξ, constraint to be active by generalizing the example shown in
respectively. The optimal value of λ can thus be interpreted Fig. 1 to Rd .
as the shadow cost of robustification, i.e., the amount by Proposition 3: Let ∂Ξ = {ξ : max Hk ξ − hk = 0},
k∈[nH ]
which the risk EξT ∼Q maxj∈[J] a⊤ j ξT + bj increases for each where nH is the number of rows in H, denote the boundary
unit increase of ǫ. The variables s(i) instead represent the of Ξ. If
empirical Lagrangian for each sample.
1 X (i)
2
min ξT − ξ˜ > ǫ , (18)
IV. M AIN R ESULTS N ξ̃∈∂Ξ 2
i∈[N ]
In this section, we present our main results. Motivated by
that is, if the average squared distance between the samples
the observation that the operational costs of engineering ap-
and the border ∂Ξ of the support Ξ is strictly greater than
plications usually relate to energy consumption and are thus
epsilon, then the optimal shadow cost of robustification λ⋆
often modeled using quadratic functions, we first extend the
is greater than λmax (Q) for any Q ∈ Rd×d .
results of Proposition 2 beyond piecewise linear objectives.
Proof: The proof is given in Appendix D.
A. Non-convexity challenges Proposition 3 shows that λ is contingent on the radius ǫ, the
support Ξ, and the realizations ξ (i) . The radius ǫ is usually
While [3, Proposition 2.12] holds for general transport
small, as the samples should be approximating the real
costs and no matter if Ξ is bounded or not, this strong
distribution well enough, which means that the condition (18)
duality result does not directly apply to (13), as the objective
is often satisfied. In the next section, we utilize the inequality
J(π(Φ), ξ) is not piece-wise concave. An extension of
λ⋆ ≥ λmax (Q) to propose a strong dual formulation for (17).
current state-of-the-art results in DRO is therefore required
to minimize a risk of the form B. Tight convex relaxation for DRO of quadratic objectives
R(Q) := sup EξT ∼Q ξT⊤ QξT . (17) In this section, we present a convex upper bound for (17),
Q∈Bǫ
and prove that it becomes tight if λ is greater than λmax (Q),
where Ξ does not necessarily equal Rd and Q  0. the largest eigenvalue of Q.
Lemma 4: Let Q ∈ Rd×d be a symmetric and positive by
definite matrix. Under Assumption 2, if ǫ > 0 and if Ξ is
bounded, the risk (17) satisfies Φ⋆ = arg min lim R(Q + |η|I) , (21a)
Φ Φ achievable,Q η→0
 
1 X (i) Q ⋆
R(Q) ≤ inf λǫ + s , (19a) subject to 1  0. (21b)
λ≥0,µi ≥0, N D2Φ I
ψi ≥−µi i∈[N ]
α≥0 Proof: The proof can be found in Appendix F
subject to , ∀i ∈ [N ] : We continue our derivations by presenting an equivalent
  convex reformulation of the safety constraints in (14). In
(i)
s(i)−h⊤ψi +λkξT k22 ⋆ ⋆ particular, in the next proposition, we embed the function
 
 2λξT(i)+H⊤ψi 4(λI −Q) ⋆  0, (19b) max{· − τ, 0} in (4) as a (J + 1)th constraint.
H⊤ µi 0 4Q Lemma 7: Under Assumption 2 and if ǫ > 0, the con-
  straints (14) can be reformulated as the following convex
α ⋆
 0. (19c) LMIs
H⊤µi λI − Q
γ−1 1 X (i)
Moreover, (19a) holds with equality and (19c) is inactive if ρǫ + τ+ ζ ≤ 0, ρ ≥ 0, (22a)
γ N
the optimum λ⋆ of λ satisfies λ⋆ I ≻ Q. i∈[N ]

∀i ∈ [N ] , ∀j ∈ [J + 1] : κij ≥ 0 , (22b)
Proof: This result is obtained by taking the limit of (16) " #
(i) (i)
when the number J of pieces tends to infinity. The detailed ζ (i)− γ1(G⊤
j ΦξT +gj )−(HξT +h) κij


derivations are presented in Appendix E. 1 ⊤ ⊤  0,
γ Φ Gj −H κij 4ργ 2 I
We stress that our results continue to hold even if H = 0 (22c)
and h = 0, that is, if Ξ = Rd . In this case, (19) simplifies
substantially. where GJ+1 = 0 and gJ+1 = τ .
Corollary 5: Lemma 4 also holds if Ξ = Rd and (19) Proof: The proof can be found in Appendix G.
simplifies into Leveraging Lemmas 4, (6), and (7), we are now ready to
reformulate (13) subject to (14) as SDP.
1 X (i)
R(Q) = inf λǫ + s , (20a) Theorem 8: Under Assumption 2 and if ǫ > 0, the closed
λ≥0 N loop map given by
i∈[N ]
" #
(i)
s(i) +λkξT k22 ⋆ 1 X (i)
subject to (i)  0. (20b) Φ⋆ = arg min inf λǫ + s ,
λξT λI − Q (i) (i)
Φ achievable Q,s ,ζ ,τ, N
λ≥0,ρ≥0,α≥0, i∈[N ]
Proof: If Ξ = Rd , the problem (13) falls into the µi ≥0,κij ≥0,
ψi ≥−µi
assumptions of [2, Theorem 11]. Additionally, we observe
that, when H = 0 and h = 0, (20b) has the same Schur subject to
complement as (19b) and (19c) is always satisfied. (21b), (22a),
To understand the effect of having restricted our attention (19b), (19c), ∀i ∈ [N ],
to distributions with bounded support, it is of interest to
(22c), ∀i ∈ [N ], j ∈ [J +1],
compare (19) with (20). In both problems, the presence of
the term λI − Q in (19b) and (20b) implies that any feasible is stable and satisfies the safety constraints (14). Moreover,
solution has a shadow cost λ greater or equal than λmax (Q). it optimizes (13) if Ξ is bounded and the optimizer λ⋆ is

On the other hand, for (20b) to be feasible, λ should be
(i)
greater than λmax Φ⋆ ⊤ DΦ⋆ .
large enough to guarantee s(i) + λkξT k22 ≥ 0, whereas the Proof: We first highlight that Φ is FIR and therefore
presence of the additional term −h⊤ ψi in the top-left entry stable by definition. Second, the safety constraints (14) are
of (19b) softens this requirement, demonstrating the helpful equivalent to (22), as shown in Lemma 7. Third, consider
contribution of the bounded support. a closed loop map Φ, b which optimizes the expectation of
b⊤ DΦξ
ξT⊤ Φ b T + |η|kξT k2 for η 6= 0. With Q = Φ⊤ DΦ +
2
C. Convex formulation of DRInC design |η|I ≻ 0, Lemma 4 shows that R(Φ⊤ DΦ) is tightly upper-
Our results of Section IV-B does not directly allow us bounded by (19). Fourth and finally, as shown in Lemma 6,
to solve (13), as (12) shows that Q depends quadratically taking the limit η → 0 yields Φ b → Φ⋆ from (13), which
on Φ and may also be rank deficient. In this subsection, concludes the proof.
we mitigate the issues associated with quadratic matrix We remark that the reformulation proposed in Theorem 8
inequalities by employing a Schur complement, and we is exact whenever the true shadow cost of robustification
address singularity concerns by examining the behavior of λ is greater or equal than λmax (Q), a condition which is
the system as Q approaches singularity, showing that this always satisfied for sufficiently small ǫ as per Proposition 3.
limit remains well-behaved. When λ is lower than λmax (Q), the solution computed using
Lemma 6: Under Assumption 2, if ǫ > 0 and Ξ is Theorem 8 may instead be suboptimal. Nevertheless, our
bounded, the optimal closed loop map Φ⋆ in (13) is given solution retains safety and stability guarantees in face of the
uncertain distribution, since neither (22) nor the achievability [14] K. Kim and I. Yang, “Distributional robustness in minimax linear
constraints depend on λ. quadratic control with Wasserstein distance,” SIAM Journal on Control
and Optimization, vol. 61, no. 2, pp. 458–483, 2023.
[15] V. Krishnan and S. Martı́nez, “A probabilistic framework for moving-
V. C ONCLUSION horizon estimation: Stability and privacy guarantees,” IEEE Transac-
tions on Automatic Control, vol. 66, no. 4, pp. 1817–1824, 2020.
We have presented an end-to-end synthesis method from [16] J.-S. Brouillon, F. Dörfler, and G. Ferrari-Trecate, “Regularization for
a collection of a finite number of disturbance realizations to distributionally robust state estimation and prediction,” arXiv preprint
the design of a stabilizing linear policy with DR safety and arXiv:2304.09921, 2023.
[17] L. Aolaritei, N. Lanzetti, H. Chen, and F. Dörfler, “Distribu-
performance guarantees. Our approach consists in estimating tional uncertainty propagation via optimal transport,” arXiv preprint
an empirical distribution using samples of the uncertainty, arXiv:2205.00343, 2023.
and then computing a feedback policy that safely minimizes [18] L. Aolaritei, N. Lanzetti, and F. Dörfler, “Capture, propagate, and
the worst-case expected cost over all distributions within a control distributional uncertainty,” arXiv preprint arXiv:2304.02235,
2023.
Wasserstein ball around the nominal estimate through the [19] Y.-S. Wang, N. Matni, and J. C. Doyle, “A system-level approach
solution of an SDP. We have shown that, as the radius of to controller synthesis,” IEEE Transactions on Automatic Control,
this ambiguity set varies, our problem statement recovers vol. 64, no. 10, pp. 4079–4093, 2019.
[20] J. Coulson, J. Lygeros, and F. Dörfler, “Distributionally robust chance
classical control formulations. To address the resulting op- constrained data-enabled predictive control,” IEEE Transactions on
timal control problem, we have established a novel tight Automatic Control, vol. 67, no. 7, pp. 3289–3304, 2021.
convex relaxation for DRO of quadratic objectives, and we [21] A. Hakobyan and I. Yang, “Wasserstein distributionally robust control
of partially observable linear systems: Tractable approximation and
have combined our results with the system level synthesis performance guarantee,” in 2022 IEEE 61st Conference on Decision
framework, presenting conditions under which our design and Control (CDC). IEEE, 2022, pp. 4800–4807.
method is non-conservative. [22] K. I. Park, M. Park, and James, Fundamentals of probability and
Future work will validate the effectiveness of our approach stochastic processes with applications to communications. Springer,
2018.
by means of numerical simulations and real-world experi- [23] N. Fournier and A. Guillin, “On the rate of convergence in wasserstein
ments. distance of the empirical measure,” Probability theory and related
fields, vol. 162, no. 3-4, pp. 707–738, 2015.
R EFERENCES [24] T. S. Badings, A. Abate, N. Jansen, D. Parker, H. A. Poonawala, and
M. Stoelinga, “Sampling-based robust control of autonomous systems
[1] P. Mohajerin Esfahani and D. Kuhn, “Data-driven distributionally with non-gaussian noise,” in Proceedings of the AAAI Conference on
robust optimization using the wasserstein metric: Performance guar- Artificial Intelligence, vol. 36, no. 9, 2022, pp. 9669–9678.
antees and tractable reformulations,” Mathematical Programming, vol. [25] L. Blackmore, M. Ono, A. Bektassov, and B. C. Williams, “A proba-
171, no. 1-2, pp. 115–166, 2018. bilistic particle-control approximation of chance-constrained stochastic
[2] D. Kuhn, P. M. Esfahani, V. A. Nguyen, and S. Shafieezadeh-Abadeh, predictive control,” IEEE transactions on Robotics, vol. 26, no. 3, pp.
“Wasserstein distributionally robust optimization: Theory and appli- 502–517, 2010.
cations in machine learning,” in Operations research & management [26] B. Hassibi, A. H. Sayed, and T. Kailath, Indefinite-quadratic estima-
science in the age of analytics. Informs, 2019, pp. 130–166. tion and control: a unified approach to H2 and H∞ theories. SIAM,
[3] S. Shafieezadeh-Abadeh, L. Aolaritei, F. Dörfler, and D. Kuhn, 1999.
“New perspectives on regularization and computation in optimal [27] K. Zhou and J. C. Doyle, Essentials of robust control. Prentice hall
transport-based distributionally robust optimization,” arXiv preprint Upper Saddle River, NJ, 1998, vol. 104.
arXiv:2303.03900, 2023. [28] J. Anderson, J. C. Doyle, S. H. Low, and N. Matni, “System level
[4] A. L. Gibbs and F. E. Su, “On choosing and bounding probability synthesis,” Annual Reviews in Control, vol. 47, pp. 364–393, 2019.
metrics,” International statistical review, vol. 70, no. 3, pp. 419–435, [29] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample
2002. complexity of the linear quadratic regulator,” Foundations of Compu-
[5] C. Villani et al., Optimal transport: old and new. Springer, 2009, tational Mathematics, vol. 20, no. 4, pp. 633–679, 2020.
vol. 338. [30] B. P. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari, “Distri-
[6] Z. Chen, D. Kuhn, and W. Wiesemann, “Data-driven chance con- butionally robust control of constrained stochastic systems,” IEEE
strained programs over wasserstein balls,” Operations Research, 2022. Transactions on Automatic Control, vol. 61, no. 2, pp. 430–442, 2015.
[7] C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio, [31] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge
“Learning with a wasserstein loss,” Advances in neural information university press, 2004.
processing systems, vol. 28, 2015. [32] J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimiza-
[8] D. O. Adu, T. Başar, and B. Gharesifard, “Optimal transport for a tion: Theory and Examples. Springer New York, 2005.
class of linear quadratic differential games,” IEEE Transactions on [33] R. G. Bartle, The elements of integration and Lebesgue measure. John
Automatic Control, vol. 67, no. 11, pp. 6287–6294, 2022. Wiley & Sons, 2014.
[9] B. Taşkesen, D. A. Iancu, Ç. Koçyiğit, and D. Kuhn, “Distributionally
[34] M. Sion, “On general minimax theorems.” Pacific Journal of Mathe-
robust linear quadratic control,” arXiv preprint arXiv:2305.17037,
matics, vol. 8, no. 4, pp. 171–176, 1958.
2023.
[10] C. Mark and S. Liu, “Stochastic MPC with distributionally robust
chance constraints,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 7136–
7141, 2020. A PPENDIX
[11] M. Fochesato and J. Lygeros, “Data-driven distributionally robust
bounds for stochastic model predictive control,” in 2022 IEEE 61st
Conference on Decision and Control (CDC). IEEE, 2022, pp. 3611– A. SLS controller implementation
3616.
[12] L. Aolaritei, M. Fochesato, J. Lygeros, and F. Dörfler, “Wasser- From [19],
stein tube MPC with exact uncertainty propagation,” arXiv preprint
arXiv:2304.12093, 2023.
[13] I. Yang, “Wasserstein distributionally robust stochastic control: A data- δ = (I − zΦxw (z))δ − Φxv (z)y,
driven approach,” IEEE Transactions on Automatic Control, vol. 66,
no. 8, pp. 3863–3870, 2020. u = zΦuw (z)δ + Φuv (z)y,
which means that at each timestep t ≥ T , one has ∀k = 1, . . . , T and Φ(T +1) = 0. In matrix form, this yields
  
T
X T
X 0 I ... 0 0 0
δt = δt − Φxw (k)δt−k+1 − Φxv (k)yt−k ,  .. .. . . .. .. .. 
. . . . 
 . . 
k=1 k=1   0 
0 0 . . . I 0 0 
T
X T
X   
 0 0 ... 0 I 0 
ut = Φuw (k)δt−k+1 + Φuv (k)yt−k . (23)  
 0 0 ... 0 0 I
 
k=1 k=0  
 0 I ... 0 0 0 
The achievability constraints (24) imply Φxw (1) = I and [I, 0]Φ  .. .. . . .. .. .. 
. . . . . .
  
Φuw (1) = Φuv (0)C, (see Appendix B). Hence, (23) can be  0 0 . . . I 0 0
 0  
reformulated at as  0 0 . . . 0 I 0
 
T
X −1 T
X 0 0 ... 0 0 I
δt = − Φxw (k + 1)δt−k − Φxv (k)yt−k ,
k=1 k=1
|} | {z } |} |} |} | {z } |} |}
(25g) (25c) (25b) (25a) (25g) (25d) (25b) (25a)
T
X −1 T
X  
ut = Φuv (0)Cδt + Φuw (k + 1)δt−k + Φuv (k)yt−k . = [0, 0, . . . , 0, I, 0], [0, 0, . . . , 0, 0, 0] +
  
k=1 k=0 I 0 ... 0 0 0
0 I . . . 0 0 0 
Writing this controller implementation in matrix form and  . . . 
 . . . . .. .. ..  
noting that Φxv (0) = 0 yields (11).  . . . . .  0 
 
 0 0 . . . I 0 0 
B. Infinite horizon Achievability  
 0 0 ... 0 I 0  
 
Proposition 9: The achievability constraints (9) are equiv-  I 0 ... 0 0 0 
[A, B]Φ 0 I . . . ,
alent to  . . . 0 0 0 
 −   +    .. .. . . .. .. .. 
Z ⊗In 0 Z ⊗In 0 

0  . . .

[I, 0]Φ = [A, B]Φ  0 0 . . . 0
0 Z − ⊗Ip 0 Z + ⊗Ip I 0
+ [ZT++1 ⊗In , ZT++1 ⊗0], (24a) 0 0 ... 0 I 0
 −
 +   |} | {z } |} |} |} | {z } |} |}
Z ⊗In Z ⊗A
Φ − =Φ + (25g) (25c) (25b) (25a) (25g) (25d) (25b) (25a)
Z ⊗(0C) Z ⊗C
 +  and
ZT +1 ⊗In    
+ , (24b) 0 I ... 0 0 0 A 0 ... 0 0 0
ZT++1 ⊗(0C)
 .. .. ..
.
.. .. ..   0 A ... 0 0 0 
 . . . . .   . .. .. .. .. ..  
where Z + = [IT +1 , 0], Z − = [0, IT +1 ] are in 
0
  . .  
0 0 ... I 0 
  . . . . . 
R(T +1)×(T +2) , and ZT++1 is the last row of Z + .     
 0 0 ... 0 I 0   0 0 ... A 0 0 
Proof: By treating Φx and Φu as FIR filters:    
0 0 ... 0 0 I 
  0 0 ... 0 A 0 

 
T
X  0 0 ... 0 0 0   C 0 ... 0 0 0 
Φ .. .. .. .. .. ..  = Φ 
Φxw (k)z −k+1 −AΦxw (k)z −k −BΦuw (k)z −k = I,  . . . . . .   0 C ... 0 0 0 
   .. .. .. .. .. .. 

k=1 0 0 ... 0 0 0   . . . . . . 
   
0
T
X  0 0 ... 0 0 0   0 0 ... C 0
Φxv (k)z −k+1 −AΦxv (k)z −k −BΦuv (k)z −k = BΦuv (0), 0 0 ... 0 0 0 0 0 ... 0 C 0
k=1
XT |} | {z } |} |} |} | {z } |} |}
Φxw (k)z −k+1 −Φxw (k)Az −k −Φxv (k)Cz −k = I, (25g) (25e),(25f) (25b) (25a) (25g)

(25e),(25f) (25b) (25a)

k=1 0, 0, . . . , 0, I, 0
+ .
T
X 0, 0, . . . , 0, 0, 0
Φuw (k)z −k+1 −Φuw (k)Az −k −Φuv (k)Cz −k = Φuv (0)C, The matrices can be written in a compact form as (24), which
k=1
concludes the proof.
which is equivalent to
C. Proof of Proposition 2
Φxw (0) = 0, Φxv (0) = 0, Φuw (0) = 0, (25a) The risk (15) is contingent on three mathematical objects:
Φxw (1) = I, Φxv (1) = BΦuv (0), Φuw (1) = Φuv (0)C, (25b) 1) A loss function max ℓj (ξT ) = max a⊤
j ξT + bj ,
j∈[J] j∈[J]
Φxw (k + 1) = AΦxw (k)+BΦuw (k) , ∀k = 1, . . . , T, (25c) (i) (i)
k22 ,
2) A transport cost c(ξnT , ξT ) = kξT − ξT o
Φxv (k + 1) = AΦxv (k)+BΦuv (k) , ∀k = 1, . . . , T, (25d)
3) and a support Ξ = ξ : max fk (ξ) ≤ 0 , where nH
Φxw (k + 1) = Φxw (k)A+Φxv (k)C , ∀k = 1, . . . , T, (25e) k∈[nH ]
is the number of rows in H and fk (ξ) = Hk ξ − hk .
Φuw (k + 1) = Φuw (k)A+Φuv (k)C , ∀k = 1, . . . , T, (25f)
Moreover, since the loss is concave and both the transport
Φ(T + 1) = 0, (25g) cost and the support are convex, (15) shows strong duality
properties if and only if it is strictly feasible. The strict from the boundary of Ξ. Third and finally, let wmax (Q) be an
feasibility is guaranteed by the full-dimensionality of Ξ and eigenvector of Q associated with λmax (Q). The distribution
the strict positivity of ǫ. The dual problem is given by [3] as Q⋆ satisfies
N
1 X (i) dR(Q) 1 ⊤
inf λǫ + s , = lim sup E ξ′ ∼Q′ ξT′ QξT′ −ξT⊤ QξT
λ≥0 N i=0 dǫ dǫ→0 dǫ Q′ ∈Bdǫ (Q⋆ ) T
+
ξT ∼Q⋆

1
(i)
subject to sup ℓ(ξT ) − λc(ξT , ξT ) ≤ s(i) , ∀i ∈ [N ]. = lim+ sup E ξ′ ∼Q′ (ξT′ −ξT )⊤ Q(ξT′ −ξT )
dǫ→0 dǫ Q′ ∈Bdǫ (Q⋆ ) T
ξT ∈Ξ ξT ∼Q⋆

While the dual problem does not seem much simpler to solve − 2(ξT′ −ξT )⊤ QξT
than the primal at first glance, we use [3, Proposition 2.12] to 1
≥ lim max δλmax (Q)kdξk22 , (28)
reformulate it using convex conjugates. In our own notation, dǫ→0+dǫ δkdξk22 ≤dǫ
this gives √
because moving δ of Q⋆ ’s mass by kdξk ≤ δ −1 dǫ in the
1 X (i)
inf λǫ+ s , subject to , ∀i ∈ [N ], ∀j ∈ [J] : direction of ±wmax (Q) to obtain Q′ remains in Bdǫ (Q⋆ ) if
λ≥0 ,κijk ≥0 N dǫ ≤ δ 2 , which is true at the limit dǫ → 0+ . Hence, the
i∈[N ]
 c  X f
! inequality (28) implies that
ζij (i) ζijk
(i) ⋆ ℓ
s ≥ (−ℓj ) (ζij ) + λc ⋆
, ξb + κijk fk ⋆
, dR(Q) 1
λ T κijk λ⋆ = ≥ lim+ λmax (Q)dǫ = λmax (Q),
k∈[nH ]
X f dǫ dǫ→0 dǫ
ℓ c
ζij + ζij + ζijk = 0, (26)
which concludes the proof.
k∈[nH ]

where (−ℓj )⋆ is the convex conjugate of the opposite of E. Proof of Lemma 4


ℓj , c⋆ is the convex conjugate of the transport cost c with In order to prove Lemma 4, we first need the following
respect to the first argument, and fk⋆ is the convex conjugate proposition.
of fk . Note that the case where λ = 0 is also well defined Proposition 10: Under the assumptions of Lemma 4, the
in [3] despite the division. All three functions are either risk R(Q) defined in (17) satisfies
linear or quadratic so their conjugates are well-known [32].
Both −ℓj and fk are linear so their convex conjugates are 1 X (i)
R(Q) ≤ inf λǫ + s , subject to (29a)
bj and hk if the conjugates’ arguments are equal to −aj and λ≥0,κi ≥0 N
i∈[N ]
Hk , respectively, and infinite otherwise. The conjugate of the (i)
(i) (i) s (i)
≥ max −ξ̄ (Q−λ−1 Q2 )ξ̄ − 2ξ̄ ⊤ QξT

(29b)
transport cost is given by c⋆ (ζ, ξT ) = 41 ζ ⊤ ζ − ζ ⊤ ξT . ξ̄∈Ξ
In order to minimize (26), one must avoid infinite costs, 1 ⊤ 1 (i) ⊤
which adds constraints on λ, ζij ℓ f
, and ζijk . This means that + κi HH ⊤κi − ξ̄ ⊤ QH ⊤κi + HξT +h κi ,
4λ λ
(26) is equivalent to
for all i = 1, . . . , N . Moreover, (19a) holds with equality if
1 X (i) the optimum λ⋆ of λ satisfies λ⋆ I  Q.
inf λǫ + s ≤ 0, (27a)
λ≥0 ,κijk ≥0 N Proof: The proof starts by linking the formulation
i∈[N ]
(15) for piece-wise affine costs to R(Q). To do so, we
subject to , ∀i ∈ [N ] , ∀j ∈ [J] :
X approximate the quadratic cost using its tangents at each
1 c ⊤ c c (i) point of a d-dimensional grid GJ ⊆ Ξ, composed of J points.
s(i) ≥ bj + (ζij ) ζij − ζij ξT + κijk hk , (27b)
4λ Because the approximation gets closer with more points, this
k∈[nH ]
X f f yields
ℓ c ℓ
ζij + ζij + ζijk = 0 , ζij = −aj , ζijk = κijk Hk . (27c)
k∈[nH ] ξT⊤ QξT = lim max 2ξT⊤ Qξj − ξj⊤ Qξj ,
J→∞ j∈[J]
To conclude the proof, we stack κijk for all k ∈ [nH ] into a
vector κij and plug the equality constraints (27c) into (27b) where ξj is the j th element of GJ . In order to obtain a
to obtain (16). formulation that fits (15), one must show that the limit
operator commutes with the supremum and the expectation.
D. Proof of Proposition 3 We show the commutation of the limit with the dominated
We first note that when Ξ is bounded, as mass cannot be convergence theorem [33] by finding bounds on the piece-
moved infinitely far away, the supremum of (17) is attained. wise affine approximation error
This means that Q⋆ = arg maxQ∈Bǫ EξT ∼Q ξT⊤ QξT exists.
∆J = ξT⊤ QξT − max 2ξT⊤ Qξj − ξj⊤ Qξj ,
Second, the limited average squared distance between the j∈[J]
samples and the border of Ξ implies that no distribution in = ξT⊤ QξT + min ξj⊤ Qξj − 2ξT⊤ Qξj ,
b has mass only at the border of Ξ, as the trasport cost
Bǫ (P) j∈[J]
would be greater than ǫ. This means that there exists√a δ > 0 = min (ξT − ξj )⊤ Q(ξT − ξj ) .
such that Q⋆ has an amount δ of mass more than δ away j∈[J]
Note that ∆J ≥ 0 because the tangents of a quadratic func- Due to the boundedness of Ξ, the grid GJ fills the entire set
tion are always below the curve. Moreover, the inequality when J tends to ∞. Hence, R(Q) is equal to
∆J ≤ λmax (Q) minj∈[J] kξT −ξj k22 is satisfied by definition.
Furthermore, the distance minj∈[J] kξT − ξj k22 between 1 X (i)
inf λǫ + s , subject to
any ξT ∈ Ξ and the closest point of the grid GJ ⊆ Ξ can λ≥0 N
i∈[N ]
be bounded as (i)
√ 1
s ≥ max min f (ξ̄, κi , λ) , ∀i ∈ [N ] . (30a)
min kξT − ξj k22 ≤ 2r(Ξ) dJ − d , ∀ξT ∈ Ξ , ξ̄∈Ξ κi ≥0
j∈[J]

where r(Ξ) < ∞ is the radius of a ball containing Ξ, which In general, one has maxξ̄∈Ξ minκi ≥0 f (ξ̄, κi , λ) ≤
is finite because Ξ is bounded. This gives the following minκi ≥0 maxξ̄∈Ξ f (ξ̄, κi , λ). This means that (29b) is a
inequality stricter constraint than (30a), yielding a larger infimum.
1
Nevertheless, if f is not only convex in κ but also concave
ξT⊤ QξT −∆Q J − d ≤ max 2ξT⊤ Qξj −ξj⊤ Qξj ≤ ξT⊤ QξT , in ξ, then Sion’s minimax theorem proves that the max and
j∈[J]
√ min operators commute [34, Corollary 3.3]. This means that
where ∆Q = 2r(Ξ) dλmax (Q). Finally, if all points of a if Q − λ−1 Q2  0, (29b) and (30a) are equivalent, which
function satisfy an inequality, its supremum must satisfy i as concludes the proof
well, hence
1
R(Q)−∆Q J − d ≤ sup EξT ∼Q max 2ξT⊤ Qξj −ξj⊤Qξj ≤ R(Q). Using Proposition 10, we are now ready to prove Lemma
Q∈Bǫ j∈[J] 4 by dualizing (29b) to remove the max operator, and by
1 using Schur’s complement to obtain linear inequalities. We
The limit limJ→∞ R(Q)−∆Q J − d is equal to R(Q). There-
start by highlighting that (29b) contains the maximization of
fore, the supremum of the piece-wise linear approximation
the quadratic cost
is squeezed into the equality
(i)
lim sup EξT ∼Q max 2ξT⊤ Qξj −ξj⊤ Qξj = R(Q). − ξ̄ ⊤ (Q − λ−1 Q2 )ξ̄ − ξ̄ ⊤ (2QξT +λ−1 QH ⊤ κi )
J→∞ Q∈Bǫ j∈[J] | {z } | {z }
quadratic linear
The second part of the proof aims at bringing the limit ⊤
1 (i)
back into the problem and evaluating it. Using the previous + κ⊤ ⊤
i HH κi + HξT + h κi ,
result and Proposition 2 with aj = 2Qξj and bj = −ξj⊤ Qξj , |4λ {z }
we know that R(Q) as defined in (17) is equal to constant

1 X (i) subject to convex polytopic constraints H ξ̄ − h ≤ 0. The


lim inf λǫ + s , subject to
J→∞ λ≥0,κij ≥0 N dual problem is therefore given by [31] as
i∈[N ]
(i)
s ≥ f (ξj , κij , λ) , ∀i ∈ [N ] , ∀j ∈ [J] , (♣)
z }| {
where ⊤ 1 ⊤ ⊤ (i) ⊤
min −µi h+ κi HH κi + HξT +h κi
1 (i) µi ≥0 4λ
f (ξ, κ, λ) = −ξ ⊤ Qξ+ ξ ⊤ Q2 ξ − 2ξ ⊤ QξT 2
λ 1 1 (i)
1 1 ⊤ + H⊤µi − (HQ)⊤κi −2QξT , (31a)
+ κ⊤ HH ⊤ κ− ξ ⊤ QH ⊤ κ + HξT + h κ ,
(i) 4 λ Q2
4λ λ  1 
(i)
Since there are only existence constraints for κij , one can subject to Pλ H⊤µi − (HQ)⊤κi −2QξT = 0, (31b)
λ
equivalently write
1 X (i) where k · k2Q2 = ·⊤ Q2 ·, Q2 = (Q − λ−1 Q2 )† , and Pλ is
lim inf λǫ +
J→∞ λ≥0 N
s , subject to the projection on null(Q†2 ) = null(λI − Q). Note that Pλ =
i∈[N ] I − (λI − Q)† (λI − Q) is symmetric, commutes with Q
s(i) ≥ min f (ξj , κij , λ) , ∀i ∈ [N ] , ∀j ∈ [J] , and Q−1 , and is equal to both its square and pseudo-inverse.
κij ≥0
Since we are looking for an upper bound for R(Q) when
The constraint holding for all j means that there are infinitely λ ≤ λmax (Q), we can replace (31b) by the stricter constraint
many constraints to satisfy. However, one can collapse all the
(i) 
constraints for a given i into Pλ H⊤µi = 0 , Pλ 2λξT +H⊤κi = 0, (32)
(i)
s ≥ max min f (ξj , κij , λ) , ∀i ∈ [N ] ,
j∈[J] κij ≥0 as it leads to a larger minimum if Pλ 6= 0 and as it is
Interestingly, the cost does not depend on J. This means that equivalent for any λ > λmax (Q) because Pλ = 0. Moreover,
the limit can be moved into the constraint as the last term of (31a) can be split as
1 X (i)
inf λǫ + s , subject to 1 (i) 1 2
λ≥0 N 2QξT + (HQ)⊤κi
i∈[N ] 4 λ Q2

s (i)
≥ lim max min f (ξj , κij , λ) , ∀i ∈ [N ] , 1 (i) 1 ⊤ 1
J→∞ j∈[J] κij ≥0
− 2QξT + (HQ)⊤κi Q2 H⊤µi + µ⊤i HQ2 H⊤µi ,
2 λ 4
or equivalently, In general, the right-hand side of (29b) is smaller than
1 (i) (i) (38), which means that s(i) ≥ (38) implies (29b). Moreover,
(2λξT +H⊤κi )⊤ (λ2 Q−1 −λI)† (2λξT +H⊤κi ) (33a) if λI − Q  0, the problem (38) is a convex and strictly
4
1 (i) ⊤ feasible QP. Strong duality therefore shows that the right-
− 2λξT +H⊤κi (λI − Q)† H⊤µi (33b) hand side of (29b) is equal to (38) in this case. Finally,
2
1 we replace the upper bound on a minimum by an existence
+ µ⊤i HQ2 H⊤µi , (33c) constraint and perform the change of variable ψi = κi − µi
4
In order to obtain some simplifications, we use the fol- to rewrite (29b) as
lowing Woodbury-like identities: z {
 † (i) ⊤ (i) 2 1 ⊤ −1 ⊤
1 1 s ≥ h ψi −λkξT k2 + µi HQ H µi
(λ2 Q−1 −λI)† = 2 Q−1 − I 4
λ λ 1 (i) ⊤ ⊤
 (i) 
  † + 2λξT +H ψi (λI −Q)† 2λξT +H⊤ψi .
1 −1 1 −1 1
= Q − Q + 2I Q − I −1 1 |4 }
λ λ λ λ
  † Applying Schur’s lemma to the two terms highlighted with
† 1 1 −1 −1 1 brackets and with (32), we obtain
= (λI −Q) + 2 I − Q Q − I
λ λ λ 1 X (i)
† 1 R(Q) ≤ inf λǫ + s ,
= (λI −Q) − (I − Pλ ) (34a) λ≥0,µi ≥0, N
λ ψi ≥−µi i∈[N ]

Q2 = λQ−1 (λI − Q)† , (34b) subject to , ∀i ∈ [N ] :


−1 †
= (I + λQ − I)(λI − Q) Pλ H ⊤ µi = 0,
 
= (λI − Q)† + Q−1 (I − Pλ ) (i) ⊤ 1 (i) ⊤
2
(i)
+λkξT k22
s −h ψi +4λ 2λξT +H κi ⋆ ⋆ 
= (λI − Q)† + Q−1 − Pλ Q−1 Pλ . (34c)  (i)
Pλ  0,
 2λξT +H⊤ψi 4(λI −Q) ⋆ 
We plug (34a), (34b), and (34c) into (33a), (33b), and (33c), H⊤ µi 0 4Q
respectively, which gives (♠)
z }| { where the equality holds when λI − Q  0. We highlight
1 2 1 2 (i)
(i)
(33) = 2λξT +H κi ⊤

(i) ⊤
2λξT +H κi that Pλ = 0 if λI − Q ≻ 0. Moreover, Pλ (2λξT +H⊤κi ) =
4 (λI−Q)† 4λ 2 (i)
0 because both Pλ H ⊤ µi and Pλ (2λξT + H⊤ψi ) are zero.

1 2
+
(i)
2λξT +H⊤κi (⋆) (35a) Finally, the constraint Pλ H ⊤ µi = 0 can be enforced as LMI
4λ Pλ using Schur’s complement of α − µ⊤ † ⊤
i H(λI − Q) H µi with
1 (i)
⊤ an arbitrarily large α, which concludes the proof.
− H⊤κi +2λξT (λI −Q)† H⊤µi (35b)
2
1 1 1 F. Proof of Lemma 6
+ kH⊤µi k2(λI−Q)†+ kH⊤µi k2Q−1− kPλ H⊤µi k2Q−1 . (35c)
4 |4 {z } |4 {z } The proof is conducted in three parts. First, we rewrite
() (⋆) the quadratic form Φ⊤DΦ as a matrix Q to obtain linear
constraints. Second, we analyze the suboptimality when Q ≻
The terms (♣) in (31a) can be grouped by completing the
0 and show that it vanishes when Q → Φ⊤DΦ. Third and
squares as
finally, we rewrite all the constraints as LMIs.
1 2
(i) (i)
2λξT +H⊤κi −λkξT k22 + κ⊤ i h, (36) We start by showing that

| {z 2
}
(♠)
R(Φ⊤DΦ) = min R(Q). (39)
QΦ⊤DΦ
We remark that the terms marked by (♠) in (36) and (35c) Recall the definition
cancel out, and that the terms marked by (⋆) in (35) can be
factorized as R(Q) := sup EξT ∼Q ξT⊤ QξT ,
  Q∈Bǫ
(i) (i) 1
(2λξT +H⊤κi +H⊤µi )⊤Pλ 2ξT + H⊤κi −Q−1 H⊤µi , (37)
λ and note that for any ξT ∈ Ξ, if Q  Φ⊤DΦ the following
because the cross terms are in the null space of (λI − Q). inequality holds
The constraint (31b) implies that (37) is zero, so the terms
marked by (⋆) in (35) cancel out. Finally, all remaining ξT⊤ QξT ≥ ξT⊤ Φ⊤DΦξT .
terms besides () in (35c) can be factorized. Hence, the Hence, because probability distributions are non-negative and
dual problem (31a) is equal to integrals preserve the order, one has
1 (i)  
min h⊤(κi −µi ) + kH⊤µi k2Q−1 −λkξT k22 (38) EξT ∼Q ξT⊤ QξT ≥ EξT ∼Q [ξT⊤ Φ⊤DΦξT ],
µi ≥0 4
1 2
(i) for any probability distribution Q and therefore also for
+ 2λξT +H⊤(κi −µi ) ,
4 (λI−Q)† the worst one. Hence, Q  Φ⊤DΦ implies that R(Q) ≥
R(Φ⊤DΦ). Moreover, the equality is attained because
Φ⊤DΦ ∈ arg minQΦ⊤DΦ R(Q).
The proof continues by showing
R(Φ⊤DΦ) = min lim R(Q + |η|I). (40)
QΦ⊤DΦ η→0

Note that R(Q) = R(limη→0 Q + |η|I), where one can take


the limit out of the risk using the inequality
R(Q) + |η| max kξT k22 ≥ R(Q + |η|I) ≥ R(Q), (41)
ξT ∈Ξ

which holds if Ξ is bounded. This means that the limit for


η → 0 is squeezed between two values that tend towards
R(Q).
We finish the proof by expressing Q  Φ⊤DΦ as a Schur
complement. This yields
 1
Q − ηI Φ⊤ D 2
1  0. (42)
D2Φ αI
Combining (40) and (42) yields (21), which concludes the
proof.
G. Proof of Lemma 7
Lemma 7 is a direct consequence of applying Proposition
2 to the definition (4). Indeed, with GJ+1 = 0 and gJ+1 = τ ,
one can rewrite (14) as (15) by setting aj = γ −1 Gj Φ, bj =
γ −1 (gj − τ + γτ ). This means that (14) is equivalent to
1 X (i)
inf ρǫ + s ≤ 0,
ρ≥0,κij ≥0 N
i∈[N ]

subject to , ∀i ∈ [N ] , ∀j ∈ [J + 1] :
1 1 (i) (i) ⊤
s(i) ≥ (gj −τ +γτ )− G⊤ ΦξT + HξT +h κij
γ γ j
kH⊤κij k22 1 ⊤ kΦ⊤ Gj k22
+ − Gj ΦH⊤κij + .
4ρ 2ργ 4ργ 2
One can factorize the last three terms of the constraint and
do the change of variable ζ (i) = s(i) +γ −1 τ −τ , which gives
1 1 X (i)
inf ρǫ − τ + τ + ζ ≤ 0, (43a)
ρ≥0,κij ≥0 γ N
i∈[N ]

subject to , ∀i ∈ [N ] , ∀j ∈ [J + 1] :
1 1 (i) (i) ⊤
ζ (i) ≥ gj − G⊤ j ΦξT + HξT + h κij (43b)
γ γ
1
+ (Φ⊤ Gj −γH⊤κij )⊤(Φ⊤ Gj −γH⊤κij ).
4ργ 2
Finally, a zero upper-bound constraint on an infimum is
equivalent to an existence constaint. Moreover, because ρ ≥
0, (43b) can be written as an LMI using Schur’s complement,
which concludes the proof.

You might also like