cs_rep_mathprog17
cs_rep_mathprog17
A (2017) 161:1–32
DOI 10.1007/s10107-016-0998-2
Received: 14 January 2015 / Accepted: 22 February 2016 / Published online: 1 April 2016
© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2016
B Venkat Chandrasekaran
[email protected]
Parikshit Shah
[email protected]
123
2 V. Chandrasekaran, P. Shah
Formally, relative entropy programs (REPs) are conic optimization problems in which
a linear functional of a decision variable is minimized subject to linear constraints as
well as a conic constraint specified by a relative entropy cone. The relative entropy
cone is defined for triples (ν, λ, δ) ∈ Rn × Rn × Rn via Cartesian products of the
following elementary cone RE 1 ⊂ R3 :
ν
RE 1 = (ν, λ, δ) ∈ R+ × R+ × R | ν log ≤δ . (1)
λ
Sublevel sets of the relative entropy function d(ν, λ) between two nonnegative vectors
ν, λ ∈ Rn+ can be represented using the relative entropy cone RE n and a linear equality
as follows:
d(ν, λ) ≤ t ⇔ ∃δ ∈ Rn s.t. (ν, λ, δ) ∈ RE n , 1 δ = t.
REPs can be solved efficiently to a desired accuracy via interior-point methods due
to the existence of computationally tractable barrier functions for the convex function
ν log(ν/λ) for ν, λ ≥ 0 [44].
The relative entropy cone RE 1 is a reparametrization of the exponential cone
{(ν, λ, δ) ∈ R+ ×R+ ×R | exp νδ ≤ λν }, which has been studied previously [24]. One
can easily see this based on the following relation for any (ν, λ, δ) ∈ R+ × R+ × R:
δ λ
exp ν ≤ ν ⇔ (ν, λ, −δ) ∈ RE 1 .
123
Relative entropy optimization and its applications 3
k
inf ci(0) exp a(i) x
x∈Rn
i=1
k
( j)
s.t. ci exp a(i) x ≤ 1, j = 1, . . . , m. (3)
i=1
inf c(0) z
x∈Rn
y,z∈Rk
s.t. c( j) z ≤ 1, j = 1, . . . , m (4)
yi ≤ log(zi ), i = 1, . . . , k
a(i) x = yi , i = 1, . . . , k.
Applications of SOCPs include filter design in signal processing [46], portfolio opti-
mization [42], truss system design [4], and robust least-squares problems in statistical
inference [22]. It is well-known that any second-order cone L k for k ≥ 2 can be
represented via suitable Cartesian products of L 2 [6]. In the “Appendix”, we show
that the second-order cone L 2 ∈ R3 can be specified using linear and relative entropy
inequalities:
123
4 V. Chandrasekaran, P. Shah
L 2 = (x, y) ∈ R2 × R | y − x1 ≥ 0, y + x1 ≥ 0 and ∃ν ∈ R+ s.t.
ν ν
ν log + ν log − 2ν ≤ 2x2
y + x1 y − x1
ν ν
ν log + ν log − 2ν ≤ −2x2 . (6)
y + x1 y − x1
n
inf log 1 + exp −y (i) w x(i) + b + μr (w)
w∈Rk ,b∈R
i=1
On the other hand, sublevel sets of the regularizers w 1 and w 22 can be described
via LPs and SOCPs. Consequently, regularized logistic regression problems with the
1 -norm or the squared-2 -norm regularizer are examples of REPs.
Although REPs contain GPs and SOCPs as special cases, the relative entropy frame-
work is more general than either of these classes of convex programs. Indeed, REPs
are useful for solving a range of problems to which these other classes of convex
123
Relative entropy optimization and its applications 5
programs are not directly applicable. We illustrate the utility of REPs in the following
diverse application domains:
1. Permanent maximization Given a collection of matrices, find the one with the
largest permanent. Computing the permanent of a matrix is a well-studied problem
that is believed to be computationally intractable. Therefore, we seek approximate
solutions to the problem of permanent maximization.
2. Robust GPs The solution of a GP (3) is sensitive to the input parameters of the prob-
lem. Compute GPs within a robust optimization framework so that the solutions
offer robustness to variability in the problem parameters.
3. Hitting-times in dynamical systems Given a linear dynamical system consisting of
multiple modes and a region of feasible starting points, compute the smallest time
required to hit a specified target set from an arbitrary feasible starting point.
We give a formal mathematical description of each of these problems in Sects. 2
and 3, and we describe computationally tractable solutions based on REPs. Variants of
these questions as well as more restrictive formulations than the ones we consider have
been investigated in previous work, with some techniques proposed for solving these
problems based on GPs. We survey this prior literature, highlighting the limitations
of previous approaches and emphasizing the more powerful generalizations afforded
by the relative entropy formalism.
In Sect. 2, we describe an approximation algorithm for the permanent maximization
problem based on an REP relaxation. We bound the quality of the approximation
provided by our method via Van Der Waerden’s inequality. We contrast our discussion
with previous work in which GPs have been employed to approximate the permanent
of a fixed matrix [41]. In Sect. 3 we describe an REP-based method for robust GPs
in which the coefficients c( j) ’s and the exponents a(i) ’s in a GP (3) are not known
precisely, but instead lie in some known uncertainty set. Our technique enables exact
and tractable solutions of robust GPs for a significantly broader class of uncertainty
sets than those considered in prior work [5,36]. We illustrate the power of these REP-
based approaches for robust GPs by employing them to compute hitting-times in
dynamical systems (these problems can be recast as robust GPs). In recent work Han
et al. [31] have also employed our reformulations of robust GPs (based on an earlier
preprint of our work [12]) to optimally allocate resources to control the worst-case
spread of epidemics in a network; here the exact network is unknown and it belongs
to an uncertainty set.
As yet another application of REPs, we note that REP-based relaxations are useful
for obtaining bounds on the optimal value in signomial programming problems [20].
Signomial programs are a generalization of GPs in which the entries of the coefficient
vectors c( j) ’s in (3) in both the objective and the constraints can take on positive and
negative values. Sums of exponentials in which the coefficients are both positive and
negative are no longer convex functions. As such, signomial programs are intractable
to solve in general (unlike GPs), and NP-hard problems can be reduced to special cases
of signomial programs. In a separate paper [13], we describe a family of tractable REP-
based convex relaxations for computing lower bounds in general signomial programs.
In the present manuscript we do not discuss this application further, and we refer the
interested reader to [13] for more details.
123
6 V. Chandrasekaran, P. Shah
Building on our discussion of relative entropy optimization problems and their appli-
cations, we consider optimization problems specified in terms of the quantum relative
entropy function in Sect. 4. The quantum relative entropy function is a matrix gener-
alization of the function ν log(ν/λ), and the domain of each of its arguments is the
cone of positive semidefinite matrices:
Here log refers to the matrix logarithm. As this function is convex and positively
homogenous, its epigraph defines a natural matrix analog of the relative entropy cone
RE 1 from (1):
QRE d = (N, Λ, δ) ∈ Sd+ × Sd+ × R | D(N, Λ) ≤ δ . (8)
Here Sd+ denotes the cone of d × d positive semidefinite matrices in the space Sd ∼ =
R(d +d)/2 of d × d symmetric matrices. Hence the convex cone QRE d ⊂ Sd ×
2
entropy programs (QREPs) are conic optimization problems specified with respect to
the quantum relative cone QRE d .
The focus of Sect. 4 is on applications of QREPs. Indeed, a broader objective of this
expository article is to initiate an investigation of the utility of the quantum relative
entropy function from a mathematical programming perspective. We begin by survey-
ing previously-studied applications of Von-Neumann entropy optimization, which is a
class of convex programs obtained by restricting the second argument of the quantum
relative entropy function (7) to be the identity matrix. We also describe a Von-Neumann
entropy optimization approach for obtaining bounds on certain capacities associated
with quantum channels; as a demonstration, in Sect. 4.3 we provide a comparison
between a “classical-to-quantum” capacity of a quantum channel and the capacity of a
purely classical channel induced by the quantum channel (i.e., one is restricted to send
and receive only classical information). Finally, we describe a stylized application of
a QREP that exploits the full convexity of the quantum relative entropy function with
respect to both its arguments. This application illustrates some of the key distinctions,
especially in the context of convex duality, between the classical and quantum cases.
As a general remark, we note that there are several types of relative entropy functions
(and associated entropies) that have been studied in the literature in both the classical
and quantum settings. In this article, we restrict our attention in the classical case to
the relative entropy function ν log(ν/λ) corresponding to Shannon entropy [16]. In the
quantum setting, the function D(N, Λ) from (7) is called the Araki–Umegaki relative
entropy [45], and it gives the Von-Neumann entropy when suitably restricted.
123
Relative entropy optimization and its applications 7
This paper is structured as follows. In Sect. 2, we describe our REP relaxation for the
permanent maximization problem, and in Sect. 3 we discuss REP-based approaches
for robust GPs and the application of these techniques to the computation of hitting-
times in dynamical systems. In Sect. 4 we describe QREPs and their applications,
highlighting the similarities and distinctions with respect to the classical case. We
conclude with a brief discussion and open questions in Sect. 5.
for ν ∈ Rn .
n
perm(M) Mi,σ (i) , (10)
σ ∈Sn i=1
where Sn refers to the set of all permutations of elements from the set {1, . . . , n}. The
matrix permanent arises in combinatorics as the sum over weighted perfect matchings
in a bipartite graph [43], in geometry as the mixed volume of hyperrectangles [8], and
in multivariate statistics in the computation of order statistics [1]. In this section we
consider the problem of maximizing the permanent over a family of matrices. This
problem has received some attention for families of positive semidefinite matrices with
specified eigenvalues [19,28], but all of those works have tended to seek analytical
solutions for very special instances of this problem. Here we consider permanent maxi-
mization over general convex subsets of the set of nonnegative matrices. (As a parallel,
recall that SDPs are useful for maximizing the determinant over an affine section of
the cone of symmetric positive semidefinite matrices [54].) Permanent maximization
is relevant for designing a bipartite network in which the average weighted matching
is to be maximized subject to additional topology constraints on the network. This
problem also arises in geometry in finding a configuration of a set of hyperrectangles
so that their mixed volume is maximized.
123
8 V. Chandrasekaran, P. Shah
To begin with, we note that even computing the permanent of a fixed matrix is a #P-hard
problem and it is therefore considered to be intractable in general [53]; accordingly
a large body of research has investigated approximate computation of the permanent
[2,30,38,41]. The class of elementwise nonnegative matrices has particularly received
attention as these arise most commonly in application domains (e.g., bipartite graphs
with nonnegative edge weights). For such matrices, several deterministic polynomial-
time approximation algorithms provide exponential-factor (in the size n of the matrix)
approximations of the permanent, e.g. [41]. The approach in [41] describes a technique
based on solving a GP to approximate the permanent of a nonnegative matrix. The
approximation guarantees in [41] are based on Van Der Waerden’s inequality, which
states that the matrix M = n1 11 has the smallest permanent among all n × n doubly
stochastic matrices (nonnegative matrices with all row-sums and column-sums equal to
one). This inequality was proved originally in the 1980s [21,23], and Gurvits recently
gave a very simple and elegant proof of this result [29]. In what follows, we describe
the approach of [41], but using the terminology developed by Gurvits in [29].
n×n
The permanent of a matrix M ∈ R+ can be defined in terms of a particular
coefficient of the following homogenous polynomial in y ∈ Rn+ :
⎛ ⎞
n n
pM (y1 , . . . , yn ) = ⎝ Mi, j y j ⎠ . (11)
i=1 j=1
∂ n pM (y1 , . . . , yn )
perm(M) = .
∂y1 · · · ∂yn
In his proof of Van Der Waerden’s inequality, Gurvits defines the capacity of a homoge-
nous polynomial p(y1 , . . . , yn ) of degree n over y ∈ Rn+ as follows:
p(y)
cap( p) infn = inf p(y). (12)
y∈R+ y1 · · · yn y∈Rn+ , y1 ···yn =1
n!
cap( pM ) ≤ perm(M) ≤ cap( pM ).
nn
Here the polynomial pM and its capacity cap( pM ) are as defined in (11) and (12).
Further if each column of M has at most k nonzeros, then the factor in the lower bound
(k−1)n
can be improved from nn!n to k−1
k .
123
Relative entropy optimization and its applications 9
Gurvits in fact proves a more general statement involving so-called stable polynomials,
but the above restricted version will suffice for our purposes. The upper-bound in this
statement is straightforward to prove; it is the lower bound that is the key technical
novelty. Thus, if one could compute the capacity of the polynomial pM associated to
a nonnegative matrix M, then one can obtain an exponential-factor approximation of
the permanent of M as nn!n ≈ exp(−n). In order to compute the capacity of pM via
GP, we apply the transformation xi = log(yi ), i = 1, . . . , n in (12) and solve the
following program:1
⎛ ⎞
n n
log(cap( pM )) = inf log ⎝ Mi, j exp(x j )⎠ s.t. 1 x = 0
x∈Rn
i=1 j=1
n
= infn Mi, j exp(x j − β i − 1) + 1 β s.t. 1 x = 0. (13)
x∈R
β∈Rn i, j=1
n! n!
perm(M̂perm ) ≤ n cap( pM̂cap ) ≤ perm(M̂perm ).
nn n
n! k−1 (k−1)n
The factor in the lower bound can be improved from nn to k if every matrix
in M has at most k nonzeros in each column.
1 Such a logarithmic transformation of the variables is also employed in converting a GP specified in terms
of non-convex posynomial functions to a GP in convex form; see [11,20] for more details.
123
10 V. Chandrasekaran, P. Shah
n! n!
perm M̂ perm ≤ cap p M̂
nn nn perm
n!
≤ n cap pM̂cap
n
≤ perm M̂cap
≤ perm M̂perm .
The first and third inequalities are a result of Theorem 1, and the second and fourth
inequalities follow from the definitions of M̂cap and of M̂perm respectively. The
improvement in the lower bound also follows from Theorem 1.
In summary, maximizing the capacity with respect to the family M gives a matrix
in M that approximately maximizes the permanent over all matrices in M. As the
computation of the capacity cap( pM ) of a fixed matrix M ∈ M involves the solution
of a GP, the maximization of cap( pM ) over the set M involves the maximization of
the optimal value of the GP (13) over M ∈ M:
⎡ ⎤
n
⎢ ⎥
sup log(cap( pM )) = sup ⎣ inf Mi, j exp(x j − β i − 1) + 1 β ⎦
M∈M M∈M x∈Rn ,1 x=0
β∈Rn i, j=1
(15)
Rewriting the inner convex program as an REP (as in Sect. 1.1 in the introduction),
we have that:
s.t. 1x=0
x j − β i − 1 ≤ log(Yi, j ), i, j = 1, . . . , n.
The dual of this REP reformulation of the GP (13) is the following optimization
problem based on a straightforward calculation:
n
i, j
log(cap( pM )) = sup − i, j log
∈Rn×n Mi, j
+ i, j=1
n
s.t. i, j = 1, i = 1, . . . , n
j=1
n n
i,1 = ··· = i,n . (16)
i=1 i=1
123
Relative entropy optimization and its applications 11
The constraint on the last line requires that the column-sums be equal to each other.
n
As the relative entropy function d( , M) = i, j=1 i, j log Mi, j is convex with
i, j
s.t. M∈M
n
j=1 i, j= 1, i = 1, . . . , n
n n
i=1 i,1 = · · · = i=1 i,n .
n×n
If the set M ⊂ R+ is convex, this problem is a convex program. Further, if M
can be represented tractably, then this convex program can be solved efficiently. We
record these observations in the following proposition:
n×n
Proposition 2 Suppose that M ⊂ R+ has a conic representation:
n×n
M = M ∈ Rn×n | M ∈ R+ , ∃z ∈ Rm s.t. g + L(M, z) ∈ K
s.t. g + L(M, z) ∈ K
n
j=1 i, j= 1, i = 1, . . . , n
n n
i=1 i,1 = · · · = i=1 i,n .
Suppose the set M has a tractable LP, GP, or SOCP representation so that the convex
cone K in the above proposition is the relative entropy cone. Then the convex program
in Proposition 2 is an REP that can be solved efficiently. If the set M has a tractable
SDP representation, then K is the cone of positive semidefinite matrices; in such cases,
the program in Proposition 2 is no longer an REP, but it can still be solved efficiently
via interior-point methods by combining logarithmic barrier functions for the positive
semidefinite cone and for the relative entropy cone [44].
As our next application, we describe the utility of REPs in addressing the problem
of computing solutions to GPs that are robust to uncertainty in the input parameters
123
12 V. Chandrasekaran, P. Shah
of the GP. Robust GPs arise in power control problems in communication systems
[14] as well as in robust digital circuit gate sizing [10]. Specifically, a robust GP is an
optimization problem in which a positive sum of exponentials is minimized subject to
affine constraints and constraints of the following form in a decision variable x ∈ Rn :
k
sup ci exp a(i) x ≤ 1. (17)
[c,a(1) ,...,a(k) ]∈U i=1
Here the set U ⊂ Rk+ × Rnk specifies the possible uncertainty in the coefficients c and
in the exponents a(1) , . . . , a(k) . In principle, constraints of the type (17) specify convex
sets in a decision variable x because they can be viewed as the intersection of a (possibly
infinite) collection of convex constraints. However, this observation does not lead
directly to efficient methods for the numerical solution of robust GPs, as constraints
of the form (17) are not tractable to describe in general. For example, if U is a finite
set then the constraint (17) reduces to a finite collection of constraints on positive
sums of exponentials; however, if U consists of infinitely
many elements, then the
k (i)
expression sup[c,a(1) ,...,a(k) ]∈U i=1 ci exp a x may not be efficiently specified
in closed form. Thus, the objective in robust GPs is to obtain tractable reformulations
of constraints of the form (17) via a small, finite number of inequalities with a tractable
description.
Such optimization problems in which one seeks solutions that are robust to parame-
ter uncertainty have been extensively investigated in the field of robust optimization
[3,5], and exact, tractable reformulations of robust convex programs are available in
a number of settings, e.g., for robust LPs. However, progress has been limited in the
context of robust GPs. We discuss prior work on this problem in Sect. 3.1, highlighting
some of the shortcomings, and we describe the more powerful generalizations afforded
by REP-based reformulations in Sect. 3.2. In Sect. 3.3, we illustrate the utility of these
reformulations in estimating hitting-times in dynamical systems.
In their seminal work on robust convex optimization [5], Ben-Tal and Nemirovski
obtained an exact, tractable reformulation of robust GPs in which a very restricted
form of coefficient uncertainty is allowed in a positive sum-of-exponentials function—
specifically, they assume that the uncertainty set U is decomposed as U = C × {ā(1) } ×
· · · × {ā(k) }, where each ā(i) ∈ Rn , i = 1, . . . , k, and C ⊂ Rk+ is a convex ellipsoid
in Rk+ specified by a quadratic form defined by an elementwise nonnegative matrix.
Thus, the exponents are assumed to be known exactly (i.e., there is no uncertainty in
these exponents) and the uncertainty in the coefficients is specified by a very particular
type of ellipsoidal set. The reformulation given in [5] for such robust GPs is itself a
GP with additional variables, which can be solved efficiently.
In subsequent work, Hsiung et al. [36] considered sum-of-exponentials functions
with the coefficients absorbed into the exponent as follows:
123
Relative entropy optimization and its applications 13
k
sup exp a(i) x + di ≤ 1
[d,a(1) ,...,a(k) ]∈D i=1
For such constraints with D being either a polyhedral set or an ellipsoidal set, Hsiung
et al. [36] obtain tractable but inexact reformulations via piecewise linear approxima-
tions, with the reformulations again being GPs.
A reason for the limitations in these previous works—very restrictive forms of
uncertainty sets in [5] and inexact reformulations in [36]—is that they considered
GP-based reformulations of the inequality (17). In the next section, we discuss exact
and tractable REP-based reformulations of the inequality (17) for a general class of
uncertainty sets.
We describe exact and efficient REP-based reformulations of robust GPs for settings
in which the uncertainty set U is decoupled as follows:
for convex cones K(i) ⊂ Ri , linear maps F(i) : Rn → Ri , linear maps H(i) : Rm i →
Ri , and g(i) ∈ Ri . Further, assume that there exists a point (c̄, z̄) ∈ Rk × Rm C
satisfying the conic and nonnegativity constraints in C strictly, and similarly that
there exists a strictly feasible point in each E (i) . Then we have that x ∈ Rn satisfies
the constraint (17) with U = C × E (1) × · · · × E (k) if and only if there exist ζ ∈
RC , θ (i) ∈ Ri for i = 1, . . . , k such that:
123
14 V. Chandrasekaran, P. Shah
†
Note Here F† , F(i) , and H† represent the adjoints of the operators F, F(i) , and H
respectively. The cones KC and K(i) denote the duals of the cones KC and K(i)
respectively. The assumption that there exist points that strictly satisfy the constraints
specifying C and those specifying each E (i) allows us to appeal to strong duality in
deriving our result [48]. We give a concrete illustration of the power of this result in
the sequel as well as an application to hitting-time estimation in dynamical systems
in Sect. 3.3.
Proof The constraint (17) can be reformulated as follows for uncertainty sets U that
are decomposable according to (18):
∃y ∈ Rk s.t. supc∈C y c ≤ 1
supa(i) ∈E (i) a(i) x ≤ log(yi ), i = 1, . . . , k.
This restatement is possible because the set C is a subset of the nonnegative orthant
Rk+ , and because the uncertainty sets E (i) are decoupled (from C and from each other)
and are therefore independent of each other. The first expression, supc∈C y c ≤ 1, is a
universal quantification for all c ∈ C. In order to convert this universal quantifier to an
existential quantifier, we appeal to convex duality as is commonly done in the theory
of robust optimization [3,5]. Specifically, by noting that C has a conic representation
and by appealing to conic duality [7], we have that:
∀c ∈ C, y c ≤ 1
∃ζ ∈ R s.t. F†C (ζ ) + y ≤ 0, H†C (ζ ) = 0, ζ ∈ KC , g ζ ≤ 1. (19)
The assumptions on strict feasibility are required to derive (19) and (20). Combining
these results and eliminating y, we have the desired conclusion.
123
Relative entropy optimization and its applications 15
sets are decomposable according to (18) and are tractably specified as polyhedral
or ellipsoidal sets (or more generally, sets that are tractably represented via relative
entropy inequalities) can be reformulated exactly and efficiently via REPs. (As before,
robust GPs in which the cones KC and K(i) are semidefinite cones are not directly
reformulated as REPs, but can nonetheless be solved efficiently by combining barrier
penalties for the relative entropy cone and the semidefinite cone.)
In contrast to previous results on robust GP, note that the form of uncertainty for
which we obtain an efficient and exact REP reformulation is significantly more general
than that considered in [5], in which C can only be a restricted kind of ellipsoidal set
and each E (i) must be a singleton set (no uncertainty). On the other hand, Hsiung et
al. [36] consider robust GPs with polyhedral or ellipsoidal uncertainty that may be
coupled across E (i) , but their reformulation is inexact. A distinction of our approach
relative to those in [5] and in [36] is that our reformulation is a REP, while those
described in [5] and in [36] are GPs. In particular, note that some of the inequalities
described in the reformulation in Proposition 3 involve a combination of an affine term
and a logarithmic term; such inequalities cannot be represented via GPs, and it is the
additional expressive power provided by REPs that enables the efficient reformulation
and solution of the general class of robust GPs considered here.
for G(i) ∈ S and linear operators F(i) : Rn → S . With these uncertainty sets, we
have from Proposition 3 that
k
supc∈C ,a(i) ∈E (i) i=1 ci exp a(i) x ≤ 1
(i)
∃Z ∈ and Θ ∈ S , i = 1, . . . , k s.t.
S
Tr(GZ) ≤ 1, Z 0, and for each i = 1, . . . , k,
†
F(i) Θ (i) + x = 0, Tr(G(i) Θ (i) ) ≤ log(−[F† (Z)]i ), Θ (i) 0.
Note that the uncertainty sets C and E (i) are SDP-representable sets. The correspond-
ing robust GP inequality (17) cannot be handled by the previous approaches [5,36]
described in Sect. 3.1, but it can be efficiently reformulated via semidefinite and rela-
tive entropy inequalities.
123
16 V. Chandrasekaran, P. Shah
where the state x(t) ∈ Rn for t ≥ 0. We assume throughout this section that the transi-
tion matrix A ∈ Rn×n is diagonal; otherwise, one can always change to the appropriate
modal coordinates given by the eigenvectors of A (assuming A is diagonalizable). The
diagonal entries of A are called the modes of the system. Suppose that the parameters
of the dynamical system can take on a range of values with A ∈ A and x(0) ∈ Xinitial ;
the set A specifies the set of modes and the set Xinitial ⊆ Rn specifies the set of initial
conditions. Suppose also that we are given a target set Xtarget ⊆ Rn , and we wish
to find the smallest time required for the system to reach a state in Xtarget from an
arbitrary initial state in Xinitial . Formally, we define the worst-case hitting-time of the
dynamical system (21) to be:
τ (Xinitial , Xtarget , A) inf t ∈ R+ | ∀x(0) ∈ Xinitial , A ∈ A we have
exp (At) x(0) ∈ Xtarget . (22)
Indeed, for an initial state x(0), the state of the system (21) at time t is given by x(t) =
exp (At) x(0); consequently, the quantity τ (Xinitial , Xtarget , A) represents the amount
of time that the worst-case trajectory of the system, taken over all initial conditions
Xinitial and mode values A, requires to enter the target set Xtarget . An assumption
underlying this definition is that the target set Xtarget is absorbing so that once a
trajectory enters Xtarget it remains in Xtarget for all subsequent time.
Hitting-times are of interest in system analysis and verification [47]. As an example,
suppose that a system has the property that τ (Xinitial , Xtarget , A) = ∞; this pro-
vides a proof that from certain initial states in Xinitial , the system never enters the
target region Xtarget . On the other hand, if the hitting-time τ (Xinitial , Xtarget , A) = 0,
we have a certificate that Xtarget ⊆ Xinitial . While verification of linear systems has
been extensively studied via Lyapunov and barrier certificate methods, the study of
hitting-times has received relatively little attention, with a few exceptions such as
[56]. In particular, the approaches in [56] can lead to loose bounds as the worst-case
hitting-time is computed based on box-constrained outer approximations of Xinitial and
Xtarget .
For a particular class of dynamical systems, we show next that the hitting-time
can be computed exactly by solving an REP. Specifically, we make the following
assumptions regarding the structure of the dynamical system (21):
– The set of modes is given by
A = A ∈ Rn×n | A diagonalwith A j, j ∈ j , u j ∀ j = 1, . . . , n
with j , u j ≤ 0 ∀ j = 1, . . . , n.
123
Relative entropy optimization and its applications 17
– The set of initial states Xinitial ⊆ Rn+ is given by a convex set with a tractable
representation via affine and conic constraints. In particular, as in Proposition 3,
Xinitial is specified as follows:
Each constraint here is a robust GP inequality of the form (17) with the uncertainty set
being decomposable according to (18). Consequently, the hitting-time computation
problem for the particular family of dynamical systems we consider can be reformu-
lated as an REP [possibly with additional conic constraints depending on the cone K
in (23)] by appealing to Proposition 3.
Example Consider a dynamical system with three modes that each take on values in
certain ranges as follows:
A = A ∈ R3×3 | A diagonal, A1,1 ∈ [−0.4, −0.45],
A2,2 ∈ [−0.5, −0.6], A3,3 ∈ [−0.6, −0.7] .
The set of initial states is a shifted elliptope that is contained within the nonnegative
orthant:
⎧ ⎡ ⎤ ⎫
⎨ 1 x1 (0) − 3 x2 (0) − 3 ⎬
Xinitial = x(0) ∈ R3+ : ⎣ x1 (0) − 3 1 x3 (0) − 3 ⎦ 0 .
⎩ ⎭
x2 (0) − 3 x3 (0) − 3 1
123
18 V. Chandrasekaran, P. Shah
Fig. 1 Some sample trajectories of a linear system from Xinitial to Xtarget for the dynamical system
described in the example in Sect. 3.3. The (shifted) elliptope specifies the set of feasible starting points,
while the tetrahedron specifies the target set. The system consists of three modes
For these system attributes, a few sample trajectories are shown in Fig. 1. We solve the
REP (24) and obtain that τ (Xinitial , Xtarget , A) = 7.6253 is the worst-case hitting-time.
123
Relative entropy optimization and its applications 19
problem provides a natural setting to examine the similarities and differences between
the classical and the quantum relative entropy functions.
As this map is positively homogenous, its epigraph gives the following power cone in
R3 :
P α = {(ν, λ, δ) ∈ R+ × R+ × R | − ν α λ1−α ≤ δ}. (26)
Theorem 2 [40] For each fixed X ∈ Rd×d and α ∈ [0, 1], the following function is
convex on Sd+ × Sd+ :
(N, Λ) → −Tr(X Nα XΛ1−α ).
As before, the quantum relative entropy function (7) may be viewed as a limiting case
as follows:
D(N, Λ) = lim 1−α 1
[−Tr(Nα Λ1−α ) + Tr(N)].
α→1
Indeed, this perspective of the quantum relative entropy function offers a proof of its
joint convexity with respect to both arguments, as the function −Tr(Nα Λ1−α )+Tr(N)
is jointly convex with respect to (N, Λ) for each α ∈ [0, 1].
123
20 V. Chandrasekaran, P. Shah
H (N) = −D(N, I)
(28)
= −Tr[N log(N)].
g(N) ≤ t
∃x ∈ R s.t.
n
f (x) ≤ t
x1 ≥ · · · ≥ xn
sr (N) ≤ x1 + · · · + xr , r = 1, . . . , n − 1
Tr(N) = x1 + · · · + xn ,
123
Relative entropy optimization and its applications 21
This minimization is subject to the decision variable M being positive definite as well
as constraints on M that impose bounds on similarities between pairs of entities in
123
22 V. Chandrasekaran, P. Shah
the linear space. We refer the reader to [39] for more details, and for the virtues of
minimizing the quantum relative entropy as a method for learning kernel matrices. In
the context of the present paper, the optimization problems considered in [39] can be
viewed as Von-Neumann entropy maximization problems subject to linear constraints,
as the second argument in the quantum relative entropy function (29) is fixed.
Here H is the Von-Neumann entropy function (28) and the number of states v(i) is
part of the optimization.
We note that there are several kinds of capacities associated with a quantum channel,
depending on the protocol employed for encoding and decoding states transmitted
across the channel; the version considered here is the one most commonly investigated,
and we refer the reader to Shor’s survey [51] for details of the other types of capacities.
2 In full generality, density matrices are trace-one, positive semidefinite Hermitian matrices, and A( j) ∈
Ck×n . As with SDPs, the Von-Neumann entropy optimization framework can handle linear matrix inequal-
ities on Hermitian matrices, but we stick with the real case for simplicity.
123
Relative entropy optimization and its applications 23
The quantity C(L) (30) on which we focus here is called the C1,∞ quantum capacity—
roughly speaking, it is the capacity of a quantum channel in which the sender cannot
couple inputs across multiple uses of the channel, while the receiver can jointly decode
messages received over multiple uses of the channel.
Shor’s approach for lower bounds via LP In [51] Shor describes a procedure based
on LP to provide lower
mbounds on C(L). As the first step of this method, one fixes a
finite set of states v(i) i=1 with each v(i) ∈ S n and a density matrix ρ ∈ Sn , so that ρ
is in the convex hull conv({v(i) v(i) }i=1
m ). With these quantities fixed, one can obtain
In [51] Shor also suggests local heuristics to search for better sets
of states and den-
m
sity matrices to improve the lower bound (31). It is clear that C L, v(i) i=1 , ρ as
computed in (31) is a lower bound on C(L). Indeed, one can check that:
C(L) = sup sup
C L, v(i) , ρ . (32)
v(i) ∈S n ρ∈conv v(i) v(i)
Here the number of states is not explicitly denoted, and it is also a part of the opti-
mization.
bound (31) by further optimizing over the set of density matrices ρ in the convex
hull conv({v(i) v(i) }i=1
m ). Our improved lower bound entails the solution of a Von-
Neumann entropy
m optimization problem. Specifically, we observe that for a fixed set
of states v(i) i=1 the following optimization problem involves Von-Neumann entropy
maximization subject to affine constraints:
m m
C L, v(i) =
sup m
C L, v(i) ,ρ
i=1 i=1
ρ∈conv v(i) v(i)
i=1
& ' m
()
(i) (i)
= sup
H L pi v v
m
pi ≥0, i=1 pi =1 i=1
m
− pi H L v(i) v(i) . (33)
i=1
123
24 V. Chandrasekaran, P. Shah
can be folded into the computation of (31) at the expense of solving a Von-Neumann
entropy maximization problem instead of an LP. It is easily seen from (30), (32), and
(33) that for a fixed set of states {v(i) }i=1
m :
m m
C(L) ≥ C L, v(i) ≥ C L, v(i) ,ρ .
i=1 i=1
matrix ρ ∈ conv({v(i) v(i) }i=1m ), and one obtains a lower bound on the C
1,∞ quan-
tum capacity C(L) (30) by optimizing over decompositions of ρ in terms of convex
combinations of elements of the set {v(i) v(i) }i=1 m . In contrast, in our method we
involves the solution of an LP, while the improved bound using our approach comes
at the cost of solving a Von-Neumann entropy optimization problem. The question of
computing the C1,∞ capacity C(L) (30) exactly in a tractable fashion remains open.
123
Relative entropy optimization and its applications 25
Fig. 2 A sequence of increasingly tighter lower bounds on the quantum capacity of the channel specified
by (34), obtained by computing classical-to-quantum capacities with increasingly larger collections of input
states
For diagonal input density matrices ρ (i.e., classical inputs), this map specifies a
classical channel induced by the quantum channel (34), as the output is also restricted
to be a diagonal density matrix (i.e., a classical output). Figure 3 shows two plots for
ranging from 0 to 1 of both the classical-to-quantum capacity of (34) and the classical
capacity of the induced classical channel (35). Note that for = 0 the output of the
operator (34) is diagonal if the input is given by a diagonal density matrix. Therefore,
the two curves coincide at = 0 in Fig. 3. For larger values of , the classical-to-
123
26 V. Chandrasekaran, P. Shah
Fig. 3 Comparison of a classical-to-quantum capacity of the quantum channel specified by (34) and the
classical capacity of a classical channel induced by the quantum channel given by (34)
quantum channel is greater than the capacity of the induced classical channel, thus
demonstrating the improvement in capacity as one goes from a classical-to-classical
communication protocol to a classical-to-quantum protocol. (The C1,∞ capacity C(L)
(34) of the channel (33) is in general even larger than the classical-to-quantum capacity
computed here.)
123
Relative entropy optimization and its applications 27
In the inner optimization problem, the constraint set specified by the inequality Y
log(Z) is a convex set as the matrix logarithm function is operator concave. The inner
problem is a non-commutative analog of an unconstrained GP (3); the set M, over
which the outer supremum is taken, specifies “coefficient uncertainty.” Hence, the
nested optimization problem (36) is a matrix equivalent of a robust unconstrained
GP with coefficient uncertainty. To see this connection more precisely, consider the
analogous problem over vectors for a convex set C ⊂ Rn+ and a collection {b( j) }nj=1 ⊂
Rk : ⎡ ⎤
⎢ ⎥
⎢ n
⎥
f (C; {b( j) }nj=1 ) = sup ⎢
⎢ infn c z s.t. y ≤ log(z),
( j)
b x j = y⎥
⎥. (37)
c∈C ⎣ x∈Rk j=1 ⎦
y∈R
z∈Rk+
The inner optimization problem here is simply an unconstrained GP, and the set
C specifies coefficient uncertainty. The reason for this somewhat non-standard
description—an equivalent, more familiar specification
of(37) via sums of exponen-
k
tials as in (3) would be supc∈C inf x∈Rn i=1 ci exp a(i) x , where a(i) is the i’th row
of the k × n matrix consisting of the b( j) ’s as columns—is to make a more transparent
connection between the matrix case (36) and the vector case (37).
Our method for obtaining bounds on F(M; {B( j) }nj=1 ) is based on a relationship
between the constraint Y log(Z) and the quantum relative entropy function via
convex conjugacy. To begin with, we describe the relationship between the vector
constraint y ≤ log(z) and classical relative entropy. Consider the following character-
istic function for y ∈ Rk and z ∈ −Rk+ :
*
0, if y ≤ log(−z)
χaff-log (y, z) = (38)
∞, otherwise.
The proof of this lemma follows from a straightforward calculation. Based on this
result and by computing the dual of the inner convex program in (37), we have that
the function f (C; {b( j) }nj=1 ) can be computed as the optimal value of an REP:
Moving to the matrix case, consider the natural matrix analog of the characteristic
function (38) for Y ∈ Sk and Z ∈ −Sk+ :
*
0, if Y log(−Z)
χmat-aff-log (Y, Z) = (40)
∞, otherwise.
123
28 V. Chandrasekaran, P. Shah
Here (i) follows from the Golden–Thompson inequality (42), and the equality (ii)
follows from the fact that the optimal U in the previous line is N. Consequently, one
can check that if N and Λ are simultaneously diagonalizable, then the inequality (i)
becomes an equality.
Thus, the non-commutativity underlying the quantum relative entropy function in
contrast to the classical case results in D(N, eΛ) only being an upper bound (in gen-
eral) on the convex conjugate χmat-aff-log (N, Λ). From the perspective of this result,
123
Relative entropy optimization and its applications 29
Lemma 1 follows from the observation that the relative entropy between two nonneg-
ative vectors can be viewed as the quantum relative entropy between two diagonal
positive semidefinite matrices (which are, trivially, simultaneously diagonalizable).
Based on Proposition 5 and again appealing to convex duality, the function
F(M; {B( j) }nj=1 ) can be bounded below by solving a QREP as follows:
The quantity on the right-hand-side can be computed, for example, via projected
coordinate descent. If the matrices {B( j) }nj=1 ∪ {C} are simultaneously diagonalizable
for each C ∈ M, then the QREP lower bound (43) is equal to F(M; {B( j) }nj=1 ). In
summary, we have that REPs are useful for computing f (C; {b( j) }nj=1 ) exactly, while
QREPs only provide a lower bound via (43) of F(M; {B( j) }nj=1 ) in general.
5 Further directions
There are several avenues for further research that arise from this paper. It is of interest
to develop efficient numerical methods to solve REPs and QREPs in order to scale
to large problem instances. Such massive size problems are especially prevalent in
data analysis tasks, and are of interest in settings such as kernel learning. On a related
note, there exists a vast literature on exploiting the structure of a particular problem
instance of an SDP or a GP—e.g., sparsity in the problem parameters—which can result
in significant computational speedups in practice in the solution of these problems.
A similar set of techniques would be useful and relevant in all of the applications
described in this paper.
We also seek a deeper understanding of the expressive power of REPs and QREPs,
i.e., of conditions under which convex sets can be tractably represented via REPs
and QREPs as well as obstructions to efficient representations in these frameworks
(in the spirit of similar results that have recently been obtained for SDPs [9,27,34]).
Such an investigation would be useful in identifying problems in statistics and infor-
mation theory that may be amenable to solution via tractable convex optimization
techniques.
Acknowledgements The authors would like to thank Pablo Parrilo and Yong-Sheng Soh for helpful
conversations, and Leonard Schulman for pointers to the literature on Von-Neumann entropy. Venkat Chan-
drasekaran was supported in part by National Science Foundation Career award CCF-1350590 and Air
Force Office of Scientific Research grant FA9550-14-1-0098.
6 Appendix
123
30 V. Chandrasekaran, P. Shah
Combining this reformulation with the next result gives us the description (6).
Proof We have that ac bc ∈ S2+ if and only if a z̄12 + bz̄22 + 2cz̄1 z̄2 ≥ 0, ∀z̄ ∈ R2 . This
latter condition can in turn be rewritten to obtain the following reformulation:
ac
∈ S2+ ⇔ az12 +bz22 +2cz1 z2 ≥ 0 and az12 +bz22 −2cz1 z2 ≥ 0 ∀z ∈ R2+ . (44)
cb
References
1. Bapat, R.B., Beg, M.I.: Order statistics for nonidentically distributed variables and permanents.
Sankhya Indian J. Stat. A 51, 79–93 (1989)
2. Barvinok, A.I.: Computing mixed discriminants, mixed volumes, and permanents. Discrete Comput.
Geom. 18, 205–237 (1997)
3. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton
(2009)
4. Ben-Tal, A., Nemirovski, A.: Optimal design of engineering structures. Optima. 47, 4–8 (1995)
5. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23, 769–805 (1998)
6. Ben-Tal, A., Nemirovski, A.: On polyhedral approximations of the second-order cone. Math. Oper.
Res. 26, 193–205 (2001)
123
Relative entropy optimization and its applications 31
7. Ben-Tal, A., Nemirovskii, A.: Lectures on Modern Convex Optimization. Society for Industrial and
Applied Mathematics, Philadelphia (2001)
8. Betke, U.: Mixed volumes of polytopes. Arch. Math. 58, 388–391 (1992)
9. Blekherman, G., Parrilo, P., Thomas, R.: Semidefinite Optimization and Convex Algebraic Geometry.
Society for Industrial and Applied Mathematics, Philadelphia (2013)
10. Boyd, S., Kim, S.J., Patil, D., Horowitz, M.: Digital circuit optimization via geometric programming.
Oper. Res. 53, 899–932 (2005)
11. Boyd, S., Kim, S.J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng.
8, 67–127 (2007)
12. Chandrasekaran, V., Shah, P.: Conic geometric programming. In: Proceedings of the Conference on
Information Sciences and Systems (2014)
13. Chandrasekaran, V., Shah, P.: Relative entropy relaxations for signomial optimization. SIAM J. Optim.
(2014)
14. Chiang, M.: Geometric programming for communication systems. Found. Trends Commun. Inf. Theory
2, 1–154 (2005)
15. Chiang, M., Boyd, S.: Geometric programming duals of channel capacity and rate distortion. IEEE
Trans. Inf. Theory 50, 245–258 (2004)
16. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (2006)
17. Cox, D.R.: The regression analysis of binary sequences. J. R. Stat. Soc. 20, 215–242 (1958)
18. Dinkel, J.J., Kochenberger, G.A., Wong, S.N.: Entropy maximization and geometric programming.
Environ. Plan. A 9, 419–427 (1977)
19. Drew, J.H., Johnson, C.R.: The maximum permanent of a 3-by-3 positive semidefinite matrix, given
the eigenvalues. Linear Multilinear Algebra 25, 243–251 (1989)
20. Duffin, R.J., Peterson, E.L., Zener, C.M.: Geometric Programming: Theory and Application. Wiley,
New York (1967)
21. Egorychev, G.P.: Proof of the Van der Waerden conjecture for permanents (english translation; original
in russian). Sib. Math. J. 22, 854–859 (1981)
22. El Ghaoui, L., Lebret, H.: Robust solutions to least-squares problems with uncertain data. SIAM J.
Matrix Anal. Appl. 18, 1035–1064 (1997)
23. Falikman, D.I.: Proof of the Van der Waerden conjecture regarding the permanent of a doubly stochastic
matrix (english translation; original in russian). Math. Notes 29, 475–479 (1981)
24. Glineur, F.: An extended conic formulation for geometric optimization. Found. Comput. Decis. Sci.
25, 161–174 (2000)
25. Golden, S.: Lower bounds for the Helmholtz function. Phys. Rev. Ser. II 137, B1127–B1128 (1965)
26. Gonalves, D.S., Lavor, C., Gomes-Ruggiero, M.A., Cesrio, A.T., Vianna, R.O., Maciel, T.O.: Quantum
state tomography with incomplete data: maximum entropy and variational quantum tomography. Phys.
Rev. A 87 (2013)
27. Gouveia, J., Parrilo, P., Thomas, R.: Lifts of convex sets and cone factorizations. Math. Oper. Res. 38,
248–264 (2013)
28. Grone, R., Johnson, C.R., Eduardo, S.A., Wolkowicz, H.: A note on maximizing the permanent of
a positive definite hermitian matrix, given the eigenvalues. Linear Multilinear Algebra 19, 389–393
(1986)
29. Gurvits, L.: Van der Waerden/Schrijver-Valiant like conjectures and stable (aka hyperbolic) homoge-
neous polynomials: one theorem for all. Electron. J. Comb. 15 (2008)
30. Gurvits, L., Samorodnitsky, A.: A deterministic algorithm for approximating the mixed discriminant
and mixed volume, and a combinatorial corollary. Discrete Comput. Geom. 27, 531–550 (2002)
31. Han, S., Preciado, V.M., Nowzari, C., Pappas, G.J.: Data-Driven Network Resource Allocation for
Controlling Spreading Processes. IEEE Trans. Netw. Sci. Eng. 2(4), 127–38 (2015)
32. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Berlin (2008)
33. Hellwig, K., Krauss, K.: Operations and measurements II. Commun. Math. Phys. 16, 142–147 (1970)
34. Helton, J.W., Nie, J.: Sufficient and necessary conditions for semidefinite representability of convex
hulls and sets. SIAM J. Optim. 20, 759–791 (2009)
35. Holevo, A.S.: The capacity of the quantum channel with general signal states. IEEE Trans. Inf. Theory
44, 269–273 (1998)
36. Hsiung, K.L., Kim, S.J., Boyd, S.: Tractable approximate robust geometric programming. Optim. Eng.
9, 95–118 (2008)
37. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. Ser. II 106, 620–630 (1957)
123
32 V. Chandrasekaran, P. Shah
38. Jerrum, M., Sinclair, A., Vigoda, E.: A polynomial-time approximation algorithm for the permanent
of a matrix with non-negative entries. J. ACM 51, 671–697 (2004)
39. Kulis, B., Sustik, M., Dhillon, I.: Low-rank kernel learning with Bregman matrix divergences. J. Mach.
Learn. Res. 10, 341–376 (2009)
40. Lieb, E.: Convex trace functions and the Wigner–Yanase–Dyson conjecture. Adv. Math. 11, 267–288
(1973)
41. Linial, N., Samorodnitsky, A., Wigderson, A.: A deterministic strongly polynomial algorithm for matrix
scaling and approximate permanents. Combinatorica 20, 545–568 (2000)
42. Lobo, M., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming.
Linear Algebra Appl. 284, 193–228 (1998)
43. Minc, H.: Permanents. Cambridge University Press, Cambridge (1984)
44. Nesterov, Y., Nemirovski, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society
of Industrial and Applied Mathematics, Philadelphia (1994)
45. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information. Cambridge University
Press, Cambridge (2011)
46. Potchinkov, A.W., Reemsten, R.M.: The design of FIR filters in the complex plane by convex opti-
mization. Signal Process. 46, 127–146 (1995)
47. Prajna, S., Jadbabaie, A.: Safety Verification of Hybrid Systems Using Barrier Certificates. In: Hybrid
Systems: Computation and Control, pp. 477–492. Springer (2004)
48. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
49. Schumacher, B., Westmoreland, M.D.: Sending classical information via noisy quantum channels.
Phys. Rev. A 56, 131–138 (1997)
50. Scott, C.H., Jefferson, T.R.: Trace optimization problems and generalized geometric programming. J.
Math. Anal. Appl. 58, 373–377 (1977)
51. Shor, P.W.: Capacities of quantum channels and how to find them. Math. Program. B 97, 311–335
(2003)
52. Thompson, C.J.: Inequality with applications in statistical mechanics. J. Math. Phys. 6, 1812–1813
(1965)
53. Valiant, L.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)
54. Vandenberghe, L., Boyd, S., Wu, S.: Determinant maximization with linear matrix inequality con-
straints. SIAM J. Matrix Anal. Appl. 19, 499–533 (1998)
55. Wall, T., Greening, D., Woolsey, R.: Solving complex chemical equilibria using a geometric program-
ming based technique. Oper. Res. 34, 345–355 (1986)
56. Yazarel, H., Pappas, G.: Geometric programming relaxations for linear system reachability. In: Pro-
ceedings of the American Control Conference (2004)
123