0% found this document useful (0 votes)
5 views

cs_rep_mathprog17

This paper explores relative entropy optimization problems (REPs), which are a generalization of various convex optimization problems, including geometric and second-order cone programs. The authors present solutions to several applications of REPs, such as permanent maximization, robust optimization, and hitting-time estimation in dynamical systems, while also discussing the limitations of previous methods. Additionally, the paper introduces quantum relative entropy optimization, highlighting its connections to classical relative entropy and its applications in quantum information theory.

Uploaded by

Bhargav Bikkani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

cs_rep_mathprog17

This paper explores relative entropy optimization problems (REPs), which are a generalization of various convex optimization problems, including geometric and second-order cone programs. The authors present solutions to several applications of REPs, such as permanent maximization, robust optimization, and hitting-time estimation in dynamical systems, while also discussing the limitations of previous methods. Additionally, the paper introduces quantum relative entropy optimization, highlighting its connections to classical relative entropy and its applications in quantum information theory.

Uploaded by

Bhargav Bikkani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Math. Program., Ser.

A (2017) 161:1–32
DOI 10.1007/s10107-016-0998-2

FULL LENGTH PAPER

Relative entropy optimization and its applications

Venkat Chandrasekaran1 · Parikshit Shah2

Received: 14 January 2015 / Accepted: 22 February 2016 / Published online: 1 April 2016
© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2016

Abstract In this expository article, we study optimization problems specified via


linear and relative entropy inequalities. Such relative entropy programs (REPs) are
convex optimization problems as the relative entropy function is jointly convex with
respect to both its arguments. Prominent families of convex programs such as geomet-
ric programs (GPs), second-order cone programs, and entropy maximization problems
are special cases of REPs, although REPs are more general than these classes of prob-
lems. We provide solutions based on REPs to a range of problems such as permanent
maximization, robust optimization formulations of GPs, and hitting-time estimation
in dynamical systems. We survey previous approaches to some of these problems and
the limitations of those methods, and we highlight the more powerful generalizations
afforded by REPs. We conclude with a discussion of quantum analogs of the relative
entropy function, including a review of the similarities and distinctions with respect to
the classical case. We also describe a stylized application of quantum relative entropy
optimization that exploits the joint convexity of the quantum relative entropy function.

Keywords Matrix permanent · Robust optimization · Dynamical systems · Quantum


information · Quantum channel capacity · Shannon entropy · Von-Neumann entropy ·
Araki–Umegaki relative entropy · Golden–Thompson inequality · Optimization over
non-commuting variables

B Venkat Chandrasekaran
[email protected]
Parikshit Shah
[email protected]

1 Departments of Computing and Mathematical Sciences and of Electrical Engineering,


California Institute of Technology, Pasadena, CA 91125, USA
2 Wisconsin Institutes for Discovery, University of Wisconsin, Madison, WI 53715, USA

123
2 V. Chandrasekaran, P. Shah

Mathematics Subject Classification 81P45 · 90C25 · 94A15 · 94A17


1 Introduction
n
The relative entropy d(ν, λ) = i=1 ν i log(ν i /λi ) between two nonnegative vectors
ν, λ ∈ Rn+ plays a prominent role in information theory and in statistics in the charac-
terization of the performance of a variety of inferential procedures as well as in proofs
of a number of fundamental inequalities [16]. In this expository article, we focus on
the computational aspects of this function by considering relative entropy programs in
which the objective and constraints are specified in terms of linear and relative entropy
inequalities.
1.1 Relative entropy programs

Formally, relative entropy programs (REPs) are conic optimization problems in which
a linear functional of a decision variable is minimized subject to linear constraints as
well as a conic constraint specified by a relative entropy cone. The relative entropy
cone is defined for triples (ν, λ, δ) ∈ Rn × Rn × Rn via Cartesian products of the
following elementary cone RE 1 ⊂ R3 :
 ν  
RE 1 = (ν, λ, δ) ∈ R+ × R+ × R | ν log ≤δ . (1)
λ

As the function ν log(ν/λ) is the perspective of the negative logarithm, which is


a convex function, this cone is convex. More generally, the relative entropy cone
RE n ⊂ R3n is defined as follows:
 
νi
RE n = RE ⊗n = (ν, λ, δ) ∈ Rn
+ × Rn
+ × Rn
| ν i log ≤ δ i , ∀i . (2)
1
λi

Sublevel sets of the relative entropy function d(ν, λ) between two nonnegative vectors
ν, λ ∈ Rn+ can be represented using the relative entropy cone RE n and a linear equality
as follows:
d(ν, λ) ≤ t ⇔ ∃δ ∈ Rn s.t. (ν, λ, δ) ∈ RE n , 1 δ = t.
REPs can be solved efficiently to a desired accuracy via interior-point methods due
to the existence of computationally tractable barrier functions for the convex function
ν log(ν/λ) for ν, λ ≥ 0 [44].
The relative entropy cone RE 1 is a reparametrization of the exponential cone
{(ν, λ, δ) ∈ R+ ×R+ ×R | exp νδ ≤ λν }, which has been studied previously [24]. One
can easily see this based on the following relation for any (ν, λ, δ) ∈ R+ × R+ × R:

δ λ
exp ν ≤ ν ⇔ (ν, λ, −δ) ∈ RE 1 .

However, we stick with our description (1) of RE 1 because it leads to a transparent


generalization in the sequel to quantum relative entropy optimization problems (see
Sect. 1.3 for a brief introduction and Sect. 4 for more details).

123
Relative entropy optimization and its applications 3

REPs offer a common generalization of a number of prominent families of convex


optimization problems such as geometric programming (GP) [11,20], second-order
cone programming (SOCP) [7,42], and entropy maximization.

GPs as a special case of REPs A GP in convex form is an optimization problem in


which the objective and the constraints are specified by positive sums of exponentials:

k  
inf ci(0) exp a(i) x
x∈Rn
i=1
k  
( j)
s.t. ci exp a(i) x ≤ 1, j = 1, . . . , m. (3)
i=1

Here the exponents a(i) ∈ Rn , i = 1, . . . , k and the coefficients c( j) ∈ Rk+ , j =


0, . . . , m are fixed parameters. Applications of GPs include the computation of
information-theoretic quantities [15], digital circuit gate sizing [10], chemical process
control [55], matrix scaling and approximate permanent computation [41], entropy
maximization problems in statistical learning [18], and power control in commu-
nication systems [14]. The GP above can be reformulated as follows based on the
observation that each coefficient vector c( j) consists of nonnegative entries:

inf c(0) z
x∈Rn
y,z∈Rk

s.t. c( j) z ≤ 1, j = 1, . . . , m (4)
yi ≤ log(zi ), i = 1, . . . , k
a(i) x = yi , i = 1, . . . , k.

The set described by the constraints yi ≤ log(zi ), i = 1, . . . , k can be specified using


the relative entropy cone and affine constraints as (1, z, −y) ∈ RE n , and consequently
GPs are a subclass of REPs. We also refer the reader to [24] in which GPs are viewed
as a special case of optimization problems involving the exponential cone.

SOCPs as a special case of REPs An SOCP is a conic optimization problem with


respect to the following second-order, or Lorentz, cone [7] in Rk+1 :
 
L k = (x, y) ∈ Rk × R | x12 + · · · + xk2 ≤ y . (5)

Applications of SOCPs include filter design in signal processing [46], portfolio opti-
mization [42], truss system design [4], and robust least-squares problems in statistical
inference [22]. It is well-known that any second-order cone L k for k ≥ 2 can be
represented via suitable Cartesian products of L 2 [6]. In the “Appendix”, we show
that the second-order cone L 2 ∈ R3 can be specified using linear and relative entropy
inequalities:

123
4 V. Chandrasekaran, P. Shah


L 2 = (x, y) ∈ R2 × R | y − x1 ≥ 0, y + x1 ≥ 0 and ∃ν ∈ R+ s.t.
 
ν ν
ν log + ν log − 2ν ≤ 2x2
y + x1 y − x1
 
ν ν
ν log + ν log − 2ν ≤ −2x2 . (6)
y + x1 y − x1

Consequently, SOCPs are also a special case of REPs.


Recall that linear programs (LPs) are a special case both of GPs and of SOCPs; as
a result, we have that REPs also contain LPs as a special case. The relation between
semidefinite programs (SDPs) and REPs is less clear. It is still an open question
whether REPs contain SDPs as a special case. In the other direction, SDPs do not
contain REPs as a special case; this follows from the observation that the boundary
of the constraint set of an REP is not algebraic in general, whereas constraint sets of
SDPs have algebraic boundaries.
As an illustration of the consequence of the observation that REPs contain GPs
and SOCPs as special cases, we note that a variety of regularized logistic regression
problems in machine learning can be recast as REPs. Logistic regression methods
are widely used for identifying classifiers f : Rk → {−1, +1} of data points in
Rk based on a collection of labelled observations {(x(i) , y (i) )}i=1
n ⊂ Rk × {−1, +1}
[17,32]. To identify good classification functions of the form f (x) = sign(w x + b)
for some parameters w ∈ Rk , b ∈ R, regularized logistic regression techniques entail
the solution of convex relaxation problems of the following form:

n    
inf log 1 + exp −y (i) w x(i) + b + μr (w)
w∈Rk ,b∈R
i=1

for μ > 0 and for suitable regularization functions r : Rk → R. Examples of convex


regularization functions that are widely used in practice include r (w) = w 1 and
r (w) = w 22 . Sublevel sets of the logistic loss function log(1 + exp{a}) can be
represented within the GP framework. In particular, for any (a, t) ∈ R2 , we have that:

log(1 + exp{a}) ≤ t ⇔ exp{−t} + exp{a − t} ≤ 1.

On the other hand, sublevel sets of the regularizers w 1 and w 22 can be described
via LPs and SOCPs. Consequently, regularized logistic regression problems with the
1 -norm or the squared-2 -norm regularizer are examples of REPs.

1.2 Applications of relative entropy optimization

Although REPs contain GPs and SOCPs as special cases, the relative entropy frame-
work is more general than either of these classes of convex programs. Indeed, REPs
are useful for solving a range of problems to which these other classes of convex

123
Relative entropy optimization and its applications 5

programs are not directly applicable. We illustrate the utility of REPs in the following
diverse application domains:
1. Permanent maximization Given a collection of matrices, find the one with the
largest permanent. Computing the permanent of a matrix is a well-studied problem
that is believed to be computationally intractable. Therefore, we seek approximate
solutions to the problem of permanent maximization.
2. Robust GPs The solution of a GP (3) is sensitive to the input parameters of the prob-
lem. Compute GPs within a robust optimization framework so that the solutions
offer robustness to variability in the problem parameters.
3. Hitting-times in dynamical systems Given a linear dynamical system consisting of
multiple modes and a region of feasible starting points, compute the smallest time
required to hit a specified target set from an arbitrary feasible starting point.
We give a formal mathematical description of each of these problems in Sects. 2
and 3, and we describe computationally tractable solutions based on REPs. Variants of
these questions as well as more restrictive formulations than the ones we consider have
been investigated in previous work, with some techniques proposed for solving these
problems based on GPs. We survey this prior literature, highlighting the limitations
of previous approaches and emphasizing the more powerful generalizations afforded
by the relative entropy formalism.
In Sect. 2, we describe an approximation algorithm for the permanent maximization
problem based on an REP relaxation. We bound the quality of the approximation
provided by our method via Van Der Waerden’s inequality. We contrast our discussion
with previous work in which GPs have been employed to approximate the permanent
of a fixed matrix [41]. In Sect. 3 we describe an REP-based method for robust GPs
in which the coefficients c( j) ’s and the exponents a(i) ’s in a GP (3) are not known
precisely, but instead lie in some known uncertainty set. Our technique enables exact
and tractable solutions of robust GPs for a significantly broader class of uncertainty
sets than those considered in prior work [5,36]. We illustrate the power of these REP-
based approaches for robust GPs by employing them to compute hitting-times in
dynamical systems (these problems can be recast as robust GPs). In recent work Han
et al. [31] have also employed our reformulations of robust GPs (based on an earlier
preprint of our work [12]) to optimally allocate resources to control the worst-case
spread of epidemics in a network; here the exact network is unknown and it belongs
to an uncertainty set.
As yet another application of REPs, we note that REP-based relaxations are useful
for obtaining bounds on the optimal value in signomial programming problems [20].
Signomial programs are a generalization of GPs in which the entries of the coefficient
vectors c( j) ’s in (3) in both the objective and the constraints can take on positive and
negative values. Sums of exponentials in which the coefficients are both positive and
negative are no longer convex functions. As such, signomial programs are intractable
to solve in general (unlike GPs), and NP-hard problems can be reduced to special cases
of signomial programs. In a separate paper [13], we describe a family of tractable REP-
based convex relaxations for computing lower bounds in general signomial programs.
In the present manuscript we do not discuss this application further, and we refer the
interested reader to [13] for more details.

123
6 V. Chandrasekaran, P. Shah

1.3 From relative entropy to quantum relative entropy

Building on our discussion of relative entropy optimization problems and their appli-
cations, we consider optimization problems specified in terms of the quantum relative
entropy function in Sect. 4. The quantum relative entropy function is a matrix gener-
alization of the function ν log(ν/λ), and the domain of each of its arguments is the
cone of positive semidefinite matrices:

D(N, Λ) = Tr[N log(N) − N log(Λ)]. (7)

Here log refers to the matrix logarithm. As this function is convex and positively
homogenous, its epigraph defines a natural matrix analog of the relative entropy cone
RE 1 from (1):
 
QRE d = (N, Λ, δ) ∈ Sd+ × Sd+ × R | D(N, Λ) ≤ δ . (8)

Here Sd+ denotes the cone of d × d positive semidefinite matrices in the space Sd ∼ =
R(d +d)/2 of d × d symmetric matrices. Hence the convex cone QRE d ⊂ Sd ×
2

Sd × R ∼ = Rd +d+1 and the case d = 1 corresponds to RE 1 . Quantum relative


2

entropy programs (QREPs) are conic optimization problems specified with respect to
the quantum relative cone QRE d .
The focus of Sect. 4 is on applications of QREPs. Indeed, a broader objective of this
expository article is to initiate an investigation of the utility of the quantum relative
entropy function from a mathematical programming perspective. We begin by survey-
ing previously-studied applications of Von-Neumann entropy optimization, which is a
class of convex programs obtained by restricting the second argument of the quantum
relative entropy function (7) to be the identity matrix. We also describe a Von-Neumann
entropy optimization approach for obtaining bounds on certain capacities associated
with quantum channels; as a demonstration, in Sect. 4.3 we provide a comparison
between a “classical-to-quantum” capacity of a quantum channel and the capacity of a
purely classical channel induced by the quantum channel (i.e., one is restricted to send
and receive only classical information). Finally, we describe a stylized application of
a QREP that exploits the full convexity of the quantum relative entropy function with
respect to both its arguments. This application illustrates some of the key distinctions,
especially in the context of convex duality, between the classical and quantum cases.

1.4 Remarks and paper outline

As a general remark, we note that there are several types of relative entropy functions
(and associated entropies) that have been studied in the literature in both the classical
and quantum settings. In this article, we restrict our attention in the classical case to
the relative entropy function ν log(ν/λ) corresponding to Shannon entropy [16]. In the
quantum setting, the function D(N, Λ) from (7) is called the Araki–Umegaki relative
entropy [45], and it gives the Von-Neumann entropy when suitably restricted.

123
Relative entropy optimization and its applications 7

This paper is structured as follows. In Sect. 2, we describe our REP relaxation for the
permanent maximization problem, and in Sect. 3 we discuss REP-based approaches
for robust GPs and the application of these techniques to the computation of hitting-
times in dynamical systems. In Sect. 4 we describe QREPs and their applications,
highlighting the similarities and distinctions with respect to the classical case. We
conclude with a brief discussion and open questions in Sect. 5.

1.5 Notational convention

The nonnegative orthant in Rn is denoted by Rn+ . The space of n × n symmetric


∼ R(n+1
matrices is denoted by Sn (in particular, Sn = 2 ) ) and the cone of n × n positive

semidefinite matrices is denoted by S+ . For vectors y, z ∈ Rn we denote elementwise


n

ordering by y ≤ z to specify that yi ≤ zi for i = 1, . . . , n. We denote positive


semidefinite ordering by  so that M  N implies N − M ∈ Sn+ for symmetric
matrices M, N ∈ Sn . Finally, for any real-valued function f with domain D ⊆ Rn ,
the Fenchel or convex conjugate f  is given by [48]:

f  (ν) = sup{x ν − f (x) | x ∈ D} (9)

for ν ∈ Rn .

2 An approximation algorithm for permanent maximization via relative


entropy optimization

The permanent of a matrix M ∈ Rn×n is defined as:


n
perm(M)  Mi,σ (i) , (10)
σ ∈Sn i=1

where Sn refers to the set of all permutations of elements from the set {1, . . . , n}. The
matrix permanent arises in combinatorics as the sum over weighted perfect matchings
in a bipartite graph [43], in geometry as the mixed volume of hyperrectangles [8], and
in multivariate statistics in the computation of order statistics [1]. In this section we
consider the problem of maximizing the permanent over a family of matrices. This
problem has received some attention for families of positive semidefinite matrices with
specified eigenvalues [19,28], but all of those works have tended to seek analytical
solutions for very special instances of this problem. Here we consider permanent maxi-
mization over general convex subsets of the set of nonnegative matrices. (As a parallel,
recall that SDPs are useful for maximizing the determinant over an affine section of
the cone of symmetric positive semidefinite matrices [54].) Permanent maximization
is relevant for designing a bipartite network in which the average weighted matching
is to be maximized subject to additional topology constraints on the network. This
problem also arises in geometry in finding a configuration of a set of hyperrectangles
so that their mixed volume is maximized.

123
8 V. Chandrasekaran, P. Shah

2.1 Approximating the permanent of a nonnegative matrix

To begin with, we note that even computing the permanent of a fixed matrix is a #P-hard
problem and it is therefore considered to be intractable in general [53]; accordingly
a large body of research has investigated approximate computation of the permanent
[2,30,38,41]. The class of elementwise nonnegative matrices has particularly received
attention as these arise most commonly in application domains (e.g., bipartite graphs
with nonnegative edge weights). For such matrices, several deterministic polynomial-
time approximation algorithms provide exponential-factor (in the size n of the matrix)
approximations of the permanent, e.g. [41]. The approach in [41] describes a technique
based on solving a GP to approximate the permanent of a nonnegative matrix. The
approximation guarantees in [41] are based on Van Der Waerden’s inequality, which
states that the matrix M = n1 11 has the smallest permanent among all n × n doubly
stochastic matrices (nonnegative matrices with all row-sums and column-sums equal to
one). This inequality was proved originally in the 1980s [21,23], and Gurvits recently
gave a very simple and elegant proof of this result [29]. In what follows, we describe
the approach of [41], but using the terminology developed by Gurvits in [29].
n×n
The permanent of a matrix M ∈ R+ can be defined in terms of a particular
coefficient of the following homogenous polynomial in y ∈ Rn+ :
⎛ ⎞

n n
pM (y1 , . . . , yn ) = ⎝ Mi, j y j ⎠ . (11)
i=1 j=1

Specifically, the permanent of M is the coefficient corresponding to the y1 · · · yn mono-


mial term of pM (y1 , . . . , yn ):

∂ n pM (y1 , . . . , yn )
perm(M) = .
∂y1 · · · ∂yn

In his proof of Van Der Waerden’s inequality, Gurvits defines the capacity of a homoge-
nous polynomial p(y1 , . . . , yn ) of degree n over y ∈ Rn+ as follows:

p(y)
cap( p)  infn = inf p(y). (12)
y∈R+ y1 · · · yn y∈Rn+ , y1 ···yn =1

Gurvits then proves the following result:


n×n
Theorem 1 [29] For any matrix M ∈ R+ , we have that:

n!
cap( pM ) ≤ perm(M) ≤ cap( pM ).
nn

Here the polynomial pM and its capacity cap( pM ) are as defined in (11) and (12).
Further if each column of M has at most k nonzeros, then the factor in the lower bound
(k−1)n
can be improved from nn!n to k−1
k .

123
Relative entropy optimization and its applications 9

Gurvits in fact proves a more general statement involving so-called stable polynomials,
but the above restricted version will suffice for our purposes. The upper-bound in this
statement is straightforward to prove; it is the lower bound that is the key technical
novelty. Thus, if one could compute the capacity of the polynomial pM associated to
a nonnegative matrix M, then one can obtain an exponential-factor approximation of
the permanent of M as nn!n ≈ exp(−n). In order to compute the capacity of pM via
GP, we apply the transformation xi = log(yi ), i = 1, . . . , n in (12) and solve the
following program:1
⎛ ⎞
n n
log(cap( pM )) = inf log ⎝ Mi, j exp(x j )⎠ s.t. 1 x = 0
x∈Rn
i=1 j=1
n
= infn Mi, j exp(x j − β i − 1) + 1 β s.t. 1 x = 0. (13)
x∈R
β∈Rn i, j=1

Here we obtain the second formulation by appealing to the following easily-verified


variational characterization of the logarithm, which holds for γ > 0:

log(γ ) = inf e−β−1 γ + β. (14)


β∈R

2.2 An approximation algorithm for permanent maximization


n×n
Focusing on a set M ⊂ R+ of entrywise nonnegative matrices, our proposal to
approximately maximize the permanent is to find M ∈ M with maximum capacity
cap( pM ), which leads to the following consequence of Theorem 1:
n×n
Proposition 1 Let M ∈ R+ be a set of nonnegative matrices, and consider the
following two quantities:

M̂perm = arg supM∈M perm(M)


M̂cap = arg supM∈M cap( pM )

Then we have that

n! n!
perm(M̂perm ) ≤ n cap( pM̂cap ) ≤ perm(M̂perm ).
nn n

n! k−1 (k−1)n
The factor in the lower bound can be improved from nn to k if every matrix
in M has at most k nonzeros in each column.

1 Such a logarithmic transformation of the variables is also employed in converting a GP specified in terms
of non-convex posynomial functions to a GP in convex form; see [11,20] for more details.

123
10 V. Chandrasekaran, P. Shah

Proof The proof follows from a direct application of Theorem 1:

n!   n!  
perm M̂ perm ≤ cap p M̂
nn nn perm

n!  
≤ n cap pM̂cap
n  
≤ perm M̂cap
 
≤ perm M̂perm .

The first and third inequalities are a result of Theorem 1, and the second and fourth
inequalities follow from the definitions of M̂cap and of M̂perm respectively. The
improvement in the lower bound also follows from Theorem 1. 


In summary, maximizing the capacity with respect to the family M gives a matrix
in M that approximately maximizes the permanent over all matrices in M. As the
computation of the capacity cap( pM ) of a fixed matrix M ∈ M involves the solution
of a GP, the maximization of cap( pM ) over the set M involves the maximization of
the optimal value of the GP (13) over M ∈ M:
⎡ ⎤
n
⎢ ⎥
sup log(cap( pM )) = sup ⎣ inf Mi, j exp(x j − β i − 1) + 1 β ⎦
M∈M M∈M x∈Rn ,1 x=0
β∈Rn i, j=1
(15)
Rewriting the inner convex program as an REP (as in Sect. 1.1 in the introduction),
we have that:

log(cap( pM )) = inf Tr(MY) + 1 β


x,β∈Rn
Y∈Rn×n
+

s.t. 1x=0
x j − β i − 1 ≤ log(Yi, j ), i, j = 1, . . . , n.

The dual of this REP reformulation of the GP (13) is the following optimization
problem based on a straightforward calculation:

n 
i, j
log(cap( pM )) = sup − i, j log
∈Rn×n Mi, j
+ i, j=1
n
s.t. i, j = 1, i = 1, . . . , n
j=1
n n
i,1 = ··· = i,n . (16)
i=1 i=1

123
Relative entropy optimization and its applications 11

The constraint on the last line requires that the column-sums be  equal  to each other.
n
As the relative entropy function d( , M) = i, j=1 i, j log Mi, j is convex with
i, j

respect to , this dual optimization problem is a convex program and it can be


expressed as an REP.
However, the function d( , M) is jointly convex with respect to ( , M). Conse-
quently, one can optimize the objective function in the dual problem (16) above jointly
with respect to ( , M). This observation leads to the following reformulation of (15):
  
sup log(cap( pM )) = sup − i,n j=1 i, j log Mi,i, jj
M∈M M, ∈Rn×n
+

s.t. M∈M
n
j=1 i, j= 1, i = 1, . . . , n
n n
i=1 i,1 = · · · = i=1 i,n .

n×n
If the set M ⊂ R+ is convex, this problem is a convex program. Further, if M
can be represented tractably, then this convex program can be solved efficiently. We
record these observations in the following proposition:
n×n
Proposition 2 Suppose that M ⊂ R+ has a conic representation:
 n×n 
M = M ∈ Rn×n | M ∈ R+ , ∃z ∈ Rm s.t. g + L(M, z) ∈ K

for L : Rn×n × Rm → R a linear operator, g ∈ R , and K ⊂ R a convex cone. Then


the problem of maximizing capacity with respect to the set M can be reformulated as
follows:
  
sup log(cap( pM )) = sup − i,n j=1 i, j log Mi,i, jj
M∈M M, ∈Rn×n
+ ,z∈R
m

s.t. g + L(M, z) ∈ K
n
j=1 i, j= 1, i = 1, . . . , n
n n
i=1 i,1 = · · · = i=1 i,n .

Suppose the set M has a tractable LP, GP, or SOCP representation so that the convex
cone K in the above proposition is the relative entropy cone. Then the convex program
in Proposition 2 is an REP that can be solved efficiently. If the set M has a tractable
SDP representation, then K is the cone of positive semidefinite matrices; in such cases,
the program in Proposition 2 is no longer an REP, but it can still be solved efficiently
via interior-point methods by combining logarithmic barrier functions for the positive
semidefinite cone and for the relative entropy cone [44].

3 Robust GP and hitting-time estimation in dynamical systems

As our next application, we describe the utility of REPs in addressing the problem
of computing solutions to GPs that are robust to uncertainty in the input parameters

123
12 V. Chandrasekaran, P. Shah

of the GP. Robust GPs arise in power control problems in communication systems
[14] as well as in robust digital circuit gate sizing [10]. Specifically, a robust GP is an
optimization problem in which a positive sum of exponentials is minimized subject to
affine constraints and constraints of the following form in a decision variable x ∈ Rn :

k  
sup ci exp a(i) x ≤ 1. (17)
[c,a(1) ,...,a(k) ]∈U i=1

Here the set U ⊂ Rk+ × Rnk specifies the possible uncertainty in the coefficients c and
in the exponents a(1) , . . . , a(k) . In principle, constraints of the type (17) specify convex
sets in a decision variable x because they can be viewed as the intersection of a (possibly
infinite) collection of convex constraints. However, this observation does not lead
directly to efficient methods for the numerical solution of robust GPs, as constraints
of the form (17) are not tractable to describe in general. For example, if U is a finite
set then the constraint (17) reduces to a finite collection of constraints on positive
sums of exponentials; however, if U consists  of infinitely
 many elements, then the
k (i)
expression sup[c,a(1) ,...,a(k) ]∈U i=1 ci exp a x may not be efficiently specified
in closed form. Thus, the objective in robust GPs is to obtain tractable reformulations
of constraints of the form (17) via a small, finite number of inequalities with a tractable
description.
Such optimization problems in which one seeks solutions that are robust to parame-
ter uncertainty have been extensively investigated in the field of robust optimization
[3,5], and exact, tractable reformulations of robust convex programs are available in
a number of settings, e.g., for robust LPs. However, progress has been limited in the
context of robust GPs. We discuss prior work on this problem in Sect. 3.1, highlighting
some of the shortcomings, and we describe the more powerful generalizations afforded
by REP-based reformulations in Sect. 3.2. In Sect. 3.3, we illustrate the utility of these
reformulations in estimating hitting-times in dynamical systems.

3.1 GP-based reformulations of robust GPs

In their seminal work on robust convex optimization [5], Ben-Tal and Nemirovski
obtained an exact, tractable reformulation of robust GPs in which a very restricted
form of coefficient uncertainty is allowed in a positive sum-of-exponentials function—
specifically, they assume that the uncertainty set U is decomposed as U = C × {ā(1) } ×
· · · × {ā(k) }, where each ā(i) ∈ Rn , i = 1, . . . , k, and C ⊂ Rk+ is a convex ellipsoid
in Rk+ specified by a quadratic form defined by an elementwise nonnegative matrix.
Thus, the exponents are assumed to be known exactly (i.e., there is no uncertainty in
these exponents) and the uncertainty in the coefficients is specified by a very particular
type of ellipsoidal set. The reformulation given in [5] for such robust GPs is itself a
GP with additional variables, which can be solved efficiently.
In subsequent work, Hsiung et al. [36] considered sum-of-exponentials functions
with the coefficients absorbed into the exponent as follows:

123
Relative entropy optimization and its applications 13

k  
sup exp a(i) x + di ≤ 1
[d,a(1) ,...,a(k) ]∈D i=1

For such constraints with D being either a polyhedral set or an ellipsoidal set, Hsiung
et al. [36] obtain tractable but inexact reformulations via piecewise linear approxima-
tions, with the reformulations again being GPs.
A reason for the limitations in these previous works—very restrictive forms of
uncertainty sets in [5] and inexact reformulations in [36]—is that they considered
GP-based reformulations of the inequality (17). In the next section, we discuss exact
and tractable REP-based reformulations of the inequality (17) for a general class of
uncertainty sets.

3.2 Relative entropy reformulations of robust GPs

We describe exact and efficient REP-based reformulations of robust GPs for settings
in which the uncertainty set U is decoupled as follows:

U = C × E (1) × · · · × E (k) , (18)

where C ⊂ Rk+ specifies uncertainty in the coefficients and each E (i) ⊂ Rn , i =


1, . . . , k specifies uncertainty in the i’th exponent. Although this decomposition
imposes a restriction on the types of uncertainty sets U we consider, our method-
ology allows for flexible specifications of the sets C and E (1) , . . . , E (k) as each of
these can essentially be arbitrary convex sets with a tractable description.
For such forms of uncertainty, the following proposition provides an exact and
efficient REP reformulation of (17) based on an appeal to convex duality:

Proposition 3 Suppose C ⊂ Rk is a convex set of the form:

C = {c ∈ Rk | ∃z ∈ Rm C s.t. c ≥ 0, g + FC (c) + HC (z) ∈ KC }

for a convex cone KC ⊂ RC , a linear map FC : Rk → RC , a linear map HC :


Rm C → RC , and g ∈ RC , and suppose each E (i) ⊂ Rn for i = 1, . . . , k is a set of
the form:
 
E (i) = q ∈ Rn | ∃z ∈ Rm i s.t. g(i) + F(i) (q) + H(i) (z) ∈ K(i)

for convex cones K(i) ⊂ Ri , linear maps F(i) : Rn → Ri , linear maps H(i) : Rm i →
Ri , and g(i) ∈ Ri . Further, assume that there exists a point (c̄, z̄) ∈ Rk × Rm C
satisfying the conic and nonnegativity constraints in C strictly, and similarly that
there exists a strictly feasible point in each E (i) . Then we have that x ∈ Rn satisfies
the constraint (17) with U = C × E (1) × · · · × E (k) if and only if there exist ζ ∈
RC , θ (i) ∈ Ri for i = 1, . . . , k such that:

123
14 V. Chandrasekaran, P. Shah

g ζ ≤ 1, ζ ∈ KC , H†C (ζ ) = 0, and for i = 1, . . . , k,


† † 
F(i) (θ (i) ) + x = 0, H(i) (θ (i) ) = 0, g(i) θ (i) ≤ log(−[F†C (ζ )]i ), θ (i) ∈ K(i) .


Note Here F† , F(i) , and H† represent the adjoints of the operators F, F(i) , and H

respectively. The cones KC and K(i) denote the duals of the cones KC and K(i)
respectively. The assumption that there exist points that strictly satisfy the constraints
specifying C and those specifying each E (i) allows us to appeal to strong duality in
deriving our result [48]. We give a concrete illustration of the power of this result in
the sequel as well as an application to hitting-time estimation in dynamical systems
in Sect. 3.3.

Proof The constraint (17) can be reformulated as follows for uncertainty sets U that
are decomposable according to (18):

∃y ∈ Rk s.t. supc∈C y c ≤ 1
supa(i) ∈E (i) a(i) x ≤ log(yi ), i = 1, . . . , k.

This restatement is possible because the set C is a subset of the nonnegative orthant
Rk+ , and because the uncertainty sets E (i) are decoupled (from C and from each other)
and are therefore independent of each other. The first expression, supc∈C y c ≤ 1, is a
universal quantification for all c ∈ C. In order to convert this universal quantifier to an
existential quantifier, we appeal to convex duality as is commonly done in the theory
of robust optimization [3,5]. Specifically, by noting that C has a conic representation
and by appealing to conic duality [7], we have that:

∀c ∈ C, y c ≤ 1

∃ζ ∈ R s.t. F†C (ζ ) + y ≤ 0, H†C (ζ ) = 0, ζ ∈ KC , g ζ ≤ 1. (19)

Similarly, we have that:

∀a(i) ∈ E (i) , a(i) x ≤ log(yi )



∃θ (i) ∈ Ri s.t. F(i) (θ (i) ) + x = 0, H(i) (θ (i) ) = 0,
† †

θ (i) ∈ K(i) , g(i) θ (i) ≤ log(yi ). (20)

The assumptions on strict feasibility are required to derive (19) and (20). Combining
these results and eliminating y, we have the desired conclusion. 


In summary, Proposition 3 gives an extended formulation for (17) with additional


variables. In a similar spirit to Proposition 2, if the sets C and E (1) , . . . , E (k) are
representable via LP or SOCP (i.e., the cones KC and K(i) are orthants, second-
order cones, or relative entropy cones of suitable dimensions), then the inequality
(17) can be represented via REP. In other words, robust GPs in which the uncertainty

123
Relative entropy optimization and its applications 15

sets are decomposable according to (18) and are tractably specified as polyhedral
or ellipsoidal sets (or more generally, sets that are tractably represented via relative
entropy inequalities) can be reformulated exactly and efficiently via REPs. (As before,
robust GPs in which the cones KC and K(i) are semidefinite cones are not directly
reformulated as REPs, but can nonetheless be solved efficiently by combining barrier
penalties for the relative entropy cone and the semidefinite cone.)
In contrast to previous results on robust GP, note that the form of uncertainty for
which we obtain an efficient and exact REP reformulation is significantly more general
than that considered in [5], in which C can only be a restricted kind of ellipsoidal set
and each E (i) must be a singleton set (no uncertainty). On the other hand, Hsiung et
al. [36] consider robust GPs with polyhedral or ellipsoidal uncertainty that may be
coupled across E (i) , but their reformulation is inexact. A distinction of our approach
relative to those in [5] and in [36] is that our reformulation is a REP, while those
described in [5] and in [36] are GPs. In particular, note that some of the inequalities
described in the reformulation in Proposition 3 involve a combination of an affine term
and a logarithmic term; such inequalities cannot be represented via GPs, and it is the
additional expressive power provided by REPs that enables the efficient reformulation
and solution of the general class of robust GPs considered here.

Example To illustrate these distinctions concretely, consider a specific instance of a


robust GP constraint (17) in which the uncertainty set U is decomposable according
to (18) as U = C × E (1) × · · · × E (k) . Further, suppose C ⊂ Rk is a convex set of the
form:
C = {c ∈ Rk | c ≥ 0, G + FC (c)  0}
for G ∈ S and a linear operator FC : Rk → S , and suppose also that each E (i) ⊂ Rn
for i = 1, . . . , k is a set of the form:
 
E (i) = q ∈ Rn | G(i) + F(i) (q)  0

for G(i) ∈ S and linear operators F(i) : Rn → S . With these uncertainty sets, we
have from Proposition 3 that
k  
supc∈C ,a(i) ∈E (i) i=1 ci exp a(i) x ≤ 1

(i)
∃Z ∈ and Θ ∈ S , i = 1, . . . , k s.t.
S
Tr(GZ) ≤ 1, Z  0, and for each i = 1, . . . , k,

F(i) Θ (i) + x = 0, Tr(G(i) Θ (i) ) ≤ log(−[F† (Z)]i ), Θ (i)  0.

Note that the uncertainty sets C and E (i) are SDP-representable sets. The correspond-
ing robust GP inequality (17) cannot be handled by the previous approaches [5,36]
described in Sect. 3.1, but it can be efficiently reformulated via semidefinite and rela-
tive entropy inequalities.

123
16 V. Chandrasekaran, P. Shah

3.3 Application to hitting-time estimation in dynamical systems

Consider a linear dynamical system with state-space equations as follows:

ẋ(t) = Ax(t), (21)

where the state x(t) ∈ Rn for t ≥ 0. We assume throughout this section that the transi-
tion matrix A ∈ Rn×n is diagonal; otherwise, one can always change to the appropriate
modal coordinates given by the eigenvectors of A (assuming A is diagonalizable). The
diagonal entries of A are called the modes of the system. Suppose that the parameters
of the dynamical system can take on a range of values with A ∈ A and x(0) ∈ Xinitial ;
the set A specifies the set of modes and the set Xinitial ⊆ Rn specifies the set of initial
conditions. Suppose also that we are given a target set Xtarget ⊆ Rn , and we wish
to find the smallest time required for the system to reach a state in Xtarget from an
arbitrary initial state in Xinitial . Formally, we define the worst-case hitting-time of the
dynamical system (21) to be:

τ (Xinitial , Xtarget , A)  inf t ∈ R+ | ∀x(0) ∈ Xinitial , A ∈ A we have

exp (At) x(0) ∈ Xtarget . (22)

Indeed, for an initial state x(0), the state of the system (21) at time t is given by x(t) =
exp (At) x(0); consequently, the quantity τ (Xinitial , Xtarget , A) represents the amount
of time that the worst-case trajectory of the system, taken over all initial conditions
Xinitial and mode values A, requires to enter the target set Xtarget . An assumption
underlying this definition is that the target set Xtarget is absorbing so that once a
trajectory enters Xtarget it remains in Xtarget for all subsequent time.
Hitting-times are of interest in system analysis and verification [47]. As an example,
suppose that a system has the property that τ (Xinitial , Xtarget , A) = ∞; this pro-
vides a proof that from certain initial states in Xinitial , the system never enters the
target region Xtarget . On the other hand, if the hitting-time τ (Xinitial , Xtarget , A) = 0,
we have a certificate that Xtarget ⊆ Xinitial . While verification of linear systems has
been extensively studied via Lyapunov and barrier certificate methods, the study of
hitting-times has received relatively little attention, with a few exceptions such as
[56]. In particular, the approaches in [56] can lead to loose bounds as the worst-case
hitting-time is computed based on box-constrained outer approximations of Xinitial and
Xtarget .
For a particular class of dynamical systems, we show next that the hitting-time
can be computed exactly by solving an REP. Specifically, we make the following
assumptions regarding the structure of the dynamical system (21):
– The set of modes is given by
   
A = A ∈ Rn×n | A diagonalwith A j, j ∈  j , u j ∀ j = 1, . . . , n

with  j , u j ≤ 0 ∀ j = 1, . . . , n.

123
Relative entropy optimization and its applications 17

– The set of initial states Xinitial ⊆ Rn+ is given by a convex set with a tractable
representation via affine and conic constraints. In particular, as in Proposition 3,
Xinitial is specified as follows:

Xinitial = {x ∈ Rn+ | ∃z ∈ Rm s.t. g + F(x) + H(z) ∈ K}. (23)

Here g ∈ R , the maps F : Rn → R and H : Rm → R are linear, and the cone


K ⊂ R is efficiently described.
– The target set Xtarget ⊂ Rn is representable as the intersection of finitely many
half spaces:
Xtarget = {x ∈ Rn+ | c(i) x ≤ di , i = 1, . . . , k}.
with c(i) ∈ Rn+ , i = 1, . . . , k, and d ∈ Rk+ .
The first condition on the modes of the dynamical system and the subsequent
conditions on the structure of the initial and target states ensure that the target state is
absorbing. Indeed, as each c(i) ∈ Rn+ , each A j, j ≤ 0, and Xinitial ⊂ Rn+ , one can check
 (i)
for each i = 1, . . . , k and any x(0) ∈ Xinitial that nj=1 c j x j (0) exp A j, j t0 ≤ di
n (i)
for some t0 ∈ R+ implies that j=1 c j x j (0) exp A j, j t ≤ di for all t ≥ t0 . Under
these conditions, the worst-case hitting-time τ (Xinitial , Xtarget , A) can be computed
exactly as follows:

τ (Xinitial , Xtarget , A) = inf t


t≥0
n
(i)
s.t. sup c j x j (0) exp A j, j t ≤ di , ∀i. (24)
x(0)∈Xinitial j=1
A∈A

Each constraint here is a robust GP inequality of the form (17) with the uncertainty set
being decomposable according to (18). Consequently, the hitting-time computation
problem for the particular family of dynamical systems we consider can be reformu-
lated as an REP [possibly with additional conic constraints depending on the cone K
in (23)] by appealing to Proposition 3.

Example Consider a dynamical system with three modes that each take on values in
certain ranges as follows:

A = A ∈ R3×3 | A diagonal, A1,1 ∈ [−0.4, −0.45],

A2,2 ∈ [−0.5, −0.6], A3,3 ∈ [−0.6, −0.7] .

The set of initial states is a shifted elliptope that is contained within the nonnegative
orthant:
⎧ ⎡ ⎤ ⎫
⎨ 1 x1 (0) − 3 x2 (0) − 3 ⎬
Xinitial = x(0) ∈ R3+ : ⎣ x1 (0) − 3 1 x3 (0) − 3 ⎦  0 .
⎩ ⎭
x2 (0) − 3 x3 (0) − 3 1

123
18 V. Chandrasekaran, P. Shah

Fig. 1 Some sample trajectories of a linear system from Xinitial to Xtarget for the dynamical system
described in the example in Sect. 3.3. The (shifted) elliptope specifies the set of feasible starting points,
while the tetrahedron specifies the target set. The system consists of three modes

Finally, we let the target region be a tetrahedron:


 
Xtarget = x ∈ R3+ | 1 x ≤ 1 .

For these system attributes, a few sample trajectories are shown in Fig. 1. We solve the
REP (24) and obtain that τ (Xinitial , Xtarget , A) = 7.6253 is the worst-case hitting-time.

4 Quantum relative entropy optimization

In this section we discuss optimization problems involving the quantum relative


entropy function. Both the classical and the quantum relative entropy cones can be
viewed as limiting cases of more general families of cones known as power cones;
we describe this relationship in Sect. 4.1. Next we survey applications involving the
solution of Von-Neumann entropy optimization problems, which constitute an impor-
tant special class of quantum relative entropy optimization problems. In Sect. 4.3 we
discuss a method to obtain bounds on certain capacities associated to quantum com-
munication channels via Von-Neumann entropy optimization. Finally, in Sect. 4.4 we
consider a matrix analog of a robust GP and the utility of the quantum relative entropy
function in providing lower bounds on the optimal value of this problem; this stylized

123
Relative entropy optimization and its applications 19

problem provides a natural setting to examine the similarities and differences between
the classical and the quantum relative entropy functions.

4.1 Relation to power cones

The relative entropy cone RE 1 ⊂ R3 can be viewed as a limiting case of a more


general family of cones known as power cones [24]. Specifically, for each α ∈ [0, 1],
the following map is a convex function on R+ × R+ :

(ν, λ) → −ν α λ1−α . (25)

As this map is positively homogenous, its epigraph gives the following power cone in
R3 :
P α = {(ν, λ, δ) ∈ R+ × R+ × R | − ν α λ1−α ≤ δ}. (26)

Power cones in higher dimensions are obtained by taking Cartesian products of P α .


The relative entropy function can be obtained as a suitable limit of the functions defined
in (25): ν 
1
lim 1−α [−ν α λ1−α + ν] = ν log .
α→1 λ
Hence the relative entropy cone RE 1 can be viewed as a kind of “extremal” power
cone.
One can also define a matrix analog of the power cones P α based on the following
theorem due to Lieb [40]. In analogy to the classical case, the quantum relative entropy
cone can also be viewed as a limiting case of these quantum power cones.

Theorem 2 [40] For each fixed X ∈ Rd×d and α ∈ [0, 1], the following function is
convex on Sd+ × Sd+ :
(N, Λ) → −Tr(X Nα XΛ1−α ).

This theorem has significant consequences in various mathematical and statistical


aspects of quantum mechanics. As with the map defined in (25), the function con-
sidered in this theorem is also positively homogenous, and consequently its epigraph
defines a convex cone analogous to P α :

QP α = {(N, Λ, δ) ∈ Sd+ × Sd+ × R | − Tr(X Nα XΛ1−α ) ≤ δ}. (27)

As before, the quantum relative entropy function (7) may be viewed as a limiting case
as follows:
D(N, Λ) = lim 1−α 1
[−Tr(Nα Λ1−α ) + Tr(N)].
α→1

Indeed, this perspective of the quantum relative entropy function offers a proof of its
joint convexity with respect to both arguments, as the function −Tr(Nα Λ1−α )+Tr(N)
is jointly convex with respect to (N, Λ) for each α ∈ [0, 1].

123
20 V. Chandrasekaran, P. Shah

4.2 Survey of applications of Von-Neumann entropy optimization

An important subclass of QREPs are optimization problems involving the Von-


Neumann entropy function, which is obtained by restricting the second argument
of the negative quantum relative entropy function to be the identity matrix:

H (N) = −D(N, I)
(28)
= −Tr[N log(N)].

Here N is a positive semidefinite matrix and I is the identity matrix of appropriate


dimension. In quantum information theory, the Von-Neumann entropy is typically
considered for positive semidefinite matrices with trace equal to one (i.e., quantum
density matrices), but the function is concave over the full cone of positive semidefinite
matrices.
Based on the next proposition, Von-Neumann entropy optimization problems are
essentially REPs with additional semidefinite constraints:

Proposition 4 [7] Let f : Rn → R be a convex function that is invariant under


permutation of its argument, and let g : Sn → R be the convex function defined as
g(N) = f (λ(N)). Here λ(N) refers to the list of eigenvalues of N. Then we have that

g(N) ≤ t

∃x ∈ R s.t.
n
f (x) ≤ t
x1 ≥ · · · ≥ xn
sr (N) ≤ x1 + · · · + xr , r = 1, . . . , n − 1
Tr(N) = x1 + · · · + xn ,

where sr (N) is the sum of the r largest eigenvalues of N.

The function sr is SDP-representable [7]. Further, we observe that the Von-


Neumann entropy function H (N) is concave and invariant under conjugation of the
argument N by any orthogonal matrix. That is, it is a concave and permutation-invariant
function of the eigenvalues of N:

H (N) = −d(λ(N), 1),

where 1 is the all-ones vector. Consequently, convex optimization problems involving


linear constraints as well as constraints on the Von-Neumann entropy function H (N)
are REPs with additional semidefinite constraints.
Several previously proposed methods for solving problems in varied applica-
tion domains can be viewed as special cases of Von-Neumann entropy optimization
problems, involving the maximization of the Von-Neumann entropy of a positive
semidefinite matrix subject to affine constraints on the matrix. We briefly survey these
methods here.

123
Relative entropy optimization and its applications 21

Quantum state tomography Quantum state tomography arises in the characterization


of optical systems and in quantum computation [26]. The goal is to reconstruct a
quantum state specified by a density matrix given partial information about the state.
Such information is typically provided via measurements of the state, which can be
viewed as linear functionals of the underlying density matrix. As measurements can
frequently be expensive to obtain, tomography is a type of inverse problem in which one
must choose among the many density matrices that satisfy the limited measurement
information that is acquired. One proposal for tomography, based on the original
work of Jaynes [37] on the maximum entropy principle, is to find the Von-Neumann-
entropy-maximizing density matrix among all density matrices consistent with the
measurements [26]—the rationale behind this approach is that the entropy-maximizing
matrix makes the least assumptions about the quantum state beyond the constraints
imposed by the acquired measurements. Using this method, the optimal reconstructed
quantum state can be computed as the solution of a Von-Neumann entropy optimization
problem.

Equilibrium densities in statistical physics In statistical mechanics, a basic objective


is to investigate the properties of a system at equilibrium given information about
macroscopic attributes of the system such as energy, number of particles, volume,
and pressure. In mathematical terms, the situation is quite similar to the previous
setting with quantum state tomography—specifically, the system is described by a
density matrix (as in quantum state tomography, this matrix is symmetric positive
semidefinite with unit trace), and the macroscopic properties can be characterized as
constraints on linear functionals of this density matrix. The Massieu–Planck extremum
principle in statistical physics states that the system at equilibrium is given by the Von-
Neumann-entropy-maximizing density matrix among all density matrices that satisfy
the specified constraints [50]. As before, this equilibrium density can be computed
efficiently via Von-Neumann entropy optimization.

Kernel learning A commonly encountered problem in machine learning is to measure


similarity or affinity between entities that may not belong to a Euclidean space, e.g.,
text documents. A first step in such settings is to specify coordinates for the entities in
a linear vector space after performing some nonlinear mapping, which is followed by a
computation of pairwise similarities in the linear space. Kernel methods approach this
problem implicitly by working directly with inner-products between pairs of entities,
thus combining the nonlinear mapping and the similarity computation in a single step.
This approach leads naturally to the question of learning a good kernel matrix, which
is a symmetric positive semidefinite matrix specifying similarities between pairs of
entities. Kulis et al. [39] propose to minimize the quantum relative entropy between
a kernel M ∈ Sn+ (the decision variable) and a given reference kernel M0 ∈ Sn+
(specifying prior knowledge about the affinities among the n entities):

D(M, M0 ) = −H (M) − Tr[M log(M0 )]. (29)

This minimization is subject to the decision variable M being positive definite as well
as constraints on M that impose bounds on similarities between pairs of entities in

123
22 V. Chandrasekaran, P. Shah

the linear space. We refer the reader to [39] for more details, and for the virtues of
minimizing the quantum relative entropy as a method for learning kernel matrices. In
the context of the present paper, the optimization problems considered in [39] can be
viewed as Von-Neumann entropy maximization problems subject to linear constraints,
as the second argument in the quantum relative entropy function (29) is fixed.

4.3 Approximating quantum channel capacity via Von-Neumann entropy


optimization

In this section we describe a Von-Neumann entropy optimization approach to bound-


ing the capacity of quantum communication channels. A quantum channel transmits
quantum information, and every such channel has an associated quantum capacity that
characterizes the maximum rate at which quantum information can be reliably com-
municated across the channel. We refer the reader to the literature for an overview of
this topic [35,45,49,51]. The input and the output of a quantum channel are quantum
states that are specified by quantum density matrices. Formally, a quantum channel is
characterized by a linear operator L : Sn → Sk that maps density matrices to density
matrices; the dimensions of the input and output density matrices may, in general,
be different. For the linear operator L to specify a valid quantum channel, it must
map positive semidefinite matrices in Sn to positive semidefinite matrices in Sk and it
must be trace-preserving. These properties ensure that density matrices are mapped to
density matrices. In addition, L must satisfy a property known as complete-positivity,
which is required by the postulates of quantum mechanics—see [45] for more details.
Any linear map L : Sn → Sk that satisfies all these properties can be expressed via
matrices2 A( j) ∈ Rk×n as follows [33]:

L(ρ) = A( j) ρA( j) where A( j) A( j) = I.


j j

Letting S n denote the n-sphere, the capacity of the channel specified by L : Sn → Sk


is defined as:
& ' ()
  
(i) (i)
C(L)  sup H L pi v v − pi H L v(i) v(i) . (30)
v(i)
∈S n i i
pi ≥0, i pi =1

Here H is the Von-Neumann entropy function (28) and the number of states v(i) is
part of the optimization.
We note that there are several kinds of capacities associated with a quantum channel,
depending on the protocol employed for encoding and decoding states transmitted
across the channel; the version considered here is the one most commonly investigated,
and we refer the reader to Shor’s survey [51] for details of the other types of capacities.

2 In full generality, density matrices are trace-one, positive semidefinite Hermitian matrices, and A( j) ∈
Ck×n . As with SDPs, the Von-Neumann entropy optimization framework can handle linear matrix inequal-
ities on Hermitian matrices, but we stick with the real case for simplicity.

123
Relative entropy optimization and its applications 23

The quantity C(L) (30) on which we focus here is called the C1,∞ quantum capacity—
roughly speaking, it is the capacity of a quantum channel in which the sender cannot
couple inputs across multiple uses of the channel, while the receiver can jointly decode
messages received over multiple uses of the channel.

Shor’s approach for lower bounds via LP In [51] Shor describes a procedure based
on LP to provide lower
 mbounds on C(L). As the first step of this method, one fixes a
finite set of states v(i) i=1 with each v(i) ∈ S n and a density matrix ρ ∈ Sn , so that ρ
is in the convex hull conv({v(i) v(i) }i=1
m ). With these quantities fixed, one can obtain

a lower bound on C(L) by solving the following LP:


  m    
C L, v(i) ,ρ = sup

H [L(ρ)] − pi H L v(i) v(i) (31)
i=1 p∈Rm , pi =1
+ i i
ρ= i pi v(i) v(i)

In [51] Shor also suggests local heuristics to search for better sets
 of states and den-

m
sity matrices to improve the lower bound (31). It is clear that C L, v(i) i=1 , ρ as
computed in (31) is a lower bound on C(L). Indeed, one can check that:
   
C(L) = sup sup
 
C L, v(i) , ρ . (32)
v(i) ∈S n ρ∈conv v(i) v(i)

Here the number of states is not explicitly denoted, and it is also a part of the opti-
mization.

Improved lower bounds via Von-Neumann entropy optimization In the computation


of the lower bound by Shor’s approach (31), the density matrix ρ remains fixed
and the optimization is over decompositions of ρ as a convex combination of ele-
ments from the set {v(i) v(i) }i=1
m . We describe an improvement upon Shor’s lower

bound (31) by further optimizing over the set of density matrices ρ in the convex
hull conv({v(i) v(i) }i=1
m ). Our improved lower bound entails the solution of a Von-

Neumann entropy
m optimization problem. Specifically, we observe that for a fixed set
of states v(i) i=1 the following optimization problem involves Von-Neumann entropy
maximization subject to affine constraints:
  m    m 
C L, v(i) = 
sup m
C L, v(i) ,ρ
i=1 i=1
ρ∈conv v(i) v(i)
i=1
& ' m
()
(i) (i)
= sup

H L pi v v
m
pi ≥0, i=1 pi =1 i=1
m   
− pi H L v(i) v(i) . (33)
i=1

123
24 V. Chandrasekaran, P. Shah

In particular, the optimization over density matrices ρ ∈ conv{v(i) v(i) }i=1


m in (32)

can be folded into the computation of (31) at the expense of solving a Von-Neumann
entropy maximization problem instead of an LP. It is easily seen from (30), (32), and
(33) that for a fixed set of states {v(i) }i=1
m :

  m    m 
C(L) ≥ C L, v(i) ≥ C L, v(i) ,ρ .
i=1 i=1

For a finite set of states {v(i) }i=1


m , the quantity C(L, {v(i) }m ) in (33) is referred to
i=1
as the classical-to-quantum capacity with respect to the states {v(i) }i=1 m [35,49]; the
(i)
reason for this name is that C(L, {v }i=1 ) is the capacity of a quantum channel in
m

which the input is “classical” as it is restricted to be a convex combination of the finite


set {v(i) v(i) }i=1
m . One can improve upon the bound C(L, {v(i) }m ) by progressively
i=1
adding more states to the collection {v(i) }i=1m . It is of interest to develop techniques to

accomplish this in a principled manner.


In summary, in Shor’s approach one fixes a set of states {v(i) }i=1 m and a density

matrix ρ ∈ conv({v(i) v(i) }i=1m ), and one obtains a lower bound on the C
1,∞ quan-
tum capacity C(L) (30) by optimizing over decompositions of ρ in terms of convex
combinations of elements of the set {v(i) v(i) }i=1 m . In contrast, in our method we

fix a set of states {v(i) }i=1


m , and we optimize simultaneously over density matrices
(i) (i)
in conv({v v }i=1 ) as well as decompositions of these matrices. Shor’s method
m

involves the solution of an LP, while the improved bound using our approach comes
at the cost of solving a Von-Neumann entropy optimization problem. The question of
computing the C1,∞ capacity C(L) (30) exactly in a tractable fashion remains open.

4.3.1 Numerical examples

We consider a quantum channel given by a linear operator L : S8 → S8 with:

L(ρ) = A(1) ρA(1) +  2 A(2) ρA(2) + A(3) ρA(3) , (34)

where A(1) is a random 8 × 8 diagonal matrix (entries chosen to be i.i.d. standard


Gaussian), A(2) is a random 8×8 matrix (entries chosen to be i.i.d. standard Gaussian),
 ∈ [0, 1], and A(3) is the symmetric square root of I − A(1) A(1) −  2 A(2) A(2) . We
scale A(1) and A(2) suitably so that I − A(1) A(1) −  2 A(2) A(2) is positive definite
for all  ∈ [0, 1]. For this quantum channel, we describe the results of two numerical
experiments.
In the first experiment, we compute the classical-to-quantum capacity of this chan-
nel with the input states v(i) ∈ S 8 for i = 1, . . . , 8 being the 8 standard basis vectors
in R8 . In other words, the input density ρ ∈ S8+ is given by a diagonal matrix of unit
trace. We add unit vectors in R8 at random to this collection, and we plot the increase
in Fig. 2 in the corresponding classical-to-quantum channel capacity (33)—in each
case, the capacity is evaluated at  = 0.5. In this manner, Von-Neumann entropy
optimization problems can be used to provide successively tighter lower bounds for
the capacity of a quantum channel.

123
Relative entropy optimization and its applications 25

Fig. 2 A sequence of increasingly tighter lower bounds on the quantum capacity of the channel specified
by (34), obtained by computing classical-to-quantum capacities with increasingly larger collections of input
states

A quantum channel can also be restricted in a natural manner to a classical channel.


In the next experiment, we compare the capacity of such a purely classical channel
induced by the quantum channel (34) to a classical-to-quantum capacity of the channel
(34). In both cases we consider input states v(i) ∈ S 8 , i = 1, . . . , 8 given by the 8
standard basis vectors in R8 . With these input states, the classical-to-quantum capacity
of the channel (34) is computed via Von-Neumann entropy optimization by solving
(33); in other words, this capacity corresponds to a setting in which the input is classical
(i.e., a diagonal density matrix) while the output is in general a quantum state (i.e.,
specified by an arbitrary density matrix). To obtain a purely classical channel from
(34) in which both the inputs and the outputs are classical, let Pdiag : S8 → S8 denote
an operator that projects a matrix onto the space of diagonal matrices (i.e., zeros out
the off-diagonal entries), and consider the following linear map:

Lclassical (ρ)  Pdiag (L(ρ)). (35)

For diagonal input density matrices ρ (i.e., classical inputs), this map specifies a
classical channel induced by the quantum channel (34), as the output is also restricted
to be a diagonal density matrix (i.e., a classical output). Figure 3 shows two plots for 
ranging from 0 to 1 of both the classical-to-quantum capacity of (34) and the classical
capacity of the induced classical channel (35). Note that for  = 0 the output of the
operator (34) is diagonal if the input is given by a diagonal density matrix. Therefore,
the two curves coincide at  = 0 in Fig. 3. For larger values of , the classical-to-

123
26 V. Chandrasekaran, P. Shah

Fig. 3 Comparison of a classical-to-quantum capacity of the quantum channel specified by (34) and the
classical capacity of a classical channel induced by the quantum channel given by (34)

quantum channel is greater than the capacity of the induced classical channel, thus
demonstrating the improvement in capacity as one goes from a classical-to-classical
communication protocol to a classical-to-quantum protocol. (The C1,∞ capacity C(L)
(34) of the channel (33) is in general even larger than the classical-to-quantum capacity
computed here.)

4.4 QREP bounds for a robust matrix GP

Moving beyond the family of Von-Neumann entropy optimization problems, which


are effectively just REPs with additional semidefinite constraints, we discuss QREPs
that exploit the full expressive power of the quantum relative entropy function. In
contrast to REPs, general QREPs are optimization problems involving non-commuting
variables. This non-commutativity leads to some important distinctions between the
classical and the quantum settings, and we present these differences in the context of
a stylized application.
We investigate the problem of obtaining bounds on the optimal value of a matrix
analog of a robust GP. Specifically, given a collection of matrices B( j) ∈ Sk , j =
1, . . . , n and a convex set M ⊂ Sk+ , consider the following function of M and the
B( j) ’s:
⎡ ⎤
⎢ ⎥
⎢ n

F(M; {B( j) }nj=1 ) = sup ⎢
⎢ infn Tr(CZ) s.t. Y  log(Z), B ( j)
x j = Y⎥
⎥ . (36)
C∈M ⎣ x∈R k j=1 ⎦
Y∈S
Z∈Sk+

123
Relative entropy optimization and its applications 27

In the inner optimization problem, the constraint set specified by the inequality Y 
log(Z) is a convex set as the matrix logarithm function is operator concave. The inner
problem is a non-commutative analog of an unconstrained GP (3); the set M, over
which the outer supremum is taken, specifies “coefficient uncertainty.” Hence, the
nested optimization problem (36) is a matrix equivalent of a robust unconstrained
GP with coefficient uncertainty. To see this connection more precisely, consider the
analogous problem over vectors for a convex set C ⊂ Rn+ and a collection {b( j) }nj=1 ⊂
Rk : ⎡ ⎤
⎢ ⎥
⎢ n

f (C; {b( j) }nj=1 ) = sup ⎢
⎢ infn c z s.t. y ≤ log(z),
( j)
b x j = y⎥
⎥. (37)
c∈C ⎣ x∈Rk j=1 ⎦
y∈R
z∈Rk+

The inner optimization problem here is simply an unconstrained GP, and the set
C specifies coefficient uncertainty. The reason for this somewhat non-standard
description—an equivalent, more familiar specification
 of(37) via sums of exponen-
k
tials as in (3) would be supc∈C inf x∈Rn i=1 ci exp a(i) x , where a(i) is the i’th row
of the k × n matrix consisting of the b( j) ’s as columns—is to make a more transparent
connection between the matrix case (36) and the vector case (37).
Our method for obtaining bounds on F(M; {B( j) }nj=1 ) is based on a relationship
between the constraint Y  log(Z) and the quantum relative entropy function via
convex conjugacy. To begin with, we describe the relationship between the vector
constraint y ≤ log(z) and classical relative entropy. Consider the following character-
istic function for y ∈ Rk and z ∈ −Rk+ :
*
0, if y ≤ log(−z)
χaff-log (y, z) = (38)
∞, otherwise.

We then have the following result:


Lemma 1 Let χaff-log (y, z) : Rk × −Rk+ → R denote the characteristic function

defined in (38). Then the convex conjugate χaff-log (ν, λ) = d(ν, eλ) with domain
(ν, λ) ∈ R+ × R+ , where e is Euler’s constant.
k k

The proof of this lemma follows from a straightforward calculation. Based on this
result and by computing the dual of the inner convex program in (37), we have that
the function f (C; {b( j) }nj=1 ) can be computed as the optimal value of an REP:

f (C; {b( j) }nj=1 ) = sup −d(ν, ec) s.t. b( j) ν = 0, j = 1, . . . , n. (39)


c∈C , ν∈Rk+

Moving to the matrix case, consider the natural matrix analog of the characteristic
function (38) for Y ∈ Sk and Z ∈ −Sk+ :
*
0, if Y  log(−Z)
χmat-aff-log (Y, Z) = (40)
∞, otherwise.

123
28 V. Chandrasekaran, P. Shah

As the matrix logarithm is operator concave, this characteristic function is also a


convex function. However, its relationship to the quantum relative entropy function is
somewhat more complicated, as described in the following proposition:
Proposition 5 Let χmat-aff-log (Y, Z) : Sk × −Sk+ → R denote the characteristic

function defined in (40). Then the convex conjugate χmat-aff-log (N, Λ) ≤ D(N, eΛ),
with equality holding if the matrices N and Λ are simultaneously diagonalizable. Here
(N, Λ) ∈ Sk+ × Sk+ , and e is again Euler’s constant.
Proof The convex conjugate of χmat-aff-log (Y, Z) can be simplified as follows:

χmat-aff-log (N, Λ) = sup Tr(NY) + Tr(ΛZ) s.t. Y  log(−Z)
Y∈Sk
Z∈−Sk+
= sup Tr[N log(−Z)] + Tr(ΛZ)
Z∈−Sk+
[set W = log(−Z)]
= sup Tr(NW) − Tr[Λ exp(W)]
W∈Sk
= sup Tr(NW) − Tr[exp(log(Λ)) exp(W)]. (41)
W∈Sk

In order to simplify further, we appeal to the Golden–Thompson inequality [25,52],


which states that:
Tr[exp(A + B)] ≤ Tr[exp(A) exp(B)], (42)
for Hermitian matrices A, B. Equality holds here if the matrices A and B commute.

Consequently, we can bound χmat-aff-log (N, Λ) as:
(i)

χmat-aff-log (N, Λ) ≤ sup Tr(NW) − Tr[exp(log(Λ) + W)]
W∈Sk
[set U = exp(log(Λ) + W)]
= sup Tr[N log(U)] − Tr[N log(Λ)] − Tr(U)
U∈Sk+
⎡ ⎤

= ⎣ sup Tr[N log(U)] − Tr(U)⎦ − Tr[N log(Λ)]


U∈Sk+
(ii)
= Tr[N log(N)] − Tr(N) − Tr[N log(Λ)] = D(N, eΛ).

Here (i) follows from the Golden–Thompson inequality (42), and the equality (ii)
follows from the fact that the optimal U in the previous line is N. Consequently, one
can check that if N and Λ are simultaneously diagonalizable, then the inequality (i)
becomes an equality. 

Thus, the non-commutativity underlying the quantum relative entropy function in
contrast to the classical case results in D(N, eΛ) only being an upper bound (in gen-

eral) on the convex conjugate χmat-aff-log (N, Λ). From the perspective of this result,

123
Relative entropy optimization and its applications 29

Lemma 1 follows from the observation that the relative entropy between two nonneg-
ative vectors can be viewed as the quantum relative entropy between two diagonal
positive semidefinite matrices (which are, trivially, simultaneously diagonalizable).
Based on Proposition 5 and again appealing to convex duality, the function
F(M; {B( j) }nj=1 ) can be bounded below by solving a QREP as follows:

F(M; {B( j) }nj=1 ) ≥ sup −D(N, eC) s.t. Tr(B( j) N) = 0, j = 1, . . . , n. (43)


C∈M
N∈Sk+

The quantity on the right-hand-side can be computed, for example, via projected
coordinate descent. If the matrices {B( j) }nj=1 ∪ {C} are simultaneously diagonalizable
for each C ∈ M, then the QREP lower bound (43) is equal to F(M; {B( j) }nj=1 ). In
summary, we have that REPs are useful for computing f (C; {b( j) }nj=1 ) exactly, while
QREPs only provide a lower bound via (43) of F(M; {B( j) }nj=1 ) in general.

5 Further directions

There are several avenues for further research that arise from this paper. It is of interest
to develop efficient numerical methods to solve REPs and QREPs in order to scale
to large problem instances. Such massive size problems are especially prevalent in
data analysis tasks, and are of interest in settings such as kernel learning. On a related
note, there exists a vast literature on exploiting the structure of a particular problem
instance of an SDP or a GP—e.g., sparsity in the problem parameters—which can result
in significant computational speedups in practice in the solution of these problems.
A similar set of techniques would be useful and relevant in all of the applications
described in this paper.
We also seek a deeper understanding of the expressive power of REPs and QREPs,
i.e., of conditions under which convex sets can be tractably represented via REPs
and QREPs as well as obstructions to efficient representations in these frameworks
(in the spirit of similar results that have recently been obtained for SDPs [9,27,34]).
Such an investigation would be useful in identifying problems in statistics and infor-
mation theory that may be amenable to solution via tractable convex optimization
techniques.

Acknowledgements The authors would like to thank Pablo Parrilo and Yong-Sheng Soh for helpful
conversations, and Leonard Schulman for pointers to the literature on Von-Neumann entropy. Venkat Chan-
drasekaran was supported in part by National Science Foundation Career award CCF-1350590 and Air
Force Office of Scientific Research grant FA9550-14-1-0098.

6 Appendix

The second-order cone L 2 ⊂ R3 from (5) can be written as:


 
y − x1 x2
L 2 = (x, y) ∈ R2 × R | 0 .
x2 y + x1

123
30 V. Chandrasekaran, P. Shah

Combining this reformulation with the next result gives us the description (6).

Proposition 6 We have that a c


c b ∈ S2+ if and only if there exists ν ∈ R+ such that:
ν  ν 
ν log + ν log − 2ν ≤ 2c
 aν   νb 
ν log + ν log − 2ν ≤ −2c
a b
a, b ∈ R+ .

Proof We have that ac bc ∈ S2+ if and only if a z̄12 + bz̄22 + 2cz̄1 z̄2 ≥ 0, ∀z̄ ∈ R2 . This
latter condition can in turn be rewritten to obtain the following reformulation:

ac
∈ S2+ ⇔ az12 +bz22 +2cz1 z2 ≥ 0 and az12 +bz22 −2cz1 z2 ≥ 0 ∀z ∈ R2+ . (44)
cb

Each of these inequalities with universal quantifiers can be reformulated by appealing


to GP duality. Specifically, based on the change of variables wi ← log(zi ), i = 1, 2,
which is commonly employed in the GP literature [11,20], we have from (44) that
c b ∈ S+ if and only if:
a c 2

inf a exp{w1 − w2 } + b exp{w2 − w1 } ≥ max{2c, −2c}. (45)


w∈R2

As the optimization problem on the left-hand-side is a GP for a, b ∈ R+ , we can


appeal to convex duality to conclude that
ν  ν 
inf a exp{w1 − w2 } + b exp{w2 − w1 } = sup − ν log − ν log + 2ν.
w∈R2 ν∈R+ a b

Combining this result with (45), we have that a c


c b ∈ S2+ if and only if:
ν  ν 
a, b ∈ R+ and ∃ν ∈ R+ s.t. − ν log − ν log + 2ν ≥ max{2c, −2c}. (46)
a b
This concludes the proof. 


References
1. Bapat, R.B., Beg, M.I.: Order statistics for nonidentically distributed variables and permanents.
Sankhya Indian J. Stat. A 51, 79–93 (1989)
2. Barvinok, A.I.: Computing mixed discriminants, mixed volumes, and permanents. Discrete Comput.
Geom. 18, 205–237 (1997)
3. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton
(2009)
4. Ben-Tal, A., Nemirovski, A.: Optimal design of engineering structures. Optima. 47, 4–8 (1995)
5. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23, 769–805 (1998)
6. Ben-Tal, A., Nemirovski, A.: On polyhedral approximations of the second-order cone. Math. Oper.
Res. 26, 193–205 (2001)

123
Relative entropy optimization and its applications 31

7. Ben-Tal, A., Nemirovskii, A.: Lectures on Modern Convex Optimization. Society for Industrial and
Applied Mathematics, Philadelphia (2001)
8. Betke, U.: Mixed volumes of polytopes. Arch. Math. 58, 388–391 (1992)
9. Blekherman, G., Parrilo, P., Thomas, R.: Semidefinite Optimization and Convex Algebraic Geometry.
Society for Industrial and Applied Mathematics, Philadelphia (2013)
10. Boyd, S., Kim, S.J., Patil, D., Horowitz, M.: Digital circuit optimization via geometric programming.
Oper. Res. 53, 899–932 (2005)
11. Boyd, S., Kim, S.J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng.
8, 67–127 (2007)
12. Chandrasekaran, V., Shah, P.: Conic geometric programming. In: Proceedings of the Conference on
Information Sciences and Systems (2014)
13. Chandrasekaran, V., Shah, P.: Relative entropy relaxations for signomial optimization. SIAM J. Optim.
(2014)
14. Chiang, M.: Geometric programming for communication systems. Found. Trends Commun. Inf. Theory
2, 1–154 (2005)
15. Chiang, M., Boyd, S.: Geometric programming duals of channel capacity and rate distortion. IEEE
Trans. Inf. Theory 50, 245–258 (2004)
16. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (2006)
17. Cox, D.R.: The regression analysis of binary sequences. J. R. Stat. Soc. 20, 215–242 (1958)
18. Dinkel, J.J., Kochenberger, G.A., Wong, S.N.: Entropy maximization and geometric programming.
Environ. Plan. A 9, 419–427 (1977)
19. Drew, J.H., Johnson, C.R.: The maximum permanent of a 3-by-3 positive semidefinite matrix, given
the eigenvalues. Linear Multilinear Algebra 25, 243–251 (1989)
20. Duffin, R.J., Peterson, E.L., Zener, C.M.: Geometric Programming: Theory and Application. Wiley,
New York (1967)
21. Egorychev, G.P.: Proof of the Van der Waerden conjecture for permanents (english translation; original
in russian). Sib. Math. J. 22, 854–859 (1981)
22. El Ghaoui, L., Lebret, H.: Robust solutions to least-squares problems with uncertain data. SIAM J.
Matrix Anal. Appl. 18, 1035–1064 (1997)
23. Falikman, D.I.: Proof of the Van der Waerden conjecture regarding the permanent of a doubly stochastic
matrix (english translation; original in russian). Math. Notes 29, 475–479 (1981)
24. Glineur, F.: An extended conic formulation for geometric optimization. Found. Comput. Decis. Sci.
25, 161–174 (2000)
25. Golden, S.: Lower bounds for the Helmholtz function. Phys. Rev. Ser. II 137, B1127–B1128 (1965)
26. Gonalves, D.S., Lavor, C., Gomes-Ruggiero, M.A., Cesrio, A.T., Vianna, R.O., Maciel, T.O.: Quantum
state tomography with incomplete data: maximum entropy and variational quantum tomography. Phys.
Rev. A 87 (2013)
27. Gouveia, J., Parrilo, P., Thomas, R.: Lifts of convex sets and cone factorizations. Math. Oper. Res. 38,
248–264 (2013)
28. Grone, R., Johnson, C.R., Eduardo, S.A., Wolkowicz, H.: A note on maximizing the permanent of
a positive definite hermitian matrix, given the eigenvalues. Linear Multilinear Algebra 19, 389–393
(1986)
29. Gurvits, L.: Van der Waerden/Schrijver-Valiant like conjectures and stable (aka hyperbolic) homoge-
neous polynomials: one theorem for all. Electron. J. Comb. 15 (2008)
30. Gurvits, L., Samorodnitsky, A.: A deterministic algorithm for approximating the mixed discriminant
and mixed volume, and a combinatorial corollary. Discrete Comput. Geom. 27, 531–550 (2002)
31. Han, S., Preciado, V.M., Nowzari, C., Pappas, G.J.: Data-Driven Network Resource Allocation for
Controlling Spreading Processes. IEEE Trans. Netw. Sci. Eng. 2(4), 127–38 (2015)
32. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Berlin (2008)
33. Hellwig, K., Krauss, K.: Operations and measurements II. Commun. Math. Phys. 16, 142–147 (1970)
34. Helton, J.W., Nie, J.: Sufficient and necessary conditions for semidefinite representability of convex
hulls and sets. SIAM J. Optim. 20, 759–791 (2009)
35. Holevo, A.S.: The capacity of the quantum channel with general signal states. IEEE Trans. Inf. Theory
44, 269–273 (1998)
36. Hsiung, K.L., Kim, S.J., Boyd, S.: Tractable approximate robust geometric programming. Optim. Eng.
9, 95–118 (2008)
37. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. Ser. II 106, 620–630 (1957)

123
32 V. Chandrasekaran, P. Shah

38. Jerrum, M., Sinclair, A., Vigoda, E.: A polynomial-time approximation algorithm for the permanent
of a matrix with non-negative entries. J. ACM 51, 671–697 (2004)
39. Kulis, B., Sustik, M., Dhillon, I.: Low-rank kernel learning with Bregman matrix divergences. J. Mach.
Learn. Res. 10, 341–376 (2009)
40. Lieb, E.: Convex trace functions and the Wigner–Yanase–Dyson conjecture. Adv. Math. 11, 267–288
(1973)
41. Linial, N., Samorodnitsky, A., Wigderson, A.: A deterministic strongly polynomial algorithm for matrix
scaling and approximate permanents. Combinatorica 20, 545–568 (2000)
42. Lobo, M., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming.
Linear Algebra Appl. 284, 193–228 (1998)
43. Minc, H.: Permanents. Cambridge University Press, Cambridge (1984)
44. Nesterov, Y., Nemirovski, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society
of Industrial and Applied Mathematics, Philadelphia (1994)
45. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information. Cambridge University
Press, Cambridge (2011)
46. Potchinkov, A.W., Reemsten, R.M.: The design of FIR filters in the complex plane by convex opti-
mization. Signal Process. 46, 127–146 (1995)
47. Prajna, S., Jadbabaie, A.: Safety Verification of Hybrid Systems Using Barrier Certificates. In: Hybrid
Systems: Computation and Control, pp. 477–492. Springer (2004)
48. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
49. Schumacher, B., Westmoreland, M.D.: Sending classical information via noisy quantum channels.
Phys. Rev. A 56, 131–138 (1997)
50. Scott, C.H., Jefferson, T.R.: Trace optimization problems and generalized geometric programming. J.
Math. Anal. Appl. 58, 373–377 (1977)
51. Shor, P.W.: Capacities of quantum channels and how to find them. Math. Program. B 97, 311–335
(2003)
52. Thompson, C.J.: Inequality with applications in statistical mechanics. J. Math. Phys. 6, 1812–1813
(1965)
53. Valiant, L.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)
54. Vandenberghe, L., Boyd, S., Wu, S.: Determinant maximization with linear matrix inequality con-
straints. SIAM J. Matrix Anal. Appl. 19, 499–533 (1998)
55. Wall, T., Greening, D., Woolsey, R.: Solving complex chemical equilibria using a geometric program-
ming based technique. Oper. Res. 34, 345–355 (1986)
56. Yazarel, H., Pappas, G.: Geometric programming relaxations for linear system reachability. In: Pro-
ceedings of the American Control Conference (2004)

123

You might also like