A Course On Semidefinite Optimization
A Course On Semidefinite Optimization
Monique Laurent
Frank Vallentin
For convenience we briefly recall some notation that we will use in this
chapter and we refer to the Appendices for more details on notation and basic
properties. Throughout S n denotes the set of symmetric n × n matrices. For
a matrix X ∈ S n , X 0 means that X is positive semidefinite and S+ n is
Throughout, In (or simply I when the dimension is clear from the context)
denotes the n × n identity matrix, e denotes the all-ones vector, i.e., e =
(1, . . . , 1)T ∈ Rn , and Jn = eeT (or simply J) denotes the all-ones matrix.
The vectors e1 , . . . , en are the standard unit vectors in Rn , and the matrices
Eij = (ei eT T n
j + ej ei )/2 (1 ≤ i ≤ j ≤ n) form the standard basis of S . We
let O(n) denote the set of orthogonal matrices, where an n × n matrix A is
orthogonal if AAT = In or, equivalently, AT A = In .
We consider the trace inner product: hA, Bi = Tr(AT B) = ni,j=1 Aij Bij
P
for two matrices A, B ∈ Rn×n . Here Tr(A) = hIn , Ai = ni=1 Aii denotes the
P
subspace W.
The set F = S+ n ∩ W is called the feasible region of (1.1) and a matrix
since they can be brought into the above standard maximization form using
the fact that infhC, Xi = − suph−C, Xi.
Note that we write a supremum in (1.1) rather than a maximum. This
is because the optimum value p∗ may or may not be attained in (1.1). In
general, we have: p∗ ∈ R ∪ {±∞}, with p∗ = −∞ if the problem (1.1) is
1.1 Primal and dual semidefinite programs 5
infeasible and p∗ = +∞ might occur in which case we say that the problem
is unbounded.
We give a small example as an illustration. Consider the problem of min-
imizing/maximizing X11 over the feasible region
X
11 a
n o
Fa = X ∈ S 2 : X = 0 , where a ∈ R,
a 0
with a being a given parameter. Note that det(X) = −a2 for X ∈ Fa . Hence,
if a 6= 0 then the problem is infeasible (since X22 = 0 implies X12 = 0 if X is
positive semidefinite). Moreover, if a = 0 then the problem is feasible (e.g.,
the zero matrix is feasible) but not strictly feasible (since the standard unit
vector e2 belongs to the kernel of every feasible solution X). The minimum
value of X11 over F0 is equal to 0, attained at X = 0, while the maximum
value of X11 over F0 is equal to ∞ (the problem is unbounded).
As another example, consider the problem
∗
n X11 1 o
p = inf X11 : 0 .
X∈S 2 1 X22
Then the infimum is p∗ = 0, which is reached at the limit when X11 = 1/X22
and letting X22 tend to +∞. So the infimum is not attained.
In the special case when the matrices Aj , C are diagonal matrices, with
diagonals aj , c ∈ Rn , then the program (1.1) reduces to the linear program
(LP):
n o
max cT x : aTj x = bj for j ∈ [m], x ≥ 0 ,
x
Thus the dual program has variables yj for j ∈ [m], one for each linear
constraint of the primal program (1.1). The positive semidefinite constraint
m
X
yj Aj − C 0 (1.3)
j=1
arising in (1.2) is also named a linear matrix inequality (LMI). The notions
of (strict) feasibility can be analogously defined for the dual program. For
instance, y ∈ Rm is called feasible (resp., strictly feasible) for (1.2) if the
matrix m
P
j=1 yj Aj − C is positive semidefinite (resp., positive definite).
The following facts relate the primal and dual SDP’s. They are simple,
but very important.
Lemma 1.1.1 Let (X, y) be a primal/dual pair of feasible solutions, i.e.,
X is a feasible solution of (1.1) and y is a feasible solution of (1.2).
(i) (weak duality) We have that hC, Xi ≤ bT y and thus p∗ ≤ d∗ .
(ii) (complementary slackness) Assume that the primal program attains
its supremum at X, that the dual program attains its infimum at y, and
that p∗ = d∗ . Then the equalities hC, Xi = bT y and hX, m
P
j=1 yj Aj −Ci =
0 hold.
(iii) (optimality criterion) If equality hC, Xi = bT y holds, then the supre-
mum of (1.1) is attained at X, the infimum of (1.2) is attained at y and
p∗ = d∗ .
Proof If (X, y) is a primal/dual pair of feasible solutions, then
m
X m
X m
X
0 ≤ hX, yj Aj −Ci = hX, Aj iyj −hX, Ci = bj yj −hX, Ci = bT y−hC, Xi.
j=1 j=1 j=1
P
The left most inequality follows from the fact that both X and j yj Aj − C
are positive semidefinite and self-duality of the cone of completely positive
semidefinite matrices, and we use the fact that hAj , Xi = bj to get the
second equality. This implies that
hC, Xi ≤ p∗ ≤ d∗ ≤ bT y.
The rest of the lemma follows immediately by considering the equality case
p∗ = d∗ .
The difference d∗ − p∗ is called the duality gap. In general there might
be a positive duality gap between the primal and dual SDP’s. When there
is no duality gap, i.e., p∗ = d∗ , one says that strong duality holds, a very
desirable situation. This topic and criteria for strong duality will be discussed
1.2 Eigenvalue optimization 7
in detail in Chapter 2. For now we only quote the following result on strong
duality which will be proved in Chapter 2 (in the general setting of conic
programming).
Theorem 1.1.2 (Strong duality: no duality gap) Consider the pair
of primal and dual programs (1.1) and (1.2).
(i) Assume that the dual program (1.2) is bounded from below (d∗ > −∞)
and that it is strictly feasible. Then the primal program (1.1) attains its
supremum (i.e., there exists X ∈ F such that p∗ = hC, Xi) and there is
no duality gap: p∗ = d∗ .
(ii) Assume that the primal program (1.1) is bounded from above (p∗ < ∞)
and that it is strictly feasible. Then the dual program (1.2) attains its
infimum (i.e., d∗ = bT y for some dual feasible y) and there is no duality
gap: p∗ = d∗ .
In the rest of this chapter we discuss several examples of applications
of semidefinite programming, many of which will be studied in much more
detail in the following parts of the book.
This is known as the Rayleigh-Ritz principle and can be proved, e.g., using
the spectral decomposition theorem. As we will now see the largest and
smallest eigenvalues can also be expressed via a semidefinite program. For
this, consider the following semidefinite program
p∗ = sup {hC, Xi : Tr(X) = 1, X 0} (1.6)
X
That is, λ1 + · · · + λk = µ1 = µ2 .
Note that the program (1.9) can be reformulated as a semidefinite program
in standard primal form by introducing an additional matrix variable (see
Exercise 1.1). The proof of Theorem 1.2.2 will use the fact that the extreme
points of the polytope
P = {x ∈ [0, 1]n : eT x = k} (1.10)
are exactly the points x ∈ P ∩ {0, 1}n . (See Exercise A.10).
Proof Let u1 , . . . , un denote an orthonormal basis of eigenvectors corre-
sponding to the eigenvalues λ1 , . . . , λn of C, let U denote the matrix with
1.2 Eigenvalue optimization 9
Now let z = (Zii )ni=1 denote the vector containing the diagonal entries
of Z. The condition: I Z 0 implies that z ∈ [0, 1]n . Moreover, the
Pn
condition: Tr(Z) = k implies eT z = k and we have hD, Zi = i=1 λi zi .
Hence the vector z lies in the polytope P from (1.10) and thus we obtain:
µ2 ≤ maxz∈P ni=1 λi zi . Now recall that the maximum of the linear func-
P
Pn
tion i=1 λi zi is attained at an extreme point of P . As recalled above, the
extreme points of P are the 0/1 valued vectors with exactly k ones. From
this follows immediately that the maximum value of ni=1 λi zi taken over P
P
For the proof we will use an intermediary well-known result about doubly
stochastic matrices. Recall that a matrix X ∈ Rn×n is doubly stochastic if
X is nonnegative and has all its row and column sums equal to 1. So the
polyhedron
n n
X n
X o
n×n
DS(n) = X ∈ R+ : Xij = 1 for j ∈ [n], Xij = 1 for i ∈ [n]
i=1 j=1
The following lemma will play a key role in the proof of Theorem 1.3.1.
Proof The feasible region of the program (1.16) is precisely the set DS(n) of
doubly stochastic matrices and, by the above mentioned result of Birkhoff,
it is equal to the convex hull of the set of permutation matrices. As the
minimum value of (1.16) is attained at an extreme point of DS(n) (i.e., at a
permutation matrix), it is equal to the minimum value of ni=1 αi βσ(i) taken
P
This shows that the optimum value of (1.16) (and thus of (1.15)) is equal to
Pn
i=1 αi βi .
Proof (of Theorem 1.3.1) The proof is structured as follows: First we show
the identity (1.14) and then we show that the optimal value of the program
(1.13) is equal to ni=1 αi βi .
P
P
First, we show the identity (1.14): OPT(A, B) = i αi βi . For this the
first step consists of replacing the program (1.11) by an equivalent program
where the matrices A and B are diagonal. For this, write A = P DP T and
B = QEQT where P, Q ∈ O(n) and D (resp., E) is the diagonal matrix with
diagonal entries αi (resp., βi ). For X ∈ O(n), we have Y := P T XQ ∈ O(n)
and Tr(AXBX T ) = Tr(DY EY T ). Hence the optimization problem (1.11)
is equivalent to the program:
That is,
OPT(A, B) = OPT(D, E). (1.18)
The next step is to show that the program (1.17) has the same optimum
value as the linear program (1.16), i.e., in view of Lemma 1.3.3, that
X
OPT(D, E) = αi βi . (1.19)
i
For this, pick X ∈ O(n) and consider the matrix Z = ((Xij )2 )ni,j=1 , which is
doubly-stochastic (since X is orthogonal). Moreover, since
n
X n
X
Tr(DXEX T ) = αi βi (Xij )2 = αi βi Zij ,
i,j=1 i,j=1
semidefinite program:
Tr(S 0 ) + Tr(T 0 ) : E ⊗ F − In ⊗ T 0 − S 0 ⊗ In 0, S 0 , T 0 ∈ S n ,
max
0 0
S ,T
(1.20)
obtained from (1.13) by replacing A and B by E and F . Indeed, using the
relation
(P ⊗Q)(E⊗F −In ⊗T −S⊗In )(P ⊗Q)T = A⊗B−In ⊗(QT QT )−(P SP T )⊗In
and the fact that P ⊗ Q is orthogonal, we see that the pair (S, T ) is feasible
for (1.13) if and only if the pair (S 0 = P SP T , T 0 = QT QT ) is feasible for
(1.20) and moreover we have Tr(S) + Tr(T ) = Tr(S 0 ) + Tr(T 0 ).
Next we show that the program (1.20) is equivalent to the linear program
(1.15). For this, observe that in the program (1.20) we may assume without
loss of generality that the matrices S 0 and T 0 are diagonal because the matrix
E ⊗ F is diagonal. Indeed, if we define the vectors x = diag(S 0 ) and y =
diag(T 0 ), we see that, since E ⊗ F is diagonal, the diagonal matrices S 00 =
Diag(x) and T 00 = Diag(y) are still feasible for (1.20) with the same objective
value: Tr(S 0 ) + Tr(T 0 ) = Tr(S 00 ) + Tr(T 00 ). Now, the program (1.20) with the
additional condition that S 0 , T 0 are diagonal matrices can be rewritten as
the linear program (1.15), since the matrix E ⊗ F − In ⊗ T 0 − S 0 ⊗ In is
diagonal with diagonal entries αi βj − xi − yj for i, j ∈ [n].
Hence, we can conclude that the maximum value of the program (1.20)
P
is equal to the maximum value of the program (1.15), which is i αi βi
by Lemma 1.3.3. This implies that the optimal value of (1.13) is equal to
P
i αi βi , which concludes the proof.
The QAP problem models the following facility location problem, where one
wants to allocate n facilities to n locations at the lowest possible total cost.
The cost of allocating two facilities i and j to the respective locations σ(i)
and σ(j) is then Aij Bσ(i)σ(j) , where Aij can be seen as the ‘flow’ cost between
the facilities i and j, and Bσ(i)σ(j) is the ‘distance’ between the locations
σ(i) and σ(j)). For instance, you may think of the campus building problem,
where one needs to locate n buildings at n locations, Aij represents the traffic
14 Semidefinite programs: Basic facts and examples
intensity between the buildings i and j, and Bhk is the distance between the
locations h and k.
As QAP is an NP-hard problem one is interested in finding some tractable
lower bounds for it. As we now see such bound can be obtained from the
result in Theorem 1.3.1. For this observe first that problem (1.21) can be
reformulated as the following optimization problem over the set of permu-
tation matrices:
QAP(A, B) = min{Tr(AXBX T ) : X is a permutation matrix}
Pn T
(because i,j=1 Aij Bσ(i)σ(j) = Tr(AXBX ) if X = Pσ ). Then, observe
that a matrix X is a permutation matrix if and only if it is simultaneously
doubly stochastic and orthogonal (Exercise 1.2). Hence, if in program (1.21)
we replace the condition that X is a permutation matrix by the condition
that X is an orthogonal matrix, then we obtain the program (1.11), which
is thus a relaxation of the QAP program (1.21). This shows the inequality
QAP(A, B) ≥ OPT(A, B)
and the next theorem.
Theorem 1.3.4 Let A, B ∈ S n be symmetric matrices with respective
eigenvalues α1 , . . . , αn and β1 , . . . , βn ordered as follows: α1 ≤ . . . ≤ αn and
β1 ≥ . . . ≥ βn . Then we have
n
X
QAP(A, B) ≥ OPT(A, B) = αi βi .
i=1
t xT
n o
Ln+1 := {(x, t) ∈ Rn × R : kxk ≤ t} = (x, t) ∈ Rn × R : 0 .
x tIn
So, if we intersect the set Ln+1 with the hyperplane t = t0 (for some scalar
t0 ), then we obtain in the x-space the Euclidean ball of radius t0 . The set
Ln+1 is a cone, known as the second-order cone (or Lorentz cone). This
cone is briefly introduced in Appendix 2.2.2 and we will come back to it in
Chapter 2.
min{cT x : kxk ≤ 1, x ∈ Rn }
x
1 xT
n o
T n
= min c x : 0, x ∈ R (1.23)
x x In
= max −Tr(X) : 2X0i = ci for i ∈ [n], X 0, X ∈ S n+1 .
X
Proof Apply Lemma 1.4.1 combined with the fact that strong duality holds
between the primal and dual programs (see Theorem 1.1.2).
where c, a ∈ Rn and b ∈ R are given data, with just one constraint for
simplicity of exposition. In practical applications the data a, b might be given
through experimental results and might not be known exactly with 100%
certainty, which is in fact the case in most of the real world applications
of linear programming. One may write a = a(z) and b = b(z) as functions
of an uncertainty parameter z assumed to lie in a given uncertainty region
Z ⊆ Rk . Then one wants to find an optimum solution x that is robust
against this uncertainty, i.e., that satisfies the constraints a(z)T x ≥ b(z) for
all values of the uncertainty parameter z ∈ Z. That is, solve the following
robust counterpart of the linear program (1.24):
Depending on the set Z this problem might have infinitely many constraints.
However, for certain choices of the functions a(z), b(z) and of the uncertainty
region Z, one can reformulate the problem as a semidefinite programming
problem.
Suppose that the uncertainty region Z is the unit ball and that a(z), b(z)
are linear functions in the uncertainty parameter z = (ζ1 , · · · , ζk ) ∈ Rk , of
the form
k
X k
X
a(z) = a0 + ζj aj , b(z) = b0 + ζj bj (1.26)
j=1 j=1
Theorem 1.5.1 Suppose that the functions a(z) and b(z) are given by
(1.26) and that Z = {z ∈ Rk : kzk ≤ 1}. Then problem (1.25) is equivalent
1.6 Examples in combinatorial optimization 17
to the problem:
min {cT x : aT
j x − 2Z0j = bj for j ∈ [k],
x∈Rn ,Z∈S k+1
aT
0 x − Tr(Z) ≥ b0 , Z 0, Z ∈ S
k+1 , x ∈ Rn }.
(1.27)
Proof Fix x ∈ Rn , set αj = aTj x−bj for j = 0, 1, . . . , k, and define the vector
α = (αj )j=1 ∈ R (which depends on x). Then the constraints: a(z)T x ≥ b(z)
k k
Therefore, we find the problem of deciding whether p∗x ≥ −α0 , where we set
Now the above problem fits precisely within the setting considered in Corol-
lary 1.4.2. Hence, we can rewrite it using the second formulation in (1.23) –
the one in maximization form – as
n o
p∗x = max −Tr(Z) : 2Z0j = αj (j ∈ [k]), Z 0, Z ∈ S k+1 .
Z
So, in problem (1.25), we can substitute the condition: a(z)T x ≥ b(z) for all
z ∈ Z, by the condition:
k+1
∃Z ∈ S+ such that − Tr(Z) ≥ −α0 and 2Z0j = αj for all j ∈ [k].
The crucial fact here is that the quantifier “∀z” has been replaced by the
existential quantifier “∃Z”. As problem (1.25) is a maximization problem in
x, it is equivalent to the following maximization problem in the variables x
and Z:
n o
k+1
max cT x : aT 0 x − Tr(Z) ≥ b0 , a T
j x − 2Z 0j = bj for j ∈ [k], x ∈ Rn
, Z ∈ S+
x,Z
and its optimum value is equal to ϑ(G) (because (1.28) is strictly feasible
and bounded – check it). Here, in the program (1.29), we have used the
elementary matrices Eij introduced in the abstract of the chapter.
We will come back in detail to the theta number in Chapter 4. As we will
see there, there is an interesting class of graphs for which α(G) = ϑ(G),
the so-called perfect graphs. For these graphs, the maximum independent
set problem can be solved in polynomial time. This result is one of the
first breakthrough applications of semidefinite programming obtained by
Grötschel, Lovász and Schrijver in the early 1980s.
(S, V \ S), i.e., for which exactly one of the two nodes i, j belongs to S; that
is,
δG (S) = {{i, j} ∈ E : i ∈ S, j ∈ V \ S}.
The maximum cut problem (or max-cut) asks to find a cut of maximum
cardinality in G. This is an NP-hard problem.
One can encode the max-cut problem using variables x ∈ {±1}n . For this,
given a subset S ⊆ V , assign xi = 1 to the nodes i ∈ S and xi = −1
to the nodes i ∈ V \ S. Then the cardinality of the cut δG (S) is equal to
P
{i,j}∈E (1 − xi xj )/2. Therefore max-cut can be formulated as
X
max-cut(G) = max (1 − xi xj )/2 : x ∈ {±1}n . (1.30)
x
{i,j}∈E
1 T 1 X
x LG x = (1 − xi xj ) for all x ∈ {±1}n .
4 2
{i,j}∈E
The first identity shows that LG 0 and the second one shows that one can
20 Semidefinite programs: Basic facts and examples
Of course, this problem comes in several flavors. One may search for such
vectors ui lying in a space of prescribed dimension k; then typically k = 1, 2,
or 3 would be of interest. This is in fact a hard problem. However, if we relax
the bound on the dimension and simply ask for the existence of the ui ’s in
Rk for some k ≥ 1, then the problem can be cast as the problem of deciding
feasibility of a semidefinite program.
Moreover, such vectors exist in the space Rk if and only if the above semidef-
inite program has a feasible solution of rank at most k.
Proof Directly, using the fact that X 0 if and only if X admits a Gram
representation u1 , . . . , un ∈ Rk (for some k ≥ 1), i.e., Xij = uT
i uj for all
i, j ∈ [n]. Moreover, the rank of X is equal to the dimension of the linear
span of the set {u1 , . . . , un }.
xα stands for the monomial xα1 1 · · · xαnn , where α ∈ Nn . The sum is finite and
the maximum value of |α| = ni=1 αi for which pα 6= 0 is the degree of p. For
P
and historical perspective and the monographs by Prestel and Delzell [2001]
and by Marshall [2008] give an in-depth treatment of positivity.
Exercises
1.1 Consider the program (1.9) in Fan’s theorem (Theorem 1.2.2).
(a) Formulate the program (1.9) as a semidefinite program in primal
standard form.
(b) Show that the dual SDP of the program (1.9) can be formulated as
the following SDP:
n
( )
X
min n kz + Zii : Z 0, −C + zI + Z 0
z∈R,Z∈S
i=1
1.2 Show that the following assertions (a)–(d) are equivalent for a matrix
X ∈ Rn×n :
(a) X is
a permutation matrix.
(b) X is
an orthogonal matrix and X is doubly stochastic.
√
(c) X doubly stochastic and kXk = n.
is
(d) X doubly stochastic with entries in {0, 1}.
is
qP
Here kXk = 2
i,j Xij denotes the Frobenius norm of the matrix X.
(a) Build the dual of the semidefinite programming (1.33) and show
that it is equivalent to
n
min {λmax (Diag(u) + LG ) : eT u = 0},
4 u∈Rn
where Diag(u) is the diagonal matrix with diagonal entries u1 , . . . , un .
(b) Show that the maximum cardinality of a cut is at most
n
λmax (LG ),
4
where λmax (LG ) is the maximum eigenvalue of the Laplacian matrix
of G.
(c) Show that the maximum cardinality of a cut in G is at most
1 n
|E| − λmin (AG ),
2 4
where AG is the adjacency matrix of G (with entry 1 at the positions
(i, j) corresponding to edges of G and 0 elsewhere).
(d) Show that both bounds in (b) and (c) coincide when G is a regular
graph (i.e., when all nodes have the same degree).
Hint: Check and use the following fact (for the implication (c) =⇒
(d)): Given a matrix V ∈ Rn×m , let x1 , . . . , xn ∈ Rm be the vectors
corresponding to its rows. Then we have V T Fij V = xi xTj.
2
Duality in conic programming (Version: May 24,
2022)
minimize f0 (x)
subject to f1 (x) ≤ 0, . . . , fN (x) ≤ 0,
aT T
1 x = b1 , . . . , aM x = bM ,
The equality constraints are given by vectors aj ∈ Rn \ {0} and right hand
sides bj ∈ R. The convex set of feasible solutions is the intersection of N
convex sets with M hyperplanes
N
\ M
\
{x ∈ D : fi (x) ≤ 0} ∩ {x ∈ Rn : aT
j x = bj }.
i=1 j=1
maximize cT x
subject to x ∈ K,
aT T
1 x = b1 , . . . , am x = bm .
programming (SDP).
As we will see in the next lecture, these three cones have particular nice
analytic properties: They have a self-concordant barrier function which is
easy to evaluate. This implies that there are theoretically (polynomial-time)
and practically efficient algorithms to solve these standard problems.
In addition to this, the three examples are ordered by their “difficulty”,
which can be pictured as
LP ⊆ CQP ⊆ SDP.
This means that one can formulate every linear program as a conic quadratic
program and one can formulate every conic quadratic program as a semidef-
inite program.
0
c
H
Dual cones of proper cones are proper as wll; you will prove the following
lemma in Exercise 2.1.
Lemma 2.1.6 If K is a proper convex cone, then its dual cone K ∗ is
proper.
One can verify that a vector lies in the interior of K via its dual cone K ∗ ;
this fact will turn out to be quite useful and you will prove it in Exercise 2.2:
Lemma 2.1.7 Let K be a closed, full-dimensional convex cone. Then x
lies in the interior of K if and only if xT y > 0 for all y ∈ K ∗ \ {0}.
One can translate Carathéodory’s theorem for convex hulls, Lemma A.2.1,
including its proof, for conic hulls: Let y ∈ cone X be a vector in the conic
hull of X. Then there are linearly independent vectors x1 , . . . , xN ∈ X so
that y ∈ cone{x1 , . . . , xN }. In particular, N ≤ n.
Sometimes we also call the conic hull of X the cone generated by X. If
X is a finite set, then cone X is called finitely generated. By the theorem
of Minkowski and Weyl for convex cones (see Section A.7) a convex cone
K is finitely generated cone if and only it is polyhedral, that is, if it is the
intersection of finitely many halfspaces through the origin. Then,
K = cone{x1 , . . . , xN } = {x ∈ Rn : Ax ≤ 0}
for some vectors x1 , . . . , xN and some matrix A ∈ Rm×n . Using this repre-
sentation it is immediate to find the dual cone of K, it is
where aT T
1 , . . . , am are the row vectors of matrix A.
The direct product of two convex cones K1 ⊆ Rn1 and K2 ⊆ Rn2 is
Here (x, s) stands for the (column) vector in Rn+1 obtained by appending a
new entry s ∈ R to x ∈ Rn , we use this notation to emphasize the different
nature of the vector’s components. Sometimes Ln+1 is also called the ice
cream cone (make a drawing of L2+1 to convince yourself) or the Lorentz
cone.
The second-order cone is a special case of a norm cone. The norm cone
associated to any norm k · k in Rn is defined as
Ln+1 n
k·k = {(x, s) ∈ R × R : kxk ≤ s} .
The norm cone is proper for every norm. The dual norm of k · k is defined
as
kxk∗ = sup{xT y : y ∈ Rn , kyk ≤ 1}
The dual norm is again a norm and its norm cone satisfies
n+1 ∗
Ln+1
k·k∗ = (Lk·k ) .
34 Duality in conic programming (Version: May 24, 2022)
Since the `2 -norm is its own dual, the second-order cone is self-dual; we have
(Ln+1 )∗ = Ln+1 .
where we only consider the upper triangular part of the matrix X. The cone
of semidefinite matrices is
n
S+ = {X ∈ S n : X is positive semidefinite},
iner product. For example, we will see that the cone of positive semidefinite
matrices is a proper convex cone which is self-dual.
CP n = cone{xxT : x ∈ Rn+ }
n . It is a proper convex cone. Its dual is
is contained in S+
Now the symmetry between the primal and the dual conic program becomes
more clear, both programs maximize, respectively minimize, a linear func-
tional over the intersection of a cone with an affine subspace. Using this
geometric view and by applying the bipolar theorem, Theorem 2.1.5, it is
easy to see that computing the dual of the dual conic program gives back
the primal.
Now we specialize the cone K to the main examples of Section 2.2. These
examples are useful for a huge spectrum of applications.
sup{cT x : x ≥ 0, Ax = b}.
inf{bT y : AT y − c ≥ 0}.
The dual can be written in a nicer and more intuitive form using the defini-
tion of the cone Ln+1 and the Euclidean norm. For this define the matrices
Ai = a1,i , . . . , am,i ∈ Rni ×m for i ∈ [r],
and vectors
αi = (α1,i , . . . , αm,i )T for i ∈ [r].
Then the dual if equivalent to
inf{bT y : y ∈ Rm , kAi y − ci k ≤ αiT y − γi for i ∈ [r]}.
In particular, when setting Ai = 0 and ci = 0, we see that linear program-
ming is a special case of convex quadratic programming.
The cone of positive semidefinite matrices is self-dual, and hence the dual
semidefinite program is
Xm m
X
n
inf bj yj : y1 , . . . , ym ∈ R, yj Aj − C ∈ S+ .
j=1 j=1
Second, one can model computationally difficult, NP-hard problem, like de-
termining the independence number of a graph, as a copositive program,
see Exercise 2.7. This in particular shows that conic optimization is not
necessarily computationally easy.
d∗ = inf{bT y : y ∈ Rm , AT y − c ∈ K ∗ }. (D)
Theorem 2.4.1 Suppose we are given a pair of primal and dual conic
programs. Let p∗ be the supremum of the primal and let d∗ be the infimum
of the dual.
(i) (weak duality) Suppose x is a feasible solution of the primal conic pro-
gram, and y is a feasible solution of the dual conic program. Then,
cT x ≤ bT y.
In particular p∗ ≤ d∗ .
(ii) (complementary slackness) Suppose that the primal conic program
40 Duality in conic programming (Version: May 24, 2022)
attains its supremum at x, and that the dual conic program attains its
infimum at y, and that p∗ = d∗ . Then
T
AT y − c x = 0.
Before we proceed to the proof one more comment about the usefulness
of weak duality: Suppose you want to solve a primal conic program. If an
oracle gives you y, then it might be wise to check whether AT y − c lies in
K ∗ . If so, then this gives immediately an upper bound for p∗ .
One last remark: If the dual conic program is not bounded from below,
that is, if d∗ = −∞, then weak duality implies that p∗ = −∞, and so the
primal conic program is infeasible.
Proof The proof of weak duality is important but simple. It reveals the
origin of the definition of the dual conic program: We have
bT y = (Ax)T y = xT AT y ≥ xT c,
or, equivalently,
bT y ≤ d∗ =⇒ (Ax)T y ≤ xT c.
This means that
{y ∈ Rm : bT y ≤ d∗ } ⊆ {y ∈ Rm : (Ax)T y ≤ xT c}.
The set on the left hand side is a half-space with normal vector b.
If Ax 6= 0, then the set on the right hand side is also half-space with
normal vector Ax which points in the same direction as b. So there is a
strictly positive scalar µ > 0 such that
Ax = µb and µd∗ ≤ xT c,
and we are done.
If Ax = 0, then on the one hand, we have that xT c ≥ 0. On the other hand,
using the assumption that the conic dual program is strictly feasible, there
exists y 0 ∈ Rm such that AT y 0 − c ∈ int K ∗ . This implies
T
0 < AT y 0 − c x = −cT x,
where the strict inequality follows from Lemma 2.1.7. This gives cT x < 0, a
contradiction.
Third step: x∗ = x/µ is a maximizer of the primal conic program.
We saw above that x∗ ∈ K and Ax∗ = b. Thus x∗ is a primal feasible
solution. Furthermore, cT x∗ ≥ d∗ ≥ p∗ .
Then every primal feasible solution satisfies X13 = 0, X22 = 1, so that the
44 Duality in conic programming (Version: May 24, 2022)
Proof Suppose (i) is not weakly feasible. Then b does not lie in the closed
convex cone
CA = {Ax : x ∈ K}.
Now we can complete the proof exactly as in the case of the non-negative
orthant.
Example 2.6.5 Consider Example 2.6.2: The system (2.6) is weakly fea-
sible as one sees by choosing the sequence
1
i 1
Xi = , i ∈ N.
1 i
We can derive a variant of Theorem 2.6.4 by switching primal with dual,
see Exercise 2.8.
Definition 2.6.6 Given A ∈ Rm×n and c ∈ Rn . We say that the conic
programming system
AT y − c ∈ K ∗
kAT y − c − zk ≤ ε.
0 ≤ (AT y)T x = y T Ax = y T b ≤ 0,
2.6 Theorems of alternatives 47
implying (AT y)T x = 0. This gives a contradicts Lemma 2.1.7 since x ∈ int K
and AT y ∈ K ∗ \ {0}.
Let us turn to the other direction. By assumption, the affine space L =
{x : Ax = b} is not empty as it contains x0 . Write L = L + x0 for the linear
space L = {x : Ax = 0}.
Because (i) has no solution, L ∩ int K = ∅. By the separation theorem,
Theorem A.4.5, there exists a hyperplane separating L and int K: There
exists a non-zero vector c ∈ Rn and a scalar β such that
holds.
Then β ≤ 0 as 0 ∈ K. Furthermore, c ∈ K ∗ because for all y ∈ K and
all t > 0 we have cT (tx) ≥ β which implies cT x ≥ 0. Moreover, for every
x ∈ L and for every scalar t ∈ R, either positive or negative, we have that
cT (tx + x0 ) leβ which implies cT x = 0. Therefore c ∈ L⊥ and thus c is a
linear combination of the row vectors of A, say c = AT y for some y ∈ Rm .
Therefore, AT y ∈ K ∗ \ {0}.
Finally, since x0 ∈ L, we have
y T b = y T Ax0 = cT x0 ≤ β ≤ 0.
Example 2.6.9 Consider again Example 2.6.2: The system (2.6) is not
strictly feasible, but there is a feasible solution y1 = 1, y2 = 0 of (2.7)
after replacing the condition y2 < 0 by y2 ≤ 0 and adding the condition
y1 E11 + y2 E12 6= 0.
p∗ = sup{cT x : x ∈ K, aT
j x = bj for j ∈ [m]}, (2.8)
48 Duality in conic programming (Version: May 24, 2022)
m
X
d∗ = inf bT y : y ∈ Rm , yj aj − c ∈ K ∗ . (2.9)
j=1
primal and dual programs and the supremum/infimum might not be attained
even though they are finite. We point out some more differences regarding
rationality and bit size of optimal solutions.
In the classical bit (Turing machine) model of computation an integer
number p is encoded in binary notation, so that its bit size is log p + 1
(logarithm in base 2). Rational numbers are encoded as two integer numbers
and the bit size of a vector or a matrix is the sum of the bit sizes of its entries.
Consider a linear program
max{cT x : Ax = b, x ≥ 0}, (2.10)
where the data A, b, c is rational-valued. From the point of view of com-
putability this is a natural assumption and it would be desirable to have
an optimal solution which is also rational-valued. A fundamental result in
linear programming asserts that this is indeed the case: If program (2.10)
has an optimal solution, then it has a rational optimal solution x ∈ Qn ,
whose bit size is polynomially bounded in terms of the bit sizes of A, b, c.
On the other hand it is easy to construct instances of semidefinite pro-
gramming where the data are rational valued, yet there is no rational optimal
solution. For instance, the following program
1 x
max x : 0
x 2
√
attains its maximum at x = 2.
Consider now the semidefinite program, with variables x1 , . . . , xn ,
1 2 1 xi−1
inf xn : 0, 0 for i = 2, . . . , n .
2 x1 xi−1 xi
n
Then any feasible solution satisfies xn ≥ 22 . Hence the bit-size of an optimal
solution is exponential in n, thus exponential in terms of the bit-size of the
data.
The above facts suggest difficulties for complexity issues about semidef-
inite programming. For instance, one cannot hope for a polynomial time
algorithm in the bit model of computation for solving a semidefinite pro-
gram exactly, since the output might not even be representable in this model.
Moreover, even if we set up to the less ambitious goal of just computing -
approximate optimal solutions, we should make some assumptions on the
semidefinite program, roughly speaking, in order to avoid having too large
or too small optimal solutions. We will come back to these complexity issues
about semidefinite programming in Chapter 3.
50 Duality in conic programming (Version: May 24, 2022)
Exercises
2.1 *** WS 2020/21 *** Prove Lemma 2.1.6.
2.2 *** WS 2020/21 *** Prove Lemma 2.1.7
2.3 *** WS 2020/21 *** Show that the set of non-negative polynomials of
degree at most 2d
n o
(a0 , a1 , . . . , a2d ) ∈ R2d+1 : a0 + a1 x + · · · + a2d x2d ≥ 0 for all x ∈ R
(c) Give some conditions ensuring that there is no duality gap between
(2.11) and (2.12), i.e., that p∗ = d∗ .
2.7 *** WS 2020/21 *** Let G = (V, E) be a graph. The independence
number α(G) of the graph is the maximal cardinality of a set S ⊆ V
such that {i, j} 6∈ E for any i, j ∈ S. Show that α(G) equals the optimal
value of the following conic program:
maximize hJ, Ai
subject to A ∈ CP n ,
hI, Ai = 1,
Aij = 0 if {i, j} ∈ E.
2.8 *** WS 2020/21 *** Prove Theorem 2.6.7
2.9 *** WS 2020/21 *** Prove Theorem 2.6.10.
Hint: You may derive it from Theorem 2.6.8.
3
The ellipsoid method
max cT x
x ∈ Rn
(3.1)
Ax = b
x≥0
size is polynomially bounded in terms of the bit sizes of the input A, b, and
c (see e.g. Schrijver [1986]).
The first polynomial-time algorithm for solving LPs was given by Khachiyan2
in 1979, based on the ellipsoid method. The value of this algorithm is how-
ever mainly theoretical as it is very slow in practice. Later the algorithm of
Karmarkar3 in 1984 opened the way to polynomial time algorithms for LP
based on interior-point algorithms, which also perform well in practice.
What about algorithms for solving semidefinite programs?
First of all, one cannot hope for a polynomial time algorithm permitting
to solve any semidefinite program rationally. Indeed, even if the data of the
SDP are assumed to be rational-valued, the output might be an irrational
number. For instance, the following program
max x
x∈R
(3.2)
1 x
0
x 2
√
attains its maximum at x = 2. Therefore, one should look at algorithms
permitting to compute in polynomial time an ε-approximate optimal solu-
tion.
However, even if we set up to this less ambitious goal of just computing
ε-approximate optimal solutions, we should make some assumptions on the
semidefinite program, roughly speaking, in order to avoid having too large
optimal solutions.
An instance of SDP whose output is exponentially large in the bit size of
the data is the semidefinite program, with variables x1 , . . . , xn ,
inf xn
x ∈ Rn
(3.3)
1 2 1 xi−1
0, 0 for i = 2, . . . , n
2 x1 xi−1 xi
n
Then any feasible solution satisfies xn ≥ 22 . Hence the bit-size of an optimal
solution is exponential in n, thus exponential in terms of the bit-size of the
data. So even writing down the optimal solution requires exponentially many
basic operations.
On the positive side, it is well known that one can test whether a given
rational matrix is positive semidefinite in polynomial time — using Gaussian
2 Leonid Khachiyan (1952–2005)
3 Narendra Karmarkar (1957–)
3.1 Ellipsoids 55
elimination. Hence one can test in polynomial time membership in the pos-
itive semidefinite cone. Moreover, as a byproduct of Gaussian elimination,
n , then one can compute in polynomial time a hyperplane strictly
if X 6∈ S+
separating X from S+ n . See Section 3.4 below for details.
This observation is at the base of the polynomial time algorithm for solving
approximately semidefinite programs, based on the ellipsoid method which is
the subject of this chapter. Roughly speaking, one can solve a semidefinite
program in polynomial time up to any given precision. More precisely, in
this chapter we shall prove the following result describing the complexity of
solving semidefinite programming with the ellipsoid method:
Theorem 3.0.1 Consider the semidefinite program
p∗ = sup hC, Xi
X ∈ Sn
hAj , Xi = bj for j ∈ [m]
X0
where Aj , C, bj are rational-valued. Denote by F its feasibility region. Sup-
pose we know a rational point x0 ∈ F and rational numbers positive r, R so
that
x0 + rBd ⊆ F ⊆ x0 + RBd ,
{Y ∈ S n : hAj , Y i = 0 (j ∈ [m])}.
Let ε > 0 be given. Then, one can find a rational matrix X ∗ ∈ F such that
p∗ − hC, X ∗ i ≤ ε.
3.1 Ellipsoids
3.1.1 Definitions
n
A positive definite matrix A ∈ S++ and a vector x ∈ Rn define the ellip-
soid E(A, x) by
For instance E(r2 In , 0) = rBn is the ball of radius r centered at the origin.
56 The ellipsoid method
Let A = ni=1 λi ui uT
P
i be a spectral decomposition of A. Then the direc-
tions of the vectors ui are the axis of the ellipsoid E(A, x). Furthermore, the
√
value λi equals the length of the corresponding semiaxis, and the volume
of E(A, x) equals
p √
vol E(A, x) = λ1 · · · λn vol Bn = det A vol Bn ,
where
Bn = {x ∈ Rn : kxk ≤ 1}
is the n-dimensional unit ball. It has volume
π n/2
vol Bn = ,
Γ(n/2 + 1)
where Γ is the gamma function, a continuation of the factorial function,
which for half-integral nonnegative integers is defined by
√
Γ(1/2) = π, Γ(1) = 1, Γ(x + 1) = xΓ(x).
This dimension dependent factor of vol Bn usually does not play a role.
The definition of E(A, x) is an implicit definition using a strictly convex
quadratic inequality. There is also an explicit definition of ellipsoids as the
image of the unit ball under an invertible affine transformation
{T y + x : y ∈ Bn },
where T ∈ Rn×n is an invertible matrix, and where x ∈ Rn is a translation
vector.
From linear algebra it is known that every invertible matrix T has a
factorization of the form T = BP where B ∈ S++ n is a positive definite
matrix and P ∈ O(n) is an orthogonal matrix. So we may assume in the
following that the matrix T which defines the ellipsoid is a positive definite
matrix.
In fact one can find this factorization, the polar factorization, from the
singular value decomposition of T
T = U T ΣV, U, V ∈ O(n), Σ = diag(σ1 , . . . , σn ),
where σi ≥ 0 are the singular values of T (i.e., σi2 are the eigenvalues of the
matrix T T T or, equivalently, of T T T ). Then,
T = BP with B = U T ΣU, P = U T V.
The singular values of T are at the same time the lengths of the semiaxis of
the ellipsoid.
3.1 Ellipsoids 57
E(A, x) ∩ {y ∈ Rn : aT y ≥ aT x}
is equal to
with
n2
A0 = n2 −1
(A − 2 T
n+1 bb ), x0 = x + 1
n+1 b, b= √ 1 Aa. (3.4)
aT Aa
Furthermore, if n ≥ 2,
vol E(A0 , x0 ) 1
≤ e− 2n < 1. (3.5)
vol E(A, x)
Proof By performing an affine transformation T : Rn → Rn we may assume
that A = In , x = 0, and a = e1 . So we are trying to find the smallest volume
ellipsoid containing Bn ∩ {y ∈ Rn : y1 ≥ 0}. The symmetry of this convex
body forces its Loewner-John ellipsoid to be defined by a diagonal matrix
for some α, β, γ > 0. The points e1 , ±e2 , . . . , ±en lie on the boundary of
the Loewner-John ellipsoid. Hence for i = 2, . . . , n we get the equations
1 γ2
1 = (±ei − γe1 )T diag( α1 , β1 , . . . , β1 )(±e1 − γe1 ) = + ,
β α
and
(1 − γ)2
1 = (e1 − γe1 )T diag( α1 , β1 , . . . , β1 )(ei − γe1 ) = .
α
(1−γ)2
So we can eliminate the variables α = (1 − γ)2 and β = 1−2γ because
1 γ2 (1 − γ)2 − γ 2 1 − 2γ
=1− = 2
= .
β α (1 − γ) (1 − γ)2
√
Now we have find the minimum of det A0 which is the minimum of the
function
s
p (1 − γ)2n
γ 7→ αβ n−1 = .
(1 − 2γ)n−1
you give a convex body? Here there are many possibilities: If K is a polytope,
then one can give a list of supporting hyperplanes, or one can give the list
of extreme points. If K is the set of feasible solution of a primal semidefinite
program, a compact spectrahedron, then one can use the symmetric matrices
A1 , . . . , Am ∈ S n and the right hand sides b1 , . . . , bm to determine K. In
the following we will use a less explicit description of K, in fact it is a
black-box description. We give K in term of a separation oracle, which is
an algorithm that can decide wether some point belongs to K and if not it
gives a separating hyperplane.
Formally the separation oracle is an algorithm which can solve the sepa-
ration problem for a convex body K:
given: vector x ∈ Rn
find: either assert x ∈ K or find d ∈ Rn with dT x ≥ maxy∈K dT y
The ellipsoid method uses this separation oracle to solve the optimization
problem for a convex body K:
E0 = E(R2 I, x0 )
60 The ellipsoid method
for k = 0, . . . , N − 1 do
Let xk be the center of the ellipsoid Ek
Use the separation procedure for xk
if xk ∈ K then
k is a feasible index
a=c
else
a = −d
end if
Ek+1 = E(Ek ∩ {y ∈ Rn : aT y ≥ aT xk })
end for
n vol Bn 1/n R2 − k2 R2
k
T T
c z − c xj ≤ e 2n ≤ 2 e− 2n2 .
vol Bn−1 r r
(recall that πK is the metric projection on the convex body K from Sec-
tion ??) and the weak optimization problem for a convex body K:
2
N = 4n2 dlog 2Rrεkck e
2 4−N
δ = R300n
p = 5N
A0 = R 2 I n
for k = 0, . . . , N − 1 do
Use the separation procedure with input xk , and δ
if kxk − πK (xk )k ≤ δ then
k is a feasible index
a=c
else
a = −d
end if
bk = √Ak a
aT Ak a
∗
xk = xk + n+1 1
bk
∗ 2n2 +3 2
Ak = 2n2 (Ak − n+1 bk bT
k)
xk+1 ≈ xk ∗
Ak+1 ≈ A∗k
end for
The entries of bk , x∗k and A∗k might not be rational as the formulæ contain
square roots. To make them rational we use the sign ≈ which means that we
round to p binary digits behind the comma. So the entries of x∗k and xk+1 ,
respectively of A∗k and Ak+1 , differ by at most 2−p . In an implementation
we are careful so that Ak+1 stays symmetric.
The following lemma from linear algebra is sometimes very useful when
analyzing numerical algorithms which perform rank-1 updates of matrices:
If we know the inverse of a matrix A and if we add to A a matrix of rank 1,
3.3 Separation and optimization 63
then finding the inverse of A can be done by a simple formula. In the ellipsoid
method the transition from the matrix defining ellipsoid Ek to the matrix
of ellipsoid Ek+1 is such a rank-1 update.
The next lemma states that we have some control on the size of the
coefficients which occur during the computation. We use the operator norm 8
for this. The operator norm of a matrix A is defined by
where kAxk and kxk is the Euclidean norm. If A is symmetric, then kAk is
the maximum absolute value of the eigenvalues of A.
2n2 aaT
∗ −1 −1 2
(Ak ) = 2 Ak + · .
2n + 3 n−1 aT Ak a
Hence, we see by induction on k that (A∗k )−1 is positive definite as it is
the sum of a positive definite matrix with a positive semidefinite matrix.
Also its inverse, A∗k , is positive definite since its eigenvalues, which are the
reciprocals of the eigenvalues of (A∗k )−1 , are all positive.
Using the induction hypothesis for Ak we get
2n2 + 3 2n2 + 3
2 3
kA∗k k = A k − bk bT
≤ kA k k ≤ 1 + R2 2k .
2n2 n+1 k 2n2 2n2
8 Note that the operator norm is different from the Frobenius norm. In this chapter we only use
the operator norm so this should not cause confusion.
64 The ellipsoid method
So
3
kAk+1 k ≤ kA∗k k + kAk+1 − A∗k k ≤ 1+ 2 R2 2k + n2−p ≤ R2 2k+1 .
2n
Further,
s
kAk ak aT A2k a p
kbk k = p = ≤ kAk k ≤ R2k/2 ,
aT Ak a aT Ak a
and so
1
kxk+1 k = kxk+1 − x∗k + x∗k k ≤ kxk+1 − x∗k k + kxk k + kbk k
n+1
√ 1
≤ n2−p + kx0 k + R2k + R2k/2 ≤ kx0 k + R2k+1 .
n+1
Further,
2n2 kak2
−1 2
k(A∗k )−1 k
≤ 2 kAk k + ·
2n + 3 n−1 aT Ak a
2n2
2 n + 1 −1
≤ 2 kA−1
k k + kA−1
k ≤ kA k,
2n + 3 n−1 k n−1 k
where we used the induction hypothesis that Ak is positive definite and thus
2
the fraction akak
T A a is at most λ
1
min (Ak )
= kA−1
k k.
k
Let v be a normalized eigenvector of the smallest eigenvalue λmin (Ak+1 )
of Ak+1 . Then
λmin (Ak+1 ) = v T Ak+1 v = v T A∗k v + v T (Ak+1 − A∗k )v
n − 1 −1 −1
≥ k(A∗k )−1 k−1 − kAk+1 − A∗k k ≥ kA k − n2−p
n+1 k
n − 1 2 −k
≥ R 4 − n2−p ≥ R2 4−(k+1) .
n+1
Hence, Ak+1 is positive definite and
−1 1
kAk+1 k= ≤ R−2 4k+1 .
λmin (Ak+1 )
Proof Similar to Lemma 3.1.1, one only needs to take the rounding errors
into account. This can be done like in the proof of the previous lemma.
The next lemma is Lemma 3.2.1 word by word, only its proof needs more
work.
aT x ≥ aT xk − δ. (3.7)
We only need the δ when k is not a feasible index and a = −d. Decompose
the vector x:
x = xk + αbk + y,
1 ≥ (y + αbk )T A−1 T −1 2 T −1 T −1 2
k (y + αbk ) = y Ak y + α bk Ak bk = y Ak y + α ,
(x − xk+1 )T A−1 ∗ T ∗ −1 ∗
k+1 (x − xk+1 ) ≤ (x − xk ) (Ak+1 ) (x − xk ) + R1 ,
1
where the remainder term R1 is at most 12n 2 as can be shown by the same
techniques as the ones in the previous lemma. The main term: We have by
the decomposition of x and the definition of x∗k
∗ 1 1
x − xk = xk + y + αbk − xk − bk = α − bk + y
n+1 n+1
66 The ellipsoid method
and so
(x − x∗k )T (A∗k+1 )−1 (x − x∗k )
T
2n2
1
= 2 α− bk + y
2n + 3 n+1
T
!
2 aa 1
A−1
k + n + 1 · aT A a α− bk + y
k n+1
2 2 !
2n2
1 T −1 2 1
= 2 α− + y Ak y + α−
2n + 3 n+1 n+1 n+1
2n2 n2
2α(1 − α)
≤ 2 2
−
2n + 3 n − 1 n−1
2n 4 4δ 2n 4δ
≤ 4 + + kA−1 k
2n + n2 − 3 (n − 1) aT Ak a 2n4 + n2 − 3 n − 1 k
p
2n4 4δR−2 4N
≤ +
2n4 + n2 − 3 n−1
1
≤1− .
12n2
Hence (x − xk+1 )T A−1
k+1 (x − xk+1 ) ≤ 1 and so x ∈ Ek+1 .
Theorem 3.3.1 The argument for the fact that N iterations suffice to guar-
antee that the found solution is ε-close to the optimum was already given
in the proof of Theorem 3.2.2. Note that N depends polynomially on the
input size. So the only thing which is left to do is to see that the (rational)
coefficients of xk and Ak have a polynomial-size bit-encoding. This follows
from Lemma 3.3.3 plus the fact that we round the coefficients to p binary
digits behind the comma.
Case 2: a11 = 0, but some entry a1j is not zero, say a12 6= 0. Then choose
λ ∈ Q such that 2λa12 + a22 < 0, so that
Case 3: a11 > 0. Then we apply Gaussian elimination to the rows Rj and
columns Cj of A for j = 2, . . . , n. Namely, for each j = 2, . . . , n, we replace
a1j a1j
Cj by Cj − a11 C1 , and analogously we replace Rj by Rj − a11 Rj , which
amounts to making all entries of A equal to zero at the positions (1, j) and
(j, 1) for j 6= 1. For this, define the matrices
a1j
Pj = In − E1j and P = P2 · · · Pn .
a11
Then, P is rational and nonsingular, and P T AP has the block form:
a11 0
T
P AP = , where A0 ∈ S n−1 .
0 A0
Thus,
A 0 ⇐⇒ P T AP 0 ⇐⇒ A0 0.
Or, we find y ∈ Qn−1 such that y T A0 y < 0. Then, we obtain that xT Ax < 0,
after defining z = (0, y) and x = P z ∈ Qn .
Case 4: a11 = 0 and the matrix A is of the form
0 0
A= for A0 ∈ S n−1 .
0 A0
68 The ellipsoid method
With this procedure we almost proved Theorem 3.0.1. There is only one
technical detail missing: We have to work in the linear span of the set of
feasible solutions since we require that we know a full-dimensional ball lying
inside F. So we have to do some postprocessing in case when the matrix X
satisfies the linear constraints hAj , Xi = bj but is not positive semidefinite
and the vector x ∈ Rn certifies this by the inequality xT Ax < 0. Then
the hyperplane {Y ∈ S n : hY, xxT i = 0} separates X from the positive
semidefinite cone. Then we have to project the matrix xxT onto the linear
space {Y ∈ S n : hAj , Y i = 0 (j ∈ [m])} in order to get the desired output
for the weak separation procedure.
1947 Dantzig invented the simplex algorithm for linear programming. The
simplex algorithm works extremely good in practice, but until today
nobody really understands why (although there are meanwhile good
theoretical indications). It is fair to say that the simplex algorithm
is one of the most important algorithms invented in the last century.
1972 Klee and Minty found a linear program for which the simplex al-
gorithm is extremely slow (when one uses Dantzig’s most-negative-
entry pivoting rule): It uses exponentially many steps.
1979 Khachiyan invented the ellipsoid method for linear programming which
runs in polynomial time. It is a very valuable theoretical algorithm.
1981 Grötschel, Lovász, and Schrijver showed that the problems of separa-
tion and optimization are polynomial time equivalent.
1984 Karmakar showed that one can use interior-point methods for design-
ing a polynomial-time algorithm for linear programming. Nowadays,
interior-point methods can compete with the simplex algorithm.
1994 Nesterov and Nemirovski generalized Karmarkar’s result to conic pro-
gramming with the use of self-concordant barrier functions.
since 1994 Every day conic programming becomes more useful (in theory
and practice).
Some more words about interior point methods: It is fair to say that
during the last twenty years there has been a revolution in mathematical
optimization based on the development of efficient interior point algorithms
for convex optimization problems.
70 The ellipsoid method
Exercises
3.1 Give a proof of the Sherman-Morrison formula.
3.2 Use induction on k to prove Lemma 3.2.1.
3.3 Let G = (V, E) be a graph and let LG be its Laplacian matrix. Show
that one can approximate
SDP(G) = max h 14 LG , Xi
X ∈ Sn
Xii = 1 for i ∈ [n]
X 0.
to any desired accuracy in polynomial time.
3.4 Implement the ellipsoid method for the above semidefinite program and
compute the value for the Petersen graph.
3.5 Show the inequality
n+1 n−1
n n
≤ exp(−1/n).
n+1 n−1
Hint: Reduce this to showing the inequality:
(1 + x)1+x (1 − x)1−x ≥ exp(x2 ) for all 0 < x < 1.
PART TWO
COMBINATORIAL OPTIMIZATION
4
Graph coloring and independent sets
In this chapter we revisit in detail the theta number ϑ(G), which has al-
ready been introduced in earlier chapters. In particular, we present several
equivalent formulations for ϑ(G), we discuss its geometric properties, and we
present some applications: for bounding the Shannon capacity of a graph,
and for computing in polynomial time maximum stable sets and minimum
colorings in perfect graphs. We also show the link to Delsarte linear pro-
gramming bounds for binary codes and we present a hierarchy of stronger
bounds for the stability number, based on the approach of Lasserre.
Here are some additional definitions used in this chapter. Let G = (V, E)
be a graph. Then, E denotes the set of pairs {i, j} of distinct nodes that
are not adjacent in G. The graph G = (V, E) is called the complementary
graph of G and G is called self-complementary if G and G are isomorphic
graphs. Given a subset S ⊆ V , G[S] denotes the subgraph induced by S:
its node set is S and its edges are all pairs {i, j} ∈ E with i, j ∈ S. The
graph Cn is the circuit (or cycle) of length n, with node set [n] and edges
the pairs {i, i + 1} (for i ∈ [n], indices taken modulo n). For a set S ⊆ V ,
its characteristic vector is the vector χS ∈ {0, 1}S , whose i-th entry is 1 if
i ∈ S and 0 otherwise. As before, e denotes the all-ones vector.
Figure 4.1 The Petersen graph has α(G) = 4, ω(G) = 2, χ(G) = 3, χ(G) =
5
Clearly, any two nodes in a clique of G must receive distinct colors. There-
fore, for any graph, the following inequality holds:
ω(G) ≤ χ(G). (4.1)
This inequality is strict, for example, when G is an odd circuit, i.e., a circuit
1 The four colour theorem was proved by Appel and Haken [1977, 1977]; this is a long proof,
which relies on computer check. Another proof, a bit simplified but still relying on computer
check, was given later by Robertson, Sanders, Seymour, Thomas [1997]. A fully automated
proof has been given recently by Gonthier [2008].
4.1 Preliminaries on graphs 75
of odd length at least 5, or its complement. Indeed, for an odd circuit C2n+1
(n ≥ 2), ω(C2n+1 ) = 2 while χ(C2n+1 ) = 3. Moreover, for the complement
G = C2n+1 , ω(G) = n while χ(G) = n + 1. For an illustration see the cycle
of length 7 and its complement in Figure 4.2.
called the stable set polytope of G. Hence, computing α(G) is linear opti-
mization over the stable set polytope:
We have now defined the stable set polytope by listing explicitly its ex-
treme points. Alternatively, it can also be represented by its hyperplanes
representation, i.e., in the form
ST(G) = {x ∈ RV : Ax ≤ b}
for some matrix A and some vector b. As computing the stability number is
a hard problem one cannot hope to find the full linear inequality description
of the stable set polytope (i.e., the explicit A and b). However some partial
information is known: many classes of valid inequalities for the stable set
polytope are known. For instance, if C is a clique of G, then the clique
inequality
X
x(C) = xi ≤ 1 (4.2)
i∈C
is valid for ST(G): any stable set can contain at most one vertex from the
clique C. The clique inequalities define the polytope
Maximizing the linear function eT x over the polytope QST(G) gives the
parameter
α∗ (G) = max{eT x : x ∈ QST(G)}, (4.5)
If we add the constraint that all λS should be integral then we obtain the
coloring number of G. Thus, χ∗ (G) ≤ χ(G). In fact the fractional stability
number of G coincides with the fractional coloring number of its complement:
α∗ (G) = χ∗ (G), and it is nested between α(G) and χ(G).
4.2 Linear programming bounds 79
Proof The inequality α(G) ≤ α∗ (G) in (4.7) follows from the inclusion (4.4)
and the inequality χ∗ (G) ≤ χ(G) was observed above. We now show that
α∗ (G) = χ∗ (G). For this, we first observe that in the linear program (4.5)
the condition x ≥ 0 can be removed without changing the optimal value;
that is,
α∗ (G) = max{eT x : x(C) ≤ 1 for C clique of G} (4.9)
(check it). Now, it suffices to observe that the dual LP of the above linear
program (4.9) coincides with the linear program (4.8).
When w is the all-ones weight function, we find again α(G), α∗ (G), χ(G)
and χ∗ (G), respectively. The following analogue of (4.7) holds for arbitrary
node weights:
the wi copies of each node i of G). Now, these stable sets S1 , . . . , St have the
property that each node i of G belongs to exactly wi of them, which shows
that χ(G, w) ≤ t = χ(H). This implies that χ(G, w) ≤ χ(H) = α(G, w),
giving equality χ(G, w) = α(G, w).
Its optimal value is denoted as ϑ(G), and called the theta number of G.
This parameter was introduced by Lovász [1979]. He proved the following
simple, but crucial result – called the Sandwich Theorem by Knuth [1994]
– which shows that ϑ(G) provides a bound for both the stability number of
G and the chromatic number of the complementary graph G.
Theorem 4.3.2 (Lovász’ sandwich theorem) For any graph G, we
have that
α(G) ≤ ϑ(G) ≤ χ(G).
Proof Given a stable set S of cardinality |S| = α(G), define the matrix
1 S S T
X= χ (χ ) ∈ S n .
|S|
Then X is feasible for (4.11) with objective value hJ, Xi = |S| (check it).
This shows the inequality α(G) ≤ ϑ(G).
Now, consider a matrix X feasible for the program (4.11) and a partition of
V into k cliques: V = C1 ∪ · · · ∪ Ck . Our goal is now to show that hJ, Xi ≤ k,
which will imply ϑ(G) ≤ χ(G). For this, using the relation e = ki=1 χCi ,
P
observe that
k k
X
Ci Ci
T 2
X
χCi (χCi )T − kJ.
Y := kχ −e kχ −e =k
i=1 i=1
Moreover,
k
* +
X
X, χCi (χCi )T = Tr(X).
i=1
Indeed the matrix i χCi (χCi )T has all its diagonal entries equal to 1 and
P
it has zero off-diagonal entries outside the edge set of G, while X has zero
off-diagonal entries on the edge set of G. As X, Y 0, we obtain
0 ≤ hX, Y i = k 2 Tr(X) − khJ, Xi
and thus hJ, Xi ≤ k Tr(X) = k.
4.3 Semidefinite programming bounds 83
We also refer to Lemma 4.4.3 for the inequality ϑ(G) ≤ χ(G), where the
link to coverings by cliques will be even more trasparent.
More generally, given integer node weights w ∈ ZV+ , the above algorithm
can also be used to find a stable set S of maximum weight w(S). For this,
construct the new graph G0 in the following way: Duplicate each node i ∈ V
wi times, i.e., replace node i ∈ V by a set Wi of wi nodes pairwise non-
adjacent, and make two nodes x ∈ Wi and y ∈ Wj adjacent if i and j are
adjacent in G. By Lemma 4.2.2, the graph G0 is perfect. Moreover, α(G0 )
is equal to the maximum weight w(S) of a stable set S in G. From this it
follows that, if the weights wi are bounded by a polynomial in n, then one
can compute α(G, w) in polynomial time. (More generally, one can compute
α(G, w) in polynomial time, e.g. by optimizing the linear function wT x over
the theta body TH(G), introduced in Section 4.5.1 below.)
(i)We find a stable set S meeting each of the cliques C1 , · · · , Ct (see below).
(ii)Compute ω(G\S).
(iii)If ω(G\S) < ω(G) then S meets all maximum cliques and we are done.
(iv) Otherwise, compute a maximum clique Ct+1 in G\S, which is thus a new
maximum clique of G, and we add it to the list L.
The first step can be done as follows: Set w = ti=1 χCi ∈ ZV+ . As G is
P
feasible for the program defining χ(G, w) then, on the one hand, wT e =
T
P P
C yC |C| ≤ C yC ω(G) and, on the other hand, w e = tω(G), thus imply-
ing t ≤ χ(G, w). Now we compute a stable set S having maximum possible
weight w(S). Hence, we have w(S) = t and thus S meets each of the cliques
C1 , · · · , Ct .
The above algorithm has polynomial running time, since the number of
iterations is bounded by |V |. To see this, define the affine space Lt ⊆ RV
defined by the equations x(C1 ) = 1, · · · , x(Ct ) = 1 corresponding to the
cliques in the current list L. Then, Lt contains strictly Lt+1 , since χS ∈
Lt \ Lt+1 for the set S constructed in the first step, and thus the dimension
decreases at least by 1 at each iteration.
Proof First we build the dual of the semidefinite program (4.11), which
reads:
X
min t : tI + yij Eij − J 0 . (4.16)
t∈R,y∈RE
{i,j}∈E
As both programs (4.11) and (4.16) are strictly feasible, there is no duality
gap: the optimal value of (4.16) is equal to ϑ(G), and the optimal values
are attained in both programs – here we have applied the duality theorem
(Theorem 2.4.1).
86 Graph coloring and independent sets
P
Setting A = {i,j}∈E yij Eij , B = J −A and C = tI +A in (4.16), it follows
that the program (4.16) is equivalent to (4.12), (4.13) and (4.14). Finally the
formulation (4.15) follows directly from (4.13) after recalling that λmax (B)
is the smallest scalar t for which tI − B 0.
Lemma 4.4.2 The theta number ϑ(G) is equal to the optimal value of the
following semidefinite program:
Proof We show that the two semidefinite programs in (4.12) and (4.17) are
equivalent. For this, observe that
eT
t
tI + A − J 0 ⇐⇒ Z := 0,
e I + 1t A
which follows by taking the Schur complement of the upper left corner t in
the block matrix Z. Hence, if (t, A) is feasible for (4.12), then Z is feasible for
(4.17) with same objective value: Z00 = t. The construction can be reversed:
if Z is feasible for (4.17), then one can construct (t, A) feasible for (4.12)
with t = Z00 . Hence both programs are equivalent.
From the formulation (4.17), the link of the theta number to the (frac-
tional) chromatic number is even more transparent.
Proof Let y = (yC ) be feasible for the linear program (4.8) defining χ∗ (G).
For each clique C define the (column) vector zC = (1 χC ) ∈ R1+n , obtained
by appending an entry equal to 1 to the characteristic vector of C. Define
T . One can verify that Z is feasible for
P
the matrix Z = C clique of G yC zC zC
P
the program (4.17) with objective value Z00 = C yC (check it). This shows
ϑ(G) ≤ χ∗ (G).
4.4 Other formulations of the theta number 87
Observe that the matrix Y ∈ S n+1 occurring in this last program can be
equivalently characterized by the conditions: Y00 = 1, Yij = 0 if {i, j} ∈
P
E and Y 0. Moreover the objective function reads: i∈V yi + 2zi =
P
− i∈V Y ii + 2Y 0i . Therefore the dual can be equivalently reformulated
as
n X o
sup − Yii + 2Y0i : Y 0, Y00 = 1, Yij = 0 for {i, j} ∈ E . (4.19)
i∈V
As (4.17) is strictly feasible (check it) there is no duality gap, the optimal
value of (4.19) is attained and it is equal to ϑ(G).
Let Y be an optimal solution of (4.19). We claim that Y0i + Yii = 0 for all
i ∈ V . Indeed, assume that Y0i + Yii 6= 0 for some i ∈ V , so that Yii 6= 0. We
construct a new matrix Y 0 feasible for (4.19) and having a larger objective
value than Y , thus contradicting the optimality of Y . If Y0i ≥ 0, then we
let Y 0 be obtained from Y by setting to 0 all the entries at the positions
(i, 0) and (i, j) for j ∈ [n], which indeed has a larger objective value since
Yii + 2Y0i > 0. Assume now Y0i < 0. Then set λ = −Y0i /Yii > 0 and let
Y 0 be obtained from Y by multiplying its i-th row and column by λ. Then,
Yii0 = λ2 Yii = Y0i2 /Yii , Y0i0 = λY0i = −Yii0 , and Y 0 has a larger objective value
than Y since −Yii0 − 2Y0i0 = Y0i2 /Yii > −Yii − 2Y0i .
4 Of course there is more than one road leading to Rome: one can also show directly the
equivalence of the two programs (4.11) and (4.18).
88 Graph coloring and independent sets
As shown in Lovász [1979], it turns out that this bound can be strengthened,
by replacing AG by any matrix A supported by the graph G, and that this
gives yet another formulation for the theta number.
which is called the theta body of G. It turns out that TH(G) is nested between
the stable set polytope ST(G) and its linear relaxation QST(G).
Lemma 4.5.1 For any graph G, we have that ST(G) ⊆ TH(G) ⊆ QST(G).
Proof The inclusion ST(G) ⊆ TH(G) follows from the fact that the char-
acteristic vector of any stable set S lies in TH(G). To see this, define the
(column) vector y = (1 χS ) ∈ Rn+1 obtained by adding an entry equal to
1 to the characteristic vector of S, and define the matrix Y = yy T ∈ S n+1 .
Then Y ∈ MG and χS = (Yii )i∈V , which shows that χS ∈ TH(G).
We now show the inclusion TH(G) ⊆ QST(G). For this pick a vector x
in TH(G) and a clique C of G; we show that x(C) ≤ 1. Say xi = Yii for all
i ∈ V , where Y ∈ MG . Consider the principal submatrix YC of Y indexed
by {0} ∪ C, which is of the form
xT
1 C
YC = ,
xC Diag(xC )
Corollary 4.5.2 For any graph G, we have that α(G) ≤ ϑ(G) ≤ α∗ (G).
Combining the inclusion from Lemma 4.5.1 with Theorem 4.2.4, we deduce
that TH(G) = ST(G) = QST(G) for perfect graphs. As we will see in
Theorem 4.5.9 below it turns out that these equalities characterize perfect
graphs.
uT
i uj = 0 for {i, j} ∈ E.
Note that the smallest integer d for which there exists an orthonormal
representation of G is upper bounded by χ(G) (check it). Moreover, if S is
a stable set in G and the ui ’s form an ONR of G in Rd , then the vectors ui
labeling the nodes of S are pairwise orthogonal, which implies that d ≥ α(G).
It turns out that the stronger lower bound d ≥ ϑ(G) holds.
setting Zij = hUi , Uj i for i, j ∈ {0} ∪ [n]. One can verify that Z is feasible
for the program (4.17) defining ϑ(G) (check it) with Z00 = d. This gives
ϑ(G) ≤ d.
4.5 Geometric properties of the theta number 91
To conclude the proof of Theorem 4.5.5 we use the following result, which
characterizes which partially specified matrices can be completed to a posi-
tive semidefinite matrix – this will be proved in Exercise 4.2.
Proof (of Theorem 4.5.5). Let x ∈ RV+ such that xT z ≤ 1 for all z ∈ TH(G);
we show that x ∈ TH(G). For this we need to find a matrix Y ∈ MG such
that x = (Yii )i∈V . In other words, the entries of Y are specified already at
the following positions: Y00 = 1, Y0i = Yii = xi for i ∈ V , and Y{i,j} = 0
for all {i, j} ∈ E, and we need to show that the remaining entries (at the
positions of non-edges of G) can be chosen in such a way that Y 0.
To show this we apply Proposition 4.5.8, where the graph H is G with
an additional node 0 adjacent to all i ∈ V . Hence it suffices now to show
{0}∪V
that hY, Zi ≥ 0 for all matrices Z ∈ S+ with Zij = 0 if {i, j} ∈ E. Pick
such Z, say with Gram representation w0 , w1 , · · · , wn . Then wiT wj = 0 if
{i, j} ∈ E. We can assume without loss of generality that all wi are non-
zero (use continuity if some wi is zero) and up to scaling that w0 is a unit
vector. Then the vectors wi /kwi k (for i ∈ V ) form an ONR of G. By Lemma
4.5.7 (applied to G), the vector z ∈ RV , defined by zi = (w0T wi )2 /kwi k2 for
i ∈ V , belongs to TH(G) and thus we have xT z ≤ 1 by assumption. We can
now verify that hY, Zi is equal to
X X X (wT wi )2
T 2 0 T 2
1+2 x i w0 wi + xi kwi k ≥ xi + 2w0 wi + kwi k
kwi k2
i∈V i∈V i∈V
2
w0T wi
X
= xi + kwi k ≥ 0,
kwi k
i∈V
Indeed, for any unit vector d ∈ Rk , the vector ((dT ui )2 )ni=1 belongs to TH(G)
and thus ni=1 (dT ui )2 xi ≤ 1 for all x ∈ F . In other words, the maximum
P
called the Shannon capacity p of the graph G. Using Fekete’s lemma7 one can
verify that Θ(G) = limk→∞ k α(Gk ).
The parameter Θ(G) was introduced by Shannon in 1956. The motivation
is as follows. Suppose V is a finite alphabet, where some pairs of letters could
be confused when they are transmitted over some transmission channel.
These pairs of confusable letters can be seen as the edge set E of a graph G =
(V, E). Then the stability number of G is the largest number of one-letter
messages that can be sent over the transmission channel without danger
of confusion. Words of length k correspond to k-tuples in V k . Two words
(i1 , · · · , ik ) and (j1 , · · · , jk ) can be confused if at every position h ∈ [k]
the two letters ih and jh are equal or can be confused, which corresponds
to having an edge in the strong product Gk . Hence the largest number of
words of length k that can be sent without danger of confusion is equal to the
stability number of Gk and the Shannon capacity of the graph G represents
the rate of correct transmission of the channel.
For instance, for the 5-cycle C5 , α(C5 ) = 2, but α((C5 )2 ) ≥ 5. Indeed, if
1, 2, 3, 4, 5 are the nodes of C5 (in this cyclic order), then the five 2-letters
words (1, 1), (2, 3), (3, 5), (4, 2), √ (5, 4) form a stable set in the strong product
2
G . This implies that Θ(C5 ) ≥ 5.
Determining the exact Shannon capacity of a graph is a very difficult
problem in general, even for small graphs. For instance, the exact value
of the Shannon capacity of C5 was not known until Lovász [1979] showed
how to use the theta number in order to upper bound √ the Shannon capacity.
Lovász [1979] √ showed that Θ(G) ≤ ϑ(G) and ϑ(C 5 ) = 5, which implies that
Θ(C5 ) = 5. For instance, although the exact value of the theta number
of C2n+1 is known (cf. Proposition 4.7.6), the exact value of the Shannon
capacity of C2n+1 is not known, already for C7 .
7 Consider a sequence (ak )k of positive real numbers satisfying: ak+m ≥ ak + am for all
integers k, m ∈ N. Fekete’s lemma claims that limk→∞ ak /k = supk∈N ak /k. Then apply
Fekete’s lemma to the sequence ak = log α(Gk ).
96 Graph coloring and independent sets
Lemma 4.6.3 The theta number of the strong product of two graphs G
and H satisfies ϑ(G · H) = ϑ(G)ϑ(H).
(Check it.) First we mention the following analogous inequality relating the
theta numbers of G and its complement G.
Proposition 4.7.1 For any graph G = (V, E), we have that ϑ(G)ϑ(G) ≥
|V |.
Proof Using the formulation of the theta number from (4.14), we obtain
matrices C, C 0 ∈ S n such that C − J, C 0 − J 0, Cii = ϑ(G), Cii0 = ϑ(G)
for i ∈ V , Cij = 0 for {i, j} ∈ E and Cij0 = 0 for {i, j} ∈ E. Combining the
graphs, namely for vertex-transitive graphs. In order to show this, one ex-
ploits in a crucial manner the symmetry of G, which permits to show that
the semidefinite program defining the theta number has an optimal solution
with a special (symmetric) structure. We need to introduce some definitions.
Proof Directly from the fact that hJ, σ(X)i = hJ, Xi, Tr(σ(X)) = Tr(X)
and σ(X)ij = Xσ(i)σ(j) = 0 if {i, j} ∈ E (since σ is an automorphism of
G).
obtained by averaging over all matrices σ(X) for σ ∈ Aut(G). As the set of
optimal solutions of (4.11) is convex, X ∗ is still an optimal solution of (4.11).
Moreover, by construction, X ∗ is invariant under action of Aut(G).
Corollary 4.7.4 If G is a vertex-transitive graph then the program (4.11)
has an optimal solution X ∗ satisfying Xii∗ = 1/n for all i ∈ V and X ∗ e =
ϑ(G)
n e.
The minimum distance of the code C is the largest integer d ∈ N for which
any two distinct words of C have Hamming distance at least d: dH (u, v) ≥ d
for all u 6= v ∈ C. A fundamental problem in coding theory is to compute the
maximum cardinality A(n, d) of a code of length n with minimum distance
at least d. This is the maximum number of messages of length n that can
correctly be decoded if after transmission at most (d − 1)/2 bits can be
erroneously transmitted in each word of C.
Computing A(n, d) is in fact an instance of the maximum stable set prob-
lem. Indeed, let G(n, d) denote the graph with vertex set V = {0, 1}n and
with an edge {u, v} if dH (u, v) ≤ d − 1. Such graph is called a Hamming
graph since its edges depend only on the Hamming distances between the
vertices. Then, a code C ⊆ V has minimum distance at least d if and only
if C is a stable set in G(n, d) and thus
A natural idea for getting an upper bound for A(n, d) is to use the theta
number ϑ(G), or its strengthening ϑ0 (G) obtained by adding nonnegativity
conditions on the entries of the matrix variable:
Proof (i) is direct verification (check it) and then (ii),(iii) follow easily.
(iv) Set t = dH (u, v) and Z = {i ∈ [n] : ui 6= vi } with t = |Z|. Moreover
for each a ∈ V define the set A = {i ∈ [n] : ai = 1}. Then, we find that
Bk (u, v) = A⊆[n]:|A|=k (−1)|A∩Z| = ti=0 (−1)i ti n−t k
P P
k−i = Pn (t).
(v) follows from (ii) and (iv).
Note that the vectors {Ca : a ∈ V = {0, 1}n } form an orthogonal basis of
RV and that they are the common eigenvectors to all matrices Bk , and thus
to all matrices in the Bose-Mesner algebra Bn .
Lemma 4.8.2 Let X = nk=0 xk Bk ∈ Bn . Then, X 0 ⇐⇒ x0 , x1 , . . . , xn ≥
P
0.
Proof The claim follows from the fact that the Bk ’s are positive semidef-
inite and pairwise orthogonal. Indeed, X 0 if all xk ’s are nonnegative.
Conversely, if X 0 then 0 ≤ hX, Bk i = xk hBk , Bk i, implying xk ≥ 0.
program:
Pn
maxx0 ,...,xn ∈R 22n x0 s.t. xk nk = 2−n ,
Pnk=0
xk Pnk (t) = 0 for t = 1, . . . , d − 1,
Pk=0
n k
k=0 xk Pn (t) ≥ 0 for t = d, . . . , n,
xk ≥ 0 for k = 0, 1, . . . , n,
(4.36)
where Pnk (t) is the Krawtchouk poynomial in (4.35).
Proof In the formulation (4.32) of ϑ0 (G(n, d)) we may assume without loss
of generality that the variable X is invariant under action of the automor-
phism group of G(n, d). Hence, in particular, we may assume that X is
invariant under action of the group Gn and thus that X belongs to Bn .
In other words, X = nk=0 xk Bk for some scalars x0 , . . . , xn ∈ R. It now
P
x ≥ 0.
Then [x]t is indexed by the set Pt (n), consisting of all subsets I ⊆ [n] with
Q
|I| ≤ t, i.e., [x]t = ( i∈I xi )I∈Pt (n)(n) . For instance, [x]1 = (1, x1 , . . . , xn )
and [x]n contains all 2n possible products of distinct xi ’s.
Next we consider the matrix Y = [x]t [x]T t . By construction, this matrix is
positive semidefinite and satisfies the following linear conditions:
Y (I, J) = Y (I 0 , J 0 ) if I ∪ J = I 0 ∪ J 0
(use here the fact that x is binary valued) and Y∅,∅ = 1. This motivates the
following definition.
Definition 4.9.1 Given an integer 0 ≤ t ≤ n and a vector y = (yI ) ∈
RP2t (n) , let Mt (y) denote the symmetric matrix indexed by Pt (n), with (I, J)th
entry yI∪J for I, J ∈ Pt (n). Mt (y) is called the moment matrix of order t of
y.
Example 4.9.2 As an example, for n = 2, the matrices M1 (y) and M2 (y)
have the form
∅ 1 2 12
∅ 1 2
∅
y∅ y1 y2 y12
∅ y∅ y1 y2
1 y1 y1 y12 y12
M1 (y) = 1 y1 y1 y12 , M2 (y) = .
2 y2 y12 y2 y12
2 y2 y12 y2
12 y12 y12 y12 y12
Here to simplify notation we use yi instead of y{i} and y12 instead of y{1,2} .
Note that M1 (y) corresponds to the matrix variable in the formulation (4.18)
of ϑ(G). Moreover, M1 (y) occurs as a principal submatrix of M2 (y).
We can now formulate new upper bounds for the stability number.
Definition 4.9.3 For any integer 1 ≤ t ≤ n, define the parameter
( n )
X
last (G) = max yi : y∅ = 1, yij = 0 for {i, j} ∈ E, Mt (y) 0 ,
y∈RP2t (n)
i=1
(4.37)
known as the Lasserre bound of order t.
Lemma 4.9.4 For each 1 ≤ t ≤ n, we have that α(G) ≤ last (G). More-
over, last+1 (G) ≤ last (G).
Proof Let x = χS where S is a stable set of G, and let y = [x]t . Then the
moment matrix Mt (y) is feasible for the program (4.37) with value ni=1 yi =
P
|S|, which shows |S| ≤ last (G) and thus α(G) ≤ last (G).
The inequality last+1 (G) ≤ last (G) follows from the fact that Mt (y) occurs
as a principal submatrix of Mt+1 (y).
4.9 Lasserre hierarchy of semidefinite bounds 105
α(G) ≤ lasn (G) ≤ . . . ≤ last (G) ≤ . . . ≤ las2 (G) ≤ las1 (G) = ϑ(G).
It turns out that, at order t = α(G), the Lasserre bound is exact: last (G) =
α(G).
Theorem 4.9.5 For any graph G, last (G) = α(G) for any t ≥ α(G).
Lemma 4.9.7 Let y ∈ RPn (n) and set λ = Zn−1 y ∈ RPn (n) . Then,
Mn (y) = Zn Diag(λ)ZnT .
Proof Pick I, J ⊆ [n]. We show that the (I, J)th entry of Zn Diag(λ)ZnT is
106 Graph coloring and independent sets
X X X X
= (−1)|H\K| yH = yH (−1)|H\K| ,
K:I∪J⊆K H:K⊆H H:I∪J⊆H K:I∪J⊆K⊆H
|H\K|
P
which is equal to yI∪J , since the inner summation K:I∪J⊆K⊆H (−1)
is equal to zero whenever H 6= I ∪ J.
Corollary 4.9.8 Let y ∈ RPn (n) . The following assertions are equivalent.
(i) Mn (y) 0.
(ii) Zn−1 y ≥ 0.
(iii) The vector y is a conic combination of the vectors [x]n for x ∈ {0, 1}n ;
P
that is, we have y = x∈{0,1}n λx [x]n for some nonnegative scalars λx .
y = (y∅ − y1 − y2 + y12 )[0]2 + (y1 − y12 )[e1 ]2 + (y2 − y12 )[e2 ]2 + y12 [e1 + e2 ]2
(setting e1 = (1, 0) and e2 = (0, 1)). Therefore, we see that this is indeed a
conic combination if and only if any of the following equivalent conditions
holds:
∅ 1 2 12
∅ y∅ − y1 − y2 + y12 ≥ 0
y0 y1 y2 y12
1 y y1 y12 y12 y1 − y12 ≥ 0
M2 (y) = 1 0 ⇐⇒
2 y2 y12 y2 y12
y2 − y12 ≥ 0
y12 ≥ 0.
12 y12 y12 y12 y12
set of incidence vectors of the stable sets in a graph. The classical so-called
polyhedral approach is to consider the polytope
P = conv(X ) ⊆ Rn ,
defined as the convex hull of all vectors in X . Then the question boils down
to finding the linear inequality description of P (or at least a part of it). It
turns out that this question gets a simpler answer if we ‘lift’ the problem
into higher dimension and allow the use of additionnal variables.
Define the polytope
P = conv([x]n : x ∈ X ) ⊆ RPn (n) .
Let π denote the projection from the space RPn (n) onto the space Rn where,
for a vector y = (yI )I⊆[n] , π(y) = (y1 , . . . , yn ) denotes its projection onto the
coordinates indexed by the n singleton subsets of [n]. Then, by construction,
we have that
P = π(P).
As we now indicate the results from the previous subsection show that the
lifted polytope P admits a very simple explicit description.
Indeed, for a vector y ∈ RPn (n) , the trivial identity y = Zn (Zn−1 y) shows
that y ∈ P if and only if it satisfies the following conditions:
Z −1 y ≥ 0, (Z −1 y)x = 0 ∀x 6∈ X , eT Z −1 y = 1. (4.38)
The first condition says that y is a conic combination of vectors [x]n (for x ∈
{0, 1}n ), the second condition says that only vectors [x]n for x ∈ X are used
in this conic combination, the last condition says that the conic combination
is in fact a convex combination and it can be equivalently written as y∅ = 1.
Hence (4.38) gives an explicit linear inequality description for the polytope
P. Moreover, using Corollary 4.9.8, we can replace the condition Zn−1 y ≥ 0
by the condition Mn (y) 0. In this way we get a description of P involving
positive semidefiniteness of the full moment matrix Mn (y).
So what this says is that any polytope P with 0 − 1 vertices can be ob-
tained as projection of a polytope P admitting a simple explicit description.
The price to pay however is that the lifted polytope P “lives” in a 2n -
dimensional space, thus exponentially large with respect to the dimension n
of the ambiant space of the polytope P . Nevertheless this perspective leads
naturally to hierarchies of semidefinite relaxations for P , obtained by con-
sidering only truncated parts Mt (y) of the full matrix Mn (y) for growing
orders t. We explain below how this idea applies to the stable set polytope.
We refer to Lasserre [2001] and Laurent [2003] for a detailed treatment, also
108 Graph coloring and independent sets
able to find the original combinatorial polytope after finitely many steps
(typically, n steps when the combinatorial polytope is the convex hull of a
subset of {0, 1}n ). These relaxations can be LP-based or SDP-based. This
includes the hierarchies designed by Lovász and Schrijver [1991], by Sherali
and Adams [1990] and by Lasserre [2001]. It turns out that the construction
of Lasserre [2001], which we have briefly described in this chapter, yields
the tigthest bounds. We will return to it within the more general setting of
polynomial optimization in Part IV of this book. We refer to Laurent [2003]
for a comparative study of these various constructions and to the monograph
by Tunçel [2010] for a detailed exposition.
As explained in this chapter the computation of the theta number for the
Hamming graph G(n, d) boils down to the LP bound of Delsarte [1973] for
the maximum cardinality A(n, d) of a binary code of length n with mini-
mum distance at least d. Strengthenings of the Delsarte bound have been
investigated, that can be seen as (variations of) the next levels in Lasserre
hierarchy. Once again what is crucial for being able to compute numerically
these bounds is exploiting symmetry. However, after exploiting symmetry
(applying block-diagonalization), the resulting programs remain SDP’s, al-
beit with many smaller blocks (instead of a single large matrix) and much
less variables. These ideas are developed, in particular, in the papers Schri-
jver [2005] and Litjens, Polak, Schrijver [2017] (where the bound of order
t = 2 is computed). Still using symmetry reduction, it is shown in Laurent
[2007a] that computing the level t bound last (G(n, d)) boils down to solving
an SDP whose size is polynomial in n for any fixed t.
The theta number plays an important role in many different areas. For
instance it will pop up again in Chapter 6 when studying analogues of the
celebrated Grothendieck constant for graphs. The theta number has also
been extended to infinite graphs arising to model geometric problems, like
for instance packing problems on the sphere, to which we will return in
Chapter 10.8
Exercises
4.1 Consider two graphs G1 = (V1 , E2 ) and G2 = (V2 , E2 ) with disjoint
vertex sets. Their disjoint union is the graph (V1 ∪ V2 , E1 ∪ E2 ).
(a) Show that the disjoint union of two perfect graphs is a perfect graph.
(b) Construct a class of perfect graphs having exponentially many max-
imal stable sets (in terms of the number of nodes).
8 Rephrase better and insert relevant references?
Exercises 111
where the minimum is taken over all unit vectors c and all orthonor-
mal representations u1 , · · · , un of G (i.e., u1 , . . . , un are unit vectors
satisfying uTi uj = 0 for all pairs {i, j} ∈ E).
Show: ϑ(G) = ϑ1 (G).
Hint for the inequality ϑ(G) ≤ ϑ1 (G): Use the dual formulation of
ϑ(G) from Lemma 4.4.1 and the matrix M = (viT vj )ni,j=1 , where vi =
c − cTuui for i ∈ [n].
i
Hint for the inequality ϑ1 (G) ≤ ϑ(G): Use an optimal solution X =
tI − B of the dual formulation for ϑ(G), written as the Gram matrix
of vectors x1 , . . . , xn . Show that there exists a nonzero vector c which
is orthogonal to x1 , . . . , xn , and consider the vectors ui = c+x √ i.
t
√
4.8 Show that ϑ(C5 ) ≤ 5, using the formulation of Exercise 4.7 for the
theta number.
Hint: Consider the following vectors c, u1 , . . . , u5 ∈ R3 : c = (0, 0, 1),
uk = (s cos(2kπ/5), s sin(2kπ/5), t) for k = 1, 2, 3, 4, 5, where the scalars
s, t ∈ R are chosen in such a way that u1 ,√. . . , u5 form an orthonormal
representation of C5 . Recall cos(2π/5) = 5−1 4 .
This is the original proof of Lovász [1979], known as the umbrella con-
struction.
5.1 Introduction
5.1.1 The MAX CUT problem
The maximum cut problem (MAX CUT) is the following problem in com-
binatorial optimization. Let G = (V, E) be a graph and let w = (wij ) ∈ RE +
be nonnegative weights assigned to the edges. Given a subset S ⊆ V , the
cut δG (S) consists of the edges {i, j} ∈ E having exactly one endnode in S,
i.e., with |{i, j} ∩ S| = 1. In other words, δG (S) consists of the edges that
are cut by the partition (S, S = V \ S) of V . The cut δG (S) is called trivial
if S = ∅ or V (in which case it is empty). Then the weight of the cut δG (S)
P
is w(δG (S)) = {i,j}∈δG (S) wij and the MAX CUT problem asks for a cut
of maximum weight, i.e., compute
mc(G, w) = max w(δG (S)).
S⊆V
Thus,
w(S, S) = w(δG (S)) for all S ⊆ V.
To state its complexity, we formulate MAX CUT as a decision problem:
MAX CUT: Given a graph G = (V, E), edge weights w ∈ ZE + and an
integer k ∈ N, decide whether there exists a cut of weight at least k.
It is well known that MAX CUT is an NP-complete problem. In fact, MAX
CUT is one of Karp’s 21 NP-complete problems. So unless the complexity
5.1 Introduction 115
Then, we have
X X X X
w(S, S) = wij = ai aj = ( ai )( aj ) = a(S)(σ−a(S)) ≤ σ 2 /4,
i∈S,j∈S i∈S,j∈S i∈S j∈S
with equality if and only if a(S) = σ/2 or, equivalently, a(S) = a(S). From
this it follows that there is a cut of weight at least k if and only if the
sequence a1 , . . . , an can be partitioned. This concludes the proof.
This hardness result for MAX CUT is in sharp contrast to the situation of
the MIN CUT problem, which asks for a nontrivial cut of minimum weight,
i.e., to compute
min w(S, S).
S⊆V :S6=∅,V
(For MIN CUT the weights of edges are usually called capacities and they
are also assumed to be nonnegative). It is well known that the MIN CUT
problem can be solved in polynomial time (together with its dual MAX
FLOW problem), using the Ford-Fulkerson algorithm. Specifically, the Ford-
Fulkerson algorithm permits to find in polynomial time a minimum cut (S, S)
separating a given source s and a given sink t, i.e., with s ∈ S and t ∈ S.
Thus a minimum weight nontrivial cut can be obtained by applying this
algorithm |V | times, fixing any s ∈ V and letting t vary over all nodes of
V \ {s}. Details can be found in essentially every textbook on combinatorial
optimization.
Even stronger, Håstad in 2001 showed that it is NP-hard to approximate
MAX CUT within a factor of 1617 ∼ 0.941.
On the positive side, one can compute a 0.878-approximation of MAX
CUT in polynomial time, using semidefinite programming. This algorithm,
116 Approximating the MAX CUT problem
due to Goemans and Williamson [1995], is one of the most influential ap-
proximation algorithms which are based on semidefinite programming. We
will explain this result in detail in Section 5.2.1.
Before doing that we recall some results for MAX CUT based on using
linear programming.
the graph G is taken into account by the objective function of this LP.
The following triangle inequalities are valid for the cut polytope CUTn :
xij − xik − xjk ≤ 0, xij + xjk + xjk ≤ 2, (5.1)
for all distinct i, j, k ∈ [n]. This is easy to see, just verify that these in-
equalities hold when x is equal to the incidence vector of a cut. The triangle
inequalities (5.1) imply the following bounds (check it):
0 ≤ xij ≤ 1 (5.2)
on the variables. Let METn denote the polytope in RE(Kn ) defined by the tri-
angle inequalities (5.1). Thus, METn is a linear relaxation of CUTn , tighter
than the trivial relaxation by the unit hypercube:
CUTn ⊆ METn ⊆ [0, 1]E(Kn ) .
It is known that equality CUTn = METn holds for n ≤ 4, but the inclusion
CUTn ⊂ METn is strict for n ≥ 5. Indeed, the inequality:
X
xij ≤ 6 (5.3)
1≤i<j≤5
is valid for CUT5 (as any cut of K5 has cardinality 0, 4 or 6), but it is not
valid for MET5 . For instance, the vector (2/3, . . . , 2/3) ∈ R10 belongs to
MET5 but it violates the inequality (5.3) (since 10 · 2/3 > 6).
We can define the following linear programming bound:
X
lp(G, w) = max wij xij : x ∈ METn (5.4)
{i,j}∈E(G)
Hence, the expected weight of the cut produced by this random partition is
equal to
X X 1 w(E)
E(w(S, S)) = wij P({i, j} is cut) = wij = .
2 2
{i,j}∈E {i,j}∈E
Figure 5.2 Views on the convex set E3 behind the semidefinite relaxation.
5.2 The algorithm of Goemans and Williamson 121
arccos(uT v)
P(sign(uT r) 6= sign(v T r)) = . (5.8)
π
(ii) The expectation of the random variable sign(uT r) sign(v T r) ∈ {−1, +1} is
equal to
2
E[sign(uT r) sign(v T r)] = arcsin(uT v). (5.9)
π
Proof (i) Since the probability distribution from which we sample the unit
vector r is rotationally invariant we can assume that u, v and r lie in a
common plane. Hence we can assume that they lie on a unit circle in R2
and that r is chosen according to the uniform distribution on this circle.
Then the probability that sign(uT r) 6= sign(v T r) depends only on the angle
between u and v. Using a figure (draw one!) it is easy to see that
1 1
P[sign(uT r) 6= sign(v T r)] = 2 · arccos(uT v) = arccos(uT v).
2π π
(ii) By definition, the expectation E[sign(uT r) sign(v T r)] can be computed
as
(+1) · P[sign(uT r) = sign(v T r)] + (−1) · P[sign(uT r) 6= sign(v T r)]
arccos(uT v)
= 1 − 2 · P[sign(uT r) 6= sign(v T r)] = 1 − 2 · π ,
where we have used (i) for the last equality. Now use the trigonometric
identity
π
arcsin t + arccos t = ,
2
to conclude the proof of (ii).
Using elementary univariate calculus one can show the following fact.
Lemma 5.2.3 For all t ∈ [−1, 1)], the following inequality holds:
2 arccos t
≥ 0.878. (5.10)
π 1−t
One can also “see” this on the following plots of the function in (5.10),
where t varies in [−1, 1) in the first plot and in [−0.73, −0.62] in the second
plot.
5.2 The algorithm of Goemans and Williamson 123
10
8 8.791e-1
8.79e-1
6
8.789e-1
4 8.788e-1
8.787e-1
2
8.786e-1
-1 -0.5 0 0.5 1 -0.73 -0.72 -0.71 -0.7 -0.69 -0.68 -0.67 -0.66 -0.65
Proof (of Theorem 5.2.1) Let X be the optimal solution of the semidefinite
program (5.6) and let v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 ,
as in Steps 1,2 of the GW algorithm. Let (S, S) be the random partition of
V , as in Steps 3,4 of the algorithm. We now use Lemma 6.2.2(i) to compute
the expected value of the cut (S, S):
P
E(w(S, S)) = {i,j}∈E wij P({i, j} is cut)
arccos(viT vj )
wij P(sign(viT r) 6= sign(vjT r)) =
P P
= {i,j}∈E {i,j}∈E wij π
1−viT vj arccos(v T v )
· π2 1−vT vi j .
P
= {i,j}∈E wij 2 i j
2 arccos(viT vj )
By Lemma 5.2.3, each term π 1−viT vj
can be lower bounded by the con-
stant 0.878. Since all weights are nonnegative, each term wij (1 − viT vj ) is
nonnegative. Therefore, we can lower bound E(w(S, S)) in the following way:
1 − viT vj
X
E(w(S, S)) ≥ 0.878 · wij .
2
{i,j}∈E
Finally, it is clear that the maximum weight of a cut is at least the expected
value of the random cut (S, S):
mc(G, w) ≥ E(w(S, S)).
Putting things together we can conclude that
mc(G, w) ≥ E(w(S, S)) ≥ 0.878 · sdp(G, w).
124 Approximating the MAX CUT problem
This concludes the proof, since the other inequality mc(G, w) ≤ sdp(G, w)
holds by (5.7).
(Lw )ij = −wij for {i, j} ∈ E, (Lw )ij = 0 for (i 6= j, {i, j} 6∈ E).
The following can be checked (Exercise 4.2).
Lemma 5.3.1 The following properties hold for the Laplacian matrix Lw :
(i) For any vector x ∈ {±1}n , 14 xT Lw x = 12 {i,j}∈E wij (1 − xi xj ).
P
qp(A) ≤ sdp(A)
for any symmetric matrix A. In the special case when A is positive semidefi-
nite, Nesterov shows that sdp(A) is a π2 -approximation for qp(A). The proof
is based on the same rounding technique of Goemans-Williamson, but the
analysis is different. It relies on the following property of the function arcsin t:
There exist positive scalars ak > 0 (k ≥ 0) such that
X
arcsin t = t + ak t2k+1 for all t ∈ [−1, 1]. (5.13)
k≥0
whose entries are the images of the entries of X under the map t 7→ arcsin t−
t. Then, X 0 implies X̃ 0.
Proof The proof uses the following fact: If X = (xij )ni,j=1 is positive semidef-
inite then, for any integer k ≥ 1, the matrix (Xijk )ni,j=1 (whose entries are the
k-th powers of the entries of X) is positive semidefinite as well. This follows
from the fact that the Hadamard product preserves positive semidefinite
matrices (recall Section B.2.3). Using this fact, the form of the series de-
composition (5.13), and taking limits, implies the result of the lemma.
2 Pn 2 Pn
= π i,j=1 Aij arcsin(viT vj ) = π i,j=1 Aij arcsin Xij
P
2 n Pn
= π i,j=1 Aij Xij + i,j=1 A ij (arcsin Xij − Xij ) .
By Lemma 5.3.2, the second term is equal to hA, X̃i ≥ 0, since A 0 and
X̃ 0. Moreover, we recognize in the first term the objective value of the
semidefinite program (5.12). Combining these facts, we obtain:
n
X 2
E( Aij xi xj ) ≥ sdp(A).
π
i,j=1
X X
qp(a, b) = max aij (1 − xi xj ) + bij (1 + xi xj ) : x ∈ {±1}n ,
ij∈E1 ij∈E2
(5.14)
where aij , bij ≥ 0 for all ij. Write the semidefinite relaxation:
X X
sdp(a, b) = max aij (1 − Xij ) + bij (1 + Xij ) : X 0, Xii = 1 ∀i ∈ [n] .
ij∈E1 ij∈E2
(5.15)
Goemans and Williamson [1995] show that the same approximation result
holds as for MAX CUT:
Lemma 5.3.5 For any z ∈ [−1, 1], the following inequality holds:
2 π − arccos z
≥ 0.878.
π 1+z
P P
E ij∈E1 aij (1 − xi xj ) + ij∈E2 bij (1 + xi xj )
arccos(viT vj ) arccos(viT vj )
P P
=2· ij∈E1 aij π +2· ij∈E2 ij 1 −
b π
2 arccos(viT vj ) P 2 π − arccos(viT vj )
aij (1 − viT vj ) T
P
= ij∈E1 + ij∈E bij (1 + v i v j )
| {z } π 1 − viT vj 2
| {z }π 1 + viT vj
≥0 | {z } ≥0 | {z }
≥0.878 ≥0.878
Here we have used Lemmas 5.2.3 and 5.3.5. From this we can conclude that
qp(a, b) ≥ 0.878 · sdp(a, b).
In the next section we indicate how to use the quadratic program (5.14)
in order to formulate and approximate MAX 2-SAT.
0 otherwise. Thus,
1 + x0 xi 1 − x0 xi
v(zi ) = , v(z i ) = 1 − v(zi ) = .
2 2
Based on this one can now express v(C) for a clause with two literals:
1−x0 xi 1−x0 xj
v(zi ∨ zj ) = 1 − v(z i ∧ z j ) = 1 − v(z i )v(z j ) = 1 − 2 2
This quadratic program is of the form (5.14). Hence Theorem 5.3.4 applies.
Therefore, the approximation algorithm of Goemans and Williamson gives
a 0.878 approximation for MAX 2SAT.
holds.
The smallest constant K for which the second inequality holds, is called
the Grothendieck constant KG . It is known that KG lies between 1.676 . . .
and 1.782 . . . but its exact value is currently not known. In the following we
will prove that
π
KG ≤ √ = 1.782 . . .
2 ln(1 + 2)
The argument will also rely on an approximation algorithm which uses ran-
domized rounding (in a tricky way).
The proof for Theorem 5.3.6 is algorithmic and has the following steps:
u1 , . . . , um , v1 , . . . , vn ∈ S m+n−1 .
Now by Lemma 5.3.7 below each last expectation will turn out to be equal to
βuT
i vj . Then the total sum in the right hand side will be equal to βsdp∞→1 (A).
This implies kAk∞→1 ≥ βsdp∞→1 (A) and thus KG ≤ β −1 .
Now the following lemma, Krivine’s trick, finishes the proof of Theo-
rem 5.3.6.
5.3 Some extensions 131
holds with
2 √
β= ln(1 + 2) = 0.561 . . .
π
2
Proof Define the function E : [−1, +1] → [−1, +1] by E(t) = π arcsin t.
Then by Grothendieck’s identity, Lemma 6.2.2,
h i
E sign((u0i )T r) sign((vj0 )T r) = E((u0i )T vj0 ).
and
(vj0 )r = |g2r+1 |β 2r+1 vj⊗2r+1 .
p
Then
∞
X
(u0i )T vj0 = g2r+1 β 2r+1 (uT
i vj )
2r+1
= E −1 (βuT
i vj )
r=0
and
∞
X
1 = (u0i )T u0i = (vj0 )T vj0 = |g2r+1 |β 2r+1 ,
r=0
2
E(t) = arcsin t,
π
and so
∞
π X (−1)2r+1 π 2r+1
E −1 (t) = sin t = t .
2 (2r + 1)! 2
r=0
Hence,
∞
X (−1)2r+1 π 2r+1 π
1= β = sinh β ,
(2r + 1)! 2 2
r=0
which implies
2 2 √
β= arsinh 1 = ln(1 + 2)
π π
√
because arsinh t = ln(t + t2 + 1).
∞
X
(u0i )T vj0 = g2r+1 β 2r+1 (uT 2r+1
i vj )
r=0
by its series expansion which converges fast enough and then we use its
Cholesky decomposition.
For their work [1995], Goemans and Williamson won in 2000 the Fulk-
erson prize (sponsored jointly by the Mathematical Programming Society
and the AMS) which recognizes outstanding papers in the area of discrete
mathematics for this result.
How good is the MAX CUT algorithm? Are there graphs where the value
of the semidefinite relaxation and the value of the maximal cut are a factor
of 0.878 apart or is this value 0.878, which maybe looks strange at first sight,
only an artefact of our analysis? It turns out that the value is optimal. In
2002 Feige and Schechtmann gave an infinite family of graphs for which the
ratio mc/sdp converges to exactly 0.878 . . .. This proof uses a lot of nice
mathematics (continuous graphs, Voronoi regions, isoperimetric inequality)
and it is explained in detail in the Chapter 8 of the book Approximation
Algorithms and Semidefinite Programming of Gärtner and Matoušek.
In 2007, Khot, Kindler, Mossel, O’Donnell showed that the algorithm of
Goemans and Williamson is optimal in the following sense: If the Unique
Games conjecture is true, then there is no polynomial time approximation
algorithm achieving a better approximation ratio than 0.878 unless P = NP.
Currently, the validity and the implications of the Unique Games conjecture
134 Approximating the MAX CUT problem
are under heavy investigation. The book of Gärtner and Matoušek also con-
tains an introduction to the unique games conjecture.
Exercises
5.1 The goal of this exercise is to show that the maximum weight stable
set problem can be formulated as an instance of the maximum cut
problem.
Let G = (V, E) be a graph with node weights c ∈ RV+ . Define the
new graph G0 = (V 0 , E 0 ) with node set V 0 = V ∪ {0}, with edge set
0
E 0 = E ∪ {{0, i} : i ∈ V }, and with edge weights w ∈ RE
+ defined by
and
X X
g(A) = max Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ {±1} .
i∈[m] j∈[n]
(f) Show that the maximum cut problem in a graph G = ([n], E) with
nonnegative edge weights can be formulated as an instance of com-
puting the cut norm f (A) of some matrix A.
5.5 The goal of this exercise is to show that computing the cut norm of a
matrix (recall Exercise 5.4) is at least as difficult as solving the maxi-
mum cut problem.
Consider a graph G = (V = [n], E) with m edges and with non-
negative edge weights (wjk ). We define the (2m) × n matrix A, whose
columns are indexed by the vertices and having two rows for each edge.
Namely, for the edge {vj , vk }, the corresponding two rows of A have
entries
A2i−1,j = A2i,k = wjk , A2i−1,k = A2i,j = −wjk .
136 Approximating the MAX CUT problem
Show that if all diagonal entries of A are equal to zero then equality
holds:
X
qp(A) = max Aij xi xj : xi ∈ [−1, 1], i ∈ V .
i,j=1
mc(C5 )
How does the ratio sdp(C5 ) compare to the GW ratio 0.878?
5.8 (a) Determine the optimal value of the following quadratic optimization
problem:
qp = max{x1 x2 + x2 x3 + x3 x4 − x4 x1 : x1 , x2 , x3 , x4 ∈ {±1}}.
and the definition of the basic semidefinite bound sdp(G, w) from (5.6)
X 1 − Xij n
sdp(G, w) = max wij : X ∈ S+ , Xii = 1 for all i ∈ [n] .
2
{i,j}∈E
.
6
Generalizations of Grothendieck’s inequality and
applications
holds, where:
Xm X
n
kAk∞→1 = max Aij xi yj : x2i = yj2 = 1, i ∈ [m], j ∈ [n] .
i=1 j=1
holds true. The left hand side is the semidefinite relaxation of the right
hand side. Furthermore, observe that the original Grothendieck constant
KG , which we studied in Chapter 5, is equal to the supremum of K(G) over
all bipartite graphs G. Hence the graph parameter K(G) can be seen as an
extension to an arbitrary graph of the classical Grothendieck constant.
The following theorem gives a surprising connection between the Grothendieck
constant of a graph and the theta number.
K(G) ≤ C ln ϑ(G),
holds. Similarly,
X
Γmax = max Auv fu · fv : fu ∈ RV , u ∈ V, kfu k ≤ 1
{u,v}∈E
Then,
X √
E Auv (Xu Yv + Xv Yu ) ≤ 2 AB(Γmax − Γmin ),
{u,v}∈E
where
X
Γmin = min Auv fu · fv : fu ∈ RV , u ∈ V, kfu k ≤ 1 .
{u,v}∈E
for some vectors fu0 with kfu0 k ≤ 1, because the matrix on the left hand side is
positive semidefinite (Exercise 6.1 (c)) and thus has a Cholesky factorization.
We introduce new variables Uu and Vu to be able to apply (6.2). The new
variables are
1 √ √ 1 √ √
Uu = Xu / A + Yu / B , Vv = Xu / A − Yu / B .
2 2
Then E[Uu2 ] ≤ 1 and E[Vu2 ] ≤ 1 (verify it). So we can apply (6.2)
X
E Auv (Xu Yv + Xv Yu )
{u,v}∈E
√ X X
= 2 AB E Auv Uu Uv − E Auv Vu Vv
{u,v}∈E {u,v}∈E
√
≤ 2 AB(Γmax − Γmin ).
The third summand in (6.1) we estimate by applying the useful lemma with
Xu = yu − xu , Yu = −(yu − xu ). We get
2
X
E Auv (yu − xu )(yv − xv ) ≥ −M e−M /2 (Γmax − Γmin ).
{u,v}∈E
Altogether,
X p 2
Auv E[xu xv ] ≥ Γmax − 2 M e−M 2 /2 + M e−M /2 (Γmax − Γmin ).
{u,v}∈E
So,
X 2 1 Γmax
Auv E[xu xv ] ≥ Γmax − √ Γmax − Γmax .
10 10 Γmax − Γmin
{u,v}∈E
where
2
2 Γ((r + 1)/2)
γ(r) = ,
r Γ(r/2)
and where Γ is the usual Gamma function, which is the extension of the
factorial function.
1
The first three values of 2γ(r)−1 are:
1 1
= = 3.65979 . . . ,
2γ(1) − 1 4/π − 1
1 1
= = 1.75193 . . . ,
2γ(2) − 1 π/2 − 1
1 1
= = 1.43337 . . .
2γ(3) − 1 16/(3π) − 1
1
For r → ∞ the values 2γ(r)−1 converge to 1. In particular, the proof of the
theorem gives another proof of the original Grothendieck’s inequality albeit
1
with a worse constant KG ≤ 4/π−1 .
Zu T Zv
E
kZuk kZvk
∞
2 Γ((r + 1)/2) 2 X (1 · 3 · · · (2k − 1))2
= (uT v)2k+1 .
r Γ(r/2) (2 · 4 · · · 2k)((r + 2) · (r + 4) · · · (r + 2k))
k=0
2
E[sign(Zu)sign(Zv)] = arcsin(uT v)
π
T 3
1 · 3 (uT v)5
2 T 1 (u v)
= u v+ + + ··· .
π 2 3 2·4 5
The proof of Lemma 6.2.2 requires quite some integration. The computa-
tion starts of by
Zu T Zv
E
kZuk kZvk
x · x − 2tx · y + y · y
Z Z
p
2 −r x y
= (2π 1 − t ) · exp − dxdy,
Rr Rr kxk kyk 2(1 − t2 )
where t = uT v. We will omit the tedious calculation here. For those who
cannot resist a definite integral (like G.H. Hardy): it can be found in [? ].
The only three facts which will be important is that the power series
expansion
∞
Zu T Zv
X
E = f2k+1 (uT v)2k+1
kZuk kZvk
k=0
The first summand equals f1 SDPm+n (A). The second summand is bounded
in absolute value by (1 − f1 )SDPm+n (A) as you will prove in Exercise 8.1
(d).
Thus for the second sum we have
m X
X n ∞
X
Aij f2k+1 (uT
i vj )
2k+1
≥ (f1 − 1)SDPm+n (A),
i=1 j=1 k=1
In case you want to know more about them: The book ”A=B” by Petkovsek,
Wilf and Zeilberger
http://www.math.upenn.edu/~wilf/AeqB.html
is a good start.
Exercises
6.1 (a) Why does Theorem 6.1.1 give a proof of the original Grothendieck
inequality? Which explicit upper bound for KG does it provide?
(Determine a concrete number.)
(b) Show that E[yu yv ] = fu · fv holds.
(c) Prove that the matrix
E[Xu Xv ] u,v∈V
is positive semidefinite.
(d) Show that
m X
X n ∞
X
Aij f2k+1 (ui · vj )2k+1 ≤ (1 − f1 )SDPm+n (A).
i=1 j=1 k=1
F : S n → R ∪ {∞}
which only depends on the spectrum of the matrix X; the collection of its
eigenvalues λ1 (X), . . . , λn (X).
Note that this implies that the function f is symmetric, i.e. its value stays
the same if we permute its n arguments; it is invariant under permutation
of the variables.
154 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
2
Set Sij = eT i Auj and verify that the matrix S = (Sij ) is doubly stochastic:
Its entries are clearly nonnegative, the rows sums are
n
X n
X 2
Sij = eT
i Auj = kAuj k2 = 1
j=1 i=1
and similarly the column sums are also all equal to 1. By Theorem A.7.2 we
can write S as a convex combination of permutation matrices:
X X
S= ασ Pσ , with ασ ≥ 0, ασ = 1.
σ∈Sn σ∈Sn
We continue
n
X X X n
X X
Yii = λj ασ (Pσ )ij = ασ λj (Pσ )ij = ασ λσ−1 (i) ,
j=1 σ∈Sn σ∈Sn j=1 σ∈Sn
which yields
(Y11 , . . . , Ynn ) ∈ Π(λ1 , . . . , λn ).
Proof of Theorem 7.1.2 One implication follows without any work. Let F
be a convex spectral function. Then f is symmetric by definition. It is convex
156 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
Thus,
F (X) ≤ max f diag(AXAT ) .
A∈O(n)
By Theorem 7.1.5
max f (diag(Y )) = max f (x).
Y ∈SH(X) x∈Π(λ1 ,...,λn )
maximizing a convex function over a polytope, see Section A.6. The vertices
of Π(λ1 , . . . , λn ) are of the form (λσ(1) , . . . , λσ(n) ) with σ ∈ Sn . Since f is
symmetric, the objective values of all vertices coincide and hence
max f (x) = f (λ1 , . . . , λn ) ≤ F (X)
x∈Π(λ1 ,...,λn )
so that we proved
max f diag(AXAT ) ≥ F (X)
A∈O(n)
Lemma 7.1.7 Let X, Y ∈ S++ n be two positive definite matrices which are
where the strict inequality follows from the addition to Corollary A.2.8.
Tr(XY ) = Tr(LT Y L)
The elements of the max det cone have two components and in this way
it is similar to the Lorentz cone Ln+1 .
Theorem 7.2.2 The max det cone Dn+1 is a proper convex cone.
160 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
Proof We first verify that Dn+1 is a convex cone. For this let α ≥ 0,
(X, s), (Y, t) ∈ Dn+1 be given. Then α(X, s) ∈ Dn+1 because
αX 0, αs ≥ 0, det(αX)1/n = α(det X)1/n ≥ αs.
Also (X, s) + (Y, t) ∈ Dn+1 because
X + Y 0, s + t ≥ 0, (det(X + Y ))1/n ≥ (det X)1/n + (det Y )1/n ≥ s + t
where we used Minkowski’s determinant inequality.
The max det cone is pointed: Suppose (X, s) and −(X, s) both lie in Dn+1 ,
then X = 0 because S+ n is pointed and s = 0 because R is pointed.
+
It is full-dimensional since an open neighborhood of (I, 1/2) is contained
in Dn+1 .
It is closed because S+n is closed, Rn is closed, and the function g(X, s) =
and because the psd cone is self dual, Y 0 follows. Now consider (X, (det X)1/n ) ∈
Dn+1 with X 0. Then
and therefore
Tr(XY )
−t ≤ .
(det X)1/n
Definition 7.2.5 The max det problem in dual standard form is given as
d∗ = inf bT y
y ∈ Rm
m (7.5)
X
yj (Aj , aj ) − (C, c) ∈ (Dn+1 )∗ .
j=1
d∗ = inf bT y
y ∈ Rm
Xm
yj Aj − C 0
j=1
1/n
m
X 1
det yj Aj − C ≥ .
n
j=1
Of course, the duality theory developed in Chapter 2.4 also holds for
max det problems. Often the following optimality condition is useful when
analyzing max det problems.
Theorem 7.2.6 Suppose there is no duality gap between the primal max
det problem (7.4) and its dual (7.5); p∗ = d∗ . Suppose (X, s) is feasible
for (7.4) and y ∈ Rm is feasible for (7.5). Then (X, s) is optimal for (7.4)
and y ∈ Rm is optimal for (7.5) if and only if the following three conditions
are satisfied:
!
m
P
(i) X yj Aj − C = αI for some α ≥ 0,
j=1
= 0.
So the two inequalities are tight. The first tight inequality implies together
with Corollary 7.1.9 condition (i), the second tight inequality implies condi-
tion (ii) and (iii).
Recall from Chapter 3.1 that one can represent an ellipsoid by a positive
n and a vector x ∈ Rn either implicitly by a strictly
definite matrix A ∈ S++
convex inequality
E(A2 , x) = {Ay + x : y ∈ Bn }.
164 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
√
The volume of E(A, x) equals det A vol Bn . In the following the volume of
the unit ball is just a dimension dependent factor which does not play any
further role.
Also recall that one can represent a polytope in two ways. Explicitly, as
a convex hull of finitely many points
P = conv{x1 , . . . , xN } ∈ Rn ,
or implicitly as a bounded intersection of finitely many halfspaces
P = {y ∈ Rn : aT T
1 y ≤ b1 , . . . , am y ≤ bm }.
d A−1
is positive semidefinite with d = A−1 x and the inequality
−1
xT T
i A xi − 2xi d + s ≤ 1
The constraint
−1
xT T
i A xi − 2xi d + s ≤ 1
can be expressed by
xT s dT
1 i , ≤ 1.
xi xi xTi d A−1
Our goal is now to find a best outer ellipsoidal outer approximation of the
polytope P . That is an ellipsoid which contains P and whose volume is as
small as possible. We can find such an ellipsoid by a conic program
sup (det A−1 )1/n
s dT
n+1
∈ S+
d A−1
−1
xT T
i A xi − 2xi d + s ≤ 1, i ∈ [N ]
Note here that maximizing (det A−1 )1/n is equivalent to minimizing (det A)1/n .
Rewriting this program into primal standard form yields
sup t
n+1
((B, t), Y, u) ∈ Dn+1 × S+ × RN
+
hEij , Bi + h−Ei+1,j+1 , Y i = 0, 1 ≤ i ≤ j ≤ n (7.6)
−xT
1 i , Y + ui = 1, i ∈ [N ].
−xi xi xT i
y1 , . . . , yN ≥ 0, i ∈ [N ]
Duality theory implies that there is no duality gap between primal and
dual and that the optimal values are attained whenever the polytope is full
dimensional.
Lemma 7.3.2 If dim P = n, then both programs are strictly feasible.
166 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
x1 = e1 , x2 = e2 , . . . , xn = en , xn+1 = 0
P
holds. Choose zij so that i,j zij Eij = I and then (I, −1) lies in the interior
of the dual cone (Dn+1 )∗ . For ε > 0 set
y1 = y2 = · · · = yn = 2, yn+1 = 2n + ε, yn+2 = · · · = yN = ε.
Then
N
X 1 −xi
−I + yi
−xi xi xTi
i=1
n T T !
X 1 1 0 0
= 2 −
−ei −ei ei ei
i=1
N T
2n + ε 0 X 1 1
+ +ε .
0 0 −xi −xi
i=n+2
Here the third summand is positive semidefinite and the first two summands
together are positive definite as one can see from the Schur complement, see
Lemma B.2.1:
4n + ε −2 . . . −2
−2 1
0
.. ..
. .
−2 1
because
4n + ε − (−2e)T I(−2e) = 4n + ε − 4n = ε > 0.
7.3 Approximating polytopes by ellipsoids 167
P = {x ∈ Rn : aT T
1 y ≤ b1 , . . . , am y ≤ bm }
kAaj k ≤ bj − aT
jx
aT
j (Ay + x) ≤ bj
if and only if
max{(Aaj )T y : y ∈ Bn }} ≤ bj − aT
j x.
Using the second order cone we can model the inequality kAaj k ≤ bj −aT
jx
T
as (Aaj , bj − aj x) ∈ Ln+1
sup s
(A, s) ∈ Dn , x ∈ Rn
(Aaj , bj − aT
j x) ∈ L
n+1
, j ∈ [m]
sup s
(A, s) ∈ Dn , x ∈ Rn , (yj , tj ) ∈ Ln+1 , j ∈ [m] (7.8)
yj = Aaj , tj = bj − aT
j x, j ∈ [m].
168 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
By assumption
0 0
(B, t) = (I, 1) and Y = .
0 I
is an optimal solution of the primal. By complementary slackness we know
si = 0 for i ∈ [M ]. Hence
−xT
1 i 0 0
, = 1 for i ∈ [M ]
−xi xi xT
i 0 I
170 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
and so
−xT
X 1 ∗ 0
yi∗ i = 1
−x1 xi xT
i 0 nI
i
because generally
0 0 A B 0 0 0 0
= =
0 I C D C D 0 0
Define λi = nyi∗ , then the second and third optimality condition follow from
the equality above.
The second statement about Ein (P ) follows from the fist statement about
Eout (P ) by considering the polar polytope
P o = {y ∈ Rn : xT y ≤ 1 for all x ∈ P },
see Exercise 7.13.
the cube are balls.
This optimality condition is helpful in surprisingly many situation. For
example one can use them to prove an estimate on the quality of the inner
and outer approximation.
Corollary 7.3.5 Let P ⊆ Rn be an n-dimensional polytope, then there is
an invertible affine transformation T so that
Bn = T Ein (P ) ⊆ T P ⊆ nT Ein (P ) = nBn
holds.
Proof It is clear that we can map Ein (P ) to the unit ball by an invertible
affine transformation, see Exercise 7.11. So we can use the equations
M
X M
X
λ i xi = 0 and λ i xi xT
i =I
i=1 i=1
7.3 Approximating polytopes by ellipsoids 171
Exercises
7.1 Let X, Y ∈ Sn
be symmetric matrices. Use Theorem 7.1.5 to determine
the minimum
min{hX, AY AT i : A ∈ O(n)}.
7.2 Show that the sum of the largest k eigenvalues of a symmetric matrix
is a convex spectral function.
7.3 Use Davis’ characterization of convex and spectral functions to show:
The function
− ln det X, if X 0,
F : S n → R ∪ {∞}, F (X) =
∞, otherwise,
is convex and spectral.
7.4 For which values of k ∈ Z is the function
Φk : S n → R, X 7→ Tr(X k )
exists, then
∀{i, j} ∈ E : ((A∗ )−1 )ij = 0.
7.6 Let
P = {x ∈ Rn : aT
j x ≤ bj (j ∈ [m])}
R = {x ∈ Rn : α1 ≤ x1 ≤ β1 , . . . , αn ≤ xn ≤ βn }
with R ⊆ P .
174 Optimizing with ellipsoids and determinants (Version: May 24, 2022)
7.7 Determine the dual conic program of (7.8) and prove that their is no
duality gap between primal and dual.
7.8 Let P ⊆ Rn be an n-dimensional polytope and let A ∈ Rn×n be an
invertible matrix. Show:
Eout(AP ) = AEout (P ).
7.9 Determine the Löwner-John ellipsoid Ein (Cn ) of the regular n-gon Cn
in the plane
Cn = conv{(cos(2πk/n), sin(2πk/n)) ∈ R2 : k = 0, 1, . . . , n − 1}.
7.10 Let P be the polytope
1
P = conv √ (±ei ± ej ) : i, j = 1, . . . , 4, i 6= j ⊆ R4 ,
2
which has 24 42 = 96 many vertices. Show that Eout (P ) = B4 holds.
precisely when FK (x) = {x}. Recall also that if K does not contain a line
then it has at least one extreme point.
A point z ∈ Rn is called a perturbation of x ∈ K if x ± z ∈ K for some
> 0; then the whole segment [x − z, x + z] is contained in the face FK (x).
The set of perturbations of x ∈ K is a linear space, which we denote by
PK (x), and whose dimension is equal to the dimension of the face FK (x).
nonnegative orthant Rn+ it is easy to see that its faces are obtained by fixing
some coordinates to zero and thus each face of Rn+ can be identified to a
smaller nonnegative orthant Rr+ for some integer 0 ≤ r ≤ n. It turns out
that an analogous property holds for the faces of S+ n , namely each face of
u1 , · · · , ur ). The map
φA : S r → S n
Z 0 (8.1)
Z 7→ U U T = U0 ZU0T
0 0
is a rank-preserving isometry, which r:
identifies F (A) and S+
r Z 0 T T r
F (A) = φA (S+ ) = U U = U0 ZU0 : Z ∈ S+ .
0 0
Moreover, F (A) is given by
n
F (A) = {X ∈ S+ : kerX ⊇ kerA} (8.2)
and its dimension is equal to r+1
2 .
rank-preserving isometry:
Z 0
Z 7→ Y = 7→ X = U Y U T
0 0
D0 7→ D 7→ A
r r
S+ → S+ ⊕ 0n−r → F (A)
r = r+1 .
and thus the dimension of F is equal to dim S+ 2
As a direct n
application, the possible dimensions for the faces of the cone S+
r+1
are 2 for r = 0, 1, · · · , n. Moreover there is a one-to-one correspondence
n and the lattice of subspaces of Rn :
between the lattice of faces of S+
U subspace of Rn 7→ FU = {X ∈ S+
n
: kerX ⊇ U }, (8.3)
with U1 ⊆ U2 ⇐⇒ FU1 ⊇ FU2 .
i.e., B ∈ φA (LA ). Therefore, we find that PK (A) = φA (LA ), and thus the
dimension of FK (A) is equal to dim PK (A) = dim φA (LA ) = dim AA , which
gives (8.10).
Remark 8.1.4 The codimension of the affine space AA can also be ex-
pressed as follows. Given a factorization A = W W T , where W ∈ Rn×r , we
have
codim AA = dimhW T Aj W : j ∈ [m]i.
Indeed, the matrix P = W T U0 D0−1 is nonsingular, since P T P = D0−1 using
the fact that U0T U0 = Ir . Moreover, W P = U0 , and thus
dimhW T Aj W : j ∈ [m]i = dimhP T W T Aj W P : j ∈ [m]i = dimhU0T Aj U0 : j ∈ [m]i
is indeed equal to codimAA .
As an application, for the elliptope K = En , if A ∈ En is the Gram matrix
of vectors {a1 , · · · , an } ⊆ Rk , then codim AA = dimha1 aT T
1 , · · · , an an i.
Figure 8.1 shows the elliptope E3 (more precisely, its bijective image in R3
obtained by taking the upper triangular part of X). Note the four corners,
which correspond to the four cuts of the graph K3 . All the points on the
boundary of E3 - except those lying on an edge between two of the four
corners – are extreme points. For instance, the matrix
√
1 0 1/√2
A = 0√ 1√ 1/ 2
1/ 2 1/ 2 1
is an extreme point of E3 (check it), with rank r = 2.
Theorem 8.1.7 Consider the projective space Pn−1 , consisting of all lines
in Rn passing through the origin, and let Sn−1 be the unit sphere in Rn . For
n ≥ 3 there does not exist a continuous map Φ : Sn−1 → Pn−1 such that
Φ(x) 6= Φ(y) for all distinct x, y ∈ Sn−1 .
most t − 1 ≤ s.
s+2 s+2
Suppose now A∩S++ 6= ∅. Then K = FK (A) for any matrix A ∈ A∩S++ .
Using (8.10) and the assumption on codimA, we obtain that
s+3 s+3 s+2
dim K = − codim A = − = s + 2.
2 2 2
Hence, K is a (s + 2)-dimensional compact convex set, whose boundary ∂K
is (topologically) the sphere Ss+1 . We now show that the boundary of K
contains a matrix with rank at most s.
Clearly every matrix in ∂K has rank at most s + 1. Suppose for a contra-
diction that no matrix of ∂K has rank at most s. Then, each matrix X ∈ ∂K
has rank s+1 and thus its kernel kerX has dimension 1, it is a line though the
origin. We can define a continuous map Φ from ∂K to Ps+1 in the following
way: For each matrix X ∈ ∂K, its image Φ(X) is the line kerX. The map Φ
is continuous (check it) from Ss+1 to Ps+1 with s + 1 ≥ 2. Hence, applying
Theorem 8.1.7, we deduce that there are two distinct matrices X, X 0 ∈ ∂K
with the same kernel: kerX = kerX 0 . Hence X and X 0 are two distinct
points lying in the same face of K: FK (X) = FK (X 0 ). Then this face has an
extreme point A, whose rank satisfies rankA ≤ rankX − 1 ≤ s.
We can now conclude the proof of Proposition 8.1.6.
Then S+2 ∩ A = {I} and thus this set contains no rank 1 matrix. Moreover,
codim A = 3 = s+2
2 with s = 1. This example shows that the condition
n ≥ s + 2 cannot be omitted in Lemma 8.1.8.
Example 8.2.3 below shows that also the assumption that K is bounded
cannot be omitted.
8.2 Applications
8.2.1 Euclidean realizations of graphs
The graph realization problem can be stated as follows. Suppose we are given
a graph G = (V = [n], E) together with nonnegative edge weights w ∈ RE +,
viewed as ‘lengths’ assigned to the edges. We say that (G, w) is d-realizable
if one can place the nodes of G at points v1 , · · · , vn ∈ Rd in such a way that
their Euclidean distances respect the given edge lengths:
Observe that the result of Lemma 8.2.2 still holds if we only assume that
G is a graph that contains one node adjacent to all other nodes.
Proposition 8.2.4 (Saxe [1979]) Given a graph (G, w) with integer lengths
w ∈ NE , deciding whether (G, w) is 1-realizable is an NP-complete problem,
already when G is restricted to be a circuit.
186 Euclidean embeddings: Low dimension
You will show this in Exercise 8.1. The following basic geometrical fact
will be useful for the proof.
Hence what the above shows is that any realizable weighted circuit can
be embedded in the line or in the plane, but deciding which one of these two
possibilities holds is an NP-complete problem!
is a convex set in R2 .
Proof It suffices to show that, if the set
n
K = {X ∈ S+ : hA, Xi = a, hB, Xi = b, Tr(X) = 1}
is not empty then it contains a matrix of rank 1. Define the affine space
A = {X ∈ S n : hA, Xi = a, hB, Xi = b, Tr(X) = 1}.
Then the existence of a matrix of rank 1 in K follows from Corollary 8.1.3 if
codim A ≤ 2, and
from Proposition 8.1.6 if codim A = 3 (as K is bounded,
s+2
codim A = 3 , n ≥ s + 2 for s = 1).
Observe that the assumption n ≥ 3 cannot be omitted in Proposition
8.2.9. To see it consider the quadratic map q defined using the matrices A
and B from Example 8.1.9. Then, q(1, 0) = (1, 0), q(0, 1) = (−1, 0), but
(0, 0) does not belong to the image of S1 under q.
We conclude with the following application of Proposition 8.2.9, which
shows that the numerical range R(M ) of a complex matrix M ∈ Cn×n is a
convex subset of C (viewed as R2 ). Recall that the numerical range of M is
n
X n
X
∗ n
R(M ) = {z M z = zi Mij zi : z ∈ C , |zi |2 = 1}.
i,j=1 i=1
(i) {x ∈ Rn : xT Ax ≥ 0} ⊆ {x ∈ Rn : xT Bx ≥ 0}.
(ii) There exists a scalar λ ≥ 0 such that B − λA 0.
Proof The implication (ii) =⇒ (i) is obvious. Now, assume (i) holds, we
show (ii). For this consider the semidefinite program:
As (P) is bounded and strictly feasible, applying the strong duality the-
orem, we deduce that there is no duality gap and that the dual prob-
lem has an optimal solution (y, z) with y, z ≥ 0. Therefore, B − zA =
(B − zA − yI) + yI 0, thus showing (ii).
Exercises
8.1 A graph G is said to be d-realizable if, for any edge weights w, (G, w) is
d-realizable whenever it is realizable. For instance, the complete graph
Kn is (n − 1)-realizable, but not (n − 2)-realizable (Example 8.2.3).
(a) Given two graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) such that V1 ∩ V2
is a clique in G1 and G2 , their clique sum is the graph G = (V1 ∪
V2 , E1 ∪ E2 ).
Show: If G1 is d1 -realizable and G2 is d2 -realizable, then G is d-
realizable where d = max{d1 , d2 }.
(b) Given a graph G = (V, E) and an edge e ∈ E, G\e = (V, E \ {e})
denotes the graph obtained by deleting the edge e in G.
Show: If G is d-realizable, then G\e is d-realizable.
(c) Given a graph G = (V, E) and an edge e = {i1 , i2 } ∈ E, G/e
denotes the graph obtained by contracting the edge e in G, which
means: Identify the two nodes i1 and i2 , i.e., replace them by a new
node, called i0 , and replace any edge {i1 , j} ∈ E by {i0 , j} and any
edge {i2 , j} ∈ E by {i0 , j}.
Show: If G is d-realizable, then G/e is d-realizable.
(d) Show that the circuit Cn is 2-realizable, but not 1-realizable.
(e) Show that G is 1-realizable if and only if G is a forest (i.e., a disjoint
union of trees).
(f) Show that K2,2,2 is 4-realizable.
Recall that a minor of G is any graph that can be obtained from G
by deleting and contracting edges and by deleting nodes. So the above
shows that if G is d-realizable then any minor of G is d-realizable.
Belk and Connelly [2007] show that K2,2,2 is not 3-realizable, and
that a graph G is 3-realizable if and only if G has no K5 and K2,2,2
minor. (The ‘if part’ requires quite some work.)
8.2 Let A, B, C ∈ S n and let
Q = {q(x) = (xT Ax, xT Bx, xT Cx) : x ∈ Rn } ⊆ R3
denote the image of Rn under the quadratic map q. Assume that n ≥ 3
and that there exist α, β, γ ∈ R such that αA + βB + γC 0.
Show that the set Q is convex.
192 Euclidean embeddings: Low dimension
9.1 Motivation
We start the chapter by giving the definition of its central contender:
One important example of a finite metric space is the shortest path met-
ric of a finite connected graph G = (V, E). There we measure the distance
d(x, y) between two vertices x and y by the length of a shortest path con-
necting x and y. Here the length of a path equals the number of its edges.
In many applications one has to deal with data sets which come equipped
with a natural metric, for example with a similarity measure. However, a
priori these metric spaces do not have a lot of structure. In contrast, Eu-
clidean spaces are metric spaces which have a lot of additional structure.
They are linear spaces having a vector space structure. They are normed
spaces, the norm defines the Euclidean metric. The normed space is com-
plete as every Cauchy sequence has a limit. The parallelogram law (A.2)
kv + wk2 + kv − wk2 = 2kvk2 + 2kwk2 holds. So one natural idea is to embed
the data set and its metric into a Euclidean space. Then one can use existing
geometric algorithms like clustering to work with the data set. Low dimen-
sional Euclidean embeddings are especially useful if one wants to visualize
the data set.
194 Euclidean embeddings: Low distortion (Version: May 24, 2022)
x3
x4 x2
x1
Then we can consider the inner product matrix Z = (f (xi )T f (xj ))1≤i,j≤n ,
which is positive semidefinite. Also note that
Fij = ei eT T T T
i − ei ej − ej ei + ej ej for 1 ≤ i < j ≤ n.
The condition Y e = 0 says that the all-ones vector e lies in the kernel of Y .
We would like to point out a technical difficulty before giving the proof
of the theorem. The dual of (9.1) reads
X
sup αij d(xi , xj )2
i,j
As always one can prove weak duality without any problem: For feasible
solutions of (9.1) and (9.3) we verify
X X X
τ− αij d(xi , xj )2 ≥ τ βij d(xi , xj )2 − αij hFij , Zi
i,j i,j i,j
X X
≥ βij hFij , Zi − αij hFij , Zi
i,j i,j
(9.4)
* +
X X
= − αij Fij + βij Fij , Z ≥ 0.
i,j i,j
Pτ = {Z ∈ S n : h−Fij , Zi ≤ −d(xi , xj )2 ,
hFij , Zi ≤ τ d(xi , xj )2 for 1 ≤ i < j ≤ n}.
This polyhedron is not empty because for every pair of distinct indices i, j
we can choose either Zij = − 21 d(xi , xj )2 or Zij = − 12 τ d(xi , xj )2 whereas the
diagonal elements Zii are all zero.
Suppose the intersection of the interior of Pτ and the interior of S+ n is
empty which implies c2 (X, d)2 ≥ τ . Now we shall construct a feasible so-
lution of the dual with objective at least τ . For this we use a separating
hyperplane, separating Pτ from S+ n . By Theorem A.4.5 there is a nonzero
n
matrix A ∈ S so that
n
sup{hA, Zi : Z ∈ Pτ } ≤ inf{hA, Zi : Z ∈ S+ }.
It follows that hA, Zi ≤ 0 for all Z ∈ Pτ which means that the inequality
hA, Zi ≤ 0 is implied by the inequalities defining the polyhedron Pτ . We
9.2 Least distortion Euclidean embeddings via semidefinite optimization 199
We already saw that Pτ 6= ∅ and also the minimization problem has a feasible
solution. For suppose not, then by Farkas’ lemma, Theorem 2.6.1, there is a
matrix Z 0 with
hFij , Z 0 i ≤ 0, −hFij , Z 0 i ≤ 0 and hA, Z 0 i > 0.
So we could add a positive multiple of Z 0 to any feasible solution of the
maximization problem, thereby increasing the objective value by the same
positive multiple of hA, Z 0 i which finally contradicts that the maximum is
at most 0. Thus the feasible sets of the maximization and the minimization
problems are both not empty and we can indeed claim by linear program-
ming duality that both problems have a common optimal value δ. Then
δ ≤ 0 and the inequality hA, Zi ≤ δ is a nonnegative linear combination of
the inequalities defining Pτ .
Consider this representation
X X
A= βij Fij − αij Fij . (9.5)
i,j i,j
Now we may choose αij , βij to have disjoint support because for an optimum
solution Z ∗ of the maximization problem it can only happen that
hFij , Z ∗ i = τ d(xi , xj )2 or − hFij , Z ∗ i = −d(xi , xj )2 ,
but never both; unless of course τ = 1 which is not interesting. So we
can eliminate in the maximization problem all the inequalities with are not
sharp.
Define
1
2
− 2 τ d(xi , xj )
if βij > 0,
1
Zij = − 2 d(xi , xj ) 2 if αij < 0,
0 otherwise.
In the next sections we will use this theorem to find lower bounds for
least distortion Euclidean embeddings of several graphs. For this one has
to construct a matrix Y which sometimes appears to come out of the blue.
By complementary slackness, which is the same as analyzing the case of
equality in the proof of weak duality (9.4), we get hints where to search for
an appropriate matrix Y .
Corollary 9.2.2 If Y is an optimal solution of the maximization prob-
lem (9.2), then Yij > 0 only for the most contracted pairs. These are pairs
d(xi ,xj )
(xi , xj ) for which kf (xi )−f (xj )} is maximized. Similarly, then Yij < 0 only for
kf (xi )−f (xj )k
the most expanded pairs, maximizing d(xi ,xj ) .
For graphs most expanded pairs are simply adjacent vertices:
Lemma 9.2.3 Let (X, d) be a finite metric space which is defined by the
shortest path metric of a graph. Let f : X → Rn be a Euclidean embedding
of X. Then the most expanded pairs are always adjacent vertices.
1101 1111
1001 1011
0101 0111
0001 0011
0100 0110
0000 0010
1100 1110
1000 1010
so that it will turn out that in a least distortion embedding of Qr the most
expanded pairs are coming from all adjacent pairs of vertices and the most
contracted pairs are coming from all pairs of vertices which are diametrically
opposite.
The matrix Y satisfies the properties of Theorem 9.2.1. We clearly have
Y e = 0 because every row of Y sums to
r(−1) + (r − 1) + 1 = 0.
where e` is the `-th standard basis vector of Rr . We see that the second
factor does not depend on x, so χu is indeed an eigenvector of Y . To see
that the eigenvalues are all nonnegative we consider the possible values of
T
the first summand (−1) r`=1 (−1)u e` which are −r, −r + 2, . . . The second
P
so,
2r r 2
c2 (Qr )2 ≥ = r.
2r r
Figure 9.3 Examples of strongly regular graphs include the square graph
with parameters (4, 2, 0, 2), the pentagon graph with parameters (5, 2, 0, 1),
or the Petersen graph with parameters (10, 3, 0, 1).
Lemma 9.4.4 The G be a strongly regular graph with parameters (v, k, λ, µ).
Then the following statements are equivalent:
(i) G is not connected.
(ii) µ = 0.
(iii) λ = k − 1.
(iv) G is isomorphic to m > 1 copies of the complete graph Kk+1 on k + 1
vertices.
Proof Exercise 9.6
If (X, d) is a finite metric space defined by the shortest path metric of a
connected strongly regular graph, then distinct pairs of points are either at
distance 1 or at distance 2. It will turn out that there exists a least distortion
embedding for which all pairs of points at distance 1 are most expanded pairs
and all pairs of points at distance 2 are most contracted pairs.
Definition 9.4.5 Let (X, d) be a finite metric space. We say that an em-
bedding f : X → Rn of X into Euclidean space is faithful if for every two
pairs of points (x, y) and (x0 , y 0 ) ∈ X × X we have
d(x, y) = d(x0 , y 0 ) =⇒ kf (x) − f (y)k = kf (x0 ) − f (y 0 )k.
Lemma 9.4.6 For every connected strongly regular graph G = (V, E) there
exists a faithful embedding into Euclidean space with minimal distortion.
Proof Let Z ∈ S+ V be the Gram matrix of an embedding f : V → Rn .
Clearly,
only depend on the distance d(x, y) it follows that f¯ is faithful. We set for
i = 0, 1, 2
Mi = {(x, y) ∈ X × X : d(x, y) = i},
and get
1 X
Z̄xy = Z x0 y 0 .
|Md(x,y) |
(x0 ,y 0 )∈M d(x,y)
Since
1 X
d(x, y)2 = d(x0 , y 0 )2
|Md(x,y) |
(x0 ,y 0 )∈Md(x,y)
9.4 Least distortion embeddings of strongly regular graphs 207
√
we see that the distortion of f¯ is at most τ :
1 X
d(x, y)2 ≤ hFx0 y0 , Zx0 y0 i ≤ τ d(x, y)2 .
|Md(x,y) | 0 0
(x ,y )∈Md(x,y)
Proof By the previous lemma there exists a least distortion Euclidean em-
bedding of G which is faithful. For this embedding the most expanded pairs
are all pairs of adjacent vertices and the most contracted pairs are all pairs
of nonadjacent vertices. So we may assume that for the maximization prob-
lem (9.2) there is a matrix attaining the maximum which is of the form
where we used the i-th adjacency matrix Ai introduced in the proof of the
previous lemma. We have
Yα e = (k − α(v − k − 1))A0 e − A1 e + αA2 e
= ((k − α(v − k − 1)) − k + α(v − k − 1))e = 0.
The numerator of the objective function equals
X
22 [Yα ]xy = 4αv(v − k − 1)
x,y:Yxy >0
together
2 α(v − k − 1)
c2 (G) = max 4 : α ∈ R, Yα is positive semidefinite ,
k
so the only thing left is to maximize α so that the matrix Yα is positive
semidefinite. The eigenvectors of A and of Yα coincide by the construction
of Yα . So we only need to ensure that the corresponding eigenvalues given
by
Yα vi = ((k − α(v − k − 1)) − r − α(r + 1))vi , i = 1, . . . , f,
208 Euclidean embeddings: Low distortion (Version: May 24, 2022)
and similarly by
Since k−s > 0 the second condition gives a constraint for α only if v−k+s >
0. But then this constraint is dominated by
k−r
α≤
v−k+r
because
k−r k−s k−r v−k+r
≤ ⇐= <1< ,
v−k+r v−k+s k−s v−k+s
k−r
as r > s, so the maximum possible α equals v−k+r which finishes the proof.
Example 9.4.8 (Example of Figure 9.4 further continued) The least dis-
tortion of the square graph equals
s
(4 − 2 − 1)(2 − 0) √
2 = 2,
(4 − 2 + 0)2
Such family of graphs are called expander graphs. So, Bourgain’s theorem is
tight for expander graphs.
In the proof of the theorem we will apply Dirac’s theorem on Hamiltonian
cycles from graph theory:
is Hamiltonian. For suppose not. Then there are vertices which are not
contained in C. Then, since G is connected, there is a vertex xj in C and
another vertex y not contained in C such that {xj , y} ∈ E. Define a path
Q starting at y, going to xj , around the cycle C, and stopping before xj .
But Q would be one edge longer than P in contradiction to the maximality
of P .
hence,
2
c2 (G)2 ≥ r .
2k
Exercises
9.1 Let (X, d) be a finite metric space. Show that c2 (X) ≤ D with D =
max{d(x, y) : x, y ∈ X} being the diameter of X.
9.2 Let G = (V, E) be a k-regular graph with n vertices. Let
k = λ1 ≥ λ2 ≥ . . . ≥ λn
be the eigenvalues of the adjacency matrix of G. Show that:
(a) λi ∈ [−k, k] for all i = 1, . . . , n.
Exercises 213
S n−1 = {x ∈ Rn : x · x = 1}
and the objects we want to pack are spherical caps of angle γ. The spherical
cap with angle γ ∈ [0, π] and center x ∈ S n−1 is given by
ωn−1 (S n−2 ) 1
Z
w(γ) = (1 − u2 )(n−3)/2 du,
ωn (S n−1 ) cos γ
where ωn (S n−1 ) = (2π n/2 )/Γ(n/2) is the surface area of the unit sphere.
Two spherical caps C(x1 , γ) and C(x2 , γ) intersect in their topological in-
terior if and only if the inner product of x1 and x2 lies in the half-open
interval (cos(2γ), 1]. Conversely we have
In the 1970s advanced methods to determine upper bounds for the kiss-
ing number based on linear programming were introduced. Using these new
techniques, the kissing number problem in dimension 8 and 24 was solved
by Odlyzko, Sloane, and Levensthein. For four dimensions, however, the op-
timization bound is 25, while the exact kissing number is 24. In a celebrated
work Oleg Musin proved this in 2003, see Pfender, Ziegler [2004].
The goal of this lecture is to provide a proof of τ8 = 240.
V = S n−1 = {x ∈ Rn : x · x = 1},
x ∼ y ⇐⇒ x · y ∈ (cos(2γ), 1).
Then,
K : S n−1 × S n−1 → R
We use this cone C(S n−1 × S n−1 )0 to define the theta prime number of
10.2 Symmetry reduction 217
The automorphism group of the graph G(n, 2γ) is the orthogonal group
O(n) because for all A ∈ O(n) we have
Ax · Ay = x · y.
Furthermore the graph G(n, 2γ) is vertex transitive because for every two
points x and y on the unit sphere there is an orthogonal matrix mapping x
to y. Even stronger it is two-point homogeneous, meaning that if x, y, x0 , y 0 ∈
S n−1 are so that
x · y = x0 · y 0 ,
is a feasible solution for ϑ0 with the same objective value. So we can sym-
metrize any feasible solution K of ϑ0
Z
K 0 (x, y) = K A (x, y)dµ(A),
A∈O(n)
where
O(n)
C(S n−1 × S n−1 )0
= {K ∈ C(S n−1 × S n−1 ) : K A (x, y) = K(Ax, Ay) = K(x, y) for all A ∈ O(n)}.
So we get
where
Ekn (x, y) = Pkn (x · y),
∞
X
K(x, y) = fk Ekn (x, y)
k=0
holds. Here the right hand side converges absolutely and uniformly over
S n−1 × S n−1 .
For n = 2, Pk2 are the Chebyshev polynomials (of the first kind). For
larger n the polynomials belong to the family of Jacobi polynomials. The
Jacobi polynomials with parameters (α, β) are orthogonal polynomials for
the measure (1 − t)α (1 + t)β dt on the interval [−1, 1]. They form a complete
orthogonal system of the space L2 ([−1, 1], (1 − t)α (1 + t)β dt). This space
consists of all real-valued functions f : [−1, 1] → R for which the integral
Z 1
f 2 (t)(1 − t)α (1 + t)β dt
−1
(α,β)
exists and is finite. We denote by Pk the normalized Jacobi polynomial of
(α,β)
degree k with normalization Pk (1) = 1. The first few normalized Jacobi
220 Packings on the sphere
1
x
4/3*x^2 - 1/3
2*x^3 - x
16/5*x^4 - 12/5*x^2 + 1/5
between K and L is
Z
hK, Li = K(x, y)L(x, y)dωn (x)dωn (y).
S n−1
Lemma 10.4.1 We have the orthogonality relation Ekn ⊥ Eln whenever
k 6= l.
Proof Since Ekn (x, y) = Pkn (x·y) and the integrals are invariant under O(n),
we can take x = N , where N is the North Pole, and therefore,
Z
n n n−1
hEk , El i = ωn (S ) Pkn (N · y)Pln (N · y)dωn (y)
S n−1
Z 1
n−3
= ωn (S n−1 )ωn−1 (S n−2 ) Pkn (t)Pln (t)(1 − t2 ) 2 dt
−1
= 0,
if k 6= l.
because we have
(Avk,x , f ) = (vk,x , A−1 f )
= f (Ax)
= (vk,Ax , f ) (by definition of vk,· )
= (vk,x , f ) (Ax = x)
and by uniqueness of vk,x , it follows that Avk,x = vk,x . Thus (x, y) 7→ vk,x (y)
is purely a function of x · y.
Also, for k 6= l, vk,x ⊥ vl,x since Vk ⊥ Vl , thus they have the right orthog-
onality relations. Hence Ekn (x, y) and vk,x (y) are multiples of each other.
Since we have
Ekn (x, x) = 1 and vk,x (x) = (vk,x , vk,x ) > 0,
the claim follows.
Now we are ready to show that Ekn is positive semidefinite. Observe that
Ekn (x, y)
= αk vk,x (y) and that vk,x (y) = (vk,y , vk,x ). Thus we have,
Z Z
Ekn (x, y)f (x)f (y)dωn (x)dωn (y)
S n−1
Z S n−1
Z
= αk (vk,y , vk,x )f (x)f (y)dωn (x)dωn (y)
S n−1 S n−1
Z Z
= αk vk,x f (x)dωn (x), vk,y f (y)dωn (y)
S n−1 S n−1
≥0
as both the integrals in the last inner product are identical. It follows that
Ekn is positive semidefinite.
since this is the inner product of two positive semidefinite kernels. Now by
orthogonality of Ekn ’s, we have
* +
X
0≤ fk Ekn , Eln = fl hEln , Eln i
| {z }
k >0
in the sense of L2 convergence, it follows that for each m the kernel Km (x, y) =
hm (x · y) is positive semidefinite.
This implies in particular that hm (1) ≥ 0 for all m. But then we have
m
X m
X
h(1) − fk = h(1) − fk Pkm (1) = hm (1) ≥ 0
k=0 k=0
P∞
and we conclude that the series of nonnegative terms k=0 fk converges to
a number less than or equal to h(1), as we wanted.
context of error correcting codes and therefore they also carry the name
“Delsarte’s LP method”.
Note that the infinitely many inequalities can be replaced by a finite
dimensional semidefinite condition using sums of squares (see Chapter 2.7):
d
X
−1 − fk Pkn (t) = p(t) − (t + 1)(t − cos(2γ))q(t)
k=1
1 1 1 1 1 1 1 T
B= √ ,√ ,√ ,√ ,√ ,√ ,√
8 8 8 8 8 8 8
There are 82 22 = 112 points generated by A and 27 = 128 points generated
by B. All possible inner products for points from this set are
1 1
−1, − , 0, , 1 .
2 2
In particular, note that there is no inner product between 12 and 1. Thus,
this is a valid kissing configuration. In fact, this configuration of points on
the unit sphere is coming from the root system E8 which has connections to
many areas in mathematics and physics.
Now, taking hints from the formulation for ϑ0 (G(8, π/3)), we explicitly
construct a kernel K(x, y). Recall, K(x, y) = −1 if {x, y} 6∈ E. Also, recall
that K(x, y) was a function of the inner product x·y = t only. Now, consider
the following polynomial
1 2 2
1
F (t) = −1 + β(t + 1) t + t t−
2 2
Note that, F (−1) = F (−1/2) = F (0) = f (1/2) = −1 by construction. Also,
226 Packings on the sphere
x1 = sin θ cos ϕ
x2 = sin θ sin ϕ
x3 = cos θ.
x · x0 = (sin θ cos ϕ)(sin θ0 cos ϕ0 ) + (sin θ sin ϕ)(sin θ0 sin ϕ0 ) + cos θ cos θ0
= sin θ sin θ0 (cos ϕ cos ϕ0 + sin ϕ sin ϕ0 ) + cos θ cos θ0
= sin θ sin θ0 (cos |ϕ − ϕ0 |) + cos θ cos θ0 ,
The (i, j)-th entry (here it is convenient to start the indexing with 0, thus
i = 0, . . . , d, and j = 0, . . . , d) of this matrix equals
h i
Zkd (x, x0 ) = sink θ sink θ0 cosi θ cos kϕ cosj θ0 cos kϕ0
ij
+ cosi θ sin kϕ cosj θ0 sin kϕ0
Lemma 10.7.2 For all natural numbers N and for all points x1 , . . . , xN ∈
S 2 the symmetric (d + 1) × (d + 1)-matrix
N X
X N
Ykd (xi , xj )
i=1 j=1
is positive semidefinite.
Proof Let y = (y0 , y1 , . . . , yd ) ∈ Rd be a vector. Then,
XN X N XN X N
yT Ykd (xi , xj ) y = y T Ekd (xi )(Ekd (xj ))T y
i=1 j=1 i=1 j=1
N
!T N
!
X X
= (Ekd (xi ))T y (Ekd (xi ))T y
i=1 i=1
≥ 0.
Now we simplify the entries of the matrices Zkd (x, x0 ) and Ykd (x, x0 ) by
introducing the following “inner product coordinates”:
u = x · (0, 0, 1)T = cos θ,
v = x0 · (0, 0, 1)T = cos θ0 ,
t = x · x0 = (1 − u2 )(1 − v 2 ) cos |ϕ − ϕ0 | + uv.
p
10.7 The problem of the thirteen spheres 229
T0 (x) = 1
T1 (x) = x
T2 (x) = 2x2 − 1
T3 (x) = 4x3 − 3x
T4 (x) = 8x4 − 8x2 + 1,
such that
Then for every spherical (N, s)-code we have the upper bound
B
N≤ + 1.
f0
Proof Denote with N = (0, 0, 1)T the north pole of the sphere. Let C be a
spherical (N, s)-code. We may assume that the north pole lies in C after we
rotated C appropriately. Consider the double sum
X X
S= F (x · N, x0 · N, x · x0 ).
x∈C\{N} x0 ∈C\{N}
≥ f0 (N − 1)2 .
The second inequality follows because of the second assumption and because
[Y0d (u, v, t)]00 = 1.
10.7 The problem of the thirteen spheres 231
X X
S= F (x · N, x0 · N, x · x0 ) + F (x · N, x0 · N, x · x0 )
x∈C\{N} (x,x0 )∈(C\{N})2
x6=x0
≤ B(N − 1) + 0,
where the inequality follows from the third and fourth assumption.
Together,
8 8 8 7 8 6 8 5 8 4
F (u, v, t) = 145.5146532u v + 988.2590039u v + 363.0801883u v − 672.8504875u v − 225.3291719u v +
8 3 8 2 8 8 7 8
191.5409281u v + 59.48900200u v − 19.01121967u v − 6.236063758u + 988.2590039u v +
7 7 7 7 7 6 7 6 7 5
2884.321650u v t + 2098.660249u v + 3568.090777u v t + 18.16433328u v + 344.9966795u v t−
7 5 7 4 7 4 7 3 7 3
1553.436106u v − 1392.615104u v t − 147.6093630u v − 437.8480785u v t + 446.8628584u v +
7 2 7 2 7 7 7
123.8174512u v t + 71.67183278u v + 62.08524794u vt − 49.22376389u v − 6.539547051u t−
7 6 8 6 7 6 7 6 6 2
6.216856357u + 363.0801883u v + 3568.090777u v t + 18.16433328u v − 1165.053624u v t +
6 6 6 6 6 5 2 6 5 6 5
5312.362070u v t − 1024.256466u v − 71.15855319u v t + 196.7871602u v t − 471.6846248u v +
6 4 2 6 4 6 4 6 3 2
110.7356966u v t − 2163.927390u v t + 464.1969368u v − 78.18545046u v t −
6 3 6 3 6 2 2 6 2 6 2
505.3867129u v t + 203.1645374u v + 41.35985267u v t + 199.8129603u v t − 54.08669500u v
6 2 6 6 6 2 6
− 17.33067199u vt + 77.24220247u vt − 21.52580028u v + 16.53555569u t − 10.78477271u t+
6 5 8 5 7 5 7 5 6 2
0.8926140522u − 672.8504875u v + 344.9966795u v t − 1553.436106u v − 71.15855319u v t +
5 6 5 6 5 5 3 5 5 2
196.7871602u v t − 471.6846248u v + 1168.212790u v t + 134.9381447u v t −
5 5 5 5 5 4 3 5 4 2
599.4393672u v t + 752.0835370u v + 386.8314957u v t + 37.58791494u v t −
5 4 5 4 5 3 3 5 3 2
219.0874465u v t + 274.0310180u v − 377.6693697u v t − 116.2455138u v t +
5 3 5 3 5 2 3 5 2 2
135.0094523u v t − 132.6882963u v − 29.72556641u v t − 35.53845637u v t +
5 2 5 2 5 3 5 2 5
37.13287924u v t − 50.62106262u v − 55.38456003u vt − 6.687980644u vt + 13.24127784u vt+
5 5 3 5 2 5 5
16.77621554u v + 47.57727922u t + 37.98428531u t − 14.54446445u t − 3.453914175u −
4 8 4 7 4 7 4 6 2 4 6
225.3291719u v − 1392.615104u v t − 147.6093630u v + 110.7356966u v t − 2163.927390u v t
4 6 4 5 3 4 5 2 4 5
+ 464.1969368u v + 386.8314957u v t + 37.58791494u v t − 219.0874465u v t+
4 5 4 4 4 4 4 3 4 4 2
274.0310180u v − 73.36820000u v t + 392.6271541u v t + 222.2882726u v t +
4 4 4 4 4 3 4 4 3 3
735.4659250u v t − 207.3205267u v + 83.02988588u v t − 118.0842510u v t −
4 3 2 4 3 4 3 4 2 4
77.09075609u v t + 244.0885057u v t − 88.19709904u v − 37.62483034u v t −
4 2 3 4 2 2 4 2 4 2
113.7068820u v t − 8.326854722u v t − 45.23470569u v t + 12.09797900u v −
4 4 4 3 4 2 4 4
77.65166209u vt − 55.08896234u vt + 31.85022557u vt − 17.02325455u vt + 10.36532594u v+
4 4 4 3 4 2 4 4
28.47777799u t + 46.28598839u t − 2.917811290u t − 7.358099446u t − 1.944166441u +
3 8 3 7 3 7 3 6 2 3 6
191.5409281u v − 437.8480785u v t + 446.8628584u v − 78.18545046u v t − 505.3867129u v t
3 6 3 5 3 3 5 2 3 5
+ 203.1645374u v − 377.6693697u v t − 116.2455138u v t + 135.0094523u v t−
3 5 3 4 4 3 4 3 3 4 2
132.6882963u v + 83.02988588u v t − 118.0842510u v t − 77.09075609u v t +
3 4 3 4 3 3 5 3 3 4
244.0885057u v t − 88.19709904u v + 57.53328043u v t + 22.60680881u v t +
3 3 3 3 3 2 3 3 3 3
186.2840976u v t + 140.5263123u v t − 8.091457206u v t − 20.29157793u v +
3 2 5 3 2 4 3 2 3 3 2 2
2.757110052u v t − 23.04651883u v t + 34.97617188u v t + 56.64805815u v t −
3 2 3 2 3 5 3 4 3 3
36.40483748u v t + 4.218095336u v + 27.34220407u vt − 9.135170204u vt − 35.39213527u vt
10.7 The problem of the thirteen spheres 233
3 2 3 3 3 5 3 4
− 22.42382897u vt + 0.7941054727u vt + 5.569355446u v + 7.486743724u t + 13.30789637u t −
3 3 3 2 3 3 2 8
17.88091824u t − 20.64369889u t + 5.781887281u t + 2.801486009u + 59.48900200u v +
2 7 2 7 2 6 2 2 6 2 6
123.8174512u v t + 71.67183278u v + 41.35985267u v t + 199.8129603u v t − 54.08669500u v
2 5 3 2 5 2 2 5 2 5
− 29.72556641u v t − 35.53845637u v t + 37.13287924u v t − 50.62106262u v −
2 4 4 2 4 3 2 4 2 2 4
37.62483034u v t − 113.7068820u v t − 8.326854722u v t − 45.23470569u v t+
2 4 2 3 5 2 3 4 2 3 3
12.09797900u v + 2.757110052u v t − 23.04651883u v t + 34.97617188u v t +
2 3 2 2 3 2 3 2 2 6
56.64805815u v t − 36.40483748u v t + 4.218095336u v − 31.31247252u v t +
2 2 5 2 2 4 2 2 3 2 2 2
14.32494116u v t + 13.18700960u v t − 37.71556777u v t − 39.18630288u v t +
2 2 2 2 2 6 2 5
16.22869753u v t + 9.192752212u v + 0.3706844433u vt − 1.226131739u vt +
2 4 2 3 2 2 2 2
15.24079688u vt + 4.448344730u vt − 12.51651111u vt + 2.559716007u vt + 0.2868981952u v−
2 6 2 5 2 4 2 3 2 2
5.434675350u t − 2.282422116u t − 2.376206451u t − 5.756516944u t + 2.285069305u t +
2 2 8 7 7
0.9051509003u t − 0.1807740107u − 19.01121967uv + 62.08524794uv t − 49.22376389uv −
6 2 6 6 5 3 5 2
17.33067199uv t + 77.24220247uv t − 21.52580028uv − 55.38456003uv t − 6.687980644uv t +
5 5 4 4 4 3 4 2
13.24127784uv t + 16.77621554uv − 77.65166209uv t − 55.08896234uv t + 31.85022557uv t −
4 4 3 5 3 4 3 3
17.02325455uv t + 10.36532594uv + 27.34220407uv t − 9.135170204uv t − 35.39213527uv t −
3 2 3 3 2 6 2 5
22.42382897uv t + 0.7941054727uv t + 5.569355446uv + 0.3706844433uv t − 1.226131739uv t +
2 4 2 3 2 2 2 2
15.24079688uv t + 4.448344730uv t − 12.51651111uv t + 2.559716007uv t + 0.2868981952uv +
7 6 5 4 3
11.42699511uvt − 3.865608217uvt − 18.70371050uvt + 14.88704132uvt + 23.94378706uvt +
2 7 6
4.201735301uvt − 4.333409705uvt − 1.954761337uv − 0.1528426696ut + 0.07806634840ut −
5 4 3 2
1.066251935ut − 2.244106275ut + 1.817221500ut + 2.535123200ut − 0.5798954908ut − 0.3852239752u−
8 7 7 6 2 6 6
6.236063758v − 6.539547051v t − 6.216856357v + 16.53555569v t − 10.78477271v t + 0.8926140522v
5 3 5 2 5 5 4 4
+ 47.57727922v t + 37.98428531v t − 14.54446445v t − 3.453914175v + 28.47777799v t +
4 3 4 2 4 4 3 5
46.28598839v t − 2.917811290v t − 7.358099446v t − 1.944166441v + 7.486743724v t +
3 4 3 3 3 2 3 3
13.30789637v t − 17.88091824v t − 20.64369889v t + 5.781887281v t + 2.801486009v −
2 6 2 5 2 4 2 3 2 2
5.434675350v t − 2.282422116v t − 2.376206451v t − 5.756516944v t + 2.285069305v t +
2 2 7 6 5
0.9051509003v t − 0.1807740107v − 0.1528426696vt + 0.07806634840vt − 1.066251935vt −
4 3 2
2.244106275vt + 1.817221500vt + 2.535123200vt − 0.5798954908vt − 0.3852239752v
7 6 5 4 3 2
+ 0.5536647161t + 0.6067608795t + 2.504492501t + 6.632189667t + 3.925884209t − 1.191716995t −
1.064050680t − 1.168270210
Now the question remains: How did we come up with this polynomial?
Answer: We found it by solving a semidefinite program on the computer.
Without loss of generality we can set f0 = 1. Then, we have the following
semidefinite program with infinitely many constraints:
minimize B
subject to B ≥ 0, F1 , . . . , Fm 0,
F0 − E00 0,
Xm
hFk , Ykd (u, v, t)i ≤ 0 for all (u, v, t) as in 10.7.3.3,
k=0
m
X
hFk , Ykd (u, u, 1)i ≤ B for all u as in 10.7.3.4.
k=0
234 Packings on the sphere
G0 , . . . , Gm 0,
Xm
hGk , Ykd (u, v, t)i ≤ −1,
k=0
m
X
hGk , Ykd (u, u, 1)i ≤ B − 1.
k=0
We can model the infinitely many constraints using sums of squares. For
this we define the polynomials
{(u, v, t) : g1 (u, v, t) ≥ 0}
equals the domain described in 10.7.3.4. Hence, any feasible solution of the
following finite-dimensional semidefinite program provides a polynomial F
which satisfies the assumptions of Theorem 10.7.3:
minimize B
subject to B ≥ 0, G0 , . . . , Gm 0,
q1 , . . . , q5 , p1 , p2 sums of square polynomials,
Xm
−1− hGk , Ykd (u, v, t)i = q1 g1 + · · · + q4 g4 + q5 ,
k=0
m
X
−1+B− hGk , Ykd (u, u, 1)i = p1 g1 + p2 .
k=0
10.8 Further reading 235
Exercises
10.1(a) Determine fk in (10.3), completing the proof of τ8 = 240.
(b) Compute ϑ0 (G(2, π/3)).
(c) Determine α(G(n, π/4)).
10.2 Consider 12 points x1 , . . . , x12 on the sphere S 2 . What is the largest
possible minimal angle between distinct points xi and xj with i 6= j?
236 Packings on the sphere
10.3 Write a computer program for finding ϑ0 (G(n), π/3) and produce a
table for n = 2, . . . , 24.
10.4 Determine α(G(24, π/3)).
Exercises 237
.
PART FOUR
ALGEBRA
11
Sums of Squares of Polynomials
xα1 1 · · · xαnn with α ∈ Nn , and only finitely many pα are nonzero. If p is not the
zero polynomial, the maximum value of |α| = ni=1 αi for which pα 6= 0 is
P
the degree of p, denoted deg(p) (one may set deg(0) = −∞). The polynomial
p is homogeneous with degree d if it is of the form p = α∈Nn :|α|=d pα xα .
P
by
psos = sup{λ : λ ∈ R, p − λ ∈ Σ}
and thus, as a direct application of Lemma 11.1.1, one can compute psos
11.1 Sums of squares of polynomials 245
Coste and Roy [1998, Prop. 6.4.4] or by Blekherman, Parrilo and Thomas
[2012, Prop???]1 . For the ‘only if’ part, what Hilbert’s theorem claims is
that, for every pair (n, d) 6= (2, 4) with n ≥ 2 and even d ≥ 4, there is an
n-variate polynomial of degree d which is nonnegative over Rn but not sos.
One can check that it suffices to give such a polynomial for the two pairs
(n, d) = (2, 6), (3, 4). (See Exercise 11.5.) Concrete polynomials for the cases
(n, d) = (2, 6) and (3, 4) are given in the next example.
Example 11.1.6 Hilbert’s proof for the ‘only if ’ part of Theorem 11.1.5
was not constructive. The first concrete example of a nonnegative polynomial
that is not sos is the following polynomial, for the case (n, d) = (2, 6):
Proof To see that p is nonnegative √on R2 , one can use the arithmetic-
geometric mean inequality: a+b+c 3 ≥ 3 abc, applied to a = x4 y 2 , b = x2 y 4
and c = 1.
To show that p is not sos, one may use brute force. Assume p = l ql2 for
P
In fact, the same argument shows that p − λ is not sos for any scalar
λ ∈ R. Therefore, for the infimum of the Motzkin polynomial p over R2 , the
sos bound psos carries no information: psos = −∞, while pmin = 0 is attained
at (±1, ±1).
For the case (n, d) = (3, 4), the Choi-Lam polynomial:
q(x, y, z) = 1 + x2 y 2 + y 2 z 2 + x2 z 2 − 4xyz
Q(x, y, z, t) = t4 + x2 y 2 + y 2 z 2 + x2 z 2 − 4xyzt.
0
1.5
1
2
0.5 1.5
0 1
0.5
−0.5 0
−0.5
−1 −1
−1.5
−1.5 −2
Figure 11.1 A polynomial which is nonnegative but not sos: the Motzkin
polynomial (11.12)
On the other hand, if we fix the degree and let the number of variables
grow, then Blekherman [2006] showed that (roughly speaking) there are sig-
nificantly more nonnegative polynomials than sums of squares. To state his
result, let Pn,2d (resp., Σn,2d ) denote the set of homogeneous n-variate poly-
nomials of degree 2d that are nonnegative over Rn (resp., sums of squares).
Then the cone Pn,2d contains the cone Σn,2d and what Blekherman shows is
that Pn,2d is much larger than Σn,2d when n grows. To formulate the result
in a precise way we intersect these two cones by the hyperplane
n Z o
H= p: p(x)dµ(x) = 1
Sn−1
vol(Pn,2d ∩ H) 1/D
(d−1)/2
c·n ≤ ≤ C · n(d−1)/2 .
vol(Σn,2d ∩ H)
The following result by Reznick [1995] shows that, under some strict pos-
itivity assumption, a sum of squares decomposition exists involving denom-
inators of a very special form.
where g1 , . . . , gm ∈ R[x].
It is convenient to set g0 = 1. When all the polynomials p, gj have degree
one, Farkas’ lemma (see Theorem 2.6.1) states:
m
X
p ≥ 0 on K ⇐⇒ p = λj gj for some scalars λj ≥ 0. (11.15)
j=0
Hence this gives a full characterization of the linear polynomials that are
nonnegative over the polyhedron K.
Such a characterization does not extend to the case when p is a nonlinear
polynomial or when the description of K involves some nonlinear polynomi-
als gj . The situation is then much more delicate and results depend on the
assumptions made on the description of the set K.
Of course, the following implication holds trivially:
m
X
p= sj gj for some polynomials s0 , . . . , sm ∈ Σ =⇒ p ≥ 0 on K.
j=0
This result is due to Putinar [1993], we will discuss it in Section 11.2.5 below.
Note the analogy between (11.15) and (11.17): while the variables in
(11.15) are nonnegative scalars λj , the variables in (11.17) are sos poly-
nomials sj .
A result of the form (11.17) provides a positivity certificate for the poly-
nomial p; it also goes under the name Positivstellensatz. This terminology
has historical reasons, the name originates from the analogy to the classical
Nullstellensatz of Hilbert for the existence of complex roots:
In particular, we have:
m
X
VC (g1 , . . . , gm ) = ∅ ⇐⇒ 1 = uj gj for some uj ∈ K[x].
j=1
on R if and only if p is sos. We now consider the other two cases: the half-
line K = [0, ∞) and the compact interval K = [−1, 1]. In both cases a
full characterization of nonnegative polynomials is known in terms of sos
representations, moreover with explicit degree bounds. See Exercises 11.9
and 11.10 for the proofs of the next two theorems.
Theorem 11.2.2 (Pólya-Szegö) Let p be a univariate polynomial of degree
d. Then, p ≥ 0 on [0, ∞) if and only if p = s0 + s1 x for some s0 , s1 ∈ Σ with
deg(s0 ) ≤ d and deg(s1 ) ≤ d − 1.
Theorem 11.2.3 (Fekete, Markov-Lukácz) Let p be a univariate polyno-
mial of degree d. Assume that p ≥ 0 on [−1, 1]. Then the following holds.
(i) p = s0 +s1 (1−x2 ), where s0 , s1 ∈ Σ, deg(s0 ) ≤ d+1 and deg(s1 ) ≤ d−1.
(ii) p = s1 (1 + x) + s2 (1 − x), where s1 , s2 ∈ Σ, deg(s1 ), deg(s2 ) ≤ d.
Note the two different representations in (i), (ii), depending on the choice
of the polynomials chosen to describe the set K = [−1, 1].
In the next sections we will discuss various positivity certificates for the
multivariate case n ≥ 2.
Note that the above result does not help us yet directly to tackle the poly-
nomial optimization problem (11.8). Indeed, using (i), we can reformulate
pmin as
pmin = sup{λ : (p − λ)f = 1 + h, f, h ∈ T (g), λ ∈ R}.
However, this does not lead directly to a sequence of semidefinite programs
after adding a bound on the degrees of the variable polynomials f, h. This is
because we have the quadratic term λf , where both λ and f are unknown. Of
course, one could fix λ and solve the corresponding semidefinite programs,
254 Sums of Squares of Polynomials
and iterate using binary search on λ. However, there is a much more elegant
and efficient remedy: Using the refined representation results of Schmüdgen
and Putinar in the next sections one can set up simpler semidefinite pro-
grams permitting to optimize directly over the variable λ, without binary
search.
Note that relation (11.22) was already mentioned earlier in (11.16). Clearly,
11.2 Positivity certificates 255
Then M is Archimedean. For this we show that the polynomial n − ni=1 x2i
P
is an ideal (called the ideal generated by the gj ’s) and the set M(g) from
(11.20) is a quadratic module (called the quadratic module generated by the
gj ’s).
We start with some technical lemmas.
Lemma 11.2.15 If M ⊆ R[x] is a quadratic module, then I = M ∩ (−M )
is an ideal.
Proof This follows from the fact that, for any f ∈ R[x] and g ∈ I, we have:
f +1 2 f −1 2
fg = g+ (−g) ∈ I.
2 2
sup A, there exists a scalar such that 0 < < 1/N and a0 − ∈ A. Then,
we have f − (a0 − ) = (f − a0 ) + ∈ M and thus
−1 + s = g + (f − a0 + )s ∈ M.
proper, we must have that gj (a) ≥ 0 for each j. This shows that a ∈ K.
Finally,
−p(a) = (p − p(a)) − p ∈ M,
since p − p(a) ∈ I ⊆ M and −p ∈ M0 ⊆ M . Again, as M is proper, this
implies that −p(a) ≥ 0. We reach a contradiction because a ∈ K and p > 0
on K by assumption.
Lemma 11.2.19 Assume p > 0 on K. Then there exist N ∈ N and
h ∈ M(g) such that N − h ∈ Σ and hp − 1 ∈ M(g).
Proof Choose the polynomial s as in Lemma 11.2.18. Thus, s ∈ Σ and
sp − 1 ∈ M(g). As M(g) is Archimedean, we can find k ∈ N such that
2k − s ∈ M(g) and 2k − s2 p − 1 ∈ M(g).
Set h = s(2k −s) and N = k 2 . Then, h ∈ M(g) and N −h = k 2 −s(2k −s) =
(k − s)2 ∈ Σ. Moreover,
hp − 1 = s(2k − s)p − 1 = 2k(sp − 1) + (2k − s2 p − 1) ∈ M(g),
since sp − 1, 2k − s2 p − 1 ∈ M(g).
We can now show Theorem 11.2.11.
Proof (of Theorem 11.2.11)
Assume p > 0 on K. We want to show that p ∈ M(g). Let h and N satisfy
the conclusion of Lemma 11.2.19. We may assume that N > 0. Moreover let
k ∈ N such that k + p ∈ M(g) (such k exists since M(g) is Archimedean).
Then we have:
1 1
k− +p= kh ∈ M(g).
(N − h) (k + p) + (hp − 1) + |{z}
N N | {z } | {z } | {z }
Σ M(g) M(g) M(g)
Exercises
11.1 Assume f ∈ R[x] is a sum of squares of polynomials, with deg(f ) = 2d.
(a) Show that if f has a decomposition f = m 2
P
k=1 qk with qk ∈ R[x],
then each polynomial qk has degree at most d.
(b) Show that if f is homogeneous and has a decomposition f = m 2
P
k=1 qk
with qk ∈ R[x], then each polynomial qk is homogeneous and has
degree d.
11.7 Give a “sum of squares” proof for the Cauchy-Schwartz inequality. For
this show that the polynomial
n
! n ! n
!2
X X X
f (x, y) = x2i yi2 − x i yi ∈ R[x1 , . . . , xn , y1 , . . . , yn ]
i=1 i=1 i=1
11.11 Show the Real Nullstellensatz (Theorem 11.2.6) (you may use Theo-
rem 11.2.5).
11.12 Let G = (V, E) be a graph. The goal is to show the reformulation
(11.6) for the stability number α(G). Define the parameter
n X o
µ = min xT (AG + I)x : xi = 1, x ≥ 0 . (11.25)
i∈V
11.13 Let Σ2t = Σ∩R[x]2t denote the cone of sums of squares of polynomials
with degree at most 2t. Show that Σ2t is a closed set.
12
Moment matrices and polynomial equations
which asks for the infimum pmin of a polynomial p over a basic closed semi-
algebraic set K, of the form:
K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0}, (12.2)
where g1 , . . . , gm ∈ R[x]. In the preceding chapter we defined a lower bound
for pmin obtained by considering sums of squares of polynomials. Here we
consider another approach, which will turn out to be dual to the sum of
squares approach.
α
P
Write the polynomial p ∈ R[x] as p = α pα x , where there are only
finitely many nonzero coefficients pα , and let
p = (pα )α∈Nn
denote the vector of coefficients of p, so pα = 0 for all |α| > deg(p). Through-
out we let
[x]∞ = (xα )α∈Nn
denote the vector containing all monomials xα . Then, one can write:
X
p(x) = pα xα = pT [x]∞ .
α
Proof (of Theorem 12.1.8). In view of Remark 12.1.9, we may assume that
K = C.
(i) Assume first that dim A = k < ∞, we show that |VC (I)| < ∞. For
this, pick a variable xi and consider the k + 1 cosets [1], [xi ], . . . , [xki ]. Then
they are linearly dependent in A and thus there exist scalars λh (0 ≤ h ≤ k)
Pk h
(not all zero) for which the (univariate) polynomial f = h=0 λh xi is a
nonzero polynomial belonging to I. As f is univariate, it has finitely many
roots. This implies that the i-th coordinates of the points v ∈ VC (I) take
only finitely many values. As this holds for all coordinates we deduce that
VC (I) is finite.
12.1 The polynomial algebra R[x] 271
Assume now that |VC (I)| < ∞, we show that dim A < ∞. For this,
assume that the i-th coordinates of the points v ∈ VC (I) take k distinct
values: a1 , . . . , ak ∈ C. Then the polynomial f = (xi − a1 ) · · · (xi − ak )
vanishes at all v ∈ VC (I). Applying the Nullstellensatz, f m ∈ I for some
integer m ∈ N. This implies that the cosets [1], [xi ], . . . , [xmk
i ] are linearly
dependent. Therefore, there exists an integer ni for which [xni i ] lies in the
linear span of {[xhi ] : 0 ≤ h ≤ ni − 1}. From this one can easily derive that
the set {[xα ] : 0 ≤ αi ≤ ni − 1, i ∈ [n]} generates the vector space A, thus
showing that dim A < ∞.
(ii) Assume |VC (I)| < ∞. Lemma 12.1.12 (i) shows that |VC (I)| ≤ dim A.
If I is radical then the equality dim A = |VC (I)| follows
√ from Lemma 12.1.12
(iii). Assume now that I is not radical and let f ∈ I \ I. If pv (v ∈ VC (I))
are interpolation polynomials at the points of VC (I), then one can easily
verify that the system {[pv ] : v ∈ VC (I)} ∪ {[f ]} is linearly independent in
A, so that dim A ≥ |VC (I)| + 1.
h1 (x) = 0, . . . , hm (x) = 0.
whose entries are the evaluations at v of the polynomials in the set B. (Note
that this is well defined since the value bj (v) does not depend on the choice
of representative in the coset [bj ].)
Lemma 12.1.13 The vectors {[v]B : v ∈ VC (I)} are linearly independent.
P
Proof Assume that v∈VC (I) λv [v]B = 0 for some scalars λv , which means
P
v∈VC (I) λv bj (v) = 0 for all j ∈ [N ]. As the set B is a basis of the space A,
P
this implies that v∈VC (I) λv f (v) = 0 for all f ∈ K[x] (check it). Applying
this to the polynomial f = pv , we obtain that λv = 0 for all v ∈ VC (I).
As we now show, the matrix Mh carries useful information about the
elements of VC (I): its eigenvalues are the evaluations h(v) of h at the points
v ∈ VC (I) and the corresponding left eigenvectors are the vectors [v]B .
Theorem 12.1.14 Let h ∈ K[x], let I ⊆ K[x] be an ideal with |VC (I)| < ∞,
and let mh be the linear map from (12.8).
(i) Let B be a basis of A and let Mh be the matrix of mh in this basis B.
Then, for each v ∈ VC (I), the vector [v]B is a left eigenvector of Mh
with eigenvalue h(v), that is,
[v]T T
B Mh = h(v)[v]B . (12.9)
N
X N
X
[hbj ] = aij [bi ], i.e., hbj − aij bi ∈ I.
i=1 i=1
Evaluating the above polynomial at v ∈ VC (I) gives the identity h(v)bj (v) =
PN T
i=1 aij bi (v) = ([v]B Mh )j for all j ∈ [N ], which is relation (12.9).
(ii) By (i), we already know that each scalar h(v) is an eigenvalue of MhT ,
thus also of Mh and of mh . We now show that the scalars h(v) (v ∈ VC (I))
are the only eigenvalues of mh . For this, let λ 6∈ {h(v) : v ∈ VC (I)}, we
12.1 The polynomial algebra R[x] 273
which can be built using the relation [x3 ] = 6[1] − 11[x] + 6[x2 ]. It is easy
to verify that the matrix MxT has three eigenvectors: (1, 1, 1) with eigenvalue
λ = 1, (1, 2, 4) with eigenvalue λ = 2, and (1, 3, 9) with eigenvalue λ = 3.
Thus the eigenvectors are of the form [v]B = (1, v, v 2 ) for v ∈ VC (I) =
{1, 2, 3}.
The polynomials p1 = (x − 2)(x − 3)/2, p2 = −(x − 1)(x − 3) and p3 =
(x − 1)(x − 2)/2 are interpolation polynomials at the roots v = 1, 2, 3. Using
the relation [(x − 1)(x2 )(x − 3)] = 0, one finds that the matrix of mx with
respect to the base {[p1 ], [p2 ], [p3 ]} is
[xp1 ] [xp2 ] [xp3 ]
[p1 ] 1 0 0
[p2 ] 0 2 0 ,
[p3 ] 0 0 3
thus a diagonal matrix with the values v = 1, 2, 3 as diagonal entries.
Finally, we indicate how to compute the number of real roots using the
multiplication operators. This is a classical result, going back to work of
Hermite in the univariate case. You will prove it in Exercise 14.4 for radical
ideals. (See, e.g., Laurent [2009] for the general nonradical case.)
Theorem 12.1.16 Let I be an ideal in R[x] with |VC (I)| < ∞. Define the
Hermite quadratic form:
H : R[x]/I × R[x]/I → R
(12.10)
([f ], [g]) 7→ Tr(mf g ),
where Tr(mf g ) denotes the trace of the multiplication operator by f g. Let
σ+ (H) (resp., σ− (H)) denote the number of positive eigenvalues (resp., neg-
ative eigenvalues) of H. Then, the rank of H is equal to |VC (I)| and
σ+ (H) − σ− (H) = |VR (I)|.
ingredient: the notion of moment matrix, which plays a crucial role in this
problem.
one also says that µ is a representing measure for y. The moment problem is
n
the problem of deciding whether a given sequence y ∈ RN is the sequence
of moments of a measure; a variant is to decide existence of such a measure
supported by a given closed set K.
For any v ∈ Rn , the Dirac measure at v, denoted by δv , has support {v}
and mass 1 at v. Its sequence of moments is the sequence [v]∞ = (v α )α .
A measure µ is called finitely atomic if µ has finite support. Then µ is
Pr
a conic combination of Dirac measures: µ = i=1 λi δvi , for some scalars
λ1 , . . . , λr > 0 and supp(µ) = {v1 , . . . , vr } ⊆ Rn . Clearly, µ is a probability
measure precisely when ri=1 λi = 1.
P
Ly : R[x] → R
P α 7→ L (f ) =
P (12.11)
f= α fα x y α fα yα .
n
Definition 12.2.4 Given a sequence y ∈ RN and a polynomial g ∈ R[x],
n
define the new sequence gy ∈ RN , with entries
X
(gy)α = Ly (gxα ) = gγ yα+γ for all α ∈ Nn .
|γ|≤deg(g)
has the shape of a Hankel matrix (all entries are the same on each antidi-
agonal). For the polynomial g = 1 − x2 , the sequence gy ∈ RN has entries
(gy)i = yi − yi+2 for i ∈ N and thus its moment matrix has the form
y0 − y2 y1 − y3 y2 − y4 y3 − y5 ...
y1 − y3 y2 − y4 y3 − y5 y4 − y6 . . .
M (gy) = y2 − y4 y3 − y5 y4 − y6 y5 − y7 . . ..
y − y y4 − y6 y5 − y7 y6 − y8 . . .
3 5
.. .. .. .. ..
. . . . .
We now show the shape of the moment matrix M (y) in the bivariate case
n = 2, where we assume it is indexed by the monomials orderd by the graded
278 Moment matrices and polynomial equations
Proof It suffices to show the first identity, since the rest follows then easily.
For f = α fα xα , h = β hβ xβ , we have:
P P
X X X
Ly (f gh) = Ly fα hβ xα+β g = fα hβ Ly (xα+β g) = fα hβ (gy)α+β ,
α,β α,β α,β
Next we observe that the kernel of the moment matrix M (y) can be seen
as an ideal of R[x], which is real radical when M (y) 0. This observation
will play a cucial role in the characterization of the set C∞ (K) in the next
section.
Lemma 12.2.7 Let y = (yα )α∈Nn be a sequence of real numbers and let
Ly be the corresponding linear functional from (12.11). Define the set
(i) L(p) ≥ 0 for all p ∈ P(K), i.e., such that p(x) ≥ 0 for all x ∈ K.
(ii) There exists
R a measure µ supported by K such that
L(p) = K p(x)dµ(x) for all p ∈ R[x].
Assume that K is given as in (12.2). Then clearly the quadratic module
M(g) is contained in P(K). A natural question is whether nonnegativity
over M(g) suffices to claim existence of a representing measure. Putinar
[1993] gave an affirmative answer under the Archimedean condition, which
is thus the analogue on the ‘moment side’ of Theorem 11.2.11.
Theorem 12.2.10 (Putinar [1993]) Assume that M(g) is Archimedean.
Then, for any L ∈ R[x]∗ , the following assertions are equivalent:
Proof The equivalence of (ii) and (iii) follows directly from Lemma 12.2.6
(and the fact that Ly (1) = y0 ) and the implication (i) =⇒ (ii) follows from
Lemma 12.2.8.
We now show the implication (ii) =⇒ (i), the technical core of Theo-
rem 12.2.14. So we assume that (ii) holds and set r = rank M (y); we will
construct a finite r-atomic measure supported by K with y as sequence of
moments. Recall Ly is the linear functional from (12.11) and let I be the set
from (12.13). By assumption, we have Ly (1) = y0 = 1. By Lemma 12.2.7,
we know that I is a real radical (thus also radical) ideal in R[x].
First we claim
dim R[x]/I = r.
This follows directly from the fact that a set of columns {C1 , . . . , Cs } of the
moment matrix M (y), indexed by {α1 , . . . , αs } ⊆ Nn , is linearly independent
if and only if the corresponding cosets of monomials {[xα1 ], . . . , [xαs ]} is
linearly independent in R[x]/I. Hence, dim R[x]/I = rank M (y) = r.
As dim R[x]/I = r < ∞, it follows using Theorem 12.1.8(ii) that |VC (I)| =
dim R[x]/I = r. Finally, by Lemma 12.1.7, we obtain VR (I) = VC (I). Hence
VC (I) = {v1 , . . . , vr } ⊆ Rn
282 Moment matrices and polynomial equations
For this, note that the polynomial pv1 − p2vi vanishes at all points in VC (I)
and thus pvi − p2vi ∈ I(VC (I)) = I since I is radical. Therefore, we have
Ly (pv1 ) = Ly (p2vi ) ≥ 0, where the inequality follows since M (y) 0. Also,
Ly (pvi ) 6= 0 since, otherwise, in view of (12.15) the rank of M (y) would be
smaller than r. Finally, since the polynomial ri=1 pvi − 1 vanishes at all
P
As we just showed that Ly (pvi ) > 0 this implies gj (vi ) ≥ 0, as desired, and
the proof is complete.
12.2 Characterizing the convex set C∞ (K) 283
Based on the discussion in the preceding section, we can define the following
parameter, which also provides a lower bound for pmin :
n
pmom = inf{pT y : y ∈ RN , y0 = 1, M (y) 0, M (gj y) 0 for j ∈ [m]}
= inf{L(p) : L ∈ R[x]∗ , L(1) = 1, L ≥ 0 on M(g)}.
(12.17)
As we now see, these two bounds are in weak duality to each other.
Proof The inequality pmom ≤ pmin follows from the fact that, for each
v ∈ K, the evaluation Evv at v is feasible for the second program in (12.17),
with objective value Evv (p) = p(v). An alternative, equivalent way is to
notice that the sequence y = [v]∞ is feasible for the first program in (12.17).
The inequality psos ≤ pmom follows from ‘weak duality’: Let λ be feasible
for (12.16) and let L be feasible for (12.17). That is, p − λ ∈ M(g), L(1) = 1
and L ≥ 0 on M(g). Then, we have L(p) − λ = L(p − λ) ≥ 0, which implies
L(p) ≥ λ and thus pmom ≥ psos .
holds if the program (12.17) has an optimal solution y for which M (y) has
finite rank. We will come back to this in the next section.
Note that the programs (12.16) and (12.17) are infinite dimensional pro-
grams, since no degree bound has been put on the unknown sums of squares
polynomials, and the linear functional L acts on the full polynomial space.
In the next chapter we will consider hierarchies of semidefinite programming
relaxations for problem (12.1) that are obtained by adding degree constraints
to the programs (12.16) and (12.17), and we will use the results of Theo-
rems 12.1.14 and 12.2.14 for giving a procedure to find global optimizers of
problem (12.1).
The terminology of ‘moment matrix’ which we have used for the matrix
M (y) is motivated by the relevance of these matrices to the classical moment
problem. Recall that,R given a (positive Borel) measure µ on a subset K ⊆ Rn ,
α
the quantity yα = K x dµ(x) is called its moment of order α. The classical
n
K-moment problem asks to characterize the sequences y ∈ RN which are
the sequence of moments of some measure µ supported by K.
In the special case when µ is a Dirac measure at a point v ∈ Rn , i.e.,
when µ has mass only at the point v, its sequence of moments is precisely
the sequence [v]∞ = (v α )α∈Nn . More generally, when µ is a finite atomic
measure, which means that µ is supported by finitely many points of K,
then its sequence of moments is of the form y = ri=1 λi [vi ]∞ for finitely
P
many positive scalars λi and points vi ∈ K. In other words, the set C∞ (K)
coincides with the set of sequences of moments of finite atomic measures on
K. Moreover, the closure of the set C∞ (K) is the set of sequences of moments
of an arbitrary measure on K. Hence, Theorem 12.2.14 characterizes which
sequences admit a finite atomic measure on K, when K is a basic closed semi-
algebraic set, in terms of positivity and finite rank conditions on the sequence
y. This result is due to Curto and Fialkow [1996]. (When the condition
rank M (y) < ∞ holds, Curto and Fialkow speak of flat data). The proof of
Curto and Fialkow [1996] uses tools from functional analysis, the algebraic
proof given here is based on Laurent [2005] (see also Laurent [2009]). In
Chapter 14.4 we will see another functional analytic based approach in the
more general setting of noncommutative polynomial optimization.
We refer to the books of Cox, Little and O’Shea [1992, 1998] for further
1 This section needs to be updated
Exercises 285
Exercises
√ √
12.1 Recall the definitions (12.5) and (12.6) for I and R I.
√
(a) Show that the radical I of an ideal I ⊆ C[x] is an ideal.
√
(b) Show that the real radical R I of an ideal I ⊆ R[x] is an ideal.
12.2 Show Lemma 12.1.5.
12.3 (a) Let I and J be two ideals in C[x]. Show that I ∩ J is an ideal and
that VC (I ∩ J) = VC (I) ∪ VC (J).
(b) Given v ∈ Cn , show that the set {v} is a complex variety.
(c) Show that any finite set V ⊆ Cn is a complex variety.
12.4 The goal is to show Theorem 12.1.16 in the radical case.
Let I be a radical ideal in R[x] with N = |VC (I)| = dim R[x]/I < ∞.
Let B = {[b1 ], . . . , [bN ]} be a base of A = R[x]/I and, for any h ∈ R[x],
let Mh denote the matrix of the multiplication by h in the base B. Then,
the matrix of the Hermite quadratic form (12.10) in the base B is the
real symmetric matrix H = (Hij )N i,=1 with entries Hij = Tr(Mbi bj ).
Finally, σ+ (H), σ− (H) denote, respectively, the numbers of positive
and negative eigenvalues of H.
(a) Show that H = v∈VC (I) [v]B [v]T
P
B and rank(H) = |VC (I)|.
(b) Let µ be a finite atomic measure and let y be the sequence of mo-
ments of µ. Show that rankM (y) = |supp(µ)|.
286 Moment matrices and polynomial equations
and
pmom = inf{L(p) : L ∈ R[x]∗ , L(1) = 1, L ≥ 0 on M(g)}.
Recall that R[x]∗ is the dual space of R[x], consisting of all real valued linear
maps on R[x]. These two parameters satisfy the inequalities:
psos ≤ pmom ≤ pmin
(Lemma 12.2.15), with equality throughout when the quadratic module
M(g) is Archimedean (Theorem 12.2.16). They both can be reformulated
using positive semidefinite matrices. However these are infinite matrices, in-
dexed by Nn , since there is no a priori degree bound on the sums of squares
entering decompositions of polynomials in M(g), and since the linear func-
tionals act on the full polynomial space R[x]. Hence, it is not clear how to
compute the parameters pmom and psos .
There is however a simple remedy: instead of working with the full quadratic
module M(g) ⊆ R[x], we truncate it by adding increasing degree bounds. In
288 Polynomial optimization
and
We will refer to (13.3) as the sos program and to (13.4) as the moment
program.
It follows from the definitions that
Moreover, as t grows, we get bounds for pmin that are potentially better and
better:
dg = ddeg(g)/2e,
g = (gα )α∈Nnt
denote the coefficient vector of g, where gα = 0 whenever |α| > deg(g). Then
we have
g(x) = gT [x]t for any t ≥ deg(g).
290 Polynomial optimization
to M(g); clearly, this implies that K is contained in the ball with radius R
(and thus K is compact).
Of course, if a ball constraint is already explicitly present in the description
(13.2) of K, say the first polynomial is g1 (x) = R2 − m 2
P
i=1 xi , then it is
immediately clear that the quadratic module M(g) is Archimedean.
Theorem 13.1.1 Assume that M(g) is Archimedean. Then, the bounds
pmom,t and psos,t converge asymptotically to pmin as t → ∞.
Proof Pick > 0. Then the polynomial p − pmin + is strictly positive on
K. As M(g) is Archimedean, we can apply Putinar’s theorem (Theorem
13.2.9) and deduce that p − pmin + ∈ M(g). Hence, there exists t ∈ N
such that p − pmin + ∈ M(g)2t and thus pmin − ≤ psos,t . Therefore,
limt→∞ psos,t = pmin . By (13.5), psos,t ≤ pmom,t ≤ pmin for all t ≥ deg(p)/2.
Hence limt→∞ pmom,t = pmin holds as well.
As the above proof shows, it follows from Putinar’s theorem that the sos
program (13.3) is feasible for all t large enough when M(g) is Archimedean.
In fact, as we will see later in Lemma 13.3.2, it is feasible for all t ≥ 1 when
a ball constraint is present in the description of K.
defining the set K: for γ ∈ Nn2t and j ∈ [m], Ajt,γ is the matrix indexed by
Nnt−dg , with entries
j
(Ajt,γ )α,β =
X
(gj )δ .
δ:α+β+δ=γ
yγ Ajt,γ for j = 0, 1, . . . , m.
X
Mt−dgj (y) = (13.8)
γ∈Nn
2t
Note that if we apply this to the case when y = [x]∞ = (xα )α is a monomial
vector then we obtain:
(13.10)
where the optimization variable is y = (yγ )γ∈Nn2t \{0} .
We now proceed to reformulate the sos problem (13.3) using the matrices
Ajt,γ . By definition, the parameter psos,t is the largest scalar λ for which the
polynomial p − λ can be written as
m
X
p−λ= sj gj with sj ∈ Σ2(t−dgj ) .
j=0
and
m
hAjt,γ , Qj i
X
pγ = for all 1 ≤ |γ| ≤ 2t.
j=0
j=0
(13.11)
where the matrix variable Qj is indexed by Nnt−dg , for j = 0, 1, . . . , m.
j
Now, we can clearly see that the two programs (13.10) and (13.11) are
dual of each other, with the sos program (13.11) being in standard primal
form and the moment program (13.10) being in standard dual form (albeit
over a product of positive semidefinite cones). Summarizing we have shown:
Lemma 13.2.2 The moment program (13.4) and the sos program (13.3)
are dual semidefinite programs.
have
|yα | ≤ R|α| for all |α| ≤ 2t.
Lemma 13.3.2 Assume the inequality g1 (x) = R2 − ni=1 x2i ≥ 0 is present
P
in (13.6). Then, strong duality holds: pmom,t = psos,t for all t ≥ max{dp , dK }.
Proof By Lemma 13.3.1 the feasibility region of the moment program is
bounded. We distinguish two cases, depending whether it is empty or not.
If it is nonempty then the set of optimal solutions of the moment program
is bounded and nonempty and thus strong duality holds in view of Proposi-
tion 2.6.11 (recall the discussion at the beginning of this section).
13.3 Strong duality 295
For this we use Nβ ≥ 1 if β =6 0 and the fact that λmin (Mt (y)) ≥ −, i.e.,
P
Mt (y) + I 0, so y2β ≥ − for |β| ≤ t. Then 1≤|β|≤k Nβ y2β is equal to
X X X X
Nβ (y2β + ) − Nβ ≥ (y2β + ) − Nβ
1≤|β|≤k 1≤|β|≤k 1≤|β|≤k 1≤|β|≤k
X
= Tr(Mk (y)) − y0 + (1 − Nβ ).
1≤|β|≤k
We now consider the sequence y from (13.13). As k ≤ t the matrix Mk−1 (g1 y )
is a principal submatrix of the matrix Mt−1 (g1 y ) and thus
which implies
n+k−1 n+t−1
Tr(Mk−1 (g1 y )) ≥ − ≥ − .
k−1 t−1
Combining with (13.14) applied to y = y and using y0 ≤ 1 + we get
2
X n+t−1
Tr(Mk (y )) ≤ R Tr(Mk−1 (y )) + 1 + 1 + (Nβ − 1) +
t−1
1≤|β|≤k
| {z }
=:C
and thus
(i) Assume that rank Mt (y) = rank Mt−1 (y). Then we have
We now state the main result of this section, known as the flat extension
theorem.
n
Theorem 13.4.3 (Curto and Fialkow [1996]) Given a sequence y ∈ RN2t
and t ≥ 1, consider its moment matrices Mt (y) and Mt−1 (y). Assume that
the following flatness condition holds:
(ii) Let I denote the ideal in R[x] corresponding to the kernel of M (ỹ). If
{α1 , . . . , αr } ⊆ Nnt−1 indexes a maximum linearly independent set of
columns of the matrix Mt−1 (y), then the subset {[xα1 ], . . . , [xαr ]} of
R[x]/I is a basis of R[x]/I. Moreover, the ideal I is generated by the
polynomials in ker Mt (y).
13.4 Flat extensions of moment matrices 299
Proof Since the proof of (i) is quite technical, we begin with the proof of
(ii) and will prove (i) thereafter. Let V denote the linear span of the set
{xα1 , . . . , xαr }, consisting of the polynomials ri=1 λi xαi (λi ∈ R). As the
P
To show the reverse inclusion we first claim that the polynomial space can
be decomposed as
For this, we show using induction on |α| that any monomial xα belongs
to V + (ker Mt (y)). If |α| ≤ t this follows from the definition of the αi ’s.
Assume now |α| ≥ t + 1 and αi ≥ 1 for some i ∈ [n]. By the induction
assumption, we know that xα−ei = p + q, where p ∈ V and q ∈ (ker Mt (y)).
Hence, xα = xi xα−ei = xi (p + q) = xi p + xi q ∈ V + (ker Mt (y)), because
xi p ∈ V + (ker Mt (y)) (since deg(xi p) ≤ t) and xi q ∈ (ker Mt (y)) (since
q ∈ (ker Mt (y)). This concludes the proof of (13.19).
We can now show the reverse inclusion
We now turn to the proof of (i), whose details are elementary but a bit
technical. (As a warm-up you may want to consider the univariate case
n = 1 in Exercise 13.5). A first observation is that it suffices to construct
n
an extension ỹ ∈ RN2t+2 of y, whose moment matrix is a flat extension of
the moment matrix of y, i.e., such that rank Mt+1 (ỹ) = rank Mt (y). Indeed,
after iterating this construction we then obtain an infinite sequence with the
desired property.
By assumption, Mt (y) is a flat extension of its principal submatrix Mt−1 (y),
300 Polynomial optimization
for any α ∈ Nnt there exists a (unique) p ∈ V such that xα − p ∈ ker Mt (y),
(13.21)
which follows from the flatness condition (13.17).
The construction of a flat extension M as in (13.20) relies in a crucial way
on Lemma 13.4.2. Take γ ∈ Nn with |γ| = t + 1, assume γi ≥ 1 for some
i ∈ [n] and let p ∈ V such that xγ−ei − p ∈ ker Mt (y) (as in (13.21)). If M
is a flat extension of Mt (y) then it follows from Lemma 13.4.2(i) that the
polynomial xi (xγ−ei − p) = xγ − xi p must belong to the kernel of M . This
fact enables us to define the entries of the block B in terms of the entries of
Mt (y) and in turn the entries of the block C in terms of the entries of B T .
After that we only need to verify that these definitions are good, i.e., that
they do not depend on the choice of the index i for which γi ≥ 1, and that
the matrix M constructed in this way is indeed a moment matrix.
This is what we do through the next three claims. It will be convenient to
also use the symbol vec(f ) to denote the coefficient vector of a polynomial
f , especially when we want to discuss the coefficient vector of a combination
of polynomials.
Proof As rank Mt (y) = rank Mt−1 (y), in view of Lemma 13.4.1(i), in order
to show that xi p − xj p0 belongs to the kernel of Mt (y) it suffices to show
that Ly (h(xi p − xj p0 )) = 0 for all h ∈ R[x]t−1 . So, let h ∈ R[x]t−1 . Then we
13.4 Flat extensions of moment matrices 301
have
Ly (h(xi p − xj p0 )) = Ly ((hxi )p) − Ly ((hxj )p0 )
= Ly ((hxi )(p − xγ−ei )) − Ly ((hxj )(p0 − xγ−ej ))
= 0.
The second equality follows from hxi xγ−ei = hxj xγ−ej (= hxγ , with degree
at most 2t) and the last equality follows since p−xγ−ei , p0 −xγ−ej ∈ ker Mt (y)
and deg(xi h), deg(xj h) ≤ t.
Next we show that xi p − xj p0 belongs to the kernel of B T , or, equivalently,
that vec(xδ )T B T vec(xi p − xj p0 ) = 0 for all |δ| = t + 1. Let |δ| = t + 1, let
k ∈ [n] such that δk ≥ 1 and let p00 ∈ V such that xδ−ek − p00 ∈ ker Mt (y). By
construction, the polynomial xδ − xk p00 belongs to the kernel of the matrix
(Mt (y) B). This gives:
which implies
Claim 13.4.5 Mγ,δ = Mγ+ei ,δ−ei for all γ, δ ∈ Nnt+1 and i ∈ [n] such that
δi ≥ 1 and |γ| ≤ t.
Claim 13.4.6 Mγ,δ = Mγ−ej +ei ,δ+ej −ei for all γ, δ ∈ Nn and i, j ∈ [n]
such that γj ≥ 1, δi ≥ 1 and |γ| = |δ| = t + 1.
xk (xγ−ej −p) and xk (xδ−ei −p0 ) belong to ker M . Using these facts we obtain:
Mγ,δ
= vec(xγ )T M vec(xδ ) = vec(xj p)T M vec(xi p0 ) = vec(xj p)T Mt (y)vec(xi p0 )
= Ly ((xj p)(xi p0 )) = Ly ((xi p)(xj p0 )) = vec(xi p)T Mt (y)vec(xj p0 )
= vec(xi p)T M vec(xj p0 ) = vec(xi xγ−ej )T M vec(xj xδ−ei )
= Mγ+ei −ej ,δ+ej −ei .
(iii) If L is an optimal solution of (13.4) for which the matrix Mt (y) has
maximum possible rank, then VC (ker Ms (y)) = Kp∗ .
Proof By assumption, y satisfies the condition (13.22) and thus the flatness
condition:
rank Ms (y) = rank Ms−1 (y).
Hence we can apply Theorem 13.4.3 and conclude that there exists a se-
n
quence ỹ ∈ RN which extends the subsequence (yα )|α|≤2s of y and satisfies
rank M (ỹ) = rank Ms (y) =: r.
Thus, ỹα = yα if |α| ≤ 2s, but it could be that ỹ and y differ at entries
indexed by monomials of degree higher than 2s, these entries of y will be
irrelevant in the rest of the proof. Let I be the ideal corresponding to the
kernel of M (ỹ). By Theorem 13.4.3, I is generated by ker Ms (y) and thus
VC (I) = VC (ker Ms (y)).
As M (ỹ) is positive semidefinite with finite rank r, we can apply Theo-
n
rem 12.2.14 (and its proof) to the sequence ỹ ∈ RN and deduce that
VC (I) = {v1 , . . . , vr } ⊆ Rn
and
r
X r
X
ỹ = λi [vi ]∞ where λi > 0 and λi = 1.
i=1 i=1
n
Taking the projection of ỹ onto the subspace RN2s , we obtain:
r
X r
X
(yα )α∈Nn2s = λi [vi ]2s where λi > 0 and λi = 1.
i=1 i=1
In other words, the restriction of the linear map L to the subspace R[x]2s is
the convex combination ri=1 λi Evvi of evaluations at the points of VC (I):
P
r
X
L(f ) = λi f (vi ) for all f ∈ R[x]2s . (13.23)
i=1
and fi lies in the linear span of the monomials xα1 , . . . , xαr . Thus the fi ’s
are again interpolation polynomials but now with degree at most s − dK .
Next we claim that
v1 , . . . , vr ∈ K.
To see this, we use the fact that L ≥ 0 on (gj Σ) ∩ R[x]2t for all j ∈ [m]. As
deg(pvi ) ≤ s − dK , we have: deg(gj p2vi ) ≤ deg(gj ) + 2(s − dK ) ≤ 2s. Hence we
can compute L(gj p2vi ) using (13.23) and obtain that L(gj p2vi ) = gj (vi )λi ≥ 0.
As λi > 0 this gives gj (vi ) ≥ 0 for all j and thus vi ∈ K.
We can now proceed to show the claims (i)-(iii). By assumption, L is an
optimal solution of (13.4) and thus pmom,t = L(p). As deg(p) ≤ 2s, we can
evaluate L(p) using (13.23). We obtain:
r
X
pmom,t = L(p) = λi p(vi ) ≥ pmin ,
i=1
P
since i λi = 1 and, for any i ∈ [r], λi > 0 and p(vi ) ≥ pmin as vi ∈ K.
As the reverse inequality: pmom,t ≤ pmin always holds, we have equality:
P
pmom,t = pmin and (i) holds. In turn, the equality i λi p(vi ) = pmin implies
termwise equality: p(vi ) = pmin for all i. This shows that each vi is a global
minimizer of p in K, i.e., {v1 , . . . , vr } ⊆ Kp∗ , showing (ii).
Assume now that the optimal solution L of (13.4) is chosen in such a
way that rank Mt (y) is maximum among all optimal solutions of (13.4). In
view of the results1 in Chapter 8.3, this means that y lies in the relative
interior of the face of the feasible region of (13.4) consisting of all opti-
mal solutions. Therefore, for any other optimal solution y 0 , we have that
ker Mt (y) ⊆ ker Mt (y 0 ). Consider a global minimizer v ∈ Kp∗ of p in K
and the corresponding optimal solution y 0 = [v]2t of (13.4). The inclusion
ker Mt (y) ⊆ ker Mt (y 0 ) implies that any polynomial in ker Mt (y) vanishes at
the point v. Therefore, we obtain: ker Ms (y) ⊆ ker Mt (y) ⊆ I(Kp∗ ), which
implies
I = (ker Ms (y)) ⊆ I(Kp∗ ).
In turn, this implies the inclusions:
Kp∗ ⊆ VC (I(Kp∗ )) ⊆ VC (I) = {v1 , . . . , vr }.
Thus (iii) holds and the proof is complete.
Note that Theorem 13.5.1 needs some assumptions. First, the moment
program (13.4) should have an optimal solution L. This is the case, for
1 Give more precise reference?
13.5 Finite convergence and global minimizers 305
On the other hand, surprisingly, the flatness condition often holds in prac-
tice. In fact, it has been shown by Nie [2014] that the flatness condition holds
generically (which, roughly speaking, means that the polynomials defining
the polynomial optimization problem have generic coefficients when fixing
their degrees).
Another question raised by Theorem 13.5.1 is how to find an optimal so-
lution whose moment matrix has maximum possible rank. It is in fact a
property of most interior-point algorithms that, when solving a semidefinite
program, they return an optimal solution lying in the relative interior of the
optimal face of the feasible region, and thus an optimal solution with maxi-
306 Polynomial optimization
mum rank. See the monographs by de Klerk [2002, Chap. 4] and Wolkowicz
et al. [2000, Chap. 5] for details.
We now turn to the question of how to find the global minimizers of p in
K under the assumptions of Theorem 13.5.1.
As Theorem 13.5.1 shows, if L is an optimal solution of the moment pro-
gram (13.4) which satisfies the flatness condition (13.22), then any common
root to the polynomials in ker Ms (L) is a global minimizer of p in K. More-
over, if the rank of the matrix Mt (L) is largest possible (among all optimal
solutions of (13.4)) then all global minimizers are found in this way. Hence,
we only need to compute the variety VC (ker Ms (y)). For this we can apply
the eigenvalue method described in Section 12.1.3. Indeed, as we see below,
all the information that we need in order to be able to apply this method is
contained in the moment matrix Ms (y).
Define the ideal I = (ker Ms (y)). Then, I is a real radical ideal. Indeed,
n
by Theorem 13.5.1, I = ker M (ỹ), where ỹ is the extension to RN of the
sequence (yα )|α|≤2s and, by Lemma 12.2.7, I is real radical since M (ỹ) 0.
We saw in Theorem 12.1.14 how to compute the variety VC (I) via the
eigenvalues/eigenvectors of multiplication matrices. One needs the assump-
tion that VC (I) is finite, which is the case here. In fact we know that
|VC (I)| = r, the rank of the associated moment matrix Ms (y), and that
VC (y) ⊆ Rn . Then, what we need in order to compute VC (I) is an explicit
basis B of the quotient space R[x]/I and the matrix Mh in this basis B of
some multiplication (‘by h’) operator acting on R[x]/I.
Finding a basis B of R[x]/I is easy: if a set {α1 , . . . , αr } ⊆ Nns−dK indexes
a maximum linearly independent set of columns of the matrix Ms−1 (y), then
the set B = {[xα1 ], . . . , [xαr ]} of corresponding cosets in R[x]/I is a basis of
R[x]/I.
Finally, we indicate how to construct the explicit matrix Mh representing
the ‘multiplication by h’ operator in the basis B. For any variable xk , recall
that the ‘multiplication by xk ’ is the linear map:
[f ] ∈ R[x]/I 7→ [xk f ] ∈ R[x]/I.
Hence, in order to build its matrix Mxk in the basis B, we need to know
how to express each coset [xk xαj ] in the basis B = {[xα1 ], . . . , [xαr ]}. This
can easily be done using the moment matrix Ms (y). Indeed, for any index
j ∈ [r], we have: deg(xk xαj ) = 1 + |αj | ≤ 1 + s − dK ≤ s. Hence we can
express the column of Ms (y) indexed by xk xαj as a linear combination of
the columns indexed by the monomials xα1 , . . . , xαr , which directly gives
the j-th column of the matrix Mxk . Once we know the matrix Mxk for each
13.6 Real solutions of polynomial equations 307
h1 (x) = 0, . . . , hm (x) = 0
Ly (1) = 1, Ly ≥ 0 on Σ2t ,
(13.24)
Ly (uhj ) = 0 for all j ∈ [m] and u ∈ R[x] with deg(uhj ) ≤ 2t.
308 Polynomial optimization
Then there exist integers t0 and s such that dK ≤ s ≤ t0 and the following
rank condition holds:
Proof The goal is to show that if we choose t large enough, then the kernel
of Mt (y) contains sufficiently many polynomials permitting to show that
the rank condition (13.25) holds. Here y is an arbitrary feasible solution in
Ft and Ly is its corresponding linear functional on R[x]2t . We assume that
t ≥ maxj deg(hj ). Then we have
(That such a nice set of generators exists follows from the theory of Gröbner
bases.) Next we claim:
t2 = max{t1 , d0 + dK }.
where p(α) lies in the span of M. Hence p(α) has degree at most d0 and
(α)
thus each term ul fl has degree at most max{|α|, d√0 } ≤ t2 . Here we have
used the fact that {[b1 ], . . . , [br ]} is a basis
√ of R[x]/ R I, combined with the
property (13.27) of the generators fl of R I.
We can now conclude the proof. Namely, we show that, if t ≥ t0 :=
t2 + 1, then the rank condition (13.25) holds with s = t2 . For this pick a
monomial xα of degree at most t2 and consider the decomposition in (13.29).
(α)
As deg(ul fl ) ≤ t2 ≤ t − 1 and fl ∈ ker Mt (y) (by (13.28)), we obtain that
(α)
ul fl ∈ ker Mt (y) (by Lemma 13.4.2(ii)). Therefore, the polynomial xα −p(α)
belongs to the kernel of Mt (y). As the degree of p(α) is at most d0 ≤ t2 − dK ,
we can conclude that rank√ Mt2 (y) = rank Mt2 −dK (y).
R
Finally, the equality I = (ker Mt2 (y)) follows from Theorem 13.5.1(iii).
Example 13.6.2 Let I be the √ ideal generated by the polynomial x21 + x22 .
Clearly, VR (I) = {(0, 0)} and R I = (x1 , x2 ) is generated by the monomials
x1 and x2 . As we now see this can also be found by applying the above result.
Indeed, let Ly be a feasible solution in the set Ft defined by (13.24) for
t = 1. Then we have Ly (x21 ), Ly (x22 ) ≥ 0 and Ly (x21 + x22 ) = 0. This implies:
Ly (x21 ) = Ly (x22 ) = 0 and thus Ly (x1 ) = Ly (x2 ) = Ly (x1 x2 ) = 0. Hence the
moment matrix M1 (y) has the form:
1 x1 x2
1 1 y10 y01 .1 0 0
M1 (y) = x1 y10
y20 y11 = 0 0 0 .
x2 y01 y11 y02 0 0 0
We will now explain the link between these two types of moment matrices.
In particular, as we will see, if we apply the moment hierarchy introduced in
this chapter to problem (13.30) then the resulting parameters pmom,t coincide
with the bounds last (G).
Let I 01 = (x2i − xi : i ∈ [n]) denote the ideal generated by the polynomials
2 Recall from Lemma 4.9.10 that we could equivalently require that zI = 0 for any I ∈ P2t (n)
that contains an edge of G.
3 In order to distinguish with the notion of moment matrix Mt (y) considered in this chapter,
we use here the notation Mt01 (z) to denote the moment matrix introduced in Definition 4.9.1;
the superscript ‘01’ refers to the fact that we work with binary variables.
13.7 Binary polynomial optimization 311
Proof The first claim follows from the fact that xα −xᾱ ∈ I2t for all α ∈ Nn2t
(check it). This in turn permits to show that for any α ∈ Nnt the αth column
of Mt (y) coincides with its ᾱth column. Indeed, given β ∈ Nnt , we have
Ly (xβ xα ) = Ly (xβ xᾱ ), since yβ+α = yβ+ᾱ as the two sequences β + α and
β + ᾱ have the same support.
the finite convergence of the moment bounds and in fact a sharper result:
pmom,t = α(G) for t ≥ α(G). In addition, this offers an alternative argument
for finite convergence (recall Section 4.9.1), which is easier than relying on
the flat extension theorem. The key here is exploiting the equations x2i = xi
in order to derive a simpler structure for the moment matrices, with less
variables and smaller size.
The above applies to any binary polynomial optimization problem:
gz in the obvious way: gz is undexed by subsets of [n] (of suitable size), with
P
Ith entry (gz)I = H gH zI∪H . As above we may apply Theorem 13.5.1 to
conclude:
pmom,t = pmin for t ≥ n + dK .
{±1}n . Then the Lasserre relaxation of order t = 1 coincides with the basic
semidefinite relaxation sdp(G, w) introduced in Chapter 5.2.1 in relation
(5.6). This follows from the next lemma (to be shown in Exercise 13.8), since
the objective polynomial uses only monomials of even degree (in fact, only
quadratic monomials). Detailed information about the Lasserre hierarchy
for max-cut can be found in Laurent [2004].
Lemma 13.7.2 Assume p, g1 , . . . , gm are multilinear polynomials which
use only monomials of even degree. If t is even (resp., odd), then the optimum
value of the moment relaxation (13.38) remains unchanged if we replace the
matrices Mt±1 (z) and Mt−d
±1
j
(gj z) by their submatrices indexed by sets with
even (resp., odd) cardinality.
As in the binary case finite convergence holds: pmom,t = pmin if t ≥ n+dK .
The unconstrained case, asking to optimize a degree d polynomial p over
{±1}n , is of special interest and contains several problems such as max-cut
and satisfiability problems (recall Section 5.3) and it is equivalent to uncon-
strained binary polynomial optimization (by a linear change of variables).
Then a stronger finite convergence result can be shown: Fawzi, Saunderson
and Parrilo [2016] show that pmom,t = pmin for t ≥ dn/2e for the max-cut
problem; Sakaue et al. [2017] show that pmom,t = pmin for t ≥ d(n + d − 1)/2e
314 Polynomial optimization
when p has degree d and for t ≥ d(n + d − 2)/2e when in addition all mono-
mials in p have even degree (which recovers the case of max-cut). Moreover
these bounds have been shown to be tight (by Laurent [2003] for max-cut
and by Kurpisz et al. [2016], Sakaue et al. [2017] for general degree d).
where ui ∈ R(B) for all i and q ∈ I(h). In other words, it suffices to work in
R[x]/I(h) to deal with sums of squares, which leads to semidefinite programs
involving matrices indexed by (a subset of) B instead of the full monomial
set, and polynomial equations can be directly used to eliminate variables.
We only illustrate this on simple examples.
Example 13.7.3 In the binary case when h = {x2i − xi : i ∈ [n]}, or the
±1-case when h = {x2i − xi : i ∈ [n]}, the set B = {xI = i∈I xi : I ⊆ [n]} is
Q
a linear basis of R[x]/I(h) and we saw above that it indeed suffices to deal
with sums of squares of polynomials in R(B) modulo the ideal I(h).
Example 13.7.4 Assume K = {x ∈ R2 : 1 − x21 − x22 = 0} is the unit
circle. Then, the set B = {xi1 , xi1 x2 : i ∈ N} is a linear basis of the quotient
space R[x1 , x2 ]/(1−x21 −x22 ) and, in the moment relaxation (13.7) of order t,
one may impose the constraints yα − yα+2e1 − yα+2e2 = 0 for all |α| ≤ 2t − 2.
See Parrilo [2005] for a detailed discussion.
There is an interesting special case, when the polynomials h1 , . . . , hm0
have finitely many common real roots, i.e., |VR (h)| < ∞. Note that this
implies that the quadratic module M(g) + I(h) is Archimedean since the
13.7 Binary polynomial optimization 315
polynomial u := − m
P 0 2
l=1 hl belongs to I(h) and its level set {x : u(x) ≥ 0}
is compact (in fact, finite). Then it has been shown by Nie [2013] that
finite convergence holds: psos,t = pmom,t = pmin for some t ∈ N. In the
restricted case when the polynomials hl also have finitely many common
complex roots, i.e., |VC (h)| < ∞, one can give an explicit bound on the
order of finite convergence and reformulate the moment bound pmom,t in a
more economical way.
Indeed, as |VC (h)| < ∞ the quotient space R[x]/I(h) has finite dimension
(recall Theorem 12.1.8). Say N = dim R[x]/I(h) and let b1 = 1, b2 , . . . , bN
be polynomials whose cosets form a basis B of R[x]/I(h). For any i, j ∈ [N ]
there exist (unique) scalars λkij such that bi bj − N k
P
k=1 λij bk ∈ I(h). Finally,
N
given a vector z ∈ R indexed by B, define the matrix MB (z) indexed by
B, whose (i, j)th entry is
N
X
(MB (z))i,j = λkij zk for i, j ∈ [N ]
k=1
N
nX o
pmin = min pi zi : z1 = 1, MB (z) 0 .
i=1
Exercises
13.1 Show Lemma 13.4.1.
13.2 Let p be a homogeneous polynomial. Assume that p can be written as
p = s0 + s1 (1 − ni=1 x2i ) for some sums of squares polynomials s0 , s1 .
P
(a) Show that one can find scalars a, b ∈ R for which the extended
sequence ỹ = (y0 , y1 , . . . , y2s , a, b) satisfies:
satisfying
rankM (ỹ) = rankMs (y).
This shows the flat extension theorem (Theorem 13.4.3) in the univari-
ate case n = 1.
Mα,β = Mα−ei +ej ,β+ei −ej for all α, β ∈ Nnt and i, j ∈ [n]
such that αi , βj ≥ 1 and |α| = |β| = t.
(13.40)
Exercises 319
13.7 Consider the problem of computing pmin = inf x∈K p(x), where p = x1 x2
and
K = {x ∈ R2 : −x22 ≥ 0, 1 + x1 ≥ 0, 1 − x1 ≥ 0}.
(a) Show that, at order t = 1, pmom,1 = pmin = 0 and psos,1 = −∞.
(b) At order t = 2, what is the value of psos,2 ?
where we optimize over all d ∈ N, all unit vectors ψ ∈ Rd , and all matrix
tuples X ∈ (S d )n . When restricting to size d = 1 in the optimization we find
back the classical (commutative) polynomial optimization problem.
In this chapter, we explain how the moment/sums-of-squares approaches
of the previous chapters extend naturally to this noncommutative setting
14.1 Noncommutative polynomial optimization 321
Likewise, for a set T ⊆ Rhxi, we can define the truncated ideal at degree
2t, denoted by I2t (T ), as the vector space spanned by all polynomials ph ∈
Rhxi2t with h ∈ T :
I2t (T ) = span ph : p ∈ Rhxi, h ∈ T, deg(ph) ≤ 2t . (14.2)
Noncommutative polynomial optimization and applications to quantum information theory
322
We say that M(S) + I(T ) is Archimedean when there exists a scalar R > 0
such that
Xn
2
R − x2i ∈ M(S) + I(T ). (14.3)
i=1
and we set M (L) = M∞ (L). The following are easy verifications (please
check it!).
M(S). By the definition of M(S) this implies that, for all X ∈ D(S), we
have R2 I − Xi2 0 and thus kXi k ≤ R.
Proof The first two (left) inequalities are clear, since any feasible solution
(s)
L to the program defining the parameter pmom provides a feasible solution to
(t)
the program defining pmom whenever t < s ≤ ∞, simply by restricting L to
(∞)
the subspace Rhxi2t . We now check the right most inequality: pmom ≤ pnc min .
For this, pick a feasible solution (d, ψ, X) to the program (14.4) and con-
sider the linear functional L ∈ Rhxi∗ , defined by L(p) = ψ ∗ p(X)ψ for
p ∈ Rhxi. Then, L satisfies the desired properties: L(1) = ψ T ψ = 1 since ψ
is a unit vector; L is symmetric:
L(p∗ ) = ψ T p∗ (X)ψ = ψ T p(X)T ψ = ψ T p(X)ψ = L(p);
L is positive on M(S), since
L(p∗ p) = ψ T p∗ (X)p(X)ψ = ψ T p(X)T p(X)ψ ≥ 0
and
L(p∗ gp) = ψ T p∗ (X)g(X)p(X)ψ = ψ T p(X)T g(X)p(X)ψ ≥ 0
as g(X) 0 implies p(X)T g(X)p(X) 0. This shows: pnc mom,∞ ≤ L(p) =
T nc nc
ψ p(X)ψ, and thus the desired inequality pmom,∞ ≤ pmin ..
As shown in Corollary 14.1.3 the conditions on the linear functional L
(t)
in the definition of the parameter pmom can be expressed as a semidefinite
program. Add about the size.
We now see some finite and asymptotic convergence properties for the hier-
archy of bounds (pncmom,t )t .
3 Here too we use the same notation as in the commutative case, which setting is meant should
be clear from the context.
14.1 Noncommutative polynomial optimization 325
We also say that L is flat when (14.6) holds. Here we set dK = maxm j=1 dgj ,
where dg = ddeg(g)/2e for any polynomial g. If the problem is unconstrained
(i.e., no polynomial constraints gj (X) 0 are imposed) then we set dK = 1.
In the constrained case we may assume that dK ≥ 1.
Note that the Archimedean condition is not needed for the next result.
We consider the real vector space generated by the vectors u for all u ∈ hxit :
H = Span{u : u ∈ hxit } ⊆ Rr . Using the flatness condition (14.6) one can
show (check it!):4
Xi u = xi u for u ∈ hxit−dK
4 Make it an exercise
Noncommutative polynomial optimization and applications to quantum information theory
326
Proof To see that the map Xi is well defined we check that, given scalars
λu ,
X X
λu u = 0 =⇒ λu xi u = 0.
u∈hxit−dK u∈hxit−dK
P
For this, assume u∈hxit−dK λu u = 0. Then, for any v ∈ hxit−dK , we have:
X X X X
hv, λu xi ui = λu L(v ∗ xi u) = λu L((xi v)∗ u) = λu hxi v, ui
u∈hxit−dK u u u
X
= hxi v, λu ui = 0,
u
P
which shows u λu xi u = 0.
To show that Xi is Hermitian we check that hv, Xi ui = hXi v, ui for all
words u, v ∈ hxit−dK . Indeed, we have: hv, Xi ui = hv, xi ui = L(v ∗ xi u) and
hXi v, ui = hxi v, ui = L((xi v)∗ u) = L(v ∗ xi u).
Proof Use induction on the length of u. The claim is true for u = 1 by the
definition of ψ. Assume now that u = xi v where v ∈ hxit−1 . Then, v(X)ψ =
v by induction. Therefore, u(X)ψ = Xi v(X)ψ = Xi v = xi v = u.
Lemma 14.1.10 We have: L(w) = hψ, w(X)ψi for all w ∈ hxi2t . There-
fore, L(p) = hψ, p(X)ψi holds, and L(u∗ gj v) = hψ, u∗ (X)gj (X)v(X)ψi holds
for all u, v ∈ hxit−dK and j ∈ [m].
equals L(uv). The last claim follows since, by assumption, deg(p) ≤ 2t and
deg(u∗ gj v) ≤ 2(t − dK ) + deg(gj ) ≤ 2t.
where the two equalities follow using Lemmas 14.1.9 and 14.1.10. Finally,
∗
P P
u,v λu λv L(u gj v) = u,v λu λv (Mt−dK (gj L))u,v is nonnegative, since the
matrix Mt−dK (gj L) is positive semidefinite by assumption.
problem (14.4), so that we now have dM = 1. This implies that the entries of
Mt (L) can be bounded (as in (14.8)) for all feasible L for (14.5); in particular
the infimum is attained in (14.5). In the general case dM ≥ 2, one can only
bound the entries of a principal submatrix of Mt (L) (see Remark ?? below).
Lemma 14.1.13 For t ∈ N let Lt be a feasible solution for the program
(14.5) defining pnc
mom,t . There exists a converging subsequence of the sequence
(Lt )t , whose limit L ∈ Khxi∗ is feasible for the program defining pnc
mom,∞ .
L0t (w) = Lt (w) for |w| ≤ 2t, L0t (w) = 0 for |w| > 2t.
In view of relation (14.8), we have |L̃t (w)| ≤ 1 for all w. Hence, the sequence
(L̃t )t lies in the unit ball of the space Rhxi∗ . Using the Banach-Alaoglu
theorem, we know that this unit ball is compact for the weak *-topology.
This implies that the sequence (L̃t )t admits a a converging subsequence.
For simplicity, we use the same notation (L̃t )t to denote this subsequence
and (L0t )t for its unnormalized version. So there exists L̃ ∈ Khxi∗ such that
limt→∞ L̃t (w) = L̃(w) for all w ∈ hxi. After scaling L̃ we obtain L ∈ Khxi∗
defined by
L(w) = R|w| L̃(w) for all w ∈ hxi.
Then it follows that limt→∞ L0t (w) = L(w) for all w. From this one deduces
easily that L is nonnegative on M(S) with L(1) = 1 and thus L is feasible
for the program defining pnc
mom,∞ .
We can now complete the proof of Theorem 14.1.12. Consider the real
vector space H = Span(V ), spanned by the set V constructed in Lemma
14.1.15, and let H be its closure in `2 (N). Then H is a real separable Hilbert
space.
Define the linear map Xi : H → H by Xi u = xi u for all u ∈ hxi, and ex-
tending by linearity. It is well-defined and it can be extended to H by setting
Xi a = limk Xi ak if a = limk ak where ak ∈ H for all k. That this is well de-
fined follows from Lemma 14.1.4, because limk ak = 0 implies limk Xi ak = 0
as kXi ak k ≤ R2 kak k.
Each Xi is Hermitian, since hv, Xi ui = hXi v, ui (= L(v ∗ xi u)) for u, v ∈
hxi.
Finally, set ψ := 1. Then ψ is a unit vector since hψ, ψi = L(1) = 1.
As in Lemmas 14.1.9, 14.1.10, 14.1.11 one can check that w(X)ψ = w and
hψ, w(X)ψi = L(w) for all w ∈ hxi, which implies: hψ, p(X)ψi = L(p) and
gj (X) 0 for all j.
This shows that (H, ψ, X) is a feasible solution for problem (14.4) with
value L(p), which implies: L(p) ≥ pnc nc nc
min . Hence, pmom,∞ = L(p) = pmin and
Noncommutative polynomial optimization and applications to quantum information theory
330
out that the relaxation of order d + 1 is exact, since it admits a flat optimal
solution.
We will need the following simple result about positive semidefinite ma-
trices (Exercise ??).
Lemma 14.1.16 Consider a matrix in block-form:
A B
M= .
B∗ C
If M 0 then there exists a matrix Z such that B = AZ and C −Z ∗ AZ 0.
Theorem 14.1.17 Consider the NC polynomial optimization problem:
n
X
pnc
min = inf hψ, p(X)ψi s.t. g(X) := 1 − Xi2 0 (14.9)
H,ψ,X
i=1
Note that the implications (i) =⇒ (ii) and (iii) =⇒ (i) are straightforward.
We only need to prove that (ii) =⇒ (iii). For this assume that (ii) holds and,
for contradiction, p 6∈ M(g)d+1 , setting g = 1 − i x2i .
P
Remark 1: Theorems 14.1.17 and 14.1.18 also hold when replacing the ball
P
(defined by I − i Xi 0) by the hypercube (defined by the inequalities
I − Xi2 0 for i ∈ [n]).
Remark 2: The result of Theorem 4 is not true when restricting to polyno-
mials in commutative variables. For this consider the polynomial (in com-
mutative variables):
p (x1 , x2 .x3 ) = M (x1 , x2 , x3 ) + (x61 + x62 + x63 ),
where M is the homogenized Motzkin polynomial:
M (x1 , x2 , x3 ) = x21 x42 + x41 x22 − 3x21 x22 x23 + x63 .
Then, for > 0, p > 0 on R3 \ {0}. Moreover, M = lim→0 p and M is
not a sum of squares. Since the cone of sums of squares of a given degree is
closed, this implies that p is not a sum of squares for some > 0. Then, for
the minimum p,min of p over the unit ball {x : x21 + x22 + x23 ≤ 1}, we have
that
p(t) (t)
,sos = p,mom < p,min = 0,
pnc 2
min = min hψ, p(X)ψi s.t. Xi = I (i ∈ [n]) (14.12)
H,ψ,X
p(1) 2
mom = min ∗ L(p) s.t. L(1) = 1, L(xi ) = 1 (i ∈ [n]), M1 (L) 0.
L∈Rhxi2
(14.13)
The following result holds.
The proof follows the following steps. As M1 (L) 0 there exist vectors
v0 , v1 , . . . , vn ∈ Rn+1 forming a Gram representation of M1 (L), i.e.,
n
X
p(1)
mom = minn pij yij s.t. Y 0, yii = 1 (i ∈ [n]).
Y ∈S
i,j=1
[1] M. Anjos and J.B. Lasserre (eds.) Handbook on Semidefinite, Conic and Poly-
nomial Optimization, (Anjos and Lasserre, eds.), Springer, 2012.
[2] J.-B. Lasserre. Global optimization with polynomials and the roblem of mo-
ments. SIAM J. Opt., 11:796–817, 2001.
[3] J.B. Lasserre. Moments, positive polynomials and their applications, Imperial
College Press, 2009.
[4] M. Laurent. Sums of squares, moment matrices and optimization over poly-
nomials. In Emerging Applications of Algebraic Geometry, Vol. 149 of IMA
Volumes in Mathematics and its Applications, M. Putinar and S. Sullivant
(eds.), Springer, pages 157-270, 2009.
[5] P. Parrilo. Structured semidefinite programs and semialgebraic geometry
methods in robustness and optimization. PhD thesis, Caltech, 2000.
[6] P. Parrilo. Semidefinite programming relaxations for semialgebraic problems.
Math. Prog. 96:293–320, 2003.
[7] S. Burgdorf, I. Klep, J. Povh. Optimization of Polynomials in Non-
Commuting Variables. Springer Briefs in Mathematics, 2016.
[8] K. Cafuta, I. Klep, J. Povh. Constrained polynomial optimization problems
with noncommuting variables. SIAM J. Opt., 22:363–383, 2012.
[9] A.C. Doherty, Y.-C. Liang, B. Toner and S. Wehner. The quantum moment
problem and bounds on entangled multi-prover games. Proc. IEEE CCC,
2008.
[10] J.W. Helton, S.A. McCullough. A Positivstellensatz for non-commutative
polynomials. Trans. Amer. Math. Soc. 356:3721–3737, 2004.
[11] I. Klep, J. Povh. Constrained trace-optimization of polynomials in freely non-
commuting variables. J. Global Opt., 64:325–348, 2016.
[12] I. Klep, M. Schweighofer. Connes’ embedding conjecture and sums of Hermi-
tian squares. Adv. Math. 217:1816–1837, 2008.
[13] S. Pironio, M. Navascues, A. Acin. Convergent relaxations of polynomial
optimization problems with non-commuting variables. SIAM J. Optimization,
20:2157–2180, 2010.
[14] M. Navascues, S. Pironio, A. Acin. SDP relaxations for non-commutative
polynomial optimization. Chapter 21 in Handbook on Semidefinite, Conic
and Polynomial Optimization, (Anjos and Lasserre, eds.), Springer, 2012.
15
Symmetries
where <(X) ∈ Rn×n and =(X) ∈ Rn×n are the real, respectively, the imagi-
nary parts of X. Then X is Hermitian and positive semidefinite if and only
if X 0 is symmetric and positive semidefinite.
(g, X) 7→ gX = π(g)Xπ(g)∗ .
and
g(X ∗ ) = π(g)X ∗ π(g)∗ = (π(g)Xπ(g)∗ )∗ = (gX)∗ = X ∗ .
p∗ = sup{hC, XiT : x1 , . . . , xN ∈ C,
x1 ϕ(B1 ) + · · · + xN ϕ(BN ) 0,
X = x1 B1 + · · · + xN BN ,
hA1 , XiT = b1 , . . . , hAm , XiT = bm }.
Thus, instead of dealing with one (potentially big) matrix of size n × n one
only has to work with d (hopefully small) block diagonal matrices of size
m1 , . . . , md . This reduces the dimension from n2 to m21 + · · · + m2d . Many
practical semidefinite programming solvers can take advantage of this block
structure and numerical calculations can become much faster. However, find-
ing an explicit ∗-isomorphism is usually a nontrivial task, especially if one
is interested in parameterized families of matrix ∗-algebras.
Indeed,
X
(Br χa )x = (Br )x,y (χa )y
y∈Fn
2
X
= (Br )x,y (χa )y−x (χa )x
y∈Fn
2
X
= (χa )y−x (χa )x
y∈Fn
2 ,dH (x,y)=r
X
= (χa )y (χa )x .
y∈Fn
2 ,dH (0,y)=r
through
X
(χa )y = Kr(n,2) (dH (0, a)).
y∈Fn
2 ,dH (0,y)=r
(so m0 = · · · = mn = 1) defined by
ϕ(Br ) = (Kr(n,2) (0), Kr(n,2) (1), . . . , Kr(n,2) (n)).
15.2 Fourier analysis on finite abelian groups 341
This is the Delsarte Linear Programming Bound; see also Theorem ??.
which is an undirected graph whose vertices are the elements of Z/nZ and
where Σ defines the neighborhood of the neutral element 0; this neigh-
borhood is then transported to every vertex by group translations. Since
Σ = −Σ, the definition is consistent, and since 0 ∈/ Σ, the Cayley graph
342 Symmetries
does not have loops. For example, the n-cycle can be represented as a Cay-
ley graph:
Cn = Cayley(Z/nZ, Σ) with Σ = {1, −1}.
The goal in this section is to show that the computation of the theta
number ϑ0e (Cayley(Z/nZ, Σ)) with unit weights e = (1, . . . , 1) reduces from a
semidefinite program to a linear program if one works in the Fourier domain.
For this we need the characters of Z/nZ, which are group homomorphisms
χ : Z/nZ → T, where T is the unit circle in the complex plane. So every
character χ satisfies
χ(x + y) = χ(x)χ(y)
for all x, y ∈ Z/nZ.
The characters themselves form a group with the operation of pointwise
multiplication (χψ)(x) = χ(x)ψ(x); this is the dual group (Z/nZ)∗ of Z/nZ.
The trivial character e of Z/nZ defined by e(x) = 1 for all x ∈ Z/nZ is the
unit element. Moreover, if χ is a character, then its inverse is its complex
conjugate χ that is such that χ(x) = χ(x) for all x ∈ Z/nZ. We often view
characters as vectors in the vector space CZ/nZ .
Lemma 15.2.1 Let χ and ψ be characters of Z/nZ. Then the following
orthogonality relation holds:
(
∗
X |Z/nZ| if χ = ψ,
χ ψ= χ(x)ψ(x) =
x∈Z/nZ
0 otherwise.
Proof If χ = ψ, then,
X X
χ∗ χ = χ(x)χ(x) = 1 = |Z/nZ|
x∈Z/nZ x∈Z/nZ
so χ∗ ψ has to be zero.
As a corollary we can explicitly give all characters of Z/nZ and see that
they form an orthogonal basis of CZ/nZ . It follows that the dual group
(Z/nZ)∗ is isomorphic to Z/nZ.
15.2 Fourier analysis on finite abelian groups 343
χu (x) = e2πiux/n .
The map u 7→ χu is a group isomorphism between Z/nZ and its dual group
(Z/nZ)∗ .
1 X
fˆ(χ) = f (x)χ−1 (x)
|Z/nZ|
x∈Z/nZ
is the discrete Fourier transform of f ; the coefficients fˆ(χ) are called the
Fourier coefficients of f . We have then the Fourier inversion formula:
X
f (x) = fˆ(χ)χ(x).
χ∈(Z/nZ)∗
which is Z/nZ-invariant.
So we can translate problem (??) into (15.4). The objective function and
the constraint on nonedges translate easily. The positive-semidefiniteness
constraint requires a bit more work.
First, observe that to require K to be real and symmetric is to require f
to be real and such that f (x) = f (−x) for all x ∈ Z/nZ. We claim that each
character χ of Z/nZ gives an eigenvector of K with eigenvalue |Z/nZ|fˆ(χ).
Indeed, using the inversion formula we have
X X
(Kχ)(x) = K(x, y)χ(y) = f (x − y)χ(y)
y∈Z/nZ y∈Z/nZ
X X
= fˆ(ψ)ψ(x − y)χ(y)
y∈Z/nZ ψ∈(Z/nZ)∗
X X
= fˆ(ψ) ψ(y)χ(x − y)
ψ∈(Z/nZ)∗ y∈Z/nZ
X X
= fˆ(ψ)χ(x) ψ(y)χ(y)
ψ∈(Z/nZ)∗ y∈Z/nZ
= |Z/nZ|fˆ(χ)χ(x),
as claimed.
This immediately implies that K is positive semidefinite — or, equiva-
lently, f is of positive type — if and only if fˆ(χ) ≥ 0 for all characters χ.
Now, since fˆ(e) = |Z/nZ|−1 x∈Z/nZ f (x), and since e is an eigenvalue of K,
P
15.3 Fourier analysis of finite Nonabelian groups 345
Cayley graphs on the cyclic group are not particularly exciting. Every-
thing in this section, however, can be straightforwardly applied to any finite
Abelian group. If, for instance, one considers the group Zn2 , then it becomes
possible to model binary codes as independent sets of Cayley graphs, and the
analogue of Theorem 15.2.3 gives Delsarte’s linear programming bound [? ].
The inner product used here is the trace inner product, defined as hA, Bi =
Tr(B ∗ A) for square complex matrices A and B of the same dimension, where
B ∗ denotes the conjugate-transpose of B.
The convolution of two functions f : Γ → C and g : Γ → C is defined by
X
f ∗ g(γ) = f (β)g(β −1 γ),
β∈Γ
for all functions g : Γ → C; that is, the sum is a nonnegative real number.
We denote by P(Γ) the set of functions on Γ of positive type. Notice that
f ∈ P(Γ) if and only if f¯ ∈ P(Γ), where f¯ is the pointwise complex-conjugate
of f . One fact that will be needed later is that f (γ −1 ) = f (γ) for all γ ∈ Γ
when f is of positive type. For a proof of this fact and more information on
functions of positive type, see Folland [? , Chapter 3.3].
For vectors u, v ∈ Cn , we use hu, vi to denote the usual inner product
of u and v. An n × n matrix A with entries from C will be called positive
semidefinite if hAv, vi is a nonnegative real number for all v ∈ Cn . Using the
polarization identity, it is possible to prove that every positive semidefinite
matrix is Hermitian. For each finite set V , the set of positive semidefinite
matrices with rows and columns indexed on V will be denoted S+ V . When
n
V = {1, . . . , n}, we will use the notation S+ instead. It is a fact that A ∈ S+ n
if and only if hA, Bi ≥ 0 for all B ∈ S+ n ; this fact is known as the self-duality
n
of S+ .
When G is the Cayley graph Cayley(Γ, X), the optimization over matrices
in (A) can be replaced with optimization over functions on Γ, as we proceed
to show.
348 Symmetries
Proof of Theorem 15.3.2 For one direction, let A be a feasible solution for
(A). Define Ā : Γ × Γ → R entrywise by
1 X
Ā(γ, γ 0 ) = A(γβ, γ 0 β).
|Γ|
β∈Γ
where 1 ∈ Γ
b denotes the trivial representation.
Then {Āπ : π ∈ Γ}
b is again a solution to (C): If x ∈ X, then
X 1 X X
dπ hĀπ , π(x)i = dπ hπ(γ)Aπ π(γ)∗ , π(x)i
|Γ|
π∈Γ
b π∈Γ
b γ∈Γ
1 X X
= dπ hπ(γ)Aπ , π(xγ)i.
|Γ|
π∈Γ
b γ∈Γ
Moreover, since π(γ)Aπ π(γ)∗ is similar to Aπ for each γ ∈ Γ, the matrix Āπ
is positive semidefinite for each π ∈ Γb and P b dπ Tr(Āπ ) = |Γ|.
π∈Γ
We have constructed Āπ so that Āπ π(γ) = π(γ)Āπ for all γ ∈ Γ. Schur’s
lemma then implies that Āπ is equal to aπ Idπ for some scalar aπ and since Āπ
is positive semidefinite this scalar is nonnegative. We have dπ aπ = Tr(Āπ )
as well as
hĀπ , π(γ)i = aπ χπ (γ) for all γ ∈ Γ,
so {aπ : π ∈ Γ}
b is a feasible solution to (D) having objective value aid = Aid .
For the other direction, we take a feasible solution {aπ : π ∈ Γ} b to (D),
and for each π ∈ Γ, we set Aπ = aπ Idπ . This is a feasible solution to (C)
b
with objective value Aid = aid .
P
Denote the constraint π∈Γb dπ aπ χπ (x) = 0 by Cx (x ∈ X). For compu-
tational purposes, the following simplifications can be applied to (D): First,
only one of the constraints {Cx , Cx−1 } is needed. Second, since the characters
χπ are constant on conjugacy classes, it suffices to keep only the constraints
Cx , with one x per conjugacy class.
Schreier graphs
Theorem 15.3.6 Let G = (V, E) be a graph and let Γ be a group of
automorphisms of G. Suppose Γ acts transitively on V . Then there exists a
connection set X ⊆ Γ such that
|V |
α(G) = |Γ| α(Cayley(Γ, X)).
Exercises
15.1 Challenge: Given a natural number n and a set Σ ⊆ Z/nZ which is
closed under taking inverses, Σ = −Σ. Find a formula for ϑ(Cayley(Z/nZ, Σ).
Appendix A
Convexity (Version: May 24, 2022)
A set C is called convex if, given any two points x and y in C, the straight
line segment connecting x and y lies completely inside of C. For instance,
cubes, balls or ellipsoids are convex sets whereas a torus is not. Intuitively,
convex sets do not have holes or dips. A real-valued function f : C → R is
called convex, if the set of points (x, y) ∈ C × R which lie above the graph
of the function (x, f (x)), the epigraph of f , is a convex set. Linear functions
are convex, also the quadratic function f (x) = x2 is.
Usually, arguments involving convex sets are easy to visualize by two-dim-
ensional drawings. One reason being that the definition of convexity only
involves three points which always lie in some two-dimensional plane. On
the other hand, convexity is a very powerful concept which appears (some-
times unexpected) in many branches of mathematics and its applications.
Here are a few areas where convexity is an important concept: mathemat-
ical optimization, high-dimensional geometry, analysis, probability theory,
system and control, harmonic analysis, calculus of variations, game theory,
computer science, functional analysis, economics, and there are many more.
Our concern is mathematical optimization and especially convex opti-
mization. Geometrically, solving a convex optimization problem amounts to
finding the minimum of a given convex function in a given convex set. One
attractive property of convex optimization problems is that it suffices to find
local minima because every local minimum is already a global minimum.
We want to solve convex optimization problems in an algorithmically ef-
ficient way. We want to use a computer having (very) limited resources of
time and space. So we have to work with convex sets and convex func-
tions algorithmically. Here we discuss possible ways to represent them in
the computer, in particular we discuss which data do we have to give to
the computer. Roughly speaking, there are two convenient possibilities to
354 Convexity (Version: May 24, 2022)
0 w
T : E → Rn defined by T v = x = (x1 , . . . , xn )T
Then the inner product is the usual one, the norm is the Euclidean norm
(or `2 -norm), and the metric is the Euclidean distance:
n
X
v · w = xT y = xi yi
i=1
n
!1/2
X
kvk = kxk2 = x2i
i=1
d(v, w) = kv − wk = kx − yk2 .
A.1.2 Topology
The n-dimensional (open) ball with center x ∈ Rn and radius r is
A = A ∪ ∂A and ∂A = A \ int A.
∂B(0, 1) = {x ∈ Rn : xT x = 1}.
We denote the unit sphere by S n−1 , where the superscript indicates the
dimension of the manifold.
N
X N
X
y= αi xi with αi = 1.
i=1 i=1
A.1 Preliminaries and conventions 357
x3
x1 x2
Figure A.2 The barycenter of the three vertices of the triangle, which are
affinely independent points, has barycentric coordinates (1/3, 1/3, 1/3).
Points which are not affinely independent, are called affinely dependent.
If x1 , . . . , xN are affinely independent and if y is an affine linear combina-
tion of these N points, then the coefficients α1 , . . . , αN in the affine linear
combination are uniquely defined. They give the barycentric coordinates 1 of
the point y in the coordinate system given by x1 , . . . , xN . The (affine) di-
mension of a set is the largest integer N − 1 so that one can find N affinely
independent points in the set. For example, Rn has dimension n and the
empty set has dimension −1. A set in Rn that has dimension n is also called
full-dimensional.
One should keep in mind that the affine dimension is a rather naive notion.
For example, the unit sphere S n−1 has affine dimension n but it is an (n−1)-
dimensional submanifold of Rn . Nevertheless, the affine dimension will be
very useful when considering convex sets.
A subset A ⊆ Rn is called an affine subspace of Rn if is closed under
taking affine linear combinations. The (by inclusion) smallest affine subspace
containing a given set of given is its affine hull. Equivalently, it is the set of
1 August Ferdinand Möbius introduced barycentric coordinates in his work “Der barycentrische
Calcul, ein neues Hülfsmittel zur analytischen Behandlung der Geometrie” published in 1827.
358 Convexity (Version: May 24, 2022)
A = x + L = {x + y : y ∈ L}
N
X N
X
y= αi xi with αi = 1.
i=1 i=1
A.2 Convex sets and convex functions 359
The convex hull of two points is the line segment between them: [x, y] =
conv{x, y}.
The convex hull of finitely many points is called a polytope. Two-dimensional,
planar, polytopes are polygons. Other important examples of convex sets
are balls, affine subspaces, halfspaces, and line segments. Furthermore, ar-
bitrary intersections of convex sets are convex again. The Minkowski sum of
two convex sets C and D, given by
C + D = {x + y : x ∈ C, y ∈ D},
is a convex set.
Suppose we are given a point y lying in the convex hull of a set A. How
many points of A have to be used for a convex combination of y? An answer
is given by a theorem of Carathéodory, originally stated in a paper published
in 1911. In fact, Hilbert used and proved a special case of Carathéodory’s
theorem already in a paper from 1888. This paper also plays an important
role in Chapter 11.3. The argument below can be traced back to Hilbert.
N ≤ dim A + 1 ≤ n + 1.
360 Convexity (Version: May 24, 2022)
N N
X X α1 α1
γi = αi − βi = (1 − α1 ) + β1 = 1,
β1 β1
i=2 i=2
Here are two useful properties of convex sets. The first one, together
with the general observation that sets whose interior is not empty are full-
dimensional, shows that a convex set is full-dimensional if and only if its
interior is not empty. This implies that if a convex set does not have interior
points, then we can pass to its affine hull, so that in this new ambient space
the convex set obtains interior points.
holds; then (A.6) follows immediately. By the mean value theorem there are
ξ1 ∈ (x, y) and ξ2 ∈ (y, z) so that
f (y) − f (x) f (z) − f (y)
f 0 (ξ1 ) = and f 0 (ξ2 ) =
y−x z−y
hold. Suppose f is convex, then this together with inequality (A.6) implies
that f 0 is monotonically increasing in C, and so the second derivative f 00 (x)
is nonnegative for every x ∈ C. Conversely, suppose that f 00 (x) ≥ 0 for all
x ∈ C. Then f 0 is monotonically increasing and (A.6) is fulfilled, so f is
convex.
Now the case n > 1: A multivariate function f is convex if and only if for
all x ∈ C and v ∈ Rn the univariate function gx,v (α) = f (x + αv) is convex
for all α so that x + αv ∈ C. Setting v = y − x we get
gx,v (α) = f ((1 − α)x + αy).
We apply the chain rule to find the first and second derivative of gx,v :
n
0
X ∂f
gx,v (α) = (x + αv)vi
∂xi
i=1
n Xn
00
X ∂2f
gx,v (α) = (x + αv)vi vj .
∂xi ∂xj
i=1 j=1
Setting α = 0, we get
00
gx,v (0) = v T ∇2 f (x)v with v = (v1 , . . . , vn )T .
00 (0) is nonnegative for all v if and only if the Hessian matrix ∇2 f (x)
So gx,v
is positive semidefinite.
Whereas convexity of a C 2 -function f can be recognized by the positive
semidefiniteness of its Hessian matrix, strict convexity of a C 2 -function can
be seen from the positive definiteness of ∇2 f ; the next corollary follows by
obvious modifications of the proof of Theorem A.2.4.
Corollary A.2.5 Let C ⊆ Rn be an open convex set and let f : C → R be
a function which is twice continuously differentiable. It is a strictly convex
function on C if and only if its Hessian matrix is positive definite for all
x ∈ C.
Using the corollary one sees immediately that the exponential function
f (x) = ex is strictly convex on the real line. From this one deduces the
inequality between the arithmetic mean and the geometric mean, the AM-
GM inequality for short.
364 Convexity (Version: May 24, 2022)
n
!1/n n
Y 1X
xi ≤ xi ,
n
i=1 i=1
n
!1/n n
!1/n n
!1/n
Y Y Y
xi + yi ≤ (xi + yi ) ,
i=1 i=1 i=1
where we have the case of equality if and only if the two vectors (x1 , . . . , xn )
and (y1 , . . . , yn ) are linearly dependent.
A.3 Metric projection 365
which is (1 − α)f (x) + αf (y). Also, the supplement about strict concavity
follows from the previous corollary.
exists and is unique is very intuitive, see Figure A.3; in the case when C
is a linear subspace we are talking simply about the orthogonal projection
onto C. We give a proof of this based on the parallelogram law (A.2). This
proof does not only work for Rn , but also for arbitrary, potentially infinite
dimensional, Hilbert spaces.
In the next section we will apply the metric projection for constructing
separating and supporting hyperplanes.
Lemma A.3.1 Let C be a nonempty closed convex set in Rn . Let z ∈
Rn be a point. Then there is a unique point y in C which is closest to z.
Additionally, the vectors z − y and x − y form an obtuse angle whenever
x ∈ C:
(z − y)T (x − y) ≤ 0 for all x ∈ C. (A.7)
Moreover, if z lies outside of C, then y lies on ∂C, the boundary of C.
· y
C x
α
(z − y)T (x − y) ≤ ky − xk2 .
2
and together
(−z + y − y 0 )T (y − y 0 ) ≤ 0.
368 Convexity (Version: May 24, 2022)
ky − y 0 k2 = (y − y 0 )T (y − y 0 )
= (−z + y − y 0 )T (y − y 0 ) + z T (y − y 0 )
≤ z T (y − y 0 )
= z T z + z T (−z + y − y 0 )
= z T z + (z − y + y 0 )T (−z + y − y 0 ) + (y − y 0 )T (−z + y − y 0 )
≤ kzk2 − kz − y + y 0 k2
≤ kzk2 ,
Proof First note that one can assume that C is bounded (since otherwise
replace C by its intersection with a ball of radius 1 around y). Since C is
bounded it is contained in a ball B of sufficiently large radius.
We will construct the desired point z which lies on the boundary ∂B by
a limit argument. For this choose a sequence of points yi ∈ Rn \ C such that
ky−yi k < 1/i. Because the metric projection is a contraction (Lemma A.3.2)
we have
Since C is convex, one of the two points of the line aff{yi , πC (yi )} intersected
with the boundary ∂B is a point zi ∈ ∂B so that πC (zi ) = πC (yi ). Since ∂B
is compact, there is a convergent subsequence (zij ) having a limit z ∈ ∂B.
Then we have, because πC is continuous,
y = πC (y) = πC lim yij = lim πC (yij )
j→∞j→∞
= lim πC (zij ) = πC lim zij = πC (z),
j→∞ j→∞
H
c
x
H+
δ
kck
0
H−
H
A c
x
Proof We prove (i) and then (ii) follows by the same argument. Let y =
πC (z) and consider the hyperplane
H = {x ∈ Rn : cT x = δ} with c = z − y, δ = cT y.
One can generalize Lemma A.4.1 (i) and remove the assumption that C
is closed.
Proof In view of Lemma A.4.1 we only have to show the statement for
convex sets C which are not closed.
First we argue that with C also its topological closure C is convex: Let
x, y ∈ C and let (1 − α)x + αy with 0 ≤ α ≤ 1 be a point in the line segment
[x, y]. There are sequences (xi )i∈N , respectively (yi )i∈N , of points in C which
converge to x, respectively to y. Then (1 − α)xi + αyi , which lies in C, goes
to (1 − α)x + αy when i tends to infinity. Hence, (1 − α)x + αy ∈ C.
For proving the lemma we are left with two cases: If z 6∈ C, then a hy-
perplane separating {z} and the closed and convex set C also separates {z}
and C. If z ∈ C, then z ∈ ∂C as z lies outside C. By Lemma A.4.3 there is
a hyperplane supporting C at z. In particular, it separates {z} and C.
C − D = {x − y : x ∈ C, y ∈ D}
so δ = cT y = cT z and y, z ∈ Hc,δ .
We proceed to show sufficiency. Let x be an interior point of C. Clearly,
the minimal face FC (x) is contained in C. For the reverse inclusion, consider
a point y ∈ C distinct from x. Since x ∈ int C, Lemma A.2.3 guarantees
that there exists a point z ∈ C and α ∈ (0, 1) so that x = (1 − α)y + αz.
Hence, y ∈ FC (x) because FC (x) is a face.
(ii) follows from (i) by considering the affine hull of FC (x).
We say that a point x ∈ C is an extreme point of C if FC = {x}, that is,
if it is not a relative interior point of any line segment in C. In other words,
if x cannot be written in the form x = (1 − α)y + αz with distinct points
y, z ∈ C and 0 < α < 1. The set of all extreme points of C we denote by
ext C.
Theorem A.5.2 (Minkowski) Let C ⊆ Rn be a compact and convex set.
Then,
C = conv(ext C).
Proof We may assume that the interior of C is not empty by considering
the affine hull of C. We prove the theorem by induction on the dimension
n.
If n = 0, then C is a point and the result follows.
Let the dimension n be at least one. We have to show that every x ∈ C
can be written as the convex hull of extreme points of C. We distinguish
between two cases:
First case: If x lies on the boundary of C, then by Lemma A.4.3 there is a
supporting hyperplane H of C through x. Consider the set F = H ∩ C. This
is a compact and convex set which lies in an affine subspace of dimension at
most n − 1 and hence we have by the induction hypotheses x ∈ conv(ext F ).
Since ext F ⊆ ext C, we are done.
Second case: If x does not lie on the boundary of C, then the intersection
of a line through x with C is a line segment [y, z] with y, z ∈ ∂C. By
the previous argument we have y, z ∈ conv(ext C). Since x is a convex
combination of y and z, the theorem follows.
For example one can specify the set of feasible solutions explicitly by finitely
many constraints
C = {x ∈ Rn : f1 (x) ≤ 0, . . . , fm (x) ≤ 0},
where f1 , . . . , fm are convex functions.
Convex optimization has the attractive feature that every local minimizer
is at the same time a global minimizer: A local minimizer is a point x ∈ C,
a feasible solution, having the property that there is a positive ε so that
f (x) = inf{f (y) : y ∈ C and kx − yk ≤ ε},
and a global minimizer or optimal solution is a point x ∈ C such that
f (x) ≤ f (y) for all y ∈ C.
To see that local optimality implies global optimality assume that x is a
local but not a global minimizer. Then there is a feasible solution z so that
f (z) < f (x). Clearly, kx − zk > ε. Define y, which lies on the line segment
[x, z] by setting
ε
y = (1 − α)x + αz, α = ,
kx − zk
which is a feasible solution because of the convexity of C. Then, kx − yk = ε
and by the convexity of the function f inequality
f (y) ≤ (1 − α)f (x) + αf (z) < f (x)
holds. This contradicts the fact that x is a local minimizer.
One way to solve convex optimization problems is by constructing a min-
imizing sequence x0 ∈ C, x1 ∈ C, . . . with f (x0 ) ≥ f (x1 ) ≥ . . ., which
converges to a local optimum. The computational efficiency of these kind
of methods is determined by how efficient one can evaluate f and by how
efficient one can represent the set C. In particular, the computational com-
plexity of deciding that an intermediate step xi stays in C plays a decisive
role, as discussed in Chapter 3.
Extreme points are important for convex optimization. Suppose the ob-
jective function f is an affine function and suppose that the set of feasible
solutions is a compact and convex set C. Then, one can always find an ex-
treme point of C as a global minimum: Let x ∈ C be a global minimizer. By
Theorem A.5.2 we can write x as a convex combination of extreme points
x1 , . . . , xN ∈ ext C
N
X N
X
x= αi xi with α1 , . . . , αN > 0, αi = 1.
i=1 i=1
A.7 Polytopes and polyhedra 375
Then
N
X N
X
f (x) = αi f (xi ) ≥ αi f (x) = f (x),
i=1 i=1
P = conv{x1 , . . . , xN }.
It is easy to see that the extreme points of P consists of the minimal subset
of {x1 , . . . , xN } so that its convex hull equals P . Often, the extreme points
of a polytope P are called the vertices of P .
If a set P ⊆ Rn is given as an intersection of finitely many halfspaces,
then it is called a polyhedron, i.e. if there is a matrix A ∈ Rm×n and a vector
b ∈ Rm so that
P = {x ∈ Rn : Ax ≤ b}
Sometimes the equalities which are used to define Az are called the at z
active inequalities of the system Ax ≤ b.
Az c = 0. Since aT
i z < bi for all rows of A which do not belong to Az , there
is a δ > 0 with
aT
i (z + δc) ≤ bi and aT
i (z − δc) ≤ bi .
bi = aT
i z
= aT
i (αx + (1 − α)y)
= αaT T
i x + (1 − α)ai y
≤ αbi + (1 − α)bi = bi .
Also aT T
i x = ai y = bi , since α ∈ (0, 1). Because rank Az = n, the linear
T
system ai w = bi , which consists of rows of Az has a unique solution which
means x = z = y and z is a vertex of P .
A graph G = (V, E) is called bipartite if one can partition the vertex set
V = U ∪ W so that every edge e ∈ E contains exactly one vertex from U
and one from W : |e ∩ U | = |e ∩ W | = 1.
378 Convexity (Version: May 24, 2022)
P (G) = {x ∈ RE : x ≥ 0, Ax = e},
Furthermore, the extreme points of P (G) are exactly the incidence vectors
of perfect matchings.
Proof Let B be a t×t submatrix of A. We shall show that det B = −1, 0, +1.
The proof is by induction. The base case t = 1 is trivial. For t > 1 we consider
three cases:
Thus, det B = ± det B 0 and by the induction hypothesis, det B 0 = −1, 0, +1.
(iii) All columns of B contain exactly two 1s.
Since G is bipartite with bipartition V = U ∪ W , we can permute the
0
rows of B so that we get the matrix BB00 , where the row indices of B 0
belong to U and the row indices of B 00 belong to W . Now every column
of B 0 and every column of B 00 contains exactly one 1. Summing up the
rows of B 0 gives the all-ones vector (1, . . . , 1). The same happens when
summing up the rows of B 00 . Hence, the rows of B are linearly dependent
and det B = 0.
A.7 Polytopes and polyhedra 379
and n2 edges
E = {{ui , wj } : i, j = 1, . . . , n}
One can identify the spaces RE and Rn×n . Using this identification we see
that the linear conditions of the perfect matching polytope P (Kn,n ) exactly
describe the Birkhoff polytope DSn .
Because every perfect matchings in Kn,n is of the form
M = {{ui , wπ(i) } : i = 1, . . . , n}
for some permutation π ∈ Sn , there is a one-to-one correspondence between
the perfect matchings in Kn,n and the set of permutations Sn . These obser-
vations together with Theorem A.7.3 prove the theorem.
überhaupt der Begriff des konvexen Körpers ein fundamentaler Begriff in unserer
Wissenschaft ist und zu deren fruchtbarsten Forschungsmitteln gehört.
Ein konvexer (nirgends konkaver) Körper ist nach Minkowski als ein solcher
Körper definiert, der die Eigenschaft hat, daß, wenn man zwei seiner Punkte
in Auge faßt, auch die ganze geradlinige Strecke zwischen denselben zu dem
Körper gehört.3
Until the end of the 1940s convex geometry was a small discipline in pure
mathematics. This changed dramatically when the breakthrough of general
linear programming came during and shortly after World War II. Leonid
Kantorovich (1912–1986), John von Neumann, and George Dantzig (1914–
2005) are the founding fathers of the theory of linear programming. Nowa-
days, convex geometry is an important toolbox for researchers, algorithm
designers and practitioners in mathematical optimization.
Two very good books which emphasize the relation between convex ge-
ometry and optimization are by Barvinok [2002] and by Gruber [2007]. Less
optimization but more convex geometry is discussed in the encyclopedic
book by Schneider [1993]. Somewhat exceptional, and fun to read, is Chap-
ter VII in the book of Berger [2010] where he gives a panoramic view on
the concept of convexity and its many relations to modern higher geometry.
One should also mention the classical study on convex analysis by Rockafel-
lar [1970]. Boyd and Vandenberghe [2004] provide an excellent starting point
for learning about convex optimization. For more on polytopes we advise to
take a look at the lecture on polytopes by Ziegler [1995].
Exercises
A.1 Let A = {x1 , . . . , xn+2 } be a set containing n + 2 points in Rn .
(a) Show: One can partition A into two sets A1 and A2 such that their
convex hulls intersect: conv A1 ∩ conv A2 6= ∅.
(b) Show: If any proper subset of A is affinely independent, then there
is exactly one possible choice for the partition in (a).
3 It is not easy to translate Hilbert’s praise into English without losing its poetic tone, but here
is an attempt. This proof of a deep theorem in number theory contains little calculation.
Using chiefly geometry, it is a gem of Minkowski’s mathematical craft. With a generalization
to forms having n variables Minkowski’s proof lead to an upper bound M which is more
natural and also much smaller than the bound due to Hermite. More important than the
result itself was his insight, namely that the only salient features of ellipsoids used in the
proof were that ellipsoids are convex and have a center, thereby showing that the proof could
be immediately generalized to arbitrary convex bodies having a center. This circumstances
led Minkowski for the first time to the insight that the notion of a convex body is a
fundamental and very fruitful notion in our scientific investigations ever since.
Minkowski defines a convex (nowhere concave) body as one having the property that, when
one looks at two of its points, the straight line segment joining them entirely belongs to the
body.
382 Convexity (Version: May 24, 2022)
(c) Give an example which shows that the statement (a) is wrong when
A only contains n + 1 points.
A.2 Prove the Gauss-Lucas theorem4 : Let f be a complex polynomial in
one variable and let z1 , . . . , zn ∈ C be the roots of f , i.e.
f (z) = (z − z1 )(z − z2 ) · · · (z − zn ).
Show that every root of the derivative f 0 lies in the convex hull of
z1 , . . . , zn where one interprets the complex plane C as R2 .
Hint for n = 2 (but it works also for larger n): For w ∈ C with
f 0 (w) = 0 we have
0 = w − z1 + w − z2 .
Multiply this equation by (w − z1 )(w − z2 ) and use it to show that
w ∈ conv{z1 , z2 }. x
A.3 Prove the converse of Lemma A.2.3: Let C ⊆ Rn be a convex set and
let x be a point lying in C. Suppose that for for every y ∈ C there is a
point z ∈ C so that
x = (1 − α)y + αz with α ∈ (0, 1),
then x is an interior point of C.
A.4 Give a proof for the following statement: Let C ⊆ Rn be a closed
convex set and let x ∈ Rn \ C a point lying outside of C. A separating
hyperplane H is defined in Lemma A.4.1. Consider a point y on the line
aff{x, πC (x)} which lies on the same side of the separating hyperplane
H as x. Then, πC (x) = πC (y).
A.5 Show that
(N )
X
CP n = αi xi xT n
i : N ∈ N, αi ∈ R+ , xi ∈ R+ (i = 1, . . . , N )
i=1
holds.
(ii) =⇒ (iii): By assumption, X has a decomposition (B.1) where all scalars
λi are nonnegative.
√ Define the matrix L ∈ Rn×n whose i-th row/column is
the vector λi ui . Then X = LLT holds.
(iii) =⇒ (iv): Assume X = LLT where L ∈ Rn×k . Let vi ∈ Rk denote the
i-th row of L. The equality X = LLT gives directly that Xij = viT vj for all
i, j ∈ [n].
(iv) =⇒ (i): Assume Xij = viT vj for all i, j ∈ [n], where v1 , . . . , vn ∈ Rk .
For x ∈ Rn we have
n n n 2
X X X
T
x Xx = xi xj Xij = xi xj viT vj = xi vi ≥ 0.
i,j=1 i,j=1 i=1
In other words, this is the usual Euclidean norm, just viewing a matrix
2
as a vector in Rn . Therefore, the Cauchy-Schwarz inequality holds for the
Frobenius norm
|hX, Y i| ≤ kXkF · kY kF ,
with equality if and only if X and Y are linearly dependent.
For a vector x ∈ Rn we have
hX, xxT i = xT Xx.
If λ is an eigenvalue of X and x is a corresponding eigenvector of unit length,
388 Positive semidefinite matrices (Version: May 24, 2022)
then we can bound the modulus of λ by the Frobenius norm of X using the
Cauchy-Schwarz inequality:
|λ| = |xT Xx| = |hX, xxT i| ≤ kXkF .
n
B.1.3 The positive semidefinite cone S+
Definition B.1.6 We let
n
S+ = {X ∈ S n : X 0}
denote the set of all positive semidefinite matrices in S n , called the positive
semidefinite cone. We denote the set of all positive definite matrices by S++n .
B.1 Basic definitions 389
cone. Its interior is exactly the convex cone of positive definite matrices
n .
S++
Proof Indeed, S+ n is a convex cone in S n , that is, it is closed under taking
X 0 ⇐⇒ P XP T 0.
X 0 ⇐⇒ A 0 and C − B T A−1 B 0.
X 0 ⇐⇒ A 0 and C − B T A−1 B 0.
of eigenvectors of X. Then,
n
X n
X
Xx = λi xi ui and xT Xx = λi x2i .
i=1 i=1
Hence,
n
X
0 = xT Xx =⇒ 0 = λi x2i =⇒ xi = 0 if λi > 0.
i=1
by
A11 B . . . A1m B
A21 B . . . A2m B
A⊗B = .
.. ..
.. . .
An1 B . . . Anm B
Here are some (easy to verify) facts about these products, where the ma-
trices and vectors have the appropriate sizes.
A, B 0 =⇒ A ⊗ B 0 and A ◦ B 0,
and
Exercises
B.1 Show: n
int S+= n .
S++
B.2 Recall that a complex square matrix A ∈ Cn×n is Hermitian (or self-
adjoint) if A = A∗ , i.e., Aij = Aji for all entries of A. The Hermitian
matrices form a real vector space (of dimension n2 ), with the Frobenius
inner product
X
hA, Bi = Aij Bij = T r(A∗ B).
ij
[2003] M. Aigner, G.M. Ziegler. Proofs from The Book, Springer, Berlin, 1998.
[1965] N.I. Akhiezer, The classical moment problem, Hafner, New York, 1965.
[2006] N. Alon, K. Makarychev, Y. Makarychev, A. Naor, Quadratic forms on
graphs, Inventiones Mathematicae 163 (2006) 499–522.
[2006] N. Alon, A. Naor, Approximating the cut-norm via Grothendieck’s inequal-
ity, SIAM Journal on Computing 35 (2006) 787–803.
[2012] M.F. Anjos and J.B. Lasserre (eds), Handbook on Semidefinite, Conic and
Polynomial Optimization [International Series in Operations Research &
Management Science, Volume 166], Springer, New York, 2012, pp. 25–60.
[2000] K. Anstreicher, H. Wolkowicz, On Lagrangian relaxation of quadratic ma-
trix constraints, SIAM Journal on Matrix Analysis and Applications 22
(2000) 41–55.
[1977] K. Appel, W. Haken, Every planar map is four colorable. I. Discharging,
Illinois Journal of Mathematics 21 (1977) 429–490.
[1977] K. Appel, W. Haken, J. Koch, Every planar map is four colorable. II.
Reducibility. Illinois Journal of Mathematics 21 (1977) 491–567.
[2009] S. Arora, B. Barak, Computational Complexity — A Modern Approach,
Cambridge University Press, Cambridge, 2009.
[1927] E. Artin, Ueber die Zerlegung definiter Funktionen in Quadrate, Abh. Math.
Sem. Hamburg 5, 1927.
[2008] C. Bachoc, F. Vallentin, New upper bounds for kissing numbers from
semidefinite programming, Journal of the American Mathematical Society
21 (2008) 909–924.
[2012] C. Bachoc, D.C. Gijswijt, A. Schrijver, F. Vallentin, Invariant semidefinite
programs, in: Handbook on Semidefinite, Conic and Polynomial Optimiza-
tion (M.F. Anjos, J.B. Lasserre, eds.) [International Series in Operations
Research & Management Science 166], Springer, New York, 2012, pp. 219–
269.
[1992] K. Ball, Ellipsoids of maximal volume in convex bodies, Geometriae Dedi-
cata 41 (1992) 241–250.
[1997] K. Ball, An Elementary Introduction to Modern Convex Geometry, in:
Flavors of Geometry (S. Levy, ed.) [MSRI Publications 31], Cambridge
University Press, Cambridge, 1997, pp. 1–58.
[1983] F. Barahona, The max-cut problem in graphs not contractible to K5 , Op-
erations Research Letters 2 (1983) 107–111.
References 397
[2021] S.M. Cioabă, H. Gupta, F. Ihringer, H. Kurihara, The least Euclidean dis-
tortion constant of a distance-regular graph, arXiv:2109.09708 [math.CO],
2021
[2010] H. Cohn, Order and disorder in energy minimization, in: Proceedings of the
International Congress of Mathematicians, Hindustan Book Agency, New
Delhi, 2010, pp. 2416–2443.
[1992] D.A. Cox, J.B. Little, D. O’Shea. Ideals, Varieties and Algorithms — An
Introduction to Computational Algebraic Geometry and Commutative Al-
gebra [Undergraduate Texts in Mathematics], Springer, New York, 1992.
[1998] D.A. Cox, J.B. Little, D. O’Shea. Using Algebraic Geometry [Graduate
Texts in Mathematics 185], Springer, New York, 1998.
[1996] R.E. Curto, L.A. Fialkow, Solution of the truncated complex moment prob-
lem for flat data, Memoirs of the American Mathematical Society 119
(1996) x+52 pp.
[1957] C. Davis, All convex invariant functions of hermitian matrices Archiv der
Mathematik 8 (1957) 276–278.
[1973] P. Delsarte, An Algebraic Approach to the Association Schemes of Cod-
ing Theory [Philips Research Reports Supplements 1973 No. 10], Philips
Research Laboratories, Eindhoven, 1973.
[1997] M.M. Deza, M. Laurent, Geometry of Cuts and Metrics [Algorithms and
Combinatorics 15], Springer, Berlin, 1997.
[1997] R. Diestel, Graph Theory [Graduate Texts in Mathematics 173], Springer,
New York, 1997.
[1956] R.J. Duffin, Infinite programs, in: Linear Inequalities and Related Sys-
tems (H.W. Kuhn, A.W. Tucker, eds.) [Annals of Mathematics Studies
38], Princeton University Press, Princeton, New Jersey, 1956, pp. 157–170.
[1969] P. Enflo, On the nonexistence of uniform homeomorphisms between Lp -
spaces, Arkiv för Matematik 8 (1969), 103–105.
[2016] H. Fwzi, J. Saunderson, P. Parrilo, Sparse sums of squares on finite abelian
groups and improved semidefinite lifts Mathematical Programming Series
A 160(1) (2016) 149–191.
[1968] A.V. Fiacco, G.P. McCormick, Nonlinear Programming: Sequential Uncon-
strained Minimization Techniques, Wiley, New York-London-Sydney, 1968.
[1979] M.R. Garey, D.S. Johnson, Computers and Intractability — A Guide to the
Theory of NP-Completeness, Freeman, San Francisco, California, 1979.
[1976] M.R. Garey, D.S. Johnson, L. Stockmeyer, Some simplified NP-complete
problems, Theoretical Computer Science 1 (1976) 237–267.
[1996] G.S. Gasparian, Minimal imperfect graphs: a simple approach, Combina-
torica 16 (1996) 209–212.
[2005] D. Gijswijt, Matrix algebras and semidefinite programming techniques for
codes Ph.D. thesis, University of Amsterdam, 2005.
[1997] M.X. Goemans, Semidefinite programming in combinatorial optimization,
Mathematical Programming, Series B 79 (1997), 143–161.
[1995] M.X. Goemans, D.P. Williamson, Improved approximation algorithms for
maximum cuts and satisfiability problems using semidefinite programming,
Journal of the Association for Computing Machinery 42 (1995) 1115–1145.
[2008] G. Gonthier, Formal proof—the four-color theorem, Notices of the Ameri-
can Mathematical Society 55 (2008) 1382–1393.
References 399
[1994] S. Poljak, Z. Tuza, The expected relative error of the polyhedral approx-
imation of the max-cut problem, Operations Research Letters 16 (1994)
191–198.
[1998] V. Powers, T. Wörmann, An algorithm for sums of squares of real polyno-
mials, Journal of Pure and Applied Algebra 127 (1998) 99–104.
[2001] A. Prestel, C.N. Delzell, Positive Polynomials — From Hilbert’s 17th Prob-
lem to Real Algebra [Springer Monographs in Mathematics], Springer,
Berlin, 2001.
[1993] M. Putinar, Positive polynomials on compact semi-algebraic sets, Indiana
University Mathematics Journal 42 (1993) 969–984.
[1997] M. Ramana, An exact duality theory for semidefinite programming and its
complexity implications, Mathematical Programming, Series B 77 (1997)
129–162.
[1999] S. Rao, Small distortion and volume preserving embeddings for planar and
Euclidean metrics, in: Proceedings of the 15th Annual Symposium on Com-
putational Geometry (Miami Beach, FL, 1999, ACM, New York, 1999, pp.
300–306.
[2001] J. Renegar, A Mathematical View of Interior-Point Methods in Convex
Optimization [MPS/SIAM Series on Optimization], Society for Industrial
and Applied Mathematics, Philadelphia, Pennsylvania; Mathematical Pro-
gramming Society, Philadelphia, Pennsylvania, 2001.
[1995] B.. Reznick, Uniform denominators in Hilbert’s Seventeenth Problem,
Mathematische Zeitschrift 220 (1995) 75–97.
[2000] B. Reznick, Some concrete aspects of Hilbert’s 17th problem, in: Real Al-
gebraic Geometry and Ordered Structures (Baton Rouge, Louisiana, 1996;
C.N. Delzell, J.J. Madden, eds.) [Contemporary Mathematics 253], Amer-
ican Mathematical Society, Providence, Rhode Island, 2000, pp. 251–272.
[1997] N. Robertson, D.P. Sanders, P. Seymour, R. Thomas. The four-colour the-
orem, Journal of Combinatorial Theory Series B 70 (1997) 2–44.
[1970] R.T. Rockafellar, Convex analysis [Princeton Mathematical Series 28],
Princeton University Press, Princeton, New Jersey, 1970.
[1997] C. Roos, T. Terlaky, J.-Ph. Vial, Theory and Algorithms for Linear Opti-
mization — An Interior Point Approach [Wiley-Interscience Series in Dis-
crete Mathematics and Optimization], Wiley, Chichester, 1997.
[2017] S. Sakaue, A. Takeda, S. Kim, N. Ito, Exact semidefinite programming re-
laxations with truncated moment matrix for binary polynomial optimiza-
tion, SIAM Journal on Optimization 27(1) (2017) 565–582.
[2004] P. Sarnak, What is . . . an Expander?, Notices of the American Mathematical
Society 51 (2004) 762–763.
[1979] J.B. Saxe, Embeddability of weighted graphs in k-space is strongly NP-
hard, in: Proceedings of Seventeenth Allerton Conference in Communica-
tions, Control and Computing, University of Illinois, Urbana-Champaign,
Illinois, 1979, pp. 480–489.
[1991] K. Schmüdgen, The K-moment problem for compact semi-algebraic sets,
Mathematische Annalen 289 (1991), 203–206.
[2017] K. Schmüdgen, The moment problem, Graduate Texts in Mathematics,
Springer, 2017.
[1993] R. Schneider, Convex bodies: the Brunn-Minkowski theory [Encyclopedia of
Mathematics and its Applications 44], Cambridge University Press, Cam-
bridge, 1993.
404 References
[1979] A. Schrijver, A comparison of the Delsarte and Lovász bounds, IEEE Trans-
actions on Information Theory IT-25 (1979), 425–429.
[1986] A. Schrijver, Theory of Linear and Integer Programming [Wiley-
Interscience Series in Discrete Mathematics], Wiley, Chichester, 1986.
[2005] A. Schrijver, New code upper bounds from the Terwilliger algebra and
semidefinite programming, IEEE Transactions on Information Theory IT-
51 (2005) 2859–2866.
[1990] H.D. Sherali, W.P. Adams, A hierarchy of relaxations between the contin-
uous and convex hull representations for zero-one programming problems,
SIAM Journal Discrete Mathematics 3 (1990) 411–430.
[1987] N.Z. Shor, An approach to obtaining global extremums in polynomial math-
ematical programming problems, Kibernetika 5 (1987) 102–106.
[1974] G. Stengle. A Nullstellensatz and a Positivstellensatz in semialgebraic ge-
ometry. Math. Ann. 207 (1974) 87–97.
[2001] M.J. Todd, Semidefinite optimization, Acta Numerica 10 (2001) 515–560.
[2012] L. Trevisan, On Khot’s unique games conjecture, American Mathematical
Society, Bulletin, New Series 49 (2012) 91–111.
[2005] M. Trnovská, Strong duality conditions in semidefinite programming, Jour-
nal of Electrical Enigeneering 56 (2005) 1–5.
[2010] L. Tunçel, Polyhedral and Semidefinite Programming Methods in Combi-
natorial Optimization [Fields Institute Monographs 27], American Mathe-
matical Society, Providence, Rhode Island; Fields Institute for Research in
Mathematical Sciences, Toronto, Ontario, 2010.
[2008] F. Vallentin, Lecture notes: Semidefinite programs and harmonic analysis,
arXiv:0809.2017 [math.OC], 2008.
[2008] F. Vallentin, Optimal distortion embeddings of distance regular graphs into
Euclidean spaces, Journal of Combinatorial Theory, Series B 98 (2008),
95–104.
[1996] L. Vandenberghe, S. Boyd, Semidefinite programming, SIAM Review 38
(1996), 49-–95.
[1998] L. Vandenberghe, S. Boyd, S.-P. Wu, Determinant maximization with lin-
ear matrix inequality constraints, SIAM Journal on Matrix Analysis and
Applications 19 (1998), 499–533.
[2000] H. Wolkowic, R. Saigal, L. Vandenberghe (eds.), Handbook of Semidefinite
Programming, Boston, Kluwer Academic, 2000.
[2005] M.H. Wright, The interior-point revolution in optimization: history, recent
developments, and lasting consequence, American Mathematical Society,
Bulletin, New Series 42 (2005) 39–56.
[1997] Y. Ye, Interior Point Algorithms — Theory and Analysis [Wiley-
Interscience Series in Discrete Mathematics and Optimization], Wiley, New
York, 1997.
[1995] G.M. Ziegler, Lectures on Polytopes [Graduate Texts in Mathematics 152],
Springer, New York, 1995.