0% found this document useful (0 votes)
58 views

A Course On Semidefinite Optimization

This document provides a draft of lecture notes for a course on semidefinite optimization. It covers basic theory including semidefinite programs, duality in conic programming, and the ellipsoid method. It also covers applications to combinatorial optimization problems like graph coloring, MAX CUT, and generalizations of Grothendieck's inequality. The document is divided into three parts that cover the basic theory, combinatorial optimization applications, and topics in geometry like optimizing with ellipsoids and determinants.

Uploaded by

kolan9519
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

A Course On Semidefinite Optimization

This document provides a draft of lecture notes for a course on semidefinite optimization. It covers basic theory including semidefinite programs, duality in conic programming, and the ellipsoid method. It also covers applications to combinatorial optimization problems like graph coloring, MAX CUT, and generalizations of Grothendieck's inequality. The document is divided into three parts that cover the basic theory, combinatorial optimization applications, and topics in geometry like optimizing with ellipsoids and determinants.

Uploaded by

kolan9519
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 414

A Course on Semidefinite Optimization

Draft Lecture Notes, Autumn 2020


May 24, 2022

Monique Laurent

Centrum Wiskunde & Informatica


Science Park 123
1098 XG Amsterdam
The Netherlands
[email protected]

Frank Vallentin

Department of Mathematics and Computer Science


Division of Mathematics
Universität zu Köln
Weyertal 86–90
50931 Köln
Germany
[email protected]
Contents

PART ONE BASIC THEORY page 1


1 Semidefinite programs: Basic facts and examples 3
1.1 Primal and dual semidefinite programs 4
1.1.1 Primal form 4
1.1.2 Dual form 5
1.2 Eigenvalue optimization 7
1.3 Hoffman-Wielandt inequality 10
1.3.1 Semidefinite reformulation 10
1.3.2 Application to the quadratic assignment problem 13
1.4 Convex quadratic constraints 14
1.5 Robust optimization 16
1.6 Examples in combinatorial optimization 17
1.6.1 The maximum independent set problem 18
1.6.2 The maximum cut problem 18
1.7 Examples in geometry 21
1.8 Examples in algebra 22
1.9 Further reading 23
Exercises 25
2 Duality in conic programming (Version: May 24, 2022) 28
2.1 Convex cones and their duals 30
2.2 Examples of convex cones 32
2.2.1 The non-negative orthant 33
2.2.2 The second-order cone 33
2.2.3 The cone of positive semidefinite matrices 34
2.2.4 The completely positive and the copositive cone 34
iv Contents

2.3 Primal and dual conic programs 35


2.3.1 Example: Linear programming (LP) 36
2.3.2 Example: Conic quadratic programming (CQP) 36
2.3.3 Example: Semidefinite programming (SDP) 37
2.3.4 Example: Copositive programming 38
2.4 Duality theory 39
2.5 Some pathological examples 42
2.5.1 Dual infimum not attained 43
2.5.2 Positive duality gap 43
2.5.3 Both primal and dual infeasible 44
2.6 Theorems of alternatives 44
2.7 More differences between linear and semidefinite pro-
gramming 48
2.8 Further reading 50
2.9 Historical remarks 50
Exercises 51
3 The ellipsoid method 53
3.1 Ellipsoids 55
3.1.1 Definitions 55
3.1.2 Loewner-John ellipsoids 57
3.2 Platonic version of the ellipsoid method 58
3.3 Separation and optimization 61
3.4 Separation for semidefinite programming: Gaussian
elimination 66
3.5 Further reading 68
3.6 Historical remarks 69
Exercises 70

PART TWO COMBINATORIAL OPTIMIZATION 71


4 Graph coloring and independent sets 73
4.1 Preliminaries on graphs 73
4.1.1 Stability and chromatic numbers 73
4.1.2 Perfect graphs 75
4.1.3 The perfect graph theorem 76
4.2 Linear programming bounds 77
4.2.1 Fractional stable sets and colorings 77
4.2.2 Polyhedral characterization of perfect graphs 79
4.3 Semidefinite programming bounds 82
Contents v

4.3.1 The theta number 82


4.3.2 Computing maximum stable sets in perfect graphs 83
4.3.3 Minimum colorings of perfect graphs 84
4.4 Other formulations of the theta number 85
4.4.1 Dual formulation 85
4.4.2 Two more (lifted) formulations 86
4.4.3 Hoffman eigenvalue bound for coloring 88
4.5 Geometric properties of the theta number 89
4.5.1 The theta body TH(G) 89
4.5.2 Orthonormal representations of graphs 90
4.5.3 Geometric properties of the theta body 91
4.5.4 Geometric characterizations of perfect graphs 92
4.6 Bounding the Shannon capacity 94
4.7 The theta number for vertex-transitive graphs 96
4.8 Application to Hamming graphs: Delsarte LP bound
for codes 99
4.8.1 Automorphisms of the Hamming graph 101
4.8.2 Invariant matrices under action of Gn 101
4.8.3 Delsarte linear programming bound for A(n, d) 102
4.9 Lasserre hierarchy of semidefinite bounds 103
4.9.1 Characterizing positive semidefinite moment
matrices Mn (y). 105
4.9.2 Canonical lifted representation for 0 − 1 polytopes 106
4.9.3 Convergence of the Lasserre hierarchy to α(G) 108
4.10 Notes and further reading 109
Exercises 110
5 Approximating the MAX CUT problem 114
5.1 Introduction 114
5.1.1 The MAX CUT problem 114
5.1.2 Linear programming relaxation 116
5.2 The algorithm of Goemans and Williamson 119
5.2.1 Semidefinite programming relaxation 119
5.2.2 The Goemans-Williamson algorithm 121
5.2.3 Remarks on the algorithm 124
5.3 Some extensions 124
5.3.1 Reformulating MAX CUT using the Laplacian
matrix 124
5.3.2 Nesterov’s approximation algorithm 125
5.3.3 Quadratic programs modeling MAX 2SAT 127
vi Contents

5.3.4 Approximating MAX 2-SAT 128


5.3.5 Grothendieck’s inequality 129
5.4 Further reading and remarks 132
Exercises 134
6 Generalizations of Grothendieck’s inequality and appli-
cations 139
6.1 The Grothendieck constant of a graph 141
6.1.1 Randomized rounding by truncating 142
6.1.2 Quality of expected solution 143
6.1.3 A useful lemma 143
6.1.4 Estimating A and B in the useful lemma 144
6.1.5 Applying the useful lemma 145
6.1.6 Connection to the theta number 145
6.2 Higher rank Grothendieck inequality 146
6.2.1 Randomized rounding by projecting 147
6.2.2 Extension of Grothendieck’s identity 147
6.2.3 Proof of the theorem 149
6.3 Further reading 149
Exercises 150

PART THREE GEOMETRY 151


7 Optimizing with ellipsoids and determinants (Version:
May 24, 2022) 153
7.1 Convex spectral functions 153
7.1.1 Davis’ characterization of convex spectral functions 153
7.1.2 Minkowski’s determinant inequality 157
7.2 Determinant maximization problems 159
7.3 Approximating polytopes by ellipsoids 163
7.3.1 Outer approximation 164
7.3.2 Inner approximation 167
7.3.3 The Löwner-John ellipsoids 168
7.4 Further reading 172
Exercises 173
8 Euclidean embeddings: Low dimension 175
8.1 Geometry of the positive semidefinite cone 176
8.1.1 Faces of convex sets 176
8.1.2 Faces of the positive semidefinite cone 176
8.1.3 Faces of spectrahedra 178
Contents vii

8.1.4 Finding an extreme point in a spectrahedron 181


8.1.5 A refined bound on ranks of extreme points 182
8.2 Applications 184
8.2.1 Euclidean realizations of graphs 184
8.2.2 Hidden convexity results for quadratic maps 186
8.2.3 The S-Lemma 189
8.3 Notes and further reading 190
Exercises 191
9 Euclidean embeddings: Low distortion (Version: May
24, 2022) 193
9.1 Motivation 193
9.2 Least distortion Euclidean embeddings via semidefinite
optimization 196
9.3 Least distortion embeddings of cubes 200
9.4 Least distortion embeddings of strongly regular graphs 202
9.5 Least distortion embeddings of expander graphs 208
9.6 Notes and further reading 211
Exercises 212
10 Packings on the sphere 214
10.1 α and ϑ for packing graphs 215
10.2 Symmetry reduction 217
10.3 Schoenberg’s theorem 219
10.4 Proof of Schoenberg’s theorem 220
10.4.1 Orthogonality relation 220
10.4.2 Positive semidefiniteness 221
10.4.3 End of proof 222
10.5 Delsarte’s LP method 224
10.6 τ8 equals 240 225
10.7 The problem of the thirteen spheres 226
10.7.1 Reminder: Some trigonometry and stereometry 226
10.7.2 Construction of a semidefinite matrix 227
10.7.3 Upper bounds for spherical codes 229
10.7.4 Application to the kissing number 231
10.8 Further reading 235
Exercises 235

PART FOUR ALGEBRA 239


viii Contents

11 Sums of Squares of Polynomials 241


11.1 Sums of squares of polynomials 242
11.1.1 Polynomial optimization 243
11.1.2 Hilbert’s theorem 245
11.1.3 Are sums of squares a rare event? 247
11.1.4 Artin’s theorem 248
11.2 Positivity certificates 249
11.2.1 The univariate case 251
11.2.2 Krivine’s Positivstellensatz 252
11.2.3 Schmüdgen’s Positivstellensatz 254
11.2.4 Putinar’s Positivstellensatz 254
11.2.5 Proof of Putinar’s Positivstellensatz 256
11.3 Notes and further reading 260
Exercises 261
12 Moment matrices and polynomial equations 264
12.1 The polynomial algebra R[x] 265
12.1.1 (Real) radical ideals and the (real) Nullstellensatz 265
12.1.2 The quotient algebra K[x]/I 268
12.1.3 The eigenvalue method for complex roots 271
12.2 Characterizing the convex set C∞ (K) 274
12.2.1 The moment problem 275
12.2.2 Moment matrices 275
12.2.3 Some results on the moment problem 280
12.2.4 Finite rank positive semidefinite moment matrices 281
12.2.5 Moment relaxation for polynomial optimization 283
12.3 Notes and further reading 284
Exercises 285
13 Polynomial optimization 287
13.1 Asymptotic convergence 290
13.2 Dual semidefinite programs 290
13.3 Strong duality 293
13.4 Flat extensions of moment matrices 296
13.5 Finite convergence and global minimizers 302
13.6 Real solutions of polynomial equations 307
13.7 Binary polynomial optimization 310
13.7.1 Revisiting Lasserre’s hierarchy for the stable set
problem 310
13.7.2 Lasserre’s hierarchy for the maximum cut problem 312
13.7.3 Exploiting equations 314
Contents ix

13.8 Notes and further reading 316


Exercises 317
14 Noncommutative polynomial optimization and applica-
tions to quantum information theory 320
14.1 Noncommutative polynomial optimization 321
14.1.1 Notation and preliminaries 321
14.1.2 The NC polynomial optimization problem 323
14.1.3 Application to NC optimization over the ball 330
14.1.4 NC optimization over the discrete hypercube 333
14.1.5 Reduction from complex to real 334
14.2 Application to quantum information 334
14.2.1 Quantum correlations 334
14.2.2 Nonlocal games 334
14.2.3 XOR games 334
14.3 Notes and further reading 334
14.4 Exercises 334
References 335
15 Symmetries 336
15.1 Symmetry reduction and matrix ∗ algebras 336
15.1.1 Complex semidefinite programs 336
15.1.2 Invariant semidefinite programs 337
15.1.3 Matrix ∗-algebras 338
15.1.4 Example: Delsarte Linear Programming Bound 339
15.2 Fourier analysis on finite abelian groups 341
15.2.1 Discrete Fourier transform 341
15.2.2 Bochner’s characterization 341
15.2.3 Example: Cycle graphs 345
15.2.4 Example: SDP lifts of cyclic polytopes 345
15.3 Fourier analysis of finite Nonabelian groups 345
15.3.1 Theory 345
15.3.2 Bochner’s theorem 346
15.3.3 Example: The ϑ-number of a Cayley graph 347
15.3.4 Blowing up vertex transitive graphs 350
15.3.5 Example: Product free sets 351
15.4 Notes 351
Exercises 351
Appendix A Convexity (Version: May 24, 2022) 353
A.1 Preliminaries and conventions 354
A.1.1 Euclidean space 354
x Contents

A.1.2 Topology 356


A.1.3 Affine geometry 356
A.2 Convex sets and convex functions 358
A.2.1 Convex sets 358
A.2.2 Convex functions 361
A.3 Metric projection 365
A.4 Separating and supporting hyperplanes 369
A.5 Extreme points 372
A.6 Convex optimization 373
A.7 Polytopes and polyhedra 375
A.8 Some historical remarks and further reading 380
Exercises 381
Appendix B Positive semidefinite matrices (Version: May 24,
2022) 384
References 396
PART ONE
BASIC THEORY
1
Semidefinite programs: Basic facts and examples

In this chapter we introduce semidefinite programs and we give some of


their basic properties. After that we point out links to some classical results
about eigenvalues of symmetric matrices, including Fan’s theorem about the
sum of the first k largest eigenvalues, and the Hoffman-Wielandt inequality
about an interlacing property of the eigenvalues of two symmetric matrices.
In addition we present several problems that can be modeled as instances
of semidefinite programs, arising from combinatorial optimization, geometry
and algebra, to which we will come back in detail later in the book. So this
chapter may also serve as a gentle introduction to some of the main topics
treated in this book.

For convenience we briefly recall some notation that we will use in this
chapter and we refer to the Appendices for more details on notation and basic
properties. Throughout S n denotes the set of symmetric n × n matrices. For
a matrix X ∈ S n , X  0 means that X is positive semidefinite and S+ n is

the cone of positive semidefinite matrices. Analogously, X  0 means that


X is positive definite and S++ n is the open cone of positive definite matrices.

Throughout, In (or simply I when the dimension is clear from the context)
denotes the n × n identity matrix, e denotes the all-ones vector, i.e., e =
(1, . . . , 1)T ∈ Rn , and Jn = eeT (or simply J) denotes the all-ones matrix.
The vectors e1 , . . . , en are the standard unit vectors in Rn , and the matrices
Eij = (ei eT T n
j + ej ei )/2 (1 ≤ i ≤ j ≤ n) form the standard basis of S . We
let O(n) denote the set of orthogonal matrices, where an n × n matrix A is
orthogonal if AAT = In or, equivalently, AT A = In .
We consider the trace inner product: hA, Bi = Tr(AT B) = ni,j=1 Aij Bij
P

for two matrices A, B ∈ Rn×n . Here Tr(A) = hIn , Ai = ni=1 Aii denotes the
P

trace of A. Recall that Tr(AB) = Tr(BA); in particular, hQAQT , QBQT i =


hA, Bi if Q is an orthogonal matrix. A well known property of the positive
4 Semidefinite programs: Basic facts and examples

semidefinite cone S+n is that it is self-dual: for a matrix X ∈ S n , X  0 if

and only if hX, Y i ≥ 0 for all Y ∈ S+n.

1.1 Primal and dual semidefinite programs


We begin with introducing semidefinite programs in standard primal and
dual forms.

1.1.1 Primal form


The typical form of a semidefinite program (often abbreviated as SDP) is a
maximization problem of the form
p∗ = sup{hC, Xi : hAj , Xi = bj for j ∈ [m], X  0}. (1.1)
X

Here C, A1 , . . . , Am ∈ S n are given n × n symmetric matrices and b ∈ Rm


is a given vector, they are the data of the semidefinite program (1.1). The
matrix X is the variable, which is constrained to be positive semidefinite
and to lie in the affine subspace
W = {X ∈ S n : hAj , Xi = bj for j ∈ [m]}
of S n . The goal is to maximize the linear objective function hC, Xi over the
set
n
F = S+ ∩ W,
obtained by intersecting the positive semidefinite cone S+ n with the affine

subspace W.
The set F = S+ n ∩ W is called the feasible region of (1.1) and a matrix

X ∈ F is said to be feasible for (1.1). A feasible solution X ∈ F is said to


be strictly feasible if X is positive definite. The program (1.1) is said to be
strictly feasible if it admits at least one strictly feasible solution, and it is
said to be infeasible if it has no feasible solution (i.e., F = ∅).
One can also handle minimization problems, of the form
inf {hC, Xi : hAj , Xi = bj for j ∈ [m], X  0},
X

since they can be brought into the above standard maximization form using
the fact that infhC, Xi = − suph−C, Xi.
Note that we write a supremum in (1.1) rather than a maximum. This
is because the optimum value p∗ may or may not be attained in (1.1). In
general, we have: p∗ ∈ R ∪ {±∞}, with p∗ = −∞ if the problem (1.1) is
1.1 Primal and dual semidefinite programs 5

infeasible and p∗ = +∞ might occur in which case we say that the problem
is unbounded.
We give a small example as an illustration. Consider the problem of min-
imizing/maximizing X11 over the feasible region
X
11 a
n  o
Fa = X ∈ S 2 : X =  0 , where a ∈ R,
a 0

with a being a given parameter. Note that det(X) = −a2 for X ∈ Fa . Hence,
if a 6= 0 then the problem is infeasible (since X22 = 0 implies X12 = 0 if X is
positive semidefinite). Moreover, if a = 0 then the problem is feasible (e.g.,
the zero matrix is feasible) but not strictly feasible (since the standard unit
vector e2 belongs to the kernel of every feasible solution X). The minimum
value of X11 over F0 is equal to 0, attained at X = 0, while the maximum
value of X11 over F0 is equal to ∞ (the problem is unbounded).
As another example, consider the problem
 

n X11 1 o
p = inf X11 : 0 .
X∈S 2 1 X22

Then the infimum is p∗ = 0, which is reached at the limit when X11 = 1/X22
and letting X22 tend to +∞. So the infimum is not attained.
In the special case when the matrices Aj , C are diagonal matrices, with
diagonals aj , c ∈ Rn , then the program (1.1) reduces to the linear program
(LP):
n o
max cT x : aTj x = bj for j ∈ [m], x ≥ 0 ,
x

where x ≥ 0 means that x is component-wise nonnegative. Indeed, if we let


x = diag(X) denote the vector consisting of the diagonal entries of the ma-
trix X, then x ≥ 0 if X  0 and we have hC, Xi = cT x and hAj , Xi = aT j x.
Therefore, the theory of semidefinite programming contains linear program-
ming as a special instance.

1.1.2 Dual form


The program (1.1) is often referred to as the primal SDP in standard form.
One can define its dual SDP, which takes the form:
m
nX m
X o
∗ T
d = inf bj yj = b y : yj Aj − C  0, y ∈ Rm . (1.2)
y
j=1 j=1
6 Semidefinite programs: Basic facts and examples

Thus the dual program has variables yj for j ∈ [m], one for each linear
constraint of the primal program (1.1). The positive semidefinite constraint
m
X
yj Aj − C  0 (1.3)
j=1

arising in (1.2) is also named a linear matrix inequality (LMI). The notions
of (strict) feasibility can be analogously defined for the dual program. For
instance, y ∈ Rm is called feasible (resp., strictly feasible) for (1.2) if the
matrix m
P
j=1 yj Aj − C is positive semidefinite (resp., positive definite).
The following facts relate the primal and dual SDP’s. They are simple,
but very important.
Lemma 1.1.1 Let (X, y) be a primal/dual pair of feasible solutions, i.e.,
X is a feasible solution of (1.1) and y is a feasible solution of (1.2).
(i) (weak duality) We have that hC, Xi ≤ bT y and thus p∗ ≤ d∗ .
(ii) (complementary slackness) Assume that the primal program attains
its supremum at X, that the dual program attains its infimum at y, and
that p∗ = d∗ . Then the equalities hC, Xi = bT y and hX, m
P
j=1 yj Aj −Ci =
0 hold.
(iii) (optimality criterion) If equality hC, Xi = bT y holds, then the supre-
mum of (1.1) is attained at X, the infimum of (1.2) is attained at y and
p∗ = d∗ .
Proof If (X, y) is a primal/dual pair of feasible solutions, then
m
X m
X m
X
0 ≤ hX, yj Aj −Ci = hX, Aj iyj −hX, Ci = bj yj −hX, Ci = bT y−hC, Xi.
j=1 j=1 j=1
P
The left most inequality follows from the fact that both X and j yj Aj − C
are positive semidefinite and self-duality of the cone of completely positive
semidefinite matrices, and we use the fact that hAj , Xi = bj to get the
second equality. This implies that
hC, Xi ≤ p∗ ≤ d∗ ≤ bT y.
The rest of the lemma follows immediately by considering the equality case
p∗ = d∗ .
The difference d∗ − p∗ is called the duality gap. In general there might
be a positive duality gap between the primal and dual SDP’s. When there
is no duality gap, i.e., p∗ = d∗ , one says that strong duality holds, a very
desirable situation. This topic and criteria for strong duality will be discussed
1.2 Eigenvalue optimization 7

in detail in Chapter 2. For now we only quote the following result on strong
duality which will be proved in Chapter 2 (in the general setting of conic
programming).
Theorem 1.1.2 (Strong duality: no duality gap) Consider the pair
of primal and dual programs (1.1) and (1.2).
(i) Assume that the dual program (1.2) is bounded from below (d∗ > −∞)
and that it is strictly feasible. Then the primal program (1.1) attains its
supremum (i.e., there exists X ∈ F such that p∗ = hC, Xi) and there is
no duality gap: p∗ = d∗ .
(ii) Assume that the primal program (1.1) is bounded from above (p∗ < ∞)
and that it is strictly feasible. Then the dual program (1.2) attains its
infimum (i.e., d∗ = bT y for some dual feasible y) and there is no duality
gap: p∗ = d∗ .
In the rest of this chapter we discuss several examples of applications
of semidefinite programming, many of which will be studied in much more
detail in the following parts of the book.

1.2 Eigenvalue optimization


Given a symmetric matrix C ∈ S n , let λmin (C) (resp., λmax (C)) denote its
smallest (resp., largest) eigenvalue. One can express them as follows:
xT Cx
λmax (C) = max = max xT Cx, (1.4)
x∈Rn \{0} kxk x∈S n−1

where S n−1 = {x ∈ Rn : kxk = 1} denotes the unit sphere in Rn and k · k


denotes the Euclidean norm, and
xT Cx
λmin (C) = min = min xT Cx. (1.5)
x∈Rn \{0} kxk x∈S n−1

This is known as the Rayleigh-Ritz principle and can be proved, e.g., using
the spectral decomposition theorem. As we will now see the largest and
smallest eigenvalues can also be expressed via a semidefinite program. For
this, consider the following semidefinite program
p∗ = sup {hC, Xi : Tr(X) = 1, X  0} (1.6)
X

and its dual program


d∗ = inf {y : yI − C  0, y ∈ R} . (1.7)
y
8 Semidefinite programs: Basic facts and examples

In view of (1.4), we have that d∗ = λmax (C). Indeed, yI − C  0 holds if


and only if xT (yI − C)x ≥ 0 for all x ∈ Rn or, equivalently, y ≥ xT Cx for all
unit vectors x ∈ Rn . Hence the infimum in (1.7) is attained at y = λmax (C).
The feasible region of (1.6) is closed and bounded since all entries of every
feasible X lie in [−1, 1] and thus the supremum is attained in program (1.6).
In addition, p∗ = λmax (C) is attained at X = uuT where u is an eigenvector
of C with eigenvalue λmax (C). Moreover, note that both programs (1.6)
and (1.7) are strictly feasible, so we could also have derived that they have
optimal solutions and that strong duality holds using Theorem 1.1.2. Thus
we have shown:
Lemma 1.2.1 The largest and smallest eigenvalues of a symmetric matrix
C ∈ S n can be expressed with the following semidefinite programs:
λmax (C) = max hC, Xi = min y
s.t. Tr(X) = 1, X  0 s.t. yIn − C  0,

λmin (C) = min hC, Xi = max y


s.t. Tr(X) = 1, X  0 s.t. C − yIn  0.
More generally, also the sum of the k largest eigenvalues of a symmetric
matrix can be computed via a semidefinite program.
Theorem 1.2.2 (Fan’s theorem) Let C ∈ S n be a symmetric matrix
with eigenvalues λ1 ≥ . . . ≥ λn . Then the sum of its k largest eigenvalues:
λ1 +. . .+λk is equal to the optimal value of any of the following two programs:
n o
µ1 := max hC, Y Y T i : Y T Y = Ik , Y ∈ Rn×k , (1.8)
Y

µ2 := max {hC, Xi : Tr(X) = k, In  X  0} . (1.9)


X

That is, λ1 + · · · + λk = µ1 = µ2 .
Note that the program (1.9) can be reformulated as a semidefinite program
in standard primal form by introducing an additional matrix variable (see
Exercise 1.1). The proof of Theorem 1.2.2 will use the fact that the extreme
points of the polytope
P = {x ∈ [0, 1]n : eT x = k} (1.10)
are exactly the points x ∈ P ∩ {0, 1}n . (See Exercise A.10).
Proof Let u1 , . . . , un denote an orthonormal basis of eigenvectors corre-
sponding to the eigenvalues λ1 , . . . , λn of C, let U denote the matrix with
1.2 Eigenvalue optimization 9

columns u1 , . . . , un which is thus an orthogonal matrix, and let D denote


the diagonal matrix with entries λ1 , . . . , λn . Thus we have: C = U DU T .
The proof is in three steps and consists of showing each of the following
three inequalities: λ1 + · · · + λk ≤ µ1 ≤ µ2 ≤ λ1 + · · · + λk .
Step 1: λ1 + · · · + λk ≤ µ1 : Consider the matrix Y with columns u1 , . . . , uk ,
then Y is feasible for the program (1.8) with value hC, Y Y T i = λ1 + · · · + λk .
Step 2: µ1 ≤ µ2 : Let Y be feasible for the program (1.8) and set X = Y Y T .
Then X is feasible for the program (1.9), because X has k eigenvalues equal
to 1 and n − k eigenvalues equal to 0.
Step 3: µ2 ≤ λ1 + · · · + λk : This is the most interesting part of the proof.
A first key observation is that the program (1.9) is equivalent to the same
program where we replace C by the diagonal matrix D (containing its eigen-
values). Indeed, using the spectral decomposition C = U DU T , we have:

hC, Xi = Tr(CX) = Tr(U DU T X) = Tr(DU T XU ) = Tr(DZ) = hD, Zi,

where the matrix Z = U T XU is again feasible for (1.9). Therefore we obtain:

µ2 = maxn {hD, Zi : Tr(Z) = k, In  Z  0} .


Z∈S

Now let z = (Zii )ni=1 denote the vector containing the diagonal entries
of Z. The condition: I  Z  0 implies that z ∈ [0, 1]n . Moreover, the
Pn
condition: Tr(Z) = k implies eT z = k and we have hD, Zi = i=1 λi zi .
Hence the vector z lies in the polytope P from (1.10) and thus we obtain:
µ2 ≤ maxz∈P ni=1 λi zi . Now recall that the maximum of the linear func-
P
Pn
tion i=1 λi zi is attained at an extreme point of P . As recalled above, the
extreme points of P are the 0/1 valued vectors with exactly k ones. From
this follows immediately that the maximum value of ni=1 λi zi taken over P
P

is equal to λ1 + · · · + λk . This shows the last inequality: µ2 ≤ λ1 + · · · + λk


and thus concludes the proof.

As an application, we obtain that the feasible region of the program (1.9)


is equal to the convex hull of the feasible region of the program (1.8). That
is, we have shown the following result.

Lemma 1.2.3 We have equality

{X ∈ S n : In  X  0, Tr(X) = k} = conv{Y Y T : Y ∈ Rn×k , Y T Y = Ik }

and the extreme points of the set {X ∈ S n : In  X  0, Tr(X) = k} are


exactly the matrices Y Y T with Y ∈ Rn×k and Y T Y = Ik .
10 Semidefinite programs: Basic facts and examples

1.3 Hoffman-Wielandt inequality


In this section we consider the following optimization problem over the set
of orthogonal matrices:
n o
OPT(A, B) = min Tr(AXBX T ) : X ∈ O(n) , (1.11)
X

where A, B ∈ S n are two given symmetric matrices. We will indicate below


its relation to the quadratic assignment problem.
Quite surprisingly, it turns out that the optimal value of the program
(1.11) can be expressed in a closed form in terms of the eigenvalues of A
and B. This gives the following nice inequality
n
X
Tr(AB) ≥ αi βi , (1.12)
i=1

about the interlacing of the eigenvalues αi of A and βi of B when they are


ordered as α1 ≤ . . . ≤ αn and β1 ≥ . . . ≥ βn , which is due to Hoffman-
Wielandt (1953). In addition, the program (1.11) has an equivalent refor-
mulation as a semidefinite program given in (1.13) below.

1.3.1 Semidefinite reformulation


Here we reformulate the problem (1.11) as a semidefinite program and as
an application we show the Hoffman-Wielandt inequality (1.12).

Theorem 1.3.1 Let A, B ∈ S n be symmetric matrices with respective


eigenvalues α1 , . . . , αn and β1 , . . . , βn ordered as follows: α1 ≤ . . . ≤ αn and
β1 ≥ . . . ≥ βn . The program (1.11) is equivalent to the following semidefinite
program:

max {Tr(S) + Tr(T ) : A ⊗ B − In ⊗ T − S ⊗ In  0, S, T ∈ S n } (1.13)


S,T

and its optimum value is equal to


n
X
OPT(A, B) = αi β i . (1.14)
i=1

In particular, setting X = I in (1.11), the following inequality holds:


n
X
Tr(AB) ≥ αi βi .
i=1
1.3 Hoffman-Wielandt inequality 11

For the proof we will use an intermediary well-known result about doubly
stochastic matrices. Recall that a matrix X ∈ Rn×n is doubly stochastic if
X is nonnegative and has all its row and column sums equal to 1. So the
polyhedron
n n
X n
X o
n×n
DS(n) = X ∈ R+ : Xij = 1 for j ∈ [n], Xij = 1 for i ∈ [n]
i=1 j=1

is the set of all doubly stochastic matrices.


Given a permutation σ of [n] one can represent it by the corresponding
permutation matrix Pσ ∈ {0, 1}n×n with entries (Pσ )i,σ(i) = 1 for all i ∈ [n]
and all other entries are equal to 0. Hence the 0/1-valued doubly stochastic
matrices are precisely the permutation matrices. Moreover, the well-known
theorem of Birkhoff shows that the set of doubly stochastic matrices is equal
to the convex hull of the set of permutation matrices.

Theorem 1.3.2 (Birkhoff’s theorem) The set of doubly stochastic matrices


is given by

DS(n) = conv{Pσ : σ is a permutation of [n]} = conv(DS(n) ∩ {0, 1}n×n ).

The following lemma will play a key role in the proof of Theorem 1.3.1.

Lemma 1.3.3 Given scalars α1 , . . . , αn and β1 , . . . , βn ordered as α1 ≤


. . . ≤ αn and β1 ≥ . . . ≥ βn , consider the following linear program:
n
nX n
X o
max xi + yj : αi βj − xi − yj ≥ 0 for i, j ∈ [n], x, y ∈ Rn (1.15)
x,y
i=1 j=1

and its dual linear program:


n
nX n
X n
X o
n×n
min αi βj Zij : Zij = 1 for j ∈ [n], Zij = 1 for i ∈ [n], Z ∈ R+ .
Z
i,j=1 i=1 j=1
(1.16)
Pn
The optimum value of (1.15) and (1.16) is equal to i=1 αi βi .

Proof The feasible region of the program (1.16) is precisely the set DS(n) of
doubly stochastic matrices and, by the above mentioned result of Birkhoff,
it is equal to the convex hull of the set of permutation matrices. As the
minimum value of (1.16) is attained at an extreme point of DS(n) (i.e., at a
permutation matrix), it is equal to the minimum value of ni=1 αi βσ(i) taken
P

over all permutations σ of [n]. We leave it as an exercise to verify that this


minimum value is attained for the identity permutation (please check it!).
12 Semidefinite programs: Basic facts and examples

This shows that the optimum value of (1.16) (and thus of (1.15)) is equal to
Pn
i=1 αi βi .

Proof (of Theorem 1.3.1) The proof is structured as follows: First we show
the identity (1.14) and then we show that the optimal value of the program
(1.13) is equal to ni=1 αi βi .
P
P
First, we show the identity (1.14): OPT(A, B) = i αi βi . For this the
first step consists of replacing the program (1.11) by an equivalent program
where the matrices A and B are diagonal. For this, write A = P DP T and
B = QEQT where P, Q ∈ O(n) and D (resp., E) is the diagonal matrix with
diagonal entries αi (resp., βi ). For X ∈ O(n), we have Y := P T XQ ∈ O(n)
and Tr(AXBX T ) = Tr(DY EY T ). Hence the optimization problem (1.11)
is equivalent to the program:

OPT(D, E) = min{Tr(DXEX T ) : X ∈ O(n)}. (1.17)

That is,
OPT(A, B) = OPT(D, E). (1.18)

The next step is to show that the program (1.17) has the same optimum
value as the linear program (1.16), i.e., in view of Lemma 1.3.3, that

X
OPT(D, E) = αi βi . (1.19)
i

For this, pick X ∈ O(n) and consider the matrix Z = ((Xij )2 )ni,j=1 , which is
doubly-stochastic (since X is orthogonal). Moreover, since
n
X n
X
Tr(DXEX T ) = αi βi (Xij )2 = αi βi Zij ,
i,j=1 i,j=1

it follows that Tr(DXEX T ) is at least the minimum value of the pro-


gram (1.16). Hence the minimum value of (1.17) is at least the minimum
P
value of (1.16), which is equal to i αi βi by Lemma 1.3.3. So we can al-
Pn
ready conclude that OPT(D, E) ≥ i=1 αi βi . The reverse inequality fol-
lows by selecting the matrix X = In as feasible solution of (1.17), so that
Pn
OPT(D, E) ≤ Tr(DE) = i=1 αi βi . Hence we have shown the desired
equality (1.19), which, combined with (1.18), gives the desired identity:
P
OPT(A, B) = i αi βi .
Now, we show that the optimal value of the program (1.13) is equal to
P
i αi βi . For this we first show that the program (1.13) is equivalent to the
1.3 Hoffman-Wielandt inequality 13

semidefinite program:
Tr(S 0 ) + Tr(T 0 ) : E ⊗ F − In ⊗ T 0 − S 0 ⊗ In  0, S 0 , T 0 ∈ S n ,

max
0 0
S ,T
(1.20)
obtained from (1.13) by replacing A and B by E and F . Indeed, using the
relation
(P ⊗Q)(E⊗F −In ⊗T −S⊗In )(P ⊗Q)T = A⊗B−In ⊗(QT QT )−(P SP T )⊗In
and the fact that P ⊗ Q is orthogonal, we see that the pair (S, T ) is feasible
for (1.13) if and only if the pair (S 0 = P SP T , T 0 = QT QT ) is feasible for
(1.20) and moreover we have Tr(S) + Tr(T ) = Tr(S 0 ) + Tr(T 0 ).
Next we show that the program (1.20) is equivalent to the linear program
(1.15). For this, observe that in the program (1.20) we may assume without
loss of generality that the matrices S 0 and T 0 are diagonal because the matrix
E ⊗ F is diagonal. Indeed, if we define the vectors x = diag(S 0 ) and y =
diag(T 0 ), we see that, since E ⊗ F is diagonal, the diagonal matrices S 00 =
Diag(x) and T 00 = Diag(y) are still feasible for (1.20) with the same objective
value: Tr(S 0 ) + Tr(T 0 ) = Tr(S 00 ) + Tr(T 00 ). Now, the program (1.20) with the
additional condition that S 0 , T 0 are diagonal matrices can be rewritten as
the linear program (1.15), since the matrix E ⊗ F − In ⊗ T 0 − S 0 ⊗ In is
diagonal with diagonal entries αi βj − xi − yj for i, j ∈ [n].
Hence, we can conclude that the maximum value of the program (1.20)
P
is equal to the maximum value of the program (1.15), which is i αi βi
by Lemma 1.3.3. This implies that the optimal value of (1.13) is equal to
P
i αi βi , which concludes the proof.

1.3.2 Application to the quadratic assignment problem


The result of Theorem 1.3.1 can be used to give an explicit lower bound for
the following quadratic assignment problem (QAP):
n
nX o
QAP(A, B) = min Aij Bσ(i)σ(j) : σ is a permutation of [n] . (1.21)
i,j=1

The QAP problem models the following facility location problem, where one
wants to allocate n facilities to n locations at the lowest possible total cost.
The cost of allocating two facilities i and j to the respective locations σ(i)
and σ(j) is then Aij Bσ(i)σ(j) , where Aij can be seen as the ‘flow’ cost between
the facilities i and j, and Bσ(i)σ(j) is the ‘distance’ between the locations
σ(i) and σ(j)). For instance, you may think of the campus building problem,
where one needs to locate n buildings at n locations, Aij represents the traffic
14 Semidefinite programs: Basic facts and examples

intensity between the buildings i and j, and Bhk is the distance between the
locations h and k.
As QAP is an NP-hard problem one is interested in finding some tractable
lower bounds for it. As we now see such bound can be obtained from the
result in Theorem 1.3.1. For this observe first that problem (1.21) can be
reformulated as the following optimization problem over the set of permu-
tation matrices:
QAP(A, B) = min{Tr(AXBX T ) : X is a permutation matrix}
Pn T
(because i,j=1 Aij Bσ(i)σ(j) = Tr(AXBX ) if X = Pσ ). Then, observe
that a matrix X is a permutation matrix if and only if it is simultaneously
doubly stochastic and orthogonal (Exercise 1.2). Hence, if in program (1.21)
we replace the condition that X is a permutation matrix by the condition
that X is an orthogonal matrix, then we obtain the program (1.11), which
is thus a relaxation of the QAP program (1.21). This shows the inequality
QAP(A, B) ≥ OPT(A, B)
and the next theorem.
Theorem 1.3.4 Let A, B ∈ S n be symmetric matrices with respective
eigenvalues α1 , . . . , αn and β1 , . . . , βn ordered as follows: α1 ≤ . . . ≤ αn and
β1 ≥ . . . ≥ βn . Then we have
n
X
QAP(A, B) ≥ OPT(A, B) = αi βi .
i=1

1.4 Convex quadratic constraints


Consider a quadratic constraint for a vector x ∈ Rn of the form
xT Ax ≤ bT x + c, (1.22)
where A ∈ S n , b ∈ Rn and c ∈ R. In the special case when A is positive
semidefinite, the function x ∈ Rn 7→ xT Ax − bT x − c is convex and thus the
feasible region defined by the constraint (1.22) is convex. In fact, as we now
show, it can be equivalently defined by a semidefinite constraint, which has
the form of an LMI as in (1.3).
Lemma 1.4.1 Assume A  0. Say, A = LLT , where L ∈ Rn×k . Then, for
any x ∈ Rn , we have
LT x
 
T T Ik
x Ax ≤ b x + c ⇐⇒  0.
xT L b T x + c
1.4 Convex quadratic constraints 15

Proof The equivalence follows as a direct application of the Schur com-


plement operation: Take the Schur complement of the submatrix Ik in the
block-matrix on the right hand side of the displayed equation in the lemma
(i.e., apply Lemma B.2.2 with A = Ik , B = LT x ∈ Rk×1 and C = bT x + c ∈
R1×1 ).

As a direct application, the Euclidean unit ball can be represented by an


LMI:
n
1 xT 0 eT
    X  
n
n
n 1 0 i
o
{x ∈ R : kxk ≤ 1} = x ∈ R : = + xi 0
x In 0 In ei 0
i=1

and the same holds for its homogenization:

t xT
n   o
Ln+1 := {(x, t) ∈ Rn × R : kxk ≤ t} = (x, t) ∈ Rn × R : 0 .
x tIn

So, if we intersect the set Ln+1 with the hyperplane t = t0 (for some scalar
t0 ), then we obtain in the x-space the Euclidean ball of radius t0 . The set
Ln+1 is a cone, known as the second-order cone (or Lorentz cone). This
cone is briefly introduced in Appendix 2.2.2 and we will come back to it in
Chapter 2.

Therefore one can reformulate the problem of maximizing a linear objec-


tive function over the Euclidean ball as a maximization semidefinite pro-
gram. In addition, using the duality theorem (Theorem 1.1.2), one can fur-
ther reformulate it as a minimization semidefinite program; this fact can be
very useful as we will see in the next section.

Corollary 1.4.2 Given c ∈ Rn , the following holds:

min{cT x : kxk ≤ 1, x ∈ Rn }
x
1 xT
n   o
T n
= min c x :  0, x ∈ R (1.23)
x x In
= max −Tr(X) : 2X0i = ci for i ∈ [n], X  0, X ∈ S n+1 .

X

Proof Apply Lemma 1.4.1 combined with the fact that strong duality holds
between the primal and dual programs (see Theorem 1.1.2).

We will come back to the question of finding SDP representations for


quadratic maps in Chapter 8.3.
16 Semidefinite programs: Basic facts and examples

1.5 Robust optimization


We indicate here how semidefinite programming comes up naturally when
dealing with some robust optimization problems.
Consider the following linear programming problem:

max{cT x : aT x ≥ b}, (1.24)

where c, a ∈ Rn and b ∈ R are given data, with just one constraint for
simplicity of exposition. In practical applications the data a, b might be given
through experimental results and might not be known exactly with 100%
certainty, which is in fact the case in most of the real world applications
of linear programming. One may write a = a(z) and b = b(z) as functions
of an uncertainty parameter z assumed to lie in a given uncertainty region
Z ⊆ Rk . Then one wants to find an optimum solution x that is robust
against this uncertainty, i.e., that satisfies the constraints a(z)T x ≥ b(z) for
all values of the uncertainty parameter z ∈ Z. That is, solve the following
robust counterpart of the linear program (1.24):

max{cT x : a(z)T x ≥ b(z) for all z ∈ Z}. (1.25)

Depending on the set Z this problem might have infinitely many constraints.
However, for certain choices of the functions a(z), b(z) and of the uncertainty
region Z, one can reformulate the problem as a semidefinite programming
problem.
Suppose that the uncertainty region Z is the unit ball and that a(z), b(z)
are linear functions in the uncertainty parameter z = (ζ1 , · · · , ζk ) ∈ Rk , of
the form
k
X k
X
a(z) = a0 + ζj aj , b(z) = b0 + ζj bj (1.26)
j=1 j=1

where aj ∈ Rn and bj ∈ R (j = 0, 1, . . . , k) are known. Then it turns


out that the robust optimization problem (1.25) can be reformulated as a
semidefinite programming problem involving the variable x ∈ Rn and a new
k+1
matrix variable Z ∈ S+ . The proof relies on the result from Corollary 1.4.2,
where we made use in a crucial manner of the duality theory for semidefinite
programming for showing the equivalence of both problems in (1.23).

Theorem 1.5.1 Suppose that the functions a(z) and b(z) are given by
(1.26) and that Z = {z ∈ Rk : kzk ≤ 1}. Then problem (1.25) is equivalent
1.6 Examples in combinatorial optimization 17

to the problem:
min {cT x : aT
j x − 2Z0j = bj for j ∈ [k],
x∈Rn ,Z∈S k+1
aT
0 x − Tr(Z) ≥ b0 , Z  0, Z ∈ S
k+1 , x ∈ Rn }.

(1.27)

Proof Fix x ∈ Rn , set αj = aTj x−bj for j = 0, 1, . . . , k, and define the vector
α = (αj )j=1 ∈ R (which depends on x). Then the constraints: a(z)T x ≥ b(z)
k k

for all z ∈ Z can be rewritten as

αT z ≥ −α0 for all z ∈ Z.

Therefore, we find the problem of deciding whether p∗x ≥ −α0 , where we set

p∗x = min{αT z : kzk ≤ 1, z ∈ Rk }.

Now the above problem fits precisely within the setting considered in Corol-
lary 1.4.2. Hence, we can rewrite it using the second formulation in (1.23) –
the one in maximization form – as
n o
p∗x = max −Tr(Z) : 2Z0j = αj (j ∈ [k]), Z  0, Z ∈ S k+1 .
Z

So, in problem (1.25), we can substitute the condition: a(z)T x ≥ b(z) for all
z ∈ Z, by the condition:
k+1
∃Z ∈ S+ such that − Tr(Z) ≥ −α0 and 2Z0j = αj for all j ∈ [k].

The crucial fact here is that the quantifier “∀z” has been replaced by the
existential quantifier “∃Z”. As problem (1.25) is a maximization problem in
x, it is equivalent to the following maximization problem in the variables x
and Z:
n o
k+1
max cT x : aT 0 x − Tr(Z) ≥ b0 , a T
j x − 2Z 0j = bj for j ∈ [k], x ∈ Rn
, Z ∈ S+
x,Z

(after substituting back in αj their expression in terms of x).

1.6 Examples in combinatorial optimization


Semidefinite programs provide a powerful tool for constructing useful convex
relaxations for combinatorial optimization problems. We will treat this in
detail in Part II. For now we illustrate the main idea on the following two
examples: finding a maximum independent set and a maximum cut in a
graph.
18 Semidefinite programs: Basic facts and examples

1.6.1 The maximum independent set problem


Consider a graph G = (V, E) with vertex set V = [n], the edges are un-
ordered pairs of distinct vertices. A set of nodes (or vertices) S ⊆ V is said
to be independent (or stable) in G if it does not contain an edge, i.e., if
{i, j} 6∈ E for all i 6= j ∈ S. The maximum cardinality of an independent
set is denoted as α(G), known to be the stability number (or independence
number) of G. The maximum independent set problem asks to compute α(G).
This problem is NP-hard.
Here is a simple recipe for constructing a semidefinite programming up-
per bound for α(G). It is based on the following observation: Let S be an
independent set in G and let x ∈ {0, 1}n be its incidence vector, with xi = 1
if i ∈ S and xi = 0 otherwise. Define the matrix X = xxT /|S|. Then the
matrix X satisfies the following conditions: X  0, Xij = 0 for all edges
{i, j} ∈ E, Tr(X) = 1, and hJ, Xi = |S|. It is therefore natural to consider
the following semidefinite program (in standard primal form)
ϑ(G) = max{hJ, Xi : Tr(X) = 1, Xij = 0 ({i, j} ∈ E), X  0}, (1.28)
X

whose optimum value ϑ(G) is known as the theta number of G. It follows


from the above discussion that ϑ(G) is an upper bound for the stability
number. That is, for any graph G we have
α(G) ≤ ϑ(G).
The dual semidefinite program of (1.28) reads
X
min{t : tI + yij Eij − J  0, t ∈ R, y ∈ RE }, (1.29)
{i,j}∈E

and its optimum value is equal to ϑ(G) (because (1.28) is strictly feasible
and bounded – check it). Here, in the program (1.29), we have used the
elementary matrices Eij introduced in the abstract of the chapter.
We will come back in detail to the theta number in Chapter 4. As we will
see there, there is an interesting class of graphs for which α(G) = ϑ(G),
the so-called perfect graphs. For these graphs, the maximum independent
set problem can be solved in polynomial time. This result is one of the
first breakthrough applications of semidefinite programming obtained by
Grötschel, Lovász and Schrijver in the early 1980s.

1.6.2 The maximum cut problem


Consider again a graph G = (V, E), where V = [n]. Given a subset S ⊆ V ,
the cut δG (S) consists of all the edges {i, j} of G that are cut by the partition
1.6 Examples in combinatorial optimization 19

(S, V \ S), i.e., for which exactly one of the two nodes i, j belongs to S; that
is,
δG (S) = {{i, j} ∈ E : i ∈ S, j ∈ V \ S}.
The maximum cut problem (or max-cut) asks to find a cut of maximum
cardinality in G. This is an NP-hard problem.
One can encode the max-cut problem using variables x ∈ {±1}n . For this,
given a subset S ⊆ V , assign xi = 1 to the nodes i ∈ S and xi = −1
to the nodes i ∈ V \ S. Then the cardinality of the cut δG (S) is equal to
P
{i,j}∈E (1 − xi xj )/2. Therefore max-cut can be formulated as
 
 X 
max-cut(G) = max (1 − xi xj )/2 : x ∈ {±1}n . (1.30)
x  
{i,j}∈E

Again there is a simple recipe for constructing a semidefinite relaxation for


max-cut: Pick a vector x ∈ {±1}n (arising in the above formulation (1.30)
of max-cut) and consider the matrix X = xxT . This matrix X satisfies the
following conditions: X  0 and Xii = 1 for all i ∈ [n]. Therefore, it is
natural to consider the following semidefinite relaxation for max-cut:
 
 X 
sdp(G) = max (1 − Xij )/2 : X  0, Xii = 1 for i ∈ [n] . (1.31)
X  
{i,j}∈E

As we will see later in Chapter 5 this semidefinite program provides a con-


stant factor approximation, independent of the graph size, for the max-cut
problem:
max-cut(G) ≤ sdp(G) ≤ 1.13 · max-cut(G).
This is a second breakthrough application of semidefinite programming, ob-
tained by Goemans and Williamson in the early 1990s.
Let LG ∈ S n denote the Laplacian matrix of G: its (i, i)th diagonal entry
is the degree of node i in G, and the (i, j)th off-diagonal entry is −1 if {i, j}
is an edge and 0 otherwise. One can easily verify that
X
xT LG x = (xi − xj )2 for all x ∈ Rn ,
{i,j}∈E

1 T 1 X
x LG x = (1 − xi xj ) for all x ∈ {±1}n .
4 2
{i,j}∈E

The first identity shows that LG  0 and the second one shows that one can
20 Semidefinite programs: Basic facts and examples

reformulate max-cut as an optimization problem involving the Laplacian


matrix:
 
1 T n
max-cut(G) = max x LG x : x ∈ {±1} . (1.32)
x 4
Analogously one can reformulate the semidefinite program (1.31) as
 
1
sdp(G) = max hLG , Xi : X  0, Xii = 1 for i ∈ [n] . (1.33)
X 4
Note that in the above reformulation (1.32) of max-cut one can replace the
condition x ∈ {±1}n by the condition x ∈ [−1, 1]n . Indeed, as the matrix
LG is positive semidefinite the quadratic function xT LG x is convex and thus
the maximum is attained at an extreme point x ∈ {±1}n of the box [−1, 1]n .
More generally, given a positive semidefinite matrix A, consider the fol-
lowing maximization problem

opt(A) = max{xT Ax : kxk∞ ≤ 1} = max{xT Ax : x ∈ [−1, 1]n }, (1.34)


x x

where kxk∞ = maxi |xi | is the `∞ -norm. Again, as we maximize a convex


function over the convex set [−1, 1]n , the maximum is attained at an extreme
point and thus (1.34) is equivalent to

opt(A) = max{xT Ax : x ∈ {±1}n }. (1.35)


x

This problem is NP-hard since it contains the max-cut problem, which is


obtained when choosing A = LG /4.
Note that if we would replace in (1.34) the `∞ -unit ball by the Euclidean
unit ball, then we find the problem of computing the largest eigenvalue of
A which, as we saw earlier, can be modeled as a semidefinite program.
Just as for max-cut one can formulate the following semidefinite relaxation
of (1.35) (and thus of (1.34)):

sdp(A) = max{hA, Xi : X  0, Xii = 1 for i ∈ [n]}.


X

As we will see in Chapter 5 this semidefinite program too gives a constant


factor approximation of the problem (1.34):
π
opt(A) ≤ sdp(A) ≤ opt(A).
2
We will investigate further variations of semidefinite approximations of quadratic
problems in Chapter 5 as well as generalizations in the context of the
Grothendieck problem in Chapter 6.
1.7 Examples in geometry 21

1.7 Examples in geometry


Given vectors u1 , . . . , un ∈ Rk , let d = (dij ) denote the vector consisting
of their pairwise squared Euclidean distances, i.e., dij = kui − uj k2 for all
1 ≤ i < j ≤ n. Now, think of the vectors ui as representing the locations
of some objects (e.g., atoms of a molecule, or sensors in a sensor network).
One might be able to determine the pairwise distances dij by making some
measurements. However, in general, one can determine these distances dij
only for a subset of pairs, that we can view as the set of edges of a graph G.
Then the problem arises whether one can reconstruct the locations of the
objects (the vectors ui ) from these partial measurements (the distances dij
for the edges {i, j} of G).

In mathematical terms, given a graph G = (V = [n], E) and d ∈ RE+,


decide whether there exist vectors u1 , . . . , un ∈ Rk such that

kui − uj k2 = dij for all {i, j} ∈ E.

Of course, this problem comes in several flavors. One may search for such
vectors ui lying in a space of prescribed dimension k; then typically k = 1, 2,
or 3 would be of interest. This is in fact a hard problem. However, if we relax
the bound on the dimension and simply ask for the existence of the ui ’s in
Rk for some k ≥ 1, then the problem can be cast as the problem of deciding
feasibility of a semidefinite program.

Lemma 1.7.1 Given d ∈ RE k


≥0 , there exist vectors u1 , . . . , un ∈ R (for
some k ≥ 1) if and only if the following semidefinite program is feasible:

X  0, Xii + Xjj − 2Xij = dij for {i, j} ∈ E.

Moreover, such vectors exist in the space Rk if and only if the above semidef-
inite program has a feasible solution of rank at most k.

Proof Directly, using the fact that X  0 if and only if X admits a Gram
representation u1 , . . . , un ∈ Rk (for some k ≥ 1), i.e., Xij = uT
i uj for all
i, j ∈ [n]. Moreover, the rank of X is equal to the dimension of the linear
span of the set {u1 , . . . , un }.

Thus arises naturally the problem of finding low rank solutions to a


semidefinite program. We will come back to this topic in Chapter 8.3 in
Part III.
22 Semidefinite programs: Basic facts and examples

1.8 Examples in algebra


Another, maybe a bit unexpected at first sight, application of semidefinite
programming is for testing whether a multivariate polynomial can be written
as a sum of squares of polynomials.
First recall a bit of notation. We let R[x1 , . . . , xn ] (or simply R[x] for sim-
plicity) denote the ring of polynomials with real coefficients in n variables.
A polynomial p ∈ R[x] can be written as p = α pα xα , where pα ∈ R and
P

xα stands for the monomial xα1 1 · · · xαnn , where α ∈ Nn . The sum is finite and
the maximum value of |α| = ni=1 αi for which pα 6= 0 is the degree of p. For
P

an integer d, [x]d denotesthe vector consisting of all monomials of degree


at most d, which has n+d d entries. Here the entries are listed in some given
order; for instance, in the case n = 2, if we use the graded lexicographic
order then the entries of [x]2 are listed as 1, x1 , x2 , x21 , x1 x2 , x22 . We use the
bold-face letter p = (pα ) to denote the vector of coefficients of p (listed in
the same order). Then we can write
X
p= pα xα = pT [x]d . (1.36)
α

Definition 1.8.1 A polynomial p is said to be a sum of squares (SOS) if


p can be written as a sum of squares of polynomials, i.e., if p = m 2
P
j=1 (qj )
for some polynomials q1 , . . . , qm ∈ R[x] and some integer m ≥ 1.
As an example, consider the polynomial p = 3x21 + x22 − 2x1 x2 − 4x1 + 4
in R[x1 , x2 ]. Then p = pT [x]2 , where [x]2 = (1, x1 , x2 , x21 , x1 x2 , x22 )T and
p = (4, −2, 0, 3, −2, 1)T . It is clear that p is SOS since p = (x1 − x2 )2 + (x1 −
2)2 + x21 .
It turns out that checking whether a polynomial p is SOS can be refor-
mulated via a semidefinite program. Clearly, we may assume that p has
even degree 2d (else p is not SOS) and the polynomials qj arising in a SOS
decomposition of p will have degree at most d.
To see this let us make the following simple manipulation, based on (1.36):
 
X X X
qj2 = [x]T T T
d qj qj [x]d = [x]d qj qj T  [x]d = [x]T
d Q[x]d ,
j j j

after setting Q = j qj qj T . Having such a decomposition for the matrix


P

Q amounts to requiring that Q is positive semidefinite. Therefore, we have


just shown that the polynomial p is SOS if and only if
p = [x]T Q[x]d for some matrix Q  0.
1.9 Further reading 23

Linear conditions on Q arise by equating the coefficients of the polynomials


on both sides in the above identity.
Summarizing, one can test whether p can be written as a sum of squares
by checking the feasibility of a semidefinite program. If p has degree 2d, this
n+d

SDP involves a variable matrix Q of size d (the number of monomials of
degree at most d) and n+2d

2d (the number of monomials of degree at most
2d) linear constraints.
One can sometimes restrict to smaller matrices Q. For instance, if the
polynomial p is homeogeneous (i.e, all its terms have degree equal to 2d),
then we may assume without loss of generality that the polynomials qj
appearing in a SOS decomposition are homogeneous of degree d. Hence Q
will be indexed by the n+d−1

d monomials of degree equal to d.
Why bother about sums of squares of polynomials? A good reason is that
they can be useful to recognize and certify that a polynomial is globally
nonnegative and to approximate optimization problems dealing with poly-
nomials. Let us just give a glimpse on this.
Suppose that one wants to compute the infimum pmin of a polynomial p
over the full space Rn . In other words, one wants to find the largest scalar
λ for which p(x) − λ ≥ 0 for all x ∈ Rn . This is in general a hard problem.
However, if we relax the positivity condition on p−λ and instead require that
p − λ is a sum of squares, then it follows from the above considerations that
we can compute the maximum λ for which p − λ is SOS using semidefinite
programming. This gives a tractable bound psos satisfying: psos ≤ pmin .
In general the bound psos might be distinct from pmin . However in the
univariate case (n = 1), it is known that equality holds: pmin = psos . (This
follows from the result in Exercise 11.3.) Equality holds also in the quadratic
case: d = 2, and in one exceptional case: n = 2 and d = 4, as was shown by
Hilbert in 1888.
We will return to sums of squares of polynomials and their use in poly-
nomial optimization in Chapters 11.3, 12 and 13 in Part IV. We will also
consider the more general context of noncommutative polynomial optimiza-
tion and its application in quantum information in Chapter 14.4.

1.9 Further reading


A detailed treatment about Fan’s theorem (Theorem 1.2.2) can be found
in Overton and Womersley [1992] and a detailed discussion about Hoffman-
Wielandt inequality, Theorem 1.3.1 and applications to the quadratic as-
signment problem can be found in Anstreicher and Wolkowicz [2000].
24 Semidefinite programs: Basic facts and examples

The recent monograph of Ben-Tal, El Ghaoui and Nemirovski [2009] of-


fers a detailed treatment of robust optimization. The result presented in
Theorem 1.5.1 is just one of the many instances of problems which admit a
robust counterpart which is a tractable optimization problem. Although we
formulated it in terms of semidefinite programming (to fit our discussion), it
can in fact be formulated in terms of second-order conic optimization, which
admits faster algorithms.
The theta number ϑ(G) was introduced in the seminal work of Lovász
[1979]. A main motivation of Lovász was to give good bounds for the Shan-
non capacity of a graph, an information theoretic measure of the graph.
Lovász succeeded to determine the exact value of the√ Shannon capacity of
C5 , the circuit on five nodes, by computing ϑ(C5 ) = 5. This work of Lovász
can be considered as the first breakthrough application of semidefinite pro-
gramming, although the name semidefinite programming was coined only
later. Aigner and Ziegler [2003] give a beautiful treatment of this result.
The monograph by Grötschel, Lovász and Schrijver [1988] treats in detail
algorithmic questions related to semidefinite programming and, in particu-
lar, to the theta number. Polynomial time solvability based on the ellipsoid
method is treated in detail there.
Using semidefinite programming to approximate max-cut was pioneered
by the work of Goemans and Williamson [1995]. This novel approach and
their result had a great impact on the area of combinatorial optimization.
It indeed spurred a lot of research activity for getting tight approximations
for various problems. This line of research is now very active also in theoret-
ical computer science. In particular, Subbash Khot formulated his Unique
Games conjecture, that is directly relevant to the approximation guarantee
for max-cut achieved by Goemans and Williamson using the basic semidef-
inite relaxation (1.31). For his work around Unique Games Khot received
the Nevanlinna Prize in 2014. His plenary talk at the International Congress
of Mathematicians 2014 in Seoul and the related paper are available online,
see Khot [2014]. See also, e.g., the survey by Trevisan [2012].
Sums of squares of polynomials are a classical topic in mathematics and
they have many applications, e.g., to control theory and engineering. In the
late 1800s David Hilbert classified the parameters degree/number of vari-
ables for which any positive polynomial can be written as a sum of squares
of polynomials. He posed the question whether any positive polynomial can
be written as a sum of squares of rational functions, known as Hilbert’s 17th
problem. This was solved by Artin in 1927, a result which started the field of
real algebraic geometry. The survey by Reznick [2000] gives a nice overview
Exercises 25

and historical perspective and the monographs by Prestel and Delzell [2001]
and by Marshall [2008] give an in-depth treatment of positivity.

Exercises
1.1 Consider the program (1.9) in Fan’s theorem (Theorem 1.2.2).
(a) Formulate the program (1.9) as a semidefinite program in primal
standard form.
(b) Show that the dual SDP of the program (1.9) can be formulated as
the following SDP:
n
( )
X
min n kz + Zii : Z  0, −C + zI + Z  0
z∈R,Z∈S
i=1

and that there is no duality gap.


(c) Give a semidefinite programming formulation for the following prob-
lem:
min{λ1 (X) + . . . + λk (X) : hAj , Xi = bj for j ∈ [m]},
which asks for a matrix X ∈ S n satisfying a system of linear con-
straints and for which the sum of the k largest eigenvalues of X is
minimum.

1.2 Show that the following assertions (a)–(d) are equivalent for a matrix
X ∈ Rn×n :
(a) X is
a permutation matrix.
(b) X is
an orthogonal matrix and X is doubly stochastic.

(c) X doubly stochastic and kXk = n.
is
(d) X doubly stochastic with entries in {0, 1}.
is
qP
Here kXk = 2
i,j Xij denotes the Frobenius norm of the matrix X.

1.3 Let G = (V = [n], E) be a graph and let λ1 ≤ λ2 ≤ . . . ≤ λn denote


the eigenvalues of its Laplacian matrix LG .
(a) Show that LG is positive semidefinite.
(b) Show: If G is connected then the kernel of LG has dimension 1.
(c) Show: The dimension of the kernel of LG is equal to the number of
connected components of G.
(d) Show: λ2 > 0 if and only if G is connected.

1.4 Consider the program (1.33).


26 Semidefinite programs: Basic facts and examples

(a) Build the dual of the semidefinite programming (1.33) and show
that it is equivalent to
n
min {λmax (Diag(u) + LG ) : eT u = 0},
4 u∈Rn
where Diag(u) is the diagonal matrix with diagonal entries u1 , . . . , un .
(b) Show that the maximum cardinality of a cut is at most
n
λmax (LG ),
4
where λmax (LG ) is the maximum eigenvalue of the Laplacian matrix
of G.
(c) Show that the maximum cardinality of a cut in G is at most
1 n
|E| − λmin (AG ),
2 4
where AG is the adjacency matrix of G (with entry 1 at the positions
(i, j) corresponding to edges of G and 0 elsewhere).
(d) Show that both bounds in (b) and (c) coincide when G is a regular
graph (i.e., when all nodes have the same degree).

1.5 Consider the following polynomial in two variables x and y

p = x4 + 2x3 y + 3x2 y 2 + 2xy 3 + 2y 4 .

(a) Build a semidefinite program permitting to recognize whether p can


be written as a sum of squares of polynomials.
(b) Describe all possible sums of squares decompositions for p.
(c) What can you say about the number of squares needed?

1.6 Given integers n, m ∈ N we let Mn = Rn×n denote the set of n × n


matrices with real entries and Mn (Mm ) denote the set of nm × nm
matrices of the form (aij )m
i,j=1 with entries aij ∈ Mm .
Consider a linear map Φ : Mn → Mm . For any integer p ∈ N we can
define the linear map
Φ(p) : Mp (Mn ) → Mp (Mm )
(aij )pi,j=1 7→ (Φ(aij ))pi,j=1 .

Then the linear map Φ : Mn → Mm is said to be positive if Φ(A)  0


for any A ∈ Mn such that A  0, and Φ is said to be completely positive
(CP) if Φ(p) is positive for all p ∈ N. The goal of this exercise is to give
several equivalent characterizations for CP maps.
Exercises 27

Let Fij ∈ Mn denote the asymmetric elementary matrices, with all


zero entries except 1 at position (i, j). Define the matrix
CΦ = (Φ(Fij ))ni,j=1 ∈ Mn (Mm ),
known as the Choi matrix of the map Φ.
Show that the following items (a)–(d) are equivalent.
(a) Φ is CP.
(b) Φ(n) is positive.
(c) The matrix CΦ is positive semidefinite.
(d) There exist matrices Vi ∈ Rn×m (1 ≤ i ≤ nm) such that the map Φ
is given by
nm
X
Φ(A) = ViT AVi for all A ∈ Mn .
i=1

Hint: Check and use the following fact (for the implication (c) =⇒
(d)): Given a matrix V ∈ Rn×m , let x1 , . . . , xn ∈ Rm be the vectors
corresponding to its rows. Then we have V T Fij V = xi xTj.
2
Duality in conic programming (Version: May 24,
2022)

Traditionally, convex optimization problems are of the form

minimize f0 (x)
subject to f1 (x) ≤ 0, . . . , fN (x) ≤ 0,
aT T
1 x = b1 , . . . , aM x = bM ,

where the objective function f0 : D → R and the inequality constraint func-


tions fi : D → R which are defined on a convex domain D ⊆ Rn are convex,
i.e. their epigraphs

epi fi = {(x, α) : D × R : fi (x) ≤ α}, i = 0, . . . , N,

are convex sets in D × R ⊆ Rn+1 . Equivalently, the function fi is convex if


and only if

∀x, y ∈ D ∀α ∈ [0, 1] : fi ((1 − α)x + αx) ≤ (1 − α)fi (x) + αfi (y).

The equality constraints are given by vectors aj ∈ Rn \ {0} and right hand
sides bj ∈ R. The convex set of feasible solutions is the intersection of N
convex sets with M hyperplanes

N
\ M
\
{x ∈ D : fi (x) ≤ 0} ∩ {x ∈ Rn : aT
j x = bj }.
i=1 j=1

The set-up for conic programming is slightly different. We start by consid-


ering a fixed convex cone K lying in the n-dimensional Euclidean space Rn .
The task of conic programming is the following: One wants to maximize (or
minimize) a linear function over the feasible region which is given as the
Duality in conic programming (Version: May 24, 2022) 29

intersection of the convex cone K with an affine subspace:

maximize cT x
subject to x ∈ K,
aT T
1 x = b1 , . . . , am x = bm .

This differs only slightly from a traditional convex optimization problem:


The objective function is linear and feasibility with respect to the inequality
constraint functions is replaced by membership in the fixed convex cone K.
In principle, one can transform every convex optimization problem into a
conic program. However, the important point in conic programming is that
it seems that a vast majority of convex optimization problems which come
up in practice can be formulated as conic programs using the three standard
cones:

1. the non-negative orthant Rn≥0 – giving linear programming (LP),


2. the second-order cone Ln+1 – giving second-order cone programming (aka
conic quadratic programming, CQP),
3. or the cone of positive semidefinite matrices S+ n – giving semidefinite

programming (SDP).

As we will see in the next lecture, these three cones have particular nice
analytic properties: They have a self-concordant barrier function which is
easy to evaluate. This implies that there are theoretically (polynomial-time)
and practically efficient algorithms to solve these standard problems.
In addition to this, the three examples are ordered by their “difficulty”,
which can be pictured as

LP ⊆ CQP ⊆ SDP.

This means that one can formulate every linear program as a conic quadratic
program and one can formulate every conic quadratic program as a semidef-
inite program.

Why do we care about conic programming in general and do not focus on


these three most important special cases?
The answer is that conic programming gives a unifying framework to
design algorithms, to understand the basic principles of its geometry and
duality, and to model optimization problems. Moreover this offers the flexi-
bility of dealing with new cones obtained, e.g., by taking direct products of
the three standard types of cones.
30 Duality in conic programming (Version: May 24, 2022)

2.1 Convex cones and their duals


Before we can say what a “conic program” is, we have to define convex
cones. For smoothly developing the theory we desire a “nice” convex cone
K, satisfying the following properties: K is closed, convex, pointed, and has
a non-empty interior or, equivalently, it is full-dimensional.
Definition 2.1.1 A non-empty subset K of Rn is called a convex cone (or
simply cone) if it is closed under non-negative linear combinations:
αx + βy ∈ K for all α, β ∈ R+ x, y ∈ K.
One can easily check that convex cones are indeed convex sets.
Definition 2.1.2 A convex cone K is pointed if it does not contain lines:
If x and −x ∈ K, then x = 0.
A pointed convex cone in Rn defines a partial order on Rn by
x K y ⇐⇒ x − y ∈ K
for x, y ∈ Rn . In most cases it is clear from the context which cone we
consider, so we will simply write x  y.
Generally, a partial order is a relation that satisfies the following three
conditions:
(i) reflexivity: x  x for all x ∈ Rn ,
(ii) antisymmetry: x  y, y  x =⇒ x = y, for all x, y ∈ Rn ,
(iii) transitivity: x  y, y  z =⇒ x  z for all x, y, z ∈ Rn .
The model case is the half-line K = [0, ∞) ⊆ R. Then x K y is just
the greater or equal relation x ≥ y. However, in a general partial order not
every pair of vectors can be compared.
The partial order given by a pointed convex cone satisfies two extra con-
ditions:
(iv) homogenity: x  y =⇒ αx  αy for all x, y ∈ Rn , α ∈ R+ ,
(v) additivity: x  y, x0  y 0 =⇒ x + x0  y + y 0 for all x, y, x0 , y 0 ∈ Rn .
Convex cones which are useful for conic optimization are closed and full-
dimensional, i.e. have non-empty interior. Together, we call a convex cone
proper if it is closed, pointed, and full-dimensional.
For a proper convex cone K we can define strict inequalities by:
x  y ⇐⇒ x − y ∈ int K.
So vectors in the interior of K can be regarded as positive elements and K
itself is a domain of positivity.
2.1 Convex cones and their duals 31

Definition 2.1.3 The dual cone of a convex cone K ⊆ Rn is defined as


K ∗ = {y ∈ Rn : xT y ≥ 0 for all x ∈ K}.
Clearly, K ∗ is a convex cone which is closed, because it is the intersection
of closed halfspaces.
Vectors in the dual cone K ∗ can be used to separate vectors from the cone
K. The separation result from Lemma A.4.1 specializes to convex cones in
the following way. For this note that the origin belongs to K.

0
c
H

Figure 2.1 Separating a point z from a closed convex cone K.

Lemma 2.1.4 Let K ⊆ Rn be a closed convex cone and let z ∈ Rn \ K be


a vector outside of K. Then there is a linear hyperplane separating {z} and
K. Even stronger, there is a non-zero vector c ∈ Rn such that
cT x ≥ 0 > cT z for all x ∈ K.
In particular, the inner product cT z is strictly negative and c lies in the dual
cone K ∗ .
What happens when one builds the dual of the dual? Not too much, as
the bipolar theorem shows:
Theorem 2.1.5 (Bipolar theorem) Let K ⊆ Rn be a convex cone, then
K = (K ∗ )∗ .
Proof The inclusion K ⊆ (K ∗ )∗ is easy to verify using the definition only.
For the reverse inclusion, one applies the separation result in the form of
Lemma 2.1.4. Let z ∈ Rn \ K. Then there exists c ∈ K ∗ so that cT z < 0.
Hence, z 6∈ (K ∗ )∗ .
32 Duality in conic programming (Version: May 24, 2022)

Dual cones of proper cones are proper as wll; you will prove the following
lemma in Exercise 2.1.
Lemma 2.1.6 If K is a proper convex cone, then its dual cone K ∗ is
proper.
One can verify that a vector lies in the interior of K via its dual cone K ∗ ;
this fact will turn out to be quite useful and you will prove it in Exercise 2.2:
Lemma 2.1.7 Let K be a closed, full-dimensional convex cone. Then x
lies in the interior of K if and only if xT y > 0 for all y ∈ K ∗ \ {0}.

2.2 Examples of convex cones


We start by considering basic examples and constructions. Every linear sub-
space L of Rn is a convex cone, its dual cone is its orthogonal complement
L⊥ .
The conic hull of set of vectors X ⊆ Rn is the smallest convex cone
containing X. It is
(N )
X
cone X = αi xi : N ∈ N, x1 , . . . , xN ∈ X, α1 , . . . , αN ∈ R+ .
i=1

One can translate Carathéodory’s theorem for convex hulls, Lemma A.2.1,
including its proof, for conic hulls: Let y ∈ cone X be a vector in the conic
hull of X. Then there are linearly independent vectors x1 , . . . , xN ∈ X so
that y ∈ cone{x1 , . . . , xN }. In particular, N ≤ n.
Sometimes we also call the conic hull of X the cone generated by X. If
X is a finite set, then cone X is called finitely generated. By the theorem
of Minkowski and Weyl for convex cones (see Section A.7) a convex cone
K is finitely generated cone if and only it is polyhedral, that is, if it is the
intersection of finitely many halfspaces through the origin. Then,

K = cone{x1 , . . . , xN } = {x ∈ Rn : Ax ≤ 0}

for some vectors x1 , . . . , xN and some matrix A ∈ Rm×n . Using this repre-
sentation it is immediate to find the dual cone of K, it is

K ∗ = cone{−a1 , . . . , −am } = {x ∈ Rn : −xT T


1 x ≤ 0, . . . , −xN x ≤ 0},

where aT T
1 , . . . , am are the row vectors of matrix A.
The direct product of two convex cones K1 ⊆ Rn1 and K2 ⊆ Rn2 is

K1 × K2 = {(x1 , x2 ) ∈ Rn1 +n2 : x1 ∈ K1 , x2 ∈ K2 }


2.2 Examples of convex cones 33

is again a convex cone. We have (K × K1 )∗ = K1∗ × K2∗ . If K1 and K2 are


both proper, then their direct product K1 × K2 is proper, too.

2.2.1 The non-negative orthant


The non-negative orthant is defined as
Rn+ = {x = (x1 , . . . , xn )T ∈ Rn : x1 , . . . , xn ≥ 0}.
It is a pointed, closed and full-dimensional cone. The non-negative orthant is
a polyhedral cone and it can also be written as Rn+ = cone{e1 , . . . , en } where
e1 , . . . , en are the standard unit basis vectors in Rn . Another way to write
Rn+ is as an n-fold direct product Rn+ = R+ × · · · × R+ . The non-negative
orthant is self-dual: (Rn+ )∗ = Rn .
The partial order x  y associated to the non-negative orthant means
inequality coordinate-wise: xi ≥ yi for all i ∈ [n]. We also then simply write
x ≥ y.

2.2.2 The second-order cone


While the non-negative orthant is a polyhedral cone, the following cone is
not. The second-order cone is defined in the Euclidean space Rn+1 = Rn × R
with the standard inner product. It is
 q 
n+1 n 2 2
L = (x, s) ∈ R × R : kxk2 = x1 + · · · + xn ≤ s .

Here (x, s) stands for the (column) vector in Rn+1 obtained by appending a
new entry s ∈ R to x ∈ Rn , we use this notation to emphasize the different
nature of the vector’s components. Sometimes Ln+1 is also called the ice
cream cone (make a drawing of L2+1 to convince yourself) or the Lorentz
cone.
The second-order cone is a special case of a norm cone. The norm cone
associated to any norm k · k in Rn is defined as
Ln+1 n
k·k = {(x, s) ∈ R × R : kxk ≤ s} .

The norm cone is proper for every norm. The dual norm of k · k is defined
as
kxk∗ = sup{xT y : y ∈ Rn , kyk ≤ 1}
The dual norm is again a norm and its norm cone satisfies
n+1 ∗
Ln+1
k·k∗ = (Lk·k ) .
34 Duality in conic programming (Version: May 24, 2022)

Since the `2 -norm is its own dual, the second-order cone is self-dual; we have
(Ln+1 )∗ = Ln+1 .

2.2.3 The cone of positive semidefinite matrices


The cone of positive semidefinite matrices lies in the space S n of symmetric
n × n matrices, which can be seen as the (n(n + 1)/2)-dimensional Euclidean
space, equipped with the trace inner product: for two matrices X, Y ∈ Rn×n ,
n X
X n n
X
T
hX, Y i = Tr(X Y ) = Xij Yij , where TrX = Xii .
i=1 j=1 i=1

Here we identify the Euclidean space S n with Rn(n+1)/2 by the isometry


T : S n → Rn(n+1)/2 defined by
√ √ √
T (X) = (X11 , 2X12 , √2X13 , . . . , √2X1n ,
X22 , 2X23 , . . . , 2X2n , (2.1)
..., Xnn )

where we only consider the upper triangular part of the matrix X. The cone
of semidefinite matrices is
n
S+ = {X ∈ S n : X is positive semidefinite},

where a matrix X is positive semidefinite if xT Xx ≥ 0 for all x ∈ Rn . Other


characterizations of positive semidefinite matrices are given in Appendix
B, there we assemble more useful facts about the cone S+ n and the trace

iner product. For example, we will see that the cone of positive semidefinite
matrices is a proper convex cone which is self-dual.

2.2.4 The completely positive and the copositive cone


The cone of completely postive matrices, defined as

CP n = cone{xxT : x ∈ Rn+ }
n . It is a proper convex cone. Its dual is
is contained in S+

(CP n )∗ = COP \ = {X ∈ S n : xT Xx ≥ 0 for all x ∈ Rn+ },

which is the cone of copositive matrices.


2.3 Primal and dual conic programs 35

2.3 Primal and dual conic programs


Let K ⊆ Rn be a proper convex cone.
Definition 2.3.1 Given c ∈ Rn , a1 , . . . , am ∈ Rn , and b1 , . . . , bm ∈ R,
a primal conic program (in standard form) is the following maximization
problem:
sup{cT x : x ∈ K, aT T
1 x = b1 , . . . , am x = bm },

which can also be written in a more compact form as


sup{cT x : x ∈ K, Ax = b}, (P)
where A is the m × n matrix with rows aT T
1 , . . . , am and right hand side
T m
b = (b1 , . . . , bm ) ∈ R .
We say that x ∈ Rn is a feasible solution of the primal (P) if it lies in
the cone K and if it satisfies the equality constraints. It is a strictly feasible
solution if it additionally lies in the interior of K.
Note that we used a supremum here instead of a maximum. The reason is
simply that sometimes the supremum is not attained. We shall see examples
in Section 2.5.
The principal problem of duality is to find upper bounds for the primal
conic program (a maximization problem), in a systematic, or even mechani-
cal way. This is helpful e.g. in formulating optimality criteria and in the de-
sign of efficient algorithms. Duality is a powerful technique, and sometimes
translating primal problems into dual problems gives unexpected benefits
and insights.
Definition 2.3.2 With the primal conic program (P) associate its dual
conic program, which is the following minimization problem
 
Xm m
X 

inf yj bj : y1 , . . . , ym ∈ R, yj aj − c ∈ K ,
 
j=1 j=1

or, more compactly,


inf{bT y : y ∈ Rm , AT y − c ∈ K ∗ }. (D)
We say that y ∈ Rm is a feasible solution of the dual (D) if AT y − c ∈ K ∗ .
It is a strictly feasible solution if AT y − c ∈ int K ∗ .
The following explanation shows how to view the primal and the dual
conic program geometrically. For this consider the linear subspace
L = {x ∈ Rn : Ax = 0},
36 Duality in conic programming (Version: May 24, 2022)

and its orthogonal complement


n o
L⊥ = AT y : y ∈ Rm .

We may assume that there exists a point x0 ∈ Rn satisfying Ax0 = b for, if


not, the primal conic program would not have a feasible solution. Note then
that
bT y = xT T T T T
0 A y = x0 (A y − c) + x0 c.

Therefore, the primal conic program (P) can be written as

sup{cT x : x ∈ K ∩ (x0 + L)}

and the dual conic program (D) as


∗ ⊥
cT x0 + inf{xT
0 z : z ∈ K ∩ (−c + L )}.

Now the symmetry between the primal and the dual conic program becomes
more clear, both programs maximize, respectively minimize, a linear func-
tional over the intersection of a cone with an affine subspace. Using this
geometric view and by applying the bipolar theorem, Theorem 2.1.5, it is
easy to see that computing the dual of the dual conic program gives back
the primal.
Now we specialize the cone K to the main examples of Section 2.2. These
examples are useful for a huge spectrum of applications.

2.3.1 Example: Linear programming (LP)


A conic program where K is the non-negative orthant Rn+ is a linear program.
We write a primal linear program (in standard form) as

sup{cT x : x ≥ 0, Ax = b}.

The non-negative orthant is self-dual, so the dual linear program is

inf{bT y : AT y − c ≥ 0}.

2.3.2 Example: Conic quadratic programming (CQP)


A conic program where K is the direct product of second-order cones Ln+1
is a conic quadratic program. For

K = Ln1 +1 × Ln2 +1 × · · · × Lnr +1


2.3 Primal and dual conic programs 37

we write a primal conic quadratic program (in standard form) as


sup{((c1 , γ1 ), . . . , (cr , γr ))T ((x1 , s1 ), . . . , (xr , sr )) :
((x1 , s1 ), . . . , (xr , sr )) ∈ Ln1 +1 × Ln2 +1 × · · · × Lnr +1
((aj,1 , αj,1 ), . . . , (aj,r , αj,r ))T ((x1 , s1 ), . . . , (xr , sr )) = bj for j ∈ [m]}.
The second-order cone is self-dual, hence K ∗ = K, and the dual conic
quadratic program equals
n
inf bT y : y ∈ Rm ,
m
X
yj ((aj,1 , αj,1 ), . . . , (aj,r , αj,r )) − ((c1 , γ1 ), . . . , (cr , γr ))
j=1
o
∈ Ln1 +1 × Ln2 +1 × · · · × Lnr +1

The dual can be written in a nicer and more intuitive form using the defini-
tion of the cone Ln+1 and the Euclidean norm. For this define the matrices
Ai = a1,i , . . . , am,i ∈ Rni ×m for i ∈ [r],


and vectors
αi = (α1,i , . . . , αm,i )T for i ∈ [r].
Then the dual if equivalent to
inf{bT y : y ∈ Rm , kAi y − ci k ≤ αiT y − γi for i ∈ [r]}.
In particular, when setting Ai = 0 and ci = 0, we see that linear program-
ming is a special case of convex quadratic programming.

2.3.3 Example: Semidefinite programming (SDP)


A conic program where K is the cone of positive semidefinite matrices S+n is

a semidefinite program. We write a primal semidefinite program (in standard


form) as
n
sup{hC, Xi : X ∈ S+ , hA1 , Xi = b1 , . . . , hAm , Xi = bm }.
In this case, when we map S n to Rn(n+1)/2 by the isometry defined in (2.1),
the matrix A in (P) has size m × (n(n + 1)/2), has T (Aj ), with j ∈ [m], as
row vectors, and satisfies
 
hA1 , Xi
AT (X) =  ..
.
 
.
hAm , Xi
38 Duality in conic programming (Version: May 24, 2022)

The cone of positive semidefinite matrices is self-dual, and hence the dual
semidefinite program is
 
Xm m
X 
n
inf bj yj : y1 , . . . , ym ∈ R, yj Aj − C ∈ S+ .
 
j=1 j=1

Engineers and applied mathematicians like to call an inequality of the form


m
X
y i Ai − C  0
i=1

a linear matrix inequality (LMI) between the parameters y1 , . . . , ym . It is


a convenient way to express a convex constraint posed on the vector y =
(y1 , . . . , ym )T .
Semidefinite programs where the optimization variable X is restricted to a
diagonal matrix are equivalent to linear programs. Thus, linear programming
is a special case of semidefinite programming.
Convex quadratic programming is also a special case of semidefinite pro-
gramming. This can be proved via the Schur complement.
Lemma 2.3.3 One can express the second order cone by a linear matrix
inequality:
   
n+1 n+1 sIn x
L = (x, s) ∈ R : 0 ,
xT s

where In ∈ Rn×n is the identity matrix.

Proof If s = 0, then x = 0. If s > 0, then sIn is positive


 definite and we con-

sIn x
sider the Schur complement of sIn in the matrix xT s . By Lemma B.2.2
 
sIn x 1
T  0 ⇐⇒ s − xT In x ≥ 0 ⇐⇒ kxk ≤ s.
x s s

2.3.4 Example: Copositive programming


Conic programming involving the copositive cone is called copositive pro-
gramming. Syntactically it looks similar to semidefinite programming but
there are two very important differences. First, the copositive cone is not
self-dual. The primal conic program is using the cone of completely positive
matrices.

sup{hC, Xi : X ∈ CP n , hA1 , Xi = b1 , . . . , hAm , Xi = bm }.


2.4 Duality theory 39

The dual conic program is using the cone of copositive matrices.


 
Xm m
X 
inf bj yj : y1 , . . . , ym ∈ R, yj Aj − C ∈ COP n .
 
j=1 j=1

Second, one can model computationally difficult, NP-hard problem, like de-
termining the independence number of a graph, as a copositive program,
see Exercise 2.7. This in particular shows that conic optimization is not
necessarily computationally easy.

2.4 Duality theory


Duality is concerned with understanding the relation between the primal
conic program and the dual conic program. We denote the supremum of the
primal conic program by p∗

p∗ = sup{cT x : x ∈ K, Ax = b}, (P)

and the infimum of the dual conic program by d∗

d∗ = inf{bT y : y ∈ Rm , AT y − c ∈ K ∗ }. (D)

What is the relation between p∗ and d∗ ?


As we shall see in the next theorem it turns out that we have always
p∗ ≤ d∗ . The nonnegative difference d∗ − p∗ is called the duality gap between
the primal conic program and dual conic program.
In many cases one has equality p∗ = d∗ and that the supremum as well as
the infimum are attained. In these cases duality theory can be very useful
because sometimes it is easier to work with the dual problem instead of the
primal problem.

Theorem 2.4.1 Suppose we are given a pair of primal and dual conic
programs. Let p∗ be the supremum of the primal and let d∗ be the infimum
of the dual.

(i) (weak duality) Suppose x is a feasible solution of the primal conic pro-
gram, and y is a feasible solution of the dual conic program. Then,

cT x ≤ bT y.

In particular p∗ ≤ d∗ .
(ii) (complementary slackness) Suppose that the primal conic program
40 Duality in conic programming (Version: May 24, 2022)

attains its supremum at x, and that the dual conic program attains its
infimum at y, and that p∗ = d∗ . Then
 T
AT y − c x = 0.

(iii) (optimality criterion) Suppose that x is a feasible solution of the pri-


mal conic program, and y is a feasible solution of the dual conic program,
and equality
 T
AT y − c x = 0

holds. Then the supremum of the primal conic program is attained at x,


the infimum of the dual conic program is attained at y, and p∗ = d∗ holds.
(iv) (strong duality; no duality gap) If the dual conic program is bounded
from below and if it is strictly feasible, then the primal conic program
attains its supremum and there is no duality gap: p∗ = d∗ .
Dually, if the primal conic program is bounded from above and if it
is strictly feasible, then the dual conic programs attains its infimum and
there is no duality gap.

Before we proceed to the proof one more comment about the usefulness
of weak duality: Suppose you want to solve a primal conic program. If an
oracle gives you y, then it might be wise to check whether AT y − c lies in
K ∗ . If so, then this gives immediately an upper bound for p∗ .
One last remark: If the dual conic program is not bounded from below,
that is, if d∗ = −∞, then weak duality implies that p∗ = −∞, and so the
primal conic program is infeasible.

Proof The proof of weak duality is important but simple. It reveals the
origin of the definition of the dual conic program: We have

bT y = (Ax)T y = xT AT y ≥ xT c,

where the last inequality is implied by AT y − c ∈ K ∗ and by x ∈ K.


Now complementary slackness and the optimality criterion imme-
diately follow from this.
Strong duality needs considerably more work. It suffices to prove the
first statement; the second one follows using the symmetry between the
primal and dual problems. We assume that d∗ > −∞ and that the dual
program has a strict feasible solution, so d∗ is a real number. Using these
assumptions we will construct a primal feasible solution x∗ with cT x∗ ≥ d∗ .
2.4 Duality theory 41

Then, weak duality implies p∗ = d∗ and hence x∗ is a maximizer of the


primal conic program.
If b = 0 then d∗ = 0 and setting x∗ = 0 proves the result immediately.
From now on we assume that b 6= 0.
Consider the convex set
n o
M = AT y − c : y ∈ Rm , bT y ≤ d∗ ,

which is not empty because b 6= 0. We first claim that


M ∩ int K ∗ = ∅.
For suppose not. Then there exists a vector y ∈ Rm such that
AT y − c ∈ int K ∗ and bT y ≤ d∗ .
Consider
b
y0 = y − ε for some ε > 0.
kbk2
Then  
T 0 b
b y =b T
y−ε ≤ d∗ − ε
kbk2
and AT y 0 − c ∈ int K ∗ for small enough ε since the linear map given by AT
is continuous. This contradicts the fact that d∗ is the infimum of the dual
conic program.
Since M and int K ∗ are disjoint nonempty convex sets we can separate
them by an affine hyperplane. According to Theorem A.4.5 there is a vector
x ∈ Rn \ {0} so that
−∞ < sup{xT z : z ∈ M } ≤ inf{xT z : z ∈ int K ∗ }
(2.2)
= inf{xT z : z ∈ K ∗ }.
We shall use this vector x to construct a maximizer of the primal conic
program. We proceed in three steps.
First step: x ∈ K.
It suffices to show that
inf{xT z : z ∈ K ∗ } ≥ 0, (2.3)
as this implies that x ∈ (K ∗ )∗ = K because K is a proper cone. We show the
inequality by contradiction. Suppose there is a vector z ∈ K ∗ with xT z < 0.
For every positive λ, the vector λz lies in the convex cone K ∗ . Making λ
extremely large drives xT (λz) towards −∞. We reach a contradiction since,
42 Duality in conic programming (Version: May 24, 2022)

inequality (2.2) would give sup{xT z : z ∈ M } = −∞, which is impossible


since M 6= ∅.
Second step: There exists µ > 0 so that Ax = µb and cT x ≥ µd∗ .
Since 0 ∈ K ∗ we have that the infimum in (2.3) is equal to 0. Therefore,
by (2.2), sup{xT z : z ∈ M } ≤ 0. In other words, by the definition of M , for
any y ∈ Rm ,
 
bT y ≤ d∗ =⇒ xT AT y − c ≤ 0

or, equivalently,
bT y ≤ d∗ =⇒ (Ax)T y ≤ xT c.
This means that
{y ∈ Rm : bT y ≤ d∗ } ⊆ {y ∈ Rm : (Ax)T y ≤ xT c}.
The set on the left hand side is a half-space with normal vector b.
If Ax 6= 0, then the set on the right hand side is also half-space with
normal vector Ax which points in the same direction as b. So there is a
strictly positive scalar µ > 0 such that
Ax = µb and µd∗ ≤ xT c,
and we are done.
If Ax = 0, then on the one hand, we have that xT c ≥ 0. On the other hand,
using the assumption that the conic dual program is strictly feasible, there
exists y 0 ∈ Rm such that AT y 0 − c ∈ int K ∗ . This implies
 T
0 < AT y 0 − c x = −cT x,

where the strict inequality follows from Lemma 2.1.7. This gives cT x < 0, a
contradiction.
Third step: x∗ = x/µ is a maximizer of the primal conic program.
We saw above that x∗ ∈ K and Ax∗ = b. Thus x∗ is a primal feasible
solution. Furthermore, cT x∗ ≥ d∗ ≥ p∗ .

2.5 Some pathological examples


If you know linear programming and its duality theory you might wonder
why do we always write sup and inf instead of max and min and why do we
care about strictly feasibility in Theorem 2.4.1. Why doesn’t strong duality
always hold? Here are some examples of semidefinite programs showing that
we indeed have to be more careful.
2.5 Some pathological examples 43

2.5.1 Dual infimum not attained


In the first example, there is no duality gap, the primal optimal value is
attained, but the optimal dual value is not attained.
Consider the semidefinite program
n 0 1 

p = sup , X : X  0,
1 0
     
1 0 0 0 o
, X = 1, ,X = 0 .
0 0 0 1
The matrix X = ( 10 00 ) is the only feasible solution, and so p∗ = 0 is attained.
The dual is
       
∗ 1 0 0 0 0 1
d = inf y1 : y1 , y2 ∈ R, y1 + y2 − 0 .
0 0 0 1 1 0
For solving the dual we need to find a positive semidefinite matrix
 
y1 −1
−1 y2
where y1 is a nonnegative number which should be as small as possible.
Choosing y1 to be an arbitrary positive number is possible, but not choosing
y1 = 0. Hence, d∗ = 0 and the infimum is not attained. Note indeed that
the primal is not strictly feasible.

2.5.2 Positive duality gap


The second example demonstrates that there can be a positive duality gap.
Consider the primal semidefinite program with data matrices
     
0 0 0 1 0 0 0 0 1
C = 0 −1 0 , A1 = 0 0 0 , A2 = 0 1 0 ,
0 0 0 0 0 0 1 0 0
and b1 = 0, b2 = 1. It reads
p∗ = sup{−X22 : X  0, X11 = 0, 2X13 + X22 = 1}.
Its dual reads
   
 y1 0 y2 
d∗ = inf y2 : y1 , y2 ∈ R,  0 y2 + 1 0   0 .
y2 0 0
 

Then every primal feasible solution satisfies X13 = 0, X22 = 1, so that the
44 Duality in conic programming (Version: May 24, 2022)

primal optimum value is equal to p∗ = −1, attained at the matrix X = E22 .


Every dual feasible solution satisfies y2 = 0, so that the dual optimum value
is equal to d∗ = 0, attained at y1 = y2 = 0. Hence there is a positive duality
gap, d∗ − p∗ = 1, and both primal and dual are not strictly feasible.

2.5.3 Both primal and dual infeasible


The third example shows that primal and dual conic programs can be in-
feasible together.
Consider the semidefinite program
n 0 0 

p = sup , X : X  0,
0 1
     
1 0 0 1 o
, X = 0, ,X = 1
0 0 1 0
and its dual
   
∗ y1 y2
d = inf y2 : y1 , y2 ∈ R, 0 .
y2 −1
Both programs are infeasible, so that
−∞ = p∗ < d∗ = +∞.

2.6 Theorems of alternatives


Consider the following two conic programming systems
Ax = b, x ∈ K, (2.4)
and
AT y ∈ K ∗ , bT y < 0. (2.5)
Clearly, if (2.4) has a solution then (2.5) has no solution: If x is feasible
for (2.4) and y is feasible for (2.5) then
0 ≤ (AT y)T x = y T Ax = y T b < 0,
giving a contradiction.
When K is the non-negative orthant then the converse also holds: If (2.4)
has no solution then (2.5) has a solution. This fact follows by applying
the separation theorem (Lemma 2.1.4). Indeed, assume that (2.4) has no
solution. Then b does not belong to the cone
CA = {Ax : x ∈ K}
2.6 Theorems of alternatives 45

generated by the columns of A. This is a finitely generated cone. Therefore


it is closed and by Lemma 2.1.4, there exists a hyperplane separating {b}
and the cone CA . Say this hyperplane has normal vector y ∈ Rm . So we have
the inequalities AT y ≥ 0 and y T b < 0. This shows that y is feasible for (2.5).
In fact, we just proved Farkas’ lemma for linear programming:
Theorem 2.6.1 Given A ∈ Rm×n and b ∈ Rm , exactly one of the following
two alternatives holds:
(i) Either the linear system Ax = b, x ≥ 0 has a solution,
(ii) or the linear system AT y ≥ 0, bT y < 0 has a solution.
However, for general conic programming, it is not true that infeasibility
of (2.4) implies feasibility of (2.5).
Example 2.6.2 Consider the following semidefinite systems:
hE11 , Xi = 0, hE12 , Xi = 1, X  0, (2.6)
and
y1 E11 + y2 E12  0, y2 < 0, (2.7)
which are both infeasible. Why does the proof above, which worked for the
non-negative orthant, fail for the cone of positive semidefinite matrices? The
answer is that the cone
{(hE11 , Xi, hE12 , Xi) : X  0} = {(X11 , X12 ) : X11 > 0, X12 ∈ R} ∪ {(0, 0)}
is not closed und thus Lemma 2.1.4 is not applicable.
There are two possible routes to establish a theorem of alternatives for
conic programming systems.
If one wants to leave the condition of feasibility of the dual (2.5) intact,
then one route is to weaken the condition of feasibility in the primal (2.4).
Definition 2.6.3 We say that the conic programming system (2.4) is
weakly feasible if for every ε > 0 there is a x ∈ K so that
kAx − bk ≤ ε.
In other words, there is a sequence (xi )i∈N with xi ∈ K so that lim Axi = b.
i→∞

Theorem 2.6.4 Let K ⊆ Rn be a proper convex cone. Given A ∈ Rm×n


and b ∈ Rm , exactly one of the following two alternatives holds:
(i) Either the system Ax = b, x ∈ K is weakly feasible,
(ii) or the system AT y ∈ K ∗ , bT y < 0 has a solution.
46 Duality in conic programming (Version: May 24, 2022)

Proof Suppose (i) is not weakly feasible. Then b does not lie in the closed
convex cone
CA = {Ax : x ∈ K}.

Now we can complete the proof exactly as in the case of the non-negative
orthant.

Example 2.6.5 Consider Example 2.6.2: The system (2.6) is weakly fea-
sible as one sees by choosing the sequence
1 
i 1
Xi = , i ∈ N.
1 i
We can derive a variant of Theorem 2.6.4 by switching primal with dual,
see Exercise 2.8.
Definition 2.6.6 Given A ∈ Rm×n and c ∈ Rn . We say that the conic
programming system
AT y − c ∈ K ∗

is weakly feasible if for every ε > 0 there exists y ∈ Rm and z ∈ K ∗ so that

kAT y − c − zk ≤ ε.

Theorem 2.6.7 Let K ⊆ Rn be a proper convex cone. Given A ∈ Rm×n


and c ∈ Rn , exactly one of the following two alternatives holds:

(i) Either the system Ax = 0, x ∈ K, and cT x > 0 has a solution,


(ii) or the system AT y − c ∈ K ∗ is weakly feasible.
The second route towards a theorem of alternatives is to strengthen the
condition of feasibility in the primal (2.4) by requiring strict feasibility. Then
one also has to modify the dual (2.5) to avoid trivial feasibility.
Theorem 2.6.8 Let K ⊆ Rn be a proper convex cone, and let A ∈ Rm×n ,
b ∈ Rm . Assume that the linear system Ax = b has a solution x0 . Then
exactly one of the following two alternatives holds:

(i) Either there exists x ∈ int K such that Ax = b,


(ii) or there exists y ∈ Rm such that AT y ∈ K ∗ \ {0}, bT y ≤ 0.

Proof Again one direction is clear: If x ∈ int K satisfies Ax = b and y


satisfies AT y ∈ K ∗ \ {0} and bT y ≤ 0, then we get

0 ≤ (AT y)T x = y T Ax = y T b ≤ 0,
2.6 Theorems of alternatives 47

implying (AT y)T x = 0. This gives a contradicts Lemma 2.1.7 since x ∈ int K
and AT y ∈ K ∗ \ {0}.
Let us turn to the other direction. By assumption, the affine space L =
{x : Ax = b} is not empty as it contains x0 . Write L = L + x0 for the linear
space L = {x : Ax = 0}.
Because (i) has no solution, L ∩ int K = ∅. By the separation theorem,
Theorem A.4.5, there exists a hyperplane separating L and int K: There
exists a non-zero vector c ∈ Rn and a scalar β such that

sup{cT x : x ∈ L} ≤ β ≤ inf{cT x : x ∈ int K} = inf{cT x : x ∈ K}

holds.
Then β ≤ 0 as 0 ∈ K. Furthermore, c ∈ K ∗ because for all y ∈ K and
all t > 0 we have cT (tx) ≥ β which implies cT x ≥ 0. Moreover, for every
x ∈ L and for every scalar t ∈ R, either positive or negative, we have that
cT (tx + x0 ) leβ which implies cT x = 0. Therefore c ∈ L⊥ and thus c is a
linear combination of the row vectors of A, say c = AT y for some y ∈ Rm .
Therefore, AT y ∈ K ∗ \ {0}.
Finally, since x0 ∈ L, we have

y T b = y T Ax0 = cT x0 ≤ β ≤ 0.

Example 2.6.9 Consider again Example 2.6.2: The system (2.6) is not
strictly feasible, but there is a feasible solution y1 = 1, y2 = 0 of (2.7)
after replacing the condition y2 < 0 by y2 ≤ 0 and adding the condition
y1 E11 + y2 E12 6= 0.

Again we can derive a variant of Theorem 2.6.8 by switching primal with


dual, see Exercise 2.9.

Theorem 2.6.10 Let K ⊆ Rn be a proper convex cone, and let A ∈ Rm×n ,


c ∈ Rn . Then exactly one of the following two alternatives holds:

(i) Either there exists x ∈ K \ {0} such that Ax = 0 and cT x ≥ 0,


(ii) or there exists y ∈ Rm such that AT y − c ∈ int K ∗ .

For instance, we can use this theorem of alternatives to reformulate the


sufficient condition for strong duality from Theorem 2.4.1, in terms of the
set of primal optimal solutions.
Consider again the pair of primal/dual conic programs

p∗ = sup{cT x : x ∈ K, aT
j x = bj for j ∈ [m]}, (2.8)
48 Duality in conic programming (Version: May 24, 2022)
 
 m
X 
d∗ = inf bT y : y ∈ Rm , yj aj − c ∈ K ∗ . (2.9)
 
j=1

Proposition 2.6.11 The following assertions are equivalent:1


(1) The set of optimal solutions of program (2.8) is nonempty and bounded.
(2) Program (2.8) is feasible and program (2.9) is strictly feasible.
(3) Program (2.9) is strictly feasible and bounded from below (i.e., d∗ > −∞).
Moreover, any of them implies strong duality: p∗ = d∗ .
Proof Assume (1) holds, we show (2). Suppose for contradiction that (2.9)
is not strictly feasible. Then alternative (2) in Theorem 2.6.10 does not hold
and thus alternative (1) holds: there exists x ∈ K \ {0} such that aT jx = 0
T ∗
for j ∈ [m] and c x ≥ 0. Pick an optimal solution x of (2.8). Then, for all
t ≥ 0, the point x∗ +tx is also an optimal solution of (2.8), which contradicts
the assumption that the set of optimal solutions is bounded.
Clearly, (2) implies (3).
Assume now (3) holds, we show (1). Then, in view of Theorem 2.4.1, strong
duality holds and (2.8) has an optimal solution. Assume for contradiction
that the set of optimal solutions is not bounded. Then there exists an optimal
solution x∗ and a nonzero vector w ∈ Rn such that all points in the half-ray
R = {x∗ + tw : t ≥ 0} are optimal solutions of (2.8). This implies aT jw = 0
for j ∈ [m] and cT w = 0. By assumption, alternative (2) holds in Theorem
2.6.10 (since (2.9) is strictly feasible) and thus alternative (1) does not hold.
This implies that w 6∈ K and in turn that x∗ + tw 6∈ K for some t large
enough, thus contradicting the fact that the half-ray R is contained in K.
As an application, we obtain a simple sufficient condition for strong du-
ality, which depends only on the primal program (and does not require
explicitly checking strict feasibility of the dual).
Corollary 2.6.12 Assume that the (primal) program (2.8) has a nonempty
and bounded feasibility region. Then strong duality holds: p∗ = d∗ .

2.7 More differences between linear and semidefinite


programming
We have already seen above several differences between linear programming
and semidefinite programming: there might be a duality gap between the
1 This can be found as an exercise in the book by Ben-Tal and Nemirovski [2001], and the SDP
case was treated by Trnovská [2005].
2.7 More differences between linear and semidefinite programming 49

primal and dual programs and the supremum/infimum might not be attained
even though they are finite. We point out some more differences regarding
rationality and bit size of optimal solutions.
In the classical bit (Turing machine) model of computation an integer
number p is encoded in binary notation, so that its bit size is log p + 1
(logarithm in base 2). Rational numbers are encoded as two integer numbers
and the bit size of a vector or a matrix is the sum of the bit sizes of its entries.
Consider a linear program
max{cT x : Ax = b, x ≥ 0}, (2.10)
where the data A, b, c is rational-valued. From the point of view of com-
putability this is a natural assumption and it would be desirable to have
an optimal solution which is also rational-valued. A fundamental result in
linear programming asserts that this is indeed the case: If program (2.10)
has an optimal solution, then it has a rational optimal solution x ∈ Qn ,
whose bit size is polynomially bounded in terms of the bit sizes of A, b, c.
On the other hand it is easy to construct instances of semidefinite pro-
gramming where the data are rational valued, yet there is no rational optimal
solution. For instance, the following program
   
1 x
max x : 0
x 2

attains its maximum at x = 2.
Consider now the semidefinite program, with variables x1 , . . . , xn ,
     
1 2 1 xi−1
inf xn :  0,  0 for i = 2, . . . , n .
2 x1 xi−1 xi
n
Then any feasible solution satisfies xn ≥ 22 . Hence the bit-size of an optimal
solution is exponential in n, thus exponential in terms of the bit-size of the
data.
The above facts suggest difficulties for complexity issues about semidef-
inite programming. For instance, one cannot hope for a polynomial time
algorithm in the bit model of computation for solving a semidefinite pro-
gram exactly, since the output might not even be representable in this model.
Moreover, even if we set up to the less ambitious goal of just computing -
approximate optimal solutions, we should make some assumptions on the
semidefinite program, roughly speaking, in order to avoid having too large
or too small optimal solutions. We will come back to these complexity issues
about semidefinite programming in Chapter 3.
50 Duality in conic programming (Version: May 24, 2022)

2.8 Further reading


Conic programs, especially linear programs, conic quadratic programs, and
semidefinite programs are the central topic in the text book of Ben-Tal
and Nemirovski [2001]. There, also many interesting engineering applica-
tions (synthesis of filters and antennas, truss topology design, robust opti-
mization, optimal control, stability analysis and synthesis, design of chips)
are covered. A nutshell version of this book is Nemirovski’s plenary talk
“Advances in convex optimization: conic programming” at the International
Congress of Mathematicians in Madrid 2006 for which a paper and a video is
available online: see Nemirovski [2007]. It is astonishing how much material
Nemirovski covers in only 60 minutes.
A second excellent text book on convex optimization is the book by Boyd
and Vandenberghe [2004] (available online). Here the treated applications
are: approximation and fitting, statistical estimation, and geometric prob-
lems. Videos of Boyd’s course held at Stanford can also be found there.
The duality theory for linear programming, especially the absence of du-
ality gaps, is explained in every book on linear programming. For example,
the monograph by Schrijver [1986] is a good source, also for a detailed ex-
position of the existence of polynomial-size rational optimal solutions for
LP.
We have seen that there might be a duality gap between a primal semidef-
inite program and its dual. An exact extended duality theory for semidefi-
nite programming has been developed by Ramana [1997], showing that for
each primal semidefinite program one can construct in polynomial time a
new semidefinite program (an ‘extended dual’) having the property that the
original primal program is feasible if and only if the new extended dual pro-
gram is infeasible. This exact extended duality theory was recently revisited
by Klep and Schweighofer [2013], who offer an algebraic interpretation of the
duality gap and the extended dual. Ramana [1997] uses this extended dual-
ity theory to show that the following semidefinite feasibility problem belongs
to NP if and only if it belongs to co-NP: given integer data matrices Aj , C
decide whether the LMI m m
P
j=1 yj Aj − C  0 admits a solution y ∈ R . On
the other hand, it is not known whether this problem belongs to NP.

2.9 Historical remarks


The history of conic programming is difficult to trace. Only recently re-
searchers recognized that they give a unifying framework for convex opti-
mization.
Exercises 51

In 1956, Duffin in a short paper “Infinite programs” (Duffin [1956]) intro-


duced conic programs. His approach even works in infinite dimensions and he
focused on these cases. However, the real beginning of conic programming
seems to be 1993 when the book “Interior-Point Polynomial Algorithms
in Convex Optimization” by Yurii Nesterov and Arkadi Nemirovski [1994]
was published. There they described for the first time a unified theory of
polynomial-time interior point methods for convex optimization problems
based on their conic formulations. Concerning the history of conic programs
they write:
Duality for convex program involving “non-negativity constraints” defined by a
general-type convex cone in a Banach space is a relatively old (and, possibly,
slightly forgotten by the mathematical programming community) part of convex
analysis (see, e.g. [ET76]). The corresponding general results, as applied to the
case of conic problems (i.e., finite-dimensional problems with general-type non-
negativity constraints and affine functional constraints), form the contents of
§3.2. To our knowledge, in convex analysis, there was no special interest to conic
problems, and consequently to the remarkable symmetric form of the aforemen-
tioned duality in this particular case. The only previous result in spirit of this
duality known to us it the dual characterization of the Lovasz capacity number
θ(Γ) of a graph (see [Lo79]).

Exercises
2.1 *** WS 2020/21 *** Prove Lemma 2.1.6.
2.2 *** WS 2020/21 *** Prove Lemma 2.1.7
2.3 *** WS 2020/21 *** Show that the set of non-negative polynomials of
degree at most 2d
n o
(a0 , a1 , . . . , a2d ) ∈ R2d+1 : a0 + a1 x + · · · + a2d x2d ≥ 0 for all x ∈ R

is a proper convex cone for any d ≥ 0.


2.4 (a) For the Lorentz cone, show that (Ln+1 )∗ = Ln+1 .
(b) Determine the dual cone of the cone of copositive matrices.
2.5 Consider the following location problem: We are given N locations in
the plane x1 , . . . , xN ∈ R2 . Find a point y ∈ R2 which minimizes the
sum of the distances to the N locations:
N
X
min d(xi , y).
y∈R2
i=1

(a) Formulate this problem as a conic program using the cone

L2+1 × L2+1 × · · · × L2+1 .


52 Duality in conic programming (Version: May 24, 2022)

(b) Determine its dual.


(c) Is there a duality gap?
2.6 Consider the following semidefinite program, which involves inequali-
ties:
p∗ = sup{hC, Xi : hAj , Xi ≤ bj for j = 1, . . . , m, X  0}. (2.11)
(a) Bring this program in standard primal form and write its dual
semidefinite program.
(b) Show that weak duality holds between the program (2.11) and the
semidefinite program
 
X m m
X 
d∗ = inf bj yj ; yj Aj − C  0, y ∈ Rm
+ . (2.12)
 
j=1 j=1

(c) Give some conditions ensuring that there is no duality gap between
(2.11) and (2.12), i.e., that p∗ = d∗ .
2.7 *** WS 2020/21 *** Let G = (V, E) be a graph. The independence
number α(G) of the graph is the maximal cardinality of a set S ⊆ V
such that {i, j} 6∈ E for any i, j ∈ S. Show that α(G) equals the optimal
value of the following conic program:
maximize hJ, Ai
subject to A ∈ CP n ,
hI, Ai = 1,
Aij = 0 if {i, j} ∈ E.
2.8 *** WS 2020/21 *** Prove Theorem 2.6.7
2.9 *** WS 2020/21 *** Prove Theorem 2.6.10.
Hint: You may derive it from Theorem 2.6.8.
3
The ellipsoid method

It is well known that linear programs (with rational data c, a1 , . . . , am ,


b) can be solved by the simplex method. It was invented by Dantzig1 in
1947 and performs very well in practice. However, it is still an open problem
whether there is a variant of the simplex algorithm which gives a polynomial
time algorithm for solving general LPs.
What do we mean by a polynomial time algorithm? We say an algorithm
runs in polynomial time if the number of basic operations the algorithm
uses is polynomial in the input size. Computers can only operate with bits,
that is with sequences of 0s and 1s. So we measure the input size in the
number of bits which have to be used to encode the input data. For instance
an integer number p is encoded in binary notation, so that its bit size is
log p + 1 (logarithm in base 2). Rational numbers are encoded as two integer
numbers and the bit size of a vector or a matrix is the sum of the bit sizes
of its entries. Basic operations are bit operations. For example, adding two
integer numbers p and q would correspond to log p+log q+2 basic operations.
Consider a linear program

max cT x
x ∈ Rn
(3.1)
Ax = b
x≥0

where the data A, b, c is rational-valued. From the point of view of com-


putability this is a natural assumption and it would be desirable to have
an optimal solution which is also rational-valued. A basic result in linear
programming asserts that this is indeed the case: If program (3.1) has an
optimal solution, then it has a rational optimal solution x ∈ Qn , whose bit
1 George Dantzig (1914–2005)
54 The ellipsoid method

size is polynomially bounded in terms of the bit sizes of the input A, b, and
c (see e.g. Schrijver [1986]).
The first polynomial-time algorithm for solving LPs was given by Khachiyan2
in 1979, based on the ellipsoid method. The value of this algorithm is how-
ever mainly theoretical as it is very slow in practice. Later the algorithm of
Karmarkar3 in 1984 opened the way to polynomial time algorithms for LP
based on interior-point algorithms, which also perform well in practice.
What about algorithms for solving semidefinite programs?
First of all, one cannot hope for a polynomial time algorithm permitting
to solve any semidefinite program rationally. Indeed, even if the data of the
SDP are assumed to be rational-valued, the output might be an irrational
number. For instance, the following program
max x
x∈R
  (3.2)
1 x
0
x 2

attains its maximum at x = 2. Therefore, one should look at algorithms
permitting to compute in polynomial time an ε-approximate optimal solu-
tion.
However, even if we set up to this less ambitious goal of just computing
ε-approximate optimal solutions, we should make some assumptions on the
semidefinite program, roughly speaking, in order to avoid having too large
optimal solutions.
An instance of SDP whose output is exponentially large in the bit size of
the data is the semidefinite program, with variables x1 , . . . , xn ,
inf xn
x ∈ Rn
    (3.3)
1 2 1 xi−1
 0,  0 for i = 2, . . . , n
2 x1 xi−1 xi
n
Then any feasible solution satisfies xn ≥ 22 . Hence the bit-size of an optimal
solution is exponential in n, thus exponential in terms of the bit-size of the
data. So even writing down the optimal solution requires exponentially many
basic operations.
On the positive side, it is well known that one can test whether a given
rational matrix is positive semidefinite in polynomial time — using Gaussian
2 Leonid Khachiyan (1952–2005)
3 Narendra Karmarkar (1957–)
3.1 Ellipsoids 55

elimination. Hence one can test in polynomial time membership in the pos-
itive semidefinite cone. Moreover, as a byproduct of Gaussian elimination,
n , then one can compute in polynomial time a hyperplane strictly
if X 6∈ S+
separating X from S+ n . See Section 3.4 below for details.

This observation is at the base of the polynomial time algorithm for solving
approximately semidefinite programs, based on the ellipsoid method which is
the subject of this chapter. Roughly speaking, one can solve a semidefinite
program in polynomial time up to any given precision. More precisely, in
this chapter we shall prove the following result describing the complexity of
solving semidefinite programming with the ellipsoid method:
Theorem 3.0.1 Consider the semidefinite program
p∗ = sup hC, Xi
X ∈ Sn
hAj , Xi = bj for j ∈ [m]
X0
where Aj , C, bj are rational-valued. Denote by F its feasibility region. Sup-
pose we know a rational point x0 ∈ F and rational numbers positive r, R so
that
x0 + rBd ⊆ F ⊆ x0 + RBd ,

where Bd is the unit ball in the d-dimensional subspace

{Y ∈ S n : hAj , Y i = 0 (j ∈ [m])}.

Let ε > 0 be given. Then, one can find a rational matrix X ∗ ∈ F such that

p∗ − hC, X ∗ i ≤ ε.

The complexity of this algorithm is polynomial in n, m, log r, log R, log(1/ε),


and the bit size of the input data.

3.1 Ellipsoids
3.1.1 Definitions
n
A positive definite matrix A ∈ S++ and a vector x ∈ Rn define the ellip-
soid E(A, x) by

E(A, x) = {y ∈ Rn : (y − x)T A−1 (y − x) ≤ 1}.

For instance E(r2 In , 0) = rBn is the ball of radius r centered at the origin.
56 The ellipsoid method

Let A = ni=1 λi ui uT
P
i be a spectral decomposition of A. Then the direc-
tions of the vectors ui are the axis of the ellipsoid E(A, x). Furthermore, the

value λi equals the length of the corresponding semiaxis, and the volume
of E(A, x) equals
p √
vol E(A, x) = λ1 · · · λn vol Bn = det A vol Bn ,
where
Bn = {x ∈ Rn : kxk ≤ 1}
is the n-dimensional unit ball. It has volume
π n/2
vol Bn = ,
Γ(n/2 + 1)
where Γ is the gamma function, a continuation of the factorial function,
which for half-integral nonnegative integers is defined by

Γ(1/2) = π, Γ(1) = 1, Γ(x + 1) = xΓ(x).
This dimension dependent factor of vol Bn usually does not play a role.
The definition of E(A, x) is an implicit definition using a strictly convex
quadratic inequality. There is also an explicit definition of ellipsoids as the
image of the unit ball under an invertible affine transformation
{T y + x : y ∈ Bn },
where T ∈ Rn×n is an invertible matrix, and where x ∈ Rn is a translation
vector.
From linear algebra it is known that every invertible matrix T has a
factorization of the form T = BP where B ∈ S++ n is a positive definite
matrix and P ∈ O(n) is an orthogonal matrix. So we may assume in the
following that the matrix T which defines the ellipsoid is a positive definite
matrix.
In fact one can find this factorization, the polar factorization, from the
singular value decomposition of T
T = U T ΣV, U, V ∈ O(n), Σ = diag(σ1 , . . . , σn ),
where σi ≥ 0 are the singular values of T (i.e., σi2 are the eigenvalues of the
matrix T T T or, equivalently, of T T T ). Then,
T = BP with B = U T ΣU, P = U T V.
The singular values of T are at the same time the lengths of the semiaxis of
the ellipsoid.
3.1 Ellipsoids 57

The relation between the implicit and explicit descriptions is given by

E(A2 , x) = {Ay + x : y ∈ Bn } when A  0.

3.1.2 Loewner-John ellipsoids


Ellipsoids are important geometric objects partially due to their simple de-
scriptions. They can be used for instance to approximate other more compli-
cated convex sets. A famous approximation is the Loewner-John4 ellipsoid:
Every convex body K ⊆ Rn (a compact and convex set) is contained in a
unique ellipsoid of minimum volume. This ellipsoid is the called the Loewner-
John ellipsoid of the convex body. We denote it by E(K). Generally, deter-
mining the Loewner-John ellipsoid involves solving a convex optimization
problem and we will come to this in one of the next chapters. However, in
some cases there are explicit formulæ which determine the Loewner-John
ellipsoid, like in the case of the intersection of an ellipsoid with an halfspace
going through the center of the ellipsoid, a central cut.

Lemma 3.1.1 Let A ∈ S++ be a positive definite matrix, and let x, a ∈ Rn


be vectors with a 6= 0. The Loewner-John ellipsoid of the intersection

E(A, x) ∩ {y ∈ Rn : aT y ≥ aT x}

is equal to

E(E(A, x) ∩ {y ∈ Rn : aT y ≥ aT x}) = E(A0 , x0 )

with
n2
A0 = n2 −1
(A − 2 T
n+1 bb ), x0 = x + 1
n+1 b, b= √ 1 Aa. (3.4)
aT Aa

Furthermore, if n ≥ 2,
vol E(A0 , x0 ) 1
≤ e− 2n < 1. (3.5)
vol E(A, x)
Proof By performing an affine transformation T : Rn → Rn we may assume
that A = In , x = 0, and a = e1 . So we are trying to find the smallest volume
ellipsoid containing Bn ∩ {y ∈ Rn : y1 ≥ 0}. The symmetry of this convex
body forces its Loewner-John ellipsoid to be defined by a diagonal matrix

A0 = Diag(α, β, . . . , β) and by x0 = 0 + γe1


4 Charles Loewner (1893–1968) and Fritz John (1910–1994)
58 The ellipsoid method

for some α, β, γ > 0. The points e1 , ±e2 , . . . , ±en lie on the boundary of
the Loewner-John ellipsoid. Hence for i = 2, . . . , n we get the equations
1 γ2
1 = (±ei − γe1 )T diag( α1 , β1 , . . . , β1 )(±e1 − γe1 ) = + ,
β α
and
(1 − γ)2
1 = (e1 − γe1 )T diag( α1 , β1 , . . . , β1 )(ei − γe1 ) = .
α
(1−γ)2
So we can eliminate the variables α = (1 − γ)2 and β = 1−2γ because

1 γ2 (1 − γ)2 − γ 2 1 − 2γ
=1− = 2
= .
β α (1 − γ) (1 − γ)2

Now we have find the minimum of det A0 which is the minimum of the
function
s
p (1 − γ)2n
γ 7→ αβ n−1 = .
(1 − 2γ)n−1

Using calculus one finds γ = n+11


. By applying T −1 we arrive at the first
result (3.4). We just computed the determinant of A0 exactly:
 n+1  n−1
0 n n
det(A ) = .
n+1 n−1
The inequality det(A0 ) ≤ exp (−1/n) follows now from another calculus
exercise (Exercise 3.5) and it implies (3.5).

3.2 Platonic version of the ellipsoid method


For the moment, in this section, we ignore the fact that physical computers
cannot work with arbitrary real numbers. We shall give an “algorithm”,
the ellipsoid method, which approximates an optimal solution of the convex
optimization problem
max cT x
x∈K
where c ∈ Rn is a given vector and where K ⊆ Rn is a given convex body.
In a sense this algorithm only works on a Platonic computer which can
operate with the arbitrary real numbers. Even though these are unrealistic
assumptions, the geometric intuition gathered here will be very helpful for
understanding the “real” ellipsoid method in the next section.
It is clear how to give a vector c: Simply give the coordinates. But how do
3.2 Platonic version of the ellipsoid method 59

you give a convex body? Here there are many possibilities: If K is a polytope,
then one can give a list of supporting hyperplanes, or one can give the list
of extreme points. If K is the set of feasible solution of a primal semidefinite
program, a compact spectrahedron, then one can use the symmetric matrices
A1 , . . . , Am ∈ S n and the right hand sides b1 , . . . , bm to determine K. In
the following we will use a less explicit description of K, in fact it is a
black-box description. We give K in term of a separation oracle, which is
an algorithm that can decide wether some point belongs to K and if not it
gives a separating hyperplane.
Formally the separation oracle is an algorithm which can solve the sepa-
ration problem for a convex body K:

given: vector x ∈ Rn
find: either assert x ∈ K or find d ∈ Rn with dT x ≥ maxy∈K dT y

The ellipsoid method uses this separation oracle to solve the optimization
problem for a convex body K:

given: vector c ∈ Rn with kck = 1, ε > 0, x0 ∈ Rn , r, R > 0 so that


x0 + rBn ⊆ K ⊆ x0 + RBn
find: x ∈ K with cT x ≥ maxy∈K cT y − ε

The ellipsoid method constructs a sequence of smaller and smaller ellip-


soids Ek , with k = 0, . . . , N , whose volume goes to zero exponentially fast.
The set of optimal solutions is contained in every ellipsoid Ek and every step
provides a more accurate approximation of this set of optimal solutions. For
this we use the separation oracle in every step to test if the center xk of Ek
is contained in the convex body K. If yes, then the set of optimal solutions
lies in the halfspace
{y ∈ Rn : cT y ≥ cT xk }
If no, then the separation oracle gives us a separating hyperplane, deter-
mined by the vector d ∈ Rn , separating xk from K and we know that the
set of optimal solutions lies in the halfspace
{y ∈ Rn : (−d)T y ≥ (−d)T xk }
Now we take in the next iteration the ellipsoid Ek+1 which is the Loewner-
John ellipsoid of the corresponding halfspace intersected with Ek .
N
Choose N so that 2 Rr e− 2n2 ≤ ε
2

E0 = E(R2 I, x0 )
60 The ellipsoid method

for k = 0, . . . , N − 1 do
Let xk be the center of the ellipsoid Ek
Use the separation procedure for xk
if xk ∈ K then
k is a feasible index
a=c
else
a = −d
end if
Ek+1 = E(Ek ∩ {y ∈ Rn : aT y ≥ aT xk })
end for

Lemma 3.2.1 For the convex body Kk ⊆ Rn defined by


ζk = max{cT xj : 0 ≤ j < k, j feasible index}
Kk = K ∩ {x ∈ Rn : cT x ≥ ζk }
the inclusion Kk ⊆ Ek holds.
Proof Exercise 3.2.
Theorem 3.2.2 Let k be a natural number and let j be a feasible index
with ζk = cT xj , then
R2 − k 2
cT xj ≥ max cT y − 2 e 2n .
y∈K r
Hence for any desired accuracy ε we can choose the number of iterations N
large enough, so that we can find by the ellipsoid method feasible solutions
whose objective value approximates the optimal value up to ε.
Proof Let z ∈ K be an optimal solution: cT z = maxy∈K cT y. Consider the
convex body C which is the convex hull of point z and the (n−1)-dimensional
ball of radius r:
(x0 + rBn ) ∩ {y ∈ Rn : cT y = cT x0 }. (3.6)
Geometrically, the convex body C is a cone with vertex z and base (3.6).
Intersecting C with the halfspace
{y ∈ Rn : cT y ≥ cT xj }
gives a convex body C 0 having volume
n
rn−1 vol Bn−1 (cT z − cT x0 ) cT z − cT xj

0
vol C = .
n cT z − cT x0
3.3 Separation and optimization 61

Clearly, C 0 ⊆ Kk and by the previous lemma Kk ⊆ Ek . Hence, by applying


iteratively relation (3.5), we obtain vol Ek ≤ e−k/2n vol E0 . Combining with
vol E0 = Rn vol Bn , we can give an upper bound for the volume of C 0 :
k
vol C 0 ≤ vol Ek ≤ Rn e− 2n vol Bn .
By the Cauchy-Schwarz inequality and using the assumption kck = 1 we get
|cT z − cT x0 | ≤ kck · kz − x0 k ≤ kckR = R.
Putting things together we obtain

n vol Bn 1/n R2 − k2 R2
 
k
T T
c z − c xj ≤ e 2n ≤ 2 e− 2n2 .
vol Bn−1 r r

3.3 Separation and optimization


We already emphasized that physical computers cannot work with arbitrary
real number and rather work with sequences of 0s and 1s. So it is more
realistic to allow only rational numbers with some predescribed accuracy.
Since it might happen that the optimal solution of an optimization cannot
be represented with rational coordinates we need appropriate modification
of the separation and of the optimization problems to be able to work with
rational numbers only. These are the weak separation problem for a convex
body K:

given: vector x ∈ Qn , rational number δ > 0


find: either assert kx − πK (x)k ≤ δ or find d ∈ Qn with kdk ≥ 1
and dT x ≥ maxy∈K dT y − δ

(recall that πK is the metric projection on the convex body K from Sec-
tion ??) and the weak optimization problem for a convex body K:

given: vector c ∈ Qn , rational number ε > 0, x0 ∈ Qn , r, R > 0 rational


so that x0 + rB ⊆ K ⊆ x0 + RB
find: x ∈ K with cT x ≥ maxy∈K cT y − ε

One of the most important results in the theory of convex optimization


due to Grötschel5 , Lovász6 and Schrijver7 from 1981 is that optimization is
5 Martin Grötschel (1948–)
6 Lászlo Lovász (1948–)
7 Alexander Schrijver (1948–)
62 The ellipsoid method

not more difficult than separation: Whenever we have an efficient procedure


to solve the weak separation problem, we get an efficient procedure to solve
the weak optimization for free.
Theorem 3.3.1 Let K be a class of convex bodies. If there is a polynomial
time algorithm to solve the weak separation problem for any K ∈ K, then
there is a polynomial time algorithm to solve the weak optimization problem
for any K ∈ K.
We use the remainder of this section to prove this theorem. We follow the
presentation of the very beautiful paper of Grötschel, Lovász, and Schrijver
[1981] closely.

2
N = 4n2 dlog 2Rrεkck e
2 4−N
δ = R300n
p = 5N
A0 = R 2 I n
for k = 0, . . . , N − 1 do
Use the separation procedure with input xk , and δ
if kxk − πK (xk )k ≤ δ then
k is a feasible index
a=c
else
a = −d
end if
bk = √Ak a
aT Ak a

xk = xk + n+1 1
bk
∗ 2n2 +3 2
Ak = 2n2 (Ak − n+1 bk bT
k)
xk+1 ≈ xk ∗

Ak+1 ≈ A∗k
end for

The entries of bk , x∗k and A∗k might not be rational as the formulæ contain
square roots. To make them rational we use the sign ≈ which means that we
round to p binary digits behind the comma. So the entries of x∗k and xk+1 ,
respectively of A∗k and Ak+1 , differ by at most 2−p . In an implementation
we are careful so that Ak+1 stays symmetric.
The following lemma from linear algebra is sometimes very useful when
analyzing numerical algorithms which perform rank-1 updates of matrices:
If we know the inverse of a matrix A and if we add to A a matrix of rank 1,
3.3 Separation and optimization 63

then finding the inverse of A can be done by a simple formula. In the ellipsoid
method the transition from the matrix defining ellipsoid Ek to the matrix
of ellipsoid Ek+1 is such a rank-1 update.

Lemma 3.3.2 Sherman-Morrison formula


Let A ∈ Rn×n be a square matrix which has an inverse, and let u and
v ∈ Rn be vectors so that 1 + v T A−1 u 6= 0. Then the matrix A + uv T is
invertible and
A−1 uv T A−1
(A + uv T )−1 = A−1 − .
1 + v T A−1 u
Proof Exercise 3.1.

The next lemma states that we have some control on the size of the
coefficients which occur during the computation. We use the operator norm 8
for this. The operator norm of a matrix A is defined by

kAk = max{kAxk : kxk = 1},

where kAxk and kxk is the Euclidean norm. If A is symmetric, then kAk is
the maximum absolute value of the eigenvalues of A.

Lemma 3.3.3 The matrices A0 , . . . , AN are positive definite. Moreover,

kxk k ≤ kx0 k + R2k , kAk k ≤ R2 2k , kA−1 −2 k


k k≤R 4 .

Proof We prove all the statements by induction on k. For k = 0 there is


nothing to do.
2
Applying the Sherman-Morrison formula with A = Ak , u = − n+1 bk , and
T −1 n−1
v = bk , we get 1 + v Ak u = n+1 6= 0 and thus

2n2 aaT
 
∗ −1 −1 2
(Ak ) = 2 Ak + · .
2n + 3 n−1 aT Ak a
Hence, we see by induction on k that (A∗k )−1 is positive definite as it is
the sum of a positive definite matrix with a positive semidefinite matrix.
Also its inverse, A∗k , is positive definite since its eigenvalues, which are the
reciprocals of the eigenvalues of (A∗k )−1 , are all positive.
Using the induction hypothesis for Ak we get
2n2 + 3 2n2 + 3
 
2 3
kA∗k k = A k − bk bT
≤ kA k k ≤ 1 + R2 2k .
2n2 n+1 k 2n2 2n2
8 Note that the operator norm is different from the Frobenius norm. In this chapter we only use
the operator norm so this should not cause confusion.
64 The ellipsoid method

So
 
3
kAk+1 k ≤ kA∗k k + kAk+1 − A∗k k ≤ 1+ 2 R2 2k + n2−p ≤ R2 2k+1 .
2n
Further,
s
kAk ak aT A2k a p
kbk k = p = ≤ kAk k ≤ R2k/2 ,
aT Ak a aT Ak a
and so
1
kxk+1 k = kxk+1 − x∗k + x∗k k ≤ kxk+1 − x∗k k + kxk k + kbk k
n+1
√ 1
≤ n2−p + kx0 k + R2k + R2k/2 ≤ kx0 k + R2k+1 .
n+1
Further,
2n2 kak2
 
−1 2
k(A∗k )−1 k
≤ 2 kAk k + ·
2n + 3 n−1 aT Ak a
2n2
 
2 n + 1 −1
≤ 2 kA−1
k k + kA−1
k ≤ kA k,
2n + 3 n−1 k n−1 k
where we used the induction hypothesis that Ak is positive definite and thus
2
the fraction akak
T A a is at most λ
1
min (Ak )
= kA−1
k k.
k
Let v be a normalized eigenvector of the smallest eigenvalue λmin (Ak+1 )
of Ak+1 . Then
λmin (Ak+1 ) = v T Ak+1 v = v T A∗k v + v T (Ak+1 − A∗k )v
n − 1 −1 −1
≥ k(A∗k )−1 k−1 − kAk+1 − A∗k k ≥ kA k − n2−p
n+1 k
n − 1 2 −k
≥ R 4 − n2−p ≥ R2 4−(k+1) .
n+1
Hence, Ak+1 is positive definite and
−1 1
kAk+1 k= ≤ R−2 4k+1 .
λmin (Ak+1 )

Lemma 3.3.4 Define the ellipsoid


Ek = E(Ak , xk ).
Then
vol Ek+1 1
≤ e− 5n .
vol Ek
3.3 Separation and optimization 65

Proof Similar to Lemma 3.1.1, one only needs to take the rounding errors
into account. This can be done like in the proof of the previous lemma.

The next lemma is Lemma 3.2.1 word by word, only its proof needs more
work.

Lemma 3.3.5 For the convex body Kk ⊆ Rn defined by

ζk = max{cT xj : 0 ≤ j < k, j feasible index}


Kk = K ∩ {x ∈ Rn : cT x ≥ ζk }
the inclusion Kk ⊆ Ek holds.

Proof Again we do induction on k and for the case k = 0 there is nothing


to do. For x ∈ Kk+1 we have x ∈ Kk ⊆ Ek and

aT x ≥ aT xk − δ. (3.7)

We only need the δ when k is not a feasible index and a = −d. Decompose
the vector x:
x = xk + αbk + y,

with a vector y ∈ Rn satisfying aT y = 0. Choosing y like this is possible


because bk and a are not orthogonal since
a T Ak a
aT bk = p > 0.
aT Ak a
Since x lies in Ek we have

1 ≥ (y + αbk )T A−1 T −1 2 T −1 T −1 2
k (y + αbk ) = y Ak y + α bk Ak bk = y Ak y + α ,

and so α ≤ 1. From (3.7) and aT y = 0 we derive


p
−δ ≤ αaT bk = α aT Ak a.

Now we come to the main estimate:

(x − xk+1 )T A−1 ∗ T ∗ −1 ∗
k+1 (x − xk+1 ) ≤ (x − xk ) (Ak+1 ) (x − xk ) + R1 ,
1
where the remainder term R1 is at most 12n 2 as can be shown by the same

techniques as the ones in the previous lemma. The main term: We have by
the decomposition of x and the definition of x∗k
 
∗ 1 1
x − xk = xk + y + αbk − xk − bk = α − bk + y
n+1 n+1
66 The ellipsoid method

and so
(x − x∗k )T (A∗k+1 )−1 (x − x∗k )
T
2n2
 
1
= 2 α− bk + y
2n + 3 n+1
 T
   !
2 aa 1
A−1
k + n + 1 · aT A a α− bk + y
k n+1
2 2 !
2n2
 
1 T −1 2 1
= 2 α− + y Ak y + α−
2n + 3 n+1 n+1 n+1
2n2 n2
 
2α(1 − α)
≤ 2 2

2n + 3 n − 1 n−1
2n 4 4δ 2n 4δ
≤ 4 + + kA−1 k
2n + n2 − 3 (n − 1) aT Ak a 2n4 + n2 − 3 n − 1 k
p

2n4 4δR−2 4N
≤ +
2n4 + n2 − 3 n−1
1
≤1− .
12n2
Hence (x − xk+1 )T A−1
k+1 (x − xk+1 ) ≤ 1 and so x ∈ Ek+1 .

Theorem 3.3.1 The argument for the fact that N iterations suffice to guar-
antee that the found solution is ε-close to the optimum was already given
in the proof of Theorem 3.2.2. Note that N depends polynomially on the
input size. So the only thing which is left to do is to see that the (rational)
coefficients of xk and Ak have a polynomial-size bit-encoding. This follows
from Lemma 3.3.3 plus the fact that we round the coefficients to p binary
digits behind the comma.

3.4 Separation for semidefinite programming: Gaussian


elimination
Now in order to prove Theorem 3.0.1 the only thing we have to do is to
provide an algorithm for the weak separation problem for the feasible set F
which runs in polynomial time. In particular we have to decide whether a
rational matrix is positive semidefinite. For this we use our favorite algorithm
of linear algebra: Gaussian elimination.
Let A = (aij ) ∈ S n be a rational matrix. Gaussian elimination permits to
do the following tasks in polynomial time:
(i) Either: find a rational matrix U ∈ Qn×n and a rational diagonal matrix
3.4 Separation for semidefinite programming: Gaussian elimination 67

D ∈ Qn×n with nonnegative coefficients such that A = U DU T , thus


showing that A  0.
(ii) Or: find a rational vector x ∈ Qn such that xT Ax < 0, thus showing that
A is not positive semidefinite and giving a hyperplane separating A from
the cone S+n.

Here is a sketch. We distinguish four cases.


Case 1: a11 < 0. Then eT
1 Ae1 < 0.

Case 2: a11 = 0, but some entry a1j is not zero, say a12 6= 0. Then choose
λ ∈ Q such that 2λa12 + a22 < 0, so that

xT Ax < 0 for x = (λ, 1, 0, . . . , 0)T .

Case 3: a11 > 0. Then we apply Gaussian elimination to the rows Rj and
columns Cj of A for j = 2, . . . , n. Namely, for each j = 2, . . . , n, we replace
a1j a1j
Cj by Cj − a11 C1 , and analogously we replace Rj by Rj − a11 Rj , which
amounts to making all entries of A equal to zero at the positions (1, j) and
(j, 1) for j 6= 1. For this, define the matrices
a1j
Pj = In − E1j and P = P2 · · · Pn .
a11
Then, P is rational and nonsingular, and P T AP has the block form:
 
a11 0
T
P AP = , where A0 ∈ S n−1 .
0 A0
Thus,
A  0 ⇐⇒ P T AP  0 ⇐⇒ A0  0.

Then, we proceed inductively with the matrix A0 ∈ S n−1 :


Either, we find W ∈ Q(n−1)×(n−1) and a diagonal matrix D0 ∈ Q(n−1)×(n−1)
such that A0 = W T D0 W . Then, we obtain that A = U T DU , setting
   
1 0 −1 1 0
U= P , D= .
0 W 0 D0

Or, we find y ∈ Qn−1 such that y T A0 y < 0. Then, we obtain that xT Ax < 0,
after defining z = (0, y) and x = P z ∈ Qn .
Case 4: a11 = 0 and the matrix A is of the form
 
0 0
A= for A0 ∈ S n−1 .
0 A0
68 The ellipsoid method

Then choose P = In and continue inductively with the matrix A0 as in


Case 3.

With this procedure we almost proved Theorem 3.0.1. There is only one
technical detail missing: We have to work in the linear span of the set of
feasible solutions since we require that we know a full-dimensional ball lying
inside F. So we have to do some postprocessing in case when the matrix X
satisfies the linear constraints hAj , Xi = bj but is not positive semidefinite
and the vector x ∈ Rn certifies this by the inequality xT Ax < 0. Then
the hyperplane {Y ∈ S n : hY, xxT i = 0} separates X from the positive
semidefinite cone. Then we have to project the matrix xxT onto the linear
space {Y ∈ S n : hAj , Y i = 0 (j ∈ [m])} in order to get the desired output
for the weak separation procedure.

3.5 Further reading


In this chapter we showed that we can get a polynomial time algorithm for
the weak optimization problem when we have one for the weak separation
problem. In fact one can also show this the opposite direction is true. So
separation and optimization are polynomial time equivalent. In the book
Grötschel, Lovász, Schrijver [1988] more fundamental algorithmic problems
for convex sets are shown to be polynomial time equivalent: optimization,
violation, validity, separation, and membership. There they also show that
for the polynomial time equivalence of separation and optimization it is not
needed that we know a small ball inside of the convex body, knowing a big
ball containing the convex body already suffices.
We explained the notion of polynomial time algorithms somewhat vaguely.
A precise discussion would require the introduction of the Turing machine
model for computation (or some other equivalent mathematical abstraction).
For this and much more we refer to books on computational complexity
theory where standard references are: Garey and Johnson [1979] and Arora
and Barak [2009]
Although polynomial time in theory, algorithms based on the ellipsoid
method are not very practical. Instead, interior-point algorithms are used
to solve semidefinite programs in practice. We chose to present the ellipsoid
method in this chapter because it gives the easiest and cleanest way to prove
that one can solve semidefinite programs in polynomial time. The analysis
of interior-point methods is much more involved and the final results are
mathematically less satisfying.
In any case, there are quite some books on interior point methods which
3.6 Historical remarks 69

we highly recommend for further reading. The classical barrier method is


developed in the book Fiacco and McCormick [1968]. The standard reference
is the book by Nesterov and Nemirovski [1994] but not easy to read. Boyd
and Vandenberghe [2004] and Ye [1997] as well as Ben-Tal and Nemirovski
[2001] are very helpful. Then, the books by Renegar [2001] and by Roos,
Terlaky, Vial [1997] consider interior point methods for linear programs.
There are some surveys available: Nemirovski, Todd [2008], Vandenberghe,
Boyd [1996], Todd [2001].

3.6 Historical remarks


The ellipsoid method is one of the milestones in the history of mathemati-
cal programming. It opened the possibility for the development of efficient,
polynomial-time algorithms for many convex optimization problems.

1947 Dantzig invented the simplex algorithm for linear programming. The
simplex algorithm works extremely good in practice, but until today
nobody really understands why (although there are meanwhile good
theoretical indications). It is fair to say that the simplex algorithm
is one of the most important algorithms invented in the last century.
1972 Klee and Minty found a linear program for which the simplex al-
gorithm is extremely slow (when one uses Dantzig’s most-negative-
entry pivoting rule): It uses exponentially many steps.
1979 Khachiyan invented the ellipsoid method for linear programming which
runs in polynomial time. It is a very valuable theoretical algorithm.
1981 Grötschel, Lovász, and Schrijver showed that the problems of separa-
tion and optimization are polynomial time equivalent.
1984 Karmakar showed that one can use interior-point methods for design-
ing a polynomial-time algorithm for linear programming. Nowadays,
interior-point methods can compete with the simplex algorithm.
1994 Nesterov and Nemirovski generalized Karmarkar’s result to conic pro-
gramming with the use of self-concordant barrier functions.
since 1994 Every day conic programming becomes more useful (in theory
and practice).

Some more words about interior point methods: It is fair to say that
during the last twenty years there has been a revolution in mathematical
optimization based on the development of efficient interior point algorithms
for convex optimization problems.
70 The ellipsoid method

Margaret H. Wright [2005] begins her survey “The interior-point rev-


olution in optimization: History, recent developments, and lasting conse-
quences” with:
REVOLUTION:
(i) a sudden, radical, or complete change;
(ii) a fundamental change in political organization, especially the overthrow or
renunciation of one government or ruler and the substitution of another.
It can be asserted with a straight face that the field of continuous optimization
has undergone a revolution since 1984 in the sense of the first definition and that
the second definition applies in a philosophical sense: Because the interior-point
presence in optimization today is ubiquitous, it is easy to lose sight of the mag-
nitude and depth of the shifts that have occurred during the past twenty years.
Building on the implicit political metaphor of our title, successful revolutions
eventually become the status quo.
The interior-point revolution, like many other revolutions, includes old ideas that
are rediscovered or seen in a different light, along with genuinely new ideas. The
stimulating interplay of old and new continues to lead to increased understanding
as well as an ever-larger set of techniques for an ever-larger array of problems,
familiar and heretofore unexplored. Because of the vast size of the interior-point
literature, it would be impractical to cite even a moderate fraction of the relevant
references, but more complete treatments are mentioned throughout. The author
regrets the impossibility of citing all important work individually.

Exercises
3.1 Give a proof of the Sherman-Morrison formula.
3.2 Use induction on k to prove Lemma 3.2.1.
3.3 Let G = (V, E) be a graph and let LG be its Laplacian matrix. Show
that one can approximate
SDP(G) = max h 14 LG , Xi
X ∈ Sn
Xii = 1 for i ∈ [n]
X  0.
to any desired accuracy in polynomial time.
3.4 Implement the ellipsoid method for the above semidefinite program and
compute the value for the Petersen graph.
3.5 Show the inequality
 n+1  n−1
n n
≤ exp(−1/n).
n+1 n−1
Hint: Reduce this to showing the inequality:
(1 + x)1+x (1 − x)1−x ≥ exp(x2 ) for all 0 < x < 1.
PART TWO
COMBINATORIAL OPTIMIZATION
4
Graph coloring and independent sets

In this chapter we revisit in detail the theta number ϑ(G), which has al-
ready been introduced in earlier chapters. In particular, we present several
equivalent formulations for ϑ(G), we discuss its geometric properties, and we
present some applications: for bounding the Shannon capacity of a graph,
and for computing in polynomial time maximum stable sets and minimum
colorings in perfect graphs. We also show the link to Delsarte linear pro-
gramming bounds for binary codes and we present a hierarchy of stronger
bounds for the stability number, based on the approach of Lasserre.
Here are some additional definitions used in this chapter. Let G = (V, E)
be a graph. Then, E denotes the set of pairs {i, j} of distinct nodes that
are not adjacent in G. The graph G = (V, E) is called the complementary
graph of G and G is called self-complementary if G and G are isomorphic
graphs. Given a subset S ⊆ V , G[S] denotes the subgraph induced by S:
its node set is S and its edges are all pairs {i, j} ∈ E with i, j ∈ S. The
graph Cn is the circuit (or cycle) of length n, with node set [n] and edges
the pairs {i, i + 1} (for i ∈ [n], indices taken modulo n). For a set S ⊆ V ,
its characteristic vector is the vector χS ∈ {0, 1}S , whose i-th entry is 1 if
i ∈ S and 0 otherwise. As before, e denotes the all-ones vector.

4.1 Preliminaries on graphs


4.1.1 Stability and chromatic numbers
A subset S ⊆ V of nodes is said to be stable (or independent) if no two nodes
of S are adjacent in G. Then the stability number of G is the parameter α(G)
defined as the maximum cardinality of an independent set in G.
A subset C ⊆ V of nodes is called a clique if every two distinct nodes in
C are adjacent. The maximum cardinality of a clique in G is denoted ω(G),
74 Graph coloring and independent sets

the clique number of G. Clearly,


ω(G) = α(G).
Computing the stability number of a graph is a hard problem: Given a
graph G and an integer k, deciding whether α(G) ≥ k is an NP-complete
problem, see Garey, Johnson, Stockmeyer [1976].
Given an integer k ≥ 1, a k-coloring of G is an assignment of numbers
(view them as colors) from {1, · · · , k} to the nodes in such a way that two
adjacent nodes receive distinct colors. In other words, a k-coloring corre-
sponds to a partition of V into k stable sets: V = S1 ∪ · · · ∪ Sk , where Si
is the stable set consisting of all the nodes that received the i-th color. The
coloring (or chromatic) number is the smallest integer k for which G admits
a k-coloring, it is denoted as χ(G).
Observe that a graph G is 2-colorable if and only if G is bipartite, a
property which can be decided in polynomial time. On the other hand, for
any integer k ≥ 3, it is an NP-complete problem to decide whether a graph
is k-colorable; in fact, it is already NP-complete to decide whether a planar
graph is 3-colorable, see Garey, Johnson, Stockmeyer [1976]. A well known
result is that every planar graph is 4-colorable — this is the celebrated four
color theorem1 .

Figure 4.1 The Petersen graph has α(G) = 4, ω(G) = 2, χ(G) = 3, χ(G) =
5

Clearly, any two nodes in a clique of G must receive distinct colors. There-
fore, for any graph, the following inequality holds:
ω(G) ≤ χ(G). (4.1)
This inequality is strict, for example, when G is an odd circuit, i.e., a circuit
1 The four colour theorem was proved by Appel and Haken [1977, 1977]; this is a long proof,
which relies on computer check. Another proof, a bit simplified but still relying on computer
check, was given later by Robertson, Sanders, Seymour, Thomas [1997]. A fully automated
proof has been given recently by Gonthier [2008].
4.1 Preliminaries on graphs 75

of odd length at least 5, or its complement. Indeed, for an odd circuit C2n+1
(n ≥ 2), ω(C2n+1 ) = 2 while χ(C2n+1 ) = 3. Moreover, for the complement
G = C2n+1 , ω(G) = n while χ(G) = n + 1. For an illustration see the cycle
of length 7 and its complement in Figure 4.2.

Figure 4.2 For C7 and its complement C7 : ω(C7 ) = 2, χ(C7 ) = 3, ω(C7 ) =


α(C7 ) = 3, χ(C7 ) = 4

4.1.2 Perfect graphs


It is intriguing to understand for which graphs equality ω(G) = χ(G) holds.
Note that any graph G with ω(G) < χ(G) can be embedded in a larger
graph Ĝ with ω(Ĝ) = χ(Ĝ), simply by adding to G a set of χ(G) new nodes
forming a clique. This justifies the following definition, introduced by C.
Berge2 in the early sixties, which makes the problem well posed.
Definition 4.1.1 A graph G is said to be perfect if equality
ω(H) = χ(H)
holds for all induced subgraphs H of G (including H = G).
For instance, bipartite graphs are perfect. Indeed for any bipartite graph
the min-max relation ω(G) = χ(G) (≤ 2) holds clearly. Also complements
of line graphs of bipartite graphs are perfect. Indeed the min-max relation
claims then that the maximum cardinality of a matching in a bipartite graph
is equal to the minimum cardinality of a vertex cover, which is true by a
theorem of König3 .
It follows from the definition and the above observation about odd circuits
that if G is a perfect graph then it does not contain an odd circuit of length at
least 5 or its complement as an induced subgraph. Berge already conjectured
in 1961 that all perfect graphs arise in this way. Resolving this conjecture
2 Claude Berge (1926–2002)
3 Dénes König (1884–1944)
76 Graph coloring and independent sets

has haunted generations of graph theorists. It was finally settled in 2006


by Chudnovsky, Robertson, Seymour and Thomas [2006] who proved the
following result, known as the strong perfect graph theorem:
Theorem 4.1.2 (The strong perfect graph theorem, Chudnovsky,
Robertson, Seymour and Thomas [2006]) A graph G is perfect if and
only if it does not contain an odd circuit of length at least 5 or its complement
as an induced subgraph.
This implies the following structural result about perfect graphs, known
as the perfect graph theorem, already proved by Lovász in 1972.
Theorem 4.1.3 (The perfect graph theorem, Lovász [1972]) If G
is a perfect graph, then its complement G too is a perfect graph.
We give a direct proof of Theorem 4.1.3 in the next section and we
will mention later some other, more geometric, characterizations of perfect
graphs (see, e.g., Theorem 4.2.4).

4.1.3 The perfect graph theorem


Lovász [1972] proved the following result, which implies the perfect graph
theorem (Theorem 4.1.3). The proof given below follows the elegant linear-
algebraic argument of Gasparian [1996].
Theorem 4.1.4 A graph G is perfect if and only if |V (G0 )| ≤ α(G0 )ω(G0 )
for each induced subgraph G0 of G.
Proof Necessity is easy: Assume that G is perfect and let G0 be an induced
subgraph of G. Then χ(G0 ) = ω(G0 ) and thus V (G0 ) can be covered by ω(G0 )
stable sets, which implies that |V (G0 )| ≤ ω(G0 )α(G0 ).
To show sufficiency, assume for a contradiction that there exists a graph
G which satisfies the condition but is not perfect; choose such a graph with
|V (G)| minimal. Then, n ≤ α(G)ω(G), ω(G) < χ(G) and ω(G0 ) = χ(G0 )
for each induced subgraph G0 6= G of G. Set ω = ω(G) and α = α(G) for
simplicity. Our first claim is:
Claim 1: There exist αω + 1 stable sets S0 , . . . , Sαω such that each vertex
of G is covered by exactly α of them.
Proof of the claim: Let S0 be a stable set of size α in G, and set S0 =
{v1 , . . . , vα }. For each node vk ∈ S0 , the graph G \ vk is perfect (by the
minimality assumption on G), and thus we have χ(G \ vk ) = ω(G \ vk ) ≤ ω.
Hence, for each k = 1, . . . , α, V \ {vk } can be partitioned into ω stable
4.2 Linear programming bounds 77

sets, denoted by S(k−1)ω+1 , . . . , Skω . In this way we obtain a collection of αω


stable sets which together with S0 satisfy the claim.

Our next claim is:

Claim 2: For each i = 0, 1, . . . , αω, there exists a clique Ki of size ω such


that Ki ∩ Si = ∅ and Ki ∩ Sj 6= ∅ for all j ∈ {0, 1, . . . , αω} \ {i}.
Proof of the claim: For each i = 0, 1, . . . , αω, as G \ Si is perfect we have
that χ(G \ Si ) = ω(G \ Si ) ≤ ω. This implies that χ(G \ Si ) = ω since, if
χ(G \ Si ) ≤ ω − 1, then one could color G with ω colors, contradicting our
assumption on G. Hence there exists a clique Ki disjoint from Si and with
|Ki | = ω. We now verify that Ki meets the other αω stable sets Sj with
j 6= i. For this recall the construction of the stable sets S0 , S1 , . . . , Sαω in
Claim 1: with S0 = {v1 , . . . , vα }, we have V \ {vk } = S(k−1)ω+1 ∪ . . . ∪ Skω for
k = 1, . . . , α. Consider first the clique K0 . For each k ∈ {1, . . . , α}, we have
K0 = K0 ∩(V \{vk }) = ∪ωr=1 (K0 ∩S(k−1)ω+r ), since K0 ∩S0 = ∅. This implies
K0 ∩S(k−1)ω+r 6= ∅ since |K0 | = ω and |K0 ∩S(i−1)ω+j | ≤ 1, and thus K0 must
meet each of the stable sets S1 , . . . , Sαω . Consider now the clique K1 (the
reasoning is analogous for any clique Ki with 1 ≤ i ≤ αω). As K1 ∩ S1 = ∅
we have K1 = K1 ∩ (V \ S1 ) = (K1 ∩ S2 ) ∪ . . . ∪ (K1 ∩ Sω ) ∪ (K1 ∩ {v1 }),
which implies that K1 meets each of the stable sets S2 , . . . , Sω and v1 ∈ K1 .
Hence K1 meets S0 and K ∩ {v2 , . . . , vα } = ∅. As above, this now implies
that K1 meets each of the remaining stable sets Sω+1 , . . . , Sαω .

We can now conclude the proof. Define the matrices M, N ∈ Rn×(αω+1) ,


whose columns are χS0 , . . . , χSαω (the incidence vectors of the stable sets
Si ), and the vectors χK0 , . . . , χαω+1 (the incidence vectors of the cliques
Ki ), respectively. By Claim 2, we have that M T N = J − I (where J is
the all-ones matrix and I is the identity matrix). As J − I is nonsingular,
we obtain that rank(M T N ) = rank(J − I) = αω + 1. On the other hand,
rank(M T N ) ≤ rank(N ) ≤ n. Thus we obtain that n ≥ αω + 1, contradicting
our assumption on G.

4.2 Linear programming bounds


4.2.1 Fractional stable sets and colorings
Let ST(G) denote the polytope in RV defined as the convex hull of the
characteristic vectors of the stable sets of G:

ST(G) = conv{χS : S ⊆ V, S is a stable set in G},


78 Graph coloring and independent sets

called the stable set polytope of G. Hence, computing α(G) is linear opti-
mization over the stable set polytope:

α(G) = max{eT x : x ∈ ST(G)}.

We have now defined the stable set polytope by listing explicitly its ex-
treme points. Alternatively, it can also be represented by its hyperplanes
representation, i.e., in the form

ST(G) = {x ∈ RV : Ax ≤ b}

for some matrix A and some vector b. As computing the stability number is
a hard problem one cannot hope to find the full linear inequality description
of the stable set polytope (i.e., the explicit A and b). However some partial
information is known: many classes of valid inequalities for the stable set
polytope are known. For instance, if C is a clique of G, then the clique
inequality
X
x(C) = xi ≤ 1 (4.2)
i∈C

is valid for ST(G): any stable set can contain at most one vertex from the
clique C. The clique inequalities define the polytope

QST(G) = x ∈ RV : x ≥ 0, x(C) ≤ 1 for all cliques C of G .



(4.3)

Cleary, QST(G) is a relaxation of the stable set polytope:

ST(G) ⊆ QST(G). (4.4)

Maximizing the linear function eT x over the polytope QST(G) gives the
parameter
α∗ (G) = max{eT x : x ∈ QST(G)}, (4.5)

known as the fractional stability number of G. Analogously, χ∗ (G) denotes


the fractional coloring number of G, defined by the following linear program:
nP
χ∗ (G) = min S
P
S stable in G λS : S stable in G λS χ = e,
o (4.6)
λS ≥ 0 for S stable set of G .

If we add the constraint that all λS should be integral then we obtain the
coloring number of G. Thus, χ∗ (G) ≤ χ(G). In fact the fractional stability
number of G coincides with the fractional coloring number of its complement:
α∗ (G) = χ∗ (G), and it is nested between α(G) and χ(G).
4.2 Linear programming bounds 79

Lemma 4.2.1 For any graph G, we have

α(G) ≤ α∗ (G) = χ∗ (G) ≤ χ(G), (4.7)

where χ∗ (G) is the optimum value of the linear program:


n X X o
min yC : yC χC = e, yC ≥ 0 for C clique of G .
C clique of G C clique of G
(4.8)

Proof The inequality α(G) ≤ α∗ (G) in (4.7) follows from the inclusion (4.4)
and the inequality χ∗ (G) ≤ χ(G) was observed above. We now show that
α∗ (G) = χ∗ (G). For this, we first observe that in the linear program (4.5)
the condition x ≥ 0 can be removed without changing the optimal value;
that is,
α∗ (G) = max{eT x : x(C) ≤ 1 for C clique of G} (4.9)

(check it). Now, it suffices to observe that the dual LP of the above linear
program (4.9) coincides with the linear program (4.8).

For instance, for an odd circuit C2n+1 (n ≥ 2), α∗ (C2n+1 ) = 2n+1


2 (check
it) lies strictly between α(C2n+1 ) = n and χ(C2n+1 ) = n + 1.
When G is a perfect graph, equality holds throughout in relation (4.7).
As we see in the next section, there is a natural extension of this result to
weighted graphs, which permits to show the equality ST(G) = QST(G) when
G is a perfect graph. Moreover, it turns out that this geometric property
characterizes perfect graphs.

4.2.2 Polyhedral characterization of perfect graphs


For any graph G, the fractional stable set polytope is a linear relaxation
of the stable set polytope: ST(G) ⊆ QST(G). Here we show a geometric
characterization of perfect graphs: G is perfect if and only if both polytopes
coincide: ST(G) = QST(G).
The following operation of duplicating a node will be useful. Let G =
(V, E) be a graph and let v ∈ V . Add to G a new node, say v 0 , which
is adjacent to v and to all neighbours of v in G. In this way we obtain a
new graph H, which we say is obtained from G by duplicating v. Repeated
duplicating is called replicating.
Lemma 4.2.2 Let H arise from G by duplicating a node. If G is perfect
then H too is perfect.
80 Graph coloring and independent sets

Proof First we show that α(H) = χ(H) if H arises from G by duplicating


node v. Indeed, by construction, α(H) = α(G), which is equal to χ(G) since
G is perfect. Now, if C1 , . . . , Ct are cliques in G that cover V with (say)
v ∈ C1 , then C1 ∪ {v 0 }, . . . , Ct are cliques in H covering V (H). This shows
that χ(G) = χ(H), which implies that α(H) = χ(H).
From this we can conclude that, if H arises from G by duplicating a node
v, then α(H 0 ) = χ(H 0 ) for any induced subgraph H 0 of H, using induction
on the number of nodes of G. Indeed, either H 0 is an induced subgraph of
G (if H 0 does not contain both v and v 0 ), or H 0 is obtained by duplicating
v in an induced subgraph of G; in both cases we have that α(H 0 ) = χ(H 0 ).
Hence, if H arises by duplicating a node in a perfect graph G, then H is
perfect which, by Theorem 4.1.3, implies that H is perfect.

Given node weights w ∈ RV+ , we define the following weighted analogues


of the (fractional) stability numbers α(G) and α∗ (G)

α(G, w) = max wT x, α∗ (G, w) = max wT x.


x∈ST(G) x∈QST(G)

Similarly, the weighted analogues of the (fractional) chromatic numbers


χ(G) and χ∗ (G) are the parameters χ(G, w) and χ∗ (G, w) defined by
n X X
χ(G, w) = min yC : yC χC = w, yC ∈ Z,
y
C clique of G C clique of G o
yC ≥ 0 for C clique of G ,
n X X
χ∗ (G, w) = min yC : yC χC = w,
y
C clique of G C clique of G o
yC ≥ 0 for C clique of G .

When w is the all-ones weight function, we find again α(G), α∗ (G), χ(G)
and χ∗ (G), respectively. The following analogue of (4.7) holds for arbitrary
node weights:

α(G, w) ≤ α∗ (G, w) = χ∗ (G, w) ≤ χ(G, w). (4.10)

Lemma 4.2.3 Let G be a perfect graph and let w ∈ ZV+ be nonnegative


integer node weights. Then, α(G, w) = χ(G, w).

Proof Let H denote the graph obtained from G by duplicating node i


wi times if wi ≥ 1 and deleting node i if wi = 0. Then, by construction,
α(G, w) = ω(H), which is equal to χ(H) since H is perfect (by Lemma 4.2.2).
Say, S̃1 , . . . , S̃t are t = χ(H) stable sets in H partitioning V (H). Each stable
set S̃k corresponds to a stable set Sk in G (since S̃k contains at most one of
4.2 Linear programming bounds 81

the wi copies of each node i of G). Now, these stable sets S1 , . . . , St have the
property that each node i of G belongs to exactly wi of them, which shows
that χ(G, w) ≤ t = χ(H). This implies that χ(G, w) ≤ χ(H) = α(G, w),
giving equality χ(G, w) = α(G, w).

We can now show the following geometric characterization of perfect


graphs, due to Chvátal [1975]. In the proof we will use the fact that ST(G) ⊆
QST(G) are down-monotone polytopes in Rn+ (and the properties from Ex-
ercise A.9). Recall that a polytope P ⊆ Rn+ is down-monotone if x ∈ P and
0 ≤ y ≤ x (coordinate-wise) implies y ∈ P .

Theorem 4.2.4 (Chvátal [1975]) A graph G is perfect if and only if


ST(G) = QST(G).

Proof First assume that G is perfect, we show that ST(G) = QST(G).


As ST(G) ⊆ QST(G) are down-monotone in RV+ , we can use the following
property shown in Exercise A.9: To show equality ST(G) = QST(G) it
suffices to show that α(G, w) = α∗ (G, w) for all w ∈ ZV+ ; now the latter
property follows from Lemma 4.2.3 (applied to G).
Conversely, assume that ST(G) = QST(G) and that G is not perfect.
Pick a minimal subset U ⊆ V for which the subgraph G0 of G induced by
U satisfies α(G0 ) < χ(G0 ). Setting w = χU , we have that α(G0 ) = α(G, w)
which, by assumption, is equal to maxx∈QST(G) wT x = α∗ (G, w). Consider
the dual of the linear program defining α∗ (G, w) with an optimal solution
y = (yC ). Pick a clique C of G for which yC > 0, then C is a nonempty subset
of U . Moreover, using complementary slackness, we deduce that x(C) = 1
for any optimal solution x ∈ QST(G) and thus, in particular, |C ∩ S| = 1 for
any maximum cardinality stable set S ⊆ U . Let G00 denote the subgraph of
G induced by U \ C. Then, α(G00 ) ≤ α(G0 ) − 1 < χ(G0 ) − 1 ≤ χ(G00 ), which
contradicts the minimality assumption made on U .

We have just seen that equality ST(G) = QST(G) holds when G is a


perfect graph. Hence the explicit linear inequality description is known for
its stable set polytope, which is given by the clique inequalities and the
nonnegativity constraints. However, it is not clear how to use this informa-
tion in order to give an efficient algorithm for optimizing over the stable
set polytope of a perfect graph! As we see later in Section 4.5.1 there is yet
another description of ST(G) – in terms of semidefinite programming, using
the theta body TH(G) – that will allow to give such an efficient algorithm.
82 Graph coloring and independent sets

4.3 Semidefinite programming bounds


4.3.1 The theta number
Definition 4.3.1 Given a graph G = (V, E), consider the following semidef-
inite program
max {hJ, Xi : Tr(X) = 1, Xij = 0 for {i, j} ∈ E, X  0} . (4.11)
X∈S n

Its optimal value is denoted as ϑ(G), and called the theta number of G.
This parameter was introduced by Lovász [1979]. He proved the following
simple, but crucial result – called the Sandwich Theorem by Knuth [1994]
– which shows that ϑ(G) provides a bound for both the stability number of
G and the chromatic number of the complementary graph G.
Theorem 4.3.2 (Lovász’ sandwich theorem) For any graph G, we
have that
α(G) ≤ ϑ(G) ≤ χ(G).
Proof Given a stable set S of cardinality |S| = α(G), define the matrix
1 S S T
X= χ (χ ) ∈ S n .
|S|
Then X is feasible for (4.11) with objective value hJ, Xi = |S| (check it).
This shows the inequality α(G) ≤ ϑ(G).
Now, consider a matrix X feasible for the program (4.11) and a partition of
V into k cliques: V = C1 ∪ · · · ∪ Ck . Our goal is now to show that hJ, Xi ≤ k,
which will imply ϑ(G) ≤ χ(G). For this, using the relation e = ki=1 χCi ,
P

observe that
k k
X
Ci Ci
T 2
X
χCi (χCi )T − kJ.

Y := kχ −e kχ −e =k
i=1 i=1

Moreover,
k
* +
X
X, χCi (χCi )T = Tr(X).
i=1

Indeed the matrix i χCi (χCi )T has all its diagonal entries equal to 1 and
P

it has zero off-diagonal entries outside the edge set of G, while X has zero
off-diagonal entries on the edge set of G. As X, Y  0, we obtain
0 ≤ hX, Y i = k 2 Tr(X) − khJ, Xi
and thus hJ, Xi ≤ k Tr(X) = k.
4.3 Semidefinite programming bounds 83

We also refer to Lemma 4.4.3 for the inequality ϑ(G) ≤ χ(G), where the
link to coverings by cliques will be even more trasparent.

4.3.2 Computing maximum stable sets in perfect graphs


Assume that G is a graph satisfying α(G) = χ(G). Then, as a direct appli-
cation of Theorem 4.3.2, α(G) = χ(G) = ϑ(G) can be computed by solving
the semidefinite program (4.11), it suffices to solve this semidefinite program
with precision  < 1/2 as one can then find α(G) by rounding the optimal
value to the nearest integer. In particular, combining with the perfect graph
theorem (Theorem 4.1.3):
Theorem 4.3.3 If G is a perfect graph then α(G) = χ(G) = ϑ(G) and
ω(G) = χ(G) = ϑ(G).
Hence one can compute the stability number and the chromatic number
in polynomial time for perfect graphs. Moreover, as was shown by Grötschel,
Lovász and Schrijver [1981], one can also find a maximum stable set and a
minimum coloring in polynomial time for perfect graphs. We now indicate
how to construct a maximum stable set – we deal with minimum graph
colorings in the next section.
Let G = (V, E) be a perfect graph. Order the nodes of G as v1 , · · · , vn .
Then we construct a sequence of induced subgraphs G0 , G1 , · · · , Gn of G.
Hence each Gi is perfect, also after removing a node, so that we can compute
in polynomial time the stability number of such graphs. The construction
goes as follows: Set G0 = G. For each i = 1, · · · , n do the following:

(i) Compute α(Gi−1 \vi ).


(ii) If α(Gi−1 \vi ) = α(G), then set Gi = Gi−1 \vi .
(iii) Otherwise, set Gi = Gi−1 .
By construction, α(Gi ) = α(G) for all i. In particular, α(Gn ) = α(G).
Moreover, the node set of the final graph Gn is a stable set and, therefore,
it is a maximum stable set of G. Indeed, if the node set of Gn is not stable
then it contains a node vi for which α(Gn \vi ) = α(Gn ). But then, as Gn is
an induced subgraph of Gi−1 , one would have that α(Gn \vi ) ≤ α(Gi−1 \vi )
and thus α(Gi−1 \vi ) = α(G), so that node vi would have been removed at
Step (ii).
Hence, the above algorithm permits to construct a maximum stable set in
a perfect graph G in polynomial time – namely by solving n + 1 semidefinite
programs for computing α(G) and α(Gi−1 \vi ) for i = 1, · · · , n.
84 Graph coloring and independent sets

More generally, given integer node weights w ∈ ZV+ , the above algorithm
can also be used to find a stable set S of maximum weight w(S). For this,
construct the new graph G0 in the following way: Duplicate each node i ∈ V
wi times, i.e., replace node i ∈ V by a set Wi of wi nodes pairwise non-
adjacent, and make two nodes x ∈ Wi and y ∈ Wj adjacent if i and j are
adjacent in G. By Lemma 4.2.2, the graph G0 is perfect. Moreover, α(G0 )
is equal to the maximum weight w(S) of a stable set S in G. From this it
follows that, if the weights wi are bounded by a polynomial in n, then one
can compute α(G, w) in polynomial time. (More generally, one can compute
α(G, w) in polynomial time, e.g. by optimizing the linear function wT x over
the theta body TH(G), introduced in Section 4.5.1 below.)

4.3.3 Minimum colorings of perfect graphs


We now describe an algorithm for computing a minimum coloring of a perfect
graph G in polynomial time. This will be reduced to several computations
of the theta number, which we will use for computing the clique number of
some induced subgraphs of G.
Let G = (V, E) be a perfect graph. Call a clique of G maximum if it has
maximum cardinality ω(G). The crucial observation is that it suffices to find
a stable set S in G which meets all maximum cliques.
First of all, such a stable set S exists: in a ω(G)-coloring, any color class
S must meet all maximum cliques.
Now, if we have found such a stable set S, then one can recursively color
G\S with ω(G\S) = ω(G) − 1 colors (in polynomial time), and thus, using
an additional color to color the elements in S, one obtains a coloring of G
with ω(G) colors.
The algorithm for finding a stable set meeting all maximum cliques goes
as follows: For t ≥ 1, we grow a list L of t maximum cliques C1 , · · · , Ct .
Suppose C1 , · · · , Ct have been found. Then do the following:

(i)We find a stable set S meeting each of the cliques C1 , · · · , Ct (see below).
(ii)Compute ω(G\S).
(iii)If ω(G\S) < ω(G) then S meets all maximum cliques and we are done.
(iv) Otherwise, compute a maximum clique Ct+1 in G\S, which is thus a new
maximum clique of G, and we add it to the list L.
The first step can be done as follows: Set w = ti=1 χCi ∈ ZV+ . As G is
P

perfect, we know that α(G, w) = χ(G, w). Moreover, χ(G, w) = t. Indeed,


χ(G, w) ≤ t follows from the definition of w. Conversely, if y = (yC ) is
4.4 Other formulations of the theta number 85

feasible for the program defining χ(G, w) then, on the one hand, wT e =
T
P P
C yC |C| ≤ C yC ω(G) and, on the other hand, w e = tω(G), thus imply-
ing t ≤ χ(G, w). Now we compute a stable set S having maximum possible
weight w(S). Hence, we have w(S) = t and thus S meets each of the cliques
C1 , · · · , Ct .
The above algorithm has polynomial running time, since the number of
iterations is bounded by |V |. To see this, define the affine space Lt ⊆ RV
defined by the equations x(C1 ) = 1, · · · , x(Ct ) = 1 corresponding to the
cliques in the current list L. Then, Lt contains strictly Lt+1 , since χS ∈
Lt \ Lt+1 for the set S constructed in the first step, and thus the dimension
decreases at least by 1 at each iteration.

4.4 Other formulations of the theta number


4.4.1 Dual formulation
We now give several equivalent formulations for the theta number obtained
by applying semidefinite programming duality and some further elementary
manipulations.
Lemma 4.4.1 The theta number can be expressed by any of the following
programs:
ϑ(G) = min {t : tI + A − J  0, Aij = 0 for i = j or {i, j} ∈ E},
t∈R,A∈S n
(4.12)

ϑ(G) = min n t : tI − B  0, Bij = 1 for i = j or {i, j} ∈ E ,
t∈R,B∈S
(4.13)
ϑ(G) = min {t : C − J  0, Cii = t for i ∈ V, Cij = 0 for {i, j} ∈ E},
t∈R,C∈S n
(4.14)

ϑ(G) = minn λmax (B) : Bij = 1 for i = j or {i, j} ∈ E . (4.15)
B∈S

Proof First we build the dual of the semidefinite program (4.11), which
reads:  
 X 
min t : tI + yij Eij − J  0 . (4.16)
t∈R,y∈RE  
{i,j}∈E

As both programs (4.11) and (4.16) are strictly feasible, there is no duality
gap: the optimal value of (4.16) is equal to ϑ(G), and the optimal values
are attained in both programs – here we have applied the duality theorem
(Theorem 2.4.1).
86 Graph coloring and independent sets
P
Setting A = {i,j}∈E yij Eij , B = J −A and C = tI +A in (4.16), it follows
that the program (4.16) is equivalent to (4.12), (4.13) and (4.14). Finally the
formulation (4.15) follows directly from (4.13) after recalling that λmax (B)
is the smallest scalar t for which tI − B  0.

4.4.2 Two more (lifted) formulations


We give here two more formulations for the theta number. They rely on
semidefinite programs involving symmetric matrices of order 1 + n, which
we will index by the set {0} ∪ V , where 0 is an additional index that does
not belong to V .

Lemma 4.4.2 The theta number ϑ(G) is equal to the optimal value of the
following semidefinite program:

min {Z00 : Z  0, Z0i = Zii = 1 for i ∈ V, Zij = 0 for {i, j} ∈ E}.


Z∈S n+1
(4.17)

Proof We show that the two semidefinite programs in (4.12) and (4.17) are
equivalent. For this, observe that

eT
 
t
tI + A − J  0 ⇐⇒ Z :=  0,
e I + 1t A

which follows by taking the Schur complement of the upper left corner t in
the block matrix Z. Hence, if (t, A) is feasible for (4.12), then Z is feasible for
(4.17) with same objective value: Z00 = t. The construction can be reversed:
if Z is feasible for (4.17), then one can construct (t, A) feasible for (4.12)
with t = Z00 . Hence both programs are equivalent.

From the formulation (4.17), the link of the theta number to the (frac-
tional) chromatic number is even more transparent.

Lemma 4.4.3 For any graph G, we have that ϑ(G) ≤ χ∗ (G).

Proof Let y = (yC ) be feasible for the linear program (4.8) defining χ∗ (G).
For each clique C define the (column) vector zC = (1 χC ) ∈ R1+n , obtained
by appending an entry equal to 1 to the characteristic vector of C. Define
T . One can verify that Z is feasible for
P
the matrix Z = C clique of G yC zC zC
P
the program (4.17) with objective value Z00 = C yC (check it). This shows
ϑ(G) ≤ χ∗ (G).
4.4 Other formulations of the theta number 87

Applying duality to the semidefinite program (4.17), we obtain4 the fol-


lowing formulation for ϑ(G).
Lemma 4.4.4 The theta number ϑ(G) is equal to the optimal value of the
following semidefinite program:
nX o
max Yii : Y  0, Y00 = 1, Y0i = Yii for i ∈ V, Yij = 0 for {i, j} ∈ E .
Y ∈S n+1
i∈V
(4.18)
Proof First we write the program (4.17) in standard form, using the el-
ementary matrices Eij (with entries 1 at positions (i, j) and (j, i) and 0
elsewhere):
inf{hE00 , Zi : hEii , Zi = 1, hE0i , Zi = 2 for i ∈ V,
hEij , Zi = 0 for {i, j} ∈ E, Z  0}.
Next we write the dual of this sdp:
nX X X o
sup yi + 2zi : Y = E00 − yi Eii + zi E0i + uij Eij  0 .
i∈V i∈V {i,j}∈E

Observe that the matrix Y ∈ S n+1 occurring in this last program can be
equivalently characterized by the conditions: Y00 = 1, Yij = 0 if {i, j} ∈
P
E and Y  0. Moreover  the objective function reads: i∈V yi + 2zi =
P
− i∈V Y ii + 2Y 0i . Therefore the dual can be equivalently reformulated
as
n X  o
sup − Yii + 2Y0i : Y  0, Y00 = 1, Yij = 0 for {i, j} ∈ E . (4.19)
i∈V

As (4.17) is strictly feasible (check it) there is no duality gap, the optimal
value of (4.19) is attained and it is equal to ϑ(G).
Let Y be an optimal solution of (4.19). We claim that Y0i + Yii = 0 for all
i ∈ V . Indeed, assume that Y0i + Yii 6= 0 for some i ∈ V , so that Yii 6= 0. We
construct a new matrix Y 0 feasible for (4.19) and having a larger objective
value than Y , thus contradicting the optimality of Y . If Y0i ≥ 0, then we
let Y 0 be obtained from Y by setting to 0 all the entries at the positions
(i, 0) and (i, j) for j ∈ [n], which indeed has a larger objective value since
Yii + 2Y0i > 0. Assume now Y0i < 0. Then set λ = −Y0i /Yii > 0 and let
Y 0 be obtained from Y by multiplying its i-th row and column by λ. Then,
Yii0 = λ2 Yii = Y0i2 /Yii , Y0i0 = λY0i = −Yii0 , and Y 0 has a larger objective value
than Y since −Yii0 − 2Y0i0 = Y0i2 /Yii > −Yii − 2Y0i .
4 Of course there is more than one road leading to Rome: one can also show directly the
equivalence of the two programs (4.11) and (4.18).
88 Graph coloring and independent sets

Therefore, we can add w.l.o.g. the condition Y0i = −Yii (i ∈ V ) to (4.19),


P
so that its objective function can be replaced by i∈V Yii . Finally, in order
to get the program (4.18), it suffices to observe that one can change the
signs on the first row and column of Y (indexed by the index 0). In this way
we obtain a matrix Ỹ such that Ỹ0i = −Y0i for all i and Ỹij = Yij at all
other positions. Thus Ỹ now satisfies the conditions Ỹii = Ỹ0i for i ∈ V and
it is an optimal solution of (4.18).

4.4.3 Hoffman eigenvalue bound for coloring


Let AG be the adjacency matrix of G. Then, the following eigenvalue bound
for the chromatic number is known as Hoffman’s bound:
λmax (AG )
χ(G) ≥ 1 − .
λmin (AG )

As shown in Lovász [1979], it turns out that this bound can be strengthened,
by replacing AG by any matrix A supported by the graph G, and that this
gives yet another formulation for the theta number.

Theorem 4.4.5 Let G = (V, E) be a graph. Then


 
λmax (A)
ϑ(G) = maxn 1 − : Aii = 0 for i ∈ V, Aij = 0 for {i, j} ∈ E .
A∈S λmin (A)
(4.20)

Proof Let A ∈ S n with Aij = 0 if i = j or {i, j} ∈ E. We show the


inequality ϑ(G) ≥ 1 − λmax (A)/λmin (A). As A has a zero diagonal (and can
be assumed to be nonzero) we have λmin (A) < 0 and thus we can pick an
eigenvector x for the eigenvalue λmax (A) of A with kxk2 = −1/λmin (A), so
that xT Ax = −λmax (A)/λmin (A). Consider the matrix B = Diag(x)(A −
λmin (A)I)Diag(x). Then B  0, Tr(B) = 1 and Bij = 0 if {i, j} ∈ E. Hence
ϑ(G) ≥ ni,j=1 Bij = (λmax (A) − λmin (A))kxk2 = 1 − λmax (A)/λmin (A).
P

Conversely, assume B is an optimal solution for the program defining


ϑ(G); that is, B  0, Tr(B) = 1, Bij = 0 if {i, j} ∈ E, and ni,j=1 Bij =
P

ϑ(G). Consider the vector x = ( Bii ) ∈ Rn , which is a unit vector. We
may assume Bii 6= 0 for all i (else the argument can be easily modifed).
Then the matrix A = Diag(x)−1 BDiag(x)−1 − I satisfies Aij = 0 if i = j
or {i, j} ∈ E. Moreover A + I  0, which implies λmin (A) ≥ −1. Finally,
xT Ax = ϑ(G) + 1, which implies λmax (A) ≥ ϑ(G) + 1. From this one can
deduce 1 − λmax (A)/λmin (A) ≥ ϑ(G), which concludes the proof.
4.5 Geometric properties of the theta number 89

4.5 Geometric properties of the theta number


In this section we introduce the theta body TH(G). This is a semidefinite
relaxation of the stable set polytope ST(G), which is at least as tight as
its linear relaxation QST(G). Moreover, the theta body TH(G) provides an-
other, more geometric formulation for the theta number as well as geometric
characterizations of perfect graphs.

4.5.1 The theta body TH(G)


It is convenient to introduce the following set of matrices X ∈ S n+1 , where
columns and rows are indexed by the set {0} ∪ V :
n+1
MG = {Y ∈ S+ : Y00 = 1, Y0i = Yii for i ∈ V, Yij = 0 for {i, j} ∈ E}.
(4.21)
The set MG is thus the feasible region of the semidefinite program (4.18).
Now let TH(G) denote the convex set obtained by projecting the set MG
onto the subspace RV of the diagonal entries:

TH(G) = {x ∈ Rn : ∃Y ∈ MG such that xi = Yii for i ∈ V }, (4.22)

which is called the theta body of G. It turns out that TH(G) is nested between
the stable set polytope ST(G) and its linear relaxation QST(G).

Lemma 4.5.1 For any graph G, we have that ST(G) ⊆ TH(G) ⊆ QST(G).

Proof The inclusion ST(G) ⊆ TH(G) follows from the fact that the char-
acteristic vector of any stable set S lies in TH(G). To see this, define the
(column) vector y = (1 χS ) ∈ Rn+1 obtained by adding an entry equal to
1 to the characteristic vector of S, and define the matrix Y = yy T ∈ S n+1 .
Then Y ∈ MG and χS = (Yii )i∈V , which shows that χS ∈ TH(G).
We now show the inclusion TH(G) ⊆ QST(G). For this pick a vector x
in TH(G) and a clique C of G; we show that x(C) ≤ 1. Say xi = Yii for all
i ∈ V , where Y ∈ MG . Consider the principal submatrix YC of Y indexed
by {0} ∪ C, which is of the form

xT
 
1 C
YC = ,
xC Diag(xC )

where we set xC = (xi )i∈C . Now, YC  0 implies Diag(xC ) − xC xT C  0


(taking a Schur complement) and in turn: eT (Diag(xC ) − xC xT
C )e ≥ 0. The
2
latter can be rewritten as x(C) − (x(C)) ≥ 0 and thus gives x(C) ≤ 1.
90 Graph coloring and independent sets

In view of Lemma 4.4.4, maximizing the all-ones objective function over


TH(G) gives the theta number:

ϑ(G) = max{eT x : x ∈ TH(G)}.

As maximizing eT x over QST(G) gives the LP bound α∗ (G), Lemma 4.5.1


implies directly that the SDP bound ϑ(G) dominates the LP bound α∗ (G):

Corollary 4.5.2 For any graph G, we have that α(G) ≤ ϑ(G) ≤ α∗ (G).

Combining the inclusion from Lemma 4.5.1 with Theorem 4.2.4, we deduce
that TH(G) = ST(G) = QST(G) for perfect graphs. As we will see in
Theorem 4.5.9 below it turns out that these equalities characterize perfect
graphs.

4.5.2 Orthonormal representations of graphs


We introduce the notion of orthonormal representation of a graph G, which
will be used in the next section to give further geometric descriptions of the
theta body TH(G).

Definition 4.5.3 An orthonormal representation of G, abbreviated as


ONR, consists of a set of unit vectors {u1 , . . . , un } ⊆ Rd (for some d ≥ 1)
satisfying

uT
i uj = 0 for {i, j} ∈ E.

Note that the smallest integer d for which there exists an orthonormal
representation of G is upper bounded by χ(G) (check it). Moreover, if S is
a stable set in G and the ui ’s form an ONR of G in Rd , then the vectors ui
labeling the nodes of S are pairwise orthogonal, which implies that d ≥ α(G).
It turns out that the stronger lower bound d ≥ ϑ(G) holds.

Lemma 4.5.4 The smallest dimension d for which a graph G admits an


orthonormal representation in Rd satisfies: ϑ(G) ≤ d.

Proof Let u1 , · · · , un ∈ Rd be an ONR of G. Define the matrices U0 = Id ,


Ui = ui uT d
i ∈ S for i ∈ [n]. Now we define a symmetric matrix Z ∈ S
n+1 by

setting Zij = hUi , Uj i for i, j ∈ {0} ∪ [n]. One can verify that Z is feasible
for the program (4.17) defining ϑ(G) (check it) with Z00 = d. This gives
ϑ(G) ≤ d.
4.5 Geometric properties of the theta number 91

4.5.3 Geometric properties of the theta body


There is a beautiful relationship between the theta bodies of a graph G and
of its complementary graph G:

Theorem 4.5.5 For any graph G,

TH(G) = {x ∈ RV+ : xT z ≤ 1 for all z ∈ TH(G)}.

In other words, we know an explicit linear inequality description of TH(G);


moreover, the normal vectors to the supporting hyperplanes of TH(G) are
precisely the elements of TH(G). One inclusion is easy:

Lemma 4.5.6 If x ∈ TH(G) and z ∈ TH(G) then xT z ≤ 1.

Proof Let Y ∈ MG and Z ∈ MG such that x = (Yii ) and z = (Zii ). Let Z 0


be obtained from Z by changing signs in its first row and column (indexed
by 0). Then hY, Z 0 i ≥ 0 as Y, Z 0  0. Moreover, hY, Z 0 i = 1 − xT z (check it),
thus giving xT z ≤ 1.

Next we observe how the elements of TH(G) can be expressed in terms of


orthonormal representations of G.

Lemma 4.5.7 For x ∈ RV+ , x ∈ TH(G) if and only if there exist an


orthonormal representation v1 , . . . , vn of G and a unit vector d such that
x = ((dT vi )2 )i∈V .

Proof Let d, vi be unit vectors where the vi ’s form an ONR of G; we show


that x = ((dT vi )2 ) ∈ TH(G). For this, let Y ∈ S n+1 denote the the Gram
matrix of the vectors d and (viT d)vi for i ∈ V , so that x = (Yii ). One can
verify that Y ∈ MG , which implies x ∈ TH(G).
For the reverse inclusion, pick Y ∈ MG and a Gram representation w0 , wi
(i ∈ V ) of Y . Set d = w0 and vi = wi /kwi k for i ∈ V . Then the conditions
expressing membership of Y in MG imply that the vi ’s form an ONR of G,
kdk = 1, and Yii = (dT vi )2 for all i ∈ V .

To conclude the proof of Theorem 4.5.5 we use the following result, which
characterizes which partially specified matrices can be completed to a posi-
tive semidefinite matrix – this will be proved in Exercise 4.2.

Proposition 4.5.8 Let H = (W, F ) be a graph and let aij (i = j ∈ W


or {i, j} ∈ F ) be given scalars, corresponding to a vector a ∈ RW ∪F . Define
the convex set

Ka = {Y ∈ S W : Y  0, Yij = aij for i = j ∈ W and {i, j} ∈ F } (4.23)


92 Graph coloring and independent sets

(consisting of all possible positive semidefinite completions of a) and the


cone
CH = {Z ∈ S W : Z  0, Zij = 0 for {i, j} ∈ F } (4.24)
(consisting of all positive semidefinite matrices supported by the graph H).
Then, Ka 6= ∅ if and only if
X X
aii Zii + 2 aij Zij ≥ 0 for all Z ∈ CH . (4.25)
i∈W {i,j}∈F

Proof (of Theorem 4.5.5). Let x ∈ RV+ such that xT z ≤ 1 for all z ∈ TH(G);
we show that x ∈ TH(G). For this we need to find a matrix Y ∈ MG such
that x = (Yii )i∈V . In other words, the entries of Y are specified already at
the following positions: Y00 = 1, Y0i = Yii = xi for i ∈ V , and Y{i,j} = 0
for all {i, j} ∈ E, and we need to show that the remaining entries (at the
positions of non-edges of G) can be chosen in such a way that Y  0.
To show this we apply Proposition 4.5.8, where the graph H is G with
an additional node 0 adjacent to all i ∈ V . Hence it suffices now to show
{0}∪V
that hY, Zi ≥ 0 for all matrices Z ∈ S+ with Zij = 0 if {i, j} ∈ E. Pick
such Z, say with Gram representation w0 , w1 , · · · , wn . Then wiT wj = 0 if
{i, j} ∈ E. We can assume without loss of generality that all wi are non-
zero (use continuity if some wi is zero) and up to scaling that w0 is a unit
vector. Then the vectors wi /kwi k (for i ∈ V ) form an ONR of G. By Lemma
4.5.7 (applied to G), the vector z ∈ RV , defined by zi = (w0T wi )2 /kwi k2 for
i ∈ V , belongs to TH(G) and thus we have xT z ≤ 1 by assumption. We can
now verify that hY, Zi is equal to
X X X  (wT wi )2 
T 2 0 T 2
1+2 x i w0 wi + xi kwi k ≥ xi + 2w0 wi + kwi k
kwi k2
i∈V i∈V i∈V

2
w0T wi
X 
= xi + kwi k ≥ 0,
kwi k
i∈V

which concludes the proof.

4.5.4 Geometric characterizations of perfect graphs


We can now prove the following geometric characterization of perfect graphs,
which strengthens the polyhedral characterization of Theorem 4.2.4.
Theorem 4.5.9 For any graph G the following assertions are equivalent.
(1) G is perfect.
4.5 Geometric properties of the theta number 93

(2) TH(G) = ST(G)


(3) TH(G) = QST(G).
(4) TH(G) is a polytope.
We start with the following observations which will be useful for the proof.
Recall that the antiblocker of a set P ⊆ Rn+ is defined as
abl(P ) = {y ∈ Rn+ : y T x ≤ 1 for all x ∈ P }.
We will use the following property, shown in Exercise A.9: If P ⊆ Rn+ is a
down-monotone polytope in Rn+ then P = abl(abl(P )).
Using this notion of antiblocker, we see that Theorem 4.5.5 shows that
TH(G) is the antiblocker of TH(G): TH(G) = abl(TH(G)) and, analo-
gously, TH(G) = abl(TH(G)). Moreover, by its definition, QST(G) is the
antiblocker of ST(G): QST(G) = abl(ST(G)). This implies the equalities
abl(QST(G)) = abl(abl(ST(G))) = ST(G)
and thus
ST(G) = abl(QST(G)). (4.26)
We now show that if TH(G) is a polytope then it coincides with QST(G),
which is the main ingredient in the proof of Theorem 4.5.9. As any polytope
is equal to the solution set of its facet defining inequalities, it suffices to show
that the only inequalities that define facets of TH(G) are the nonnegativity
conditions and the clique inequalities.
Lemma 4.5.10 Let a ∈ Rn and α ∈ R. If the inequality aT x ≤ α defines
a facet of TH(G) then it is a multiple of a nonnegativity condition xi ≥ 0
for some i ∈ V or of a clique inequality x(C) ≤ 1 for some clique C of G.
Proof Let F = {x ∈ TH(G) : aT x = α} be the facet of TH(G) defined by
the inequality aT x ≤ α. Pick a point z in the relative interior of F , thus
z lies on the boundary of TH(G). We use the description of TH(G) from
Theorem 4.5.5. If zi = 0 for some i ∈ V , then the inequality aT x ≤ α is
equivalent to the nonnegativity condition xi ≥ 0. Suppose now that z T y = 1
for some y ∈ TH(G). In view of Lemma 4.5.7, y = ((cT ui )2 )ni=1 for some unit
vectors c, u1 , . . . , un ∈ Rk forming an orthonormal representation of G, i.e.,
satisfying uT T
i uj = 0 for {i, j} ∈ E. Then the inequality a x ≤ α is equivalent
Pn T 2
to i=1 (c ui ) xi ≤ 1, i.e., up to scaling we may assume that α = 1 and
ai = (cT ui )2 for all i ∈ V . We claim that
n
X
c= xi (cT ui )ui for all x ∈ F. (4.27)
i=1
94 Graph coloring and independent sets

Indeed, for any unit vector d ∈ Rk , the vector ((dT ui )2 )ni=1 belongs to TH(G)
and thus ni=1 (dT ui )2 xi ≤ 1 for all x ∈ F . In other words, the maximum
P

of the quadratic form dT ( ni=1 xi ui uT k


P
i )d taken over all unit vectors d ∈ R
is equal to 1 and it is attained at d = c. This shows that c is an eigen-
Pn T
vector of the matrix i=1 xi ui ui for the eigenvalue 1, and thus equality
Pn T
( i=1 xi ui ui )c = c holds, which gives (4.27).
Pn T
From (4.27) we deduce that each equation i=1 xi (ui c)(ui )j = cj is
Pn T 2
a scalar multiple of the equation i=1 xi (ui c) = 1. This implies that
(ui c)(ui )j = cj (ui c) for all i, j ∈ [n]. Hence, uT
T T 2 T
i c 6= 0 implies ui = (ui c)c,
thus ui = ±c (since ui and c are both unit vectors) and without loss of
generality ui = c. Set C = {i ∈ V : ui = c}, so that the inequality
Pn T 2
P
i=1 (c ui ) xi ≤ 1 reads i∈C xi ≤ 1. Finally we now observe that C
is a clique in G, since i 6= j ∈ C implies uT T
i uj = c c = 1 and thus {i, j} ∈ E.
This concludes the proof.

We can now complete the proof of Theorem 4.5.9.

Proof (of Theorem 4.5.9). By Theorem 4.2.4 we know that G is perfect


if and only if QST(G) = ST(G). Moreover, by Lemma 4.5.1, we have the
inclusion ST(G) ⊆ TH(G) ⊆ QST(G). Hence, in order to show the theorem
it suffices to show that QST(G) = ST(G) if and only if TH(G) is a polytope;
only the ‘if’ part needs a proof.
Assume that TH(G) is a polytope; we show TH(G) = QST(G) = ST(G).
If TH(G) is a polytope then TH(G) = QST(G) in view of Lemma 4.5.10.
Moreover, TH(G) too is a polytope since TH(G) = abl(TH(G)), and thus
we have TH(G) = QST(G), again in view of Lemma 4.5.10 (applied to G).
Finally, taking the antiblocker of both sides and using (4.26), we obtain that
TH(G) = abl(TH(G)) = abl(QST(G)) = ST(G).

4.6 Bounding the Shannon capacity


The theta number was introduced by Lovász [1979] in connection with the
problem of computing the Shannon capacity of a graph, a problem in coding
theory considered by Shannon5 . We need some definitions.6
Definition 4.6.1 (Strong product) Let G = (V, E) and H = (W, F ) be
two graphs. Their strong product is the graph denoted as G · H with node
set V × W and with edges the pairs of distinct nodes {(i, r), (j, s)} ∈ V × W
with (i = j or {i, j} ∈ E) and (r = s or {r, s} ∈ F ).
5 Claude Shannon (1916–2001)
6 Insert a picture of the strong product of a path and K2 ?
4.6 Bounding the Shannon capacity 95

Observe that if S ⊆ V is stable in G and T ⊆ W is stable in H then


S × T is stable in G · H. Hence, α(G · H) ≥ α(G)α(H). We let Gk denote
the strong product of k copies of G. Then for any integers k, m ∈ N we have
that
α(Gk+m ) ≥ α(Gk )α(Gm ) and α(Gk ) ≥ (α(G))k .

Consider the parameter


q
k
Θ(G) := sup α(Gk ), (4.28)
k≥1

called the Shannon capacity p of the graph G. Using Fekete’s lemma7 one can
verify that Θ(G) = limk→∞ k α(Gk ).
The parameter Θ(G) was introduced by Shannon in 1956. The motivation
is as follows. Suppose V is a finite alphabet, where some pairs of letters could
be confused when they are transmitted over some transmission channel.
These pairs of confusable letters can be seen as the edge set E of a graph G =
(V, E). Then the stability number of G is the largest number of one-letter
messages that can be sent over the transmission channel without danger
of confusion. Words of length k correspond to k-tuples in V k . Two words
(i1 , · · · , ik ) and (j1 , · · · , jk ) can be confused if at every position h ∈ [k]
the two letters ih and jh are equal or can be confused, which corresponds
to having an edge in the strong product Gk . Hence the largest number of
words of length k that can be sent without danger of confusion is equal to the
stability number of Gk and the Shannon capacity of the graph G represents
the rate of correct transmission of the channel.
For instance, for the 5-cycle C5 , α(C5 ) = 2, but α((C5 )2 ) ≥ 5. Indeed, if
1, 2, 3, 4, 5 are the nodes of C5 (in this cyclic order), then the five 2-letters
words (1, 1), (2, 3), (3, 5), (4, 2), √ (5, 4) form a stable set in the strong product
2
G . This implies that Θ(C5 ) ≥ 5.
Determining the exact Shannon capacity of a graph is a very difficult
problem in general, even for small graphs. For instance, the exact value
of the Shannon capacity of C5 was not known until Lovász [1979] showed
how to use the theta number in order to upper bound √ the Shannon capacity.
Lovász [1979] √ showed that Θ(G) ≤ ϑ(G) and ϑ(C 5 ) = 5, which implies that
Θ(C5 ) = 5. For instance, although the exact value of the theta number
of C2n+1 is known (cf. Proposition 4.7.6), the exact value of the Shannon
capacity of C2n+1 is not known, already for C7 .
7 Consider a sequence (ak )k of positive real numbers satisfying: ak+m ≥ ak + am for all
integers k, m ∈ N. Fekete’s lemma claims that limk→∞ ak /k = supk∈N ak /k. Then apply
Fekete’s lemma to the sequence ak = log α(Gk ).
96 Graph coloring and independent sets

Theorem 4.6.2 For any graph G, we have that Θ(G) ≤ ϑ(G).

The proof is based on the multiplicative property of the theta number


from Lemma 4.6.3 – which you will prove in Exercise 4.3 – combined with
the fact that the theta number upper bounds thep stability number: For any
k k k
integer k, α(G ) ≤ ϑ(G ) = (ϑ(G)) implies k
α(Gk ) ≤ ϑ(G) and thus
Θ(G) ≤ ϑ(G).

Lemma 4.6.3 The theta number of the strong product of two graphs G
and H satisfies ϑ(G · H) = ϑ(G)ϑ(H).

As an application one can compute the Shannon capacity of the 5-cycle


C5 :

Θ(C5 ) = 5.
p √
Indeed, Θ(C5 ) ≥ α(C52 ) ≥ 5 (by the definition of the Shannon capacity)
and Θ(C5 ) ≤ ϑ(C5 ) (by Theorem 4.6.2). Finally,
√ one can compute the exact
value of the theta number and show ϑ(C5 ) = 5 in several ways. For instance
one can exploit the fact that C5 is vertex-transitive to find a closed form
expression for the optimum value of the SDP defining the theta number; we
will do this in the next section (see relation (4.30)). The original proof of
Lovász [1979] will be worked out in Exercises 4.7 and 4.8.

4.7 The theta number for vertex-transitive graphs


The following inequalities relate the stability number and the (fractional)
coloring number of a graph:

|V | ≤ α(G)χ∗ (G) ≤ α(G)χ(G).

(Check it.) First we mention the following analogous inequality relating the
theta numbers of G and its complement G.

Proposition 4.7.1 For any graph G = (V, E), we have that ϑ(G)ϑ(G) ≥
|V |.

Proof Using the formulation of the theta number from (4.14), we obtain
matrices C, C 0 ∈ S n such that C − J, C 0 − J  0, Cii = ϑ(G), Cii0 = ϑ(G)
for i ∈ V , Cij = 0 for {i, j} ∈ E and Cij0 = 0 for {i, j} ∈ E. Combining the

inequalities hC − J, Ji ≥ 0, hC 0 − J, Ji ≥ 0 and hC − J, C 0 − Ji ≥ 0 with the


identity hC, C 0 i = nϑ(G)ϑ(G), we get the desired inequality.

We now show that equality ϑ(G)ϑ(G) = |V | holds for certain symmetric


4.7 The theta number for vertex-transitive graphs 97

graphs, namely for vertex-transitive graphs. In order to show this, one ex-
ploits in a crucial manner the symmetry of G, which permits to show that
the semidefinite program defining the theta number has an optimal solution
with a special (symmetric) structure. We need to introduce some definitions.

Let G = (V, E) be a graph. A permutation σ of the node set V is called


an automorphism of G if it preserves the edge set of G, i.e., {i, j} ∈ E if
and only if {σ(i), σ(j)} ∈ E. Then the set of automorphisms of G, which is
denoted by Aut(G), is a group. The graph G is said to be vertex-transitive
if for any two nodes i, j ∈ V there exists an automorphism σ ∈ Aut(G)
mapping i to j: σ(i) = j. For instance any complete graph Kn and any cycle
Cn are vertex-transitive (check it).
The group of permutations of V acts on symmetric matrices X indexed by
V . Namely, if σ is a permutation of V and Pσ is the corresponding permu-
tation matrix (with (i, j)th entry Pσ (i, j) = 1 if j = σ(i) and 0 otherwise),
then one can build the new symmetric matrix

σ(X) := Pσ XPσT = (Xσ(i),σ(j) )i,j∈V .

If σ is an automorphism of G, then σ preserves the feasible region of the


semidefinite program (4.11) defining the theta number ϑ(G). This is an easy,
but very useful observation, which follows from the fact that the matrices
entering the program (4.11) behave well under action of Aut(G). Indeed the
all-ones matrix J and the identity matrix I are invariant under action of
Aut(G) and, when applying a permutation σ ∈ Aut(G) to an elementary
matrix Eij indexed by an edge {i, j} ∈ E we still have an elementary matrix
Eσ−1 (i)σ−1 (j) indexed by another edge of G.

Lemma 4.7.2 If X is feasible for the program (4.11) and σ is an auto-


morphism of G, then σ(X) is again feasible for (4.11), moreover with the
same objective value as X.

Proof Directly from the fact that hJ, σ(X)i = hJ, Xi, Tr(σ(X)) = Tr(X)
and σ(X)ij = Xσ(i)σ(j) = 0 if {i, j} ∈ E (since σ is an automorphism of
G).

Lemma 4.7.3 The program (4.11) has an optimal solution X ∗ which


is invariant under action of the automorphism group of G, i.e., satisfies
σ(X ∗ ) = X ∗ for all σ ∈ Aut(G).

Proof Let X be an optimal solution of (4.11). By Lemma 4.7.2, σ(X) is


98 Graph coloring and independent sets

again an optimal solution for each σ ∈ Aut(G). Define the matrix


1 X
X∗ = σ(X),
|Aut(G)|
σ∈Aut(G)

obtained by averaging over all matrices σ(X) for σ ∈ Aut(G). As the set of
optimal solutions of (4.11) is convex, X ∗ is still an optimal solution of (4.11).
Moreover, by construction, X ∗ is invariant under action of Aut(G).
Corollary 4.7.4 If G is a vertex-transitive graph then the program (4.11)
has an optimal solution X ∗ satisfying Xii∗ = 1/n for all i ∈ V and X ∗ e =
ϑ(G)
n e.

Proof By Lemma 4.7.3, there is an optimal solution X ∗ which is invariant


under action of Aut(G). As G is vertex-transitive, all diagonal entries of
X ∗ are equal. Indeed, let i, j ∈ V and σ ∈ Aut(G) such that σ(i) = j.
Then, Xjj ∗ = X∗ = Xii∗ . As Tr(X ∗ ) = 1 we must have Xii∗ = 1/n for
σ(i)σ(i)
∗ = ∗ ∗ ∗
P P P P
all i. Moreover, k∈V Xjk k∈V Xσ(i)k = h∈V Xσ(i)σ(h) = h∈V Xih ,
which shows that X ∗ e = λe for some scalar λ. Combining with the condition
hJ, X ∗ i = ϑ(G) we obtain that λ = ϑ(G) n .

Proposition 4.7.5 If G is a vertex-transitive graph, then ϑ(G)ϑ(G) = |V |.


Proof By Corollary 4.7.4, there is an optimal solution X ∗ of the program
(4.11) defining ϑ(G) which satisfies Xii∗ = 1/n for i ∈ V and X ∗ e = ϑ(G)
n e.
n2 ∗ n n2 ∗
Then ϑ(G) X − J  0 (check it). Hence, t = ϑ(G) and C = ϑ(G) X define a
feasible solution of the program (4.14) defining ϑ(G), which implies ϑ(G) ≤
n/ϑ(G). Combining with Proposition 4.7.1 we get the equality ϑ(G)ϑ(G) =
|V |.
As an application, since the cycle Cn is vertex-transitive, we obtain
ϑ(Cn )ϑ(Cn ) = n. (4.29)
In particular, as C5 is isomorphic to its complement, we have ϑ(C5 ) = ϑ(C5 )
and thus we deduce that

ϑ(C5 ) = 5. (4.30)
For n even, Cn is bipartite (and thus perfect), so that ϑ(Cn ) = α(Cn ) = n2
and ϑ(Cn ) = ω(Cn ) = 2. For n odd, one can compute ϑ(Cn ) using the above
symmetry reduction.
Proposition 4.7.6 For any odd n ≥ 3,
n cos(π/n) 1 + cos(π/n)
ϑ(Cn ) = and ϑ(Cn ) = .
1 + cos(π/n) cos(π/n)
4.8 Application to Hamming graphs: Delsarte LP bound for codes 99

Proof As ϑ(Cn )ϑ(Cn ) = n, it suffices to compute ϑ(Cn ). We use the for-


mulation (4.15). As Cn is vertex-transitive, there is an optimal solution B
whose entries are all equal to 1, except Bij = 1 + x for some scalar x
whenever |i − j| = 1 (modulo n). In other words, B = J + xACn , where
ACn is the adjacency matrix of the cycle Cn . Thus ϑ(Cn ) is equal to the
minimum value of λmax (B) for all possible x. Since ACn is a symmetric cir-
culant matrix, its eigenvalues are known in closed form: They are the scalars
2iπ
ω k + ω −k = 2 cos(2kπ/n) (for k = 0, 1, · · · , n − 1), where ω = e n is an n-th
root of unity and the eigenvector corresponding to the eigenvalue 2 (case
k = 0) is the all-ones vector. Hence the eigenvalues of B are
n + 2x and x(ω k + ω −k ) for k = 1, · · · , n − 1. (4.31)
So we need to find the value of x that minimizes the maximum of the values
in (4.31) or, in other words, max{n + 2x, 2x cos(2kπ/n) for 1 ≤ k ≤ (n −
1)/2}. This occurs when choosing x such that n + 2x = −2x cos(π/n) (check
n cos(π/n)
it), which gives ϑ(Cn ) = −2x cos(π/n) = 1+cos(π/n) .

As another application, as indicated in Lovász [1979] one can compute


the Shannon capacity of any graph G which is vertex-transitive and self-
complementary (like the 5-cycle C5 ).
Theorem 4.7.7 If G = (V, E) is a vertex-transitive graph,
p then Θ(G·G) =
|V |. If, moreover, G is self-complementary, then Θ(G) = |V |.
Proof We have Θ(G · G) ≥ α(G · G) ≥ |V |, since the set of diagonal pairs
{(i, i) : i ∈ V } is stable in G · G. The reverse inequality follows from Propo-
sition 4.7.5 combined with Lemma 4.6.3: Θ(G · G) ≤ ϑ(G · G) = ϑ(G)ϑ(G) =
|V |. Therefore, Θ(G · G) = |V |. p
If moreover G ispisomorphic to G then ϑ(G) = |V | and thus we have
Θ(G) ≤ ϑ(G) = |V |. On the other hand, 2
2
p |V | = Θ(G · G) = Θ(G ) ≤
(Θ(G))
p (check it), which implies: Θ(G) ≥ |V | and thus equality: Θ(G) =
|V |.

4.8 Application to Hamming graphs: Delsarte LP bound for


codes
A binary code of length n is a subset C of the set V = {0, 1}n of binary
sequences (aka words) of length n. Given two words u, v ∈ V , their Hamming
distance dH (u, v) is the number of positions i ∈ [n] such that ui 6= vi . The
Hamming weight |u| of a word u ∈ V is its number of nonzero coordinates:
|u| = dH (u, 0).
100 Graph coloring and independent sets

The minimum distance of the code C is the largest integer d ∈ N for which
any two distinct words of C have Hamming distance at least d: dH (u, v) ≥ d
for all u 6= v ∈ C. A fundamental problem in coding theory is to compute the
maximum cardinality A(n, d) of a code of length n with minimum distance
at least d. This is the maximum number of messages of length n that can
correctly be decoded if after transmission at most (d − 1)/2 bits can be
erroneously transmitted in each word of C.
Computing A(n, d) is in fact an instance of the maximum stable set prob-
lem. Indeed, let G(n, d) denote the graph with vertex set V = {0, 1}n and
with an edge {u, v} if dH (u, v) ≤ d − 1. Such graph is called a Hamming
graph since its edges depend only on the Hamming distances between the
vertices. Then, a code C ⊆ V has minimum distance at least d if and only
if C is a stable set in G(n, d) and thus

A(n, d) = α(G(n, d)).

A natural idea for getting an upper bound for A(n, d) is to use the theta
number ϑ(G), or its strengthening ϑ0 (G) obtained by adding nonnegativity
conditions on the entries of the matrix variable:

ϑ0 (G) = {maxhJ, Xi : Tr(X) = 1, Xuv = 0 for {u, v} ∈ E, X ≥ 0, X  0}.


(4.32)
Computing the paramater ϑ(G(n, d)) or ϑ0 (G(n, d)) is apparently a difficult
problem. Indeed the graph G(n, d) has 2n vertices and thus the matrix X in
the above semidefinite program has size 2n . However, using the fact that the
Hamming graph G(n, d) has a large automorphism group, one can greatly
simplify the above semidefinite program and in fact reformulate it as an
equivalent linear program with only n + 1 variables and linear constraints.
This is thus an enormous gain in complexity, thanks to which one can com-
pute the parameter ϑ0 (G(n, d)) for large values of n. In fact this reformulated
LP coincides with an LP bound on A(n, d) that had already been discovered
earlier by Delsarte [1973].
In a nutshell this symmetry reduction is possible because the symmetric
matrices that are invariant under action of the automorphism group of the
Hamming graph form a commutative algebra, so that they can all be diag-
onalized simultaneously, by the same orthogonal basis. In what follows we
will explain how to use this idea to derive the linear program equivalent to
(4.32).
4.8 Application to Hamming graphs: Delsarte LP bound for codes 101

4.8.1 Automorphisms of the Hamming graph


Any permutation π of [n] induces a permutation of the set V = {0, 1}n by
setting
π(v) = (vπ(1) , . . . , vπ(n) ) ∈ {0, 1}n for v ∈ {0, 1}n .
Moreover, any a ∈ {0, 1}n also induces a permutation sa of V by setting
sa (v) = v ⊕ a for v ∈ {0, 1}n .
Here we use addition modulo 2 in {0, 1}n : u ⊕ v = (ui ⊕ vi )ni=1 , setting
0 ⊕ 0 = 1 ⊕ 1 = 0 and 1 ⊕ 0 = 0 ⊕ 1 = 1. Thus dH (u, v) = |u ⊕ v|.
It is easy to see that these induced permutations π and sa of V are auto-
morphisms of G(n, d). Moreover, the set
Gn = {πsa : π permutation of [n], a ∈ {0, 1}n }
is a group (check it), which is contained in the automorphism group of
G(n, d). Note, for instance, that πsa = sπ(a) π (check it).
The graph G(n, d) is vertex-transitive under action of Gn (since, for any
two vertices u, v ∈ V , the map su⊕v maps u to v). Moreover, given words
u, v, u0 , v 0 in V , there exists σ ∈ Gn such that σ(u) = u0 and σ(v) = v 0 if and
only if dH (u, v) = dH (u0 , v 0 ) (check it).

4.8.2 Invariant matrices under action of Gn


Let Bn denote the set of matrices indexed by V = {0, 1}n that are invariant
under action of Gn . That is, X ∈ Bn if it satisfies: X(u, v) = X(σ(u), σ(v)) for
all u, v ∈ V and σ ∈ Gn or, equivalently, if each entry X(u, v) depends only
on the value of the Hamming distance dH (u, v). For k ∈ {0, 1, . . . , n} let Mk
denote the matrix indexed by V with entries Mk (u, v) = 1 if dH (u, v) = k
and Mk (u, v) = 0 otherwise. Then the matrices M0 , M1 , . . . , Mn form a
basis of the vector space Bn , and Bn has dimension n + 1. Moreover, Bn is
an algebra, which is commutative (this will be clear from Lemma 4.8.1); it
is known as the Bose-Mesner algebra.
It will be convenient to use another basis of Bn in order to describe positive
semidefinite matrices in Bn .
Given a ∈ V = {0, 1}n , define the vector Ca ∈ {±1}V defined by
T
Ca = ((−1)a v )v∈V for a ∈ V. (4.33)
Next define the V × V matrices B0 , B1 , . . . , Bn by
X
Bk = Ca CaT for k = 0, 1, . . . , n. (4.34)
a∈V :|a|=k
102 Graph coloring and independent sets

Lemma 4.8.1 (i) The vectors in {Ca : a ∈ V } are pairwise orthogonal.


(ii) The matrices B0 , B1 , . . . , Bn are pairwise orthogonal: Bh Bk = 2n Bk δh,k .
(iii) B0 = J, Tr(Bk ) = 2n nk for 0 ≤ k ≤ n, and hJ, Bk i = 0 for 1 ≤ k ≤ n.


(iv) For any k and u, v ∈ V , we have

Bk (u, v) = Pnk (dH (u, v)),

where Pnk (t) is the Krawtchouk polynomial which, at any integer t =


0, 1, . . . , n, is given by
k   
k
X
i t n−t
Pn (t) = (−1) . (4.35)
i k−i
i=0

(v) The set {B0 , . . . , Bn } is a basis of Bn and Bn is a commutative algebra.

Proof (i) is direct verification (check it) and then (ii),(iii) follow easily.
(iv) Set t = dH (u, v) and Z = {i ∈ [n] : ui 6= vi } with t = |Z|. Moreover
for each a ∈ V define the set A = {i ∈ [n] : ai = 1}. Then, we find that
Bk (u, v) = A⊆[n]:|A|=k (−1)|A∩Z| = ti=0 (−1)i ti n−t k
P P
k−i = Pn (t).
(v) follows from (ii) and (iv).

Note that the vectors {Ca : a ∈ V = {0, 1}n } form an orthogonal basis of
RV and that they are the common eigenvectors to all matrices Bk , and thus
to all matrices in the Bose-Mesner algebra Bn .
Lemma 4.8.2 Let X = nk=0 xk Bk ∈ Bn . Then, X  0 ⇐⇒ x0 , x1 , . . . , xn ≥
P

0.

Proof The claim follows from the fact that the Bk ’s are positive semidef-
inite and pairwise orthogonal. Indeed, X  0 if all xk ’s are nonnegative.
Conversely, if X  0 then 0 ≤ hX, Bk i = xk hBk , Bk i, implying xk ≥ 0.

4.8.3 Delsarte linear programming bound for A(n, d)


Using the above facts we can now formulate the parameter ϑ0 (G(n, d)) as
the optimum value of a linear program. This linear program (4.36) pro-
vides an upper bound for A(n, d), which was first discovered by Delsarte
[1973]. That this bound coincides with the theta number ϑ0 (G(n, d)) was
discovered independently by McEliece, Rodemich and Rumsey [1978] and
Schrijver [1979].
Theorem 4.8.3 (Delsarte LP bound for A(n, d) = α(G(n, d))) The pa-
rameter ϑ0 (G(n, d)) is equal to the optimum value of the following linear
4.9 Lasserre hierarchy of semidefinite bounds 103

program:
Pn
maxx0 ,...,xn ∈R 22n x0 s.t. xk nk = 2−n ,

Pnk=0
xk Pnk (t) = 0 for t = 1, . . . , d − 1,
Pk=0
n k
k=0 xk Pn (t) ≥ 0 for t = d, . . . , n,
xk ≥ 0 for k = 0, 1, . . . , n,
(4.36)
where Pnk (t) is the Krawtchouk poynomial in (4.35).
Proof In the formulation (4.32) of ϑ0 (G(n, d)) we may assume without loss
of generality that the variable X is invariant under action of the automor-
phism group of G(n, d). Hence, in particular, we may assume that X is
invariant under action of the group Gn and thus that X belongs to Bn .
In other words, X = nk=0 xk Bk for some scalars x0 , . . . , xn ∈ R. It now
P

suffices to rewrite the constraints on X as constraints on the xk ’s. Using


2n
Pn n n
Lemma 4.8.1, we find: hJ, Xi = 2 x0 , 1 = Tr(X) = k=0 2 k xk , and
X(u, v) = nk=0 xk Pnk (t) if dH (u, v) = t. Finally the condition X  0 gives
P

x ≥ 0.

4.9 Lasserre hierarchy of semidefinite bounds


As we have seen in this chapter the theta number ϑ(G) offers an interest-
ing upper bound for the stability number α(G), which can be efficiently
computed via semidefinite programming. This raises naturally the question
whether it is possible to strengthen this basic bound. A first easy way of get-
ting a stronger bound toward α(G) is by adding nonnegativity constraints
to the formulation of ϑ(G). In this way we get the parameter ϑ0 (G) in (4.32),
which satisfies α(G) ≤ ϑ0 (G) ≤ ϑ(G).
There is a more systematic way of constructing stronger and stronger
bounds for α(G). The idea is to start from the formulation of ϑ(G) from
(4.18) and to observe that the matrix variable is indexed by all nodes to-
gether with an additional index 0. More generally, we can define a hierarchy
of upper bounds for α(G) that are obtained by optimizing over a matrix vari-
able indexed by all products of at most t (distinct) variables, for increasing
values of t.
The idea is simple and consists of ‘lifting’ the problem into higher dimen-
sion by adding new variables. Given a set S ⊆ V , let x = χS ∈ {0, 1}n
denote its characteristic vector and, for t ∈ [n], define the vector
[x]t = (1, x1 , . . . , xt , x1 x2 , . . . , xn−1 xn , . . . , x1 x2 · · · xt , . . . , xn−t+1 · · · xn )
consisting of all products of at most t distinct xi ’s (listed in sone order).
104 Graph coloring and independent sets

Then [x]t is indexed by the set Pt (n), consisting of all subsets I ⊆ [n] with
Q
|I| ≤ t, i.e., [x]t = ( i∈I xi )I∈Pt (n)(n) . For instance, [x]1 = (1, x1 , . . . , xn )
and [x]n contains all 2n possible products of distinct xi ’s.
Next we consider the matrix Y = [x]t [x]T t . By construction, this matrix is
positive semidefinite and satisfies the following linear conditions:
Y (I, J) = Y (I 0 , J 0 ) if I ∪ J = I 0 ∪ J 0
(use here the fact that x is binary valued) and Y∅,∅ = 1. This motivates the
following definition.
Definition 4.9.1 Given an integer 0 ≤ t ≤ n and a vector y = (yI ) ∈
RP2t (n) , let Mt (y) denote the symmetric matrix indexed by Pt (n), with (I, J)th
entry yI∪J for I, J ∈ Pt (n). Mt (y) is called the moment matrix of order t of
y.
Example 4.9.2 As an example, for n = 2, the matrices M1 (y) and M2 (y)
have the form
∅ 1 2 12
∅ 1 2

 
  y∅ y1 y2 y12
∅ y∅ y1 y2
1  y1 y1 y12 y12 
M1 (y) = 1  y1 y1 y12 , M2 (y) =  .
2  y2 y12 y2 y12 
2 y2 y12 y2
12 y12 y12 y12 y12
Here to simplify notation we use yi instead of y{i} and y12 instead of y{1,2} .
Note that M1 (y) corresponds to the matrix variable in the formulation (4.18)
of ϑ(G). Moreover, M1 (y) occurs as a principal submatrix of M2 (y).
We can now formulate new upper bounds for the stability number.
Definition 4.9.3 For any integer 1 ≤ t ≤ n, define the parameter
( n )
X
last (G) = max yi : y∅ = 1, yij = 0 for {i, j} ∈ E, Mt (y)  0 ,
y∈RP2t (n)
i=1
(4.37)
known as the Lasserre bound of order t.
Lemma 4.9.4 For each 1 ≤ t ≤ n, we have that α(G) ≤ last (G). More-
over, last+1 (G) ≤ last (G).
Proof Let x = χS where S is a stable set of G, and let y = [x]t . Then the
moment matrix Mt (y) is feasible for the program (4.37) with value ni=1 yi =
P

|S|, which shows |S| ≤ last (G) and thus α(G) ≤ last (G).
The inequality last+1 (G) ≤ last (G) follows from the fact that Mt (y) occurs
as a principal submatrix of Mt+1 (y).
4.9 Lasserre hierarchy of semidefinite bounds 105

Some further observations:


• For t = 1, the Lasserrre bound is simply the theta number: las1 (G) = ϑ(G).
• For t = 2, the Lasserre bound improves ϑ0 (G): las2 (G) ≤ ϑ0 (G). This is
because the condition M2 (y)  0 implies that all entries yij are nonnegative
(as yij occurs as a diagonal entry of M2 (y)).
• The bounds form a hierarchy of stronger and stronger bounds:

α(G) ≤ lasn (G) ≤ . . . ≤ last (G) ≤ . . . ≤ las2 (G) ≤ las1 (G) = ϑ(G).

It turns out that, at order t = α(G), the Lasserre bound is exact: last (G) =
α(G).

Theorem 4.9.5 For any graph G, last (G) = α(G) for any t ≥ α(G).

In the rest of the section we will prove this result.

4.9.1 Characterizing positive semidefinite moment matrices


Mn (y).
In a first step we characterize the vectors y = (yI )I⊆[n] whose moment matrix
Mn (y) (of the largest order n) is positive semidefinite.
For this we use the 2n × 2n matrix Zn , whose columns are the vectors [x]n
for x ∈ {0, 1}n . Alternatively, Zn is the matrix indexed by the collection
Pn (n) of all subsets of [n], with entries Zn (I, J) = 1 if I ⊆ J and Zn (I, J) = 0
otherwise. Its inverse matrix Zn−1 is defined by Zn−1 (I, J) = (−1)|J\I| if I ⊆ J
and 0 otherwise. (Check it). The matrix Zn is known as the Zeta matrix of
the lattice Pn (n) (consisting of all subsets of the set [n], being ordered by
set inclusion) and its inverse Zn−1 as its Möbius matrix (cf., e.g., Lovász,
Schrivjer [1991]).

Example 4.9.6 For n = 2 we have:


∅ 1 2 12 ∅ 1 2 12
∅ ∅ −1 −1
   
1 1 1 1 1 1
1  0 1 0 1 
, Z −1 = 1  0 1 0 −1 
Z2 = 
2
 .
2 0 0 1 1  2 0 0 1 −1 
12 0 0 0 1 12 0 0 0 1

Lemma 4.9.7 Let y ∈ RPn (n) and set λ = Zn−1 y ∈ RPn (n) . Then,

Mn (y) = Zn Diag(λ)ZnT .

Proof Pick I, J ⊆ [n]. We show that the (I, J)th entry of Zn Diag(λ)ZnT is
106 Graph coloring and independent sets

equal to yI∪J . This is direct verification:


X X
(Zn Diag(λ)ZnT )I,J = (Diag(λ)ZnT )K,J = (Zn−1 y)K
K:I⊆K K:I∪J⊆K

X X X X
= (−1)|H\K| yH = yH (−1)|H\K| ,
K:I∪J⊆K H:K⊆H H:I∪J⊆H K:I∪J⊆K⊆H

|H\K|
P
which is equal to yI∪J , since the inner summation K:I∪J⊆K⊆H (−1)
is equal to zero whenever H 6= I ∪ J.

Corollary 4.9.8 Let y ∈ RPn (n) . The following assertions are equivalent.

(i) Mn (y)  0.
(ii) Zn−1 y ≥ 0.
(iii) The vector y is a conic combination of the vectors [x]n for x ∈ {0, 1}n ;
P
that is, we have y = x∈{0,1}n λx [x]n for some nonnegative scalars λx .

Example 4.9.9 Let n = 2 and consider a vector y = (y∅ , y1 , y2 , y12 ). Then,


y can be written as the following linear combination of the vectors [x]2 for
x ∈ {0, 1}2 :

y = (y∅ − y1 − y2 + y12 )[0]2 + (y1 − y12 )[e1 ]2 + (y2 − y12 )[e2 ]2 + y12 [e1 + e2 ]2

(setting e1 = (1, 0) and e2 = (0, 1)). Therefore, we see that this is indeed a
conic combination if and only if any of the following equivalent conditions
holds:
∅ 1 2 12

∅ y∅ − y1 − y2 + y12 ≥ 0
 
y0 y1 y2 y12 

1  y y1 y12 y12  y1 − y12 ≥ 0

M2 (y) =  1   0 ⇐⇒
2  y2 y12 y2 y12  
 y2 − y12 ≥ 0
y12 ≥ 0.

12 y12 y12 y12 y12

4.9.2 Canonical lifted representation for 0 − 1 polytopes


We sketch here the significance of the above results about moment matrices
for discrete optimization. A fundamental question is whether one can op-
timize efficiently a linear objective function over a given set X ⊆ {0, 1}n .
Think for instance of the traveling salesman problem, in which case X is
the set of the incidence vectors of all Hamiltonian cycles in a graph, or of
the maximum stable set problem considered here, in which case X is the
4.9 Lasserre hierarchy of semidefinite bounds 107

set of incidence vectors of the stable sets in a graph. The classical so-called
polyhedral approach is to consider the polytope
P = conv(X ) ⊆ Rn ,
defined as the convex hull of all vectors in X . Then the question boils down
to finding the linear inequality description of P (or at least a part of it). It
turns out that this question gets a simpler answer if we ‘lift’ the problem
into higher dimension and allow the use of additionnal variables.
Define the polytope
P = conv([x]n : x ∈ X ) ⊆ RPn (n) .
Let π denote the projection from the space RPn (n) onto the space Rn where,
for a vector y = (yI )I⊆[n] , π(y) = (y1 , . . . , yn ) denotes its projection onto the
coordinates indexed by the n singleton subsets of [n]. Then, by construction,
we have that
P = π(P).
As we now indicate the results from the previous subsection show that the
lifted polytope P admits a very simple explicit description.
Indeed, for a vector y ∈ RPn (n) , the trivial identity y = Zn (Zn−1 y) shows
that y ∈ P if and only if it satisfies the following conditions:
Z −1 y ≥ 0, (Z −1 y)x = 0 ∀x 6∈ X , eT Z −1 y = 1. (4.38)
The first condition says that y is a conic combination of vectors [x]n (for x ∈
{0, 1}n ), the second condition says that only vectors [x]n for x ∈ X are used
in this conic combination, the last condition says that the conic combination
is in fact a convex combination and it can be equivalently written as y∅ = 1.
Hence (4.38) gives an explicit linear inequality description for the polytope
P. Moreover, using Corollary 4.9.8, we can replace the condition Zn−1 y ≥ 0
by the condition Mn (y)  0. In this way we get a description of P involving
positive semidefiniteness of the full moment matrix Mn (y).
So what this says is that any polytope P with 0 − 1 vertices can be ob-
tained as projection of a polytope P admitting a simple explicit description.
The price to pay however is that the lifted polytope P “lives” in a 2n -
dimensional space, thus exponentially large with respect to the dimension n
of the ambiant space of the polytope P . Nevertheless this perspective leads
naturally to hierarchies of semidefinite relaxations for P , obtained by con-
sidering only truncated parts Mt (y) of the full matrix Mn (y) for growing
orders t. We explain below how this idea applies to the stable set polytope.
We refer to Lasserre [2001] and Laurent [2003] for a detailed treatment, also
108 Graph coloring and independent sets

about the links to other lift-and-project techniques used in combinatorial


optimization.
This idea of ‘lifting’ a problem by adding new variables is widely used in
optimization and it can sometimes lead to a huge efficiency gain. As a simple
illustrating example consider the `1 -ball B = {x ∈ Rn : |x1 |, . . . , |xn | ≤
1}. The explicit linear inequality description of B requires the following 2n
inequalities: ni=1 ai xi ≤ 1 for all a ∈ {±1}n . On the other hand, if we allow
P

n additional variables we can describe B using only 3n linear inequalities.


Namely, define the polytope

Q = {(x, y) ∈ Rn × Rn : xi ≤ yi , −xi ≤ yi , yi ≤ 1 for i ∈ [n]}.

Then B coincides with the projection of Q onto the x-subspace.

4.9.3 Convergence of the Lasserre hierarchy to α(G)


We can now conclude the proof of Theorem 4.9.5, showing that the Lasserre
relaxation solves the maximum stable set problem at any order t ≥ α(G).
The proof follows through the following three lemmas. We let SG denote the
set of all characteristic vectors of the stable sets of G (with the notation of
the preceding section we consider here the set X = SG ).
Lemma 4.9.10 Assume Mt (y)  0 and yij = 0 for all edges {i, j} ∈ E.
Then yI = 0 if I ⊆ [n] contains an edge and |I| ≤ 2t.

Proof We leave this as an exercise.

Lemma 4.9.11 Assume y0 = 1, Mn (y)  0 and yij = 0 for all edges


{i, j} ∈ E. Then y is a convex combination of vectors [x]n for x ∈ SG .
P
Proof By Corollary 4.9.8, y = x∈{0,1}n λx [x]n , with all λx ≥ 0. It suffices
now to observe that λx > 0 implies x ∈ SG , which follows directly from the
P
fact that 0 = yij = x λx xi xj for all edges {i, j} ∈ E.

Lemma 4.9.12 Let t ≥ α(G). Assume y ∈ RP2t (n) satisfies Mt (y)  0,


yij = 0 for all edges {i, j} ∈ E and y0 = 1. Then, ni=1 yi ≤ α(G) and thus
P

last (G) ≤ α(G).

Proof We extend the vector y to a vector ỹ ∈ Pn (n) by setting ỹI = yI if


|I| ≤ 2t and ỹI = 0 if |I| > 2t. We claim that the matrix Mn (ỹ) has the
block-form:
 
Mt (y) 0
Mn (ỹ) = .
0 0
4.10 Notes and further reading 109

Indeed, by construction, ỹI∪J = yI∪J if both I and J have cardinality at


most t. Otherwise, say |I| > t. If |I ∪ J| ≤ 2t, then ỹI∪J = yI∪J = 0 by
Lemma 4.9.10 (since I ∪ J is not stable as |I| > t ≥ α(G)). If |I ∪ J| > 2t,
then ỹI∪J = 0 by construction.
Hence Mt (y)  0 implies Mn (ỹ)  0. Using Lemma 4.9.11, we can con-
clude that ỹ is a convex combination of vectors [x]n for x ∈ SG . By pro-
jecting onto the positions indexed by 1, 2, . . . , n, we get that the vector
(y1 , . . . , yn ) = (ỹ1 , . . . , ỹn ) is a convex combination of characteristic vectors
of stable sets of G, and thus this implies ni=1 yi ≤ α(G).
P

4.10 Notes and further reading


In his seminal paper Lovász [1979], Lovász introduced the theta number and
gave several equivalent characterizations for it. His main motivation was its
application for bounding the Shannon capacity, which enabled him to com-
pute the Shannon capacity of the 5-cycle C5 .It is worth noting that Lovász’
paper was published in 1979, thus before the discovery of polynomial time
algorithms for semidefinite programming and in fact before the terminology
‘semidefinite programming’ was even coined!
In 1981, Grötschel, Lovász and Schrijver [1981] derived polynomial time
algorithms for finding maximum stable sets and minimum graph colorings in
perfect graphs, that are based on iterated computations of the theta number.
They showed that computing the theta number can be done using the ellip-
soid method for solving semidefinite programs. As of today, the semidefinite
programming approach is still the only known method for designing poly-
nomial time algorithms for maximum stable sets and minimum coloring in
perfect graphs – in particular, no purely combinatorial algorithm is known!
Whether such an efficient algorithm can be designed using the structural
decomposition theory for perfect graphs discovered by M. Chudnovsky, N.
Robertson, P. Seymour, R. Thomas [2006] is still open.
Detailed information about the theta number can also be found in the
survey of Knuth [1994] and a detailed treatment about the material about
the theta number, algorithms for perfect graphs and the theta body TH(G)
can be found in the monograph by Grötschel, Lovász and Schrijver [1988].
Strengthenings of the theta number have been investigated both toward
the stable set number and toward the coloring number (as described, e.g., in
[2008]). General lift-and-project methods have been developed, that permit
to design hierarchies of relaxations for combinatorial polytopes (like, for
instance, stable set polytopes) that construct tighter and tighter relaxations
110 Graph coloring and independent sets

able to find the original combinatorial polytope after finitely many steps
(typically, n steps when the combinatorial polytope is the convex hull of a
subset of {0, 1}n ). These relaxations can be LP-based or SDP-based. This
includes the hierarchies designed by Lovász and Schrijver [1991], by Sherali
and Adams [1990] and by Lasserre [2001]. It turns out that the construction
of Lasserre [2001], which we have briefly described in this chapter, yields
the tigthest bounds. We will return to it within the more general setting of
polynomial optimization in Part IV of this book. We refer to Laurent [2003]
for a comparative study of these various constructions and to the monograph
by Tunçel [2010] for a detailed exposition.
As explained in this chapter the computation of the theta number for the
Hamming graph G(n, d) boils down to the LP bound of Delsarte [1973] for
the maximum cardinality A(n, d) of a binary code of length n with mini-
mum distance at least d. Strengthenings of the Delsarte bound have been
investigated, that can be seen as (variations of) the next levels in Lasserre
hierarchy. Once again what is crucial for being able to compute numerically
these bounds is exploiting symmetry. However, after exploiting symmetry
(applying block-diagonalization), the resulting programs remain SDP’s, al-
beit with many smaller blocks (instead of a single large matrix) and much
less variables. These ideas are developed, in particular, in the papers Schri-
jver [2005] and Litjens, Polak, Schrijver [2017] (where the bound of order
t = 2 is computed). Still using symmetry reduction, it is shown in Laurent
[2007a] that computing the level t bound last (G(n, d)) boils down to solving
an SDP whose size is polynomial in n for any fixed t.
The theta number plays an important role in many different areas. For
instance it will pop up again in Chapter 6 when studying analogues of the
celebrated Grothendieck constant for graphs. The theta number has also
been extended to infinite graphs arising to model geometric problems, like
for instance packing problems on the sphere, to which we will return in
Chapter 10.8

Exercises
4.1 Consider two graphs G1 = (V1 , E2 ) and G2 = (V2 , E2 ) with disjoint
vertex sets. Their disjoint union is the graph (V1 ∪ V2 , E1 ∪ E2 ).
(a) Show that the disjoint union of two perfect graphs is a perfect graph.
(b) Construct a class of perfect graphs having exponentially many max-
imal stable sets (in terms of the number of nodes).
8 Rephrase better and insert relevant references?
Exercises 111

(c) Construct a class of perfect graphs having exponentially many max-


imal stable sets and exponentially many maximal cliques.

4.2 Show the result of Proposition 4.5.8.


4.3 The goal is to show the result of Lemma 4.6.3 about the theta number
of the strong product of two graphs G = (V, E) and H = (W, F ):
ϑ(G · H) = ϑ(G)ϑ(H).

(a) Show that ϑ(G · H) ≥ ϑ(G)ϑ(H).


(b) Show that ϑ(G · H) ≤ ϑ(G)ϑ(H).
Hint: Use the primal formulation (4.11) for (1), and the dual formula-
tion (4.12) for (2), and think of using Kronecker products of matrices
in order to build feasible solutions.
4.4 Let G = (V, E) be the disjoint union of two graphs G1 = (V1 , E1 ) and
G2 = (V2 , E2 ). That is, V1 , V2 are disjoint, V = V1 ∪V2 and E = E1 ∪E2 .
(a) Show that ϑ(G) ≤ ϑ(G1 ) + ϑ(G2 ).
(b) Show that ϑ(G) ≥ ϑ(G1 ) + ϑ(G2 ).
Hint: Use the formulation (4.14) of the theta number for (a) and the
formulation (4.18) for (b).
4.5 Given two graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) with V1 ∩ V2 = ∅,
consider the graph G = (V, E) obtained by adding all edges between
V1 and V2 to the disjoint union of G1 and G2 . That is, V = V1 ∪ V2 and
E = E1 ∪ E2 ∪ {{i, j} : i ∈ V1 , j ∈ V2 }. Show that
ϑ(G) = max{ϑ(G1 ), ϑ(G2 )}.

4.6 Given a graph G = (V = [n], E), a symmetric matrix B ∈ S n is said


to fit G if it has non-zero diagonal entries and zero entries at the off-
diagonal positions corresponding to non-edges of G, i.e.,
Bii 6= 0 for i ∈ [n] and Bij = 0 for {i, j} ∈ E.
Consider the parameter R(G), defined as the smallest possible rank of
a matrix B which fits G, i.e.,
R(G) = min rank(B) such that Bii 6= 0 (i ∈ V ), Bij = 0 ({i, j} ∈ E).

(a) Show that R(G) ≤ χ(G).


(b) Show that R(G) ≥ α(G).
(c) Show that R(G) ≥ Θ(G).
112 Graph coloring and independent sets

(This upper bound on the Shannon capacity is due to W. Haemers.)

4.7 Let G = (V = [n], E) be a graph. Consider the graph parameter


1
ϑ1 (G) = min max ,
c,ui i∈V (cT ui )2

where the minimum is taken over all unit vectors c and all orthonor-
mal representations u1 , · · · , un of G (i.e., u1 , . . . , un are unit vectors
satisfying uTi uj = 0 for all pairs {i, j} ∈ E).
Show: ϑ(G) = ϑ1 (G).
Hint for the inequality ϑ(G) ≤ ϑ1 (G): Use the dual formulation of
ϑ(G) from Lemma 4.4.1 and the matrix M = (viT vj )ni,j=1 , where vi =
c − cTuui for i ∈ [n].
i
Hint for the inequality ϑ1 (G) ≤ ϑ(G): Use an optimal solution X =
tI − B of the dual formulation for ϑ(G), written as the Gram matrix
of vectors x1 , . . . , xn . Show that there exists a nonzero vector c which
is orthogonal to x1 , . . . , xn , and consider the vectors ui = c+x √ i.
t

4.8 Show that ϑ(C5 ) ≤ 5, using the formulation of Exercise 4.7 for the
theta number.
Hint: Consider the following vectors c, u1 , . . . , u5 ∈ R3 : c = (0, 0, 1),
uk = (s cos(2kπ/5), s sin(2kπ/5), t) for k = 1, 2, 3, 4, 5, where the scalars
s, t ∈ R are chosen in such a way that u1 ,√. . . , u5 form an orthonormal
representation of C5 . Recall cos(2π/5) = 5−1 4 .
This is the original proof of Lovász [1979], known as the umbrella con-
struction.

4.9 Assume G = (V, E) is a regular graph and let λ1 ≥ . . . ≥ λn denote


the eigenvalues of its adjacency matrix AG .
Recall that G is said to be edge-transitive if, for any two edges
{i, j}, {i0 , j 0 } of G, there exists an automorphism σ of G such that
{σ(i), σ(j)} = {i0 , j 0 }.
(a) Show: min{λmax (J + xAG ) : x ∈ R} = λ−nλ n
1 −λn
.
(b) Show: ϑ(G) ≤ min{λmax (J + xAG ) : x ∈ R}, with equality if G is
edge-transitive.
(c) Show: ϑ(G) ≤ λ−nλ n
1 −λn
, with equality if G is edge-transitive.
Hint: Use the formulation for ϑ(G) from relation (4.15).

4.10 Show the result of Lemma 4.9.10.


Exercises 113

4.11 Let G = Cn be the complementary graph of the cycle (1, 2, . . . , n) of


length n ≥ 4. Consider a vector y = (yI )I⊆V,|I|≤4 which satisfies the
conditions of the Lasserre relaxation of order 2. That is, M2 (y)  0,
yij = 0 for all edges {i, j} ∈ E(G), and y∅ = 1.
P
Show: i∈V (G) yi ≤ 2.
5
Approximating the MAX CUT problem

5.1 Introduction
5.1.1 The MAX CUT problem
The maximum cut problem (MAX CUT) is the following problem in com-
binatorial optimization. Let G = (V, E) be a graph and let w = (wij ) ∈ RE +
be nonnegative weights assigned to the edges. Given a subset S ⊆ V , the
cut δG (S) consists of the edges {i, j} ∈ E having exactly one endnode in S,
i.e., with |{i, j} ∩ S| = 1. In other words, δG (S) consists of the edges that
are cut by the partition (S, S = V \ S) of V . The cut δG (S) is called trivial
if S = ∅ or V (in which case it is empty). Then the weight of the cut δG (S)
P
is w(δG (S)) = {i,j}∈δG (S) wij and the MAX CUT problem asks for a cut
of maximum weight, i.e., compute
mc(G, w) = max w(δG (S)).
S⊆V

It is sometimes convenient to extend the weight function w ∈ RE to all


pairs of nodes of V , by setting wij = 0 if {i, j} is not an edge of G. Given
disjoint subsets S, T ⊆ V , it is also convenient to use the following notation:
X
w(S, T ) = wij .
i∈S,j∈T

Thus,
w(S, S) = w(δG (S)) for all S ⊆ V.
To state its complexity, we formulate MAX CUT as a decision problem:
MAX CUT: Given a graph G = (V, E), edge weights w ∈ ZE + and an
integer k ∈ N, decide whether there exists a cut of weight at least k.
It is well known that MAX CUT is an NP-complete problem. In fact, MAX
CUT is one of Karp’s 21 NP-complete problems. So unless the complexity
5.1 Introduction 115

classes P and NP coincide there is no efficient polynomial-time algorithm


which solves MAX CUT exactly. We give here a reduction of MAX CUT
from the PARTITION problem, defined below, which is one the first six
basic NP-complete problems in Garey and Johnson [1979]:
PARTITION: Given natural numbers a1 , . . . , an ∈ N, decide whether
P P
there exists a subset S ⊆ [n] such that i∈S ai = i6∈S ai .
Theorem 5.1.1 The MAX CUT problem is NP-complete.
Proof It is clear that MAX CUT belongs to the class NP. We now show
a reduction from PARTITION. Let a1 , . . . , an ∈ N be given. Construct the
following weights wij = ai aj for the edges of the complete graph Kn . Set
σ = ni=1 ai and k = σ 2 /4. For any subset S ⊆ [n], set a(S) = i∈S ai .
P P

Then, we have
X X X X
w(S, S) = wij = ai aj = ( ai )( aj ) = a(S)(σ−a(S)) ≤ σ 2 /4,
i∈S,j∈S i∈S,j∈S i∈S j∈S

with equality if and only if a(S) = σ/2 or, equivalently, a(S) = a(S). From
this it follows that there is a cut of weight at least k if and only if the
sequence a1 , . . . , an can be partitioned. This concludes the proof.

This hardness result for MAX CUT is in sharp contrast to the situation of
the MIN CUT problem, which asks for a nontrivial cut of minimum weight,
i.e., to compute
min w(S, S).
S⊆V :S6=∅,V

(For MIN CUT the weights of edges are usually called capacities and they
are also assumed to be nonnegative). It is well known that the MIN CUT
problem can be solved in polynomial time (together with its dual MAX
FLOW problem), using the Ford-Fulkerson algorithm. Specifically, the Ford-
Fulkerson algorithm permits to find in polynomial time a minimum cut (S, S)
separating a given source s and a given sink t, i.e., with s ∈ S and t ∈ S.
Thus a minimum weight nontrivial cut can be obtained by applying this
algorithm |V | times, fixing any s ∈ V and letting t vary over all nodes of
V \ {s}. Details can be found in essentially every textbook on combinatorial
optimization.
Even stronger, Håstad in 2001 showed that it is NP-hard to approximate
MAX CUT within a factor of 1617 ∼ 0.941.
On the positive side, one can compute a 0.878-approximation of MAX
CUT in polynomial time, using semidefinite programming. This algorithm,
116 Approximating the MAX CUT problem

Figure 5.1 Minimum and maximum weight cuts

due to Goemans and Williamson [1995], is one of the most influential ap-
proximation algorithms which are based on semidefinite programming. We
will explain this result in detail in Section 5.2.1.
Before doing that we recall some results for MAX CUT based on using
linear programming.

5.1.2 Linear programming relaxation


In order to define linear programming bounds for MAX CUT, one needs
to find some linear inequalities that are satisfied by all cuts of G, i.e., some
valid inequalities for the cut polytope of G. Large classes of such inequalities
are known (cf. e.g. [1997] for an overview and references).
We now present some simple but important valid inequalities for the cut
polytope of the complete graph Kn , which is denoted as CUTn , and defined
as the convex hull of the incidence vectors of the cuts of Kn :
CUTn = conv{χδKn (S) : S ⊆ [n]}.
For instance, for n = 2, CUTn = [0, 1] and, for n = 3, CUT3 is a simplex
in R3 (indexed by the edges of K3 ordered as {1, 2}, {1, 3}, {2, 3}) with as
vertices the incidence vectors of the four cuts (S, S) of K3 : (0, 0, 0), (1, 1, 0),
(1, 0, 1), and (0, 1, 1) (for S = ∅, {1}, {2} and {3}, respectively).
As a first easy observation it is important to realize that in order to
compute the maximum cut mc(G, w) in a weighted graph G on n nodes,
one can as well deal with the complete graph Kn . Indeed, any cut δG (S)
of G can be obtained from the corresponding cut δKn (S) of Kn , simply by
ignoring the pairs that are not edges of G, in other words, by projecting onto
the edge set of G. Hence one can reformulate any maximum cut problem as
a linear optimization problem over the cut polytope of Kn :
X
mc(G, w) = max wij xij ;
x∈CUTn
{i,j}∈E
5.1 Introduction 117

the graph G is taken into account by the objective function of this LP.
The following triangle inequalities are valid for the cut polytope CUTn :
xij − xik − xjk ≤ 0, xij + xjk + xjk ≤ 2, (5.1)
for all distinct i, j, k ∈ [n]. This is easy to see, just verify that these in-
equalities hold when x is equal to the incidence vector of a cut. The triangle
inequalities (5.1) imply the following bounds (check it):
0 ≤ xij ≤ 1 (5.2)
on the variables. Let METn denote the polytope in RE(Kn ) defined by the tri-
angle inequalities (5.1). Thus, METn is a linear relaxation of CUTn , tighter
than the trivial relaxation by the unit hypercube:
CUTn ⊆ METn ⊆ [0, 1]E(Kn ) .
It is known that equality CUTn = METn holds for n ≤ 4, but the inclusion
CUTn ⊂ METn is strict for n ≥ 5. Indeed, the inequality:
X
xij ≤ 6 (5.3)
1≤i<j≤5

is valid for CUT5 (as any cut of K5 has cardinality 0, 4 or 6), but it is not
valid for MET5 . For instance, the vector (2/3, . . . , 2/3) ∈ R10 belongs to
MET5 but it violates the inequality (5.3) (since 10 · 2/3 > 6).
We can define the following linear programming bound:
 
 X 
lp(G, w) = max wij xij : x ∈ METn (5.4)
 
{i,j}∈E(G)

for the maximum cut:


mc(G, w) ≤ lp(G, w).
The graphs for which this bound is tight have been characterized by Bara-
hona [1983]:
Theorem 5.1.2 Let G be a graph. Then, mc(G, w) = lp(G, w) for all
weight functions w ∈ RE if and only if the graph G has no K5 minor.
In particular, if G is a planar graph, then mc(G, w) = lp(G, w) so that the
maximum cut can be computed in polynomial time using linear program-
ming.
A natural question is how good the LP bound is for general graphs. Here
are some easy bounds.
118 Approximating the MAX CUT problem

Lemma 5.1.3 Let G be a graph with nonnegative weights w. The following


holds.
(i) mc(G, w) ≤ lp(G, w) ≤ w(E).
(ii) mc(G, w) ≥ w(E)/2.
Proof (i) follows from the fact that METn ⊆ [0, 1]E(Kn ) and w ≥ 0. For
(ii) pick S ⊆ V for which (S, S) is a cut of maximum weight: w(S, S) =
mc(G, w). Thus if we move one node i ∈ S to S, or if we move one node
j ∈ S to S, then we obtain another cut whose weight is at most w(S, S).
This gives:
w(S \ {i}, S ∪ {i}) − w(S, S) = w(S \ {i}, {i}) − w({i}, S) ≤ 0,

w(S ∪ {j}, S \ {j}) − w(S, S) = w({j}, S \ {j}) − w(S, {j}) ≤ 0.


Summing the first relation over i ∈ S and using the fact that 2w(E(S)) =
P
\ {i}, {i}), where E(S) is the set of edges contained in S, and the
i∈S w(SP
fact that i∈S w({i}, S) = w(S, S), we obtain:
2w(E(S)) ≤ w(S, S).
Analogously, summing over j ∈ S, we obtain:
2w(E(S)) ≤ w(S, S).
Summing these two relations yields: w(E(S)) + w(E(S)) ≤ w(S, S). Now
adding w(S, S) to both sides implies: w(E) ≤ 2w(S, S) = 2mc(G, w), which
shows (ii).
As an application of Lemma 5.1.3, we obtain that
1 mc(G, w)
≤ ≤1 for all nonnegative weights w ≥ 0.
2 lp(G, w)
It turns out that there are graphs for which the ratio mc(G, w)/lp(G, w)
can be arbitrarily close to 1/2, see Poljak, Tuza [1994]. This means that for
these graphs, the metric polytope does not provide a better approximation
of the cut polytope than its trivial relaxation by the hypercube [0, 1]E .
We now provide another argument for the lower bound mc(G, w) ≥ w(E)/2.
This argument is probabilistic and based on the following simple randomized
algorithm: Construct a random partition (S, S) of V by assigning, indepen-
dently, with probability 1/2, each node i ∈ V to either side of the partition.
Then the probability that an edge {i, j} is cut by the partition is equal to
1 1 1 1 1
P({i, j} is cut) = P(i ∈ S, j ∈ S) + P(i ∈ S, j ∈ S) = · + · = .
2 2 2 2 2
5.2 The algorithm of Goemans and Williamson 119

Hence, the expected weight of the cut produced by this random partition is
equal to
X X 1 w(E)
E(w(S, S)) = wij P({i, j} is cut) = wij = .
2 2
{i,j}∈E {i,j}∈E

Here we have used the linearity of the expectation.


In the next section, we will see another probabilistic argument, due to
Goemans and Williamson, which permits to construct a much better random
cut. Namely we will get a random cut whose expected weight satisfies:
E(w(S, S)) ≥ 0.878 · w(E),
thus improving the above factor 0.5. The crucial tool will be to use a semidef-
inite relaxation for MAX CUT combined with a simple, but ingenious ran-
domized “hyperplane rounding” technique.

5.2 The algorithm of Goemans and Williamson


5.2.1 Semidefinite programming relaxation
We now want to describe the Goemans-Williamson algorithm.
For this we first reformulate MAX CUT as a (non-convex) quadratic opti-
mization problem having quadratic equality constraints. With every vertex
i ∈ V , we associate a binary variable xi ∈ {−1, +1} which indicates whether
i lies in S or in S, say, i ∈ S if xi = −1 and i ∈ S if xi = +1. We model the
binary constraint xi ∈ {−1, +1} as a quadratic equality constraint
x2i = 1 for i ∈ V.
For two vertices i, j ∈ V we have
1 − xi xj ∈ {0, 2}.
This value equals to 0 if i and j lie on the same side of the cut (S, S) and
the value equals to 2 if i and j lie on different sides of the cut. Hence, one
can express the weight of the cut (S, S) by
X 1 − xi xj
w(S, S) = wij .
2
{i,j}∈E

Now, the MAX CUT problem can be equivalently formulated as


 
1 X 
mc(G, w) = max wij (1 − xi xj ) : x2i = 1 ∀i ∈ V . (5.5)
2 
{i,j}∈E
120 Approximating the MAX CUT problem

Next, we introduce a matrix variable X = (xij ) ∈ S n , whose entries


xij model the pairwise products xi xj . Then, as the matrix (xi xj )ni,j=1 =
xxT is positive semidefinite, we can require the condition that X should be
positive semidefinite. Moreover, the constraints x2i = 1 give the constraints
Xii = 1 for all i ∈ [n]. Therefore we can formulate the following semidefinite
programming relaxation:
 
1 X 
sdp(G, w) = max wij (1 − Xij ) : X  0, Xii = 1 ∀i ∈ [n] .
2 
{i,j}∈E
(5.6)
By construction, we have:
mc(G, w) ≤ sdp(G, w). (5.7)
The feasible region of the above semidefinite program is the convex (non-
polyhedral) set
En = {X ∈ S n : X  0, Xii = 1 ∀i ∈ [n]},
called the elliptope (and its members are known as correlation matrices).
One can visualize the elliptope E3 . Indeed, for a 3 × 3 symmetric matrix X
with an all-ones diagonal, we have:
 
1 x y
X = x 1 z   0 ⇐⇒ 1 + 2xyz − x2 − y 2 − z 2 ≥ 0, x, y, z ∈ [−1, 1],
y z 1
which expresses the fact that the determinant of X is nonnegative as well as
the three 2 × 2 principal subdeterminants. The following Figure 5.2.1 visu-
alizes the set of triples (x, y, z) for which X ∈ E3 . Notice that the elliptope
E3 looks like an “inflated” tetrahedron, while the underlying tetrahedron
corresponds to the linear relaxation MET3 .

Figure 5.2 Views on the convex set E3 behind the semidefinite relaxation.
5.2 The algorithm of Goemans and Williamson 121

5.2.2 The Goemans-Williamson algorithm


Goemans and Williamson [1995] show the following result for the semidefi-
nite programming bound sdp(G, w).

Theorem 5.2.1 Given a graph G with nonnegative edge weights w, the


following inequalities hold:

sdp(G, w) ≥ mc(G, w) ≥ 0.878 · sdp(G, w).

The proof is algorithmic and it gives an approximation algorithm which


approximates the MAX CUT problem within a ratio of 0.878. The Goemans-
Williamson algorithm has five steps:

(i) Solve the semidefinite program (5.6); let X be an optimal solution, so


P
that sdp(G, w) = {i,j}∈E wij (1 − Xij )/2.
(ii) Perform a Cholesky decomposition of X to find unit vectors vi ∈ R|V |−1
for i ∈ V , so that X = (viT vj )i,j∈V .
(iii) Choose a random unit vector r ∈ R|V |−1 , according to the rotationally
invariant probability distribution on the unit sphere.
(iv) Define a cut (S, S) by setting xi = sign(viT r) for all i ∈ V . That is, i ∈ S
if and only if sign(viT r) ≤ 0.
P
(v) Check whether {i,j}∈E wij (1 − xi xj )/2 ≥ 0.878 · sdp(G, w). If not, go to
step 3.

The steps 3 and 4 in the algorithm are called a randomized rounding


procedure because a solution of a semidefinite program is “rounded” (or
better: projected) to a solution of the original combinatorial problem with
the help of randomness.
Note also that because the expectation of the constructed solution is at
least 0.878 · sdp(G, w) the algorithm eventually terminates; it will pass step
5 and without getting stuck in an endless loop. One can show that with
high probability we do not have to wait long until the condition in step 5 is
fulfilled.

The following lemma (also known as Grothendieck’s identity, since it came


up in work of Grothendieck in the 50’s, however in the different context of
functional analysis) is the key to the proof of Theorem 5.2.1.

Lemma 5.2.2 Let u, v ∈ Rd (for some d ≥ 1) be unit vectors and let


r ∈ Rd be a random unit vector chosen according to the rotationally invariant
probability distribution on the unit sphere. The following holds.
122 Approximating the MAX CUT problem

(i) The probability that sign(uT r) 6= sign(v T r) is equal to

arccos(uT v)
P(sign(uT r) 6= sign(v T r)) = . (5.8)
π

(ii) The expectation of the random variable sign(uT r) sign(v T r) ∈ {−1, +1} is
equal to
2
E[sign(uT r) sign(v T r)] = arcsin(uT v). (5.9)
π
Proof (i) Since the probability distribution from which we sample the unit
vector r is rotationally invariant we can assume that u, v and r lie in a
common plane. Hence we can assume that they lie on a unit circle in R2
and that r is chosen according to the uniform distribution on this circle.
Then the probability that sign(uT r) 6= sign(v T r) depends only on the angle
between u and v. Using a figure (draw one!) it is easy to see that
1 1
P[sign(uT r) 6= sign(v T r)] = 2 · arccos(uT v) = arccos(uT v).
2π π
(ii) By definition, the expectation E[sign(uT r) sign(v T r)] can be computed
as
(+1) · P[sign(uT r) = sign(v T r)] + (−1) · P[sign(uT r) 6= sign(v T r)]

arccos(uT v)
= 1 − 2 · P[sign(uT r) 6= sign(v T r)] = 1 − 2 · π ,

where we have used (i) for the last equality. Now use the trigonometric
identity
π
arcsin t + arccos t = ,
2
to conclude the proof of (ii).

Using elementary univariate calculus one can show the following fact.

Lemma 5.2.3 For all t ∈ [−1, 1)], the following inequality holds:
2 arccos t
≥ 0.878. (5.10)
π 1−t
One can also “see” this on the following plots of the function in (5.10),
where t varies in [−1, 1) in the first plot and in [−0.73, −0.62] in the second
plot.
5.2 The algorithm of Goemans and Williamson 123
10

8 8.791e-1

8.79e-1
6
8.789e-1

4 8.788e-1

8.787e-1
2
8.786e-1

-1 -0.5 0 0.5 1 -0.73 -0.72 -0.71 -0.7 -0.69 -0.68 -0.67 -0.66 -0.65

Proof (of Theorem 5.2.1) Let X be the optimal solution of the semidefinite
program (5.6) and let v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 ,
as in Steps 1,2 of the GW algorithm. Let (S, S) be the random partition of
V , as in Steps 3,4 of the algorithm. We now use Lemma 6.2.2(i) to compute
the expected value of the cut (S, S):
P
E(w(S, S)) = {i,j}∈E wij P({i, j} is cut)

arccos(viT vj )
wij P(sign(viT r) 6= sign(vjT r)) =
P P
= {i,j}∈E {i,j}∈E wij π

1−viT vj arccos(v T v )
   
· π2 1−vT vi j .
P
= {i,j}∈E wij 2 i j

2 arccos(viT vj )
By Lemma 5.2.3, each term π 1−viT vj
can be lower bounded by the con-
stant 0.878. Since all weights are nonnegative, each term wij (1 − viT vj ) is
nonnegative. Therefore, we can lower bound E(w(S, S)) in the following way:
1 − viT vj
X  
E(w(S, S)) ≥ 0.878 · wij .
2
{i,j}∈E

Now we recognize that the objective value sdp(G, w) of the semidefinite


program is appearing in the right hand side and we obtain:
1 − viT vj
X  
E(w(S, S)) ≥ 0.878 · wij = 0.878 · sdp(G, w).
2
{i,j}∈E

Finally, it is clear that the maximum weight of a cut is at least the expected
value of the random cut (S, S):
mc(G, w) ≥ E(w(S, S)).
Putting things together we can conclude that
mc(G, w) ≥ E(w(S, S)) ≥ 0.878 · sdp(G, w).
124 Approximating the MAX CUT problem

This concludes the proof, since the other inequality mc(G, w) ≤ sdp(G, w)
holds by (5.7).

5.2.3 Remarks on the algorithm


It remains to give a procedure which samples a random vector from the
unit sphere. This can be done if one can sample random numbers from the
standard normal (Gaussian) distribution (with mean zero and variance one),
which has probability density
1 2
f (x) = √ e−x /2 .

Many software packages include a procedure which produces random num-
bers from the standard normal distribution.
If we sample n real numbers x1 , . . . , xn independently uniformly at random
from the standard normal distribution, then, the vector
1
r=p 2 (x1 , . . . , xn )T ∈ S n−1
2
x1 + · · · + xn
is distributed according to the rotationally invariant probability measure on
the unit sphere.
Finally we mention that one can modify the Goemans-Williamson algo-
rithm so that it becomes an algorithm which runs deterministically (without
the use of randomness) in polynomial time and which gives the same ap-
proximation ratio. This was done by Mahajan and Ramesh in 1995.

5.3 Some extensions


5.3.1 Reformulating MAX CUT using the Laplacian matrix
Given a graph G with edge weights w, its Laplacian matrix Lw is the sym-
metric n × n matrix with entries:
X
(Lw )ii = wij for i ∈ [n],
j:{i,j}∈E

(Lw )ij = −wij for {i, j} ∈ E, (Lw )ij = 0 for (i 6= j, {i, j} 6∈ E).
The following can be checked (Exercise 4.2).
Lemma 5.3.1 The following properties hold for the Laplacian matrix Lw :
(i) For any vector x ∈ {±1}n , 14 xT Lw x = 12 {i,j}∈E wij (1 − xi xj ).
P

(ii) For any nonnegative edge weights w ≥ 0, Lw  0.


5.3 Some extensions 125

This permits to reformulate the quadratic formulation (5.5) of MAX CUT


as
 
1 T 2
mc(G, w) = max x Lw x : xi = 1 for i ∈ V
4
and its semidefinite relaxation (5.6) as
 
1
sdp(G, w) = max hLw , Xi : X  0, Xii = 1 for i ∈ V .
4
A property of the above programs is that the matrix Lw /4 occurring in the
objective function is positive semidefinite. In the next section we consider
general quadratic programs, where Lw is replaced by an arbitrary positive
semidefinite matrix A. Then one can still show an approximation algorithm,
however with performance ration π2 ∼ 0.636, thus weaker than the 0.878 ratio
of Goemans and Williamson for the case when A = Lw for some w ≥ 0.

5.3.2 Nesterov’s approximation algorithm


Nesterov [1997] considers the class of quadratic problems:
 
X n 
qp(A) = max Aij xi xj : x2i = 1 for i ∈ [n] , (5.11)
 
i,j=1

where A ∈ S n is a symmetric matrix. (Thus, qp(A) = mc(G, w) for A =


Lw /4). Analogously define the semidefinite programming relaxation:

sdp(A) = max {hA, Xi : X  0, Xii = 1 for i ∈ [n]} . (5.12)

The following inequality holds:

qp(A) ≤ sdp(A)

for any symmetric matrix A. In the special case when A is positive semidefi-
nite, Nesterov shows that sdp(A) is a π2 -approximation for qp(A). The proof
is based on the same rounding technique of Goemans-Williamson, but the
analysis is different. It relies on the following property of the function arcsin t:
There exist positive scalars ak > 0 (k ≥ 0) such that
X
arcsin t = t + ak t2k+1 for all t ∈ [−1, 1]. (5.13)
k≥0

Based on this one can show the following result.


126 Approximating the MAX CUT problem

Lemma 5.3.2 Given a matrix X = (xij ) ∈ S n , define the new matrix

X̃ = (arcsin Xij − Xij )ni,j=1 ,

whose entries are the images of the entries of X under the map t 7→ arcsin t−
t. Then, X  0 implies X̃  0.

Proof The proof uses the following fact: If X = (xij )ni,j=1 is positive semidef-
inite then, for any integer k ≥ 1, the matrix (Xijk )ni,j=1 (whose entries are the
k-th powers of the entries of X) is positive semidefinite as well. This follows
from the fact that the Hadamard product preserves positive semidefinite
matrices (recall Section B.2.3). Using this fact, the form of the series de-
composition (5.13), and taking limits, implies the result of the lemma.

Theorem 5.3.3 Assume A is a positive semidefinite matrix. Then,


2
sdp(A) ≥ qp(A) ≥ sdp(A).
π
Proof Let X be an optimal solution of the semidefinite program (5.12) and
let v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 (as in Steps 1,2 of
the GW algorithm). Pick a random unit vector r and set xi = sign(viT r) for
i ∈ V (as in Steps 3,4 of the GW algorithm). We now use Lemma 6.2.2(ii)
to compute the expected value of ni,j=1 Aij xi xj :
P

E( ni,j=1 Aij xi xj ) = ni,j=1 Aij E(xi xj )


P P

2 Pn 2 Pn
= π i,j=1 Aij arcsin(viT vj ) = π i,j=1 Aij arcsin Xij
P 
2 n Pn
= π i,j=1 Aij Xij + i,j=1 A ij (arcsin Xij − Xij ) .

By Lemma 5.3.2, the second term is equal to hA, X̃i ≥ 0, since A  0 and
X̃  0. Moreover, we recognize in the first term the objective value of the
semidefinite program (5.12). Combining these facts, we obtain:
n
X 2
E( Aij xi xj ) ≥ sdp(A).
π
i,j=1

On the other hand, it is clear that


n
X
qp(A) ≥ E( Aij xi xj ).
i,j=1

This concludes the proof.


5.3 Some extensions 127

5.3.3 Quadratic programs modeling MAX 2SAT


Here we consider another class of quadratic programs, of the form:

 
X X 
qp(a, b) = max aij (1 − xi xj ) + bij (1 + xi xj ) : x ∈ {±1}n ,
 
ij∈E1 ij∈E2
(5.14)
where aij , bij ≥ 0 for all ij. Write the semidefinite relaxation:

 
X X 
sdp(a, b) = max aij (1 − Xij ) + bij (1 + Xij ) : X  0, Xii = 1 ∀i ∈ [n] .
 
ij∈E1 ij∈E2
(5.15)
Goemans and Williamson [1995] show that the same approximation result
holds as for MAX CUT:

Theorem 5.3.4 Assume that a, b ≥ 0. Then,

sdp(a, b) ≥ qp(a, b) ≥ 0.878 · sdp(a, b).

In the proof we will use the following variation of Lemma 5.2.3.

Lemma 5.3.5 For any z ∈ [−1, 1], the following inequality holds:

2 π − arccos z
≥ 0.878.
π 1+z

Proof Set t = −z ∈ [−1, 1]. Using the identity arccos(−t) = π − arccos t


and applying (5.10), we get: π2 π−arccos
1+z
z
= π2 arccos t
1−t ≥ 0.878.

Proof (of Theorem 5.3.4) We apply the GW algorithm: Let X = (viT vj )


be an optimal solution of (5.15). Pick a random unit vector r and set xi =
sign(viT r) for i ∈ [n]. Using the fact that E(xi xj ) = 1 − 2 · P(xi 6= xj ) = 1 −
arccos(v T v )
i j
2· π , we can compute the expected value of the quadratic objective
of (5.14) evaluated at x:
128 Approximating the MAX CUT problem

P P 
E ij∈E1 aij (1 − xi xj ) + ij∈E2 bij (1 + xi xj )

arccos(viT vj ) arccos(viT vj )
P P  
=2· ij∈E1 aij π +2· ij∈E2 ij 1 −
b π

2 arccos(viT vj ) P 2 π − arccos(viT vj )
aij (1 − viT vj ) T
P
= ij∈E1 + ij∈E bij (1 + v i v j )
| {z } π 1 − viT vj 2
| {z }π 1 + viT vj
≥0 | {z } ≥0 | {z }
≥0.878 ≥0.878

≥ 0.878 · sdp(a, b).

Here we have used Lemmas 5.2.3 and 5.3.5. From this we can conclude that
qp(a, b) ≥ 0.878 · sdp(a, b).

In the next section we indicate how to use the quadratic program (5.14)
in order to formulate and approximate MAX 2-SAT.

5.3.4 Approximating MAX 2-SAT


An instance of MAX SAT is given by a collection of Boolean clauses C1 , . . . , Cm ,
where each clause Cj is a disjunction of literals, drawn from a set of variables
{z1 , . . . , zn }. A literal is a variable zi or its negation z i . Moreover there is
a weight wj attached to each clause Cj . The MAX SAT problem asks for
an assignment of truth values to the variables z1 , . . . , zn that maximizes the
total weight of the clauses that are satisfied. MAX 2SAT consists of the
instances of MAX SAT where each clause has at most two literals. It is an
NP-complete problem [1979] and analogously to MAX CUT it is also hard
to approximate.
Goemans and Williamson show that their randomized algorithm for MAX
CUT also applies to MAX 2SAT and yields again a 0.878-approximation
algorithm. Prior to their result, the best approximation was 3/4, due to
Yannakakis (1994).

To show this it suffices to model MAX 2SAT as a quadratic program of


the form (5.14). We now indicate how to do this. We introduce a variable
xi ∈ {±1} for each variable zi of the SAT instance. We also introduce an
additional variable x0 ∈ {±1} which is used as follows: zi is true if xi = x0
and false otherwise.
Given a clause C, define its value v(C) to be 1 if the clause C is true and
5.3 Some extensions 129

0 otherwise. Thus,
1 + x0 xi 1 − x0 xi
v(zi ) = , v(z i ) = 1 − v(zi ) = .
2 2
Based on this one can now express v(C) for a clause with two literals:
1−x0 xi 1−x0 xj
v(zi ∨ zj ) = 1 − v(z i ∧ z j ) = 1 − v(z i )v(z j ) = 1 − 2 2

1+x0 xi 1+x0 xj 1−xi xj


= 4 + 4 + 4 .
Analogously, one can express v(zi ∨ z j ) and v(z i ∨ z j ), by replacing xi by
−xi when zi is negated. In all cases we see that v(C) is a linear combination
of terms of the form 1 + xi xj and 1 − xi xj with nonnegative coefficients.
Now MAX 2SAT can be modelled as
Xm
max{ wj v(Cj ) : x21 = . . . = x2n = 1}.
j=1

This quadratic program is of the form (5.14). Hence Theorem 5.3.4 applies.
Therefore, the approximation algorithm of Goemans and Williamson gives
a 0.878 approximation for MAX 2SAT.

5.3.5 Grothendieck’s inequality


Given a (possibly rectangular) matrix A ∈ Rm×n we consider the follow-
ing quadratic program, which can be seen as the bilinear analogue of the
program (5.11):
 
X m Xn
kAk∞→1 = max Aij xi yj : x2i = 1, yj2 = 1 (i ∈ [m], j ∈ [n]}.

i=1 j=1
(5.16)
See Exercise 5.4 for some properties of this quantity; in particular, when A
is symmetric positive semidefinite the program (5.16) is equivalent to the
program (5.11) (which corresponds to selecting xi = yi for all i).
A natural relaxation for kAk∞→1 is
 
X m X n 
sdp∞→1 (A) = max Aij uT
i v j : kui k = kvj k = 1, i ∈ [m], j ∈ [n] ,
 
i=1 j=1

where we optimize over m + n unit vectors ui , vj ∈ Rm+n . Note that this


optimization problem is indeed a semidefinite program (why?).
A beautiful result, due to Grothendieck, is that this semidefinite program
provides a constant approximation for kAk∞→1 .
130 Approximating the MAX CUT problem

Theorem 5.3.6 (Grothendieck’s inequality) There is a constant K so that


for all matrices A ∈ Rm×n the inequality

kAk∞→1 ≤ sdp∞→1 (A) ≤ KkAk∞→1

holds.
The smallest constant K for which the second inequality holds, is called
the Grothendieck constant KG . It is known that KG lies between 1.676 . . .
and 1.782 . . . but its exact value is currently not known. In the following we
will prove that
π
KG ≤ √ = 1.782 . . .
2 ln(1 + 2)
The argument will also rely on an approximation algorithm which uses ran-
domized rounding (in a tricky way).
The proof for Theorem 5.3.6 is algorithmic and has the following steps:

(i) Solve sdp∞→1 (A). This gives optimal unit vectors

u1 , . . . , um , v1 , . . . , vn ∈ S m+n−1 .

(ii) Use these unit vectors to construct new unit vectors

u01 , . . . , u0m , v10 , . . . , vn0 ∈ S m+n−1

according to Krivine’s trick presented in Lemma 5.3.7 below.


(iii) Choose a random vector r ∈ S m+n−1 according to the rotationally in-
variant probability distribution on the unit sphere.
(iv) Randomized rounding: Set

xi = sign((u0i )T r), yj = sign((vj0 )T r).

We analyze the expected quality of the constructed solution (xi , yj ). By


linearity of expectation we have
 
Xm X n m X
X n h i
kAk∞→1 ≥ E  Aij xi yj =
 Aij E sign((u0i )T r) sign((vj0 )T r) .
i=1 j=1 i=1 j=1

Now by Lemma 5.3.7 below each last expectation will turn out to be equal to
βuT
i vj . Then the total sum in the right hand side will be equal to βsdp∞→1 (A).
This implies kAk∞→1 ≥ βsdp∞→1 (A) and thus KG ≤ β −1 .
Now the following lemma, Krivine’s trick, finishes the proof of Theo-
rem 5.3.6.
5.3 Some extensions 131

Lemma 5.3.7 Let u1 , . . . , um and v1 , . . . , vn be unit vectors in Rm+n . Then


there exist unit vectors u01 , . . . , u0m and v10 , . . . , vn0 in Rm+n such that
h i
E sign((u0i )T r) sign((vj0 )T r) = βuT i vj for all i ∈ [m], j ∈ [n]

holds with
2 √
β= ln(1 + 2) = 0.561 . . .
π
2
Proof Define the function E : [−1, +1] → [−1, +1] by E(t) = π arcsin t.
Then by Grothendieck’s identity, Lemma 6.2.2,
h i
E sign((u0i )T r) sign((vj0 )T r) = E((u0i )T vj0 ).

Now the idea is to invert the function E so that we have

(u0i )T vj0 = E −1 (βuT


i vj )

and use the series expansion



X
E −1 (t) = g2r+1 t2r+1 ,
r=0

which is valid for all t ∈ [−1, 1] to define u0i and vj0 .


For this define the infinite dimensional Hilbert space

M
H= (Rm+n )⊗2r+1 ,
r=0

and the vectors u0i , vj0 ∈ H componentwise by

(u0i )r = sign(g2r+1 ) |g2r+1 |β 2r+1 u⊗2r+1


p
i

and
(vj0 )r = |g2r+1 |β 2r+1 vj⊗2r+1 .
p

Then

X
(u0i )T vj0 = g2r+1 β 2r+1 (uT
i vj )
2r+1
= E −1 (βuT
i vj )
r=0

and

X
1 = (u0i )T u0i = (vj0 )T vj0 = |g2r+1 |β 2r+1 ,
r=0

which defines the value of β uniquely.


132 Approximating the MAX CUT problem

It’s a fun exercise to work out β explicitly: We have

2
E(t) = arcsin t,
π

and so


π  X (−1)2r+1  π 2r+1
E −1 (t) = sin t = t .
2 (2r + 1)! 2
r=0

Hence,


X (−1)2r+1  π 2r+1 π 
1= β = sinh β ,
(2r + 1)! 2 2
r=0

which implies

2 2 √
β= arsinh 1 = ln(1 + 2)
π π


because arsinh t = ln(t + t2 + 1).

Last concern: How to find/approximate u0i , vj0 in polynomial time? Answer:


we approximate the inner product matrix


 X
(u0i )T vj0 = g2r+1 β 2r+1 (uT 2r+1

i vj )
r=0

by its series expansion which converges fast enough and then we use its
Cholesky decomposition.

5.4 Further reading and remarks


We start with an anecdote. About the finding of the approximation ratio
0.878, Knuth writes in the article “Mathematical Vanity Plates”:
5.4 Further reading and remarks 133

For their work [1995], Goemans and Williamson won in 2000 the Fulk-
erson prize (sponsored jointly by the Mathematical Programming Society
and the AMS) which recognizes outstanding papers in the area of discrete
mathematics for this result.
How good is the MAX CUT algorithm? Are there graphs where the value
of the semidefinite relaxation and the value of the maximal cut are a factor
of 0.878 apart or is this value 0.878, which maybe looks strange at first sight,
only an artefact of our analysis? It turns out that the value is optimal. In
2002 Feige and Schechtmann gave an infinite family of graphs for which the
ratio mc/sdp converges to exactly 0.878 . . .. This proof uses a lot of nice
mathematics (continuous graphs, Voronoi regions, isoperimetric inequality)
and it is explained in detail in the Chapter 8 of the book Approximation
Algorithms and Semidefinite Programming of Gärtner and Matoušek.
In 2007, Khot, Kindler, Mossel, O’Donnell showed that the algorithm of
Goemans and Williamson is optimal in the following sense: If the Unique
Games conjecture is true, then there is no polynomial time approximation
algorithm achieving a better approximation ratio than 0.878 unless P = NP.
Currently, the validity and the implications of the Unique Games conjecture
134 Approximating the MAX CUT problem

are under heavy investigation. The book of Gärtner and Matoušek also con-
tains an introduction to the unique games conjecture.

Exercises
5.1 The goal of this exercise is to show that the maximum weight stable
set problem can be formulated as an instance of the maximum cut
problem.
Let G = (V, E) be a graph with node weights c ∈ RV+ . Define the
new graph G0 = (V 0 , E 0 ) with node set V 0 = V ∪ {0}, with edge set
0
E 0 = E ∪ {{0, i} : i ∈ V }, and with edge weights w ∈ RE
+ defined by

w0i = ci − degG (i)M for i ∈ V, and wij = M for {i, j} ∈ E.


Here, degG (i) denotes the degree of node i in G, and M is a constant
to be determined.
(a) Let S ⊆ V . Show: w(S, V 0 \ S) = c(S) − 2M |E(S)|.
(b) Show: If M is sufficiently large, then S ⊆ V is a stable set of maxi-
mum weight in (G, c) if and only if (S, V 0 \ S) is a cut of maximum
weight in (G0 , w).
Give an explicit value of M for which the above holds.

5.2 Let G = (V = [n], E) be a graph with edge weights w ∈ RE and


consider the associated Laplacian matrix Lw .
(a) Show: xT Lw x = 2 · {i,j}∈E wij (1 − xi xj ) for any vector x ∈ {±1}n .
P

(b) Show: If w ≥ 0 then Lw  0.


(c) Given an example of weights w for which Lw is not positive semidef-
inite.

5.3 Let G = (V = [n], E) be a graph and let w ∈ RE


+ be nonnegative edge
weights.
(a) Show the following reformulation for the MAX CUT problem:
 
 X T
arccos(vi vj ) 
n
mc(G, w) = max wij : v1 , . . . , vn unit vectors in R .
 π 
{i,j}∈E

Hint: Use the analysis of the Goemans-Williamson algorithm.


(b) Let v1 , . . . , v7 be unit vectors. Show:
X
arccos(viT vj ) ≤ 12π.
1≤i<j≤7
Exercises 135

5.4 For a matrix A ∈ Rm×n we define the following parameters:


XX
f (A) = max | Aij |,
I⊆[m],J⊆[n]
i∈I j∈J

and
 
X X 
g(A) = max Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ {±1} .
 
i∈[m] j∈[n]

The parameter f (A) is known as the cut norm of A, also denoted as


kAk , and the parameter g(A) coincides with the parameter kAk∞→1
considered in Section 5.3.5.
(a) Show: f (A) ≤ g(A) ≤ 4f (A).
(b) Assume that all row sums and all column sums of A are equal to 0.
Show: g(A) = 4f (A).
(c) Formulate a semidefinite programming relaxation for g(A).
(d) Show:
 
X X 
g(A) = max Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ [−1, 1] .
 
i∈[m] j∈[n]

(e) Assume that A is a symmetric positive semidefinite n × n matrix.


Show:
 
Xn Xn 
g(A) = max Aij xi xj : x1 , . . . , xn ∈ {±1} .
 
i=1 j=1

(f) Show that the maximum cut problem in a graph G = ([n], E) with
nonnegative edge weights can be formulated as an instance of com-
puting the cut norm f (A) of some matrix A.

5.5 The goal of this exercise is to show that computing the cut norm of a
matrix (recall Exercise 5.4) is at least as difficult as solving the maxi-
mum cut problem.
Consider a graph G = (V = [n], E) with m edges and with non-
negative edge weights (wjk ). We define the (2m) × n matrix A, whose
columns are indexed by the vertices and having two rows for each edge.
Namely, for the edge {vj , vk }, the corresponding two rows of A have
entries
A2i−1,j = A2i,k = wjk , A2i−1,k = A2i,j = −wjk .
136 Approximating the MAX CUT problem

All other entries of A are zero.


Show that mc(G, w) = 14 kAk∞→1 = kAk .

5.6 Consider the quadratic optimization problem


 
X n 
qp(A) = max Aij xi xj : xi ∈ {±1}, i ∈ [n] .
 
ij=1

Show that if all diagonal entries of A are equal to zero then equality
holds:
 
X 
qp(A) = max Aij xi xj : xi ∈ [−1, 1], i ∈ V .
 
i,j=1

5.7 Let G = C5 denote the cycle on 5 nodes.


Compute the Goemans-Williamson semidefinite relaxation for max-
cut (where all edge weights are taken equal to 1):
( 5 )
1X
sdp(C5 ) = max (1 − Xi,i+1 ) : X ∈ S 5 , X  0, Xii = 1 ∀i ∈ [5] .
2
i=1

mc(C5 )
How does the ratio sdp(C5 ) compare to the GW ratio 0.878?

5.8 (a) Determine the optimal value of the following quadratic optimization
problem:

qp = max{x1 x2 + x2 x3 + x3 x4 − x4 x1 : x1 , x2 , x3 , x4 ∈ {±1}}.

(b) Determine the optimum value of its semidefinite programming re-


laxation:
4
sdp = max{X12 + X23 + X34 − X14 : X ∈ S+ , Xii = 1 for i ∈ [4]}.

Remark: As we will see in Chapter 14.4 these questions are relevant


to quantum information theory, since they are related to the study of
Bell inequalities arising in the so-called CHSH non-local game.

5.9 Given a graph G = (V = [n], E) with edge weights w ∈ RE , recall the


definition of max-cut
 
 X 1 − xi xj 
mc(G, w) = max wij : x ∈ {±1}n
 2 
{i,j}∈E
Exercises 137

and the definition of the basic semidefinite bound sdp(G, w) from (5.6)
 
 X 1 − Xij n

sdp(G, w) = max wij : X ∈ S+ , Xii = 1 for all i ∈ [n] .
 2 
{i,j}∈E

We now introduce a new semidefinite relaxation. For this, given a vector


y = (yij , yijkl ) indexed by all subsets of [n] with cardinality 2 or 4,
consider the symmetric matrix C(y) of size 1 + n2 indexed by all 2-
subsets of V and an additional index corresponding to ∅, with entries:
C(y)∅,∅ = C(y)ij,ij = 1, C(y)∅,ij = yij , C(y)ij,ik = yjk , C(y)ij,kl = yijkl
for all distinct i, j, k, l ∈ [n]. (Here yij , yijkl denote the cooordinates
of y indexed by the subsets {i, j}, {i, j, k, l}, resp.). we consider the
semidefinite program
 
 X 1 − yij 
sdp2 (G, w) = max wij : y = (yij , yijkl ), C(y)  0 .
 2 
{i,j}∈E
(5.17)
(a) Show that (5.17) is a relaxation of max-cut and that it is at least as
good as the basic semidefinite bound, i.e., that mc(G, w) ≤ sdp2 (G, w) ≤
SDP(G, w).
(b) Show that the condition C(y)  0 implies the following inequalities
(known as the triangle inequalities):
yij + yik + yjk ≥ −1, yij − yik − yjk ≥ −1 for distinct i, j, k ∈ [n].
(5.18)
(c) Give an example of a matrix X ∈ S n which is feasible for the pro-
gram defining sdp(G, w) and satisfies Xij + Xik + Xjk < −1 (i.e, it
violates some triangle inequality in (5.18)).
(d) For n ≥ 5 give an example of a vector y = (yij )1≤i<j≤n which
satisfies all the triangle inequalities in (5.18) and does not give a
feasible solution to sdp(G, w) (i.e., the matrix with diagonal entries
1 and with (i, j)th entry yij is not positive semidefinite).
NB: What items (d),(e) show is that the polyhedral relaxation of max-
cut defined by the triangle inequalities and the semidefinite relaxation
given by the elliptope are not comparable for n ≥ 5. Note that for n ≤ 4
the polyhedral relaxation coincides with the cut polytope and thus it
is contained in the elliptope.
138 Approximating the MAX CUT problem

.
6
Generalizations of Grothendieck’s inequality and
applications

In the previous Chapter 5 we considered Grothendieck’s inequality: There


is a constant K so that for all matrices A ∈ Rm×n the inequality

kAk∞→1 ≤ SDP∞→1 (A) ≤ KkAk∞→1

holds, where:
 
Xm X
n 
kAk∞→1 = max Aij xi yj : x2i = yj2 = 1, i ∈ [m], j ∈ [n] .
 
i=1 j=1

and where the semidefinite relaxation equals


 
X m X n 
SDP∞→1 (A) = max Aij ui · vj : kui k = kvj k = 1, i ∈ [m], j ∈ [n] .
 
i=1 j=1

As can be seen in Exercise 5.4 the parameter kAk∞→1 is closely related to


the cut norm, which is useful in many graph theoretic applications.
The number kAk∞→1 also has a meaning in theoretical physics. It can
be used to find ground states in the Ising model. The Ising model (named
after the physicist Ernst Ising), is a mathematical model of ferromagnetism
in statistical mechanics. The model consists of discrete variables called spins
that can be in one of two states, namely +1 or −1, UP or DOWN. The spins
are arranged in a graph, and each spin only interacts with its neighbors.
In many cases, the interaction graph is a finite subgraph of the integer
lattice Zn where the vertices are the lattice points and where two vertices
are connected if their Euclidean distance is one. These graphs are bipartite
since they can be partitioned into even and odd vertices, corresponding to
the parity of the sum of the coordinates. Let G = (V, E) be a bipartite
interaction graph. The potential function is given by a symmetric matrix
140 Generalizations of Grothendieck’s inequality and applications

A = (Auv ) ∈ S V . Auv = 0 if u and v are not adjacent, Auv is positive if there


is ferromagnetic interaction between u and v, and Auv is negative if there is
antiferromagnetic interaction. The particles possess a spin x ∈ {−1, +1}V .
In the absence of an external field, the total energy of the system is given
by
X
− Auv xu xv .
{u,v}∈E

The ground state of this model is a configuration of spins x ∈ {−1, +1}V


which minimizes the total energy. So computing the xu ∈ {−1, +1} which
give kAk∞→1 is equivalent to finding the ground state and computing SDP∞→1
amounts to approximate this ground state energy.
In this chapter we consider two generalizations of this bipartite Ising
model.
We start by studying the Ising model for arbitrary interaction graphs
and we find approximations of the ground state energy. The quality of this
approximation will clearly depend on properties of the interaction graph. In
particular, the theta number will appear here in an unexpected way.

Figure 6.1 Spins in the XY model

Another generalization will be the consideration of more complicated spins.


Instead of looking only at spins attaining the values −1 and +1 as in the
Ising model, the r-vector model considers spins which are vectors in the unit
sphere S r−1 = {x ∈ Rr : x · x = 1}. The case r = 1 corresponds to the Ising
6.1 The Grothendieck constant of a graph 141

model, the case r = 2 to the XY model, the case r = 3 to the Heisenberg


model, and the case r = |V | to the Berlin-Kac spherical model. We will
derive approximations of ground state energies
X
− max Au,v xu · xv , for xu ∈ S r−1 and u ∈ V
{u,v}∈E

for fixed r and for bipartite graphs.


In principle a mixture of both generalizations is possible. We do not give
it here as it would require adding even more technical details.

6.1 The Grothendieck constant of a graph


The Grothendieck constant of an undirected graph G = (V, E) is defined
as the smallest constant1 K(G) = K so that for every symmetric matrix
A ∈ S V the following inequality
 
 X 
max Auv fu · fv : fu ∈ RV , u ∈ V, kfu k = 1
 
{u,v}∈E
 
 X 
≤ K max Auv xu xv : xu = ±1, u ∈ V
 
{u,v}∈E

holds true. The left hand side is the semidefinite relaxation of the right
hand side. Furthermore, observe that the original Grothendieck constant
KG , which we studied in Chapter 5, is equal to the supremum of K(G) over
all bipartite graphs G. Hence the graph parameter K(G) can be seen as an
extension to an arbitrary graph of the classical Grothendieck constant.
The following theorem gives a surprising connection between the Grothendieck
constant of a graph and the theta number.

Theorem 6.1.1 There is a constant C so that for any graph G we have

K(G) ≤ C ln ϑ(G),

where ϑ(G) is the theta number of the complementary graph of G.

The proof of this theorem will again be based on an approximation algo-


rithm which performs randomized rounding of the solution of the semidefi-
nite relaxation.
1 Do not confuse this parameter K(G) - which depends on graph G - with the constant KG in
Chapter 5 - where the index G refers to Grothendieck.
142 Generalizations of Grothendieck’s inequality and applications

6.1.1 Randomized rounding by truncating


q
In the algorithm we use the constant M = 3 1 + ln ϑ(G). The meaning of
it will become clear when we analyze the algorithm.

(i) Solve the semidefinite relaxation


 
 X 
Γmax = max Auv fu · fv : fu ∈ RV , u ∈ V, kfu k = 1 .
 
{u,v}∈E

(ii) Choose a random vector z = (zu ) ∈ RV so that every entry zu is dis-


tributed independently according to the standard normal distribution
with mean 0 and variance 1: zu ∼ N (0, 1).
(iii) Round to real numbers yu = z · fu for all u ∈ V .
(iv) Truncate yu by setting

yu if |yu | ≤ M ,
xu =
0 otherwise

We denote by ∆ the optimal value of the ±1-constrained problem


 
 X 
∆ = max Auv xu xv : xu = ±1, u ∈ V .
 
{u,v}∈E

Important note. The solution xu which the algorithm determines does


not satisfy the ±1-constraint. It only lies in the interval [−M, M ] by con-
struction. However, it is easy to show (see Exercise 5.6) that the following
equality
 
 X 
M 2 ∆ = max Auv xu xv : xu ∈ [−M, M ], u ∈ V
 
{u,v}∈E

holds. Similarly,
 
 X 
Γmax = max Auv fu · fv : fu ∈ RV , u ∈ V, kfu k ≤ 1
 
{u,v}∈E

In the remainder of this section we shall prove the theorem by giving an


explicit bound on the ratio Γmax /∆.
6.1 The Grothendieck constant of a graph 143

6.1.2 Quality of expected solution


The expected quality of the solution xu constructed in the fourth step of the
algorithm is
X
Auv E[xu xv ]
{u,v}∈E
X
= Auv (E[yu yv ] − E[yu (yv − xv )]
{u,v}∈E

− E[yv (yu − xu )] + E[(yu − xu )(yv − xv )])


  (6.1)
X
= Γmax − E  Auv ((yu (yv − xv ) + yv (yu − xu ))
{u,v}∈E
 
X
+ E Auv (yu − xu )(yv − xv )
{u,v}∈E

because E[yu yv ] = fu · fv (Exercise 6.1 (b)).

6.1.3 A useful lemma


To estimate the second and third summands in (6.1) we use the following
lemma.
Lemma 6.1.2 Let Xu , Yu be random variables with u ∈ V . Assume

E[Xu2 ] ≤ A and E[Yu2 ] ≤ B.

Then,
 
X √
E Auv (Xu Yv + Xv Yu ) ≤ 2 AB(Γmax − Γmin ),
{u,v}∈E

where
 
 X 
Γmin = min Auv fu · fv : fu ∈ RV , u ∈ V, kfu k ≤ 1 .
 
{u,v}∈E

Proof If E[Xu2 ] ≤ 1, then


 
X
E Auv Xu Xv  ∈ [Γmin , Γmax ]. (6.2)
{u,v}∈E
144 Generalizations of Grothendieck’s inequality and applications

This follows from the fact we can write


E[Xu Xv ] u,v∈V = fu0 · fv0 u,v∈V
 

for some vectors fu0 with kfu0 k ≤ 1, because the matrix on the left hand side is
positive semidefinite (Exercise 6.1 (c)) and thus has a Cholesky factorization.
We introduce new variables Uu and Vu to be able to apply (6.2). The new
variables are
1 √ √  1 √ √ 
Uu = Xu / A + Yu / B , Vv = Xu / A − Yu / B .
2 2
Then E[Uu2 ] ≤ 1 and E[Vu2 ] ≤ 1 (verify it). So we can apply (6.2)
 
X
E Auv (Xu Yv + Xv Yu )
{u,v}∈E
    
√ X X
= 2 AB E  Auv Uu Uv  − E  Auv Vu Vv 
{u,v}∈E {u,v}∈E

≤ 2 AB(Γmax − Γmin ).

6.1.4 Estimating A and B in the useful lemma


It is clear that E[yu2 ] = 1. We find an upper bound for E[(yu − xu )2 ] in the
following lemma.
Lemma 6.1.3
Z ∞
1 2 /2 2 /2
2
E[(yu − xu ) ] = 2 √ t2 e−t dt ≤ M e−M .
2π M
Proof The equation follows from the definition of yu and xu and the nor-
mal distribution. The inequality is coming from the following simple but
noteworthy trick to estimate the integrand by an expression which can be
integrated:
2 /2 2 /2
t2 e−t ≤ (t2 + t−2 )e−t .
Then,
t2 + 1 −t2 /2
Z
2 /2
(t2 + t−2 )e−t dt = − e + constant of integration,
t
and the lemma follows by
Z ∞ r
1 2 −t2 /2 2 2 2
2√ t e dt ≤ (M + 1/M )e−M /2 ≤ M e−M /2
2π M π
6.1 The Grothendieck constant of a graph 145

because the fact M ≥ 2 implies that


r
2 4
(M + 1/M ) ≤ (M + 1/M ) ≤ M.
π 5

6.1.5 Applying the useful lemma


In (6.1) we estimate the second summand by applying the useful lemma
with Xu = yu , Yu = yu − xu . We get
 
X p
−E  Auv ((yu (yv − xv ) + yv (yu − xu )) ≥ −2 M e−M 2 /2 (Γmax −Γmin ).
{u,v}∈E

The third summand in (6.1) we estimate by applying the useful lemma with
Xu = yu − xu , Yu = −(yu − xu ). We get
 
2
X
E Auv (yu − xu )(yv − xv ) ≥ −M e−M /2 (Γmax − Γmin ).
{u,v}∈E

Altogether,
X  p 2

Auv E[xu xv ] ≥ Γmax − 2 M e−M 2 /2 + M e−M /2 (Γmax − Γmin ).
{u,v}∈E

6.1.6 Connection to the theta number


The connection to the theta number comes in the following lemma.
Lemma 6.1.4
Γmax − Γmin
≤ ϑ(G).
Γmax
Proof Exercise 6.2.
q
In particular, from the definition M = 3 1 + ln ϑ(G) we obtain
p
M ≥ M̃ = 3 1 + ln((Γmax − Γmin )/Γmax ).
Furthermore (using the inequality ln x ≤ x − 1),
p
M̃ ≤ 3 (Γmax − Γmin )/Γmax .
From this it follows that 2
 2
−M 2 /2 −M̃ 2 /2 1 Γmax
Me ≤ M̃ e ≤ .
10 Γmax − Γmin
2 GIve a hint for the right most inequality!
146 Generalizations of Grothendieck’s inequality and applications

So,
X 2 1 Γmax
Auv E[xu xv ] ≥ Γmax − √ Γmax − Γmax .
10 10 Γmax − Γmin
{u,v}∈E

Since Γmax − Γmin ≥ Γmax this leads to


X 1
Auv E[xu xv ] ≥ Γmax .
4
{u,v}∈E

Finally we can put everything together: There is a positive constant C (which


is not difficult to estimate) so that
1 X 1
∆≥ 2 Auv E[xu xv ] ≥ Γmax ,
M C ln ϑ(G)
{u,v}∈E

which finishes the proof of Theorem 6.1.1.

6.2 Higher rank Grothendieck inequality


Now we model finding ground states in the r-vector model. Given positive
integers m, n, r and a matrix A = (Aij ) ∈ Rm×n , the Grothendieck problem
with rank-r-constraint is defined as
Xm X n 
T r−1 r−1
SDPr (A) = max Aij xi yj : x1 , . . . , xm ∈ S , y1 , . . . , yn ∈ S ,
i=1 j=1

where S r−1 = {x ∈ Rr : xT x = 1} is the unit sphere; the inner product


matrix of the vectors x1 , . . . , xm , y1 , . . . , yn has rank at most r. When r = 1,
then SDP1 (A) = kAk∞→1 because S 0 = {−1, +1}.
When r is a constant that does not depend on the matrix size m, n there
is no polynomial-time algorithm known which solves SDPr . However, it is
not known if the problem SDPr is NP-hard when r ≥ 2. On the other hand
the semidefinite relaxation of SDPr (A) defined by
Xm X n 
T m+n−1
SDPm+n (A) = max Aij ui vj : u1 , . . . , um , v1 , . . . , vn ∈ S
i=1 j=1

can be computed in polynomial time using semidefinite programming.

Theorem 6.2.1 For all matrices A ∈ Rm×n we have


1
SDPr (A) ≤ SDPm+n (A) ≤ SDPr (A),
2γ(r) − 1
6.2 Higher rank Grothendieck inequality 147

where
 2
2 Γ((r + 1)/2)
γ(r) = ,
r Γ(r/2)
and where Γ is the usual Gamma function, which is the extension of the
factorial function.
1
The first three values of 2γ(r)−1 are:

1 1
= = 3.65979 . . . ,
2γ(1) − 1 4/π − 1
1 1
= = 1.75193 . . . ,
2γ(2) − 1 π/2 − 1
1 1
= = 1.43337 . . .
2γ(3) − 1 16/(3π) − 1
1
For r → ∞ the values 2γ(r)−1 converge to 1. In particular, the proof of the
theorem gives another proof of the original Grothendieck’s inequality albeit
1
with a worse constant KG ≤ 4/π−1 .

6.2.1 Randomized rounding by projecting


The approximation algorithm which we use to prove the theorem is the
following three-step process.

(i) By solving SDPm+n (A) we obtain the vectors u1 , . . . , um , v1 , . . . , vn in the


unit sphere S m+n−1 .
(ii) Choose Z = (Zij ) ∈ Rr×(m+n) so that every matrix entry Zij is dis-
tributed independently according to the standard normal distribution
with mean 0 and variance 1: Zij ∼ N (0, 1).
(iii) Project xi = Zui /kZui k ∈ S r−1 with i = 1, . . . , m, and yj = Zvj /kZvj k ∈
S r−1 with j = 1, . . . , n.

6.2.2 Extension of Grothendieck’s identity


The quality of the feasible solution x1 , . . . , xm , y1 , . . . , yn for SDPr is mea-
sured by the expectation
Xm X n 
T
SDPr (A) ≥ E Aij xi yj .
i=1 j=1

Lemma 6.2.2 Let u, v be unit vectors in Rm+n and let Z ∈ Rr×(m+n) be a


148 Generalizations of Grothendieck’s inequality and applications

random matrix whose entries are distributed independently according to the


standard normal distribution with mean 0 and variance 1. Then,

Zu T Zv
  
E
kZuk kZvk
 ∞
2 Γ((r + 1)/2) 2 X (1 · 3 · · · (2k − 1))2

= (uT v)2k+1 .
r Γ(r/2) (2 · 4 · · · 2k)((r + 2) · (r + 4) · · · (r + 2k))
k=0

The case r = 1 specializes to Grothendieck’s identity from the previous


chapter:

2
E[sign(Zu)sign(Zv)] = arcsin(uT v)
π
  T 3 
1 · 3 (uT v)5
  
2 T 1 (u v)
= u v+ + + ··· .
π 2 3 2·4 5

The proof of Lemma 6.2.2 requires quite some integration. The computa-
tion starts of by

Zu T Zv
  
E
kZuk kZvk
 
x · x − 2tx · y + y · y
Z Z
p
2 −r x y
= (2π 1 − t ) · exp − dxdy,
Rr Rr kxk kyk 2(1 − t2 )

where t = uT v. We will omit the tedious calculation here. For those who
cannot resist a definite integral (like G.H. Hardy): it can be found in [? ].
The only three facts which will be important is that the power series
expansion

Zu T Zv
   X
E = f2k+1 (uT v)2k+1
kZuk kZvk
k=0

has the following three properties:

(i) the leading coefficient f1 equals γ(r)


(ii) all coefficients f2k+1 are nonnegative
P∞
(iii) k=0 f2k+1 = 1.
6.3 Further reading 149

6.2.3 Proof of the theorem


Now we have
m Xn m X
n
X  X " T #
Zui Zvj
E Aij xT
i yj = Aij E
kZui k kZvj k
i=1 j=1 i=1 j=1
Xm X n m X
X n ∞
X
= f1 Aij uT
i vj + Aij f2k+1 (uT
i vj )
2k+1
.
i=1 j=1 i=1 j=1 k=1

The first summand equals f1 SDPm+n (A). The second summand is bounded
in absolute value by (1 − f1 )SDPm+n (A) as you will prove in Exercise 8.1
(d).
Thus for the second sum we have
m X
X n ∞
X
Aij f2k+1 (uT
i vj )
2k+1
≥ (f1 − 1)SDPm+n (A),
i=1 j=1 k=1

which finishes the proof.

6.3 Further reading


Section 6.1: The result is from [2006] and the presentation of the proof
is closely following Chapter 10 of the book Approximation Algorithms and
Semidefinite Programming of Gärtner and Matoušek, which mostly follows
K. Makarychev’s thesis.
Section 6.2: The proof is from Briët, Oliveira, Vallentin [2010] and it
follows the idea of Alon and Naor AlonN2006 which in turn relies on ideas
of Rietz.
More on the definite integral: When working with power series expansions
it is sometimes useful to use hypergeometric functions for this. For instance
we have

X (1 · 3 · · · (2k − 1))2
(u · v)2k+1
(2 · 4 · · · 2k)((r + 2) · (r + 4) · · · (r + 2k))
k=0
 
1/2, 1/2 2
= (u · v) 2 F1 ; (u · v) ,
r/2 + 1

where 2 F1 is a hypergeometric function. Hypergeometric functions are a


classical subject in mathematics. In fact, many (all?) functions you know,
are hypergeometric functions. However the topic of hypergeometric functions
seems somehow to be too classical for many modern universities.
150 Generalizations of Grothendieck’s inequality and applications

In case you want to know more about them: The book ”A=B” by Petkovsek,
Wilf and Zeilberger
http://www.math.upenn.edu/~wilf/AeqB.html
is a good start.

Exercises
6.1 (a) Why does Theorem 6.1.1 give a proof of the original Grothendieck
inequality? Which explicit upper bound for KG does it provide?
(Determine a concrete number.)
(b) Show that E[yu yv ] = fu · fv holds.
(c) Prove that the matrix

E[Xu Xv ] u,v∈V
is positive semidefinite.
(d) Show that
m X
X n ∞
X
Aij f2k+1 (ui · vj )2k+1 ≤ (1 − f1 )SDPm+n (A).
i=1 j=1 k=1

6.2 Let G = (V, E) be a graph. A vector k-coloring of G is a collection of


unit vectors fu ∈ RV so that
1
fu · fv = − if {u, v} ∈ E.
k−1
(a) Show that if G is colorable with k colors, then it also has a vector
k-coloring.
(b) Find a connection between vector k-colorings and the theta number.
(c) Prove Lemma 6.1.4.
PART THREE
GEOMETRY
7
Optimizing with ellipsoids and determinants
(Version: May 24, 2022)

7.1 Convex spectral functions


In this section we shall provide a conceptional proof for the fact that the
function X 7→ −(det X)1/n is a convex function. For this we will first char-
acterize, in Section 7.1.1, all convex functions on symmetric matrices which
only depend on their eigenvalues. In Section 7.1.2 we apply this characteriza-
tion to the function X 7→ −(det X)1/n and derive Minkowski’s determinant
inequality.

7.1.1 Davis’ characterization of convex spectral functions


Definition 7.1.1 A convex spectral function is a convex function

F : S n → R ∪ {∞}

which only depends on the spectrum of the matrix X; the collection of its
eigenvalues λ1 (X), . . . , λn (X).

In other words, by the spectral theorem,

F (X) = F (AXAT ) for all A ∈ O(n).

Hence, there is a function f : Rn → R∪{∞} which defines F by the following


equation
F (X) = f (λ1 (X), . . . , λn (X)).

Note that this implies that the function f is symmetric, i.e. its value stays
the same if we permute its n arguments; it is invariant under permutation
of the variables.
154 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

We shall prove that the function


(
n −(det X)1/n if X  0,
F : S → R ∪ {∞}, X 7→
∞ otherwise,
is an example of a convex spectral function; it will turn out to be convex
and it only depends on the eigenvalues of X as we have here
 
n
1/n
Q
− λi if all λi ≥ 0,

f (λ1 , . . . , λn ) = i=1

∞ otherwise.
The following theorem is due to Davis [1957]. It gives a complete charac-
terization of convex spectral functions.
Theorem 7.1.2 A function F : S n → R∪{∞} is a convex spectral function
if and only if the function f : Rn → R ∪ {∞} defined by
F (X) = f (λ1 (X), . . . , λn (X))
is symmetric and convex.
As a preparation for the proof we define two convex sets.
Definition 7.1.3 Let X ∈ S n be a symmetric matrix. The Schur-Horn
orbitope of X is defined as
SH(X) = conv{AXAT : A ∈ O(n)}.
Definition 7.1.4 Let (λ1 , . . . , λn ) ∈ Rn be a vector. The permutahedron
of (λ1 , . . . , λn ) is defined as the polytope
Π(λ1 , . . . , λn ) = conv{(λσ(1) , . . . , λσ(n) ) : σ ∈ Sn }
The permutahedron is a polytope lying in the hyperplane in which the
coordinates sum to λ1 +· · ·+λn . For example Π(1, 2, 3) is a hexagon. On the
other hand, the Schur-Horn orbitope is generally not a polytope. Neverthe-
less, projecting the Schur-Horn orbitope of X onto the diagonal coordinates
gives the permutahedron of the spectrum of X.
Theorem 7.1.5 Let X ∈ S n be a symmetric matrix and let λ1 (X), . . . , λn (X)
be its eigenvalues. Define the diagonal projection by
diag : S n → Rn , X 7→ (X11 , . . . , Xnn ).
Then
diag(SH(X)) = Π(λ1 (X), . . . , λn (X)).
7.1 Convex spectral functions 155

Proof The image diag(SH(X)) is convex because it is a linear image of a


convex set. Furthermore for every permutation σ ∈ Sn we have

(λσ(1) (X), . . . , λσ(n) (X)) ∈ diag(SH(X)), (7.1)

which follows from a spectral decomposition of X. Since the permutahe-


dron Π(λ1 (X), . . . , λn (X)) is the smallest convex set containing these vec-
tors in (7.1), we see that the image is contained in the permutahedron.
For the other inclusion we apply the theorem of Birkhoff and von Neu-
mann about the Birkhoff polytope, Theorem A.7.2. We perform a spectral
decomposition of X and get
n
X
X= λj uj uT
j.
j=1

with orthonormal basis u1 , . . . , un . For A ∈ O(n) define Y = AXAT . Then


 
Xn n
X  2
Yii = eT
i Y e i = e T
i λ j Au u
j j
T 
A e i = λ j e T
i Auj .
j=1 j=1

2
Set Sij = eT i Auj and verify that the matrix S = (Sij ) is doubly stochastic:
Its entries are clearly nonnegative, the rows sums are
n
X n 
X 2
Sij = eT
i Auj = kAuj k2 = 1
j=1 i=1

and similarly the column sums are also all equal to 1. By Theorem A.7.2 we
can write S as a convex combination of permutation matrices:
X X
S= ασ Pσ , with ασ ≥ 0, ασ = 1.
σ∈Sn σ∈Sn

We continue
n
X X X n
X X
Yii = λj ασ (Pσ )ij = ασ λj (Pσ )ij = ασ λσ−1 (i) ,
j=1 σ∈Sn σ∈Sn j=1 σ∈Sn

which yields
(Y11 , . . . , Ynn ) ∈ Π(λ1 , . . . , λn ).

Proof of Theorem 7.1.2 One implication follows without any work. Let F
be a convex spectral function. Then f is symmetric by definition. It is convex
156 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

since F and f “coincide” on diagonal matrices. Let x = (x1 , . . . , xn ) and


y = (y1 , . . . , yn ) and t ∈ [0, 1] be given. Define the diagonal matrices
   
x1 y1
X=
 ..  and Y = 
  .. .

. .
xn yn
Then
f (tx + (1 − t)y) = F (tX + (1 − t)Y )
≤ tF (X) + (1 − t)F (Y )
= tf (x) + (1 − t)f (y).
The proof of the other implication is more interesting. We will show that
 
F (X) = max f diag(AXAT ) (7.2)
A∈O(n)

holds. From this representation it follows that F is convex because it is a


maximum of a family of convex functions
 
fA : S n → R ∪ {∞}, X 7→ f diag(AXAT ) , with A ∈ O(n),

and hence it is convex itself.


To prove (7.2) consider a spectral decomposition of X
 
λ1
X = A
 ..  T
A .
.
λn
Then
 
F (X) = f (λ1 , . . . , λn ) = f diag(AXAT )

Thus,
 
F (X) ≤ max f diag(AXAT ) .
A∈O(n)

Let us turn to the reverse inequality. We have


 
max f diag(AXAT ) ≤ max f (diag(Y )).
A∈O(n) Y ∈SH(X)

By Theorem 7.1.5
max f (diag(Y )) = max f (x).
Y ∈SH(X) x∈Π(λ1 ,...,λn )

The maximum is attained at a vertex of the permutahedron since we are


7.1 Convex spectral functions 157

maximizing a convex function over a polytope, see Section A.6. The vertices
of Π(λ1 , . . . , λn ) are of the form (λσ(1) , . . . , λσ(n) ) with σ ∈ Sn . Since f is
symmetric, the objective values of all vertices coincide and hence
max f (x) = f (λ1 , . . . , λn ) ≤ F (X)
x∈Π(λ1 ,...,λn )

so that we proved
 
max f diag(AXAT ) ≥ F (X)
A∈O(n)

and the statement of the theorem follows.

7.1.2 Minkowski’s determinant inequality


Corollary 7.1.6 The function
(
n −(det X)1/n if X  0,
F : S → R ∪ {∞}, X 7→
∞ otherwise,
is a convex function.
Proof Indeed, the function F is a convex spectral function since det(AXAT ) =
det(X) for all A ∈ O(n) because of the multiplicativity of the determinant.
Let λ1 , . . . , λn be the eigenvalues of X, then
n
!1/n
Y
F (X) = f (λ1 , . . . , λn ) = − λi .
i=1

The function f is convex because of the AM-GM inequality, see Corol-


lary A.2.8.
By applying Jensen’s inequality
 
1 1 1
F (X + Y ) ≤ F (X) + F (Y )
2 2 2
we derive Minkowski’s determinant inequality:
(det(X + Y ))1/n ≥ (det X)1/n + (det Y )1/n
which holds for all X, Y ∈ S+ n . Geometrically, this shows that in the cone

of positive semidefinite matrices, the set of matrices having determinant


greater or equal than a given constant is a convex set.
Whereas the function F is constant along a line segment [X, Y ] when
X, Y ∈ S++n are linearly dependent, it is strictly convex along those line
segments when X, Y are linearly independent.
158 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

Lemma 7.1.7 Let X, Y ∈ S++ n be two positive definite matrices which are

linearly independent. Then the following strict inequality holds true

F ((1 − α)X + αY ) < (1 − α)F (X) + αF (Y ) for α ∈ (0, 1).

Proof Since X is strictly positive definite, it has a Cholesky decomposition


X = LLT where the matrix L is nonsingular. Then

(1 − α)X + αY = L((1 − α)I + αL−1 Y (L−1 )T )LT . (7.3)

The matrix L−1 Y (L−1 )T is positive definite; its eigenvalues we denote by


µ1 , . . . , µn . Because X and Y are linearly independent, the vectors (1, . . . , 1)
and (µ1 , . . . , µn ) are linearly independent as well. We continue our compu-
tation using (7.3) and basic properties of the determinant
 1/n
F ((1 − α)X + αY ) = − det X det((1 − α)I + αL−1 Y (L−1 )T )
n
!1/n
Y
1/n
= − (det X) ((1 − α) + αµi )
i=1
 !1/n !1/n 
n
Y n
Y
< − (det X)1/n  (1 − α) + αµi 
i=1 i=1
 1/n
= − (det X)1/n det(1 − α)I)1/n + det(αL−1 Y (L−1 )T )
= (1 − α)F (X) + αF (Y )

where the strict inequality follows from the addition to Corollary A.2.8.

Another useful consequence of the AM-GM inequality is the following


inequality between the trace and the determinant of a positive semidefinite
matrix.

Lemma 7.1.8 n be a positive semidefinite matrix. Then the


Let X ∈ S+
inequality
Tr(X) − n (det X)1/n ≥ 0

holds where equality holds if and only if X is a nonnegative multiple of the


identity.

Proof Without loss of generality we can assume that the matrix X is a


diagonal matrix. Then the inequality we want to prove is simply a rewriting
of the AM-GM inequality, see Theorem A.2.6.
7.2 Determinant maximization problems 159

In some cases we want to apply the previous lemma to products XY


of positive semidefinite matrices. However, this is not immediately possible
since the product of two symmetric matrices is generally not even symmetric
again:
    
2 1 2 −1 3 1
= .
1 2 −1 3 0 5

Nevertheless using the Cholesky factorization trick as already discussed in


the proof of Lemma 7.1.7 it is possible to rescue the statement.

Corollary 7.1.9 n positive semidefinite matrices. Then the


Let X, Y ∈ S+
inequality
Tr(XY ) − n det(XY )1/n ≥ 0

holds. Equality holds if and only if XY is a nonnegative multiple of the


identity.

Proof Consider a Cholesky factorization of X = LLT . Then

Tr(XY ) = Tr(LT Y L)

and the matrix LT Y L is positive semidefinite, thus Lemma 7.1.8 applies. So

0 ≤ Tr(LT Y L) − n det(LT Y L)1/n = Tr(XY ) − n det(XY )1/n .

We have equality if and only if LT Y L is a nonnegative multiple of the


identity. If L is singular, this implies that XY = 0, if L is nonsingular this
implies that XY = αI for some nonnegative scalar α.

7.2 Determinant maximization problems


Now we develop determinant maximization in the framework of conic pro-
gramming. All in all it quite similar to semidefinite programming.
We start by defining the relevant convex cone.

Definition 7.2.1 Define the max det cone by

Dn+1 = {(X, s) ∈ S n × R : X  0, s ≥ 0, (det X)1/n ≥ s}.

The elements of the max det cone have two components and in this way
it is similar to the Lorentz cone Ln+1 .

Theorem 7.2.2 The max det cone Dn+1 is a proper convex cone.
160 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

Proof We first verify that Dn+1 is a convex cone. For this let α ≥ 0,
(X, s), (Y, t) ∈ Dn+1 be given. Then α(X, s) ∈ Dn+1 because
αX  0, αs ≥ 0, det(αX)1/n = α(det X)1/n ≥ αs.
Also (X, s) + (Y, t) ∈ Dn+1 because
X + Y  0, s + t ≥ 0, (det(X + Y ))1/n ≥ (det X)1/n + (det Y )1/n ≥ s + t
where we used Minkowski’s determinant inequality.
The max det cone is pointed: Suppose (X, s) and −(X, s) both lie in Dn+1 ,
then X = 0 because S+ n is pointed and s = 0 because R is pointed.
+
It is full-dimensional since an open neighborhood of (I, 1/2) is contained
in Dn+1 .
It is closed because S+n is closed, Rn is closed, and the function g(X, s) =

(det X)1/n − s is continuous function and so the preimage of the closed


interval [0, ∞) is closed.
Definition 7.2.3 The max det problem in primal standard form is defined
as
p∗ = sup [(C, c), (X, s)]
(X, s) ∈ Dn+1 (7.4)
[(Aj , aj ), (X, s)] = bj , j = 1, . . . , m,
Here C, A1 , . . . , Am are symmetric matrices, c, a1 , . . . , aj , b1 , . . . , bm are real
numbers. Furthermore, the inner product on S n × R is defined by
[(X, s), (Y, t)] = hX, Y i + st = Tr(XY ) + st.
If (C, c) = (0, 1) and aj = 0 for j ∈ [m] then the above problem simplifies
to
p∗ = sup{s : X  0, (det X)1/n ≥ s, hAj , Xi = bj (j ∈ [m])}
= sup{(det X)1/n : X  0, hAj , Xi = bj (j ∈ [m])}.
so that it becomes apparent that we aim at maximizing the determinant of
a positive semidefinite matrix subject to linear equality constraints.
Theorem 7.2.4 The dual of the max det cone is
 
n+1 ∗ n 1/n t
(D ) = (Y, t) ∈ S × R : Y  0, (det Y ) ≥− .
n
Proof Given (Y, t) ∈ S n × R with Y  0 and (det Y )1/n ≥ −t/n. For
(X, s) ∈ Dn+1 we have by Corollary 7.1.9
[(X, s), (Y, t)] = Tr(XY ) + st ≥ n det(XY )1/n + st ≥ ns(−t/n) + st = 0,
7.2 Determinant maximization problems 161

and so (Y, t) ∈ Dn+1 .


For the reverse inclusion consider (Y, t) ∈ Dn+1 . For (X, 0) ∈ Dn+1 we
have

0 ≤ [(X, 0), (Y, t)] = hX, Y i

and because the psd cone is self dual, Y  0 follows. Now consider (X, (det X)1/n ) ∈
Dn+1 with X  0. Then

0 ≤ [(X, (det X)1/n ), (Y, t)] = Tr(XY ) + (det X)1/n t,

and therefore
Tr(XY )
−t ≤ .
(det X)1/n

We minimize the right hand side as a function depending on X.

First case: Y is positive definite. Then the minimum is attained at X =


Y −1 and it is equal to n(det Y )1/n . Indeed, by Corollary 7.1.9 we
have

Tr(XY ) n(det X)1/n (det Y )1/n


≥ = n(det Y )1/n
(det X)1/n (det X)1/n

with equality if and only if XY = αI for some α > 0. Since the


fraction to minimize is invariant under scaling we can choose α = 1.
Second case: Y is not positive definite, but only positive semidefinite. Then
the minimum is not attained but the infimum is zero. Indeed let
u1 ∈ Rn be a unit vector such that uT 1 Y u1 = 0. Complete u1 to an
n
orthonormal basis u1 , . . . , un of R and define
n
X
X= u1 uT
1 +ε ui uT
i
i=2

for some ε > 0. Then


Pn
Tr(XY ) uT
1 Y u1 + ε
T
i=2 ui Y ui
=
(det X)1/n ε(n−1)/n
0 + ε(n − 1)λmax (Y ) ε→0
≤ (n−1)/n
= ε1/n (n − 1)λmax (Y ) −−−→ 0,
ε
and so t ≥ 0.
162 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

Definition 7.2.5 The max det problem in dual standard form is given as

d∗ = inf bT y
y ∈ Rm
m (7.5)
X
yj (Aj , aj ) − (C, c) ∈ (Dn+1 )∗ .
j=1

Specializing to (C, c) = (0, 1) and aj = 0 for all j ∈ [m] gives

d∗ = inf bT y
y ∈ Rm
Xm
yj Aj − C  0
j=1
 1/n
m
X 1
det  yj Aj − C  ≥ .
n
j=1

Of course, the duality theory developed in Chapter 2.4 also holds for
max det problems. Often the following optimality condition is useful when
analyzing max det problems.

Theorem 7.2.6 Suppose there is no duality gap between the primal max
det problem (7.4) and its dual (7.5); p∗ = d∗ . Suppose (X, s) is feasible
for (7.4) and y ∈ Rm is feasible for (7.5). Then (X, s) is optimal for (7.4)
and y ∈ Rm is optimal for (7.5) if and only if the following three conditions
are satisfied:

!
m
P
(i) X yj Aj − C = αI for some α ≥ 0,
j=1

(ii) (det X)1/n = s,


!!1/n !
m
P n
P
(iii) det yj Aj − C =− yj aj − c /n.
j=1 j=1
7.3 Approximating polytopes by ellipsoids 163

Proof We specialize the optimality condition of Theorem 2.4.1 and get


  
Xn n
X
0 = (X, s),  yj Aj − C, aj − c
j=1 j=1
   
Xn n
X
= Tr X  y j Aj − C   + s  aj − c
j=1 j=1
  1/n  
n
X n
X
≥ n det X  yj Aj − C  + s aj − c
j=1 j=1
Pn  
n
!
j=1 aj −c X
≥ ns − + s aj − c
n
j=1

= 0.
So the two inequalities are tight. The first tight inequality implies together
with Corollary 7.1.9 condition (i), the second tight inequality implies condi-
tion (ii) and (iii).

7.3 Approximating polytopes by ellipsoids


Ellipsoids are important geometric objects partially due to their simple de-
scriptions. They can be used for instance to approximate other more com-
plicated convex sets. Here we will use ellipsoids to approximate polytopes.
In particular we will answer the questions:

(i) Inner approximation: How can we determine an ellipsoid contained in a


polytope which has largest volume?
(ii) Outer approximation: How can we determine an ellipsoid containing a
polytope which has smallest volume?
(iii) Can we estimate the quality of this inner and of this outer approximation?

Recall from Chapter 3.1 that one can represent an ellipsoid by a positive
n and a vector x ∈ Rn either implicitly by a strictly
definite matrix A ∈ S++
convex inequality

E(A, x) = {y ∈ Rn : (y − x)T A−1 (y − x) ≤ 1}

or explicitly by the affine image of the unit ball

E(A2 , x) = {Ay + x : y ∈ Bn }.
164 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

The volume of E(A, x) equals det A vol Bn . In the following the volume of
the unit ball is just a dimension dependent factor which does not play any
further role.
Also recall that one can represent a polytope in two ways. Explicitly, as
a convex hull of finitely many points
P = conv{x1 , . . . , xN } ∈ Rn ,
or implicitly as a bounded intersection of finitely many halfspaces
P = {y ∈ Rn : aT T
1 y ≤ b1 , . . . , am y ≤ bm }.

7.3.1 Outer approximation


To formulate the condition that an ellipsoid is containing a polytope we will
use the implicit representation of the ellipsoid and the explicit representation
of the polytope.
Proposition 7.3.1 The ellipsoid E(A, x) contains the polytope
P = conv{x1 , . . . , xN }
if and only if the (n + 1) × (n + 1)-matrix
s dT
 

d A−1
is positive semidefinite with d = A−1 x and the inequality
−1
xT T
i A xi − 2xi d + s ≤ 1

holds for all i ∈ [N ].


Proof For verifying the condition P ⊆ E(A, x) it suffices to consider only
the points xi , with i ∈ [N ]. The point xi lies in the ellipsoid E(A, x) if and
only if
(xi − x)T A−1 (xi − x) ≤ 1
−1 T −1 T −1
⇐⇒ xT
i A xi − 2xi A x + x A x ≤ 1
−1
⇐⇒ xT T T
i A xi − 2xi d + d Ad ≤ 1
−1
⇐⇒ xT T
i A xi − 2xi d + s ≤ 1,

where d = A−1 x and s ≥ dT Ad. Because the matrix A is positive definite


we can express s ≥ dT Ad using the Schur complement, see Lemma B.2.1, as
s dT
 
 0.
d A−1
7.3 Approximating polytopes by ellipsoids 165

The constraint
−1
xT T
i A xi − 2xi d + s ≤ 1

can be expressed by
xT s dT
   
1 i , ≤ 1.
xi xi xTi d A−1
Our goal is now to find a best outer ellipsoidal outer approximation of the
polytope P . That is an ellipsoid which contains P and whose volume is as
small as possible. We can find such an ellipsoid by a conic program
sup (det A−1 )1/n
s dT
 
n+1
∈ S+
d A−1
−1
xT T
i A xi − 2xi d + s ≤ 1, i ∈ [N ]

Note here that maximizing (det A−1 )1/n is equivalent to minimizing (det A)1/n .
Rewriting this program into primal standard form yields
sup t
n+1
((B, t), Y, u) ∈ Dn+1 × S+ × RN
+
hEij , Bi + h−Ei+1,j+1 , Y i = 0, 1 ≤ i ≤ j ≤ n (7.6)
−xT
  
1 i , Y + ui = 1, i ∈ [N ].
−xi xi xT i

Its dual max det problem is


N
X
inf yi
i=1
 
X
 zij Eij , −1 ∈ (Dn+1 )∗
i,j
(7.7)
T
 
X X 1 −xi
− zij Ei+1,j+1 + yi ∈ S n+1
−xi xi xT
i
i,j i

y1 , . . . , yN ≥ 0, i ∈ [N ]
Duality theory implies that there is no duality gap between primal and
dual and that the optimal values are attained whenever the polytope is full
dimensional.
Lemma 7.3.2 If dim P = n, then both programs are strictly feasible.
166 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

Proof First we construct a strictly feasible solution of the primal (7.6).


Since P is a polytope, it is a bounded set. So there is a ball B(x, r) with
center x and radius r which is containing P . We can make r so large that no
vertices of P lie on the boundary of B(x, r). Let ε > 0 be sufficiently small.
Then
s dT
 
1
B = 2 I, t = ε, Y = , d = Bx, s = dT B −1 d + ε,
r d B
ui = 1 − xT T
i Bxi − 2xi d + s, i ∈ [N ]

is a strictly feasible solution of the primal.


Then we exhibit a strictly feasible solution of the dual (7.7). Since dim P =
n we may assume that after performing an affine transformation that

x1 = e1 , x2 = e2 , . . . , xn = en , xn+1 = 0
P
holds. Choose zij so that i,j zij Eij = I and then (I, −1) lies in the interior
of the dual cone (Dn+1 )∗ . For ε > 0 set

y1 = y2 = · · · = yn = 2, yn+1 = 2n + ε, yn+2 = · · · = yN = ε.

Then
N  
X 1 −xi
−I + yi
−xi xi xTi
i=1
n    T    T !
X 1 1 0 0
= 2 −
−ei −ei ei ei
i=1
  N   T
2n + ε 0 X 1 1
+ +ε .
0 0 −xi −xi
i=n+2

Here the third summand is positive semidefinite and the first two summands
together are positive definite as one can see from the Schur complement, see
Lemma B.2.1:
 
4n + ε −2 . . . −2
 −2 1 
0
 
 .. ..
 . . 
−2 1

because
4n + ε − (−2e)T I(−2e) = 4n + ε − 4n = ε > 0.
7.3 Approximating polytopes by ellipsoids 167

7.3.2 Inner approximation


To formulate the condition that an ellipsoid is contained in a polytope we will
use the explicit representation of the ellipsoid and the implicit representation
of the polytope.

Proposition 7.3.3 The ellipsoid E(A2 , x) is contained in the polytope

P = {x ∈ Rn : aT T
1 y ≤ b1 , . . . , am y ≤ bm }

if and only if the inequality

kAaj k ≤ bj − aT
jx

holds for all j ∈ [m].

Proof All points in the unit ball y ∈ Bn satisfy the inequality

aT
j (Ay + x) ≤ bj

if and only if
max{(Aaj )T y : y ∈ Bn }} ≤ bj − aT
j x.

To find the maximum we use the Cauchy-Schwarz inequality and get


Aaj
max{(Aaj )T y : y ∈ Bn } = (Aaj )T = kAaj k.
kAaj k

Using the second order cone we can model the inequality kAaj k ≤ bj −aT
jx
T
as (Aaj , bj − aj x) ∈ Ln+1

Now a best inner ellipsoidal approximation of the polytope P is an ellip-


soid E(A2 , x) with E(A2 , x) ⊆ P and vol E(A2 , x) = det A vol Bn maximal.
To find such an ellipsoid we can formulate the conic program

sup s
(A, s) ∈ Dn , x ∈ Rn
(Aaj , bj − aT
j x) ∈ L
n+1
, j ∈ [m]

which in primal standard form equals

sup s
(A, s) ∈ Dn , x ∈ Rn , (yj , tj ) ∈ Ln+1 , j ∈ [m] (7.8)
yj = Aaj , tj = bj − aT
j x, j ∈ [m].
168 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

7.3.3 The Löwner-John ellipsoids


Using Proposition 7.3.3 and Proposition 7.3.1 one can find an ellipsoid of
largest volume contained in a polytope P as well as an ellipsoid of smallest
volume containing P by solving determinant maximization problems. Re-
call Lemma 7.1.7. The objective function X 7→ −(det X)1/n is strictly con-
vex along line segments [X, Y ] when X, Y ∈ S++ n are linearly independent.

Therefore both determinant maximization problems have unique solutions.


There is exactly one ellipsoid of largest volume contained in P and exactly
one ellipsoid of smallest volume containing P .
These two ellipsoids are called the Löwner-John ellipsoids 1 of P . We de-
note Ein (P ) the ellipsoid giving the optimal inner approximation of P and
Eout (P ) the ellipsoid giving the optimal outer approximation of P .
The following theorem can be traced back to John (1948). Historically, it is
considered to be one of the first theorems involving an optimality condition
for nonlinear optimization.

Theorem 7.3.4 Let P ⊆ Rn be a polytope with dim P = n.

(i) The Löwner-John ellipsoid Eout (P ) is the closed unit ball Bn = {x ∈


Rn : kxk ≤ 1} if and only if P ⊆ Bn and if there are positive numbers
λ1 , . . . , λM > 0 and vertices x1 , . . . , xM of P so that the following three
conditions hold:
(a) kxi k = 1 for i ∈ [M ],
M
P
(b) λi xi = 0,
i=1
N
λ i xi xT
P
(c) i = I.
i=1
(ii) The Löwner-John ellipsoid Ein (P ) is the closed unit ball Bn if and only
if Bn ⊆ P and if there are positive numbers λ1 , . . . , λM > 0 and points
x1 , . . . , xM ∈ ∂P on the boundary of P so that the above three conditions
hold.

Before we give the proof we comment on the optimality conditions. By


applying an affine transformation we can always assume that the Loewner-
John ellipsoids are unit balls, see Exercise 7.11.
The second optimality condition makes sure that not all the vectors x1 , . . . , xM
lie on one side of the sphere. The third optimality condition shows that the
1 In the literature the terminology seems to differ from author to author.
7.3 Approximating polytopes by ellipsoids 169

vectors behave similar to an orthonormal basis, in the sense that we can


compute the inner product of two vectors x and y by
M
X
T
x y= λi (xT T
i x)(xi y).
i=1

Both optimality conditions together imply that M ≥ n + 1.


Proof Sufficiency in the first statement is an application of weak duality.
Consider the dual conic program (7.7) and label the vertices of P accordingly.
Define
(y1 , . . . , yM , yM +1 , . . . , yN ) = (λ1 /n, . . . , λM /n, 0, . . . , 0).
and
X 1
zij Eij = I.
n
i,j

This is a feasible solution of the dual.


Consider the primal conic program (7.6). Define
 
0 0
(B, t) = (I, 1) and Y = .
0 I
This is a feasible solution of the primal because P ⊆ Bn . The objective value
of the primal feasible solution equals 1. The objective value of the dual also
equals to n1 M
P
i=1 λi = 1 because by the third optimality condition
M M M
!
X X X
λi = λ i xT
i x = Tr λ i xi xT
i = Tr(I) = n.
i=1 i=1 i=1

Necessity in the first statement is an application of strong duality. Suppose


that Bn = Eout (P ) holds. Clearly, P ⊆ Bn . Let y1∗ , . . . , yN
∗ be an optimal

solution of the dual program. After reordering we may assume that


y1∗ > 0, . . . , yM
∗ ∗
> 0, yM ∗
+1 = 0, . . . , yN = 0.

By assumption
 
0 0
(B, t) = (I, 1) and Y = .
0 I
is an optimal solution of the primal. By complementary slackness we know
si = 0 for i ∈ [M ]. Hence
−xT
   
1 i 0 0
, = 1 for i ∈ [M ]
−xi xi xT
i 0 I
170 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

and we arrive at the first optimality condition kxi k = 1 for i ∈ [M ]. Let


Z ∗ = i,j zij
∗ E be an optimal solution of the dual. Then by the optimality
P
ij
condition for max det problems, Theorem 7.2.6, there is a positive scalar
α > 0 so that
IZ ∗ = αI and (det Z ∗ )1/n = 1/n.
Hence, Z ∗ = n1 I. Let Y ∗ be an optimal solution for the primal. Then
 
X  1 T

X −x
Y ∗ − ∗
zij Ei+1,j+1 + yi∗ i  = 0,
−x1 xi xT i
i,j i

and so
−xT
   
X 1 ∗ 0
yi∗ i = 1
−x1 xi xT
i 0 nI
i

because generally
      
0 0 A B 0 0 0 0
= =
0 I C D C D 0 0
Define λi = nyi∗ , then the second and third optimality condition follow from
the equality above.
The second statement about Ein (P ) follows from the fist statement about
Eout (P ) by considering the polar polytope
P o = {y ∈ Rn : xT y ≤ 1 for all x ∈ P },
see Exercise 7.13.
the cube are balls.
This optimality condition is helpful in surprisingly many situation. For
example one can use them to prove an estimate on the quality of the inner
and outer approximation.
Corollary 7.3.5 Let P ⊆ Rn be an n-dimensional polytope, then there is
an invertible affine transformation T so that
Bn = T Ein (P ) ⊆ T P ⊆ nT Ein (P ) = nBn
holds.
Proof It is clear that we can map Ein (P ) to the unit ball by an invertible
affine transformation, see Exercise 7.11. So we can use the equations
M
X M
X
λ i xi = 0 and λ i xi xT
i =I
i=1 i=1
7.3 Approximating polytopes by ellipsoids 171

to show T P ⊆ nEin (P ). By taking the trace on both sides of the second


equations we also have
M
X
λi = n.
i=1

The supporting hyperplane through the boundary point xi of P is orthogonal


the unit vector xi (draw a figure). Hence,
Bn ⊆ P ⊆ Q = {x ∈ Rn : xT
i x ≤ 1 (i ∈ [M ])}.

Let x be in Q, then because xT xi ∈ [−kxk, 1] we have


M
X
0 ≤ λi (1 − xT xi )(kxk + xT xi )
i=1
M
X M
X M
X
T
= kxk λi + (1 − kxk) λ i x xi − λi (xT xi )2
i=1 i=1 i=1
2
= kxkn + 0 − kxk ,
and so kxk ≤ n.
Similarly, we have
1 1
Bn = T Eout (P ) ⊆ T P ⊆ T Eout (P ) = Bn
n n
for a suitable affine transformation T which we can derive from the previous
corollary by considering the polar polytope.
If P is centrally symmetric, i.e. if P = −P holds, then in the above

inequalities the factor n can be improved to n, see Exercise 7.14
Another nice mathematical application of the uniqueness Löwner-John
ellipsoids is the following.
Proposition 7.3.6 Let P be a polytope and consider the group G of all
affine transformations which map P into itself. Then there is an affine trans-
formation T so that T GT −1 is a subgroup of the orthogonal group.
Proof Since the volume is invariant under affine transformations with de-
terminant equal to 1 or −1 (only those affine transformations can be in G)
and since the Löwner-John ellipsoid is the unique maximum volume ellipsoid
contained in a polytope we have
AEin (P ) = Ein (AP ) = Ein (P )
for all A ∈ G.
172 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

Let T be the affine transformation which maps the Löwner-John ellipsoid


Ein (P ) to the unit ball B. Then for every A ∈ G
T AT −1 B = T AEin (P ) = T Ein (P ) = B.
So T AT −1 leaves the unit ball invariant, hence it is an orthogonal transfor-
mation.

7.4 Further reading


Davis’ characterization together with Fan’s theorem, Theorem 1.2.2, can
be used to determine an explicit linear matrix inequality modeling the
condition F (X) ≤ t for many functions F . See Ben-Tal and Nemirovski
[2001][Proposition 4.2.1] for the complete statement. A similar argument
also works for functions depending only on singular values. For more in-
formation on convex spectral functions and general eigenvalue optimization
problems the survey Lewis [2003] is a good start.
Many examples of determinant maximization problems are in Vanden-
berghe, Boyd, and Wu [1998]. They treat matrix completion problems, risk-
averse linear estimation, experimental design, maximum likelihood estima-
tion of structured covariance matrices, and Gaussian channel capacity. Next
to this, they also develop the duality theory and an interior point algorithm
for determinant maximization problems.
In fact one can reformulate determinant maximization problems as semidef-
inite programs but dealing directly with determinant maximization problem
is generally far easier and more efficient.
The Löwner-John ellipsoid is an important and useful concept in geometry,
optimization, and functional analysis, see Henk [2012] for a summary which
also includes historical remarks. For instance, Lenstra’s polynomial time
algorithm for solving integer programs in fixed dimension is based on it.
Another excellent and very elegant source on applications of the Löwner-
John ellipsoid in geometry and functional analysis is by Ball [1997]. He
uses John’s optimality criterion to give a reverse isoperimetric inequality
(the ratio between surface and volume is maximized by cubes) and to prove
Dvoretsky’s theorem (high dimensional convex bodies have almost ellip-
soidal slices). The proof of the second part of Theorem 7.3.4 (b) is from the
beautiful note Ball [1992].
Many more examples of computing ellipsoidal approximations are in Ben-
Tal and Nemirovski [2001][Chapter 4.9], especially ellipsoidal approxima-
tions of unions and intersections of ellipsoids and approximating sums of
ellipsoids.
Exercises 173

Exercises
7.1 Let X, Y ∈ Sn
be symmetric matrices. Use Theorem 7.1.5 to determine
the minimum
min{hX, AY AT i : A ∈ O(n)}.

7.2 Show that the sum of the largest k eigenvalues of a symmetric matrix
is a convex spectral function.
7.3 Use Davis’ characterization of convex and spectral functions to show:
The function

− ln det X, if X  0,
F : S n → R ∪ {∞}, F (X) =
∞, otherwise,
is convex and spectral.
7.4 For which values of k ∈ Z is the function

Φk : S n → R, X 7→ Tr(X k )

a convex spectral function?


7.5 Let C ∈ S n be a symmetric matrix and let G = (V, E) be a graph with
vertex set V = [n]. The solution of the following max det problem
 1/n
X
max det C + xij Eij 
{i,j}∈E
X
n
C+ xij Eij ∈ S0 ,
{i,j}∈E

is said to be a G-modification of C with maximal entropy. Show: If a


G-modification of C with maximal entropy A∗ = C + {i,j}∈E x∗ij Eij
P

exists, then
∀{i, j} ∈ E : ((A∗ )−1 )ij = 0.

7.6 Let
P = {x ∈ Rn : aT
j x ≤ bj (j ∈ [m])}

be an n-dimensional polytope. Formulate the following optimization


problem as a conic program in primal standard form: Find the largest
volume of an axis-parallel parallelipiped R

R = {x ∈ Rn : α1 ≤ x1 ≤ β1 , . . . , αn ≤ xn ≤ βn }

with R ⊆ P .
174 Optimizing with ellipsoids and determinants (Version: May 24, 2022)

7.7 Determine the dual conic program of (7.8) and prove that their is no
duality gap between primal and dual.
7.8 Let P ⊆ Rn be an n-dimensional polytope and let A ∈ Rn×n be an
invertible matrix. Show:
Eout(AP ) = AEout (P ).
7.9 Determine the Löwner-John ellipsoid Ein (Cn ) of the regular n-gon Cn
in the plane
Cn = conv{(cos(2πk/n), sin(2πk/n)) ∈ R2 : k = 0, 1, . . . , n − 1}.
7.10 Let P be the polytope
 
1
P = conv √ (±ei ± ej ) : i, j = 1, . . . , 4, i 6= j ⊆ R4 ,
2
which has 24 42 = 96 many vertices. Show that Eout (P ) = B4 holds.


7.11 Let P ⊆ Rn be an n-dimensional polytope and let A ∈ Rn×n be an


invertible matrix. Show:
Eout (AP ) = AEout (P ).
7.12 Let T ⊆ Rn be a regular simplex with inradius 1. Show that Ein (T ) =
Bn and that
Bn ⊆ T ⊆ nBn .
holds.
7.13 Prove part (ii) of Theorem 7.3.4.
7.14 Let P be a centrally symmetric convex polytope. Show the following
strengthening of Corollary 7.3.5: There is an invertible affine transfor-
mation T so that

Bn ⊆ T P ⊆ nBn
holds.
7.15 Let P ⊆ Rn be a centrally symmetric polytope (P = −P ). Find a conic
program (with possibly infinitely many constraints) which determines
the minimal value ρ ∈ R such that there exists an ellipsoid E for which
E ⊆ P ⊆ ρE holds.
8
Euclidean embeddings: Low dimension

In many situations one is interested in finding solutions to semidefinite pro-


grams having a small rank. For instance, if the semidefinite program arises
as relaxation of a combinatorial optimization problem - like the maximum
cut problem or the maximum stable set problem discussed earlier in Part II
- then its rank one solutions correspond to the solutions of the underlying
combinatorial problem, which are the ones one is really interested in. As
other examples, finding an embedding of a weighted graph in the Euclidean
space of dimension d, or finding a sum of squares decomposition of a poly-
nomial with d squares, amounts to finding a solution of rank at most d to
some semidefinite program. As a last example, the minimum dimension of
an orthonormal representation of a graph G is given by the minimum rank
of a positive semidefinite matrix with nonzero diagonal entries and with zero
entries at the positions corresponding to the non-edges of G.

This chapter is organized as follows. First we show some upper bounds


on the smallest possible rank of solutions to semidefinite programs. For this
we have to look in some detail into the geometry of the faces of the cone
of positive semidefinite matrices. Then we discuss several applications: Eu-
clidean embeddings of weighted graphs, hidden convexity results for images
of quadratic maps, and the S-lemma which deals with quadratic inequalities.
We also discuss complexity issues related to the problem of determining the
smallest possible rank of solutions to semidefinite programs.
176 Euclidean embeddings: Low dimension

8.1 Geometry of the positive semidefinite cone


8.1.1 Faces of convex sets
We begin with recalling some preliminary facts about faces of convex sets
which we will use to study the faces of the positive semidefinite cone S+n.

See also Appendix A for details.


Let K be a convex set in Rn . A set F ⊆ K is called a face of K if for all
x ∈ F the following holds:

x = ty + (1 − t)z with t ∈ (0, 1), y, z ∈ K =⇒ y, z ∈ F.

Clearly any intersection of faces is again a face. Hence, for x ∈ K, the


smallest face containing x is well defined (as the intersection of all the faces
of K that contain x), let us denote it by FK (x).
Recall that FK (x) is the unique face of K which contains x in its relative
interior (see Lemma A.5.1). In particular, x lies in the relative interior of
K precisely when FK (x) = K. Moreover, x is an extreme point of K, i.e.,

x = ty + (1 − t)z with y, z ∈ K and t ∈ (0, 1) =⇒ y = z = x,

precisely when FK (x) = {x}. Recall also that if K does not contain a line
then it has at least one extreme point.
A point z ∈ Rn is called a perturbation of x ∈ K if x ± z ∈ K for some
 > 0; then the whole segment [x − z, x + z] is contained in the face FK (x).
The set of perturbations of x ∈ K is a linear space, which we denote by
PK (x), and whose dimension is equal to the dimension of the face FK (x).

8.1.2 Faces of the positive semidefinite cone


Here we describe the faces of the positive semidefinite cone S+ n . For the

nonnegative orthant Rn+ it is easy to see that its faces are obtained by fixing
some coordinates to zero and thus each face of Rn+ can be identified to a
smaller nonnegative orthant Rr+ for some integer 0 ≤ r ≤ n. It turns out
that an analogous property holds for the faces of S+ n , namely each face of

S+n can be identified to a smaller semidefinite cone S r for some 0 ≤ r ≤ n


+
(see Hill, Waters [1987]).

Proposition 8.1.1 Let A ∈ S+ n , r = rank(A), and let F (A) = F n (A) de-


S+
n containing A. Let u , · · · , u be an orthonormal
note the smallest face of S+ 1 n
set of eigenvectors of A, where u1 , · · · , ur correspond to its nonzero eigen-
values, and let U (resp., U0 ) be the matrix with columns u1 , · · · , un (resp.,
8.1 Geometry of the positive semidefinite cone 177

u1 , · · · , ur ). The map
φA : S r → S n
 
Z 0 (8.1)
Z 7→ U U T = U0 ZU0T
0 0
is a rank-preserving isometry, which r:
identifies F (A) and S+
   
r Z 0 T T r
F (A) = φA (S+ ) = U U = U0 ZU0 : Z ∈ S+ .
0 0
Moreover, F (A) is given by
n
F (A) = {X ∈ S+ : kerX ⊇ kerA} (8.2)
and its dimension is equal to r+1

2 .

Proof Let λ1 , . . . , λr be the nonzero eigenvalues of A corresponding to its


eigenvectors u1 , . . . , ur and define the matrices D = Diag(λ1 , · · · , λr , 0, · · · , 0) ∈
S+n , D = Diag(λ , · · · , λ ) ∈ S r n
0 1 r ++ and ∆ = Diag(0, · · · , 0, 1, · · · , 1) ∈ S+ ,
where the first r entries are 0 and the last n − r entries are 1. Finally, define
the matrix Q = U ∆U T = ni=r+1 ui uT
P
i . Then, hQ, Ai = 0, since the vectors
ur+1 , · · · , un span the kernel of A, and we have A = U DU T and h∆, Di = 0.
As Q  0, the hyperplane
H = {X ∈ S n : hQ, Xi = 0}
n and thus its intersection with S n
is a supporting hyperplane for S+ +
n n
F = S+ ∩ H = {X ∈ S+ : hQ, Xi = 0}
n containing A. We first claim that
is a face of S+
n
F = {X ∈ S+ : kerX ⊇ kerA}.
Indeed, for X  0, the condition hQ, Xi = 0 is equivalent to Xui = 0 for all
i = r + 1, . . . , n and thus to kerA ⊆ kerX. Moreover, we have
r
F = φA (S+ ).
For this, consider X ∈ S n written as X = U Y U T where Y ∈ S n . Then,
X  0 if and only if Y  0. Moreover, hQ, Xi = 0 if and only if h∆, Y i = 0
or, equivalently, Y = φA (Z) for some Z ∈ S r . Summarizing, X ∈ F if and
only if X = φA (Z) for some Z ∈ S+ r.

We now show that F = F (A). In view of Lemma A.5.1, it suffices to show


that A lies in the relative interior of the face F . We use the characterization
of interior points from Lemma A.2.3: let X ∈ F , we show that there exist
178 Euclidean embeddings: Low dimension

X 0 ∈ F and a scalar t ∈ (0, 1) such that A = tX + (1 − t)X 0 . As we just


saw above, X = φA (Z) for some Z ∈ S+ r . As D is an interior point of S r ,
0 +
there exists Z 0 ∈ S+r and t ∈ (0, 1) such that D = tZ + (1 − t)Z 0 . Then,
0
X 0 = φA (Z 0 ) ∈ F and A = tX + (1 − t)X 0 , as required.
Summarizing, we have shown that F (A) can be identified with S+ r via the

rank-preserving isometry:
 
Z 0
Z 7→ Y = 7→ X = U Y U T
0 0
D0 7→ D 7→ A
r r
S+ → S+ ⊕ 0n−r → F (A)
r = r+1 .

and thus the dimension of F is equal to dim S+ 2

As a direct n
application, the possible dimensions for the faces of the cone S+
r+1

are 2 for r = 0, 1, · · · , n. Moreover there is a one-to-one correspondence
n and the lattice of subspaces of Rn :
between the lattice of faces of S+
U subspace of Rn 7→ FU = {X ∈ S+
n
: kerX ⊇ U }, (8.3)
with U1 ⊆ U2 ⇐⇒ FU1 ⊇ FU2 .

8.1.3 Faces of spectrahedra


Consider an affine subpsace A in the space of symmetric matrices, of the
form
A = {X ∈ S n : hAj , Xi = bj for j ∈ [m]}, (8.4)
where A1 , · · · , Am are given matrices in S n and b1 , · · · , bm are given scalars.
Assume that A is not empty. The codimension of A is
codim A = dim S n − dim A = dimhA1 , · · · , Am i,
where hA1 , · · · , Am i denotes the linear subspace of S n spanned by {A1 , . . . , Am }.
If we intersect the cone of positive semidefinite matrices with the affine
space A, we obtain the convex set
n
K = S+ ∩ A = {X ∈ S n : X  0, hAj , Xi = bj for j ∈ [m]}. (8.5)
This is the feasible region of a typical semidefinite program (in standard
primal form). Such a convex set is called a spectrahedron – this name is
in the analogy with polyhedron, which corresponds to the feasible region of
a linear program, and spectra reflects the fact that the definition involves
spectral properties of matrices.
8.1 Geometry of the positive semidefinite cone 179

An example of spectrahedron is the elliptope


n
En = {X ∈ S+ : Xii = 1 for i ∈ [n]}, (8.6)
which is the feasible region of the semidefinite relaxation for Max-Cut con-
sidered in Chapter 5.
As an application of the description of the faces of the positive semidefinite
cone in Proposition 8.1.1, we can describe the faces of the spectrahedron K.
These results can be found in Deza, Laurent [1997, §31.5] and Pataki [2000].
Proposition 8.1.2 Let K be the spectrahedron (8.5). Let A ∈ K, r =
rank(A), and let U, U0 be as in Proposition 8.1.1. Define the affine space
AA = {Z ∈ S r : hU0T Aj U0 , Zi = bj for j ∈ [m]} ⊆ S r , (8.7)
and the corresponding linear space:
LA = {Z ∈ S r : hU0T Aj U0 , Zi = 0 for j ∈ [m]} ⊆ S r . (8.8)
Then, the map φA from (8.1) identifies the sets FK (A) (the smallest face of
r ∩A :
K containing A) and S+ A
r
FK (A) = φA (S+ ∩ AA ),
and the set of perturbations of A in K is given by
PK (A) = φA (LA ).
Moreover, FK (A) is also given by
FK (A) = {X ∈ K : kerX ⊇ kerA} (8.9)
and its dimension is equal to
 
r+1
dim FK (A) = dim AA = − dimhU0T Aj U0 : j ∈ [m]i. (8.10)
2
Proof By definition, K = S+ n ∩ A and F (A) is the smallest face of S n
+
containing A. One can verify that the set F (A) ∩ A contains A in its relative
interior and thus we have that FK (A) = F (A) ∩ A. Hence (8.9) follows from
(8.2).
If X = φA (Z) is the image of Z ∈ S r under the map φA from (8.1), then
  
T T T Z 0
hAj , Xi = hU Aj U, U XU i = U Aj U, = hU0T Aj U0 , Zi.
0 0
r ∩ A under the map φ , i.e.,
Therefore, the face FK (A) is the image of S+ A A
FK (A) = φA (S+ ∩ AA ). Moreover, a matrix B ∈ S n is a perturbation of A if
r

and only if A ± B ∈ K for some  > 0, which is equivalent to B ∈ U0 LA U0T ,


180 Euclidean embeddings: Low dimension

i.e., B ∈ φA (LA ). Therefore, we find that PK (A) = φA (LA ), and thus the
dimension of FK (A) is equal to dim PK (A) = dim φA (LA ) = dim AA , which
gives (8.10).

Corollary 8.1.3 Let K be defined as in (8.5). Let A ∈ K and r = rank(A).


If A is an extreme point of K then
 
r+1
≤ codim A ≤ m. (8.11)
2
In particular, K contains a matrix A whose rank r satisfies

−1 + 8m + 1
r≤ . (8.12)
2
Proof If A is an extreme point of K then dim FK (A) = 0. Then, by (8.10),
we have r+1
2 = codimAA . Now, (8.11) follows from
codimAA = dimhU0T Aj U0 : j ∈ [m]i ≤ dimhAj : j ∈ [m]i = codimA ≤ m.
As K contains no line, K has at least one extreme point. Then, for any
r+1

extreme point A with rank r, the relation 2 ≤ m implies directly (8.12).

Remark 8.1.4 The codimension of the affine space AA can also be ex-
pressed as follows. Given a factorization A = W W T , where W ∈ Rn×r , we
have
codim AA = dimhW T Aj W : j ∈ [m]i.
Indeed, the matrix P = W T U0 D0−1 is nonsingular, since P T P = D0−1 using
the fact that U0T U0 = Ir . Moreover, W P = U0 , and thus
dimhW T Aj W : j ∈ [m]i = dimhP T W T Aj W P : j ∈ [m]i = dimhU0T Aj U0 : j ∈ [m]i
is indeed equal to codimAA .
As an application, for the elliptope K = En , if A ∈ En is the Gram matrix
of vectors {a1 , · · · , an } ⊆ Rk , then codim AA = dimha1 aT T
1 , · · · , an an i.

As an illustration we discuss a bit the geometry of the elliptope En . As a


direct application of Corollary 8.1.3, we obtain the following bound for the
rank of its extreme points:
Corollary 8.1.5 Any extreme point of En has rank r satisfying r+1

2 ≤ n.
This result comes from Grone, Pierce, Watkins [1990], where also the
reverse is shown: there exists an extreme matrix of rank r in En whenever
8.1 Geometry of the positive semidefinite cone 181
r+1

2 ≤ n holds. Geometric properties of the elliptope are further studied
in Laurent, Poljak [1996], where the following facts can be found.
A matrix X ∈ En has rank 1 if and only if it is of the form X = xxT for
some x ∈ {±1}n . Such matrix is also called a cut matrix (since it corresponds
to a cut in the complete graph Kn ). There are 2n−1 distinct cut matrices.
In fact all of them are extreme points of En and any two of them form an
edge (face of dimension 1) of En . While for n ≤ 4, these are the only faces
of dimension 1, the elliptope En for n ≥ 5 has faces of dimension 1 that are
not an edge between two cut matrices. You will see an example in Exercise
10.3.

Figure 8.1 The elliptope E3

Figure 8.1 shows the elliptope E3 (more precisely, its bijective image in R3
obtained by taking the upper triangular part of X). Note the four corners,
which correspond to the four cuts of the graph K3 . All the points on the
boundary of E3 - except those lying on an edge between two of the four
corners – are extreme points. For instance, the matrix
 √ 
1 0 1/√2
A =  0√ 1√ 1/ 2
1/ 2 1/ 2 1
is an extreme point of E3 (check it), with rank r = 2.

8.1.4 Finding an extreme point in a spectrahedron


In order to find a matrix A in a spectrahedron K whose rank satisfies (8.12),
it suffices to find an extreme point A of K. Algorithmically this can be done
as follows.
Suppose we have a matrix A ∈ K with rank r. Observe that A is an
extreme point of K precisely when the linear space LA (in (8.8)) is reduced to
the zero matrix. Assume that A is not an extreme point of K. Pick a nonzero
matrix C in LA , then the matrix B = U0 CU0T is a nonzero perturbation of
182 Euclidean embeddings: Low dimension

A. Hence we have A ± tB  0 for some t > 0. Moreover, at least one of the


following two supremums sup{t > 0 : A + tB  0} and sup{t > 0 : A − tB 
0} is finite, since K contains no line. Say, the first supremum is finite. Then
compute the largest scalar t > 0 for which A + tB  0 (this is a semidefinite
program). Then the matrix A0 = A + tB still belongs to the face FK (A), but
it now lies on its border (by the maximality of t). Therefore, A0 has a larger
kernel: kerA0 ⊃ kerA, and thus a smaller rank: rankA0 ≤ rankA − 1. Then
iterate this procedure, replacing A by A0 , until finding an extreme point of
K.
Therefore, one can find an extreme point of K by solving at most n
semidefinite programs and thus find a matrix in K whose rank satisfies
(8.12). On the other hand, finding the smallest possible rank of a matrix in
K is a hard problem – see Proposition 8.2.4.

8.1.5 A refined bound on ranks of extreme points


The upper bound on the rank of an extreme point from Corollary 8.1.3 is
tight – see Example 8.2.3 below. However, there is one special case where
this bound can be sharpened, as we now explain.
Consider again the affine space A from (8.4) and the spectrahedron in
(8.5) K = S+ n ∩ A. From Corollary 8.1.3, we know that every extreme point

A of K has rank r satisfying


 
r+1
≤ codim A.
2

Hence, r ≤ s + 1 if codim A = s+2



2 . Under some assumptions, Barvinok
(see [2002, §13]) shows that r ≤ s for at least one extreme point of K.

Proposition 8.1.6 Assume K is nonempty and bounded, and codim A =


s+2

2 for some integer s satisfying 1 ≤ s ≤ n − 2. Then there exists A ∈ K
with rank rank A ≤ s.

The proof uses the following topological result.

Theorem 8.1.7 Consider the projective space Pn−1 , consisting of all lines
in Rn passing through the origin, and let Sn−1 be the unit sphere in Rn . For
n ≥ 3 there does not exist a continuous map Φ : Sn−1 → Pn−1 such that
Φ(x) 6= Φ(y) for all distinct x, y ∈ Sn−1 .

In the following lemma we first deal with the case n = s + 2, which


represents the core of the proof of Proposition 8.1.6.
8.1 Geometry of the positive semidefinite cone 183

Lemma 8.1.8 Let n = s + 2 with s ≥ 1 and let A ⊆ S s+2 be an affine


space with codim A = s+2 2 . If K = S+
s+2
∩ A is nonempty and bounded,
then there is a matrix A ∈ K with rank A ≤ s.
s+2
Proof Assume first that A ∩ S++ = ∅. Then A lies in a hyperplane H sup-
s+2
porting a proper face F of S+ . (This can be checked using the separating
theorem from Theorem 1.3.8 (i).) By Proposition 8.1.1, F can be identified
with S+t for some t ≤ s + 1 and thus an extreme point of K has rank at

most t − 1 ≤ s.
s+2 s+2
Suppose now A∩S++ 6= ∅. Then K = FK (A) for any matrix A ∈ A∩S++ .
Using (8.10) and the assumption on codimA, we obtain that
     
s+3 s+3 s+2
dim K = − codim A = − = s + 2.
2 2 2
Hence, K is a (s + 2)-dimensional compact convex set, whose boundary ∂K
is (topologically) the sphere Ss+1 . We now show that the boundary of K
contains a matrix with rank at most s.
Clearly every matrix in ∂K has rank at most s + 1. Suppose for a contra-
diction that no matrix of ∂K has rank at most s. Then, each matrix X ∈ ∂K
has rank s+1 and thus its kernel kerX has dimension 1, it is a line though the
origin. We can define a continuous map Φ from ∂K to Ps+1 in the following
way: For each matrix X ∈ ∂K, its image Φ(X) is the line kerX. The map Φ
is continuous (check it) from Ss+1 to Ps+1 with s + 1 ≥ 2. Hence, applying
Theorem 8.1.7, we deduce that there are two distinct matrices X, X 0 ∈ ∂K
with the same kernel: kerX = kerX 0 . Hence X and X 0 are two distinct
points lying in the same face of K: FK (X) = FK (X 0 ). Then this face has an
extreme point A, whose rank satisfies rankA ≤ rankX − 1 ≤ s.
We can now conclude the proof of Proposition 8.1.6.

Proof (of Proposition 8.1.6). By Corollary 8.1.3 there exists a matrix A ∈


K with rank A ≤ s + 1. Pick a vector space U ⊆ kerA with codim U = s + 2
n as defined in (8.3). By Proposition 8.1.1, there
and consider the face FU of S+
s+2
is a rank-preserving isometry between FU and S+ . Moreover, A ∈ FU ∩ A.
Hence the result follows by applying Lemma 8.1.8.
Example 8.1.9 Consider the three matrices
     
1 0 0 1 1 1
A= , B= , C=
0 −1 1 0 1 0
and the affine space
A = {X ∈ S 2 : hA, Xi = 0, hB, Xi = 0, hC, Xi = 1}.
184 Euclidean embeddings: Low dimension

Then S+2 ∩ A = {I} and thus this set contains no rank 1 matrix. Moreover,

codim A = 3 = s+2

2 with s = 1. This example shows that the condition
n ≥ s + 2 cannot be omitted in Lemma 8.1.8.
Example 8.2.3 below shows that also the assumption that K is bounded
cannot be omitted.

8.2 Applications
8.2.1 Euclidean realizations of graphs
The graph realization problem can be stated as follows. Suppose we are given
a graph G = (V = [n], E) together with nonnegative edge weights w ∈ RE +,
viewed as ‘lengths’ assigned to the edges. We say that (G, w) is d-realizable
if one can place the nodes of G at points v1 , · · · , vn ∈ Rd in such a way that
their Euclidean distances respect the given edge lengths:

There exist v1 , · · · , vn ∈ Rd such that kvi − vj k2 = wij for all {i, j} ∈ E.


(8.13)
(We use here the squares of the Euclidean distances as this will make the
notation a bit easier). Moreover, we say (G, w) is realizable if it is d-realizable
for some d ≥ 1. In dimension 3, the problem of testing d-realizability arises
naturally in robotics or in computational chemistry. In the latter case the
given lengths represent known distances between some pairs of atoms in a
molecule and one wants to reconstruct the full molecule from these partial
data.
Testing whether a weighted graph is realizable amounts to testing feasi-
bility of a semidefinite program:

Lemma 8.2.1 (G, w) is realizable if and only if the semidefinite program

Xii + Xjj − 2Xij = wij for {i, j} ∈ E, X ∈ S n , X  0 (8.14)

has a feasible solution. Moreover, (G, w) is d-realizable if and only if the


system (8.14) has a solution with rank at most d.

Proof If v1 , · · · , vn ∈ Rd is a realization of (G, w), then their Gram matrix


X = (viT vj ) is a solution of (8.14) with rank at most d. Conversely, if X is a
solution of (8.14) with rank d and v1 , · · · , vn ∈ Rd is a Gram decomposition
of X, then the vi ’s form a d-realization of (G, w).

As a direct application of Corollary 8.1.3, any realizable graph (G, w) is


8.2 Applications 185

d-realizable in dimension d satisfying


  p
d+1 −1 + 8|E| + 1
≤ |E|, i.e., d ≤ . (8.15)
2 2
When G = Kn is a complete graph, checking whether (Kn , w) is d-
realizable amounts to checking whether a suitable matrix is positive semidef-
inite and computing its rank:

Lemma 8.2.2 Consider the complete graph G = Kn with edge weights w,


and define the matrix X ∈ S n−1 by
win + wjn − wij
Xii = win for i ∈ [n − 1], Xij = for i 6= j ∈ [n − 1].
2
Then, (Kn , w) is d-realizable if and only if X  0 and rankX ≤ d.

Proof The proof relies on the observation that if a set of vectors v1 , · · · , vn


satisfies (8.13), then one can translate it and thus assume without loss of
generality that vn = 0.

Observe that the result of Lemma 8.2.2 still holds if we only assume that
G is a graph that contains one node adjacent to all other nodes.

Example 8.2.3 Consider the complete graph G = Kn with weights wij = 1


for all edges. Then (Kn , w) is (n−1)-realizable but it is not (n−2)-realizable
(which is easy to check using Lemma 8.2.2).
Hence, the upper bound (8.15) is tight on this example. This shows that the
condition that K is bounded cannot be omitted in Proposition 8.1.6. (Note
that the set of feasible solutions to the program (8.14) is indeed not bounded).

On the other hand, if we fix the dimension d ≥ 1, then deciding whether a


graph (G, w) is d-realizable is a hard problem. Therefore, deciding whether
the semidefinite program (8.14) has a solution of rank at most d is a hard
problem.
We show this for dimension d = 1. Then there is a simple reduction from
the following combinatorial problem, known as the partition problem and
well known to be NP-complete: Decide whether a given sequence of positive
integers a1 , · · · , an can be partitioned into two groups with equal sums, i.e.,
whether there exists  ∈ {±1}n such that 1 a1 + · · · + n an = 0.

Proposition 8.2.4 (Saxe [1979]) Given a graph (G, w) with integer lengths
w ∈ NE , deciding whether (G, w) is 1-realizable is an NP-complete problem,
already when G is restricted to be a circuit.
186 Euclidean embeddings: Low dimension

Proof Let a1 , · · · , an be positive integers, used as an instance of the parti-


tion problem. Consider the circuit G = Cn of length n, with edges {i, i + 1}
for i ∈ [n] (indices taken modulo n). Assign the weight wi,i+1 = ai+1 to edge
{i, i + 1} for i = 1, · · · , n. It is now an easy exercise to show that (Cn , w) is
1-realizable if and only if the sequence (a1 , · · · , an ) can be partitioned.
Indeed, assume that v1 , · · · , vn−1 , vn ∈ R is a 1-realization of (Cn , w).
Without loss of generality we may assume that vn = 0. The condition
wn,1 = a1 = |v1 | implies that v1 = 1 a1 for some 1 ∈ {±1}. Next, for
i = 1, · · · , n − 1, the conditions wi,i+1 = ai+1 = |vi − vi+1 | imply the ex-
istence of 2 , · · · , n ∈ {±1} such that vi+1 = vi + i+1 ai+1 . This implies
0 = vn = 1 a1 + · · · + n an and thus the sequence a1 , · · · , an can be parti-
tioned.
These arguments can be reversed to show the reverse implication.

On the other hand:

Lemma 8.2.5 If a circuit (Cn , w) is realizable, then it is 2-realizable.

You will show this in Exercise 8.1. The following basic geometrical fact
will be useful for the proof.

Lemma 8.2.6 Let u1 , · · · , uk ∈ Rn and v1 , · · · , vk ∈ Rn two sets of vectors


representing the same Euclidean distances, i.e., satisfying

kui − uj k = kvi − vj k for i, j ∈ [k].

Then there exists an orthogonal matrix A ∈ O(n) and a vector a ∈ Rn such


that vi = Aui + a for all i ∈ [k].

Hence what the above shows is that any realizable weighted circuit can
be embedded in the line or in the plane, but deciding which one of these two
possibilities holds is an NP-complete problem!

8.2.2 Hidden convexity results for quadratic maps


We discuss here some classical “hidden convexity” results for quadratic equa-
tions. Such results claim that, in some cases, deciding whether a system of
quadratic equations has a solution can be reformulated as deciding feasibil-
ity of some semidefinite program. In the next section we will consider such
results for systems of quadratic inequalities.
First, as a direct application of Proposition 8.1.3, we obtain the following
result for systems of two quadratic equations.
8.2 Applications 187

Proposition 8.2.7 Consider two matrices A, B ∈ S n and a, b ∈ R. Then


the system of two quadratic equations
n
X n
X
Aij xi xj = a, Bij xi xj = b (8.16)
i,j=1 i,j=1

has a real solution x = (x1 , · · · , xn ) ∈ Rn if and only if the system of two


linear matrix equations
hA, Xi = a, hB, Xi = b (8.17)
has a positive semidefinite solution X  0.
Proof If x is a solution of (8.16), then X = xxT is a solution of (8.17).
Conversely, assume that the system (8.17) has a solution. Applying Corollary
8.1.3, we know that it has a solution X of rank r satisfying r+1 2 ≤ m = 2,
thus with r ≤ 1. Now, if X has rank 1, it can be written in the form X = xxT ,
so that x is a solution of (8.16).
Note that this result does not extend to three equations: The affine space
from Example 8.1.9 contains a positive semidefinite matrix, but none of rank
1.
As we now observe, the above result can be reformulated as claiming that
the image of Rn under a quadratic map into R2 is a convex set.
Proposition 8.2.8 (Dines 1941) Given two matrices A, B ∈ S n , the
image of Rn under the quadratic map q(x) = (xT Ax, xT Bx):
Q = {(xT Ax, xT Bx) : x ∈ Rn }, (8.18)
is a convex set in R2 .
Proof Set
Q0 = {(hA, Xi, hB, Xi) ∈ R2 : X ∈ S+
n
}.
Clearly, Q ⊆ Q0 and the set Q0 is convex (because it is the image of a convex
set under a linear map). Thus it suffices to show equality: Q = Q0 . For this,
let (a, b) ∈ Q0 . Then the system (8.17) has a solution X  0. By Proposition
8.2.7, the system (8.16) too has a solution, and thus (a, b) ∈ Q.
While it is not obvious from its definition that the set Q in (8.18) is
convex, it is obvious from its definition that the above set Q0 is convex.
Then convexity of Q follows from the equality Q = Q0 . This is the reason
why such a result is called a hidden convexity result.
Here is another hidden convexity result, showing that the image of the
188 Euclidean embeddings: Low dimension

unit sphere Sn−1 (n ≥ 3) under a quadratic map in R2 is convex. We show


it using the refined bound from Proposition 8.1.6.
Proposition 8.2.9 (Brickman 1961) Let n ≥ 3 and A, B ∈ S n . Then
the image of the unit sphere under the quadratic map q(x) = (xT Ax, xT Bx):
n
X
C = {(xT Ax, xT Bx) : x2i = 1}
i=1

is a convex set in R2 .
Proof It suffices to show that, if the set
n
K = {X ∈ S+ : hA, Xi = a, hB, Xi = b, Tr(X) = 1}
is not empty then it contains a matrix of rank 1. Define the affine space
A = {X ∈ S n : hA, Xi = a, hB, Xi = b, Tr(X) = 1}.
Then the existence of a matrix of rank 1 in K follows from Corollary 8.1.3 if
codim A ≤ 2, and
 from Proposition 8.1.6 if codim A = 3 (as K is bounded,
s+2
codim A = 3 , n ≥ s + 2 for s = 1).
Observe that the assumption n ≥ 3 cannot be omitted in Proposition
8.2.9. To see it consider the quadratic map q defined using the matrices A
and B from Example 8.1.9. Then, q(1, 0) = (1, 0), q(0, 1) = (−1, 0), but
(0, 0) does not belong to the image of S1 under q.
We conclude with the following application of Proposition 8.2.9, which
shows that the numerical range R(M ) of a complex matrix M ∈ Cn×n is a
convex subset of C (viewed as R2 ). Recall that the numerical range of M is
n
X n
X
∗ n
R(M ) = {z M z = zi Mij zi : z ∈ C , |zi |2 = 1}.
i,j=1 i=1

Proposition 8.2.10 (Toeplitz-Hausdorff ) The numerical range of a


complex matrix is convex.
Proof Write z ∈ Cn as z = x + iy where x, y ∈ Rn , so that i |zi |2 =
P
P 2 2 n×n . Then define
i xi + yi . Moreover write M = A + iB where A, B ∈ R

the quadratic map q(x, y) = (q1 (x, y), q2 (x, y)) by z M z = q1 (x, y)+iq2 (x, y)
with
q1 (x, y) = xT Ax+y T Ay+xT (B T −B)y, q2 (x, y) = xT Bx+y T By+xT (A−AT )y.
Then, the numerical range of M is the image of the unit sphere S2n−1 under
the map q, and the result follows from Proposition 8.2.9.
8.2 Applications 189

8.2.3 The S-Lemma


In the preceding section we dealt with systems of quadratic equations. We
now discuss systems of quadratic inequalities.
Recall Farkas’ lemma for linear programming: If a system of linear in-
equalities:
 T
 a1 x ≤ b1

..
 .
 T
am x ≤ bm

implies the linear inequality cT x ≤ d, then there exist nonnegative scalars


λ1 , · · · , λm ≥ 0 such that c = λ1 a1 + · · · + λm am and λ1 b1 + · · · + λm bm ≤ d.
This type of inference rules does not extend to general nonlinear inequali-
ties. However such an extension does hold in the case of quadratic polynomi-
als, in the special case m = 1 (and under some strict feasibility assumption).

Theorem 8.2.11 (The homogeneous S-lemma) Given matrices A, B ∈


S n , assume that xT Ax > 0 for some x ∈ Rn . The following assertions are
equivalent.

(i) {x ∈ Rn : xT Ax ≥ 0} ⊆ {x ∈ Rn : xT Bx ≥ 0}.
(ii) There exists a scalar λ ≥ 0 such that B − λA  0.

Proof The implication (ii) =⇒ (i) is obvious. Now, assume (i) holds, we
show (ii). For this consider the semidefinite program:

inf{hB, Xi : hA, Xi ≥ 0, Tr(X) = 1, X  0} (P)

and its dual:


sup{y : B − zA − yI  0, z ≥ 0}. (D)

First we show that (P) is strictly feasible. By assumption, there exists a


vector x for which xT Ax > 0. Then the matrix X = xxT /kxk2 is a strictly
feasible solution for (P).
Next we show that the optimum value of (P) is nonnegative. For this,
consider a feasible solution X0 of (P) and consider the set
n
K = {X ∈ S+ : hA, Xi = hA, X0 i, hB, Xi = hB, X0 i}.

As K 6= ∅, applying Corollary 8.1.3, there is a matrix X ∈ K with rank 1.


Say X = xxT . Then, xT Ax = hA, X0 i ≥ 0 which, by assumption (i), implies
xT Bx ≥ 0, and thus hB, X0 i = xT Bx ≥ 0.
190 Euclidean embeddings: Low dimension

As (P) is bounded and strictly feasible, applying the strong duality the-
orem, we deduce that there is no duality gap and that the dual prob-
lem has an optimal solution (y, z) with y, z ≥ 0. Therefore, B − zA =
(B − zA − yI) + yI  0, thus showing (ii).

This extends to non-homogeneous quadratic polynomials (Exercise 8.5):


Theorem 8.2.12 (The non-homogeneous S-lemma)
Let f (x) = xT Ax + 2aT x + α and g(x) = xT Bx + 2bT x + β be two quadratic
polynomials where A, B ∈ S n , a, b ∈ Rn and α, β ∈ R. Assume that f (x) > 0
for some x ∈ Rn . The following assertions are equivalent.

(i) {x ∈ Rn : f (x) ≥ 0} ⊆ {x ∈ Rn : g(x) ≥ 0}.


β bT α aT
   
(ii) There exists a scalar λ ≥ 0 such that −λ  0.
b B a A
(iii) There exist a scalar λ ≥ 0 and a polynomial h(x) which is a sum of
squares of polynomials such that g = λf + h.

8.3 Notes and further reading


Part of the material in this chapter can be found in the book of Barvinok
[2002]. In particular, the refined bound (from Section 8.1.5) on the rank
of extreme points of a spectrahedron is due to Barvinok. The description
of the faces of spectrahedra and the results in in Section 8.1 can be found
in Barvinok [2002], Deza, Laurent [1997], Pataki [2000]. For more details
about the geometry of general spectrahedra we refer, e.g., to the overview
by Pataki [2000] and details about the geometry of the elliptope En can
be found in Deza, Laurent [1997], Laurent, Poljak [1996]. For instance, all
possible dimensions for the faces of En are known, as well as for its polyhedral
faces. It is also known that the vertices of En correspond precisely to the cuts
of the graph Kn .
The structure of the d-realizable graphs has been studied by Belk and
Connelly [2007]. It turns out that the class of d-realizable graphs is closed
under taking minors, and thus it can be characterized by finitely many
forbidden minors. For d ≤ 3 the forbidden minors are known: A graph G
is 1-realizable if and only if it is a forest (no K3 -minor), G is 2-realizable if
and only if it has no K4 -minor, and G is 3-realizable if and only if it does
not contain K5 and K2,2,2 as a minor. (You will show some partial results
in Exercise 8.1.)
The S-lemma dates back to work of Jakubovich in the 1970s in con-
trol theory. There is a rich history and many links to classical results about
Exercises 191

quadratic systems of (in)equations (including the results of Dines and Brick-


man presented here), this is nicely exposed in the survey of Polik and Terlaky
[2007].

Exercises
8.1 A graph G is said to be d-realizable if, for any edge weights w, (G, w) is
d-realizable whenever it is realizable. For instance, the complete graph
Kn is (n − 1)-realizable, but not (n − 2)-realizable (Example 8.2.3).
(a) Given two graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) such that V1 ∩ V2
is a clique in G1 and G2 , their clique sum is the graph G = (V1 ∪
V2 , E1 ∪ E2 ).
Show: If G1 is d1 -realizable and G2 is d2 -realizable, then G is d-
realizable where d = max{d1 , d2 }.
(b) Given a graph G = (V, E) and an edge e ∈ E, G\e = (V, E \ {e})
denotes the graph obtained by deleting the edge e in G.
Show: If G is d-realizable, then G\e is d-realizable.
(c) Given a graph G = (V, E) and an edge e = {i1 , i2 } ∈ E, G/e
denotes the graph obtained by contracting the edge e in G, which
means: Identify the two nodes i1 and i2 , i.e., replace them by a new
node, called i0 , and replace any edge {i1 , j} ∈ E by {i0 , j} and any
edge {i2 , j} ∈ E by {i0 , j}.
Show: If G is d-realizable, then G/e is d-realizable.
(d) Show that the circuit Cn is 2-realizable, but not 1-realizable.
(e) Show that G is 1-realizable if and only if G is a forest (i.e., a disjoint
union of trees).
(f) Show that K2,2,2 is 4-realizable.
Recall that a minor of G is any graph that can be obtained from G
by deleting and contracting edges and by deleting nodes. So the above
shows that if G is d-realizable then any minor of G is d-realizable.
Belk and Connelly [2007] show that K2,2,2 is not 3-realizable, and
that a graph G is 3-realizable if and only if G has no K5 and K2,2,2
minor. (The ‘if part’ requires quite some work.)
8.2 Let A, B, C ∈ S n and let
Q = {q(x) = (xT Ax, xT Bx, xT Cx) : x ∈ Rn } ⊆ R3
denote the image of Rn under the quadratic map q. Assume that n ≥ 3
and that there exist α, β, γ ∈ R such that αA + βB + γC  0.
Show that the set Q is convex.
192 Euclidean embeddings: Low dimension

Hint: Use Proposition 8.1.6.


8.3 (a) Consider the matrices J (the all-ones matrix) and X = xxT , where
x ∈ {±1}n is distinct from the all-ones vector.
Show that the segment F = [J, X] is a face of the elliptope En .
(b) Consider the matrix
√ √ 
1 0 0 1/√2 1/ 2

 0 1 0 1/ 2 0√ 
 
A =  0√ 0√ 1 0 1/ 2  ∈ E5 .

1/ 2 1/ 2 0 1 1/2 
√ √
1/ 2 0 1/ 2 1/2 1
What is the dimension of the face FE5 (A)?
What are its extreme points?
8.4 Let p be a polynomial in two variables and with even degree d.
Show: If p can be written as a sum of squares, then it can be written
as a sum of at most d + 1 squares.
NB: For d = 4, Hilbert has shown that p can be written as a sum of at
most three squares, but this is a difficult result.
8.5 Show the result of Theorem 8.2.12.
9
Euclidean embeddings: Low distortion (Version:
May 24, 2022)

9.1 Motivation
We start the chapter by giving the definition of its central contender:

Definition 9.1.1 A finite metric space is a pair (X, d) where X is a finite


set and where the function d : X × X → R defines a metric: For all x, y, z ∈
X we have

(i) non-negativity: d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y,


(ii) symmetry: d(x, y) = d(y, x),
(iii) triangle inequality: d(x, z) ≤ d(x, y) + d(y, z).

One important example of a finite metric space is the shortest path met-
ric of a finite connected graph G = (V, E). There we measure the distance
d(x, y) between two vertices x and y by the length of a shortest path con-
necting x and y. Here the length of a path equals the number of its edges.

In many applications one has to deal with data sets which come equipped
with a natural metric, for example with a similarity measure. However, a
priori these metric spaces do not have a lot of structure. In contrast, Eu-
clidean spaces are metric spaces which have a lot of additional structure.
They are linear spaces having a vector space structure. They are normed
spaces, the norm defines the Euclidean metric. The normed space is com-
plete as every Cauchy sequence has a limit. The parallelogram law (A.2)
kv + wk2 + kv − wk2 = 2kvk2 + 2kwk2 holds. So one natural idea is to embed
the data set and its metric into a Euclidean space. Then one can use existing
geometric algorithms like clustering to work with the data set. Low dimen-
sional Euclidean embeddings are especially useful if one wants to visualize
the data set.
194 Euclidean embeddings: Low distortion (Version: May 24, 2022)

Definition 9.1.2 A Euclidean embedding f : X → Rn is an injective map


from X to n-dimensional Euclidean space Rn .
Ideally, we want to embed X isometrically into Euclidean space Rn , so
that for all x, y ∈ X we have
n
!1/2
X
2
d(x, y) = kf (x) − f (y)k = (f (x)i − f (y)i ) ,
i=1

where f (x)i denotes the i-th component of the vector f (x) ∈ Rn .


There are two problems with embeddings into Euclidean spaces: If we
insist on finding an embedding with a fixed dimension n, independent of
the cardinality of X, then finding such an embedding is a semidefinite op-
timization problem with an additional rank constraint. If we relax the rank
constraint, then we are dealing with a semidefinite feasibility problem. How-
ever, in general it will not be feasible as for instance the simple star graph
on four vertices shows, see Figure 9.1.

x3

x4 x2

x1

Figure 9.1 The star graph on four vertices given by X = {x1 , x2 , x3 , x4 }


with d(x1 , x4 ) = d(x2 , x4 ) = d(x3 , x4 ) = 1, and d(xi , xj ) = 2, otherwise,
having vertex x4 in its center. To embed (X, d) isometrically into Euclidean
space, one needs that each of the triplets {x1 , x2 , x4 }, {x1 , x3 , x4 } and
{x2 , x3 , x4 } lie on a single line, which is impossible. The star graph on
four vertices does not embed isometrically into Euclidean space.

If there are finite metric spaces which cannot be isometrically embedded


into Euclidean space we ask: How close is (X, d) to a Euclidean space? In this
chapter we propose to measure this closeness by using Euclidean embeddings
having low distortion.
Definition 9.1.3 Let (X, d) be a finite metric space and let f : X → Rn
be an embedding into Euclidean space. We define the expansion, contraction
and the distortion of f by
 
kf (x) − f (y)k
expansion(f ) = max : x, y ∈ X, x 6= y ,
d(x, y)
9.1 Motivation 195
 
d(x, y)
contraction(f ) = max : x, y ∈ X, x 6= y ,
kf (x) − f (y)k

distortion(f ) = expansion(f ) · contraction(f ).


It is quite easy to constructpa Euclidean embedding of (X, d) into Eu-
clidean space with distortion |X|. For this we first consider the Fréchet
embedding of X into the normed space RX equipped with the supremum
norm kvk∞ = sup{|vx | : x ∈ X}. We define the embedding f : X → RX
componentwise by f (x)y = d(x, y) with y ∈ X. This embedding is isometric
because for all x, y ∈ X we have d(x, y) = kf (x) − f (y)k∞ because
kf (x) − f (y)k∞ = sup{|d(x, z) − d(y, z)| : z ∈ X} ≤ d(x, y)
by the triangle inequality, and
kf (x) − f (y)k∞ ≥ |f (x)y − f (y)y | = |d(x, y) − 0| = d(x, y).
In the next step we consider RX equipped with thepEuclidean norm k · k.
We see that the embedding f has distortion at most |X| because for every
v ∈ RX we have
1
p kvk∞ ≤ kvk ≤ kvk∞ .
|X|
p
But we can do substantially better than |X| as we discuss now.
Definition 9.1.4 The least (Euclidean) distortion of (X, d) is given by
c2 (X, d) = min distortion(f ).
f :X→R|X|

When (X, d) is the shortest path metric of a graph G we sometimes write


c2 (G) instead of c2 (X, d).
Notice that in the definition of c2 (X, d) it is not necessary to allow embed-
dings into Euclidean spaces of dimension larger than |X|. Indeed, for every
Euclidean embedding f : X → Rn we can restrict to the Euclidean space
spanned by the points f (x), with x ∈ X, which has dimension at most |X|.
By a fundamental result of Bourgain [1985] it is known that every fi-
nite metric space embeds into Euclidean space with distortion much bet-
ter than the Fréchet embedding. Bourgain’s theorem states that any finite
metric space (X, d) can be embedded into Euclidean space with distortion
c2 (X, d) = O(log |X|). For comparison, the least Euclideanpdistortion is
known for specific graph classes. For example c2 (X, d) = Θ( log |X|) for
cubes (see Section 9.3) or for planar graph (see Section 9.6), or c2 (X, d) =
196 Euclidean embeddings: Low distortion (Version: May 24, 2022)
p
Θ( log log |X|) for complete binary trees (see Section 9.6). For expander
graphs Bourgain’s theorem is tight c2 (X, d) = Θ(log |X|) (see Section 9.5).
In this chapter we show that finding a least distortion embedding into
Euclidean space can be done via semidefinite optimization. Using duality
theory one finds proofs for lower bounds for c2 (X, d). We will treat the
cases of regular cubes, strongly regular graphs, and expander graphs by this
approach.

9.2 Least distortion Euclidean embeddings via semidefinite


optimization
Let (X, d) be a finite metric space with X = {x1 , . . . , xn }. Then we can find
a Euclidean embedding of X which minimizes the distortion by solving a
semidefinite optimization problem.
For this let f : X → Rn be an embedding. The distortion of f is invariant
under scaling, that is we have
distortion(f ) = distortion(αf ) for every α > 0.
So we can assume, by scaling, that contraction(f ) = 1. Then we have to min-
imize expansion(f ) to minimize the distortion. The following minimization
problem does exactly this:
c2 (X, d) = min γ
γ ∈ R, f : X → Rn ,
d(xi , xj ) ≤ kf (xi ) − f (xj )k ≤ γd(xi , xj ) for 1 ≤ i < j ≤ n.
The minimum
p exists because of a compactness argument. We know that
c2 (X, d) ≤ |X| by the Fréchet embedding. By translation we can assume
that f (x1 ) = 0 so that each vector f (xi ) of a potential optimal Euclidean
embedding lies in the compact ball of radius
p
|X| max{d(x1 , xi ) : i = 2, . . . , n}
around the origin. Now we intersect these balls with the closed constraints
d(xi , xj ) ≤ kf (xi ) − f (xj )k,
and minimize the continuous function
 
kf (xi ) − f (xj )k
max :1≤i<j≤n .
d(xi , xj )
As usual it is more convenient to work with the squares of the norms.
9.2 Least distortion Euclidean embeddings via semidefinite optimization 197

Then we can consider the inner product matrix Z = (f (xi )T f (xj ))1≤i,j≤n ,
which is positive semidefinite. Also note that

kf (xi ) − f (xj )k2 = Zii − 2Zij + Zjj = hei eT T T T


i − ei ej − ej ei + ej ej , Zi.

We define the matrices

Fij = ei eT T T T
i − ei ej − ej ei + ej ej for 1 ≤ i < j ≤ n.

Thus the following semidefinite optimization problem computes the square


of c2 (X, d):
c2 (X, d)2 = min τ
n
τ ∈ R, Z ∈ S+ ,
(9.1)
h−Fij , Zi ≤ −d(xi , xj )2 ,
hFij , Zi ≤ τ d(xi , xj )2 for 1 ≤ i < j ≤ n.

Here τ = γ 2 and we can recover an optimal embedding f from an optimal


solution Z by applying the Cholesky factorization.
This shows how to compute an optimal Euclidean embedding of a finite
metric space. Another benefit of this formulation is that we can apply duality
theory. Then the dual maximization problem will play a key role to deter-
mine lower bounds for c2 (X, d) for specific graph classes. By using strong
duality we arrive at the following theorem.
Theorem 9.2.1 The least distortion of a finite metric space (X, d), with
X = {x1 , . . . , xn }, into Euclidean space is given by
( Pn 2
)
2 i,j=1:Yij >0 Yij d(xi , xj ) n
c2 (X, d) = max : Y ∈ S+ , Y e = 0 . (9.2)
− ni,j=1:Yij <0 Yij d(xi , xj )2
P

The condition Y e = 0 says that the all-ones vector e lies in the kernel of Y .
We would like to point out a technical difficulty before giving the proof
of the theorem. The dual of (9.1) reads
X
sup αij d(xi , xj )2
i,j

αij , βij ≥ 0 for 1 ≤ i < j ≤ n,


(9.3)
X X
n
− αij Fij + βij Fij ∈ S+ ,
i,j i,j
X
βij d(xi , xj )2 ≤ 1.
i,j
198 Euclidean embeddings: Low distortion (Version: May 24, 2022)

As always one can prove weak duality without any problem: For feasible
solutions of (9.1) and (9.3) we verify
X X X
τ− αij d(xi , xj )2 ≥ τ βij d(xi , xj )2 − αij hFij , Zi
i,j i,j i,j
X X
≥ βij hFij , Zi − αij hFij , Zi
i,j i,j
(9.4)
* +
X X
= − αij Fij + βij Fij , Z ≥ 0.
i,j i,j

To derive strong duality in (9.2) one is tempted to directly apply The-


orem 2.4.1 about conic duality, by verifying that there are strictly feasi-
P
ble points for the primal and the dual, and then set Y = − i,j αij Fij +
P
i,j βij Fij ; recall that the nonzero, nondiagonal elements of Fij are equal
to −1. But there is a problem in the dual: All matrices Fij have the all-ones
vector e in their kernel, so the dual does not have a strictly feasible solu-
tion. In the proof of the theorem we work around this by employing linear
programming duality.

Proof Fix the parameter τ and consider the polyhedron

Pτ = {Z ∈ S n : h−Fij , Zi ≤ −d(xi , xj )2 ,
hFij , Zi ≤ τ d(xi , xj )2 for 1 ≤ i < j ≤ n}.

This polyhedron is not empty because for every pair of distinct indices i, j
we can choose either Zij = − 21 d(xi , xj )2 or Zij = − 12 τ d(xi , xj )2 whereas the
diagonal elements Zii are all zero.
Suppose the intersection of the interior of Pτ and the interior of S+ n is

empty which implies c2 (X, d)2 ≥ τ . Now we shall construct a feasible so-
lution of the dual with objective at least τ . For this we use a separating
hyperplane, separating Pτ from S+ n . By Theorem A.4.5 there is a nonzero
n
matrix A ∈ S so that
n
sup{hA, Zi : Z ∈ Pτ } ≤ inf{hA, Zi : Z ∈ S+ }.

Because 0 ∈ S+ n the infimum is at most 0. If it would be strictly less than

0, we could make it −∞ by scaling which contradicts Pτ 6= ∅. Hence, the


infimum equals 0 and hA, Zi ≥ 0 for all Z ∈ S+ n . Thus, A ∈ S n because the
+
cone S+ n is selfdual.

It follows that hA, Zi ≤ 0 for all Z ∈ Pτ which means that the inequality
hA, Zi ≤ 0 is implied by the inequalities defining the polyhedron Pτ . We
9.2 Least distortion Euclidean embeddings via semidefinite optimization 199

apply linear programming duality now. We have


0 ≥ max{hA, Zi : Z ∈ Pτ }
n X X
= min τ βij d(xi , xj )2 − αij d(xi , xj )2 :
i,j i,j
X X o
αij , βij ≥ 0, βij Fij − αij Fij = A .
i,j i,j

We already saw that Pτ 6= ∅ and also the minimization problem has a feasible
solution. For suppose not, then by Farkas’ lemma, Theorem 2.6.1, there is a
matrix Z 0 with
hFij , Z 0 i ≤ 0, −hFij , Z 0 i ≤ 0 and hA, Z 0 i > 0.
So we could add a positive multiple of Z 0 to any feasible solution of the
maximization problem, thereby increasing the objective value by the same
positive multiple of hA, Z 0 i which finally contradicts that the maximum is
at most 0. Thus the feasible sets of the maximization and the minimization
problems are both not empty and we can indeed claim by linear program-
ming duality that both problems have a common optimal value δ. Then
δ ≤ 0 and the inequality hA, Zi ≤ δ is a nonnegative linear combination of
the inequalities defining Pτ .
Consider this representation
X X
A= βij Fij − αij Fij . (9.5)
i,j i,j

Now we may choose αij , βij to have disjoint support because for an optimum
solution Z ∗ of the maximization problem it can only happen that
hFij , Z ∗ i = τ d(xi , xj )2 or − hFij , Z ∗ i = −d(xi , xj )2 ,
but never both; unless of course τ = 1 which is not interesting. So we
can eliminate in the maximization problem all the inequalities with are not
sharp.
Define
 1
2
− 2 τ d(xi , xj )
 if βij > 0,
1
Zij = − 2 d(xi , xj ) 2 if αij < 0,

0 otherwise.

So Z ∈ Pτ and hence hA, Zi ≤ 0 which is equivalent to


* +
X X X X
0≥ βij Fij − αij Fij , Z = γ βij d(xi , xj )2 − αij d(xi , xj )2
i,j i,j i,j i,j
200 Euclidean embeddings: Low distortion (Version: May 24, 2022)

which again is equivalent to


αij d(xi , xj )2
P
Pi,j ≥ τ.
i,j βij d(xi , xj )2

This yields (9.2) by setting Y = A. By scaling of A in (9.5) so that the


coefficients βij satisfy i,j βij d(xi , xj )2 = 1 this also yields that equality
P

between (9.1) and (9.3) holds.

In the next sections we will use this theorem to find lower bounds for
least distortion Euclidean embeddings of several graphs. For this one has
to construct a matrix Y which sometimes appears to come out of the blue.
By complementary slackness, which is the same as analyzing the case of
equality in the proof of weak duality (9.4), we get hints where to search for
an appropriate matrix Y .
Corollary 9.2.2 If Y is an optimal solution of the maximization prob-
lem (9.2), then Yij > 0 only for the most contracted pairs. These are pairs
d(xi ,xj )
(xi , xj ) for which kf (xi )−f (xj )} is maximized. Similarly, then Yij < 0 only for
kf (xi )−f (xj )k
the most expanded pairs, maximizing d(xi ,xj ) .
For graphs most expanded pairs are simply adjacent vertices:
Lemma 9.2.3 Let (X, d) be a finite metric space which is defined by the
shortest path metric of a graph. Let f : X → Rn be a Euclidean embedding
of X. Then the most expanded pairs are always adjacent vertices.

Proof For a pair (x, y) let x = x0 , x1 , . . . , xk = y be a shortest path from


x to y in the graph so that d(x, y) = k. By the triangle inequality we have
kf (x) − f (y)k max{kf (xi−1 ) − f (xi )k : i = 1, . . . , k}
≤ ,
d(x, y) k
which is maximized when k = 1.

9.3 Least distortion embeddings of cubes


To warm up we consider the graph of the r-dimensional unit cube Qr =
(Vr , Er ) with vertex set Vr = {0, 1}r . Here two vertices are adjacent when-
ever their Euclidean distance equals 1. The standard Euclidean embedding

has distortion r. In fact, as the following theorem shows, one cannot im-
prove it.

Theorem 9.3.1 c2 (Qr ) = r.
9.3 Least distortion embeddings of cubes 201

1101 1111

1001 1011

0101 0111

0001 0011

0100 0110

0000 0010

1100 1110

1000 1010

Figure 9.2 The regular cube graph Q4 .

Proof Define the matrix Y ∈ RVr ×Vr by





 −1 if d(x, y) = 1,

r − 1 if x = y,
Yxy =


 1 if d(x, y) = r,

0 otherwise,

so that it will turn out that in a least distortion embedding of Qr the most
expanded pairs are coming from all adjacent pairs of vertices and the most
contracted pairs are coming from all pairs of vertices which are diametrically
opposite.
The matrix Y satisfies the properties of Theorem 9.2.1. We clearly have
Y e = 0 because every row of Y sums to

r(−1) + (r − 1) + 1 = 0.

We verify that Y is positive semidefinite by computing the eigenvalues.


202 Euclidean embeddings: Low distortion (Version: May 24, 2022)
T
For u ∈ Vr componentwise define the vector χu ∈ RVr by [χu ]x = (−1)u x .
The vectors form a basis of RVr and are eigenvectors of Y as a calculation
shows
X
[Y χu ]x = Yxy [χu ]y
y∈Vr
Ty
X
= Yxy (−1)u
y∈Vr
Tx T (y−x)
X
= (−1)u Yxy (−1)u
y∈Vr
r
!
X T T T
Pr
u
= [χ ]x (−1) (−1)u e` + (r − 1)(−1)u 0 + 1 · (−1)u ( `=1 e` ) ,
`=1

where e` is the `-th standard basis vector of Rr . We see that the second
factor does not depend on x, so χu is indeed an eigenvector of Y . To see
that the eigenvalues are all nonnegative we consider the possible values of
T
the first summand (−1) r`=1 (−1)u e` which are −r, −r + 2, . . . The second
P

summand equals r − 1 and the third summand is either +1 or −1 depending


on the parity of u. So we only have to look at the case when the first
summand is −r. But then u = 0 and the last sum is +1 and the entire sum
is nonnegative.
To complete the proof we only have to evaluate the objective value of Y :
The numerator equals
X
Yxy d(x, y)2 = 2r r2
x,y:Yxy >0

and the denominator is


X
− Yxy d(x, y)2 = 2r r,
x,y:Yxy <0

so,
2r r 2
c2 (Qr )2 ≥ = r.
2r r

9.4 Least distortion embeddings of strongly regular graphs


The next class of graphs we will consider are strongly regular graphs.
Let G = (V, E) be an undirected graph. We denote the neighborhood of a
vertex x ∈ V by
N (x) = {y ∈ V : {x, y} ∈ E}.
9.4 Least distortion embeddings of strongly regular graphs 203

Definition 9.4.1 A graph G = (V, E) is called strongly regular with pa-


rameters (v, k, λ, µ) if

(i) G is neither complete nor edgeless,


(ii) v = |V |,
(iii) for every pair of vertices x, y ∈ V we have

k if x = y,

|N (x) ∩ N (y)| = λ if {x, y} ∈ E,

µ otherwise.

Figure 9.3 Examples of strongly regular graphs include the square graph
with parameters (4, 2, 0, 2), the pentagon graph with parameters (5, 2, 0, 1),
or the Petersen graph with parameters (10, 3, 0, 1).

Next to this combinatorial definition of strongly regular graphs we can


also characterize them through the eigenvalues of their adjacency matrices.
The adjacency matrix of G = (V, E) is the 0/1-square matrix A ∈ {0, 1}V ×V
with entries
(
1 if {x, y} ∈ E,
Axy =
0 otherwise.

If G is a strongly regular graph with parameters (v, k, λ, µ) then its adjacency


matrix A satisfies the relation

A2 = (λ − µ)A + (k − µ)I + µJ (9.6)

since the xy-entry of A2 counts the number of walks from x to y of length


2 which is equal to |N (x) ∩ N (y)|.

Lemma 9.4.2 The adjacency matrix A of a strongly regular graph with


204 Euclidean embeddings: Low distortion (Version: May 24, 2022)

parameters (v, k, λ, µ) has eigenvalues:


k,
s 2
λ−µ λ−µ
r= + + (k − µ),
2 2
s 2
λ−µ λ−µ
s= − + (k − µ).
2 2
Proof We have Ae = ke. Let u be an eigenvector of A for the real eigenvalue
z which is orthogonal to e. By (9.6) this eigenvector satisfies the relation
z 2 u = A2 u = ((λ − µ)z + (k − µ)) u.
The eigenvalue z therefore satisfies the quadratic equation
z 2 − (λ − µ)z − (k − µ) = 0
with solution s 2
λ−µ λ−µ
z= ± + (k − µ).
2 2
Example 9.4.3 (Example of Figure 9.4 continued) The eigenvalues of the
adjacency matrix of the square graph (4, 2, 0, 2) are
k = 2, r = 0, s = −2,
the eigenvalues of the adjacency matrix of the pentagon graph (5, 2, 0, 1) are
√ √
−1 + 5 −1 − 5
k = 2, r = , s= ,
2 2
and the eigenvalues of the adjacency matrix of the Petersen graph (10, 3, 0, 1)
are
k = 3, r = 1, s = −2.
The parameters λ and µ are integers and satisfy the bounds
0≤λ≤k−1 and 0 ≤ µ ≤ k.
Hence,
k ≥ r ≥ 0 > s,
where we have k = r if and only if λ = k − 1 and µ = 0. The graph
is disconnected in this case as the following lemma shows. Otherwise, the
graph is connected, has diameter 2, and its adjacency matrix has exactly
three different eigenvalues.
9.4 Least distortion embeddings of strongly regular graphs 205

Lemma 9.4.4 The G be a strongly regular graph with parameters (v, k, λ, µ).
Then the following statements are equivalent:
(i) G is not connected.
(ii) µ = 0.
(iii) λ = k − 1.
(iv) G is isomorphic to m > 1 copies of the complete graph Kk+1 on k + 1
vertices.
Proof Exercise 9.6
If (X, d) is a finite metric space defined by the shortest path metric of a
connected strongly regular graph, then distinct pairs of points are either at
distance 1 or at distance 2. It will turn out that there exists a least distortion
embedding for which all pairs of points at distance 1 are most expanded pairs
and all pairs of points at distance 2 are most contracted pairs.
Definition 9.4.5 Let (X, d) be a finite metric space. We say that an em-
bedding f : X → Rn of X into Euclidean space is faithful if for every two
pairs of points (x, y) and (x0 , y 0 ) ∈ X × X we have
d(x, y) = d(x0 , y 0 ) =⇒ kf (x) − f (y)k = kf (x0 ) − f (y 0 )k.
Lemma 9.4.6 For every connected strongly regular graph G = (V, E) there
exists a faithful embedding into Euclidean space with minimal distortion.
Proof Let Z ∈ S+ V be the Gram matrix of an embedding f : V → Rn .

Suppose f satisfies the inequality


d(x, y)2 ≤ hFxy , Zi ≤ τ d(x, y)2

for all x, y ∈ V so that the distortion of f is at most τ .
We will use Z to construct a faithful embedding f¯ which also has distortion

at most τ . For this we consider the matrix algebra A generated by the
adjacency matrix A, and by I, and J.
By (9.6) the algebra A has dimension 3 and it is contained in the space of
symmetric matrices. When G is connected it has two interesting orthogonal
basis, one combinatorial and one spectral.
The combinatorial basis is defined by the i-th adjacency matrix, with
i = 0, 1, 2,
(
1 if d(x, y) = i,
[Ai ]xy =
0 otherwise.
Clearly, A0 = I, A1 = A, and A2 = J − I − A. We have orthogonality
hAi , Aj i = 0 for i 6= j because the support of Ai and of Aj are disjoint.
206 Euclidean embeddings: Low distortion (Version: May 24, 2022)

Let k > r > s be the eigenvalues of A. Let u be a normal eigenvector


of the eigenspace of A for the eigenvalue k, let v1 , . . . , vf be pairwise or-
thonormal vectors spanning the eigenspace of A for the eigenvalue r, and
let w1 , . . . , wg be pairwise orthonormal vectors spanning the eigenspace of
A for the eigenvalue s. Define the spectral basis
f
X g
X
T
E0 = uu , E1 = vi viT , E2 = wj wjT .
i=1 j=1

Clearly,

A = kE0 + rE1 + sE2 , I = E0 + E1 + E2 , J = vE0 ,

and hEi , Ej i = 0 for i 6= j and hEi , Ei i = 1 because u, v1 , . . . , vf , w1 , . . . , wg


form an orthonormal basis of RV . It is also clear from the definition that
the matrices E0 , E1 , E2 are positive semidefinite.
When we project the matrix Z orthogonally onto the algebra A we get
the positive semidefinite matrix

Z̄ = hE0 , ZiE0 + hE1 , ZiE1 + hE2 , ZiE2 .

We shall verify that Z̄ defines a faithful Euclidean embedding f¯ with dis-


tortion at most τ . To do so we represent Z̄ in terms of the combinatorial
basis
hA0 , Zi hA1 , Zi hA2 , Zi
Z̄ = A0 + A1 + A2 ,
hA0 , A0 i hA1 , A1 i hA2 , A2 i
and because the entries
hAd(x,y) , Zi
Z̄xy = A .
hAd(x,y) , Ad(x,y) i d(x,y)

only depend on the distance d(x, y) it follows that f¯ is faithful. We set for
i = 0, 1, 2
Mi = {(x, y) ∈ X × X : d(x, y) = i},

and get
1 X
Z̄xy = Z x0 y 0 .
|Md(x,y) |
(x0 ,y 0 )∈M d(x,y)

Since
1 X
d(x, y)2 = d(x0 , y 0 )2
|Md(x,y) |
(x0 ,y 0 )∈Md(x,y)
9.4 Least distortion embeddings of strongly regular graphs 207

we see that the distortion of f¯ is at most τ :
1 X
d(x, y)2 ≤ hFx0 y0 , Zx0 y0 i ≤ τ d(x, y)2 .
|Md(x,y) | 0 0
(x ,y )∈Md(x,y)

Theorem 9.4.7 Let G = (V, E) be a connected, strongly regular graph


with parameters (v, k, λ, µ) having eigenvalues k > r ≥ 0 > s. Then
s
(v − k − 1)(k − r)
c2 (G) = 2 .
(v − k + r)k

Proof By the previous lemma there exists a least distortion Euclidean em-
bedding of G which is faithful. For this embedding the most expanded pairs
are all pairs of adjacent vertices and the most contracted pairs are all pairs
of nonadjacent vertices. So we may assume that for the maximization prob-
lem (9.2) there is a matrix attaining the maximum which is of the form

Yα = (k − α(v − k − 1))A0 − A1 + αA2 ,

where we used the i-th adjacency matrix Ai introduced in the proof of the
previous lemma. We have
Yα e = (k − α(v − k − 1))A0 e − A1 e + αA2 e
= ((k − α(v − k − 1)) − k + α(v − k − 1))e = 0.
The numerator of the objective function equals
X
22 [Yα ]xy = 4αv(v − k − 1)
x,y:Yxy >0

and the denominator of the objective function


X
− 12 [Yα ]xy = vk,
x,y:Yxy <0

together
 
2 α(v − k − 1)
c2 (G) = max 4 : α ∈ R, Yα is positive semidefinite ,
k
so the only thing left is to maximize α so that the matrix Yα is positive
semidefinite. The eigenvectors of A and of Yα coincide by the construction
of Yα . So we only need to ensure that the corresponding eigenvalues given
by
Yα vi = ((k − α(v − k − 1)) − r − α(r + 1))vi , i = 1, . . . , f,
208 Euclidean embeddings: Low distortion (Version: May 24, 2022)

and similarly by

Yα wj = ((k − α(v − k − 1)) − s − α(s + 1)wj , j = 1, . . . , g,

are nonnegative. Hence,

k − r ≥ α(v − k + r) and k − s ≥ α(v − k + s).

Since k−s > 0 the second condition gives a constraint for α only if v−k+s >
0. But then this constraint is dominated by
k−r
α≤
v−k+r
because
k−r k−s k−r v−k+r
≤ ⇐= <1< ,
v−k+r v−k+s k−s v−k+s
k−r
as r > s, so the maximum possible α equals v−k+r which finishes the proof.

Example 9.4.8 (Example of Figure 9.4 further continued) The least dis-
tortion of the square graph equals
s
(4 − 2 − 1)(2 − 0) √
2 = 2,
(4 − 2 + 0)2

the least distortion of the pentagon graph equals


v √
u (5 − 2 − 1)(2 − −1+ 5 ) √
u
2t √ 2 = 5 − 1,
(5 − 2 + −1+2 5 )2

and the least distortion of the Petersen graph equals


s
(10 − 3 − 1)(3 − 1) √
2 = 2.
(10 − 3 + 1)3

9.5 Least distortion embeddings of expander graphs


Theorem 9.5.1 Let G = (V, E) be a k-regular graph with an even number
n = |V | of vertices. Suppose that the second largest eigenvalue λ2 of the
adjacency matrix A of G satisfies k − λ2 ≥  > 0. Then we have
r j
 nk 
c2 (G) ≥ logk −1 .
2k 2
9.5 Least distortion embeddings of expander graphs 209

In particular, for families of k-regular graphs (Gn ) with n vertices and


with n → ∞ for which k and  are constant, independent of n, we have

c2 (Gn ) ∈ Ω(log n).

Such family of graphs are called expander graphs. So, Bourgain’s theorem is
tight for expander graphs.
In the proof of the theorem we will apply Dirac’s theorem on Hamiltonian
cycles from graph theory:

Lemma 9.5.2 Let G = (V, E) be a graph with n ≥ 3 vertices. Suppose


every vertex in G has at least n/2 neighbors, then G contains a Hamiltonian
cycle, that is, a cycle which passes through all the vertices of G.

Proof The graph G is connected because every connected component has at


least n/2+1 vertices and so there is space only for one connected component.
Let P = (x0 , x1 , . . . , xk ) be a longest path in G. Every neighbour of x0
and xk must be contained in P by maximality. By the pigeon hole principle
there must be an index i with 0 ≤ i ≤ k − 1 for which both {x0 , , xi+1 } and
{xi , xk } are edges in G.
We claim that the cycle

C = (x0 , xi+1 , xi+2 , . . . , xk , xi , xi−1 , . . . , x0 )

is Hamiltonian. For suppose not. Then there are vertices which are not
contained in C. Then, since G is connected, there is a vertex xj in C and
another vertex y not contained in C such that {xj , y} ∈ E. Define a path
Q starting at y, going to xj , around the cycle C, and stopping before xj .
But Q would be one edge longer than P in contradiction to the maximality
of P .

Proof of Theorem 9.5.1 Again we shall construct an appropriate matrix Y


and apply Theorem 9.2.1 to derive the lower bound on c2 (G).
We set
j nk
r = logk −1
2
and define the auxiliary graph H = (V, EH ) with edges {x, y} ∈ EH when-
ever the shortest path between x and y in G has length at least r.
In H every vertex has degree at least n/2. Indeed, since G is k-regular
every vertex x ∈ V has one vertex at distance 0, it has k vertices at distance
1, it has at most k(k − 1) vertices at distance 2, . . . , it has at at most
k(k − 1)r−1 vertices at distance r, see Figure 9.5.
210 Euclidean embeddings: Low distortion (Version: May 24, 2022)

Figure 9.4 Vertices at distance at most r = 3 in a k = 3-regular graph.

Hence, the vertex x has at most


r
X
r−1
1 + k + k(k − 1) + · · · + k(k − 1) ≤ k i ≤ k r+1
i=0

vertices at distance at most r. Now r is chosen so that k r+1 ≤ n/2.


By Dirac’s theorem, Lemma 9.5.2, H contains a Hamiltonian cycle. Since
n is even, we can choose in the Hamiltonian cycle every second edge and
get a perfect matching in H. Let B ∈ S V be the adjacency matrix of such
a perfect matching. It is a permutation matrix of a permutation consisting
out of n/2 disjoint transpositions. We denote the edges in H participating
in the perfect matching by M .
Define the matrix Y by

Y = kI − A + (B − I).
2
In the remainder of the proof we show that Y satisfies the assumptions of
Theorem 9.2.1.
It is easy to verify that Y e = 0 holds:

Y e = ke − ke + (Be − Ie) = 0,
2
where Be = e holds because B is a permutation matrix.
Next we verify that Y is positive semidefinite. For this let v ∈ RV be a
9.6 Notes and further reading 211

vector which is perpendicular to e. By the Rayleigh-Ritz principle we we


have
v T Av ≤ λ2 v T v,
and then the desired inequality

v T Y v = v T (kI − A + (B − I))v
2

≥ (k − λ2 )v v + v T (B − I)v
T
2
T  X
≥ v v + (2vx vy − vx2 − vy2 )
2
{x,y}∈M
 X
≥ xT x − · 2 (vx2 + vy2 )
2
{x,y}∈M
T T
= v v − v v
=0
follows.
To finish the proof we only have to evaluate Y ’s objective value. The
numerator of the objective function equals
X X 
Yxy d(x, y)2 = Bxy d(x, y)2 ≥ nr2 ,
2 x,y 2
x,y:Yxy >0

and the denominator of the objective function is


X
− Yxy d(x, y)2 = kn,
x,y:Yxy <0

hence,
 2
c2 (G)2 ≥ r .
2k

9.6 Notes and further reading


Linial, London, Rabinovich [1995] noticed that one can compute a least dis-
tortion Euclidean embedding via a semidefinite optimization problem. They
also gave the dual formulation Theorem 9.2.1. Enflo [1969] proved that the

least Euclidean distortion of the r-dimensional unit cube equals r, Theo-
rem 9.3.1. Linial, Magen [2000] realized that one can easily derive Enflo’s
result using the dual formulation. Least distortion Euclidean embeddings
of strongly regular graphs (Theorem 9.4.7) were found by Vallentin [2008].
Linial, London, Rabinovich [1995] showed that Bourgain’s theorem is tight
for expander graphs, the proof of Theorem 9.5.1 is from Linial, Magen [2000].
212 Euclidean embeddings: Low distortion (Version: May 24, 2022)

In this chapter we only took the semidefinite perspective of embedding fi-


nite metric spaces. Much more can be said about this geometrization of com-
binatorics, its relations to functional analysis (Ribe program), data science
(dimension reduction and Johnson-Lindenstrauss flattening lemma) and ap-
proximation algorithms (multicommodity flow and sparsest cut). For this we
refer to Linial [2002] and to Chapter 15 of Matoušek [2002]. The general idea
of relating Banach space concepts (linear, norm, complete) to metric space
concepts is central to the “Ribe program” in functional analysis, see the in-
troduction by Naor [2012]. In the Ribe program Bourgain’s theorem is seen
as the “grand ancestor” of the area of metric embeddings. One algorithmic
proof is presented in Chapter 15.7 of Matoušek [2002]. However, currently,
there is no proof known which is based on semidefinite optimization. Goe-
mans [1997] writes: “it would be nice to prove this result from semidefinite
programming duality.”
Strongly regular graphs are highly structured graphs which are relevant for
statistics, Euclidean geometry, group theory, finite geometry and extremal
combinatorics, see the monograph Brouwer, Van Maldeghem [2022].
Expander graphs are highly connected graphs with few edges. They have
many applications in mathematics and computer science, see the comple-
mentary surveys Hoory, Linial, Wigderson [2006] and Lubotzky [2012].
In this chapter we treated least distortion Euclidean embeddings of reg-
ular cubes, strongly regular graphs, and expander graphs. More cases have
been treated in the literature: product of cycles (Linial, Magen [2000]), trees
(Bourgain [1986], Matoušek [1999]), planar graphs (Rao [1999], Newman,
Rabinovich [2003]), graphs of high girth (Linial, Magen, Naor [2002]), dis-
tance regular graphs (Vallentin [2008], Kobayashi, Kondo [2015], Cioabă,
Gupta, Ihringer, Kurihara [2021]), In all these cases except for trees and
planar graphs Theorem 9.2.1 has been used. It would be interesting to re-
cover all cases from the theorem.

Exercises
9.1 Let (X, d) be a finite metric space. Show that c2 (X) ≤ D with D =
max{d(x, y) : x, y ∈ X} being the diameter of X.
9.2 Let G = (V, E) be a k-regular graph with n vertices. Let
k = λ1 ≥ λ2 ≥ . . . ≥ λn
be the eigenvalues of the adjacency matrix of G. Show that:
(a) λi ∈ [−k, k] for all i = 1, . . . , n.
Exercises 213

(b) The number of connected components of G coincides with the mul-


tiplicity of the largest eigenvalue. In particular, G is connected if
and only if λ1 > λ2 .
(c) G is bipartite if and only if λ1 = −λn .
(d) λ22 ≥ k n−k
n−1 .
9.3 Determine the least Euclidean distortion of the star graph on n vertices,
see Figure 9.1.
9.4 Let Cn be circuit graph of length n. Show that c2 (Cn ) = n2 sin πn when
n is even.
9.5 Consider the symmetric group Sn . The graph Gn has vertex set V =
Sn , and two vertices σ, π ∈ Sn are adjacent if and only if σ is the
composition of π and a transposition that swaps consecutive elements.

Show that c2 (Gn ) = O( log n!).
9.6 Prove Lemma 9.4.4.
9.7 Let G be a strongly regular graph. Show that also the complementary
graph G is strongly regular.
9.8 Let G be a connected k-regular graph. Suppose that next to k it has
only two more eigenvalues r and s. Show that G is a strongly regular
graph.
9.9 The vertex set of the triangular graph Tn consists of all 2-subsets {i, j}
with i, j ∈ [n]. In the triangular graph two vertices are adjacent when-
ever their intersection is not empty. Determine a least distortion Eu-
clidean embedding of Tn .
9.10 Prove the expander mixing lemma: Let G = (V, E) be a k-regular graph
and let λ2 the second largest eigenvalue of its adjacency matrix. Then
for S, T ⊆ V we have
|S||T | p
|{{x, y} ∈ E : x ∈ S, y ∈ T }| − k ≤ λ2 |S||T |.
|V |
10
Packings on the sphere

Packing problems are fundamental in geometric optimization and coding


theory: How densely can one pack given objects into a given container?
In this lecture the container will be the unit sphere

S n−1 = {x ∈ Rn : x · x = 1}

and the objects we want to pack are spherical caps of angle γ. The spherical
cap with angle γ ∈ [0, π] and center x ∈ S n−1 is given by

C(x, γ) = {y ∈ S n−1 : x · y ≥ cos γ}.

Its normalized volume equals (by integration with spherical coordinates)

ωn−1 (S n−2 ) 1
Z
w(γ) = (1 − u2 )(n−3)/2 du,
ωn (S n−1 ) cos γ

where ωn (S n−1 ) = (2π n/2 )/Γ(n/2) is the surface area of the unit sphere.
Two spherical caps C(x1 , γ) and C(x2 , γ) intersect in their topological in-
terior if and only if the inner product of x1 and x2 lies in the half-open
interval (cos(2γ), 1]. Conversely we have

C(x1 , γ)◦ ∩ C(x2 , γ)◦ = ∅ ⇐⇒ −1 ≤ x1 · x2 ≤ cos(2γ).

A packing of spherical caps with angle γ, is a collection of any number


of spherical caps with this angle and pairwise-disjoint topological interiors.
Given the dimension n and the angle γ we define1

A(n, 2γ) = max{N : C(x1 , γ), . . . , C(xN , γ) is a packing in S n−1 }.


1 Note here that we use 2γ in the definition of A(n, 2γ) because we want to make the notation
consistent with the common literature. There one emphasizes that 2γ is the angle between
the centers of the spherical caps.
10.1 α and ϑ for packing graphs 215

One particular case of packings of spherical caps has received a lot of


attention over the last centuries.
In geometry, the kissing number τn is the maximum number of non-
overlapping equally-sized spheres that can simultaneously touch a central
sphere. It is easy to see that τn = A(n, π/3) because the points where the
spheres touch the central sphere form the centers of a packing of spherical
caps with angle π/6.
Today, the kissing number is only known for dimensions 1, 2, 3, 4, 8 and
24. It is easy to see that the kissing number in dimension 1 is 2, and in
dimension 2 it is 6. The kissing number problem has a rich history. In 1694
Isaac Newton and David Gregory had a famous discussion about the kissing
number in three dimensions. The story is that Gregory thought thirteen
spheres could fit while Newton believed the limit was twelve. Note that the
easy area argument, which proves τ2 = 6, only gives that
   
1 4π
τ3 ≤ = = b14.92 . . .c = 14.
w(π/3) 2π(1 − cos(π/6))
It took many years, until 1953, when Schütte and van der Waerden proved
Newton right.

Figure 10.1 Construction of 12 kissing spheres. Image credit: Anja Traffas

In the 1970s advanced methods to determine upper bounds for the kiss-
ing number based on linear programming were introduced. Using these new
techniques, the kissing number problem in dimension 8 and 24 was solved
by Odlyzko, Sloane, and Levensthein. For four dimensions, however, the op-
timization bound is 25, while the exact kissing number is 24. In a celebrated
work Oleg Musin proved this in 2003, see Pfender, Ziegler [2004].
The goal of this lecture is to provide a proof of τ8 = 240.

10.1 α and ϑ for packing graphs


Many, often notoriously difficult, problems in combinatorics and geometry
can be modeled as packing problems of graphs G = (V, E) where the vertex
216 Packings on the sphere

set V can be an infinite or even a continuous set. All possible positions of


the objects which we can use for the packing are vertices of a graph and we
draw edges between two vertices whenever the two corresponding objects
cannot be simultaneously present in the packing because they overlap in
their interior. Now every independent set in this conflict graph gives a valid
packing.
For the problem of determining the optimal packing of spherical caps with
angle γ, A(n, 2γ), we define the packing graph G(n, 2γ) with vertex set

V = S n−1 = {x ∈ Rn : x · x = 1},

and edge set

x ∼ y ⇐⇒ x · y ∈ (cos(2γ), 1).

Then,

A(n, 2γ) = α(G(n, 2γ)), and τn = A(n, π/3) = α(G(n, π/3))

Now it is an “obvious” strategy to compute the theta number for this


graph in order to find upper bounds for the independence number α(G(n, 2γ)).
To generalize the theta number for infinite graphs, we will need a notion of
positive semidefinite infinite matrices because in the definition of the theta
number we need matrices whose rows and columns are indexed by the vertex
set of the graph.
This leads to positive semidefinite, continuous Hilbert-Schmidt kernels.

Definition 10.1.1 A continuous function (called continuous Hilbert-Schmidt


kernel)

K : S n−1 × S n−1 → R

is called symmetric if K(x, y) = K(y, x) holds for all x, y ∈ S n−1 . It is called


positive semidefinite if for all N and all x1 , . . . , xN ∈ S n−1 the symmetric
N × N matrix

K(xi , xj ) 1≤i,j≤N  0

is positive semidefinite. We denote the cone of positive semidefinite contin-


uous Hilbert-Schmidt kernels by C(S n−1 × S n−1 )0

We use this cone C(S n−1 × S n−1 )0 to define the theta prime number of
10.2 Symmetry reduction 217

the packing graph G(n, 2γ):


ϑ0 (G(n, 2γ)) = inf λ
K ∈ C(S n−1 × S n−1 )0
K(x, x) = λ − 1 for all x ∈ S n−1
K(x, y) ≤ −1 for all {x, y} 6∈ E.
We have {x, y} 6∈ E whenever the spherical caps C(x, γ) and C(y, γ) do
not intersect in their topological interior, i.e. whenever x · y ∈ [−1, cos(2γ)].
The definition ϑ0 is similar to the dual formulations in Lemma 6.4.1. We
use a prime to indicate that we replace the equality K(x, y) = −1 by the
inequality K(x, y) ≤ −1.
Similar to the finite case, ϑ0 provides an upper bound for the independence
number:
Theorem 10.1.2
α(G(n, 2γ)) ≤ ϑ0 (G(n, 2γ))
Proof Let C ⊆ S n−1 be an independent set. Let K be a feasible solution
of ϑ0 (G(n, 2γ)). Because K is positive semidefinite we have
XX
0≤ K(x, y)
x∈C y∈C
X X
= K(x, x) + K(x, y)
x∈C x6=y
| {z } | {z }
=|C|(λ−1) ≤(−1)(|C|2 −|C|)

≤ |C|(λ − 1) − (|C|2 − |C|)

This implies |C| ≤ λ, yielding the theorem.


Note that if we are in the lucky case that α(G(n, 2γ)) = ϑ0 (G(n, 2γ)),
the inequalities in the proof of the theorem are tight. This can only happen
when K(x, y) = −1 for {x, y} 6∈ E. We will use this observation later when
we determine τ8 .

10.2 Symmetry reduction


Computing ϑ0 does not seem to be easy since it is defined as an infinite-
dimensional semidefinite program. However, the underlying graph is highly
symmetric and so we can perform symmetry reduction, similar to the one
in Chapter 6.6.
218 Packings on the sphere

The automorphism group of the graph G(n, 2γ) is the orthogonal group
O(n) because for all A ∈ O(n) we have

Ax · Ay = x · y.

Furthermore the graph G(n, 2γ) is vertex transitive because for every two
points x and y on the unit sphere there is an orthogonal matrix mapping x
to y. Even stronger it is two-point homogeneous, meaning that if x, y, x0 , y 0 ∈
S n−1 are so that
x · y = x0 · y 0 ,

then there is an A ∈ O(n) with Ax = x0 , Ay = y 0 .


If K is a feasible solution for ϑ0 with objective value λ = K(x, x) + 1 and
if A ∈ O(n) is an orthogonal matrix then also

K A (x, y) = K(Ax, Ay)

is a feasible solution for ϑ0 with the same objective value. So we can sym-
metrize any feasible solution K of ϑ0
Z
K 0 (x, y) = K A (x, y)dµ(A),
A∈O(n)

where µ is the normalized Haar measure of the orthogonal group.


That means that we can restrict the optimization variable K to be a
positive semidefinite continuous Hilbert-Schmidt kernel which is invariant
under the orthogonal group, i.e.
O(n)
K ∈ C(S n−1 × S n−1 )0 ,

where
O(n)
C(S n−1 × S n−1 )0
= {K ∈ C(S n−1 × S n−1 ) : K A (x, y) = K(Ax, Ay) = K(x, y) for all A ∈ O(n)}.

So we get

ϑ0 (G(n, 2γ)) = inf λ


O(n)
K ∈ C(S n−1 × S n−1 )0
K(x, x) = λ − 1 for all x ∈ S n−1
K(x, y) ≤ −1 for all {x, y} 6∈ E.
10.3 Schoenberg’s theorem 219

10.3 Schoenberg’s theorem


Now the idea is to find an explicit characterization of the cone C(S n−1 ×
O(n)
S n−1 )0 . Such a characterization was proved by Schoenberg in 1941. He
parameterized this cone by its extreme rays.

Theorem 10.3.1 (Schoenberg (1941))


(∞ ∞
)
O(n)
X X
C(S n−1 × S n−1 )0 = fk Ekn (x, y) : fk ≥ 0, fk < ∞ , (10.1)
k=0 k=0

where
Ekn (x, y) = Pkn (x · y),

and where Pkn is a polynomial of degree k satisfying the orthogonality relation


Z 1 n−3
Pkn (t)Pln (t)(1 − t2 ) 2 dt = 0 if k 6= l,
−1

and where the polynomial Pkn is normalized by Pkn (1) = 1.

The equality in (10.1) should be interpreted as follows: A kernel K lies in


C(S n−1 × S n−1 )O(n) if and only if there are nonnegative numbers f0 , f1 , . . .
so that the series ∞
P
k=0 fk converges and so that


X
K(x, y) = fk Ekn (x, y)
k=0

holds. Here the right hand side converges absolutely and uniformly over
S n−1 × S n−1 .
For n = 2, Pk2 are the Chebyshev polynomials (of the first kind). For
larger n the polynomials belong to the family of Jacobi polynomials. The
Jacobi polynomials with parameters (α, β) are orthogonal polynomials for
the measure (1 − t)α (1 + t)β dt on the interval [−1, 1]. They form a complete
orthogonal system of the space L2 ([−1, 1], (1 − t)α (1 + t)β dt). This space
consists of all real-valued functions f : [−1, 1] → R for which the integral
Z 1
f 2 (t)(1 − t)α (1 + t)β dt
−1

(α,β)
exists and is finite. We denote by Pk the normalized Jacobi polynomial of
(α,β)
degree k with normalization Pk (1) = 1. The first few normalized Jacobi
220 Packings on the sphere

polynomials with parameter (α, α) and α = (n − 3)/2 are


(α,α)
P0n (t) = P0 (t) = 1,
(α,α)
P1n (t) = P1 (t) = t,
(α,α) n 2 1
P2n (t) = P2 (t) = t − .
n−1 n−1
Much more information is known about these orthogonal polynomials. They
are also known to many computer algebra systems.

sage: x = PolynomialRing(QQ, ’x’).gen()


sage: n = 4
sage: a = (n-3)/2
sage: for k in range(0,5):
sage: print(jacobi_P(k,a,a,x)/jacobi_P(k,a,a,1))

1
x
4/3*x^2 - 1/3
2*x^3 - x
16/5*x^4 - 12/5*x^2 + 1/5

10.4 Proof of Schoenberg’s theorem


In this section we prove Theorem 10.3.1 in three steps. In the first two steps
we derive some properties of the extreme rays Ekn .

10.4.1 Orthogonality relation


The space of symmetric continuous Hilbert-Schmidt kernel is an inner prod-
uct space, just like the space of symmetric matrices. The inner product
10.4 Proof of Schoenberg’s theorem 221

between K and L is
Z
hK, Li = K(x, y)L(x, y)dωn (x)dωn (y).
S n−1
Lemma 10.4.1 We have the orthogonality relation Ekn ⊥ Eln whenever
k 6= l.
Proof Since Ekn (x, y) = Pkn (x·y) and the integrals are invariant under O(n),
we can take x = N , where N is the North Pole, and therefore,
Z
n n n−1
hEk , El i = ωn (S ) Pkn (N · y)Pln (N · y)dωn (y)
S n−1
Z 1
n−3
= ωn (S n−1 )ωn−1 (S n−2 ) Pkn (t)Pln (t)(1 − t2 ) 2 dt
−1
= 0,
if k 6= l.

10.4.2 Positive semidefiniteness


Lemma 10.4.2 The Ekn ’s are positive semidefinite.
Proof Let us consider the space of continuous functions f : S n−1 → R with
inner product
Z
(f, g) = f (x)g(x)dωn (x).
S n−1

Let V0 be the space of constant functions on S n−1 and, for k ≥ 1, let Vk be


the space of polynomial functions on S n−1 of degree k which are orthogonal
to V0 , V1 , . . . , Vk−1 .
The key idea is to relate Vk to Ekn .
Fix x ∈ S n−1 . Consider the evaluation map f 7→ f (x). This is a linear
function on Vk . By the Riesz representation theorem2 , there is a unique
vk,x ∈ Vk with
(vk,x , f ) = f (x).
Claim: αk vk,x (y) = Ekn (x, y) for some αk > 0
Proof Note that both sides are polynomials of the right degree. Also, vk,x (y)
is invariant under rotations that leave x fixed: Let A ∈ O(n) such that
Ax = x. Then,
(Avk,x )(y) = vk,x (A−1 y) = vk,x (y),
2 In fact it follows from basic linear algebra because Vk is of finite dimension.
222 Packings on the sphere

because we have
(Avk,x , f ) = (vk,x , A−1 f )
= f (Ax)
= (vk,Ax , f ) (by definition of vk,· )
= (vk,x , f ) (Ax = x)
and by uniqueness of vk,x , it follows that Avk,x = vk,x . Thus (x, y) 7→ vk,x (y)
is purely a function of x · y.
Also, for k 6= l, vk,x ⊥ vl,x since Vk ⊥ Vl , thus they have the right orthog-
onality relations. Hence Ekn (x, y) and vk,x (y) are multiples of each other.
Since we have
Ekn (x, x) = 1 and vk,x (x) = (vk,x , vk,x ) > 0,
the claim follows.
Now we are ready to show that Ekn is positive semidefinite. Observe that
Ekn (x, y)
= αk vk,x (y) and that vk,x (y) = (vk,y , vk,x ). Thus we have,
Z Z
Ekn (x, y)f (x)f (y)dωn (x)dωn (y)
S n−1
Z S n−1
Z
= αk (vk,y , vk,x )f (x)f (y)dωn (x)dωn (y)
S n−1 S n−1
Z Z 
= αk vk,x f (x)dωn (x), vk,y f (y)dωn (y)
S n−1 S n−1
≥0
as both the integrals in the last inner product are identical. It follows that
Ekn is positive semidefinite.

10.4.3 End of proof


We first show that, if f0 , f1 , . . . are nonnegative numbers such that ∞
P
P∞ k=0 fk
converges, then the series f E n (x, y) converges absolutely and uni-
k=0 k k
formly for all x, y ∈ S n−1 .
By Lemma 10.4.2 Ekn is positive semidefinite and so
|Ekn (x, y)| ≤ Ekn (x, x) = Pkn (1) = 1
for all x, y ∈ S n−1 and so

X
fk Ekn (x, y)
k=0
10.4 Proof of Schoenberg’s theorem 223

converges absolutely for all x, y ∈ S n−1 .


Now, for all x, y ∈ S n−1 for all m ∈ N we have

X ∞
X
fk Ekn (x, y) ≤ fk
k=m k=m

and so the series also converges uniformly for all x, y ∈ S n−1 .


With the above observation, if we are given nonnegative numbers f0 , f1 , . . .
such that ∞
P
k=0 fk converges, then the kernel

X
K(x, y) = fk Ekn (x, y)
k=0

is continuous. From Lemma 10.4.2 it is also positive semidefinite, and so we


showed the inclusion “⊇”.
For the other inclusion “⊆” let K : S n−1 × S n−1 → R be a continu-
ous, positive semidefinite, and invariant kernel. Kernel K is invariant, so
let h : [−1, 1] → R be the function such that K(x, y) = h(x · y) for all
x, y ∈ S n−1 . The polynomials P0n , P1n form a complete orthogonal system of
L2 ([−1, 1], (1 − t2 )(n−3)/2 dt) with convergence in the L2 -norm.
We first claim that the fk are all nonnegative. To see this, recall the
orthogonality relation from Lemma 10.4.1. First note that
* +
X
n n
fk Ek , El ≥ 0
k

since this is the inner product of two positive semidefinite kernels. Now by
orthogonality of Ekn ’s, we have
* +
X
0≤ fk Ekn , Eln = fl hEln , Eln i
| {z }
k >0

This is possible only if fl ≥ 0.


To finish, we show that the series ∞
P
k=0 fk converges. To this end, consider
for m = 0, 1, . . . the function
m
X
hm (u) = h(u) − fk Pkn (u) for all u ∈ [−1, 1].
k=0

These are continuous functions. Moreover, since we have



X
hm = fk Pkn
k=m+1
224 Packings on the sphere

in the sense of L2 convergence, it follows that for each m the kernel Km (x, y) =
hm (x · y) is positive semidefinite.
This implies in particular that hm (1) ≥ 0 for all m. But then we have
m
X m
X
h(1) − fk = h(1) − fk Pkm (1) = hm (1) ≥ 0
k=0 k=0
P∞
and we conclude that the series of nonnegative terms k=0 fk converges to
a number less than or equal to h(1), as we wanted.

10.5 Delsarte’s LP method


Using Schoenberg’s theorem we can reformulate ϑ0 (G(n, 2γ)) where we use
the nonnegative optimization variables f0 , f1 , . . .
ϑ0 (G(n, 2γ)) = inf λ
f0 , f1 , . . . ≥ 0
X∞
fk < ∞
k=0 (10.2)

X
fk Pkn (1) = λ − 1
k=0
X
fk Pkn (t) ≤ −1 for all t ∈ [−1, cos(2γ)]
k

This problem has infinitely many variables. If we truncate the variables3


we get the following bound:
α(G(n, 2γ)) ≤ ϑ0 (G(n, 2γ)) ≤ inf λ
f0 , f1 , . . . , fd ≥ 0,
d
X
fk Pkn (1) = λ − 1
k=1
Xd
fk Pkn (t) ≤ −1 ∀t ∈ [−1, cos(2γ)]
k=1

Since this optimization problem is a linear program (with infinitely many


constraints) it carries the name linear programming bound. These kind of
linear programming bounds were first invented by Delsarte in 1973 in the
3 Formally we set 0 = fd+1 = fd+2 = . . .
10.6 τ8 equals 240 225

context of error correcting codes and therefore they also carry the name
“Delsarte’s LP method”.
Note that the infinitely many inequalities can be replaced by a finite
dimensional semidefinite condition using sums of squares (see Chapter 2.7):
d
X
−1 − fk Pkn (t) = p(t) − (t + 1)(t − cos(2γ))q(t)
k=1

where p and q are polynomials which can be written as sum of squares.

10.6 τ8 equals 240


It so happens that for n = 8, α(G(8, π/3)) = ϑ0 (G(8, π/3)) = 240. This
result is due to Odlyzko, Sloane, and independently due to Levenshtein.
First, consider the set of 240 points in S 7 obtained by all possible permu-
tations and sign-changes of the point
 T
1 1
A = √ , √ , 0, 0, 0, 0, 0
2 2
and all possible even sign-changes of the point

1 1 1 1 1 1 1 T
 
B= √ ,√ ,√ ,√ ,√ ,√ ,√
8 8 8 8 8 8 8
There are 82 22 = 112 points generated by A and 27 = 128 points generated


by B. All possible inner products for points from this set are
 
1 1
−1, − , 0, , 1 .
2 2
In particular, note that there is no inner product between 12 and 1. Thus,
this is a valid kissing configuration. In fact, this configuration of points on
the unit sphere is coming from the root system E8 which has connections to
many areas in mathematics and physics.
Now, taking hints from the formulation for ϑ0 (G(8, π/3)), we explicitly
construct a kernel K(x, y). Recall, K(x, y) = −1 if {x, y} 6∈ E. Also, recall
that K(x, y) was a function of the inner product x·y = t only. Now, consider
the following polynomial
1 2 2
   
1
F (t) = −1 + β(t + 1) t + t t−
2 2
Note that, F (−1) = F (−1/2) = F (0) = f (1/2) = −1 by construction. Also,
226 Packings on the sphere

F (t) ≤ −1 for t ∈ [−1, 1/2]. Setting, F (1) = λ − 1 = 240 − 1 = 239, we get


β = 320
3 .
Now, it can be verified (Exercise 12.1 (a)) that
6
X
F (t) = fk Pk8 (t), fk ≥ 0. (10.3)
k=0

Thus, F (t) is a feasible point for the optimization problem (10.2).


Now by construction of the set of points, we know that α(G(8, π/3)) ≥
240. By the construction of F (t), we know that ϑ0 (G(8), π/3) ≤ 240. Thus
we have α = ϑ0 (G(8, π/3)). Thus,
τ8 = 240.

10.7 The problem of the thirteen spheres


We will now give a computer proof which resolves the problem of the thirteen
spheres. It is a “stripped” version of the proof presented in Bachoc, Vallentin
[2008] which also works in arbitrary dimension n.

10.7.1 Reminder: Some trigonometry and stereometry


Th approach of finding upper bounds for the cardinality of spherical codes
is based on spherical harmonics (Fourier analysis on the unit sphere). For
the unit sphere S 2 spherical harmonics can be understood using high school
math. We recall the trigonometry and stereometry which we use.
We have the addition formula for cosine
cos(α − β) = cos α cos β + sin α sin β.
This follows easily from Euler’s formula
eiθ = cos θ + i sin θ,
because
cos(α − β) = <(ei(α−β) )
= <(eiα e−iβ )
= <((cos α + i sin α)(cos(−β) + i sin(−β)))
= <((cos α + i sin α)(cos β − i sin β))
= cos α cos β + sin α sin β,
where we used that cosine is an even function, cos(−β) = cos β, and sine is
10.7 The problem of the thirteen spheres 227

an odd function, sin(−β) = − sin(β). Also, since cosine is an even function,


the addition formula for cosine holds with absolute values:

cos |α − β| = cos α cos β + sin α sin β.

Let x = (x1 , x2 , x3 )T ∈ S 2 be a point on the 2-dimensional unit sphere


S 2 = {x ∈ R3 : x · x = 1}. We can represent the point x using spherical
coordinates θ ∈ [0, π] and ϕ ∈ [0, 2π] by

x1 = sin θ cos ϕ
x2 = sin θ sin ϕ
x3 = cos θ.

If we have two points x = (x1 , x2 , x3 )T , x0 = (x01 , x02 , x03 )T ∈ S 2 , both lying


on the unit sphere S 2 then we can express the inner product between x and
x0 using their spherical coordinates (θ, ϕ) and (θ0 , ϕ0 ) as

x · x0 = (sin θ cos ϕ)(sin θ0 cos ϕ0 ) + (sin θ sin ϕ)(sin θ0 sin ϕ0 ) + cos θ cos θ0
= sin θ sin θ0 (cos ϕ cos ϕ0 + sin ϕ sin ϕ0 ) + cos θ cos θ0
= sin θ sin θ0 (cos |ϕ − ϕ0 |) + cos θ cos θ0 ,

where we used the addition formula for cosine.

10.7.2 Construction of a semidefinite matrix


Here we construct semidefinite matrices which will be crucial in the proof of
Theorem 10.7.3. Although the construction is quite simple it is not clear at
this moment why we perform exactly this construction. One can show (but
we do not do this here) that this construction is a result of the properties
of Lemma 10.7.1 and Lemma 10.7.2.
Let k and d be natural numbers. For a point x ∈ S 2 on the unit sphere
with spherical coordinates (θ, ϕ) we define a matrix Ekd (x) ∈ R(d+1)×2 by
 0
cos θ cos kϕ cos0 θ sin kϕ

cos1 θ cos kϕ cos1 θ sin kϕ
Ekd (x) = sink θ  .
 
.. ..
 . . 
cosd θ cos kϕ cosd θ sin kϕ

Furthermore, for two points x, x0 ∈ S 2 we define the matrix Zkd (x, x0 ) ∈


R(d+1)×(d+1) by
 T
Zkd (x, x0 ) = Ekd (x) Ekd (x0 ) .
228 Packings on the sphere

The (i, j)-th entry (here it is convenient to start the indexing with 0, thus
i = 0, . . . , d, and j = 0, . . . , d) of this matrix equals
h i
Zkd (x, x0 ) = sink θ sink θ0 cosi θ cos kϕ cosj θ0 cos kϕ0
ij
+ cosi θ sin kϕ cosj θ0 sin kϕ0


= sink θ sink θ0 cosi θ cosj θ0 cos k|ϕ − ϕ0 |, (10.4)


where we again used the addition formula. Finally, we define the symmetric
matrix Ykd (x, x0 ) ∈ S d+1 by
1 d 
Ykd (x, x0 ) = Zk (x, x0 ) + Zkd (x0 , x) .
2
This matrix Ykd (x, x0 ) has the following two important properties:
Lemma 10.7.1 The entries of Ykd (x, x0 ) only depend on θ, θ0 and |ϕ − ϕ0 |.
Proof This immediately follows from the definition of Ykd and from (10.4).

Lemma 10.7.2 For all natural numbers N and for all points x1 , . . . , xN ∈
S 2 the symmetric (d + 1) × (d + 1)-matrix
N X
X N
Ykd (xi , xj )
i=1 j=1

is positive semidefinite.
Proof Let y = (y0 , y1 , . . . , yd ) ∈ Rd be a vector. Then,
   
XN X N XN X N
yT  Ykd (xi , xj ) y = y T  Ekd (xi )(Ekd (xj ))T  y
i=1 j=1 i=1 j=1

N
!T N
!
X X
= (Ekd (xi ))T y (Ekd (xi ))T y
i=1 i=1
≥ 0.

Now we simplify the entries of the matrices Zkd (x, x0 ) and Ykd (x, x0 ) by
introducing the following “inner product coordinates”:
u = x · (0, 0, 1)T = cos θ,
v = x0 · (0, 0, 1)T = cos θ0 ,
t = x · x0 = (1 − u2 )(1 − v 2 ) cos |ϕ − ϕ0 | + uv.
p
10.7 The problem of the thirteen spheres 229

Recall the Chybeshev polynomials of the first kind Tn . They satisfy

Tn (cos θ) = cos nθ,

and in general the Chebyshev polynomial of the first kind Tn is a polynomial


of degree n which is even, when n is even, and odd, when n is odd. The first
few Chebyshev polynomials are

T0 (x) = 1
T1 (x) = x
T2 (x) = 2x2 − 1
T3 (x) = 4x3 − 3x
T4 (x) = 8x4 − 8x2 + 1,

and they satisfy the three-term recurrence relation

Tn+1 (x) = 2xTn (x) − Tn−1 (x).

They are orthogonal with respect to the weight function √ 1 , i.e. if n 6= m,


1−x2
then
Z 1
1
Tn (x)Tm (x) √ dx = 0.
−1 1 − x2
Now it is easy to verify that the matrices Zkd (x, x0 ) and Ykd (x, x0 ) have the
following expressions in the inner product coordinates u, v, t:
!
h
d 0
i
i j
p k t − uv
Zk (x, x ) = u v (1 − u2 )(1 − v 2 ) Tk p .
ij (1 − u2 )(1 − v 2 )

In particular, every entry is a multivariate polynomial in the variables u, v, t


since all square roots cancel because Tk is even if and only if k is even. We
shall write Ykd (u, v, t) for Ykd (x, x0 ).

10.7.3 Upper bounds for spherical codes


The kissing number problem asks for the largest cardinality N of a spherical
(N, 21 )-code. Using a volume argument it is easy to see that

N≤ ≈ 14.92 . . . ,
2π(1 − cos π/6)
thus it does not resolve the question of the thirteen spheres. The following
theorem can be used to give sharper upper bound for N for any given s.
230 Packings on the sphere

Theorem 10.7.3 Let F ∈ R[u, v, t] be a polynomial of the form


m
X
F (u, v, t) = hFk , Ykd (u, v, t)i
k=0

such that

(i) Fk ∈ S d+1 , with k = 1, . . . , m, is a positive semidefinite matrix,


(ii) F0 − f0 E00 is a positive semidefinite matrix, where f0 > 0 and where E00
is the matrix which only has a 1 at position (0, 0) and zeroes everywhere
else,
(iii) F (u, v, t) ≤ 0 for all (u, v, t) which satisfy
 
1 u v
u, v, t ∈ [−1, s]3 , and, u 1 t   0,
v t 1

(iv) F (u, u, 1) ≤ B for all u ∈ [−1, s].

Then for every spherical (N, s)-code we have the upper bound
B
N≤ + 1.
f0

Proof Denote with N = (0, 0, 1)T the north pole of the sphere. Let C be a
spherical (N, s)-code. We may assume that the north pole lies in C after we
rotated C appropriately. Consider the double sum
X X
S= F (x · N, x0 · N, x · x0 ).
x∈C\{N} x0 ∈C\{N}

On the one side we have


m
* +
X X X
S= Fk , Ykd (x · N, x0 · N, x · x0 )
k=0 x∈C\{N} x0 ∈C\{N}
* +
X X
≥ F0 , Y0d (x · N, x0 · N, x · x0 )
x∈C\{N} x0 ∈C\{N}

≥ f0 (N − 1)2 .

Here, the first inequality follows because Fk  0 and x x0 Ykd (x, x0 )  0.


P P

The second inequality follows because of the second assumption and because
[Y0d (u, v, t)]00 = 1.
10.7 The problem of the thirteen spheres 231

On the other hand we have

X X
S= F (x · N, x0 · N, x · x0 ) + F (x · N, x0 · N, x · x0 )
x∈C\{N} (x,x0 )∈(C\{N})2
x6=x0

≤ B(N − 1) + 0,

where the inequality follows from the third and fourth assumption.
Together,

f0 (N − 1)2 ≤ S ≤ B(N − 1),

and from this the theorem follows immediately.

10.7.4 Application to the kissing number

The following polynomial satisfies the assumptions of Theorem 10.7.3 with


s = 1/2, B = 11.799 . . . and f0 = 1. Hence, the kissing number in dimension
3 is at most 12.799 . . . . Since it is a natural number, it is at most 12. It is
also at least 12 because of the construction given in Figure 10.1. Thus, this
gives the solution to the thirteen-sphere problem.
232 Packings on the sphere

8 8 8 7 8 6 8 5 8 4
F (u, v, t) = 145.5146532u v + 988.2590039u v + 363.0801883u v − 672.8504875u v − 225.3291719u v +
8 3 8 2 8 8 7 8
191.5409281u v + 59.48900200u v − 19.01121967u v − 6.236063758u + 988.2590039u v +
7 7 7 7 7 6 7 6 7 5
2884.321650u v t + 2098.660249u v + 3568.090777u v t + 18.16433328u v + 344.9966795u v t−
7 5 7 4 7 4 7 3 7 3
1553.436106u v − 1392.615104u v t − 147.6093630u v − 437.8480785u v t + 446.8628584u v +
7 2 7 2 7 7 7
123.8174512u v t + 71.67183278u v + 62.08524794u vt − 49.22376389u v − 6.539547051u t−
7 6 8 6 7 6 7 6 6 2
6.216856357u + 363.0801883u v + 3568.090777u v t + 18.16433328u v − 1165.053624u v t +
6 6 6 6 6 5 2 6 5 6 5
5312.362070u v t − 1024.256466u v − 71.15855319u v t + 196.7871602u v t − 471.6846248u v +
6 4 2 6 4 6 4 6 3 2
110.7356966u v t − 2163.927390u v t + 464.1969368u v − 78.18545046u v t −
6 3 6 3 6 2 2 6 2 6 2
505.3867129u v t + 203.1645374u v + 41.35985267u v t + 199.8129603u v t − 54.08669500u v
6 2 6 6 6 2 6
− 17.33067199u vt + 77.24220247u vt − 21.52580028u v + 16.53555569u t − 10.78477271u t+
6 5 8 5 7 5 7 5 6 2
0.8926140522u − 672.8504875u v + 344.9966795u v t − 1553.436106u v − 71.15855319u v t +
5 6 5 6 5 5 3 5 5 2
196.7871602u v t − 471.6846248u v + 1168.212790u v t + 134.9381447u v t −
5 5 5 5 5 4 3 5 4 2
599.4393672u v t + 752.0835370u v + 386.8314957u v t + 37.58791494u v t −
5 4 5 4 5 3 3 5 3 2
219.0874465u v t + 274.0310180u v − 377.6693697u v t − 116.2455138u v t +
5 3 5 3 5 2 3 5 2 2
135.0094523u v t − 132.6882963u v − 29.72556641u v t − 35.53845637u v t +
5 2 5 2 5 3 5 2 5
37.13287924u v t − 50.62106262u v − 55.38456003u vt − 6.687980644u vt + 13.24127784u vt+
5 5 3 5 2 5 5
16.77621554u v + 47.57727922u t + 37.98428531u t − 14.54446445u t − 3.453914175u −
4 8 4 7 4 7 4 6 2 4 6
225.3291719u v − 1392.615104u v t − 147.6093630u v + 110.7356966u v t − 2163.927390u v t
4 6 4 5 3 4 5 2 4 5
+ 464.1969368u v + 386.8314957u v t + 37.58791494u v t − 219.0874465u v t+
4 5 4 4 4 4 4 3 4 4 2
274.0310180u v − 73.36820000u v t + 392.6271541u v t + 222.2882726u v t +
4 4 4 4 4 3 4 4 3 3
735.4659250u v t − 207.3205267u v + 83.02988588u v t − 118.0842510u v t −
4 3 2 4 3 4 3 4 2 4
77.09075609u v t + 244.0885057u v t − 88.19709904u v − 37.62483034u v t −
4 2 3 4 2 2 4 2 4 2
113.7068820u v t − 8.326854722u v t − 45.23470569u v t + 12.09797900u v −
4 4 4 3 4 2 4 4
77.65166209u vt − 55.08896234u vt + 31.85022557u vt − 17.02325455u vt + 10.36532594u v+
4 4 4 3 4 2 4 4
28.47777799u t + 46.28598839u t − 2.917811290u t − 7.358099446u t − 1.944166441u +
3 8 3 7 3 7 3 6 2 3 6
191.5409281u v − 437.8480785u v t + 446.8628584u v − 78.18545046u v t − 505.3867129u v t
3 6 3 5 3 3 5 2 3 5
+ 203.1645374u v − 377.6693697u v t − 116.2455138u v t + 135.0094523u v t−
3 5 3 4 4 3 4 3 3 4 2
132.6882963u v + 83.02988588u v t − 118.0842510u v t − 77.09075609u v t +
3 4 3 4 3 3 5 3 3 4
244.0885057u v t − 88.19709904u v + 57.53328043u v t + 22.60680881u v t +
3 3 3 3 3 2 3 3 3 3
186.2840976u v t + 140.5263123u v t − 8.091457206u v t − 20.29157793u v +
3 2 5 3 2 4 3 2 3 3 2 2
2.757110052u v t − 23.04651883u v t + 34.97617188u v t + 56.64805815u v t −
3 2 3 2 3 5 3 4 3 3
36.40483748u v t + 4.218095336u v + 27.34220407u vt − 9.135170204u vt − 35.39213527u vt
10.7 The problem of the thirteen spheres 233

3 2 3 3 3 5 3 4
− 22.42382897u vt + 0.7941054727u vt + 5.569355446u v + 7.486743724u t + 13.30789637u t −
3 3 3 2 3 3 2 8
17.88091824u t − 20.64369889u t + 5.781887281u t + 2.801486009u + 59.48900200u v +
2 7 2 7 2 6 2 2 6 2 6
123.8174512u v t + 71.67183278u v + 41.35985267u v t + 199.8129603u v t − 54.08669500u v
2 5 3 2 5 2 2 5 2 5
− 29.72556641u v t − 35.53845637u v t + 37.13287924u v t − 50.62106262u v −
2 4 4 2 4 3 2 4 2 2 4
37.62483034u v t − 113.7068820u v t − 8.326854722u v t − 45.23470569u v t+
2 4 2 3 5 2 3 4 2 3 3
12.09797900u v + 2.757110052u v t − 23.04651883u v t + 34.97617188u v t +
2 3 2 2 3 2 3 2 2 6
56.64805815u v t − 36.40483748u v t + 4.218095336u v − 31.31247252u v t +
2 2 5 2 2 4 2 2 3 2 2 2
14.32494116u v t + 13.18700960u v t − 37.71556777u v t − 39.18630288u v t +
2 2 2 2 2 6 2 5
16.22869753u v t + 9.192752212u v + 0.3706844433u vt − 1.226131739u vt +
2 4 2 3 2 2 2 2
15.24079688u vt + 4.448344730u vt − 12.51651111u vt + 2.559716007u vt + 0.2868981952u v−
2 6 2 5 2 4 2 3 2 2
5.434675350u t − 2.282422116u t − 2.376206451u t − 5.756516944u t + 2.285069305u t +
2 2 8 7 7
0.9051509003u t − 0.1807740107u − 19.01121967uv + 62.08524794uv t − 49.22376389uv −
6 2 6 6 5 3 5 2
17.33067199uv t + 77.24220247uv t − 21.52580028uv − 55.38456003uv t − 6.687980644uv t +
5 5 4 4 4 3 4 2
13.24127784uv t + 16.77621554uv − 77.65166209uv t − 55.08896234uv t + 31.85022557uv t −
4 4 3 5 3 4 3 3
17.02325455uv t + 10.36532594uv + 27.34220407uv t − 9.135170204uv t − 35.39213527uv t −
3 2 3 3 2 6 2 5
22.42382897uv t + 0.7941054727uv t + 5.569355446uv + 0.3706844433uv t − 1.226131739uv t +
2 4 2 3 2 2 2 2
15.24079688uv t + 4.448344730uv t − 12.51651111uv t + 2.559716007uv t + 0.2868981952uv +
7 6 5 4 3
11.42699511uvt − 3.865608217uvt − 18.70371050uvt + 14.88704132uvt + 23.94378706uvt +
2 7 6
4.201735301uvt − 4.333409705uvt − 1.954761337uv − 0.1528426696ut + 0.07806634840ut −
5 4 3 2
1.066251935ut − 2.244106275ut + 1.817221500ut + 2.535123200ut − 0.5798954908ut − 0.3852239752u−
8 7 7 6 2 6 6
6.236063758v − 6.539547051v t − 6.216856357v + 16.53555569v t − 10.78477271v t + 0.8926140522v
5 3 5 2 5 5 4 4
+ 47.57727922v t + 37.98428531v t − 14.54446445v t − 3.453914175v + 28.47777799v t +
4 3 4 2 4 4 3 5
46.28598839v t − 2.917811290v t − 7.358099446v t − 1.944166441v + 7.486743724v t +
3 4 3 3 3 2 3 3
13.30789637v t − 17.88091824v t − 20.64369889v t + 5.781887281v t + 2.801486009v −
2 6 2 5 2 4 2 3 2 2
5.434675350v t − 2.282422116v t − 2.376206451v t − 5.756516944v t + 2.285069305v t +
2 2 7 6 5
0.9051509003v t − 0.1807740107v − 0.1528426696vt + 0.07806634840vt − 1.066251935vt −
4 3 2
2.244106275vt + 1.817221500vt + 2.535123200vt − 0.5798954908vt − 0.3852239752v
7 6 5 4 3 2
+ 0.5536647161t + 0.6067608795t + 2.504492501t + 6.632189667t + 3.925884209t − 1.191716995t −
1.064050680t − 1.168270210

Now the question remains: How did we come up with this polynomial?
Answer: We found it by solving a semidefinite program on the computer.
Without loss of generality we can set f0 = 1. Then, we have the following
semidefinite program with infinitely many constraints:

minimize B
subject to B ≥ 0, F1 , . . . , Fm  0,
F0 − E00  0,
Xm
hFk , Ykd (u, v, t)i ≤ 0 for all (u, v, t) as in 10.7.3.3,
k=0
m
X
hFk , Ykd (u, u, 1)i ≤ B for all u as in 10.7.3.4.
k=0
234 Packings on the sphere

By setting G0 = F0 − E0 and Gi = Fi for i = 1, . . . , m, we get the


conditions

G0 , . . . , Gm  0,
Xm
hGk , Ykd (u, v, t)i ≤ −1,
k=0
m
X
hGk , Ykd (u, u, 1)i ≤ B − 1.
k=0

We can model the infinitely many constraints using sums of squares. For
this we define the polynomials

g1 (u, v, t) = −(u + 1)(u − s),


g2 (u, v, t) = −(v + 1)(v − s),
g3 (u, v, t) = −(t + 1)(t − s),
 
1 u v
g4 (u, v, t) = det u 1 t  = 1 + 2uvt − u2 − v 2 − t2 .
v t 1

Then the domain

{(u, v, t) : g1 (u, v, t) ≥ 0, . . . , g4 (u, v, t) ≥ 0}

equals the domain described in 10.7.3.3, and the domain

{(u, v, t) : g1 (u, v, t) ≥ 0}

equals the domain described in 10.7.3.4. Hence, any feasible solution of the
following finite-dimensional semidefinite program provides a polynomial F
which satisfies the assumptions of Theorem 10.7.3:

minimize B
subject to B ≥ 0, G0 , . . . , Gm  0,
q1 , . . . , q5 , p1 , p2 sums of square polynomials,
Xm
−1− hGk , Ykd (u, v, t)i = q1 g1 + · · · + q4 g4 + q5 ,
k=0
m
X
−1+B− hGk , Ykd (u, u, 1)i = p1 g1 + p2 .
k=0
10.8 Further reading 235

10.8 Further reading


Schoenberg’s result can be seen as a special case of Bochner’s theorem which
gives a similar statement for every compact, homogeneous space and which
is based on the Peter-Weyl theorem. In Vallentin [2008] and Bachoc, Gi-
jswijt, Schrijver, Vallentin [2012] general techniques are presented which use
Bochner’s theorem to simplify semidefinite programs which are invariant
under a group of symmetries.
A very readable introduction to the area of geometric packing problems
and energy minimization is Cohn [2010].
Casselman [2004] writes about the discussion between Newton and Gre-
gory:

The history of the problem is obscure. It is commonly said that in a discussion


that took place in Cambridge the Scottish astronomer and mathematician David
Gregory asserted that 13 spheres could be placed in contact with a central sphere,
while Isaac Newton claimed that only 12 were possible. Evidence for exactly what
was said in this discussion is murky. The first published reference to it that I know
of is in the third volume of Newton’s correspondence, edited by H. W. Turnbull,
which came out in 1961. There is an entry for May 4, 1694, one of several Latin
memoranda written about that day by Gregory, summarizing a conversation with
Newton on the distribution of stars of various magnitudes. On the question of
12 versus 13, the entry does not support what is commonly said. Two distinct
possibilities are not mentioned, and the most plausible reading is that Newton
himself thought that 13 spheres surrounding a fourteenth was a possibility! More
likely, some would think, is that Gregory didn’t understand what Newton had
said in an apparently rather rapid discourse. Turnbull refers to a more elaborate
entry in a notebook of Gregory kept at Christ Church, Oxford, but at least
one person’s attempt to locate that entry where Turnbull said it should be was
unsuccessful. In any event, Turnbull’s paraphrase suggests that there is nothing
important there not already mentioned in the published memorandum. Other
puzzling features of this story are that the 1953 paper by Schütte and van der
Waerden refers to a Newton-Gregory discussion, and in 1956 John Leech referred
in more detail to the Christ Church notebook. These both appeared several years
before the correspondence of Newton appeared in print. What was the source of
their information? To paraphrase a familiar dictum of the Mattel Toy Company,
history is hard.

Exercises
10.1(a) Determine fk in (10.3), completing the proof of τ8 = 240.
(b) Compute ϑ0 (G(2, π/3)).
(c) Determine α(G(n, π/4)).
10.2 Consider 12 points x1 , . . . , x12 on the sphere S 2 . What is the largest
possible minimal angle between distinct points xi and xj with i 6= j?
236 Packings on the sphere

10.3 Write a computer program for finding ϑ0 (G(n), π/3) and produce a
table for n = 2, . . . , 24.
10.4 Determine α(G(24, π/3)).
Exercises 237

.
PART FOUR
ALGEBRA
11
Sums of Squares of Polynomials

In this chapter we investigate sums of squares of polynomials, already briefly


introduced in Chapter 2. We address the following basic question:
Given a subset K ⊆ Rn defined by finitely many polynomial inequalities,
how can one certify that a given polynomial p is nonnegative on K?
This question is motivated in particular by its relevance to the problem
of minimizing the polynomial p over the set K, known as an instance of
polynomial optimization. We introduce this problem briefly here and will re-
turn to it in the next two chapters. We collect a number of results from real
algebraic geometry which give certificates for nonnegative (positive) poly-
nomials on K in terms of sums of squares of polynomials. We cannot give
all full proofs since some of these results require advanced algebraic tools
beyond the scope of this book. However, we give a proof for the representa-
tion result of Putinar (relying on an earlier result of Schmüdgen), which we
will use later for designing converging hierarchies of semidefinite relaxations
for polynomial optimization problems.
In this chapter and the next ones we will use the following notation. We
let R[x1 , . . . , xn ] (or simply R[x]) denote the ring of n-variate polynomials
with real coefficients. Analogously, for a field K, K[x] stands for the set of
polynomials with coefficients in K. We will mostly use the field K = R, but
sometimes also the field K = C of complex numbers. A polynomial p ∈ K[x]
can be written as p = α pα xα , where pα ∈ K, xα stands for the monomial
P

xα1 1 · · · xαnn with α ∈ Nn , and only finitely many pα are nonzero. If p is not the
zero polynomial, the maximum value of |α| = ni=1 αi for which pα 6= 0 is
P

the degree of p, denoted deg(p) (one may set deg(0) = −∞). The polynomial
p is homogeneous with degree d if it is of the form p = α∈Nn :|α|=d pα xα .
P

For an n-variate polynomial p of degree d one can define its homogenization


P , which is the homogeneous (n + 1)-variate polynomial of degree d defined
242 Sums of Squares of Polynomials

by

P (x1 , . . . , xn , xn+1 ) = xdn+1 p(x1 /xn+1 , . . . , xn /xn+1 ). (11.1)

For d ∈ N, Nnd denotes the set of sequences α ∈ Nn with |α| ≤ d, corre-


sponding to the exponents of the monomials of degree at most d. Moreover,
R[x]d denotes the vector space ofall polynomials of degree at most d, its
dimension is s(n, d) = |Nnd | = n+d
d and the set {xα : α ∈ Nnd } of monomials
of degree at most d is its canonical basis.
Throughout, for p ∈ R[x] and a set K ⊆ Rn , we use the notation ‘p ≥ 0
on K’ to express that p(x) ≥ 0 for all x ∈ K; in the same way, p > 0 on
K means p(x) > 0 for all x ∈ K, and p = 0 on K means p(x) = 0 for all
x ∈ K.

11.1 Sums of squares of polynomials


A polynomial p is said to be a sum of squares of polynomials, abbreviated
as p is sos, if p can be written as p = m 2
P
k=1 qk for some m ∈ N and some
polynomials q1 , . . . , qm ∈ R[x]. Then p must have even degree 2d and each
summand qk has degree at most d. In addition, if p is homogeneous with
degree 2d, then each qk is homogeneous with degree d. (See Exercise 11.1).
We let Σ denote the set of all polynomials that are sos and Σ2d = Σ∩R[x]2d
the set of sos polynomials with degree at most 2d. Both Σ and Σ2d are convex
cones. A fundamental property, already proved in Section 2.7, is that sums of
squares of polynomials can be recognized using semidefinite programming.

Lemma 11.1.1 Let p ∈ R[x]2d . Then p is a sum of squares of polynomials


if and only if there exists a matrix Q ∈ S s(n,d) satisfying the properties:
X
Q  0, Qβ,γ = pα for all α ∈ Nn2d . (11.2)
β,γ∈Nn :
d
β+γ=α

In other words, checking whether a given polynomial is sos amounts to


checking whether the feasibility region of a semidefinite program is nonempty.
Here are some other basic properties (to be shown in Exercise 11.2). Given
a polynomial p ∈ R[x1 , . . . , xn ] and the corresponding homogeneous polyno-
mial P ∈ R[x1 , . . . , xn , xn+1 ] from (11.1), p is nonnegative over Rn (resp.,
a sum of squares of polynomials) if and only if P is nonnegative over Rn+1
(resp., a sum of squares of polynomials).
11.1 Sums of squares of polynomials 243

11.1.1 Polynomial optimization


Why do we care about sums of squares of polynomials? Sums of squares
are useful because they constitute a sufficient condition for nonnegativity
(of polynomials): If p can be written as a sum of squares of polynomials
then clearly one has p(x) ≥ 0 for all x ∈ Rn . Hence sums of squares provide
certificates for nonnegativity. As an illustration, the next example indicates
how to use sums of squares to give a ‘sos type’ proof for the arithmetic-
geometric mean inequality.
Example 11.1.2 Consider the polynomial:
fn (x) = xn1 + · · · + xnn − nx1 · · · xn .
For n = 2 we have f2 (x) = (x1 − x2 )2 . More generally, Hurwitz [1891]
showed that, for any even n, fn is a sum of squares of polynomials. This
permits to derive the arithmetic-geometric mean inequality:
√ x1 + · · · + xn
n
x1 · · · xn ≤ (11.3)
n
for all x1 , . . . , xn ≥ 0 and n ≥ 1. (See Exercise 11.8).
Since sums of squares of polynomials can be recognized using semidefinite
programming they can be used to design tractable bounds for hard opti-
mization problems of the form: Compute the infimum pmin of a polynomial
p over a set K ⊆ Rn :
pmin = inf p(x), (11.4)
x∈K

where the set K is defined by polynomial inequalities:


K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0}
with g1 , . . . , gm ∈ R[x]. Such an optimization problem, where the objective
and the constraints are polynomial functions, is called a polynomial opti-
mization problem. Depending on the setting, ‘inf’ in (11.4) can be ‘min’,
e.g., when K is compact. If the polynomials f, g1 , . . . , gm have degree 1 then
we find a linear programming problem, which can be solved in polynomial
time. But, as the next examples show, computing pmin is a hard problem in
general.
Example 11.1.3 Given integers a1 , · · · , an ∈ N, consider the polynomial
X n 2 X n
p(x) = ai xi + (x2i − 1)2 ,
i=1 i=1

which is obviously nonnegative. Then the minimum of p over Rn is equal


244 Sums of Squares of Polynomials

to 0 if and only if there exists x ∈ {±1}n for which xT a = 0, i.e., if the


sequence a1 , · · · , an can be partitioned. So if one could compute the infimum
over Rn of a quartic polynomial then one could solve the partition problem,
which is well known to be NP-complete (Garey and Johnson [1979]).

Example 11.1.4 As another example, the stability number α(G) of a graph


G = (V, E) can be expressed using any of the following two programs:
nX o
α(G) = max xi : xi + xj ≤ 1 for {i, j} ∈ E, x2i − xi = 0 for i ∈ V ,
i∈V
(11.5)
1 n X o
= min xT (AG + I)x : xi = 1, x ≥ 0 , (11.6)
α(G)
i∈V

where AG is the adjacency matrix of G. The formulation (11.6) is due to


Motzkin and Straus [1965] (see Exercise 11.12). This shows that polynomial
optimization captures NP-hard problems, when allowing the objective or the
constraints to be quadratic polynomials.

Here is a possible approach to problem (11.4). Consider the set of poly-


nomials that are nonnegative on K:

P(K) = {f ∈ R[x] : f (x) ≥ 0 for all x ∈ K}, (11.7)

which is a convex cone. Then, clearly, we can reformulate the problem of


minimizing p over K as

pmin = inf p(x) = sup{λ : λ ∈ R, p − λ ∈ P(K)}. (11.8)


x∈K

A natural idea is now to replace in (11.8) the (hard) positivity condition:


p ∈ P(K), by the (easier) sos type condition: p ∈ Σ + g1 Σ + . . . + gm Σ. The
latter condition means p admits a decomposition p = s0 + s1 g1 + . . . + sm gm
for some s0 , s1 , . . . , sm ∈ Σ. The existence of such a decomposition clearly
implies p ∈ P(K). This leads to defining the following parameter:

psos = sup{λ : λ ∈ R, p − λ ∈ Σ + g1 Σ + · · · + gm Σ}. (11.9)

For instance, in the unconstrained case (m = 0 and K = Rn ), we have

psos = sup{λ : λ ∈ R, p − λ ∈ Σ}

and thus, as a direct application of Lemma 11.1.1, one can compute psos
11.1 Sums of squares of polynomials 245

using the semidefinite program:


n X o
psos = p0 + sup − Q00 : Q  0, pα = Qβ,γ for α ∈ Nn2d \ {0} .
β,γ∈Nn :
d
β+γ=α
(11.10)
Note indeed that λ = p0 − Q00 when using (11.2) to express that the poly-
nomial p − λ is sos. For general K, as will be explained later in more detail,
by bounding the degrees of the sos polynomials s0 , . . . , sm one obtains a
sequence of semidefinite programs whose optimal values converge to the pa-
rameter psos .
Clearly the following inequality holds:
psos ≤ pmin . (11.11)
In general this inequality is strict. In the unconstrained setting (K = Rn ),
the cases when equality holds have been classified by Hilbert (Theorem 11.1.5
below). In the constrained setting, equality holds in (11.11), e.g., when the
set K is compact and satisfies an additional condition (the Archimedean
condition, described below). This follows from Putinar’s theorem (Theo-
rem 11.2.11), which claims that any polynomial strictly positive on K be-
longs to Σ + g1 Σ + · · · + gm Σ.
We will return to the polynomial optimization problem (11.8) and to its
sos relaxation (11.9) in the next chapters. In the remaining of this chapter
we investigate sums of squares representations for positive polynomials.

11.1.2 Hilbert’s theorem


In 1888 Hilbert [1888] classified the pairs (n, d) for which every nonnegative
polynomial of degree d in n variables is a sum of squares of polynomials.
He showed that this happens only in three cases: for univariate polynomi-
als, for quadratic polynomials, and in the last exceptional case of degree 4
polynomials in two variables.
Theorem 11.1.5 (Hilbert [1888]) Every nonnegative n-variate polynomial
of even degree d is a sum of squares of polynomials if and only if n = 1, or
d = 2, or (n, d) = (2, 4).
How about the proof of this result? For the ‘if part’ one needs to show
that any nonnegative n-variate polynomial is sos when n = 1, or d = 2, or
(n, d) = (2, 4). The univariate case n = 1 and the quadratic case will be
considered in Exercises 11.3 and 11.4, but the proof for the last exceptional
case (n, d) = (2, 4) is involved. A proof can be found in the books of Bochnak,
246 Sums of Squares of Polynomials

Coste and Roy [1998, Prop. 6.4.4] or by Blekherman, Parrilo and Thomas
[2012, Prop???]1 . For the ‘only if’ part, what Hilbert’s theorem claims is
that, for every pair (n, d) 6= (2, 4) with n ≥ 2 and even d ≥ 4, there is an
n-variate polynomial of degree d which is nonnegative over Rn but not sos.
One can check that it suffices to give such a polynomial for the two pairs
(n, d) = (2, 6), (3, 4). (See Exercise 11.5.) Concrete polynomials for the cases
(n, d) = (2, 6) and (3, 4) are given in the next example.

Example 11.1.6 Hilbert’s proof for the ‘only if ’ part of Theorem 11.1.5
was not constructive. The first concrete example of a nonnegative polynomial
that is not sos is the following polynomial, for the case (n, d) = (2, 6):

p(x, y) = x4 y 2 + x2 y 4 − 3x2 y 2 + 1, (11.12)

constructed by Motzkin in 1967. See Figure 11.1.

Proof To see that p is nonnegative √on R2 , one can use the arithmetic-
geometric mean inequality: a+b+c 3 ≥ 3 abc, applied to a = x4 y 2 , b = x2 y 4
and c = 1.
To show that p is not sos, one may use brute force. Assume p = l ql2 for
P

some polynomials ql of degree at most 3. We will reach a contradiction by


considering some coefficients in the polynomials at both sides of the identity
p = l ql2 . As the coefficient of x6 in p is 0, we see that the coefficient of
P

x3 in each ql is 0; analogously, the coefficient of y 3 in ql is 0. Then, as the


coefficients of x4 and y 4 in p are 0, we get that the coefficients of x2 and y 2
in ql are 0. After that, as the coefficients of x2 and y 2 in p are 0, we can
conclude that the coefficients of x and y in ql are 0. Finally, we now know
that each polynomial ql is of the form ql = al xy 2 + bl x2 y + cl xy + dl for some
scalars al , bl , cl , dl . Then the coefficient of x2 y 2 in p is equal to −3 = l c2l ,
P

which yields a contradiction.

In fact, the same argument shows that p − λ is not sos for any scalar
λ ∈ R. Therefore, for the infimum of the Motzkin polynomial p over R2 , the
sos bound psos carries no information: psos = −∞, while pmin = 0 is attained
at (±1, ±1).
For the case (n, d) = (3, 4), the Choi-Lam polynomial:

q(x, y, z) = 1 + x2 y 2 + y 2 z 2 + x2 z 2 − 4xyz

is nonnegative (directly, using the arithmetic-geometric mean inequality) but


not sos (direct inspection as for Motzkin’s polynomial).
1 insert reference
11.1 Sums of squares of polynomials 247

As an application, also the homogenized versions of the Motzkin and Choi-


Lam polynomials are globally nonnegative, but not sums of squares of poly-
nomials:
P (x, y, z) = x4 y 2 + x2 y 4 − 3x2 y 2 z 2 + z 6 , (11.13)

Q(x, y, z, t) = t4 + x2 y 2 + y 2 z 2 + x2 z 2 − 4xyzt.

0
1.5
1
2
0.5 1.5
0 1
0.5
−0.5 0
−0.5
−1 −1
−1.5
−1.5 −2

Figure 11.1 A polynomial which is nonnegative but not sos: the Motzkin
polynomial (11.12)

11.1.3 Are sums of squares a rare event?


A natural question is how abundant sums of squares are within nonnegative
polynomials. It turns out that the answer depends on whether we fix or let
grow the number of variables and the degree.
On the one hand, if we fix the number of variables and allow the degree to
grow, then every nonnegative polynomial p can be approximated by sums of
squares, which are obtained by adding a small high degree perturbation to p.
In fact, as the next theorem shows, this result remains valid when assuming
only that p is nonnegative over the box [−1, 1]n .
248 Sums of Squares of Polynomials

Theorem 11.1.7 (Lasserre, Netzer [2006]) If p is a polynomial which is


nonnegative on [−1, 1]n , then the following holds:
 n
X 
For every  > 0 there exists k ∈ N for which p +  1 + x2k
i ∈ Σ.
i=1

On the other hand, if we fix the degree and let the number of variables
grow, then Blekherman [2006] showed that (roughly speaking) there are sig-
nificantly more nonnegative polynomials than sums of squares. To state his
result, let Pn,2d (resp., Σn,2d ) denote the set of homogeneous n-variate poly-
nomials of degree 2d that are nonnegative over Rn (resp., sums of squares).
Then the cone Pn,2d contains the cone Σn,2d and what Blekherman shows is
that Pn,2d is much larger than Σn,2d when n grows. To formulate the result
in a precise way we intersect these two cones by the hyperplane
n Z o
H= p: p(x)dµ(x) = 1
Sn−1

consisting of the homogeneous polynomials whose integral on the unit sphere


is 1 (here µ is the Haar measure on the unit sphere). Then we have the
inclusion Σn,2d ∩ H ⊆ Pn,2d ∩ H, where both sets are compact with non-
empty interior in the space Pn,2d whose dimension is D = n+2d−1 2d − 1.
Blekherman [2006] shows that there exist universal constants c, C > 0 (not
depending on n or d) such that

vol(Pn,2d ∩ H) 1/D
 
(d−1)/2
c·n ≤ ≤ C · n(d−1)/2 .
vol(Σn,2d ∩ H)

11.1.4 Artin’s theorem


As mentioned above, Hilbert showed in 1888 that not every nonnegative
polynomial can be written as a sum of squares of polynomials. Later, in
1900, Hilbert asked the following question, known as Hilbert’s 17th problem:
Is it true that every nonnegative polynomial on Rn can be written as a
sum of squares of rational functions?
Artin answered this question in the affirmative in 1927. This was a major
breakthrough, which started the field of real algebraic geometry.
Theorem 11.1.8 (Artin [1927]) A polynomial p is nonnegative on Rn if
 2
pj
and only if p = m
P
j=1 qj for some m ∈ N and p1 , q1 , . . . , pm , qm ∈ R[x].

Example 11.1.9 Consider the homogenized Motzkin polynomial P (x, y, z)


from relation (11.13). As we have seen earlier, P is nonnegative but not a
11.2 Positivity certificates 249

sum of squares of polynomials. By Artin’s theorem we know that P can be


written as a sum of squares of rational functions. Here is an explicit such
decomposition (check it!), taken from Reznick [2000]:

(x2 + y 2 )2 P (x, y, z) = x2 y 2 (x2 + y 2 + z 2 )(x2 + y 2 − 2z 2 )2 + (x2 − y 2 )2 z 6 .

The following result by Reznick [1995] shows that, under some strict pos-
itivity assumption, a sum of squares decomposition exists involving denom-
inators of a very special form.

Theorem 11.1.10 (Reznick [1995]) Assume p is a homogeneous polyno-


mial which is strictly positive on Rn \{0}. Then, there exists an integer r ∈ N
such that the polynomial ( ni=1 x2i )r p(x) is a sum of squares of polynomials.
P

11.2 Positivity certificates


We now turn to the study of nonnegative polynomials p on a basic closed
semialgebraic set K, i.e., a set K of the form

K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0}, (11.14)

where g1 , . . . , gm ∈ R[x].
It is convenient to set g0 = 1. When all the polynomials p, gj have degree
one, Farkas’ lemma (see Theorem 2.6.1) states:
m
X
p ≥ 0 on K ⇐⇒ p = λj gj for some scalars λj ≥ 0. (11.15)
j=0

Hence this gives a full characterization of the linear polynomials that are
nonnegative over the polyhedron K.
Such a characterization does not extend to the case when p is a nonlinear
polynomial or when the description of K involves some nonlinear polynomi-
als gj . The situation is then much more delicate and results depend on the
assumptions made on the description of the set K.
Of course, the following implication holds trivially:
m
X
p= sj gj for some polynomials s0 , . . . , sm ∈ Σ =⇒ p ≥ 0 on K.
j=0

However, this is not an equivalence. One needs a stronger assumption: strict


positivity of p over K, and an assumption on K (a bit more than compact-
ness), in order to claim the reverse implication. More precisely, assume that
250 Sums of Squares of Polynomials

K is compact and satisfies the following additional condition:


n
X m
X
∃N > 0 ∃s0 , . . . , sm ∈ Σ s.t. N − x2i = sj gj (11.16)
i=1 j=0

(which clearly implies K is compact). Then the following implication holds:


m
X
p > 0 on K =⇒ p = sj gj for some s0 , . . . , sm ∈ Σ. (11.17)
j=0

This result is due to Putinar [1993], we will discuss it in Section 11.2.5 below.
Note the analogy between (11.15) and (11.17): while the variables in
(11.15) are nonnegative scalars λj , the variables in (11.17) are sos poly-
nomials sj .
A result of the form (11.17) provides a positivity certificate for the poly-
nomial p; it also goes under the name Positivstellensatz. This terminology
has historical reasons, the name originates from the analogy to the classical
Nullstellensatz of Hilbert for the existence of complex roots:

Theorem 11.2.1 (Hilbert’s Nullstellensatz) Let K = R or C. Given


g1 , . . . , gm ∈ K[x], define their complex variety, which consists of their com-
mon complex roots:

VC (g1 , . . . , gm ) = {x ∈ Cn : g1 (x) = 0, . . . , gm (x) = 0}.

For any polynomial p ∈ K[x], the following holds:


m
X
p = 0 on VC (g1 , . . . , gm ) ⇐⇒ pk = uj gj for some uj ∈ K[x], k ∈ N.
j=1

In particular, we have:
m
X
VC (g1 , . . . , gm ) = ∅ ⇐⇒ 1 = uj gj for some uj ∈ K[x].
j=1

The set of polynomials of the form m


P
j=1 uj gj with u1 , . . . , um ∈ K[x] is
the ideal generated by the polynomials g1 , . . . , gm , denoted as (g1 , . . . , gm ),
and the set of polynomials p for which pk ∈ (g1 , . . . , gm ) for some k ∈ N
is its radical ideal; we will come back to these notions in the next chapter
(Section 12.1.1).
What the last claim in Theorem 11.2.1 says is that a system of polynomial
equations g1 (x) = 0, . . . , gm (x) = 0 does not have any common complex so-
lution in Cn if and only if the constant polynomial 1 belongs to the ideal
11.2 Positivity certificates 251

(g1 , . . . , gm ), or equivalently if (g1 , . . . , gm ) = K[x] (indeed an obvious suffi-


cient condition for nonexistence of a solution). This result is also known as
the weak Hilbert Nullstellensatz.
Suppose we search for a certificate that a given polynomial p ∈ K[x]
Pm
lies in the ideal (g1 , . . . , gm ), i.e., for a decomposition p = j=1 uj gj . If
we impose a degree restriction on the degrees of the unknown polynomials
uj , then the search boils down to solving a linear program (obtained by
equating coefficients at both sides of the identity p = m
P
j=1 uj gj ). Under
some conditions on the polynomials gj , one knows a priori a low degree
bound on the polynomials uj . Namely, if {g1 , . . . , gm } forms a Groebner
basis of the ideal they generate (w.r.t. some monomial ordering), then there
is a decomposition p = m
P
j=1 uj gj with deg(uj gj ) ≤ deg(p) for all j ∈ [m]
(see, e.g., Cox, Little and O’Shea [1992] for details on Groebner bases).
On the other hand, suppose we search for a decomposition of the form
p = s0 + m
P
j=1 sj gj , where the sj ’s are sums of squares of polynomials (i.e.,
we want to certify that p belongs to the quadratic module generated the
gj ’s, as introduced in Section 11.2.5). If we fix a bound on the degrees of
the unknown polynomials sj , then this amounts to solving a semidefinite
program (as will be seen later). Giving such bound on the degrees of the
summands sj is a delicate question and the bounds are usually very high (we
refer, e.g., to the work of Lombardi, Perruci and Roy [2014] and references
therein, where such questions are addressed in a more general setting).
There is a real analogue of Hilbert’s weak Nullstellensatz, which charac-
terizes when a system of polynomial equations g1 (x) = 0, . . . , gm (x) = 0
does not have any common real solution in Rn : this happens when the con-
stant polynomial −1 can be written as the sum of a sos polynomial and a
polynomial in the ideal generated by the gj ’s (see Theorem 11.2.6 below).
Hence, in a nutshell, semidefinite programming is the key ingredient to
deal with real algebraic notions (such as sums of squares and real roots),
while linear programming permits to deal with complex algebraic notions
(such as ideals and complex roots). We will consider more general positivity
certificates in the rest of this chapter. We will also mention a method for
computing complex solutions of polynomial equations in Chapter 12 and we
will return to the topic of real solutions in Chapter 13.

11.2.1 The univariate case


We consider here nonnegative univariate polynomials over a closed interval
K ⊆ R, thus of the form K = R, K = [0, ∞), or K = [−1, 1] (up to scaling
and translation). The case K = R was considered in Theorem 11.1.5: p ≥ 0
252 Sums of Squares of Polynomials

on R if and only if p is sos. We now consider the other two cases: the half-
line K = [0, ∞) and the compact interval K = [−1, 1]. In both cases a
full characterization of nonnegative polynomials is known in terms of sos
representations, moreover with explicit degree bounds. See Exercises 11.9
and 11.10 for the proofs of the next two theorems.
Theorem 11.2.2 (Pólya-Szegö) Let p be a univariate polynomial of degree
d. Then, p ≥ 0 on [0, ∞) if and only if p = s0 + s1 x for some s0 , s1 ∈ Σ with
deg(s0 ) ≤ d and deg(s1 ) ≤ d − 1.
Theorem 11.2.3 (Fekete, Markov-Lukácz) Let p be a univariate polyno-
mial of degree d. Assume that p ≥ 0 on [−1, 1]. Then the following holds.
(i) p = s0 +s1 (1−x2 ), where s0 , s1 ∈ Σ, deg(s0 ) ≤ d+1 and deg(s1 ) ≤ d−1.
(ii) p = s1 (1 + x) + s2 (1 − x), where s1 , s2 ∈ Σ, deg(s1 ), deg(s2 ) ≤ d.
Note the two different representations in (i), (ii), depending on the choice
of the polynomials chosen to describe the set K = [−1, 1].
In the next sections we will discuss various positivity certificates for the
multivariate case n ≥ 2.

11.2.2 Krivine’s Positivstellensatz


2

Here we state the Positivstellensatz of Krivine (1964), which characterizes


nonnegative polynomials on an arbitrary basic closed semialgebraic set K
(with no compactness assumption). Let K be as in (12.2). Throughout we
set
g = {g1 , . . . , gm }
Q
and, for a set of indices J ⊆ {1, . . . , m}, we set gJ = j∈J gj , with g∅ = 1
for J = ∅. The set
n X o
T (g) = sJ gJ : sJ ∈ Σ for J ⊆ [m] (11.18)
J⊆[m]

is called the preordering generated by g = {g1 , . . . , gm }. The set T (g) con-


sists of all weighted sums of the products gJ , weighted by sums of squares.
Clearly, any polynomial in T (g) is nonnegative on K:
T (g) ⊆ P(K).
As the next example shows this inclusion can be strict.
2 Name Stengle?
11.2 Positivity certificates 253

Example 11.2.4 Let K = {x ∈ R : g = (1 − x2 )3 ≥ 0} and p = 1 − x2 .


Then, p is nonnegative on K, but p 6∈ T (g) (check it). However, observe
that pg = p4 (and compare with item (ii) in the next theorem).
Theorem 11.2.5 (Krivine [1964]) Let K be as in (12.2) and let p ∈ R[x].
The following assertions hold.
(i) p > 0 on K ⇐⇒ pf = 1 + h for some f, h ∈ T (g).
(ii) p ≥ 0 on K ⇐⇒ pf = p2k + h for some f, h ∈ T (g) and k ∈ N.
(iii) p = 0 on K ⇐⇒ −p2k ∈ T (g) for some k ∈ N.
(iv) K = ∅ ⇐⇒ −1 ∈ T (g).
In (i)–(iv) above, there is always one implication which is clear. Indeed, in
(iv), it is clear that −1 ∈ T (g) implies K = ∅ and, in (i)–(iii), the existence
of a sos type certificate for p of the prescribed form implies the desired
property for p.
When choosing K = Rn (i.e., g = {1}), we have T (g) = Σ and thus
(ii) implies Artin’s theorem. Moreover, one can derive the following result,
which characterizes the polynomials that vanish on the set of common real
roots of a set of polynomials.
Theorem 11.2.6 (The Real Nullstellensatz) Given g1 , . . . , gm ∈ R[x],
define their real variety, consisting of their common real roots:
VR (g1 , . . . , gm ) = {x ∈ Rn : g1 (x) = 0, . . . , gm (x) = 0}. (11.19)
For any polynomial p ∈ R[x], the following holds:
p = 0 on VR (g1 , . . . , gm ) ⇐⇒ p2k + s = m
P
j=1 uj gj for some s ∈ Σ,
u1 , . . . , um ∈ R[x] and k ∈ N.
In particular, we have:
m
X
VR (g1 , . . . , gm ) = ∅ ⇐⇒ −1 = s + uj gj for some s ∈ Σ, uj ∈ R[x].
j=1

Note that the above result does not help us yet directly to tackle the poly-
nomial optimization problem (11.8). Indeed, using (i), we can reformulate
pmin as
pmin = sup{λ : (p − λ)f = 1 + h, f, h ∈ T (g), λ ∈ R}.
However, this does not lead directly to a sequence of semidefinite programs
after adding a bound on the degrees of the variable polynomials f, h. This is
because we have the quadratic term λf , where both λ and f are unknown. Of
course, one could fix λ and solve the corresponding semidefinite programs,
254 Sums of Squares of Polynomials

and iterate using binary search on λ. However, there is a much more elegant
and efficient remedy: Using the refined representation results of Schmüdgen
and Putinar in the next sections one can set up simpler semidefinite pro-
grams permitting to optimize directly over the variable λ, without binary
search.

11.2.3 Schmüdgen’s Positivstellensatz


For compact K, Schmüdgen [1991] proved the following simpler representa-
tion result for positive polynomials on K.
Theorem 11.2.7 (Schmüdgen [1991]) Let K be as in (12.2) and p ∈ R[x].
Assume K is compact. Then,

p(x) > 0 for all x ∈ K =⇒ p ∈ T (g).


P
A drawback of a representation J⊆[m] sJ gJ in the preordering T (g) is
that it involves 2m sos polynomials sJ , thus exponentially many in terms of
the number m of constraints defining K. Next we see how to get a repre-
sentation of the form m
P
j=0 sj gj , involving only m + 1 terms, thus a linear
number of terms.

11.2.4 Putinar’s Positivstellensatz


Under an additional (mild) assumption on the polynomials defining the com-
pact set K, Putinar [1993] showed an analogue of Schmüdgen’s theorem,
where the preordering T (g) is replaced by the following set
 
X m 
M(g) = sj gj : s0 , . . . , sm ∈ Σ , (11.20)
 
j=0

known as the quadratic module generated by g = {g1 , . . . , gm }.


First we describe this additional assumption. For this consider the follow-
ing conditions on the polynomials gj defining K:

∃h ∈ M(g) for which K0 = {x ∈ Rn : h(x) ≥ 0} is compact, (11.21)


Xn
∃N ∈ N for which N − x2i ∈ M(g), (11.22)
i=1
∀f ∈ R[x] ∃N ∈ N for which N ± f ∈ M(g). (11.23)

Note that relation (11.22) was already mentioned earlier in (11.16). Clearly,
11.2 Positivity certificates 255

the following implications hold:


(11.23) =⇒ (11.22) =⇒ (11.21).
As an application of Schmüdgen’s theorem (Theorem 11.2.7) it follows that
these conditions are all equivalent.
Theorem 11.2.8 The conditions (11.21), (11.22) and (11.23) are all
equivalent. If any of them holds, then the quadratic module M(g) is said
to be Archimedean.
Proof It suffices to show the implication (11.21) =⇒ (11.23). Assume
(11.21) holds and let f ∈ R[x]. As the set K0 = {x : h(x) ≥ 0} is com-
pact, there exists N ∈ N such that −N < f (x) < N over K0 . Moreover,
as K ⊆ K0 , the two polynomials N ± f are positive on K. Applying Theo-
rem 11.2.7, we deduce that N ± f ∈ T (h) ⊆ M(g).
Note that being Archimedean is a property of the polynomials g1 , . . . , gm
defining the set K, and not of the set K itself. If M(g) is Archimedean then
this provides a certificate that the set K is compact.
On the other hand, if the set K is compact, then it is contained in a ball
{x : R2 − ni=1 x2i ≥ 0} for some scalar R > 0. Hence, if we know the radius
P

R of such a ball containing K and if we add the (redundant) ball constraint


gm+1 := R2 − ni=1 x2i ≥ 0 to the description of K, then the quadratic
P

module M(g0 ) is now Archimedean, after setting g0 = g ∪ {gm+1 }.


Example 11.2.9 Consider the standard simplex
n Xn o
n
K = x ∈ R : x ≥ 0, xi ≤ 1
i=1

and the corresponding quadratic module M = M(x1 , . . . , xn , 1 − ni=1 xi ).


P

Then M is Archimedean. For this we show that the polynomial n − ni=1 x2i
P

belongs to M. This follows from the following identities:


P P
• 1 − xi = (1 − j xj ) + j6=i xj ∈ M.
(1+x )(1−x2 ) (1+x )(1−x2i ) (1+xi )2 2
• 1 − x2i = i
2
i
+ i
2 = 2 (1 − xi ) + (1−x
2
i)
(1 + xi ) ∈ M.
P 2 P 2
• n − i xi = i (1 − xi ) ∈ M.
Example 11.2.10 Consider the hypercube
K = [0, 1]n = {x ∈ Rn : 0 ≤ xi ≤ 1 for i ∈ [n]}
and the corresponding quadratic module M = M(x1 , 1 − x1 , . . . , xn , 1 − xn ).
Then M is Archimedean. Indeed, as in the previous example, 1 − x2i ∈ M
and thus n − ni=1 x2i ∈ M.
P
256 Sums of Squares of Polynomials

Theorem 11.2.11 (Putinar [1993]) Let K be as in (12.2) and p ∈ R[x].


Assume that the quadratic module M(g) is Archimedean, i.e., the gj ’s satisfy
any of the equivalent conditions (11.21)-(11.23). Then, we have:
p(x) > 0 for all x ∈ K =⇒ p ∈ M(g).

11.2.5 Proof of Putinar’s Positivstellensatz


In this section we give a full proof for Theorem 11.2.11 (however relying on
Theorem 11.2.8). The proof is elementary, combining some (often ingenious)
algebraic manipulations.3 We start with defining the notions of ideal and
quadratic module in the ring R[x].
Definition 11.2.12 A set I ⊆ R[x] is an ideal if I is closed under addition
and multiplication by R[x]: I + I ⊆ I and R[x] · I ⊆ I.
Definition 11.2.13 A subset M ⊆ R[x] is a quadratic module if 1 ∈ M
and M is closed under addition and multiplication by squares: M + M ⊆ M
and Σ · M ⊆ M . M is said to be proper if M 6= R[x] or, equivalently, if
−1 6∈ M .
Example 11.2.14 Given polynomials g1 , . . . , gm , the set
 
Xm 
(g1 , . . . , gm ) = uj gj : u1 , . . . , um ∈ R[x]
 
j=1

is an ideal (called the ideal generated by the gj ’s) and the set M(g) from
(11.20) is a quadratic module (called the quadratic module generated by the
gj ’s).
We start with some technical lemmas.
Lemma 11.2.15 If M ⊆ R[x] is a quadratic module, then I = M ∩ (−M )
is an ideal.
Proof This follows from the fact that, for any f ∈ R[x] and g ∈ I, we have:
f +1 2 f −1 2
   
fg = g+ (−g) ∈ I.
2 2

Lemma 11.2.16 Let M ⊆ R[x] be a maximal (by inclusion) proper quadratic


module. Then, M ∪ (−M ) = R[x].
3 We thank Markus Schweighofer for communicating this proof to us; it can also be found in
the book by Marshall [2008].
11.2 Positivity certificates 257

Proof Assume f 6∈ M ∪ (−M ). Each of the sets M 0 = M + f Σ and M 00 =


M − f Σ is a quadratic module, strictly containing M . By the maximality
assumption on M , the two quadratic modules M 0 and M 00 are not proper:
M 0 = M 00 = R[x]. Hence we have
−1 = g1 + s1 f, −1 = g2 − s2 f for some g1 , g2 ∈ M, s1 , s2 ∈ Σ.
This implies
−s2 − s1 = s2 (g1 + s1 f ) + s1 (g2 − s2 f ) = s2 g1 + s1 g2
and thus s1 , s2 ∈ −M . On the other hand, s1 , s2 ∈ Σ ⊆ M . Therefore, we
have s1 , s2 ∈ I = M ∩ (−M ). As I is an ideal (by Lemma 11.2.15), we
get s1 f ∈ I ⊆ M and therefore −1 = g1 + s1 f ∈ M , contradicting the
assumption that M is proper.
Lemma 11.2.17 Let M be a maximal proper quadratic module in R[x]
and set I = M ∩ (−M ). Assume that M is Archimedean, i.e., M satisfies
(11.23):
for all f ∈ R[x] there exists N ∈ N such that N ± f ∈ M.
(i) For all f ∈ R[x], there exists a (unique) scalar a ∈ R such that f −a ∈ I.
(ii) There exists a ∈ Rn such that
f − f (a) ∈ I for all f ∈ R[x]. (11.24)
Proof (i) Given a polynomial f ∈ R[x], define the sets
A = {a ∈ R : f − a ∈ M }, B = {b ∈ R : b − f ∈ M }.
As M is Archimedean, A, B are both nonempty. We show that |A ∩ B| = 1.
First observe that a ≤ b for any a ∈ A and b ∈ B. For, if one would have
a > b, then b − a = (f − a) + (b − f ) is a negative scalar in M , contradicting
the fact that M is proper. Let a0 be the supremum of A and b0 the infimum
of B. Thus a0 ≤ b0 . Moreover, a0 = b0 . For, if not, there is a scalar c such
that a0 < c < b0 . Then, f − c 6∈ M ∪ (−M ), which gives a contradiction
since M ∪ (−M ) = R[x] by Lemma 11.2.16.
We now show that a0 = b0 belongs to A ∩ B, which implies A ∩ B = {a0 }
and thus concludes the proof of (i). Suppose for contradiction that a0 6∈ A,
i.e., f − a0 6∈ M . Then the quadratic module M 0 = M + (f − a0 )Σ is not
proper: M 0 = R[x]. Hence,
−1 = g + (f − a0 )s for some g ∈ M, s ∈ Σ.
As M is Archimedean, there exists N ∈ N such that N − s ∈ M . As a0 =
258 Sums of Squares of Polynomials

sup A, there exists a scalar  such that 0 <  < 1/N and a0 −  ∈ A. Then,
we have f − (a0 − ) = (f − a0 ) +  ∈ M and thus

−1 + s = g + (f − a0 + )s ∈ M.

Adding (N − s) ∈ M to both sides, we obtain:

−1 + N = (−1 + s) + (N − s) ∈ M.

We reach a contradiction since −1 + N < 0. This concludes the proof of (i).


We now show (ii). By applying (i) to each coordinate polynomial f = xi
(for i ∈ [n]), we find scalars ai ∈ R for which

xi − ai ∈ I = M ∩ (−M ) for all i ∈ [n].

These scalars constitute the vector a = (a1 , . . . , an ) ∈ Rn . Using the fact


that I is an ideal (Lemma 11.2.15), we can show relation (11.24), namely
that f −f (a) ∈ I for any f ∈ R[x]. Indeed, say f = α fα xα , then f −f (a) =
P
α α α α
P
α fα (x −a ). It suffices now to show that each x −a belongs to I. We do
this using induction on |α| ≥ 0. If α = 0 there is nothing to prove. Otherwise,
say α1 ≥ 1 and write β = α − e1 so that xα = x1 xβ and aα = a1 aβ . Then
we have
xα − aα = x1 (xβ − aβ ) + aβ (x1 − a1 ) ∈ I

since xβ − aβ ∈ I (using induction) and x1 − a1 ∈ I.


Lemma 11.2.18 Assume p(x) > 0 for all x ∈ K. Then, sp − 1 ∈ M(g)
for some s ∈ Σ.

Proof We need to show that the quadratic module M0 = M(g) − pΣ is


not proper. Assume for a contradiction that M0 is proper. We are going
to construct an element a ∈ K for which p(a) ≤ 0, which contradicts the
assumption that p is positive on K. By Zorn’s lemma4 there exists a maximal
proper quadratic module M containing M0 . Then M is Archimedean since
it contains M(g), which is Archimedean.
By Lemma 11.2.17(ii) there exists a ∈ Rn such that f − f (a) ∈ I for any
polynomial f and thus gj − gj (a) ∈ I for all j ∈ [m]. From this we derive
that
gj (a) = gj − (gj − gj (a)) ∈ M,

since gj ∈ M(g) ⊆ M and gj − gj (a) ∈ I = M ∩ −M ⊆ −M . As M is


4 Zorn’s lemma states the following: Let (P, ≤) be a partially ordered set in which every chain
(totally ordered subset) has an upper bound. Then P has a maximal element. Here we apply
Zorn’s lemma with P the set of quadratic modules and ‘≤’ is inclusion.
11.2 Positivity certificates 259

proper, we must have that gj (a) ≥ 0 for each j. This shows that a ∈ K.
Finally,
−p(a) = (p − p(a)) − p ∈ M,
since p − p(a) ∈ I ⊆ M and −p ∈ M0 ⊆ M . Again, as M is proper, this
implies that −p(a) ≥ 0. We reach a contradiction because a ∈ K and p > 0
on K by assumption.
Lemma 11.2.19 Assume p > 0 on K. Then there exist N ∈ N and
h ∈ M(g) such that N − h ∈ Σ and hp − 1 ∈ M(g).
Proof Choose the polynomial s as in Lemma 11.2.18. Thus, s ∈ Σ and
sp − 1 ∈ M(g). As M(g) is Archimedean, we can find k ∈ N such that
2k − s ∈ M(g) and 2k − s2 p − 1 ∈ M(g).
Set h = s(2k −s) and N = k 2 . Then, h ∈ M(g) and N −h = k 2 −s(2k −s) =
(k − s)2 ∈ Σ. Moreover,
hp − 1 = s(2k − s)p − 1 = 2k(sp − 1) + (2k − s2 p − 1) ∈ M(g),
since sp − 1, 2k − s2 p − 1 ∈ M(g).
We can now show Theorem 11.2.11.
Proof (of Theorem 11.2.11)
Assume p > 0 on K. We want to show that p ∈ M(g). Let h and N satisfy
the conclusion of Lemma 11.2.19. We may assume that N > 0. Moreover let
k ∈ N such that k + p ∈ M(g) (such k exists since M(g) is Archimedean).
Then we have:
 1 1 
k− +p= kh ∈ M(g).
(N − h) (k + p) + (hp − 1) + |{z}
N N | {z } | {z } | {z }
Σ M(g) M(g) M(g)

If k ≤ 1/N then 1/N − k ∈ M(g) (as 1 ∈ M(g)) and thus


 1  1 
p= k− +p + − k ∈ M(g),
N N
and the proof is complete. Otherwise, we have just shown the following
implication:
 1
k + p ∈ M(g) =⇒ k − + p ∈ M(g).
N
Iterating this (kN ) times, we obtain that
 1
p = k − kN + p ∈ M(g).
N
This concludes the proof of Theorem 11.2.11.
260 Sums of Squares of Polynomials

11.3 Notes and further reading


The mathematician David Hilbert obtained the first fundamental results
about nonnegative polynomials and sums of squares. As mentioned earlier
he already knew that not all nonnegative polynomials can be written as sums
of squares of polynomials and, in 1900 at the first International Congress of
Mathematicians in Paris, he asked whether every nonnegative polynomial
on Rn can be written as a sum of squares of rational functions, now known
as Hilbert’s 17th problem.
Artin [1927] gave in 1927 a positive answer to Hilbert’s 17th problem.
This was a major breakthrough which started the field of real algebraic
geometry. Artin’s proof works in the setting of formal real (ordered) fields.
It combines understanding which elements are positive in any ordering of the
field and using Tarski’s transfer principle, which roughly states the following:
If (F, ≤) is an ordered field extension of R which contains a solution x ∈ F n
to a system of polynomial equations and inequalities with coefficients in R,
then this system also has a solution x0 ∈ Rn . Tarski’s transfer principle
also plays a crucial role in the proof of the Positivstellensatz by Krivine
[1964] (Theorem 11.2.5). The Positivstellensatz was in fact rediscovered ten
years later by Stengle [1974]. The monograph of Marshall [2008] contains the
proofs of all the Positivstellensätze described in this chapter (and more), but
they can also be found in other textbooks about real algebraic geometry (like
Bochnak, Coste, Roy [1998], Prestel and Delzell [2001]).
A detailed treatment of Hilbert’s Nullstellensatz can be found in Cox,
Little and O’Shea [1992] (and any other book about commutative algebra).
Reznick [2000] gives a nice historical overview of results about positive
polynomials and sums of squares and many more examples of nonnegative
polynomials that are not sums of squares of polynomials can be found there.
The procedure described in Section 11.1 for testing whether a polynomial
can be written as a sum of squares of polynomials is also known as the
Gram matrix method; it was already presented in the works of Choi, Lam
and Reznick [1995], Powers and Wörmann [1998] and Reznick [2000]. The
idea of using sums of squares combined with the power of semidefinite pro-
gramming in order to obtain tractable sufficient conditions for nonnegativity
of polynomials goes back to the works by Nesterov [2000], Parrilo [2000] and
Lasserre [2001]. So, while sums of squares belong to an old, classical math-
ematical field, it is only recent that they were studied from an algorithmic
point of view and that their relevance to optimization was fully appreciated.
This has started a whole wealth of research activity within the new field
of polynomial optimization and numerous extensions and applications. We
Exercises 261

refer in particular to the monographs by Lasserre [2009, 2015], Blekherman,


Parrilo, Thomas [2012], the handbook by Anjos and Lasserre [2012], the
survey by Laurent [2009], and references therein, for in-depth treatments.
We will touch upon some of these extensions and applications in the next
chapters. In particular, how to solve polynomial optimization problems and
find (sometimes) global minimizers, which exploits the link to the dual the-
ory of moments; how to find real solutions to polynomial equations; and how
to extend polynomial optimization to the general setting of noncommutative
variables with applications to quantum information.

Exercises
11.1 Assume f ∈ R[x] is a sum of squares of polynomials, with deg(f ) = 2d.
(a) Show that if f has a decomposition f = m 2
P
k=1 qk with qk ∈ R[x],
then each polynomial qk has degree at most d.
(b) Show that if f is homogeneous and has a decomposition f = m 2
P
k=1 qk
with qk ∈ R[x], then each polynomial qk is homogeneous and has
degree d.

11.2 Let f (x1 , . . . , xn ) = α:|α|≤2d fα xα be an n-variate polynomial of de-


P

gree 2d and let F (x1 , . . . , xn , t) = α:|α|≤2d fα xα t2d−|α| be the corre-


P

sponding homogeneous (n+1)-variate polynomial (in the n+1 variables


x1 , . . . , xn , t).
(a) Show: f (x) ≥ 0 for all x ∈ Rn ⇐⇒ F (x, t) ≥ 0 for all (x, t) ∈ Rn+1 .
(b) Show: f is a sum of squares of polynomials in R[x1 , . . . , xn ] ⇐⇒ F
is a sum of squares of polynomials in R[x1 , . . . , xn , t].

11.3 Let p be a univariate polynomial.


(a) Show that p can be written as a sum of squares of polynomials if
and only if p is nonnegative over R, i.e., p(x) ≥ 0 for all x ∈ R.
(b) Show that if p is nonnegative over R then it can be written as a sum
of at most two squares.

11.4 Let p be a quadratic n-variate polynomial. Show that p is nonnegative


on Rn if and only if p is a sum of squares of polynomials.
11.5 Assume p is an n-variate polynomial of even degree d, which is non-
negative over Rn but not a sum of squares of polynomials.
(a) Construct an n-variate polynomial of degree d + 2, which is nonneg-
ative on Rn but not a sum of squares of polynomials.
262 Sums of Squares of Polynomials

(b) Construct an (n + 1)-variate polynomial of degree d, which is non-


negative on Rn+1 but not a sum of squares of polynomials.

11.6 Consider the polynomial


f (x, y) = x4 y 2 + y 4 + x2 − 3x2 y 2 ∈ R[x, y].
(a) Show that f is nonnegative over R2 .
(b) Show that f cannot be written as a sum of squares of polynomials.

11.7 Give a “sum of squares” proof for the Cauchy-Schwartz inequality. For
this show that the polynomial
n
! n ! n
!2
X X X
f (x, y) = x2i yi2 − x i yi ∈ R[x1 , . . . , xn , y1 , . . . , yn ]
i=1 i=1 i=1

is a sum of squares of polynomials.


11.8 Given a ∈ Nn with |a| =
P
i ai = 2d, define the polynomial in n
variables x = (x1 , . . . , xn ) and of degree 2d:
n
X n
Y n
X
Fn,2d (a, x) = ai x2d
i − 2d xai i = ai x2d a
i − 2dx .
i=1 i=1 i=1

(a) Show: For n = 2, Fn,2d (a, x) is sum of two squares of polynomials.


(b) Let a ∈ Nn with |a| = 2d. Show that a = b + c for some b, c ∈ Nn ,
where |b| = |c| = d and both bi , ci > 0 for at most one index i ∈ [n].
(c) With a, b, c as in (b), show that
1
Fn,2d (a, x) = (Fn,2d (2b, x) + Fn,2d (2c, x)) + d(xb − xc )2 .
2
(d) Show that, for any a ∈ Nn with |a| = 2d, the polynomial Fn,2d (a, x)
can be written as the sum of at most 3n − 4 squares.
(d) Show the arithmetic-geometric mean inequality (11.3) for any n ∈ N.

11.9 Show Theorem 11.2.2: A univariate polynomial p of degree d is non-


negative on [0, ∞) if and only if p = s0 + s1 x for some s0 , s1 ∈ Σ with
deg(s0 ), deg(s1 x) ≤ d.
11.10 For a univariate polynomial f of degree d define the following univari-
ate polynomial G(f ), known as its Goursat transform:
 
d 1−x
G(f )(x) = (1 + x) f .
1+x
(a) Show that f ≥ 0 on [−1, 1] if and only if G(f ) ≥ 0 on [0, ∞).
Exercises 263

(b) Show Theorem 11.2.3 (using the result of Theorem 11.2.2).

11.11 Show the Real Nullstellensatz (Theorem 11.2.6) (you may use Theo-
rem 11.2.5).
11.12 Let G = (V, E) be a graph. The goal is to show the reformulation
(11.6) for the stability number α(G). Define the parameter
n X o
µ = min xT (AG + I)x : xi = 1, x ≥ 0 . (11.25)
i∈V

(a) Show that µ ≤ 1/α(G).


(b) Let x be an optimal solution of the program (11.25) and let the set
S = {i : xi 6= 0} denote its support. Show that µ ≥ 1/α(G) if S is a
stable set in G.
(c) Show that the program (11.25) has an optimal solution x whose
support is a stable set. Conclude that equality µ = 1/α(G) holds.

11.13 Let Σ2t = Σ∩R[x]2t denote the cone of sums of squares of polynomials
with degree at most 2t. Show that Σ2t is a closed set.
12
Moment matrices and polynomial equations

Consider the polynomial optimization problem:


pmin = inf p(x), (12.1)
x∈K

which asks for the infimum pmin of a polynomial p over a basic closed semi-
algebraic set K, of the form:
K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0}, (12.2)
where g1 , . . . , gm ∈ R[x]. In the preceding chapter we defined a lower bound
for pmin obtained by considering sums of squares of polynomials. Here we
consider another approach, which will turn out to be dual to the sum of
squares approach.
α
P
Write the polynomial p ∈ R[x] as p = α pα x , where there are only
finitely many nonzero coefficients pα , and let
p = (pα )α∈Nn
denote the vector of coefficients of p, so pα = 0 for all |α| > deg(p). Through-
out we let
[x]∞ = (xα )α∈Nn
denote the vector containing all monomials xα . Then, one can write:
X
p(x) = pα xα = pT [x]∞ .
α

Here we have implicitly selected an ordering of the monomials which we use


to order the coordinates in both vectors p and [x]∞ . Let us introduce new
variables yα = xα for α ∈ Nn , aiming to ‘linearize’ the monomials. Then we
can rewrite problem (12.1) as
pmin = inf{pT y : y ∈ {[x]∞ : x ∈ K}}.
12.1 The polynomial algebra R[x] 265

As the objective function is linear in y, we can replace the set of feasible


solutions by its convex hull. So we define the set C∞ (K) as the convex hull
of the vectors [x]∞ for x ∈ K:
n
C∞ (K) = conv{[x]∞ : x ∈ K} ⊆ RN . (12.3)

We can now further reformulate problem (12.1) as

pmin = inf p(x) = inf pT [x]∞ = inf {pT y : y ∈ C∞ (K)}. (12.4)


x∈K x∈K y=(yα )α∈Nn

This leads naturally to the problem of understanding which sequences y


belong to the set C∞ (K). In this chapter we give a characterization for the
set C∞ (K), we will use it in the next chapter as a tool for deriving global
optimal solutions to the polynomial optimization problem (12.1).
This chapter is organized as follows. We introduce some algebraic facts
and ideas about polynomial ideals I ⊆ R[x] and their associated quotient
spaces R[x]/I, which we will need for the characterization of the set C∞ (K).
Using these tools we can describe the so-called eigenvalue method for com-
puting the complex solutions of a system of polynomial equations. As we
will see in the next chapter, this method also gives a useful tool to extract
the global optimizers of problem (12.1). Then we give a characterization
for the sequences y belonging to the set C∞ (K), in terms of properties of
their associated moment matrices. Along the way we mention the link with
the classical moment problem, which asks to characterize the sequences of
moments of measures.

12.1 The polynomial algebra R[x]


Much of the material in this section can be found in textbooks about commu-
tative algebra, such as Cox, Little and O’Shea [1992]. We present it in detail
since it will be very relevant in applications to polynomial optimization.

12.1.1 (Real) radical ideals and the (real) Nullstellensatz


Here, K = R or C denotes the field of real or complex numbers and K[x] is
the algebra of polynomials with coefficients in K. Recall that a set I ⊆ K[x]
is an ideal if I + I ⊆ I and K[x] · I ⊆ I and, given polynomials h1 , . . . , hm ,
the ideal generated by the hj ’s is
m
nX o
(h1 , . . . , hm ) = uj hj : u1 , . . . , um ∈ K[x] .
j=1
266 Moment matrices and polynomial equations

In order not to overload the notation we do not indicate the dependence


on K in the notation (h1 , . . . , hm ), which should be clear from the context.
Conversely, a basic property of the polynomial ring K[x] is that it is Noethe-
rian: every ideal admits a finite set of generators and thus it is of the form
above.
Given a subset V ⊆ Cn , the set
I(V ) = {f ∈ K[x] : f (x) = 0 for all x ∈ V }
is an ideal in K[x], called the vanishing ideal of V .
The complex variety of an ideal I ⊆ K[x] is
VC (I) = {x ∈ Cn : f (x) = 0 for all f ∈ I}
and its real variety is
VR (I) = {x ∈ Rn : f (x) = 0 for all f ∈ I} = VC (I) ∩ Rn .
The elements x ∈ VC (I) are thus the common complex roots to all the
polynomials in I. Clearly, if I = (h1 , . . . , hm ) is generated by the hj ’s, then
VC (I) is the set of common complex roots of the polynomials h1 , . . . , hm and
VR (I) is their set of common real roots.
Given an ideal I ⊆ K[x], the set

I = {f ∈ K[x] : f m ∈ I for some m ∈ N} (12.5)
is an ideal (Exercise 12.1), called the radical of I. (Here too we omit the
dependence on K in the notation.) Clearly we have the inclusions:

I ⊆ I ⊆ I(VC (I)).
Generally the first inclusion is strict.
Example 12.1.1 Consider the ideal I = (x2 ) generated by√the monomial
x2 . Then, VC (I) = {0}. Clearly, the polynomial x belongs to I and thus to
I(VC (I)), but it does not belong to I.
Hilbert’s Nullstellensatz, which was√already stated in Chapter 11.3 (The-
orem 11.2.1), claims that both ideals I and I(VC (I)) coincide.
Theorem 12.1.2 (Hilbert’s Nullstellensatz) For any ideal I ⊆ K[x], we
have equality:

I = I(VC (I)).
That is, a polynomial f ∈ K[x] vanishes at all x ∈ VC (I) if and only if some
power of f belongs to I.
12.1 The polynomial algebra R[x] 267

The ideal I is said to be radical if I = I or, equivalently (in view of
2
√ I = I(VC (I)). For instance, the ideal I = (x ) is
the Nullstellensatz), if
not radical since x ∈ I \ I. Note that 0 is a root with double multiplicity
of the polynomial x2 . Roughly speaking, an ideal is radical when all roots
x ∈ VC (I) have single multiplicity, but we will not go into details about
multiplicities of roots.
Given an ideal I ⊆ R[x], the set

I = {f ∈ R[x] : f 2m + s ∈ I
R
for some m ∈ N, s ∈ Σ} (12.6)
is an ideal in R[x] (Exercise 14.1), called the real radical of I. Clearly we
have the inclusions:
√R
I ⊆ I ⊆ I(VR (I)).
The Real Nullstellensatz, which was√already stated in Chapter 11.3 (The-
orem 11.2.6), claims that both ideals R I and I(VR (I)) coincide.
Theorem 12.1.3 (The Real Nullstellensatz) For any ideal I ⊆ R[x],

R
I = I(VR (I)).
That is, a polynomial f ∈ R[x] vanishes at all common real roots of I if and
only if the sum of an even power of f and of a sum of squares belongs to I.

An ideal I ⊆ R[x] is called real radical if equality I = R I holds.
Example 12.1.4 Consider the ideal I = (x2 +y 2 ) ⊆ R[x, y]. Then, VR (I) =
{(0, 0)} while VC (I) = {(z, ±iz) : z ∈ C}. Here, the symbol i denotes the
complex
√ square root of 1. Note that the polynomials x, x2 , y and y 2 belong to
R
I = I(VR (I)) but not to I. Hence I is not real radical.
We will use the following characterization of (real) radical ideals (see Ex-
ercise 12.2).
Lemma 12.1.5 The following holds.

(i) An ideal I ⊆ K[x] is radical (i.e., I = I) if and only if
for all f ∈ K[x] f 2 ∈ I =⇒ f ∈ I.

(ii) An ideal I ⊆ R[x] is real radical (i.e., R I = I) if and only if
for all f1 , . . . , fm ∈ R[x] f12 + · · · + fm
2
∈ I =⇒ f1 , . . . , fm ∈ I.
Here is a simple, useful observation.
Lemma 12.1.6 Assume V ⊆ Cn is a complex variety, i.e., V = VC (I) for
some ideal I ⊆ K[x]. Then, we have VC (I(V )) = V .
268 Moment matrices and polynomial equations

Proof The inclusion VC (I) ⊆ VC (I(VC (I)))) is clear. Conversely, assume


v 6∈ VC (I). Then there is a polynomial f ∈ I such that f (v) 6= 0. As
f ∈ I ⊆ I(VC (I)), we deduce that v 6∈ VC (I(VC (I))). This shows the reverse
inclusion: VC (I(VC (I)))) ⊆ VC (I).
Note that the inclusion V ⊆ VC (I(V )) can be strict if V is not a complex
variety. For example, for V = C \ {0} ⊆ C, I(V ) = {0}, since the zero
polynomial is the only polynomial vanishing at all elements of V . Hence,
VC (I(V )) = C contains strictly V .
For any ideal I ⊆ K[x], we have the inclusions:
I ⊆ I(VC (I)) ⊆ I(VR (I)),
with equality throughout if I is real radical. Yet this does not imply in
general that VC (I) = VR (I), i.e., that all roots are real. As an example
illustrating this, consider, e.g., the ideal I = (x − y) ⊆ R[x, y]; then I is real
radical, but VR (I) ⊂ VC (I). The situation is however different when the set
VR (I) is finite.
Lemma 12.1.7 If I ⊆ R[x] is a real radical ideal, with finite real variety:
|VR (I)| < ∞, then VC (I) = VR (I).
Proof By assumption, equality I(VR (I)) = I(VC (I)) holds. Hence these
two ideals have the same complex variety: VC (I(VR (I))) = VC (I(VC (I))).
Using Lemma 12.1.6 we obtain the equality VR (I) = VC (I), since VR (I) is a
complex variety (as it is finite, see Exercise 12.3) and VC (I) too is a complex
variety (by definition).

12.1.2 The quotient algebra K[x]/I


Let I be an ideal in K[x]. We define the quotient space A = K[x]/I, whose
elements are the cosets
[f ] = f + I = {f + q : q ∈ I}
for f ∈ K[x]. Then A is an algebra with addition: [f ] + [g] = [f + g], scalar
multiplication λ[f ] = [λf ], and multiplication [f ][g] = [f g], for f, g ∈ K[x]
and λ ∈ K. (It is easy to check that these operations are well defined.) As we
now see, the dimension of the quotient space A = K[x]/I is closely related
to the cardinality of the complex variety VC (I).
Theorem 12.1.8 Let I ⊆ K[x] be an ideal and let A = K[x]/I be the
associated quotient space. The following holds.
12.1 The polynomial algebra R[x] 269

(i) dim A < ∞ if and only if |VC (I)| < ∞.


(ii) Assume |VC (I)| < ∞. Then |VC (I)| √ ≤ dim A, with equality if and only
if the ideal I is radical (i.e., I = I).
Remark 12.1.9 Let I be an ideal in R[x]. Then the set
IC := I + iI = {f + ig : f, g ∈ I}
is an ideal in C[x]. Moreover, the two quotient spaces R[x]/I and C[x]/IC
have the same dimension. Indeed, if f1 , . . . , fr ∈ R[x] are real polynomials
whose cosets in R[x]/I form a basis of R[x]/I, then their cosets in C[x]/IC
form a basis of C[x]/IC . Hence, in order to compute the dimension of R[x]/I,
we can as well deal with the corresponding ideal IC = I + iI in the complex
polynomial ring.
For the proof of Theorem 12.1.8, it is useful to have the following con-
struction of interpolation polynomials. For a polynomial p = α xα ∈ C[x]
P

we set p = α pα xα and, for a set V ⊆ Cn , we set V = {v : v ∈ V }. One


P

says V is stable under conjugation if V = V .


Lemma 12.1.10 Let V ⊆ Kn be a finite set.
(i) There exist polynomials pv ∈ K[x] for v ∈ V satisfying
pv (u) = δu,v for all u, v ∈ V,
where δu,v = 1 if u = v and 0 otherwise. They are called interpolation
polynomials at the points of V . Then, for any polynomial f ∈ K[x],
X
f− f (v)pv ∈ I(V ). (12.7)
v∈V

(ii) Assume V ⊆ Cn is stable under conjugation, so V can be partitioned


as V = S ∪ T ∪ T with S = V ∩ Rn , and consider scalars av ∈ C
satisfying av = av for all v ∈ V . Then there exist interpolation
polynomials pv ∈ C[x] at the points of V that satisfy pv = pv for all
v ∈ V , and a polynomial f ∈ R[x] such that f (v) = av for all v ∈ V .
Proof (i) Fix v ∈ V . For any u ∈ V \ {v}, let iu ∈ [n] be an index of a coor-
dinate where v and u differ, that is, viu 6= uiu . Then define the polynomial
pv by
Y xi − ui
u u
pv = .
viu − uiu
u∈V \{v}

Clearly, pv (v) = 1 and pv (u) = 0 if u ∈ V \ {u}, as desired. By construction


the polynomial in (12.7) vanishes at all v ∈ V and thus belongs to I(V ).
270 Moment matrices and polynomial equations

(ii) Interpolation polynomials qv satisfying the required condition are ob-


tained from the polynomials pv as follows: for v ∈ S define qv as the real
part of pv and, for v ∈ T , set qv = pv and qv = pv . Then the polynomial
P
f = v∈V av qv has real coefficients and satisfies f (v) = av for all v ∈ V .
Example 12.1.11 Consider the set
V = {(0, 0), (1, 0), (0, 2)} ⊆ R2 .
Then the polynomials
p(0,0) = (x1 − 1)(x2 − 2)/2, p(1,0) = x21 , p(0,2) = x2 (1 − x1 )/2
are interpolation polynomials at the points of V .
Lemma 12.1.12 Let I be an ideal in C[x] and A = C[x]/I. Assume that
VC (I) is finite, let pv (v ∈ VC (I)) be interpolation polynomials at the points
of VC (I), and let
L = {[pv ] : v ∈ VC (I)}
be the corresponding set of cosets in A. Then the following holds.
(i) L is linearly independent in A.
(ii) L generates the vector space C[x]/I(VC (I)).
(iii) If I is radical, then L is a basis of A and thus dim A = |VC (I)|.
P
Proof (i) Assume that v∈VC (I) λv [pv ] = 0 for some scalars λv . That is,
P
the polynomial f = v∈VC (I) λv pv belongs to I. Then f (v) = λv = 0 for all
v ∈ VC (I), which shows that L is linearly independent in A.
(ii) Relation (12.7) implies directly that L generates K[x]/I(VC (I)).
(iii) Assume that I is radical and thus I = I(VC (I)) by the Nullstellensatz.
Then, L is linearly independent and generating in A and thus a basis of
A.

Proof (of Theorem 12.1.8). In view of Remark 12.1.9, we may assume that
K = C.
(i) Assume first that dim A = k < ∞, we show that |VC (I)| < ∞. For
this, pick a variable xi and consider the k + 1 cosets [1], [xi ], . . . , [xki ]. Then
they are linearly dependent in A and thus there exist scalars λh (0 ≤ h ≤ k)
Pk h
(not all zero) for which the (univariate) polynomial f = h=0 λh xi is a
nonzero polynomial belonging to I. As f is univariate, it has finitely many
roots. This implies that the i-th coordinates of the points v ∈ VC (I) take
only finitely many values. As this holds for all coordinates we deduce that
VC (I) is finite.
12.1 The polynomial algebra R[x] 271

Assume now that |VC (I)| < ∞, we show that dim A < ∞. For this,
assume that the i-th coordinates of the points v ∈ VC (I) take k distinct
values: a1 , . . . , ak ∈ C. Then the polynomial f = (xi − a1 ) · · · (xi − ak )
vanishes at all v ∈ VC (I). Applying the Nullstellensatz, f m ∈ I for some
integer m ∈ N. This implies that the cosets [1], [xi ], . . . , [xmk
i ] are linearly
dependent. Therefore, there exists an integer ni for which [xni i ] lies in the
linear span of {[xhi ] : 0 ≤ h ≤ ni − 1}. From this one can easily derive that
the set {[xα ] : 0 ≤ αi ≤ ni − 1, i ∈ [n]} generates the vector space A, thus
showing that dim A < ∞.
(ii) Assume |VC (I)| < ∞. Lemma 12.1.12 (i) shows that |VC (I)| ≤ dim A.
If I is radical then the equality dim A = |VC (I)| follows
√ from Lemma 12.1.12
(iii). Assume now that I is not radical and let f ∈ I \ I. If pv (v ∈ VC (I))
are interpolation polynomials at the points of VC (I), then one can easily
verify that the system {[pv ] : v ∈ VC (I)} ∪ {[f ]} is linearly independent in
A, so that dim A ≥ |VC (I)| + 1.

12.1.3 The eigenvalue method for complex roots


A basic, fundamental problem in mathematics and many areas of applica-
tions is how to solve a system of polynomial equations:

h1 (x) = 0, . . . , hm (x) = 0.

In other words, how to compute the complex variety of the ideal I =


(h1 , . . . , hm ). Here we assume that I ⊆ K[x] is an ideal which has finitely
many complex roots: |VC (I)| < ∞. We now describe a method for finding the
elements of VC (I), which is based on computing the eigenvalues of a suitable
linear map on the algebra A = K[x]/I.
Namely, given an arbitrary polynomial h ∈ K[x], we consider the following
linear map, given by ‘multiplication by h’:
mh : A → A
(12.8)
[f ] 7→ [f h].
As VC (I) is finite we known from Theorem 12.1.8 that the vector space A
has finite dimension, say N = dim A. Then N ≥ |VC (I)|, with equality if
and only if I is radical (by Theorem 12.1.8).
Let us choose a basis B = {[b1 ], . . . , [bN ]} of A and let Mh denote the
matrix of the ‘multiplication by h’ operator mh from (12.8) with respect to
the basis B. Moreover, for v ∈ VC (I), define the vector

[v]B = (bj (v))N


j=1
272 Moment matrices and polynomial equations

whose entries are the evaluations at v of the polynomials in the set B. (Note
that this is well defined since the value bj (v) does not depend on the choice
of representative in the coset [bj ].)
Lemma 12.1.13 The vectors {[v]B : v ∈ VC (I)} are linearly independent.
P
Proof Assume that v∈VC (I) λv [v]B = 0 for some scalars λv , which means
P
v∈VC (I) λv bj (v) = 0 for all j ∈ [N ]. As the set B is a basis of the space A,
P
this implies that v∈VC (I) λv f (v) = 0 for all f ∈ K[x] (check it). Applying
this to the polynomial f = pv , we obtain that λv = 0 for all v ∈ VC (I).
As we now show, the matrix Mh carries useful information about the
elements of VC (I): its eigenvalues are the evaluations h(v) of h at the points
v ∈ VC (I) and the corresponding left eigenvectors are the vectors [v]B .
Theorem 12.1.14 Let h ∈ K[x], let I ⊆ K[x] be an ideal with |VC (I)| < ∞,
and let mh be the linear map from (12.8).
(i) Let B be a basis of A and let Mh be the matrix of mh in this basis B.
Then, for each v ∈ VC (I), the vector [v]B is a left eigenvector of Mh
with eigenvalue h(v), that is,
[v]T T
B Mh = h(v)[v]B . (12.9)

(ii) The set {h(v) : v ∈ VC (I)} is the set of eigenvalues of mh .


(iii) Assume that I is radical. Let pv (v ∈ VC (I)) be interpolation polyno-
mials at the points of VC (I). Then,
mh ([pu ]) = h(u)[pu ]
for all u ∈ VC (I). Hence, the interpolation polynomials are (right)
eigenvectors of the multiplication operator mh , and the matrix of
mh in the basis {[pv ] : v ∈ VC (I)} is a diagonal matrix with h(v)
(v ∈ VC (I)) as diagonal entries.
Proof (i) Let Mh be the matrix Mh = (aij )N
i,j=1 , so that

N
X N
X
[hbj ] = aij [bi ], i.e., hbj − aij bi ∈ I.
i=1 i=1

Evaluating the above polynomial at v ∈ VC (I) gives the identity h(v)bj (v) =
PN T
i=1 aij bi (v) = ([v]B Mh )j for all j ∈ [N ], which is relation (12.9).
(ii) By (i), we already know that each scalar h(v) is an eigenvalue of MhT ,
thus also of Mh and of mh . We now show that the scalars h(v) (v ∈ VC (I))
are the only eigenvalues of mh . For this, let λ 6∈ {h(v) : v ∈ VC (I)}, we
12.1 The polynomial algebra R[x] 273

show that λ is not an eigenvalue of mh . Let J denote the ideal generated


by I ∪ {h − λ}. Then, VC (J) = ∅. Applying the Nullstellensatz (Theorem
12.1.2), we obtain that 1 ∈ J and thus 1 − u(h − λ) ∈ I for some u ∈ K[x].
It suffices now to observe that the latter implies that mu (mh − λid) = id,
where id is the identity map from A to A. But then mh − λid is nonsingular,
which implies that λ is not an eigenvalue of mh .
(iii) Assume that I is radical and let {pv : v ∈ VC (I)} be interpolation
polynomials at the points in VC (I). Using relation (12.7), we obtain that
h X i X
mh ([f ]) = [hf ] = (hf )(v)pv = f (v)h(v)[pv ]
v∈VC (I) v∈VC (I)

for any polynomial f . In particular, mh ([pv ]) = h(v)[pv ].


Here is a simple strategy on how to use the above result in order to
compute the points v ∈ VC (I).
Assume that the ideal I is radical (which will be the case in our application
to polynomial optimization) and that we have a polynomial h for which the
values h(v) (v ∈ VC (I)) are pairwise distinct; for instance, if we have a linear
polynomial h with random coefficients this will happen with probability 1.
Assume moreover that we have a basis B of A which contains the constant
polynomial 1 (such a basis can be easily computed in our application to
polynomial optimization and we can take b1 = 1) and that we have the
matrix Mh of mh in this basis.
By Theorem 12.1.14(ii) we know that Mh has N = |VC (I)| distinct eigen-
values, hence each of its eigenspaces is one-dimensional. Therefore, by com-
puting the (right) eigenvectors of MhT , we can recover a scaling of each vector
[v]B = (bj (v))Nj=1 and thus the vector [v]B itself (since we know that its co-
ordinate indexed by b1 is 1). In order to compute the i-th coordinate vi of
v, it suffices to express the coset [xi ] in the basis B: If [xi ] = N
P
j=1 cij [bj ] for
PN
some scalars cij , then vi = j=1 cij bj (v).
Example 12.1.15 Let I = (x3 − 6x2 + 11x − 6) be the ideal generated
by the univariate polynomial x3 − 6x2 + 11x − 6 = (x − 1)(x − 2)(x − 3).
Then, its complex variety is VC (I) = {1, 2, 3} and the set B = {[1], [x], [x2 ]}
is a basis of the quotient space A = R[x]/I. With respect to this basis B, the
matrix of mx , the ‘multiplication by x’ operator, is given by
[x] [x2 ] [x3 ]
 
[1] 0 0 6
Mx = [x]  1 0 −11 ,
[x2 ] 0 1 6
274 Moment matrices and polynomial equations

which can be built using the relation [x3 ] = 6[1] − 11[x] + 6[x2 ]. It is easy
to verify that the matrix MxT has three eigenvectors: (1, 1, 1) with eigenvalue
λ = 1, (1, 2, 4) with eigenvalue λ = 2, and (1, 3, 9) with eigenvalue λ = 3.
Thus the eigenvectors are of the form [v]B = (1, v, v 2 ) for v ∈ VC (I) =
{1, 2, 3}.
The polynomials p1 = (x − 2)(x − 3)/2, p2 = −(x − 1)(x − 3) and p3 =
(x − 1)(x − 2)/2 are interpolation polynomials at the roots v = 1, 2, 3. Using
the relation [(x − 1)(x2 )(x − 3)] = 0, one finds that the matrix of mx with
respect to the base {[p1 ], [p2 ], [p3 ]} is
[xp1 ] [xp2 ] [xp3 ]
 
[p1 ] 1 0 0
[p2 ]  0 2 0 ,
[p3 ] 0 0 3
thus a diagonal matrix with the values v = 1, 2, 3 as diagonal entries.
Finally, we indicate how to compute the number of real roots using the
multiplication operators. This is a classical result, going back to work of
Hermite in the univariate case. You will prove it in Exercise 14.4 for radical
ideals. (See, e.g., Laurent [2009] for the general nonradical case.)
Theorem 12.1.16 Let I be an ideal in R[x] with |VC (I)| < ∞. Define the
Hermite quadratic form:
H : R[x]/I × R[x]/I → R
(12.10)
([f ], [g]) 7→ Tr(mf g ),
where Tr(mf g ) denotes the trace of the multiplication operator by f g. Let
σ+ (H) (resp., σ− (H)) denote the number of positive eigenvalues (resp., neg-
ative eigenvalues) of H. Then, the rank of H is equal to |VC (I)| and
σ+ (H) − σ− (H) = |VR (I)|.

12.2 Characterizing the convex set C∞ (K)


Our goal in this section is to characterize the convex set
n
C∞ (K) = conv{[v]∞ : v ∈ K} ⊆ RN ,
introduced in (12.3). So we want to characterize the sequences y = (yα )α∈Nn
that can be written as convex combinations of the vectors [v]∞ = (v α )α∈Nn
for v ∈ K. Before going into this, we place this question in the broader
perspective of the moment problem. Then we introduce a new important
12.2 Characterizing the convex set C∞ (K) 275

ingredient: the notion of moment matrix, which plays a crucial role in this
problem.

12.2.1 The moment problem


Throughout, when speaking of a measure µ, we mean that µ is a (positive)
Borel measure on Rn , i.e., a measure defined on the open subsets of Rn . The
measure µ is a probability measure if its total mass is one: µ(Rn ) = 1. The
support supp(µ) of µ is the smallest (w.r.t. inclusion) closed set S ⊆ Rn for
which µ(Rn \S) = 0. One says that µ is supported by a set K if supp(µ) ⊆ K.
The moment of order α ∈ Nn of µ is the quantity
Z
xα dµ(x)

and the sequence y = ( xα dµ(x))α is the sequence of moments of µ. Then


R

one also says that µ is a representing measure for y. The moment problem is
n
the problem of deciding whether a given sequence y ∈ RN is the sequence
of moments of a measure; a variant is to decide existence of such a measure
supported by a given closed set K.
For any v ∈ Rn , the Dirac measure at v, denoted by δv , has support {v}
and mass 1 at v. Its sequence of moments is the sequence [v]∞ = (v α )α .
A measure µ is called finitely atomic if µ has finite support. Then µ is
Pr
a conic combination of Dirac measures: µ = i=1 λi δvi , for some scalars
λ1 , . . . , λr > 0 and supp(µ) = {v1 , . . . , vr } ⊆ Rn . Clearly, µ is a probability
measure precisely when ri=1 λi = 1.
P

Hence the set C∞ (K) consists precisely of the sequences of moments of


the finitely atomic measures supported by the set K. It turns out that one
can give a full characterization for this set and this is what we will do in this
chapter. For this we need the notion of moment matrix, which also plays a
crucial role for the study of the general moment problem, to which we will
briefly come back below.

12.2.2 Moment matrices


n
There is a natural one-to-one correspondence between the set RN of se-
quences y = (yα )α and the set R[x]∗ of linear functionals acting on R[x]: any
n
sequence y ∈ RN corresponds uniquely to a linear functional Ly ∈ R[x]∗ ,
defined by
276 Moment matrices and polynomial equations

Ly : R[x] → R
P α 7→ L (f ) =
P (12.11)
f= α fα x y α fα yα .

Given v ∈ Rn , the evaluation map at v is the linear map


Evv : R[x] → R
α 7→ Ev (f ) = f (v) = α
P P
f= α fα x v α fα v .

Hence, Evv is the linear map corresponding to the sequence [v]∞ .


Observe that, for the sequence y = [v]∞ , the (infinite) matrix yy T has
a special structure: its (α, β)-th entry is equal to v α v β = v α+β = yα+β ,
thus it depends only on the sum of the indices α and β. A matrix with
such structure is also known as being of ‘generalized Hankel’ type. This
observation motivates the following definition.
Definition 12.2.1 Given a sequence y = (yα )α∈Nn of real numbers, its
moment matrix is the real symmetric (infinite) matrix M (y) indexed by Nn ,
defined by
M (y) = (yα+β )α,β∈Nn .
Example 12.2.2 For any v ∈ Rn , the moment matrix of the sequence
[v]∞ = (v α )α∈Nn satisfies: M ([v]∞ ) = [v]∞ [v]T
∞ and thus it is positive
semidefinite with rank one.
Remark 12.2.3 We could also define the notion of moment matrix for a
linear functional L ∈ R[x]∗ , as the matrix
M (L) = (L(xα xβ ))α,β∈Nn .
Then, M (Ly ) = M (y) if Ly corresponds to the sequence y, i.e., yα = Ly (xα )
for all α. Both terminologies (in terms of sequences or of linear functionals)
are useful and depending on the context we may prefer one over the other.
Working with linear functionals L ∈ R[x]∗ has the advantage that this
n
is basis-free. On the other hand, working with sequences y ∈ RN , which
amounts to considering the coordinates in the monomial basis, has the ad-
vantage that this is more concrete. For instance, as we see in Lemma 12.2.6
below, the property that Ly is nonnegative on sums of squares is equivalent
to the moment matrix M (y) being positive semidefinite (which means all its
finite principal submatrices are positive semidefinite).
As an example, we have M (Evv ) = M ([v]∞ ) for any v ∈ Rn .
As another example, given
R a measure µ on a closed set K ⊆ Rn consider
the linear map L(p) = K p(x)dµ(x) for p ∈ R[x]. Note that p ≥ 0 on K
12.2 Characterizing the convex set C∞ (K) 277

implies L(p) ≥ 0. As will be seen in Theorem 12.2.9 below, this positivity


property in fact characterizes such linear functionals.

We now define how a polynomial g can act on a sequence y to get a new


sequence, denoted gy.

n
Definition 12.2.4 Given a sequence y ∈ RN and a polynomial g ∈ R[x],
n
define the new sequence gy ∈ RN , with entries

X
(gy)α = Ly (gxα ) = gγ yα+γ for all α ∈ Nn .
|γ|≤deg(g)

So, gy = y for the constant polynomial g = 1. The matrix M (gy) is also


known as a localizing moment matrix.

Example 12.2.5 We give an example to illustrate the notion of moment


matrix. Consider the univariate case n = 1 and a sequence y = (yi )i≥0 ∈ RN .
Its moment matrix:
 
y0 y1 y2 y3 ...
y1 y2 y3 y4 . . .
 
M (y) = y2 y3 y4 y5 . . .


y y4 y5 y6 . . .
 3 
.. .. .. .. ..
. . . . .

has the shape of a Hankel matrix (all entries are the same on each antidi-
agonal). For the polynomial g = 1 − x2 , the sequence gy ∈ RN has entries
(gy)i = yi − yi+2 for i ∈ N and thus its moment matrix has the form

 
y0 − y2 y1 − y3 y2 − y4 y3 − y5 ...
y1 − y3 y2 − y4 y3 − y5 y4 − y6 . . .
 
M (gy) = y2 − y4 y3 − y5 y4 − y6 y5 − y7 . . ..

y − y y4 − y6 y5 − y7 y6 − y8 . . .
 3 5 
.. .. .. .. ..
. . . . .

We now show the shape of the moment matrix M (y) in the bivariate case
n = 2, where we assume it is indexed by the monomials orderd by the graded
278 Moment matrices and polynomial equations

lexicographic order: 1, x1 , x2 , x21 , x1 x2 , x22 , x31 , x21 x2 , x1 x22 , x32 , . . . .:


y0,0 y1,0 y0,1 y2,0 y1,1 y0,2 . . .
 
y1,0 y2,0 y1,1 y3,0 y2,1 y1,2 . . . 
 
y0,1 y1,1 y0,2 y2,1 y1,2 y0,3 . . . 
 
M (y) = y2,0 y3,0 y2,1 y3,0 y3,1 y2,2 . . . 
 
y
 1,1 y2,1 y1,2 y3,1 y2,2 y1,3 . . . 

y
 0,2 y1,2 y0,3 y2,2 y1,3 y0,4 . . . 

.. .. .. .. .. .. ..
. . . . . . .

We now observe that nonnegativity of a linear functional on the cone gΣ


can be reformulated in terms of positive semidefiniteness of the correspond-
ing (localizing) moment matrix M (gy).
Lemma 12.2.6 Let y = (yα )α∈Nn be a sequence of real numbers and let
Ly be the corresponding linear functional from (12.11). For any polynomials
f, g, h ∈ R[x] we have:

Ly (f gh) = fT M (gy)h, (12.12)


n
where gy ∈ RN is the sequence as defined in Definition 12.2.4. In particular,
we have: Ly (f g) = fT M (y)g, Ly (gf 2 ) = fT M (gy)f, Ly (f 2 ) = fT M (y)f.
Therefore, we have:

Ly ≥ 0 on Σ ⇐⇒ M (y)  0 and Ly ≥ 0 on gΣ ⇐⇒ M (gy)  0.

Proof It suffices to show the first identity, since the rest follows then easily.
For f = α fα xα , h = β hβ xβ , we have:
P P

X  X X
Ly (f gh) = Ly fα hβ xα+β g = fα hβ Ly (xα+β g) = fα hβ (gy)α+β ,
α,β α,β α,β

which is equal to fT M (gy)h.

Next we observe that the kernel of the moment matrix M (y) can be seen
as an ideal of R[x], which is real radical when M (y)  0. This observation
will play a cucial role in the characterization of the set C∞ (K) in the next
section.
Lemma 12.2.7 Let y = (yα )α∈Nn be a sequence of real numbers and let
Ly be the corresponding linear functional from (12.11). Define the set

I = {f ∈ R[x] : Ly (f h) = 0 for all h ∈ R[x]}. (12.13)

Then the following holds.


12.2 Characterizing the convex set C∞ (K) 279

(i) I is an ideal in R[x].


(ii) A polynomial f belongs to I if and only if its coefficient vector f belongs
to the kernel of M (y), i.e., M (y)f = 0.
(iii) If M (y)  0 then the ideal I is real radical.
Proof (i), (ii): Direct verification, using (12.12).
(iii) Using Lemma 12.2.6 and the fact that M (y)  0, the following holds
for any polynomial f :
Ly (f 2 ) = fT M (y)f = 0 =⇒ M (y)f = 0 =⇒ f ∈ I,
applying (i) for the last implication. We now show that I is real radical,
P 2
using the characterization from Lemma 12.1.5: Assume that
P 2 i fi ∈ I.
2 ) and thus L (f 2 ) = 0, which in turn
P
Then, 0 = Ly ( i fi ) = L (f
i y i y i
implies that fi ∈ I for all i.
From this we obtain some necessary conditions for a sequence to have a
representing measure.
n
Lemma 12.2.8 Assume y ∈ RN is the sequence of moments of a measure
µ. Let f, g ∈ R[x]. The following holds.
(i) We have M (y)  0. Moreover, if supp(µ) ⊆ {x ∈ Rn : g(x) ≥ 0}, then
M (gy)  0.
(ii) If M (y)f = 0, then µ is supported by the real variety of f , i.e., we have
supp(µ) ⊆ VR (f ). Moreover, if supp(µ) ⊆ {x ∈ Rn : g(x) ≥ 0} and
M (gy)f = 0 then supp(µ) ⊆ VR (f g).
(iii) If µ is finitely atomic then rank M (y) = |supp(µ)|.
Proof Let Ly ∈R R[x]∗ be the linear functional corresponding to y, so that
Ly (f ) = fT y = f (x)dµ(x) for any f ∈ R[x]. Then it is clear that Ly ≥ 0
on Σ and that, if supp(µ) ⊆ {x : g(x) ≥ 0}, then Ly ≥ 0 on gΣ. Hence (i)
follows directly using Lemma 12.2.6.
(ii) It suffices to show the second claim since it implies the first one (taking
g = 1). Assume supp(µ) ⊆ {x : g(x) ≥ 0}. ThenR M (gy)  0 by (i). Assume
also M (gy)f = 0. This implies 0 = fT M (gy)f = f (x)2 g(x)dµ(x). We show
that µ(Rn \ VR (f 2 g)) = 0, which implies that supp(µ) ⊆ VR (f 2 g) = VR (f g)
since VR (f g) is a closed set. For this, write Rn \ VR (f 2 g) = ∪k≥1 Uk , where
we set Uk = {x : f (x)2 g(x) ≥ 1/k}. Then, we have:
Z Z
2 1
0 = f (x) g(x)dµ(x) ≥ f (x)2 g(x)dµ(x) ≥ µ(Uk ).
Uk k
This shows µ(Uk ) = 0 for all k ≥ 1 and thus µ(Rn \ VR (f 2 g)) = 0 as desired.
(iii) will be shown in Exercise 12.5.
280 Moment matrices and polynomial equations

The above lemma shows that if y is the sequence of moments of a finite


atomic measure then its moment matrix M (y) is positive semidefinite with
finite rank. In fact, this implication holds as an equivalence, as will be shown
in Theorem 12.2.14. First, we mention in the next section some selected
fundamental results about the problem of moments.

12.2.3 Some results on the moment problem


Here we group some classical results about the moment problem, we mention
them without proofs, since these are out of the scope of this book. We refer,
e.g., to the monographs by Marshall [2008] and Schmüdgen [2017], or the
survey by Laurent [2009]) for a detailed exposition.
Let K ⊆ Rn be a closed set and recall the definition from (11.7) of the cone
P(K) of nonnegative polynomials over K. A fundamental fact is that the
cone P(K) and the cone of linear functionals having a representing measure
on K are dual to each other.
Theorem 12.2.9 (Haviland [1936]) For a linear functional L ∈ R[x]∗ the
following assertions are equivalent:

(i) L(p) ≥ 0 for all p ∈ P(K), i.e., such that p(x) ≥ 0 for all x ∈ K.
(ii) There exists
R a measure µ supported by K such that
L(p) = K p(x)dµ(x) for all p ∈ R[x].
Assume that K is given as in (12.2). Then clearly the quadratic module
M(g) is contained in P(K). A natural question is whether nonnegativity
over M(g) suffices to claim existence of a representing measure. Putinar
[1993] gave an affirmative answer under the Archimedean condition, which
is thus the analogue on the ‘moment side’ of Theorem 11.2.11.
Theorem 12.2.10 (Putinar [1993]) Assume that M(g) is Archimedean.
Then, for any L ∈ R[x]∗ , the following assertions are equivalent:

(i) L(p) ≥ 0 on M(g).


(ii) There exists
R a measure µ supported by K such that
L(p) = K p(x)dµ(x) for all p ∈ R[x].
In the univariate case, we have the following classical results when K ⊆ R
is an interval, thus the analogues on the ‘moment side’ of the results in
Section 11.2.1.
Theorem 12.2.11 (Hamburger) A linear functional L ∈ R[x] has a rep-
resenting measure on the line R if and only if L ≥ 0 on Σ.
12.2 Characterizing the convex set C∞ (K) 281

Theorem 12.2.12 (Stieltjes) A linear functional L ∈ R[x] has a repre-


senting measure on the half-line [0, ∞) if and only if L ≥ 0 on M(x).
Theorem 12.2.13 (Hausdorff) A linear functional L ∈ R[x] has a rep-
resenting measure on the compact interval [a, b] if and only if L ≥ 0 on
M(x − a, b − x) or, equivalently, L ≥ 0 on M((x − a)(b − x)).

12.2.4 Finite rank positive semidefinite moment matrices


We now characterize the sequences belonging to the convex set C∞ (K), in
terms of positive semidefiniteness and finite rank conditions on their moment
matrices.
Theorem 12.2.14 (Curto and Fialkow [1996]) Let K be the set from
(12.2). Let y = (yα )α∈Nn be a sequence of real numbers and let Ly be the
linear functional from (12.11). Then the following assertions are equivalent.

(i) y ∈ C∞ (K), i.e., y has a representing probability measure which is


finitely atomic and supported by K.
(ii) rank M (y) < ∞, M (y)  0, M (gj y)  0 for all j ∈ [m], and y0 = 1.
(iii) rank M (y) < ∞, Ly ≥ 0 on Σ + g1 Σ + · · · + gm Σ, and Ly (1) = 1.

Proof The equivalence of (ii) and (iii) follows directly from Lemma 12.2.6
(and the fact that Ly (1) = y0 ) and the implication (i) =⇒ (ii) follows from
Lemma 12.2.8.
We now show the implication (ii) =⇒ (i), the technical core of Theo-
rem 12.2.14. So we assume that (ii) holds and set r = rank M (y); we will
construct a finite r-atomic measure supported by K with y as sequence of
moments. Recall Ly is the linear functional from (12.11) and let I be the set
from (12.13). By assumption, we have Ly (1) = y0 = 1. By Lemma 12.2.7,
we know that I is a real radical (thus also radical) ideal in R[x].
First we claim
dim R[x]/I = r.

This follows directly from the fact that a set of columns {C1 , . . . , Cs } of the
moment matrix M (y), indexed by {α1 , . . . , αs } ⊆ Nn , is linearly independent
if and only if the corresponding cosets of monomials {[xα1 ], . . . , [xαs ]} is
linearly independent in R[x]/I. Hence, dim R[x]/I = rank M (y) = r.
As dim R[x]/I = r < ∞, it follows using Theorem 12.1.8(ii) that |VC (I)| =
dim R[x]/I = r. Finally, by Lemma 12.1.7, we obtain VR (I) = VC (I). Hence

VC (I) = {v1 , . . . , vr } ⊆ Rn
282 Moment matrices and polynomial equations

for some v1 , . . . , vr ∈ Rn . Let pv1 , . . . , pvr ∈ R[x] be interpolation polynomi-


als at the vi ’s (which exist by Lemma 12.1.10) and define the linear form
r
X
0
L = Ly (pvi )Evvi ,
i=1

where Evvi is the evaluation at vi . We next claim that


r
X
0
Ly = L , i.e., y = Ly (pvi )[vi ]∞ . (12.14)
i=1

As both Ly and L0 vanish at all polynomials in I, in order to show that Ly =


L0 , it suffices to show that Ly and L0 coincide at all elements of a given basis
of R[x]/I. Now, by Lemma 12.1.12, we know that the set {[pv1 ], . . . , [pvr ]} is
a basis of R[x]/I and it is indeed true that L0 (pvi ) = Ly (pvi ) for all i. Thus
(12.14) holds and this implies
r
X
M (y) = Ly (pvi )[vi ]∞ [vi ]T
∞. (12.15)
i=1

Next, we claim that


r
X
Ly (pvi ) > 0 for all i ∈ [r] and Ly (pvi ) = 1.
i=1

For this, note that the polynomial pv1 − p2vi vanishes at all points in VC (I)
and thus pvi − p2vi ∈ I(VC (I)) = I since I is radical. Therefore, we have
Ly (pv1 ) = Ly (p2vi ) ≥ 0, where the inequality follows since M (y)  0. Also,
Ly (pvi ) 6= 0 since, otherwise, in view of (12.15) the rank of M (y) would be
smaller than r. Finally, since the polynomial ri=1 pvi − 1 vanishes at all
P

points in VC (I) it belongs to I(VC (I)) = I, thus we have Ly ( ri=1 pvi − 1) =


P
Pr
0, which implies i=1 Ly (pvi ) = L(1) = 1.
It remains to show that the points v1 , . . . , vr belong to the set K, i.e.,
that gj (vi ) ≥ 0 for all j ∈ [m] and i ∈ [r]. For this, we use the fact that
Ly (gj p2vi ) ≥ 0, which follows from the assumption M (gj y)  0. Then, using
(12.14), we get:

Ly (gj p2vi ) = gj (vi )Ly (pvi ) ≥ 0.

As we just showed that Ly (pvi ) > 0 this implies gj (vi ) ≥ 0, as desired, and
the proof is complete.
12.2 Characterizing the convex set C∞ (K) 283

12.2.5 Moment relaxation for polynomial optimization


We come back to the polynomial optimization problem (12.1). In Chap-
ter 11.3 (relation (11.9)), we defined the following parameter psos , which
provides a lower bound on the infimum pmin of p over the set K:

psos = sup{λ : λ ∈ R, p − λ ∈ M(g)}. (12.16)

Recall that M(g) is the quadratic module generated by g = {g1 , . . . , gm },


defined by
M(g) = Σ + g1 Σ + . . . + gm Σ.

Based on the discussion in the preceding section, we can define the following
parameter, which also provides a lower bound for pmin :
n
pmom = inf{pT y : y ∈ RN , y0 = 1, M (y)  0, M (gj y)  0 for j ∈ [m]}
= inf{L(p) : L ∈ R[x]∗ , L(1) = 1, L ≥ 0 on M(g)}.
(12.17)
As we now see, these two bounds are in weak duality to each other.

Lemma 12.2.15 We have: psos ≤ pmom ≤ pmin .

Proof The inequality pmom ≤ pmin follows from the fact that, for each
v ∈ K, the evaluation Evv at v is feasible for the second program in (12.17),
with objective value Evv (p) = p(v). An alternative, equivalent way is to
notice that the sequence y = [v]∞ is feasible for the first program in (12.17).
The inequality psos ≤ pmom follows from ‘weak duality’: Let λ be feasible
for (12.16) and let L be feasible for (12.17). That is, p − λ ∈ M(g), L(1) = 1
and L ≥ 0 on M(g). Then, we have L(p) − λ = L(p − λ) ≥ 0, which implies
L(p) ≥ λ and thus pmom ≥ psos .

Moreover, as a direct application of Putinar’s theorem (Theorem 11.2.11),


we see that the bounds psos and pmom coincide with pmin when the quadratic
module M(g) is Archimedean.

Theorem 12.2.16 If the quadratic module M(g) is Archimedean then we


have: psos = pmom = pmin .

Proof Fix a scalar  > 0. Then the polynomial p − pmin +  is positive on


the set K and thus, by Theorem 11.2.11, it belongs to M(g). This implies
that psos ≥ pmin − . Letting  tend to 0 we obtain the inequality psos ≥ pmin .
Hence equality holds throughout in Lemma 12.2.15.

In addition, it follows from Theorem 12.2.14 that equality pmom = pmin


284 Moment matrices and polynomial equations

holds if the program (12.17) has an optimal solution y for which M (y) has
finite rank. We will come back to this in the next section.
Note that the programs (12.16) and (12.17) are infinite dimensional pro-
grams, since no degree bound has been put on the unknown sums of squares
polynomials, and the linear functional L acts on the full polynomial space.
In the next chapter we will consider hierarchies of semidefinite programming
relaxations for problem (12.1) that are obtained by adding degree constraints
to the programs (12.16) and (12.17), and we will use the results of Theo-
rems 12.1.14 and 12.2.14 for giving a procedure to find global optimizers of
problem (12.1).

12.3 Notes and further reading


1

The terminology of ‘moment matrix’ which we have used for the matrix
M (y) is motivated by the relevance of these matrices to the classical moment
problem. Recall that,R given a (positive Borel) measure µ on a subset K ⊆ Rn ,
α
the quantity yα = K x dµ(x) is called its moment of order α. The classical
n
K-moment problem asks to characterize the sequences y ∈ RN which are
the sequence of moments of some measure µ supported by K.
In the special case when µ is a Dirac measure at a point v ∈ Rn , i.e.,
when µ has mass only at the point v, its sequence of moments is precisely
the sequence [v]∞ = (v α )α∈Nn . More generally, when µ is a finite atomic
measure, which means that µ is supported by finitely many points of K,
then its sequence of moments is of the form y = ri=1 λi [vi ]∞ for finitely
P

many positive scalars λi and points vi ∈ K. In other words, the set C∞ (K)
coincides with the set of sequences of moments of finite atomic measures on
K. Moreover, the closure of the set C∞ (K) is the set of sequences of moments
of an arbitrary measure on K. Hence, Theorem 12.2.14 characterizes which
sequences admit a finite atomic measure on K, when K is a basic closed semi-
algebraic set, in terms of positivity and finite rank conditions on the sequence
y. This result is due to Curto and Fialkow [1996]. (When the condition
rank M (y) < ∞ holds, Curto and Fialkow speak of flat data). The proof of
Curto and Fialkow [1996] uses tools from functional analysis, the algebraic
proof given here is based on Laurent [2005] (see also Laurent [2009]). In
Chapter 14.4 we will see another functional analytic based approach in the
more general setting of noncommutative polynomial optimization.
We refer to the books of Cox, Little and O’Shea [1992, 1998] for further
1 This section needs to be updated
Exercises 285

reading about ideals and varieties and, in particular, about multiplication


operators in the quotient space R[x]/I.

Exercises
√ √
12.1 Recall the definitions (12.5) and (12.6) for I and R I.

(a) Show that the radical I of an ideal I ⊆ C[x] is an ideal.

(b) Show that the real radical R I of an ideal I ⊆ R[x] is an ideal.
12.2 Show Lemma 12.1.5.
12.3 (a) Let I and J be two ideals in C[x]. Show that I ∩ J is an ideal and
that VC (I ∩ J) = VC (I) ∪ VC (J).
(b) Given v ∈ Cn , show that the set {v} is a complex variety.
(c) Show that any finite set V ⊆ Cn is a complex variety.
12.4 The goal is to show Theorem 12.1.16 in the radical case.
Let I be a radical ideal in R[x] with N = |VC (I)| = dim R[x]/I < ∞.
Let B = {[b1 ], . . . , [bN ]} be a base of A = R[x]/I and, for any h ∈ R[x],
let Mh denote the matrix of the multiplication by h in the base B. Then,
the matrix of the Hermite quadratic form (12.10) in the base B is the
real symmetric matrix H = (Hij )N i,=1 with entries Hij = Tr(Mbi bj ).
Finally, σ+ (H), σ− (H) denote, respectively, the numbers of positive
and negative eigenvalues of H.
(a) Show that H = v∈VC (I) [v]B [v]T
P
B and rank(H) = |VC (I)|.

(b) Show that VC (I) can be partitioned into VR (I) ∪ T ∪ T , where T is


the set of complex conjugates of the elements of T .
(c) Show that there exist matrices P and Q such that H = P − Q,
P  0, Q  0, rank(P ) = |VR (I)| + |T |, and rank(Q) = |T |.
(d) Show that H = A − B for some matrices A, B such that A, B  0,
AB = BA = 0, rank(A) = σ+ (H) and rank(B) = σ− (H).
(e) Show that σ+ (H) = |VR (I)| + |T | and σ− (H) = |T |.

12.5(a) Given a finite set {v1 , . . . , vr } ⊆ Rn , show that the corresponding


evaluation maps {Evv1 , . . . , Evvr } ⊆ R[x]∗ are linearly independent.

(b) Let µ be a finite atomic measure and let y be the sequence of mo-
ments of µ. Show that rankM (y) = |supp(µ)|.
286 Moment matrices and polynomial equations

12.6 Let K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0} ⊆ Rn , where gj are


nonzero polynomials. Let µ be a measure with supp(µ) = K and let y
be its sequence of moments. Assume that K has a nonempty interior.
Set g0 = 1.
Show that, for any j = 0, 1, . . . , m, we have: M (gj y)  0; that is,
M (gj y)  0 and, for every nonzero polynomial f ∈ R[x], M (gj y)f 6= 0.
12.7 Given polynomials g1 , . . . , gm , h1 , . . . , hm0 ∈ R[x] set
K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0, h1 (x) = 0, . . . , hm0 (x) = 0}
and let I = (h1 , . . . , hm0 ) denote the ideal generated by the hl ’s. One
can partition VC (I) as VC (I) = S ∪ T ∪ T , where S = VC (I) ∩ Rn .
Assume the ideal I is zero-dimensional (i.e., |VC (I)| < ∞) and radical.
Let f ∈ R[x].
(a) Assume f ≥ 0 on S. Show that f = σ + q, where σ ∈ Σ and q ∈ I.
Hint: Let pv be interpolation polynomials at the points in VC (I) as
in Lemma 12.1.10(ii) and consider the polynomial
Xp X p p 2
f− ( f (v)pv )2 − f (v)pv + f (v)pv .
v∈S v∈T
Pm
(b) Assume f ≥ 0 on K. Show that f = σ0 + j=1 σj gj + q, where
σ0 , σj ∈ Σ and q ∈ I.
Hint: Using Lemma 12.1.10(ii)) construct polynomials f0 , f1 , . . . , fm ∈
R[x] satisfying the following conditions: f0 (v) = f (v) and f1 (v) =
. . . = fm (v) = 0 if v ∈ T ∪T ∪{v ∈ S : f (v) ≥ 0}. For each remaining
v ∈ VC (I), we have v ∈ S \ K and thus gjv (v) < 0 for some jv ∈ [m];
then require fjv (v) = gfj (v) and f0 (v) = fj (v) = 0 for j ∈ [m] \ {jv }.
v (v)
Next, consider the polynomials f − f0 − m
P
j=1 fj gj .
13
Polynomial optimization

In this chapter we insvestigate in detail the polynomial optimization prob-


lem:
pmin = inf p(x), (13.1)
x∈K

where K is defined by polynomial inequalities:


K = {x ∈ Rn : g1 (x) ≥ 0, . . . , gm (x) ≥ 0} (13.2)
with p, g1 , . . . , gm ∈ R[x]. We set g0 = 1 and g = {g1 , . . . , gm }. In the
previous two chapters we have introduced the two parameters:
n m
X o
psos = sup λ : λ ∈ R, p − λ ∈ M(g) = gj Σ ,
j=0

and
pmom = inf{L(p) : L ∈ R[x]∗ , L(1) = 1, L ≥ 0 on M(g)}.
Recall that R[x]∗ is the dual space of R[x], consisting of all real valued linear
maps on R[x]. These two parameters satisfy the inequalities:
psos ≤ pmom ≤ pmin
(Lemma 12.2.15), with equality throughout when the quadratic module
M(g) is Archimedean (Theorem 12.2.16). They both can be reformulated
using positive semidefinite matrices. However these are infinite matrices, in-
dexed by Nn , since there is no a priori degree bound on the sums of squares
entering decompositions of polynomials in M(g), and since the linear func-
tionals act on the full polynomial space R[x]. Hence, it is not clear how to
compute the parameters pmom and psos .
There is however a simple remedy: instead of working with the full quadratic
module M(g) ⊆ R[x], we truncate it by adding increasing degree bounds. In
288 Polynomial optimization

this way we obtain finite matrices, leading to semidefinite programs whose


optimal values converge to the parameters pmom and psos as the degree
bounds grow.
This approach goes as follows. Given an integer t, define the quadratic
module, truncated at degree 2t:
m
nX o
M(g)2t = gj sj : sj ∈ Σ, deg(sj gj ) ≤ 2t for j = 0, 1, . . . , m ,
j=0
P
which consists of the elements j sj gj in the quadratic module M (g) where
all summands have degree at most 2t. Then, for any integer t ≥ deg(p)/2,
we can define the parameters:

psos,t = sup{λ : λ ∈ R, p − λ ∈ M(g)2t } (13.3)

and

pmom,t = inf{L(p) : L ∈ R[x]∗2t , L(1) = 1, L ≥ 0 on M(g)2t }. (13.4)

We will refer to (13.3) as the sos program and to (13.4) as the moment
program.
It follows from the definitions that

psos,t ≤ psos and pmom,t ≤ pmom for all t ≥ deg(p)/2.

Using an analogous ‘weak duality’ argument as for Lemma 12.2.15, we ob-


tain:
psos,t ≤ pmom,t ≤ pmin for all t ≥ deg(p)/2. (13.5)

Moreover, as t grows, we get bounds for pmin that are potentially better and
better:

psos,t ≤ psos,t+1 and pmom,t ≤ pmom,t+1 for all t ≥ deg(p)/2.

In this chapter we investigate some of the most important properties of


these hierarchies of bounds {psos,t } and {pmom,t }:

1. Asymptotic convergence: Both bounds converge to pmin when M(g)


is Archimedean.
2. Dual semidefinite programs: The parameters psos,t and pmom,t are
defined by dual semidefinite programs.
3. Strong duality: There is no duality gap, i.e., pmom,t = psos,t , when K
has a nonempty interior, or when a ball constraint is present in the
description of K.
Polynomial optimization 289

4. Finite convergence and global minimizers: When the moment pro-


gram (13.4) has an optimal solution satisfying a special rank condi-
tion (the so-called flatness condition), the bound pmom,t is exact, i.e.,
we have: pmom,t = pmin . In other words, we have finite convergence
at the current order t. In addition, one can then compute global
minimizers of the polynomial p in the set K.

To establish the finite convergence result mentioned above we will need an


additional ingredient: how to extend a finite sequence satisfying the flatness
condition to an infinite sequence, in such a way that the ranks of the corre-
sponding moment matrices stay the same; this is the ‘flat extension theorem’
of Curto and Fialkow (Theorem 13.4.3).
We also consider the case when explicit polynomial equations are present
in the description of the set K and mention how they can be used to
give more economical reformulations for the sos/moment bounds involv-
ing smaller matrices and less variables. We discuss in particular the case of
binary polynomial optimization, when the set K is contained in the Boolean
cube {0, 1}n (or {±1}n ) and explain the link to the Lasserre hierarchy of
bounds introduced in Chapter 4 for the stable set problem.
Here we group some additional notation that will be used in this chapter.
For a polynomial g ∈ R[x] we set

dg = ddeg(g)/2e,

so dg0 = 0 for the constant polynomial g0 = 1 and t ≥ dg means 2t ≥ deg(g).


We also set

dK = max{dg1 , . . . , dgm } with K defined as in (13.2). (13.6)

In the unconstrained case (when m = 0, so that K = Rn ) we set dK = 1.


For an integer t ∈ N,
[x]t = (xα )α∈Nnt

is the vector containing all monomials of degree at most t. For t = ∞ this is


consistent with the definition of [x]∞ = (xα )α∈Nn introduced in the previous
chapter. For a polynomial g = α gα xα and an integer t ≥ deg(g), we let
P

g = (gα )α∈Nnt

denote the coefficient vector of g, where gα = 0 whenever |α| > deg(g). Then
we have
g(x) = gT [x]t for any t ≥ deg(g).
290 Polynomial optimization

13.1 Asymptotic convergence


The Archimedean condition discussed in Chapter 11.2.5 plays a crucial role
for the convergence of the sos/moment bounds to the infimum of p. Recall
from Theorem 11.2.8 that the quadratic module M(g) is Archimedean when
there exists a scalar R > 0 for which the polynomial R2 − ni=1 x2i belongs
P

to M(g); clearly, this implies that K is contained in the ball with radius R
(and thus K is compact).
Of course, if a ball constraint is already explicitly present in the description
(13.2) of K, say the first polynomial is g1 (x) = R2 − m 2
P
i=1 xi , then it is
immediately clear that the quadratic module M(g) is Archimedean.
Theorem 13.1.1 Assume that M(g) is Archimedean. Then, the bounds
pmom,t and psos,t converge asymptotically to pmin as t → ∞.
Proof Pick  > 0. Then the polynomial p − pmin +  is strictly positive on
K. As M(g) is Archimedean, we can apply Putinar’s theorem (Theorem
13.2.9) and deduce that p − pmin +  ∈ M(g). Hence, there exists t ∈ N
such that p − pmin +  ∈ M(g)2t and thus pmin −  ≤ psos,t . Therefore,
limt→∞ psos,t = pmin . By (13.5), psos,t ≤ pmom,t ≤ pmin for all t ≥ deg(p)/2.
Hence limt→∞ pmom,t = pmin holds as well.
As the above proof shows, it follows from Putinar’s theorem that the sos
program (13.3) is feasible for all t large enough when M(g) is Archimedean.
In fact, as we will see later in Lemma 13.3.2, it is feasible for all t ≥ 1 when
a ball constraint is present in the description of K.

13.2 Dual semidefinite programs


In this section we give the explicit reformulation of the programs (13.3)
and (13.4) as semidefinite programs and we observe that these semidefinite
programs are in fact dual of each other.
For this we need to introduce the analogues for the truncated setting of
some notions and facts introduced in Chapter 12 for the case of the full
polynomial space R[x]. Fix an integer t ∈ N. Clearly, there is a one-to-one
n
correspondence between linear functionals on R[x]2t and sequences in RN2t :
any linear functional L ∈ R[x]∗2t is completely specified by the sequence
n
of real numbers y = (yα )α∈Nn2t ∈ RN2t , where yα = L(xα ) for |α| ≤ 2t
and, to stress this correspondence, we rename L as Ly . Then we define the
corresponding moment matrix, truncated at order t, which is indexed by Nnt
and defined by
Mt (y) = (yα+β )α,β∈Nnt = (Ly (xα+β ))α,β∈Nnt .
13.2 Dual semidefinite programs 291

We now extend the operation of defining a new sequence gy by action of


a polynomial g on a sequence y. However, we must take into account that
n
the entries of y are defined only up to degree 2t. Then, given y ∈ RN2t and
Nn
g ∈ R[x] with deg(g) ≤ 2t, define gy as the sequence in R 2(t−dg ) with entries
X
(gy)α = Ly (gxα ) = gβ yα+β for |α| ≤ 2(t − dg ).
β:|β|≤deg(g)

Note that each entry of gy is well-defined if |α| ≤ 2(t − dg ), since then


|α + β| ≤ 2(t − dg ) + deg(g) ≤ 2t.
The following is the ‘truncated’ analogue of Lemma 12.2.6:
n
Lemma 13.2.1 Let y ∈ RN2t and Ly ∈ R[x]∗2t be the corresponding linear
functional. Then we have:
Ly ≥ 0 on Σ2t ⇐⇒ Mt (y)  0,
and
Ly ≥ 0 on gΣ2(t−dg ) ⇐⇒ Mt−dg (gy)  0,
for any polynomial g with degree at most 2t.
Based on this, the moment program (13.4) can be equivalently reformu-
lated as:
pmom,t = inf{pT y : y ∈ Nn2t , y0 = 1, Mt−dgj (gj y)  0 for j = 0, 1, . . . , m}.
(13.7)
The matrices Mt−dgj (gj y) (for j ∈ [m]) are also known as the (truncated)
localizing moment matrices. Roughly speaking, if the corresponding linear
functional Ly is a convex combination of evaluations at points v ∈ Rn , then
the role of these moment matrices is to ensure that the v’s all lie within the
set K and thus correspond to the global minimizers of p in K. This will
become more clear later on in the chapter.
We now work out explicitly the fact that the dual semidefinite program
of the moment program (13.7) coincides with the sos program (13.3). The
details are not difficult but a bit technical, nevertheless it is instructive to
go through them at least once. As a warm-up the reader may want to work
them out first in the unconstrained case (when m = 0, g0 = 1).
We need to introduce some matrices in order to express (13.7) as a semidef-
inite program in standard form.
For any γ ∈ Nn2t , let A0t,γ be the matrix indexed by Nnt , with entries
(A0t,γ )α,β = 1 if γ = α + β, 0 otherwise.
292 Polynomial optimization

Then, we define such matrices for each polynomial constraint gj = δ (gj )δ xδ


P

defining the set K: for γ ∈ Nn2t and j ∈ [m], Ajt,γ is the matrix indexed by
Nnt−dg , with entries
j

(Ajt,γ )α,β =
X
(gj )δ .
δ:α+β+δ=γ

The notation is consistent: for j = 0 (g0 = 1) we recover the matrix A0t,γ .


Using these matrices we can express the localizing moment matrices as

yγ Ajt,γ for j = 0, 1, . . . , m.
X
Mt−dgj (y) = (13.8)
γ∈Nn
2t

Note that if we apply this to the case when y = [x]∞ = (xα )α is a monomial
vector then we obtain:

xγ Ajt,γ = gj (x) · [x]t−dgj ([x]t−dgj )T for j = 0, 1, . . . , m.


X
(13.9)
γ∈Nn
2t

Using the expression (13.8) of the localizing moment matrices in terms of


the matrices Ajt,γ , we can reformulate the moment problem (13.7) as
P
pmom,t = p0 + inf 1≤|γ|≤2t pγ yγ
s.t. At,0 + 1≤|γ|≤2t yγ Ajt,γ  0 for j = 0, 1, . . . , m,
j P

(13.10)
where the optimization variable is y = (yγ )γ∈Nn2t \{0} .
We now proceed to reformulate the sos problem (13.3) using the matrices
Ajt,γ . By definition, the parameter psos,t is the largest scalar λ for which the
polynomial p − λ can be written as
m
X
p−λ= sj gj with sj ∈ Σ2(t−dgj ) .
j=0

For each j = 0, 1, . . . , m we introduce a matrix variable Qj  0, indexed by


Nnt−dg , to express the sos polynomial sj as
j

sj (x) = ([x]t−dgj ])T Qj [x]t−dgj = hQj , [x]t−dgj ([x]t−dgj )T i

and we arrive at the following polynomial identity:


m
X
p(x) − λ = hQj , gj (x) · [x]t−dgj ([x]t−dgj )T i.
j=0

By using relation (13.9) and equating the coefficients of the polynomials at


13.3 Strong duality 293

both sides of the above polynomial identity, we obtain:


m
hAjt,0 , Qj i
X
p0 − λ =
j=0

and
m
hAjt,γ , Qj i
X
pγ = for all 1 ≤ |γ| ≤ 2t.
j=0

Hence we arrive at the following reformulation of the sos bound:


n m
hAjt,0 , Qj i : Q0 , Q1 , . . . , Qm  0
X
psos,t = p0 + sup −
j=0
m o
hAjt,γ , Qj i = pγ for 1 ≤ |γ| ≤ 2t ,
X

j=0
(13.11)
where the matrix variable Qj is indexed by Nnt−dg , for j = 0, 1, . . . , m.
j

Now, we can clearly see that the two programs (13.10) and (13.11) are
dual of each other, with the sos program (13.11) being in standard primal
form and the moment program (13.10) being in standard dual form (albeit
over a product of positive semidefinite cones). Summarizing we have shown:

Lemma 13.2.2 The moment program (13.4) and the sos program (13.3)
are dual semidefinite programs.

13.3 Strong duality


Here we discuss some conditions under which strong duality holds between
the moment program (13.4) and the sos program (13.3).
Recall from Theorem 2.4.1 that strong duality holds if the moment pro-
gram (13.4) is bounded from below (i.e., pmom,t > −∞) and strictly feasible,
or if the sos program (13.3) is bounded from above (i.e., psos,t < ∞) and
strictly feasible. It is useful to recall that this latter condition is equivalent
to requiring that the set of optimal solutions of the moment program (13.4)
is bounded and nonempty (Proposition 2.6.11). We will use this fact later.
We begin with some general observations about the moment and sos pro-
grams. The proof of the next lemma is left as an exercise (Exercise 13.3).
n
Lemma 13.3.1 Let R > 0 and t ≥ 1. Assume y ∈ RN2t satisfies the
conditions: y0 = 1, Mt (y)  0 and Mt−1 ((R2 − ni=1 x2i )y)  0. Then we
P
294 Polynomial optimization

have
|yα | ≤ R|α| for all |α| ≤ 2t.
Lemma 13.3.2 Assume the inequality g1 (x) = R2 − ni=1 x2i ≥ 0 is present
P

in the description of K. Then, for any t ≥ 1, we have:


(i) The feasibility region of the moment program (13.4) is bounded.
(ii) The feasibility region of the sos program (13.3) is nonempty.
Proof (i) follows directly from Lemma 13.3.1. We now show (ii); we will in
fact show that there exists λ such that p − λ ∈ Σ2t + g1 Σ2t−2 ⊆ M(g)2t . In
other words, we may as well assume here that the ball constraint g1 (x) ≥ 0
is the only constraint entering the description of K, so K is the ball of radius
R. By (i) the feasibility region of the moment program (13.4) is bounded.
In addition, it contains a strictly feasible solution. Indeed, if µ is a measure
whose support is exactly the ball K then its moment sequence provides
a strictly feasible solution of (13.4) (Exercise 12.6). Hence strong duality
applies: we can conclude that there is no duality gap and thus the (dual)
sos program is feasible, which shows existence of λ for which p − λ belongs
to Σ2t + g1 Σ2t−2 ⊆ M(g)2t .
Lemma 13.3.3 If K has a nonempty interior and the sos program (13.3)
is feasible then strong duality holds: pmom,t = psos,t for all t ≥ max{dp , dK }.
Proof As K has a nonempty interior it contains a ball B. Let µ be a
measure whose support is B and let y be its sequence of moments. Then,
Mt (y)  0 and Mt−dgj (gj y)  0 (use Exercise 12.6). Hence y is a strictly
feasible solution of the moment program (13.4). In addition, the moment
program is bounded from below since the dual sos program is feasible. Hence
strong duality holds (by Theorem 2.4.1).
As the next result shows, when an explicit ball constraint is present in the
description of K, strong duality holds under a weaker assumption, without
assuming that K has a nonempty interior.
Theorem 13.3.4 (Josz and Henrion [2016]) Assume the ball constraint
g1 (x) = R2 − ni=1 x2i ≥ 0 is present in the description of K and let dK be as
P

in (13.6). Then, strong duality holds: pmom,t = psos,t for all t ≥ max{dp , dK }.
Proof By Lemma 13.3.1 the feasibility region of the moment program is
bounded. We distinguish two cases, depending whether it is empty or not.
If it is nonempty then the set of optimal solutions of the moment program
is bounded and nonempty and thus strong duality holds in view of Proposi-
tion 2.6.11 (recall the discussion at the beginning of this section).
13.3 Strong duality 295

We now assume that the program (13.4) is infeasible, so pmom,t = +∞; we


will show that psos,t = +∞. For this we use the results about strong/weak
infeasibility of semidefinite programs in Section ?? and we use the explicit
semidefinite moment and sos programs (13.10) and (13.11). In view of The-
orem ?? there are two possibilities for the moment program (13.10):
(1) Either it is strongly infeasible, i.e., there exist positive semidefinite ma-
trices Q∗0 , Q∗1 , . . . , Q∗m  0 satisfying the two conditions:
m m
hAjt,γ , Q∗j i = 0 for 1 ≤ |γ| ≤ 2t hAjt,0 , Q∗j i > 0.
X X

and z := −
j=0 j=0
(13.12)
n
(2) Or it is weakly infeasible, i.e., for all  > 0 there exists y ∈ RN2twhich
is ‘almost feasible’, which means it satisfies
1 −  ≤ y0 ≤ 1 +  and λmin (Mt−dgj (gj y  )) ≥ − for j = 0, 1, . . . , m.
(13.13)
Consider first case (1); then we show psos,t = ∞ as desired. For this,
assume the tuple (Q0 , . . . , Qm ) provides a feasible solution of (13.11) (which
j
exists by Lemma 13.3.2), with objective value val = p0 − m
P
j=0 hAt,0 , Qj i.
Let θ > 0. Using the scalar z ∗ and the matrices Q∗j in (13.12), we obtain
that the tuple (Q0 + θQ∗0 , . . . , Qm + θQ∗m ) provides a feasible solution of
j
(13.11), with objective value p0 − m ∗ ∗
P
j=0 hAt,0 , Qj + θQj i = val + θz . This

shows psos,t ≥ val + θz for all θ > 0 and thus psos,t = ∞.
We now consider case (2); we will contradict the assumption that the
moment program (13.4) is infeasible. For β ∈ Nn let Nβ denote the number
of indices i ∈ [n] with βi ≥ 1, so Nβ ≥ 1 if β 6= 0. We first claim that if
n
y ∈ RN2t satisfies λmin (Mt (y)) ≥ − then, for any 1 ≤ k ≤ t, we have:
X
Tr(Mk−1 (g1 y)) ≤ R2 Tr(Mk−1 (y)) + y0 − Tr(Mk (y)) +  (Nβ − 1).
1≤|β|≤k
(13.14)
Indeed, Tr(Mk−1 (g1 y)) is equal to
X  n
X  n
X X
2 2
R y2γ − y2γ+2ei = R Tr(Mk−1 (y)) − y2γ+2ei
|γ|≤k−1 i=1 |γ|≤k−1 i=1
X
= R2 Tr(Mk−1 (y)) − Nβ y2β .
1≤|β|≤k

Hence (13.14) follows if we can show the inequality


X X
Nβ y2β ≥ Tr(Mk (y)) − y0 +  (1 − Nβ ).
1≤|β|≤k 1≤|β|≤k
296 Polynomial optimization

For this we use Nβ ≥ 1 if β =6 0 and the fact that λmin (Mt (y)) ≥ −, i.e.,
P
Mt (y) + I  0, so y2β ≥ − for |β| ≤ t. Then 1≤|β|≤k Nβ y2β is equal to
X X X X
Nβ (y2β + ) −  Nβ ≥ (y2β + ) −  Nβ
1≤|β|≤k 1≤|β|≤k 1≤|β|≤k 1≤|β|≤k
X
= Tr(Mk (y)) − y0 +  (1 − Nβ ).
1≤|β|≤k

We now consider the sequence y  from (13.13). As k ≤ t the matrix Mk−1 (g1 y  )
is a principal submatrix of the matrix Mt−1 (g1 y  ) and thus

λmin (Mk−1 (g1 y  )) ≥ λmin (Mt−1 (g1 y  )) ≥ −,

which implies
   
 n+k−1 n+t−1
Tr(Mk−1 (g1 y )) ≥ − ≥ − .
k−1 t−1
Combining with (13.14) applied to y = y  and using y0 ≤ 1 +  we get
 
 2 
 X n+t−1 
Tr(Mk (y )) ≤ R Tr(Mk−1 (y )) + 1 +  1 + (Nβ − 1) +
t−1
1≤|β|≤k
| {z }
=:C

and thus

Tr(Mk (y  )) ≤ R2 Tr(Mk−1 (y  )) + C for all 1 ≤ k ≤ t.

Solving this recurrence gives the upper bound


t−1
X 
Tr(Mt (y  )) ≤ R2k C + (1 + )R2t =: ϕ(),
k=0

where the function ϕ() is monotone increasing in . We can now conclude.


Fix 0 > 0. Then we find that, for any 0 <  < 0 , Tr(Mt (y  )) ≤ ϕ(0 ),
which implies that all entries of y  are bounded by an absolute constant
(see Exercise 13.4(ii)). By a compactness argument we find an accumulation
n
point y ∈ RN2t , which provides a feasible solution of the moment program,
contradicting the assumption that the moment program is infeasible.

13.4 Flat extensions of moment matrices


We state here a technical result about moment matrices which will play a
crucial role for establishing an optimality certificate for the moment bounds
13.4 Flat extensions of moment matrices 297

pmom,t . Roughly speaking, this result permits to extend a (truncated) se-


n
quence y in RN2t that satisfies a rank condition (see (13.17) below) to an
n
infinite sequence ỹ ∈ RN , whose moment matrix M (ỹ) has the same rank
as Mt (y). Hence, if Mt (y)  0 then also M (ỹ)  0, and thus we can apply
the result from Theorem 12.2.14 to M (ỹ) and obtain an atomic representing
measure.
n
Fix an integer t ≥ 1. Recall that f = (fα )α ∈ RNt denotes the vector of
n
coefficients of a polynomial f ∈ R[x]t . Given a sequence y in RN2t and the
corresponding linear functional Ly ∈ R[x]∗2t , we have:
f ∈ ker Mt (y) ⇐⇒ Ly (hf ) = 0 for all h ∈ R[x]t . (13.15)
We will sometimes abuse notation and say that ‘the polynomial f belongs
to the kernel of Mt (y)’ and write ‘f ∈ ker Mt (y)’ instead of ‘f ∈ ker Mt (y)’.
Recall that the kernel of an infinite moment matrix corresponds to an
ideal I in R[x] (Lemma 12.2.7). As we will see below this property extends
in a well defined way to finite moment matrices. The following simple result
about kernels of matrices will be useful (Exercise 13.1).
Lemma 13.4.1 Let X be a symmetric matrix with block-form
 
A B
X= , (13.16)
BT C
where A and C are square matrices.
(i) Assume that rank X = rank A. Then we have
ker X = ker(A B).

(ii) Assume that rank X = rank A or that X  0. Then we have


 
T u
u ∈ ker A ⇐⇒ u ∈ ker A and u ∈ ker B ⇐⇒ ∈ ker X.
0
n
For a sequence y ∈ RN2t , note that its order t − 1 moment matrix Mt−1 (y)
is a principal submatrix of its order t moment matrix Mt (y). When equality:
rank Mt (y) = rank Mt−1 (y)
holds, following Curto and Fialkow [1996], one says that Mt (y) is a flat
extension of Mt−1 (y) and one may also say that y is a ‘flat’ sequence. As
we will see in this chapter this flatness condition plays a crucial role in the
polynomial optimization problem.
As an application of Lemma 13.4.1 we have the following result showing
298 Polynomial optimization

that the kernel of a (flat or posiive semidefinite) truncated moment matrix


behaves like a ‘truncated ideal’.
n
Lemma 13.4.2 Given a sequence y ∈ RN2t and t ≥ 1 consider its moment
matrices Mt (y) and Mt−1 (y), and let f, g ∈ R[x].

(i) Assume that rank Mt (y) = rank Mt−1 (y). Then we have

f ∈ ker Mt (y), deg(f g) ≤ t =⇒ f g ∈ ker Mt (y).

(ii) Assume that Mt (y)  0. Then we have

f ∈ ker Mt (y), deg(f g) ≤ t − 1 =⇒ f g ∈ ker Mt (y).

Proof Let Ly be the linear functional on R[x]2t corresponding to y. Observe


that it suffices to show the result when g has degree 1, say g = xi , since the
general result follows by iterating this special case. We first show (i). For
this write the matrix Mt (y) in block-form (13.16) with A = Mt−1 (y). Then,
in view of Lemma 13.4.1(i), it suffices to show that the polynomial f g = xi f
belongs to the kernel of the row submatrix (Mt−1 (y) B) of Mt (y) (since this
implies that f g ∈ ker Mt (y)). So, pick a polynomial h ∈ R[x]t−1 and let us
show that Ly (h(xi f )) = 0. But Ly (h(xi f )) = Ly ((hxi )f ) = 0 follows from
the fact that f ∈ ker Mt (y) and deg(hxi ) ≤ t (recall (13.15)).
Claim (ii) follows in an analogous manner using Lemma 13.4.1(ii).

We now state the main result of this section, known as the flat extension
theorem.
n
Theorem 13.4.3 (Curto and Fialkow [1996]) Given a sequence y ∈ RN2t
and t ≥ 1, consider its moment matrices Mt (y) and Mt−1 (y). Assume that
the following flatness condition holds:

rank Mt (y) = rank Mt−1 (y) =: r. (13.17)

The following holds.


n
(i) One can extend y to a (unique) sequence ỹ ∈ RN satisfying:

rank M (ỹ) = rank Mt (y). (13.18)

(ii) Let I denote the ideal in R[x] corresponding to the kernel of M (ỹ). If
{α1 , . . . , αr } ⊆ Nnt−1 indexes a maximum linearly independent set of
columns of the matrix Mt−1 (y), then the subset {[xα1 ], . . . , [xαr ]} of
R[x]/I is a basis of R[x]/I. Moreover, the ideal I is generated by the
polynomials in ker Mt (y).
13.4 Flat extensions of moment matrices 299

Proof Since the proof of (i) is quite technical, we begin with the proof of
(ii) and will prove (i) thereafter. Let V denote the linear span of the set
{xα1 , . . . , xαr }, consisting of the polynomials ri=1 λi xαi (λi ∈ R). As the
P

set {α1 , . . . , αr } indexes a maximum set of linearly independent columns


of Mt−1 (y), and rank M (ỹ) = rank Mt−1 (y), it also indexes a maximum
set of linearly independent columns of M (ỹ). This implies that the set
{[xα1 ], . . . , [xαr ]} is a basis of R[x]/I and that V ∩ ker M (ỹ) = {0}.
As rank M (ỹ) = rank Mt (y), the inclusion: ker Mt (y) ⊆ ker M (ỹ) fol-
lows using Lemma 13.4.1(ii). Hence, also the ideal generated by ker Mt (y) is
contained in the ideal ker M (ỹ):

(ker Mt (y)) ⊆ ker M (ỹ).

To show the reverse inclusion we first claim that the polynomial space can
be decomposed as

R[x] = V + (ker Mt (y)). (13.19)

For this, we show using induction on |α| that any monomial xα belongs
to V + (ker Mt (y)). If |α| ≤ t this follows from the definition of the αi ’s.
Assume now |α| ≥ t + 1 and αi ≥ 1 for some i ∈ [n]. By the induction
assumption, we know that xα−ei = p + q, where p ∈ V and q ∈ (ker Mt (y)).
Hence, xα = xi xα−ei = xi (p + q) = xi p + xi q ∈ V + (ker Mt (y)), because
xi p ∈ V + (ker Mt (y)) (since deg(xi p) ≤ t) and xi q ∈ (ker Mt (y)) (since
q ∈ (ker Mt (y)). This concludes the proof of (13.19).
We can now show the reverse inclusion

ker M (ỹ) ⊆ (ker Mt (y)).

Assume f ∈ ker M (ỹ). Using (13.19) write f = p + q, where p ∈ V and


q ∈ (ker Mt (y)) ⊆ ker M (ỹ). Then, f − q ∈ ker M (ỹ) ∩ V = {0}, thus
showing f = q ∈ (ker Mt (y)).

We now turn to the proof of (i), whose details are elementary but a bit
technical. (As a warm-up you may want to consider the univariate case
n = 1 in Exercise 13.5). A first observation is that it suffices to construct
n
an extension ỹ ∈ RN2t+2 of y, whose moment matrix is a flat extension of
the moment matrix of y, i.e., such that rank Mt+1 (ỹ) = rank Mt (y). Indeed,
after iterating this construction we then obtain an infinite sequence with the
desired property.
By assumption, Mt (y) is a flat extension of its principal submatrix Mt−1 (y),
300 Polynomial optimization

and our objective is to construct a matrix M indexed by Nnt+1 of the form:


 
Mt (y) B
M= , (13.20)
BT C

with the following two properties:

• M is a flat extension of Mt (y), i.e., rank M = rank Mt (y),


n
• M is a moment matrix, i.e., M = Mt+1 (ỹ) for some ỹ ∈ RN2t+2 .

Recall {α1 , . . . , αr } ⊆ Nnt−1 indexes a maximum linearly independent set of


columns of Mt−1 (y) and V denotes the linear span of {xα1 , . . . , xαr }. We will
repeatedly make use of the following property:

for any α ∈ Nnt there exists a (unique) p ∈ V such that xα − p ∈ ker Mt (y),
(13.21)
which follows from the flatness condition (13.17).
The construction of a flat extension M as in (13.20) relies in a crucial way
on Lemma 13.4.2. Take γ ∈ Nn with |γ| = t + 1, assume γi ≥ 1 for some
i ∈ [n] and let p ∈ V such that xγ−ei − p ∈ ker Mt (y) (as in (13.21)). If M
is a flat extension of Mt (y) then it follows from Lemma 13.4.2(i) that the
polynomial xi (xγ−ei − p) = xγ − xi p must belong to the kernel of M . This
fact enables us to define the entries of the block B in terms of the entries of
Mt (y) and in turn the entries of the block C in terms of the entries of B T .
After that we only need to verify that these definitions are good, i.e., that
they do not depend on the choice of the index i for which γi ≥ 1, and that
the matrix M constructed in this way is indeed a moment matrix.
This is what we do through the next three claims. It will be convenient to
also use the symbol vec(f ) to denote the coefficient vector of a polynomial
f , especially when we want to discuss the coefficient vector of a combination
of polynomials.

Claim 13.4.4 Let γ ∈ Nn with |γ| = t + 1 and γi , γj ≥ 1 for distinct


i, j ∈ [n], and let p, p0 ∈ V such that xγ−ei − p, xγ−ej − p0 ∈ ker Mt (y). Then,
the polynomial xi p − xj p0 belongs to the kernel of Mt (y) (which implies B is
well defined) and also to the kernel of B T (which implies C is well defined).

Proof As rank Mt (y) = rank Mt−1 (y), in view of Lemma 13.4.1(i), in order
to show that xi p − xj p0 belongs to the kernel of Mt (y) it suffices to show
that Ly (h(xi p − xj p0 )) = 0 for all h ∈ R[x]t−1 . So, let h ∈ R[x]t−1 . Then we
13.4 Flat extensions of moment matrices 301

have
Ly (h(xi p − xj p0 )) = Ly ((hxi )p) − Ly ((hxj )p0 )
= Ly ((hxi )(p − xγ−ei )) − Ly ((hxj )(p0 − xγ−ej ))
= 0.

The second equality follows from hxi xγ−ei = hxj xγ−ej (= hxγ , with degree
at most 2t) and the last equality follows since p−xγ−ei , p0 −xγ−ej ∈ ker Mt (y)
and deg(xi h), deg(xj h) ≤ t.
Next we show that xi p − xj p0 belongs to the kernel of B T , or, equivalently,
that vec(xδ )T B T vec(xi p − xj p0 ) = 0 for all |δ| = t + 1. Let |δ| = t + 1, let
k ∈ [n] such that δk ≥ 1 and let p00 ∈ V such that xδ−ek − p00 ∈ ker Mt (y). By
construction, the polynomial xδ − xk p00 belongs to the kernel of the matrix
(Mt (y) B). This gives:

Mt (y)vec(xk p00 ) = Bvec(xδ ),

which implies

vec(xi p − xj p0 )T Bvec(xδ ) = vec(xi p − xj p0 )T Mt (y)vec(xk p00 ) = 0,

since xi p − xj p0 lies in the kernel of Mt (y), as shown above.

Next we show that the constructed matrix M is indeed a moment matrix.


For this we use the characterization from Exercise 13.6. So we only need to
show the following two claims.

Claim 13.4.5 Mγ,δ = Mγ+ei ,δ−ei for all γ, δ ∈ Nnt+1 and i ∈ [n] such that
δi ≥ 1 and |γ| ≤ t.

Proof Let p, p0 ∈ V ⊆ R[x]t−1 such that xδ−ei − p and xγ − p0 belong to


ker Mt (y) ⊆ ker M . By construction, the polynomials xδ −xi p and xγ xi −xi p0
belong to ker M . Using these facts we obtain:

Mγ,δ = vec(xγ )T M vec(xδ ) = vec(p0 )T M vec(xi p) = vec(p0 )T Mt (y)vec(xi p)


= Ly (p0 (xi p)) = Ly ((p0 xi )p) = vec(p0 xi )T Mt (y)vec(p)
= vec(p0 xi )T M vec(p) = vec(xγ xi )T M vec(xδ−ei ) = Mγ+ei ,δ−ei .

Claim 13.4.6 Mγ,δ = Mγ−ej +ei ,δ+ej −ei for all γ, δ ∈ Nn and i, j ∈ [n]
such that γj ≥ 1, δi ≥ 1 and |γ| = |δ| = t + 1.

Proof Let p, p0 ∈ V ⊆ R[x]t−1 such that xγ−ej − p and xδ−ei − p0 be-


long to ker Mt (y) ⊆ ker M. By construction, for k ∈ {i, j}, the polynomials
302 Polynomial optimization

xk (xγ−ej −p) and xk (xδ−ei −p0 ) belong to ker M . Using these facts we obtain:
Mγ,δ
= vec(xγ )T M vec(xδ ) = vec(xj p)T M vec(xi p0 ) = vec(xj p)T Mt (y)vec(xi p0 )
= Ly ((xj p)(xi p0 )) = Ly ((xi p)(xj p0 )) = vec(xi p)T Mt (y)vec(xj p0 )
= vec(xi p)T M vec(xj p0 ) = vec(xi xγ−ej )T M vec(xj xδ−ei )
= Mγ+ei −ej ,δ+ej −ei .

In view of the characterization in Exercise 13.6, Claims 13.4.5 and 13.4.6


show that M is a moment matrix. Combining with Claim 13.4.4 we have
shown that M is a moment matrix which is a flat extension of Mt (y), and
this concludes the proof of the theorem.

13.5 Finite convergence and global minimizers


As mentioned earlier, each parameter pmom,t provides a lower bound on pmin ,
the minimum value taken by p on the set K. Here, we present a sufficient
condition which, if satisfied, permits to conclude that the moment bound is
in fact exact, i.e., that equality pmom,t = pmin holds. This condition depends
whether an optimal solution L to the moment program (13.4) satisfies a
suitable flatness relation. In addition, one can then find global minimizers of
p in K by computing the common roots of the polynomials in the kernel of
a moment matrix associated to L, and this can be done using the eigenvalue
method presented in Chapter 12.1.3.
Recall the definition of the parameter dK from (13.6) and define the set
Kp∗ = {x ∈ K : p(x) = pmin },
which consists of the global minimizers of the polynomial p over the set K.
So Kp∗ 6= ∅ when K is compact.
Theorem 13.5.1 Assume L is an optimal solution to the program (13.4)
n
and let y = (L(xα )) ∈ RN2t be the corresponding sequence. Assume also that
y satisfies the condition:
∃s ∈ N such that max{dp , dK } ≤ s ≤ t and rank Ms (y) = rank Ms−dK (y).
(13.22)
Then the following properties hold:
(i) The relaxation (13.4) is exact, i.e., we have equality: pmom,t = pmin .
(ii) The common roots to the polynomials in ker Ms (y) are all real and
global minimizers, i.e., we have the inclusion: VC (ker Ms (y)) ⊆ Kp∗ .
13.5 Finite convergence and global minimizers 303

(iii) If L is an optimal solution of (13.4) for which the matrix Mt (y) has
maximum possible rank, then VC (ker Ms (y)) = Kp∗ .
Proof By assumption, y satisfies the condition (13.22) and thus the flatness
condition:
rank Ms (y) = rank Ms−1 (y).
Hence we can apply Theorem 13.4.3 and conclude that there exists a se-
n
quence ỹ ∈ RN which extends the subsequence (yα )|α|≤2s of y and satisfies
rank M (ỹ) = rank Ms (y) =: r.
Thus, ỹα = yα if |α| ≤ 2s, but it could be that ỹ and y differ at entries
indexed by monomials of degree higher than 2s, these entries of y will be
irrelevant in the rest of the proof. Let I be the ideal corresponding to the
kernel of M (ỹ). By Theorem 13.4.3, I is generated by ker Ms (y) and thus
VC (I) = VC (ker Ms (y)).
As M (ỹ) is positive semidefinite with finite rank r, we can apply Theo-
n
rem 12.2.14 (and its proof) to the sequence ỹ ∈ RN and deduce that
VC (I) = {v1 , . . . , vr } ⊆ Rn
and
r
X r
X
ỹ = λi [vi ]∞ where λi > 0 and λi = 1.
i=1 i=1
n
Taking the projection of ỹ onto the subspace RN2s , we obtain:
r
X r
X
(yα )α∈Nn2s = λi [vi ]2s where λi > 0 and λi = 1.
i=1 i=1

In other words, the restriction of the linear map L to the subspace R[x]2s is
the convex combination ri=1 λi Evvi of evaluations at the points of VC (I):
P

r
X
L(f ) = λi f (vi ) for all f ∈ R[x]2s . (13.23)
i=1

Moreover, if {α1 , . . . , αr } ⊆ Nns−dK indexes a maximum linearly independent


set of columns of Ms−dK (y), then the set B = {[xα1 ], . . . , [xαr ]} is a linear
basis of R[x]/I (by Theorem 13.4.3).
Next we claim that we can choose interpolation polynomials pvi at the
points vi ∈ VC (I) with deg(pvi ) ≤ s − dK . Indeed, if pvi are arbitrary inter-
polation polynomials then, using the basis B, write pvi = fi +gi where gi ∈ I
304 Polynomial optimization

and fi lies in the linear span of the monomials xα1 , . . . , xαr . Thus the fi ’s
are again interpolation polynomials but now with degree at most s − dK .
Next we claim that
v1 , . . . , vr ∈ K.
To see this, we use the fact that L ≥ 0 on (gj Σ) ∩ R[x]2t for all j ∈ [m]. As
deg(pvi ) ≤ s − dK , we have: deg(gj p2vi ) ≤ deg(gj ) + 2(s − dK ) ≤ 2s. Hence we
can compute L(gj p2vi ) using (13.23) and obtain that L(gj p2vi ) = gj (vi )λi ≥ 0.
As λi > 0 this gives gj (vi ) ≥ 0 for all j and thus vi ∈ K.
We can now proceed to show the claims (i)-(iii). By assumption, L is an
optimal solution of (13.4) and thus pmom,t = L(p). As deg(p) ≤ 2s, we can
evaluate L(p) using (13.23). We obtain:
r
X
pmom,t = L(p) = λi p(vi ) ≥ pmin ,
i=1
P
since i λi = 1 and, for any i ∈ [r], λi > 0 and p(vi ) ≥ pmin as vi ∈ K.
As the reverse inequality: pmom,t ≤ pmin always holds, we have equality:
P
pmom,t = pmin and (i) holds. In turn, the equality i λi p(vi ) = pmin implies
termwise equality: p(vi ) = pmin for all i. This shows that each vi is a global
minimizer of p in K, i.e., {v1 , . . . , vr } ⊆ Kp∗ , showing (ii).
Assume now that the optimal solution L of (13.4) is chosen in such a
way that rank Mt (y) is maximum among all optimal solutions of (13.4). In
view of the results1 in Chapter 8.3, this means that y lies in the relative
interior of the face of the feasible region of (13.4) consisting of all opti-
mal solutions. Therefore, for any other optimal solution y 0 , we have that
ker Mt (y) ⊆ ker Mt (y 0 ). Consider a global minimizer v ∈ Kp∗ of p in K
and the corresponding optimal solution y 0 = [v]2t of (13.4). The inclusion
ker Mt (y) ⊆ ker Mt (y 0 ) implies that any polynomial in ker Mt (y) vanishes at
the point v. Therefore, we obtain: ker Ms (y) ⊆ ker Mt (y) ⊆ I(Kp∗ ), which
implies
I = (ker Ms (y)) ⊆ I(Kp∗ ).
In turn, this implies the inclusions:
Kp∗ ⊆ VC (I(Kp∗ )) ⊆ VC (I) = {v1 , . . . , vr }.
Thus (iii) holds and the proof is complete.
Note that Theorem 13.5.1 needs some assumptions. First, the moment
program (13.4) should have an optimal solution L. This is the case, for
1 Give more precise reference?
13.5 Finite convergence and global minimizers 305

instance, if the feasible region of (13.4) is bounded (which happens, e.g.,


when an explicit ball constraint is present in the description of K, recall
Lemma 13.3.2 (i))), or when the sos program (13.3) is strictly feasible. Sec-
ond, this optimal solution L should satisfy the flatness condition (13.22)
for some s such that max(dp , dK ) ≤ s ≤ t. If so, one can then conclude
that pmom,t = pmin , which means finite convergence of the moment hierar-
chy at the current order t. Another consequence is that the set Kp∗ is finite
(with cardinality equal to the rank of the associated moment matrix Ms (y)).
Hence, the flatness condition cannot hold when the set Kp∗ of global mini-
mizers is infinite. On the other hand, as we see in the next example, having
finitely many global minimizers is not a sufficient condition for having finite
convergence.

Example 13.5.2 Consider the problem of minimizing a polynomial p over


the unit ball K = {x ∈ Rn : 1− ni=1 x2i ≥ 0}. Assume that p is homogeneous,
P

strictly positive on Rn \ {0} and p is not a sum of squares of polynomials.


Then, pmin = 0 is attained only at the origin, but psos,t = pmom,t < pmin
for all t ≥ deg(p)/2. Indeed, using Lemmas 13.3.2 and 13.3.3, we can con-
clude that strong duality holds: psos,t = pmom,t and, in addition, that the sos
relaxation attains its optimum. Now, assume for contradiction that psos,t =
pmin = 0. Therefore p belongs to the quadratic module M(1 − ni=1 x2i ).
P

As p is homogeneous this implies that p itself must be a sum of squares of


polynomials (use Exercise 13.2), and we reach a contradiction.
To get an example of such polynomial, consider the homogenized Motzkin
polynomial P ∈ R[x, y, z] from relation (11.13), which is nonnegative over
R3 but not a sum of squares of polynomials. Then for  > 0 the perturbed
polynomial P = P + (x6 + y 6 + z 6 ) is homogeneous, strictly positive on
R3 \ {0} and P is not a sum of squares of polynomials for  small enough.
(To see the latter use the fact that the cone Σ6 is closed – Exercise 11.13).

On the other hand, surprisingly, the flatness condition often holds in prac-
tice. In fact, it has been shown by Nie [2014] that the flatness condition holds
generically (which, roughly speaking, means that the polynomials defining
the polynomial optimization problem have generic coefficients when fixing
their degrees).
Another question raised by Theorem 13.5.1 is how to find an optimal so-
lution whose moment matrix has maximum possible rank. It is in fact a
property of most interior-point algorithms that, when solving a semidefinite
program, they return an optimal solution lying in the relative interior of the
optimal face of the feasible region, and thus an optimal solution with maxi-
306 Polynomial optimization

mum rank. See the monographs by de Klerk [2002, Chap. 4] and Wolkowicz
et al. [2000, Chap. 5] for details.
We now turn to the question of how to find the global minimizers of p in
K under the assumptions of Theorem 13.5.1.
As Theorem 13.5.1 shows, if L is an optimal solution of the moment pro-
gram (13.4) which satisfies the flatness condition (13.22), then any common
root to the polynomials in ker Ms (L) is a global minimizer of p in K. More-
over, if the rank of the matrix Mt (L) is largest possible (among all optimal
solutions of (13.4)) then all global minimizers are found in this way. Hence,
we only need to compute the variety VC (ker Ms (y)). For this we can apply
the eigenvalue method described in Section 12.1.3. Indeed, as we see below,
all the information that we need in order to be able to apply this method is
contained in the moment matrix Ms (y).
Define the ideal I = (ker Ms (y)). Then, I is a real radical ideal. Indeed,
n
by Theorem 13.5.1, I = ker M (ỹ), where ỹ is the extension to RN of the
sequence (yα )|α|≤2s and, by Lemma 12.2.7, I is real radical since M (ỹ)  0.
We saw in Theorem 12.1.14 how to compute the variety VC (I) via the
eigenvalues/eigenvectors of multiplication matrices. One needs the assump-
tion that VC (I) is finite, which is the case here. In fact we know that
|VC (I)| = r, the rank of the associated moment matrix Ms (y), and that
VC (y) ⊆ Rn . Then, what we need in order to compute VC (I) is an explicit
basis B of the quotient space R[x]/I and the matrix Mh in this basis B of
some multiplication (‘by h’) operator acting on R[x]/I.
Finding a basis B of R[x]/I is easy: if a set {α1 , . . . , αr } ⊆ Nns−dK indexes
a maximum linearly independent set of columns of the matrix Ms−1 (y), then
the set B = {[xα1 ], . . . , [xαr ]} of corresponding cosets in R[x]/I is a basis of
R[x]/I.
Finally, we indicate how to construct the explicit matrix Mh representing
the ‘multiplication by h’ operator in the basis B. For any variable xk , recall
that the ‘multiplication by xk ’ is the linear map:
[f ] ∈ R[x]/I 7→ [xk f ] ∈ R[x]/I.
Hence, in order to build its matrix Mxk in the basis B, we need to know
how to express each coset [xk xαj ] in the basis B = {[xα1 ], . . . , [xαr ]}. This
can easily be done using the moment matrix Ms (y). Indeed, for any index
j ∈ [r], we have: deg(xk xαj ) = 1 + |αj | ≤ 1 + s − dK ≤ s. Hence we can
express the column of Ms (y) indexed by xk xαj as a linear combination of
the columns indexed by the monomials xα1 , . . . , xαr , which directly gives
the j-th column of the matrix Mxk . Once we know the matrix Mxk for each
13.6 Real solutions of polynomial equations 307

k ∈ [n] we can derive the multiplication matrix Mh for any polynomial h =


α α1 αn
P P
α hα x , namely we have: Mh = h(Mx1 , . . . , Mxn ) = α hα Mx1 · · · Mxn .
As explained in Section 12.1.3 a good strategy is to take h to be a random
linear polynomial.

13.6 Real solutions of polynomial equations


Here we consider the problem of computing the real roots of a system of
polynomial equations:

h1 (x) = 0, . . . , hm (x) = 0

where h1 , . . . , hm ∈ R[x]. In other words, with I denoting the ideal generated


by the hj ’s, this is the problem of computing the real variety VR (I) of I. We
address this question in the case when VR (I) is finite.
Of course, if the complex variety VC (I) of I is finite, then we can just
apply the eigenvalue method presented in Chapter 14 to compute VC (I) and
thus VR (I) = VC (I) ∩ Rn . However, it can be that VR (I) is finite while VC (I)
is infinite. As a simple such example, consider the ideal generated by the
polynomial x21 +x22 in two variables, to which we come back in Example 13.6.2
below. In that case we cannot apply directly the eigenvalue method. However
we can apply it indirectly. Indeed, we can view the problem of computing
VR (I) as an instance of polynomial optimization problem to which we can
then apply the results of the preceding section. Namely, consider the problem
of minimizing the constant polynomial p = 0 over the set

K = {x ∈ Rn : h1 (x) = 0, . . . , hm (x) = 0}.

Then K = VR (I) coincides with the set of global minimizers of p = 0 in K.


As before, we consider the moment relaxations (13.4). Now, any feasible
solution L is an optimal solution of (13.4). Hence, by Theorem 13.5.1, if
the rank condition (13.22) holds, then we can compute all points in VR (I).
We now show that it is indeed the case that, for t large enough, the rank
condition (13.22) will be satisfied.

Theorem 13.6.1 Let h1 , . . . , hm ∈ R[x] be polynomials having finitely


many common real roots. Set dK = maxj ddeg(hj )/2e. For t ≥ dK , let Ft
n
denote the set of sequences y ∈ RN2t whose corresponding linear functional
Ly ∈ (R[x]2t )∗ satisfies the conditions:

Ly (1) = 1, Ly ≥ 0 on Σ2t ,
(13.24)
Ly (uhj ) = 0 for all j ∈ [m] and u ∈ R[x] with deg(uhj ) ≤ 2t.
308 Polynomial optimization

Then there exist integers t0 and s such that dK ≤ s ≤ t0 and the following
rank condition holds:

rankMs (y) = rankMs−dK (y) for all y ∈ Ft and t ≥ t0 . (13.25)



Moreover, R I = (ker Ms (y)) for any y ∈ Ft whose moment matrix Mt (y)
has maximum possible rank.

Proof The goal is to show that if we choose t large enough, then the kernel
of Mt (y) contains sufficiently many polynomials permitting to show that
the rank condition (13.25) holds. Here y is an arbitrary feasible solution in
Ft and Ly is its corresponding linear functional on R[x]2t . We assume that
t ≥ maxj deg(hj ). Then we have

hj ∈ ker Mt (y) for all j ∈ [m] (13.26)

(by (13.24), since L(h2j ) = 0 as deg(h2j ) ≤ 2t).



Now we choose a ‘nice’ set of polynomials {f1 , . . . , fL } generating R I, the
real radical ideal of the ideal I; namely, a set for which we can claim the
following degree bounds:

Any f ∈ R I can be written as f = L
P
l=1 ul fl (13.27)
for some ul ∈ R[x] with deg(ul fl ) ≤ deg(f ).

(That such a nice set of generators exists follows from the theory of Gröbner
bases.) Next we claim:

There exists t1 ∈ N for which f1 , . . . , fL ∈ ker Mt (y) for all t ≥ t1 .


(13.28)
Fix l ∈ [L]. Applying the Real Nullstellensatz, we know that there exist
polynomials pl,i and ul,j and an integer N (which, for convenience, we can
choose to be a power of 2) satisfying the following identity:
X m
X
flN + p2l,i = ul,j hj .
i j=1

If t is large enough, then L vanishes at each ul,j hj (in view of (13.24)).


Hence L vanishes at the polynomial flN + i p2l,i . As L is nonnegative on
P

Σ2t , we deduce that L(flN ) = 0. Now an easy induction permits to show


that L(fl2 ) = 0 (this is where choosing N a power of 2 was helpful) and thus
fl ∈ ker Mt (y).
By √assumption, the set VR (I) is finite. Therefore, the quotient space
R
R[x]/ I has finite dimension (Theorem 14.1.5). Let B = {b1 , . . . , br } be a
13.6 Real solutions of polynomial equations 309

R
set of polynomials whose cosets form a basis of the quotient space R[x]/ I.
Let d0 denote the maximum degree of the polynomials in B and set

t2 = max{t1 , d0 + dK }.

Pick any monomial xα of degree at most t2 . We can write:


L
(α)
X
α (α) (α) (α)
x =p +q , with q = ul fl , (13.29)
l=1

where p(α) lies in the span of M. Hence p(α) has degree at most d0 and
(α)
thus each term ul fl has degree at most max{|α|, d√0 } ≤ t2 . Here we have
used the fact that {[b1 ], . . . , [br ]} is a basis
√ of R[x]/ R I, combined with the
property (13.27) of the generators fl of R I.
We can now conclude the proof. Namely, we show that, if t ≥ t0 :=
t2 + 1, then the rank condition (13.25) holds with s = t2 . For this pick a
monomial xα of degree at most t2 and consider the decomposition in (13.29).
(α)
As deg(ul fl ) ≤ t2 ≤ t − 1 and fl ∈ ker Mt (y) (by (13.28)), we obtain that
(α)
ul fl ∈ ker Mt (y) (by Lemma 13.4.2(ii)). Therefore, the polynomial xα −p(α)
belongs to the kernel of Mt (y). As the degree of p(α) is at most d0 ≤ t2 − dK ,
we can conclude that rank√ Mt2 (y) = rank Mt2 −dK (y).
R
Finally, the equality I = (ker Mt2 (y)) follows from Theorem 13.5.1(iii).

Example 13.6.2 Let I be the √ ideal generated by the polynomial x21 + x22 .
Clearly, VR (I) = {(0, 0)} and R I = (x1 , x2 ) is generated by the monomials
x1 and x2 . As we now see this can also be found by applying the above result.
Indeed, let Ly be a feasible solution in the set Ft defined by (13.24) for
t = 1. Then we have Ly (x21 ), Ly (x22 ) ≥ 0 and Ly (x21 + x22 ) = 0. This implies:
Ly (x21 ) = Ly (x22 ) = 0 and thus Ly (x1 ) = Ly (x2 ) = Ly (x1 x2 ) = 0. Hence the
moment matrix M1 (y) has the form:

1 x1 x2
   
1 1 y10 y01 .1 0 0
M1 (y) = x1 y10
 y20 y11  =  0 0 0 .
x2 y01 y11 y02 0 0 0

Therefore, rank M1 (y) = rank M0 (y),√x1 , x2 belong to the kernel of M1 (y),


and we find that ker M1 (y) generates R I.
As an exercise, check what happens when I is the ideal generated by the
polynomial (x21 + x2 )2 . When does the rank condition holds?
310 Polynomial optimization

13.7 Binary polynomial optimization


In this section we consider binary polynomial optimization, that is, the case
of optimizing a polynomial over a semialgebraic set K which is a subset
of the Boolean hypercube: K ⊆ {0, 1}n (or K ⊆ {±1}n ). This case is of
particular importance since it permits to model a wealth of discrete opti-
mization problems. It contains for instance the maximum stable set problem
and the maximum cut problems, which were discussed in detail in previous
chapters. We indicate how the moment/sos method applies to these cases
and to binary polynomial optimization, as well as how to exploit equations
in general. General references include Lasserre [2001] and Laurent [2003].

13.7.1 Revisiting Lasserre’s hierarchy for the stable set problem


Given a graph G = (V = [n], E) consider again the problem of computing
the maximum stability number α(G), which can be expressed via the binary
polynomial optimization problem:
n X o
α(G) = max p(x) = xi : xi xj = 0 for {i, j} ∈ E, x2i −xi = 0 for i ∈ V .
i∈V
(13.30)
We saw in Section 4.9 the bounds last (G), which converge to α(G) in finitely
many steps (in t = α(G) steps). Recall their definition2 from relation (4.37):
nX o
last (G) = max zi : Mt01 (z)  0, z∅ = 1, zij = 0 for {i, j} ∈ E .
i∈V
(13.31)
Here the variable z is indexed by P2t (n), the collection of all subsets of V
with cardinality at most 2t, and the matrix3 Mt01 (z) is indexed by Pt (n),
with entries
(Mt01 (z))I,J = zI∪J for I, J ∈ Pt (n). (13.32)

We will now explain the link between these two types of moment matrices.
In particular, as we will see, if we apply the moment hierarchy introduced in
this chapter to problem (13.30) then the resulting parameters pmom,t coincide
with the bounds last (G).
Let I 01 = (x2i − xi : i ∈ [n]) denote the ideal generated by the polynomials
2 Recall from Lemma 4.9.10 that we could equivalently require that zI = 0 for any I ∈ P2t (n)
that contains an edge of G.
3 In order to distinguish with the notion of moment matrix Mt (y) considered in this chapter,
we use here the notation Mt01 (z) to denote the moment matrix introduced in Definition 4.9.1;
the superscript ‘01’ refers to the fact that we work with binary variables.
13.7 Binary polynomial optimization 311

x2i − xi for i ∈ [n] and, for an integer t ∈ N, let


n
nX o
It01 = ui (x2i − xi ) : u1 , . . . , un ∈ R[x]t−2
i=1

denote its truncation at degree t. Clearly an equation x2i − xi = 0 can be


seen as the two opposite inequalities x2i − xi ≥ 0 and xi − x2i ≥ 0; note that
the truncated quadratic module M(±(x2i − xi ) : i ∈ [n])2t coincides with the
01 .
truncated ideal I2t
n
In the definition of the parameter pmom,t , the variable y ∈ RN2t and its
corresponding linear functional Ly ∈ (R[x]2t )∗ must satisfy the constraints
y0 = 1, Mt (y)  0, Ly (uxi xj ) = 0 for all {i, j} ∈ E and u ∈ R[x]2t−2 , and
01
Ly = 0 on I2t , i.e., Ly (u(x2i − xi )) = 0 for i ∈ [n] and u ∈ R[x]2t−2
(13.33)
and the parameter pmom,t asks to maximize ni=1 Ly (xi ) =
P Pn
i=1 i . The
y
above condition (13.33) permits to derive that many entries of y must be
equal and thus to eliminate variables.
We use the following notation: for α ∈ Nn we let ᾱ ∈ {0, 1}n denote the
binary sequence with the same support, i.e., ᾱi = min(αi , 0) for i ∈ [n].
n
Lemma 13.7.1 Assume y ∈ RN2t satisfies (13.33). Then yα = yᾱ for all
α ∈ Nn2t . Moreover, if t ≥ n + 1 then Mt (y) is a flat extension of Mn (y),
i.e., rank Mt (y) = rank Mn (y).

Proof The first claim follows from the fact that xα −xᾱ ∈ I2t for all α ∈ Nn2t
(check it). This in turn permits to show that for any α ∈ Nnt the αth column
of Mt (y) coincides with its ᾱth column. Indeed, given β ∈ Nnt , we have
Ly (xβ xα ) = Ly (xβ xᾱ ), since yβ+α = yβ+ᾱ as the two sequences β + α and
β + ᾱ have the same support.

As a first application we can conclude (using Theorem 13.5.1) that finite


convergence holds: pmom,n+1 = α(G).
A second application is that we can now link the two types of moment
matrices. Here it is convenient to identify a set I ∈ P2t (n) with a binary
n
sequence in Nn2t (namely the characteristic vector of I). Given y ∈ RN2t let
us denote by z ∈ RP2t (n) its subsequence indexed by the binary sequences in
Nn2t . Then it follows from Lemma 13.7.1 that if y satisfies (13.33) then Mt01 (z)
coincides with the submatrix of Mt (y) indexed by Pt (n) and, moreover,
Mt (y) is a flat extension of Mt01 (z).
As zij = Ly (xi xj ) = 0 for all edges {i, j} ∈ E we can conclude that the two
parameters last (G) and pmom,t coincide: last (G) = pmom,t . This gives again
312 Polynomial optimization

the finite convergence of the moment bounds and in fact a sharper result:
pmom,t = α(G) for t ≥ α(G). In addition, this offers an alternative argument
for finite convergence (recall Section 4.9.1), which is easier than relying on
the flat extension theorem. The key here is exploiting the equations x2i = xi
in order to derive a simpler structure for the moment matrices, with less
variables and smaller size.
The above applies to any binary polynomial optimization problem:

pmin = min{p(x) : gj (x) ≥ 0 for j ∈ [m], x2i − xi = 0 for i ∈ [n]}. (13.34)

We may assume without loss of generality that p, gj are multilinear poly-


nomials (i.e, use only monomials xα with α ∈ {0, 1}n ). Hence we can
I
P
write p = I∈Pd (n) pI x if p has degree d (and analogously for gj ) with
xI = i∈I xi . Then we have:
Q

pmom,tn= inf{Ly (p) : L(1) = 1, L ≥ 0 on M(g)2t , L = 0 on I2t 01 }


X o
= inf pI zI : z∅ = 1, Mt01 (z)  0, Mt−d
01
j
(gj z)  0 for j ∈ [m] .
I∈Pd (n)
(13.35)
Here, for a polynomial g = H gH xH , we adapt the definition of the vector
P

gz in the obvious way: gz is undexed by subsets of [n] (of suitable size), with
P
Ith entry (gz)I = H gH zI∪H . As above we may apply Theorem 13.5.1 to
conclude:
pmom,t = pmin for t ≥ n + dK .

Also the sos bound admits a more economical reformulation:


n m
X o
01
psos,t = sup λ : p − λ = σj gj + q where q ∈ I2t , σj ∈ Σ2(t−dj ) ,
j=0

where each σj can be assumed to be a sum of squares of multilinear polyno-


mials; hence the semidefinite matrix expressing σj is indexed by the smaller
set Pt (n) instead of Nnt .

13.7.2 Lasserre’s hierarchy for the maximum cut problem


The above treatment applies to polynomial optimization over a semialge-
braic set K ⊆ {±1}n , with the following straightforward modifications. We
now use the equations x2i − 1 = 0 for i ∈ [n] to define the set K and de-
note the ideal they generate by I ±1 and its truncations by It±1 . Note that
xI xJ = xI∆J for any sets I, J ⊆ [n] when x ∈ {±1}. This justifies defining
13.7 Binary polynomial optimization 313

the matrix Mt±1 (z), indexed by Pt (n) and with entries


(Mt±1 (z))I,J = zI∆J for I, J ∈ Pt (n). (13.36)
Given multilinear polynomials p (with degree d) and g = {g1 , . . . , gm }, the
moment relaxation of order t ≥ max{dp , dK } for the problem
pmin = min{p(x) : gj (x) ≥ 0 for j ∈ [m], x2i − 1 = 0 for i ∈ [n]} (13.37)
can be expressed as
±1
pmom,t = inf{Ly (p) : Ly (1) = 1, Ly ≥ 0 on M(g)2t , Ly = 0 on I2t }
P
= inf{ I∈Pd (n) pI zI : z∅ = 1, Mt−dj (gj z)  0 for j ∈ [m]}
(13.38)
and the sos relaxation as
Xm o
±1
psos,t = sup{λ : p − λ = σj gj + q where q ∈ I2t , σj ∈ Σ2(t−dj ) ,
j=0

where each σj can be assumed to be a sum of squares of multilinear poly-


nomials.
This applies in particular to the maximum cut problem, which asks to
maximize the quadratic polynomial p(x) = 12 {i,j}∈E wij (1 − xi xj ) over
P

{±1}n . Then the Lasserre relaxation of order t = 1 coincides with the basic
semidefinite relaxation sdp(G, w) introduced in Chapter 5.2.1 in relation
(5.6). This follows from the next lemma (to be shown in Exercise 13.8), since
the objective polynomial uses only monomials of even degree (in fact, only
quadratic monomials). Detailed information about the Lasserre hierarchy
for max-cut can be found in Laurent [2004].
Lemma 13.7.2 Assume p, g1 , . . . , gm are multilinear polynomials which
use only monomials of even degree. If t is even (resp., odd), then the optimum
value of the moment relaxation (13.38) remains unchanged if we replace the
matrices Mt±1 (z) and Mt−d
±1
j
(gj z) by their submatrices indexed by sets with
even (resp., odd) cardinality.
As in the binary case finite convergence holds: pmom,t = pmin if t ≥ n+dK .
The unconstrained case, asking to optimize a degree d polynomial p over
{±1}n , is of special interest and contains several problems such as max-cut
and satisfiability problems (recall Section 5.3) and it is equivalent to uncon-
strained binary polynomial optimization (by a linear change of variables).
Then a stronger finite convergence result can be shown: Fawzi, Saunderson
and Parrilo [2016] show that pmom,t = pmin for t ≥ dn/2e for the max-cut
problem; Sakaue et al. [2017] show that pmom,t = pmin for t ≥ d(n + d − 1)/2e
314 Polynomial optimization

when p has degree d and for t ≥ d(n + d − 2)/2e when in addition all mono-
mials in p have even degree (which recovers the case of max-cut). Moreover
these bounds have been shown to be tight (by Laurent [2003] for max-cut
and by Kurpisz et al. [2016], Sakaue et al. [2017] for general degree d).

13.7.3 Exploiting equations


Consider the general case when there are explicit polynomial equations
present in the description of the semialgebraic set K, namely

K = {x ∈ Rn : gj (x) ≥ 0 for j ∈ [m], hl (x) = 0 for l ∈ [m0 ]},

where gj , hl ∈ R[x] and m, m0 ∈ N. Let I(h) denote the ideal generated by


h = {h1 , . . . , hm0 } and I(h)t = { m
P 0
l=1 ul hl : deg(ul hl ) ≤ t for l ∈ [m0 ]} its
truncation at degreee t.
As observed above in the binary case, equations can be exploited to get
more economical semidefinite programming formulations for the sos/moment
programs. This still holds in general. In a nutshell, instead of working in the
full polynomial space, it suffices to work in the quotient space R[x]/I(h).
Indeed, if B is a linear basis of R[x]/I(h) then R[x] = R(B) ⊕ I(h) holds,
where R(B) denotes the linear span of the set B. Then, one can easily see
that any sum of squares polynomial p admits a decomposition: p = i u2i +q,
P

where ui ∈ R(B) for all i and q ∈ I(h). In other words, it suffices to work in
R[x]/I(h) to deal with sums of squares, which leads to semidefinite programs
involving matrices indexed by (a subset of) B instead of the full monomial
set, and polynomial equations can be directly used to eliminate variables.
We only illustrate this on simple examples.
Example 13.7.3 In the binary case when h = {x2i − xi : i ∈ [n]}, or the
±1-case when h = {x2i − xi : i ∈ [n]}, the set B = {xI = i∈I xi : I ⊆ [n]} is
Q

a linear basis of R[x]/I(h) and we saw above that it indeed suffices to deal
with sums of squares of polynomials in R(B) modulo the ideal I(h).
Example 13.7.4 Assume K = {x ∈ R2 : 1 − x21 − x22 = 0} is the unit
circle. Then, the set B = {xi1 , xi1 x2 : i ∈ N} is a linear basis of the quotient
space R[x1 , x2 ]/(1−x21 −x22 ) and, in the moment relaxation (13.7) of order t,
one may impose the constraints yα − yα+2e1 − yα+2e2 = 0 for all |α| ≤ 2t − 2.
See Parrilo [2005] for a detailed discussion.
There is an interesting special case, when the polynomials h1 , . . . , hm0
have finitely many common real roots, i.e., |VR (h)| < ∞. Note that this
implies that the quadratic module M(g) + I(h) is Archimedean since the
13.7 Binary polynomial optimization 315

polynomial u := − m
P 0 2
l=1 hl belongs to I(h) and its level set {x : u(x) ≥ 0}
is compact (in fact, finite). Then it has been shown by Nie [2013] that
finite convergence holds: psos,t = pmom,t = pmin for some t ∈ N. In the
restricted case when the polynomials hl also have finitely many common
complex roots, i.e., |VC (h)| < ∞, one can give an explicit bound on the
order of finite convergence and reformulate the moment bound pmom,t in a
more economical way.
Indeed, as |VC (h)| < ∞ the quotient space R[x]/I(h) has finite dimension
(recall Theorem 12.1.8). Say N = dim R[x]/I(h) and let b1 = 1, b2 , . . . , bN
be polynomials whose cosets form a basis B of R[x]/I(h). For any i, j ∈ [N ]
there exist (unique) scalars λkij such that bi bj − N k
P
k=1 λij bk ∈ I(h). Finally,
N
given a vector z ∈ R indexed by B, define the matrix MB (z) indexed by
B, whose (i, j)th entry is

N
X
(MB (z))i,j = λkij zk for i, j ∈ [N ]
k=1

(thus the ‘linearization’ of the product bi bj expressed in the basis B and


replacing bk by zk ). This matrix is known as a combinatorial moment matrix
(see Laurent [2007b]); note that it coincides with the matrix Mn01 (z) from
(13.32) (resp., the matrix Mn±1 (z) from (13.36)) when h = {x2i − xi : i ∈ [n]}
(resp., h = {x2i − 1 : i ∈ [n]}). For the problem of minimizing a polynomial
p= N
P
i=1 pi bi over VR (h), we have

N
nX o
pmin = min pi zi : z1 = 1, MB (z)  0 .
i=1

An analogous reformulation holds when restricting to points in VR (h) satis-


fying prescribed inequalities (see Laurent [2007b]).
When the ideal I(h) is both finite-dimensional and radical then finite
convergence: psos,t = pmin for some t ∈ N, follows from the following repre-
sentation result, which was shown in Exercise 12.7.

Theorem 13.7.5 (Parrilo [2002]) Let p ∈ R[x]. Assume

p ≥ 0 on {x ∈ Rn : gj (x) ≥ 0 for j ∈ [m], hl (x) = 0 for l ∈ [m0 ]}

and the ideal generated by h = {h1 , . . . , hm0 } is zero-dimensional and radi-


cal. Then p ∈ M(g) + I(h).
316 Polynomial optimization

13.8 Notes and further reading


The sos/moment approach to polynomial optimization presented in this
chapter was introduced by Lasserre [2001] and Parrilo [2000, 2003], with
roots in the works by Shor [1987] and Nesterov [2000]. Lasserre [2001] real-
ized the relevance of the results of Curto and Fialkow [1996] for optimization
and, in particular, that their ‘flat extension theorem’ (Theorem 13.4.3) can
be used to get an optimality certificate: if the current moment relaxation
has a ‘flat’ optimal solution then this relaxation is in fact exact (Theo-
rem 13.5.1). Moreover, Henrion and Lasserre [2005] adapted the eigenvalue
method to compute global optimizers when the flatness condition holds.
Having such a stopping criterium and being able to compute global opti-
mizers is a remarkable property of this ‘moment based’ approach. It has
been implemented in the software GloptiPoly, see Henrion, Lasserre, Loef-
berg [2009]. This approach applies more generally to the general moment
problem, where one optimizes over measures satisfying certain prescribed
linear conditions on their moments (see the treatments in the monographs
by Lasserre [2009], [2015]).
An alternative algebraic proof of the ‘flat extension theorem’, which holds
in a more general ‘sparse’ setting, was given by Mourrain and Laurent [91]
(see also the exposition in the survey by Laurent [2009]).
The application to computing real roots of polynomial equations (and real
radical ideals) has been developed by Lasserre, Laurent and Rostalski [2008]
(see also the survey by Laurent, Rostalski [2012]).
Several implementations of the sums of squares vs. moment approaches
for polynomial optimization are available, including
- GloptiPoly:
http://homepages.laas.fr/henrion/software/gloptipoly3/
- YALMIP:
http://users.isy.liu.se/johanl/yalmip/pmwiki.php?n=Main.HomePage,
- SOSTOOLS: http://www.cds.caltech.edu/sostools/,
- SparsePOP, for polynomial optimization problems with sparsity pattern:
https://sparsepop.sourceforge.io/
The latter software applies to sparse polynomial optimization problems.
This means that the objective and constraint polynomials p, gj have some
well-defined sparsity structure. Namely, there are subsets I1 , . . . , Ik ⊆ [n]
such that each polynomial gj uses only variables indexed by one of the sets
I1 , . . . , Ik and the polynomial p can be decomposed as p = p1 + . . . + pk
where pl uses only variables indexed by Il for l ∈ [k]. Then instead of de-
compositions in the quadratic module M(g) one may consider weaker de-
Exercises 317

compositions involving terms σj gj where the sos polynomial σj involves only


the same subset of variables as gj . In this way one obtains semidefinite pro-
gramming formulations involving smaller matrices. While this gives weaker
positivity certificates in general, this is still sufficient to claim asymptotic
convergence when the sets I1 , . . . , Ik satisfy the so-called intersection run-
ning property (up to re-ordering): for all l ∈ [k] there exists 1 ≤ l0 ≤ l−1 such
that Il ∩ (I1 ∪ . . . ∪ Il−1 ) = Il ∩ Il0 (under suitable Archimedean conditions,
see Lasserre [2006], Grimm, Netzer, Schweighofer [2007]).
There is a broad literature about polynomial optimization, its many ap-
plications and extensions. We refer, e.g., to the monographs by Blekherman
et al. [2012], Marshall [2008], Lasserre [2009, 2015], the survey by Laurent
[2009], and further references therein.
We have only briefly sketched how the general sos/moment hierarchy ap-
plies to the special case of binary polynomial optimization, when the vari-
ables are assumed to take values in {0, 1}n (or {±1}n ). This area has at-
tracted a lot of attention in the past recent years, with many interesting ques-
tions, such as analyzing the integrality gap of the moment/sos relaxations
and their power to get good approximation algorithms. For the first topic we
refer, for instance, to the analysis of the integrality gap at any order t ≤ n for
the knapsack problem by Karlin, Mathieu, Nguyen [2011]. About the power
of the sos/moment method let us just mention the groundbreaking work by
Lee, Raghavendra and Steurer [2015]. For maximum constraint satisfaction
problems they show a tight connection between arbitrary semidefinite pro-
gramming reformulations and those obtained via the moment/sos method.
Based on this they can show super-polynomial lower bounds for the size
of semidefinite programs permitting to reformulate combinatorial problems
such as max-cut, maximum stable sets and traveling salesman problems.

Exercises
13.1 Show Lemma 13.4.1.
13.2 Let p be a homogeneous polynomial. Assume that p can be written as
p = s0 + s1 (1 − ni=1 x2i ) for some sums of squares polynomials s0 , s1 .
P

Show that p is a sum of squares of polynomials.


n
13.3 Consider a sequence y ∈ RN2t and the polynomial g(x) = R2 − ni=1 x2i .
P

Assume that y satisfies the following two conditions:


Mt (y)  0 and Mt−1 (gy)  0.
Show: |yα | ≤ R|α| for all |α| ≤ 2t.
318 Polynomial optimization
n
13.4 Consider a sequence y ∈ RN2t .
(a) Assume that Mt (y)  0 and Tr(Mt (y)) ≤ C for some scalar C > 0.
Show that |yγ | ≤ C for all |γ| ≤ 2t.
(b) Assume that Mt (y) + I  0 and Tr(Mt(y)) ≤ C for some scalars
, C > 0. Show that |yγ | ≤ C +  n+t
t for all |γ| ≤ 2t.

13.5 Given an integer s ≥ 1, consider a sequence y = (y0 , y1 , . . . , y2s ) ∈


R2s+1 and its moment matrix Ms (y) of order s. Assume that the rank
condition holds:
rankMs (y) = rankMs−1 (y).

(a) Show that one can find scalars a, b ∈ R for which the extended
sequence ỹ = (y0 , y1 , . . . , y2s , a, b) satisfies:

rankMs+1 (ỹ) = rankMs (y).

(b) Show that one can find an (infinite) extension

ỹ = (y0 , y1 , . . . , y2s , ỹ2s+1 , ỹ2s+2 , . . .) ∈ RN

satisfying
rankM (ỹ) = rankMs (y).

This shows the flat extension theorem (Theorem 13.4.3) in the univari-
ate case n = 1.

13.6 Let M be a real symmetric matrix indexed by Nt , where t ≥ 1. Then,


M is called a moment matrix if it is of the form M = Mt (y) for some
n
sequence y ∈ RN2t . In other words, M is a moment matrix if and only
if it satisfies the conditions:

Mα,β = Mα0 ,β 0 for all α, β, α0 , β 0 ∈ Nnt such that α + β = α0 + β 0 .

Show the following additional characterization: M is a moment matrix


if and only if it satisfies the following two conditions:
Mα,β = Mα−ei ,β+ei for all α, β ∈ Nnt and i ∈ [n]
(13.39)
such that αi ≥ 1 and |β| ≤ t − 1,

Mα,β = Mα−ei +ej ,β+ei −ej for all α, β ∈ Nnt and i, j ∈ [n]
such that αi , βj ≥ 1 and |α| = |β| = t.
(13.40)
Exercises 319

13.7 Consider the problem of computing pmin = inf x∈K p(x), where p = x1 x2
and
K = {x ∈ R2 : −x22 ≥ 0, 1 + x1 ≥ 0, 1 − x1 ≥ 0}.
(a) Show that, at order t = 1, pmom,1 = pmin = 0 and psos,1 = −∞.
(b) At order t = 2, what is the value of psos,2 ?

13.8 Consider the polynomial optimization problem (13.37), where p, gj are


multilinear polynomials which use only even monomials. That is, p =
I
P
I pI x with pI = 0 whenever |I| is odd, and the same for gj . Consider
the moment relaxation (13.38).
Show that, if t is even (resp., odd), then the optimum value of (13.38)
remains unchanged if we replace the matrices Mt±1 (z) and Mt−d ±1
j
(gj z)
by their submatrices indexed by sets I with |I| even (resp., odd).
14
Noncommutative polynomial optimization and
applications to quantum information theory

In this chapter we introduce the noncommutative polynomial optimization


problem. This is an analogue of the polynomial optimization problem:

pmin = min p(x), where K = {x ∈ Rn : gj (x) ≥ 0 ∀j ∈ [m]}


x∈K

and p, gj ∈ R[x] are polynomials (in commutative varibales), investigated in


the previous chapters. But, instead of optimizing over scalar variables, we
now allow variables that are real symmetric matrices of arbitrary size. So,
the variables are now noncommutative symbols. This means, e.g., that the
polynomials x1 x2 x1 , x21 x2 and x2 x21 are all different polynomials, since when
the xi ’s are replaced by matrices one gets different matrices in general. Hence
we now evaluate the polynomial p at an n-tuple X = (X1 , . . . , Xn ) ∈ (S d )n ,
where the matrix size d ≥ 1 is not fixed. So p(X) is now a matrix and
the objective is to minimize the smallest eigenvalue of p(X). Specifically,
the noncommutative polynomial optimization problem asks to minimize the
quantity ψ T p(X)ψ taken over all possible choices of unit vectors ψ ∈ Rd ,
all matrix tuples X = (X1 , . . . , Xn ) ∈ (S d )n and all d ∈ N. In addition,
we may restrict to X satisfying some positivity conditions of the form
g1 (X)  0, . . . , gm (X)  0. Thus the noncommutative polynomial optimiza-
tion problem reads:

inf {ψ T p(X)ψ : gj (X)  0 for all j ∈ [m]},


d,ψ,X

where we optimize over all d ∈ N, all unit vectors ψ ∈ Rd , and all matrix
tuples X ∈ (S d )n . When restricting to size d = 1 in the optimization we find
back the classical (commutative) polynomial optimization problem.
In this chapter, we explain how the moment/sums-of-squares approaches
of the previous chapters extend naturally to this noncommutative setting
14.1 Noncommutative polynomial optimization 321

and we present its application to optimization problems arising in quantum


information theory.

14.1 Noncommutative polynomial optimization


14.1.1 Notation and preliminaries
Noncommutative polynomials
Throughout the variables x1 , . . . , xn (also called letters, or symbols) are as-
sumed to be noncommutative, so x1 x2 x3 x1 6= x21 x3 x2 . Any product of vari-
ables is called a word (or monomial). We denote the set of all words in the
symbols x1 , . . . , xn by hxi = hx1 , . . . , xn i, where the empty word is denoted
by 1. This is a semigroup with involution, where the binary operation is con-
catenation, and the involution of a word w ∈ hxi is the word w∗ obtained by
reversing the order of the symbols in w, letting x∗i = xi for all i ∈ [n]. For
instance, the concatenation of the two words w1 = x2 x1 and w2 = x31 x23 x2 is
the word w1 w2 = x2 x41 x23 x2 , and w∗ = x3 x21 x2 x1 for the word w = x1 x2 x21 x3 .
The set of all real linear combinations of the words in hxi is denoted by
Rhxi, and its elements are called noncommutative polynomials, abbreviated
as NC polynomials. The involution extends to Rhxi by linearity. The set
Rhxi is a real vector space; it is also a ∗-algebra, which means it is an
algebra which is closed under the ∗-involution. A polynomial p ∈ Rhxi is
called symmetric if p∗ = p and Sym Rhxi denotes the set of symmetric
polynomials. For instance, the polynomial x1 x2 + x2 x1 is symmetric, but
the polynomial x1 x2 − x2 x1 is not.
The degree of a word w ∈ hxi is the number of symbols composing it,
P
denoted as |w| or deg(w), and the degree of a polynomial p = w pw w ∈
Rhxi is the maximum degree of a word w with pw 6= 0. Given t ∈ N ∪ {∞},
we let hxit be the set of words w of degree |w| ≤ t, so that hxi∞ = hxi, and
Rhxit is the real vector space of noncommutative polynomials p of degree
deg(p) ≤ t.
For a finite set S ⊆ Sym Rhxi and t ∈ N ∪ {∞}, the truncated quadratic
module at degree 2t associated to S is defined as the cone generated by all
polynomials p∗ gp ∈ Rhxi2t with g ∈ S ∪ {1}:
n o
M2t (S) = cone p∗ gp : p ∈ Rhxi, g ∈ S ∪ {1}, deg(p∗ gp) ≤ 2t . (14.1)

Likewise, for a set T ⊆ Rhxi, we can define the truncated ideal at degree
2t, denoted by I2t (T ), as the vector space spanned by all polynomials ph ∈
Rhxi2t with h ∈ T :

I2t (T ) = span ph : p ∈ Rhxi, h ∈ T, deg(ph) ≤ 2t . (14.2)
Noncommutative polynomial optimization and applications to quantum information theory
322

We say that M(S) + I(T ) is Archimedean when there exists a scalar R > 0
such that
Xn
2
R − x2i ∈ M(S) + I(T ). (14.3)
i=1

As we see later this condition plays a crucial role in establishing convergence


of some bounds for the problem (14.4).

LInear forms on noncommutative polynomials and moment matrices


Throughout we will use the dual space Rhxi∗t of real-valued linear functionals
on Rhxit . We list some basic definitions.
A linear functional L ∈ Rhxi∗t is called symmetric if L(p) = L(p∗ ) for all
p ∈ Rhxit and a linear functional L ∈ Rhxi∗2t is called positive if L(p∗ p) ≥
0 for all p ∈ Rhxit . As in the commutative case, many properties of a
linear functional L ∈ Rhxi∗2t can be expressed as properties of its associated
moment matrix (also known as its Hankel matrix). For L ∈ Rhxi∗2t we define
its associated moment matrix 1 , which has rows and columns indexed by
words in hxit , by

Mt (L)w,w0 = L(w∗ w0 ) for w, w0 ∈ hxit ,

and we set M (L) = M∞ (L). The following are easy verifications (please
check it!).

Lemma 14.1.1 Consider a linear functional L ∈ Rhxi∗2t and its associated


moment matrix Mt (L) = (L(w∗ w0 ))w,w0 ∈hxit .

(i) L is symmetric if and only if Mt (L) is symmetric.


(ii) L is positive if and only if Mt (L) is positive semidefinite.

One can also express nonnegativity of a linear form L ∈ Rhxi∗2t on the


truncated quadratic module M2t (S) in terms of certain associated positive
semidefinite moment matrices.

Lemma 14.1.2 Given a polynomial g ∈ Rhxi and L ∈ Rhxi∗2t , define


the linear form gL ∈ Rhxi∗2t−deg(g) by (gL)(p) = L(gp), and set dg =
ddeg(g)/2e). We have

L(p∗ gp) ≥ 0 for all p ∈ Rhxit−dg ⇐⇒ Mt−dg (gL)  0.


1 Note that for simplicity we use the same notation Mt (L) as in the commutative case; which
setting is considered - commutative or not - should be clear from the context.
14.1 Noncommutative polynomial optimization 323

Corollary 14.1.3 Given a finite set S ⊆ SymRhxi and L ∈ Rhxi∗2t , we


have

L ≥ 0 on M2t (S) if and only if Mt−dg (gL)  0 for all g ∈ S ∪ {1}.

In the same way, the condition L = 0 on I2t (T ) corresponds to linear


equalities on the entries of Mt (L).

14.1.2 The NC polynomial optimization problem


We introduce the noncommutative (abbreviated as NC) polynomial opti-
mization problem:

pnc
min = inf {ψ p(X)ψ : g1 (X)  0, . . . , gm (X)  0}. (14.4)
d,ψ,X

Here, p, g1 , . . . , gm ∈ SymRhxi are NC symmetric polynomials, and the op-


timization is over all integers d ∈ N, unit vectors ψ ∈ Rd , and matrix tuples
X = (X1 , . . . , Xn ) ∈ (S d )n . Note that if we would restrict to size d = 1
then we would recover the classical polynomial optimization problem (in
commutative variables) treated in the previous chapters. Alternatively one
may write ψ T p(X)ψ = hψ, p(X)ψi and one may instead optimize in (14.4)
over the triples (H, ψ, X), where H is a finite dimensional real Hilbert space
with inner product h·, ·i, ψ ∈ H is a unit vector, and X1 , . . . , Xn are linear
maps on H. 2
In what follows we let
S = {g1 , . . . , gm }

denote the set of polynomials involved as constraints in (14.4). In other


words, we are asking for the smallest possible eigenvalue of p(X) taken over
all X in the positivity domain D(S) of the set S, setting
[
D(S) = {X ∈ (S d )n : g1 (X)  0, . . . , gm (X)  0}.
d≥1

We begin with a useful property of matrix tuples in the positivity domain


D(S) when the Archimedean condition (14.3) holds. Recall that, for a matrix
X ∈ S dp
, kXk = max{kXuk : kuk = 1} denotes its operator norm. Therefore,
kXk = λmax (X 2 ), with kXk = λmax (X) when X  0.
Lemma 14.1.4 Assume the Archimedean condition (14.3) holds. Then,
for any X = (X1 , . . . , Xn ) ∈ D(S), we have kXi k ≤ R for all i ∈ [n].
2 As we will see later in Section ?? the optimum value does not change if we would optimize
over complex Hilbert spaces instead of real ones.
Noncommutative polynomial optimization and applications to quantum information theory
324

Proof As R2 − x2i = R2 − nj=1 x2j + j6=i x2j , we obtain that R2 − x2i ∈


P P

M(S). By the definition of M(S) this implies that, for all X ∈ D(S), we
have R2 I − Xi2  0 and thus kXi k ≤ R.

Hierarchy of lower bounds


We now introduce a hierarchy of lower bounds for problem (14.4). For any
t ∈ N ∪ {∞}, define the parameter3

pnc
mom,t = inf{L(p) : L ∈ Rhxi2t , L symmetric, L(1) = 1, L ≥ 0 on M2t (S)}.
(14.5)
When t = ∞ we allow L to act on the full polynomial space Rhxi and
M∞ (S) = M(S) is the full NC quadratic module.
Lemma 14.1.5 For any integer t ∈ N we have:
pnc nc nc nc
mom,t ≤ pmom,t+1 ≤ pmom,∞ ≤ pmin .

Proof The first two (left) inequalities are clear, since any feasible solution
(s)
L to the program defining the parameter pmom provides a feasible solution to
(t)
the program defining pmom whenever t < s ≤ ∞, simply by restricting L to
(∞)
the subspace Rhxi2t . We now check the right most inequality: pmom ≤ pnc min .
For this, pick a feasible solution (d, ψ, X) to the program (14.4) and con-
sider the linear functional L ∈ Rhxi∗ , defined by L(p) = ψ ∗ p(X)ψ for
p ∈ Rhxi. Then, L satisfies the desired properties: L(1) = ψ T ψ = 1 since ψ
is a unit vector; L is symmetric:
L(p∗ ) = ψ T p∗ (X)ψ = ψ T p(X)T ψ = ψ T p(X)ψ = L(p);
L is positive on M(S), since
L(p∗ p) = ψ T p∗ (X)p(X)ψ = ψ T p(X)T p(X)ψ ≥ 0
and
L(p∗ gp) = ψ T p∗ (X)g(X)p(X)ψ = ψ T p(X)T g(X)p(X)ψ ≥ 0
as g(X)  0 implies p(X)T g(X)p(X)  0. This shows: pnc mom,∞ ≤ L(p) =
T nc nc
ψ p(X)ψ, and thus the desired inequality pmom,∞ ≤ pmin ..
As shown in Corollary 14.1.3 the conditions on the linear functional L
(t)
in the definition of the parameter pmom can be expressed as a semidefinite
program. Add about the size.
We now see some finite and asymptotic convergence properties for the hier-
archy of bounds (pncmom,t )t .
3 Here too we use the same notation as in the commutative case, which setting is meant should
be clear from the context.
14.1 Noncommutative polynomial optimization 325

Finite convergence under flatness


We first state a finite convergence result for these bounds when the following
‘flatness’ condition holds:

rankMt (L) = rankMt−dK (L). (14.6)

We also say that L is flat when (14.6) holds. Here we set dK = maxm j=1 dgj ,
where dg = ddeg(g)/2e for any polynomial g. If the problem is unconstrained
(i.e., no polynomial constraints gj (X)  0 are imposed) then we set dK = 1.
In the constrained case we may assume that dK ≥ 1.
Note that the Archimedean condition is not needed for the next result.

Theorem 14.1.6 Given t ≥ max{dp , dK ), let L be an optimal solution


of the program (14.5) and set r := rankMt (L). Assume that L satisfies
the flatness condition (14.6). Then the relaxation (14.5) is exact: pnc
mom,t =
pmin , and one can find an optimal solution (H, ψ, X) to problem (14.4) with
dimension dim H = r.

We prove the theorem by constructing a feasible solution (H, ψ, X) to


problem (14.4) with value hψ, p(X)ψi = L(p) and dim H = r. This implies
that pnc nc
min ≤ L(p) = pmom,t and, by Lemma 14.1.5, equality holds through-
out. We can then conclude that (H, ψ, X) is an optimal solution of the NC
optimization problem (14.4).

By assumption, the matrix Mt (L) is positive semidefinite (by Lemma 14.1.2)


with rankMt (L) = r. Thus, Mt (L) can be written as the Gram matrix of a
set of vectors in Kr , i.e., there exist vectors, denoted u for each u ∈ hxit ,
such that
L(u∗ v) = hu, vi for u, v ∈ hxit .

We consider the real vector space generated by the vectors u for all u ∈ hxit :
H = Span{u : u ∈ hxit } ⊆ Rr . Using the flatness condition (14.6) one can
show (check it!):4

Lemma 14.1.7 The vector space H is spanned by the set {u : u ∈ hxit−dK }


and it has dimension r.

We set ψ = 1, where 1 is the vector labeling the empty word. Next, we


consider the linear mappings Xi on H, defined by setting

Xi u = xi u for u ∈ hxit−dK
4 Make it an exercise
Noncommutative polynomial optimization and applications to quantum information theory
326

and extending to H by linearity. For clarity, xi u is the vector associated to


the word xi u, which is well-defined since |xi u| = 1 + |u| ≤ 1 + t − dK ≤ t.

Lemma 14.1.8 The map Xi is well defined and it is Hermitian.

Proof To see that the map Xi is well defined we check that, given scalars
λu ,
X X
λu u = 0 =⇒ λu xi u = 0.
u∈hxit−dK u∈hxit−dK
P
For this, assume u∈hxit−dK λu u = 0. Then, for any v ∈ hxit−dK , we have:
X X X X
hv, λu xi ui = λu L(v ∗ xi u) = λu L((xi v)∗ u) = λu hxi v, ui
u∈hxit−dK u u u

X
= hxi v, λu ui = 0,
u
P
which shows u λu xi u = 0.
To show that Xi is Hermitian we check that hv, Xi ui = hXi v, ui for all
words u, v ∈ hxit−dK . Indeed, we have: hv, Xi ui = hv, xi ui = L(v ∗ xi u) and
hXi v, ui = hxi v, ui = L((xi v)∗ u) = L(v ∗ xi u).

Lemma 14.1.9 We have: u(X)ψ = u for all u ∈ hxit .

Proof Use induction on the length of u. The claim is true for u = 1 by the
definition of ψ. Assume now that u = xi v where v ∈ hxit−1 . Then, v(X)ψ =
v by induction. Therefore, u(X)ψ = Xi v(X)ψ = Xi v = xi v = u.

Lemma 14.1.10 We have: L(w) = hψ, w(X)ψi for all w ∈ hxi2t . There-
fore, L(p) = hψ, p(X)ψi holds, and L(u∗ gj v) = hψ, u∗ (X)gj (X)v(X)ψi holds
for all u, v ∈ hxit−dK and j ∈ [m].

Proof Write w = uv with u, v ∈ hxit . Then,

hψ, w(X)ψi = hψ, u(X)v(X)ψi = hu∗ (X)ψ, v(X)ψi = hu∗ , vi = L((u∗ )∗ v)

equals L(uv). The last claim follows since, by assumption, deg(p) ≤ 2t and
deg(u∗ gj v) ≤ 2(t − dK ) + deg(gj ) ≤ 2t.

To conclude the proof of Theorem 14.1.6 it suffices to verify that (H, ψ, X)


provides a feasible solution for problem (14.4).

Lemma 14.1.11 ψ is a unit vector in H and gj (X)  0 for all j ∈ [m].


14.1 Noncommutative polynomial optimization 327

Proof Indeed, ψ = 1 is a unit vector, since h1, 1i = L(1) = 1.


We now check that gj (X)  0. For this consider an arbitrary vector ξ ∈ H,
written as ξ = u∈hxit−d λu u for some scalars λu . Then, ξ ∗ gj (X)ξ is equal
P
K
to
X X X
λu λv u∗ gj (X)v = λu λv hψ, u∗ (X)gj (X)v(X)ψi = λu λv L(u∗ gj v),
u,v u,v u,v

where the two equalities follow using Lemmas 14.1.9 and 14.1.10. Finally,

P P
u,v λu λv L(u gj v) = u,v λu λv (Mt−dK (gj L))u,v is nonnegative, since the
matrix Mt−dK (gj L) is positive semidefinite by assumption.

Asymptotic convergence under the Archimedean condition


Under the Archimedean condition we can show the asymptotic convergence
of the moment bounds (14.5) to the optimum value of (14.4).
Theorem 14.1.12 Assume that the Archimedean condition (14.3) holds.
Then the bounds pnc nc
mom,t converge asymptotically to the parameter pmin . More-
over one can find a solution (H, ψ, X) (where H may have infinite dimen-
sion) which is optimal for problem (14.4).
The proof will follow from the following lemmas.
We assume that the condition (14.3) holds, i.e., R2 − i x2i ∈ M2dM (S)
P

for some integer dM ≥ 1.


To simplify some technical details in the proofs of Lemmas 14.1.13, 14.1.14
below, we add the additional constraint gm+1 (X) := R2 − ni=1 Xi2  0 to
P

problem (14.4), so that we now have dM = 1. This implies that the entries of
Mt (L) can be bounded (as in (14.8)) for all feasible L for (14.5); in particular
the infimum is attained in (14.5). In the general case dM ≥ 2, one can only
bound the entries of a principal submatrix of Mt (L) (see Remark ?? below).
Lemma 14.1.13 For t ∈ N let Lt be a feasible solution for the program
(14.5) defining pnc
mom,t . There exists a converging subsequence of the sequence
(Lt )t , whose limit L ∈ Khxi∗ is feasible for the program defining pnc
mom,∞ .

Proof Let Lt be feasible for the program (14.5).We first claim


Lt (p∗ (R2 − x2i )p) ≥ 0 for all p ∈ Khxit−1 , (14.7)
which follows from the fact that p∗ (R2 − x2i )p ∈ M2t (S), since R2 − x2i ∈
M2 (S) by assumption. Next we claim that
|Lt (w)| ≤ R|w| for all |w| ≤ 2t. (14.8)
For this we first we check that Lt (u∗ u) ≤ R2|u| if |u| ≤ t, using induction
Noncommutative polynomial optimization and applications to quantum information theory
328

on |u| ≥ 0. If u = 1 then L(1) = 1. Consider a word v ∈ hxit with |v| ≥ 1


and write it as v = xi u with |u| = |v| − 1 ≤ t − 1. Then, Lt (u∗ u) ≤ R2|u| by
induction. By (14.7), we have that Lt (u∗ (R2 − x2i )u) ≥ 0 and thus

Lt (v ∗ v) = R2 Lt (u∗ u) − Lt (u∗ (R2 − x2i )u) ≤ R2 Lt (u∗ u) ≤ R2 R2|u| = R2|v| .

Suppose now that w = u∗ v where u 6= v and |u|+|v| ≤ 2t. Since Mt (Lt )  0,


(Mt (Lt ))u,u (Mt (Lt ))v,v ≥ ((Mt (Lt ))u,v )2 holds. This implies:
p
|Lt (w)| = |Lt (u∗ v)| ≤ Lt (u∗ u)Lt (v ∗ v) ≤ R2|u| R2|v| = R|u|+|v| = R|w|
p

and thus (14.8) is proved.


We now proceed with showing the existence of a converging subsequence
of (Lt )t . For this we first extend Lt to the full polynomial algebra Rhxi;
namely define the linear functional L0t ∈ Khxi∗ by

L0t (w) = Lt (w) for |w| ≤ 2t, L0t (w) = 0 for |w| > 2t.

Next we scale L0t and define L̃t ∈ Khxi∗ by

L̃t (w) = L0t (w)/R|w| for all w ∈ hxi.

In view of relation (14.8), we have |L̃t (w)| ≤ 1 for all w. Hence, the sequence
(L̃t )t lies in the unit ball of the space Rhxi∗ . Using the Banach-Alaoglu
theorem, we know that this unit ball is compact for the weak *-topology.
This implies that the sequence (L̃t )t admits a a converging subsequence.
For simplicity, we use the same notation (L̃t )t to denote this subsequence
and (L0t )t for its unnormalized version. So there exists L̃ ∈ Khxi∗ such that
limt→∞ L̃t (w) = L̃(w) for all w ∈ hxi. After scaling L̃ we obtain L ∈ Khxi∗
defined by
L(w) = R|w| L̃(w) for all w ∈ hxi.

Then it follows that limt→∞ L0t (w) = L(w) for all w. From this one deduces
easily that L is nonnegative on M(S) with L(1) = 1 and thus L is feasible
for the program defining pnc
mom,∞ .

Corollary 14.1.14 We have: limt→∞ pnc nc


mom,t = pmom,∞ and the program
nc
defining pmom,∞ has an optimal solution L.

Proof Set p∗ := limt→∞ pnc nc ∗ nc


mom,t , so that pmom,t ≤ p ≤ pmom,∞ . Fix  >
0. For each t ∈ N let Lt be a feasible solution to the program defining
pnc nc
mom,t such that Lt (p) ≤ pmom,t + . By Lemma 14.1.13 there is a feasible
solution L to the program defining pnc mom,∞ which is the limit of a converging
subsequence of (Lt )t . By letting t → ∞ we obtain: pnc ∗
mom,∞ ≤ L(p) ≤ p + .
14.1 Noncommutative polynomial optimization 329

Now, letting  → 0, we obtain: pnc ∗


mom,∞ ≤ p , thus equality holds and the
lemma is proved.

We now proceed to prove Theorem 14.1.12. In view of Corollary 14.1.14,


there exists L ∈ Rhxi∗ which is optimal for pnc nc
mom,∞ . Thus, L(p) = pmom,∞
nc nc
and it suffices to show the inequality pmin ≤ pmom,∞ .
Set `2 (N) = {x ∈ KN : kxk2 < ∞} and let `02 (N) be its subset consisting
of all sequences x ∈ `2 (N) with finite support.

Lemma 14.1.15 There exists a system of vectors V = {v : v ∈ hxi} ⊆


`02 (N) such that L(u∗ v) = hu, vi for all u, v ∈ hxi.

Proof By assumption, Mt (L)  0 for all t ∈ N. Fix t ∈ N. Then there exists


a system of vectors Vt = {u : u ∈ hxit } forming a Gram representation of
Mt (L), i.e., L(u∗ v) = hu, vi for all u, v ∈ hxit . We claim that we can find
vectors w for |w| = t + 1 such that the set Vt ∪ {w : |w| = t + 1} is a
Gram representation of Mt+1 (L). Indeed, consider a system of vectors u0 for
u ∈ hxit+1 forming a Gram representation of Mt+1 (L). As hu, vi = hu0 , v 0 i
for all u, v ∈ hxit there exists a unitary matrix P such that u = P u0 for all
u ∈ hxit . Then Vt ∪ {P w0 : |w| = t + 1} provides a Gram representation of
Mt+1 (L), as desired. Iterating this procedure we get a system V of vectors
satisfying the lemma.

We can now complete the proof of Theorem 14.1.12. Consider the real
vector space H = Span(V ), spanned by the set V constructed in Lemma
14.1.15, and let H be its closure in `2 (N). Then H is a real separable Hilbert
space.
Define the linear map Xi : H → H by Xi u = xi u for all u ∈ hxi, and ex-
tending by linearity. It is well-defined and it can be extended to H by setting
Xi a = limk Xi ak if a = limk ak where ak ∈ H for all k. That this is well de-
fined follows from Lemma 14.1.4, because limk ak = 0 implies limk Xi ak = 0
as kXi ak k ≤ R2 kak k.
Each Xi is Hermitian, since hv, Xi ui = hXi v, ui (= L(v ∗ xi u)) for u, v ∈
hxi.
Finally, set ψ := 1. Then ψ is a unit vector since hψ, ψi = L(1) = 1.
As in Lemmas 14.1.9, 14.1.10, 14.1.11 one can check that w(X)ψ = w and
hψ, w(X)ψi = L(w) for all w ∈ hxi, which implies: hψ, p(X)ψi = L(p) and
gj (X)  0 for all j.
This shows that (H, ψ, X) is a feasible solution for problem (14.4) with
value L(p), which implies: L(p) ≥ pnc nc nc
min . Hence, pmom,∞ = L(p) = pmin and
Noncommutative polynomial optimization and applications to quantum information theory
330

thus (H, ψ, X) is an optimal solution of (14.4). The proof of Theorem 14.1.12


is now complete.

Bounds in terms of sums-of-squares


Mention without proof the results of Helton, etc.

14.1.3 Application to NC optimization over the ball


We consider the instance of problem (14.4), where the feasibility region is
defined by a single inequality: h = 1 − ni=1 x2i . When deg(p) = 2d, it turns
P

out that the relaxation of order d + 1 is exact, since it admits a flat optimal
solution.
We will need the following simple result about positive semidefinite ma-
trices (Exercise ??).
Lemma 14.1.16 Consider a matrix in block-form:
 
A B
M= .
B∗ C
If M  0 then there exists a matrix Z such that B = AZ and C −Z ∗ AZ  0.
Theorem 14.1.17 Consider the NC polynomial optimization problem:
n
X
pnc
min = inf hψ, p(X)ψi s.t. g(X) := 1 − Xi2  0 (14.9)
H,ψ,X
i=1

where p is a symmetric polynomial of degree 2d. Then the moment relaxation


(d+1)
of order t = d + 1 has a flat optimal solution and thus pmom = pnc
min .
(d+1)
As the feasibility region of (14.5) is bounded, the program defining pmom
has an optimal solution L. With respect to the partition hxid+1 = hxid ∪{w :
|w| = d + 1}, the matrix Md+1 (L) has the block-form
 
A B
Md+1 (L) = ,
B∗ C
where A = Md (L).
The key idea is to build from L another linear form L0 ∈ Khxi∗2d+2 , which
coincides with L on words of size at most 2d+1 and which is still nonnegative
on M(g)d+1 .
For this consider the matrix
 
0 A B
M := ,
B ∗ Z ∗ AZ
14.1 Noncommutative polynomial optimization 331

where Z is the matrix such that B = AZ (which exists by Lemma 14.1.16).


Here are the steps of the proof. Show:
1. M 0  0 and rank(M 0 ) = rank(A).
2. M 0 is a moment matrix, i.e., for all u, v, w, z ∈ hxid+1 , Mu,v
0 0
= Mw,z if
∗ ∗ 0 ∗
u v = w z . In other words, there exists a linear form L ∈ Khxi2d+2 such
that M 0 = Md+1 (L0 ).
3. L0 (u) = L(u) for all words u ∈ hxi2d+1 .
4. L0 (f ∗ (1 − i x2i )f ) ≥ 0 for all f ∈ Khxid .
P

Based on this one can conclude the proof of Theorem 14.1.17.

Proof for assertion 4. Write f = g + r, where deg(g) ≤ d − 1 and r =


P
u:|u|=d ru u. Then we have:
X X X X
L0 (f ∗ (1− x2i )f ) = L(f ∗ (1 − x2i )f ) + L0 (r∗ (1 − x2i )r) − L(r∗ (1 − x2i )r) .
i
| {z i } | i
{z i
}
≥0 :=∆

So it suffices to show that ∆ ≥ 0. For this write


X
∆= L(r∗ x2i r) − L0 (r∗ x2i r)
| {z }
i
:=∆i

and observe that ∆i is equal to


X X
ru rv (L(u∗ x2i v) − L0 (u∗ x2i v)) = ru rv (C − Z ∗ AZ)xi u,xi v ≥ 0,
u,v:|u|=|v|=d u,v

using the fact that C − Z ∗ AZ  0.


One can in fact prove a stronger result. Let us introduce the NC version
of the sum-of-squares bound of order t:
p(t)
sos := sup λ s.t. p − λ ∈ M(h)t . (14.10)
The following inequalities hold:
p(t) (t)
sos ≤ pmom ≤ pmin . (14.11)
Theorem 14.1.18 Let p be a symmetric polynomial of degree 2d. The
following assertions are equivalent:
(i) p(X)  0 over {X ∈ k (S k )n : I − i Xi2  0},
S P

(ii) p(X)  0 over {X ∈ (S |hxid | )n : I − i Xi2  0},


P

(iii) p ∈ M(1 − i x2i )d+1 .


P
Noncommutative polynomial optimization and applications to quantum information theory
332

Note that the implications (i) =⇒ (ii) and (iii) =⇒ (i) are straightforward.
We only need to prove that (ii) =⇒ (iii). For this assume that (ii) holds and,
for contradiction, p 6∈ M(g)d+1 , setting g = 1 − i x2i .
P

Here are the main steps of the proof. Show:


1. There exists a linear form L ∈ Khxi∗2d+2 such that L(p) < 0 and L ≥ 0
on M(g)d+1 . Hint: You may use the fact that the truncated quadratic
module M(g)d+1 is a closed subset of Khxi2d+2 .
2. Such a linear form L may be assumed to be flat, i.e., rankMd+1 (L) =
rankMd (L) =: r.
3. There exist a unit vector ψ ∈ H = Kr and Hermitian matrices Xi ∈ Kr×r
such that I − i Xi2  0 and L(p) = hψ, p(X)ψi.
P

Hint: Use Theorem 14.1.6.

Conclude that condition (ii) is violated, which finishes the proof.

Remark 1: Theorems 14.1.17 and 14.1.18 also hold when replacing the ball
P
(defined by I − i Xi  0) by the hypercube (defined by the inequalities
I − Xi2  0 for i ∈ [n]).
Remark 2: The result of Theorem 4 is not true when restricting to polyno-
mials in commutative variables. For this consider the polynomial (in com-
mutative variables):
p (x1 , x2 .x3 ) = M (x1 , x2 , x3 ) + (x61 + x62 + x63 ),
where M is the homogenized Motzkin polynomial:
M (x1 , x2 , x3 ) = x21 x42 + x41 x22 − 3x21 x22 x23 + x63 .
Then, for  > 0, p > 0 on R3 \ {0}. Moreover, M = lim→0 p and M is
not a sum of squares. Since the cone of sums of squares of a given degree is
closed, this implies that p is not a sum of squares for some  > 0. Then, for
the minimum p,min of p over the unit ball {x : x21 + x22 + x23 ≤ 1}, we have
that
p(t) (t)
,sos = p,mom < p,min = 0,

which follows from the following arguments:


(t) (t)
• As the program defining p,mom is strictly feasible, no duality gap: p,sos =
(t) (t)
p,mom and the optimum is attained in the program defining p,sos , so p −
(t)
p,sos ∈ M(h)t .
(t) (t)
• p,sos < 0, since if p,sos = 0 then p ∈ M(h)t , which implies that p is a
SOS (show this using the fact that p is homogeneous).
14.1 Noncommutative polynomial optimization 333

14.1.4 NC optimization over the discrete hypercube


Given a quadratic symmetric polynomial p ∈ Rhxi2 , we consider the NC
polynomial optimization problem:

pnc 2
min = min hψ, p(X)ψi s.t. Xi = I (i ∈ [n]) (14.12)
H,ψ,X

and its moment relaxation of order 1:

p(1) 2
mom = min ∗ L(p) s.t. L(1) = 1, L(xi ) = 1 (i ∈ [n]), M1 (L)  0.
L∈Rhxi2
(14.13)
The following result holds.

Theorem 14.1.19 For any quadratic symmetric polynomial p the moment


(1)
relaxation of order 1 is exact: pnc
min = pmom and problem (14.4) admits an
optimal solution with dimension dim H = n + 1.

The proof follows the following steps. As M1 (L)  0 there exist vectors
v0 , v1 , . . . , vn ∈ Rn+1 forming a Gram representation of M1 (L), i.e.,

hv0 , v0 i = L(1) = 1, hv0 , vi i = L(xi ), hvi , vj i = L(xi xj ) for i, j ∈ [n].


(14.14)
Let H denote the span of the vectors v0 , v1 , . . . , vn .
−vi
If vi = v0 set Xi = I and, if vi 6= v0 , set wi := kvv00 −vik
, let Pi : H → H
denote the projection onto wi and set Xi = I − 2Pi .

1. Show that Xi2 = I, Xi v0 = vi and hv0 , p(X)v0 i = L(p).


2. Conclude that (H, ψ = v0 , X1 , . . . , Xn ) is an optimal solution for problem
(14.4).
3. If p = ni,j=1 pij xi xj is homogeneous quadratic then
P

n
X
p(1)
mom = minn pij yij s.t. Y  0, yii = 1 (i ∈ [n]).
Y ∈S
i,j=1

Remark 3. One can associate to XOR games NC polynomial optimization


problems of the type (14.12). For example, the following NC polynomial
optimization problem is associated to the CHSH game:
X
max (−1)st hψ, Xs Yt ψi s.t. ψ unit vector, Xs2 = Yt2 = I (s, t = 0, 1).
s,t∈{0,1}

WE will come back to this in the following section.


Noncommutative polynomial optimization and applications to quantum information theory
334

14.1.5 Reduction from complex to real


We consider problem (14.4) in the case when p is a real symmetric polyno-
mial, i.e., p ∈ Rhxi and p∗ = p. Our goal is to show that we can restrict wlog
to (H, ψ, X) being real valued. For this let pC R nc
min (resp., pmin = pmin ) denote
the optimum value of problem (14.4) where we optimize over (H, ψ, X) be-
ing complex valued (resp., real valued). Clearly, pC min ≤ pmin . We will show
R

that equality holds.


Consider the following standard ‘realification’ operator:
Φ : Cd×d → R2d×2d ,
 
d×d d×d A −B
M = A + iB ∈ C with A, B ∈ R , M 7→ Φ(M ) = .
B A
Consider also the following additional optimization problems:
K
min = min hρ, p(X)i s.t. ρ  0, Tr(ρ) = 1,
pg (14.15)
H,ρ,X

with K = C, R, chosing respectively H, X, ρ to be complex valued or real


valued.
The proof goes along the following steps, which you will show in Exer-
cise ??.
1. For any X, Y ∈ Cd×d , we have
hX, Y i = 21 hΦ(X), Φ(Y )i,
X is Hermitian ⇐⇒ Φ(X) is symmetric,
X  0 ⇐⇒ Φ(X)  0.
C R
2. pC
min = pg
R
min and pmin = pgmin .
C R C R
3. pg
min = pgmin , and conclude that pmin = pmin .

14.2 Application to quantum information


14.2.1 Quantum correlations
14.2.2 Nonlocal games
14.2.3 XOR games
14.3 Notes and further reading
14.4 Exercises
14.1
14.2
References

[1] M. Anjos and J.B. Lasserre (eds.) Handbook on Semidefinite, Conic and Poly-
nomial Optimization, (Anjos and Lasserre, eds.), Springer, 2012.
[2] J.-B. Lasserre. Global optimization with polynomials and the roblem of mo-
ments. SIAM J. Opt., 11:796–817, 2001.
[3] J.B. Lasserre. Moments, positive polynomials and their applications, Imperial
College Press, 2009.
[4] M. Laurent. Sums of squares, moment matrices and optimization over poly-
nomials. In Emerging Applications of Algebraic Geometry, Vol. 149 of IMA
Volumes in Mathematics and its Applications, M. Putinar and S. Sullivant
(eds.), Springer, pages 157-270, 2009.
[5] P. Parrilo. Structured semidefinite programs and semialgebraic geometry
methods in robustness and optimization. PhD thesis, Caltech, 2000.
[6] P. Parrilo. Semidefinite programming relaxations for semialgebraic problems.
Math. Prog. 96:293–320, 2003.
[7] S. Burgdorf, I. Klep, J. Povh. Optimization of Polynomials in Non-
Commuting Variables. Springer Briefs in Mathematics, 2016.
[8] K. Cafuta, I. Klep, J. Povh. Constrained polynomial optimization problems
with noncommuting variables. SIAM J. Opt., 22:363–383, 2012.
[9] A.C. Doherty, Y.-C. Liang, B. Toner and S. Wehner. The quantum moment
problem and bounds on entangled multi-prover games. Proc. IEEE CCC,
2008.
[10] J.W. Helton, S.A. McCullough. A Positivstellensatz for non-commutative
polynomials. Trans. Amer. Math. Soc. 356:3721–3737, 2004.
[11] I. Klep, J. Povh. Constrained trace-optimization of polynomials in freely non-
commuting variables. J. Global Opt., 64:325–348, 2016.
[12] I. Klep, M. Schweighofer. Connes’ embedding conjecture and sums of Hermi-
tian squares. Adv. Math. 217:1816–1837, 2008.
[13] S. Pironio, M. Navascues, A. Acin. Convergent relaxations of polynomial
optimization problems with non-commuting variables. SIAM J. Optimization,
20:2157–2180, 2010.
[14] M. Navascues, S. Pironio, A. Acin. SDP relaxations for non-commutative
polynomial optimization. Chapter 21 in Handbook on Semidefinite, Conic
and Polynomial Optimization, (Anjos and Lasserre, eds.), Springer, 2012.
15
Symmetries

Motivation and running example: The theta number of Cayley graphs


Suppose Γ is a finite group. A subset X ⊆ Γ will be called a connection set
if the unit element e of Γ does not belong to X, and if X is inverse-closed;
that is x−1 ∈ X whenever x ∈ X. For any connection set X ⊆ Γ, the Cayley
graph Cayley(Γ, X) is the graph with vertex set Γ, where two vertices x
and y are adjacent if and only if y −1 x ∈ X. The defining conditions of a
connection set imply that Cayley(Γ, X) is an undirected graph without self-
loops. Notice that we do not require X to generate Γ; therefore Cayley(Γ, X)
need not be connected.

15.1 Symmetry reduction and matrix ∗ algebras


15.1.1 Complex semidefinite programs
Sometimes—especially when dealing with invariant semidefinite programs
or in the area of quantum information theory—it is convenient to work with
complex Hermitian matrices instead of real symmetric matrices. A complex
T
matrix X ∈ Cn×n is called Hermitian if X = X ∗ , where X ∗ = X denotes
the conjugate transpose of X, i.e. Xij = X ji . A Hermitian matrix is called
positive semidefinite if for all vectors x ∈ Cn we have x∗ Xx ≥ 0. The
space of Hermitian matrices is equipped with the real-valued inner product
hX, Y iT = Tr(Y ∗ X). Now a primal complex semidefinite program is

p∗ = sup{hC, XiT : X  0, hA1 , XiT = b1 , . . . , hAm , XiT = bm }, (15.1)

where A1 , . . . , Am ∈ Cn×n , and C ∈ Cn×n are given Hermitian matrices,


b ∈ Rm is a given vector and X ∈ Cn×n is the positive semidefinite Hermitian
optimization variable (denoted by X  0).
One can easily reduce complex semidefinite programming to real semidef-
15.1 Symmetry reduction and matrix ∗ algebras 337

inite programming by the following construction: A complex matrix X ∈


Cn×n defines a real matrix
 
0 <(X) −=(X)
X = ∈ R2n×2n ,
=(X) <(X)

where <(X) ∈ Rn×n and =(X) ∈ Rn×n are the real, respectively, the imagi-
nary parts of X. Then X is Hermitian and positive semidefinite if and only
if X 0 is symmetric and positive semidefinite.

15.1.2 Invariant semidefinite programs


Symmetry reduction of semidefinite programs is easiest explained using com-
plex semidefinite programs of the form (15.1). Let Γ be a finite group and
let π : Γ → U(Cn ) be a unitary representation of Γ, that is, a group
homomorphism from Γ to the group of unitary matrices U(Cn ). Then Γ acts
on the space of complex matrices by

(g, X) 7→ gX = π(g)Xπ(g)∗ .

A complex matrix X is called Γ-invariant if X = gX holds for all g ∈ Γ.


By
(Cn×n )Γ = {X ∈ Cn×n : X = gX for all g ∈ Γ}

we denote the set of all Γ-invariant matrices.


Definition 15.1.1 Let Γ be a finite group. A complex semidefinite pro-
gram is called Γ-invariant if for every feasible solution X and every g ∈ Γ,
the matrix gX also is feasible and hC, XiT = hC, gXiT holds. (Recall
hX, Y iT = Tr(Y ∗ X).)
Suppose that the complex semidefinite program (15.1) is Γ-invariant. Then
we may restrict the optimization variable X to be Γ-invariant without chang-
ing the supremum. In fact, if X is feasible for (15.1), so is its Γ-average
1 X
X= gX.
|Γ|
g∈Γ

Hence, (15.1) simplifies to

p∗ = sup{hC, XiT : X  0, X ∈ (Cn×n )Γ ,


(15.2)
hA1 , XiT = b1 , . . . , hAm , XiT = bm }.

If we intersect the Γ-invariant complex matrices (Cn×n )Γ with the Hermitian


338 Symmetries

matrices we get a vector space having a basis B1 , . . . , BN . If we express X


in terms of this basis, (15.2) becomes
p∗ = sup{hC, XiT : x1 , . . . , xN ∈ C,
X = x1 B1 + · · · + xN BN  0, (15.3)
hA1 , XiT = b1 , . . . , hAm , XiT = bm }.
So the number of optimization variables is N . It turns out that we can sim-
plify (15.3) even more by performing a simultaneous block diagonalization
of the basis B1 , . . . , BN . This is a consequence of the main structure theorem
of matrix ∗-algebras.

15.1.3 Matrix ∗-algebras


Definition 15.1.2 A linear subspace A ⊆ Cn×n is called a matrix al-
gebra if it is closed under matrix multiplication. It is called a matrix
∗-algebra if it is closed under taking the conjugate transpose: if A ∈ A,
then A∗ ∈ A.
The space of Γ-invariant matrices (Cn×n )Γ is a matrix ∗-algebra. Indeed,
for Γ-invariant matrices X, Y and g ∈ Γ, we have

g(XY ) = π(g)XY π(g)∗ = (π(g)Xπ(g)∗ )(π(g)Y π(g)∗ ) = (gX)(gY ) = XY,

and
g(X ∗ ) = π(g)X ∗ π(g)∗ = (π(g)Xπ(g)∗ )∗ = (gX)∗ = X ∗ .

The main structure theorem of matrix ∗-algebras—it is due to Wedder-


burn and it is well-known in the theory of C ∗ -algebras, where it can be also
stated for the compact operators on a Hilbert space—is the following:
Theorem 15.1.3 Let A ⊆ Cn×n be a matrix ∗-algebra. Then there are
natural numbers d, m1 , . . . , md such that there is a ∗-isomorphism between
A and a direct sum of full matrix ∗-algebras
d
M
ϕ: A → Cmk ×mk .
k=1

Here a ∗-isomorphism is a bijective linear map between two matrix ∗-


algebras which respects multiplication and taking the conjugate transpose.
An elementary proof of Theorem 15.1.3, which also shows how to find a
∗-isomorphism ϕ algorithmically, is presented in [2012]. An alternative proof
15.1 Symmetry reduction and matrix ∗ algebras 339

is given in [? , Section 3] in the framework of representation theory of finite


groups; see also [2008] and [? ].
Now we want to apply Theorem 15.1.3 to block diagonalize the Γ-invariant
semidefinite program (15.3). Let A = (Cn×n )Γ be the matrix ∗-algebra of
Γ-invariant matrices. Let ϕ be a ∗-isomorphism as in Theorem 15.1.3; then
ϕ preserves positive semidefiniteness. Hence, (15.3) is equivalent to

p∗ = sup{hC, XiT : x1 , . . . , xN ∈ C,
x1 ϕ(B1 ) + · · · + xN ϕ(BN )  0,
X = x1 B1 + · · · + xN BN ,
hA1 , XiT = b1 , . . . , hAm , XiT = bm }.

Thus, instead of dealing with one (potentially big) matrix of size n × n one
only has to work with d (hopefully small) block diagonal matrices of size
m1 , . . . , md . This reduces the dimension from n2 to m21 + · · · + m2d . Many
practical semidefinite programming solvers can take advantage of this block
structure and numerical calculations can become much faster. However, find-
ing an explicit ∗-isomorphism is usually a nontrivial task, especially if one
is interested in parameterized families of matrix ∗-algebras.

15.1.4 Example: Delsarte Linear Programming Bound


Let us apply the symmetry reduction technique to demonstrate that the
exponential size semidefinite program ϑ0 (G(n, d)) collapses to the linear size
Delsarte Linear Programming Bound.
Since the graph G(n, d) is a Cayley graph over the additive group Fn2 , the
semidefinite program ϑ0 (G(n, d)) is Fn2 -invariant where the group is acting as
n n
permutations of the rows and columns of the matrix X ∈ CF2 ×F2 . The graph
G(n, d) has even more symmetries. Its automorphism group Aut(G(n, d))
consists of all permutations of the n coordinates x = x1 x2 · · · xn ∈ Fn2 fol-
lowed by independently switching the elements of F2 from 0 to 1, or vice
versa. So the semidefinite program ϑ0 (G(n, d)) is Aut(G(n, d))-invariant. The
∗-algebra Bn of Aut(G(n, d))-invariant matrices is called the Bose-Mesner
algebra (of the binary Hamming scheme). A basis B0 , . . . , Bn is given
by zero-one matrices
(
1, if dH (x, y) = r,
(Br )x,y =
0, otherwise,

with r = 0, . . . , n. So, ϑ0 (G(n, d)) in the form of (15.3) is the following


340 Symmetries

semidefinite program in n + 1 variables:


n  
n X
n n 1
max 2 xr : x0 = n , x1 = · · · = xd−1 = 0,
r 2
r=0
n
X o
xd , . . . , xn ≥ 0, xr Br  0 .
r=0

Finding a simultaneous block diagonalization of the Br ’s is easy since they


pairwise commute and have a common system of eigenvectors. An orthogonal
n
basis of eigenvectors is given by χa ∈ CF2 defined componentwise by
n
Y
(χa )x = (−1)aj xj .
j=1

Indeed,
X
(Br χa )x = (Br )x,y (χa )y
y∈Fn
2
X
= (Br )x,y (χa )y−x (χa )x
y∈Fn
2
 
X
= (χa )y−x  (χa )x
y∈Fn
2 ,dH (x,y)=r
 
X
= (χa )y  (χa )x .
y∈Fn
2 ,dH (0,y)=r

The eigenvalues are given by the Krawtchouk polynomials


r   
X x n−x
Kr(n,2) (x) = (−1)j
j r−j
j=0

through
X
(χa )y = Kr(n,2) (dH (0, a)).
y∈Fn
2 ,dH (0,y)=r

Altogether, we have the ∗-algebra isomorphism


n
M
ϕ : Bn → C,
r=0

(so m0 = · · · = mn = 1) defined by
ϕ(Br ) = (Kr(n,2) (0), Kr(n,2) (1), . . . , Kr(n,2) (n)).
15.2 Fourier analysis on finite abelian groups 341

So the semidefinite program ϑ0 (G(n, d)) degenerates to the following linear


program
n  
n X
n n 1
max 2 xr : x0 = n , x1 = · · · = xd−1 = 0, xd , . . . , xn ≥ 0,
r 2
r=0
n
X o
xr Kr(n,2) (j) ≥ 0 for j = 0, . . . , n .
r=0

This is the Delsarte Linear Programming Bound; see also Theorem ??.

15.2 Fourier analysis on finite abelian groups


The symmetry reduction of a semidefinite program becomes especially easy
when the the group Γ is abelian. In this case, as one sees from the formulas
involved, the theory is immediately related to the discrete Fourier transform
(DFT). The discrete Fourier transform is used in many practical applications
because one compute it very efficiently using the fast Fourier tranform (FFT)
James Cooley und John W. Tukey from 1965. Applications are for example
in digital signal processing for data compression or in in the design of fast
algorithms for muliplying polynomials or large integers (Schönhage–Strassen
algorithm).

15.2.1 Discrete Fourier transform


15.2.2 Bochner’s characterization
To make things concrete, let us demonstrate the basic strategy using the
cyclic group Z/nZ. This group is finite, so that discretization is unnecessary,
and Abelian, so that harmonic analysis becomes simple. Nevertheless, this
simple example already carries many essential features, and ought to be kept
in mind by the reader when the more complicated cases are treated later.
Let Σ ⊆ Z/nZ with 0 6∈ Σ be closed under taking negatives, i.e., Σ = −Σ.
Then we define the Cayley graph

Cayley(Z/nZ, Σ) = (Z/nZ, { {x, y} : x − y ∈ Σ} }),

which is an undirected graph whose vertices are the elements of Z/nZ and
where Σ defines the neighborhood of the neutral element 0; this neigh-
borhood is then transported to every vertex by group translations. Since
Σ = −Σ, the definition is consistent, and since 0 ∈/ Σ, the Cayley graph
342 Symmetries

does not have loops. For example, the n-cycle can be represented as a Cay-
ley graph:
Cn = Cayley(Z/nZ, Σ) with Σ = {1, −1}.
The goal in this section is to show that the computation of the theta
number ϑ0e (Cayley(Z/nZ, Σ)) with unit weights e = (1, . . . , 1) reduces from a
semidefinite program to a linear program if one works in the Fourier domain.
For this we need the characters of Z/nZ, which are group homomorphisms
χ : Z/nZ → T, where T is the unit circle in the complex plane. So every
character χ satisfies
χ(x + y) = χ(x)χ(y)
for all x, y ∈ Z/nZ.
The characters themselves form a group with the operation of pointwise
multiplication (χψ)(x) = χ(x)ψ(x); this is the dual group (Z/nZ)∗ of Z/nZ.
The trivial character e of Z/nZ defined by e(x) = 1 for all x ∈ Z/nZ is the
unit element. Moreover, if χ is a character, then its inverse is its complex
conjugate χ that is such that χ(x) = χ(x) for all x ∈ Z/nZ. We often view
characters as vectors in the vector space CZ/nZ .
Lemma 15.2.1 Let χ and ψ be characters of Z/nZ. Then the following
orthogonality relation holds:
(

X |Z/nZ| if χ = ψ,
χ ψ= χ(x)ψ(x) =
x∈Z/nZ
0 otherwise.

Proof If χ = ψ, then,
X X
χ∗ χ = χ(x)χ(x) = 1 = |Z/nZ|
x∈Z/nZ x∈Z/nZ

holds. If χ 6= ψ, then there is y ∈ Z/nZ, so that (χψ)(y) 6= 1. Furthermore,


we have
X X
(χψ)(y)χ∗ ψ = (χψ)(y) χ(x)ψ(x) = χ(x + y)ψ(x + y)
x∈Z/nZ x∈Z/nZ
X
= χ(x)ψ(x) = χ∗ ψ,
x∈Z/nZ

so χ∗ ψ has to be zero.
As a corollary we can explicitly give all characters of Z/nZ and see that
they form an orthogonal basis of CZ/nZ . It follows that the dual group
(Z/nZ)∗ is isomorphic to Z/nZ.
15.2 Fourier analysis on finite abelian groups 343

Corollary 15.2.2 Every element u ∈ Z/nZ defines a character of Z/nZ


by

χu (x) = e2πiux/n .

The map u 7→ χu is a group isomorphism between Z/nZ and its dual group
(Z/nZ)∗ .

Proof One immediately verifies that the map u 7→ χu is well-defined, that


it is an injective group homomorphism, and that χu is a character of Z/nZ.
By the orthogonality relation we see that the number of different charac-
ters of Z/nZ is at most the dimension of the space CZ/nZ , hence |Z/nZ|
equals |(Z/nZ)∗ | and the map is a bijection.

Given a function f : Z/nZ → C, the function fˆ: (Z/nZ)∗ → C such that

1 X
fˆ(χ) = f (x)χ−1 (x)
|Z/nZ|
x∈Z/nZ

is the discrete Fourier transform of f ; the coefficients fˆ(χ) are called the
Fourier coefficients of f . We have then the Fourier inversion formula:
X
f (x) = fˆ(χ)χ(x).
χ∈(Z/nZ)∗

We say that f : Z/nZ → C is of positive type if f (x) = f (−x) for all x ∈


Z/nZ and for all ρ : Z/nZ → C we have
X
f (x − y)ρ(x)ρ(y) ≥ 0.
x,y∈Z/nZ

So f is of positive type if and only if the matrix K(x, y) = f (x−y) is positive


semidefinite. With this we have the following characterization for the theta
number of Cayley(Z/nZ, Σ).

Theorem 15.2.3 We have that

ϑ0e (Cayley(Z/nZ, Σ)) = min f (0)


f (x) ≤ 0 for all x ∈
/ Σ ∪ {0},
P (15.4)
x∈Z/nZ f (x) ≥ |Zn |,
f : Z/nZ → R is of positive type.
344 Symmetries

Alternatively, expressing f in the Fourier domain we obtain:


ϑ0e (Cayley(Z/nZ, Σ)) = min
P ˆ
∗ f (χ)
Pχ∈(Z/nZ) ˆ
χ∈(Z/nZ)∗ f (χ)χ(x) ≤ 0 for all x ∈ / Σ ∪ {0},
ˆ
f (e) ≥ 1,
fˆ(χ) ≥ 0 and fˆ(χ) = fˆ(χ−1 ) for all χ ∈ (Z/nZ)∗ .
(15.5)
Proof Functions f : Z/nZ → C correspond to Z/nZ-invariant matrices K : Z/nZ×
Z/nZ → C, which are matrices such that K(x + z, y + z) = K(x, y) for all x,
y, z ∈ Z/nZ.
In solving problem (??) for computing ϑ0e we may restrict ourselves to
Z/nZ-invariant matrices. This can be seen via a symmetrization argument:
If (M, K) is an optimal solution of (??), then so is (M, K) with
1 X
K(x, y) = K(x + z, y + z),
|Z/nZ|
z∈Z/nZ

which is Z/nZ-invariant.
So we can translate problem (??) into (15.4). The objective function and
the constraint on nonedges translate easily. The positive-semidefiniteness
constraint requires a bit more work.
First, observe that to require K to be real and symmetric is to require f
to be real and such that f (x) = f (−x) for all x ∈ Z/nZ. We claim that each
character χ of Z/nZ gives an eigenvector of K with eigenvalue |Z/nZ|fˆ(χ).
Indeed, using the inversion formula we have
X X
(Kχ)(x) = K(x, y)χ(y) = f (x − y)χ(y)
y∈Z/nZ y∈Z/nZ
X X
= fˆ(ψ)ψ(x − y)χ(y)
y∈Z/nZ ψ∈(Z/nZ)∗
X X
= fˆ(ψ) ψ(y)χ(x − y)
ψ∈(Z/nZ)∗ y∈Z/nZ
X X
= fˆ(ψ)χ(x) ψ(y)χ(y)
ψ∈(Z/nZ)∗ y∈Z/nZ

= |Z/nZ|fˆ(χ)χ(x),
as claimed.
This immediately implies that K is positive semidefinite — or, equiva-
lently, f is of positive type — if and only if fˆ(χ) ≥ 0 for all characters χ.
Now, since fˆ(e) = |Z/nZ|−1 x∈Z/nZ f (x), and since e is an eigenvalue of K,
P
15.3 Fourier analysis of finite Nonabelian groups 345

then K − eeT is positive semidefinite if and only if x∈Z/nZ f (x) ≥ |Z/nZ|


P

and f is of positive type.


So we see that (??) can be translated into (15.4). Using the inversion
formula and noting that f is real-valued if and only if fˆ(χ) = fˆ(χ−1 ) for
all χ, one immediately obtains (15.5).

Cayley graphs on the cyclic group are not particularly exciting. Every-
thing in this section, however, can be straightforwardly applied to any finite
Abelian group. If, for instance, one considers the group Zn2 , then it becomes
possible to model binary codes as independent sets of Cayley graphs, and the
analogue of Theorem 15.2.3 gives Delsarte’s linear programming bound [? ].

15.2.3 Example: Cycle graphs


15.2.4 Example: SDP lifts of cyclic polytopes
15.3 Fourier analysis of finite Nonabelian groups
15.3.1 Theory
In the following we recall some basic facts from representation theory of
finite groups. For a good reference, see for instance Terras [? ]. A (finite-
dimensional) unitary representation of Γ is a group homomorphism π : Γ →
U (dπ ) where U (dπ ) is the group of unitary dπ × dπ matrices. The number dπ
is called the degree of π. The character of π is defined as χπ (γ) = Tr(π(γ)),
where Tr denotes trace. A subspace M of Cdπ is π-invariant if π(γ)m ∈
M for all γ ∈ Γ and m ∈ M . The unitary representation π is said to
be irreducible if {0} and Cdπ are the only π-invariant subspaces of Cdπ .
Two unitary representations π and π 0 are (unitarily) equivalent if there is a
unitary matrix T such that T π(γ) = π 0 (γ)T for all γ ∈ Γ.
Schur’s lemma says that if π and π 0 are two irreducible unitary represen-
tations, and if T is a matrix such that T π(γ) = π 0 (γ)T for all γ ∈ Γ, then
T is either invertible or zero; if π = π 0 , then T is a scalar multiple of the
identity matrix.
We fix a set of mutually inequivalent irreducible unitary representations
of Γ, so that each unitary equivalence class has a representative; call this set
b This allows us to define the Fourier transform of a function f : Γ → C:
Γ.
X
fˆ(π) = f (γ)π(γ),
γ∈Γ

where fˆ(π) is a complex dπ × dπ matrix. The Fourier inversion formula says


346 Symmetries

we can recover f from its Fourier transform:


1 X
f (γ) = dπ hfˆ(π), π(γ)i.
|Γ|
π∈Γ
b

The inner product used here is the trace inner product, defined as hA, Bi =
Tr(B ∗ A) for square complex matrices A and B of the same dimension, where
B ∗ denotes the conjugate-transpose of B.
The convolution of two functions f : Γ → C and g : Γ → C is defined by
X
f ∗ g(γ) = f (β)g(β −1 γ),
β∈Γ

and the involution of f is defined as f ∗ (γ) = f (γ −1 ). It is a fact that


f[∗ g(π) = fˆ(π)ĝ(π), and that fc∗ (π) = fˆ(π)∗ .
A function f : Γ → C is of positive type if
X
g ∗ g ∗ (γ)f (γ) ≥ 0
γ∈Γ

for all functions g : Γ → C; that is, the sum is a nonnegative real number.
We denote by P(Γ) the set of functions on Γ of positive type. Notice that
f ∈ P(Γ) if and only if f¯ ∈ P(Γ), where f¯ is the pointwise complex-conjugate
of f . One fact that will be needed later is that f (γ −1 ) = f (γ) for all γ ∈ Γ
when f is of positive type. For a proof of this fact and more information on
functions of positive type, see Folland [? , Chapter 3.3].
For vectors u, v ∈ Cn , we use hu, vi to denote the usual inner product
of u and v. An n × n matrix A with entries from C will be called positive
semidefinite if hAv, vi is a nonnegative real number for all v ∈ Cn . Using the
polarization identity, it is possible to prove that every positive semidefinite
matrix is Hermitian. For each finite set V , the set of positive semidefinite
matrices with rows and columns indexed on V will be denoted S+ V . When
n
V = {1, . . . , n}, we will use the notation S+ instead. It is a fact that A ∈ S+ n

if and only if hA, Bi ≥ 0 for all B ∈ S+ n ; this fact is known as the self-duality
n
of S+ .

15.3.2 Bochner’s theorem


The following theorem is an application of self-duality, as well as Parseval’s
identity, which says that
X 1 X
f (γ)g(γ) = dπ hfˆ(π), ĝ(π)i
|Γ|
γ∈Γ π∈Γ
b
15.3 Fourier analysis of finite Nonabelian groups 347

for all functions f and g on Γ. A more general analytic version is proven in


[? ]. In the case of a finite group, the proof is simpler and we include it to
make the text self-contained.

Theorem 15.3.1 (Bochner’s theorem for finite groups) Suppose Γ is a


finite group and let f : Γ → C. Then f is of positive type if and only if fˆ(π)
is positive semidefinite for each π ∈ Γ.
b

Proof For any two complex-valued functions f and g on Γ, we have


X 1 X 1 X
g ∗g ∗ (γ)f (γ) = ∗ g ∗ (π), fˆ(π)i =
dπ hg\ dπ hĝ(π)ĝ(π)∗ , fˆ(π)i.
|Γ| |Γ|
γ∈Γ π∈Γ
b π∈Γ
b
(15.6)
The matrices ĝ(π)ĝ(π)∗ are always positive semidefinite, so (15.6) is non-
negative if all the matrices fˆ(π) are positive semidefinite. This gives one
direction.
For the other direction, suppose f : Γ → C is of positive type, and fix
b Now let A ∈ S dπ be arbitrary, and let A = BB ∗ be the Cholesky
π ∈ Γ. +
decomposition. Define g : Γ → C by g(γ) = dπ /|Γ|hB, π(γ)i. By uniqueness
of Fourier coefficients we have ĝ(π) = B and ĝ(π 0 ) = 0 when π 0 and π are
inequivalent, whence

ĝ(π)ĝ(π)∗ = BB ∗ = A and ĝ(π 0 )ĝ(π 0 )∗ = 0.

Now (15.6), which is nonnegative by hypotheses, is equal to dπ /|Γ|hA, fˆ(π)i.


Since π and A were arbitrary, we conclude that hA, fˆ(π)i ≥ 0 for every π
and every A ∈ S+ dπ dπ
. Self-duality of S+ now implies fˆ(π) ∈ S+ dπ
for each
π ∈ Γ.
b

15.3.3 Example: The ϑ-number of a Cayley graph


Let G = (V, E) be a finite graph. In [? ], the Lovász ϑ-number ϑ(G) of G is
defined and a number of equivalent formulations are given. The formulation
of ϑ(G) which will be most important for us is:
n X
V
ϑ(G) = max A(u, v) : A ∈ S+ real-valued, (A)
o
u,v∈V
Tr(A) = 1, A(u, v) = 0 for {u, v} ∈ E .

When G is the Cayley graph Cayley(Γ, X), the optimization over matrices
in (A) can be replaced with optimization over functions on Γ, as we proceed
to show.
348 Symmetries

Theorem 15.3.2 Suppose G = Cayley(Γ, X). Then


nX
ϑ(G) = max f (γ) : f ∈ P(Γ) real-valued, (B)
o
γ∈Γ
f (e) = 1, f (x) = 0 for x ∈ X .

Before we prove Theorem 15.3.2, we require a lemma:


Lemma 15.3.3 Suppose A : Γ × Γ → C is a Hermitian matrix satisfying
A(γ, e) = A(γβ, β) for all γ, β ∈ Γ. Define f : Γ → C by f (γ) = A(γ, e).
Then for any function g : Γ → C we have
X X
g ∗ g ∗ (γ)f (γ) = g(γ)g(γ 0 )A(γ, γ 0 ).
γ∈Γ γ,γ 0 ∈Γ

Proof This follows from a straightforward computation.

Proof of Theorem 15.3.2 For one direction, let A be a feasible solution for
(A). Define Ā : Γ × Γ → R entrywise by
1 X
Ā(γ, γ 0 ) = A(γβ, γ 0 β).
|Γ|
β∈Γ

Being the average of matrices similar to A (via permutation matrices), the


matrix Ā is positive semidefinite, and one now easily checks that Ā is again
a feasible solution for (A) having the same objective value as A. Moreover,
we have Ā(γ, e) = Ā(γβ, β) for all γ, β ∈ Γ.
Now define f : Γ → R by f (γ) = |Γ|Ā(γ, e). Then Ā and f /|Γ| satisfy the
hypotheses of Lemma 15.3.3, so
X X
g ∗ g ∗ (γ)f (γ) = |Γ| g(γ)g(γ 0 )Ā(γ, γ 0 ),
γ∈Γ γ,γ 0 ∈Γ

and since Ā is positive semidefinite, it follows that the function f is of pos-


itive type. It is easily checked that the other constraints of (B) are satisfied
by f , and moreover that the objective values are equal:
X X X X
f (γ) = |Γ| Ā(γ, e) = Ā(γ, γ 0 ) = A(γ, γ 0 ).
γ∈Γ γ∈Γ γ,γ 0 ∈Γ γ,γ 0 ∈Γ

For the other direction, we begin with a feasible solution f : Γ → R to (B),


and we define A : Γ × Γ → R by A(β, γ) = |Γ| 1
f (βγ −1 ). Then A is a feasible
P
solution to (A) by Lemma 15.3.3, and its objective value is γ∈Γ f (γ).

Using Theorem 15.3.1, we can also give a (complex) semidefinite program-


ming formulation of (B) using block matrices.
15.3 Fourier analysis of finite Nonabelian groups 349

Theorem 15.3.4 Suppose G = Cayley(Γ, X). Then


n

ϑ(G) = max Aid : Aπ ∈ S0 for each π ∈ Γ,
b (C)
X X o
dπ Tr(Aπ ) = |Γ|, dπ hAπ , π(x)i = 0 for x ∈ X ,
π∈Γ
b π∈Γ
b

where 1 ∈ Γ
b denotes the trivial representation.

Proof If f : Γ → R is any feasible solution to (B), set Aπ = fˆ(π) for


each π ∈ Γ̂. By Theorem 15.3.1, the matrices Aπ are positive semidefinite.
Moreover, one easily checks using the Fourier inversion formula that the
other constraints of (C) are satisfied by {Aπ : π ∈ Γ},
b and that the objective
P
values are equal: A1 = γ∈Γ f (γ).
For the other direction, let {Aπ : π ∈ Γ}
b be a feasible solution for (C) and
define g : Γ → C by
1 X
g(γ) = dπ hAπ , π(γ)i for all γ ∈ Γ.
|Γ|
π∈Γ
b

Then g is of positive type by Theorem 15.3.1. Now define f (γ) = 21 (g(γ) +


g(γ −1 )) for all γ ∈ Γ. Then f is real-valued, and that f satisfies all the other
constraints of (B) is easily checked using the fact that X is inverse-closed.
Moreover
X X
f (γ) = g(γ) = ĝ(1) = A1 .
γ∈Γ γ∈Γ

When Γ is an Abelian group, then all its irreducible representation are


one-dimensional. Therefore, the semidefinite program (C) is just a linear
program. More generally, (C) is equivalent to a linear program whenever the
connection set of the Cayley graph Cayley(Γ, X) is closed under conjugation;
that is, γxγ −1 ∈ X for all x ∈ X and γ ∈ Γ. This is the content of the next
theorem.

Theorem 15.3.5 Let G be the Cayley graph Cayley(Γ, X) and suppose


that the connection set X is closed under conjugation. Then
n
ϑ(G) = max aid : aπ ≥ 0 for each π ∈ Γ,b (D)
X X o
d2π aπ = |Γ|, dπ aπ χπ (x) = 0 for x ∈ X .
π∈Γ
b π∈Γ
b

Proof We prove the equivalence of (C) and (D). Let {Aπ : π ∈ Γ}


b be a
350 Symmetries

feasible solution for (C), and for each π let


1 X
Āπ = π(γ)Aπ π(γ)∗ .
|Γ|
γ∈Γ

Then {Āπ : π ∈ Γ}
b is again a solution to (C): If x ∈ X, then
X 1 X X
dπ hĀπ , π(x)i = dπ hπ(γ)Aπ π(γ)∗ , π(x)i
|Γ|
π∈Γ
b π∈Γ
b γ∈Γ
1 X X
= dπ hπ(γ)Aπ , π(xγ)i.
|Γ|
π∈Γ
b γ∈Γ

Since X is closed under conjugation there is a y ∈ X so that xγ = γy holds.


Hence, the sum above equals
1 X X 1 X X X
dπ hπ(γ)Aπ , π(γy)i = dπ hAπ , π(y)i = dπ hAπ , π(y)i = 0.
|Γ| |Γ|
π∈Γ
b γ∈Γ π∈Γ
b γ∈Γ π∈Γ
b

Moreover, since π(γ)Aπ π(γ)∗ is similar to Aπ for each γ ∈ Γ, the matrix Āπ
is positive semidefinite for each π ∈ Γb and P b dπ Tr(Āπ ) = |Γ|.
π∈Γ
We have constructed Āπ so that Āπ π(γ) = π(γ)Āπ for all γ ∈ Γ. Schur’s
lemma then implies that Āπ is equal to aπ Idπ for some scalar aπ and since Āπ
is positive semidefinite this scalar is nonnegative. We have dπ aπ = Tr(Āπ )
as well as
hĀπ , π(γ)i = aπ χπ (γ) for all γ ∈ Γ,
so {aπ : π ∈ Γ}
b is a feasible solution to (D) having objective value aid = Aid .
For the other direction, we take a feasible solution {aπ : π ∈ Γ} b to (D),
and for each π ∈ Γ, we set Aπ = aπ Idπ . This is a feasible solution to (C)
b
with objective value Aid = aid .
P
Denote the constraint π∈Γb dπ aπ χπ (x) = 0 by Cx (x ∈ X). For compu-
tational purposes, the following simplifications can be applied to (D): First,
only one of the constraints {Cx , Cx−1 } is needed. Second, since the characters
χπ are constant on conjugacy classes, it suffices to keep only the constraints
Cx , with one x per conjugacy class.

15.3.4 Blowing up vertex transitive graphs


The final theorem in this note shows that for the purposes of estimating the
independence number of a graph, the theory presented in the preceding sec-
tions can be applied not just to Cayley graphs, but also to vertex-transitive
graphs.
15.4 Notes 351

Schreier graphs
Theorem 15.3.6 Let G = (V, E) be a graph and let Γ be a group of
automorphisms of G. Suppose Γ acts transitively on V . Then there exists a
connection set X ⊆ Γ such that
|V |
α(G) = |Γ| α(Cayley(Γ, X)).

Proof Pick a vertex x0 ∈ V and define


X = {γ ∈ Γ : {x0 , γ · x0 } ∈ E}.
Then for β, γ ∈ Γ, one has an edge {β, γ} in the Cayley graph Cayley(Γ, X)
if and only if
γ −1 β ∈ X ⇐⇒ {x0 , γ −1 β · x0 } ∈ E ⇐⇒ {γ · x0 , β · x0 } ∈ E.
Now notice that by the orbit-stabilizer theorem, one has
|Γ|
|{γ ∈ Γ : γ · x = x}| = for all x ∈ V ,
|V |
and the theorem follows immediately.
Going from G to the Cayley graph Cayley(Γ, X) is accomplished using
the following procedure: First choose a vertex x0 ∈ V arbitrarily, and let H
be the stabilizer subgroup of x0 in Γ. Each vertex x ∈ V is then replaced
with an empty graph on the left coset of H in Γ consisting of all those γ ∈ Γ
such that γ · x0 = x. In other words, the vertex set V is regarded as a Γ-
homogeneous space, and each vertex is “blown up” to an independent set of
size |Γ|/|V | by replacing it with its inverse image under the projection map.

15.3.5 Example: Product free sets


15.4 Notes
Powers of cyclic graphs: Bachoc, et al.
Product free sets: Gowers
EKR Theorems

Exercises
15.1 Challenge: Given a natural number n and a set Σ ⊆ Z/nZ which is
closed under taking inverses, Σ = −Σ. Find a formula for ϑ(Cayley(Z/nZ, Σ).
Appendix A
Convexity (Version: May 24, 2022)

A set C is called convex if, given any two points x and y in C, the straight
line segment connecting x and y lies completely inside of C. For instance,
cubes, balls or ellipsoids are convex sets whereas a torus is not. Intuitively,
convex sets do not have holes or dips. A real-valued function f : C → R is
called convex, if the set of points (x, y) ∈ C × R which lie above the graph
of the function (x, f (x)), the epigraph of f , is a convex set. Linear functions
are convex, also the quadratic function f (x) = x2 is.
Usually, arguments involving convex sets are easy to visualize by two-dim-
ensional drawings. One reason being that the definition of convexity only
involves three points which always lie in some two-dimensional plane. On
the other hand, convexity is a very powerful concept which appears (some-
times unexpected) in many branches of mathematics and its applications.
Here are a few areas where convexity is an important concept: mathemat-
ical optimization, high-dimensional geometry, analysis, probability theory,
system and control, harmonic analysis, calculus of variations, game theory,
computer science, functional analysis, economics, and there are many more.
Our concern is mathematical optimization and especially convex opti-
mization. Geometrically, solving a convex optimization problem amounts to
finding the minimum of a given convex function in a given convex set. One
attractive property of convex optimization problems is that it suffices to find
local minima because every local minimum is already a global minimum.
We want to solve convex optimization problems in an algorithmically ef-
ficient way. We want to use a computer having (very) limited resources of
time and space. So we have to work with convex sets and convex func-
tions algorithmically. Here we discuss possible ways to represent them in
the computer, in particular we discuss which data do we have to give to
the computer. Roughly speaking, there are two convenient possibilities to
354 Convexity (Version: May 24, 2022)

represent convex sets: By an implicit description as an intersection of half-


spaces or by an explicit description as the convex combination of extreme
points. The goal of this chapter is to discuss these two representations. In
the context of functional analysis they are connected to two famous theo-
rems, the Hahn-Banach separation theorem and the Krein-Milman theorem.
Since we are only working in finite-dimensional Euclidean spaces (and not
in the more general setting of infinite-dimensional topological vector spaces)
we can derive the statements using simple geometric arguments.

A.1 Preliminaries and conventions


We recall some fundamental geometric notions. The following is a brief re-
view, without proofs, of some basic definitions and notations frequently ap-
pearing in this textbook.

A.1.1 Euclidean space


Let E be an n-dimensional Euclidean space which is an n-dimensional vector
space over the field of real numbers together with an inner product. We
usually use the notation v · w for the inner product of the vectors v and w

in E. This inner product defines a norm on E by kvk = v · v and a metric
by d(v, w) = kv − wk. The angle θ = ∠(v, w), with 0 ≤ θ < π, between two
nonzero vectors v and w is defined through
v · w = cos θ kvkkwk.
The Cauchy-Schwarz inequality
|v · w| ≤ kvkkwk (A.1)
frequently plays a crucial role. We have equality in (A.1) if and only if the
vectors v and w are linearly dependent.
Two vectors are orthogonal if their inner product is zero. A family of n
vectors b1 , . . . , bn is an orthonormal basis of E if it is linearly independent, if
it consists only of normal vectors, kb1 k = . . . = kbn k = 1, and if the vectors
are pairwise orthogonal, bi · bj = 0 whenever i 6= j.
The parallelogram law
kv + wk2 + kv − wk2 = 2kvk2 + 2kwk2 for all v, w ∈ E, (A.2)
holds in Euclidean space, which is an easy to verify but yet surprisingly
powerful fact. The parallelogram law is illustrated in Figure A.1.
For sake of concreteness we will work with coordinates most of the time:
A.1 Preliminaries and conventions 355
v v+w
v−w

0 w

Figure A.1 Parallelogram law.

After choosing an orthonormal basis b1 , . . . , bn of E, one can always identify


E with Rn . A vector v = x1 b1 + · · · + xn bn ∈ E corresponds to the column
vector x = (x1 , . . . , xn )T ∈ Rn . Formally this identification is carried out by
the linear transformation

T : E → Rn defined by T v = x = (x1 , . . . , xn )T

which is an isometry; we have v · w = (T v)T (T w) for all v, w ∈ E. The


image of the orthonormal basis b1 , . . . , bn of E under the transformation T
are the standard unit vectors e1 , . . . , en of Rn which form an orthonormal
basis of Rn
     
1 0 0
0 1 0
     
0 0 0
e1 =  .  , e2 =  .  , . . . , en =  .  .
     
.
. . .  .. 
     
0 0 0
0 0 1

Then the inner product is the usual one, the norm is the Euclidean norm
(or `2 -norm), and the metric is the Euclidean distance:

n
X
v · w = xT y = xi yi
i=1
n
!1/2
X
kvk = kxk2 = x2i
i=1

d(v, w) = kv − wk = kx − yk2 .

We will often simply write kxk instead of kxk2 .


356 Convexity (Version: May 24, 2022)

A.1.2 Topology
The n-dimensional (open) ball with center x ∈ Rn and radius r is

B(x, r) = {y ∈ Rn : ky − xk < r}.

Let A be a subset of n-dimensional Euclidean space Rn . A point x ∈ A is


an interior point of A if there is a positive radius ε > 0 so that B(x, ε) ⊆ A.
The set of all interior points of A is denoted by int A, the interior of A. We
say that A is open if all its points are interior points, and so A = int A. The
set A is closed if its complement Rn \ A is open. The (topological) closure A
of A is the smallest (by the inclusion relation) closed set containing A. One
can show that A is closed if and only if every converging sequence of points
in A has a limit which also lies in A.
A point x ∈ A belongs to the boundary ∂A of A if for every ε > 0 the
ball B(x, ε) contains points in A and in Rn \ A. The boundary ∂A is a closed
set and we have

A = A ∪ ∂A and ∂A = A \ int A.

The set A is compact if every sequence in A contains a convergent subse-


quence. Since E has dimension n < ∞, we can characterize the compactness
of a subset of E by the two properties closedness and boundedness; a subset
of E is called bounded if it is contained in a ball of sufficiently large, but
finite, radius.
For instance, the boundary of the ball with radius 1 and center 0 is the
unit sphere

∂B(0, 1) = {x ∈ Rn : xT x = 1}.

We denote the unit sphere by S n−1 , where the superscript indicates the
dimension of the manifold.

A.1.3 Affine geometry


We say that a point y ∈ Rn is an affine linear combination of points x1 , . . . , xN ∈
Rn if there are real coefficients α1 , . . . , αN ∈ R so that

N
X N
X
y= αi xi with αi = 1.
i=1 i=1
A.1 Preliminaries and conventions 357

Points x1 , . . . , xN are said to be affinely independent if for all α1 , . . . , αN ∈ R


the two conditions
N
X N
X
αi = 0 and αi xi = 0 imply α1 = · · · = αN = 0.
i=1 i=1

In other words, the vectors


   
1 1
,..., ∈ Rn+1
x1 xN

are linearly independent.

x3

x1 x2

Figure A.2 The barycenter of the three vertices of the triangle, which are
affinely independent points, has barycentric coordinates (1/3, 1/3, 1/3).

Points which are not affinely independent, are called affinely dependent.
If x1 , . . . , xN are affinely independent and if y is an affine linear combina-
tion of these N points, then the coefficients α1 , . . . , αN in the affine linear
combination are uniquely defined. They give the barycentric coordinates 1 of
the point y in the coordinate system given by x1 , . . . , xN . The (affine) di-
mension of a set is the largest integer N − 1 so that one can find N affinely
independent points in the set. For example, Rn has dimension n and the
empty set has dimension −1. A set in Rn that has dimension n is also called
full-dimensional.
One should keep in mind that the affine dimension is a rather naive notion.
For example, the unit sphere S n−1 has affine dimension n but it is an (n−1)-
dimensional submanifold of Rn . Nevertheless, the affine dimension will be
very useful when considering convex sets.
A subset A ⊆ Rn is called an affine subspace of Rn if is closed under
taking affine linear combinations. The (by inclusion) smallest affine subspace
containing a given set of given is its affine hull. Equivalently, it is the set of
1 August Ferdinand Möbius introduced barycentric coordinates in his work “Der barycentrische
Calcul, ein neues Hülfsmittel zur analytischen Behandlung der Geometrie” published in 1827.
358 Convexity (Version: May 24, 2022)

all possible affine linear combinations


(N N
)
X X
aff A = αi xi : N ∈ N, x1 , . . . , xN ∈ A, α1 , . . . , αN ∈ R, αi = 1 .
i=1 i=1

Again, equivalently, a set A is an affine subspace if and only if it is the set


of solutions of a system of linear equations. If A is not empty, then one can
write A in the form

A = x + L = {x + y : y ∈ L}

where x ∈ Rn and where L is a linear subspace of Rn . Then the affine


dimension of A equals dim L. Moreover, it is easy to see that the affine
dimension of a set A equals the affine dimension of its affine hull aff A.
Zero-dimensional affine subspaces are single points, one-dimensional affine
subspaces are called (affine) lines, and (n − 1)-dimensional affine subspaces
are called (affine) hyperplanes. A hyperplane can be written as the set of
solutions of one non-trivial linear equality

Hc,δ = {x ∈ Rn : cT x = δ}, (A.3)

where c ∈ Rn \ {0} is a normal vector of the hyperplane, being orthogonal


to it, and where δ ∈ R.
If the dimension of A is strictly smaller than n, then A does not have
interior points: int A = ∅. In this situation one is frequently interested in the
interior points of A relative to the affine subspace aff A. We say that a point
x ∈ A belongs to the relative interior of A when there is a ball B(x, ε) with
strictly positive radius ε > 0 so that aff A ∩ B(x, ε) ⊆ A. We denote the set
of all relative interior points of A by relint A. Of course, if dim A = n, then
the interior coincides with the relative interior: int A = relint A.

A.2 Convex sets and convex functions


A.2.1 Convex sets
We say that a point y ∈ Rn is a convex combination of points x1 , . . . , xN ∈
Rn if there are nonnegative real numbers α1 , . . . , αN ≥ 0 so that

N
X N
X
y= αi xi with αi = 1.
i=1 i=1
A.2 Convex sets and convex functions 359

A set C ⊆ Rn is called a convex set if it is closed under taking convex


combinations: For all x1 , . . . , xN ∈ C we have
N
X N
X
αi xi ∈ C whenever α1 , . . . , αN ≥ 0 with αi = 1.
i=1 i=1

Equivalently, as one can show by induction on the number of points N , a set


C is convex if and only for every pair of points x, y ∈ C also the entire line
segment between x and y is contained in C, where the line segment between
the points x and y is defined as

[x, y] = {(1 − α)x + αy : 0 ≤ α ≤ 1}.

The convex hull of A ⊆ Rn is the smallest convex set containing A. We


denote it by conv A. Equivalently, the convex hull of A is the set of all
possible convex combinations which one can achieve by using points in A
N N
( )
X X
conv A = y : N ∈ N, x1 , . . . , xN ∈ A, α1 , . . . , αN ≥ 0, αi = 1, y = αi xi .
i=1 i=1

The convex hull of two points is the line segment between them: [x, y] =
conv{x, y}.
The convex hull of finitely many points is called a polytope. Two-dimensional,
planar, polytopes are polygons. Other important examples of convex sets
are balls, affine subspaces, halfspaces, and line segments. Furthermore, ar-
bitrary intersections of convex sets are convex again. The Minkowski sum of
two convex sets C and D, given by

C + D = {x + y : x ∈ C, y ∈ D},

is a convex set.
Suppose we are given a point y lying in the convex hull of a set A. How
many points of A have to be used for a convex combination of y? An answer
is given by a theorem of Carathéodory, originally stated in a paper published
in 1911. In fact, Hilbert used and proved a special case of Carathéodory’s
theorem already in a paper from 1888. This paper also plays an important
role in Chapter 11.3. The argument below can be traced back to Hilbert.

Lemma A.2.1 (Carathéodory’s theorem) Let A ⊆ Rn be a set and let y ∈


conv A be a point in the convex hull of A. Then there are affinely independent
points x1 , . . . , xN ∈ A so that y ∈ conv{x1 , . . . , xN }. In particular,

N ≤ dim A + 1 ≤ n + 1.
360 Convexity (Version: May 24, 2022)

Proof Write y ∈ conv A as


N
X N
X
y= αi xi , xi ∈ A, αi ≥ 0, αi = 1.
i=1 i=1

Suppose N is minimal and x1 , . . . , xN are affinely dependent. Then there are


coefficients β1 , . . . , βN ∈ R with
N
X N
X
βi = 0, βi xi = 0, (β1 , . . . , βN ) 6= (0, . . . , 0).
i=1 i=1

We may assume that the minimum


 
αi
min : βi > 0, i = 1, . . . N
βi
is attained at i = 1. Consider the new expression for y which only involves
N − 1 summands
N N N
X α1 X X α1
y= αi xi − β i xi = γi xi , γi = αi − βi .
β1 β1
i=1 i=1 i=2

Since γi ≥ 0 (which is obvious when βi ≤ 0; and otherwise, when βi > 0,


follows from the minimality of αβ11 ), and

N N  
X X α1 α1
γi = αi − βi = (1 − α1 ) + β1 = 1,
β1 β1
i=2 i=2

this new expression contradicts the minimality of N . Hence, x1 , . . . , xN have


to be affinely independent.

Here are two useful properties of convex sets. The first one, together
with the general observation that sets whose interior is not empty are full-
dimensional, shows that a convex set is full-dimensional if and only if its
interior is not empty. This implies that if a convex set does not have interior
points, then we can pass to its affine hull, so that in this new ambient space
the convex set obtains interior points.

Lemma A.2.2 If a convex set C ⊆ Rn has affine dimension n, then its


interior is not empty.

Proof By assumption, C contains n + 1 affinely independent points. The


convex hull of these n + 1 points is contained in C. Geometrically, ∆ =
A.2 Convex sets and convex functions 361

conv{x1 , . . . , xn+1 } is an n-dimensional simplex with vertices x1 , . . . , xn+1 .


Since the points x1 , . . . , xn+1 are affinely independent, the linear system
   
1 ... 1 1
α=
x1 . . . xn+1 y
has a unique solution. By Cramer’s rule, the solution α depends continu-
ously on y. We denote this continuous function by f : Rn → Rn+1 . Con-
1
sider the center of mass y = n+1 (x1 + · · · + xn+1 ). Then f (y) = α with
1
α = n+1 (1, . . . , 1). If ε > 0 is small enough, then all vectors in the ball
B(α, ε) have only positive coordinates. Since f is continuous there is a δ > 0
so that f (B(y, δ)) ⊆ B(α, ε). So, the ball B(y, δ) consists only of convex
combinations of x1 , . . . , xn+1 and thus is contained in ∆. Thus y is an inte-
rior point of ∆, and of C.
The second result deals with line segments through interior point of a
convex set.
Lemma A.2.3 Let C ⊆ Rn be a convex set. If a point x lies in the interior
of C, then for every y ∈ C there is a point z ∈ C so that
x = (1 − α)y + αz with α ∈ (0, 1),
where (0, 1) denotes the open interval 0 < α < 1.
Proof By assumption there is a positive ε so that C contains a ball of radius
ε around x. We choose ε so that also the boundary of the ball B(x, ε) is
contained in C. Now consider the line through x and y. It hits the boundary
of B(x, ε) twice. Form these two intersection points, take the one which is
further from y and call it z. Then, x lies in the relative interior of the line
segment [y, z].
Also the reverse implication of Lemma A.2.3 holds, see Exercise A.3.

A.2.2 Convex functions


Let C ⊆ Rn be a convex set. A function f : C → R is called a convex function
(on C) if its epigraph
epi f = {(x, α) ∈ C × R : f (x) ≤ α}
is a convex set in C × R ⊆ Rn+1 . Equivalently, function f is convex if and
only if for all x, y ∈ C
f ((1 − α)x + αy) ≤ (1 − α)f (x) + αf (y) with 0≤α≤1 (A.4)
362 Convexity (Version: May 24, 2022)

holds. By induction this gives Jensen’s inequality


N N
!
X X
f αi xi ≤ αi f (xi ). (A.5)
i=1 i=1
PN
which holds for all x1 , . . . , xN ∈ C, and all α1 , . . . , αN ≥ 0 with i=1 αi = 1.
A function f : C → R is called strictly convex if for all distinct x, y ∈ C
inequality (A.4) holds strictly:
f ((1 − α)x + αy) < (1 − α)f (x) + αf (y) with 0 < α < 1.
A function f : C → R is a concave function on C if its negative −f is convex.
Some examples of convex functions: Affine functions f : Rn → R given
by f (x) = bT x + c, with b ∈ Rn and c ∈ R, are trivially convex and at
the same time concave functions. A quadratic function f : Rn → R given by
f (x) = xT Ax + bT x + c, with symmetric matrix A ∈ S n , vector b ∈ Rn ,
constant c ∈ R is convex if and only if A is positive semidefinite.
Some constructions preserving convexity: The supremum of any family of
convex functions, the sum of convex functions, or nonnegative multiples of
convex functions yield again convex functions.
To test that a twice continuously differentiable function—a C 2 -function—
is convex it is convenient to consider its Hessian matrix.
Theorem A.2.4 Let C ⊆ Rn be an open convex set and let f : C → R be
a twice continuously differentiable. It is a convex function on C if and only
if its Hessian matrix
 2 
∇2 f (x) = ∂x∂i ∂xf
j
(x)
1≤i,j≤n

is positive semidefinite for every x ∈ C.


Proof First we consider the univariate case n = 1. A univariate function
f : C → R is convex if and only if for all x, y, z ∈ C with x < y < z inequality
f (y) − f (x) f (z) − f (y)
≤ (A.6)
y−x z−y
holds. One can see the validity of the inequality above by setting
y−x
y = (1 − α)x + αz with α = ∈ (0, 1)
z−x
so that for every convex function f
z−y y−x
f (y) ≤ (1 − α)f (x) + αf (z) = f (x) + f (z)
z−x z−x
A.2 Convex sets and convex functions 363

holds; then (A.6) follows immediately. By the mean value theorem there are
ξ1 ∈ (x, y) and ξ2 ∈ (y, z) so that
f (y) − f (x) f (z) − f (y)
f 0 (ξ1 ) = and f 0 (ξ2 ) =
y−x z−y
hold. Suppose f is convex, then this together with inequality (A.6) implies
that f 0 is monotonically increasing in C, and so the second derivative f 00 (x)
is nonnegative for every x ∈ C. Conversely, suppose that f 00 (x) ≥ 0 for all
x ∈ C. Then f 0 is monotonically increasing and (A.6) is fulfilled, so f is
convex.
Now the case n > 1: A multivariate function f is convex if and only if for
all x ∈ C and v ∈ Rn the univariate function gx,v (α) = f (x + αv) is convex
for all α so that x + αv ∈ C. Setting v = y − x we get
gx,v (α) = f ((1 − α)x + αy).
We apply the chain rule to find the first and second derivative of gx,v :
n
0
X ∂f
gx,v (α) = (x + αv)vi
∂xi
i=1
n Xn
00
X ∂2f
gx,v (α) = (x + αv)vi vj .
∂xi ∂xj
i=1 j=1

Setting α = 0, we get
00
gx,v (0) = v T ∇2 f (x)v with v = (v1 , . . . , vn )T .
00 (0) is nonnegative for all v if and only if the Hessian matrix ∇2 f (x)
So gx,v
is positive semidefinite.
Whereas convexity of a C 2 -function f can be recognized by the positive
semidefiniteness of its Hessian matrix, strict convexity of a C 2 -function can
be seen from the positive definiteness of ∇2 f ; the next corollary follows by
obvious modifications of the proof of Theorem A.2.4.
Corollary A.2.5 Let C ⊆ Rn be an open convex set and let f : C → R be
a function which is twice continuously differentiable. It is a strictly convex
function on C if and only if its Hessian matrix is positive definite for all
x ∈ C.
Using the corollary one sees immediately that the exponential function
f (x) = ex is strictly convex on the real line. From this one deduces the
inequality between the arithmetic mean and the geometric mean, the AM-
GM inequality for short.
364 Convexity (Version: May 24, 2022)

Theorem A.2.6 For nonnegative real numbers x1 , . . . , xn ≥ 0 the inequal-


ity between the arithmetic and the geometric mean holds:

n
!1/n n
Y 1X
xi ≤ xi ,
n
i=1 i=1

where equality holds if and only if all numbers coincide, x1 = . . . = xn .

Proof Since the exponential function is convex, Jensen’s inequality implies


1 1 y1
e n (y1 +···+yn ) ≤ (e + · · · + eyn ) .
n
After setting xi = eyi , which is nonnegative, we get the AM-GM inequality
1
(x1 · · · xn )1/n ≤ (x1 + · · · + xn ).
n
Strict convexity of the exponential function gives the case of equality.

The AM-GM inequality has an elementary geometric interpretation: The


volume of a parallelipiped with side length S = x1 + · · · + xn is maximal if
and only if x1 = . . . = xn = S/n.
The above proof also can easily accommodate nonuniform weights: For
λ1 , . . . , λn ≥ 0 with λ1 + · · · + λn = 1 we have
n
Y n
X
xλi i ≤ λ i xi .
i=1 i=1

The AM-GM inequality also implies that the function

f (x1 , . . . , xn ) = (x1 · · · xn )1/n

is strictly convave. We prove this in two steps.

Corollary A.2.7 For positive real numbers x1 , . . . , xn > 0 and y1 , . . . , yn >


0 the following inequality holds

n
!1/n n
!1/n n
!1/n
Y Y Y
xi + yi ≤ (xi + yi ) ,
i=1 i=1 i=1

where we have the case of equality if and only if the two vectors (x1 , . . . , xn )
and (y1 , . . . , yn ) are linearly dependent.
A.3 Metric projection 365

Proof Consider the fraction


 n 1/n  n 1/n
Q Q
xi + yi n
!1/n n
!1/n
i=1 i=1
Y xi Y yi
1/n = + .
 n xi + yi x i + yi
Q i=1 i=1
(xi + yi )
i=1

By the AM-GM inequality this fraction is at most


n n
1 X xi 1 X yi
+ = 1,
n xi + yi n x i + yi
i=1 i=1

which proves the inequality. Equality holds if and only if


x1 xn y1 yn
= ... = and = ... = .
x1 + y1 x n + yn x1 + y1 x n + yn
Hence,
xi xi + yi yi
= = for all i 6= j.
xj x j + yj yj
In other words, the vectors (x1 , . . . , xn ) and (y1 , . . . , yn ) are linearly depen-
dent.
Corollary A.2.8 The function
f : Rn++ → R with f (x1 , . . . , xn ) = (x1 · · · xn )1/n
is a concave function. Additionally, if x, y are linearly independent, then we
have strict concavity
f ((1 − α)x + αy) > (1 − α)f (x) + αf (y) for α ∈ (0, 1).
Proof For points x, y ∈ Rn++ and α ∈ [0, 1] we apply the previous corollary
to f ((1 − α)x + αy) and get
n
!1/n n
!1/n n
!1/n
Y Y Y
((1 − α)xi + αyi ) ≥ (1 − α) xi + α yi ,
i=1 i=1 i=1

which is (1 − α)f (x) + αf (y). Also, the supplement about strict concavity
follows from the previous corollary.

A.3 Metric projection


Let C be a nonempty closed convex set in Rn . One can project every point
z ∈ Rn onto C by simply taking the point in C which is closest to it. This
projection is called the metric projection. The fact that the metric projection
366 Convexity (Version: May 24, 2022)

exists and is unique is very intuitive, see Figure A.3; in the case when C
is a linear subspace we are talking simply about the orthogonal projection
onto C. We give a proof of this based on the parallelogram law (A.2). This
proof does not only work for Rn , but also for arbitrary, potentially infinite
dimensional, Hilbert spaces.
In the next section we will apply the metric projection for constructing
separating and supporting hyperplanes.
Lemma A.3.1 Let C be a nonempty closed convex set in Rn . Let z ∈
Rn be a point. Then there is a unique point y in C which is closest to z.
Additionally, the vectors z − y and x − y form an obtuse angle whenever
x ∈ C:
(z − y)T (x − y) ≤ 0 for all x ∈ C. (A.7)
Moreover, if z lies outside of C, then y lies on ∂C, the boundary of C.

· y
C x

Figure A.3 Metric projection of a point z onto C.

Proof We may assume, after performing a translation of the complete


situation, that z = 0. We denote the shortest distance from z to C by
d = inf x∈C kxk. Then, by the definition of the infimum, there is a sequence
(xr )r∈N of elements in C with limr→∞ kxr k = d.
We shall prove that (xr )r∈N is a Cauchy sequence: For large r and s we
have by the parallelogram law (A.2)
1 2 1 2 1 1
(xr + xs ) + (xr − xs ) = kxr k2 + kxs k2 → d2 (A.8)
2 2 2 2
By convexity 12 (xr + xs ) ∈ C and hence k 12 (xr + xs )k2 ≥ d2 . So k 12 (xr − xs )k2
tends to zero in (A.8).
Since Rn is a complete space, every Cauchy sequence is convergent. This
limit also lies in C because C is closed. So we established the existence of y.
To prove the uniqueness suppose there is y 0 ∈ C distinct from y with
kyk = ky 0 k. Then again by the parallelogram law (A.2)
1 2 1 2 1 2 1 1
(y + y 0 ) < (y + y 0 ) + (y − y 0 ) = kyk2 + ky 0 k2 = d2 .
2 2 2 2 2
A.3 Metric projection 367

By convexity, 12 (y+y 0 ) lies in C, which gives a contradiction to the minimality


of d.
Clearly, y ∈ ∂C, otherwise one would find another point in C closer to z
lying in some small ball B(y, ε) ⊆ C.
For α ∈ [0, 1] we have

kz − yk2 ≤ kz − ((1 − α)y + αx)k2 = kz − y + α(y − x)k2


= kz − yk2 + 2α(z − y)T (y − x) + α2 ky − xk2 .

From this it follows if α 6= 0 that

α
(z − y)T (x − y) ≤ ky − xk2 .
2

By letting α tending to zero we derive the conclusion of the lemma.

Thus, the map πC : Rn → C defined by the property

kx − zk ≥ kπC (z) − zk for all x∈C

is well-defined. This map is called metric projection of z on C. Sometimes


we refer to πC (z) as the best approximation of z in the set C.
The metric projection πC is a contraction:

Lemma A.3.2 Let C be a nonempty closed and convex set in Rn . Then,

kπC (z) − πC (z 0 )k ≤ kz − z 0 k for all z, z 0 ∈ Rn .

In particular, the metric projection πC is a Lipschitz continuous map with


Lipschitz constant 1, a contraction.

Proof Define y = πC (z) and y 0 = πC (z 0 ). To simplify the calculation we


assume, without loss of generality, z 0 = 0; so we want to show ky −y 0 k ≤ kzk.
By (A.7) we have

(z − y)T (y 0 − y) ≤ 0 and (−y 0 )T (y − y 0 ) ≤ 0,

and together

(−z + y − y 0 )T (y − y 0 ) ≤ 0.
368 Convexity (Version: May 24, 2022)

We use this inequality two times,

ky − y 0 k2 = (y − y 0 )T (y − y 0 )
= (−z + y − y 0 )T (y − y 0 ) + z T (y − y 0 )
≤ z T (y − y 0 )
= z T z + z T (−z + y − y 0 )
= z T z + (z − y + y 0 )T (−z + y − y 0 ) + (y − y 0 )T (−z + y − y 0 )
≤ kzk2 − kz − y + y 0 k2
≤ kzk2 ,

and the lemma follows.

The metric projection can reach every point on the boundary of C:

Lemma A.3.3 Let C be a nonempty closed and convex set in Rn . Then,


for every boundary point y ∈ ∂C there is a point z lying outside of C so that
y = πC (z).

Proof First note that one can assume that C is bounded (since otherwise
replace C by its intersection with a ball of radius 1 around y). Since C is
bounded it is contained in a ball B of sufficiently large radius.
We will construct the desired point z which lies on the boundary ∂B by
a limit argument. For this choose a sequence of points yi ∈ Rn \ C such that
ky−yi k < 1/i. Because the metric projection is a contraction (Lemma A.3.2)
we have

ky − πC (yi )k = kπC (y) − πC (yi )k ≤ ky − yi k < 1/i.

Since C is convex, one of the two points of the line aff{yi , πC (yi )} intersected
with the boundary ∂B is a point zi ∈ ∂B so that πC (zi ) = πC (yi ). Since ∂B
is compact, there is a convergent subsequence (zij ) having a limit z ∈ ∂B.
Then we have, because πC is continuous,
 
y = πC (y) = πC lim yij = lim πC (yij )
j→∞j→∞
 
= lim πC (zij ) = πC lim zij = πC (z),
j→∞ j→∞

which proves the lemma.


A.4 Separating and supporting hyperplanes 369

A.4 Separating and supporting hyperplanes


As defined by (A.3) the hyperplane through a point x ∈ Rn with normal
vector c ∈ Rn \ {0} is
H = {y ∈ Rn : cT y = δ} with δ = cT x.
It is an affine subspace of dimension n − 1. The hyperplane H divides Rn
into two closed halfspaces
H + = {y ∈ Rn : cT y ≥ δ} and H − = {y ∈ Rn : cT y ≤ δ}
which have hyperplane H as common boundary, see Figure A.4

H
c
x

H+
δ
kck
0

H−

Figure A.4 A hyperplane H through a point x with normal vector c defining


two halfspaces H + and H − . The distance of H to the origin equals δ/kck.

A hyperplane H separates two sets A ⊆ Rn and B ⊆ Rn if they lie on


different sides of the hyperplane, i.e., if A ⊆ H + and B ⊆ H − or conversely
A ⊆ H − and B ⊆ H + . In other words, A and B are separated by a hyper-
plane if there exists a nonzero vector c ∈ Rn \ {0} and a scalar β ∈ R such
that
cT x ≤ β ≤ cT y for all x ∈ A, y ∈ B.
Then, hyperplane H is also called a separating hyperplane of A and B.
Separation is said to be strict if both inequalities are strict, i.e.,
cT x < β < cT y for all x ∈ A, y ∈ B.
A hyperplane H is said to support A at a point x if x ∈ A ∩ H and if A
is contained in one of the two halfspaces H + or H − , say H − . Then H is
a supporting hyperplane of A at x and H − is a supporting halfspace, see
Figure A.5.
It is a fundamental property of convex sets that one can use the metric
projection to construct separating and supporting hyperplanes.
370 Convexity (Version: May 24, 2022)

H
A c
x

Figure A.5 Hyperplane H supports A at x and is a separating hyperplane


of A and B.

Lemma A.4.1 Let C be a nonempty closed convex set in Rn . Let z ∈ Rn \C


be a point outside C and let πC (z) its closest point in C. Then the following
holds.

(i) The hyperplane H through πC (z) with normal z − πC (z) supports C at


πC (z). Thus, it separates {z} and C; additionally z 6∈ H.
(ii) The hyperplane through the midpoint 1/2(z+πC (z)) with normal z−πC (z)
strictly separates {z} and C.

Proof We prove (i) and then (ii) follows by the same argument. Let y =
πC (z) and consider the hyperplane

H = {x ∈ Rn : cT x = δ} with c = z − y, δ = cT y.

Then z ∈ H + and z 6∈ H because

cT z > δ if and only if kz − yk2 = (z − y)T (z − y) > 0.

Furthermore, C ⊆ H − because for x ∈ C

cT x ≤ δ if and only if (z − y)T (x − y) ≤ 0,

where the last inequality is (A.7).

As a direct application of Lemma A.4.1 (i), we can formulate the following


fundamental structural result for closed convex sets.
Theorem A.4.2 A nonempty closed convex set C ⊆ Rn is the intersection
of its supporting halfspaces.

Proof It is clear that C is contained in the intersection of its supporting


halfspaces. For the other inclusion, let z ∈ Rn \C be a point outside C. Then
by Lemma A.4.1 (i) there is a supporting hyperplane H of C separating {z}
and C so that z 6∈ H − .
A.4 Separating and supporting hyperplanes 371

Theorem A.4.2 provides an implicit description of a convex set, it gives a


method to verify whether a point belongs to the closed convex set in ques-
tion: One has to check whether the point lies in all its supporting halfspaces.
Combining Lemma A.3.3 and Lemma A.4.1 we deduce that one can con-
struct a supporting hyperplane through every boundary point.

Lemma A.4.3 Let C ⊆ Rn be a nonempty closed convex set and let y ∈ ∂C


be a point lying on the boundary of C. Then there is a hyperplane which
supports C at y.

One can generalize Lemma A.4.1 (i) and remove the assumption that C
is closed.

Lemma A.4.4 Let C ⊆ Rn be a nonempty convex set and let z ∈ Rn \C be


a point lying outside C. Then, {z} and C can be separated by a hyperplane.

Proof In view of Lemma A.4.1 we only have to show the statement for
convex sets C which are not closed.
First we argue that with C also its topological closure C is convex: Let
x, y ∈ C and let (1 − α)x + αy with 0 ≤ α ≤ 1 be a point in the line segment
[x, y]. There are sequences (xi )i∈N , respectively (yi )i∈N , of points in C which
converge to x, respectively to y. Then (1 − α)xi + αyi , which lies in C, goes
to (1 − α)x + αy when i tends to infinity. Hence, (1 − α)x + αy ∈ C.
For proving the lemma we are left with two cases: If z 6∈ C, then a hy-
perplane separating {z} and the closed and convex set C also separates {z}
and C. If z ∈ C, then z ∈ ∂C as z lies outside C. By Lemma A.4.3 there is
a hyperplane supporting C at z. In particular, it separates {z} and C.

We conclude with a general separation theorem for nonintersecting convex


sets. This will be the basis in our discussion of the duality theory of conic
programs.

Theorem A.4.5 Let C, D ⊆ Rn be nonempty convex sets with C ∩ D = ∅.


Then, C and D can be separated by a hyperplane. In particular, there exists
a vector c ∈ Rn \ {0} so that

sup{cT x : x ∈ C} ≤ inf{cT y : y ∈ D}.

Proof Consider the nonempty convex set

C − D = {x − y : x ∈ C, y ∈ D}

The origin does not lie in C − D by assumption. Hence, by Lemma A.4.4,


{0} and C −D can be separated by a hyperplane H = Hc,δ with c ∈ Rn \ {0}
372 Convexity (Version: May 24, 2022)

and δ ∈ R. We choose the direction of c so that 0 ∈ H + and C − D ∈ H − .


Then, for x ∈ C and y ∈ D it follows that
cT (x − y) ≤ δ and so cT x ≤ δ + cT y ≤ cT y
because 0 = cT 0 ≥ δ.

A.5 Extreme points


Now we turn to an explicit description of convex sets. An explicit description
gives an easy way to generate points lying in the convex set.
Let C be a convex set in Rn . A set F ⊆ C is called a face of C if for all
x ∈ F the following holds:
x = (1 − α)y + αz with α ∈ (0, 1), y, z ∈ C =⇒ y, z ∈ F.
Clearly any intersection of faces is again a face. Hence, for x ∈ C, the
smallest face containing x is well defined. It is the intersection of all the
faces of C that contain x. We denote it by FC (x).
A vector z ∈ Rn is called a perturbation of x ∈ C if x ± εz ∈ FC (x) for
some ε > 0; then the whole segment [x − εz, x + εz] is contained in the face
FC (x).
Lemma A.5.1 Let C ⊆ Rn be a convex set.
(i) Suppose C has nonempty interior. A point x ∈ C lies in the interior of
C if and only if the face FC (x) equals C.
(ii) The face FC (x) is the unique face of C containing x in its relative interior.
Proof (i) Assume that FC (x) = C and suppose for contradiction that
x does not lie in the interior C. Then x lies on the boundary of C. By
Lemma A.4.3 we can find a supporting hyperplane Hc,δ which supports C
at x, and so
cT x = δ and cT y ≤ δ for all y ∈ C.
Now F = FC (x) ∩ Hc,δ is strictly contained in C = FC (x) as C is full-
dimensional and F is not. It is a face of C, contradicting the minimality of
C. To verify that F is indeed a face of C is easy: Given x0 ∈ F and y, z ∈ C
with
x0 = (1 − α)y + αz for α ∈ (0, 1),
Since FC (x) is a face, we see y, z ∈ FC (x), but also
δ = cT x0 = (1 − α)cT y + αcT z ≤ (1 − α)δ + αδ = δ,
A.6 Convex optimization 373

so δ = cT y = cT z and y, z ∈ Hc,δ .
We proceed to show sufficiency. Let x be an interior point of C. Clearly,
the minimal face FC (x) is contained in C. For the reverse inclusion, consider
a point y ∈ C distinct from x. Since x ∈ int C, Lemma A.2.3 guarantees
that there exists a point z ∈ C and α ∈ (0, 1) so that x = (1 − α)y + αz.
Hence, y ∈ FC (x) because FC (x) is a face.
(ii) follows from (i) by considering the affine hull of FC (x).
We say that a point x ∈ C is an extreme point of C if FC = {x}, that is,
if it is not a relative interior point of any line segment in C. In other words,
if x cannot be written in the form x = (1 − α)y + αz with distinct points
y, z ∈ C and 0 < α < 1. The set of all extreme points of C we denote by
ext C.
Theorem A.5.2 (Minkowski) Let C ⊆ Rn be a compact and convex set.
Then,
C = conv(ext C).
Proof We may assume that the interior of C is not empty by considering
the affine hull of C. We prove the theorem by induction on the dimension
n.
If n = 0, then C is a point and the result follows.
Let the dimension n be at least one. We have to show that every x ∈ C
can be written as the convex hull of extreme points of C. We distinguish
between two cases:
First case: If x lies on the boundary of C, then by Lemma A.4.3 there is a
supporting hyperplane H of C through x. Consider the set F = H ∩ C. This
is a compact and convex set which lies in an affine subspace of dimension at
most n − 1 and hence we have by the induction hypotheses x ∈ conv(ext F ).
Since ext F ⊆ ext C, we are done.
Second case: If x does not lie on the boundary of C, then the intersection
of a line through x with C is a line segment [y, z] with y, z ∈ ∂C. By
the previous argument we have y, z ∈ conv(ext C). Since x is a convex
combination of y and z, the theorem follows.

A.6 Convex optimization


Geometrically, convex optimization is the problem of minimizing a convex
function f , the objective function, over a convex set C, the set of feasible
solutions.
inf{f (x) : x ∈ C}
374 Convexity (Version: May 24, 2022)

For example one can specify the set of feasible solutions explicitly by finitely
many constraints
C = {x ∈ Rn : f1 (x) ≤ 0, . . . , fm (x) ≤ 0},
where f1 , . . . , fm are convex functions.
Convex optimization has the attractive feature that every local minimizer
is at the same time a global minimizer: A local minimizer is a point x ∈ C,
a feasible solution, having the property that there is a positive ε so that
f (x) = inf{f (y) : y ∈ C and kx − yk ≤ ε},
and a global minimizer or optimal solution is a point x ∈ C such that
f (x) ≤ f (y) for all y ∈ C.
To see that local optimality implies global optimality assume that x is a
local but not a global minimizer. Then there is a feasible solution z so that
f (z) < f (x). Clearly, kx − zk > ε. Define y, which lies on the line segment
[x, z] by setting
ε
y = (1 − α)x + αz, α = ,
kx − zk
which is a feasible solution because of the convexity of C. Then, kx − yk = ε
and by the convexity of the function f inequality
f (y) ≤ (1 − α)f (x) + αf (z) < f (x)
holds. This contradicts the fact that x is a local minimizer.
One way to solve convex optimization problems is by constructing a min-
imizing sequence x0 ∈ C, x1 ∈ C, . . . with f (x0 ) ≥ f (x1 ) ≥ . . ., which
converges to a local optimum. The computational efficiency of these kind
of methods is determined by how efficient one can evaluate f and by how
efficient one can represent the set C. In particular, the computational com-
plexity of deciding that an intermediate step xi stays in C plays a decisive
role, as discussed in Chapter 3.
Extreme points are important for convex optimization. Suppose the ob-
jective function f is an affine function and suppose that the set of feasible
solutions is a compact and convex set C. Then, one can always find an ex-
treme point of C as a global minimum: Let x ∈ C be a global minimizer. By
Theorem A.5.2 we can write x as a convex combination of extreme points
x1 , . . . , xN ∈ ext C
N
X N
X
x= αi xi with α1 , . . . , αN > 0, αi = 1.
i=1 i=1
A.7 Polytopes and polyhedra 375

Then
N
X N
X
f (x) = αi f (xi ) ≥ αi f (x) = f (x),
i=1 i=1

and so f (xi ) = f (x) for all i = 1, . . . , N .


More generally, if f is a concave function, then the same argument as above
together with Jensen’s inequality (A.5) implies that a global minimum of f
in a convex and compact set C can be found in ext C.

A.7 Polytopes and polyhedra


In this section we consider convex sets which a characterized by having an
easy, finite description.
The convex hull of finitely many points is called a polytope. So for a
polytope P ⊆ Rn there are finitely many points x1 , . . . , xN so that

P = conv{x1 , . . . , xN }.

It is easy to see that the extreme points of P consists of the minimal subset
of {x1 , . . . , xN } so that its convex hull equals P . Often, the extreme points
of a polytope P are called the vertices of P .
If a set P ⊆ Rn is given as an intersection of finitely many halfspaces,
then it is called a polyhedron, i.e. if there is a matrix A ∈ Rm×n and a vector
b ∈ Rm so that
P = {x ∈ Rn : Ax ≤ b}

holds, where Ax ≤ b denotes the system of m linear inequalities aT j x ≤ bj


given by the row vectors aj , with j = 1, . . . , m, of A. Every nonzero vector
aj is the normal vector of a hyperplane which bounds a halfspace in which
P lies. Again, the extreme points of a polyhedron, are frequently called
vertices. The vertices of a polyhedron P can be characterized using linear
algebra.
Lemma A.7.1 Let P = {x ∈ Rn : Ax ≤ b} be a polyhedron. A point z ∈ P
is a vertex of P if and only if the rank of the matrix Az equals n. Here the
matrix Az is the submatrix of A consisting only of the row vectors aj so that
aTj z = bj .

Sometimes the equalities which are used to define Az are called the at z
active inequalities of the system Ax ≤ b.

Proof Suppose that rank Az < n, then there is a vector c ∈ Rn , c 6= 0 with


376 Convexity (Version: May 24, 2022)

Az c = 0. Since aT
i z < bi for all rows of A which do not belong to Az , there
is a δ > 0 with

aT
i (z + δc) ≤ bi and aT
i (z − δc) ≤ bi .

Hence, A(z + δc) ≤ b and A(z − δc) ≤ b and so z + δc, z − δc ∈ P . This


means
1 1
z = (z + δc) + (z − δc)
2 2
and z cannot be a vertex of P .
Now suppose that rank Az = n. Assume that z = αx + (1 − α)y with
x, y ∈ P , α ∈ (0, 1). Equality holds for a row of Az

bi = aT
i z
= aT
i (αx + (1 − α)y)
= αaT T
i x + (1 − α)ai y
≤ αbi + (1 − α)bi = bi .

Also aT T
i x = ai y = bi , since α ∈ (0, 1). Because rank Az = n, the linear
T
system ai w = bi , which consists of rows of Az has a unique solution which
means x = z = y and z is a vertex of P .

For a matrix A with m rows and n columns there are at most


 
m m!
=
n (m − n)! · n!
submatrices of rank n. In particular, every polyhedron only has finitely many
vertices.
We note that the upper bound of m

n is not optimal and can be improved.
McMullen proved in 1970 the upper bound theorem which implies an optimal
upper bound on the number of vertices of a polytope given the parameter
m and n. The upper bound theorem also characterizes the combinatorial
structure of the polytopes reaching the optimal upper bound.
The finiteness result together with Theorem A.5.2 implies that every
bounded polyhedron is a polytope. Conversely, one can show that every
polytope is a bounded polyhedron.
The equivalence of polytopes and bounded polyhedra is a special case of
the Minkowski-Weyl theorem for polyhedra: Every polyhedron is a Minkowski
sum of a polytope and a polyhedral cone, see for example Ziegler [1995].
Converting the inequality description of a bounded polyhedron (also called
the H-representation) into the vertex description of a polytope (also called
A.7 Polytopes and polyhedra 377

the V-representation) is a task which appears frequently but tends to be


difficult and computationally demanding.
A classical example where both descriptions are elegant is the Birkhoff
polytope. The Birkhoff polytope is the set of all doubly stochastic matri-
ces, and a doubly stochastic matrix is a square matrix where all entries are
nonnegative reals and where every row and every column sums to 1. So, the
Birkhoff polytope equals
n
DSn = X ∈ Rn×n : Xe = e, X T e = e, X ≥ 0},

where e = (1, . . . , 1) is the all-ones vector. A theorem of Garret Birkhoff


(1911–1996) and John von Neumann (1903–1957) says that the extreme
points of DSn are exactly the permutation matrices. These are doubly stochas-
tic matrices with entries either 0 or 1. A permutation π : {1, . . . , n} →
{1, . . . , n}, a bijective map from the set {1, . . . , n} to itself, can be uniquely
represented with the permutation matrix P = Pπ by

π(i) = j ⇐⇒ (Pπ )ji = 1

Permutation matrices permute the standard basis vectors of Rn by Pπ ei =


eπ(i) . We denote the group of all permutations by Sn , the symmetric group,
where the group operation is composition of maps.

Theorem A.7.2 (Birkhoff, von Neumann)

ext DSn = {Pπ ∈ Rn×n : π ∈ Sn }.

We prove the theorem by relating it to the theory of perfect matchings


in bipartite graphs. A matching in an undirected graph G = (V, E) is a set
of disjoint edges M ⊆ E: For every pair of distinct edges e, f ∈ M we have
e ∩ f = ∅. Matching M is called perfect if 2|M | = |V |. The incidence vector
χM ∈ RE of a matching M ⊆ E is defined componentwise by
(
1 if e ∈ M,
χM
e =
0 otherwise.

The perfect matching polytope P (G) is

P (G) = conv{χM : M is a perfect matching of G}.

A graph G = (V, E) is called bipartite if one can partition the vertex set
V = U ∪ W so that every edge e ∈ E contains exactly one vertex from U
and one from W : |e ∩ U | = |e ∩ W | = 1.
378 Convexity (Version: May 24, 2022)

Theorem A.7.3 Let G = (V, E) be a bipartite graph. The perfect matching


polytope of G equals

P (G) = {x ∈ RE : x ≥ 0, Ax = e},

where A ∈ RV ×E is the incidence matrix of the graph G defined as


(
1 if v ∈ e,
Av,e =
0 otherwise.

Furthermore, the extreme points of P (G) are exactly the incidence vectors
of perfect matchings.

We begin by proving that the incidence matrix A of a bipartite graph is


totally unimodular, which means that every minor of A is either −1, 0, or
+1.

Lemma A.7.4 The incidence matrix of a bipartite graph is totally uni-


modular.

Proof Let B be a t×t submatrix of A. We shall show that det B = −1, 0, +1.
The proof is by induction. The base case t = 1 is trivial. For t > 1 we consider
three cases:

(i) B contains a zero column.


Then det B = 0.
(ii) B contains a column which contains only one 1.
Then we can permute the rows and columns of B so that we get the
following block structure; the permutation only can change the sign of
B’s determinant:
1 bT
 
det B = ± det , with b ∈ Rt−1 , B 0 ∈ B (t−1)×(t−1) .
0 B0

Thus, det B = ± det B 0 and by the induction hypothesis, det B 0 = −1, 0, +1.
(iii) All columns of B contain exactly two 1s.
Since G is bipartite with bipartition V = U ∪ W , we can permute the
0 
rows of B so that we get the matrix BB00 , where the row indices of B 0
belong to U and the row indices of B 00 belong to W . Now every column
of B 0 and every column of B 00 contains exactly one 1. Summing up the
rows of B 0 gives the all-ones vector (1, . . . , 1). The same happens when
summing up the rows of B 00 . Hence, the rows of B are linearly dependent
and det B = 0.
A.7 Polytopes and polyhedra 379

Proof of Theorem A.7.3 The incidence vectors of a perfect matching χM


of G obviously satisfies the linear conditions χM ≥ 0 and AχM = e.
To prove the reverse inclusion we rewrite {x ∈ RE : x ≥ 0, Ax = e} as a
system of linear inequalities
    
 −I 0 
E 
x∈R : A x≤
  e , (A.9)
−A −e
 

where I is the identity matrix in RE×E . To determine the vertices of (A.9)


we want to apply Lemma A.7.1. Using Lemma A.7.4 it is easy to check that
the matrix
 
−I
A (A.10)
−A
is totally unimodular. In view of Lemma A.7.1, if a point z is a vertex of
the polyhedron (A.9) then there is a square submatrix B ∈ RE×E of (A.10)
and a corresponding right hand side b ∈ RE , which is a subvector of the
right hand side (0, e, −e)T , so that z is the unique solution of the linear
system Bx = b. Using Cramer’s rule we see that the coordinates of z are
only integers: We have
det Be0
ze =
det B
where Be0 is the matrix that we get from B by replacing the e-th column
with the vector b. Since the matrix in (A.10) is totally unimodular, we see
det B ∈ {±1}. Also det Be0 has to be intergral. Hence, in this situation the
coordinates of z are either 0 or 1. Every such 0/1-vector satisfying (A.9) is
the incidence vector of a perfect matching.
In fact, every incidence vector χM of a perfect matching M is a vertex of
the polyhedron (A.9) because the active inequalities determine a matrix of
full rank. More precisely, there are |E|−|M | = |E|−|V |/2 active inequalities
in the system −IχM ≤ 0, all other inequalities AχM ≤ e and −AχM ≤ −e
are active as well. We assemble a full rank |E| × |E|-matrix as follows: We
add to the (|E| − |M |) × |E| submatrix of −I, which contains the rows
indexed by edges with χM e = 0, the |M | rows of the matrix A, indexed by
vertices of one side in the bipartition V = U ∪ W .
Proof of Theorem A.7.2 Consider the complete bipartite graph Kn,n with
2n vertices
V = U ∪ W = {u1 , . . . , un } ∪ {w1 , . . . , wn }
380 Convexity (Version: May 24, 2022)

and n2 edges
E = {{ui , wj } : i, j = 1, . . . , n}
One can identify the spaces RE and Rn×n . Using this identification we see
that the linear conditions of the perfect matching polytope P (Kn,n ) exactly
describe the Birkhoff polytope DSn .
Because every perfect matchings in Kn,n is of the form
M = {{ui , wπ(i) } : i = 1, . . . , n}
for some permutation π ∈ Sn , there is a one-to-one correspondence between
the perfect matchings in Kn,n and the set of permutations Sn . These obser-
vations together with Theorem A.7.3 prove the theorem.

A.8 Some historical remarks and further reading


The history of convexity is astonishing: On the one hand, the notion of
convexity is very natural. The ancient Greek mathematicians intensively
studied the three-dimensional polytopes in which all faces are congruent —
the Platonic solids.
On the other hand, the first mathematician who realized how important
convexity is as a geometric concept was the brilliant Hermann Minkowski
(1864–1909) who in a series of very influential papers “Allgemeine Lehrsätze
über die konvexen Polyeder” (1897), “Theorie der konvexen Körper, ins-
besondere Begründung ihres Oberflächenbegriffs” (published posthumously)
initiated the mathematical study of convex sets and their properties. All the
results in this chapter on the implicit and the explicit representation of
convex sets can be found there (although with different proofs).
Not much can be added to David Hilbert’s (1862–1943) praise in his obit-
uary of his close friend Minkowski:
Dieser Beweis eines tiefliegenden zahlentheoretischen Satzes2 ohne rechnerische
Hilfsmittel wesentlich auf Grund einer geometrisch anschaulichen Betrachtung ist
eine Perle Minkowskischer Erfindungskunst. Bei der Verallgemeinerung auf For-
men mit n Variablen führte der Minkowskische Beweis auf eine natürlichere und
weit kleinere obere Schranke für jenes Minimum M , als sie bis dahin Hermite ge-
funden hatte. Noch wichtiger aber als dies war es, daß der wesentliche Gedanke
des Minkowskischen Schlußverfahrens nur die Eigenschaft des Ellipsoids, daß
dasselbe eine konvexe Figur ist und einen Mittelpunkt besitzt, benutzte und
daher auf beliebige konvexe Figuren mit Mittelpunkt übertragen werden kon-
nte. Dieser Umstand führte Minkowski zum ersten Male zu der Erkenntnis, daß
2 Hilbert is referring to Minkowski’s lattice point theorem. It states that for any invertible
matrix A ∈ Rn×n defining a lattice AZn and any convex set in Rn which is symmetric with
respect to the origin and with volume greater than 2n det(A)2 contains a nonzero lattice
point.
Exercises 381

überhaupt der Begriff des konvexen Körpers ein fundamentaler Begriff in unserer
Wissenschaft ist und zu deren fruchtbarsten Forschungsmitteln gehört.
Ein konvexer (nirgends konkaver) Körper ist nach Minkowski als ein solcher
Körper definiert, der die Eigenschaft hat, daß, wenn man zwei seiner Punkte
in Auge faßt, auch die ganze geradlinige Strecke zwischen denselben zu dem
Körper gehört.3

Until the end of the 1940s convex geometry was a small discipline in pure
mathematics. This changed dramatically when the breakthrough of general
linear programming came during and shortly after World War II. Leonid
Kantorovich (1912–1986), John von Neumann, and George Dantzig (1914–
2005) are the founding fathers of the theory of linear programming. Nowa-
days, convex geometry is an important toolbox for researchers, algorithm
designers and practitioners in mathematical optimization.
Two very good books which emphasize the relation between convex ge-
ometry and optimization are by Barvinok [2002] and by Gruber [2007]. Less
optimization but more convex geometry is discussed in the encyclopedic
book by Schneider [1993]. Somewhat exceptional, and fun to read, is Chap-
ter VII in the book of Berger [2010] where he gives a panoramic view on
the concept of convexity and its many relations to modern higher geometry.
One should also mention the classical study on convex analysis by Rockafel-
lar [1970]. Boyd and Vandenberghe [2004] provide an excellent starting point
for learning about convex optimization. For more on polytopes we advise to
take a look at the lecture on polytopes by Ziegler [1995].

Exercises
A.1 Let A = {x1 , . . . , xn+2 } be a set containing n + 2 points in Rn .
(a) Show: One can partition A into two sets A1 and A2 such that their
convex hulls intersect: conv A1 ∩ conv A2 6= ∅.
(b) Show: If any proper subset of A is affinely independent, then there
is exactly one possible choice for the partition in (a).
3 It is not easy to translate Hilbert’s praise into English without losing its poetic tone, but here
is an attempt. This proof of a deep theorem in number theory contains little calculation.
Using chiefly geometry, it is a gem of Minkowski’s mathematical craft. With a generalization
to forms having n variables Minkowski’s proof lead to an upper bound M which is more
natural and also much smaller than the bound due to Hermite. More important than the
result itself was his insight, namely that the only salient features of ellipsoids used in the
proof were that ellipsoids are convex and have a center, thereby showing that the proof could
be immediately generalized to arbitrary convex bodies having a center. This circumstances
led Minkowski for the first time to the insight that the notion of a convex body is a
fundamental and very fruitful notion in our scientific investigations ever since.
Minkowski defines a convex (nowhere concave) body as one having the property that, when
one looks at two of its points, the straight line segment joining them entirely belongs to the
body.
382 Convexity (Version: May 24, 2022)

(c) Give an example which shows that the statement (a) is wrong when
A only contains n + 1 points.
A.2 Prove the Gauss-Lucas theorem4 : Let f be a complex polynomial in
one variable and let z1 , . . . , zn ∈ C be the roots of f , i.e.
f (z) = (z − z1 )(z − z2 ) · · · (z − zn ).
Show that every root of the derivative f 0 lies in the convex hull of
z1 , . . . , zn where one interprets the complex plane C as R2 .
Hint for n = 2 (but it works also for larger n): For w ∈ C with
f 0 (w) = 0 we have
0 = w − z1 + w − z2 .
Multiply this equation by (w − z1 )(w − z2 ) and use it to show that
w ∈ conv{z1 , z2 }. x
A.3 Prove the converse of Lemma A.2.3: Let C ⊆ Rn be a convex set and
let x be a point lying in C. Suppose that for for every y ∈ C there is a
point z ∈ C so that
x = (1 − α)y + αz with α ∈ (0, 1),
then x is an interior point of C.
A.4 Give a proof for the following statement: Let C ⊆ Rn be a closed
convex set and let x ∈ Rn \ C a point lying outside of C. A separating
hyperplane H is defined in Lemma A.4.1. Consider a point y on the line
aff{x, πC (x)} which lies on the same side of the separating hyperplane
H as x. Then, πC (x) = πC (y).
A.5 Show that
(N )
X
CP n = αi xi xT n
i : N ∈ N, αi ∈ R+ , xi ∈ R+ (i = 1, . . . , N )
i=1

is a closed, unbounded, convex set.


A.6 For α > 0 determine the maximum
( n n
)
Y X
max xi : x ∈ Rn+ , xi = α .
i=1 i=1
4 As an addition to his third proof of the fundamental theorem of algebra he claimed: Carl
Friedrich Gauss Werke, Band III., page 112: Lehrsatz. Sind a, b, c . . . m, n die Wurzeln der
Gleichung f x = 0, a0 , b0 , c0 . . . m0 die Wurzeln der Gleichung f 0 x = 0, wo f 0 x = df
dx
x
, und
werden dieselben Buchstaben die entsprechenden Punkte in plano bezeichnet, so ist, wenn
man sich in a, b, c . . . m, n gleiche abstossende oder anziehende Massen denkt, die im
umgekehrten Verhältniss der Entfernung wirken, in a0 , b0 , c0 . . . m0 Gleichgewicht. It is
interesting that the first published proof of this theorem—by Félix Lucas (1836–1914)—only
appeared in 1879.
Exercises 383

A.7 Show that the lpn unit ball


 !1/p 
 n
X 
(x1 , . . . , xn )T ∈ Rn : kxkp = |xi |p ≤1
 
i=1

is convex for p = 1, p = 2 and p = ∞ (kxk∞ = maxi=1,...,n |xi |).


Determine the extreme points and determine a supporting hyperplane
for every boundary point.
(*) What happens for the other p?
A.8 Consider a subset S ⊆ Rn+ . Then, S is said to be down-monotone in
Rn+ if for each x ∈ S all vectors y ∈ Rn+ with 0 ≤ y ≤ x belong to S.
Moreover, its antiblocker abl(S) is defined as
abl(S) = {y ∈ Rn+ : y T x ≤ 1 ∀x ∈ S}.
Show: abl(abl(S)) = S if and only if S is nonempty, closed, convex
and down-monotone in Rn+ .
A.9 Let P and Q be polyhedra in Rn such that P ⊆ Q.
(a) Show: P = Q if and only if the following equality holds for all weights
w ∈ Rn :
max wT x = max wT x. (A.11)
x∈P x∈Q

(b) Assume that P ⊆ Q ⊆ Rn+ are down-monotone in Rn+ .


Show: P = Q if and only if (A.11) holds for all nonnegative weights
w ∈ Rn+ .
(c) Show that in (a),(b) it suffices to show that (A.11) holds for all
integer valued weights w.
A.10 Given an integer k ∈ [n] consider the hypersimplex ∆n,k defined by
∆n,k = {x ∈ [0, 1]n : x1 + · · · + xn = k}.
(a) Show: ∆n,k = conv(∆n,k ∩ {0, 1}n ).
(b) Show that every point x ∈ ∆n,k ∩ {0, 1}n is a vertex of ∆n,k .
Appendix B
Positive semidefinite matrices (Version: May 24,
2022)

In this chapter we collect basic facts about positive semidefinite matrices,


which we will need throughout the book.

B.1 Basic definitions


We use the following notation: By [n] we denote the set {1, . . . , n}, In denotes
the n × n identity matrix and Jn denotes the all-ones matrix. We may
sometimes omit the index n if the dimension is clear from the context and
write I or J instead of In or Jn .
A matrix X is symmetric if it coincides with its transpose X T . A matrix
P ∈ Rn×n is orthogonal if P P T = In or, equivalently, P T P = In , i.e. the
rows (respectively the columns) of P form an orthonormal basis of Rn . By
S n we denote the set of symmetric n × n matrices and by O(n) we denote
the set of orthogonal matrices. The set of orthogonal matrices O(n) forms a
group under matrix multiplication.
A diagonal matrix D ∈ S n has entries zero at all off-diagonal positions:
Dij = 0 for all i 6= j.

B.1.1 Characterizations of positive semidefinite matrices


Definition B.1.1 A symmetric matrix X ∈ S n is called positive semidef-
inite if
xT Xx ≥ 0 for all x ∈ Rn .
To indicate that a matrix X is positive semidefinite we often use the
notation X  0.
There are many equivalent characterizations of positive semidefinite ma-
trices known. We give the most useful ones below.
B.1 Basic definitions 385

Before we recall the notions of eigenvalues and eigenvectors. For a matrix


X ∈ Rn×n , a nonzero vector u ∈ Rn is an eigenvector of X if there exists
a scalar λ ∈ R such that Xu = λu. Then λ is the eigenvalue of X for the
eigenvector u. The spectral decomposition theorem is one of the most im-
portant theorems about symmetric matrices. It says that for any symmetric
matrix X there exists an orthonormal system u1 , . . . , un of Rn consisting of
eigenvectors of X.
Theorem B.1.2 (Spectral decomposition theorem) Any real sym-
metric matrix X ∈ S n can be decomposed as
n
X
X= λi ui uT
i , (B.1)
i=1

where λ1 , . . . , λn ∈ R are the eigenvalues of X and where u1 , . . . , un ∈ Rn


are the corresponding eigenvectors. These eigenvectors form an orthonormal
basis of Rn .
We can write (B.1) in matrix terms as X = P DP T , where D is the
diagonal matrix with the λi ’s on the diagonal and P is the orthogonal matrix
with the ui ’s as its columns.
The smallest, respectively the largest, eigenvalue of a real symmetric ma-
trix X frequently plays an important role. We denote it by λmin (X), respec-
tively by λmax (X).
Next we give several equivalent characterizations of positive semidefinite
matrices
Theorem B.1.3 (Positive semidefinite matrices) The following as-
sertions are equivalent for a symmetric matrix X ∈ S n .
(i) X is positive semidefinite.
(ii) The smallest eigenvalue of X is nonnegative, i.e., a spectral decomposition
of X is of the form X = ni=1 λi ui uT
P
i with all λ1 , . . . , λn ≥ 0.
(iii) There exists a matrix L ∈ Rn×k , with k ≥ 1, such that X = LLT . In
addition, such L can be chosen to be lower triangular in which case the
decomposition X = LLT is called a Cholesky decomposition of X.
(iv) There exist vectors v1 , . . . , vn ∈ Rk , with k ≥ 1, such that Xij = viT vj for
all i, j ∈ [n]. These vectors are called a Gram representation of X.
(v) All principal minors of X are non-negative.
Proof (i) =⇒ (ii): Let X = ni=1 λi ui uT
P
i be a spectral decomposition of X.
For i ∈ [n] the inequality
0 ≤ uT T 2
i Xui = ui (λi ui ) = λi kui k = λi
386 Positive semidefinite matrices (Version: May 24, 2022)

holds.
(ii) =⇒ (iii): By assumption, X has a decomposition (B.1) where all scalars
λi are nonnegative.
√ Define the matrix L ∈ Rn×n whose i-th row/column is
the vector λi ui . Then X = LLT holds.
(iii) =⇒ (iv): Assume X = LLT where L ∈ Rn×k . Let vi ∈ Rk denote the
i-th row of L. The equality X = LLT gives directly that Xij = viT vj for all
i, j ∈ [n].
(iv) =⇒ (i): Assume Xij = viT vj for all i, j ∈ [n], where v1 , . . . , vn ∈ Rk .
For x ∈ Rn we have

n n n 2
X X X
T
x Xx = xi xj Xij = xi xj viT vj = xi vi ≥ 0.
i,j=1 i,j=1 i=1

This shows that X  0.


The equivalence (i) ⇐⇒ (v) can be found in any standard textbook on linear
algebra.

Observe that a diagonal matrix D is positive semidefinite if and only if


all its diagonal entries are nonnegative:

D  0 ⇐⇒ Dii ≥ 0 for all i ∈ [n].

The above characterization extends to positive definite matrices. A matrix


X is said to be positive definite, which is denoted as X  0, if it satisfies
any of the following equivalent properties:

(i) xT Xx > 0 for all x ∈ Rn \ {0};


(ii) all eigenvalues of X are strictly positive;
(iii) in a Cholesky decomposition of X, the matrix L is nonsingular;
(iv) in any Gram representation of X as (viT vj )ni,j=1 , the system of vectors
{v1 , . . . , vn } has full rank n;
(v) all the principal minors of X are positive. In fact positivity of all the lead-
ing principal minors already implies positive definiteness, this is known
as Sylvester’s criterion.

The following is an easy but useful observation: If X ∈ S n then X +tI  0


for any t large enough; namely, for any t > −λmin (X).
B.1 Basic definitions 387

B.1.2 The trace inner product


The trace of an n × n-matrix X is defined as the sum of diagonal coefficients
of X:
Xn
Tr(X) = Xii .
i=1

Taking the trace is a linear operation: For matrices X, Y ∈ Rn×n and α ∈ R


we have
Tr(αX) = αTr(X), Tr(X + Y ) = Tr(X) + Tr(Y ).
The trace satisfies the following properties:
Tr(X) = Tr(X T ), Tr(XY ) = Tr(Y X),
(B.2)
Tr(xxT ) = xT x = kxk2 for x ∈ Rn .
Using the fact that Tr(uuT ) = 1 for any unit vector u, combined with (B.1),
we deduce that the trace of a symmetric matrix is equal to the sum of its
eigenvalues.
Lemma B.1.4 If X ∈ S n has eigenvalues λ1 , . . . , λn , then
Tr(X) = λ1 + · · · + λn .
One can define an inner product, denoted as h·, ·i, on Rn×n by setting
n
X
hX, Y i = Tr(X T Y ) = Xij Yij for X, Y ∈ Rn×n . (B.3)
i,j=1

This defines the Frobenius norm on Rn×n by setting


v
u n
uX
p
kXkF = hX, Xi = t Xij2 .
i,j=1

In other words, this is the usual Euclidean norm, just viewing a matrix
2
as a vector in Rn . Therefore, the Cauchy-Schwarz inequality holds for the
Frobenius norm
|hX, Y i| ≤ kXkF · kY kF ,
with equality if and only if X and Y are linearly dependent.
For a vector x ∈ Rn we have
hX, xxT i = xT Xx.
If λ is an eigenvalue of X and x is a corresponding eigenvector of unit length,
388 Positive semidefinite matrices (Version: May 24, 2022)

then we can bound the modulus of λ by the Frobenius norm of X using the
Cauchy-Schwarz inequality:
|λ| = |xT Xx| = |hX, xxT i| ≤ kXkF .

We are mainly interested in the n(n+1)


2 -dimensional subspace S n ⊆ Rn×n
of symmetric matrices. The standard basis of S n is
1
Eij = (ei eT T
j + ej ei ) for 1 ≤ i ≤ j ≤ n,
2
where ei , with i = 1, . . . , n, is the standard basis of Rn .
With the trace inner product S n is a Euclidean space. Then we can identify
the Euclidean space
(S n , (X, Y ) 7→ hX, Y i) with (Rn(n+1)/2 , (x, y) 7→ xT y)
via the isometry T : S n → Rn(n+1)/2 defined by
√ √ √ √ √
T (X) = (X11 , 2X12 , 2X13 , . . . , 2X1n , X22 , 2X23 , . . . , 2X2n , . . . , Xnn ),
where we only consider the upper triangular part of the matrix X.
The following property, which says that conjugation with an orthogonal
matrix P ∈ O(n), namely the transformation X 7→ P XP T , is an isometry
of the Euclidean space S n , is useful to know.
Lemma B.1.5 Let X, Y ∈ S n be symmetric matrices, and let P ∈ O(n)
be an orthogonal matrix. Then,
hX, Y i = hP XP T , P Y P T i.
Proof We have
hP XP T , P Y P T i = Tr(P XP T P Y P T ) = Tr(P XY P T )
= Tr(XY P T P ) = Tr(XY ) = hX, Y i,

where we have used the orthogonality of P , P T P = In , and the commuta-


tivity rule from (B.2).

n
B.1.3 The positive semidefinite cone S+
Definition B.1.6 We let
n
S+ = {X ∈ S n : X  0}
denote the set of all positive semidefinite matrices in S n , called the positive
semidefinite cone. We denote the set of all positive definite matrices by S++n .
B.1 Basic definitions 389

As a direct application of the spectral decomposition (B.1), we find that


n is generated by rank one matrices, i.e.,
the cone S+
n
S+ = cone{xxT : x ∈ Rn }
(N )
X (B.4)
= αi xi xT n
i : N ∈ N, αi ≥ 0, xi ∈ R (i ∈ [N ]) .
i=1

Theorem B.1.7 The positive semidefinite cone S+ n is a proper convex

cone. Its interior is exactly the convex cone of positive definite matrices
n .
S++
Proof Indeed, S+ n is a convex cone in S n , that is, it is closed under taking

nonnegative linear combinations.


Moreover, S+ n is a closed subset of S n because S n is by definition an
+
intersection of closed half spaces:
\
n
S+ = {X ∈ S n : hX, xxT i ≥ 0}
x∈Rn

To show that S+n is pointed we choose X ∈ S n so that also −X ∈ S n .


+ +
Consider the smallest eigenvalue of −X, which is nonnegative. Then,
0 ≤ λmin (−X) = −λmax (X).
Since the largest eigenvalue of X has to be nonnegative as well, it equals
zero, so all eigenvalues of X are zero. Hence, X = 0.
The positive semidefinite cone S+ n is full-dimensional because its interior
n ,
is not empty; its interior equals the set of all positive definite matrices S++
see Exercise B.1.
n is self-dual, i.e., it coincides
A very important fact is that the cone S+
with its dual cone, which is defined as
n ∗
(S+ ) = {Y ∈ S n : hY, Xi ≥ 0 for all X ∈ S+
n
}.
This lemma is sometimes attributed to the Hungarian mathematician Lipót
Fejér (1880–1959).
Lemma B.1.8 For a symmetric matrix X ∈ S n ,
n
X  0 ⇐⇒ hX, Y i ≥ 0 for all Y ∈ S+ .
n )∗ = S n .
In particular, (S+ +

Proof The direction “⇐=” is easy: If hX, Y i ≥ 0 for all Y ∈ S+ n then, in

particular for Y = yy T  0, we obtain that y T Xy ≥ 0, which shows X  0.


390 Positive semidefinite matrices (Version: May 24, 2022)

The direction “=⇒” uses the spectral decomposition theorem. We can


write a positive semidefinite matrix Y as Y = ni=1 λi ui uT
P
i with λi ≥ 0. If
X  0, then
n n n
* +
X X X
T
hX, Y i = X, λi ui ui = λi hX, ui uT
i i = λi uT
i Xui ≥ 0.
i=1 i=1 i=1

B.2 Basic properties


B.2.1 The Schur complement
We recall some basic operations about positive semidefinite matrices. The
proof of the following lemma is easy and left as an exercise.
Lemma B.2.1 (i) If X  0 then every principal submatrix of X is positive
semidefinite.
(ii) Any matrix congruent to X  0, i.e., of the form P XP T where P is
nonsingular, is positive semidefinite: if P ∈ Rn×n be nonsingular then

X  0 ⇐⇒ P XP T  0.

(iii) Let X ∈ S n be a matrix having the following block-diagonal form:


 
A 0
X=
0 B
Then,
X  0 ⇐⇒ A  0 and B  0.

We now introduce the notion of Schur complement 1 , which can be very


useful for showing positive semidefiniteness.
Lemma B.2.2 Let X ∈ S n be a matrix in block-form
 
A B
X= , (B.5)
BT C

where A ∈ S p , C ∈ S n−p and B ∈ Rp×(n−p) . If A is nonsingular, then

X  0 ⇐⇒ A  0 and C − B T A−1 B  0.

The matrix C − B T A−1 B is called the Schur complement of A in X.


1 after Issai Schur (1875–1941)
B.2 Basic properties 391

Proof The following identity holds:


A−1 B
   
T A 0 I
X=P P, where P = ,
0 C − B T A−1 B 0 I
which we get from a calculation with matrices:
A−1 B
   
I 0 A 0 I
−1
(A B) IT T
0 C −B A B −1 0 I
  
I 0 A B
= −1
(A B) IT 0 C − B T A−1 B
 
A B
= .
BT C
Using the previous lemma the result follows: As P is nonsingular, we deduce
that X  0 if and only if (P −1 )T XP −1  0, which is thus equivalent to A  0
and C − B T A−1 B  0.

Note that from the proof we can also deduce

X  0 ⇐⇒ A  0 and C − B T A−1 B  0.

B.2.2 Properties of the kernel


Here is a first useful property of the kernel of positive semidefinite matrices.
Lemma B.2.3 Assume X ∈ S n is positive semidefinite and let x ∈ Rn .
Then,
Xx = 0 ⇐⇒ xT Xx = 0.

Proof The “only if” part is clear.


Conversely, decompose the vector x = ni=1 xi ui in an orthonormal basis
P

of eigenvectors of X. Then,
n
X n
X
Xx = λi xi ui and xT Xx = λi x2i .
i=1 i=1

Hence,
n
X
0 = xT Xx =⇒ 0 = λi x2i =⇒ xi = 0 if λi > 0.
i=1

This shows that x is a linear combination of eigenvectors ui with eigenvalue


λi = 0, and thus Xx = 0.
392 Positive semidefinite matrices (Version: May 24, 2022)

Clearly, X  0 implies Xii ≥ 0 for all i, because Xii = eT i Xei ≥ 0.


Moreover, if a positive semidefinite matrix X has a zero diagonal entry at
position (i, i) then the whole i-th row/column is identically zero. This follows
from the following lemma:
n be a positive semidefinite matrix in block-form
Lemma B.2.4 Let X ∈ S+
 
A B
X= , (B.6)
BT C

where A ∈ S p , C ∈ S n−p and B ∈ Rp×(n−p) . Assume y ∈ Rp belongs to the


kernel of A, i.e., Ay = 0. Then the vector x = (y, 0, . . . , 0) ∈ Rn (obtained
from y by adding zero coordinates at the remaining n − p positions) belongs
to the kernel of X, i.e., Xx = 0.
Proof We have xT Xx = y T Ay = 0, which, in view of Lemma B.2.3, implies
Xx = 0.
We conclude with the following property: The inner product of two posi-
tive semidefinite matrices is zero if and only if their matrix product is equal
to 0.
Lemma B.2.5 Let X, Y  0. Then,
hX, Y i = 0 ⇐⇒ XY = 0.
Proof The “only if” part is clear since hX, Y i = Tr(XY ).
Assume now that hX, Y i = 0 hold. Let Y = ni=1 λi ui uT
P
i be a spectral
decomposition of Y , where λi ≥ 0 and the vectors ui form an orthonormal
basis. Then,
Xn
0 = hX, Y i = λi hX, ui uT
i i.
i=1

This implies that each summand λi hX, ui uT T


i i = λi ui Xui is equal to 0, since
λi ≥ 0 and ui Xui ≥ 0, as X  0. Hence, λi > 0 implies uT
T
i Xui = 0 and
thus Xui = 0 by Lemma B.2.3. Therefore, each term λi Xui is 0 and thus
n n
!
X X
XY = X λi ui uT
i = λi Xui uTi = 0.
i=1 i=1

B.2.3 Kronecker and Hadamard products


Given two matrices A = (Aij ) ∈ Rn×m and B = (Bhk ) ∈ Rp×q , their
Kronecker product (or tensor product) is the matrix A⊗B ∈ Rnp×mq defined
B.2 Basic properties 393

by
 
A11 B . . . A1m B
 A21 B . . . A2m B 
A⊗B = .
 
.. .. 
 .. . . 
An1 B . . . Anm B

which is the n × m block-matrix whose (i, j)-th block is the p × q matrix


Aij B for all i ∈ [n], j ∈ [m].
The most convenient way to describe single entries of A ⊗ B is by using
quadruple indices. For this we write A(i,j) for Aij and B(h,k) for Bhk . Then,

(A ⊗ B)((i,h),(j,k)) = A(i,j) B(h,k) ∀i ∈ [n], j ∈ [m], h ∈ [p], k ∈ [q].

The definition of Kronecker product of matrices includes in particular


defining the Kronecker product u ⊗ v ∈ Rnp of two vectors u ∈ Rn and
v ∈ Rp , with entries (u ⊗ v)ih = ui vh for i ∈ [n], h ∈ [p].
Given two matrices A, B ∈ Rn×m , their Schur-Hadamard product is the
matrix A ◦ B ∈ Rn×m with entries

(A ◦ B)ij = Aij Bij ∀i ∈ [n], j ∈ [m].

Note that A ◦ B coincides with the principle submatrix of A ⊗ B indexed


by the subset of all “diagonal” pairs of indices of the form ((i, i), (j, j)) for
i ∈ [n], j ∈ [m].

Here are some (easy to verify) facts about these products, where the ma-
trices and vectors have the appropriate sizes.

(i) (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).


(ii) In particular, (A ⊗ B)(u ⊗ v) = (Au) ⊗ (Bv).
(iii) Assume A ∈ S n and B ∈ S p have, respectively, eigenvalues α1 , . . . , αn
and β1 , . . . , βp . Then A⊗B ∈ S np has eigenvalues αi βh for i ∈ [n], h ∈ [p].
In particular,

A, B  0 =⇒ A ⊗ B  0 and A ◦ B  0,

and

A  0 =⇒ A◦k = ((Aij )k )ni,j=1  0 for all k ∈ N.


394 Positive semidefinite matrices (Version: May 24, 2022)

B.2.4 Block-diagonal matrices


Given matrices X1 ∈ S n1 , . . . , Xr ∈ S nr ,
 
X1 0 . . . 0
 0 X2 . . . 0 
X = X1 ⊕ · · · ⊕ Xr =  . (B.7)
 
.. . 
 .. . .. 
0 0 . . . Xr
denotes a block-diagonal matrix X ∈ S n , where n = n1 + · · · + nr . Then, X
is positive semidefinite if and only if all the blocks X1 , . . . , Xr are positive
semidefinite.
Given two sets of matrices A and B, A ⊕ B denotes the set of all matrices
X ⊕ Y , where X ∈ A and Y ∈ B. Moreover, for an integer m ≥ 1, mA
denotes A ⊕ · · · ⊕ A, the m-fold sum.
From an algorithmic point of view it is much more economical to deal
with positive semidefinite matrices in block-form like (B.7).
For instance, if we have a set A of matrices that pairwise commute, then
it is well known that they admit a common set of eigenvectors. In other
words, there exists an orthogonal matrix P ∈ O(n) such that the matrices
P T XP are diagonal for all X ∈ A.
In general one may use the following powerful result about C ∗ -algebras
which permits to show that certain sets of matrices can be block-diagonalized.
Consider a non-empty set A ⊆ Cn×n of matrices. A is said to be a C ∗ -
algebra if it satisfies the following conditions:
(i) A is closed under matrix addition and multiplication, and under scalar
multiplication.
(ii) For any matrix A ∈ A, its conjugate transpose A∗ also belongs to A.
For instance, the full matrix algebra Cn×n is a simple instance of C ∗ -
algebra, and the algebra ri=1 mi Cni ×ni as well, where ni , mi are integers.
L

The following fundamental result shows that up to an orthogonal transfor-


mation this is the general form of a C ∗ -algebra.
Theorem B.2.6 (Wedderburn-Artin theorem) Assume A is a C ∗ -
algebra of matrices in Cn×n containing the identity matrix. Then there exists
a unitary matrix P (i.e., such that P P ∗ = In ) and integers r, n1 , m1 , . . . , nr , mr ≥
1 such that the set P ∗ AP = {P ∗ XP : X ∈ A} is equal to
m1 Cn1 ×n1 ⊕ . . . ⊕ mr Cnr ×nr .
See e.g. the thesis of Gijswijt [2005] for a detailed exposition and its use
for bounding the size of error correcting codes in finite fields.
Exercises 395

Exercises
B.1 Show: n
int S+= n .
S++
B.2 Recall that a complex square matrix A ∈ Cn×n is Hermitian (or self-
adjoint) if A = A∗ , i.e., Aij = Aji for all entries of A. The Hermitian
matrices form a real vector space (of dimension n2 ), with the Frobenius
inner product
X
hA, Bi = Aij Bij = T r(A∗ B).
ij

A Hermitian matrix M ∈ Cn×n is positive semidefinite if z ∗ M z ≥ 0 for


all z ∈ Cn , or equivalently if all eigenvalues of M are non-negative.
Consider the set H+ n of positive semidefinite complex n × n-matrices

as a subset of the Hermitian matrices. Show:


n is a self-dual proper convex cone for any n ≥ 1.
(a) H+
2 is isometric to L3+1 .
(b) H+
B.3 Given x1 , . . . , xn ∈ R, consider the following matrix
 
1 x1 . . . x n
 x1 x1 0 0
X= . .
 
 .. ..
0 . 0
xn 0 0 xn
That is, X ∈ S n+1 is the matrix indexed by {0, 1, . . . , n}, with entries
X00 = 1, X0i = Xi0 = Xii = xi for i ∈ [n], and all other entries are
equal to 0.
Use the Schur complement to show:
n
X
X  0 ⇐⇒ xi ≥ 0 for all i ∈ [n] and xi ≤ 1.
i=1
B.4 Assume that X ∈ Sn satisfies the condition:
X
Xii ≥ |Xij | for all i ∈ [n].
j∈[n]:j6=i

(Then X is said to be diagonally dominant.)


Show: X  0.
B.5 A matrix X is called scaled diagonally dominant if there is positive
semidefinite diagonal matrix D so that DXD is diagonally dominant.
Show that a scaled diagonally dominant matrix is positive semidefinite.
B.6 Let X be a symmetric matrix whose entries are either +1 or −1. Show:
X  0 if and only if X = xxT for some x ∈ {±1}n .
References

[2003] M. Aigner, G.M. Ziegler. Proofs from The Book, Springer, Berlin, 1998.
[1965] N.I. Akhiezer, The classical moment problem, Hafner, New York, 1965.
[2006] N. Alon, K. Makarychev, Y. Makarychev, A. Naor, Quadratic forms on
graphs, Inventiones Mathematicae 163 (2006) 499–522.
[2006] N. Alon, A. Naor, Approximating the cut-norm via Grothendieck’s inequal-
ity, SIAM Journal on Computing 35 (2006) 787–803.
[2012] M.F. Anjos and J.B. Lasserre (eds), Handbook on Semidefinite, Conic and
Polynomial Optimization [International Series in Operations Research &
Management Science, Volume 166], Springer, New York, 2012, pp. 25–60.
[2000] K. Anstreicher, H. Wolkowicz, On Lagrangian relaxation of quadratic ma-
trix constraints, SIAM Journal on Matrix Analysis and Applications 22
(2000) 41–55.
[1977] K. Appel, W. Haken, Every planar map is four colorable. I. Discharging,
Illinois Journal of Mathematics 21 (1977) 429–490.
[1977] K. Appel, W. Haken, J. Koch, Every planar map is four colorable. II.
Reducibility. Illinois Journal of Mathematics 21 (1977) 491–567.
[2009] S. Arora, B. Barak, Computational Complexity — A Modern Approach,
Cambridge University Press, Cambridge, 2009.
[1927] E. Artin, Ueber die Zerlegung definiter Funktionen in Quadrate, Abh. Math.
Sem. Hamburg 5, 1927.
[2008] C. Bachoc, F. Vallentin, New upper bounds for kissing numbers from
semidefinite programming, Journal of the American Mathematical Society
21 (2008) 909–924.
[2012] C. Bachoc, D.C. Gijswijt, A. Schrijver, F. Vallentin, Invariant semidefinite
programs, in: Handbook on Semidefinite, Conic and Polynomial Optimiza-
tion (M.F. Anjos, J.B. Lasserre, eds.) [International Series in Operations
Research & Management Science 166], Springer, New York, 2012, pp. 219–
269.
[1992] K. Ball, Ellipsoids of maximal volume in convex bodies, Geometriae Dedi-
cata 41 (1992) 241–250.
[1997] K. Ball, An Elementary Introduction to Modern Convex Geometry, in:
Flavors of Geometry (S. Levy, ed.) [MSRI Publications 31], Cambridge
University Press, Cambridge, 1997, pp. 1–58.
[1983] F. Barahona, The max-cut problem in graphs not contractible to K5 , Op-
erations Research Letters 2 (1983) 107–111.
References 397

[2002] A. Barvinok, A course in convexity, [Graduate Studies in Mathematics 54],


American Mathematical Society, Providence, RI, 2002.
[2007] M. Belk, R. Connelly, Realizability of graphs, Discrete & Computational
Geometry 37 (2007) 125–137.
[2009] A. Ben-Tal, L. El Ghaoui, A. Nemirovski, Robust Optimization, [Prince-
ton Series in Applied Mathematics], Princeton University Press, Princeton,
New Jersey, 2009.
[2001] A. Ben-Tal, A. Nemirovski, Lectures on modern convex optimization —
Analysis, algorithms, and engineering applications, [MPS/SIAM Series on
Optimization], Society for Industrial and Applied Mathematics, Philadel-
phia, Pennsylvania; Mathematical Programming Society, Philadelphia,
Pennsylvania, 2001.
[2010] M. Berger, Geometry Revealed — A Jacob’s Ladder to Modern Higher Ge-
ometry, Springer, Heidelberg, 2010.
[2006] G. Blekherman, There are significantly more nonnegative polynomials than
sums of squares, Israel Journal of Mathematics 153 (2006) 355–380.
[2012] G. Blekherman, P.A. Parrilo, R.R. Thomas (eds.), Semidefinite Optimiza-
tion and Convex Algebraic Geometry [MOS-SIAM Series on Optimization,
Volume 13], Society for Industrial and Applied Mathematics, Philadelphia,
Pennsylvania; Mathematical Optimization Society, Philadelphia, Pennsyl-
vania, 2012.
[1998] J. Bochnak, M. Coste, M.-F. Roy, Géometrie Algebrique Réelle, Ergeb.
Math. 12, Springer, 1987. Real Algebraic Geometry, Ergeb. Math. 36,
Springer, 1998.
[1985] J. Bourgain, On Lipschitz embedding of finite metric spaces in Hilbert
space, Israel Journal of Mathematics 52 (1985), 46–52.
[1986] J. Bourgain, The metrical interpretation of superreflexivity in Banach
spaces. Israel Journal of Mathematics 56 (1986), 222–230.
[2004] S. Boyd, L. Vandenberghe, Convex optimization, Cambridge University
Press, Cambridge, 2004.
[2010] J. Briët, F.M. de Oliveira Filho, F. Vallentin, The Grothendieck problem
with rank constraint, in: Proceedings of the 19th Symposium on Mathemat-
ical Theory of Networks and Systems, 2010, pp. 111–113.
[2014] J. Briët, F.M. de Oliveira Filho, F. Vallentin, Grothendieck inequalities
for semidefinite programs with rank constraint, Theory of Computing. An
Open Access Journal 10 (2014) 77–105.
[2022] A.E. Brouwer, H. Van Maldeghem, Strongly regular graphs [Encyclopedia
of Mathematics and its Applications 182], Cambridge University Press,
Cambridge, 2022.
[2004] B. Casselman, The difficulties of kissing in three dimensions, Notices of the
American Mathematical Society 51 (2004), 884–885.
[1995] M.D. Choi, T.Y. Lam, B. Reznick, Sums of squares of real polynomials,
in: K-theory and algebraic geometry: connections with quadratic forms and
division algebras (Santa Barbara, California, 1992) [Proceedings of Sym-
posia in Pure Mathematics, Volume 58], American Mathematical Society,
Providence, Rhode Island, 1995, pp. 103–126.
[1975] V. Chvátal, On certain polytopes associated with graphs, Journal of Com-
binatorial Theory, Series B 18 (1975) 138–154.
[2006] M. Chudnovsky, N. Robertson, P. Seymour, R. Thomas, The strong perfect
graph theorem, Annals of Mathematics (2) 164 (2006) 51–229.
398 References

[2021] S.M. Cioabă, H. Gupta, F. Ihringer, H. Kurihara, The least Euclidean dis-
tortion constant of a distance-regular graph, arXiv:2109.09708 [math.CO],
2021
[2010] H. Cohn, Order and disorder in energy minimization, in: Proceedings of the
International Congress of Mathematicians, Hindustan Book Agency, New
Delhi, 2010, pp. 2416–2443.
[1992] D.A. Cox, J.B. Little, D. O’Shea. Ideals, Varieties and Algorithms — An
Introduction to Computational Algebraic Geometry and Commutative Al-
gebra [Undergraduate Texts in Mathematics], Springer, New York, 1992.
[1998] D.A. Cox, J.B. Little, D. O’Shea. Using Algebraic Geometry [Graduate
Texts in Mathematics 185], Springer, New York, 1998.
[1996] R.E. Curto, L.A. Fialkow, Solution of the truncated complex moment prob-
lem for flat data, Memoirs of the American Mathematical Society 119
(1996) x+52 pp.
[1957] C. Davis, All convex invariant functions of hermitian matrices Archiv der
Mathematik 8 (1957) 276–278.
[1973] P. Delsarte, An Algebraic Approach to the Association Schemes of Cod-
ing Theory [Philips Research Reports Supplements 1973 No. 10], Philips
Research Laboratories, Eindhoven, 1973.
[1997] M.M. Deza, M. Laurent, Geometry of Cuts and Metrics [Algorithms and
Combinatorics 15], Springer, Berlin, 1997.
[1997] R. Diestel, Graph Theory [Graduate Texts in Mathematics 173], Springer,
New York, 1997.
[1956] R.J. Duffin, Infinite programs, in: Linear Inequalities and Related Sys-
tems (H.W. Kuhn, A.W. Tucker, eds.) [Annals of Mathematics Studies
38], Princeton University Press, Princeton, New Jersey, 1956, pp. 157–170.
[1969] P. Enflo, On the nonexistence of uniform homeomorphisms between Lp -
spaces, Arkiv för Matematik 8 (1969), 103–105.
[2016] H. Fwzi, J. Saunderson, P. Parrilo, Sparse sums of squares on finite abelian
groups and improved semidefinite lifts Mathematical Programming Series
A 160(1) (2016) 149–191.
[1968] A.V. Fiacco, G.P. McCormick, Nonlinear Programming: Sequential Uncon-
strained Minimization Techniques, Wiley, New York-London-Sydney, 1968.
[1979] M.R. Garey, D.S. Johnson, Computers and Intractability — A Guide to the
Theory of NP-Completeness, Freeman, San Francisco, California, 1979.
[1976] M.R. Garey, D.S. Johnson, L. Stockmeyer, Some simplified NP-complete
problems, Theoretical Computer Science 1 (1976) 237–267.
[1996] G.S. Gasparian, Minimal imperfect graphs: a simple approach, Combina-
torica 16 (1996) 209–212.
[2005] D. Gijswijt, Matrix algebras and semidefinite programming techniques for
codes Ph.D. thesis, University of Amsterdam, 2005.
[1997] M.X. Goemans, Semidefinite programming in combinatorial optimization,
Mathematical Programming, Series B 79 (1997), 143–161.
[1995] M.X. Goemans, D.P. Williamson, Improved approximation algorithms for
maximum cuts and satisfiability problems using semidefinite programming,
Journal of the Association for Computing Machinery 42 (1995) 1115–1145.
[2008] G. Gonthier, Formal proof—the four-color theorem, Notices of the Ameri-
can Mathematical Society 55 (2008) 1382–1393.
References 399

[2007] D. Grimm, T. Netzer, M. Schweighofer, A note on the representationof


positive polynomials with structured sparsity, Archiv der Mathematik 89(5)
(2007) 399–403.
[1990] R. Grone, S. Pierce, W. Watkins, Extremal correlation matrices, Linear
Algebra and its Applications 134 (1990) 63–70.
[1981] M. Grötschel, L. Lovász, A. Schrijver, The ellipsoid method and its conse-
quences in combinatorial optimization, Combinatorica 1 (1981) 169–197.
[1988] M. Grötschel, L. Lovász, A. Schrijver, Geometric Algorithms and Combi-
natorial Optimization [Algorithms and Combinatorics 2], Springer, Berlin,
1988.
[2007] P.M. Gruber, Convex and Discrete Geometry [Grundlehren der Mathema-
tischen Wissenschaften 336], Springer, Berlin, 2007.
[2008] N. Gvozdenović, M. Laurent, The operator Ψ for the chromatic number of
a graph. SIAM Journal on Optimization 19 (2008) 572–591.
[1936] E.K. Haviland, On the momentum problem for distributions in more than
one dimension, American Journal of Mathematics 58 (1936) 164–168.
[2012] M. Henk, Löwner-John ellipsoids, Documenta Mathematicae [Extra vol-
ume.: Optimization stories], 95–106, 2012.
[2005] D. Henrion, J.-B. Lasserre, Detecting global optimality and extracting so-
lutions in GloptiPoly, In Positive Polynomials in Control, D. Henrion and
A. Garulli (eds.), Springer, Lecture Notes on Control and Information Sci-
ences 312 (2005) 293–310.
[2009] D. Henrion, J.-B. Lasserre, J. Loefberg. GloptiPoly 3: moments, optimiza-
tion and semidefinite programming, Optimization Methods & Software 24
(2009) 761–779.
[1888] D. Hilbert, Ueber die Darstellung definiter Formen als Summe von For-
menquadraten, Mathematische Annalen 32 (1888) 342–350.
[2006] S. Hoory, N. Linial, A. Wigderson, Expander graphs and their applications,
American Mathematical Society, Bulletin, New Series 43 (2006) 439–561.
[1891] A. Hurwitz, Ueber den Vergleich des arithmetischen und des geometrischen
Mittels, J. Reine angew. Math. 108, 1891.
[1987] R.D. Hill, S.R. Waters, On the cone of positive semidefnite matrices, Linear
Algebra and its Applications 90 (1987) 81–88.
[2016] C. Josz and D. Henrion, Strong duality in Lasserre’s hierarchy for polyno-
mial optimization, Optimization Letters 10:3–10, 2016.
[2011] A.R. Karlin, C. Mathieu, C.T. Nguyen, Integrality Gaps of Linear and
Semi-Definite Programming Relaxations for Knapsack, In Integer Pro-
gramming and Combinatoral Optimization - IPCO 2011, O. Günlük, G.J.
Woeginger (eds), Lecture Notes in Computer Science, vol 6655. Springer,
Berlin, Heidelberg.
[2014] S. Khot, Hardness of Approximation, in: Proceedings of the International
Congress of Mathematicians, Kyung Moon Sa, Seoul, 2014, pp. 711–728.
[2013] I. Klep, M. Schweighofer, An exact duality theory for semidefinite pro-
gramming based on sums of squares, Mathematics of Operations Research
38 (2013) 569–590.
[2002] E. de Klerk, Aspects of Semidefinite Prgramming – Interior Point Algo-
rthms and Selected Applications, Kluwer, 2002.
[1994] D.E. Knuth, The sandwich theorem, The Electronic Journal of Combina-
torics 1 (1994) Article 1, 48pp.
400 References

[2015] T. Kobayashi, T. Kondo, The Euclidean distortion of generalized polygons,


Advances in Geometry 15 (2015), 499–506.
[2016] A. Kurpisz, S. Leppänen, M. Mastrolilli, Tight sum-of-squares bounds for
binary polynomial optimization problems, in 43rd International Colloquium
on Automata, Languages, and Programming, ICALP 2016, Rome, Italy,
July 11-15, pages 78:1-78:14.
[1964] J.-L. Krivine, Anneaux préordonnés, J. Analyse Math., 12:307–326, 1964.
[2001] J.B. Lasserre, Global optimization with polynomials and the problem of
moments, SIAM Journal on Optimization 11 (2001) 796–817.
[2001] J.B. Lasserre, An explicit exact SDP relaxation for nonlinear 0 − 1 pro-
grams, in: Integer Programming and Combinatorial Optimization (Proceed-
ings 8th IPCO Conference, Utrecht, 2001; K. Aardal, A.M.H. Gerards, eds.)
[Lecture Notes in Computer Science 2081] Springer, Berlin, 2001, pp. 293–
303.
[2006] J.-B. Lasserre, Convergent semidefinite relaxations in polynomial optimiza-
tion with sparsity, SIAM Journal on Optimization 17 (2006) 822–843.
[2009] J.B. Lasserre, Moments, Positive Polynomials and Their Applications [Im-
perial College Press Optimization Series 1], Imperial College Press, London,
2010.
[2015] J.B. Lasserre, Introduction to Polynomial and Semi-Algebraic Optimiza-
tion, Cambridge University Press, Cambridge, UK, 2015.
[2008] J.-B. Lasserre, M. Laurent, P. Rostalski, Semidefinite characterizationand
computation of real radical ideals, Foundations of Computational Mathe-
matics 8 (2008) 607?647.
[2006] J.-B. Lasserre, T. Netzer, SOS approximations of nonnegative polynomials
via simple high degree perturbations, Mathematische Zeitschrift 256 (2007)
99–112.
[2003] M. Laurent, A comparison of the Sherali-Adams, Lovász-Schrijver, and
Lasserre relaxations for 0-1 programming, Mathematics of Operations Re-
search 28 (2003) 470–496.
[2003] M. Laurent, Lower bound for the number of iterations in semidefinite re-
laxations for the cut polytope, Mathematics of Operations Research 28(4)
(2003) 871–883.
[2004] M. Laurent, Semidefinite relaxations for Max-Cut. In The Sharpest Cut:
The Impact of Manfred Padberg and His Work, M. Grötschel, ed. (2004)
pp. 257–290, MPS-SIAM Series in Optimization 4.
[2005] M. Laurent, Revisiting two theorems of Curto and Fialkow on moment
matrices, Proceedings of the American Mathematical Society 133 (2005)
2965–2976.
[2007a] M. Laurent, Strengthened semidefinite programming bounds for codes,
Mathematical Programming, Series B 109 (2007) 239–261.
[2007b] M. Laurent, Semidefinite representations for finite varieties, Mathematical
Programming 109 (2007) 1–26.
[2009] M. Laurent, Sums of squares, moment matrices and optimization over poly-
nomials, in: Emerging Applications of Algebraic Geometry (M. Putinar, S.
Sullivant, eds.) [IMA Volumes in Mathematics and its Applications 149]
Springer, New York, 2009, pp. 157–270.
[91] M. Laurent, B. Mourrain, A generalized flat extension theorem for moment
matrices, Archiv der Mathematik 93(1) (2009) 87–98.
References 401

[1996] M. Laurent, S. Poljak, On the facial structure of the set of correlation


matrices, SIAM Journal on Matrix Analysis and Applications 17 (1996)
530–547.
[2012] M. Laurent, P. Rostalski, The approach of moments for polynomial equa-
tions, in: Handbook on Semidefinite, Conic and Polynomial Optimization
(M.F. Anjos, J.B. Lasserre, eds.) [International Series in Operations Re-
search & Management Science, Volume 166], Springer, New York, 2012,
pp. 25–60.
[2015] J.R. Lee, P. Raghavendra, D. Steurer, Lower bounds on the size of semidefi-
nite programming relaxations, In STOC’15, Proceedings of the forty-seventh
annual ACM symposium on Theory of Computing (2015) pp. 567–576.
[2003] A.S. Lewis, The mathematics of eigenvalue optimization, Mathematical
Programmming, Series B 97 (2003) 155–176.
[1995] N. Linial, E. London, Y. Rabinovich, The geometry of graphs and some of
its algorithmic applications, Combinatorica 15 (1995), 215–246.
[2000] N. Linial, A. Magen, Least-distortion Euclidean embeddings of graphs:
products of cycles and expanders, Journal of Combinatorial Theory, Se-
ries B 79 (2000), 157–171.
[2002] N. Linial, A. Magen, A. Naor, Girth and Euclidean distortion, Geometric
and Functional Analysis 12 (2002), 380–394.
[2002] N. Linial, Finite metric-spaces—combinatorics, geometry and algorithms,
in: Proceedings of the International Congress of Mathematicians, Higher
Education Press, Beijing, 2002, pp. 573—586.
[2017] B. Litjens, S. Polak, A. Schrijver, Semidefinite bounds for nonbinary codes
based on quadruples, Designs, Codes and Cryptography, An International
Journal 84 (2017) 87–100.
[2012] A. Lubotzky, Expander graphs in pure and applied mathematics, American
Mathematical Society, Bulletin, New Series 49 (2012) 113–162.
[2014] H. Lombardi, D. Perruci and M.-F. Roy, An elementary recursive bound
for effective Positivstellensatz and Hilbert 17-th problem, arXiv:1404.2338,
2014.
[1972] L. Lovász, A characterization of perfect graphs, Journal of Combinatorial
Theory, Series B 13 (1972) 95–98.
[1979] L. Lovász, On the Shannon capacity of a graph, IEEE Transactions on
Information Theory IT-25 (1979) 1–7.
[1991] L. Lovász, A. Schrijver, Cones of matrices and set-functions and 0-1 opti-
mization, SIAM Journal on Optimization 1 (1991) 166–190.
[2008] M. Marshall, Positive Polynomials and Sums of Squares [Mathematical
Surveys and Monographs 146], American Mathematical Society, Provi-
dence, Rhode Island, 2008.
[1999] J. Matoušek, On embedding trees into uniformly convex Banach spaces,
Israel Journal of Mathematics 114 (1999) 221–237.
[2002] J. Matoušek, Lectures on discrete geometry [Graduate Texts in Mathemat-
ics 212], Springer, New York, 2002.
[1978] R.J. McEliece, E.R. Rodemich, H.C. Rumsey, Jr., The Lovász’ bound and
some generalizations, Journal of Combinatorics, Information & System
Sciences 3 (1978) 134–152.
[1965] T.S. Motzkin, E.G. Straus, Maxima for graphs and a new proof of a the-
orem of Túran, Canadian Journal of Mathematics, Journal canadien de
mathématiques 17 (1965) 533–540.
402 References

[2012] An introduction to the Ribe program, Japanese Journal of Mathematics 7


(2012) 167–233.
[2007] A. Nemirovski, Advances in convex optimization — Conic programming,
in: International Congress of Mathematicians, European Mathematical So-
ciety Publishing House, Zürich, 2007, pp. 413–444.
[2008] A.S. Nemirovski, M.J. Todd, Interior-point methods for optimization, Acta
Numerica 17 (2008) 191—234.
[1994] Y. Nesterov, A. Nemirovskii, Interior-Point Polynomial Methods in Con-
vex Programming [SIAM Studies in Applied Mathematics 13], Society for
Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1994.
[1997] Y. Nesterov, Quality of semidefinite relaxation for nonconvex quadratic
optimization, CORE Discussion Paper, Number 9719, Louvain-la-Neuve
(Belgium), 1997.
[2000] Y. Nesterov, Squared functional systems and optimization problems, in:
High Performance Optimization (J.B.G. Frenk, C. Roos, T. Terlaky, S.
Zhang, eds.) [Applied Optimization 33], Springer, Boston, Massachussets,
2000, pp. 405–440.
[2003] I. Newman, Y. Rabinovich, A lower bound on the distortion of embedding
planar metrics into Euclidean space, Discrete & Computational Geometry
29 (2003) 77–81.
[2013] J. Nie, Polynomial optimization with real varieties, SIAM Journal on Op-
timization 23(3) (2013) 1634-1646.
[2014] J. Nie, Optimality conditions and finite convergence of Lasserre?s hierarchy,
Mathematical Programming 146(1-2):97–121, 2014.
[2007] J. Nie and M. Schweighofer. On the complexity of Putinar’s Positivstellen-
satz, J. of Complexity 23(1):135–150, 2007.
[1992] M.L. Overton, R.S. Womersley, On the sum of the largest eigenvalues of a
symmetric matrix, SIAM Journal on Matrix Analysis and its Applications
13 (1992) 41–45.
[2000] P.A. Parrilo. Structured Semidefinite Programs and Semialgebraic Geome-
try Methods in Robustness and Optimization, Ph.D. thesis, California In-
stitute of Technology, 2000.
[2002] P. Parrilo, An explicit construction of distinguished representations of poly-
nomials nonnegative over finite sets, IfA Technical Report AUT02-02, ETH
Zurich (2002)
[2003] P. Parrilo, Semidefinite programming relaxations for semialgebraic prob-
lems, Mathematical Programming B 96 (2003) 293?320.
[2005] P. Parrilo. Exploiting algebraic structure in sum of squares programs, In
Positive Polynomials in Control, D. Henrion and A. Garulli, eds., LNCIS
312 (2005) 181–194.
[2000] G. Pataki, The geometry of semidefinite programming, in: Handbook of
Semidefinite Programming (H. Wolkowicz, R. Saigal, L. Vandenberghe,
eds.) [International Series in Operations Research & Management Science
27], Springer, Boston, Massachusetts, 2000.
[2004] F. Pfender, G.M. Ziegler, Kissing numbers, sphere packings and some un-
expected proofs, Notices of the American Mathematical Society 51 (2004)
873–883.
[2007] I. Polik, T. Terlaky, A survey of the S-lemma, SIAM Review 49 (2007)
371–418.
References 403

[1994] S. Poljak, Z. Tuza, The expected relative error of the polyhedral approx-
imation of the max-cut problem, Operations Research Letters 16 (1994)
191–198.
[1998] V. Powers, T. Wörmann, An algorithm for sums of squares of real polyno-
mials, Journal of Pure and Applied Algebra 127 (1998) 99–104.
[2001] A. Prestel, C.N. Delzell, Positive Polynomials — From Hilbert’s 17th Prob-
lem to Real Algebra [Springer Monographs in Mathematics], Springer,
Berlin, 2001.
[1993] M. Putinar, Positive polynomials on compact semi-algebraic sets, Indiana
University Mathematics Journal 42 (1993) 969–984.
[1997] M. Ramana, An exact duality theory for semidefinite programming and its
complexity implications, Mathematical Programming, Series B 77 (1997)
129–162.
[1999] S. Rao, Small distortion and volume preserving embeddings for planar and
Euclidean metrics, in: Proceedings of the 15th Annual Symposium on Com-
putational Geometry (Miami Beach, FL, 1999, ACM, New York, 1999, pp.
300–306.
[2001] J. Renegar, A Mathematical View of Interior-Point Methods in Convex
Optimization [MPS/SIAM Series on Optimization], Society for Industrial
and Applied Mathematics, Philadelphia, Pennsylvania; Mathematical Pro-
gramming Society, Philadelphia, Pennsylvania, 2001.
[1995] B.. Reznick, Uniform denominators in Hilbert’s Seventeenth Problem,
Mathematische Zeitschrift 220 (1995) 75–97.
[2000] B. Reznick, Some concrete aspects of Hilbert’s 17th problem, in: Real Al-
gebraic Geometry and Ordered Structures (Baton Rouge, Louisiana, 1996;
C.N. Delzell, J.J. Madden, eds.) [Contemporary Mathematics 253], Amer-
ican Mathematical Society, Providence, Rhode Island, 2000, pp. 251–272.
[1997] N. Robertson, D.P. Sanders, P. Seymour, R. Thomas. The four-colour the-
orem, Journal of Combinatorial Theory Series B 70 (1997) 2–44.
[1970] R.T. Rockafellar, Convex analysis [Princeton Mathematical Series 28],
Princeton University Press, Princeton, New Jersey, 1970.
[1997] C. Roos, T. Terlaky, J.-Ph. Vial, Theory and Algorithms for Linear Opti-
mization — An Interior Point Approach [Wiley-Interscience Series in Dis-
crete Mathematics and Optimization], Wiley, Chichester, 1997.
[2017] S. Sakaue, A. Takeda, S. Kim, N. Ito, Exact semidefinite programming re-
laxations with truncated moment matrix for binary polynomial optimiza-
tion, SIAM Journal on Optimization 27(1) (2017) 565–582.
[2004] P. Sarnak, What is . . . an Expander?, Notices of the American Mathematical
Society 51 (2004) 762–763.
[1979] J.B. Saxe, Embeddability of weighted graphs in k-space is strongly NP-
hard, in: Proceedings of Seventeenth Allerton Conference in Communica-
tions, Control and Computing, University of Illinois, Urbana-Champaign,
Illinois, 1979, pp. 480–489.
[1991] K. Schmüdgen, The K-moment problem for compact semi-algebraic sets,
Mathematische Annalen 289 (1991), 203–206.
[2017] K. Schmüdgen, The moment problem, Graduate Texts in Mathematics,
Springer, 2017.
[1993] R. Schneider, Convex bodies: the Brunn-Minkowski theory [Encyclopedia of
Mathematics and its Applications 44], Cambridge University Press, Cam-
bridge, 1993.
404 References

[1979] A. Schrijver, A comparison of the Delsarte and Lovász bounds, IEEE Trans-
actions on Information Theory IT-25 (1979), 425–429.
[1986] A. Schrijver, Theory of Linear and Integer Programming [Wiley-
Interscience Series in Discrete Mathematics], Wiley, Chichester, 1986.
[2005] A. Schrijver, New code upper bounds from the Terwilliger algebra and
semidefinite programming, IEEE Transactions on Information Theory IT-
51 (2005) 2859–2866.
[1990] H.D. Sherali, W.P. Adams, A hierarchy of relaxations between the contin-
uous and convex hull representations for zero-one programming problems,
SIAM Journal Discrete Mathematics 3 (1990) 411–430.
[1987] N.Z. Shor, An approach to obtaining global extremums in polynomial math-
ematical programming problems, Kibernetika 5 (1987) 102–106.
[1974] G. Stengle. A Nullstellensatz and a Positivstellensatz in semialgebraic ge-
ometry. Math. Ann. 207 (1974) 87–97.
[2001] M.J. Todd, Semidefinite optimization, Acta Numerica 10 (2001) 515–560.
[2012] L. Trevisan, On Khot’s unique games conjecture, American Mathematical
Society, Bulletin, New Series 49 (2012) 91–111.
[2005] M. Trnovská, Strong duality conditions in semidefinite programming, Jour-
nal of Electrical Enigeneering 56 (2005) 1–5.
[2010] L. Tunçel, Polyhedral and Semidefinite Programming Methods in Combi-
natorial Optimization [Fields Institute Monographs 27], American Mathe-
matical Society, Providence, Rhode Island; Fields Institute for Research in
Mathematical Sciences, Toronto, Ontario, 2010.
[2008] F. Vallentin, Lecture notes: Semidefinite programs and harmonic analysis,
arXiv:0809.2017 [math.OC], 2008.
[2008] F. Vallentin, Optimal distortion embeddings of distance regular graphs into
Euclidean spaces, Journal of Combinatorial Theory, Series B 98 (2008),
95–104.
[1996] L. Vandenberghe, S. Boyd, Semidefinite programming, SIAM Review 38
(1996), 49-–95.
[1998] L. Vandenberghe, S. Boyd, S.-P. Wu, Determinant maximization with lin-
ear matrix inequality constraints, SIAM Journal on Matrix Analysis and
Applications 19 (1998), 499–533.
[2000] H. Wolkowic, R. Saigal, L. Vandenberghe (eds.), Handbook of Semidefinite
Programming, Boston, Kluwer Academic, 2000.
[2005] M.H. Wright, The interior-point revolution in optimization: history, recent
developments, and lasting consequence, American Mathematical Society,
Bulletin, New Series 42 (2005) 39–56.
[1997] Y. Ye, Interior Point Algorithms — Theory and Analysis [Wiley-
Interscience Series in Discrete Mathematics and Optimization], Wiley, New
York, 1997.
[1995] G.M. Ziegler, Lectures on Polytopes [Graduate Texts in Mathematics 152],
Springer, New York, 1995.

You might also like