Lecture 03
Lecture 03
Lecture 3
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo
In this lecture, we will discuss one of the most important applications of semidefinite programming,
namely its use in the formulation of convex relaxations of nonconvex optimization problems. We will
present the results from several different, but complementary, points of view. These will also serve us as
starting points for the generalizations to be presented later in the course.
We will discuss first the case of binary quadratic optimization, since in this case the notation is
simpler, and perfectly illustrates many of the issues appearing in more complicated problems. Afterwards,
a more general formulation containing arbitrary linear and quadratic constraints will be presented.
1 Binary optimization
Binary (or Boolean) quadratic optimization is a classical combinatorial optimization problem. In the
version we consider, we want to minimize a quadratic function, where the decision variables can only
take the values ±1. In other words, we are minimizing an (indefinite) quadratic form over the vertices
of an ndimensional hypercube. The problem is formally expressed as:
minimize xT Qx
(1)
s.t. xi ∈ {−1, 1}
where Q ∈ S n . There are many wellknown problems that can be naturally written in the form above.
Among these, we mention the maximum cut problem (MAXCUT) discussed below, the 01 knapsack,
the linear quadratic regulator (LQR) control problem with binary inputs, etc.
Notice that we can model the Boolean constraints using quadratic equations, i.e.,
These n quadratic equations define a finite set, with an exponential number of elements, namely all
the ntuples with entries in {−1, 1}. There are exactly 2n points in this set, so a direct enumeration
approach to (1) is computationally prohibitive when n is large (already for n = 30, we have 2n ≈ 109 ).
We can thus write the equivalent polynomial formulation:
minimize xT Qx
(2)
s.t. x2i = 1
We will denote the optimal value and optimal solution of this problem as f� and x� , respectively. It is
wellknown that the decision version of this problem is NPcomplete (e.g., [GJ79]). Notice that this is
true even if the matrix Q is positive definite (i.e., Q � 0), since we can always make Q positive definite
by adding to it a constant multiple of the identity (this only shifts the objective by a constant).
Example 1 (MAXCUT) The maximum cut (MAXCUT) problem consists in finding a partition of
the nodes of a graph G = (V, E) into two disjoint sets V1 and V2 (V1 ∩ V2 = ∅, V1 ∪ V2 = V ), in such a
way to maximize the number of edges that have one endpoint in V1 and the other in V2 . It has important
practical applications, such as optimal circuit layout. The decision version of this problem (does there
exist a cut with value greater than or equal to K?) is NPcomplete [GJ79].
We can easily rewrite the MAXCUT problem as a binary optimization problem. A standard formu
lation (for the weighted problem) is the following:
1�
max wij (1 − yi yj ), (3)
yi ∈{−1,1} 4
i,j
31
where wij is the weight corresponding to the (i, j) edge, and is zero if the nodes i and j are not connected.
The constraints yi ∈ {−1, 1} are equivalent to the quadratic constraints yi2 = 1.
We can easily convert the MAXCUT formulation into binary quadratic programming. Removing the
constant term, and changing the sign, the original problem is clearly equivalent to:
�
min
2
wij yi yj . (4)
yi =1
i,j
minimize Tr QX maximize Tr Λ
s.t. Xii = 1 s.t. Q � Λ (5)
X�0 Λ diagonal
In the next sections, we will derive these SDPs several times, in a number of different ways. Let us
notice here first that for this primaldual pair of SDP, strong duality always holds, and both achieve
their corresponding optimal solutions (why?).
For the dual function g(λ) := inf x L(x, λ) to be bounded below, we need the implicit constraint that the
matrix Q − Λ must be positive semidefinite. In this case, the optimal value of x is zero, and thus we
obtain a lower bound given by the solution of the SDP:
maximize TrΛ
(7)
s.t. Q − Λ � 0
32
1
E2
0 E1
−1
−1 0 1
where
• The first inequality follows from Q � Λ
• The second equation holds since the matrix Λ is diagonal
• Finally, the third one holds since xi ∈ {+1, −1}
There is also a nice corresponding geometric interpretation. For simplicity, we assume without loss
of generality that Q is positive definite. Then, the problem (2) can be intepreted as finding the largest
value of γ for which the ellipsoid {x ∈ Rn |xT Qx ≤ γ} does not contain a vertex of the unit hypercube.
Consider now the two ellipsoids in Rn defined by:
E1 = {x ∈ Rn | xT Qx ≤ TrΛ}
E2 = {x ∈ Rn | xT Λx ≤ TrΛ}.
The principal axes of ellipsoid E2 are aligned with the coordinates axes (since Λ is diagonal), and
furthermore its boundary contains all the vertices of the unit hypercube. Also, it is easy to see that the
condition Q � Λ implies E1 ⊆ E2 .
With these facts, it is easy to understand the related problem that the SDP relaxation is solving:
dilating E1 as much as possible, while ensuring the existence of another ellipsoid E2 with coordinate
aligned axes and touching the hypercube in all 2n vertices; see Figure 1 for an illustration.
To be written ToDo
33
-1
-0.5
0
0.5
1
1
0.5
-0.5
-1
-1
-0.5
0
0.5
1
Figure 2: The threedimensional “spectraplex.” This is the set of 3 × 3 positive semidefinite matrices,
with unit diagonal.
34
2 Bounds: GoemansWilliamson and Nesterov
So far, our use of the SDP relaxation (5) has been limited to providing only a posteriori bounds on the
optimal solution of the original minimization problem. However, two desirable features are missing:
• Approximation guarantees: is it possible to prove general properties on the quality of the bounds
obtained by SDP?
• Feasible solutions: can we (somehow) use the SDP relaxations to provide not just bounds, but
actual feasible points with good (or optimal) values of the objective?
As we will see, it turns out that both questions can be answered in the positive. As it has been shown
by Goemans and Williamson [GW95] in the MAXCUT case, and Nesterov in a more general setting,
we can actually achieve both of these objectives by randomly “rounding” in an appropriate manner the
solution X of this relaxation. We discuss these results below.
We can easily compute the value of this expectation. Consider the plane spanned by vi and vj , and let
θij be the angle between these two vectors. Then, it is easy to see that the desired expectation is equal
to the probability that both points are on the same side of the hyperplane, minus the probability that
θ θ
they are on different sides. These probabilities are 1 − πij and πij , respectively. Thus, the expected
value of the rounded solution is exactly:
� � � � �
� 2θij 2 2�
Qij 1 − = Qij 1 − arccos(viT vj ) = Qij arcsin Xij . (8)
ij
π ij
π π ij
Notice that the expression is of course welldefined, since if X is PSD and has unit diagonal, all its
entries are bounded in absolute value by 1. This result exactly characterizes the expected value of the
rounding procedure, as a function of the optimal solution of the SDP. We would like, however, to directly
relate this quantity to the optimal solution of the original optimization problem. For this, we will need
additional assumptions on the matrix Q. We discuss next two of the most important results in this
direction.
35
2
1.8
(2/π) arccos t
1.6
α (1−t)
1.4
1.2
0.8
0.6
0.4
0.2
0
−1 −0.5 0 0.5 1
t
On the other hand, the solution of the SDP gives an upper bound on the cut capacity equal to:
1�
csdpupperbound = wij (1 − Xij ).
4 ij
Notice that here we have used the nonnegativity of the weights (i.e., wij ≥ 0). Thus, so far we have the
following inequalities:
1
• csdpupperbound ≤ α csdpexpected
• Also clearly csdpexpected ≤ cmax
• And cmax ≤ csdpupperbound
Putting it all together, we can sandwich the value of the relaxation as follows:
36
2
2.3 Nesterov’s π
result
A result by Nesterov generalizes the MAXCUT bound described above, but for a larger class of problems.
The original formulation is for the case of binary maximization, and applies to the case when the matrix
A is positive semidefinite. Since the problem is homogeneous, the optimal value is guaranteed to be
nonnegative.
As we have seen, the expected value of the solution after randomized rounding is given by (8). Since
X is positive semidefinite, it follows from the nonnegativity of the Taylor series of arcsin(t) − t and the
Schur product theorem that
arcsin[X] � X,
where the arcsin function is applied componentwise. This inequality can be combined with (8) to give
the bounds:
2
· fsdpupperbound ≤ fsdpexpected ≤ fmax ≤ fsdpupperbound ,
π
where 2/π ≈ 0.636. For more details, see [BTN01, Section 4.3.4]. Among others, the paper [Meg01]
presents several new results, as well as a review of many of the available approximation schemes.
The general primal form of the SDP optimization problems we are concerned with is
min xT Qx
s.t. xT Ai x ≥ 0
Bx ≥ 0
� �
1
x=
y
min Q • X max γ
s.t. Ai • X ≥ 0 λi Ai + B T N B
�
s.t. Q � γ E11 + i
T
BXB ≥ 0 λi ≥ 0 (9)
E11 • X = 1 N ≥0
X�0 Nii = 0
Here E11 denotes the matrix with a 1 on the (1, 1) component, and all the rest being zero. The dual
variables λi can be interpreted as Lagrange multipliers associated to the quadratic constraints of the
primal problem, while N corresponds to pairwise products of the linear constraints.
References
[BTN01] A. BenTal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series
on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,
2001.
37
[GJ79] M. R. Garey and D. S. Johnson. Computers and Intractability: A guide to the theory of
NPcompleteness. W. H. Freeman and Company, 1979.
[GW95] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut
and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6):1115–
1145, 1995.
[Meg01] A. Megretski. Relaxations of quadratic programs in operator theory and system analysis.
In Systems, approximation, singular integral operators, and related topics (Bordeaux, 2000),
volume 129 of Oper. Theory Adv. Appl., pages 365–392. Birkhäuser, Basel, 2001.
38