0% found this document useful (0 votes)
12 views8 pages

Lecture 03

This lecture discusses the application of semidefinite programming in formulating convex relaxations for nonconvex optimization problems, focusing on binary quadratic optimization. It presents the MAXCUT problem as a specific case and explores techniques for obtaining upper and lower bounds on the optimal value through semidefinite relaxations and Lagrangian duality. The lecture also introduces rounding methods to derive feasible solutions with approximation guarantees, particularly referencing the work of Goemans and Williamson.

Uploaded by

Paul Phineas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Lecture 03

This lecture discusses the application of semidefinite programming in formulating convex relaxations for nonconvex optimization problems, focusing on binary quadratic optimization. It presents the MAXCUT problem as a specific case and explores techniques for obtaining upper and lower bounds on the optimal value through semidefinite relaxations and Lagrangian duality. The lecture also introduces rounding methods to derive feasible solutions with approximation guarantees, particularly referencing the work of Goemans and Williamson.

Uploaded by

Paul Phineas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MIT 6.

972 Algebraic techniques and semidefinite optimization February 14, 2006

Lecture 3
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo

In this lecture, we will discuss one of the most important applications of semidefinite programming,
namely its use in the formulation of convex relaxations of nonconvex optimization problems. We will
present the results from several different, but complementary, points of view. These will also serve us as
starting points for the generalizations to be presented later in the course.
We will discuss first the case of binary quadratic optimization, since in this case the notation is
simpler, and perfectly illustrates many of the issues appearing in more complicated problems. Afterwards,
a more general formulation containing arbitrary linear and quadratic constraints will be presented.

1 Binary optimization
Binary (or Boolean) quadratic optimization is a classical combinatorial optimization problem. In the
version we consider, we want to minimize a quadratic function, where the decision variables can only
take the values ±1. In other words, we are minimizing an (indefinite) quadratic form over the vertices
of an n­dimensional hypercube. The problem is formally expressed as:

minimize xT Qx
(1)
s.t. xi ∈ {−1, 1}

where Q ∈ S n . There are many well­known problems that can be naturally written in the form above.
Among these, we mention the maximum cut problem (MAXCUT) discussed below, the 0­1 knapsack,
the linear quadratic regulator (LQR) control problem with binary inputs, etc.
Notice that we can model the Boolean constraints using quadratic equations, i.e.,

x2i − 1 = 0 ⇐⇒ xi ∈ {−1, 1}.

These n quadratic equations define a finite set, with an exponential number of elements, namely all
the n­tuples with entries in {−1, 1}. There are exactly 2n points in this set, so a direct enumeration
approach to (1) is computationally prohibitive when n is large (already for n = 30, we have 2n ≈ 109 ).
We can thus write the equivalent polynomial formulation:

minimize xT Qx
(2)
s.t. x2i = 1

We will denote the optimal value and optimal solution of this problem as f� and x� , respectively. It is
well­known that the decision version of this problem is NP­complete (e.g., [GJ79]). Notice that this is
true even if the matrix Q is positive definite (i.e., Q � 0), since we can always make Q positive definite
by adding to it a constant multiple of the identity (this only shifts the objective by a constant).

Example 1 (MAXCUT) The maximum cut (MAXCUT) problem consists in finding a partition of
the nodes of a graph G = (V, E) into two disjoint sets V1 and V2 (V1 ∩ V2 = ∅, V1 ∪ V2 = V ), in such a
way to maximize the number of edges that have one endpoint in V1 and the other in V2 . It has important
practical applications, such as optimal circuit layout. The decision version of this problem (does there
exist a cut with value greater than or equal to K?) is NP­complete [GJ79].
We can easily rewrite the MAXCUT problem as a binary optimization problem. A standard formu­
lation (for the weighted problem) is the following:
1�
max wij (1 − yi yj ), (3)
yi ∈{−1,1} 4
i,j

3­1
where wij is the weight corresponding to the (i, j) edge, and is zero if the nodes i and j are not connected.
The constraints yi ∈ {−1, 1} are equivalent to the quadratic constraints yi2 = 1.
We can easily convert the MAXCUT formulation into binary quadratic programming. Removing the
constant term, and changing the sign, the original problem is clearly equivalent to:

min
2
wij yi yj . (4)
yi =1
i,j

1.1 Semidefinite relaxations


Computing “good” solutions to the binary optimization problem given in (2) is a quite difficult task, so
it is of interest to produce accurate bounds on its optimal value. As in all minimization problems, upper
bounds can be directly obtained from feasible points. In other words, if x0 ∈ Rn has entries equal to
±1, it always holds that f� ≤ xT0 Qx0 (of course, for a poorly chosen x0 , this upper bound may be very
loose).
To prove lower bounds, we need a different technique. There are several approaches to do this, but
as we will see in detail in the next sections, many of them will turn out to be exactly equivalent in the
end. Indeed, many of these different approaches will yield a characterization of a lower bound in terms
of the following primal­dual pair of semidefinite programming problems:

minimize Tr QX maximize Tr Λ
s.t. Xii = 1 s.t. Q � Λ (5)
X�0 Λ diagonal

In the next sections, we will derive these SDPs several times, in a number of different ways. Let us
notice here first that for this primal­dual pair of SDP, strong duality always holds, and both achieve
their corresponding optimal solutions (why?).

1.2 Lagrangian duality


A general approach to obtain lower bounds on the value of general (non)convex minimization problems
is to use Lagrangian duality. As we have seen the original Boolean minimization problem can be written
as:
minimize xT Qx
(6)
s.t. x2i − 1 = 0
For notational convenience, let Λ = diag(λ1 , . . . , λn ). Then, the Lagrangian function can be written as:
n

L(x, λ) = xT Qx − λi (x2i − 1) = xT (Q − Λ)x + Tr Λ.
i=1

For the dual function g(λ) := inf x L(x, λ) to be bounded below, we need the implicit constraint that the
matrix Q − Λ must be positive semidefinite. In this case, the optimal value of x is zero, and thus we
obtain a lower bound given by the solution of the SDP:

maximize TrΛ
(7)
s.t. Q − Λ � 0

This is exactly the dual side of the SDP in (5).

3­2
1
E2

0 E1

−1

−1 0 1

Figure 1: The ellipsoids E1 and E2 .

1.3 Underestimator of the objective


A different but related interpretation of the SDP relaxation (5) is through the notion of an underestimator
of the objective function. Indeed, the quadratic function xT Λx is an “easily optimizable” function that
is guaranteed to lie below the desired objective xT Qx. To see this, notice that for any feasible x we have
n

xT Qx ≥ xT Λx = Λii xi2 = Tr Λ,
i=1

where
• The first inequality follows from Q � Λ
• The second equation holds since the matrix Λ is diagonal
• Finally, the third one holds since xi ∈ {+1, −1}
There is also a nice corresponding geometric interpretation. For simplicity, we assume without loss
of generality that Q is positive definite. Then, the problem (2) can be intepreted as finding the largest
value of γ for which the ellipsoid {x ∈ Rn |xT Qx ≤ γ} does not contain a vertex of the unit hypercube.
Consider now the two ellipsoids in Rn defined by:

E1 = {x ∈ Rn | xT Qx ≤ TrΛ}
E2 = {x ∈ Rn | xT Λx ≤ TrΛ}.

The principal axes of ellipsoid E2 are aligned with the coordinates axes (since Λ is diagonal), and
furthermore its boundary contains all the vertices of the unit hypercube. Also, it is easy to see that the
condition Q � Λ implies E1 ⊆ E2 .
With these facts, it is easy to understand the related problem that the SDP relaxation is solving:
dilating E1 as much as possible, while ensuring the existence of another ellipsoid E2 with coordinate­
aligned axes and touching the hypercube in all 2n vertices; see Figure 1 for an illustration.

1.4 Probabilistic interpretation

To be written ToDo

3­3
-1
-0.5
0
0.5
1
1

0.5

-0.5

-1
-1
-0.5
0
0.5
1

Figure 2: The three­dimensional “spectraplex.” This is the set of 3 × 3 positive semidefinite matrices,
with unit diagonal.

1.5 Lifting and rank relaxation


We present yet another derivation of the SDP relaxations, this time focused on the primal side. Recall
the original formulation of the optimization problem (2). Define now X := xxT . By construction, the
matrix X ∈ S n satisfies X � 0, Xii = x2i = 1, and has rank one. Conversely, any matrix X with
X � 0, Xii = 1, rank X = 1
necessarily has the form X = xxT for some ±1 vector x (why?). Furthermore, by the cyclic property of
the trace, we can express the objective function directly in terms of the matrix X, via:
xT Qx = Tr xT Qx = Tr QxxT = Tr QX.
As a consequence, the original problem (2) can be exactly rewritten as:
minimize Tr QX
s.t. Xii = 1, X � 0
rank(X) = 1
This is almost an SDP problem (all the constraints are either linear or conic), except for the rank one
constraint on X. Since this is a minimization problem, a lower bound on the solution can be obtained
by dropping the (nonconvex) rank constraint, which enlarges the feasible set.
A useful interpretation is in terms of a nonlinear lifting to a higher dimensional space. Indeed, rather
than solving the original problem in terms of the n­dimensional vector x, we are instead solving � for the
n × n matrix X, effectively converting the problem from Rn to S n (which has dimension n+1

2 ).
Observe that this line of reasoning immediately shows that if we find an optimal solution X of the
SDP (5) that has rank one, then we have solved the original problem. Indeed, in this case the upper
and lower bounds on the solution coincide.
As a graphical illustration, in Figure 2 we depict the set of 3 × 3 positive semidefinite matrices of
unit diagonal. The rank one matrices correspond to the four “vertices” of this convex set, and are in
(two­to­one) correspondence with the eight 3­vectors with ±1 entries.
In general, it is not the case that the optimal solution of the SDP relaxation will be rank one.
However, as we will see in the next section, it is possible to use rounding schemes to obtain “nearby”
rank one solutions. Furthermore, in some cases, it is possible to do so while obtaining some approximation
guarantees on the quality of the rounded solutions.

3­4
2 Bounds: Goemans­Williamson and Nesterov
So far, our use of the SDP relaxation (5) has been limited to providing only a posteriori bounds on the
optimal solution of the original minimization problem. However, two desirable features are missing:
• Approximation guarantees: is it possible to prove general properties on the quality of the bounds
obtained by SDP?
• Feasible solutions: can we (somehow) use the SDP relaxations to provide not just bounds, but
actual feasible points with good (or optimal) values of the objective?
As we will see, it turns out that both questions can be answered in the positive. As it has been shown
by Goemans and Williamson [GW95] in the MAXCUT case, and Nesterov in a more general setting,
we can actually achieve both of these objectives by randomly “rounding” in an appropriate manner the
solution X of this relaxation. We discuss these results below.

2.1 Goemans and Williamson rounding


In their celebrated MAXCUT paper, Goemans and Williamson developed the following randomized
method for finding a “good” feasible cut from the solution of the SDP.
• Factorize X as X = V T V , where V = [v1 . . . vn ] ∈ Rr×n , where r is the rank of X.
• Then Xij = viT vj , and since Xii = 1 this factorization gives n vectors vi on the unit sphere in Rr
• Instead of assigning either 1 or −1 to each variable, we have assigned to each a point on the unit
sphere in Rr .
• Now, choose a random hyperplane in Rr , and assign to each variable xi either a +1 or a −1,
depending on which side of the hyperplane the point vi lies.
It turns out that this procedure gives a solution that, on average, is quite close to the value of the
SDP bound. We will compute the expected value of the rounded solution in a slightly different form
from the original G­W argument, but one that will be helpful later. The random hyperplane can be
characterized by its normal vector p, which is chosen to be uniformly distributed on the unit sphere
(e.g., by suitably normalizing a standard multivariate Gaussian random variable). Then, according to
the description above, the rounded solution is given by xi = sign(pT vi ). The expected value of this
solution can then be written as:
� �
Ep [xT Qx] = Qij Ep [xi xj ] = Qij Ep [sign(pT vi ) sign(pT vj )].
ij ij

We can easily compute the value of this expectation. Consider the plane spanned by vi and vj , and let
θij be the angle between these two vectors. Then, it is easy to see that the desired expectation is equal
to the probability that both points are on the same side of the hyperplane, minus the probability that
θ θ
they are on different sides. These probabilities are 1 − πij and πij , respectively. Thus, the expected
value of the rounded solution is exactly:
� � � � �
� 2θij 2 2�
Qij 1 − = Qij 1 − arccos(viT vj ) = Qij arcsin Xij . (8)
ij
π ij
π π ij

Notice that the expression is of course well­defined, since if X is PSD and has unit diagonal, all its
entries are bounded in absolute value by 1. This result exactly characterizes the expected value of the
rounding procedure, as a function of the optimal solution of the SDP. We would like, however, to directly
relate this quantity to the optimal solution of the original optimization problem. For this, we will need
additional assumptions on the matrix Q. We discuss next two of the most important results in this
direction.

3­5
2

1.8
(2/π) arccos t
1.6
α (1−t)
1.4

1.2

0.8

0.6

0.4

0.2

0
−1 −0.5 0 0.5 1
t

Figure 3: Bound on the inverse cosine function, for α ≈ 0.878.

2.2 MAXCUT bound


Recall from (3) that for the MAXCUT problem, the objective function does not only include the quadratic
part, but there is actually a constant term:
1�
wij (1 − yi yj ).
4 ij

The expected value of the cut is then:


� �
1� 2 1 2�
csdp­expected = wij 1− arcsin Xij = · wij arccos Xij .
4 ij π 4 π ij

On the other hand, the solution of the SDP gives an upper bound on the cut capacity equal to:
1�
csdp­upper­bound = wij (1 − Xij ).
4 ij

To relate these two quantities, we look for a constant α such that


2
α (1 − t) ≤ arccos(t) for all t ∈ [−1, 1]
π
The best possible (i.e., largest) such constant is α = 0.878; see Figure 3. So we have
1 1 2� 1
csdp­upper­bound ≤ · · wij arccos Xij = csdp­expected
α 4 π ij α

Notice that here we have used the nonnegativity of the weights (i.e., wij ≥ 0). Thus, so far we have the
following inequalities:
1
• csdp­upper­bound ≤ α csdp­expected
• Also clearly csdp­expected ≤ cmax
• And cmax ≤ csdp­upper­bound
Putting it all together, we can sandwich the value of the relaxation as follows:

α · csdp­upper­bound ≤ csdp­expected ≤ cmax ≤ csdp­upper­bound .

3­6
2
2.3 Nesterov’s π
result
A result by Nesterov generalizes the MAXCUT bound described above, but for a larger class of problems.
The original formulation is for the case of binary maximization, and applies to the case when the matrix
A is positive semidefinite. Since the problem is homogeneous, the optimal value is guaranteed to be
nonnegative.
As we have seen, the expected value of the solution after randomized rounding is given by (8). Since
X is positive semidefinite, it follows from the nonnegativity of the Taylor series of arcsin(t) − t and the
Schur product theorem that
arcsin[X] � X,
where the arcsin function is applied componentwise. This inequality can be combined with (8) to give
the bounds:
2
· fsdp­upper­bound ≤ fsdp­expected ≤ fmax ≤ fsdp­upper­bound ,
π

where 2/π ≈ 0.636. For more details, see [BTN01, Section 4.3.4]. Among others, the paper [Meg01]
presents several new results, as well as a review of many of the available approximation schemes.

3 Linearly constrained problems


In this section we extend the earlier results, to general quadratic optimization problems under linear
and quadratic constraints. For notational simplicity, we write the constraints in homogeneous form, i.e.,
�T
in terms of the vector x = 1 y T .

The general primal form of the SDP optimization problems we are concerned with is

min xT Qx
s.t. xT Ai x ≥ 0
Bx ≥ 0
� �
1
x=
y

The corresponding primal and dual SDP relaxations are given by

min Q • X max γ
s.t. Ai • X ≥ 0 λi Ai + B T N B

s.t. Q � γ E11 + i
T
BXB ≥ 0 λi ≥ 0 (9)
E11 • X = 1 N ≥0
X�0 Nii = 0

Here E11 denotes the matrix with a 1 on the (1, 1) component, and all the rest being zero. The dual
variables λi can be interpreted as Lagrange multipliers associated to the quadratic constraints of the
primal problem, while N corresponds to pairwise products of the linear constraints.

References
[BTN01] A. Ben­Tal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series
on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,
2001.

3­7
[GJ79] M. R. Garey and D. S. Johnson. Computers and Intractability: A guide to the theory of
NP­completeness. W. H. Freeman and Company, 1979.
[GW95] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut
and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6):1115–
1145, 1995.
[Meg01] A. Megretski. Relaxations of quadratic programs in operator theory and system analysis.
In Systems, approximation, singular integral operators, and related topics (Bordeaux, 2000),
volume 129 of Oper. Theory Adv. Appl., pages 365–392. Birkhäuser, Basel, 2001.

3­8

You might also like