0% found this document useful (0 votes)
30 views6 pages

Lec 18

The document discusses semidefinite programming and how it can be used to obtain approximation algorithms for combinatorial optimization problems such as maximum cut. It describes how the maximum cut problem can be formulated as a semidefinite program and relaxed, and then approximately solved using a technique called random hyperplane rounding on the solution to the relaxed problem.

Uploaded by

Arthur Costa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

Lec 18

The document discusses semidefinite programming and how it can be used to obtain approximation algorithms for combinatorial optimization problems such as maximum cut. It describes how the maximum cut problem can be formulated as a semidefinite program and relaxed, and then approximately solved using a technique called random hyperplane rounding on the solution to the relaxed problem.

Uploaded by

Arthur Costa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

princeton univ.

F’18 cos 521: Advanced Algorithm Design


Lecture 18: Semidefinite Programming
Lecturer: Christopher Musco

1 Positive Semidefinite matrices


Recall that a symmetric matrix A ∈ Rn×n is positive semidefinite (PSD) if

xT Ax ≥ 0 for all x ∈ Rn .

This property is equivalent to:

1. A has all non-negative eigenvalues.

2. A can be written as A = U T U for some U ∈ Rn×n . I.e. Aij = uTj ui where ui is the
ith column of U .

To denote that a matrix is PSD, we write A  0. A  B indicates that A − B is PSD,


or equivalently that xT Ax ≥ xT Bx for all x ∈ Rn . The symbols  and  can be used to
define an ordering on matrices, which is called the “Loewner ordering”. It’s a partial order.
While it’s impossible for both A  B and B  A to hold, it could be that neither does.

Exercise 1. Come up with a simple example where A  B and B  A.

The Loewner ordering has many useful properties. For example, A  B implies that
A−1  B −1 . A  B also implies, that for all i, σi (A) ≥ σi (B), where σi denotes the ith
singular values (which is the same as the ith eigenvalue for PSD matrices).1
You have to be careful though. For example, A  B ; A2  B 2 .
PSD matrices appear all the time in algorithmic applications, including some that we
have already seen. Graph Laplacians, Hessians of convex functions, covariance matrices,
and many other natural matrices are always PSD. As we will see today, PSD matrices are
also very useful in formulating optimization problems.

2 Semidefinite programming
The goal of semidefinite programming is to solve optimization problems where the input is
a matrix that is constrained to be PSD. I.e. we optimize over X ∈ Rn×n where X ∈ K and:

K = {M | M  0}.

K is a convex set: if X  0 and Y  0 are PSD then for all λ ∈ [0, 1], it’s easy to see that
λX + (1 − λ)Y  0. This realization leads to the following convex optimization problem:
1
The opposite statement is not true – it can be that σi (A) ≥ σi (B) for all i, but A  B.

1
2

Problem
P 1 (Semidefinite program – SDP). Let f be a convex function and let hM, N i
denote i,j Mij Nij . We seek to find X ∈ Rn×n which solves:

min f (X) such that:


X  0,
for i = 1, . . . , k, hAi , Xi ≥ bi .

Here A1 , . . . , Ak and b1 , . . . , bk are input constraints. It is very common to have:

f (X) = hC, Xi

for some C. I.e. to have our objective be a linear function in X.

Problem 1 is optimizing over a convex set, since the convex PSD constraint intersected
with k linear constraints forms a convex set. It can be view as a Linear Program with an
infinite number of constraints. Specifically, our constraints are equivalent to:

min f (X) such that:


n T
∀v ∈ R vv , X ≥ 0,
for i = 1, . . . , k, hAi , Xi ≥ bi .

The PSD constraint gives a compact way of encoding these infinite linear constraints.
in this sense, SDPs are strictly stronger than linear programs.

Exercise 2. Show that every LP can be written as an SDP. The idea is that a diagonal
matrix, i.e., with off-diagonal entries are 0, is PSD if and only if its entries are non-negative.

Semidefinite programs can be solved (relatively) efficiently with a variety of methods,


including the ellipsoid method and specially designed interior point methods. They model a
wide range of natural problems, several examples of which are outlined in [1]. One example
problem is as follows:

Example 1 (Minimum Volume Ellipsoid via an SDP). Suppose we have points v1 , . . . , vk ∈


Rn and we want to find the smallest (specifically, minimum volume) ellipsoid E that contains
these points. This problem can be formulated as a semidefinite program.

Recall from our lecture on the Ellipsoid Method that any ellipsoid E can be parameter-
ized by a PSD matrix X ∈ Rn×n and center c ∈ Rn , where a point y lies inside E if and
only if:

kXy − ck2 ≤ 1.

Also note that E’s volume is proportional to det(X −1 ) = ni=1 1/σi (X). With some work,
Q
it’s possible to verify that log(det(X −1 ))) = − log(det(X)) is a convex function in X. So
3

to solve the minimum volume ellipsoid problem we can solve:

min − log(det(X)) such that:


X  0,
for i = 1, . . . , k, kXvi − ck22 ≤ 0.

To check that the k constraints involving v1 , . . . , vk can be written P


as linear constraints on
2 T T T T
X, we simply note that kXz − ck2 = z Xz − 2z Xc + c c = c c + i,j Xij · (zi zj − 2zi cj ).

2.1 Alternative View of SDP


Since any PSD matrix X can be written as V T V where V = [v1 , . . . , vn ] is a matrix in
Rn×n , we can equivalently formulate the SDP problem as solving:

min f (V T V ) subject to Ai , V T V ≥ bi . (1)

3 Maximum Cut
Just as we saw for linear programs, SDPs can be very useful in obtaining approximation
algorithms for combinatorial optimization problems. In fact, it’s possible to use the same
“relax-and-round” framework that we saw for linear programs. Semidefinite programs allow
for a richer variety of relaxation and rounding techniques.
One classic example of a combinatorial problem that can be approximated using a
algorithm based on semidefinite programming is the maximum cut problem:

Problem 2 (Maximum cut). Give an undirected, unweighted graph G = (V, E) with |V | =


n, find S ⊂ V such that |E(S, V \ S)| is maximized. |E(S, V \ S)| denotes the number of
edges between nodes in S and nodes not in S – i.e. the size of the cut between S and V \ S.
Denote the optimal value for this problem by OP TM C = maxS |E(S, V \ S)|.

This problem can be formulated as an integer optimization problem:


X 1
max |ui − uj |2 . (2)
u1 ,...,un ∈{−1,1} 4
(i,j)∈E

If we set ui = 1 for all i ∈ S and −1 otherwise, then this objective function exactly captures
the size of the cut between S and V \ S: |ui − uj |2 = 0 if i, j are on the same side of the
cut and |ui − uj |2 = 4 if they’re on different sides.
Unfortunately solving (2) is NP-hard. It’s possible to solve approximately using a greedy
algorithm or LP relaxation, but both obtain objective values of just 12 OP TM C .
Our main result today is that the maximum cut problem can be approximated to much
better accuracy using an algorithm based on semidefinite program:

Theorem 3 (Goemans, Williamson ‘94 [2]). There is a randomized SDP rounding scheme
that finds a cut with expected size ≥ .878 · OP TM C .
4

3.1 SDP Relaxation


The Goemans and Williamson approach relaxes binary variables to continuous vectors:

ui ∈ {−1, 1} =⇒ vi ∈ Rn with kvi k2 = 1, ∀i

Specifically, they solve:

Problem 4 (Relaxed Maximum Cut).


X 1
max kvi − vj k22 . (3)
v1 ,...,vn , kvi k2 =1 ∀i 4
(i,j)∈E

This problem can be solved as a semidefinite program and we denote its optimal value by
OP TSDP .

To check that Problem 4 can be solved as an SDP, we refer to the formulation of


(1). The constraint that kvi k2 = 1 is simply a constraint that all diagonal entries of
X = V T V are 1, which can be encoded as a linear constraint. Additionally, since kvi −vj k22 =
viT vi + vjT vj − 2viT vj , our objective function (3) can be written as hC, Xi for some C.
Intuitively, Problem 4 seeks to arrange vectors on the unit circle in such away that
vectors corresponding to connected nodes i, j are placed as far as possible from each other.

Figure 1: SDP solutions are unit vectors which are arranged so that vectors vi and vj are
far apart when nodes i and j are connected with an edge in G.

Problem 4 is a valid relaxation of Problem 2. In particular, we have:

Claim 5.

OP TM C ≤ OP TSDP .

Proof. Given a solution u1 , . . . , un to Problem 2 we simply set vi = ui · e1 , where e1 =


[1, 0, . . . , 0]T is a standard basis vector. Then (3) exactly equals (2).
5

3.2 Random Hyperplane Rounding


To obtain a solution to Problem 2 from an optimal solution to Problem 4 we employ the
following rounding strategy:

1. Solve the semidefinite program in Problem 4 to obtain vectors v1 , . . . , vn .

2. Choose a random vector c ∈ Rn by choosing each entry to be an independent standard


Gaussian random variable.

3. Set ũi = sign cT vi .




Claim 6.
 
X 1 X 1
E |ũi − ũj |2  ≥ .878 · kvi − vj k22
4 4
(i,j)∈E (i,j)∈E

It follows that our rounded solution obtains an expected cut value ≥ .878 · OP TSDP , which
is ≥ .878 · OP TM C by Claim 5. Applying Markov’s, a few repeated trials ensures that we
obtain a good approximate max cut with high probability.

Proof. Since c is spherically symmetric our rounding strategy corresponds to choosing a


random n dimensional hyperplane through the origin. For all vectors vi placed on one side
of the hyperplane, node i belongs to S. The nodes corresponding to all vectors on the other
side of the hyperplace belong to V \ S. This approach is known as random hyperplane
rounding. It is visualized in Figure 2.

Figure 2: Our SDP solution is rounded by choosing a random hyperplane through the
origin and assigning nodes to each side of the cut based on what side of the hyperplane
their corresponding vector lies on. In this case, nodes i and j are placed on one side of the
cut, with node k placed on the other side. In other words, ũi = ũj = −ũk .

Intuitively, since vectors corresponding to connected nodes are in general placed as far
apart as possible by the SDP, it is more likely that the a random hyperplane separates
connected nodes, and thus that we obtain a large cut value.
Formally, we bound the expected number of edges cut in our solution ũ1 , . . . , ũn . Let
θij denote the angle (in radians) between vectors vi and vj . What is the probability that
nodes i and j end up on different sides of the cut after random hyperplane rounding? This
may seem a difficult n-dimensional calculation, until we realize that there is a 2-dimensional
6

subspace defined by vi , vj , and all that matters is the intercept of the random hyperplane
with this 2-dimensional subspace, which is a random line in this subspace.
θ
So this probability is exactly equal to πij . Thus by linearity of expectations,
X θij
E[Number of edges in cut defined by ũ1 , . . . , ũn ] = . (4)
π
{i,j}∈E

How do we relate this to OP TSDP ? We use the fact that hvi , vj i = cos θij to rewrite the
SDP objective as:
X 1 X 1 X 1
OP TSDP = kvi − vj k2 = (kvi k2 + kvj k2 − 2hvi , vj i) = (1 − cos θij ).
4 4 2
{i,j}∈E {i,j}∈E {i,j}∈E
(5)
To compare this objective function to (4) Goemans and Williamson observed that:

θ/π 2θ
1 = ≥ 0.87856 . . . ∀θ ∈ [0, π].
2 (1 − cos θ) π(1 − cos θ)

This is easy to verify by plotting e.g. in MATLAB.


It follows that the expected size of our cut ≥ 0.878 · OP TSDP ≥ 0.878 · OP TM C .

The saga of 0.878... The GW paper came on the heels of the PCP Theorem (1992)
which established that there is a constant  > 0 such that (1 − )-approximation to MAX-
CUT is NP-hard. In the ensuing few years this constant was improved. Meanwhile, most
researchers hoped that the GW algorithm could not be optimal. The most trivial relaxation,
the most trivial rounding, and an approximation ratio derived by MATLAB calculation: it
all just didn’t smell right. However, in 2005 Khot et al. showed that Khot’s unique games
conjecture implies that the GW algorithm cannot be improved by any polynomial-time
algorithm. (Aside: not all experts believe the unique games conjecture.)

References
[1] Vandenberghe, Lieven, and Stephen Boyd. Applications of semidefinite programming.
Applied Numerical Mathematics 29.3 (1999): 283-300.

[2] Goemans, Michel X., and David P. Williamson. Improved approximation algorithms for
maximum cut and satisfiability problems using semidefinite programming. Journal of
the ACM (JACM) 42.6 (1995): 1115-1145.

You might also like