Lec 18
Lec 18
xT Ax ≥ 0 for all x ∈ Rn .
2. A can be written as A = U T U for some U ∈ Rn×n . I.e. Aij = uTj ui where ui is the
ith column of U .
The Loewner ordering has many useful properties. For example, A B implies that
A−1 B −1 . A B also implies, that for all i, σi (A) ≥ σi (B), where σi denotes the ith
singular values (which is the same as the ith eigenvalue for PSD matrices).1
You have to be careful though. For example, A B ; A2 B 2 .
PSD matrices appear all the time in algorithmic applications, including some that we
have already seen. Graph Laplacians, Hessians of convex functions, covariance matrices,
and many other natural matrices are always PSD. As we will see today, PSD matrices are
also very useful in formulating optimization problems.
2 Semidefinite programming
The goal of semidefinite programming is to solve optimization problems where the input is
a matrix that is constrained to be PSD. I.e. we optimize over X ∈ Rn×n where X ∈ K and:
K = {M | M 0}.
K is a convex set: if X 0 and Y 0 are PSD then for all λ ∈ [0, 1], it’s easy to see that
λX + (1 − λ)Y 0. This realization leads to the following convex optimization problem:
1
The opposite statement is not true – it can be that σi (A) ≥ σi (B) for all i, but A B.
1
2
Problem
P 1 (Semidefinite program – SDP). Let f be a convex function and let hM, N i
denote i,j Mij Nij . We seek to find X ∈ Rn×n which solves:
f (X) = hC, Xi
Problem 1 is optimizing over a convex set, since the convex PSD constraint intersected
with k linear constraints forms a convex set. It can be view as a Linear Program with an
infinite number of constraints. Specifically, our constraints are equivalent to:
The PSD constraint gives a compact way of encoding these infinite linear constraints.
in this sense, SDPs are strictly stronger than linear programs.
Exercise 2. Show that every LP can be written as an SDP. The idea is that a diagonal
matrix, i.e., with off-diagonal entries are 0, is PSD if and only if its entries are non-negative.
Recall from our lecture on the Ellipsoid Method that any ellipsoid E can be parameter-
ized by a PSD matrix X ∈ Rn×n and center c ∈ Rn , where a point y lies inside E if and
only if:
kXy − ck2 ≤ 1.
Also note that E’s volume is proportional to det(X −1 ) = ni=1 1/σi (X). With some work,
Q
it’s possible to verify that log(det(X −1 ))) = − log(det(X)) is a convex function in X. So
3
3 Maximum Cut
Just as we saw for linear programs, SDPs can be very useful in obtaining approximation
algorithms for combinatorial optimization problems. In fact, it’s possible to use the same
“relax-and-round” framework that we saw for linear programs. Semidefinite programs allow
for a richer variety of relaxation and rounding techniques.
One classic example of a combinatorial problem that can be approximated using a
algorithm based on semidefinite programming is the maximum cut problem:
If we set ui = 1 for all i ∈ S and −1 otherwise, then this objective function exactly captures
the size of the cut between S and V \ S: |ui − uj |2 = 0 if i, j are on the same side of the
cut and |ui − uj |2 = 4 if they’re on different sides.
Unfortunately solving (2) is NP-hard. It’s possible to solve approximately using a greedy
algorithm or LP relaxation, but both obtain objective values of just 12 OP TM C .
Our main result today is that the maximum cut problem can be approximated to much
better accuracy using an algorithm based on semidefinite program:
Theorem 3 (Goemans, Williamson ‘94 [2]). There is a randomized SDP rounding scheme
that finds a cut with expected size ≥ .878 · OP TM C .
4
This problem can be solved as a semidefinite program and we denote its optimal value by
OP TSDP .
Figure 1: SDP solutions are unit vectors which are arranged so that vectors vi and vj are
far apart when nodes i and j are connected with an edge in G.
Claim 5.
OP TM C ≤ OP TSDP .
Claim 6.
X 1 X 1
E |ũi − ũj |2 ≥ .878 · kvi − vj k22
4 4
(i,j)∈E (i,j)∈E
It follows that our rounded solution obtains an expected cut value ≥ .878 · OP TSDP , which
is ≥ .878 · OP TM C by Claim 5. Applying Markov’s, a few repeated trials ensures that we
obtain a good approximate max cut with high probability.
Figure 2: Our SDP solution is rounded by choosing a random hyperplane through the
origin and assigning nodes to each side of the cut based on what side of the hyperplane
their corresponding vector lies on. In this case, nodes i and j are placed on one side of the
cut, with node k placed on the other side. In other words, ũi = ũj = −ũk .
Intuitively, since vectors corresponding to connected nodes are in general placed as far
apart as possible by the SDP, it is more likely that the a random hyperplane separates
connected nodes, and thus that we obtain a large cut value.
Formally, we bound the expected number of edges cut in our solution ũ1 , . . . , ũn . Let
θij denote the angle (in radians) between vectors vi and vj . What is the probability that
nodes i and j end up on different sides of the cut after random hyperplane rounding? This
may seem a difficult n-dimensional calculation, until we realize that there is a 2-dimensional
6
subspace defined by vi , vj , and all that matters is the intercept of the random hyperplane
with this 2-dimensional subspace, which is a random line in this subspace.
θ
So this probability is exactly equal to πij . Thus by linearity of expectations,
X θij
E[Number of edges in cut defined by ũ1 , . . . , ũn ] = . (4)
π
{i,j}∈E
How do we relate this to OP TSDP ? We use the fact that hvi , vj i = cos θij to rewrite the
SDP objective as:
X 1 X 1 X 1
OP TSDP = kvi − vj k2 = (kvi k2 + kvj k2 − 2hvi , vj i) = (1 − cos θij ).
4 4 2
{i,j}∈E {i,j}∈E {i,j}∈E
(5)
To compare this objective function to (4) Goemans and Williamson observed that:
θ/π 2θ
1 = ≥ 0.87856 . . . ∀θ ∈ [0, π].
2 (1 − cos θ) π(1 − cos θ)
The saga of 0.878... The GW paper came on the heels of the PCP Theorem (1992)
which established that there is a constant > 0 such that (1 − )-approximation to MAX-
CUT is NP-hard. In the ensuing few years this constant was improved. Meanwhile, most
researchers hoped that the GW algorithm could not be optimal. The most trivial relaxation,
the most trivial rounding, and an approximation ratio derived by MATLAB calculation: it
all just didn’t smell right. However, in 2005 Khot et al. showed that Khot’s unique games
conjecture implies that the GW algorithm cannot be improved by any polynomial-time
algorithm. (Aside: not all experts believe the unique games conjecture.)
References
[1] Vandenberghe, Lieven, and Stephen Boyd. Applications of semidefinite programming.
Applied Numerical Mathematics 29.3 (1999): 283-300.
[2] Goemans, Michel X., and David P. Williamson. Improved approximation algorithms for
maximum cut and satisfiability problems using semidefinite programming. Journal of
the ACM (JACM) 42.6 (1995): 1115-1145.