Advanced Algorithms Approximation Algorithms
Advanced Algorithms Approximation Algorithms
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1 18.415/6.854 Advanced Algorithms November 1994
Introduction
Many of the optimization problems we would like to solve are NP-hard. There are
several ways of coping with this apparent hardness. For most problems, there are
straightforward exhaustive search algorithms, and one could try to speed up such an
algorithm. Techniques which can be used include divide-and-conquer (or the refined
branch-and-bound which allows to eliminate part of the search tree by computing, at
every node, bounds on the optimum value), dynamic programming (which sometimes
leads to pseudo-polynomial algorithms), cutting plane algorithms (in which one tries
to refine a linear programming relaxation to better match the convex hull of integer
solutions), randomization, etc. Instead of trying to obtain an optimum solution, we
could also settle for a suboptimal solution. The latter approach refers to heuristic
or "rule of thumb" methods. The most widely used such methods involve some sort
of local search of the problem space, yielding a locally optimal solution. In fact,
heuristic methods can also be applied to polynomially solvable problems for which
existing algorithms are not "efficient" enough. A @(n1O)algorithm (or even a linear
time algorithm with a constant of 10lo0),although efficient from a complexity point
of view, will probably never get implemented because of its inherent inefficiency.
The drawback with heuristic algorithms is that it is difficult to compare them.
Which is better, which is worse? For this purpose, several kinds of analyses have
been introduced.
1. Empirical analysis. Here the heuristic is tested on a bunch of (hopefully
meaningful) instances, but there is no guarantee that the behavior of the heuris-
tic on these instances will be LLtypical"(what does it mean to be typical?).
2. Average-case analysis, dealing with the average-case behavior of a heuristic
over some distribution of instances. The difficulty with this approach is that it
can be difficult to find a distribution that matches the real-life data an algorithm
will face. Probabilistic analyses tend to be quite hard.
3. Worst-case analysis. Here, one tries to evaluate the performance of the
heuristic on the worst possible instance. Although this may be overly pes-
simistic, it gives a stronger guarantee about an algorithm's behavior. This is
the type of analysis we will be considering in these notes.
To this end, we introduce the following definition:
Approx- 1
Definition 1 The performance guarantee of a heuristic algorithm for a minimization
(maximization) problem is a if the algorithm is guaranteed to deliver a solution whose
value is at most (at least) a times the optimal value.
Definition 2 An a-approximation algorithm is a polynomial time algorithm with a
performance guarantee of a .
Before presenting techniques to design and analyze approximation algorithms as
well as specific approximation algorithms, we should first consider which performance
guarantees are unlikely to be achievable.
Negative Results
For some hard optimization problems, it is possible to show a limit on the performance
guarantee achievable in polynomial-time (assuming P # N P ) . A standard method
for proving results of this form is to show that the existence of an a-approximation
algorithm would allow you to solve some NP-complete decision problem in polynomial
time. Even though NP-complete problems have equivalent complexity when exact
solutions are desired, the reductions don't necessarily preserve approximability. The
class of NP-complete problems can be subdivided according to how well a problem
can be approximated.
As a first example, for the traveling salesman problem (given nonnegative lengths
on the edges of a complete graph, find a tour a closed walk visiting every vertex
-
The proof goes as follows. Take any NP-complete language L. Consider the
verifier V given by the characterization of Arora et al. The number of possible
output of the O(1og n) toin cosses is S = 2'('"gn) which is polynomial in n. Consider
any outcome of these coin tosses. This gives k bits, say i l , . . . , ik to examine in the
proof y. Based on these k bits, V will decide whether to answer yes or no. The
condition that it answers yes can be expressed as a boolean formula on these k bits
(with the Boolean variables being the bits of y). This formula can be expressed as
the disjunction ("or") of conjunctions ("and") of k literals, one for each satisfying
assignment. Equivalently, it can be written as the conjunction of disjunction of k
literals (one for each rejecting assignment). Since k is 0(1),this latter k-SAT formula
with at most 2k clauses can be expressed as a 3-SAT formula with a constant number
of clauses and variables (depending exponentially on k). (More precisely, using the
classical reduction from SAT to 3-SAT, we would get a 3-SAT formula with at most
k2k clauses and variables.) Call this constant number of clauses M 5 k2k = O(1).
If x E L, we know that there exists a proof y such that all SM clauses obtained by
concatenating the clauses for each random outcome is satisfiable. However, if x f- L,
for any y , the clauses corresponding to at least half the possible random outcomes
cannot be all satisfied. This means that if x 6 L, at least S/2 clauses cannot be
satisfied. Thus either all SM clauses can be satisfied or at most SM - $ clauses
can be satisfied. If we had an approximation algorithm with performance guarantee
better than 1 - E where E = & we could decide whether x E L or not, in polynomial
The class of MAX-SNP problems is defined in the next section and the corollary
is derived there. We first give some examples of problems that are complete for
MAX-SNP.
1. MAX 2-SAT: Given a set of clauses with one or two literals each, find an as-
signment that maximizes the number of satisfied clauses.
2. MAX k-SAT: Same as MAX 2-SAT, but each clause has up to k literals.
3. MAX CUT: Find a subset of the vertices of a graph that maximizes the number
of edges crossing the associated cut.
Where P(G, y, x ) is true iff y appears positively in clause x, and N(G, y , x ) is true iff
y appears negated in clause x.
Strict NP is the set of problems in NP that can be defined without the the third
quantifier :
In this way, we can derive an optimization problem from an SNP predicate. These
maximization problems comprise the class MAX-SNP (MAXimization, Strict NP)
defined by Papadimitriou and Yannakakis 1211. Thus, MAX 3SAT is in MAX-SNP.
Papadimitriou and Yannakakis then introduce an L-reduction (L for linear), which
preserverses approximability. In particular, if P L-reduces to PI, and there exists an
a-approximation algorithm for P', then there exists a .ya-approximation algorithm
for P, where y is some constant depending on the reduction.
Given L-reductions, we can define MAX-SNP complete problems to be those P E
MAX-SNP for which Q S L P for all Q E MAX-SNP. Some examples of MAX-SNP
complete problems are MAX 3SAT, MAX 2SAT (and in fact MAX kSAT for any
fixed k > I ) , and MAX-CUT. The fact that MAX 3SAT is MAX-SNP-complete and
Theorem 1 implies the corollary mentioned previously.
For MAX 3SAT, E in the statement of Theorem 1 can be chosen can be set to
1/ 74 (Bellare and Sudan [5]).
Minimization problems may not be able to be expressed so that they are in MAX-
SNP, but they can still be MAX-SNP hard. Examples of such problems are:
TSP with edge weights 1 and 2 (i.e., d(i, j) E {1,2} for all i, j ) . In this case,
there exists a 716-approximation algorithm due to Papadimitriou and Yan-
nakakis.
Minimum Vertex Cover. (Given a graph G = (V, E), a vertex cover is a set
S & V such that (u,v) E E j u E S or v E S.)
The Design of Approximation Algorithms
We now look at key ideas in the design and analysis of approximation algorithms.
We will concentrate on minimization problems, but the ideas apply equally well to
maximization problems. Since we are interested in the minimization case, we know
that an a-approximation algorithm H has cost CH 5 aCOpTwhere COPTis the cost
of the optimal solution, and a 2 1.
Relating CH to COPTdirectly can be difficult. One reason is that for NP-hard
problems, the optimum solution is not well characterized. So instead we can relate
the two in two steps:
2. Let 0 be the odd degree vertices in T. One can prove that 101 is
' even.
4. Add the edges in M to E. Now the degree of every vertex of G is even. Therefore
G has an Eulerian tour. Trace the tour, and take shortcuts when the same vertex
is reached twice. This cannot increase the cost since the triangle inequality
holds.
Here ZT is the cost of the minimum spanning tree and ZM is the cost of the matching.
Clearly ZT 5 ZTSP,since if we delete an edge of the optimal tour a spanning tree
results, and the cost of the minimum spanning tree is at most the cost of that tree.
Therefore -$& 5 1.
To show & $ 5 consider the optimal tour visiting only the vertices in 0.
Clearly by the triangle inequality this is of length no more than ZTSP. There are an
even number of vertices in this tour, and so also an even number of edges, and the
tour defines two disjoint matchings on the graph induced by 0. At least one of these
has cost 5 $ZTSP, and the cost of ZM is no more than this.
A vertex cover U in a graph G = (V, E) is a subset of vertices such that every edge is
incident to at least one vertex in U . The vertex cover problem is defined as follows:
Given a graph G = (V, E) and weight w(v) 2 0 for each vertex v, find a vertex cover
U V minimizing w(U) = w(v). (Note that the problem in which nonpositive
weight vertices are allowed can be handled by including all such vertices in the cover,
deleting them and the incident edges, and finding a minimum weight cover of the
remaining graph. Although this reduction preserves optimality, it does not maintain
approximability; consider, for example, the case in which the optimum vertex cover
has 0 cost (or even negative cost) .)
This can be expressed as an integer program as follows. Let x(v) = 1 if v E U
and x(v) = 0 otherwise. Then
COPT= min C W ( V ) X ( V )
x E S vEV
where
L B = min w(v)x(v)
xER V E V
1. Rounding
Find an optimal solution x* to the relaxation. Round x* E R to an element
x' E S. Then prove f (x') 5 ag(x*) which implies
Initialize C (the vertex cover) to the empty set, y = 0 and F = E. The algorithm
proceeds by repeating the following two steps while F # 0:
1. Choose some e = (u, v) E F. Increase y (e) as much as possible, until inequal-
ity (3) becomes tight for u or v. Assume WLOG it is tight for u.
2. Add u to C and remove all edges incident to u from F.
Clearly C is a vertex cover. Furthermore
The Min-Cost Perfect Matching Problem
In this section, we illustrate the power of the primal-dual technique to derive approx-
imat ion algorithms. We consider the following problem.
The first polynomial time algorithm for this problem was given by Edmonds [8] and
has a running time of O(n4) where n = IVI. To date, the fastest strongly polynomial
time algorithm is due to Gabow [lo] and has a running time of O(n(m n 1g n)) +
where m = IEl. For dense graphs, m = O(n2),this algorithm gives a running time
of O(n3). The best weakly polynomial algorithm is due to Gabow and Tarjan [12]
and runs in time 0(md-lognC) where C is a bound on the costs c,.
For dense graphs with C = O(n), this bound gives an O*(n2-5) running time.
As you might suspect from these bounds, the algorithms involved are fairly com-
plicated. Also, these algorithms are too slow for many of the instances of the problem
that arise in practice. In this section, we discuss an approximation algorithm by Goe-
mans and Williamson 1131 that runs in time O(n2lgn). (This bound has recently
been improved by Gabow, Goemans and Williamson [ l l ] to O(n(n J-)).) +
Although MCPMP itself is in PTIME, this algorithm is sufficiently general to give
approximations for many NP-hard problems as well.
The algorithm of Goemans and Williamson is a 2-approximation algorithm - it
outputs a perfect matching with cost not more than a factor of 2 larger than the cost
of a minimum-cost perfect matching. This algorithm requires that the costs c, make
+
up a metric, that is, c, must respect the triangle inequality: cij cjk 2 cik for all
triples i , j , k of vertices.
Z = M i n x cex,
eEE
subject to: x xe 2 1
~EG(S)
for all S c V with IS1 odd
for all e E E.
We can now see that the value Z of this linear program is a lower bound on the cost
of any perfect matching. In particular, for any perfect matching M, we let
1 if e E M;
0 otherwise.
Clearly, this assignment is a feasible solution to the linear program, so we know that
Z 5 c(M). This bound also applies to a minimum-cost perfect matching M*, so we
have Z I c(M*).
Note that this is a huge linear program having one constraint for each S c V of
odd cardinality. Though it is too large to be solved in polynomial time by any of the
linear programming algorithms we have seen, the ellipsoid method can actually solve
this program in polynomial time. We do not consider this solution technique; rather
we let the linear program and its dual serve as a tool for developing and analyzing
the algorithm.
We now consider the dual linear program:
Z=Max x
scv,
ys
IS1 odd
c(Mt) I 2 C Ys.
scv,
IS1 odd
Approx- 11
Since y is dual feasible, we know that
YS 5 5 c(M)
scv,
IS1 odd
c(F') 52 C Ys.
sc v,
IS1 odd
We now show how to convert F' into a perfect matching M' such that c(Mr) 5
c(Fr). The idea is as follows. Starting from the forest F', consider any vertex v with
degree at least 3. Take two edges (u, v) and (v, w); remove them and replace them
with the single edge (u, w). Since the edge costs obey the triangle inequality, the
resulting forest must have a cost not more than c(F'). Thus, if we can iterate on this
operation until all vertices have degree 1, then we have our perfect matching M'.
The only thing that can get in the way of the operation just described is a vertex
of degree 2. Fortunately, we can show that all vertices of F' have odd degree. Notice
then that this property is preserved by the basic operation we are using. (As a direct
consequence, the property that all components are even is also preserved.) Therefore,
if all vertices of F' have odd degree, we can iteration the basic operation to produce
a perfect matching M' such that c(M') 5 c(Fr). Notice that M' is produced after
O ( n ) iterations.
Approx- 12
Proof: Suppose there is a vertex v with even degree, and let v be in component
A of F'. Removing v and all its incident edges partitions A into an even number k of
smaller components Al, A2, . . . ,Ak. If all k of these components have odd size, then
it must be the case that A has odd size. But we know that A has even size -- all
components of F' have even size -- so there must be a component Ai with even size.
Let vi denote the vertex in Ai such that (v,vi) is an edge of F'. Now if we start from
F' and remove the edge (v, vi), we separate A into two even size components. This
contradicts the edge-minimality of F'.
The algorithm
The algorithm must now output an edge-minimalforest F' with even size components
and be able to compute a dual feasible solution y such that c ( F f )I2 C ys.
At the highest level, the algorithm is:
1. Start with F = 0.
2. As long as there exists an odd size component of F, add an edge between two
components (at least one of which has odd size).
Note that the set of components of F is initially just the set of vertices V.
The choice of edges is guided by the dual linear program shown earlier. We start
with all the dual variables equal to zero; y s = 0. Suppose at some point in the
execution we have a forest F as shown below and a dual solution y. Look at the
Ys+
lYs+ 6 if S is an odd size component of F
, 7
otherwise.
Make 6 as large as possible while keeping y s dual feasible. By doing this, we make
the constraint on some edge e tight; for some e the constraint
becomes
This is the edge e that we add to F . (If more than one edge constraint becomes tight
simultaneously, then just arbitrarily pick one of the edges to add.)
We now state the algorithm to compute F'. The steps that compute the dual
feasible solution y are commented out by placing the text in curly braces.
F +0
C t {{i)I i E V ) {The components of F )
{Let ys t 0 for all S with IS1 odd.)
Vi E V do d(i) + 0 { d ( i )= Cs3iys)
while 3C E C with ICI odd do
Find edge e = ( i ,j ) such that i E C,, j E C,, p f q
ce-d(i)-d(j)
which minimizes 6 =
~(CP)+~(C,)
where X(C) = { ' / ~ ~ ~ (i.e., s ~ parity of C ) .
w ~ the
F -+F U { e )
VC E C with ICI odd do
Yi E C do d(i) t d(i)+ 6
{Let Yc + Yc 6.) +
c c \ {CP,C,) u {C, u C,)
+
F' t edge-minimal F
Proof: We show this by induction on the while loop. Specifically, we show that
at the start of each iteration, the values of the variables ys are feasible for the dual
linear program. We want to show that for each edge e E E,
scv,
e€S(S)
The base case is trivial since all variables ys are initialized to zero and the cost
function c, is nonnegative. Now consider an edge e' = ( i f j') , and an iteration. There
are two cases to consider.
In the first case, suppose both 'i and j' are in the same component at the start
of the iteration. In this case, there is no component C E C for which e' E S(C).
Therefore, since the only way a variable ys gets increased is when S is a component,
none of the variables ys with e' E 6 ( S )get increased at this iteration. By the induction
hypothesis, we assume the iteration starts with
scv,
e1€S(S)
and therefore, since the left-hand-side of this inequality does not change during the
iteration, this inequality is also satisfied at the start of the next iteration.
In the second case, suppose i' and j' are in different components; i' E Cpl and
j' E Cql at the start of the iteration. In this case, we can write
scv,
CYS+ scv,
C Ys = scv, Cys
Figure 1: A sample run of the algorithm. The various values of d ( i ) are indicated b y
the shaded regions around the components.
We'll assume a Euclidean distance metric to ease visualization. Now, initially, all
points (1through 8 ) are in separate components, and d ( i ) is 0 for all i. Since the metric
is Euclidean distance, the first edge to be found will be (7,8). Since both components
are of odd size, 6 will be half the distance between them ( ( c , - 0 - 0 ) / ( 1 + 1 ) ) . Since,
in fact, all components are of odd size, every d(i) will be increased by this amount,
as indicated by the innermost white circle around each point. The set { 7 , 8 ) now
becomes a single component of even size.
In general, we can see the next edge to be chosen by finding the pair of components
whose boundaries in the picture can be most easily made to touch. Thus, the next
edge is (3,5), since the boundaries of their regions are closest, and the resulting values
Approx-16
of d ( i ) are represented by the dark gray bands around points 1 through 6. Note that
the component {7,8} does not increase its d ( i ) values since it is of even size.
We continue in this way, expanding the "moats" around odd-sized components
until all components are of even size. Since there is an even number of vertices and
we always expand odd-sized components, we are guaranteed to reach such a point.
1. every component of F' has an even number of vertices and F' is edge minimal
with respect to this property..
Proof:
Let us first show that every component of F' has an even number of vertices.
Suppose not. Then consider the components of F . Every component of F has an
even number of vertices by design of the algorithm. Consider a component of F' which
has an odd number of vertices and let us denote it as T:. Let Tibe the component
that T,! belongs to in F . Let Nl, . . . ,Ni be the components of F within Ti obtained
by removing T: (see figure 3). Ti has an even number of vertices. Nk with 1 5 k 5 j
has an even number of vertices because, otherwise, the edge from Nk to T;I would
belong to F' by definition. But this implies that T,! is even, which is a contradiction.
2. Its removal divides a component into two odd sized components. Despite the
fact that other edges may be removed, as well, an two odd sized component will
remain in the forest. Thus, e cannot be removed.
Now let us prove the second portion of the theorem. In what follows, though we
do not explicitly notate it, when we refer to a set S of vertices, we mean a set S of
vertices with I
S1 odd. We observe that by the choice of the edges e in F, we have
6 C I~'n6(~)l~26l{C~C,lclodd)l
c~c,lclodd
x odd
c~c,lcl
IF'nS(C)II21{CEC,ICIodd]l
where dH(v) denotes the degree of node v in the graph H. Since F' is a forest, H is
also a forest and we have:
Number of edges in H 5 number of vertices in H. Or
We now claim that if v E Even then v is not a leaf. If this is true then (2-dH(v)) 5
0 for v E Even and so we are done.
Suppose there is a vi E Even which is a leaf. Consider the component C in H
that vi is contained in. By the construction of H , each tree in F' is either contained
solely in the vertices represented by C or it is strictly outside C. Since each tree in
F' contains an even number of vertices C does (w.r.t. the original graph), as well.
So vi and C - vi each contains an even number of vertices. As a result, removing the
edge between vi and G - vi would leave even sized components, thus contradicting
the minimality of F'.
For simplicity we focus on the unweighted case. The results that we shall obtain will
also apply to the weighted case.
Recall that an a-approximation algorithm for MAX- CUT is a polynomial time
algorithm which delivers a cut 6(S) such that d(S) 2 a z ~ where
c ZMC is the value
of the optimum cut. Until 1993 the best known a was 0.5 but now it is 0.878 due
to an approximation algorithm of Goemans and Williamson [14]. We shall first of all
look at three (almost identical) algorithms which have an approximation ratio of 0.5.
1. Randomized construction. We select S uniformly from all subsets of V. i.e.
i
For each i E V we put i E S with probability (independently of J' # i).
E [d(S)I = z ( i , j ) E Pr
~ [(i,j) E b(S)1 by linearity of expect ations
= C ( i , j ) E E P r [ i ~ S , j $ S O r i $ S]
S,j~
= iIEl.
But clearly ZMC 5 IEl and so we have E [d(S)]2 $zMc. Note that by comparing
our cut to IEl, the best possible bound that we could obtain is $- since for K,
(the complete graph on n vertices) we have IE 1 = (:) and z ~ =c7 n2
.
2. Greedy procedure. Let V = {1,2,. . . ,n) and let E j = {i : (i,j ) E E and i <
j}. It is clear that {Ej : j = 2, . . . ,n} forms a partition of E. The algorithm
is:
Set S = (1)
For j = 2 t o n do
if lSnEjI 5 ~ I E ~ I
then S + S U {j).
If we define Fj = Ej n 6 ( S ) then we can see that {Fj : j = 2,. .. , n ) is a
partition of S(S). By definition of the algorithm it is clear that IFj 2 I
By 9.
summing over j we get d ( S ) 2 2 y.In fact, the greedy algorithm can
be obtained from the randomized algorithm by using the method of conditional
expectations.
The inequality is true because if IS(S) n S(i)1 < IS(i)1 for some i then we can
move i to the other side of the cut and get an improvement. This contradicts
local optimality.
In local search we move one vertex at a time from one side of the cut to the other
until we reach a local optimum. In the unweighted case this is a polynomial
time algorithm since the number of different values that a cut can take is O(n2).
In the weighted case the running time can be exponential. Haken and Luby [15]
have shown that this can be true even for 4-regular graphs. For cubic graphs
the running time is polynomial [22].
Over the last 15-20 years a number of small improvements were made in the approxi-
mation ratio obtainable for MAX-CUT. The ratio increased in the following manner:
We compare the cut that we obtain to an upper bound which is better that IEl.
Suppose that for each vertex i E V we have a vector vi E Rn (where n = IVI). Let
Sn be the unit sphere {x E Rn : 1 lx 1 1 = 1). Take a point r uniformly distributed on
Sn and let S = {i E V : vi r 2 0) (Figure 5). (Note that without loss of generality
llvill = 1.) Then by linearity of expectations:
Lemma 7
-
where a = arccos(vi v j ) (the angle between vi and vj).
Proof: Thisresultiseasytoseebutitisalittledifficulttoformalize.LetPbethe
2-dimensional plane containing vi and vj. Then P n Sn is a circle. With probability
1, H = {x : x * r = 0) intersects this circle in exactly two points s and t (which are
diametrically opposed). See figure 6. By symmetry s and t are uniformly distributed
on the circle. The vectors vi and v j are separated by the hyperplane H if and only if
either 3 or t lies on the smaller arc between vi and vj. This happens with probability
2a -
- 2
27r - 7r*
Figure 6: The plane P.
where we maximize over all choices for the vi7s. We actually have ma%, E [d(S)]=
ZMC Let S(T)be a cut such that d ( T )= z ~ and
c let e be the unit vector whose first
component is 1 and whose other components are 0. If we set
e i f i ~ T
Vi =
-e otherwise.
Corollary 8
Unfortunately this is as difficult to solve as the original problem and so at first glance
we have not made any progress.
Choosing a good set of vectors
Let f : [- 1,1] + [0,1] be a function which satisfies f (- 1) = 1, f (1) = 0. Consider
the following program:
Max
subject to:
If we denote the optimal value of this program by zp then we have z ~ 5c zp. This
is because if we have a cut 6(T) then we can let,
vi = { e
-e
ifiET
otherwise.
Hence C(i,j)EE
f (vi vj) = d(T) and z ~ 5c zp follows immediately.
where,
arccos(x)
a = min
-15.51 ?rf(x) '
Proof:
(i,j)€E
= azp
We must now choose f such that (P) can be solved in polynomial time and a is as
large as possible. We shall show that (P)can be solved in polynomial time whenever
f is linear and so if we define,
1-x
f (4= - 2
then our first criterion is satisfied. (Note that f (-1) = 1 and f (1) = 0.) With this
choice of f ,
2 arccos(x)
a = min
-15x51 T(1 - x)
Figure 7: Calculating a.
Solving ( P )
We now turn our attention to solving:
Man C l v i - v j )
(i,j)€E
subject to:
Let Y = (yij) where yij = vi vj. Then,
yij = vi vj +=
Y k 0, where Y k 0 means that Y is positive semi-definite:
Vx : xTYx 2 0.) This is true because,
Conversely if Y k 0 and yii = 1 for all i then it can be shown that there exists a set
of vi's such that yij = vi vj. Hence (P) is equivalent to,
Max
subject to:
> >
Note that Q := { Y : Y 0,yii = 1) is convex. (If A 0 and B 0 then A + B 0 >
and also > 0.) It can be shown that maximizing a concave function over a
convex set can be done in polynomial time. Hence we can solve ( P I ) in polynomial
time since linear functions are concave. This completes the analysis of the algorithm.
5.5 Remarks
1. The optimum Y could be irrational but in this case we can find a solution with
an arbitrarily small error in polynomial time.
2. To solve (P') in polynomial time we could use a variation of the interior point
method for linear programming.
5. The analysis is very nearly tight. For the 5-cycle we have z ~ and
c zp =
+
(1 cos = t) which implies that 2P = 0.88445.
Bin Packing and P 11 Cmax
One can push the notion of approximation algorithms a bit further than we have been
doing and define the notion of approximation schemes:
a B i n Packing: Given item sizes a l , az,. . . ,a, 2 0 and a bin size of T, find a
partition of 11,. . . ,Ikof 1 , . . . ,n, such that CiEI,ai 5 T and k is minimized
(the items in Ii are assigned to bin 1).
a P 11 C,,,: Given n jobs with processing times pl, . . . ,p, and m machines,
find a partition {Il, . . . ,I,) of (1, . . . ,n}, such that the makespan defined as
maxi (CjEI, pj ) is minimum. (The makespan represents the maximum comple-
tion time on any machine given that the jobs in I, are assigned to machine
i) .
The decision versions of the two problems are identical and NP-complete. However
when we consider approximation algorithms for the two problems we have completely
different results. In the case of the bin packing problem there is no a-approximation
algorithm with a < 312, unless P = NP.
Proposition 10 There is no a-approximation algorithm with a < 312, for bin pack-
ing, unless P = NP, as seen in Section 2.
However, we shall see, for P 11 C,,, we have a-approximation algorithms for any
a 2 lim sup a k
k+m
where
at;:= sup \ I
I:OPT(I)=k O P T ( I )
11
For P Cmax,there is no difference between an asymptotic performance and a per-
formance guarantee. This follows from the fact that P 11 Cmaxsatisfies a scaling
property : an instance with value P O P T ( I ) can be constructed by multiplying every
processing time pj by P.
Using this definition we can analogously define a polynomial a s y m p t o t i c approx-
i m a t i o n s c h e m e (paas). And a fully polynomial a s y m p t o t i c approximation s c h e m e
(fpaas).
Now we will state some results to illustrate the difference in the two problems
when we consider approximat ion algorithms.
1. For bin packing, there does not exist an a-approximation algorithm with a <
312, unless P = N P . Therefore there is no pas for bin packing unless P = N P .
11
2. For P Cmaxthere exists a pas. This is due to Hochbaum and Shmoys [17].
We will study this algorithm in more detail in today's lecture.
3. For bin packing there exists a paas. (Fernandez de la Vega and Lueker [7]).
5. For bin packing there even exists a fpaas. This was shown by Karmarkar and
Karp [la].
Notice that in some cases both answers are valid. In such a case, we do not care
if the procedure outputs "yes" or "no". Suppose we have such a procedure. Then we
use binary search to find the solution. To begin our binary search, we must find an
interval where optimal Cmaxis contained. Notice that (xipj)
/ m is an average load
per machine and maxjpj is the length of the longest jbbl w e can put a bound on
optimum Cmaxas follows:
Lemma 11 Let
Cj#k Pj
~ m a x I m
+ Plc
Step 3: If answer in step 2 is "no", then return that there does not exist a schedule with
makespan 5 T .
+
If answer in step 2 is "yes", then with a deadline of (1 E')Tput back all small
jobs using list scheduling (i.e. greedy strategy), one at a time. If all jobs are
(l+&)T
deadline deadline
accommodated then return that schedule, else return that there does not exist a
schedule with makespan 5 T .
Step 3 of the algorithm gives the final answer of the procedure. In case of a "yes" it
is clear that the answer is correct. In case of a "no" that was propagated from Step 2
it is also clear that the answer is correct. Finally, if we fail to put back all the small
jobs we must also show that the algorithm is correct. Let us look at a list schedule
in which some small jobs have been scheduled but others couldn't (see Figure 9).
+
If we cannot accomodate all small jobs with a deadline of (1 $)T, it means
that all machines are busy at time T since the processing time of each small job is
< t'T. Hence, the average load per processor exceeds T. Therefore, the answer "no"
-
is correct.
Now, we describe Step 2 of the algorithm for pj > tlT. Having eliminated the
small jobs, we obtain a constant (when t is fixed) upper bound on the number of
jobs processed on one machine. Also, we would like to have only a small number
of distinct processing times in order to be able to enumerate in polynomial time all
possible schedules. For this purpose, the idea is to use rounding. Let qj be the largest
+
number of the form E'T ~ E ' 5 ~ pjT for some k E N. A refinement of Step 2 is the
following.
2.1 Address the decision problem: Is there a schedule for { q j ) with makespan 5 T?
2.2 If the answer is "no", then return that there does not exist a schedule with
makespan 5 T.
If the answer is "yes", then return that schedule.
The Lemma that follows justifies the correctness of the refined Step 2.
namely we need one machine to accomodate the jobs in r' E R and f (n' - f )machines
to accomodate the remaining jobs. In order to compute this recurrence we first have
to compute the at most QP vectors in R. The upper bound on the size of R comes
from the fact that we have at most P jobs per machine and each job can have one of
at most Q processing times. Subsequently, for each one of the vectors in R we have
to iterate for nQ times, since ni 5 n and there are Q components in Z. Thus, the
running time of Step 2.1 is 0(n11ft2
(~/c'~)(~/")).
From this point we can derive the overall running time of the pas in a straight-
forward manner. Since Step 2 iterates O(lg(l/t)) times and since t = 2tt, the overall
running time of the algorithm is 0(n1lf (1/ t2)('If) lg ( 1/ t)) .
Min W
subject to:
C xi(V, W) - C xi(w, V) =
w w 0 otherwise
(v, w) E E.
Notice that constraint (6) forces the xi to define a path (perhaps not simple) from si
to ti. Constraint (7) ensures that every edge has width no greater than W, and the
overall integer program minimizes W .
We can consider the LP relaxation of this integer program by replacing the con-
straints xi(v, w) E {0,1) with xi(v, w) 2 0. The resulting linear program can be
solved in polynomial time by using interior point methods discussed earlier in the
course. The resulting solution may not be integral. For example, consider the multi-
commodity flow problem with one source and sink, and suppose that there are exactly
i edge-disjoint paths between the source and sink. If we weight the edges of each path
by (i.e. set x(v, w) = for each edge of each path), then WLp= +. The value WLp
can be no smaller: since there are i edge-disjoint paths, there is a cut in the graph
with i edges. The average flow on these edges will be f , so that the width will be at
least +.
The fractional solution can be decomposed into paths using flow decomposition, a
standard technique from network flow theory. Let x be such that x 2 0 and
a if v = s;
C X ( V , W) - C X(W, V) =
w W 0 otherwise.
(v, W) E Pl
otherwise.
2. Decompose the fractional solution into paths, yielding paths Pij for i = 1,. . . , k
and j = 1 , . . . ,ji (where Pij is a path from si to t i ) ,and yielding aij > 0 such
that aij = 1 and
3. (Randomization step) For all i, cast a ji-faced die with face probabilities aij . If
the outcome is face f , select path Pij as the path Pi from si to ti.
We will show, using a Chernoff bound, that with high probability we will get small
congestion. Later we will show how to derandomize this algorithm. To carry out the
derandomization it will be important to have a strong handle on the Chernoff bound
and its derivation.
C hernoff bound
For completeness, we include the derivation of a Chernoff bound, although it already
appears in the randomized algorithms chapter.
Proof:
for any u > 0. Moreover, this can be written as P r [ Y > a] with Y 2 0. From
Markov's inequality we have
= >~t ]
P r [ ~ f Xi <
- e-"tE [eaxi
- e-*t nf=, E [emxi] because of independence.
< (1+~)-('+"l"n~
I I".
k
eP
> (1 + P ) M [(1
i=l i=l < [ ( 1+ P ) ( l + P )
The second inequality of the corollary follows from the fact that
Theorem 15 Given e > 0, if the optimal solution to the multicommodity flow prob-
lem W* has value W* = fl(1og n) where n = IVI, then the algorithm produces a
+
solution of width W 5 W* c J m with probability 1 - e (where c and the con-
stant in fl(1ogn) depends on e, see the proof).
Proof: Fix an edge (v, w ) E E. Edge (v, w ) is used by commodity i with proba-
bility pi = Cj:(u,w)~pi, aij. Let Xi be a Bernoulli random variable denoting whether
or not (v, w ) is in path Pi.Then W(v, w ) = Xi, where W(v, w ) is the width of
edge (v, w ). Hence,
Therefore, for
we have that
P.[W(v, w) 2 (1 + P)W*] I
. ,n
&
Derandomization
We will use the method of conditional probabilities. We will need to supplement this
technique, however, with an additional trick to carry through the derandomization.
This result is due to Raghavan [23].
We can represent the probability space using a decision tree. At the root of the
tree we haven't made any decisions. As we descend the tree from the root we represent
the choices first for commodity 1, then for commodity 2, etc. Hence the root has jl
children representing the jl possible paths for commodity 1. Each of these nodes has
j2 children, one for each of the j2 possible paths for commodity 2. We continue in
the manner, until we have reached level k . Clearly the leaves of this tree represent
all the possible choices of paths for the k commodities.
A node at level i (the root is at level 0) is labeled by the i choices of paths for
commodities 1 . .. i : Il . . . I;. Now we define:
I
Il for commodity 1 -
12 for commodity 2
g(l1.l)=Pr max W ( v , w ) > ( l + , B ) W * .
(v,w)€E
1; for commodity i -
By conditioning on the choice of the path for commodity i, we obtain that
If we could compute g(ll, 12, .. . ) efficiently, we could start from g(0) and by select-
ing the minimum at each stage construct a sequence g(0) >
g(ll) >
g(ll, 12) >
>
. . . g(ll, 12, . . . , lk). Unfortunately we don't know how to calculate these quantities.
Therefore we need to use an additional trick.
Instead of using the exact value g, we shall use a pessimistic estimator for the
probability of failure. From the derivation of the Chernoff bound and the analysis of
the algorithm, we know that
where the superscript on Xi denotes the dependence on the edge (v,w), i.e. =1 x!"'~)
if (v,w) belongs to the path Pi. Letting h(ll, . . . , 1;) be the RHS of (10) when we
condition on selecting path Pjrl for commodity j, j = 1, . . . ,i, we observe that:
Therefore, selecting the minimum in the last inequality at each stage, we construct
a sequence such that 1 > E > h(0) > >
h(ll) >_ h(ll, 12) >_ . . . >
h(ll, 12,. . . , lk)
g(h, 12,. . . , lk). Since g(ll, 12, .. . , lk)is either 0 or 1 (there is no randomness involved),
we must have that the choice of paths of this deterministic algorithm gives a maximum
congestion less than (1 P)W*. +
Mult icommodity Flow
Consider an undirected graph G = (V, E) with a capacity u, on each edge. Suppose
that we are given k commodities and a demand for fi units of commodity i between
two points si and ti. In the area of multicommodity flow, one is interested in knowing
whether all commodities can be shipped simultaneously. That is, can we find flows
of value fi between si and ti such that the sum over all commodities of the flow on
each edge (in either direction) is at most the capacity of the edge.
There are several variations of the problem. Here, we consider the concurrent flow
problem: Find a* where a* is the maximum a such that for each commodity we can
ship afiunits from si to ti. This problem can be solved by linear programming since
all the constraints are linear. Indeed, one can have a flow variable for each edge and
each commodity (in addition to the variable a), and the constraints consist of the
flow conservation constraints for each commodity as well as a capacity constraint for
every edge. An example is shown in figure 8. The demand for each commodity is 1
unit and the capacity on each edge is 1 unit. It can be shown that a* = 2.
When there is only one commodity, we know that the maximum flow value is equal
to the minimum cut value. Let us investigate whether there is such an analogue for
multicommodity flow. Consider a cut (S,S). As usual S(S) is the set of edges with
exactly one endpoint in S . Let,
Since all flow between S and S must pass along one of the edges in S(S) we must
have,
Proof: Assume that there exists an embedding of {1,2, .. . , n } into {vl = 0, v2, . .. ,v,}.
Consider one such embedding. Then,
d2 ( i ) = v -v
2
= (vi - vj)(vi - v j ) = vf - 2vi - vj + vj.
2
This theorem is given without proof. The reduction is from MAX CUT, since as we
will see later there is a very close relationship between 11-embeddable metrics and
cuts. We also omit the proof of the following theorem.
Theorem 19 Let X & Rn. (X,12)can be embedded into (Rm,e l) for some m.
The converse of this theorem is not true as can be seen from the metric space
( N O ? o), (-1, O)? (17 01, (07 l ) } ? l l ) .
Claim 20
Proof:
(2)Given S, let
u(b(S)) = C uXY% Y)
(x,Y)EE
k
f (s) = C
i=
fil(si
1
ti)
( 5 )We can view any ll embeddable metric l as a combination of cuts. See figure 11
for the 2-dimensional case.
1 if x, y are separated by S
Is = { 0 otherwise.
C ( x , y ) E ~uxyl(x, Y)
-
Ci a i ~ ( b ( S i ) 2
) min u(b(S))
~ j " =fiP(si
1 ti) aif (si) f(S) '
Claim 21
a* = min
&,~)EE ~xYd(x7y )
l , -embeddable metrics (V,d ) Ci fid(si ti)
Note that by theorem 16 we actually minimize over all metrics.
Proof:
Max a
subject to:
Min x
eEE
uele
subject to:
The second constraint in the dual implies that hi is at most the shortest path
length between si and ti with respect to l,. By strong duality if l is an optimum
solution to the dual then,
j)
C(i,j)EEuijd(i,
2
Cf=1fid(si ti) '
where d(a, b) represents the shortest path length with respect to l,. The first
inequality holds because fihi is constrained to be at least 1.
Linial, London, and Rabinovitch and Aumann and Rabani use the following strat-
egy to bound 5
and approximate the minimum multicommodity cut.
u ( ~ ( S<
) )C ( x , yUXY!(X~
) ~ ~ Y)
-
f (S) Cf=1fie(si ti) '
a What is c?
Approx-44
Proof: Let k range over { 1 , 2 , 4 , 8 , . . . , 2 j , . . . ,2P} where p = llog n ] . Hence we
have p + 1 = O(1og n ) different values for k. Now choose n k sets of size k. At first
take all sets of size k , i.e., n k = (L). Introduce a coordinate for every such set. This
implies that points are mapped into a space of dimension C:=, nzl < 2n. For a set A
of size k the corresponding coordinate of a point x is,
Exchanging the roles of x and y, we deduce that ld(x,A) - d ( y,A) IId ( x , y ). Hence,
We now want to prove that 1 lx - y 1 ll 2 d ( x ,y). Fix two points x and y and define,
B ( x , r ) = { z :d(x,z) 5 r } ,
B(X,T) = { z :d(x,z)< r),
po = 0,
pt = infir : IB(x,r)l > 2t, l B ( y , r ) l > 2t).
Let l be the least index such that pl 2 y.Redefine p' so that it is equal to y.
I
Observe that for all t either B ( x , p t ) 1 < 2t or IB(y, p t ) 1 < 2t. Since B ( x , pl-1) n
+-
B( y , pl-1) = PI we have 2'-I +2'-l 5 n l 5 p. Now fix k = 2j where p- 1 2 j 2 p - l
and let t = p - j (thus, 1 I t 5 I ) . By our observation we can assume without loss
I I
of generality that B ( X , pt) < 2t. Let A be a set of size k and consider the following
two conditions.
In order to prove this theorem we restrict the metric to T and then embed the
restricted metric. If we look at the entire vertex set V then the first part of the
original proof still works.
This new theorem is enough to show that 55 O(1og k) and to approximate the
multicommodity cut to within O(1og k ) . This result is best possible in the sense that
we can have 5 = @(logk ) .
References
[ I ] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. Proof verification
and hardness of approximation problems. In Proceedings of the 33rd Annual
Symposium on Foundations of Computer Science, pages 14-23, 1992.
[4] R. Bar-Yehuda and S. Even. A linear time approximation algorithm for the
weighted vertex cover problem. Journal of Algorithms, 2:198-203, 1981.
[5] M. Bellare and M. Sudan. Improved non-approximability results. In Proceedings
of the 26th Annual ACM Symposium on Theory of Computing, pages 184-193,
1994.
[6] N. Christofides. Worst-case analysis of a new heuristic for the travelling salesman
problem. Technical Report 388, Graduate School of Industrial Administration,
Carnegie Mellon University, Pittsburgh, PA, 1976.
[lo] H. N. Gabow. Data structures for weighted matching and nearest common ances-
tors with linking. In Proceedings of the 1st ACM-SIAM Symposium on Discrete
Algorithms, pages 434-443, 1990.
[12] H. N. Gabow and R. E. Tarjan. Faster scaling algorithms for general graph
matching problems. Technical Report CU-CS-432-89, University of Colorado,
Boulder, 1989.
[15] A. Haken and M. Luby. Steepest descent can take exponential time for symmetric
connection networks. Complex Systems, 2: 191-196, 1988.
[16] D. Hochbaum. Approximation algorithms for set covering and vertex cover prob-
lems. SIAM Journal on Computing, 11:555-556, 1982.
[17] D. Hochbaum and D. Shmoys. Using dual approximation algorithms for schedul-
ing problems: theoretical and practical results. Journal of the ACM, 34(1), Jan.
1987.
[18] N. Karmarkar and R. Karp. An efficient approximation scheme for the one-
dimensional bin-packing problem. In Proceedings of the 23rd Annual Symposium
on Foundations of Computer Science, 1982.
[I91 T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform
mult icommodity flow problems with applications to approximat ion algorithms.
In Proceedings of the 29th Annual Symposium on Foundations of Computer Sci-
ence, pages 422-431, 1988.
[20] N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of
its algorithmic applications. In Proceedings of the 35th Annual Symposium on
Foundations of Computer Science, 1994.
[22] S. Poljak. Integer linear programs and local search for max-cut. Preprint, 1993.
1987.