Approx Algorithms Lecture Notes
Approx Algorithms Lecture Notes
Chandra Chekuri1
1 Introduction 5
1.1 Formal Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.1 NP Optimization Problems . . . . . . . . . . . . . . . . . . 8
1.1.2 Relative Approximation . . . . . . . . . . . . . . . . . . . . 9
1.1.3 Additive Approximation . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Hardness of Approximation . . . . . . . . . . . . . . . . . . 10
1.2 Designing Approximation Algorithms . . . . . . . . . . . . . . . . 12
2 Covering Problems 14
2.1 Greedy for Set Cover and Maximum Coverage . . . . . . . . . . . 15
2.1.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Analysis of Greedy Cover . . . . . . . . . . . . . . . . . . . 16
2.1.3 Dominating Set . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 A 2-approximation for Vertex Cover . . . . . . . . . . . . 20
2.2.2 Set Cover with small frequencies . . . . . . . . . . . . . . . 21
2.3 Vertex Cover via LP . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Set Cover via LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Deterministic Rounding . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Randomized Rounding . . . . . . . . . . . . . . . . . . . . 25
2.4.3 Dual-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.4 Greedy for implicit instances of Set Cover . . . . . . . . . 31
2.5 Submodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Submodular Set Cover . . . . . . . . . . . . . . . . . . . . . 32
2.5.2 Submodular Maximum Coverage . . . . . . . . . . . . . . . 33
2.6 Covering Integer Programs (CIPs) . . . . . . . . . . . . . . . . . . 33
3 Knapsack 35
3.1 The Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 A Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . 36
1
CONTENTS 2
4 Packing Problems 41
4.1 Maximum Independent Set Problem in Graphs . . . . . . . . . . . 42
4.1.1 Elimination Orders and MIS . . . . . . . . . . . . . . . . . 45
4.2 The efficacy of the Greedy algorithm for a class of Independence
Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Randomized Rounding with Alteration for Packing Problems . . 48
4.4 Packing Integer Programs (PIPs) . . . . . . . . . . . . . . . . . . . 51
4.4.1 Randomized Rounding with Alteration for PIPs . . . . . . 51
16 Multicut 183
16.1 Upper Bound on the Integrality Gap . . . . . . . . . . . . . . . . . 184
16.2 Lower Bound on the Integrality Gap . . . . . . . . . . . . . . . . . 189
16.2.1 Expander Graphs . . . . . . . . . . . . . . . . . . . . . . . . 189
16.2.2 The Multicut Instance . . . . . . . . . . . . . . . . . . . . . 192
Introduction
The complexity class P contains the set of problems that can be solved
in polynomial time. From a theoretical viewpoint, this describes the class of
tractable problems, that is, problems that can be solved efficiently. The class NP
is the set of problems that can be solved in non-deterministic polynomial time, or
equivalently, problems for which a solution can be verified in polynomial time.
NP contains many interesting problems that often arise in practice, but there is
good reason to believe P ≠ NP. That is, it is unlikely that there exist algorithms
to solve NP optimization problems efficiently, and so we often resort to heuristic
methods to solve these problems.
Heuristic approaches include backtrack search and its variants, mathematical
programming methods, local seach, genetic algorithms, tabu search, simulated
5
CHAPTER 1. INTRODUCTION 6
annealing etc. Some methods are guaranteed to find an optimal solution, though
they may take exponential time; others are guaranteed to run in polynomial time,
though they may not return a (optimal) solution. Approximation algorithms are
(typically) polynomial time heuristics that do not always find an optimal solution
but they are distinguished from general heuristics in providing guarantees on
the quality of the solution they output.
Approximation Ratio: To give a guarantee on solution quality, one must first
define what we mean by the quality of a solution. We discuss this more carefully
later. For now, note that each instance of an optimization problem has a set of
feasible solutions. The optimization problems we consider have an objective
function which assigns a (real/rational) number/value to each feasible solution
of each instance 𝐼. The goal is to find a feasible solution with minimum objective
function value or maximum objective function value. The former problems are
minimization problems and the latter are maximization problems.
For each instance 𝐼 of a problem, let OPT(𝐼) denote the value of an optimal
solution to instance 𝐼. We say that an algorithm 𝒜 is an 𝛼-approximation
algorithm for a problem if, for every instance 𝐼, the value of the feasible solution
returned by 𝒜 is within a (multiplicative) factor of 𝛼 of OPT(𝐼). Equivalently,
we say that 𝒜 is an approximation algorithm with approximation ratio 𝛼. For a
minimization problem we would have 𝛼 ≥ 1 and for a maximization problem
we would have 𝛼 ≤ 1. However, it is not uncommon to find in the literature a
different convention for maximization problems where one says that 𝒜 is an
𝛼-approximation algorithm if the value of the feasible solution returned by 𝒜
is at least 𝛼1 · OPT(𝐼); the reason for using convention is so that approximation
ratios for both minimization and maximization problems will be ≥ 1. In this
course we will for the most part use the convention that 𝛼 ≥ 1 for minimization
problems and 𝛼 ≤ 1 for maximization problems.
Remarks:
2. The approximation ratio 𝛼 can depend on the size of the instance 𝐼, so one
should technically write 𝛼(|𝐼 |).
OPT(𝐼) + 𝛼 for all 𝐼. This is a valid definition and is the more relevant one
in some settings. However, for many NP problems it is easy to show that
one cannot obtain any interesting additive approximation (unless of course
𝑃 = 𝑁𝑃) due to scaling issues. We will illustrate this via an example later.
Pros and cons of the approximation approach: Some advantages to the ap-
proximation approach include:
1. It explains why problems can vary considerably in difficulty.
Disadvantages include:
1. The focus on worst-case measures risks ignoring algorithms or heuristics
that are practical or perform well on average.
4. The framework does not (at least directly) apply to decision problems or
those that are inapproximable.
• For each 𝐼, and 𝑆 ∈ 𝒮(𝐼), |𝑆| ≤ poly(|𝐼 |). That is, the solution are of size
polynomial in the input size.
• There exists a poly-time decision procedure that for each 𝐼 and 𝑆 ∈ Σ∗ , decides if
𝑆 ∈ 𝒮(𝐼). This is the key property of 𝑁𝑃; we should be able to verify solutions
efficiently.
For maximization problems, it is also common to see use 1/𝛼 (which must
be ≥ 1) as approximation ratio.
Lemma 1.2. Metric-TSP does not admit an 𝛼 additive approximation algorithm for
any polynomial-time computable 𝛼 unless 𝑃 = 𝑁𝑃.
Proof. For simplicity, suppose every edge has integer cost. For the sake of
contradiction, suppose there exists an additive 𝛼 approximation 𝒜 for Metric-
TSP. Given 𝐼, we run the algorithm on 𝐼𝛽 and let 𝑆 be the solution, where 𝛽 = 2𝛼.
We claim that 𝑆 is the optimal solution for 𝐼. We have 𝑣𝑎𝑙(𝑆, 𝐼) = 𝑣𝑎𝑙(𝑆, 𝐼𝛽 )/𝛽 ≤
OPT(𝐼𝛽 )/𝛽 + 𝛼/𝛽 = OPT(𝐼) + 1/2, as 𝒜 is 𝛼-additive approximation. Thus we
conclude that OPT(𝐼) = 𝑣𝑎𝑙(𝑆, 𝐼), since OPT(𝐼) ≤ 𝑣𝑎𝑙(𝑆, 𝐼), and OPT(𝐼), 𝑣𝑎𝑙(𝑆, 𝐼)
are integers. This is impossible unless 𝑃 = 𝑁𝑃.
Theorem 1.3 ([88]). Unless 𝑃 = 𝑁𝑃, there is no 2 − 𝜖 approximation for 𝑘-center for
any fixed 𝜖 > 0.
Covering Problems
Part of these notes were scribed by Abul Hassan Samee and Lewis Tseng.
Packing and Covering problems together capture many important problems
in combinatorial optimization. We will discuss several covering problems in
this chapter. Two canonical one problems are Minimum Vertex Cover and
its generalization Minimum Set Cover. (Typically we will omit the use of the
qualifiers minimum and maximum since this is often clear from the definition
of the problem and the context.) They play an important role in the study of
approximation algorithms.
A vertex cover in an undirected graph 𝐺 = (𝑉 , 𝐸) is a set 𝑆 ⊆ 𝑉 of vertices
such that for each edge 𝑒 ∈ 𝐸, at least one of its end points is in 𝑆. It is also
called a node cover. In the Vertex Cover problem, our goal is to find a smallest
vertex cover of 𝐺. In the weighted version of the problem, a weight function
𝑤 : 𝑉 → ℛ + is given, and our goal is to find a minimum weight vertex cover of
𝐺. The unweighted version of the problem is also known as Cardinality Vertex
Cover. Note that we are picking vertices to cover the edges. Vertex Cover is
NP-Hard and is on the list of problems in Karp’s list.
In the Set Cover problem the input is a set 𝒰 of 𝑛 elements, and a collection
𝒮 = {𝑆1 , 𝑆2 , . . . , 𝑆𝑚 } of 𝑚 subsets of 𝒰 such that 𝑖 𝑆 𝑖 = 𝒰. Our goal in the
Ð
Set Cover problem is to select as few subsets as possible from 𝒮 such that their
union covers 𝒰. In the weighted version each set 𝑆 𝑖 has a non-negative weight
𝑤 𝑖 the goal is to find a set cover of minimim weight. Closely related to the Set
Cover problem is the Maximum Coverage problem. In this problem the input is
again 𝒰 and 𝒮 but we are also given an integer 𝑘 ≤ 𝑚. The goal is to select 𝑘
subsets from 𝒮 such that their union has the maximum cardinality. Note that Set
Cover is a minimization problem while Maximum Coverage is a maximization
problem. Set Cover is essentially equivalent to the Hitting Set problem. In
Hitting Set the input is 𝒰 and 𝒮 but the goal is to pick the smallest number of
14
CHAPTER 2. COVERING PROBLEMS 15
elements of 𝒰 that cover the given sets in 𝒮. In other words we are seeking a
set cover in the dual set system. It is easy to see Vertex Cover is a special case of
Set Cover .
Set Cover is an important problem because in discrete optimization. In the
standard definition the set system is given explicitly. In many applications the
set system is implicit, and often exponential in the explicit part of the input;
nevertheless such set systems are ubiquitious and one can often obtain exact
or approximation algorithms. As an example consider the well known MST
problem in graphs. One way to phrase MST is the following: given an edge-
weighted graph 𝐺 = (𝑉 , 𝐸) find a minimum cost subset of the edges that cover
all the cuts of 𝐺; by cover a cut 𝑆 ⊆ 𝑉 we mean that at least one of the edges in
𝛿(𝑆) must be chosen. This may appear to be a strange way of looking at the MST
problem but this view is useful as we will see later. Another implicit example
is the following. Suppose we are given 𝑛 rectangles in the plane and the goal
is to choose a minimum number of points in the plane such that each input
rectangle contains one of the chosen points. This is perhaps more natural to
view as a special case of the Hitting Set problem. In principle the set of points
that we can choose from is infinite but it can be seen that we can confine our
attention to vertices in the arrangement of the given rectangles and it is easy to
see that there are only 𝑂(𝑛 2 ) vertices — however, explicity computing them may
be expensive and one may want to treat the problem as an implicit one for the
sake of efficiency. want to think of
Covering problems have the feature that a superset of a feasible solution is
also a feasible solution. More abstractly one can cast covering problems as the
following. We are given a finite ground set 𝑉 (vertices in a graph or sets in a
set system) and a family of feasible solutions ℐ ⊆ 2𝑉 where ℐ is upward closed;
by this we mean that if 𝐴 ∈ ℐ and 𝐴 ⊂ 𝐵 then 𝐵 ∈ ℐ. The goal is to find the
smallest cardinality set 𝐴 in ℐ. In the weighted case 𝑉 has weights and the goal
is to find a minimum weight set in ℐ. In some case one can also consider more
complex non-additive objectives that assign a cost 𝑐(𝑆) for each 𝑆 ∈ ℐ.
Greedy Cover(𝒰 , 𝒮 )
1. repeat
A. pick the set that covers the maximum number of uncovered elements
B. mark elements in the chosen set as covered
2. until done
In case of Set Cover, the algorithm Greedy Cover is done when all the
elements in set 𝒰 have been covered. And in case of Maximum Coverage, the
algorithm is done when exactly 𝑘 subsets have been selected from 𝒮.
We will prove the following theorem.
Theorem 2.1. Greedy Cover is a 1 − (1 − 1/𝑘) 𝑘 ≥ (1 − 1𝑒 ) ' 0.632 approximation
for Maximum Coverage, and a (ln 𝑛 + 1) approximation for Set Cover.
The following theorem due to Feige [56] implies that Greedy Cover is
essentially the best possible in terms of the approximation ratio that it guarantees.
𝑧𝑖
Claim 2.1.1. For 0 ≤ 𝑖 < 𝑘, 𝑥 𝑖+1 ≥ 𝑘.
Proof. Let 𝐹 ∗ ⊆ 𝒰 be the elements covered by some fixed optimum solution; we
have |𝐹 ∗ | = OPT. Consider iteration 𝑖 + 1. Greedy Cover selects the subset 𝑆 𝑗
whose inclusion covers the maximum number of uncovered elements. Since 𝑧 𝑖 is
the total number of elements covered upto iteration 𝑖, at least OPT −𝑧 𝑖 elements
from 𝐹 ∗ are uncovered. Let the set of uncovered elements from 𝐹 ∗ at the end of
iteration 𝑖 be 𝐹𝑖∗ . Since 𝑘 sets together cover 𝐹 ∗ , and hence 𝐹𝑖 ∗ as well, there must
be some set in that collection of 𝑘 sets that covers at least |𝐹𝑖∗ |/𝑘 elements. This is
a candidate set that can be chosen in iteration 𝑖 + 1. Since the algorithm picks
the set that covers the maximum number of uncovered elements, the chosen
set in iteration 𝑖 + 1 covers at least |𝐹𝑖∗ |/𝑘 = 𝑧𝑘𝑖 uncovered elements. Hence,
𝑥 𝑖+1 ≥ 𝑧𝑘𝑖 .
𝑧𝑖
Remark 2.1. It is tempting to make a stronger claim that 𝑥 𝑖+1 ≥ 𝑘−𝑖 . This is
however false, and it is worthwhile to come up with an example.
By definition we have 𝑦 𝑘 = 𝑥 1 + 𝑥 2 + . . . + 𝑥 𝑘 is the total number of elements
covered by Greedy Cover. To analyze the worst-case we want to make this sum
as small as possible given the preceding claim. Heuristically (which one can
formalize), one can argue that choosing 𝑥 𝑖+1 = 𝑧 𝑖 /𝑘 minimizes the sum. Using
this one can argue that the sum is at least (1 − (1 − 1/𝑘) 𝑘 ) OPT. We give a formal
argument now.
Claim 2.1.2. For 𝑖 ≥ 0, 𝑧 𝑖 ≤ (1 − 1𝑘 )𝑖 · OPT.
Proof. By induction on 𝑖. The claim is trivially true for 𝑖 = 0 since 𝑧 0 = OPT. We
assume inductively that 𝑧 𝑖 ≤ (1 − 1𝑘 )𝑖 · 𝑂𝑃𝑇. Then
𝑧 𝑖+1 = 𝑧 𝑖 − 𝑥 𝑖+1
1
≤ 𝑧 𝑖 (1 − ) [using Claim 2.1.1]
𝑘
1 𝑖+1
≤ (1 − ) · OPT .
𝑘
The preceding claims yield the following lemma for algorithm Greedy Cover
when applied on Maximum Coverage.
Lemma 2.1. Greedy Cover is a 1 − (1 − 1/𝑘) 𝑘 ≥ 1 − 1
𝑒 approximation for Maximum
Coverage.
Proof. It follows from Claim 2.1.2 that 𝑧 𝑘 ≤ (1 − 1𝑘 ) 𝑘 · OPT ≤ OPT
𝑒 . Hence,
𝑦 𝑘 = OPT −𝑧 𝑘 ≥ (1 − 1𝑒 ) · OPT.
We note that (1 − 1/𝑒) ' 0.632.
CHAPTER 2. COVERING PROBLEMS 18
Thus, after 𝑡 steps, at most 𝑘 ∗ elements are left to be covered. Since Greedy Cover
picks at least one element in each step, it covers all the elements after picking at
most d𝑘 ∗ ln 𝑘𝑛∗ e + 𝑘 ∗ ≤ 𝑘 ∗ (ln 𝑛 + 1) sets.
A useful special case of Set Cover is when all sets are “small”. Does the
approximation bound for Greedy improve? We can prove the following corollary
via Lemma 2.2.
Corollary 2.3. If each set in the set system has at most 𝑑 elements, then Greedy Cover
is a (ln 𝑑 + 1) approximation for Set Cover.
𝑛
Proof. If each set has at most 𝑑 elements then we have that 𝑘 ∗ ≥ 𝑑 and hence
ln 𝑘𝑛∗ ≤ ln 𝑑. Then the claim follows from Lemma 2.2.
Theorem 10.3 follows directly from Lemma 2.1 and 2.2.
A near-tight example for Greedy Cover when applied on Set Cover : Let
us consider a set 𝒰 of 𝑛 elements along with a collection 𝒮 of 𝑘 + 2 subsets
{𝑅 1 , 𝑅2 , 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 } of 𝒰. Let us also assume that |𝐶 𝑖 | = 2𝑖 and |𝑅 1 ∩ 𝐶 𝑖 | =
|𝑅2 ∩ 𝐶 𝑖 | = 2𝑖−1 (1 ≤ 𝑖 ≤ 𝑘), as illustrated in Fig. 2.1.
Clearly, the optimal solution consists of only two sets, i.e., 𝑅1 and 𝑅 2 . Hence,
OPT = 2. However, Greedy Cover will pick each of the remaining 𝑘 sets, namely
Í 𝑘−1 𝑖
𝐶 𝑘 , 𝐶 𝑘−1 , . . . , 𝐶1 . Since 𝑛 = 2 · 𝑖=0 2 = 2 · (2 𝑘 − 1), we get 𝑘 = Ω(log 𝑛). One
can construct tighter examples with more involved analysis.
CHAPTER 2. COVERING PROBLEMS 19
$cdots$ $cdots$
111
000
000
111 111
000
000
111 111
000
000
111 1111
0000
0000
1111
000
111 000
111 000
111 0000
1111
$R_1$ 000
111 000
111 000
111 0000
1111
000
111
000
111 000
111
000
111 000
111
000
111 0000
1111
0000
1111
$2^{i−1}$ elements
111
000 111
000 111
000 1111
0000
000
111 000
111 000
111 0000
1111
$R_2$ 000
111
000
111 000
111
000
111 000
111
000
111 0000
1111
0000
1111
000
111
000
111 000
111
000
111 000
111
000
111 0000
1111
0000
1111
$cdots$ $cdots$
Figure 2.1: A near-tight example for Greedy Cover when applied on Set Cover
Exercise 2.1. Consider the weighted version of the Set Cover problem where a
weight function 𝑤 : 𝒮 → ℛ + is given, and we want to select a collection 𝒮 0 of
subsets from 𝒮 such that ∪𝑋∈𝒮0 𝑋 = 𝒰, and 𝑋∈𝒮0 𝑤(𝑋) is the minimum. One
Í
can generalize the greedy heuristic in the natural fashion where in each iteration
the algorithm picks the set that maximizes the ratio of the number of elements
to its weight. Adapt the unweighted analysis to prove that the greedy algorithm
yields an 𝑂(ln 𝑛) approximation for the weighted version (you can be sloppy
with the constant in front of ln 𝑛).
Exercise 2.2. 1. Show that Dominating Set is a special case of Set Cover.
2. What is the greedy heuristi when applied to Dominating Set. Prove that it
yields an (ln (Δ + 1) + 1) approximation where Δ is the maximum degree
in the graph.
Matching-VC(𝐺 )
1. 𝑆 ← ∅
4. Output 𝑆
Claim 2.2.1. Let OPT be the size of the vertex cover in an optimal solution. Then
OPT ≥ |𝑀|.
Proof. Any vertex cover must contain at least one end point of each edge in 𝑀
since no two edges in 𝑀 intersect. Hence OPT ≥ |𝑀 |.
Claim 2.2.2. Let 𝑆(𝑀) = {𝑢, 𝑣|(𝑢, 𝑣) ∈ 𝑀}. Then 𝑆(𝑀) is a vertex cover.
Proof. If 𝑆(𝑀) is not a vertex cover, then there must be an edge 𝑒 ∈ 𝐸 such that
neither of its endpoints are in 𝑀. But then 𝑒 can be included in 𝑀, which
contradicts the maximality of 𝑀.
We now finish the proof of Theorem 2.4. Since 𝑆(𝑀) is a vertex cover,
Claim 2.2.1 implies that |𝑆(𝑀)| = 2 · |𝑀| ≤ 2 · OPT.
Weighted Vertex Cover: The matching based heuristic does not generalize in
a straight forward fashion to the weighted case but 2-approximation algorithms
for the Weighted Vertex Cover problem can be designed based on LP rounding.
Exercise 2.3. Give an 𝑓 -approximation for Set Cover, where 𝑓 is the maximum
frequency of an element. Hint: Follow the approach used for Vertex Cover .
Õ
min 𝑤𝑣 𝑥𝑣
𝑣∈𝑉
subject to
𝑥𝑢 + 𝑥𝑣 ≥ 1 ∀𝑒 = (𝑢, 𝑣) ∈ 𝐸
𝑥𝑣 ∈ {0, 1} ∀𝑣 ∈ 𝑉
Integrality Gap: We introduce the notion of integrality gap to show the best
approximation guarantee we can obtain if we only use the LP values as a lower
bound.
Definition 2.5. For a minimization problem Π, the integrality gap for a linear pro-
OPT(𝐼)
gramming relaxation/formulation 𝐿𝑃 for Π is sup𝐼∈𝜋 OPT (𝐼)
.
𝐿𝑃
That is, the integrality gap is the worst case ratio, over all instances 𝐼 of Π, of
the integral optimal value and the fractional optimal value. Note that different
linear programming formulations for the same problem may have different
integrality gaps.
Claims 2.3.1 and 2.3.2 show that the integrality gap of the Vertex Cover LP
formulation above is at most 2.
Question: Is this bound tight for the Vertex Cover LP?
Consider the following example: Take a complete graph, 𝐾 𝑛 , with n vertices,
and each vertex has 𝑤 𝑣 = 1. It is clear that we have to choose 𝑛 − 1 vertices to
cover all the edges. Thus, OPT(𝐾 𝑛 ) = 𝑛 − 1. However, 𝑥 𝑣 = 12 for each 𝑣 is a
feasible solution to the LP, which has a total weight of 𝑛2 . So gap is 2 − 𝑛1 , which
tends to 2 as 𝑛 → ∞. One can also prove that the integrality gap is essentially 2
even in a class of sparse graphs.
Exercise 2.4. The vertex cover problem can be solved optimally in polynomial
time in bipartite graphs. In fact the LP is integral. Prove this via the maxflow-
mincut theorem and the integrality of flows when capacities are integral.
We give several algorithms for Set Cover based on this primal/dual pair LPs.
2. Let 𝑃 = {𝑖 | 𝑥 ∗𝑖 > 0}
3. Output {𝑆 𝑗 | 𝑗 ∈ 𝑃}
Note that the above algorithm, even when specialized to Vertex Cover is
different from the one we saw earlier. It includes all sets which have a strictly
positive value in an optimum solution to the LP.
CHAPTER 2. COVERING PROBLEMS 25
Claim 2.4.1. The output of the algorithm is a feasible set cover for the given instance.
Proof. Exercise.
𝑤𝑗 ≤ 𝑓 𝑤 𝑗 𝑥 ∗𝑗 = OPT𝐿𝑃 .
Í Í
Claim 2.4.2. 𝑗∈𝑃 𝑗
Proof.
Õ Õ Õ ©Õ ª Õ © Õ ª Õ
𝑤𝑗 = 𝑤𝑗 = 𝑦 𝑖∗ ® = 𝑦 𝑖∗ 1®® ≤ 𝑓 𝑦 𝑖∗ = 𝑓 OPT𝐿𝑃 (𝐼).
𝑗∈𝑃 𝑗:𝑥 ∗𝑗 >0 𝑗:𝑥 ∗𝑗 >0 « 𝑖∈𝑆 𝑗 ¬ 𝑖 𝑗:𝑖∈𝑆 𝑗 ,𝑥 ∗𝑗 >0 𝑖
« ¬
.
2. for 𝑘 = 1 to 2 ln 𝑛 do
(1 − 𝑥 ∗𝑗 ) ≤ 1𝑒 .
Î
Claim 2.4.3. P[𝑖 is not covered in an iteration] = 𝑗:𝑖∈𝑆 𝑗
want to minimize the probability that element 𝑖 is covered, one can see that the
minimum is achieved with 𝑥 ∗𝑗 = 1/ℓ for each set 𝑆 𝑗 that covers 𝑖; here ℓ is the
number of sets that cover 𝑖. Then the probability is (1 − ℓ1 )ℓ .
We then obtain the following corollaries:
Corollary 2.6. P[𝑖 is not covered at the end of the algorithm] ≤ 𝑒 −2 log 𝑛 ≤ 1
𝑛2
.
Corollary 2.7. P[all elements are covered, after the algorithm stops] ≥ 1 − 𝑛1 .
Proof. Via the union bound. The probability that 𝑖 is not covered is at most
1/𝑛 2 , hence the probability that there is some 𝑖 that is not covered is at most
𝑛 · 1/𝑛 2 ≤ 1/𝑛.
Now we bound the expected cost of the algorithm. Let 𝐶𝑡 = cost of sets
picked in iteration 𝑡, then E[[𝐶𝑡 ] = 𝑚𝑗=1 𝑤 𝑗 𝑥 𝑗 , where E[𝑋] denotes the ex-
∗
Í
ln 𝑛
pectation of a random variable 𝑋. Then, let 𝐶 = 2𝑡=1 𝐶𝑡 ; we have E[𝐶] =
Í
Í2 ln 𝑛
𝑡=1 E[𝐶 𝑡 ] ≤ 2 ln 𝑛 OPT𝐿𝑃 . By Markov’s inequality, P[𝐶 > 2 E[𝐶]] ≤ 2 , hence
1
P[𝐶 ≤ 4 ln 𝑛 OPT𝐿𝑃 ] ≥ 12 . Therefore, P[𝐶 ≤ 4 ln 𝑛 OPT𝐿𝑃 and all items are covered] ≥
1 1
2 − 𝑛 . Thus, the randomized rounding algorithm, with probability close to 1/2
CHAPTER 2. COVERING PROBLEMS 27
succeeds in giving a feasible solution of cost 𝑂(log 𝑛) OPT𝐿𝑃 . Note that we can
check whether the solution satisfies the desired properties (feasibility and cost)
and repeat the algorithm if it does not.
2. We can also use Chernoff bounds (large deviation bounds) to show that
a single rounding succeeds with high probability (probability at least
1 − 𝑝𝑜𝑙1𝑦(𝑛) ).
4. After a few rounds, select the cheapest set that covers each uncovered
element. This has low expected cost. This algorithm ensures feasibility
but guarantees cost only in the expected sense. We will see a variant on
the homework.
Lemma 2.3. Let 𝛽 𝑖 be the cost of the cheapest set covering 𝑖. Then 𝛽 𝑖 ≤ 𝑑 OPT𝐿𝑃 .
Í
𝑖
Proof. Consider an element 𝑖. We have the constraint that 𝑗:𝑖∈𝑆 𝑗 𝑥 ∗𝑗 ≥ 1. Since
Í
Thus,
Õ ÕÕ Õ Õ
𝛽𝑖 ≤ 𝑐 𝑗 𝑥 ∗𝑗 ≤ 𝑐 𝑗 𝑥 ∗𝑗 |𝑆 𝑗 | ≤ 𝑑 𝑐 𝑗 𝑥 ∗𝑗 = 𝑑 OPT𝐿𝑃 .
𝑖 𝑖 𝑗:𝑖∈𝑆 𝑗 𝑗 𝑗
Now we bound the expected second phase cost.
Lemma 2.4. E[𝐶2 ] ≤ OPT𝐿𝑃 .
Proof. We pay for a set to cover element 𝑖 in the second phase only if it is not
covered in the first phase. Hence 𝐶2 = 𝑖 ℰ 𝑖 𝛽 𝑖 . Note that the events ℰ 𝑖 for
Í
different elements 𝑖 are not necessarily independent, however, we can apply
linearity of expectation.
Õ Õ Õ
E[𝐶2 ] = E[ℰ 𝑖 ]𝛽 𝑖 = P[ℰ 𝑖 ]𝛽 𝑖 ≤ 1/𝑑 𝛽 𝑖 ≤ OPT𝐿𝑃 .
𝑖 𝑖 𝑖
Combining the expected costs of the two phases we obtain the following
theorem.
Theorem 2.8. Randomized rounding with alteration outputs a feasible solution of
expected cost (1 + ln 𝑑) OPT𝐿𝑃 .
Note that the simplicity of the algorithm and tightness of the bound.
Remark 2.4. If 𝑑 = 2 the Set Cover problem becomes the Edge Cover problem
in a graph which is the following. Given an edge-weighted graph 𝐺 = (𝑉 , 𝐸),
find the minimum weight subset of edges such that each vertex is covered. Edge
Cover admits a polynomial-time algorithm via a reduction to the minimum-cost
matching problem in a general graph. However 𝑑 = 3 for Set Cover is NP-Hard
via a reduction from 3-D Matching.
CHAPTER 2. COVERING PROBLEMS 29
2.4.3 Dual-fitting
In this section, we introduce the technique of dual-fitting for the analysis of
approximation algorithms. At a high-level the approach is the following:
3. Show that the cost of the solution returned by the algorithm can be bounded
in terms of the value of the dual solution.
Note that the algorithm itself need not be LP based. Here, we use Set Cover
as an example. See the previous section for the primal and dual LP formulations
for Set Cover .
We can interpret the dual as follows: Think of 𝑦 𝑖 as how much element 𝑖 is
willing to pay to be covered; the dual maximizes the total payment, subject to
the constraint that for each set, the total payment of elements in that set is at
most the cost of the set.
We rewrite the Greedy algorithm for Weighted Set Cover.
1. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = ∅
2. 𝐴 = ∅;
3. While 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ≠ 𝑈 do
𝑤𝑘
A. 𝑗 ← arg min( );
|𝑆 𝑘 ∩ 𝑈 𝑛𝑐𝑜𝑣𝑒𝑟𝑒 𝑑|
𝑘
B. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ∪ 𝑆 𝑗 ;
C. 𝐴 = 𝐴 ∪ {𝑗}.
4. end while;
Theorem 2.9. Greedy Set Cover picks a solution of cost ≤ 𝐻𝑑 · OPT𝐿𝑃 , where 𝑑 is
the maximum set size, i.e., 𝑑 = max 𝑗 |𝑆 𝑗 |.
CHAPTER 2. COVERING PROBLEMS 30
1. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = ∅
2. While 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ≠ 𝑈 do
𝑤𝑘
A. 𝑗 ← arg min( )
𝑘 |𝑆 𝑘 ∩ 𝑈 𝑛𝑐𝑜𝑣𝑒𝑟𝑒 𝑑|
𝑤𝑗
B. if 𝑖 is uncovered and 𝑖 ∈ 𝑆 𝑗 , set 𝑝 𝑖 = ;
|𝑆 𝑗 ∩ 𝑈 𝑛𝑐𝑜𝑣𝑒𝑟𝑒 𝑑|
C. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ∪ 𝑆 𝑗
D. 𝐴 = 𝐴 ∪ {𝑗}.
𝑤𝑗 = 𝑝𝑖 .
Í Í
Claim 2.4.4. 𝑗∈𝐴 𝑖
Proof. Consider when 𝑗 is added to 𝐴. Let 𝑆0𝑗 ⊆ 𝑆 𝑗 be the elements that are
uncovered before 𝑗 is added. For each 𝑖 ∈ 𝑆0𝑗 the algorithm sets 𝑝 𝑖 = 𝑤 𝑗 /|𝑆0𝑗 |.
Hence, 𝑖∈𝑆0 𝑝 𝑖 = 𝑤 𝑗 . Moreover, it is easy to see that the sets 𝑆0𝑗 , 𝑗 ∈ 𝐴 are disjoint
Í
𝑗
and together partition 𝑈. Therefore,
Õ ÕÕ Õ
𝑤𝑗 = 𝑝𝑖 = 𝑝𝑖 .
𝑗∈𝐴 𝑗∈𝐴 𝑖∈𝑆0𝑗 𝑖∈𝑈
For each 𝑖, let 𝑦 𝑖0 = 𝐻𝑑 𝑝 𝑖
1
.
Í Suppose
Íthe0 claim is true, then the cost of Greedy Set Cover’s solution =
𝑖 𝑝 𝑖 = 𝐻𝑑 𝑖 𝑦 𝑖 ≤ 𝐻𝑑 OPT𝐿𝑃 . The last step is because any feasible solution for
the dual problem is a lower bound on the value of the primal LP (weak duality).
Now, we prove the claim. Let 𝑆 𝑗 be an arbitrary set, and let |𝑆 𝑗 | = 𝑡 ≤ 𝑑. Let
𝑆 𝑗 = {𝑖1 , 𝑖2 , ..., 𝑖 𝑡 }, where we the elements are ordered such that 𝑖1 is covered by
Greedy no-later than 𝑖2 , and 𝑖2 is covered no later than 𝑖3 and so on.
𝑤𝑗
Claim 2.4.6. For 1 ≤ ℎ ≤ 𝑡, 𝑝 𝑖 ℎ ≤ 𝑡−ℎ+1 .
CHAPTER 2. COVERING PROBLEMS 31
Proof. Let 𝑆 𝑗0 be the set that covers 𝑖 ℎ in Greedy. When Greedy picked 𝑆 𝑗0 the
elements 𝑖 ℎ , 𝑖 ℎ+1 , . . . , 𝑖 𝑡 from 𝑆 𝑗 were uncovered and hence Greedy could have
picked 𝑆 𝑗 as well. This implies that the density of 𝑆 𝑗0 when it was picked was
𝑤𝑗
no more than 𝑡−ℎ+1 . Therefore 𝑝 𝑖 ℎ which is set to the density of 𝑆 𝑗0 is at most
𝑤𝑗
𝑡−ℎ+1 .
From the above claim, we have
Õ Õ 𝑤𝑗
𝑝𝑖ℎ ≤ = 𝑤 𝑗 𝐻𝑡 ≤ 𝑤 𝑗 𝐻 𝑑 .
𝑡−ℎ+1
1≤ℎ≤𝑡 1≤ℎ≤𝑡
We will see several examples of implicit use of the greedy analysis in the
course.
2.5 Submodularity
Set Cover turns out to be a special case of a more general problem called
Submodular Set Cover. The Greedy algorithm and analysis applies in this more
generality. Submodularity is a fundamental notion with many applications in
CHAPTER 2. COVERING PROBLEMS 32
Greedy Submodular ( 𝑓 , 𝑁)
1. 𝑆 ← ∅
3. Output 𝑆
Exercise 2.10. 1. Prove that the greedy algorithm is a 1 + ln( 𝑓 (𝑁)) approxi-
mation for Submodular Set Cover?
The above and many related results were shown in the influential papers of
Fisher, Nemhauser and Wolsey [61, 128].
𝑛
Õ
min 𝑤𝑗 𝑥𝑗
𝑗=1
subject to
𝐴𝑥 ≥ 𝑏
𝑥𝑗 ≤ 𝑑𝑗 1≤𝑗≤𝑚
𝑥𝑗 ≥ 0 1≤𝑗≤𝑚
𝑥𝑗 ∈ ℤ 1≤𝑗≤𝑚
Exercise 2.12. Prove that CIPs are a special case of Submodular Set Cover.
One can apply the Greedy algorithm to the above problem and the standard
analysis shows that the approximation ratio obtained is 𝑂(log 𝐵) where 𝐵 = 𝑖 𝑏 𝑖
Í
(assuming that they are integers). Even though this is reasonable we would
prefer a strongly polynomial bound. In fact there are instances where 𝐵 is
exponential in 𝑛 and the worst-case approximation ratio can be poor. The
natural LP relaxation of the above integer program has a large integrality gap in
constrat to the case of Set Cover . One needs to strengthen the LP relaxation via
what are known as knapsack cover inequalities. We refer the reader to the paper of
Kolliopoulos and Young [106] and recent one by Chekuri and Quanrud [40] for
more on this problem.
Chapter 3
Knapsack
35
CHAPTER 3. KNAPSACK 36
The next claim is perhaps more interesting and captures the intuition that
the bad case for greedy happens only when there are “big” items.
Proof. We give a proof sketch via the LP relaxation. Recall that 𝑘 is the first item
that did not fit in the knapsack. We make the following observation. Recall that
OPT𝐿𝑃 is the optimum value of LP relaxation. Suppose we reduce the knapsack
capacity to 𝐵0 = 𝑠 1 + 𝑠 2 + . . . 𝑠 𝑘−1 while keeping all the items the same. Let
OPT0𝐿𝑃 be the value for the new size. We claim that OPT0𝐿𝑃 ≥ 𝐵𝐵 OPT𝐿𝑃 — this
0
is because we can take any feasible solution to the original LP and scale each
variable by 𝐵0/𝐵 to obtain a feasible solution with the new capacity. What is
OPT0𝐿𝑃 ? We note that Greedy will fill 𝐵0 to capacity with the first 𝑘 − 1 items and
hence, OPT0𝐿𝑃 = 𝑝1 + . . . + 𝑝 𝑘−1 . Combining, we obtain that
𝐵0 𝐵0
𝑝 1 + . . . + 𝑝 𝑘−1 ≥ OPT𝐿𝑃 ≥ OPT .
𝐵 𝐵
We note that 𝐵0 + 𝑠 𝑘 ≥ 𝐵 since item 𝑘 did not fit, and hence 𝐵0 ≥ 𝐵 − 𝑠 𝑘 ≥ 𝐵 − 𝜖𝐵 ≥
(1 − 𝜖)𝐵. Therefore 𝐵0/𝐵 ≥ (1 − 𝜖) and this finishes the proof.
We may now describe the following algorithm. Let 𝜖 ∈ (0, 1) be a fixed
constant and let ℎ = d 1𝜖 e. We will try to guess the ℎ most profitable items in an
optimal solution and pack the rest greedily.
CHAPTER 3. KNAPSACK 38
Guess h + Greedy(𝑁 , 𝐵)
Theorem 3.4. Guess h + Greedy gives a (1−𝜖) approximation and runs in 𝑂(𝑛 d1/𝜖e+1 )
time.
Proof. For the running time, observe that there are 𝑂(𝑛 ℎ ) subsets of 𝑁. For each
subset, we spend linear time greedily packing the remaining items. The time
initially spent sorting the items can be ignored thanks to the rest of the running
time.
For the approximation ratio, consider a run of the loop where 𝑆 actually is
the ℎ most profitable items in an optimal solution and the algorithm’s greedy
stage packs the set of items 𝐴0 ⊆ (𝑁 − 𝑆). Let OPT0 be the optimal way to pack
the smaller items in 𝑁 − 𝑆 so that OPT = 𝑝(𝑆) + OPT0. Let item 𝑘 be the first
item rejected by the greedy packing of 𝑁 − 𝑆. We know 𝑝 𝑘 ≤ 𝜖 OPT so by Claim
3.1.1 𝑝(𝐴0) ≥ OPT0 −𝜖 OPT. This means the total profit found in that run of the
loop is 𝑝(𝑆) + 𝑝(𝐴0) ≥ (1 − 𝜖) OPT.
Note that for any fixed choice of 𝜖 > 0, the preceding algorithm runs
in polynomial time. This type of algorithm is known as a polynomial time
approximation scheme or PTAS. The term “scheme” refers to the fact that the
algorithm varies with 𝜖. We say a maximization problem Π has a PTAS if for all
𝜖 > 0, there exists a polynomial time algorithm that gives a (1 − 𝜖) approximation
((1 + 𝜖) for minimization problems). In general, one can often find a PTAS for a
problem by greedily filling in a solution after first searching for a good basis on
which to work. As described below, Knapsack actually has something stronger
known as a fully polynomial time approximation scheme or FPTAS. A maximization
problem Π has a FPTAS if for all 𝜖 > 0, there exists an algorithm that gives
a (1 − 𝜖) approximation ((1 + 𝜖) for minimization problems) and runs in time
polynomial in both the input size and 1/𝜖.
CHAPTER 3. KNAPSACK 39
optimal solution during the rounding, but the scaled down OPT is still at least
𝑛
𝜖 . We have only lost an 𝜖 fraction of the solution. This process of rounding and
scaling values for use in exact algorithms has use in a large number of other
maximization problems. We now formally state the algorithm Round&Scale
and prove its correctness and running time.
Round&Scale(𝑁, 𝐵)
𝑛 1
1. For each 𝑖 set 𝑝 0𝑖 = b 𝑝𝑖 c
𝜖 𝑝max
2. Run exact algorithm with run time 𝑂(𝑛𝑃 0) to obtain 𝐴
3. Output 𝐴
Proof. The rounding can be done in linear time and as 𝑃 0 = 𝑂( 𝑛𝜖 ), the dy-
2
1 0 1 𝑛
𝑝(𝐴) ≥ 𝑝 (𝐴) ≥ 𝑝 0(𝐴∗ ) ≥ 𝑝(𝐴∗ ) − = OPT −𝜖𝑝max ≥ (1 − 𝜖) OPT
𝛼 𝛼 𝛼
CHAPTER 3. KNAPSACK 40
It should be noted that this is not the best FPTAS known for Knapsack. In
particular, [111] shows a FPTAS that runs in 𝑂(𝑛 log(1/𝜖) + 1/𝜖 4 ) time. There
have been several improvements and we refer the reader to Chan’s paper for the
latest [32].
Packing Problems
41
CHAPTER 4. PACKING PROBLEMS 42
Remark 4.1. The maximum clique problem is to find the maximum cardinality
clique in a given graph. It is approximation-equivalent to the MIS problem;
simply complement the graph.
The theorem basically says the following: there are a class of graphs in which
the maximum independent set size is either less than 𝑛 𝛿 or greater than 𝑛 1−𝛿
and it is NP-Complete to decide whether a given graph falls into the former
category or the latter.
The lower bound result suggests that one should focus on special cases, and
several interesting positive results are known. First, we consider a simple greedy
algorithm for the unweighted problem.
Greedy(𝐺 )
1. 𝑆 ← ∅
3. Output 𝑆
CHAPTER 4. PACKING PROBLEMS 43
Theorem 4.3. Greedy outputs an independent set 𝑆 such that |𝑆| ≥ 𝑛/(Δ + 1) where Δ
is the maximum degree of any node in the graph. Moreover |𝑆| ≥ 𝛼(𝐺)/Δ where 𝛼(𝐺)
is the cardinality of the largest independent set. Thus Greedy is a 1/Δ approximation.
Proof. We upper bound the number of nodes in 𝑉 \ 𝑆 as follows. A node 𝑢 is
in 𝑉 \ 𝑆 because it is removed as a neighbor of some node 𝑣 ∈ 𝑆 when Greedy
added 𝑣 to 𝑆. Charge 𝑢 to 𝑣. A node 𝑣 ∈ 𝑆 can be charged at most Δ times since
it has at most Δ neighbors. Hence we have that |𝑉 \ 𝑆| ≤ Δ|𝑆|. Since every node
is either in 𝑆 or 𝑉 \ 𝑆 we have |𝑆| + |𝑉 \ 𝑆| = 𝑛 and therefore (Δ + 1)|𝑆| ≥ 𝑛 which
implies that |𝑆| ≥ 𝑛/(Δ + 1).
We now argue that |𝑆| ≥ 𝛼(𝐺)/Δ. Let 𝑆 ∗ be a largest independent set in 𝐺.
As in the above proof we can charge each node 𝑣 in 𝑆 ∗ \ 𝑆 to a node 𝑢 ∈ 𝑆 \ 𝑆∗
which is a neighbor of 𝑣. The number of nodes charged to a node 𝑢 ∈ 𝑆 \ 𝑆 ∗ is at
most Δ. Thus |𝑆 ∗ \ 𝑆| ≤ Δ|𝑆 \ 𝑆∗ |.
𝑛
Exercise 4.1. Show that Greedy outputs an independent set of size at least 2(𝑑+1)
where 𝑑 is the average degree of 𝐺.
Remark 4.2. The well-known Turan’s theorem shows via a clever argument that
𝑛
there is always an independent set of size (𝑑+1) where 𝑑 is the average degree of
𝐺.
Remark 4.3. For the case of unweighted graphs one can obtain an approximation
log 𝑑
ratio of Ω( 𝑑 log log 𝑑 ) where 𝑑 is the average degree. Surprisingly, under a
complexity theory conjecture called the Unique-Games conjecture it is known to
log2 Δ
be NP-Hard to approximate MIS to within a factor of 𝑂( Δ ) in graphs with
maximum degree Δ when Δ is sufficiently large.
Exercise 4.2. Consider the weigthed MIS problem on graphs of maximum degree
Δ. Alter Greedy to sort the nodes in non-increasing order of the weight and
show that it gives a Δ1 -approximation. Can one obtain an Ω(1/𝑑)-approximation
for the weighted case where 𝑑 is the average degree?
LP Relaxation: One can formulate a simple linear-programming relaxation
for the (weighted) MIS problem where we have a variable 𝑥(𝑣) for each node
𝑣 ∈ 𝑉 indicating whether 𝑣 is chosen in the independent set or not. We have
constraints which state that for each edge (𝑢, 𝑣) only one of 𝑢 or 𝑣 can be chosen.
Õ
maximize 𝑤(𝑣)𝑥(𝑣)
𝑣∈𝑉
subject to 𝑥(𝑢) + 𝑥(𝑣) ≤ 1 (𝑢, 𝑣) ∈ 𝐸
𝑥(𝑣) ∈ [0, 1] 𝑣∈𝑉
CHAPTER 4. PACKING PROBLEMS 44
Õ
maximize 𝑤(𝑣)𝑥(𝑣)
𝑣∈𝑉
Õ
subject to 𝑥(𝑣) ≤ 1 𝑆 is a clique in 𝐺
𝑣∈𝑆
𝑥(𝑣) ∈ [0, 1] 𝑣∈𝑉
The above linear program has an exponential number of constraints, and it
cannot be solved in polynomial time in general, but for some special cases of
interest the above linear program can indeed be solved (or approximately solved)
in polynomial time and leads to either exact algorithms or good approximation
bounds.
Approximability of Vertex Cover and MIS: The following is a basic fact and
is easy to prove.
Fact 4.1. In any graph 𝐺 = (𝑉 , 𝐸), 𝑆 is a vertex cover in 𝐺 if and only if 𝑉 \ 𝑆 is an
independent set in 𝐺. Thus 𝛼(𝐺) + 𝛽(𝐺) = |𝑉 | where 𝛼(𝐺) is the size of a maximum
independent set in 𝐺 and 𝛽(𝐺) is the size of a minimum vertex cover in 𝐺.
The above shows that if one of Vertex Cover or MIS is NP-Hard then the
other is as well. We have seen that Vertex Cover admits a 2-approximation
while MIS admits no constant factor approximation. It is useful to see why a
2-approximation for Vertex Cover does not give any useful information for MIS
even though 𝛼(𝐺) + 𝛽(𝐺) = |𝑉 |. Suppose 𝑆 ∗ is an optimal vertex cover and has
size ≥ |𝑉 |/2. Then a 2-approximation algorithm is only guaranteed to give a
vertex cover of size |𝑉 |! Hence one does not obtain a non-trivial independent
set by complementing the approximate vertex cover.
Some special cases of MIS: We mention some special cases of MIS that have
been considered in the literature, this is by no means an exhaustive list.
• Interval graphs; these are intersection graphs of intervals on a line. An
exact algorithm can be obtained via dynamic programming and one can
solve more general versions via linear programming methods.
CHAPTER 4. PACKING PROBLEMS 45
For a vertex 𝑣 in a graph we use 𝑁(𝑣) denote the neighbors of 𝑣 (not including
𝑣 itself). For a graph 𝐺 = (𝑉 , 𝐸) and 𝑆 ⊂ 𝑉 we use 𝐺[𝑆] to denote the subgraph
of 𝐺 induced by 𝑆.
Definition 4.4. An undirected graph 𝐺 = (𝑉 , 𝐸) is inductive 𝑘-independent if there
is an ordering of the vertices 𝑣 1 , 𝑣2 , . . . , 𝑣 𝑛 such that for 1 ≤ 𝑖 ≤ 𝑛, 𝛼(𝐺[𝑁(𝑣 𝑖 ) ∩
{𝑣 𝑖+1 , . . . , 𝑣 𝑛 }]) ≤ 𝑘.
Graphs which are inductively 1-independent have a perfect elimination order-
ing and are called chordal graphs because they have an alternate characterization.
A graph is chordal iff each cycle 𝐶 in 𝐺 has a chord (an edge connecting two
nodes of 𝐶 which is not an edge of 𝐶), or in other words there is no induced
cycle of length more than 3.
Exercise 4.3. Prove that the intersection graph of intervals is chordal.
Exercise 4.4. Prove that if Δ(𝐺) ≤ 𝑘 then 𝐺 is inductively 𝑘-independent. Prove
that if 𝐺 is 𝑘-degenerate then 𝐺 is inductively 𝑘-independent.
The preceding shows that planar graphs are inductively 5-independent. In
fact, one can show something stronger, that they are inductively 3-independent.
Given a graph 𝐺 one can ask whether there is an algorithm that checks whether
𝐺 is inductively 𝑘-independent. There is such an algorithm that runs in time
𝑂(𝑘 2 𝑛 𝑘+2 ) [159]. A classical result shows how to recognize chordal graphs (𝑘 = 1)
in linear time. However, most of the useful applications arise by showing that a
certain class of graphs are inductively 𝑘-independent for some small value of 𝑘.
See [159] for several examples.
Exercise 4.5. Prove that the Greedy algorithm that considers the vertices in the
inductive 𝑘-independent order gives a 1𝑘 -approximation for MIS.
Interestingly one can obtain a 1𝑘 -approximation for the maximum weight
independent set problem in inductively 𝑘-independent graphs. The algorithm
is simple and runs in linear time but is not obvious. To see this consider the
weighted problem for intervals. The standard algorithm to solve this is via
dynamic programming. However, one can obtain an optimum solution for
all chordal graphs (given the ordering). We refer the reader to [159] for the
algorithm and proof (originally from [7]). Showing a Ω(1/𝑘)-approximation is
easier.
Greedy(𝑁 ,ℐ )
1. 𝑆 ← ∅
2. While (TRUE)
A. Let 𝐴 ← {𝑒 ∈ 𝑁 \ 𝑆 | 𝑆 + 𝑒 ∈ ℐ}
B. If 𝐴 = ∅ break
C. 𝑒 ← argmax𝑒∈𝐴 𝑤(𝑒)
D. 𝑆 ← 𝑆 ∪ {𝑒}
3. Output 𝑆
Exercise 4.6. Prove that the Greedy algorithm gives a 1/2-approximation for the
maximum weight matching problem in a general graph. Also prove that this
bound is tight even in bipartite graphs. Note that max weight matching can be
solved exactly in polynomial time.
The following theorem is not too difficult but not so obvious either.
Theorem 4.6. Greedy gives a 1/𝑘-approximation for the maximum weight independent
set problem in a 𝑘-system.
CHAPTER 4. PACKING PROBLEMS 48
The above theorem generalizes and unifies several examples that we have
seen so far including MIS in bounded degree graphs, matchings, matroids etc.
How does one see that a given independence system is indeed a 𝑘-system for
some parameter 𝑘? For instance matchings in graphs form a 2-system. The
following simple lemma gives an easy way to argue that a given system is a
𝑘-system.
Lemma 4.1. Suppose (𝑁 , ℐ) is an independence system with the following property: for
any 𝐴 ∈ ℐ and 𝑒 ∈ 𝑁 \ 𝐴 there is a set 𝑌 ⊂ 𝐴 such that |𝑌| ≤ 𝑘 and (𝐴 \ 𝑌) ∪ {𝑒} ∈ ℐ.
Then ℐ is a 𝑘-system.
𝑛
Õ
maximize 𝑤𝑖 𝑥𝑖
𝑖=1
Õ
subject to 𝑥𝑖 ≤ 1 1≤𝑗≤𝑚
𝑖:𝑝 𝑗 ∈𝐼 𝑖
𝑥 𝑖 ∈ [0, 1] 1≤𝑖≤𝑛
Rounding-with-Alteration
3. 𝑅 ← {𝑖 | 𝑥 0𝑖 = 1}
4. 𝑆 ← ∅
5. For 𝑖 = 𝑛 down to 1 do
The algorithm consists of two phases. The first phase is a simple selection
phase via independent randomized rounding. The second phase is deterministic
and is a greedy pruning step in the reverse elimination order. To analyze the
expected value of 𝑆 we consider two binary random variables for each 𝑖, 𝑌𝑖 and
𝑍 𝑖 . 𝑌𝑖 is 1 if 𝑖 ∈ 𝑅 and 0 otherwise. 𝑍 𝑖 is 1 if 𝑖 ∈ 𝑆 and 0 otherwise.
By linearity of expectation,
𝑤 𝑖 𝔼[𝑍 𝑖 ] = 𝑤 𝑖 P[𝑍 𝑖 = 1].
Í Í
Claim 4.3.1. 𝔼[𝑤(𝑆)] = 𝑖 𝑖
at the point 𝑏 𝑗 . Let ℰ1 be the event that 𝐼 𝑖 is rejected in the pruning phase. Let
ℰ ∈ be the event that at least one of the intervals in 𝐴 is selected in the first phase.
Note that ℰ1 can happen only if ℰ2 happens. Thus P[ℰ1 ] ≤ P[ℰ2 ]. In general we
try to upper bound P[ℰ2 ]. In this simple case we have an exact formula for it.
Ö Ö
P[ℰ2 ] = 1 − P[𝑌𝑗 = 0] = 1 − (1 − 𝑥 𝑗 /2).
𝑗∈𝐴 𝑗∈𝐴
Î We claim that P[ℰ2 ] ≤ Í 𝑗∈𝐴 𝑥 𝑗 /2 ≤ 1/2. One can derive this by showing that
Í
𝑛
Õ
maximize 𝑝𝑖 𝑥𝑖
𝑖=1
Õ
subject to 𝑠𝑖 𝑥𝑖 ≤ 1
𝑖
𝑥 𝑖 ∈ {0, 1} 1≤𝑖≤𝑛
Definition 4.7. A packing integer program (PIP) is an integer program of the form
max{𝑤𝑥 | 𝐴𝑥 ≤ 1, 𝑥 ∈ {0, 1} 𝑛 } where 𝑤 is a 1 × 𝑛 non-negative vector and 𝐴 is a
𝑚 × 𝑛 matrix with entries in [0, 1]. We call it a {0, 1}-PIP if all entries are in {0, 1}.
3. 𝑥 00 = 𝑥 0
one big item is chosen in 𝑥 0 then the algorithm retains that item and rejects
all the other small items. Otherwise, the algorithm rejects all items if two or
more big items are chosen in 𝑥 0 or if the total size of all small items chosen in 𝑥 0
exceeds the capacity.
The following claim is easy to verify.
Now let us analyze the probability of an item 𝑖 being present in the final
solution. Let ℰ1 be the event that 𝑖∈𝑆 𝑎 𝑖 𝑥 0𝑖 > 1, that is the sum of the sizes of
Í
the small items chose in 𝑥 0 exceeds the capacity. Let ℰ2 be the event that at least
one big item is chosen in 𝑥 0.
Proof. Let 𝑋𝑠 = 𝑖∈𝑆 𝑎 𝑖 𝑥 0𝑖 be the random variable that measures the sum of the
Í
sizes of the small items chosen. We have, by linearity of expectation, that
Õ Õ
𝔼[𝑋𝑠 ] = 𝑎 𝑖 𝔼[𝑥 0𝑖 ] = 𝑎 𝑖 𝑥 𝑖 /4 ≤ 1/4.
𝑖∈𝑆 𝑖∈𝑆
Proof. Since the size of each big item in 𝐵 is at least 1/2, we have 1 ≥ 𝑖∈𝐵 𝑎 𝑖 𝑥 𝑖 ≥
Í
Thus,
𝑥𝑖 𝑥𝑖
P[𝑍 𝑖 = 1] = P[𝑋𝑖 = 1] · P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] = (1 − P[𝑍 𝑖 = 0 | 𝑋𝑖 = 1]) ≥ .
4 16
One can improve the above analysis to show that P[𝑍 𝑖 = 1] ≥ 𝑥 𝑖 /8.
Theorem 4.9. The randomized algorithm outputs a feasible solution of expected weight
at least 𝑛𝑖=1 𝑤 𝑖 𝑥 𝑖 /16.
Í
Rounding for 𝒌-sparse PIPs: We now extend the rounding algorithm and
analysis above to 𝑘-sparse PIPs. Let 𝑥 be a feasible fractional solution to
max{𝑤𝑥 | 𝐴𝑥 ≤ 1, 𝑥 ∈ [0, 1]𝑛 }. For a column index 𝑖 we let 𝑁(𝑖) = {𝑗 | 𝐴 𝑗,𝑖 > 0}
be the indices of the rows in which 𝑖 has a non-zero entry. Since 𝐴 is 𝑘-
column-sparse we have that |𝑁(𝑖)| ≤ 𝑘 for 1 ≤ 𝑖 ≤ 𝑛. When we have more
than one constraint we cannot classify an item/index 𝑖 as big or small since it
may be big for some constraints and small for others. We say that 𝑖 is small
for constraint 𝑗 ∈ 𝑁(𝑖) if 𝐴 𝑗,𝑖 ≤ 1/2 otherwise 𝑖 is big for constraint 𝑗. Let
𝑆 𝑗 = {𝑖 | 𝑗 ∈ 𝑁(𝑖), and 𝑖 small for 𝑗} be the set of all small columns for 𝑗 and
𝐵 𝑗 = {𝑖 | 𝑗 ∈ 𝑁(𝑖), and 𝑖 small for 𝑗} be the set of all big columns for 𝑗. Note that
𝑆 𝑗 ∩ 𝐵 𝑗 is the set of all 𝑖 with 𝐴 𝑗,𝑖 > 0.
CHAPTER 4. PACKING PROBLEMS 55
3. 𝑥 00 = 𝑥 0
4. For 𝑗 = 1 to 𝑚 do
We upper bound the probability P[𝑍 𝑖 = 0|𝑋𝑖 = 1], that is, the probability
that we reject 𝑖 conditioned on the fact that it is chosen in the random solution
𝑥 0. We observe that
Õ
P[𝑍 𝑖 = 0|𝑋𝑖 = 1] ≤ (P[ℰ1 (𝑗)] + P[ℰ2 (𝑗)] ≤ 𝑘(1/(4𝑘) + 1/(2𝑘)) ≤ 3/4.
𝑗∈𝑁(𝑖)
We used the fact that 𝑁(𝑖) ≤ 𝑘 and the claims above. Therefore,
𝑥𝑖 𝑥𝑖
P[𝑍 𝑖 = 1] = P[𝑋𝑖 = 1] · P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] = (1 − P[𝑍 𝑖 = 0 | 𝑋𝑖 = 1]) ≥ .
4𝑘 16𝑘
The theorem below follows by using the above lemma and linearity of
expectation to compare the expected weight of the output of the randomized
algorithm with that of the fractional solution.
Theorem 4.10. The randomized algorithm outputs a feasible solution of expected weight
at least 𝑛𝑖=1 𝑤 𝑖 𝑥 𝑖 /(16𝑘). There is 1/(16𝑘)-approximation for 𝑘-sparse PIPs.
Í
Larger width helps: We saw during the discussion on the Knapsack problem
that if all items are small with respect to the capacity constraint then one can
obtain better approximations. For PIPs we defined the width of a given instance
as 𝑊 if max𝑖,𝑗 𝐴 𝑖𝑗 /𝑏 𝑖 ≤ 1/𝑊; in other words no single item is more than 1/𝑊
times the capacity of any constraint. One can show using a very similar algorithm
and anaylisis as above that the approximation bound improves √ to Ω(1/𝑘 d𝑊e )
for instance with width 𝑊. Thus if 𝑊 = 2 we get a Ω(1/ 𝑘) approximation
instead of Ω(1/𝑘)-approximation. More generally when 𝑊 ≥ 𝑐 log 𝑘/𝜖 for some
sufficiently large constant 𝑐 we can get a (1 − 𝜖)-approximation. Thus, in the
setting with multiple knapsack constraints, the notion of small with respect
𝑐𝜖
to capacities is that in each constraint the size of the item is ≤ log 𝑘 times the
capacity of that constraint.
Chapter 5
57
CHAPTER 5. LOAD BALANCING AND BIN PACKING 58
2The load of a machine is defined as the sum of the processing times of jobs that are assigned
to that machine.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 59
which implies:
Í𝑛
( 𝑖=1 𝑝𝑖 ) − 𝑝𝑘
𝐿 − 𝑝𝑘 ≤
𝑚
hence Í𝑛
𝑝𝑖 )
( 𝑖=1 1
𝐿 ≤ + 𝑝𝑘 1−
𝑚 𝑚
1
≤ OPT + OPT 1 −
𝑚
1
= OPT 2 −
𝑚
where the third step follows from the two lower bounds on OPT.
The above analysis is tight, i.e., there exist instances where the greedy
algorithm produces a schedule which has a makespan (2 − 1/𝑚) times the
optimal. Consider the following instance: 𝑚(𝑚 − 1) jobs with unit processing
time and a single job with processing time 𝑚. Suppose the greedy algorithm
schedules all the short jobs before the long job, then the makespan of the schedule
obtained is (2𝑚 − 1) while the optimal makespan is 𝑚. Hence the algorithm
gives a schedule which has makespan 2 − 1/𝑚 times the optimal.
It may seem from the tight example above that an approximation ratio
𝛼 < (2 − 1/𝑚) could be achieved if the jobs are sorted before processing, which
indeed is the case. The following algorithm, due to [69], sorts the jobs in
decreasing order of processing time prior to running Greedy Multiprocessor
Scheduling algorithm.
We will not prove the preceding theorem which requires some careful case
analysis. Instead we will show how one can obtain an easier bound of 3/2 via
the following claim.
Claim 5.1.1. Suppose 𝑝 𝑖 ≥ 𝑝 𝑗 for all 𝑖 > 𝑗 and 𝑛 > 𝑚. Then, OPT ≥ 𝑝 𝑚 + 𝑝 𝑚+1 .
CHAPTER 5. LOAD BALANCING AND BIN PACKING 60
Proof. Since 𝑛 > 𝑚 and the processing times are sorted in decreasing order,
some two of the (𝑚 + 1) largest jobs must be scheduled on the same machine.
Notice that the load of this machine is at least 𝑝 𝑚 + 𝑝 𝑚+1 .
Exercise 5.1. Prove that Modified Greedy Multiprocessor Scheduling gives a
(3/2 − 1/2𝑚)-approximation using the preceding claim and the other two lower
bounds on OPT that we have seen already.
Before going to the description of a PTAS for Multiprocessor Scheduling
problem, we discuss the case when the processing times of the jobs are bounded
from above.
Claim 5.1.2. If 𝑝 𝑖 ≤ 𝜖·OPT, ∀𝑖, then Modified Greedy Multiprocessor Scheduling
gives a (1 + 𝜖)-approximation.
Rounding Jobs:
For each big job 𝑖 do
If 𝑝 𝑖 ∈ (𝜖(1 + 𝜖) 𝑗 , 𝜖(1 + 𝜖) 𝑗+1 ]
Set 𝑝 𝑖 = 𝜖(1 + 𝜖) 𝑗+1
Proof. Notice that due to scaling, we have 𝑝 𝑖 ≤ 1 for all jobs 𝐽𝑖 . Since the job sizes
are between 𝜖 and 1 the number of geometric powers of (1 + 𝜖) required is 𝑘
where
𝜖(1 + 𝜖) 𝑘 ≤ 1
1 ln(1/𝜖)
⇒ 𝑘 ≤ ln(1+𝜖) = 𝑂( ).
𝜖 𝜖
Lemma 5.1. If the number of distinct job sizes is 𝑘, then there is an exact algorithm
that returns the schedule (if there is one) and runs in time 𝑂(𝑛 2𝑘 ).
Proof. Use Dynamic Programming.
ln(1/𝜖)
Corollary 5.6. Big Jobs can be scheduled (if possible) with load (1 + 𝜖) in time 𝑛 𝑂( 𝜖 )
.
Once we have scheduled the jobs in ℬ, using Claim 5.1.3, we can pack small
items using greedy list scheduling on top of them. The overall algorithm is then
given as:
PTAS Multiprocessor Scheduling:
1. Guess OPT
2. Define ℬ and 𝒮
3. Round ℬ to ℬ 0
4. If jobs in ℬ 0 can be scheduled in (1 + 𝜖) OPT
Greedily pack 𝒮 on top
Else
Modify the guess and Repeat.
In the following subsection, we comment on the guessing process.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 62
Definition 5.7. Given 𝜖 > 0 and a time 𝑇, a (1 + 𝜖)-relaxed decision procedure returns:
Define
1 Õ
𝐿 = max max 𝑝 𝑗 , 𝑝𝑗
𝑗 𝑚
𝑗
𝐿 is a lower bound on OPT as we saw earlier. Furthermore, an upper bound on
OPT is given by the Greedy Multiprocessor Scheduling algorithm, which is 2𝐿.
Consider running the decision procedure with guess 𝐿 + 𝑖𝜖𝐿 for each integer
𝑖 ∈ [d2/𝜖e]. We will choose the schedule with the best makespan among all
the successful runs. If 𝐿∗ is the optimum load then the algorithm will try the
decision procedure with 𝐿∗ + 𝜖𝐿 ≤ (1 + 𝜖)𝐿∗ . For this guess we are guaranteed a
solution and the decision procedure will succeed in outputting a schedule with
load (1 + 𝜖)(1 + 𝜖)𝐿∗ ≤ (1 + 3𝜖)𝐿∗ for sufficiently small 𝜖. We run the decision
procedure 𝑂(1/𝜖) times. This gives us the desired PTAS.
Remark 5.1. A PTAS indicates that the problem can approximated arbitrarily
well in polynomial time. However, a running time of the form 𝑛 𝑓 (𝜖) is typically
not very interesting. We have seen that an FPTAS is ruled out for the makespan
minimization problem. However, it does admit what is now called an Efficient
PTAS (EPTAS) whose running time is 2𝑂(1/𝜖 ·(log(1/𝜖)) ) + poly(𝑛). See [93].
2 3
In Greedy Bin Packing algorithm, a new bin is opened only if the item can
not be packed in any of the already opened bins. However, there might be
several opened bins in which the item 𝑖 could be packed. Several rules could be
formulated in such a scenario:
• Best Fit: Pack item in the bin that would have least amount of space left
after packing the item
• Worst Fit: Pack item in the bin that would have most amount of space left
after packing the item
𝑖 𝑠𝑖 .
Í
Observation 5.9. OPT ≥
Proof. For the sake of contradiction, assume that there are two bins 𝐵 𝑖 and 𝐵 𝑗
that are 12 -full. WLOG, assume that Greedy Bin Packing algorithm opened bin 𝐵 𝑖
before 𝐵 𝑗 . Then, the first item that the algorithm packed into 𝐵 𝑗 must be of size
at most 12 . However, this item could have been packed into 𝐵 𝑖 since 𝐵 𝑖 is 12 -full.
This is a contradiction to the fact that Greedy Bin Packing algorithm opens a
new bin if and only if the item can not be packed in any of the opened bins.
Proof of Theorem 5.8. Let 𝑚 be the number of bins opened by Greedy Bin Packing
algorithm. From Claim 5.2.1, we have:
Õ 𝑚−1
𝑠𝑖 >
2
𝑖
𝑖 𝑠𝑖 ,
Í
Using the observation that OPT ≥ we get:
𝑚−1
OPT >
2
which gives us:
𝑚 < 2 · OPT +1
⇒ 𝑚 ≤ 2 · OPT
Claim 5.2.2. If Bin Packing has a ( 32 − 𝜖)-approximation for any 𝜖 > 0, the Partition
problem can be solved exactly in polynomial time.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 65
Packing problem. If all items of 𝐼 can be packed in 2 bins, then we have an “yes”
0
answer to 𝐼. Otherwise, the items of 𝐼 0 need 3 bins and the answer to 𝐼 is “no”.
OPT for 𝐼 0 is 2 or 3. Hence, if there is a ( 32 − 𝜖)-approximation algorithm for
the Bin Packing problem, we can determine the value of OPT which in turn
implies that we can solve 𝐼. Thus, there can not exist a ( 23 − 𝜖)-approximation
algorithm for the Bin Packing problem, unless P = NP.
Recall the scaling property where we discussed why many optimization
problems do not admit additive approximations. We notice that the Bin Packing
problem does not have the scaling property. Hence it may be possible to find an
additive approximation algorithms. We state some of the results in this context:
Theorem 5.10 (Johnson ’74 [96]). There exists a polynomial time algorithm such 𝒜 𝐽
such that:
11
𝒜 𝐽 (𝐼) ≤ OPT(𝐼) + 4
9
for all instances 𝐼 of the Bin Packing problem.
Theorem 5.11 (de la Vega, Lueker ’81 [48]). For any fixed 𝜖 > 0 there exists a
polynomial time algorithm such 𝒜 𝐹𝐿 such that:
𝒜 𝐹𝐿 (𝐼) ≤ (1 + 𝜖) OPT(𝐼) + 1
for all instances 𝐼 of the Bin Packing problem.
Theorem 5.12 (Karmarkar, Karp ’82 [100]). There exists a polynomial time algorithm
such 𝒜 𝐾𝐾 such that:
𝒜 𝐾𝐾 (𝐼) ≤ OPT(𝐼) + 𝑂(log2 (OPT(𝐼)))
for all instances 𝐼 of the Bin Packing problem.
This has been improved recently.
Theorem 5.13 (Hoberg and Rothvoss 2017 [82]). There exists a polynomial time
algorithm such 𝒜 𝐻𝑇 such that:
𝒜 𝐾𝐾 (𝐼) ≤ OPT(𝐼) + 𝑂(log(OPT(𝐼)))
for all instances 𝐼 of the Bin Packing problem.
A major open problem is the following.
Open Question 5.14. Is there a polynomial-time algorithm 𝒜 such that 𝒜(𝐼) ≤
OPT(𝐼) + 𝑐, for some fixed constant 𝑐? In particular is 𝑐 = 1?
3
Exercise 5.2. Show that First Fit greedy rule yields a 2 OPT +1-approximation.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 66
Proof Sketch. If the number of big items is small, one can find the optimal solution
using brute force search.
The following gives a procedure to round up the items in ℬ:
Lemma 5.2. Consider the restriction of the bin packing problem to instances in which
the number of distinct item sizes is 𝑘. There is an 𝑛 𝑂(𝑘) -time algorithm that outputs the
optimum solution.
Proof Sketch. Use Dynamic Programming.
Claim 5.2.5. The items in ℬ can be packed in OPT +|ℬ1 | bins in time 𝑛 𝑂(1/𝜖 ) .
2
Proof. Using Rounding Item Sizes, we have restricted all items but those in ℬ1 to
have one of the 𝑘 − 1 distinct sizes. Using lemma 5.2, these items can be packed
efficiently in OPT. Furthermore, the items in ℬ1 can always be packed in |ℬ1 |
bins (one per bin). Hence, the total number of bins is OPT +|ℬ1 |.
The running time of the algorithm follows since 𝑘 = 𝑂(1/𝜖 2 ).
CHAPTER 5. LOAD BALANCING AND BIN PACKING 67
Lemma 5.3. Let 𝜖 > 0 be fixed. Consider the restriction of the bin packing problem to
instances in which each items is of size at least 𝜖. There is a polynomial time algorithm
that solves this restricted problem within a factor of (1 + 𝜖).
Proof. Using Claim 5.2.5, we can pack ℬ in OPT +|ℬ1 | bins. Recall that |ℬ1 | =
b𝑛 0/𝑘c ≤ 𝜖 2 · 𝑛 0/2 ≤ 𝜖 · OPT/8 where, we have used Claim 5.2.3 to reach the final
expression.
Theorem 5.15. For any 𝜖, 0 < 𝜖 < 1/2, there is an algorithm 𝒜 𝜖 that runs in time
polynomial in 𝑛 and finds a packing using at most (1 + 2𝜖) OPT +1 bins.
Proof. Assume that the number of bins used to pack items in ℬ is 𝑚 and the
total number of bins used after packing items in 𝒮 is 𝑚 0. Clearly
( 𝑖 𝑠𝑖 )
Í
0
𝑚 ≤ max 𝑚,
(1 − 𝜖)
since at most one bin must be (1 − 𝜖) full using an argument in Greedy Bin
Packing. Furthermore,
!
( 𝑖 𝑠𝑖 )
Í Õ
≤ 𝑠 𝑖 (1 + 2𝜖) + 1
(1 − 𝜖)
𝑖
68
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT69
minimize 𝜆
Õ
subject to 𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗∈𝑀
Õ
𝑥 𝑖𝑗 𝑝 𝑖𝑗 ≤ 𝜆 ∀𝑗 ∈ 𝑀
𝑖∈𝐽
𝑥 𝑖𝑗 ≥ 0 ∀𝑖 ∈ 𝐽, 𝑗 ∈ 𝑀
The above LP is very natural, but unfortunately it has unbounded integrality
gap. Suppose that we have a single job that has processing time 𝑇 on each of
the machines. Clearly, the optimal schedule has makespan 𝑇. However, the LP
can schedule the job to the extend of 1/𝑚 on each of the machines, i.e., it can set
𝑥 1𝑗 = 1/𝑚 for all 𝑗, and the makespan of the resulting fractional schedule is only
𝑇/𝑚.
To overcome this difficulty, we modify the LP slightly. Suppose we knew
that the makespan of the optimal solution is equal to 𝜆, where 𝜆 is some
fixed number. If the processing time 𝑝 𝑖𝑗 of job 𝑖 on machine 𝑗 is greater than
𝜆, job 𝑖 is not scheduled on machine 𝑗, and we can strengthen the LP by
setting 𝑥 𝑖𝑗 to 0 or equivalently, by removing the variable. More precisely, let
𝒮𝜆 = {(𝑖, 𝑗) | 𝑖 ∈ 𝐽, 𝑗 ∈ 𝑀, 𝑝 𝑖𝑗 ≤ 𝜆}. Given a value 𝜆, we can write the following
LP for the problem.
LP(𝜆)
Õ
𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗: (𝑖,𝑗)∈𝒮𝜆
Õ
𝑥 𝑖𝑗 𝑝 𝑖𝑗 ≤ 𝜆 ∀𝑗 ∈ 𝑀
𝑖: (𝑖,𝑗)∈𝒮𝜆
𝑥 𝑖𝑗 ≥ 0 ∀(𝑖, 𝑗) ∈ 𝒮𝜆
Note that the LP above does not have an objective function. In the following,
we are only interested in whether the LP is feasible, i.e, whether there is an
assignment that satisfies all the constraints. Also, we can think of 𝜆 as a
parameter and LP(𝜆) as a family of LPs, one for each value of the parameter. A
useful observation is that, if 𝜆 is a lower bound on the makespan of the optimal
schedule, LP(𝜆) is feasible and it is a valid relaxation for the Scheduling on
Unrelated Parallel Machines problem.
Lemma 6.1. Let 𝜆∗ be the minimum value of the parameter 𝜆 such that LP(𝜆) is feasible.
We can find 𝜆∗ in polynomial time.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT70
Proof. For any fixed value of 𝜆, we can check whether LP(𝜆) is feasible using a
polynomial-time algorithm for solving LPs. Thus we can find 𝜆∗ using binary
search starting with the interval [0, 𝑖,𝑗 𝑝 𝑖𝑗 ].
Í
In the following, we will show how to round a solution to LP(𝜆∗ ) in order to
get a schedule with makespan at most 2𝜆∗ . As we will see shortly, it will help to
round a solution to LP(𝜆∗ ) that is a vertex solution.
Let 𝑥 be a vertex solution to LP(𝜆∗ ). Let 𝐺 be a bipartite graph on the vertex
set 𝐽 ∪ 𝑀 that has an edge 𝑖𝑗 for each variable 𝑥 𝑖𝑗 ≠ 0. We say that job 𝑖 is
fractionally set if 𝑥 𝑖𝑗 ∈ (0, 1) for some 𝑗. Let 𝐹 be the set of all jobs that are
fractionally set, and let 𝐻 be a bipartite graph on the vertex set 𝐹 ∪ 𝑀 that has
an edge 𝑖𝑗 for each variable 𝑥 𝑖𝑗 ∈ (0, 1); note that 𝐻 is the induced subgraph of 𝐺
on 𝐹 ∪ 𝑀. As shown in Lemma 6.2, the graph 𝐻 has a matching that matches
every job in 𝐹 to a machine, and we will it in the rounding algorithm.
Lemma 6.2. The graph 𝐺 has a matching that matches every job in 𝐹 to a machine.
SUPM-Rounding
Find 𝜆∗
Find a vertex solution 𝑥 to LP(𝜆∗ )
For each 𝑖 and 𝑗 such that 𝑥 𝑖𝑗 = 1, assign job 𝑖 to machine 𝑗
Construct the graph 𝐻
Find a maximum matching ℳ in 𝐻
Assign the fractionally set jobs according to the matching ℳ
Proof. By Lemma 6.2, the matching ℳ matches every fractionally set job to a
machine and therefore all of the jobs are assigned. After assigning all of the
integrally set jobs, the makespan (of the partial schedule) is at most 𝜆∗ . Since
ℳ is a matching, each machine receives at most one additional job. Let 𝑖 be a
fractionally set job, and suppose that 𝑖 is matched (in ℳ) to machine 𝑗. Since
the pair (𝑖, 𝑗) is in 𝒮𝜆∗ , the processing time 𝑝 𝑖𝑗 is at most 𝜆∗ , and therefore the
total processing time of machine 𝑗 increases by at most 𝜆 after assigning the
fractionally set jobs. Therefore the makespan of the final schedule is at most
2𝜆∗ .
Exercise 6.1. Give an example that shows that Theorem 6.1 is tight. That is,
give an instance and a vertex solution such that the makespan of the schedule
SUPM-Rounding is at least (2 − 𝑜(1))𝜆∗ .
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT71
Since 𝜆∗ is a lower bound on the makespan of the optimal schedule, we get the
following corollary.
Corollary 6.2. SUPM-Rounding achieves a 2-approximation.
Now we turn our attention to Lemma 6.2 and some other properties of vertex
solutions to LP(𝜆). The following can be derived from the rank lemma which is
described in Chapter A. Here we give a self-contained proof.
Lemma 6.3. If LP(𝜆) is feasible, any vertex solution has at most 𝑚 + 𝑛 non-zero
variables and it sets at least 𝑛 − 𝑚 of the jobs integrally.
Proof. Let 𝑥 be a vertex solution to LP(𝜆). Let 𝑟 denote the number of pairs in
𝒮𝜆 . Note that LP(𝜆) has 𝑟 variables, one for each pair (𝑖, 𝑗) ∈ 𝒮𝜆 . If 𝑥 is a vertex
solution, it satisfies 𝑟 of the constraints of LP(𝜆) with equality. The first set of
constraints consists of 𝑛 constraints, and the second set of constraints consists of
𝑚 constraints. Therefore at least 𝑟 − (𝑚 + 𝑛) of the tight constraints are from the
third set of constraints, i.e., at least 𝑟 − (𝑚 + 𝑛) of the variables are set to zero.
We say that job 𝑖 is set fractionally if 𝑥 𝑖𝑗 ∈ (0, 1) for some 𝑗; job 𝑖 is set
integrally if 𝑥 𝑖𝑗 ∈ {0, 1} for all 𝑗. Let 𝐼 and 𝐹 be the set of jobs that are set
integrally and fractionally (respectively). Clearly, |𝐼 | + |𝐹| = 𝑛. Any job 𝑖 that
is fractionally set is assigned (fractionally) to at least two machines, i.e., there
exist 𝑗 ≠ ℓ such that 𝑥 𝑖𝑗 ∈ (0, 1) and 𝑥 𝑖ℓ ∈ (0, 1). Therefore there are at least
2|𝐹| distinct non-zero variables corresponding to jobs that are fractionally set.
Additionally, for each job 𝑖 that is integrally set, there is a variable 𝑥 𝑖𝑗 that is
non-zero. Thus the number of non-zero variables is at least |𝐼 | + 2|𝐹|. Hence
|𝐼 | + |𝐹| = 𝑛 and |𝐼 | + 2|𝐹| ≤ 𝑚 + 𝑛, which give us that |𝐼 | is at least 𝑛 − 𝑚.
Definition 6.3. A connected graph is a pseudo-tree if the number of edges is at most
the number of vertices. A graph is a pseudo-forest if each of its connected components
is a pseudo-tree.
Lemma 6.4. The graph 𝐺 is a pseudo-forest.
Proof. Let 𝐶 be a connected component of 𝐺. We restrict LP(𝜆) and 𝑥 to the
jobs and machines in 𝐶 to get LP0(𝜆) and 𝑥 0. Note that 𝑥 0 is a feasible solution
to LP0(𝜆). Additionally, 𝑥 0 is a vertex solution to LP0(𝜆). If not, 𝑥 0 is a convex
combination of two feasible solutions 𝑥 10 and 𝑥 20 to LP0(𝜆). We can extend 𝑥 10
and 𝑥20 to two solutions 𝑥 1 and 𝑥 2 to LP(𝜆) using the entries of 𝑥 that are not
in 𝑥 0. By construction, 𝑥 1 and 𝑥2 are feasible solutions to LP(𝜆). Additionally,
𝑥 is a convex combination of 𝑥 1 and 𝑥 2 , which contradicts the fact that 𝑥 is a
vertex solution. Thus 𝑥 0 is a vertex solution to LP0(𝜆) and, by Lemma 6.3, 𝑥 0
has at most 𝑛 0 + 𝑚 0 non-zero variables, where 𝑛 0 and 𝑚 0 are the number of jobs
and machines in 𝐶. Thus 𝐶 has 𝑛 0 + 𝑚 0 vertices and at most 𝑛 0 + 𝑚 0 edges, and
therefore it is a pseudo-tree.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT72
Proof. of Lemma 6.2 Note that each job that is integrally set has degree one in
𝐺. We remove each integrally set job from 𝐺; note that the resulting graph is 𝐻.
Since we removed an equal number of vertices and edges from 𝐺, it follows that
𝐻 is a pseudo-forest as well. Now we construct a matching ℳ as follows.
Note that every job vertex has degree at least 2, since the job is fractionally
assigned to at least two machines. Thus all of the leaves (degree-one vertices) of
𝐻 are machines. While 𝐻 has at least one leaf, we add the edge incident to the
leaf to the matching and we remove both of its endpoints from the graph. If 𝐻
does not have any leaves, 𝐻 is a collection of vertex-disjoint cycles, since it is a
pseudo-forest. Moreover, each cycle has even length, since 𝐻 is bipartite. We
construct a perfect matching for each cycle (by taking alternate edges), and we
add it to our matching.
Exercise 6.2. (Exercise 17.1 in [152]) Give a proof of Lemma 6.2 using Hall’s
theorem.
GAP-LP
Õ
min 𝑥 𝑖𝑗 𝑐 𝑖𝑗
(𝑖,𝑗)∈𝒮𝜆
Õ
subject to 𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗: (𝑖,𝑗)∈𝒮𝜆
Õ
𝑥 𝑖𝑗 𝑝 𝑖𝑗 ≤ 𝜆 ∀𝑗 ∈ 𝑀
𝑖: (𝑖,𝑗)∈𝒮𝜆
𝑥 𝑖𝑗 ≥ 0 ∀(𝑖, 𝑗) ∈ 𝒮𝜆
Since we also need to preserve the costs, we can no longer use the previous
rounding; in fact, it is easy to see that the previous rounding is arbitrarily bad
for the Generalized Assignment problem. However, we will still look for a
matching, but in a slightly different graph.
But before we give the rounding algorithm for the Generalized Assignment
problem, we take a small detour into the problem of finding a minimum-cost
matching in a bipartite graph. In the Minimum Cost Biparite Matching problem,
we are given a bipartite graph 𝐵 = (𝑉1 ∪ 𝑉2 , 𝐸) with costs 𝑐 𝑒 on the edges, and
we want to construct a minimum cost matching ℳ that matches every vertex in
𝑉1 , if there is such a matching. For each vertex 𝑣, let 𝛿(𝑣) be the set of all edges
incident to 𝑣. We can write the following LP for the problem.
BipartiteMatching(𝐵)
Õ
min 𝑐 𝑒 𝑦𝑒
𝑒∈𝐸(𝐵)
Õ
subject to 𝑦𝑒 = 1 ∀𝑣 ∈ 𝑉1
𝑒∈𝛿(𝑣)
Õ
𝑦𝑒 ≤ 1 ∀𝑣 ∈ 𝑉2
𝑒∈𝛿(𝑣)
𝑦𝑒 ≥ 0 ∀𝑒 ∈ 𝐸(𝐵)
The following is well-known in combinatorial optimization [137].
Theorem 6.4. For any bipartite graph 𝐵, any vertex solution to BipartiteMatching(𝐵)
is an integer solution. Moreover, given a feasible fractional solution 𝑦, we can find in
polynomial time a feasible solution 𝑧 such that 𝑧 is integral and
Õ Õ
𝑐𝑒 𝑧𝑒 ≤ 𝑐 𝑒 𝑦𝑒 .
𝑒∈𝐸(𝐵) 𝑒∈𝐸(𝐵)
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT74
In the rest of the section we give two different proofs that establish our
claimed result. One is based on the first work that gave this result [140], and the
other is based on iterative rounding [110].
GreedyPacking(𝑥)
𝑦 = 0 hhinitialize 𝑦 to 0ii
𝑠 = 1 hh𝑠 is the current binii
𝑅 = 1 hh𝑅 is the space available on bin 𝑠ii
for 𝑖 = 1 to ℎ
hhpack 𝑥 𝑖𝑗 into the binsii
if 𝑥 𝑖𝑗 ≤ 𝑅
𝑦 𝑖,(𝑗,𝑠) = 𝑥 𝑖𝑗
𝑅 = 𝑅 − 𝑥 𝑖𝑗
if 𝑅 = 0
𝑠 = 𝑠+1
𝑅=1
else
𝑦 𝑖,(𝑗,𝑠) = 𝑅
𝑦 𝑖,(𝑗,𝑠+1) = 𝑥 𝑖𝑗 − 𝑅 hhpack 𝑥 𝑖𝑗 − 𝑅 in the next binii
𝑅 = 1 − 𝑦 𝑖,(𝑗,𝑠+1)
𝑠 = 𝑠+1
return 𝑦
Jobs Slots/Bins
y1,(j,1) = 0.5
x1j = 0.5
ble on bin sii y2,(j,1) = 0.5
x2j = 0.7
y2,(j,2) = 0.2
x3j = 0.3 y3,(j,2) = 0.3
y4,(j,2) = 0.2
y5,(j,2) = 0.3
x4j = 0.2
y5,(j,3) = 0.3
x5j = 0.6
ck xij R in the next binii
“pack” jobs into the bins greedily. We only consider jobs 𝑖 such that 𝑝 𝑖𝑗 is at most
𝜆; let ℎ denote the number of such jobs. We assume without loss of generality
that these are labeled as 1, 2, · · · , ℎ, and 𝑝 1𝑗 ≥ 𝑝 2𝑗 ≥ · · · ≥ 𝑝 ℎ 𝑗 . Informally, when
we construct 𝑦, we consider the jobs 1, 2, · · · , ℎ in this order. Additionally, we
keep track of the bin that has not been filled and the amount of space 𝑠 available
on that bin. When we consider job 𝑖, we try to pack 𝑥 𝑖𝑗 into the current bin: if
there is at least 𝑥 𝑖𝑗 space available, i.e., 𝑥 𝑖𝑗 ≤ 𝑠, we pack the entire amount into
the current bin; otherwise, we pack as much as we can into the current bin, and
we pack the rest into the next bin. (See Figure 1 for an example.)
Í𝑘 𝑗
Proof. Note that, by construction, 𝑥 𝑖𝑗 = 𝑠=1
𝑦 𝑖,(𝑗,𝑠) . Therefore, for any job 𝑖, we
have
𝑘𝑗
Õ Õ Õ Õ
𝑦 𝑖,(𝑗,𝑠) = 𝑦 𝑖,(𝑗,𝑠) = 𝑥 𝑖𝑗 = 1
(𝑖,(𝑗,𝑠))∈𝛿(𝑖) 𝑗: (𝑖,𝑗)∈𝒮𝜆 𝑠=1 𝑗: (𝑖,𝑗)∈𝒮𝜆
𝑛 𝑘𝑗
Õ Õ Õ Õ Õ
𝑦 𝑖,(𝑗,𝑠) 𝑐 𝑖,(𝑗,𝑠) = 𝑦 𝑖,(𝑗,𝑠) 𝑐 𝑖𝑗 = 𝑥 𝑖𝑗 𝑐 𝑖𝑗
(𝑖,(𝑗,𝑠))∈𝐸(𝐺) 𝑖=1 𝑗: (𝑖,𝑗)∈𝒮𝜆 𝑠=1 (𝑖,𝑗)∈𝒮𝜆
Theorem 6.4 gives us the following corollary.
Corollary 6.5. The graph 𝐺 has a matching ℳ that matches every job and it has cost
at most (𝑖,𝑗)∈𝒮𝜆 𝑥 𝑖𝑗 𝑐 𝑖𝑗 . Moreover, we can find such a matching in polynomial time.
Í
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT77
GAP-Rounding
let 𝑥 be an optimal solution to GAP-LP
𝑦 = GreedyPacking(𝑥)
construct the graph 𝐺
construct a matching ℳ in 𝐺 such Íthat ℳ matches every job
and the cost of ℳ is at most (𝑖,𝑗)∈𝒮𝜆 𝑥 𝑖𝑗 𝑐 𝑖𝑗
for each edge (𝑖, (𝑗, 𝑠)) ∈ ℳ
assign job 𝑖 to machine 𝑗
𝑞 𝑗𝑠 = max 𝑝 𝑖𝑗
𝑖:𝑦 𝑖,(𝑗,𝑠) >0
That is, 𝑞 𝑗𝑠 is the maximum processing time of any pair 𝑖𝑗 such that job 𝑖 is
assigned (in 𝑦) to the slot (𝑗, 𝑠). It follows that the total processing time of the
Í𝑘 𝑗
jobs that ℳ assigns to machine 𝑗 is at most 𝑠=1 𝑞 𝑗𝑠 .
Since GAP-LP has a variable 𝑥 𝑖𝑗 only for pairs (𝑖, 𝑗) such that 𝑝 𝑖𝑗 is at most
𝜆, it follows that 𝑞 𝑗1 is at most 𝜆. We restrict attention to the case when at
least two slots are assigned to 𝑗, for otherwise it is easy to see that the load is
Í𝑘 𝑗
atmost 𝜆. Therefore we only need to show that 𝑠=2 𝑞 𝑗𝑠 is at most 𝜆 as well.
Consider a slot 𝑠 on machine 𝑗 such that 𝑠 > 1. Recall that we labeled the jobs
that are relevant to machine 𝑗 — that is, jobs 𝑖 such that 𝑝 𝑖𝑗 is at most 𝜆 — as
1, 2, · · · , ℎ such that 𝑝1𝑗 ≥ 𝑝 2𝑗 ≥ · · · ≥ 𝑝 ℎ 𝑗 . Consider a job ℓ that is assigned to
slot 𝑠. Since GreedyPacking considers jobs in non-increasing order according to
their processing times, the processing time 𝑝ℓ 𝑗 of job ℓ is at most the processing
time of any job assigned to the slot 𝑠 − 1. Therefore 𝑝ℓ 𝑗 is upper bounded by
any convex combination of the processing times of the jobs that are assigned to
the slot 𝑠 − 1. Since the slot 𝑠 − 1 is full, 𝑖 𝑦 𝑖,(𝑗,𝑠−1) = 1 and thus 𝑝ℓ 𝑗 is at most
Í
𝑘𝑗 𝑘𝑗 𝑘𝑗
Õ Õ Õ Õ Õ
𝑞 𝑗𝑠 ≤ 𝑦 𝑖,(𝑗,𝑠−1) 𝑝 𝑖𝑗 ≤ 𝑦 𝑖,(𝑗,𝑠) 𝑝 𝑖𝑗
𝑠=2 𝑠=2 𝑖 𝑠=1 𝑖
𝑘𝑗 𝑠
Õ Õ Õ Õ Õ
𝑦 𝑖,(𝑗,𝑠) 𝑝 𝑖𝑗 = 𝑝 𝑖𝑗 𝑦 𝑖,(𝑗,𝑠) = 𝑝 𝑖𝑗 𝑥 𝑖𝑗
𝑠=1 𝑖 𝑖 𝑠=1 𝑖
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT78
GAP-LP
Õ
min 𝑐 𝑖𝑗 𝑥 𝑖𝑗
(𝑖,𝑗)∈𝐸
Õ
subject to 𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗: (𝑖,𝑗)∈𝛿(𝑖)
Õ
𝑝 𝑖𝑗 𝑥 𝑖𝑗 ≤ 𝑏 𝑗 ∀𝑗 ∈ 𝑀
𝑖: (𝑖,𝑗)∈𝐸
𝑥 𝑖𝑗 ≥ 0 ∀(𝑖, 𝑗) ∈ 𝐸
one obtained by assigning each job to its cheapest allowed machine (one can
also argue that the LP is an integer polytope). Now consider another scenario.
Suppose each machine 𝑗 has in-degree at most 𝑘 in 𝐺 — that is, there are only 𝑘
jobs that can ever be assigned to any machine 𝑗. Now suppose we assign each
job to its cheapest allowed machine. Clearly the cost is at most the optimum
cost of any feasible solution. But what about the load? Since each machine had
in-degree at most 𝑘 we will load a machine 𝑗 to at most 𝑘𝑏 𝑗 . Thus, if 𝑘 = 2 we will
only violate the machine’s load by a factor of 2. However, this seems to be very
restrictive assumption. Now consider a less restrictive scenario where there is
one machine 𝑗 such that its in-degree is at most 2. Then, in the LP relaxation, we
can omit the constraint that limits its load since we are guaranteed that at most
2 jobs can be assigned to it (note that we still have the job assignment constraints
which only allow a job to be assigned to machines according to the edges of 𝐺).
Omitting constraints in an iterative fashion by taking advantage of sparsity in
the basic feasible solution is the key idea.
To allow dropping of constraints we need some notation. Given an instance of
GAP specified by 𝐺 = (𝐽 ∪ 𝑀, 𝐸) and 𝑀 0 ⊆ 𝑀, we let 𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0) denote the
LP relaxation for GAP where we only impose the load constraints for machines
in 𝑀 0. In other words we drop the load constraints for 𝑀 \ 𝑀 0. Note that jobs
are still allowed to be assigned to machines in 𝑀 \ 𝑀 0.
The key structural lemma that allows for iterated rounding is the following.
Lemma 6.6. Let 𝑦 be a basic feasible solution to 𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0). Then one of the
following properties holds:
1. There is some 𝑖𝑗 ∈ 𝐸 such 𝑦 𝑖𝑗 = 0 or 𝑦 𝑖𝑗 = 1.
GAP-Iter-Rounding(𝐺 )
1. 𝐹 = ∅, 𝑀 0 = 𝑀
3. Output assignment 𝐹
Theorem 6.7. Given an instance of GAP that is feasible and has optimum cost 𝐶, the
algorithm GAP-Iter-Rounding outputs an assignment whose cost is at most 𝐶 and
such that each machine 𝑗 has load at most 2𝑏 𝑗 .
The proof is by induction on the number of iterations. Alternatively, it is
useful to view the algorithm recursively. We will sketch the proof and leave
some of the formal details to the reader (who can also consult [110]). We observe
that the algorithm makes progress in each iteration via Lemma 6.6. The analysis
will consider the four cases that can happen in each iteration: (i) 𝑦 𝑖𝑗 = 0 for some
𝑖𝑗 ∈ 𝐸 (ii) 𝑦 𝑖𝑗 = 1 for some 𝑖𝑗 ∈ 𝐸 (iii) 𝑑(𝑗) ≤ 1 for some 𝑗 ∈ 𝑀 0 and (iv) 𝑑(𝑗) = 2
and 𝑖𝑗 𝑦 𝑖𝑗 ≥ 1 for some 𝑗 ∈ 𝑀 0.
Í
Thus the algorithm terminates in polynomial number of iterations. It is also
not hard to see that 𝐹 corresponds to an assignment of jobs to machines.
Observation 6.8. The algorithm terminates and outputs an assignment of jobs to
machines, and and job 𝑖 is assigned to 𝑗 implies 𝑖𝑗 ∈ 𝐸.
Now we prove that the assignment has good properties in terms of the cost
and loads.
Lemma 6.7. The cost of the LP solution at the start of each iteration is at most
𝐶 − 𝑖𝑗∈𝐹 𝑐 𝑖𝑗 . Hence, at the end of the algorithm the cost of the assignment 𝐹 is at most
Í
𝐶.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT81
Proof. This is true in the first iteration since 𝐹 = ∅ and the LP cost is less than that
of an optimum integer feasible solution. Now consider an iteration assuming
that the precondition holds.
If 𝑦 𝑖𝑗 = 0 we remove 𝑖𝑗 from 𝐸 and we note that the cost of the LP for the next
iteration does not increase since 𝑦 itself is feasible for the residual instance.
If 𝑦 𝑖𝑗 = 1 and we add 𝑖𝑗 to 𝐹, we can charge the cost of 𝑖𝑗 to what the LP has
already paid on the edge 𝑖𝑗, and the solution 𝑦 with 𝑖𝑗 removed is feasible to the
residual instance obtained by removing job 𝑖 and reducing the capacity of 𝑗 to
𝑏 𝑗 − 𝑝 𝑖𝑗 .
In the other cases we do not change 𝐹 but drop constraints so the LP cost can
only decrease in the subsequent iteration.
Now we upper bound the load on each machine 𝑗.
Lemma 6.8. For each machine 𝑗, 𝑖𝑗∈𝐹 𝑝 𝑖𝑗 ≤ 2𝑏 𝑗 . In fact, a stronger property holds:
Í
for each 𝑗, its load at the end of the algorithm is at most 𝑏 𝑗 or there is a single job assigned
to 𝑗 such that removing it reduces the load of 𝑗 to at most 𝑏 𝑗 .
• Use the preceding assignment to find a feasible packing of items that has
profit at least OPT/2.
For Max-GAP one can use a stronger LP relaxation and obtain a (1 − 1/𝑒 + 𝛿)-
approximation. We refer the reader to [58] for this result, and also to [28] for
connections to submodular function maximization. The latter connection allows
one to obtain an extremely simple 1/2 − 𝜖 greedy approximation algorithm that
is not obvious to discover.
Congestion Minimization in
Networks
84
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 85
𝑃 to route pair 𝑖. The constraints express that exactly one path is for each pair
𝑖. To minimize the maximum number of paths using any edge we introduce a
variable 𝜆 and minimize it subject it to a natural packing constraint.
minimize 𝜆
Õ
subject to 𝑥 𝑖,𝑃 = 1 1≤𝑖≤𝑘
𝑃∈𝒫𝑖
𝑘
Õ Õ
𝑥 𝑖,𝑃 ≤ 𝜆 ∀𝑒 ∈ 𝐸
𝑖=1 𝑃∈𝒫𝑖 ,𝑃3𝑒
𝑥 𝑖,𝑃 ∈ {0, 1} 1 ≤ 𝑖 ≤ 𝑘, 𝑃 ∈ 𝒫𝑖
Randomized-Rounding
2. For 𝑖 = 1 to 𝑘 do
3. Output 𝑄 1 , 𝑄 2 , . . . , 𝑄 𝑘 .
Note that the choices for the pairs done with independent randomness.
The analysis requires the use of Chernoff-Hoeffding bounds. See Chapter B.
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 86
Theorem 7.1. Randomized rounding outputs one path per pair and with probability
log 𝑚
at least (1 − 1/𝑚 2 ) no edge is contained in more than 𝑐 log log 𝑚 · 𝜆∗ paths where 𝑐 is
an absolute constant. Here 𝑚 are the number of edges in the graph 𝐺. One can also
log 𝑚
show that for any fixed 𝜖 > 0 the congestion is at most (1 + 𝜖)𝜆∗ + 𝑐 𝜖2 with high
probability.
minimize 𝜆
Õ Õ
subject to 𝑓 (𝑒 , 𝑖) − 𝑓 (𝑒 , 𝑖) = 1 1≤𝑖≤𝑘
𝑒∈𝛿+ (𝑠 ) 𝑒∈𝛿− (𝑠 𝑖)
Õ𝑖 Õ
𝑓 (𝑒 , 𝑖) − 𝑓 (𝑒 , 𝑖) = 0 1 ≤ 𝑖 ≤ 𝑘, 𝑣 ∈ 𝑉 − {𝑠 𝑖 , 𝑡 𝑖 }
𝑒∈𝛿+ (𝑣) 𝑒∈𝛿− (𝑣)
𝑘
Õ
𝑓 (𝑒 , 𝑖) ≤ 𝜆 ∀𝑒 ∈ 𝐸
𝑖=1
𝑓 (𝑒 , 𝑖) ≥ 0 1 ≤ 𝑖 ≤ 𝑘, 𝑒 ∈ 𝐸
it
is not hard to see that the separation oracle for the dual is another shortest path
type problem that can be solved efficiently (via Bellman-Ford type algorithm).
This is not easy to capture/see via the compact flow based formulation.
Derandomization: Is there a deterministic algorithm with the roughly the
same approximation guarantee? The algorithm can be derandomized via the
notion of pessimistic estimators. Congestion Minimization was one of the first
instances with a sophisticated use of this technique [132].
Integrality gap and Hardness of Approximation: There is simple yet clever
example demonstrating that the integrality gap of the flow relaxation in directed
graphs is Ω(log 𝑚/log log 𝑚) [114]. In a remarkable result, [44] showed that
𝑂(log 𝑚/log log 𝑚) is the hardness factor. The complexity of Congestion Mini-
mization is less clear in undirected graphs. It is known that the LP integrality gap
and hardness of approximation are Ω(log log 𝑛/log log log 𝑛) [8]. Closing the
gap betten the upper and lower bounds is a major open problem.
Here we outline the integrality gap example for directed graphs from [114].
The graph 𝐺 and the pairs are constructed in a recursive fashion. Let ℎ be a
parameter that we will fix later. We start with a directed path 𝑣 0 , 𝑣1 , . . . , 𝑣 𝑛 . We
add a demand pair (𝑠 1 , 𝑡1 ) which connects to the path as follows. We partition
the path into 𝑛/ℎ paths of equal length: add an arc to 𝑠 to the start of each
sub-path and an arc from the end of each sub-path to 𝑡. See figure.
One can see from the figure that the pair (𝑠, 𝑡) can splits its flow along ℎ paths.
Now we consider each of the ℎ sub-paths and recursively create an instance on
the path with length 𝑛/ℎ − 1 (while keeping parameter ℎ the same). Note that in
the second level of the recursion we add ℎ new source-sink pairs, one for each
sub-path. We stop the recursion when the size of the sub-path is Θ(ℎ). Let 𝑑 be
the depth of the recursion.
We claim that there is a fractional routing of all demand pairs where the
congestion is at most 𝑑/ℎ. This follows by splitting the flow of the pairs ℎ ways.
The next claim is that some edge has congestion 𝑑 in any integral routing. This
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 89
can be seen inductively. The top level pair (𝑠, 𝑡) has to choose one amongst the
ℎ sub-paths — all edges in that sub-path will be used by the route for (𝑠, 𝑡).
Inductively there is some edge in that sub-path with congestion 𝑑 − 1 and hence
the congestion of that edge will 𝑑 when we add the path for (𝑠, 𝑡).
It now remains to set the parameters. If we choose ℎ = log2 𝑛 say then
𝑑 = Θ(log 𝑛/log log 𝑛). The fractional congestion is ≤ 1 and integrally congestion
is Θ(log 𝑛/log log 𝑛).
Short paths and improved congestion via Lovász-Local-Lemma: We consider
the congestion minimization problem when the path for each pair is required
to be “short”. By this we mean that we are required to route on a path with
at most 𝑑 edges where 𝑑 is some given parameter. One can imagine that in
many applications 𝑑 is small and is a fixed constant, say 10. The question
is whether the approximation ratio can be improved. Indeed one can show
that the LP integrality gap is 𝑂(log 𝑑/log log 𝑑). Thus, when 𝑑 𝑛 we get a
substantial improvement. However, proving this and obtaining a polynomial
time algorithm are quite non-trivial. One requires the use of the subtle Lovász-
Local-Lemma (LLL), a powerful tool in probabilistic combinatorics. Typically
LLL only gives a proof of existence and there was substantial work in making LLL
constructive/efficient. Srinivasan obtained an algorithm via derandomization of
LLL in this context with a lot of technical work [142]. There was a breakthrough
work of Moser and Tardos [123] that gave an extremely simple way to make
LLL constructive and this has been refined and developed over the last decade.
For the congestion minimization problem we refer the reader to [75] which
builds upon [123] and describes an efficient randomized algorithm that outputs
a solution with congestion 𝑂(log 𝑑/log log 𝑑). In fact the application is given in
the context of a more abstract problem that we discuss in the next section.
Integer flows and Unsplittable flows: We worked with the simple setting
where each pair (𝑠 𝑖 , 𝑡 𝑖 ) wishes to send one unit of flow. One can imagine a
situation where one wants to send 𝑑 𝑖 units of flow for pair 𝑖 where 𝑑 𝑖 is some
(integer) demand value. There are two interesting variants. The first one requires
integer valued flow for each pair which means that we want to find 𝑑 𝑖 paths for
(𝑠 𝑖 , 𝑡 𝑖 ) that each carry one unit of flow (the paths can overlap). This variant can
be essentially reduced to the unit demand flow by creating 𝑑 𝑖 copies of (𝑠 𝑖 , 𝑡 𝑖 )
— we leave this as a simple exercise for the reader. The second variant is that
we want each pair’s flow of 𝑑 𝑖 units to be sent along a single path — this is
called unsplittable flow. When discussing unsplittable flow it is also natural to
consider capacities on the edges. Thus, each edge has a capacity 𝑢𝑒 and one
wants to minimize congestion relative to 𝑢𝑒 . The techniques we discussed can
be generalized relatively easily to this version as well to obtain the same kind
of bounds. The unsplittable flow problem is interesting even in the setting
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 90
where there is a single source/sink or when the graph is a simple ring or a path.
Interesting results are known here and we refer the reader to [2, 30, 49, 70, 122,
141] for further pointers.
minimize 𝜆
Õ
subject to 𝑥 𝑖,𝑗 = 1 1≤𝑖≤𝑘
1≤𝑗≤ℓ 𝑖
𝑘
Õ Õ
𝑣 𝑖,𝑗,𝑘 𝑥 𝑖,𝑗 ≤ 𝜆 ∀𝑒 𝑘
𝑖=1 1≤𝑗≤ℓ 𝑖
𝑥 𝑖,𝑗 ∈ {0, 1} 1 ≤ 𝑖 ≤ 𝑘, 1 ≤ 𝑗 ≤ ℓ 𝑖
minimize 𝜆
Õ
subject to 𝑥 𝑖,𝑗 = 1 1≤𝑖≤𝑘
1≤𝑗≤ℓ 𝑖
𝐴𝑥 ≤ 𝜆𝟙
𝑥 𝑖,𝑗 ∈ {0, 1} 1 ≤ 𝑖 ≤ 𝑘, 1 ≤ 𝑗 ≤ ℓ 𝑖
Local search is a powerful and widely used heuristic method (with various
extensions). In this lecture we introduce this technique in the context of
approximation algorithms. The basic outline of local search is as follows. For
an instance 𝐼 of a given problem let 𝒮(𝐼) denote the set of feasible solutions for
𝐼. For a solution 𝑆 we use the term (local) neighborhood of 𝑆 to be the set of all
solutions 𝑆0 such that 𝑆0 can be obtained from 𝑆 via some local moves. We let
𝑁(𝑆) denote the neighborhood of 𝑆.
LocalSearch:
Find a “good” initial solution 𝑆0 ∈ 𝒮(𝐼)
𝑆 ← 𝑆0
repeat
If (∃𝑆0 ∈ 𝑁(𝑆) such that val(𝑆0) is strictly better than val(𝑆))
𝑆 ← 𝑆0
Else
𝑆 is a local optimum
return 𝑆
EndIf
Until (True)
92
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 93
We will first focus on the quality of solution output by the local search
algorithm.
Lemma 8.1. Let 𝑆 be the output of the local search algorithm. Then for each vertex 𝑣,
𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)) ≥ 𝑤(𝛿(𝑣))/2.
Proof. Let 𝛼 𝑣 = 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)) be the weight of edges among those incident to 𝑣
that cross the cut 𝑆. Let 𝛽 𝑣 = 𝑤(𝛿(𝑣)) − 𝛼 𝑣 .
We claim that 𝛼 𝑣 ≥ 𝛽 𝑣 for each 𝑣. If 𝑣 ∈ 𝑉 \ 𝑆 and 𝛼 𝑣 < 𝛽 𝑣 then moving 𝑣
to 𝑆 will strictly increase 𝑤(𝛿(𝑆)) and 𝑆 cannot be a local optimum. Similarly if
𝑣 ∈ 𝑆 and 𝛼 𝑣 < 𝛽 𝑣 , we would have 𝑤(𝛿(𝑆 − 𝑣)) > 𝑤(𝛿(𝑆)) and 𝑆 would not be a
local optimum.
Corollary 8.1. If 𝑆 is a local optimum then 𝑤(𝛿(𝑆)) ≥ 𝑤(𝐸)/2 ≥ OPT/2.
Proof. Since each edge is incident to exactly two vertices we have 𝑤(𝛿(𝑆)) =
𝑣∈𝑉 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)). Apply the above lemma,
1Í
2
1Õ
𝑤(𝛿(𝑆)) = 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣))
2
𝑣∈𝑉
1Õ
≥ 𝑤(𝛿(𝑣))/2
2
𝑣∈𝑉
1
≥ 𝑤(𝐸)
2
1
≥ OPT,
2
since OPT ≤ 𝑤(𝐸).
The running time of the local search algorithm depends on the number of
local improvement iterations; checking whether there is a local move that results
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 95
Lemma 8.3. The modified local search algorithm terminates in 𝑂( 1𝜖 𝑛 log 𝑛) iterations
of the improvement step.
Proof. We observe that 𝑤(𝑆0 ) = 𝑤(𝛿(𝑣 ∗ )) ≥ 𝑛2 𝑤(𝐸) (why?). Each local improve-
ment iteration improves 𝑤(𝛿(𝑆)) by a multiplicative factor of (1 + 𝜖/𝑛). Therefore
if 𝑘 is the number of iterations that the algorithm, then (1 + 𝜖/𝑛) 𝑘 𝑤(𝑆0 ) ≤ 𝑤(𝛿(𝑆)
where 𝑆 is the final output. However, 𝑤(𝛿(𝑆)) ≤ 𝑤(𝐸). Hence
Max Directed Cut: A problem related to Max Cut is Max Directed Cut in
which we are given a directed edge-weighted graph 𝐺 = (𝑉 , 𝐸) and the goal is
to find a set 𝑆 ⊆ 𝑉 that maximizes 𝑤(𝛿+𝐺 (𝑆)); that is, the weight of the directed
edges leaving 𝑆. One can apply a similar local search as the one for Max Cut.
However, the following example shows that the output 𝑆 can be arbitrarily bad.
Let 𝐺 = (𝑉 , 𝐸) be a directed in-star with center 𝑣 and arcs connecting each of
𝑣 1 , . . . , 𝑣 𝑛 to 𝑣. Then 𝑆 = {𝑣} is a local optimum with 𝛿+ (𝑆) = ∅ while OPT = 𝑛.
However, a minor tweak to the algorithm gives a 1/3-approximation! Instead
of returning the local optimum 𝑆 return the better of 𝑆 and 𝑉 \ 𝑆. This step is
needed because the directed cuts are not symmetric.
Problem 8.2. Max Submod Func. Given a non-negative submodular set function 𝑓
on a ground set 𝑉 via a value oracle1 find max𝑆⊆𝑉 𝑓 (𝑆).
1A value oracle for a set function 𝑓 : 2𝑉 → ℝ provides access to the function by giving the
value 𝑓 (𝐴) when presented with the set 𝐴.
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 98
Note that if 𝑓 is monotone then the problem is trivial since 𝑉 is the optimum
solution. Therefore, the problem is interesting (and NP-Hard) only when 𝑓
is not necessarily monotone. We consider a simple local search algorithm
for Max Submod Func and show that it gives a 1/3-approximation and a 1/2-
approximation when 𝑓 is symmetric. This was shown in [57].
Remark 8.4. Given a graph 𝐺 = (𝑉 , 𝐸) consider the submodular function 𝑓 :
2𝑉 → ℝ where 𝑓 (𝑆) = |𝛿(𝑆)| − 𝐵 where 𝐵 is a fixed number. Is there a polynomial
time algorithm to decide whether there is a set 𝑆 such that 𝑓 (𝑆) ≥ 0?
• If 𝑓 (𝐵) > 𝑓 (𝐴) then there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐴 + 𝑣) − 𝑓 (𝐴) > 0.
More generally there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐴 + 𝑣) − 𝑓 (𝐴) ≥
1
|𝐵\𝐴|
( 𝑓 (𝐵) − 𝑓 (𝐴)).
• If 𝑓 (𝐴) > 𝑓 (𝐵) then there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐵 − 𝑣) − 𝑓 (𝐵) > 0.
More generally there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐵 − 𝑣) − 𝑓 (𝐵) ≥
1
|𝐵\𝐴|
( 𝑓 (𝐴) − 𝑓 (𝐵)).
Corollary 8.3. Let 𝑆 be a local optimum for the local search algorithm and let 𝑆 ∗ be an
optimum solution. Then 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆∗ ) and 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∪ 𝑆 ∗ ).
Proof. Let 𝑆 be the local optimum and 𝑆 ∗ be a global optimum for the given
instance. From the previous corollary we have that 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆∗ ) and
𝑓 (𝑆) ≥ 𝑓 (𝑆 ∪ 𝑆 ∗ ). Note that the algorithm outputs the better of 𝑆 and 𝑉 \ 𝑆. By
submodularity, we have,
𝑓 (𝑉 \ 𝑆) + 𝑓 (𝑆 ∪ 𝑆 ∗ ) ≥ 𝑓 (𝑆 ∗ \ 𝑆) + 𝑓 (𝑉) ≥ 𝑓 (𝑆 ∗ \ 𝑆)
2 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆 ∗ ) + 𝑓 (𝑆 ∪ 𝑆¯ ∗ )
≥ 𝑓 (𝑆 ∩ 𝑆 ∗ ) + 𝑓 (𝑆∗ \ 𝑆)
≥ 𝑓 (𝑆∗ ) + 𝑓 (∅)
≥ 𝑓 (𝑆∗ ) = OPT .
The running time of the local search algorithm may not be polynomial
but one can modify the algorithm as we did for Max Cut to obtain a strongly
polynomial time algorithm that gives a (1/3 − 𝑜(1))-approximation ((1/2 − 𝑜(1) for
symmetric). See [57] for more details. There has been much work on submodular
function maximization including work on variants with additional constraints.
Local search has been a powerful tool for these problems. See [24, 60, 112]
for some of the results on local search based method, and [25] for a survey on
submodular function maximization.
Chapter 9
Clustering and Facility Location are two widely studied topics with a vast
literature. Facility location problems have been well-studied in Operations
Research and logistics. Clustering is ubiquitious with many applications in data
analysis and machine learning. We confine attention to a few central problems
and provide some pointers as needed to other topics. These problems have also
played an important role in approximation algorithms and their study has led to
a variety of interesting techniques. Research on these topics is still quite active.
For both classes of problems a key assumption that we will make is that we
are working with points in some underlying metric space. Recall that a space
(𝑉 , 𝑑) where 𝑑 : 𝑉 × 𝑉 → ℝ+ is a metric space if the distance function 𝑑 satisfies
metric properties: (i) 𝑑(𝑢, 𝑣) = 0 iff 𝑢 = 𝑣 (reflexivity) (ii) 𝑑(𝑢, 𝑣) = 𝑑(𝑣, 𝑢) for
all 𝑢, 𝑣 ∈ 𝑉 (symmetry) and (iii) 𝑑(𝑢, 𝑣) + 𝑑(𝑣, 𝑤) ≥ 𝑑(𝑢, 𝑤) for all 𝑢, 𝑣, 𝑤 ∈ 𝑉
(triangle inequality). We will abuse the notation and use 𝑑(𝐴, 𝐵) for two sets
𝐴, 𝐵 ⊆ 𝑉 to denote the quantity min𝑝∈𝐴,𝑞∈𝐵 𝑑(𝑝, 𝑞). Similarly 𝑑(𝑝, 𝐴) for 𝑝 ∈ 𝑉
and 𝐴 ⊆ 𝑉 will denote min𝑞∈𝐴 𝑑(𝑝, 𝑞).
Center based clustering: In center based clustering we are given 𝑛 points
𝑃 = {𝑝 1 , 𝑝2 , . . . , 𝑝 𝑛 } in a metric space (𝑉 , 𝑑), and an integer 𝑘. The goal is to
cluster/partition 𝑃 into 𝑘 clusters 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 which are induced by choosing
𝑘 centers 𝑐 1 , 𝑐2 , . . . , 𝑐 𝑘 from 𝑉. Each point 𝑝 𝑖 is assigned to its nearest center
from 𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 and this induces a clustering. The nature of the clustering
is controlled by an objective function that measures the quality of the clusters.
Typically we phrase the problem as choosing 𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 to minimize the
clustering objective 𝑛𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 })𝑞 for some 𝑞. The three most well-
Í
studied problems are special cases obtained by choosing an appropriate 𝑞.
• 𝑘-Center is the problem when 𝑞 = ∞ which can be equivalently phrased
as min𝑐1 ,𝑐2 ,...,𝑐 𝑘 ∈𝑉 max𝑛𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 }). In other words we want to
minimize the maximum distance of the input points to the cluster centers.
100
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 101
Í𝑛
• 𝑘-Median is problem when 𝑞 = 1. min𝑐1 ,𝑐2 ,...,𝑐 𝑘 ∈𝑉 𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 }).
Í𝑛
• 𝑘-Means is the problem when 𝑞 = 2. min𝑐1 ,𝑐2 ,...,𝑐 𝑘 ∈𝑉 𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 })2 .
9.1 𝒌-Center
Recall that in 𝑘-Center we are given 𝑛 points 𝑝 1 , . . . , 𝑝 𝑛 in a metric space and
an integer 𝑘 and we need to choose 𝑘 cluster centers 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 } such
that we minimize max𝑖 𝑑(𝑝 𝑖 , 𝐶). An alternative view is that we wish to find the
smallest radius 𝑅 such that there are 𝑘 balls of radius 𝑅 that together cover all
the input points. Given a fixed 𝑅 this can be seen as a Set Cover problem. In
fact there is an easy reduction from Dominating Set to 𝑘-Center establishing
the NP-Hardness. Moreoever, as we saw already in Chapter 1, 𝑘-Center has no
2 − 𝜖-approximation unless 𝑃 = 𝑁𝑃 via a reduction from Dominating Set. Here
we will see two 2-approximation algorithms that are quite different and have
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 102
their own advantages. The key lemma for their analysis is common and is stated
below.
Lemma 9.1. Suppose there are 𝑘 + 1 points 𝑞1 , 𝑞2 , . . . , 𝑞 𝑘+1 ∈ 𝑃 such that 𝑑(𝑞 𝑖 , 𝑞 𝑗 ) >
2𝑅 for all 𝑖 ≠ 𝑗. Then OPT > 𝑅.
Gonzalez-𝑘-Center(𝑃, 𝑘 )
1. 𝐶 ← ∅
2. For 𝑖 = 1 to 𝑘 do
3. Output 𝐶
Theorem 9.1. Let 𝑅 = max𝑝∈𝑃 𝑑(𝑝, 𝐶) where 𝐶 is the set of centers chosen by
Gonzalez’s algorithm. Then 𝑅 ≤ 2𝑅 ∗ where 𝑅 ∗ is the optimum 𝑘-Center radius for 𝑃.
Proof. Suppose not. There is a point 𝑝 ∈ 𝑃 such that 𝑑(𝑝, 𝐶) > 2𝑅 ∗ which implies
that 𝑝 ∉ 𝐶. Since the algorithm chose the farthest point in each iteration and
could have chosen 𝑝 in each of the 𝑘 iteration but did not, we have the property
that 𝑑(𝑐 𝑖 , {𝑐1 , . . . , 𝑐 𝑖−1 }) > 2𝑅 ∗ for 𝑖 = 2 to 𝑘. This implies that the distance
between each pair of points in the set {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 , 𝑝} is more than 2𝑅 ∗ . By
Lemma 9.1, the optimum radius must be larger than 𝑅∗ , a contradiction.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 103
HS-𝑘-Center(𝑃, 𝑘 )
2. 𝐶 ← ∅, 𝑆 ← 𝑃
3. While (𝑆 ≠ ∅) do
4. Output 𝐶
Theorem 9.2. Let 𝐶 be the output of the HS algorithm for a guess 𝑅. Then for all
𝑝 ∈ 𝑃, 𝑑(𝑝, 𝐶) ≤ 2𝑅 and moreover if 𝑅 ≥ 𝑅∗ then |𝐶| ≤ 𝑘.
Proof. The first property is easy to see since we only remove a point 𝑝 from 𝑆 if
we add a center 𝑐 to 𝐶 such that 𝑝 ∈ 𝐵(𝑐, 2𝑅). Let 𝑐1 , 𝑐2 , . . . , 𝑐 ℎ be the centers
chosen by the algorithm. We observe that 𝑑(𝑐 𝑖 , {𝑐1 , . . . , 𝑐 𝑖−1 }) > 2𝑅. Thus, if the
algorithm outputs ℎ points then the pairwise distance between any two of them
is more than 2𝑅. By Lemma 9.1, if ℎ ≥ 𝑘 + 1 the optimum radius is > 𝑅. Hence,
if the guess 𝑅 ≥ 𝑅∗ the algorithm outputs at most 𝑘 centers.
The guessing of 𝑅∗ can be implemented by binary search in various ways.
We omit these routine details.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 104
Exercise 9.2. Describe an example where the algorithm uses exactly 𝑘 centers
even with guess 𝑅 ∗ . Describe an example where the algorithm outputs less than
𝑘 centers with a guess of 𝑅∗ .
𝑛
Õ
𝑥𝑖 = 𝑘
Õ𝑖=1
𝑥𝑖 ≥ 1 ∀𝑝 𝑗 ∈ 𝑃
𝑝 𝑖 ∈𝐵(𝑝 𝑗 ,𝑅)
𝑥𝑖 ≥ 0 ∀𝑝 𝑖 ∈ 𝑃
Exercise 9.3. Prove that if 𝑅 is feasible for the preceding LP then one can obtain
a solution with 𝑘 centers with max radius 2𝑅.
Exercise 9.4. Generalize the LP for the 𝑘-Supplier problem and prove that one
can obtain a 3-approximation with respect to lower bound provided via the LP
approach.
9.2.1 LP Rounding
The first constant factor approximation for UCFL was via LP rounding by Aardal,
Shmoys and Tardos using a filtering technique of Lin and Vitter. We start
with the LP relaxation. We use a variable 𝑦 𝑖 for 𝑖 ∈ ℱ to indicate whether 𝑖 is
opened or not. We use a variable 𝑥 𝑖,𝑗 to indicate whether 𝑗 is connected to 𝑖
(or assigned to 𝑖). One set of constraints are natural here: each client has to be
assigned/connected to a facility. The other constraint requires that 𝑗 is assigned
to 𝑖 only if 𝑖 is open.
Õ ÕÕ
min 𝑓𝑖 𝑦 𝑖 + 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗
𝑖∈ℱ 𝑗∈𝒟 𝑖∈ℱ
Õ
𝑥 𝑖,𝑗 = 1 ∀𝑗 ∈ 𝒟
𝑖
𝑥 𝑖,𝑗 ≤ 𝑦𝑖 𝑖 ∈ℱ,𝑗 ∈𝒟
𝑥, 𝑦 ≥ 0
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 106
𝑖 𝑓𝑖 𝑦 𝑖 + 𝑗 𝛼𝑗.
Í Í
Lemma 9.2. For each 𝑗 and each 𝛿 ∈ (0, 1) there is a total facility value of at least (1 − 𝛿)
in 𝐵(𝑗, 𝛼 𝑗 /𝛿). That is, 𝑖∈𝐵(𝑗,𝛼 𝑗 /𝛿) 𝑦 𝑖 ≥ 1 − 𝛿. In particular 𝑖∈𝐵(𝑗,2𝛼 𝑗 ) 𝑦 𝑖 ≥ 1/2.
Í Í
Proof. This essentially follows from Markov’s inequality or averaging. Note that
𝛼 𝑗 = 𝑖 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗 and 𝑖 𝑥 𝑖,𝑗 = 1. Suppose 𝑖∈𝐵(𝑗,𝛼 𝑗 /𝛿) 𝑦 𝑖 < 1 − 𝛿. Since 𝑥 𝑖,𝑗 ≤ 𝑦 𝑖
Í Í Í
for all 𝑖, 𝑗, we will have 𝛼 𝑗 > 𝛿𝛼 𝑗 /𝛿 which is impossible.
We say that two clients 𝑗 and 𝑗 0 intersect if there is some 𝑖 ∈ ℱ such that
𝑖 ∈ 𝐵(𝑗, 2𝛼 𝑗 ) ∩ 𝐵(𝑗 0 , 2𝛼 𝑗0 ). The rounding algorithm is described below.
UCFL-primal-rounding
4. For 𝑗 = 1 to ℎ do
It is not hard to see that every client is assigned to an open facility. The main
issue is to bound the total cost. Let 𝐹 be the total facility opening cost, and let 𝐶
be the total connection cost. We will bound these separately.
Lemma 9.3. 𝐹 ≤ 2 𝑖 𝑓𝑖 𝑦 𝑖 .
Í
Proof. Note that a client 𝑗 opens a new facility only if it has not been assigned
when it is considered by the algorithm. Let 𝑗1 , 𝑗2 , . . . , 𝑗 𝑘 be the clients that open
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 107
It should be clear to the reader that the algorithm and analysis are not
optimized for the approximation ratio. The goal here was to simply outline the
basic scheme that led to the first constnat factor approximation.
9.2.2 Primal-Dual
Jain and Vazirani developed an elegant and influential primal-dual algorithm
for UCFL [91]. It was influential since it allowed new algorithms for 𝑘-median
and several generalizations of UCFL in a clean way. Moreover the primal-dual
algorithm is simple and efficient to implement. On the other hand we should
mention that one advantage of having an LP solution is that it gives an explicit
lower bound on the optimum value while a primal-dual method yields a lower
bound via a feasible dual which may not be optimal. We need some background
to describe the primal-dual method in approximation.
Complementary slackness: To understand primal-dual we need some basic
background in complementary slackness. Suppose we have a primal LP (P) of
the form min 𝑐𝑥 s.t 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0 which we intentionally wrote in this standard
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 108
form as a covering LP. It’s dual (D) is a packing LP max 𝑏 𝑡 𝑦 s.t 𝑦𝐴𝑡 ≥ 𝑐, 𝑦 ≥ 0.
We will assume that both primal and dual are feasible and hence the optimum
values are finite, and via strong duality we know that the optimum values are
the same.
Definition 9.4. A feasible solution 𝑥 to (P) and a feasible solution 𝑦 to (𝐷) satisfy the
primal complementary slackness condition with respect to each other if the following is
true: for each 𝑖, 𝑥 𝑖 = 0 or the corresponding dual constraint is tight, that is 𝑗 𝐴 𝑖,𝑗 𝑦 𝑗 = 𝑐 𝑖 .
Í
They satisfy the dual complementary slackness condition if the following is true: for each
𝑗, 𝑦 𝑗 = 0 or the corresponding primal constraint is tight, that is 𝑗 𝐴 𝑗,𝑖 𝑥 𝑖 = 𝑏 𝑗 .
Í
Õ
min 𝑤𝑣 𝑥𝑣
𝑣∈𝑉
𝑥𝑢 + 𝑥𝑣 ≥ 1 ∀𝑒 = (𝑢, 𝑣) ∈ 𝐸
𝑥𝑣 ≥ 0 ∀𝑣 ∈ 𝑉
Õ
max 𝑦𝑒
𝑒∈𝐸
Õ
𝑦𝑒 ≤ 𝑤𝑣 ∀𝑣 ∈ 𝑉
𝑒∈𝛿(𝑣)
𝑦𝑒 ≥ 0, ∀𝑒 ∈ 𝐸
Lemma 9.5. Let 𝑥 ∗ be an optimum solution to the primal covering LP. Then 𝑆 = {𝑣 |
𝑥 𝑣∗ > 0} is a feasible vertex cover for 𝐺 and moreover 𝑤(𝑆) ≤ 2 𝑣 𝑤 𝑣 𝑥 𝑣∗ .
Í
Proof. It is easy to see that 𝑆 is a vertex cover via the same argument that we
have seen before. How do we bound the cost now that we may be rounding
𝑥 𝑣∗ to 1 even though 𝑥 𝑣∗ may be tiny? Let 𝑦 ∗ be any optimum solution to the
dual; one exists (why?). Via strong duality we have 𝑣 𝑤 𝑣 𝑥 𝑣∗ = 𝑒 𝑦 𝑒∗ . Via
Í Í
primal-complementary slackness we have the property that if 𝑥 𝑣∗ > 0 then
𝑒∈𝛿(𝑣) 𝑦 𝑒 = 𝑤 𝑣 . Hence
Í ∗
Õ Õ Õ
𝑤(𝑆) = 𝑤𝑣 = 𝑦 𝑒∗ .
𝑣:𝑥 𝑣∗ >0 𝑣:𝑥 𝑣∗ >0 𝑒∈𝛿(𝑣)
where we use the fact that an edge 𝑒 has only two end points in the inequality.
VC-primal-dual(𝐺 = (𝑉 , 𝐸), 𝑤 : 𝑉 → ℝ+ )
3. While (𝑈 ≠ ∅) do
Õ
A. Increase 𝑦 𝑒 uniformly for each 𝑒 ∈ 𝑈 until constraint 𝑦 𝑒 = 𝑤 𝑣 for
𝑒∈𝛿(𝑣)
some vertex 𝑎
B. Set 𝑥 𝑎 = 1 // Maintain primal complementary slackness
C. 𝑈 ← 𝑈 \ 𝛿(𝑎) // Remove all edges covered by 𝑎
Remark 9.2. Note that when checking whether a vertex 𝑣 is tight we count the
payments from 𝑒 ∈ 𝛿(𝑣) even though some of them are no longer active.
Lemma 9.6. At the end of the algorithm 𝑥 is a feasible vertex cover for 𝐺 and
𝑣 𝑤 𝑣 𝑥 𝑣 ≤ 2 OPT.
Í
Proof. By induction on the iterations one can prove that (i) 𝑦 remains dual
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 111
Õ Õ Õ Õ
𝑤𝑣 𝑥𝑣 = 𝑦𝑒 ≤ 2 𝑦 𝑒 ≤ 2 OPT𝐿𝑃 .
𝑣 𝑣:𝑥 𝑣 >0 𝑒∈𝛿(𝑣) 𝑒
We used the fact that 𝑒 has at most two end points in the first inequality and
the fact that 𝑦 is dual feasible in the second inequality. In terms of payment
what this says is that edge 𝑢𝑣’s payment of 𝑦𝑢𝑣 can be used to pay for opening 𝑢
and 𝑣 while the dual pays only once.
As the reader can see, the algorithm is very simple to implement.
Exercise 9.5. Describe an example to show that the primal-dual algorithm’s
worst case performance is 2. Describe an example to show that the dual value
constructed by the algorithm is ' OPT/2. Are these two parts the same?
Remark 9.3. The algorithm generalizes to give an 𝑓 -approximation for Set Cover
where 𝑓 is the maximum frequency of any element. There are examples showing
that the performance of this algorithm in the worst-case can indeed be a factor
of 𝑓 . We saw earlier that the integrality gap of the LP is at most 1 + ln 𝑑 where
𝑑 is the maximum set size. There is no contradiction here since the specific
primal-dual algorithm that we developed need not achieve the tight integrality
gap.
Õ
max 𝛼𝑗
𝑗∈𝒟
Õ
𝛽 𝑖,𝑗 ≤ 𝑓𝑖 ∀𝑖 ∈ ℱ
𝑖
𝛼 𝑗 − 𝛽 𝑖,𝑗 ≤ 𝑑(𝑖, 𝑗) 𝑖 ∈ℱ,𝑗 ∈𝒟
𝛼, 𝛽 ≥ 0
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 112
JV-primal-dual((ℱ ∪ 𝒟, 𝑑), 𝑓𝑖 , 𝑖 ∈ ℱ )
5. Create graph 𝐻 with vertex set 𝑂 and edge set 𝑄 where (𝑖, 𝑖 0) ∈ 𝑄 iff
there exists client 𝑗 such that 𝛽(𝑖, 𝑗) > 0 and 𝛽(𝑖 0 , 𝑗) > 0
Example: The example in Fig 9.2.2 illustrates the need for the pruning phase.
There are 2𝑛 clients and 𝑛 facilities and the opening costs of the facilities are
𝑛 + 2 except for the first one which has an opening cost of 𝑛 + 1. The first group
of clients shown at the top of the figure have a connection cost of 1 to each
facility. The second group of clients have the following property: 𝑑(𝑖 ℎ , 𝑗 0ℎ ) = 1
and 𝑑(𝑖ℓ , 𝑗 0ℎ ) = 3 when ℓ ≠ 𝑗. The rest of the distances are induced by these. One
can verify that in the growth phase all the facilities will be opened. However the
total dual value after the growth phase is 5𝑛 − 1 while the total cost of the opened
facilitie is Θ(𝑛 2 ) and hence pruning is necessary to obtain a good approximation.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 114
Figure 9.1: Example to illustrate the need for the pruning phase.
Claim 9.2.1. If (𝑖, 𝑗) is an edge in 𝐺 then 𝛼 𝑗 − 𝛽 𝑖,𝑗 = 𝑑(𝑖, 𝑗) and hence 𝛼 𝑗 ≥ 𝑑(𝑖, 𝑗).
Proof. The algorithm adds an edge (𝑖, 𝑗) only if (𝑖, 𝑗) is a witness edge for 𝑗 or if
𝛽(𝑖, 𝑗) > 0 (in which case (𝑖, 𝑗) is special). The algorithm maintain the invariant
𝛽 𝑖,𝑗 = max{0, 𝛼 𝑗 − 𝑑(𝑖, 𝑗)} and hence if 𝛽 𝑖,𝑗 > 0 the claim is clear. If 𝛽 𝑖,𝑗 = 0 then
(𝑖, 𝑗) is a witness edge for 𝑗 and case (i) happens when 𝑗 is removed from 𝐴 and
in this case 𝛼 𝑗 = 𝑑(𝑖, 𝑗).
Analysis: We upper bound the cost of opening the facilities in 𝑂 0 and the
connection cost of the clients.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 115
Since a client 𝑗 can pay for multiple facilities which we cannot afford, the
pruning phase removes facilities such that no client 𝑗 is connected two facilities
in 𝑂 0 with special edges (otherwise those two facilities will have an edge in 𝐻).
We say that a client 𝑗 is directly connected to a facility 𝑖 ∈ 𝑂 0 if (𝑖, 𝑗) is a special
edge. We call all such clients directly connected clients and the rest of the clients
are called indirectly connected. We let 𝒟1 = ∪𝑖∈𝑂 0 𝑍 𝑖 be the set of all directly
connected clients and let 𝒟2 be the set of all indirectly connected clients.
For 𝑖 ∈ 𝑂 0 let 𝑍 𝑖 be the directly connected clients. By the pruning rule we
have the property that a client 𝑗 is directly connected to at most one facility in 𝑂 0.
We show that each facility in 𝑂 0 can be paid for by its directly connected clients.
Lemma 9.8. For each 𝑖 ∈ 𝑂 0, 𝛽 𝑖,𝑗 = 𝑓𝑖 .
Í
𝑗∈𝑍 𝑖
Proof. From Lemma 9.7 and the fact that if 𝑖 ∈ 𝑂 0 then every client 𝑗 with special
edge (𝑖, 𝑗) must be directly connected to 𝑖.
From Claim 9.2.1 we see that if 𝑗 is directly connected to 𝑖 then 𝛼 𝑗 −𝛽 𝑖,𝑗 = 𝑑(𝑖, 𝑗),
and hence 𝑗 can pay its connection cost to 𝑖 and its contribution to opening
𝑖. What about indirectly connected clients? The next lemma bounds their
connection cost.
Lemma 9.9. Suppose 𝑗 ∈ 𝒟2 , that is, it is an indirectly connected client. Let 𝑖 be its
closest facility in 𝑂 0 then 𝑑(𝑖, 𝑗) ≤ 3𝛼 𝑗 .
Proof. Let (𝑖, 𝑗) be the witness edge for 𝑗. Note that 𝑖 ∈ 𝑂. Since 𝑗 is an indirectly
connected client there is no facility 𝑖 0 ∈ 𝑂 0 such that (𝑖 0 , 𝑗) is a special edge. Since
𝑖 ∉ 𝑂 0 it must be because 𝑖 was closed in the pruning phase and hence there
must be a facility 𝑖 0 ∈ 𝑂 0 such that (𝑖, 𝑖 0) is an edge in 𝐻 (otherwise 𝑂 0 would
not be a maximal independent set). Therefore there is some client 𝑗 0 ≠ 𝑗 such
that (𝑖 0 , 𝑗 0) and (𝑖, 𝑗 0) are both special edges. We claim that 𝛼 𝑗 ≥ 𝛼 𝑗0 . Assuming
this claim we see via triangle inequality and Claim 9.2.1 that,
𝑑(𝑖 0 , 𝑗) ≤ 𝑑(𝑖, 𝑗) + 𝑑(𝑖, 𝑗 0) + 𝑑(𝑖 0 , 𝑗 0) ≤ 𝛼 𝑗 + 2𝛼 𝑗0 ≤ 3𝛼 𝑗 .
Since 𝑖 0 ∈ 𝑂 0 the nearest client to 𝑗 is within distance ≤ 3𝛼 𝑗 .
We now prove the claim that 𝛼 𝑗 ≥ 𝛼 𝑗0 . Let 𝑡 = 𝛼 𝑗 be the time when 𝑗 connects
to facility 𝑖 as its witness. Consider two cases. In the first case 𝑑(𝑖, 𝑗 0) ≤ 𝑡 which
means that 𝑗 0 had already reached 𝑖 at or before 𝑡; in this case 𝛼 𝑗0 ≤ 𝑡 since 𝑖 was
open at 𝑡. In the second case 𝑑(𝑖, 𝑗 0) > 𝑡; this means that 𝑗 0 reached 𝑖 strictly after
𝑡. Since 𝑖 was already open at 𝑡, 𝑗 0 would not pay to open 𝑖 which implies that
𝛽(𝑖, 𝑗 0) = 0 but then (𝑖, 𝑗 0) would not be a special edge and hence this case cannot
arise. This finishes the proof of the claim.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 116
With the preceding two lemmas in place we can bound the total cost of
opening facilities in 𝑂 0 and connecting clients to them. We will provide a refined
statement that turns out to be useful in some applications.
Theorem 9.6.
Õ Õ Õ
𝑑(𝑂 0 , 𝑗) + 3 𝑓𝑖 ≤ 3 𝛼 𝑗 ≤ 3 OPT𝐿𝑃 .
𝑗∈𝒟 𝑖∈𝑂 0 𝑗∈𝒟
Õ© Õ
= 𝑓𝑖 + 𝑑(𝑖, 𝑗)®
ª
𝑗∈𝑂 0 « 𝑗∈𝑍 𝑖 ¬
Õ Õ
≥ 𝑓𝑖 + 𝑑(𝑂 0 , 𝑗).
𝑗∈𝑂 0 𝑗∈𝒟1
𝛼𝑗 ≥
Í
Í For indirectly connected clients, via Lemma 9.9, we have 3 𝑗∈𝒟2
𝑗∈𝒟2 𝑑(𝑂 , 𝑗). Thus
0
Õ Õ Õ
3 𝛼𝑗 = 3 𝛼𝑗 + 3 𝛼𝑗
𝑗∈𝒟 𝑗∈𝒟1 𝑗∈𝒟2
Õ Õ Õ
≥ 3 𝑓𝑖 + 3 𝑑(𝑂 0 , 𝑗) + 𝑑(𝑂 0 , 𝑗)
𝑖∈𝑂 0 𝑗∈𝒟1 𝑗∈𝒟2
Õ Õ
≥ 3 𝑓𝑖 + 𝑑(𝑂 0 , 𝑗).
𝑖∈𝑂 0 𝑗∈𝒟
The algorithm is easy and efficient to implement. One of the main advantages
of the stronger property that we saw in the theorem is that it leads to a nice
algorithm for the 𝑘-median problem; we refer the reader to Chapter 25 in [152]
for a detailed description. In addition the flexibility of the primal-dual algorithm
has led to algorithms for several other variants; see [20] for one such example.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 117
9.3 𝒌-Median
𝑘-Median has been extensively studied in approximation algorithms due to its
simplicity and connection to UCFL. The first constant factor approximation was
obtained in [35] via LP rounding. We will consider a slight generalization of
𝑘-Median where the medians are to be selected from the facility set ℱ . We
describe the LP which is closely related to that for UCFL which we have already
seen. The variables are the same: 𝑦 𝑖 indicates whether a center is opened at
location 𝑖 ∈ ℱ and 𝑥 𝑖,𝑗 indicates whether client 𝑗 is connected to facility 𝑖. The
objective and constraints change since the problem requires one to open at most
𝑘 facilities but there is no cost to opening them.
ÕÕ
min 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗
𝑗∈𝒟 𝑖∈ℱ
Õ
𝑥 𝑖,𝑗 = 1 ∀𝑗 ∈ 𝒟
𝑖
𝑥 𝑖,𝑗 ≤ 𝑦𝑖 𝑖 ∈ℱ,𝑗 ∈𝒟
Õ
𝑦𝑖 ≤ 𝑘
𝑖
𝑥, 𝑦 ≥ 0
thus the analysis cannot be based on preserving the cost of each client within a
constant factor of its fractional cost.
There are several LP rounding algorithms known. An advantage of LP based
approach is that it leads to a constant factor approximation for the Matroid
Median problem which is a nice and powerful generalization of the 𝑘-Median
problem; here there is a matroid defined over the facilities and the constraint is
that the set of facilities chosen must be independent in the given matroid. One
can write a natural LP relaxation for this problem and prove that the LP has a
constant integrality gap by appealing to matroid intersection! It showcases the
power of classical result in combinatorial optimization. We refer the reader to
[109, 147].
Theorem 9.7 (Arya et al. [11]). For any fixed 𝑝 ≥ 1 the 𝑝-swap local search heuristic
has a tight worst-case approximation ratio of (3 + 2/𝑝) for 𝑘-Median. In particular the
basic local search algorithm yields a 5-approximation.
in this fashion. The short answer is that the analysis ideas required a series of
developments with the somewhat easier case of UCFL coming first.
We set up some notation. Let 𝜙 : 𝒟 → 𝑆 be the mapping of clients to
facilities in 𝑆 based on nearest distance. Similarly let 𝜙∗ : 𝒟 → 𝑆 ∗ the mapping
to facilities in the optimum solution 𝑆∗ . Thus 𝑗 connects to facility 𝜙(𝑗) in the
local optimum and to 𝜙 ∗ (𝑗) in the optimum solution. We also let 𝑁(𝑖) denote
the set of all clients assigned to a facility 𝑖 ∈ 𝑆 and let 𝑁 ∗ (𝑖) denote the set of all
clients assigned to a facility 𝑖 ∈ 𝑆 ∗ . Let 𝐴 𝑗 = 𝑑(𝑗, 𝑆) and 𝑂 𝑗 = 𝑑(𝑗, 𝑆∗ ) be the cost
paid by 𝑗 in local optimum and optimal solutions respectively. To reinforce the
notation we express cost(𝑆) as follows.
Õ Õ Õ
cost(𝑆) = 𝐴𝑗 = 𝑑(𝑗, 𝑖).
𝑗∈𝒟 𝑖∈𝑆 𝑗∈𝑁(𝑖)
Similarly Õ Õ Õ
cost(𝑆∗ ) = 𝑂𝑗 = 𝑑(𝑗, 𝑖).
𝑗∈𝒟 𝑖∈𝑆 ∗ 𝑗∈𝑁 ∗ (𝑖)
We create a set of pairs 𝒫 as follows. There will be exactly 𝑘 pairs. For each
𝑓∗ ∈ 𝑆1∗ we add the pair (𝑟, 𝑓 ∗ ) where 𝜌( 𝑓 ∗ ) = 𝑟. For each 𝑓 ∗ ∈ 𝑆∗ \ 𝑆1∗ we add a
pair (𝑟, 𝑓 ∗ ) where 𝑟 is any arbitrary center from 𝑅0 — however we make sure
that a center 𝑟 ∈ 𝑅 0 is in at most two pairs in 𝒫; this is possible because of Claim
9.3.1.
The pairs satisfy the following property.
The intuition for the pairs is as follows. If 𝜌( 𝑓 ∗ ) = {𝑟} then we are essentially
forced to consider the pair (𝑟, 𝑓 ∗ ) since 𝑟 could be the only center near 𝑓 ∗ with
all other centers from 𝑆 very far away. When considering the swap (𝑟, 𝑓 ∗ ) we
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 120
Figure 9.2: Mapping between 𝑆 ∗ and 𝑆 with each 𝑓 ∗ ∈ 𝑆 ∗ mapped to its closest
center in 𝑆.
can move the clients connecting to 𝑟 to 𝑓 ∗ . On the other hand if |𝜌−1 (𝑟)| > 1 then
𝑟 is close to several centers in 𝑆 ∗ and may be serving many clients. Thus we do
not consider such an 𝑟 in the swap pairs.
The main technical claim about the swap pairs is the following.
We defer the proof of the lemma for now and use it to show that cost(𝑆) ≤
5cost(𝑆 ∗ ). We sum over all pairs (𝑟, 𝑓 ∗ ) ∈ 𝒫 and note that each 𝑓 ∗ ∈ 𝑆 occurs in
exactly one pair and each 𝑟 ∈ 𝑆 occurs in at most two pairs. Note that 𝑂 𝑗 − 𝐴 𝑗
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 121
Õ © Õ Õ
0 ≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 ®
ª
(𝑟, 𝑓 ∗ )∈𝒫 « 𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)
Õ Õ Õ Õ ¬
≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2 2𝑂 𝑗
𝑓 ∗ ∈𝑆 ∗ 𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑟∈𝑆 𝑗∈𝑁(𝑟)
∗ ∗
≤ cost(𝑆 ) − cost(𝑆) + 4cost(𝑆 ).
Figure 9.3: Two cases in proof of Lemma 9.10. Consider the swap pair (𝑟, 𝑓 ∗ ) In
the figure on the left the client 𝑗 ∈ 𝑁 ∗ ( 𝑓 ∗ ) is assigned to 𝑓 ∗ . In the figure on the
right the client 𝑗 ∈ 𝑁(𝑟) \ 𝑁 ∗ ( 𝑓 ∗ ) is is assigned to 𝑟 0 = 𝜌( 𝑓ˆ∗ ).
facility. Which one? Let 𝜙 ∗ (𝑗) = 𝑓ˆ∗ be the facility that 𝑗 is assigned to in the
optimum solution. Note that 𝑓ˆ∗ ≠ 𝑓 ∗ . We assign 𝑗 to 𝑟 0 = 𝜌( 𝑓ˆ∗ ); from Claim 9.3.2,
𝜌( 𝑓ˆ∗ ) ≠ 𝑟 and hence 𝑟 0 ∈ 𝑆 − 𝑟 + 𝑓 ∗ . See Figure 9.3. The change in the cost for
such a client 𝑗 is 𝑑(𝑗, 𝑟 0) − 𝑑(𝑗, 𝑟). We bound it as follows
Every other client is assigned to its existing center in 𝑆. Thus the total
increase in the cost is obtained as
Õ Õ Õ Õ
(𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 ≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 .
𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)\𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)
See [11] for a description of the tight example. The example in the conference
version of the paper is buggy.
9.4 𝒌-Means
The 𝑘-Means problem is very popular in practice for a variety of reasons. In terms
of center-based clustering the goal is to choose 𝑘 centers 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 }
to minimize 𝑝 𝑑(𝑝, 𝐶)2 . In the discrete setting one can obtain constant factor
Í
approximation algorithms via several techniques that follow the approach of
𝑘-Median. We note that the squared distance does not satisfy triangle inequality,
however, it satisfies a relaxed triangle inequality and this is sufficient to generalize
certain LP rounding and local search techniques.
In practice the continuous version is popular for clustering applications.
The input points 𝑃 are in the Euclidean space ℝ 𝑑 where 𝑑 is typically large.
Let 𝑋 be the set of input points where each 𝑥 ∈ 𝑋 is now a 𝑑-dimensional
vector. The centers are now allowed to be in ambient space. This is called
the Euclidean 𝑘-Means. Here the squared distance actually helps in a certain
sense. For instance if 𝑘 = 1 then we can see that the optimum center is simply
𝑥∈𝑋 𝑥 — in other words we take the “average”. One can see this
1 Í
obtained by |𝑃|
by considering the problem of finding the center as an optimization problem:
min 𝑦∈ℝ 𝑑 𝑥∈𝑋 k𝑥 − 𝑦k 22 = min 𝑦∈ℝ 𝑑 𝑥∈𝑋 (𝑥 𝑖 − 𝑦 𝑖 )2 . It can be seen that we can
Í Í
optimize in each dimension separately and that the optimum in dimension
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 123
Lloyds-𝑘-Means(𝑋 , 𝑘 )
2. repeat
4. Output clusters 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘
There are two issues with the algorithm. The first issue is that the algorithm
can, in the worst-case, run for an exponential number of iterations. This issue
is common for many local search algorithms and as we discussed, it can be
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 124
Figure 9.4: Example demonstrating that a local optimum for Lloyd’s algorithm
can be arbitrarily bad compared to the optimum clustering. The green clusters
are the optimum ones and the red ones are the local optimum.
𝑫 2 -sampling and 𝒌-Means ++: To overcome the bad local optima it is common
to run the algorithm with random starting centers. Arthur and Vassilvitskii
[151] suggested a specific random sampling scheme to initialize the centers that
is closely related to independent work in [130]. This is called 𝐷 2 sampling.
𝐷 2 -sampling-𝑘-Means ++(𝑋 , 𝑘 )
2. for 𝑖 = 2 to 𝑘 do
3. Output 𝑆
The analysis establishes that the seeding already creates a good approxi-
mation, so in a sense the local search is only refining the initial approximation.
[4, 6] show that if one uses 𝑂(𝑘) centers, initialized according to 𝐷 2 sampling,
then the local optimum will yield a constant factor approximation with constant
probability; note that this is a bicriteria approximation where the number of
centers is a constant factor more than 𝑘 and the cost is being compared with
respect to the optimum cost with 𝑘 centers. The authors also show that there is a
subset of 𝑘 centers from the output of the algorithm that yields a constant factor
approximation. One can then run a discrete optimization algorithm using the
centers. Another interesting result based on 𝐷 2 -sampling ideas yields a PTAS
˜ 2
but the running time is of the form 𝑂(𝑛𝑑2𝑂(𝑘 /𝜖) ) [92]. See [15] for a scalable
version of 𝑘-Means ++.
(Parts of this chapter are based on previous scribed lecture notes by Nitish
Korula and Sungjin Im.)
Network Design is a broad topic that deals with finding a subgraph 𝐻 of a
given graph 𝐺 = (𝑉 , 𝐸) of minimum cost while satisfying certain requirements.
𝐺 represents an existing network or a constraint over where one can build. The
subgraph 𝐻 is what we want to select/build. Many natural problems can be
viewed this way. For instance the minimum spanning tree (MST) can be viewed
as follows: given an undirected graph 𝐺 = (𝑉 , 𝐸) with edge costs 𝑐 : 𝐸 → ℝ+ ,
find the cheapest connected spanning (spanning means that all vertices are
included) subgraph of 𝐺. The fact that a minimal solution is a tree is clear, but
the point is that the motivation does not explicitly mention the requirement that
the output be a tree.
Connectivity problems are a large part of network design. As we already saw
MST is the most basic one and can be solved in polynomial time. The Steiner Tree
problem is a generalization where we are given a subset 𝑆 of terminals in an edge-
weighted graph 𝐺 = (𝑉 , 𝐸) and the goal is to find a cheapest connected subgraph
that contains all terminals. This is NP-Hard. Traveling Salesman Problem (TSP)
and its variants can also be viewed as network design problems. Network design
is heavily motivated by real-world problems in telecommunication networks and
those problems combine aspects of connectivity and routing and in this context
there are several problems related to buy-at-bulk network design, fixed-charge
flow problems etc.
Graph theory plays an important role in most network algorithmic questions.
The complexity and nature of the problems vary substantially based on whether
the graph is undirected or directed. To illustrate this consider the Directed Steiner
Tree problem. Here 𝐺 = (𝑉 , 𝐸) is a directed graph with non-negative edge/arc
weights, and we are given a root 𝑟 and a set of terminals 𝑆 ⊆ 𝑉. The goal is to
126
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 127
1Variants of the Steiner Tree problem, named after Jakob Steiner, have been studied by Fermat,
Weber, and others for centuries. The front cover of the course textbook contains a reproduction of
a letter from Gauss to Schumacher on a Steiner tree question.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 128
Definition 10.1. Given a connected graph 𝐺(𝑉 , 𝐸) with edge costs, the metric com-
pletion of 𝐺 is a complete graph 𝐻(𝑉 , 𝐸0) such that for each 𝑢, 𝑣 ∈ 𝑉, the cost of
edge 𝑢𝑣 in 𝐻 is the cost of the shortest path in 𝐺 from 𝑢 to 𝑣. The graph 𝐻 with
edge costs is a metric on 𝑉, because the edge costs satisfy the triangle inequality:
∀𝑢, 𝑣, 𝑤, 𝑐𝑜𝑠𝑡(𝑢𝑣) ≤ 𝑐𝑜𝑠𝑡(𝑢𝑤) + 𝑐𝑜𝑠𝑡(𝑤𝑣).
5 5
6 7
1 2 1 2
4 3
Figure 10.1: On the left, a graph. On the right, its metric completion, with new
edges and modified edge costs in red.
Observation 10.2. To solve the Steiner Tree problem on a graph 𝐺, it suffices to solve
it on the metric completion of 𝐺.
We now look at two approximation algorithms for the Steiner Tree problem.
(Here, we use the notation 𝐻[𝑆] to denote the subgraph of 𝐻 induced by the set
of terminals 𝑆.)
Lemma 10.1. For any instance 𝐼 of Steiner Tree, let 𝐻 denote the metric completion
of the graph, and 𝑆 the set of terminals. There exists a spanning tree in 𝐻[𝑆] (the graph
1
induced by terminals) of cost at most 2(1 − |𝑆| ) OPT, where OPT is the cost of an optimal
solution to instance 𝐼.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 129
5 5
6 7 6 7 6
1 2 1 2
4 3 3 3
Graph 𝐺 Metric Completion 𝐻 𝐻[𝑆] Output Tree 𝑇
Before we prove the lemma, we note that if there exists some spanning tree
in 𝐻[𝑆] of cost at most 2(1 − |𝑆|
1
) OPT, the minimum spanning tree has at most
this cost. Therefore, Lemma 10.1 implies that the algorithm SteinerMST is a
1
2(1 − |𝑆| )-approximation for the Steiner Tree problem.
Proof. Proof of Lemma 10.1 Let 𝑇 ∗ denote an optimal solution in 𝐻 to the given
instance, with cost 𝑐(𝑇 ∗ ). Double all the edges of 𝑇 ∗ to obtain an Eulerian graph,
and fix an Eulerian Tour 𝑊 of this graph. See Fig 10.3. Now, shortcut edges of
𝑊 to obtain a tour 𝑊 0 of the vertices in 𝑇 ∗ in which each vertex is visited exactly
once. Again, shortcut edges of 𝑊 0 to eliminate all non-terminals; this gives a
walk 𝑊 00 that visits each terminal exactly once.
8 4
2 7
9 6 5
10
1
13 12
14 11
8 4
2 7
9 6 5
10
1
13 12
14 11
Blue edges show shortcut tour 𝑊 0 Red edges show shortcut walk 𝑊 00 on terminals
It is easy to see that 𝑐(𝑊 00) ≤ 𝑐(𝑊 0) ≤ 𝑐(𝑊) = 2𝑐(𝑇 ∗ ), where the inequalities
follow from the fact that by shortcutting, we can only decrease the length of the
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 130
walk. (Recall that we are working in the metric completion 𝐻.) Now, delete the
heaviest edge of 𝑊 00 to obtain a path through all the terminals in 𝑆, of cost at
1
most (1 − |𝑆| )𝑐(𝑊 00). This path is a spanning tree of the terminals, and contains
only terminals; therefore, there exists a spanning tree in 𝐻[𝑆] of cost at most
1
2(1 − |𝑆| )𝑐(𝑇 ∗ ).
A tight example: The following example (Fig. 4 below) shows that this analysis
is tight; there are instances of Steiner Tree where the SteinerMST algorithm
finds a tree of cost 2(1 − 𝑆1 ) OPT. Here, each pair of terminals is connected by an
edge of cost 2, and each terminal is connected to the central non-terminal by an
edge of cost 1. The optimal tree is a star containing the central non-terminal,
with edges to all the terminals; it has cost |𝑆|. However, the only trees in 𝐻[𝑆]
are formed by taking |𝑆| − 1 edges of cost 2; they have cost 2(|𝑆| − 1).
2 2 2 2 2 2
2 2 2 2 2 2
1 1
1 1
2 1 2 2 2 2 2
1
1
2 1 2 2 2 2 2
1 1
1 1
2 2 2 2 2 2
2 2 2 2 2 2
Graph 𝐺; not all edges shown 𝐻[𝑆]; not all edges shown. An MST of 𝐻[𝑆].
Figure 10.4: A tight example for the SteinerMST algorithm
To prove Theorem 10.3, we introduce some notation. Let 𝑐(𝑖) denote the cost
of the path 𝑃𝑖 used in the 𝑖th iteration to connect the terminal 𝑠 𝑖 to the already
Í|𝑆|
existing tree. Clearly, the total cost of the tree is 𝑖=1 𝑐(𝑖). Now, let {𝑖1 , 𝑖2 , . . . 𝑖 |𝑆| }
be a permutation of {1, 2, . . . |𝑆|} such that 𝑐(𝑖1 ) ≥ 𝑐(𝑖2 ) ≥ . . . ≥ 𝑐(𝑖 |𝑆| ). (That is,
relabel the terminals in decreasing order of the cost paid to connect them to the
tree that exists when they are considered by the algorithm.)
Claim 10.1.1. For all 𝑗, the cost 𝑐(𝑖 𝑗 ) is at most 2 OPT/𝑗, where OPT is the cost of an
optimal solution to the given instance.
Proof. Suppose by way of contradiction this were not true; since 𝑠 𝑖 𝑗 is the terminal
with 𝑗th highest cost of connection, there must be 𝑗 terminals that each pay more
than 2 OPT/𝑗 to connect to the tree that exists when they are considered. Let
𝑆0 = {𝑠 𝑖1 , 𝑠 𝑖2 , . . . 𝑠 𝑖 𝑗 } denote this set of terminals.
We argue that no two terminals in 𝑆0 ∪ {𝑠 1 } are within distance 2 OPT/𝑗 of
each other. If some pair 𝑥, 𝑦 were within this distance, one of these terminals
(say 𝑦) must be considered later by the algorithm than the other. But then the
cost of connecting 𝑦 to the already existing tree (which includes 𝑥) must be at
most 2 OPT/𝑗, and we have a contradiction.
Therefore, the minimum distance between any two terminals in 𝑆0 ∪ {𝑠 1 }
must be greater than 2 OPT/𝑗. Since there must be 𝑗 edges in any MST of these
terminals, an MST must have cost greater than 2 OPT. But the MST of a subset
of terminals cannot have cost more than 2 OPT, exactly as argued in the proof of
Lemma 10.1. Therefore, we obtain a contradiction.
Given this claim, it is easy to prove Theorem 10.3.
|𝑆| |𝑆| |𝑆| |𝑆|
Õ Õ Õ 2 OPT Õ 1
𝑐(𝑖) = 𝑐(𝑖 𝑗 ) ≤ = 2 OPT = 2𝐻|𝑆| · OPT .
𝑗 𝑗
𝑖=1 𝑗=1 𝑗=1 𝑗=1
Remark 10.3. We emphasize again that the analysis above holds for every ordering
of the terminals. A natural variant might be to adaptively order the terminals so
that in each iteration 𝑖 , the algorithm picks the terminal 𝑠 𝑖 to be the one closest
to the already existing tree 𝑇 built in the first 𝑖 iterations. Do you see that this
is equivalent to using the MST Heuristic with Prim’s algorithm for MST? This
illustrates the need to be careful in the design and analysis of heuristics.
10.1.3 LP Relaxation
A natural LP relaxation for the Steiner Tree problem is the following. For each
edge 𝑒 ∈ 𝐸 we have an indicator variable 𝑥 𝑒 to decide if we choose to include
𝑒 in our solution. The chosen edges should ensure that no two terminals are
separated. We write this via a constraint 𝑒∈𝛿(𝐴) 𝑥 𝑒 ≥ 1 for any set 𝐴 ⊂ 𝑉 such
Í
that 𝐴 contains a terminal and 𝑉 \ 𝐴 contains a terminal.
min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 1 𝐴 ∩ 𝑆 ≠ ∅, (𝑉 − 𝐴) ∩ 𝑆 ≠ ∅
𝑒∈𝛿(𝐴)
𝑥𝑒 ≥ 0
Note that the preceding LP has an exponential number of constraints.
However, there is a polynomial-time separation oracle. Given 𝑥 it is feasible for
the LP iff the 𝑠-𝑡 cut value is at least 1 between any two terminals 𝑠, 𝑡 ∈ 𝑆 with
edge capacities given by 𝑥. How good is this LP relaxation? We will see later
that there is a 2(1 − 1/|𝑆|)-approximation via this LP. Interestingly the LP has an
integrality gap of 2(1 − 1/|𝑆|) even if 𝑆 = 𝑉 in which case we want to solve the
MST problem! Despite the weakness of this cut based LP for these simple cases,
we will see later that it generalizes nicely for higher connectivity problems and
one can derive a 2-approximation even for those much more difficult problems.
1. The first algorithm to obtain a ratio of better than 2 was due to due to
Alexander Zelikovsky [160]; the approximation ratio of this algorithm was
11/6 ≈ 1.83. This was improved to 1 + ln23 ≈ 1.55 [135] and is based on
a local search based improvement starting with the MST heuristic, and
follows the original approach of Zelikovsky.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 133
3. The bidirected cut LP relaxation for the Steiner Tree was proposed by [52];
1
it has an integrality gap of at most 2(1 − |𝑆| ), but it is conjectured that
the gap is smaller. No algorithm is currently known that exploits this
LP relaxation to obtain an approximation ratio better than that of the
SteinerMST algorithm. Though the true integrality gap is not known,
there are examples that show it is at least 6/5 = 1.2 [153].
4. For many applications, the vertices can be modeled as points on the plane,
where the distance between them is simply the Euclidean distance. The
MST-based algorithm performs√ fairly well on such instances; it has an
approximation ratio of 2/ 3 ≈ 1.15 [51]. An example which achieves
this bound is three points at the corners of an equilateral triangle, say
of side-length 1; the MST heuristic outputs a tree of cost 2 (two sides of
the triangle) while the optimum solution is to connect the three points
to a Steiner vertex which is the circumcenter of the triangle. One can
do better still for instances in the plane (or in any Euclidean space of
small-dimensions); for any 𝜖 > 0, there is a 1 + 𝜖-approximation algorithm
that runs in polynomial time [10]. Such an approximation scheme is also
known for planar graphs [23] and more generally bounded-genus graphs.
We focus on Metric-TSP for the rest of this section. We first consider a natural
greedy approach, the Nearest Neighbor Heuristic (NNH).
TSP-MST(𝐺(𝑉 , 𝐸), 𝑐 : 𝐸 → ℛ + ):
Compute an MST 𝑇 of 𝐺.
Obtain an Eulerian graph 𝐻 = 2𝑇 by doubling edges of 𝑇
An Eulerian tour of 2𝑇 gives a tour in 𝐺.
Obtain a Hamiltonian cycle by shortcutting the tour.
Proof. We have 𝑐(𝑇) = 𝑒∈𝐸(𝑇) 𝑐(𝑒) ≤ OPT, since we can get a spanning tree in
Í
𝐺 by removing any edge from the optimal Hamiltonian cycle, and 𝑇 is a MST.
Thus 𝑐(𝐻) = 2𝑐(𝑇) ≤ 2 OPT. Also shortcutting only decreases the cost.
We observe that the loss of a factor 2 in the approximation ratio is due to
doubling edges; we did this in order to obtain an Eulerian tour. But any graph
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 136
in which all vertices have even degree is Eulerian, so one can still get an Eulerian
tour by adding edges only between odd degree vertices in 𝑇. Christofides
Heuristic [43] exploits this and improves the approximation ratio from 2 to 3/2.
See Fig 10.6 for a snapshot.
that touch the vertices 𝑉 \ 𝑆. By the metric-condition, 𝑐(𝐹𝑆∗ ) ≤ 𝑐(𝐹 ∗ ) = OPT. Let
𝑆 = {𝑣 1 , 𝑣2 , . . . , 𝑣 |𝑆| }. Without loss of generality 𝐹𝑆∗ visits the vertices of 𝑆 in the
order 𝑣1 , 𝑣2 , . . . , 𝑣 |𝑆| . Recall that |𝑆| is even. Let 𝑀1 = {𝑣 1 𝑣 2 , 𝑣3 𝑣 4 , ...𝑣 |𝑆|−1 𝑣 |𝑆| }
and 𝑀2 = {𝑣2 𝑣3 , 𝑣4 𝑣5 , ...𝑣 |𝑆| 𝑣 1 }. Note that both 𝑀1 and 𝑀2 are matchings,
and 𝑐(𝑀1 ) + 𝑐(𝑀2 ) = 𝑐(𝐹𝑆∗ ) ≤ OPT. We can assume without loss of generality
that 𝑐(𝑀1 ) ≤ 𝑐(𝑀2 ). Then we have 𝑐(𝑀1 ) ≤ .5 OPT. Also we know that
𝑐(𝑀) ≤ 𝑐(𝑀1 ), since 𝑀 is a minimum cost matching on 𝑆 in 𝐺[𝑆]. Hence we
have 𝑐(𝑀) ≤ 𝑐(𝑀1 ) ≤ .5 OPT, which completes the proof.
10.2.2 LP Relaxation
We describe a well-known LP relaxation for TSP called the Subtour-Elimination
LP and sometimes also called the Held-Karp LP relaxation although the formu-
lation was first given by Dantzig, Fulkerson and Johnson [47]. The LP relaxation
has a variable 𝑥 𝑒 for each edge 𝑒 ∈ 𝐸. Note that the TSP solution is a Hamilton
Cycle of least cost. A Hamilton cycle can be viewed as a connected subgraph of 𝐺
with degree 2 at each vertex. Thus we write the degree constraints and also the
cut constraints.
min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 = 2 𝑣∈𝑉
𝑒∈𝛿(𝑣)
Õ
𝑥𝑒 ≥ 2 ∅(𝑆(𝑉
𝑒∈𝛿(𝑆)
𝑥𝑒 ∈ [0, 1] 𝑒∈𝐸
The relaxation is not useful for a general graph since we saw that TSP is not
approximable. To obtain a relaxation for Metric-TSP we apply the above to the
metric completion of the graph 𝐺.
Another alternative is to consider the following LP which view the problem
as finding a connected Eulerian multi-graph of the underlying graph 𝐺. In other
words we are allowed to take an integer number of copies of each edge with the
constraint that the degree of each vertex is even and the graph is connected. It
is not easy to write the even degree condition since we do not have an apriori
bound. Instead one can write the following simpler LP, and interestingly one
can show that its optimum value is the same as that of the preceding relaxation
(when applied to the metric completion).
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 138
min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 2 ∅(𝑆(𝑉
𝑒∈𝛿(𝑆)
𝑥𝑒 ∈ 0 𝑒∈𝐸
1. In practice, local search heuristics are widely used and they perform
extremely well. A popular heuristic 2-Opt is to swap pairs from 𝑥 𝑦, 𝑧𝑤 to
𝑥𝑧, 𝑦𝑤 or 𝑥𝑤, 𝑦𝑧, if it improves the tour.
found. Recall that a cycle cover is a collection of disjoint cycles covering all
vertices. It is known that finding a minimum-cost cycle cover can be done in
polynomial time (see Homework 0). The Cycle Shrinking Algorithm achieves a
log2 𝑛 approximation ratio.
Figure 10.8: A snapshot of Cycle Shrinking Algorithm. To the left, a cycle cover
𝒞1 . In the center, blue vertices indicate proxy nodes, and a cycle cover 𝒞2 is
found on the proxy nodes. To the right, pink vertices are new proxy nodes, and
a cycle cover 𝒞3 is found on the new proxy nodes.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 140
Lemma 10.3. Let the cost of edges in 𝐺 satisfy the asymmetric triangle inequality.
Then for any 𝑆 ⊆ 𝑉, the cost of an optimal TSP tour in 𝐺[𝑆] is at most the cost of an
optimal TSP tour in 𝐺.
Proof. Since 𝐺 satisfies the triangle inequality there is an optimal tour TSP tour
in 𝐺 that is a Hamiltonian cycle 𝐶. Given any 𝑆 ⊆ 𝑉 the cycle 𝐶 can be short-cut
to produce another cycle 𝐶 0 that visits only 𝑆 and whose cost is at most the cost
of 𝐶.
Lemma 10.4. The cost of a min-cost cycle-cover is at most the cost of an optimal TSP
tour.
10.2.4 LP Relaxation
The LP relaxation for ATSP is given below. For each arc 𝑒 ∈ 𝐸 we have a variable
𝑥 𝑒 . We view the problem as finding a connected Eulerian multi-graph in the
support of 𝐺. That is, we can choose each edge 𝑒 an integer number of times.
We impose Eulerian constraint at each vertex by requiring the in-degree to be
equal to the out-degree. We impose connectivity constraint by ensuring that at
least one arc leaves each set of vertices 𝑆 which is not 𝑉 or ∅.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 141
min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ Õ
𝑥𝑒 − 𝑥𝑒 = 0 𝑣∈𝑉
𝑒∈𝛿+ (𝑣) 𝑒∈𝛿 − (𝑣)
Õ
𝑥𝑒 ≥ 1 ∅(𝑆(𝑉
𝑒∈𝛿 + (𝑆)
𝑥𝑒 ≥ 0 𝑒∈𝐸
Remarks:
1. It has remained an open problem for more than 25 years whether there
exists a constant factor approximation for ATSP. Asadpour et al [12] have
obtained an 𝑂(log 𝑛/log log 𝑛)-approximation for ATSP using some very
novel ideas and a well-known LP relaxation.
142
CHAPTER 11. STEINER FOREST PROBLEM 143
work of Goemans and Williamson (see their survey [66]) and several others.
Steiner Forest has the advantage that one can visualize the algorithm and
analysis more easily when compared to the more abstract settings that we will
see shortly.
We now describe an integer programming formulation for the problem. In
the IP we will have a variable 𝑥 𝑒 for each edge 𝑒 ∈ 𝐸 such that 𝑥 𝑒 is 1 if and only
if 𝑒 is part of the solution. Let the set 𝒮 be the collection of all sets 𝑆 ⊂ 𝑉 such
that |𝑆 ∩ {𝑠 𝑖 , 𝑡 𝑖 }| = 1 for some 1 ≤ 𝑖 ≤ 𝑘. For a set 𝑆 ⊂ 𝑉 let 𝛿(𝑆) denote the set
of edges crossing the cut (𝑆, 𝑉 \ 𝑆). The IP can be written as the following.
Õ
min 𝑐𝑒 𝑥𝑒
𝑒∈𝐸
Õ
such that 𝑥 𝑒 ≥ 1 ∀𝑆 ∈ 𝒮
𝑒∈𝛿(𝑆)
𝑥 𝑒 ∈ {0, 1} ∀𝑒 ∈ 𝐸
Õ
max 𝑦𝑆
𝑆∈𝒮
Õ
such that 𝑦𝑆 ≤ 𝑐 𝑒 ∀𝑒 ∈ 𝐸
𝑆:𝑒∈𝛿(𝑆)
𝑦𝑆 ≥ 0 ∀𝑆 ∈ 𝒮
Before we continue, some definitions will be stated which will help to define
our algorithm for the problem.
Next we show that any two minimally violated sets are disjoint.
In fact we will prove the following claim which implies the preceding one.
CHAPTER 11. STEINER FOREST PROBLEM 144
Claim 11.0.2. Let 𝑋 ⊆ 𝐸. The minimially violated sets with respect to 𝑋 are the
connected components of the graph (𝑉 , 𝑋) that are violated with respect to 𝑋.
Proof. Consider a minimal violated set 𝑆. We may assume the set 𝑆 contains
𝑠 𝑖 but not 𝑡 𝑖 for some 𝑖. If 𝑆 is not a connected component of (𝑉 , 𝑋) then there
must be some connected component 𝑆0 of 𝐺[𝑆] that contains 𝑠 𝑖 . But 𝛿 𝑋 (𝑆0) = ∅,
and hence 𝑆0 is violated; this contradicts the fact that 𝑆 is minimal. Therefore, if
a set 𝑆 is a minimal violated set then it must be connected in the graph (𝑉 , 𝑋).
Now suppose that 𝑆 is a connected component of (𝑉 , 𝑋); it is easy to see that
no proper subset of 𝑆 can be violated since some edge will cross any such set.
Thus, if 𝑆 is a violated set then it is minimal violated set.
Thus, the minimal violated sets with respect to 𝑋 are the conected components
of the graph (𝑉 , 𝑋) that are themselves violated sets. It follows that any two
distinct minimal violated sets are disjoint.
The primal-dual algorithm for Steiner Forest is described below.
SteinerForest:
𝐹←∅
while 𝐹 is not feasible
Let 𝐶1 , 𝐶2 , . . . , 𝐶 ℎ be minimally violated sets with respect to 𝐹
Raise 𝑦𝐶 𝑖 for 1 ≤ 𝑖 ≤ ℎ uniformly until some edge 𝑒 becomes tight
𝐹←𝐹+𝑒
𝑥𝑒 = 1
Output 𝐹0 = {𝑒 ∈ 𝐹 | 𝐹 − 𝑒 is not feasible}
The first thing to notice about the algorithm above is that it is closely related
to our solution to the Vertex Cover problem, however, there are two main
differences. In the Vertex Cover we raised the dual variables for all uncovered
edges uniformly, however, in this algorithm we were more careful on which
dual variables are raised. In this algorithm, we chose to only raise the variables
which correspond to the minimally violated sets. Unlike the case of Steiner
Tree, in Steiner Forest, there can be non-trivial connected components that are
not violated and hence become inactive. A temporarily inactive component may
become part of an active component later if an active component merges with
it. The other main difference is that when we finally output the solution, we
prune 𝐹 to get 𝐹0. This is done for technical reasons, but the intuition is that we
should include no edge in the solution which is not needed to obtain a feasible
solution. To understand this algorithm, there is a non-trivial example in the
textbook [152] that demonstrates the algorithm’s finer points.
Lemma 11.1. At the end of the algorithm, 𝐹0 and y are primal and dual feasible solutions,
respectively.
CHAPTER 11. STEINER FOREST PROBLEM 145
Proof. In each iteration of the while loop, only the dual variables corresponding
to connected components were raised. Therefore, no edge that is contained
within the same component can become tight, and, therefore, 𝐹 is acyclic. To see
that none of the dual constraints are violated, observe that when a constraint
becomes tight (that is, it holds with equality), the corresponding edge 𝑒 is added
to 𝐹. Subsequently, since 𝑒 is contained in some connected component of 𝐹, no
set 𝑆 with 𝑒 ∈ 𝛿(𝑆) ever has 𝑦𝑆 raised. Therefore, the constraint for 𝑒 cannot be
violated, and so y is dual feasible.
As long as 𝐹 is not feasible, the while loop will not terminate, and there are
some minimal violated sets that can have their dual variables raised. Therefore,
at the end of the algorithm 𝐹 is feasible. Moreover, since 𝐹 is acyclic (it is a
forest), there is a unique 𝑠 𝑖 -𝑡 𝑖 path in 𝐹 for each 1 ≤ 𝑖 ≤ 𝑘. Thus, each edge on a
𝑠 𝑖 -𝑡 𝑖 path is not redundant and is not deleted when pruning 𝐹 to get 𝐹0.
Theorem 11.3. The primal-dual algorithm for Steiner Forest gives a 2-approximation.
Proof. Let 𝐹0 be the output from our algorithm. To prove this theorem, we want
to show that 𝑐(𝐹0) ≤ 2 𝑠∈𝒮 𝑦𝑆 where 𝑦𝑆 is the feasible dual constructed by the
Í
algorithm. It follows from this that the algorithm is in fact a 2-approximation.
First, we know that 𝑐(𝐹0) = 𝑒∈𝐹0 𝑐 𝑒 = 𝑒∈𝐹0 𝑆∈𝒮:𝑒∈𝛿(𝑆) 𝑦 𝑠 because every edge
Í Í Í
picked is tight. Let deg𝐹0 (𝑆) denote the number of edges of 𝐹0 that cross the cut
(𝑆, 𝑉 \ 𝑆). It can be seen that 𝑒∈𝐹0 𝑆∈𝒮:𝑒∈𝛿(𝑆) 𝑦 𝑠 = 𝑠∈𝒮 𝑦𝑆 deg𝐹0 (𝑆).
Í Í Í
Let 𝐴 𝑖 contain the minimally violated sets in iteration 𝑖 and let Δ𝑖 denote
the amount of dual growth in the 𝑖th iteration. Say that our algorithm runs for
𝛼 iterations. We can then rewrite 𝑠∈𝒮 𝑦𝑆 deg𝐹0 (𝑆) as the double summation
Í
𝛼
𝑖=1 Í 𝑆∈𝐴 𝑖 Δ𝑖 deg𝐹0 (𝑆). In the next lemma it will be shown for any iteration 𝑖
Í Í
that 𝑆∈𝐴𝑖 deg𝐹0 (𝑆) ≤ 2|𝐴 𝑖 |. Knowing this we can prove the theorem:
Õ Õ Õ 𝛼 Õ
Õ 𝛼
Õ Õ
𝑦 𝑠 deg𝐹0 (𝑆) = Δ𝑖 deg𝐹0 (𝑆) = Δ𝑖 deg𝐹0 (𝑆) ≤ Δ𝑖 · 2|𝐴 𝑖 | ≤ 2 𝑦𝑆 .
𝑆∈𝒮 𝑆∈𝒮 𝑖:𝑆∈𝐴 𝑖 𝑖=1 𝑆∈𝐴 𝑖 𝑖=1 𝑆∈𝒮
Now we show the lemma used in the previous theorem. It is in this lemma
that we use the fact that we prune 𝐹 to get 𝐹0.
We need a simple claim.
Í
Proof. We will prove it for a tree. We have 𝑣∈𝑉 deg𝑇 (𝑉) = 2|𝑉 | − 2 since a tree
has |𝑉 | − 1 edges. Every node 𝑢 ∈ 𝑉 − 𝑍 has degree at least 2 since it is not a
leaf. Thus 𝑢∈𝑉−𝑍 deg𝑇 (𝑢) ≥ 2|𝑉 − 𝑍|. Thus,
Í
Õ
deg𝑇 (𝑣) ≤ 2|𝑉 | − 2 − 2|𝑉 − 𝑍| ≤ 2|𝑍| − 2.
𝑣∈𝑍
Lemma 11.3. For any iteration 𝑖 of our algorithm,
Í
𝑆∈𝐴 𝑖 deg𝐹0 (𝑆) ≤ 2|𝐴 𝑖 | − 2.
Proof. Consider the graph (𝑉 , 𝐹0), and fix an iteration 𝑖. In this graph, contract
each set 𝑆 active in iteration 𝑖 to a single node (call such a node an active node),
and each inactive set to a single node. Let the resulting graph be denoted by 𝐻.
We know that 𝐹 is a forest and we have contracted connected subsets of vertices
in 𝐹; as 𝐹0 ⊆ 𝐹, we conclude that 𝐻 is also a forest.
Claim 11.0.3. Every leaf of 𝐻 is an active node.
Proof. If not, consider leaf node 𝑣 of 𝐻 which is an inactive node and let 𝑒 ∈ 𝐹0
be the edge incident to it. We claim that 𝐹0 − 𝑒 is feasible which would contradict
the minimality of 𝐹0. To see this, if 𝑥, 𝑦 are two nodes in 𝐻 where 𝑣 ≠ 𝑥, 𝑣 ≠ 𝑦
then 𝑥 and 𝑦 are connected also in 𝐻 − 𝑒 since 𝑣 is a leaf. Thus the utility of 𝑒 is
to connect 𝑣 to other nodes in 𝐻 but if this is the case 𝑣 would be an active node
at the start of the iteration which is not the case.
The degree in 𝐻 of an active node corresponding to violated set 𝑆 is deg𝐹0 (𝑆).
Now we apply Lemma 11.2.
Chapter 12
147
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 148
𝑓 : 2𝑉 → {0, 1}, find a min-cost subset of edges 𝐸0 such that |𝛿 𝐸0 (𝑆)| ≥ 𝑓 (𝑆) for
each 𝑆 ⊆ 𝑉. We use the notation 𝛿 𝐹 (𝑆) to denote the edges from an edge set 𝐹
that cross the set/cut 𝑆. Alternatively we want a min-cost subset of edges 𝐸0
such that each set 𝑆 ∈ 𝒮 is crossed by an edge of 𝐸0 where 𝒮 = {𝑆 | 𝑓 (𝑆) = 1}. It
is easy to observe that a minimal solution to this abstract problem is a forest since
any cut needs to be covered at most once. This formulation is too general since
𝑓 may be completely arbitrary. The goal is to find a sufficiently general class
that captures interesting problems while still being tractable. The advantage of
{0, 1} functions is precisely because the minimal solutions are forests. We will
later consider integer valued functions.
Definition 12.1. Given a graph 𝐺 = (𝑉 , 𝐸) and an integer valued function 𝑓 : 2𝑉 → ℤ
we say that a susbet of edges 𝐹 is feasible for 𝑓 or covers 𝑓 iff |𝛿 𝐹 (𝑆)| ≥ 𝑓 (𝑆) for all
𝑆 ⊆ 𝑉.
Remark 12.1. Even though it may seem natural to restrict attention to requirement
functions that only have non-negative entries, we will see later that the flexibility
of negative requirements is useful.
Given a network design problem Π in an undirected graph 𝐺 = (𝑉 , 𝐸) and
an integer valued function 𝑓 : 2𝑉 → ℤ+ we say that the requirement function of
Π is 𝑓 if covering 𝑓 is equivalent to satisfing the constraints of Π.
1. 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵)
2. 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴)
Definition 12.6. A {0, 1} valued 𝑓 is uncrossable if for all 𝐴 and 𝐵 such that 𝑓 (𝐴)
and 𝑓 (𝐵) = 1, one of the following conditions hold,
1. 𝑓 (𝐴 ∪ 𝐵) = 1 and 𝑓 (𝐴 ∩ 𝐵) = 1
2. 𝑓 (𝐴 − 𝐵) = 1 and 𝑓 (𝐵 − 𝐴) = 1.
𝑓 (𝐴) ≤ 𝑓 (𝐴 − 𝐵)
𝑓 (𝐵) ≤ 𝑓 (𝐵 − 𝐴)
2( 𝑓 (𝐴) + 𝑓 (𝐵)) ≤ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴) + 𝑓 (𝐴 ∩ 𝐵) + 𝑓 (𝐴 ∪ 𝐵)
The function is also symmetric and hence also satisfies posi-modularity. There-
fore,
|𝛿 𝑋 (𝐴)| + |𝛿 𝑋 (𝐵)| ≥ |𝛿 𝑋 (𝐴 − 𝐵)| + |𝛿 𝑋 (𝐵 − 𝐴)| ∀𝐴, 𝐵 ⊆ 𝑉.
Now consider the function 𝑓𝑋 and let 𝐴, 𝐵 be any two subsets of 𝑉. Suppose
𝑓 (𝐴) + 𝑓 (𝐵) ≥ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵). Then
Definition 12.8. Let 𝑓 be a {0, 1} requirement function over 𝑉 and let 𝑋 ⊆ 𝐸. A set
𝑆 is violated with respect to 𝑋 if 𝑓𝑋 (𝑆) = 1 (in other words 𝑆 is not yet covered by
the edge set 𝑋). A set 𝑆 is a minimal violated set if there is no 𝑆0 ( 𝑆 such that 𝑆0 is
violated.
Lemma 12.4. Let 𝑓 be an uncrossable function and 𝑋 ⊆ 𝐸, then the minimial violated
sets of 𝑓 with respect to 𝑋 are disjoint. That is, if 𝐴, 𝐵 are minimal violated sets then
𝐴 = 𝐵 or 𝐴 ∩ 𝐵 = ∅.
Õ
min 𝑐𝑒 𝑥𝑒
𝑒∈𝐸
Õ
such that 𝑥 𝑒 ≥ 𝑓 (𝑆) ∀𝑆 ⊆ 𝑉
𝑒∈𝛿(𝑆)
𝑥 𝑒 ≥ 0 ∀𝑒 ∈ 𝐸
Õ
max 𝑓 (𝑆)𝑦𝑆
𝑆∈𝒮
Õ
such that 𝑦𝑆 ≤ 𝑐 𝑒 ∀𝑒 ∈ 𝐸
𝑆:𝑒∈𝛿(𝑆)
𝑦𝑆 ≥ 0 ∀𝑆 ⊆ 𝑉
The primal-dual algorithm is similar to the one for Steiner Forest with a
growth phase and reverse-delete phase.
CoverUncrossableFunc(𝐺 = (𝑉 , 𝐸), 𝑓 ):
𝐹←∅
while 𝐹 is not feasible
Let 𝐶1 , 𝐶2 , . . . , 𝐶ℓ be minimally violated sets of 𝑓 with respect to 𝐹
Raise 𝑦𝐶 𝑖 for 1 ≤ 𝑖 ≤ ℓ uniformly until some edge 𝑒 becomes tight
𝐹←𝐹+𝑒
𝑥𝑒 = 1
Let 𝐹0 = {𝑒1 , 𝑒2 , . . . , 𝑒𝑡 } where 𝑒 𝑖 is edge added to 𝐹 in iteration 𝑖
𝐹0 = 𝐹
for 𝑖 = 𝑡 down to 1 do
if (𝐹0 − 𝑒 𝑖 ) is feasible then 𝐹0 = 𝐹0 − 𝑒 𝑖
Output 𝐹0
Analysis: We can prove the following by induction on the iterations and omit
the routine details.
Lemma 12.5. The output of the algorithm, 𝐹0, is a feasible solution that covers 𝑓 .
Assuming oracle access to finding the minimal violated sets in each iteration, the
algorithm can be implemented in polynomial time.
The preceding lemma shows that the algorithm correctly outputs a feasible
solution. The main part is to analyze the cost of the solution.
Theorem 12.9. Let 𝐹0 be the output of the algorithm. Then 𝑐(𝐹0) ≤ 2 𝑓 (𝑆)𝑦𝑆 and
Í
𝑆
hence 𝑐(𝐹0) ≤ 2 OPT.
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 153
Õ Õ Õ Õ Õ 𝑡
Õ Õ
0
𝑐(𝐹 ) = 𝑦𝑆 = 𝑓 (𝑆)𝑦𝑆 deg𝐹0 (𝑆) = deg𝐹0 (𝑆) Δ𝑖 = Δ𝑖 deg𝐹0 (𝑆).
𝑒∈𝐹0 𝑆:𝑒∈𝛿(𝑆) 𝑆 𝑆 𝑆∈𝒞𝑖 𝑖=1 𝑆∈𝒞𝑖
To complete the analysis we need the following lemma which can be seen as
a corollary of Lemma 12.6.
Proof. Consider iteration 𝑖. Let 𝑋 = {𝑒1 , 𝑒2 , . . . , 𝑒 𝑖−1 } which are the set of edges
added by algorithm in the growth phase prior to iteration 𝑖. Thus 𝒞𝑖 is the
minimal violated sets with respect to 𝑋. Consider the graph 𝐺0 = (𝑉 , 𝐸 \ 𝑋)
and the function 𝑓𝑋 . We observe that 𝐹00 = 𝐹0 \ 𝑋 is a minimal feasible solution
to covering 𝑓𝑋 in 𝐺0 — this is because of the reverse delete process. Moreover
we claim that deg𝐹0 (𝐶) = deg𝐹00 (𝐶) for any 𝐶 ∈ 𝒞𝑖 (why?). We can now apply
Lemma 12.6 to the function 𝑓𝑋 (which is uncrossable) and 𝐺0 and 𝐹00 so obtain:
Õ Õ
deg𝐹0 (𝐶) = deg𝐹00 (𝐶) ≤ 2|𝒞𝑖 |.
𝐶∈𝒞𝑖 𝐶∈𝒞𝑖
With the preceding lemma in place we have,
𝑡
Õ Õ 𝑡
Õ Õ
𝑐(𝐹0) = Δ𝑖 deg𝐹0 (𝑆) ≤ Δ𝑖 · 2|𝒞𝑖 | = 2 𝑓 (𝑆)𝑦𝑆 .
𝑖=1 𝑆∈𝒞𝑖 𝑖=1 𝑆
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 154
Claim 12.2.1. Let 𝑆 𝑒 be a witness set for 𝑒 ∈ 𝐹. Then for any minimal violated set 𝐶
we have 𝐶 ⊆ 𝑆 𝑒 or 𝐶 ∩ 𝑆 𝑒 = ∅.
We have seen that there is a witness family for 𝐹 since it is minimal. We can
obtain a special witness family starting with an arbitrary witness family.
The first inequality is since each is covered and the second inequality is because
of posi-modularity of |𝛿𝑌 (·)|.
The other case when 𝑓 (𝑆 𝑒1 ∩ 𝑆 𝑒2 ) = 1 and 𝑓 (𝑆 𝑒1 ∪ 𝑆 𝑒2 ) = 1 can also be handled
with a similar argument.
Let 𝒮 be a laminar witness family for 𝐹. We create a rooted tree 𝑃 from
𝒮 as follows. To do this we first add 𝑉 to the family. For each set 𝑆 ∈ 𝒮 we
add a vertex 𝑣 𝑆 . For any 𝑆, 𝑇 such that 𝑇 is the smallest set in 𝒮 containing 𝑆
we add the edge (𝑣 𝑆 , 𝑣𝑇 ) which makes 𝑣𝑇 the parent of 𝑣 𝑆 . Since we added 𝑉
to the family we obtain a tree since every maximal set in the original family
becomes a child of 𝑣𝑉 . Note that 𝑃 has |𝐹| edges and |𝐹| + 1 nodes and each
𝑒 ∈ 𝐹 corresponds to the edge (𝑣 𝑆𝑒 , 𝑣𝑇 ) of 𝑃 where 𝑣𝑇 is the parent of 𝑣 𝑆𝑒 in 𝑃.
We keep this bijection between 𝐹 and edges of 𝑃 in mind for later.
See figure.
Consider any minimal violated set 𝐶 ∈ 𝒞. We observe that 𝐶 cannot cross
any set in 𝒮 since it is a witness family. For each 𝐶 we associate the minimal
set 𝑆 ∈ 𝒮 such that 𝐶 ⊆ 𝑆. We call 𝑣 𝑆 an active node of 𝑇 if there is a 𝐶 ∈ 𝒮
associated with it. Note that (i) not all nodes of 𝑃 may be active (ii) every 𝐶 is
associated with exactly one active node of 𝑃 (iii) multiple sets from 𝒞 can be
associated with the same active node of 𝑃.
Lemma 12.9. Let 𝑃𝑎 be the active nodes of 𝑃. Then |𝑃𝑎 | ≤ |𝒞| and every leaf of 𝑃 other
than potentially the root is an active node.
Proof. Since each 𝐶 is associated with an active node we obtain |𝑃𝑎 | ≤ |𝒞|. A
leaf of 𝑃 corresponds to a minimal set from 𝒮 or the root. If 𝑆 𝑒 is a minimal set
from 𝒮 then 𝑆 𝑒 is a violated set and hence must contain a minimal violated set
𝐶. But then 𝑆 𝑒 is active because of 𝐶. Consider 𝑣𝑉 . If it is a leaf then it has only
one child which is the unique maximal witness set 𝑆. The root can be inactive
since the function 𝑓 is not necessarily symmetric. (However, if 𝑓 is symmetric
then 𝑉 − 𝑆 would also be a violated set and hence contain a minimal violated
set from 𝒞.)
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 156
Figure 12.1: Laminar witness family for a minimal solution 𝐹 shown as red
edges. Sets in green are the minimal violated sets and the sets in black are the
witness sets. The root set 𝑉 is not drawn.
Lemma 12.10. Let 𝑣𝑇 be an active node in 𝑃. Let 𝒞 0 ⊆ 𝒞 be set of all minimal violated
sets associated with 𝑣𝑇 . Then 𝐶∈𝒞0 deg𝐹 (𝐶) ≤ deg𝑃 (𝑣𝑇 ).
Í
Proof. Let 𝑌 = ∪𝐶∈𝒞0 𝛿 𝐹 (𝐶) be the set of edges incident to the sets in 𝒞 0. Consider
any 𝐶 ∈ 𝒞 0. 𝐶 ⊂ 𝑇 and 𝐶 is also disjoint from the children of 𝑇. Consider an
edge 𝑒 ∈ 𝛿 𝐹 (𝐶). If 𝑒 crosses 𝑇 then 𝑇 is the witness set for 𝑒 (since only one edge
from 𝐹 can cross 𝑇 for which it is the witness) and we charge 𝑒 to the parent edge
of 𝑇. If 𝑒 does not cross 𝑇 then the witness set 𝑆 𝑒 must be a child of 𝑇. Since
only one edge can cross each child of 𝑇 we can charge 𝑒 to one child of 𝑇. Note
that no edge 𝑒 ∈ 𝑌 can be incident to two set 𝐶1 , 𝐶2 ∈ 𝒞 0 since both one end
point of 𝑒 must be contained in a child of 𝑇 (assuming 𝑒 does not cross 𝑇) and
both 𝐶1 , 𝐶2 are contained in 𝑇 and disjoint from the children of 𝑇. See figure.
Therefore, we can charge 𝐶∈𝒞0 deg𝐹0 (𝐶) to the number of children of 𝑇 plus the
Í
parent edge of 𝑇, which is deg𝑃 (𝑣𝑇 ).
Now we are ready to finish the proof. From Lemma 12.10 and the fact that
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 157
To bound 𝑣∈𝑃𝑎 deg𝑃 (𝑣) we had observed that in the tree 𝑃 every leaf except
Í
perhaps the root node is an active node. Suppose root is an active node or it is
Í
not a leaf. Then from Lemma 11.2, 𝑣∈𝑃𝑎 deg𝑃 (𝑣) ≤ 2|𝑃𝑎 | − 2. Suppose root is a
leaf and inactive. Again from Lemma 11.2 where we consider 𝑍 = 𝑃𝑎 ∪ {𝑣𝑉 }, we
Í Í
have 𝑣∈𝑃𝑎 deg𝑃 (𝑣) + 1 ≤ 2|𝑃𝑎 + 1| − 2 = 2|𝑃𝑎 |. Hence 𝑣∈𝑃𝑎 deg𝑃 (𝑣) ≤ 2|𝑃𝑎 | − 1.
Í
Thus, in both cases we see that 𝑣∈𝑃𝑎 deg𝑃 (𝑣) ≤ 2|𝑃𝑎 |. Finally we note that
|𝑃𝑎 | ≤ |𝒞| and hence putting together we have
Õ
deg𝐹 (𝐶) ≤ 2|𝑃𝑎 | ≤ 2|𝒞|
𝐶∈𝒞
as desired.
Bibliographic Notes: The primal-dual algorithm for uncrossable function is
from the paper of Williamson, Goemans, Mihail and Vazirani [157]. The proof in
the paper assumes that ℎ is symmetric without explicitly stating it; for symmetric
functions one obtains a slight advantage over 2. See Williamson’s thesis [156]
where he gives the proof for both symmetric and general uncrossable functions.
The survey by Goemans and Williamson [67] describes the many applications of
the primal-dual method in network design.
Chapter 13
t1 t1
s1 s1
s2 s2
s3 s3
t2 t3 t2 t3
Figure 13.1: Example of EC-SNDP. Requirement only for three pairs. A feasible
solution shown in the second figure as red edges. In this example the paths
for each pair are also vertex disjoint even though the requirement is only for
edge-disjointness.
158
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 159
Proof. For each pair or nodes (𝑠, 𝑡) find a source-minimal 𝑠-𝑡 mincut 𝑆 in the
graph 𝐻 = (𝑉 , 𝑋) and a sink-minimal mincut 𝑇 via maxflow1. Let 𝑆 be the cut.
If |𝛿 𝑋 (𝑆)| < 𝑝 then 𝑝 is a violated set. We compute all such minimal cuts over all
pairs of vertices and take the minimal sets in this collection. We leave it as an
exercise to check that the minimal violated sets of 𝑔𝑝 are the minimal sets in this
collection and will be disjoint.
Corollary 13.1. Let 𝑓 be the requirement function of an instance of EC-SNDP in
𝐺 = (𝑉 , 𝐸) and let 𝑝 be an integer. Let 𝑋 be set of edges such that 𝑋 is feasible to cover
𝑔𝑝 . In the graph 𝐺0 = (𝑉 , 𝐸 \ 𝑋) and for any 𝐹 ⊆ (𝐸 \ 𝑋) the minimal violated sets of
ℎ 𝑝+1 with respect to 𝐹 can computed in polynomial time.
Proof. The minimial violated sets of ℎ 𝑝+1 with respect to 𝐹 are the same as the
minimal violated sets of 𝑔𝑝+1 with respect to 𝑋 ∪ 𝐹.
1The source minimal 𝑠-𝑡 mincut in a directed/undirected graph is unique via submodularity
and can be found by computing 𝑠-𝑡 maxflow and finding the reachable set from 𝑠 in the residual
graph. Similarly sink minimal set.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 161
Augmentation-Algorithm(𝐺 = (𝑉 , 𝐸), 𝑓 )
3. 𝐴 ← ∅
4. for ( 𝑝 = 1 to 𝑘 ) do
A. 𝐺0 = (𝑉 , 𝐸 \ 𝐴)
B. Let 𝑔𝑝 be the function defined as 𝑔𝑝 (𝑆) = min{ 𝑓 (𝑆), 𝑝}
C. Let ℎ 𝑝 be the uncrossable function where ℎ 𝑝 (𝑆) = 1 iff 𝑔𝑝 (𝑆) > |𝛿 𝐴 (𝑆)|
D. Find 𝐴0 ⊆ 𝐸 \ 𝐴 that covers ℎ 𝑝 in 𝐺0
E. 𝐴 ← 𝐴 ∪ 𝐴0
5. Output 𝐴
t1
r(s1t1) = 2 s1
r(s2t2) = 2 s2
r(s3t3) = 1
s3
t2 t3
t1
s1
s2
s3
t2 t3
Proof. We sketch the proof. The algorithm has 𝑘 iterations and in each iteration
it uses a black box algorithm to cover an uncrossable function. We saw a primal-
dual 2-approximation for this problem. We observe that if 𝐹 ∗ is an optimum
solution to the given instance then in each iteration 𝐹 ∗ \ 𝐴 is a feasible solution
to the covering problem in that iteration. Thus the cost paid by the algorithm
in each iteration can be bound by 2𝑐(𝐹 ∗ ) and hence the total cost is at most
2𝑘 OPT. The preceding lemmas argue that the primal-dual algorithm can be
implemented in polynomial time.
Remark 13.1. A different algorithm that is based on augmentation in reverse
yields a 2𝐻 𝑘 approximation where 𝐻 𝑘 is the 𝑘’th harmonic number. We refer
the reader to [66].
Õ
min 𝑐(𝑒)𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 𝑓 (𝑆) 𝑆⊂𝑉
𝑒∈𝛿(𝑆)
𝑥𝑒 ∈ [0, 1] 𝑒∈𝐸
Note that upper bound constraints 𝑥 𝑒 ≤ 1 are necessary in the general setting
when 𝑓 is integer valued since we can only take one copy of an edge. The key
structural theorem of Jain is the following.
Theorem 13.3. Let 𝑥 be a basic feasible solution to the LP relaxation. Then there is
some edge 𝑒 ∈ 𝐸 such that 𝑥 𝑒 = 0 or 𝑥 𝑒 ≥ 1/2.
With the above in place, and the observation that the residual function of
a skew-supermodular function is again a skew-supermodular function, one
obtains an interative rounding algorithm.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 163
Cover-Skew-Supermodular(𝐺, 𝑓 )
2. 𝐴 ← ∅, 𝑔 = 𝑓
4. Output 𝐴
Corollary 13.4. The integrality gap of the cut LP is at most 2 for any skew-supermodular
function 𝑓 .
Proof. We consider the iterative rounding algorithm and prove the result via
induction on 𝑚 the number of edges in 𝐺. The base case of 𝑚 = 0 is trivial since
the function has to be 0.
Í Let 𝑥∗ be an optimum basic feasible solution to the LP relaxation. We have
∗
𝑒∈𝐸 𝑐 𝑒 𝑥 𝑒 ≤ OPT. We can assume without loss of generality that 𝑓 is not trivial
in the sense that 𝑓 (𝑆) ≥ 1 for at least some set 𝑆, otherwise 𝑥 = 0 is optimal
and there is nothing to prove. By Theorem 13.3, there is an edge 𝑒˜ ∈ 𝐸 such
that 𝑥 ∗𝑒˜ = 0 or 𝑥 ∗𝑒˜ ≥ 1/2. Let 𝐸0 = 𝐸 \ 𝑒˜ and 𝐺0 = (𝑉 , 𝐸0). In the former case we
can discard 𝑒˜ and the current LP solution restricted to 𝐸0 is a feasible fractional
solution and we obtain the desired result via induction since we have one less
edge.
The more interesting case is when 𝑥 ∗𝑒˜ ≥ 1/2. The algorithm includes 𝑒˜ and
recurses on 𝐺0 and the residual function 𝑔 : 2𝑉 → ℤ where 𝑔(𝑆) = 𝑓 (𝑆) − |𝛿 𝑒˜ (𝑆)|.
Note that 𝑔 is skew-supermodular. We observe that 𝐴0 ⊆ 𝐸0 is a feasible solution
to cover 𝑔 in 𝐺0 iff 𝐴0 ∪ { 𝑒˜ } is a feasible solution to cover 𝑓 in 𝐺. Furthermore,
we also observe that the fractional solution 𝑥 0 obtained by restricting 𝑥 to 𝐸0
is a feasible fractional solution to the LP relaxation to cover 𝑔 in 𝐺0. Thus,
by induction, there is a solution 𝐴0 ⊆ 𝐸0 such that 𝑐(𝐴0) ≤ 2 𝑒∈𝐸0 𝑐(𝑒)𝑥 ∗𝑒 . The
Í
algorithm outputs 𝐴 = 𝐴0 ∪ { 𝑒˜ } which is feasible to cover 𝑓 in 𝐺. We have
Õ Õ
𝑐(𝐴) = 𝑐(𝐴0) + 𝑐(˜𝑒 ) ≤ 𝑐(𝐴0) + 2𝑐(˜𝑒 )𝑥 ∗𝑒˜ ≤ 2 𝑐(𝑒)𝑥 ∗𝑒 + 2𝑐(˜𝑒 )𝑥 ∗𝑒˜ = 2 𝑐(𝑒)𝑥 ∗𝑒 .
𝑒∈𝐸0 𝑒∈𝐸
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 164
We used the fact that 𝑥 ∗𝑒˜ ≥ 1/2 to upper bound 𝑐(˜𝑒 ) by 2𝑐(˜𝑒 )𝑥 ∗𝑒˜ .
2-approximation for EC-SNDP : We had already seen that the requirement
function for EC-SNDP is skew-supermodular. To applyTheorem 13.3 and obtain
a 2-approximation for EC-SNDP we need to argue that the LP relaxation can
be solved efficiently. We observe that the LP relaxation at the top level can be
solved efficiently via maxflow. We need to check that in the graph 𝐺 with edge
capacities given by the fractional solution 𝑥 the min-cut between every pair of
vertices (𝑠, 𝑡) is at least 𝑟(𝑠, 𝑡). Note that the algorithm is iterative. As we proceed
the function 𝑔 = 𝑓𝐴 where 𝑓 is the original requirement function and the 𝐴 is
the set of edges already chosen.
Exercise 13.2. Prove that there is an efficient separation oracle for each step of
the iterative rounding algorithm when 𝑓 is the requirement function for a given
EC-SNDP instance.
We now prove Theorem 13.3. The proof consists of two steps. The first step
is a characterization of basic feasible solutions via laminar tight sets. The second
step is a counting argument.
Lemma 13.3. Let 𝑥 be a basic feasible solution to the cut covering LP relaxation of
a skew-supermodular function 𝑓 where 𝑥 𝑒 ∈ (0, 1) for all 𝑒. Then there is a laminar
family ℒ of tight sets 𝑆1 , 𝑆2 , . . . , 𝑆𝑚 such that 𝑥 is the unique solution to the system
𝜒𝑆𝑇𝑖 𝑥 = 𝑓 (𝑆 𝑖 ).
x is basic feasible solution and 0 < x(e) < 1 for all e
Set S tight if x(δ(S)) = f(S)
Theorem
CHAPTER [Jain] x unique
13. SURVIVABLE solution to
NETWORK system PROBLEM
DESIGN of m tight 165
sets that form a laminar family
Lemma 13.4. Suppose 𝐴 and 𝐵 are two tight sets with respect to 𝑥 such that 𝐴, 𝐵 cross.
Then one of the following holds:
This implies that 𝑥(𝛿(𝐴 ∪ 𝐵)) = 𝑓 (𝐴 ∪ 𝐵) and 𝑥(𝛿(𝐴 ∩ 𝐵)) = 𝑓 (𝐴 ∩ 𝐵). Thus
both 𝐴 ∩ 𝐵 and 𝐴 ∪ 𝐵 are tight. Moreover we observe that 𝑥(𝛿(𝐴)) + 𝑥(𝛿(𝐵)) =
𝑥(𝛿(𝐴 ∪ 𝐵)) + 𝑥(𝛿(𝐴 ∩ 𝐵)) + 2𝑥(𝐸(𝐴 − 𝐵, 𝐵 − 𝐴)) where 𝐸(𝐴 − 𝐵, 𝐵 − 𝐴) is the
set of edges between 𝐴 − 𝐵 and 𝐵 − 𝐴. From the above tightness we see that
𝑥(𝛿(𝐴)) + 𝑥(𝛿(𝐵)) = 𝑥(𝛿(𝐴 ∪ 𝐵)) + 𝑥(𝛿(𝐴 ∩ 𝐵)), and since 𝑥 is fully fractional it
means that 𝐸(𝐴 − 𝐵, 𝐵 − 𝐴) = ∅. This implies that 𝜒𝐴 + 𝜒𝐵 = 𝜒𝐴∪𝐵 + 𝜒𝐴∩𝐵 (why?).
The second case is similar where we use posimodularity of the cut function.
Proof of Lemma 13.3. One natural way to proceed is as follows. We start with
tight sets 𝒮 = {𝑆1 , 𝑆2 , . . . , 𝑆𝑚 } such that 𝑥 is characterized as the unique solution
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 166
Now we consider a second setting where the forest associated with ℒ has 𝑘
leaves and ℎ internal nodes but each internal node has at least two children. In
this case, following Jain, we can easily prove a weaker statement that 𝑥 𝑒 ≥ 1/3
for some edge 𝑒. If not, then each leaf set 𝑆 must have four edges leaving it
and hence the total number of endpoints must be at least 4𝑘. However, if each
internal node has at least two children, we have ℎ < 𝑘 and since ℎ + 𝑘 = 𝑚 we
have 𝑘 > 𝑚/2. This implies that there must be at least 4𝑘 > 2𝑚 endpoints since
the leaf sets are disjoint. But 𝑚 edges can have at most 2𝑚 endpoints. Our
assumption on each internal node having at least two children is obviously a
restriction. So far we have not used the fact that the vectors 𝜒𝑆 , 𝑆 ∈ ℒ are linearly
independent. We can handle the general case to prove 𝑥 𝑒 ≥ 1/3 by using the
following lemma.
Lemma 13.5. Suppose 𝐶 is a unique child of 𝑆. Then there must be at least two
endpoints in 𝑆 that belong to 𝑆.
Proof. If there is no endpoint that belongs to 𝑆 then 𝛿(𝑆) = 𝛿(𝐶) but then 𝜒𝑆 and
𝜒𝐶 are linearly dependent. Suppose there is exactly one endpoint that belongs
to 𝑆 and let it be the endpoint of edge 𝑒. But then 𝑥(𝛿(𝑆)) = 𝑥(𝛿(𝐶)) + 𝑥 𝑒 or
𝑥(𝛿(𝑆)) = 𝑥(𝛿(𝐶)) − 𝑥 𝑒 . Both cases are not possible because 𝑥(𝛿(𝑆)) = 𝑓 (𝑆) and
𝑥(𝛿(𝐶)) = 𝑓 (𝐶) where 𝑓 (𝑆) and 𝑓 (𝐶) are positive integers while 𝑥 𝑒 ∈ (0, 1). Thus
there are at least two end points that belong to 𝑆.
Using the preceding lemma we prove that 𝑥 𝑒 ≥ 1/3 for some edge 𝑒. Let 𝑘
be the number of leaves in ℒ and ℎ be the number of internal nodes with at
least two children and let ℓ be the number of internal nodes with exactly one
child. We again have ℎ < 𝑘 and we also have 𝑘 + ℎ + ℓ = 𝑚. Each leaf has at
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 168
least four endpoints. Each internal node with exactly one child has at least two
end points which means the total number of endpoints is at least 4𝑘 + 2ℓ . But
4𝑘 + 2ℓ = 2𝑘 + 2𝑘 + 2ℓ > 2𝑘 + 2ℎ + 2ℓ > 2𝑚 and there are only 2𝑚 endpoints
for 𝑚 edges. In other words, we can ignore the internal nodes with exactly one
child since there are two endpoints in such a node/set and we can effectively
charge one edge to such a node.
We now come to the more delicate argument to prove the tight bound that
𝑥 𝑒 ≥ 21 for some edge 𝑒. We describe invariant that effectively reduces the
argument to the case where we can assume that ℒ is a collection of leaves. This
is encapsulated in the lemma below which requires some notation. Let 𝛼(𝑆) be
the number of sets of ℒ contained in 𝑆 including 𝑆 itself. Let 𝛽(𝑆) be the number
of edges whose both endpoints lie inside 𝑆. Recall that 𝑓 (𝑆) is the requirement of
𝑆.
ℎ
Õ ℎ
Õ ℎ
Õ ℎ
Õ
𝑓 (𝑅 𝑖 ) ≥ 𝛼(𝑅 𝑖 ) − 𝛽(𝑅 𝑖 ) ≥ 𝑚 − 𝛽(𝑅 𝑖 ).
𝑖=1 𝑖=1 𝑖=1 𝑖=1
ℎ
𝑓 (𝑅 𝑖 ) is the total requirement of the maximal sets. And
Í
Note that 𝑖=1
Íℎ
𝑚 − 𝑖=1 𝛽(𝑅 𝑖 ) is the total number of edges that cross the sets 𝑅 1 , . . . , 𝑅 ℎ . Let 𝐸0
be the set of edges crossing these maximal sets. Now we are back to the setting
Íℎ
with ℎ disjoint sets and 𝐸0 edges with 𝑖=1 𝑓 (𝑅 𝑖 ) ≥ |𝐸0 |. This easily leads to a
contradiction as before if we assume that 𝑥 𝑒 < 12 for all 𝑒 ∈ 𝐸0. Formally, each set
𝑅 𝑖 requires > 2 𝑓 (𝑅 𝑖 ) edges crossing it from 𝐸0 and therefore 𝑅 𝑖 contains at least
2 𝑓 (𝑅 𝑖 ) + 1 endpoints of edges from 𝐸0. Since 𝑅1 , . . . , 𝑅 ℎ are disjoint the total
number of endpoints is at least 2 𝑖 𝑓 (𝑅 𝑖 ) + ℎ which is strictly more than 2|𝐸0 |.
Í
Proof of Lemma 13.6. Thus, it remains to prove the claim which we do by induc-
tively starting at the leaves of the forest for ℒ.
Case 1: 𝑆 is a leaf node. We have 𝑓 (𝑆) ≥ 1 while 𝛼(𝑆) = 1 and 𝛽(𝑆) = 0 which
verifies the claim.
Case 2: 𝑆 is an internal nodes with 𝑘 children 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 . See Fig 13.5 for
the different types of edges that are relevant. 𝐸 𝑐𝑐 is the set of edges with end
points in two different children of 𝑆. 𝐸 𝑐𝑝 be the set of edges that cross exactly
one child but do not cross 𝑆. 𝐸 𝑝𝑜 be the set of edges that cross 𝑆 but do not cross
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 169
∈ Ecp
S
∈ Ecc
∈ Ecc
C3
C1 C2
Figure 13.5: 𝑆 is an internal node with several children. Different types of edges
that play a role. 𝑝 refers to parent set 𝑆, 𝑐 refer to a child set and 𝑜 refers to
outside.
any of the children. 𝐸 𝑐𝑜 is the set of edges that cross both a child and 𝑆. This
notation is borrowed from [155].
Let 𝛾(𝑆) be the number of edges whose both endpoints belong to 𝑆 but not
to any child of 𝑆. Note that 𝛾(𝑆) = |𝐸 𝑐𝑐 | + |𝐸 𝑐𝑝 |.
Then,
𝑘
Õ
𝛽(𝑆) = 𝛾(𝑆) + 𝛽(𝐶 𝑖 )
𝑖=1
𝑘
Õ 𝑘
Õ
≥ 𝛾(𝑆) + 𝛼(𝐶 𝑖 ) − 𝑓 (𝐶 𝑖 ) (13.1)
𝑖=1 𝑖=1
𝑘
Õ
= 𝛾(𝑆) + 𝛼(𝑆) − 1 − 𝑓 (𝐶 𝑖 )
𝑖=1
(13.1) follows by applying the inductive hypothesis to each child. From the
preceding inequality, to prove that 𝛽(𝑆) ≥ 𝛼(𝑆) − 𝑓 (𝑆) (the claim for 𝑆), it suffices
to show the following inequality.
𝑘
Õ
𝛾(𝑆) ≥ 𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1. (13.2)
𝑖=1
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 170
The right hand side of the above inequality can be written as:
𝑘
Õ Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 = 2𝑥 𝑒 + 𝑥𝑒 − 𝑥 𝑒 + 1. (13.3)
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝 𝑒∈𝐸 𝑝𝑜
𝑘
Õ Õ Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 = 2𝑥 𝑒 + 𝑥𝑒 − 𝑥𝑒 + 1 = − 𝑥 𝑒 + 1 < 1.
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝 𝑒∈𝐸 𝑝𝑜 𝑒∈𝐸 𝑝𝑜
𝑘
𝑓 (𝐶 𝑖 )− 𝑓 (𝑆)+1 ≤ 0 = 𝛾(𝑆).
Í
Since the left hand side is an integer, it follows that 𝑖=1
Case 2.2: 𝛾(𝑆) ≥ 1. Recall that 𝛾(𝑆) = |𝐸 𝑐𝑐 | + |𝐸 𝑐𝑝 |.
𝑘
Õ Õ Õ Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 = 2𝑥 𝑒 + 𝑥𝑒 − 𝑥𝑒 + 1 ≤ 2𝑥 𝑒 + 𝑥𝑒 + 1
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝 𝑒∈𝐸 𝑝𝑜 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝
By our assumption that 𝑥 𝑒 < 12 for each 𝑒, we have 𝑒∈𝐸𝑐𝑐 2𝑥 𝑒 < |𝐸 𝑐𝑐 | if |𝐸 𝑐𝑐 | > 0,
Í
and similarly 𝑒∈𝐸𝑐𝑝 𝑥 𝑒 < |𝐸 𝑐𝑝 |/2 if |𝐸 𝑐𝑝 | > 0. Since 𝛾(𝑆) = |𝐸 𝑐𝑐 | + |𝐸 𝑐𝑝 | ≥ 1 we
Í
conclude that Õ Õ
2𝑥 𝑒 + 𝑥 𝑒 < 𝛾(𝑆).
𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝
as desired.
Tightness of the analysis: The LP relaxation has an integrality gap of 2 even
for the MST problem. Let 𝐺 be the cycle on 𝑛 vertices with all edge costs equal
to 1. Then setting 𝑥 𝑒 = 1/2 on each edge is feasible and the cost is 𝑛/2 while the
MST cost is 𝑛 − 1. Note that the optimum fractional solution here is 1/2-integral.
However, there are more involved examples (see Jain’s paper or [152]) based on
the Petersen graph where the optimum basic feasible solution is not half-integral
while there are one or more edges with fractional value at least 1/2. Jain’s
iterated rounding algorithm is an unusual algorithm in that the output of the
algorithm may not have any discernible structure until it is completely done.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 171
Running time: The strength of the iterated rounding approach is the remark-
able approximation guarantees it delivers for various problems. The weakness
is the high running time which is due to two reasons. First, one needs a basic
feasible solution for the LP — this is typically much more expensive than finding
an approximately good feasible solution. Second, the algorithm requires com-
puting an LP solution many times. Finding faster algorithms with comparable
approximation guarantees is an open research area.
Chapter 14
Graph cut and partitoning problems such as the well-known 𝑠-𝑡 minimum cut
problem play a fundamental role in combinatorial optimization. Many natural
cut problems that go beyond the 𝑠-𝑡 cut problem are NP-Hard and there has
been extensive work on approximation algorithms and heuristics since they
arise in many applications. In addition to algorithms, the structural results that
capture approximate relationships between flows and cuts (called flow-cut gaps),
and the connections to the theory of metric embeddings as well as graph theory
have led to many beautiful and important results.
172
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS173
is equal to the 𝑠-𝑡 minimum cut value and both can be computed in strongly polynomial
time. Further, if 𝑐 is integer valued then there exists an integer-valued maximum flow.
The proof of the preceding theorem is typically established via the augment-
ing path algorithm for computing a maximum flow. Here we take a different
approach to finding an 𝑠-𝑡 cut via an LP relaxation whose dual can be seen as
the the maxflow LP.
Suppose we want to find an 𝑠-𝑡 mincut. We can write it as an integer program
as follows. For each edge 𝑒 ∈ 𝐸 we have a boolean variable 𝑥 𝑒 ∈ {0, 1} to indicate
whether we cut 𝑒. The constraint is that for any path 𝑃 ∈ 𝒫𝑠,𝑡 (here 𝒫𝑠,𝑡 is the
set of all 𝑠-𝑡 paths) we must choose at least on edge from 𝑃. This leads to the
following IP.
Õ
min 𝑐(𝑒)𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 1 𝑃 ∈ 𝒫𝑠,𝑡
𝑒∈𝑃
𝑥𝑒 ∈ {0, 1} 𝑒 ∈ 𝐸.
Theta-Rounding(𝐺, 𝑠, 𝑡 )
It is easy to see that the algorithm outputs a valid 𝑠-𝑡 since 𝑑 𝑦 (𝑠, 𝑡) ≥ 1 by
feasibility of the LP solution 𝑦 and hence 𝑡 ∉ 𝐵(𝑠, 𝜃) for any 𝜃 < 1.
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS174
Lemma 14.1. Let 𝑒 = (𝑢, 𝑣) be an edge. P[𝑒 is cut by algorithm] ≤ 𝑦(𝑢, 𝑣).
Proof. An edge 𝑒 = (𝑢, 𝑣) is cut iff 𝑑 𝑦 (𝑠, 𝑢) ≤ 𝜃 < 𝑑 𝑦 (𝑠, 𝑣). Hence the edge is not
cut if 𝑑 𝑦 (𝑠, 𝑣) ≤ 𝑑 𝑦 (𝑠, 𝑢). If 𝑑 𝑦 (𝑠, 𝑣) > 𝑑 𝑦 (𝑠, 𝑢) we have 𝑑 𝑦 (𝑠, 𝑣)−𝑑 𝑦 (𝑠, 𝑢) ≤ 𝑦(𝑢, 𝑣).
Since 𝜃 is chosen uniformly at random from (0, 1) the probabilty that 𝜃 lies in
the interval [𝑑 𝑦 (𝑠, 𝑢), 𝑑 𝑦 (𝑠, 𝑣)] is at most 𝑦(𝑢, 𝑣).
Corollary 14.2. The expected cost of the cut output by the algorithm is at most
𝑒 𝑐(𝑒)𝑦 𝑒 .
Í
The preceding corollary shows that there is an integral cut whose cost is at
most that of the LP relaxation which implies that the LP relaxation yields an
optimum solution. The algorithm can be easily derandomized by trying “all
possible value of 𝜃”. What does this mean? Once we have 𝑦 we compute the
shortest path distances from 𝑠 to each vertex 𝑣. We can think of these distances
as producing a line embedding where we place 𝑠 at 0 and each vertex 𝑣 at 𝑑 𝑦 (𝑠, 𝑣).
The only interesting choices for 𝜃 are given by the 𝑛 values of 𝑑 𝑦 (𝑠, 𝑣) and one
can try each of them and the corresponding cut and find the cheapest one. It is
guaranteed to be at most 𝑒 𝑐(𝑒)𝑦 𝑒 .
Í
What is the dual LP? We write it down below and you can verify that it is
the path version of the maxflow!
max 𝑧 𝑃
𝑃∈𝒫𝑠,𝑡
Õ
𝑧𝑃 ≤ 𝑐(𝑒) 𝑒∈𝐸
𝑃:𝑒∈𝑃
𝑧𝑃 ≥ 0 𝑃 ∈ 𝒫𝑠,𝑡 .
Thus, we have seen a proof of the maxflow-mincut theorem via LP rounding
of a relaxation for the 𝑠-𝑡 cut problem.
A compact LP via distance variables: The path based LP relaxation for the 𝑠-𝑡
mincut problem is natural and easy to formulate. We can also express shortest
path constraints via distance variables. We first write a bigger LP than necessary
via variables 𝑑(𝑢, 𝑣) for all ordered pairs of vertices (hence there are 𝑛 2 variables).
We need triangle inequality constraints to enforce that 𝑑(𝑢, 𝑣) values respect
shortest path distances.
Exercise 14.1. Write the dual of the above LP and see it as the standard edge-
based flow formulation for 𝑠-𝑡 maximum flow.
and 𝐷(𝐸0) is the total demand of pairs separated by removing 𝐸0. In undirected
graphs one can see that this ratio is also minimized by a connected component
𝑐(𝛿(𝑆))
and hence one can alternatively phrase the problem as min𝑆⊆𝑉 𝐷(𝛿(𝑆)) . In the
Uniform Sparsest Cut problem we associate 𝐷(𝑢, 𝑣) = 1 for every pair of vertices
𝑐(𝛿(𝑆))
and hence one wants to find min𝑆∈𝑉 |𝑆||𝑉−𝑆| . This is closely related (to within
a factor of 2) to the problem of finding the expansion of a graph where the
objective is min𝑆∈𝑉 ,|𝑆|≤|𝑉 |/2 𝑐(𝛿(𝑆)) |𝑆|
. Other closely related variants are to find the
conductance and sparsest cut for product multicommodity flow instances where
the demands are induced by vertex weights (that is, 𝐷(𝑢, 𝑣) = 𝜋(𝑢)𝜋(𝑣) where
𝜋 : 𝑉 → 𝑚𝑎𝑡 ℎ𝑏𝑏𝑅 + ).
Sparsest Cut can be generalized to node-weighted settings and to directed
graphs. In directed graphs there is a substantial difference when considering
the non-uniform versus the uniform settings because the latter can be thought
of a symmetric version. We will not detail the issues here.
Minimum Bisection and Balanced Partitioning: The input is an undirected
edge-weighted graph 𝐺. In Minimum Bisection the goal is to partition 𝑉 into
𝑉1 , 𝑉2 where b|𝑉 |/2c ≤ |𝑉1 |, |𝑉2 | ≤ d|𝑉 |/2e so as to minimize the weight of the
edges crossing the partition. In Balanced Partition the sizes of the two parts
can be approximately equal — there is a balance parameter 𝛼 ∈ (0, 1/2]) and
the goal is to partition 𝑉 into 𝑉1 , 𝑉2 such that 𝛼|𝑉 | ≤ |𝑉1 |, |𝑉2 | ≤ (1 − 𝛼)|𝑉 |.
These problems are partially motivated by parallel and distributed computation
where a graph representing some computation is recursively decomposed into
several pieces while minimizing the communication required between the pieces
(captured by the edge weights). In this context partitioning into 𝑘 given pieces
to minimize the number of edges while each piece has size roughly |𝑉 |/𝑘 is also
considered as Balanced 𝑘-Partition.
Hypergraphs and Submodular functions: One can generalize several edge-
weighted problems to node-weighted problems in a natural fashion but in some
cases it is useful to consider other ways to model. In this context hypergraphs
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS177
come in handy and have their own intrinsic appeal. Recall that a hypergraph
𝐻 = (𝑉 , 𝐸) consists of a vertex set 𝑉 and a set of hyperedges 𝐸 where each 𝑒 ∈ 𝐸
is a subset of vertices; thus 𝑒 ⊆ 𝑉. The rank of a hypergraph is the maximum
cardinality of a hyperedge, typically denote by 𝑟. Graphs are rank 2 hypergraphs.
One can typically reduce a hypergraph cut problem to a node-weighted cut
problem and vice-versa with some distinctions based on the specific problem at
hand. Finally some of the problems can be naturally lifted to the setting where
we consider an abstract submodular set function defined over the vertex set
𝑉; that is we are given 𝑓 : 2𝑉 → ℝ+ rather than a graph or a hypergraph and
the goal is to partition 𝑉 where the cost of the partition is now measured with
respect to 𝑓 .
Chapter 15
Multiway Cut
178
CHAPTER 15. MULTIWAY CUT 179
IsolatingCut(𝐺 = (𝑉 , 𝐸), 𝑆 = {𝑠 1 , . . . , 𝑠 𝑘 })
length variable 𝑥 𝑒 indicating whether 𝑒 is cut or not. We require that the length
of any path 𝑝 connecting 𝑠 𝑖 and 𝑠 𝑗 , 𝑖 ≠ 𝑗, should be at least 1.
𝑐𝑒 𝑥𝑒
Í
min
𝑒∈𝐸
s.t.
𝑥 𝑒 ≥ 1 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑠 𝑗 , 𝑖 ≠ 𝑗
Í
𝑒∈𝑝
𝑥𝑒 ≥ 0 𝑒∈𝐸
The preceding LP can be solved in polynomial time via the Ellipsoid method
since the separation oracle is the shortest path problem. Alternatively, one can
write a compact LP. We focus on rounding the LP and establishing its integrality
gap.
BallCut(𝐺 = (𝑉 , 𝐸), 𝑆 ⊆ 𝑉 )
Proof. There are several cases to consider but the main one is the one where both
𝑢, 𝑣 ∈ 𝐵 𝑑 (𝑠 𝑖 , 1/2] for some 𝑠 𝑖 . Suppose this is the case. Then only 𝑠 𝑖 can cut the
edge (𝑢, 𝑣). It is easy to see that P[𝑒 is cut] = 2|𝑑(𝑠 𝑖 , 𝑢) − 𝑑(𝑠 𝑖 , 𝑣)| since we pick
𝜃 uniformly at random from (0, 1/2). But |𝑑(𝑠 𝑖 , 𝑢) − 𝑑(𝑠 𝑖 , 𝑣)| ≤ 𝑥 𝑒 by triangle
inequality and hence we obtain the desired claim.
Now we consider the case when 𝑢 ∈ 𝐵(𝑠 𝑖 , 1/2) and 𝑣 ∈ 𝐵(𝑠 𝑗 , 1/2) where 𝑖 ≠ 𝑗.
Let 𝛼 = 1/2 − 𝑑(𝑠 𝑖 , 𝑢) and let 𝛽 = 1/2 − 𝑑(𝑠 𝑗 , 𝑣). We observe that 𝛼 + 𝛽 ≤ 𝑥 𝑒 for
otherwise 𝑑(𝑠 𝑖 , 𝑢) + 𝑥 𝑒 + 𝑑(𝑠 𝑗 , 𝑣) < 1. We see that 𝑒 is cut iff 𝜃 lies in the interval
[𝑑(𝑠 𝑖 , 𝑢), 1/2) or it lies in the interval [𝑑(𝑠 𝑗 , 𝑣), 1/2). Thus 𝑒 is cut with probability
2 max(𝛼, 𝛽) ≤ 2(𝛼 + 𝛽) ≤ 2𝑥 𝑒 .
There are two other cases. One is when both 𝑢, 𝑣 are outside every half-ball.
In this case the edge 𝑒 is not cut. The other is when 𝑢 ∈ 𝐵 𝑑 (𝑠 𝑖 , 1/2) for some 𝑖
and 𝑣 is not inside any ball. The analysis here is similar to the second case and
we leave it as an exercise.
Thus the expected cost of the cut is at most 2 𝑒 𝑐 𝑒 𝑥 𝑒 ≤ 2 OPT𝐿𝑃 ≤ 2 OPT.
Í
One can improve and obtain a 2(1 − 1/𝑘)-approximation by saving on one
terminal as we did in the preceding section.
Exercise 15.2. Modify the algorithm to obtain a 2(1 − 1/𝑘)-approximation with
respect to the LP relaxation.
Exercise 15.3. Consider a variant of the algorithm where each 𝑠 𝑖 picks an
independent 𝜃𝑖 ∈ (0, 1/2); we output the cut ∪𝑖 𝛿(𝐵 𝑑 (𝑠 𝑖 , 𝜃𝑖 )). Prove that this also
yields a 2-approximation (can be improved to 2(1 − 1/𝑘)-approximation).
Integrality gap: The analysis is tight as shown by the following integrality gap
example. Consider a star with center 𝑟 and 𝑘 leaves 𝑠 1 , 𝑠2 , . . . , 𝑠 𝑘 which are the
terminals. All edges have cost 1. Then it is easy to see that a feasible integral
solution consists of removing 𝑘 − 1 edges from the star. Hence OPT = 𝑘 − 1. On
the other hand, setting 𝑥 𝑒 = 1/2 for each edge is a feasible fractional solution of
cost 𝑘/2. Hence the integrality gap is 2(1 − 1/𝑘).
http://chekuri.cs.illinois.edu/papers/dir-multiway-cut-soda.pdf.
Chapter 16
Multicut
𝑐 𝑒 𝑑𝑒
Í
min
𝑒∈𝐸
s.t.
𝑑 𝑒 ≥ 1 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 1 ≤ 𝑖 ≤ 𝑘
Í
𝑒∈𝑝
𝑑𝑒 ≥ 0 𝑒∈𝐸
The LP assigns distance labels to edges so that, on each path 𝑝 between 𝑠 𝑖
and 𝑡 𝑖 , the distance labels of the edges on 𝑝 sum up to at least one. Note that,
even though the LP can have exponentially many constraints, we can solve the
LP in polynomial time using the ellipsoid method and the following separation
oracle. Given distance labels 𝑑 𝑒 , we set the length of each edge to 𝑑 𝑒 and, for
each pair (𝑠 𝑖 , 𝑡 𝑖 ), we compute the length of the shortest path between 𝑠 𝑖 and 𝑡 𝑖
and check whether it is at least one. If the shortest path between 𝑠 𝑖 and 𝑡 𝑖 has
length smaller than one, we have a violated constraint. Conversely, if all shortest
paths have length at least one, the distance labels define a feasible solution.
We also consider the dual of the previous LP. For each path 𝑝 between any
pair (𝑠 𝑖 , 𝑡 𝑖 ) we have a dual variable 𝑓𝑝 . We interpret each variable 𝑓𝑝 as the
183
CHAPTER 16. MULTICUT 184
amount of flow between 𝑠 𝑖 and 𝑡 𝑖 that is routed along the path 𝑝. We have the
following dual LP:
𝑘
𝑓𝑝
Í Í
max
𝑖=1 𝑝∈𝒫𝑠 𝑖 ,𝑡 𝑖
s.t.
𝑓𝑝 ≤ 𝑐 𝑒 𝑒 ∈ 𝐸(𝐺)
Í
𝑝: 𝑒∈𝑝
Exercise 16.1. Write the Multicut LP and its dual in a compact form with
polynomially many constraints.
CKR-RandomPartition:
Solve the LP to get the distance labels 𝑑 𝑒
Pick 𝜃 uniformly at random from [0, 1/2)
Pick a random permutation 𝜎 on {1, 2, ..., 𝑘}
for 𝑖 = 1 to 𝑘
𝑉𝜎(𝑖) = 𝐵 𝑑 (𝑠 𝜎(𝑖) , 𝜃)\ 𝑉𝜎(𝑗)
Ð
𝑗<𝑖
𝑘
𝛿(𝑉𝑖 )
Ð
Output
𝑖=1
Lemma 16.1. CKR-RandomPartition correctly outputs a feasible multicut for the given
instance.
Proof. Let 𝐹 be the set of edges output by the algorithm. Suppose 𝐹 is not a
feasible multicut. Then there exists a pair of vertices (𝑠 𝑖 , 𝑡 𝑖 ) such that there is a
path between 𝑠 𝑖 and 𝑡 𝑖 in 𝐺 − 𝐹. Therefore there exists a 𝑗 such that 𝑉𝑗 contains
𝑠 𝑖 and 𝑡 𝑖 . Since 𝑉𝑗 ⊆ 𝐵 𝑑 (𝑠 𝑗 , 𝜃), both 𝑠 𝑖 and 𝑡 𝑖 are contained in the ball of radius 𝜃
centered at 𝑠 𝑗 . Consequently, the distance between 𝑠 𝑗 and 𝑠 𝑖 is at most 𝜃 and the
distance between 𝑠 𝑗 and 𝑡 𝑖 is at most 𝜃. By the triangle inequality, the distance
between 𝑠 𝑖 and 𝑡 𝑖 is at most 2𝜃. Since 𝜃 is smaller than 1/2, it follows that the
distance between 𝑠 𝑖 and 𝑡 𝑖 is smaller than one. This contradicts the fact that the
distance labels 𝑑 𝑒 are a feasible solution for the LP. Therefore 𝐹 is a multicut, as
desired.
Lemma 16.2. The probability that an edge 𝑒 is cut is at most 2𝐻 𝑘 𝑑 𝑒 , where 𝐻 𝑘 is the
𝑘-th harmonic number and 𝑑 𝑒 is the distance label of the edge 𝑒.
Figure 16.1: For a fixed edge 𝑒 = (𝑢, 𝑣) we renumber the pairs such that
𝐿1 ≤ 𝐿2 ≤ ... ≤ 𝐿 𝑘 .
Õ
P[𝑒 is cut] = P[𝐴 𝑖 ].
𝑖
Let us fix 𝑟 ∈ [0, 1/2) and consider P[𝐴 𝑖 | 𝜃 = 𝑟]. Note that 𝑠 𝑖 cuts the edge 𝑒
only if one of 𝑢, 𝑣 is inside the ball of radius 𝑟 centered at 𝑠 𝑖 and the other is
outside the ball. Differently said, 𝑠 𝑖 cuts the edge only if 𝑟 ∈ [𝐿 𝑖 , 𝑅 𝑖 ):
P[𝐴 𝑖 | 𝜃 = 𝑟] = 0 if 𝑟 ∉ [𝐿 𝑖 , 𝑅 𝑖 )
Now suppose that 𝑟 ∈ [𝐿 𝑖 , 𝑅 𝑖 ). Let us fix 𝑗 < 𝑖 and suppose 𝑗 comes before
𝑖 in the permutation (that is, 𝜎(𝑗) < 𝜎(𝑖)). Recall that, since 𝑗 < 𝑖, we have
𝐿 𝑗 ≤ 𝐿 𝑖 ≤ 𝑟. Therefore at least one of 𝑢, 𝑣 is inside the ball of radius 𝑟 centered at
𝑠 𝑗 . Consequently, 𝑠 𝑖 cannot be the first to cut the edge 𝑒. Therefore 𝑠 𝑖 is the first
to cut the edge 𝑒 only if 𝜎(𝑖) < 𝜎(𝑗) for all 𝑗 < 𝑖. See Fig 16.2. Since 𝜎 is a random
permutation, 𝑖 appears before 𝑗 for all 𝑗 < 𝑖 with probability 1/𝑖. Therefore we
have:
CHAPTER 16. MULTICUT 187
1
P[𝐴 𝑖 | 𝜃 = 𝑟] ≤ if 𝑟 ∈ [𝐿 𝑖 , 𝑅 𝑖 )
𝑖
Figure 16.2: If 𝜎(𝑗) < 𝜎(𝑖), 𝑠 𝑖 cannot be the first to cut the edge 𝑒 = (𝑢, 𝑣). On the
left 𝑠 𝑗 also cuts the edge. On the right 𝑠 𝑗 captures both end points and therefore
𝑠 𝑖 cannot cut it.
Since 𝜃 was selected uniformly at random from the interval [0, 1/2), and
independently from 𝜎, we have:
1 2
P[𝐴 𝑖 ] ≤ P[𝜃 ∈ [𝐿 𝑖 , 𝑅 𝑖 )] = (𝑅 𝑖 − 𝐿 𝑖 )
𝑖 𝑖
By the triangle inequality, 𝑅 𝑖 ≤ 𝐿 𝑖 + 𝑑 𝑒 . Therefore:
2𝑑 𝑒
P[𝐴 𝑖 ] ≤
𝑖
Consequently,
Õ
P[𝑒 is cut] = P[𝐴 𝑖 ] ≤ 2𝐻 𝑘 𝑑 𝑒 .
𝑖
Corollary 16.1. The integrality gap of the Multicut LP is 𝑂(log 𝑘).
Proof. Let 𝐹 be the set of edges outputted by the Randomized Rounding algo-
rithm. For each edge 𝑒, let 𝜒𝑒 be an indicator random variable equal to 1 if and
only if the edge 𝑒 is in 𝐹. As we have already seen,
E 𝜒𝑒 = P[𝜒𝑒 = 1] ≤ 2𝐻 𝑘 𝑑 𝑒
Let 𝑐(𝐹) be a random variable equal to the total capacity of the edges in 𝐹. We
have:
CHAPTER 16. MULTICUT 188
Õ Õ Õ
E 𝑐(𝐹) = E 𝑐 𝑒 𝜒𝑒 = 𝑐 𝑒 P[𝜒𝑒 ] ≤ 2𝐻 𝑘 𝑐 𝑒 𝑑 𝑒 = 2𝐻 𝑘 OPTLP
𝑒 𝑒 𝑒
Consequently, there exists a set of edges 𝐹 such that the total capacity of the
edges in 𝐹 is at most 2𝐻 𝑘 OPTLP . Therefore OPT ≤ 2𝐻 𝑘 OPTLP , as desired.
E 𝑐(𝐹) ≤ 2𝐻 𝑘 OPTLP
where 𝐹 is the set of edges output by the algorithm and 𝑐(𝐹) is the total capacity
of the edges in 𝐹. Since OPT𝐿𝑃 ≤ OPT,
where | 𝑓 | represents the value of the multicommodity flow 𝑓 , and |𝐶 | represents the
capacity of the multicut 𝐶.
Proof. Let OPTLP denote the total capacity of an optimal (fractional) solution
for the Multicut LP. Let OPT𝑑𝑢𝑎𝑙 denote the flow value of an optimal solution
for the dual LP. Since OPTLP is a lower bound on the capacity of the minimum
(integral) multicut, we have:
Note that the complete graph 𝐾 𝑛 is a (|𝑉 |/2)-edge-expander. However, the more
interesting expander graphs are also sparse. Cycles and grids are examples of
graphs that are very poor expanders.
Figure 16.3: The top half of the cycle has |𝑉 |/2 vertices and only two edges
crossing
p the cut. The left half of the grid has roughly |𝑉 |/2 vertices and only
|𝑉 | edges crossing the cut.
Note that 2-regular graphs consist of a collection of edge disjoint cycles and
therefore they have poor expansion. However, for any 𝑑 ≥ 3, there exist 𝑑-regular
graphs that are very good expanders.
CHAPTER 16. MULTICUT 190
Theorem 16.6. For every 𝑑 ≥ 3 there exists an infinite family of 𝑑-regular 1-edge-
expanders.
We will only need the following special case of the previous theorem.
Theorem 16.7. There exists a universal constant 𝛼 > 0 and an integer 𝑛0 such that,
for all even integers 𝑛 ≥ 𝑛0 , there exists an 𝑛-vertex, 3-regular 𝛼-edge-expander.
Proof Idea. The easiest way to prove this theorem is using the probabilistic
method. The proof itself is beyond the scope of this lecture1. The proof idea is
the following.
Let’s fix an even integer 𝑛. We will generate a 3-regular random graph 𝐺 by
selecting three random perfect matchings on the vertex set {1, 2, ..., 𝑛} (recall
that a perfect matching is a set of edges such that every vertex is incident to
exactly one of these edges). We select a random perfect matching as follows.
We maintain a list of vertices that have not been matched so far. While there is
at least one vertex that is not matched, we select a pair of distinct vertices 𝑢, 𝑣
uniformly at random from all possible pairs of unmatched vertices. We add the
edge (𝑢, 𝑣) to our matching and we remove 𝑢 and 𝑣 from the list. We repeat this
process three times (independently) to get three random matchings. The graph
𝐺 will consist of the edges in these three matchings. Note that 𝐺 is actually a
3-regular multigraph since it might have parallel edges (if the same edge is in at
least two of the matchings). There are two properties of interest: (1) 𝐺 is a simple
graph and (2) 𝐺 is an 𝛼-edge-expander for some constant 𝛼 > 0. If we can show
that 𝐺 has both properties with positive probability, it follows that there exists
a 3-regular 𝛼-edge-expander (if no graph is a 3-regular 𝛼-edge-expander, the
probability that our graph 𝐺 has both properties is equal to 0).
It is not very hard to show that the probability that 𝐺 does not have property
(1) is small. To show that the probability that 𝐺 does not have property (2) is
small, for each set 𝑆 with at most 𝑛/2 vertices, we estimate the expected number
of edges that cross the cut (𝑆, 𝑉\𝑆) (e.g., we can easily show that |𝛿(𝑆)| ≥ |𝑆|/2).
Using tail inequalities (e.g., Chernoff bounds), we can show that the probability
that |𝛿(𝑆)| differs significantly from its expectation is extremely small (i.e., small
enough so that the sum – taken over all sets 𝑆 – of these probabilities is also
small) and we can use the union bound to get the desired result.
Note that explicit constructions of 𝑑-regular expanders are also known.
Margulis [119] gave an infinite family of 8-regular expanders. There are many
explicit construction by now and it is a very important topic of study — we
refer the reader to the survey on expanders by Hoory, Linial and Wigderson
1A more accurate statement is that the calculations are a bit involved and not terribly interesting
for us.
CHAPTER 16. MULTICUT 191
Proof. Note that 𝐵(𝑣, log3 𝑛/2) is the set of all vertices 𝑤 such that 𝑑𝑖𝑠𝑡(𝑣, 𝑤) is
at most log3 𝑛/2. As we have seen in the proof of the previous claim, we have
√
|𝐵(𝑣, log3 𝑛/2)| ≤ 3log3 𝑛/2 = 𝑛.
Claim 16.2.3. There exists a feasible fractional solution for (𝐺, 𝑋) of capacity 𝑂(𝑛/log 𝑛).
Proof. Let 𝑑 𝑒 = 2/log3 𝑛, for all 𝑒. Note that, since 𝐺 is 3-regular, 𝐺 has 3𝑛/2
edges. Therefore the total capacity of the fractional solution is
Õ 3𝑛 2 3𝑛
𝑑𝑒 = · =
𝑒
2 log3 𝑛 log3 𝑛
Therefore we only need to show that the solution is feasible. Let (𝑢, 𝑣) be a pair
in 𝑋. Let’s consider a path 𝑝 between 𝑢 and 𝑣. Since 𝑢 is not in 𝐵(𝑣, log3 𝑛/2),
the path 𝑝 has more than log3 𝑛/2 edges (recall that 𝐵(𝑣, 𝑖) is the set of all vertices
𝑢 such that there is a path between 𝑢 and 𝑣 with at most 𝑖 edges). Consequently,
Õ log3 𝑛 2
𝑑𝑒 > · =1
𝑒∈𝑝
2 log3 𝑛
Claim 16.2.4. Any integral solution for (𝐺, 𝑋) has capacity Ω(𝑛).
Proof. Let 𝐹 be an integral solution for (𝐺, 𝑋). Let 𝑉1 , ..., 𝑉ℎ be the connected
components of 𝐺 − 𝐹. Fix an 𝑖 and let 𝑣 be an arbitrary vertex in the connected
component 𝑉𝑖 . Note that, for any 𝑢 in 𝑉𝑖 , there is a path between 𝑣 and 𝑢
with at most log3 𝑛/2 edges (if not, (𝑢, 𝑣) is a pair in 𝑋 which contradicts the
fact that removing the edges in 𝐹 disconnects every pair in 𝑋). Therefore√ 𝑉𝑖 is
contained in 𝐵(𝑣, log3 𝑛/2). It follows from Claim 16.2.2 that |𝑉𝑖 | ≤ 𝑛. Since
𝐺 is an 𝛼-edge-expander and |𝑉𝑖 | ≤ |𝑉 |/2, we have |𝛿(𝑉𝑖 )| ≥ 𝛼|𝑉𝑖 |, for all 𝑖.
Consequently,
ℎ ℎ
1Õ 𝛼Õ 𝛼𝑛
|𝐹| = |𝛿(𝑉𝑖 )| ≥ |𝑉𝑖 | =
2 2 2
𝑖=1 𝑖=1
Therefore 𝐹 has total capacity Ω(𝑛) (recall that every edge has unit capacity).
CHAPTER 16. MULTICUT 193
Proof. Note that 𝑘 = |𝑋 | = 𝑂(𝑛 2 ). It follows from claims 10 and 11 that the LP
has integrality gap Ω(log 𝑛) = Ω(log 𝑘), as desired.
Bibliographic Notes
Multicut is closely related to the Sparsest Cut problem. Initial algorithm
for Multicut were based on algorithms for Sparsest Cut. Garg, Vazirani and
Yannakakis [63] then used Leighton and Rao’s region growing argument (as
well as their integrality gap example on expanders for the uniform sparsest
cut problem) [113] to obtain a tight 𝑂(log 𝑘) bound on the integrality gap
for Multicut. The randomized proof that we described is from the work of
Calinescu, Karloff and Rabani [27] on the 0-extension problem; their algorithm
and analysis eventually led to an optimal bound for approximating an arbitrary
metric via random trees [54]. For planar graphs (and more generally any proper
minor closed family of graphs) the integrality gap is 𝑂(1), as shown by Klein,
Plotkin and Rao [104] — the constant depends on the family. There have been
several subsequent refinements of the precise dependence of the constant in the
integrality gap — see [1]. The 𝑂(log 𝑘) bound extends to node-weighted case
and the 𝑂(1) approximation for planar graphs also extends to the node-weighted
case. Multicut is APX-Hard even on trees and in general graphs. Assuming
UGC, the problem is known to be hard to approximate to a super-constant factor
[36]. For some special case of Multicut based on the structure of the demand
graph one can obtain improved approximation ratios [39].
The directed graph version of Multicut turns out be much more difficult.
The flow-cut gap is known to be Ω̃(𝑛 1/7 ) and the problem is also known to
be hard to approximate to almost polynomial factors; these negative results
are due to Chuzhoy and Khanna [45]. The best known approximation ratio is
˜ 11/23 )} [3]. Very recently Kawarabayashi and Sidiropoulos obtained
min{𝑘, 𝑂(𝑛
a poly-logarithmic approximation for Directed Multicut if 𝐺 is a planar directed
graph [101]. There is a notion of symmetric demands in directed graphs and
for that version of Multicut one can get a poly-logarithmic flow-cut gap and
approximation; see [37, 105]. This is closely connected to the Feedback Arc Set
problem in directed graphs [53, 138].
Chapter 17
Sparsest Cut
194
CHAPTER 17. SPARSEST CUT 195
directed graphs.
Uniform Sparsest Cut: Very often when people say Sparsest Cut they mean
the uniform version. This is the version in which 𝐷(𝑢, 𝑣) = 1 for each unordered
pair of vertices (𝑢, 𝑣). For these demands the a cut 𝑆 is 𝑐(𝛿 𝐺 (𝑆))
|𝑆||𝑉\𝑆|
. Alternatively the
demand graph 𝐻 is a complete graph with unit demand values on each edge.
A slightly generalization of Uniform Sparsest Cut is obtained by considering
demands induced by weights on vertices (the dual flow instances are called
Prodcut Multicommodity Flow instances). There is a weight function 𝜋 :
𝑉 → ℝ+ on the vertices and and demand 𝐷(𝑢, 𝑣) for pair (𝑢, 𝑣) is set to be
𝜋(𝑢)𝜋(𝑣). Note that if 𝜋(𝑢) = 1 for all 𝑢 then we obtain Uniform Sparsest Cut. If
𝜋(𝑢) ∈ {0, 1} for all 𝑢 then we are focusing our attention on sparsity with respect
to the set 𝑉 0 = {𝑣 | 𝜋(𝑣) = 1} since the the vertices with 𝜋(𝑢) = 0 play no role.
This may seem unnatural at first but it is closely connected to expansion and
conductance as we will see below.
|𝛿(𝑆)|
Expansion: The expansion of a multi-graph 𝐺 = (𝑉 , 𝐸) is defined as min𝑆:|𝑆|≤|𝑉 |/2 |𝑆|
.
Recall that 𝐺 is an 𝛼-expander if the expansion of 𝐺 is at least 𝛼. A random
3-regular graph is 𝛼-expander with 𝛼 = Ω(1) with high probability. Thus, to
find an 𝛼-expander one can obtain an efficient randomized algorithm by picking
a random graph and then verifying its expansion. However, checking expansion
is coNP-Hard. Expansion is closely related to Uniform Sparsest Cut. Note that
when |𝑆| ≤ |𝑉 |/2 we have
1 |𝛿(𝑆)| |𝛿(𝑆)| 2 |𝛿(𝑆)|
≤ ≤ .
|𝑉 | |𝑆| |𝑆||𝑉 \ 𝑆| |𝑉 | |𝑆|
Thus Expansion and Uniform Sparsest Cut are within a factor of 2 of each other.
Sometimes it is useful to consider expansion with vertex weights 𝑤 : 𝑉 →
ℝ+ . Here the expansion is defined as min𝑆:𝑤(𝑆)≤𝑤(𝑉)/2 |𝛿(𝑆)|
𝑤(𝑆)
. This corresponds
to product multicommodity flow instances where 𝜋(𝑣) = 𝑤(𝑣). The term
|𝛿(𝑆)|
Conductance is sometimes used to denote the quantity vol(𝑆) where vol(𝑆) =
Í
𝑣∈𝑆 deg(𝑣) (here vol is short for volume). When a graph is regular the definition
of expansion and conductance are the same but not in the general setting.
Note that we can capture conductance by setting weights on vertices where
𝑤(𝑣) = deg(𝑣).
Some key applications: Uniform Sparsest Cut is fundamentally interesting
because it helps us directly and indirectly solve the Balanced Separator problem.
In the latter problem we want to partition 𝐺 = (𝑉 , 𝐸) into two pieces 𝐺1 = (𝑉1 , 𝐸1 )
and 𝐺2 = (𝑉2 , 𝐸2 ) where |𝑉1 | and |𝑉2 are roughly the same size so that we
minimize the number of edges between 𝑉1 and 𝑉2 . One can repeatedly use
a sparse cut routine to get a balanced separator. The other key application is
CHAPTER 17. SPARSEST CUT 196
𝑐𝑒 𝑥𝑒
Í
min Í𝑒∈𝐸
𝑘
𝑖=1 𝐷𝑖 𝑦 𝑖
Õ
𝑥𝑒 ≥ 𝑦𝑖 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 𝑖 ∈ [𝑘]
𝑒∈𝑝
𝑥𝑒 ∈ {0, 1} 𝑒∈𝐸
𝑦𝑖 ∈ {0, 1} 𝑖 ∈ [𝑘]
Note, however, that the objective is a ratio and not linear. It is a standard trick
to obtain an LP relaxation wherein we normalize the denominator in the ratio to
1 and relaxt the variables to be real-valued. Thus we obtain the following LP
relaxation.
Õ
min 𝑐𝑒 𝑥𝑒
𝑒∈𝐸
𝑘
Õ
𝐷𝑖 𝑦 𝑖 = 1
𝑖=1
Õ
𝑥𝑒 ≥ 𝑦𝑖 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 𝑖 ∈ [𝑘]
𝑒∈𝑝
𝑥𝑒 ≥ 0 𝑒∈𝐸
𝑦𝑖 ≥ 0 𝑖 ∈ [𝑘]
CHAPTER 17. SPARSEST CUT 197
Exercise 17.1. Show that the LP is indeed a relaxation for the Sparsest Cut
problem. Formally, given an integer feasible solution with sparsity 𝜆 find a
feasible solution to the relaxation such that its value is no more than 𝜆.
max 𝜆
Õ
𝑦𝑝 ≥ 𝜆𝐷𝑖 𝑖 ∈ [𝑘]
𝑝∈𝒫𝑠 𝑖 ,𝑡 𝑖
𝑘
Õ Õ
𝑦𝑝 ≤ 𝑐𝑒 𝑒∈𝐸
𝑖=1 𝑝∈𝒫𝑠 𝑖 ,𝑡 𝑖 ,𝑒∈𝑝
𝑦𝑝 ≥ 0 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 𝑖 ∈ [𝑘]
Note that the LP can be solved via the Ellipsoid method. One can also
write a compact LP via distance variables which will help us later to focus on
constraining the metric in other ways.
Õ
min 𝑐(𝑢𝑣)𝑑(𝑢𝑣)
𝑢𝑣∈𝐸
𝑘
Õ
𝐷𝑖 𝑑(𝑠 𝑖 𝑡 𝑖 ) = 1
𝑖=1
𝑑 is a metric on 𝑉
Flow-cut gap: The flow-cut gap in this context is the following equivalent way
of thinking about the problem. Consider a multicommodity flow instance on 𝐺
with demand pairs (𝑠 1 , 𝑡1 ), . . . , (𝑠 𝑘 , 𝑡 𝑘 ) and demand values 𝐷1 , . . . , 𝐷 𝑘 . Suppose
𝐺 satisfies the cut-condition, that is, for every 𝑆 ⊆ 𝑉 the capacity 𝑐(𝛿(𝑆)) is at least
CHAPTER 17. SPARSEST CUT 198
the demand separated by 𝑆. Can we route all the demand pairs? This is true
when 𝑘 = 1 but is not true in general even for 𝑘 = 3 in undirected graphs. The
question is the maximum value of 𝜆 such that we can route 𝜆𝐷𝑖 for every pair 𝑖?
The worst-case integrality gap of the preceding LP relaxation for Sparsest Cut is
precisely the flow-cut gap. One can ask about the flow-cut gap for all graphs, a
specific class of graphs, for a specific class of demand graphs, a specific class of
supply and demand graphs, and so on.
In these notes we will establish that the flow-cut gap in general undirected
graphs is at most 𝑂(log 𝑘). And there are instances where the gap is Ω(log 𝑘). It
is conjecturedpthat the gap is 𝑂(1) for planar graphs but the best upper bound
we have is 𝑂( log 𝑛). Resolving the flow-cut gap in planar graphs is a major
open problem.
Remark 17.2. Approximating the Sparsest Cut problem is not the same as
establishing flow-cut gaps. One can obtain improved approximations for Sparsest
Cut via stronger relaxationspthan the natural LP. Indeed the best approximation
ratio for Sparsest Cut is 𝑂( log 𝑛) via an SDP relaxation.
Can we prove some form a converse? That is, can we use an approximation
algorithm for Multicut to obtain an approximation algorithm for Sparsest Cut?
Note that if someone told us the pairs to separate in an optimum solution
to the Sparsest Cut instance then we can use an (approximation) algorithm
for Multicut to separate those pairs. Here we show that one can use the LP
relaxation and obtain an algorithm via the integrality gap that we have already
establised for Multicut LP. We sketch the argument and focus our attention on
the simpler case when 𝐷𝑖 = 1 for all 𝑖 ∈ [𝑘]. We give this argument even though
it does not lead to the optimum ratio, for historical interest, as well as to illustrate
a useful high-level technique that has found applications in other settings.
Identifying the pairs to separate from LP solution: Suppose we solve the LP
and obtain a feasible solution (𝑥, 𝑦). 𝑦 𝑖 indicates the extent to which pair 𝑖 is
separated. Suppose we have an ideal situation where 𝑦 𝑖 ∈ {0, 𝑝} for every 𝑖.
Let 𝐴 = {𝑖 | 𝑦 𝑖 = 𝑝}. We have |𝐴| = 1/𝑝 since 𝑖 𝑦 𝑖 = 1. Then it is intutively
Í
CHAPTER 17. SPARSEST CUT 199
clear that the LP is separating the pairs in 𝐴. We can then solve the Multicut
problem for the pairs in 𝐴 and consider the ratio of the cost of the cut to |𝐴|.
How do we argue about this algorithm? We do the following. Consider a
fractional assignment 𝑥 0 : 𝐸 → ℝ+ where 𝑥 0𝑒 = min{1, 𝑥 𝑒 /𝑝}; in other words
we scale each 𝑥 𝑒 by 1/𝑝. Note that 𝑦 𝑖 = 𝑑 𝑥 (𝑠 𝑖 , 𝑡 𝑖 ). Since we scaled up each 𝑥 𝑒
by 1/𝑝 it is not hard to see that 𝑑 𝑥0 (𝑠 𝑖 , 𝑡 𝑖 ) ≥ 1; in other words 𝑥 0 is a feasible
solution to the Multicut instance on 𝐺 for the pairs in 𝐴. The fractional cost of
𝑥 0 is 𝑒 𝑐 𝑒 𝑥 0𝑒 ≤ 𝑒 𝑐 𝑒 𝑥 𝑒 /𝑝. Thus, by the algorithm for Multicut in the previous
Í Í
chapter, we can find a feasible Multicut 𝐸0 ⊆ 𝐸 that separates all pairs in 𝐴 and
𝑐(𝐸0) = 𝑂(log 𝑘) 𝑒 𝑐 𝑒 𝑥 𝑒 /𝑝. What is the sparsity of this cut? It is 𝑐(𝐸0)/|𝐴| which
Í
is 𝑂(log 𝑘) 𝑒 𝑥 𝑒 . Thus the sparsity of the cut is 𝑂(log 𝑘)𝜆 where 𝜆 is the value
Í
of the LP relaxation.
Now we consider the general setting. Recall that 𝑖 𝑦 𝑖 = 1. We partition
Í
the pairs into groups that have similar 𝑦 𝑖 values. For 𝑗 ≥ 0, let 𝐴 𝑗 = {𝑖 | 𝑦 𝑖 ∈
(1/2 𝑗+1 , 1/2 𝑗 ]. Thus all pairs in 𝐴 𝑗 have a 𝑦 𝑖 value that are within a factor of 2 of
each other.
𝑦𝑖 ≥ 1
Í
𝑖∈𝐴 𝑗 2(1+log2 𝑘)
(there are only so many groups).
Claim 17.1.2. Consider the fractional solution 𝑥 0 : 𝐸 → [0, 1] where 𝑥 0𝑒 = min{1, 2 𝑗+1 𝑥 𝑒 }.
Then 𝑑 𝑥0 (𝑠 𝑖 , 𝑡 𝑖 ) ≥ 1 for all 𝑖 ∈ 𝐴 𝑗 . Thus 𝑥 0 is a feasible fractional solution to the Multicut
LP for separating the pairs in 𝐴 𝑗 .
𝑐𝑒 𝑥𝑒
Í
𝑒
𝜆 =
𝑖 𝐷𝑖 𝑑 𝑥 (𝑠 𝑖 , 𝑡 𝑖 )
Í
𝑒 𝑐𝑒 𝑥𝑒
Í
=
𝑖 𝐷𝑖 𝑒∈𝑃𝑠 ,𝑡 𝑥 𝑒
Í Í
𝑖 𝑖
𝑐𝑒 𝑥𝑒
Í
= Í𝑒
𝐷𝑒𝑒
𝑐𝑒
≥ min .
𝑒 𝐷𝑒
𝑎 1 +𝑎 2 +...+𝑎 𝑛 𝑎𝑖
In the last inequality we are using the simple fact that 𝑏1 +𝑏2 +...+𝑏 𝑛 ≥ min𝑖 𝑏𝑖 for
positive 𝑎’s and 𝑏’s.
What made the proof work for trees? Is there a more general phenomenon
than the fact that trees are pretty simple structures? It turns out that the key fact
is that shortest path distances induced by a tree are ℓ1 metrics or equivalently
cut metrics.
CHAPTER 17. SPARSEST CUT 201
Definition 17.1. Let 𝑉 be a finite set and let 𝑆 ⊆ 𝑉. The metric 𝑑𝑆 associated with the
cut 𝑆 is the following: 𝑑𝑆 (𝑢, 𝑣) = 1 if |𝑆 ∩ {𝑢, 𝑣}| = 1 and 𝑑𝑆 (𝑢, 𝑣) = 0 otherwise.
Definition 17.2. Let (𝑉 , 𝑑) be a finite metric space. The metric 𝑑 is a cut metric if
there is a set 𝑆 ⊂ 𝑉 such that 𝑑 = 𝑑𝑆 . 𝑑 is in the cut cone (or in the cone of cut metric)
if there exist non-negative scalars 𝑦𝑆 , 𝑆 ⊂ 𝑉 such that 𝑑(𝑢, 𝑣) = 𝑆⊂𝑉 𝑦𝑆 𝑑𝑆 (𝑢, 𝑣) for
Í
all 𝑢, 𝑣 ∈ 𝑉.
Definition 17.3. Let (𝑉 , 𝑑) be a finite metric space. The metrid 𝑑 is a line metric if
there is a mappting 𝑓 : 𝑉 → ℝ (the real line) such that 𝑑(𝑢, 𝑣) = | 𝑓 (𝑢) − 𝑓 (𝑣)| for all
𝑢, 𝑣 ∈ 𝑉.
Proof. Consider the metric 𝑑𝑆 . It is easy to that it is a simple line metric. Map all
vertices in 𝑆 to 0 and all vertices in 𝑉 − 𝑆 to 1. If 𝑑 is in the cut cone then it is
a non-negative combination of the cut metrics, and hence it is a non-negative
combination of line metrics, and hence an ℓ 1 metric.
To prove the converse, it suffices to argue that any line metric is in the cut
cone. Let 𝑉 = {𝑣 1 , 𝑣2 , . . . , 𝑣 𝑛 } and let 𝑑 be a line metric on 𝑉. Without loss of
generality assume that the coordinates of the points corresponding to the line
metric 𝑑 are 𝑥 1 ≤ 𝑥 2 ≤≤ 𝑥 𝑛 on the real line. For 1 ≤ 𝑖 < 𝑛 let 𝑆 𝑖 = {𝑣 1 , 𝑣2 , . . . , 𝑣 𝑖 }.
It is not hard to verify that 𝑖 |𝑥 𝑖+1 − 𝑥 𝑖 |𝑑𝑆𝑖 = 𝑑.
Í
CHAPTER 17. SPARSEST CUT 202
2
embedding 𝑓 : 𝑉 → ℝ 𝑂(log 𝑛) such that (i) the embedding is a contraction (that is,
k 𝑓 (𝑢) − 𝑓 (𝑣)k 1 ≤ 𝑑(𝑢, 𝑣) for all 𝑢, 𝑣 ∈ 𝑉 and (ii) for every 𝑢, 𝑣 ∈ 𝑆, k 𝑓 (𝑢) − 𝑓 (𝑣)k 1 ≥
𝑐
log 𝑘 𝑑(𝑢, 𝑣) for some universal constant 𝑐.
Theorem 17.9. Let 𝐺 = (𝑉 , 𝐸) be a graph. Suppose any finite metric induced by edge
lengths on 𝐸 can be embedded into ℓ 1 with distortion 𝛼. Then the integrality gap of the
LP for Sparsest Cut is at most 𝛼 for any instance on 𝐺.
Proof. Let (𝑥, 𝑦) be a feasible fractional solution and let 𝑑 be the metric induced
by edge lengths given by 𝑥. Let 𝜆 be the value of the solution and recall that
𝑢𝑣∈𝐸 𝑐(𝑢𝑣)𝑑(𝑢𝑣)
Í
𝜆 = Í𝑘 .
𝑖=1 𝐷𝑖 𝑑(𝑠 𝑖 ,𝑡 𝑖 )
Since 𝑑 can be embedded into ℓ 1 with distortion at most 𝛼 and any ℓ 1 metric
is in the cut-cone, it implies that there are scalaras 𝑧 𝑆 , 𝑆 ⊂ 𝑉 such that for all 𝑢, 𝑣
1 Õ Õ
𝑦𝑆 𝑑𝑆 (𝑢, 𝑣) ≤ 𝑑(𝑢, 𝑣) ≤ 𝑦𝑆 𝑑𝑆 (𝑢, 𝑣).
𝛼
𝑆⊂𝑉 𝑆⊂𝑉
𝑐(𝑢𝑣)𝑑(𝑢𝑣)
Í
𝑢𝑣∈𝐸
𝜆 = Í𝑘
𝐷𝑖 𝑑(𝑠 𝑖 , 𝑡 𝑖 )
Í𝑖=1
𝑢𝑣∈𝐸 𝑐(𝑢𝑣) 𝑆⊂𝑉 𝑧 𝑆 𝑑𝑆 (𝑢𝑣)
Í
1
≥
𝛼 Í𝑘
𝑖=1 𝐷𝑖 𝑆⊂𝑉 𝑑𝑆 (𝑠 𝑖 , 𝑡 𝑖 )
Í
𝑧 𝑆 𝑐(𝛿(𝑆))
Í
1
= Í 𝑆⊂𝑉
𝛼 𝑆⊂𝑉 𝑧 𝑆 Dem(𝛿(𝑆))
1 𝑐(𝛿(𝑆))
≥ min .
𝛼 𝑆⊂𝑉 Dem(𝛿(𝑆))
SparseCutviaEmbedding
3. For 𝑖 = to 𝑑 do
4. Output among all cuts 𝑆 𝑖,ℎ with 1 ≤ 𝑖 ≤ 𝑑 and 1 ≤ ℎ ≤ 𝑛 − 1 output the one
with the smallest sparsity.
Exercise 17.5. Use the refined guarantee in and the proof outline in Theorem 17.9
to show that the described algorithm is a randomized 𝑂(log 𝑘)-approximation
algorithm for Sparsest Cut.
Bibliographic Notes
The highly influential paper of Leighton and Rao [113] obtained an 𝑂(log 𝑛)-
approximation and flow-cut gap for Uniform Sparsest Cut and introduced
the region growing argument as well as the lower bound via expanders (an
important influence is the paper of Sharokhi and Matula [SharokhiM99]). [113]
demonstrated many applications of the divide and conquer approach. There is
CHAPTER 17. SPARSEST CUT 205
a large literature on Sparsest Cut and related problems and we only touched
upon a small part. An outstanding open problem is whether the flow-cut gap
for Non-Uniform Sparsest Cut in planar graphs is 𝑂(1) (this called the GNRS
conjecture [73] in the more general context of minor-free graphs); Rao, building
on ideas from [104], showed that the gap is 𝑂( log 𝑛) [134]. No super-constant
p
lower bound is known for planar graphs. The theory of metric embeddings
has been a fruitful bridge between TCS and mathematics and there are several
surveys and connections from both perspectives.
Bibliography
[1] Ittai Abraham, Cyril Gavoille, Anupam Gupta, Ofer Neiman, and Kunal
Talwar. “Cops, robbers, and threatening skeletons: Padded decomposi-
tion for minor-free graphs”. In: SIAM Journal on Computing 48.3 (2019),
pp. 1120–1145.
[2] Anna Adamaszek, Parinya Chalermsook, Alina Ene, and Andreas Wiese.
“Submodular unsplittable flow on trees”. In: International Conference
on Integer Programming and Combinatorial Optimization. Springer. 2016,
pp. 337–349.
[3] Amit Agarwal, Noga Alon, and Moses S Charikar. “Improved approxi-
mation for directed cut problems”. In: Proceedings of the thirty-ninth annual
ACM symposium on Theory of computing. 2007, pp. 671–680.
[4] Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. “Adaptive sampling
for k-means clustering”. In: Approximation, Randomization, and Combina-
torial Optimization. Algorithms and Techniques. Springer, 2009, pp. 15–
28.
[5] Ajit Agrawal, Philip Klein, and Ramamoorthi Ravi. “When trees collide:
An approximation algorithm for the generalized Steiner problem on
networks”. In: SIAM journal on Computing 24.3 (1995), pp. 440–456.
[6] Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. “Streaming k-means
approximation.” In: NIPS. Vol. 22. 2009, pp. 10–18.
[7] Karhan Akcoglu, James Aspnes, Bhaskar DasGupta, and Ming-Yang Kao.
“Opportunity cost algorithms for combinatorial auctions”. In: Computa-
tional Methods in Decision-Making, Economics and Finance. Springer, 2002,
pp. 455–479.
[8] Matthew Andrews, Julia Chuzhoy, Venkatesan Guruswami, Sanjeev
Khanna, Kunal Talwar, and Lisa Zhang. “Inapproximability of edge-
disjoint paths and low congestion routing on undirected graphs”. In:
Combinatorica 30.5 (2010), pp. 485–520.
206
BIBLIOGRAPHY 207
[9] Kenneth Appel, Wolfgang Haken, et al. “Every planar map is four
colorable. Part I: Discharging”. In: Illinois Journal of Mathematics 21.3
(1977), pp. 429–490.
[10] Sanjeev Arora. “Polynomial time approximation schemes for Euclidean
traveling salesman and other geometric problems”. In: Journal of the ACM
(JACM) 45.5 (1998), pp. 753–782.
[11] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh
Munagala, and Vinayaka Pandit. “Local search heuristics for k-median
and facility location problems”. In: SIAM Journal on computing 33.3 (2004),
pp. 544–562.
[12] Arash Asadpour, Michel X Goemans, Aleksander Madry, ˛ Shayan Oveis
Gharan, and Amin Saberi. “An O (log n/log log n)-approximation
algorithm for the asymmetric traveling salesman problem”. In: Operations
Research 65.4 (2017). Preliminary version in Proc. of ACM-SIAM SODA,
2010., pp. 1043–1061.
[13] Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali
Kemal Sinop. “The Hardness of Approximation of Euclidean k-Means”.
In: 31st International Symposium on Computational Geometry (SoCG 2015). Ed.
by Lars Arge and János Pach. Vol. 34. Leibniz International Proceedings
in Informatics (LIPIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-
Zentrum fuer Informatik, 2015, pp. 754–767. isbn: 978-3-939897-83-5. doi:
10.4230/LIPIcs.SOCG.2015.754. url: http://drops.dagstuhl.de/opus/
volltexte/2015/5117.
[14] Baruch Awerbuch, Yossi Azar, and Yair Bartal. “On-line generalized
Steiner problem”. In: Theoretical Computer Science 324.2-3 (2004), pp. 313–
324.
[15] Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and
Sergei Vassilvitskii. “Scalable K-Means++”. In: Proc. VLDB Endow. 5.7
(Mar. 2012), pp. 622–633. issn: 2150-8097. doi: 10.14778/2180912.2180915.
url: https://doi.org/10.14778/2180912.2180915.
[16] Tanvi Bajpai, Deeparnab Chakrabarty, Chandra Chekuri, and Maryam
Negahbani. “Revisiting Priority 𝑘-Center: Fairness and Outliers”. In:
arXiv preprint arXiv:2103.03337 (2021).
[17] Brenda S Baker. “Approximation algorithms for NP-complete problems
on planar graphs”. In: Journal of the ACM (JACM) 41.1 (1994), pp. 153–180.
[18] Nikhil Bansal, Nitish Korula, Viswanath Nagarajan, and Aravind Srini-
vasan. “Solving packing integer programs via randomized rounding
with alterations”. In: Theory of Computing 8.1 (2012), pp. 533–565.
BIBLIOGRAPHY 208
[30] Amit Chakrabarti, Chandra Chekuri, Anupam Gupta, and Amit Ku-
mar. “Approximation algorithms for the unsplittable flow problem”. In:
Algorithmica 47.1 (2007), pp. 53–78.
[31] Deeparnab Chakrabarty and Maryam Negahbani. “Generalized center
problems with outliers”. In: ACM Transactions on Algorithms (TALG) 15.3
(2019), pp. 1–14.
[32] Timothy M Chan. “Approximation schemes for 0-1 knapsack”. In: 1st
Symposium on Simplicity in Algorithms (SOSA 2018). Schloss Dagstuhl-
Leibniz-Zentrum fuer Informatik. 2018.
[33] Ashok K Chandra, Daniel S. Hirschberg, and Chak-Kuen Wong. “Ap-
proximate algorithms for some generalized knapsack problems”. In:
Theoretical Computer Science 3.3 (1976), pp. 293–304.
[34] Moses Charikar, Chandra Chekuri, To-Yat Cheung, Zuo Dai, Ashish Goel,
Sudipto Guha, and Ming Li. “Approximation algorithms for directed
Steiner problems”. In: Journal of Algorithms 33.1 (1999), pp. 73–91.
[35] Moses Charikar, Sudipto Guha, Éva Tardos, and David B Shmoys. “A
constant-factor approximation algorithm for the k-median problem”. In:
Journal of Computer and System Sciences 65.1 (2002), pp. 129–149.
[36] Shuchi Chawla, Robert Krauthgamer, Ravi Kumar, Yuval Rabani, and
D Sivakumar. “On the hardness of approximating multicut and sparsest-
cut”. In: computational complexity 15.2 (2006), pp. 94–114.
[37] Chandra Chekuri, Sreeram Kannan, Adnan Raja, and Pramod Viswanath.
“Multicommodity flows and cuts in polymatroidal networks”. In: SIAM
Journal on Computing 44.4 (2015), pp. 912–943.
[38] Chandra Chekuri and Sanjeev Khanna. “A polynomial time approxima-
tion scheme for the multiple knapsack problem”. In: SIAM Journal on
Computing 35.3 (2005), pp. 713–728.
[39] Chandra Chekuri and Vivek Madan. “Approximating multicut and the
demand graph”. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM
Symposium on Discrete Algorithms. SIAM. 2017, pp. 855–874.
[40] Chandra Chekuri and Kent Quanrud. “On approximating (sparse) cover-
ing integer programs”. In: Proceedings of the Thirtieth Annual ACM-SIAM
Symposium on Discrete Algorithms. SIAM. 2019, pp. 1596–1615.
[41] Chandra Chekuri and Thapanapong Rukkanchanunt. “A note on it-
erated rounding for the Survivable Network Design Problem”. In: 1st
Symposium on Simplicity in Algorithms (SOSA 2018). Schloss Dagstuhl-
Leibniz-Zentrum fuer Informatik. 2018.
BIBLIOGRAPHY 210
[55] Tomás Feder and Daniel Greene. “Optimal algorithms for approximate
clustering”. In: Proceedings of the twentieth annual ACM symposium on
Theory of computing. 1988, pp. 434–444.
[56] Uriel Feige. “A threshold of ln n for approximating set cover”. In: Journal
of the ACM (JACM) 45.4 (1998), pp. 634–652.
[57] Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. “Maximizing non-
monotone submodular functions”. In: SIAM Journal on Computing 40.4
(2011), pp. 1133–1153.
[58] Uriel Feige and Jan Vondrak. “Approximation algorithms for allocation
problems: Improving the factor of 1-1/e”. In: 2006 47th Annual IEEE
Symposium on Foundations of Computer Science (FOCS’06). IEEE. 2006,
pp. 667–676.
[59] Moran Feldman, Joseph Seffi Naor, Roy Schwartz, and Justin Ward. “Im-
proved approximations for k-exchange systems”. In: European Symposium
on Algorithms. Springer. 2011, pp. 784–798.
[60] Moran Feldman, Joseph Seffi Naor, Roy Schwartz, and Justin Ward. “Im-
proved approximations for k-exchange systems”. In: European Symposium
on Algorithms. Springer. 2011, pp. 784–798.
[61] Marshall L Fisher, George L Nemhauser, and Laurence A Wolsey. “An
analysis of approximations for maximizing submodular set functions II”.
In: Polyhedral combinatorics (1978), pp. 73–87.
[62] Michael R Garey and David S Johnson. Computers and intractability.
Vol. 174. Freeman, San Francisco, 1979.
[63] N. Garg, V.V. Vazirani, and M. Yannakakis. “Approximate max-flow
min-(multi) cut theorems and their applications”. In: Proceedings of the
twenty-fifth annual ACM symposium on Theory of computing. ACM New
York, NY, USA. 1993, pp. 698–707.
[64] Shayan Oveis Gharan, Amin Saberi, and Mohit Singh. “A randomized
rounding approach to the traveling salesman problem”. In: 2011 IEEE
52nd Annual Symposium on Foundations of Computer Science. IEEE. 2011,
pp. 550–559.
[65] Michel X Goemans, Neil Olver, Thomas Rothvoß, and Rico Zenklusen.
“Matroids and integrality gaps for hypergraphic steiner tree relaxations”.
In: Proceedings of the forty-fourth annual ACM symposium on Theory of
computing. 2012, pp. 1161–1176.
BIBLIOGRAPHY 212
[92] Ragesh Jaiswal, Amit Kumar, and Sandeep Sen. “A simple D 2-sampling
based PTAS for k-means and other clustering problems”. In: Algorithmica
70.1 (2014), pp. 22–46.
[93] Klaus Jansen. “An EPTAS for scheduling jobs on uniform processors:
using an MILP relaxation with a constant number of integral variables”.
In: SIAM Journal on Discrete Mathematics 24.2 (2010), pp. 457–485.
[94] Klaus Jansen. “Parameterized approximation scheme for the multiple
knapsack problem”. In: SIAM Journal on Computing 39.4 (2010), pp. 1392–
1412.
[95] Klaus Jansen and Lars Rohwedder. “A quasi-polynomial approxima-
tion for the restricted assignment problem”. In: International Conference
on Integer Programming and Combinatorial Optimization. Springer. 2017,
pp. 305–316.
[96] David S Johnson. “Approximation algorithms for combinatorial prob-
lems”. In: Journal of computer and system sciences 9.3 (1974), pp. 256–278.
[97] EG Co man Jr, MR Garey, and DS Johnson. “Approximation algorithms
for bin packing: A survey”. In: Approximation algorithms for NP-hard
problems (1996), pp. 46–93.
[98] George Karakostas. “A better approximation ratio for the vertex cover
problem”. In: ACM Transactions on Algorithms (TALG) 5.4 (2009), p. 41.
[99] Anna R Karlin, Nathan Klein, and Shayan Oveis Gharan. “A (slightly)
improved approximation algorithm for metric TSP”. In: Proceedings of
the 53rd Annual ACM SIGACT Symposium on Theory of Computing. 2021,
pp. 32–45.
[100] Narendra Karmarkar and Richard M Karp. “An efficient approximation
scheme for the one-dimensional bin-packing problem”. In: 23rd Annual
Symposium on Foundations of Computer Science (sfcs 1982). IEEE. 1982,
pp. 312–320.
[101] Ken-ichi Kawarabayashi and Anastasios Sidiropoulos. “Embeddings of
Planar Quasimetrics into Directed ℓ 1 and Polylogarithmic Approximation
for Directed Sparsest-Cut”. In: Proceedigns of IEEE FOCS (2021). To appear.
2021.
[102] H. Kellerer, H.K.U.P.D. Pisinger, U. Pferschy, and D. Pisinger. Knapsack
Problems. Springer Nature Book Archives Millennium. Springer, 2004. isbn:
9783540402862. url: https://books.google.com/books?id=u5DB7gck08YC.
[103] Subhash Khot and Oded Regev. “Vertex cover might be hard to approxi-
mate to within 2- 𝜀”. In: Journal of Computer and System Sciences 74.3 (2008),
pp. 335–349.
BIBLIOGRAPHY 215
[104] Philip Klein, Serge A Plotkin, and Satish Rao. “Excluded minors, network
decomposition, and multicommodity flow”. In: Proceedings of the twenty-
fifth annual ACM symposium on Theory of computing. 1993, pp. 682–690.
[105] Philip N Klein, Serge A Plotkin, Satish Rao, and Eva Tardos. “Approx-
imation algorithms for Steiner and directed multicuts”. In: Journal of
Algorithms 22.2 (1997), pp. 241–269.
[106] Stavros G Kolliopoulos and Neal E Young. “Approximation algorithms
for covering/packing integer programs”. In: Journal of Computer and
System Sciences 71.4 (2005), pp. 495–505.
[107] Guy Kortsarz and Zeev Nutov. “Approximating minimum cost con-
nectivity problems”. In: Parameterized complexity and approximation al-
gorithms. Ed. by Erik D. Demaine, MohammadTaghi Hajiaghayi, and
Dániel Marx. Dagstuhl Seminar Proceedings 09511. Dagstuhl, Germany:
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 2010. url:
http://drops.dagstuhl.de/opus/volltexte/2010/2497.
[108] Madhukar R Korupolu, C Greg Plaxton, and Rajmohan Rajaraman.
“Analysis of a local search heuristic for facility location problems”. In:
Journal of algorithms 37.1 (2000), pp. 146–188.
[109] Ravishankar Krishnaswamy, Amit Kumar, Viswanath Nagarajan, Yo-
gish Sabharwal, and Barna Saha. “The matroid median problem”. In:
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete
Algorithms. SIAM. 2011, pp. 1117–1130.
[110] Lap Chi Lau, Ramamoorthi Ravi, and Mohit Singh. Iterative methods in
combinatorial optimization. Vol. 46. Cambridge University Press, 2011.
[111] Eugene L Lawler. “Fast approximation algorithms for knapsack prob-
lems”. In: Mathematics of Operations Research 4.4 (1979), pp. 339–356.
[112] Jon Lee, Maxim Sviridenko, and Jan Vondrák. “Matroid matching: the
power of local search”. In: SIAM Journal on Computing 42.1 (2013), pp. 357–
379.
[113] T. Leighton and S. Rao. “Multicommodity max-flow min-cut theorems
and their use in designing approximation algorithms”. In: Journal of the
ACM (JACM) 46.6 (1999). Conference version is from 1988., pp. 787–832.
[114] Tom Leighton, Satish Rao, and Aravind Srinivasan. “Multicommodity
flow and circuit switching”. In: Proceedings of the Thirty-First Hawaii
International Conference on System Sciences. Vol. 7. IEEE. 1998, pp. 459–465.
[115] Jan Karel Lenstra, David B Shmoys, and Éva Tardos. “Approximation
algorithms for scheduling unrelated parallel machines”. In: Mathematical
programming 46.1 (1990), pp. 259–271.
BIBLIOGRAPHY 216
[116] Shi Li. “A 1.488 approximation algorithm for the uncapacitated facility
location problem”. In: Information and Computation 222 (2013), pp. 45–58.
[117] Nathan Linial, Eran London, and Yuri Rabinovich. “The geometry of
graphs and some of its algorithmic applications”. In: Combinatorica 15.2
(1995), pp. 215–245.
[118] Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. “The
planar k-means problem is NP-hard”. In: Theoretical Computer Science 442
(2012), pp. 13–21.
[119] G.A. Margulis. “Explicit constructions of expanders”. In: Problemy Peredaci
Informacii 9.4 (1973), pp. 71–80.
[120] Julián Mestre. “Greedy in approximation algorithms”. In: European Sym-
posium on Algorithms. Springer. 2006, pp. 528–539.
[121] Michael Mitzenmacher and Eli Upfal. Probability and computing: Random-
ization and probabilistic techniques in algorithms and data analysis. Cambridge
university press, 2017.
[122] Sarah Morell and Martin Skutella. “Single source unsplittable flows with
arc-wise lower and upper bounds”. In: Mathematical Programming (2021),
pp. 1–20.
[123] Robin A Moser and Gábor Tardos. “A constructive proof of the general
Lovász local lemma”. In: Journal of the ACM (JACM) 57.2 (2010), pp. 1–15.
[124] Dana Moshkovitz. “The Projection Games Conjecture and the NP-
Hardness of ln 𝑛-Approximating Set-Cover”. In: Theory of Computing
11.1 (2015), pp. 221–235.
[125] Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. Cam-
bridge university press, 1995.
[126] Viswanath Nagarajan, R Ravi, and Mohit Singh. “Simpler analysis of LP
extreme points for traveling salesman and survivable network design
problems”. In: Operations Research Letters 38.3 (2010), pp. 156–160.
[127] Viswanath Nagarajan, Baruch Schieber, and Hadas Shachnai. “The Eu-
clidean k-supplier problem”. In: Mathematics of Operations Research 45.1
(2020), pp. 1–14.
[128] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. “An
analysis of approximations for maximizing submodular set functions—I”.
In: Mathematical Programming 14.1 (1978), pp. 265–294.
[129] Zeev Nutov. “Node-connectivity survivable network problems”. In:
Handbook of Approximation Algorithms and Metaheuristics. Chapman and
Hall/CRC, 2018, pp. 233–253.
BIBLIOGRAPHY 217
[158] Laurence A Wolsey. “An analysis of the greedy algorithm for the sub-
modular set covering problem”. In: Combinatorica 2.4 (1982), pp. 385–
393.
[159] Yuli Ye and Allan Borodin. “Elimination graphs”. In: ACM Transactions
on Algorithms (TALG) 8.2 (2012), pp. 1–23.
[160] Alexander Z Zelikovsky. “An 11/6-approximation algorithm for the
network Steiner problem”. In: Algorithmica 9.5 (1993), pp. 463–470.
Appendix A
We discuss the rank lemma about vertex solutions for linear programs. Recall
that a polyhedron in ℝ 𝑛 is defined as the intersection of finite collection of half
spaces. Without loss of generality we can assume that it is defined by a system
of inequalities of the form 𝐴𝑥 ≤ 𝑏 where 𝐴 is a 𝑚 × 𝑛 matrix and 𝑏 is a 𝑚 × 1
vector. A polyhedron 𝑃 is bounded if 𝑃 is contained in finite radius ball around
the origin. A polytope in ℝ 𝑛 is defined as the convex hull of a finite collection
of points. A fundamental theorem about linear programming states that any
bounded polyhedron is a polytope. If the polyhedron is not bounded then it can
be expressed as the Minkowski sum of a polytope and a cone.
A bounded polyhedron 𝑃 in ℝ 𝑛 defined by a system 𝐴𝑥 ≤ 𝑏 must necessarily
have 𝑚 ≥ 𝑛. A point 𝑝 ∈ 𝑃 is a basic feasible solution or a vertex solution of the
system if it is the unique solution to a system 𝐴0 𝑦 = 𝑏 0 where 𝐴0 is a sub-matrix
of 𝐴 with 𝑛 inequalities and the rank of 𝐴0 is equal to 𝑛. The inequalities in
𝐴0 are said to be tight for 𝑦. Note that there may be many other inequalities in
𝐴𝑥 ≤ 𝑏 that are tight at 𝑦 and in general there many be many different rank 𝑛
sub-matrices that give rise to the same basic feasible solution 𝑦.
An extension of the previous lemma is often useful when the system defining
the polyhedron has equality constraints.
220
APPENDIX A. BASIC FEASIBLE SOLUTIONS TO LPS AND THE RANK LEMMA221
A special case of the preceding corollary is called the rank lemma in [110].
The lemmas are a simple consequence of the definition of basic feasible
solution. We will focus on the proof of Lemma A.1. It is interesting only when
𝑟𝑎𝑛 𝑘(𝐴) or 𝑚 is smaller than 𝑛, otherwise the claim is trivial. Before we prove
it formally we observe some simple corollaries. Suppose we have a system
𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0 where 𝑚 < 𝑛. Then the number of non-zero variables in a basic
feasible solution is at most 𝑚. Similarly if the system is 𝐴𝑥 ≤ 𝑏, 𝑥 ∈ [0, 1]𝑛
then the number of non-integer variables in 𝑦 is at most 𝑚. For example in the
knapsack LP we have 𝑚 = 1 and hence in any basic feasible solution there can
only be one fractional variable.
Now for the proof. We consider the system 𝐴𝑥 ≤ 𝑏, −𝑥 ≤ −ℓ , 𝑥 ≤ 𝑢 as a
single system 𝐶𝑥 ≤ 𝑑 which has 𝑚 + 2𝑛 inequalities. Since 𝑦 is a basic feasible
solution to this system, from the definition, it is the unique solution of sub-system
𝐶 0 𝑥 = 𝑑0 where 𝐶 0 is a 𝑛 × 𝑛 full-rank matrix. How many rows of 𝐶 0 can come
from 𝐴? At most 𝑟 𝑎𝑛 𝑘(𝐴) ≤ 𝑚 rows. It means that the rest of the rows of 𝐶 0
are of the from the other set of inequalities −𝑥 ≤ ℓ or 𝑥 ≤ 𝑢. There are at least
𝑛 − 𝑟 𝑎𝑛 𝑘(𝐴) such rows which are tight at 𝑦. Thus 𝑛 − 𝑟 𝑎𝑛 𝑘(𝐴) variables in 𝑦 are
tight at lower or upper bounds and hence there can only be 𝑟 𝑎𝑛 𝑘(𝐴) fractional
variables in 𝑦.
See [110] for iterated rounding based methodology for exact and approxima-
tion algorithms. The whole methodology relies on properties of basic feasible
solutions to LP relaxations of combinatorial optimization problems.
at most 𝑚 fractional variables. When 𝑚 is a fixed constant one can exploit this
after guessing the big items to obtain a PTAS.
Generalized Assignment: See Chapter 6.
Probabilistic Inequalities
The course will rely heavily on proababilistic methods. We will mostly rely
on discrete probability spaces. We will keep the discussion high-level where
possible and use certain results in a black-box fashion.
Let Ω be a finite set. A probability measure 𝑝 assings a non-negative number
𝑝(𝜔) for each 𝜔 ∈ Ω such that 𝜔∈Ω 𝑝(𝜔) = 1. The tuple (Ω, 𝑝) defines a discrete
Í
probability space; an event in this space is any subset 𝐴 ⊆ Ω and the probability
of an event is simply 𝑝(𝐴) = 𝜔∈𝐴 𝑝(𝜔). When Ω is a continuous space such as
Í
the interval [0, 1] things get trickier and we need to talk about a measure spaces
𝜎-algebras over Ω; we can only assign probability to certain subsets of Ω. We
will not go into details since we will not need any formal machinery for what we
do in this course.
An important definition is that of a random variable. We will focus only
on real-valued random variables in this course. A random variable 𝑋 in a
probability space is a function 𝑋 : Ω → ℝ. In the discrete setting the expectation
of 𝑋, denoted by E[𝑋], is defined as 𝜔∈Ω 𝑝(𝑤)𝑋(𝜔). For continuous spaces
Í
∫
E[𝑋] = 𝑋(𝜔)𝑑𝑝(𝜔) with appropriate definition of the integral. The variance
of 𝑋, denoted by Var[𝑋] or as 𝜎𝑋 2
, is defined as E (𝑋 − E[𝑋]])2 . The standard
deviation is 𝜎𝑋 , the square root of the variance.
Proof. The proof is in some sense obvious, especially in the discrete case. Here
is a sketch. Define a new random variable 𝑌 where 𝑌(𝜔) = 𝑋(𝜔) if 𝑋(𝜔) < 𝑡
and 𝑌(𝜔) = 𝑡 if 𝑋(𝜔) ≥ 𝑡. 𝑌 is non-negative and 𝑌 ≤ 𝑋 point-wise and hence
223
APPENDIX B. PROBABILISTIC INEQUALITIES 224
Corollary B.5. Under the conditions of Theorem B.3, there is a universal constant
𝛼 such that for any 𝜇 ≥ max{1, E[𝑋]}, and sufficiently large 𝑛 and for 𝑐 ≥ 1,
P[𝑋 > 𝛼𝑐 ln 𝑛 𝑐
ln ln 𝑛 · 𝜇] ≤ 1/𝑛 . Similarly, there is a constant 𝛼 such that for any 𝜖 > 0,
P[𝑋 ≥ (1 + 𝜖)𝜇 + 𝛼𝑐 log 𝑛/𝜖] ≤ 1/𝑛 𝑐 .
Remark B.1. If the 𝑋𝑖 are in the range [0, 𝑏] for some 𝑏 not equal to 1 one can
scale them appropriately and then use the standard bounds.
Some times we need to deal with random variables that are in the range
[−1, 1]. Consider the setting where 𝑋 = 𝑖 𝑋𝑖 where for each 𝑖, 𝑋𝑖 ∈ [−1, 1]
Í
and E𝑋𝑖 = 0, and the 𝑋𝑖 are independent. In this case E[𝑋] = 0 and we can no
longer expect a dimension-free bound. Suppose each 𝑋𝑖 is 1 with probability 1/2
and −1 with probability 1/2. Then 𝑋 = 𝑖 𝑋𝑖 corresponds to a 1-dimensional
Í
random walk and even though the expected value is 0 the standard deviation of
√ √ 2
𝑋 is Θ( 𝑛). One can show that P[|𝑋 | ≥ 𝑡 𝑛] ≤ 2𝑒 −𝑡 /2 . For these settings we
can use the following bounds.
Theorem B.6. Let 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 be independent random variables such that for each
𝑖, 𝑋𝑖 ∈ [𝑎 𝑖 , 𝑏 𝑖 ]. Let 𝑋 = 𝑖 𝑎 𝑖 𝑋𝑖 and let 𝜇 = E[𝑋]. Then
Í
2𝑡 2
− Í𝑛
(𝑏 𝑖 −𝑎 𝑖 )2
P[|𝑋 − 𝜇| ≥ 𝑡] ≤ 2𝑒 𝑖=1 .
Í
Note that Var[𝑋] = 𝑖 Var[𝑋𝑖 ]. One can show a bound based of the following
form
𝑡2
−
2(𝜎2 +𝑀𝑡/3)
P[|𝑋 − 𝜇| ≥ 𝑡] ≤ 2𝑒 𝑋