0% found this document useful (0 votes)
5 views

Approx Algorithms Lecture Notes

Algo

Uploaded by

Adarsh Amit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Approx Algorithms Lecture Notes

Algo

Uploaded by

Adarsh Amit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 227

CS 583: Approximation Algorithms

Chandra Chekuri1

November 18, 2021

1Dept. of Computer Science, University of Illinois, Urbana, IL 61820. Email:


[email protected].
Contents

1 Introduction 5
1.1 Formal Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.1 NP Optimization Problems . . . . . . . . . . . . . . . . . . 8
1.1.2 Relative Approximation . . . . . . . . . . . . . . . . . . . . 9
1.1.3 Additive Approximation . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Hardness of Approximation . . . . . . . . . . . . . . . . . . 10
1.2 Designing Approximation Algorithms . . . . . . . . . . . . . . . . 12

2 Covering Problems 14
2.1 Greedy for Set Cover and Maximum Coverage . . . . . . . . . . . 15
2.1.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Analysis of Greedy Cover . . . . . . . . . . . . . . . . . . . 16
2.1.3 Dominating Set . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 A 2-approximation for Vertex Cover . . . . . . . . . . . . 20
2.2.2 Set Cover with small frequencies . . . . . . . . . . . . . . . 21
2.3 Vertex Cover via LP . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Set Cover via LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Deterministic Rounding . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Randomized Rounding . . . . . . . . . . . . . . . . . . . . 25
2.4.3 Dual-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.4 Greedy for implicit instances of Set Cover . . . . . . . . . 31
2.5 Submodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Submodular Set Cover . . . . . . . . . . . . . . . . . . . . . 32
2.5.2 Submodular Maximum Coverage . . . . . . . . . . . . . . . 33
2.6 Covering Integer Programs (CIPs) . . . . . . . . . . . . . . . . . . 33

3 Knapsack 35
3.1 The Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 A Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . 36

1
CONTENTS 2

3.1.2 A Polynomial Time Approximation Scheme . . . . . . . . . 37


3.1.3 Rounding and Scaling . . . . . . . . . . . . . . . . . . . . . 39
3.2 Other Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Packing Problems 41
4.1 Maximum Independent Set Problem in Graphs . . . . . . . . . . . 42
4.1.1 Elimination Orders and MIS . . . . . . . . . . . . . . . . . 45
4.2 The efficacy of the Greedy algorithm for a class of Independence
Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Randomized Rounding with Alteration for Packing Problems . . 48
4.4 Packing Integer Programs (PIPs) . . . . . . . . . . . . . . . . . . . 51
4.4.1 Randomized Rounding with Alteration for PIPs . . . . . . 51

5 Load Balancing and Bin Packing 57


5.1 Load Balancing / MultiProcessor Scheduling . . . . . . . . . . . . 57
5.1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 58
5.1.2 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.3 A PTAS for Multi-Processor Scheduling . . . . . . . . . . . 60
5.1.4 Section Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Bin Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 63
5.2.2 Greedy Approaches . . . . . . . . . . . . . . . . . . . . . . 63
5.2.3 (Asymptotic) PTAS for Bin Packing . . . . . . . . . . . . . . 64
5.2.4 Asymptotic PTAS for Bin Packing . . . . . . . . . . . . . . 66
5.2.5 Section Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6 Unrelated Machine Scheduling and Generalized Assignment 68


6.1 Scheduling on Unrelated Parallel Machines . . . . . . . . . . . . . 68
6.2 Generalized Assignment Problem . . . . . . . . . . . . . . . . . . . 72
6.2.1 Shmoys-Tardos Rounding . . . . . . . . . . . . . . . . . . . 74
6.2.2 Iterative Rounding . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Maximization version of GAP . . . . . . . . . . . . . . . . . . . . . 82
6.4 Bibilographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7 Congestion Minimization in Networks 84


7.1 Congestion Minimization and VLSI Routing . . . . . . . . . . . . 84
7.2 Min-max Integer Programs . . . . . . . . . . . . . . . . . . . . . . 90

8 Introduction to Local Search 92


8.1 Local Search for Max Cut . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 Local Search for Submodular Function Maximization . . . . . . . 97
CONTENTS 3

9 Clustering and Facility Location 100


9.1 𝑘-Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.1.1 Gonzalez’s algorithm and nets in metric spaces . . . . . . 102
9.1.2 Hochbaum-Shmoys bottleneck approach . . . . . . . . . . 103
9.1.3 Related Problems and Discussion . . . . . . . . . . . . . . 104
9.2 Uncapacitated Facility Location . . . . . . . . . . . . . . . . . . . . 105
9.2.1 LP Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.2.2 Primal-Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.2.3 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3 𝑘-Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3.1 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.4 𝑘-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.5 Lloyd’s algorithm, 𝐷 2 -sampling and 𝑘-Means ++ . . . . . . . . . . 123
9.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

10 Introduction to Network Design 126


10.1 The Steiner Tree Problem . . . . . . . . . . . . . . . . . . . . . . . . 127
10.1.1 The MST Algorithm . . . . . . . . . . . . . . . . . . . . . . 128
10.1.2 The Greedy/Online Algorithm . . . . . . . . . . . . . . . . 130
10.1.3 LP Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . 132
10.1.4 Other Results on Steiner Trees . . . . . . . . . . . . . . . . 132
10.2 The Traveling Salesperson Problem (TSP) . . . . . . . . . . . . . . 133
10.2.1 TSP in Undirected Graphs . . . . . . . . . . . . . . . . . . . 133
10.2.2 LP Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . 137
10.2.3 TSP in Directed Graphs . . . . . . . . . . . . . . . . . . . . 138
10.2.4 LP Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . 140

11 Steiner Forest Problem 142

12 Primal Dual for Constrained Forest Problems 147


12.1 Classes of Functions and Setup . . . . . . . . . . . . . . . . . . . . 148
12.2 A Primal-Dual Algorithm for Covering Uncrossable Functions . . 151
12.2.1 Proof of Lemma 12.6 . . . . . . . . . . . . . . . . . . . . . . 154

13 Survivable Network Design Problem 158


13.1 Augmentation approach . . . . . . . . . . . . . . . . . . . . . . . . 159
13.2 Iterated rounding based 2-approximation . . . . . . . . . . . . . . 162
13.2.1 Basic feasible solutions and laminar family of tight sets . . 164
13.2.2 Counting argument . . . . . . . . . . . . . . . . . . . . . . 166
CONTENTS 4

14 Introduction to Cut and Partitioning Problems 172


14.1 𝑠-𝑡 mincut via LP Rounding and Maxflow-Mincut . . . . . . . . . 172
14.2 A Catalog of Cut and Partitioning Problems . . . . . . . . . . . . . 175

15 Multiway Cut 178


15.1 Isolating Cut Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . 179
15.2 Distance based LP Relaxation . . . . . . . . . . . . . . . . . . . . . 179
15.3 A Partitioning View and Geometric Relaxation . . . . . . . . . . . 181
15.4 Node-weighted and Directed Multiway Cut . . . . . . . . . . . . . 181

16 Multicut 183
16.1 Upper Bound on the Integrality Gap . . . . . . . . . . . . . . . . . 184
16.2 Lower Bound on the Integrality Gap . . . . . . . . . . . . . . . . . 189
16.2.1 Expander Graphs . . . . . . . . . . . . . . . . . . . . . . . . 189
16.2.2 The Multicut Instance . . . . . . . . . . . . . . . . . . . . . 192

17 Sparsest Cut 194


17.0.1 LP Relaxation and Maximum Concurrent Flow . . . . . . . 196
17.1 Rounding LP via Connection to Multicut . . . . . . . . . . . . . . 198
17.2 Rounding via ℓ 1 embeddings . . . . . . . . . . . . . . . . . . . . . 200
17.2.1 A digression through trees . . . . . . . . . . . . . . . . . . 200
17.2.2 Cut metrics, line metrics, and ℓ 1 metrics . . . . . . . . . . . 201
17.2.3 Brief introducton to metric embeddings . . . . . . . . . . . 202
17.2.4 Utilizing the ℓ 1 embedding . . . . . . . . . . . . . . . . . . 203
17.3 SDP and Spectral Relaxations . . . . . . . . . . . . . . . . . . . . . 204

A Basic Feasible Solutions to LPs and the Rank Lemma 219


A.0.1 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . 220
A.0.2 Connection to Caratheodary’s Theorem . . . . . . . . . . . 221

B Probabilistic Inequalities 222


Chapter 1

Introduction

These are lecture notes for a course on approximation algorithms.


Course Objectives
1. To appreciate that not all intractable problems are the same. NP opti-
mization problems, identical in terms of exact solvability, can appear very
different from the approximation point of view. This sheds light on why, in
practice, some optimization problems (such as Knapsack) are easy, while
others (like Clique) are extremely difficult.
2. To learn techniques for design and analysis of approximation algorithms,
via some fundamental problems.
3. To build a toolkit of broadly applicable algorithms/heuristics that can be
used to solve a variety of problems.
4. To understand reductions between optimization problems, and to develop
the ability to relate new problems to known ones.

The complexity class P contains the set of problems that can be solved
in polynomial time. From a theoretical viewpoint, this describes the class of
tractable problems, that is, problems that can be solved efficiently. The class NP
is the set of problems that can be solved in non-deterministic polynomial time, or
equivalently, problems for which a solution can be verified in polynomial time.
NP contains many interesting problems that often arise in practice, but there is
good reason to believe P ≠ NP. That is, it is unlikely that there exist algorithms
to solve NP optimization problems efficiently, and so we often resort to heuristic
methods to solve these problems.
Heuristic approaches include backtrack search and its variants, mathematical
programming methods, local seach, genetic algorithms, tabu search, simulated

5
CHAPTER 1. INTRODUCTION 6

annealing etc. Some methods are guaranteed to find an optimal solution, though
they may take exponential time; others are guaranteed to run in polynomial time,
though they may not return a (optimal) solution. Approximation algorithms are
(typically) polynomial time heuristics that do not always find an optimal solution
but they are distinguished from general heuristics in providing guarantees on
the quality of the solution they output.
Approximation Ratio: To give a guarantee on solution quality, one must first
define what we mean by the quality of a solution. We discuss this more carefully
later. For now, note that each instance of an optimization problem has a set of
feasible solutions. The optimization problems we consider have an objective
function which assigns a (real/rational) number/value to each feasible solution
of each instance 𝐼. The goal is to find a feasible solution with minimum objective
function value or maximum objective function value. The former problems are
minimization problems and the latter are maximization problems.
For each instance 𝐼 of a problem, let OPT(𝐼) denote the value of an optimal
solution to instance 𝐼. We say that an algorithm 𝒜 is an 𝛼-approximation
algorithm for a problem if, for every instance 𝐼, the value of the feasible solution
returned by 𝒜 is within a (multiplicative) factor of 𝛼 of OPT(𝐼). Equivalently,
we say that 𝒜 is an approximation algorithm with approximation ratio 𝛼. For a
minimization problem we would have 𝛼 ≥ 1 and for a maximization problem
we would have 𝛼 ≤ 1. However, it is not uncommon to find in the literature a
different convention for maximization problems where one says that 𝒜 is an
𝛼-approximation algorithm if the value of the feasible solution returned by 𝒜
is at least 𝛼1 · OPT(𝐼); the reason for using convention is so that approximation
ratios for both minimization and maximization problems will be ≥ 1. In this
course we will for the most part use the convention that 𝛼 ≥ 1 for minimization
problems and 𝛼 ≤ 1 for maximization problems.
Remarks:

1. The approximation ratio of an algorithm for a minimization problem is


the maximum (or supremum), over all instances of the problem, of the
ratio between the values of solution returned by the algorithm and the
optimal solution. Thus, it is a bound on the worst-case performance of the
algorithm.

2. The approximation ratio 𝛼 can depend on the size of the instance 𝐼, so one
should technically write 𝛼(|𝐼 |).

3. A natural question is whether the approximation ratio should be defined


in an additive sense. For example, an algorithm has an 𝛼-approximation for
a minimization problem if it outputs a feasible solution of value at most
CHAPTER 1. INTRODUCTION 7

OPT(𝐼) + 𝛼 for all 𝐼. This is a valid definition and is the more relevant one
in some settings. However, for many NP problems it is easy to show that
one cannot obtain any interesting additive approximation (unless of course
𝑃 = 𝑁𝑃) due to scaling issues. We will illustrate this via an example later.

Pros and cons of the approximation approach: Some advantages to the ap-
proximation approach include:
1. It explains why problems can vary considerably in difficulty.

2. The analysis of problems and problem instances distinguishes easy cases


from difficult ones.

3. The worst-case ratio is robust in many ways. It allows reductions between


problems.

4. Approximation allgorithmic ideas/tools/relaxations are valuable in devel-


oping heuristics, including many that are practical and effective.

5. Quantification of performance via a concrete metric such as the approxi-


mation ratio allows for innovation in algorithm design and has led to many
new ideas.
As a bonus, many of the ideas are beautiful and sophisticated, and involve
connections to other areas of mathematics and computer science.

Disadvantages include:
1. The focus on worst-case measures risks ignoring algorithms or heuristics
that are practical or perform well on average.

2. Unlike, for example, integer programming, there is often no incremen-


tal/continuous tradeoff between the running time and quality of solution.

3. Approximation algorithms are often limited to cleanly stated problems.

4. The framework does not (at least directly) apply to decision problems or
those that are inapproximable.

Approximation as a broad lens


The use of approximation algorithms is not restricted solely to NP-Hard opti-
mization problems. In general, ideas from approximation can be used to solve
many problems where finding an exact solution would require too much of any
resource.
CHAPTER 1. INTRODUCTION 8

A resource we are often concerned with is time. Solving NP-Hard problems


exactly would (to the best of our knowledge) require exponential time, and so we
may want to use approximation algorithms. However, for large data sets, even
polynomial running time is sometimes unacceptable. As an example, the best
exact√algorithm known for the Matching problem in general graphs requires
𝑂(𝑚 𝑛) time; on large graphs, this may be not be practical. In contrast, a simple
greedy algorithm takes near-linear time and outputs a matching of cardinality at
least 1/2 that of the maximum matching; moreover there have been randomized
sub-linear time algorithms as well.
Another often limited resource is space. In the area of data streams/streaming
algorithms, we are often only allowed to read the input in a single pass, and given
a small amount of additional storage space. Consider a network switch that
wishes to compute statistics about the packets that pass through it. It is easy to
exactly compute the average packet length, but one cannot compute the median
length exactly. Surprisingly, though, many statistics can be approximately
computed.
Other resources include programmer time (as for the Matching problem,
the exact algorithm may be significantly more complex than one that returns
an approximate solution), or communication requirements (for instance, if the
computation is occurring across multiple locations).

1.1 Formal Aspects


1.1.1 NP Optimization Problems
In this section, we cover some formal definitions related to approximation
algorithms. We start from the definition of optimization problems. A problem
is simply an infinite collection of instances. Let Π be an optimization problem. Π
can be either a minimization or maximixation problem. Instances 𝐼 of Π are a
subset of Σ∗ where Σ is a finite encoding alphabet. For each instance 𝐼 there is a
set of feasible solutions 𝒮(𝐼). We restrict our attention to real/rational-valued
optimization problems; in these problems each feasible solution 𝑆 ∈ 𝒮(𝐼) has
a value 𝑣𝑎𝑙(𝑆, 𝐼). For a minimization problem Π the goal is, given 𝐼, find
OPT(𝐼) = min𝑆∈𝒮(𝐼) 𝑣𝑎𝑙(𝑆, 𝐼).
Now let us formally define NP optimization (NPO) which is the class of
optimization problems corresponding to 𝑁𝑃.
Definition 1.1. Π is in NPO if
• Given 𝑥 ∈ Σ∗ , there is a polynomial-time algorithm that decide if 𝑥 is a valid
instance of Π. That is, we can efficiently check if the input string is well-formed.
This is a basic requirement that is often not spelled out.
CHAPTER 1. INTRODUCTION 9

• For each 𝐼, and 𝑆 ∈ 𝒮(𝐼), |𝑆| ≤ poly(|𝐼 |). That is, the solution are of size
polynomial in the input size.

• There exists a poly-time decision procedure that for each 𝐼 and 𝑆 ∈ Σ∗ , decides if
𝑆 ∈ 𝒮(𝐼). This is the key property of 𝑁𝑃; we should be able to verify solutions
efficiently.

• 𝑣𝑎𝑙(𝐼, 𝑆) is a polynomial-time computable function.

We observe that for a minimization NPO problem Π, there is a associated


natural decision problem 𝐿(Π) = {(𝐼, 𝐵) : OPT(𝐼) ≤ 𝐵} which is the following:
given instance 𝐼 of Π and a number 𝐵, is the optimal value on 𝐼 at most 𝐵? For
maximization problem Π we reverse the inequality in the definition.

Lemma 1.1. 𝐿(Π) is in 𝑁𝑃 if Π is in NPO.

1.1.2 Relative Approximation


When Π is a minimization problem, recall that we say an approximation
algorithm 𝒜 is said to have approximation ratio 𝛼 iff

• 𝒜 is a polynomial time algorithm

• for all instance 𝐼 of Π, 𝒜 produces a feasible solution 𝒜(𝐼) s.t. 𝑣𝑎𝑙(𝒜(𝐼), 𝐼) ≤


𝛼 𝑣𝑎𝑙 (OPT(𝐼), 𝐼). (Note that 𝛼 ≥ 1.)

Approximation algorithms for maximization problems are defined similarly.


An approximation algorithm 𝒜 is said to have approximation ratio 𝛼 iff

• 𝒜 is a polynomial time algorithm

• for all instance 𝐼 of Π, 𝒜 produces a feasible solution 𝒜(𝐼) s.t. 𝑣𝑎𝑙(𝒜(𝐼), 𝐼) ≥


𝛼 𝑣𝑎𝑙 (OPT(𝐼), 𝐼). (Note that 𝛼 ≤ 1.)

For maximization problems, it is also common to see use 1/𝛼 (which must
be ≥ 1) as approximation ratio.

1.1.3 Additive Approximation


Note that all the definitions above are about relative approximations; one could
also define additive approximations. 𝒜 is said to be an 𝛼-additive approximation
algorithm, if for all 𝐼, 𝑣𝑎𝑙(𝒜(𝐼)) ≤ OPT(𝐼) + 𝛼. Most NPO problems, however,
do not allow any additive approximation ratio because OPT(𝐼) has a scaling
property.
CHAPTER 1. INTRODUCTION 10

To illustrate the scaling property, let us consider Metric-TSP. Given an instance


𝐼, let 𝐼𝛽 denote the instance obtained by increasing all edge costs by a factor of 𝛽.
It is easy to observe that for each 𝑆 ∈ 𝒮(𝐼) = 𝒮(𝐼𝛽 ), 𝑣𝑎𝑙(𝑆, 𝐼𝛽 ) = 𝛽𝑣𝑎𝑙(𝑆, 𝐼𝛽 ) and
OPT(𝐼𝛽 ) = 𝛽 OPT(𝐼). Intuitively, scaling edge by a factor of 𝛽 scales the value by
the same factor 𝛽. Thus by choosing 𝛽 sufficiently large, we can essentially make
the additive approximation(or error) negligible.

Lemma 1.2. Metric-TSP does not admit an 𝛼 additive approximation algorithm for
any polynomial-time computable 𝛼 unless 𝑃 = 𝑁𝑃.

Proof. For simplicity, suppose every edge has integer cost. For the sake of
contradiction, suppose there exists an additive 𝛼 approximation 𝒜 for Metric-
TSP. Given 𝐼, we run the algorithm on 𝐼𝛽 and let 𝑆 be the solution, where 𝛽 = 2𝛼.
We claim that 𝑆 is the optimal solution for 𝐼. We have 𝑣𝑎𝑙(𝑆, 𝐼) = 𝑣𝑎𝑙(𝑆, 𝐼𝛽 )/𝛽 ≤
OPT(𝐼𝛽 )/𝛽 + 𝛼/𝛽 = OPT(𝐼) + 1/2, as 𝒜 is 𝛼-additive approximation. Thus we
conclude that OPT(𝐼) = 𝑣𝑎𝑙(𝑆, 𝐼), since OPT(𝐼) ≤ 𝑣𝑎𝑙(𝑆, 𝐼), and OPT(𝐼), 𝑣𝑎𝑙(𝑆, 𝐼)
are integers. This is impossible unless 𝑃 = 𝑁𝑃. 

Now let us consider two problems which allow additive approximations.


In the Planar Graph Coloring, we are given a planar graph 𝐺 = (𝑉 , 𝐸). We are
asked to color all vertices of the given graph 𝐺 such that for any 𝑣𝑤 ∈ 𝐸, 𝑣 and
𝑤 have different colors. The goal is to minimize the number of different colors.
It is known that to decide if a planar graph admits 3-coloring is NP-complete
[143], while one can always color any planar graph 𝐺 with using 4 colors (this is
the famous 4-color theorem) [9, 149]. Further, one can efficiently check whether
a graph is 2-colorable (that is, if it is bipartite). Thus, the following algorithm is
a 1-additive approximation for Planar Graph Coloring: If the graph is bipartite,
color it with 2 colors; otherwise, color with 4 colors.
As a second example, consider the Edge Coloring Problem, in which we are
asked to color edges of a given graph 𝐺 with the minimum number of different
colors so that no two adjacent edges have different colors. By Vizing’s theorem
[154], we know that one can color edges with either Δ(𝐺) or Δ(𝐺) + 1 different
colors, where Δ(𝐺) is the maximum degree of 𝐺. Since Δ(𝐺) is a trivial lower
bound on the minimum number, we can say that the Edge Coloring Problem
allows a 1-additive approximation. Note that the problem of deciding whether
a given graph can be edge colored with Δ(𝐺) colors is NP-complete [85].

1.1.4 Hardness of Approximation


Now we move to hardness of approximation.
CHAPTER 1. INTRODUCTION 11

Definition 1.2 (Approximability Threshold). Given a minimization optimization


problem Π, it is said that Π has an approximation threshold 𝛼∗ (Π), if for any 𝜖 > 0,
Π admits a 𝛼∗ (Π) + 𝜖 approximation but if it admits a 𝛼∗ (Π) − 𝜖 approximation then
𝑃 = 𝑁𝑃.

If 𝛼∗ (Π) = 1, it implies that Π is solvable in polynomial time. Many NPO


problems Π are known to have 𝛼 ∗ (Π) > 1 assuming that 𝑃 ≠ 𝑁𝑃. We can say
that approximation algorithms try to decrease the upper bound on 𝛼∗ (Π), while
hardness of approximation attempts to increase lower bounds on 𝛼∗ (Π).
To prove hardness results on NPO problems in terms of approximation,
there are largely two approaches; a direct way by reduction from NP-complete
problems and an indirect way via gap reductions. Here let us take a quick look
at an example using a reduction from an NP-complete problem.
In the (metric) 𝑘-center problem, we are given an undirected graph 𝐺 = (𝑉 , 𝐸)
and an integer 𝑘. We are asked to choose a subset of 𝑘 vertices from 𝑉
called centers. The goal is to minimize the maximum distance to a center, i.e.
min𝑆⊆𝑉 ,|𝑆|=𝑘 max𝑣∈𝑉 dist𝐺 (𝑣, 𝑆), where dist𝐺 (𝑣, 𝑆) = min𝑢∈𝑆 dist𝐺 (𝑢, 𝑣).
The 𝑘-center problem has approximation threshold 2, since there are a few
2-approximation algorithms for 𝑘-center and there is no 2 − 𝜖 approximation
algorithm for any 𝜖 > 0 unless 𝑃 = 𝑁𝑃. We can prove the inapproximability
using a reduction from the decision version of Dominating Set: Given an
undirected graph 𝐺 = (𝑉 , 𝐸) and an integer 𝑘, does 𝐺 have a dominating set
of size at most 𝑘? A set 𝑆 ⊆ 𝑉 is said to be a dominating set in 𝐺 if for all
𝑣 ∈ 𝑉, 𝑣 ∈ 𝑆 or 𝑣 is adjacent to some 𝑢 in 𝑆. Dominating Set is known to be
NP-complete.

Theorem 1.3 ([88]). Unless 𝑃 = 𝑁𝑃, there is no 2 − 𝜖 approximation for 𝑘-center for
any fixed 𝜖 > 0.

Proof. Let 𝐼 be an instance of Dominating Set Problem consisting of graph


𝐺 = (𝑉 , 𝐸) and integer 𝑘. We create an instance 𝐼 0 of 𝑘-center while keeping
graph 𝐺 and 𝑘 the same. If 𝐼 has a dominating set of size 𝑘 then OPT(𝐼 0) = 1,
since every vertex can be reachable from the Dominating Set by at most one
hop. Otherwise, we claim that OPT(𝐼 0) ≥ 2. This is because if OPT(𝐼 0) < 2,
then every vertex must be within distance 1, which implies the 𝑘-center that
witnesses OPT(𝐼 0) is a dominating set of 𝐼. Therefore, the (2 − 𝜖) approximation
for 𝑘-center can be used to solve the Dominating Set Problem. This is impossible,
unless 𝑃 = 𝑁𝑃. 
CHAPTER 1. INTRODUCTION 12

1.2 Designing Approximation Algorithms


How does one design and more importantly analyze the performance of approx-
imation algorithms? This is a non-trivial task and the main goal of the course
is to expose you to basic and advanced techniques as well as central problems.
The purpose of this section is to give some high-level insights. We start with
how we design polynomial-time algorithms. Note that approximation makes
sense mainly in the setting where one can find a feasible solution relatively
easily but finding an optimum solution is hard. In some cases finding a feasible
solution itself may involve some non-trivial algorithm, in which case it is useful
to properly understand the structural properties that guarantee feasibility, and
then build upon it.
Some of the standard techniques we learn in basic and advanced undergradu-
ate algorithms courses are recursion based methods such as divide and conquer,
dynamic programming, greedy, local search, combinatorial optimization via
duality, and reductions to existing problems. How do we adapt these to the
approximation setting? Note that intractability implies that there are no efficient
characterizations of the optimum solution value.
Greedy and related techniques are often fairly natural for many problems
and simple heuristic algorithms often suggest themselves for many problems.
(Note that the algorithms may depend on being able to solve some existing
problem efficiently. Thus, knowing a good collection of general poly-time
solvable problems is often important.) The main difficulty is in analyzing their
performance. The key challenge here is to identify appropriate lower bounds on
the optimal value (assuming that the problem is a minimization problem) or
upper bounds on the optimal value (assuming that the problem is a maximization
problem). These bounds allow one to compare the output of the algorithm and
prove an approximation bound. In designing poly-time algorithms we often
prove that greedy algorithms do not work. We typically do this via examples.
This skill is also useful in proving that some candidate algorithm does not give a
good approximation. Often the bad examples lead one to a new algorithm.
How does one come up with lower or upper bounds on the optimum value?
This depends on the problem at hand and knowing some background and
related problems. However, one would like to find some automatic ways of
obtaining bounds. This is often provided via linear programming relaxations
and more advanced convex programming methods including semi-definite
programming, lift-and-project hierarchies etc. The basic idea is quite simple.
Since integer linear programming is NP-Complete one can formulate most
discrete optimization problems easily and “naturally” as an integer program.
Note that there may be many different ways of expressing a given problem as
an integer program. Of course we cannot solve the integer program but we
CHAPTER 1. INTRODUCTION 13

can solve the linear-programming relaxation which is obtained by removing


the integrality constraints on the variables. Thus, for each instance 𝐼 of a given
problem we can obtain an LP relaxation 𝐿𝑃(𝐼) which we typically can be solve
in polynomial-time. This automatically gives a bound on the optimum value
since it is a relaxation. How good is this bound? It depends on the problem, of
course, and also the specific LP relaxation. How do we obtain a feasible solution
that is close to the bound given by the LP relaxation. The main technique here
is to round the fractional solution 𝑥 to an integer feasible solution 𝑥 0 such that
𝑥 0’s value is close to that of 𝑥. There are several non-trivial rounding techniques
that have been developed over the years that we will explore in the course. We
should note that in several cases one can analyze combinatorial algorithms via
LP relaxations even though the LP relaxation does not play any direct role in
the algorithm itself. Finally, there is the question of which LP relaxation to use.
Often it is required to “strengthen” an LP relaxation via addition of constraints
to provide better bounds. There are some automatic ways to strengthen any LP
and often one also needs problem specific ideas.
Local search is another powerful technique and the analysis here is not
obvious. One needs to relate the value of a local optimum to the value of a
global optimum via various exchange properties which define the local search
heuristic. For a formal analysis it is necessary to have a good understanding of
the problem structure.
Finally, dynamic programming plays a key role in the following way. Its
main use is in solving to optimality a restricted version of the given problem or a
subroutine that is useful as a building block. How does one obtain a restricted
version? This is often done by some clever proprocessing of a given instance.
Reductions play a very important role in both designing approximation
algorithms and in proving inapproximability results. Often reductions serve
as a starting point in developing a simple and crude heuristic that allows
one to understand the structure of a problem which then can lead to further
improvements.
Discrete optimization problems are brittle — changing the problem a little can
lead to substantial changes in the complexity and approximability. Nevertheless
it is useful to understand problems and their structure in broad categories
so that existing results can be leveraged quickly and robustly. Thus, some of
the emphasis in the course will be on classifying problems and how various
parameters influence the approximability.
Chapter 2

Covering Problems

Part of these notes were scribed by Abul Hassan Samee and Lewis Tseng.
Packing and Covering problems together capture many important problems
in combinatorial optimization. We will discuss several covering problems in
this chapter. Two canonical one problems are Minimum Vertex Cover and
its generalization Minimum Set Cover. (Typically we will omit the use of the
qualifiers minimum and maximum since this is often clear from the definition
of the problem and the context.) They play an important role in the study of
approximation algorithms.
A vertex cover in an undirected graph 𝐺 = (𝑉 , 𝐸) is a set 𝑆 ⊆ 𝑉 of vertices
such that for each edge 𝑒 ∈ 𝐸, at least one of its end points is in 𝑆. It is also
called a node cover. In the Vertex Cover problem, our goal is to find a smallest
vertex cover of 𝐺. In the weighted version of the problem, a weight function
𝑤 : 𝑉 → ℛ + is given, and our goal is to find a minimum weight vertex cover of
𝐺. The unweighted version of the problem is also known as Cardinality Vertex
Cover. Note that we are picking vertices to cover the edges. Vertex Cover is
NP-Hard and is on the list of problems in Karp’s list.
In the Set Cover problem the input is a set 𝒰 of 𝑛 elements, and a collection
𝒮 = {𝑆1 , 𝑆2 , . . . , 𝑆𝑚 } of 𝑚 subsets of 𝒰 such that 𝑖 𝑆 𝑖 = 𝒰. Our goal in the
Ð
Set Cover problem is to select as few subsets as possible from 𝒮 such that their
union covers 𝒰. In the weighted version each set 𝑆 𝑖 has a non-negative weight
𝑤 𝑖 the goal is to find a set cover of minimim weight. Closely related to the Set
Cover problem is the Maximum Coverage problem. In this problem the input is
again 𝒰 and 𝒮 but we are also given an integer 𝑘 ≤ 𝑚. The goal is to select 𝑘
subsets from 𝒮 such that their union has the maximum cardinality. Note that Set
Cover is a minimization problem while Maximum Coverage is a maximization
problem. Set Cover is essentially equivalent to the Hitting Set problem. In
Hitting Set the input is 𝒰 and 𝒮 but the goal is to pick the smallest number of

14
CHAPTER 2. COVERING PROBLEMS 15

elements of 𝒰 that cover the given sets in 𝒮. In other words we are seeking a
set cover in the dual set system. It is easy to see Vertex Cover is a special case of
Set Cover .
Set Cover is an important problem because in discrete optimization. In the
standard definition the set system is given explicitly. In many applications the
set system is implicit, and often exponential in the explicit part of the input;
nevertheless such set systems are ubiquitious and one can often obtain exact
or approximation algorithms. As an example consider the well known MST
problem in graphs. One way to phrase MST is the following: given an edge-
weighted graph 𝐺 = (𝑉 , 𝐸) find a minimum cost subset of the edges that cover
all the cuts of 𝐺; by cover a cut 𝑆 ⊆ 𝑉 we mean that at least one of the edges in
𝛿(𝑆) must be chosen. This may appear to be a strange way of looking at the MST
problem but this view is useful as we will see later. Another implicit example
is the following. Suppose we are given 𝑛 rectangles in the plane and the goal
is to choose a minimum number of points in the plane such that each input
rectangle contains one of the chosen points. This is perhaps more natural to
view as a special case of the Hitting Set problem. In principle the set of points
that we can choose from is infinite but it can be seen that we can confine our
attention to vertices in the arrangement of the given rectangles and it is easy to
see that there are only 𝑂(𝑛 2 ) vertices — however, explicity computing them may
be expensive and one may want to treat the problem as an implicit one for the
sake of efficiency. want to think of
Covering problems have the feature that a superset of a feasible solution is
also a feasible solution. More abstractly one can cast covering problems as the
following. We are given a finite ground set 𝑉 (vertices in a graph or sets in a
set system) and a family of feasible solutions ℐ ⊆ 2𝑉 where ℐ is upward closed;
by this we mean that if 𝐴 ∈ ℐ and 𝐴 ⊂ 𝐵 then 𝐵 ∈ ℐ. The goal is to find the
smallest cardinality set 𝐴 in ℐ. In the weighted case 𝑉 has weights and the goal
is to find a minimum weight set in ℐ. In some case one can also consider more
complex non-additive objectives that assign a cost 𝑐(𝑆) for each 𝑆 ∈ ℐ.

2.1 Greedy for Set Cover and Maximum Coverage


In this section we consider the unweighted version of Set Cover.

2.1.1 Greedy Algorithm


A natural greedy approximation algorithm for these problems is easy to come
up with.
CHAPTER 2. COVERING PROBLEMS 16

Greedy Cover(𝒰 , 𝒮 )

1. repeat

A. pick the set that covers the maximum number of uncovered elements
B. mark elements in the chosen set as covered

2. until done

In case of Set Cover, the algorithm Greedy Cover is done when all the
elements in set 𝒰 have been covered. And in case of Maximum Coverage, the
algorithm is done when exactly 𝑘 subsets have been selected from 𝒮.
We will prove the following theorem.
Theorem 2.1. Greedy Cover is a 1 − (1 − 1/𝑘) 𝑘 ≥ (1 − 1𝑒 ) ' 0.632 approximation
for Maximum Coverage, and a (ln 𝑛 + 1) approximation for Set Cover.
The following theorem due to Feige [56] implies that Greedy Cover is
essentially the best possible in terms of the approximation ratio that it guarantees.

Theorem 2.2. Unless NP ⊆ DTIME(𝑛 𝑂(log log 𝑛) ), there is no (1 − 𝑜(1)) ln 𝑛 approx-


imation for Set Cover. Unless P=NP, for any fixed 𝜖 > 0, there is no (1 − 1𝑒 − 𝜖)
approximation for Maximum Coverage.
Recently the preceding theorem has been strengthened so that the hardness
holds under the assumption that 𝑁𝑃 ≠ 𝑃 [124].

2.1.2 Analysis of Greedy Cover


We proceed towards the proof of Theorem 10.3 by providing analysis of Greedy
Cover separately for Set Cover and Maximum Coverage.

Analysis for Maximum Coverage


Let OPT denote the value of an optimal solution to the Maximum Coverage
problem; this is the maximum number of elements that are covered by 𝑘 sets in
the given set system. Let 𝑥 𝑖 denote the number of new elements covered by the
𝑖-th set chosen by Greedy Cover. Also, let 𝑦 𝑖 = 𝑖𝑗=1 𝑥 𝑖 be the total number of
Í
elements covered after 𝑖 iterations, and 𝑧 𝑖 = OPT −𝑦 𝑖 . Note that, according to
our notation, 𝑦0 = 0 and 𝑦 𝑘 is the number of elements chosen by Greedy Cover at
the end of the algorithm, and 𝑧 0 = OPT. The key to the analysis is the following
simple claim.
CHAPTER 2. COVERING PROBLEMS 17

𝑧𝑖
Claim 2.1.1. For 0 ≤ 𝑖 < 𝑘, 𝑥 𝑖+1 ≥ 𝑘.
Proof. Let 𝐹 ∗ ⊆ 𝒰 be the elements covered by some fixed optimum solution; we
have |𝐹 ∗ | = OPT. Consider iteration 𝑖 + 1. Greedy Cover selects the subset 𝑆 𝑗
whose inclusion covers the maximum number of uncovered elements. Since 𝑧 𝑖 is
the total number of elements covered upto iteration 𝑖, at least OPT −𝑧 𝑖 elements
from 𝐹 ∗ are uncovered. Let the set of uncovered elements from 𝐹 ∗ at the end of
iteration 𝑖 be 𝐹𝑖∗ . Since 𝑘 sets together cover 𝐹 ∗ , and hence 𝐹𝑖 ∗ as well, there must
be some set in that collection of 𝑘 sets that covers at least |𝐹𝑖∗ |/𝑘 elements. This is
a candidate set that can be chosen in iteration 𝑖 + 1. Since the algorithm picks
the set that covers the maximum number of uncovered elements, the chosen
set in iteration 𝑖 + 1 covers at least |𝐹𝑖∗ |/𝑘 = 𝑧𝑘𝑖 uncovered elements. Hence,
𝑥 𝑖+1 ≥ 𝑧𝑘𝑖 . 
𝑧𝑖
Remark 2.1. It is tempting to make a stronger claim that 𝑥 𝑖+1 ≥ 𝑘−𝑖 . This is
however false, and it is worthwhile to come up with an example.
By definition we have 𝑦 𝑘 = 𝑥 1 + 𝑥 2 + . . . + 𝑥 𝑘 is the total number of elements
covered by Greedy Cover. To analyze the worst-case we want to make this sum
as small as possible given the preceding claim. Heuristically (which one can
formalize), one can argue that choosing 𝑥 𝑖+1 = 𝑧 𝑖 /𝑘 minimizes the sum. Using
this one can argue that the sum is at least (1 − (1 − 1/𝑘) 𝑘 ) OPT. We give a formal
argument now.
Claim 2.1.2. For 𝑖 ≥ 0, 𝑧 𝑖 ≤ (1 − 1𝑘 )𝑖 · OPT.
Proof. By induction on 𝑖. The claim is trivially true for 𝑖 = 0 since 𝑧 0 = OPT. We
assume inductively that 𝑧 𝑖 ≤ (1 − 1𝑘 )𝑖 · 𝑂𝑃𝑇. Then
𝑧 𝑖+1 = 𝑧 𝑖 − 𝑥 𝑖+1
1
≤ 𝑧 𝑖 (1 − ) [using Claim 2.1.1]
𝑘
1 𝑖+1
≤ (1 − ) · OPT .
𝑘

The preceding claims yield the following lemma for algorithm Greedy Cover
when applied on Maximum Coverage.
Lemma 2.1. Greedy Cover is a 1 − (1 − 1/𝑘) 𝑘 ≥ 1 − 1
𝑒 approximation for Maximum
Coverage.
Proof. It follows from Claim 2.1.2 that 𝑧 𝑘 ≤ (1 − 1𝑘 ) 𝑘 · OPT ≤ OPT
𝑒 . Hence,
𝑦 𝑘 = OPT −𝑧 𝑘 ≥ (1 − 1𝑒 ) · OPT. 
We note that (1 − 1/𝑒) ' 0.632.
CHAPTER 2. COVERING PROBLEMS 18

Analysis for Set Cover


Let 𝑘 ∗ denote the value of an optimal solution to the Set Cover problem. Then an
optimal solution value to the Maximum Coverage problem with the same system
and 𝑘 = 𝑘 ∗ would by 𝑛 = |𝒰 | since it is possible to cover all the 𝑛 elements in
set 𝒰 with 𝑘 ∗ sets. From our previous analysis, 𝑧 𝑘 ∗ ≤ 𝑛𝑒 . Therefore, at most
𝑛
𝑒 elements would remain uncovered after the first 𝑘 steps of Greedy Cover.

𝑛
Similarly, after 2 · 𝑘 steps of Greedy Cover, at most 𝑒 2 elements would remain

uncovered. This easy intuition convinces us that Greedy Cover is a (ln 𝑛 + 1)


approximation for the Set Cover problem. A more formal proof is given below.

Lemma 2.2. Greedy Cover is a (ln 𝑛 + 1) approximation for Set Cover.


1 𝑖 𝑛
Proof. Since 𝑧 𝑖 ≤ (1 − 𝑘∗ ) · 𝑛, after 𝑡 = d𝑘 ∗ ln 𝑘∗ e steps,
𝑛 𝑛
𝑧 𝑡 ≤ 𝑛(1 − 1/𝑘 ∗ ) 𝑘
∗ ln 𝑘∗ ≤ 𝑛𝑒 − ln 𝑘 ∗ ≤ 𝑘 ∗ .

Thus, after 𝑡 steps, at most 𝑘 ∗ elements are left to be covered. Since Greedy Cover
picks at least one element in each step, it covers all the elements after picking at
most d𝑘 ∗ ln 𝑘𝑛∗ e + 𝑘 ∗ ≤ 𝑘 ∗ (ln 𝑛 + 1) sets. 
A useful special case of Set Cover is when all sets are “small”. Does the
approximation bound for Greedy improve? We can prove the following corollary
via Lemma 2.2.

Corollary 2.3. If each set in the set system has at most 𝑑 elements, then Greedy Cover
is a (ln 𝑑 + 1) approximation for Set Cover.
𝑛
Proof. If each set has at most 𝑑 elements then we have that 𝑘 ∗ ≥ 𝑑 and hence
ln 𝑘𝑛∗ ≤ ln 𝑑. Then the claim follows from Lemma 2.2. 
Theorem 10.3 follows directly from Lemma 2.1 and 2.2.
A near-tight example for Greedy Cover when applied on Set Cover : Let
us consider a set 𝒰 of 𝑛 elements along with a collection 𝒮 of 𝑘 + 2 subsets
{𝑅 1 , 𝑅2 , 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 } of 𝒰. Let us also assume that |𝐶 𝑖 | = 2𝑖 and |𝑅 1 ∩ 𝐶 𝑖 | =
|𝑅2 ∩ 𝐶 𝑖 | = 2𝑖−1 (1 ≤ 𝑖 ≤ 𝑘), as illustrated in Fig. 2.1.
Clearly, the optimal solution consists of only two sets, i.e., 𝑅1 and 𝑅 2 . Hence,
OPT = 2. However, Greedy Cover will pick each of the remaining 𝑘 sets, namely
Í 𝑘−1 𝑖
𝐶 𝑘 , 𝐶 𝑘−1 , . . . , 𝐶1 . Since 𝑛 = 2 · 𝑖=0 2 = 2 · (2 𝑘 − 1), we get 𝑘 = Ω(log 𝑛). One
can construct tighter examples with more involved analysis.
CHAPTER 2. COVERING PROBLEMS 19

$cdots$ $cdots$
111
000
000
111 111
000
000
111 111
000
000
111 1111
0000
0000
1111
000
111 000
111 000
111 0000
1111
$R_1$ 000
111 000
111 000
111 0000
1111
000
111
000
111 000
111
000
111 000
111
000
111 0000
1111
0000
1111
$2^{i−1}$ elements

111
000 111
000 111
000 1111
0000
000
111 000
111 000
111 0000
1111
$R_2$ 000
111
000
111 000
111
000
111 000
111
000
111 0000
1111
0000
1111
000
111
000
111 000
111
000
111 000
111
000
111 0000
1111
0000
1111
$cdots$ $cdots$

$C_1$ $C_2$ $C_i$ $C_k$

Figure 2.1: A near-tight example for Greedy Cover when applied on Set Cover

Exercise 2.1. Consider the weighted version of the Set Cover problem where a
weight function 𝑤 : 𝒮 → ℛ + is given, and we want to select a collection 𝒮 0 of
subsets from 𝒮 such that ∪𝑋∈𝒮0 𝑋 = 𝒰, and 𝑋∈𝒮0 𝑤(𝑋) is the minimum. One
Í
can generalize the greedy heuristic in the natural fashion where in each iteration
the algorithm picks the set that maximizes the ratio of the number of elements
to its weight. Adapt the unweighted analysis to prove that the greedy algorithm
yields an 𝑂(ln 𝑛) approximation for the weighted version (you can be sloppy
with the constant in front of ln 𝑛).

2.1.3 Dominating Set


A dominating set in a graph 𝐺 = (𝑉 , 𝐸) is a set 𝑆 ⊆ 𝑉 such that for each 𝑢 ∈ 𝑉,
either 𝑢 ∈ 𝑆, or some neighbor 𝑣 of 𝑢 is in 𝑆. In other words 𝑆 covers/dominates
all the nodes in 𝑉. In the Dominating Set problem, the input is a graph 𝐺 and
the goal is to find a smallest sized dominating set in 𝐺.

Exercise 2.2. 1. Show that Dominating Set is a special case of Set Cover.

2. What is the greedy heuristi when applied to Dominating Set. Prove that it
yields an (ln (Δ + 1) + 1) approximation where Δ is the maximum degree
in the graph.

3. Show that Set Cover can be reduced in an approximation preserving


fashion to Dominating Set. More formally, show that if Dominating Set
has an 𝛼(𝑛)-approximation where 𝑛 is the number of vertices in the given
instance then Set Cover has an (1 − 𝑜(1))𝛼(𝑛)-approximation.
CHAPTER 2. COVERING PROBLEMS 20

2.2 Vertex Cover


We have already seen that the Vertex Cover problem is a special case of the
Set Cover problem. The Greedy algorithm when specialized to Vertex Cover
picks a highest degree vertex, removes it and the covered edges from the graph,
and recurses in the remaining graph. It follows that the Greedy algorithm
gives an 𝑂(ln Δ + 1) approximation for the unweighted versions of the Vertex
Cover problem. One can wonder wheter the Greedy algorith has a better worst-
case for Vertex Cover than the analysis suggests. Unfortunately the answer is
negative and there are examples where the algorithm outputs a solution with
Ω(log 𝑛 · OPT) vertices.
We sketch the construction. Consider a bipartite graph 𝐺 = (𝑈 , 𝑉 , 𝐸) where
𝑈 = {𝑢1 , 𝑢2 , . . . , 𝑢 ℎ }. 𝑉 is partitioned into 𝑆1 , 𝑆2 , . . . , 𝑆 ℎ where 𝑆 𝑖 has bℎ/𝑖c
vertices. Each vertex 𝑣 in 𝑆 𝑖 is connected to exactly 𝑖 distinct vertices of 𝑈; thus,
any vertex 𝑢 𝑗 is incident to at most one edge from 𝑆 𝑖 . It can be seen that the
degree of each vertex 𝑢 𝑗 ∈ 𝑈 is roughly ℎ. Clearly 𝑈 is a vertex cover of 𝐺 since
the graph is bipartite. Convince yourself that the Greedy algorithm will pick
all of 𝑉 starting with the lone vertex in 𝑆 ℎ (one may need to break ties to make
this happen but the example can be easily perturbed to make this unnecessary).
We have 𝑛 = Θ(ℎ log ℎ) and 𝑂𝑃𝑇 ≤ ℎ and Greedy outputs a solution of size
Ω(ℎ log ℎ).

2.2.1 A 2-approximation for Vertex Cover


There is a very simple 2-approximation algorithm for the Cardinality Vertex
Cover problem.

Matching-VC(𝐺 )

1. 𝑆 ← ∅

2. Compute a maximal matching 𝑀 in 𝐺

3. for each edge (𝑢, 𝑣) ∈ 𝑀 do

A. add both 𝑢 and 𝑣 to 𝑆

4. Output 𝑆

Theorem 2.4. Matching-VC is a 2-approximation algorithm.

The proof of Theorem 2.4 follows from two simple claims.


CHAPTER 2. COVERING PROBLEMS 21

Claim 2.2.1. Let OPT be the size of the vertex cover in an optimal solution. Then
OPT ≥ |𝑀|.

Proof. Any vertex cover must contain at least one end point of each edge in 𝑀
since no two edges in 𝑀 intersect. Hence OPT ≥ |𝑀 |. 
Claim 2.2.2. Let 𝑆(𝑀) = {𝑢, 𝑣|(𝑢, 𝑣) ∈ 𝑀}. Then 𝑆(𝑀) is a vertex cover.

Proof. If 𝑆(𝑀) is not a vertex cover, then there must be an edge 𝑒 ∈ 𝐸 such that
neither of its endpoints are in 𝑀. But then 𝑒 can be included in 𝑀, which
contradicts the maximality of 𝑀. 
We now finish the proof of Theorem 2.4. Since 𝑆(𝑀) is a vertex cover,
Claim 2.2.1 implies that |𝑆(𝑀)| = 2 · |𝑀| ≤ 2 · OPT.
Weighted Vertex Cover: The matching based heuristic does not generalize in
a straight forward fashion to the weighted case but 2-approximation algorithms
for the Weighted Vertex Cover problem can be designed based on LP rounding.

2.2.2 Set Cover with small frequencies


Vertex Cover is an instance of Set Cover where each element in 𝒰 is in at most
two sets (in fact, each element was in exactly two sets). This special case of the
Set Cover problem admits a 2-approximation algorithm. What would be the
case if every element is contained in at most three sets? More generally, given
an instance of Set Cover, for each 𝑒 ∈ 𝒰, let 𝑓 (𝑒) denote the number of sets
containing 𝑒. Let 𝑓 = max𝑒 𝑓 (𝑒), which we call the maximum frequency.

Exercise 2.3. Give an 𝑓 -approximation for Set Cover, where 𝑓 is the maximum
frequency of an element. Hint: Follow the approach used for Vertex Cover .

2.3 Vertex Cover via LP


Let 𝐺 = (𝑉 , 𝐸) be an undirected graph with arc weights 𝑤 : 𝑉 → 𝑅+ . We can
formulate Vertex Cover as an integer linear programming problem as follows.
For each vertex 𝑣 we have a variable 𝑥 𝑣 . We interpret the variable as follows: if
𝑥 𝑣 = 1 if 𝑣 is chosen to be included in a vertex cover, otherwise 𝑥 𝑣 = 0. With this
interprtation we can easily see that the minimum weight vertex cover can be
formulated as the following integer linear program.
CHAPTER 2. COVERING PROBLEMS 22

Õ
min 𝑤𝑣 𝑥𝑣
𝑣∈𝑉
subject to
𝑥𝑢 + 𝑥𝑣 ≥ 1 ∀𝑒 = (𝑢, 𝑣) ∈ 𝐸
𝑥𝑣 ∈ {0, 1} ∀𝑣 ∈ 𝑉

However, solving the preceding integer linear program is NP-Hard since it


would solve Vertex Cover exactly. Therefore we use Linear Programming (LP)
to approximate the optimal solution, OPT(𝐼), for the integer program. First, we
can relax the constraint 𝑥 𝑣 ∈ {0, 1} to 𝑥 𝑣 ∈ [0, 1]. It can be further simplified to
𝑥 𝑣 ≥ 0, ∀𝑣 ∈ 𝑉.
Thus, a linear programming formulation for Vertex Cover is:
Õ
min 𝑤𝑣 𝑥𝑣
𝑣∈𝑉
subject to
𝑥𝑢 + 𝑥𝑣 ≥ 1 ∀𝑒 = (𝑢, 𝑣) ∈ 𝐸
𝑥𝑣 ≥ 0

We now use the following algorithm:

Vertex Cover via LP

1. Solve LP to obtain an optimal fractional solution 𝑥 ∗


1
2. Let 𝑆 = {𝑣 | 𝑥 𝑣∗ ≥ }
2
3. Output 𝑆

Claim 2.3.1. 𝑆 is a vertex cover.


Proof. Consider any edge, 𝑒 = (𝑢, 𝑣). By feasibility of 𝑥 ∗ , 𝑥 𝑢∗ + 𝑥 𝑣∗ ≥ 1, and thus
𝑥 𝑢∗ ≥ 21 or 𝑥 𝑣∗ ≥ 12 . Therefore, at least one of 𝑢 and 𝑣 will be in 𝑆. 
Claim 2.3.2. 𝑤(𝑆) ≤ 2 OPT𝐿𝑃 (𝐼).
𝑤 𝑣 𝑥 𝑣∗ ≥ 1
𝑤 𝑣 = 12 𝑤(𝑆).
Í Í
Proof. OPT𝐿𝑃 (𝐼) = 𝑣 2 𝑣∈𝑆 

Therefore, OPT𝐿𝑃 (𝐼) ≥ OPT(𝐼)


2 for all instances 𝐼.
Remark 2.2. For minimization problems: OPT𝐿𝑃 (𝐼) ≤ OPT(𝐼), where OPT𝐿𝑃 (𝐼)
is the optimal solution found by LP; for maximization problems, OPT𝐿𝑃 (𝐼) ≥
OPT(𝐼).
CHAPTER 2. COVERING PROBLEMS 23

Integrality Gap: We introduce the notion of integrality gap to show the best
approximation guarantee we can obtain if we only use the LP values as a lower
bound.

Definition 2.5. For a minimization problem Π, the integrality gap for a linear pro-
OPT(𝐼)
gramming relaxation/formulation 𝐿𝑃 for Π is sup𝐼∈𝜋 OPT (𝐼)
.
𝐿𝑃

That is, the integrality gap is the worst case ratio, over all instances 𝐼 of Π, of
the integral optimal value and the fractional optimal value. Note that different
linear programming formulations for the same problem may have different
integrality gaps.
Claims 2.3.1 and 2.3.2 show that the integrality gap of the Vertex Cover LP
formulation above is at most 2.
Question: Is this bound tight for the Vertex Cover LP?
Consider the following example: Take a complete graph, 𝐾 𝑛 , with n vertices,
and each vertex has 𝑤 𝑣 = 1. It is clear that we have to choose 𝑛 − 1 vertices to
cover all the edges. Thus, OPT(𝐾 𝑛 ) = 𝑛 − 1. However, 𝑥 𝑣 = 12 for each 𝑣 is a
feasible solution to the LP, which has a total weight of 𝑛2 . So gap is 2 − 𝑛1 , which
tends to 2 as 𝑛 → ∞. One can also prove that the integrality gap is essentially 2
even in a class of sparse graphs.

Exercise 2.4. The vertex cover problem can be solved optimally in polynomial
time in bipartite graphs. In fact the LP is integral. Prove this via the maxflow-
mincut theorem and the integrality of flows when capacities are integral.

Other Results on Vertex Cover


1. The current best approximation ratio for Vertex Cover is 2 − Θ( √ 1 ) [98].
log 𝑛

2. It is known that unless 𝑃 = 𝑁𝑃 there is 𝛼-approximation for Vertex Cover


for 𝛼 < 1.36 [50]. Under a stronger hypothesis called the Unique Games
Conjecture it is known that there is no 2 − 𝜖 approximation for any fixed
𝜖 > 0 [103].

3. There is a polynomial time approximation scheme (PTAS), that is a (1 + 𝜖)-


approximation for any fixed 𝜖 > 0, for planar graphs. This follows from a
general approach due to Baker [17]. The theorem extends to more general
classes of graphs.
CHAPTER 2. COVERING PROBLEMS 24

2.4 Set Cover via LP


The input to the Set Cover problem consists of a finite set 𝑈 = {1, 2, ..., 𝑛}, and
𝑚 subsets of 𝑈, 𝑆1 , 𝑆2 , ..., 𝑆𝑚 . Each set 𝑆 𝑗 has a non-negative weigh 𝑤 𝑗 and the
goal is to find the minimum weight collection of sets which cover all elements in
𝑈 (in other words their union is 𝑈).
A linear programming relaxation for Set Cover is:
Õ
min 𝑤𝑗 𝑥𝑗
𝑗
subject to
Õ
𝑥𝑗 ≥ 1 ∀𝑖 ∈ {1, 2, ..., 𝑛}
𝑗:𝑖∈𝑆 𝑗
𝑥𝑗 ≥ 0 1≤𝑗≤𝑚

And its dual is:


𝑛
Õ
max 𝑦𝑖
𝑖=1
subject to
Õ
𝑦𝑖 ≤ 𝑤𝑗 ∀𝑗 ∈ {1, 2, ..., 𝑚}
𝑖∈𝑆 𝑗
𝑦𝑖 ≥ 0 ∀𝑖 ∈ 1, 2, ..., 𝑛

We give several algorithms for Set Cover based on this primal/dual pair LPs.

2.4.1 Deterministic Rounding

Set Cover via LP

1. Solve LP to obtain an optimal solution 𝑥 ∗ , which contains fractional


numbers.

2. Let 𝑃 = {𝑖 | 𝑥 ∗𝑖 > 0}

3. Output {𝑆 𝑗 | 𝑗 ∈ 𝑃}

Note that the above algorithm, even when specialized to Vertex Cover is
different from the one we saw earlier. It includes all sets which have a strictly
positive value in an optimum solution to the LP.
CHAPTER 2. COVERING PROBLEMS 25

Let 𝑥 ∗ be an optimal solution to the primal LP, 𝑦 ∗ be an optimum solution to the


dual, and let 𝑃 = {𝑗 | 𝑥 ∗𝑗 > 0}. First, note that by strong duality, 𝑗 𝑤 𝑗 𝑥 ∗𝑗 = 𝑖 𝑦 𝑖∗ .
Í Í

Second, by complementary slackness if 𝑥 ∗𝑗 > 0 then the corresponding dual


constraint is tight, that is 𝑖∈𝑆 𝑗 𝑦 𝑖∗ = 𝑤 𝑗 .
Í

Claim 2.4.1. The output of the algorithm is a feasible set cover for the given instance.

Proof. Exercise. 
𝑤𝑗 ≤ 𝑓 𝑤 𝑗 𝑥 ∗𝑗 = OPT𝐿𝑃 .
Í Í
Claim 2.4.2. 𝑗∈𝑃 𝑗

Proof.

Õ Õ Õ ©Õ ª Õ © Õ ª Õ
𝑤𝑗 = 𝑤𝑗 = ­ 𝑦 𝑖∗ ® = 𝑦 𝑖∗ ­­ 1®® ≤ 𝑓 𝑦 𝑖∗ = 𝑓 OPT𝐿𝑃 (𝐼).
𝑗∈𝑃 𝑗:𝑥 ∗𝑗 >0 𝑗:𝑥 ∗𝑗 >0 « 𝑖∈𝑆 𝑗 ¬ 𝑖 𝑗:𝑖∈𝑆 𝑗 ,𝑥 ∗𝑗 >0 𝑖
« ¬
. 

Notice that the the second equality is due to complementary slackness


conditions (if 𝑥 ∗𝑗 > 0, the corresponding dual constraint is tight), the penultimate
inequality uses the definition of 𝑓 , and the last inequality follows from weak
duality (a feasible solution for the dual problem is a lower bound on the optimal
primal solution).
Therefore we have that the algorithm outputs a cover of weight at most
𝑓 OPT𝐿𝑃 . We note that 𝑓 can be as large as 𝑛 in which case the bound given by
the algorithm is quite weak. In fact, it is not hard to construct examples that
demonstrate the tightness of the analysis.
Remark 2.3. The analysis cruically uses the fact that 𝑥 ∗ is an optimal solution.
On the other hand the algorithm for Vertex Cover is more robust and works
with any feasible solution 𝑥. It is easy to generalize the earlier rounding for
Vertex Cover to obtain an 𝑓 -approximation. The point of the above rounding is
to illustrate the utility of complementary slackness.

2.4.2 Randomized Rounding


Now we describe a different rounding that yields an approximation bound that
does not depend on 𝑓 .
CHAPTER 2. COVERING PROBLEMS 26

Set Cover via Randomized Rounding

1. 𝐴 = ∅, and let 𝑥 ∗ be an optimal solution to the LP

2. for 𝑘 = 1 to 2 ln 𝑛 do

A. pick each 𝑆 𝑗 independently with probability 𝑥 ∗𝑗


B. if 𝑆 𝑗 is picked, 𝐴 = 𝐴 ∪ {𝑗}

3. Output the sets with indices in 𝐴

(1 − 𝑥 ∗𝑗 ) ≤ 1𝑒 .
Î
Claim 2.4.3. P[𝑖 is not covered in an iteration] = 𝑗:𝑖∈𝑆 𝑗

Intuition: We know that 𝑗:𝑖∈𝑆 𝑗 𝑥 ∗𝑗 ≥ 1. Subject to this constraint, if we


Í

want to minimize the probability that element 𝑖 is covered, one can see that the
minimum is achieved with 𝑥 ∗𝑗 = 1/ℓ for each set 𝑆 𝑗 that covers 𝑖; here ℓ is the
number of sets that cover 𝑖. Then the probability is (1 − ℓ1 )ℓ .

Proof. We use the inequality (1 − 𝑥) ≤ 𝑒 −𝑥 for all 𝑥 ∈ [0, 1].


Ö Ö −𝑥 ∗𝑗 −
Í
𝑗:𝑖∈𝑆 𝑗 𝑥 ∗𝑗 1
P[𝑖 is not covered in an iteration] = (1 − 𝑥 ∗𝑗 ) ≤ 𝑒 ≤𝑒 ≤ .
𝑒
𝑗:𝑖∈𝑆 𝑗 𝑗:𝑖∈𝑆 𝑗


We then obtain the following corollaries:

Corollary 2.6. P[𝑖 is not covered at the end of the algorithm] ≤ 𝑒 −2 log 𝑛 ≤ 1
𝑛2
.

Corollary 2.7. P[all elements are covered, after the algorithm stops] ≥ 1 − 𝑛1 .

Proof. Via the union bound. The probability that 𝑖 is not covered is at most
1/𝑛 2 , hence the probability that there is some 𝑖 that is not covered is at most
𝑛 · 1/𝑛 2 ≤ 1/𝑛. 
Now we bound the expected cost of the algorithm. Let 𝐶𝑡 = cost of sets
picked in iteration 𝑡, then E[[𝐶𝑡 ] = 𝑚𝑗=1 𝑤 𝑗 𝑥 𝑗 , where E[𝑋] denotes the ex-

Í
ln 𝑛
pectation of a random variable 𝑋. Then, let 𝐶 = 2𝑡=1 𝐶𝑡 ; we have E[𝐶] =
Í
Í2 ln 𝑛
𝑡=1 E[𝐶 𝑡 ] ≤ 2 ln 𝑛 OPT𝐿𝑃 . By Markov’s inequality, P[𝐶 > 2 E[𝐶]] ≤ 2 , hence
1

P[𝐶 ≤ 4 ln 𝑛 OPT𝐿𝑃 ] ≥ 12 . Therefore, P[𝐶 ≤ 4 ln 𝑛 OPT𝐿𝑃 and all items are covered] ≥
1 1
2 − 𝑛 . Thus, the randomized rounding algorithm, with probability close to 1/2
CHAPTER 2. COVERING PROBLEMS 27

succeeds in giving a feasible solution of cost 𝑂(log 𝑛) OPT𝐿𝑃 . Note that we can
check whether the solution satisfies the desired properties (feasibility and cost)
and repeat the algorithm if it does not.

1. We can check if solution after rounding satisfies the desired properties,


such as all elements are covered, or cost at most 2𝑐 log 𝑛 OPT𝐿𝑃 . If not,
repeat rounding. Expected number of iterations to succeed is a constant.

2. We can also use Chernoff bounds (large deviation bounds) to show that
a single rounding succeeds with high probability (probability at least
1 − 𝑝𝑜𝑙1𝑦(𝑛) ).

3. The algorithm can be derandomized. Derandomization is a technique of


removing randomness or using as little randomness as possible. There
are many derandomization techniques, such as the method of conditional
expectation, discrepancy theory, and expander graphs.

4. After a few rounds, select the cheapest set that covers each uncovered
element. This has low expected cost. This algorithm ensures feasibility
but guarantees cost only in the expected sense. We will see a variant on
the homework.

Randomized Rounding with Alteration: In the preceding analysis we had to


worry about the probability of covering all the elements and the expected cost
of the solution. Here we illustrate a simple yet powerful technique of alteration
in randomized algorithms and analysis. Let 𝑑 be the maximum set size.

Set Cover: Randomized Rounding with Alteration

1. 𝐴 = ∅, and let 𝑥 ∗ be an optimal solution to the LP

2. Add to 𝐴 each 𝑆 𝑗 independently with probability min{1, ln 𝑑 · 𝑥 ∗𝑗 }

3. Let 𝒰 0 be the elements uncovered by the chosen sets in 𝐴

4. For each uncovered element 𝑖 ∈ 𝒰 0 do

A. Add to 𝐴 the cheapest set that covers 𝑖

5. Output the sets with indices in 𝐴

The algorithm has two phases. A randomized phase and a fixing/altering


phase. In the second phase we apply a naive algorithm that may have a high
cost in the worst case but we will bound its expected cost appropriately. The
CHAPTER 2. COVERING PROBLEMS 28

algorithm deterministically guarantees that all elements will be covered, and


hence we only need to focus on the expected cost of the chosen sets. Let
𝐶1 be the random cost of the sets chosen in the first phase and let 𝐶2 be the
random cost of the sets chosen in the second phase. It is easy to see that
E[𝐶1 ] = ln 𝑑 𝑗 𝑤 𝑗 𝑥 ∗𝑗 = ln 𝑑 OPT𝐿𝑃 . Let ℰ 𝑖 be the event that element 𝑖 is not
Í

covered after the first randomized phase.


Exercise 2.5. P[ℰ 𝑖 ] ≤ 𝑒 − ln 𝑑 ≤ 1/𝑑.
The worst case second phase cost can be upper bounded via the next lemma.

Lemma 2.3. Let 𝛽 𝑖 be the cost of the cheapest set covering 𝑖. Then 𝛽 𝑖 ≤ 𝑑 OPT𝐿𝑃 .
Í
𝑖
Proof. Consider an element 𝑖. We have the constraint that 𝑗:𝑖∈𝑆 𝑗 𝑥 ∗𝑗 ≥ 1. Since
Í

each set covering 𝑖 has cost at least 𝛽 𝑖 , we have 𝑗:𝑖∈𝑆 𝑗 𝑐 𝑗 𝑥 ∗𝑗 ≥ 𝛽 𝑖 𝑗:𝑖∈𝑆 𝑗 𝑥 ∗𝑗 ≥ 𝛽 𝑖 .


Í Í

Thus,
Õ ÕÕ Õ Õ
𝛽𝑖 ≤ 𝑐 𝑗 𝑥 ∗𝑗 ≤ 𝑐 𝑗 𝑥 ∗𝑗 |𝑆 𝑗 | ≤ 𝑑 𝑐 𝑗 𝑥 ∗𝑗 = 𝑑 OPT𝐿𝑃 .
𝑖 𝑖 𝑗:𝑖∈𝑆 𝑗 𝑗 𝑗


Now we bound the expected second phase cost.
Lemma 2.4. E[𝐶2 ] ≤ OPT𝐿𝑃 .
Proof. We pay for a set to cover element 𝑖 in the second phase only if it is not
covered in the first phase. Hence 𝐶2 = 𝑖 ℰ 𝑖 𝛽 𝑖 . Note that the events ℰ 𝑖 for
Í
different elements 𝑖 are not necessarily independent, however, we can apply
linearity of expectation.
Õ Õ Õ
E[𝐶2 ] = E[ℰ 𝑖 ]𝛽 𝑖 = P[ℰ 𝑖 ]𝛽 𝑖 ≤ 1/𝑑 𝛽 𝑖 ≤ OPT𝐿𝑃 .
𝑖 𝑖 𝑖

Combining the expected costs of the two phases we obtain the following
theorem.
Theorem 2.8. Randomized rounding with alteration outputs a feasible solution of
expected cost (1 + ln 𝑑) OPT𝐿𝑃 .
Note that the simplicity of the algorithm and tightness of the bound.
Remark 2.4. If 𝑑 = 2 the Set Cover problem becomes the Edge Cover problem
in a graph which is the following. Given an edge-weighted graph 𝐺 = (𝑉 , 𝐸),
find the minimum weight subset of edges such that each vertex is covered. Edge
Cover admits a polynomial-time algorithm via a reduction to the minimum-cost
matching problem in a general graph. However 𝑑 = 3 for Set Cover is NP-Hard
via a reduction from 3-D Matching.
CHAPTER 2. COVERING PROBLEMS 29

2.4.3 Dual-fitting
In this section, we introduce the technique of dual-fitting for the analysis of
approximation algorithms. At a high-level the approach is the following:

1. Consider an algorithm that one wants to analyze.

2. Construct a feasible solution to the dual LP based on the structure of the


algorithm.

3. Show that the cost of the solution returned by the algorithm can be bounded
in terms of the value of the dual solution.

Note that the algorithm itself need not be LP based. Here, we use Set Cover
as an example. See the previous section for the primal and dual LP formulations
for Set Cover .
We can interpret the dual as follows: Think of 𝑦 𝑖 as how much element 𝑖 is
willing to pay to be covered; the dual maximizes the total payment, subject to
the constraint that for each set, the total payment of elements in that set is at
most the cost of the set.
We rewrite the Greedy algorithm for Weighted Set Cover.

Greedy Set Cover

1. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = ∅

2. 𝐴 = ∅;

3. While 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ≠ 𝑈 do
𝑤𝑘
A. 𝑗 ← arg min( );
|𝑆 𝑘 ∩ 𝑈 𝑛𝑐𝑜𝑣𝑒𝑟𝑒 𝑑|
𝑘
B. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ∪ 𝑆 𝑗 ;
C. 𝐴 = 𝐴 ∪ {𝑗}.

4. end while;

5. Output sets in 𝐴 as cover

Let 𝐻 𝑘 = 1 + 1/2 + . . . + 1/𝑘 be the 𝑘the Harmonic number. It is well known


that 𝐻 𝑘 ≤ 1 + ln 𝑘.

Theorem 2.9. Greedy Set Cover picks a solution of cost ≤ 𝐻𝑑 · OPT𝐿𝑃 , where 𝑑 is
the maximum set size, i.e., 𝑑 = max 𝑗 |𝑆 𝑗 |.
CHAPTER 2. COVERING PROBLEMS 30

To prove this, we augment the algorithm to keep track of some additional


information.

Augmented Greedy Algorithm of Weighted Set Cover

1. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = ∅

2. While 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ≠ 𝑈 do
𝑤𝑘
A. 𝑗 ← arg min( )
𝑘 |𝑆 𝑘 ∩ 𝑈 𝑛𝑐𝑜𝑣𝑒𝑟𝑒 𝑑|
𝑤𝑗
B. if 𝑖 is uncovered and 𝑖 ∈ 𝑆 𝑗 , set 𝑝 𝑖 = ;
|𝑆 𝑗 ∩ 𝑈 𝑛𝑐𝑜𝑣𝑒𝑟𝑒 𝑑|
C. 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 = 𝐶𝑜𝑣𝑒𝑟𝑒 𝑑 ∪ 𝑆 𝑗
D. 𝐴 = 𝐴 ∪ {𝑗}.

3. Output sets in 𝐴 as cover

It is easy to see that the algorithm outputs a feasible cover.

𝑤𝑗 = 𝑝𝑖 .
Í Í
Claim 2.4.4. 𝑗∈𝐴 𝑖

Proof. Consider when 𝑗 is added to 𝐴. Let 𝑆0𝑗 ⊆ 𝑆 𝑗 be the elements that are
uncovered before 𝑗 is added. For each 𝑖 ∈ 𝑆0𝑗 the algorithm sets 𝑝 𝑖 = 𝑤 𝑗 /|𝑆0𝑗 |.
Hence, 𝑖∈𝑆0 𝑝 𝑖 = 𝑤 𝑗 . Moreover, it is easy to see that the sets 𝑆0𝑗 , 𝑗 ∈ 𝐴 are disjoint
Í
𝑗
and together partition 𝑈. Therefore,
Õ ÕÕ Õ
𝑤𝑗 = 𝑝𝑖 = 𝑝𝑖 .
𝑗∈𝐴 𝑗∈𝐴 𝑖∈𝑆0𝑗 𝑖∈𝑈


For each 𝑖, let 𝑦 𝑖0 = 𝐻𝑑 𝑝 𝑖
1
.

Claim 2.4.5. 𝑦 0 is a feasible solution for the dual LP.

Í Suppose
Íthe0 claim is true, then the cost of Greedy Set Cover’s solution =
𝑖 𝑝 𝑖 = 𝐻𝑑 𝑖 𝑦 𝑖 ≤ 𝐻𝑑 OPT𝐿𝑃 . The last step is because any feasible solution for
the dual problem is a lower bound on the value of the primal LP (weak duality).
Now, we prove the claim. Let 𝑆 𝑗 be an arbitrary set, and let |𝑆 𝑗 | = 𝑡 ≤ 𝑑. Let
𝑆 𝑗 = {𝑖1 , 𝑖2 , ..., 𝑖 𝑡 }, where we the elements are ordered such that 𝑖1 is covered by
Greedy no-later than 𝑖2 , and 𝑖2 is covered no later than 𝑖3 and so on.
𝑤𝑗
Claim 2.4.6. For 1 ≤ ℎ ≤ 𝑡, 𝑝 𝑖 ℎ ≤ 𝑡−ℎ+1 .
CHAPTER 2. COVERING PROBLEMS 31

Proof. Let 𝑆 𝑗0 be the set that covers 𝑖 ℎ in Greedy. When Greedy picked 𝑆 𝑗0 the
elements 𝑖 ℎ , 𝑖 ℎ+1 , . . . , 𝑖 𝑡 from 𝑆 𝑗 were uncovered and hence Greedy could have
picked 𝑆 𝑗 as well. This implies that the density of 𝑆 𝑗0 when it was picked was
𝑤𝑗
no more than 𝑡−ℎ+1 . Therefore 𝑝 𝑖 ℎ which is set to the density of 𝑆 𝑗0 is at most
𝑤𝑗
𝑡−ℎ+1 . 
From the above claim, we have
Õ Õ 𝑤𝑗
𝑝𝑖ℎ ≤ = 𝑤 𝑗 𝐻𝑡 ≤ 𝑤 𝑗 𝐻 𝑑 .
𝑡−ℎ+1
1≤ℎ≤𝑡 1≤ℎ≤𝑡

Thus, the setting of 𝑦 𝑖0 to be 𝑝 𝑖 scaled down by a factor of 𝐻𝑑 gives a feasible


solution.

2.4.4 Greedy for implicit instances of Set Cover


Set Cover and the Greedy heuristic are quite useful in applications because
many instances are implicit, nevertheless, the algorithm and the analysis applies.
That is, the universe 𝒰 of elements and the collection 𝒮 of subsets of 𝒰 need
not be restricted to be finite or explicitly enumerated in the Set Cover problem.
For instance, a problem could require covering a finite set of points in the plane
using disks of unit radius. There is an infinite set of such disks, but the Greedy
approximation algorithm can still be applied. For such implicit instances, the
Greedy algorithm can be used if we have access to an oracle, which, at each
iteration, selects a set having the optimal density. However, an oracle may not
always be capable of selecting an optimal set. In some cases it may have to make
the selections approximately. We call an oracle an 𝛼-approximate oracle for some
𝛼 ≥ 1 if, at each iteration, it selects a set 𝑆 such that 𝑤(𝑆) 𝐴
𝑆 ≤ 𝛼 min𝐴 in collection 𝑤(𝐴) .

Exercise 2.6. Prove that the approximation guarantee of Greedy with an 𝛼-


approximate oracle would be 𝛼(ln 𝑛 + 1) for Set Cover, and (1 − 𝑒1𝛼 ) for Maximum
Coverage.

We will see several examples of implicit use of the greedy analysis in the
course.

2.5 Submodularity
Set Cover turns out to be a special case of a more general problem called
Submodular Set Cover. The Greedy algorithm and analysis applies in this more
generality. Submodularity is a fundamental notion with many applications in
CHAPTER 2. COVERING PROBLEMS 32

combinatorial optimization and else where. Here we take the opportunity to


provide some definitions and a few results.
Definition 2.10. Given a finite set 𝐸, a real-valued set function 𝑓 : 2𝐸 → ℝ is
submodular iff

𝑓 (𝐴) + 𝑓 (𝐵) ≥ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵) ∀𝐴, 𝐵 ⊆ 𝐸.

Alternatively, 𝑓 is a submodular function iff

𝑓 (𝐴 ∪ {𝑖}) − 𝑓 (𝐴) ≥ 𝑓 (𝐵 ∪ {𝑖}) − 𝑓 (𝐵) ∀𝐴 ⊂ 𝐵, 𝑖 ∈ 𝐸 \ 𝐵.

The second characterization shows that submodularity is based on decreasing


marginal utility property in the discrete setting. Adding element 𝑖 to a set 𝐴 will
help at least as much as adding it to to a (larger) set 𝐵 ⊃ 𝐴. It is common to use
𝐴 + 𝑖 to denote 𝐴 ∪ {𝑖} and 𝐴 − 𝑖 to denote 𝐴 \ {𝑖}.
Exercise 2.7. Prove that the two characterizations of submodular functions are
equivalent.
Many application of submodular functions are when 𝑓 is a non-negative
function though there are several important applications when 𝑓 can be negative.
A submodular function 𝑓 (·) is monotone if 𝑓 (𝐴 + 𝑖) ≥ 𝑓 (𝐴) for all 𝑖 ∈ 𝐸 and
𝐴 ⊆ 𝐸. Typically one assumes that 𝑓 is normalized by which we mean that
𝑓 (∅) = 0; this can always be done by shifting the function by 𝑓 (∅). 𝑓 is
symmetric if 𝑓 (𝐴) = 𝑓 (𝐸 \ 𝐴) for all 𝐴 ⊆ 𝐸. Submodular set functions arise in a
large number of fields including combinatorial optimization, probability, and
geometry. Examples include rank function of a matroid, the sizes of cutsets in a
directed or undirected graph, the probability that a subset of events do not occur
simultaneously, entropy of random variables, etc. In the following we show that
the Set Cover and Maximum Coverage problems can be easily formulated in
terms of submodular set functions.
Exercise 2.8. Let 𝒰 be a set and let 𝒮 = {𝑆1 , 𝑆2 , . . . , 𝑆𝑚 } be a finite collection of
subsets of 𝒰. Let 𝑁 = {1, 2, . . . , 𝑚}, and define 𝑓 : 2𝑁 → ℝ as: 𝑓 (𝐴) = | ∪𝑖∈𝐴 𝑆 𝑖 |
for 𝐴 ⊆ 𝐸. Show that 𝑓 is a monotone non-negative submodular set function.
Exercise 2.9. Let 𝐺 = (𝑉 , 𝐸) be a directed graph and let 𝑓 : 2𝑉 → ℝ where
𝑓 (𝑆) = |𝛿+ (𝑆) is the number of arcs leaving 𝑆. Prove that 𝑓 is submodular. Is the
function monotone?

2.5.1 Submodular Set Cover


When formulated in terms of submodular set functions, the Set Cover problem
is the following. Given a monotone submodular function 𝑓 (whose value would
CHAPTER 2. COVERING PROBLEMS 33

be computed by an oracle) on 𝑁 = {1, 2, . . . , 𝑚}, find the smallest set 𝑆 ⊆ 𝑁


such that 𝑓 (𝑆) = 𝑓 (𝑁). Our previous greedy approximation can be applied to
this formulation as follows.

Greedy Submodular ( 𝑓 , 𝑁)

1. 𝑆 ← ∅

2. While 𝑓 (𝑆) ≠ 𝑓 (𝑁) do

A. find 𝑖 to maximize 𝑓 (𝑆 + 𝑖) − 𝑓 (𝑆)


B. 𝑆 ← 𝑆 ∪ {𝑖}

3. Output 𝑆

Not so easy exercise.

Exercise 2.10. 1. Prove that the greedy algorithm is a 1 + ln( 𝑓 (𝑁)) approxi-
mation for Submodular Set Cover?

2. Prove that the greedy algorithm is a 1 + ln (max𝑖 𝑓 (𝑖)) approximation for


Submodular Set Cover.

The above results were first obtained by Wolsey [158].

2.5.2 Submodular Maximum Coverage


By formulating the Maximum Coverage problem in terms of submodular func-
tions, we seek to maximize 𝑓 (𝑆) such that |𝑆| ≤ 𝑘. We can apply algorithm
Greedy Submodular for this problem by changing the condition in line 2 to be:
while |𝑆| ≤ 𝑘.

Exercise 2.11. Prove that greedy gives a (1 − 1/𝑒)-approximation for Submodular


Maximum Coverage problem when 𝑓 is monotone and non-negative. Hint:
Generalize the main claim that we used for Maximum Coverage .

The above and many related results were shown in the influential papers of
Fisher, Nemhauser and Wolsey [61, 128].

2.6 Covering Integer Programs (CIPs)


There are several extensions of Set Cover that are interesting and useful. Sub-
modular Set Cover is a very general problem while there are intermediate
CHAPTER 2. COVERING PROBLEMS 34

problems of interest such as Set Mmulticover. We refer to the reader to the


relevant chapters in the two reference books. Here we refer to a general problem
called Covering Integer Programs (CIPs for short). The goal is to solve the
following integer program where 𝐴 ∈ ℝ+𝑛×𝑚 is a non-negative matrix. We can
assume without loss of generality that 𝑤 and 𝑏 are also non-negative.

𝑛
Õ
min 𝑤𝑗 𝑥𝑗
𝑗=1
subject to
𝐴𝑥 ≥ 𝑏
𝑥𝑗 ≤ 𝑑𝑗 1≤𝑗≤𝑚
𝑥𝑗 ≥ 0 1≤𝑗≤𝑚
𝑥𝑗 ∈ ℤ 1≤𝑗≤𝑚

𝐴𝑥 ≥ 𝑏 model covering constraints and 𝑥 𝑗 ≤ 𝑑 𝑗 models multiplicity con-


straints. Note that Set Cover is a special case where 𝐴 is simply the incidence
matrix of the sets and elements (the columns correspond to sets and the rows to
elements) and 𝑑 𝑗 = 1 for all 𝑗. What are CIPs modeling? It is a generalization of
Set Cover . To see this, assume, without loss of generality, that 𝐴, 𝑏 are integer
matrices. For each element corresponding to row 𝑖 the quantity 𝑏 𝑖 corresponds
to the requirement of how many times 𝑖 needs to be covered. 𝐴 𝑖𝑗 corresponds to
the number of times set 𝑆 𝑗 covers element 𝑖. 𝑑 𝑗 is an upper bound on the number
of copies of set 𝑆 𝑗 that are allowed to be picked.

Exercise 2.12. Prove that CIPs are a special case of Submodular Set Cover.

One can apply the Greedy algorithm to the above problem and the standard
analysis shows that the approximation ratio obtained is 𝑂(log 𝐵) where 𝐵 = 𝑖 𝑏 𝑖
Í
(assuming that they are integers). Even though this is reasonable we would
prefer a strongly polynomial bound. In fact there are instances where 𝐵 is
exponential in 𝑛 and the worst-case approximation ratio can be poor. The
natural LP relaxation of the above integer program has a large integrality gap in
constrat to the case of Set Cover . One needs to strengthen the LP relaxation via
what are known as knapsack cover inequalities. We refer the reader to the paper of
Kolliopoulos and Young [106] and recent one by Chekuri and Quanrud [40] for
more on this problem.
Chapter 3

Knapsack

In this lecture we explore the Knapsack problem. This problem provides a


good basis for learning some important procedures used for approximation
algorithms that give better solutions at the cost of higher running time.

3.1 The Knapsack Problem


In the Knapsack problem we are given a number (knapsack capacity) 𝐵 ≥ 0, and
a set 𝑁 of 𝑛 items; each item 𝑖 has a given size 𝑠 𝑖 ≥ 0 and a profit 𝑝 𝑖 ≥ 0. We
will assume that all the input numbers are integers (or more generally rationals).
Given a subset of the items 𝐴 ⊆ 𝑁, we define two functions, 𝑠(𝐴) = 𝑖∈𝐴 𝑠 𝑖
Í
and 𝑝(𝐴) = 𝑖∈𝐴 𝑝 𝑖 , representing the total size and profit of the group of items
Í
respectively. The goal is to choose a subset of the items, 𝐴, such that 𝑠(𝐴) ≤ 𝐵
and 𝑝(𝐴) is maximized. We will assume, without loss of generality, that 𝑠 𝑖 ≤ 𝐵
for all 𝑖; we can discard items that do not satisfy this constraint.
It is not difficult to see that if all the profits are identical (say 1), the natural
greedy algorithm that inserts items in the order of non-increasing sizes yields.
Assuming the profits and sizes are integral, we can still find an optimal solution
to the problem using dynamic programming in either 𝑂(𝑛𝐵) or 𝑂(𝑛𝑃) time,
where 𝑃 = 𝑛𝑖=1 𝑝 𝑖 . These are standard exercises. While these algorithms appear
Í
to run in polynomial time, it should be noted that 𝐵 and 𝑃 can be exponential in
the size of the input written in binary. We call such algorithms pseudo-polynomial
time algorithms as their running times are polynomial when numbers in the input
are given in unary. Knapsack is a classical NP-Hard problem and these results
(and the proof of NP-Hardness) show that the hardness manifests itself when
the numbers are large (exponential in 𝑛 which means that the number of bits in
the size or profit are polynomial in 𝑛).

35
CHAPTER 3. KNAPSACK 36

3.1.1 A Greedy Algorithm


Consider the following greedy algorithm for the Knapsack problem which we
will refer to as GreedyKnapsack. We sort all the items by the ratio of their profits
𝑝 𝑝 𝑝
to their sizes so that 𝑠11 ≥ 𝑠22 ≥ · · · ≥ 𝑠 𝑛𝑛 . Afterward, we greedily take items
in this order as long as adding an item to our collection does not exceed the
capacity of the knapsack. It turns out that this algorithm can be arbitrarily bad.
Suppose we only have two items in 𝑁. Let 𝑠 1 = 1, 𝑝 1 = 2, 𝑠 2 = 𝐵, and 𝑝 2 = 𝐵.
GreedyKnapsack will take only item 1, but taking only item 2 would be a better
solution and the ratio of the profits in the two cases is 2/𝐵 which can be made
arbitrarily small. As it turns out, we can easily modify this algorithm to provide
a 2-approximation by simply taking the best of GreedyKnapsack’s solution or
the most profitable item. We will call this new algorithm ModifiedGreedy.
Theorem 3.1. ModifiedGreedy has an approximation ratio of 1/2 for the Knapsack
problem.
Proof. Let 𝑘 be the index of the first item that is not accepted by GreedyKnapsack.
Consider the following claim:
Claim 3.1.1. 𝑝 1 + 𝑝2 + . . . 𝑝 𝑘 ≥ OPT. In fact, 𝑝 1 + 𝑝 2 + · · · + 𝛼𝑝 𝑘 ≥ OPT where
𝛼 = 𝐵−(𝑠1 +𝑠2𝑠+···+𝑠
𝑘
𝑘−1 )
is the fraction of item 𝑘 that can still fit in the knapsack after
packing the first 𝑘 − 1 items.
The proof of Theorem 3.1 follows immediately from the claim. In particular,
either 𝑝 1 + 𝑝2 + · · · + 𝑝 𝑘−1 or 𝑝 𝑘 must be at least OPT/2. We now only have to
prove Claim 3.1.1. We give an LP relaxation of the Knapsack problem as follows:
Here, 𝑥 𝑖 ∈ [0, 1] denotes the fraction of item 𝑖 packed in the knapsack.
𝑛
Õ
maximize 𝑝𝑖 𝑥𝑖
𝑖=1
Õ𝑛
subject to 𝑠𝑖 𝑥𝑖 ≤ 𝐵
𝑖=1
𝑥 𝑖 ≤ 1 for all 𝑖 in {1 . . . 𝑛}
𝑥 𝑖 ≥ 0 for all 𝑖 in {1 . . . 𝑛}
Let OPT𝐿𝑃 be the optimal value of the objective function in this linear
programming instance. Any solution to Knapsack is a feasible solution to the LP
and both problems share the same objective function, so OPT𝐿𝑃 ≥ OPT. Now
set 𝑥 1 = 𝑥 2 = · · · = 𝑥 𝑘−1 = 1, 𝑥 𝑘 = 𝛼, and 𝑥 𝑖 = 0 for all 𝑖 > 𝑘. This is a feasible
solution to the LP. We leave it as an exercise to the reader to argue that is an
optimum solution. Therefore, 𝑝 1 + 𝑝 2 + · · · + 𝛼𝑝 𝑘 = OPT0 ≥ OPT. The first
statement of the lemma follows from the second as 𝛼 ≤ 1. 
CHAPTER 3. KNAPSACK 37

3.1.2 A Polynomial Time Approximation Scheme


Using the results from the last section, we make a few simple observations.
Some of these lead to a better approximation.

Observation 3.2. If for all 𝑖, 𝑝 𝑖 ≤ 𝜖 OPT, GreedyKnapsack gives a (1 − 𝜖) approxi-


mation.

Proof. Follows easily from Claim 3.1.1. 


Observation 3.3. There are at most d 1𝜖 e items with profit at least 𝜖 OPT in any optimal
solution.

The next claim is perhaps more interesting and captures the intuition that
the bad case for greedy happens only when there are “big” items.

Claim 3.1.2. If for all 𝑖, 𝑠 𝑖 ≤ 𝜖𝐵, GreedyKnapsack gives a (1 − 𝜖) approximation.

Proof. We give a proof sketch via the LP relaxation. Recall that 𝑘 is the first item
that did not fit in the knapsack. We make the following observation. Recall that
OPT𝐿𝑃 is the optimum value of LP relaxation. Suppose we reduce the knapsack
capacity to 𝐵0 = 𝑠 1 + 𝑠 2 + . . . 𝑠 𝑘−1 while keeping all the items the same. Let
OPT0𝐿𝑃 be the value for the new size. We claim that OPT0𝐿𝑃 ≥ 𝐵𝐵 OPT𝐿𝑃 — this
0

is because we can take any feasible solution to the original LP and scale each
variable by 𝐵0/𝐵 to obtain a feasible solution with the new capacity. What is
OPT0𝐿𝑃 ? We note that Greedy will fill 𝐵0 to capacity with the first 𝑘 − 1 items and
hence, OPT0𝐿𝑃 = 𝑝1 + . . . + 𝑝 𝑘−1 . Combining, we obtain that

𝐵0 𝐵0
𝑝 1 + . . . + 𝑝 𝑘−1 ≥ OPT𝐿𝑃 ≥ OPT .
𝐵 𝐵
We note that 𝐵0 + 𝑠 𝑘 ≥ 𝐵 since item 𝑘 did not fit, and hence 𝐵0 ≥ 𝐵 − 𝑠 𝑘 ≥ 𝐵 − 𝜖𝐵 ≥
(1 − 𝜖)𝐵. Therefore 𝐵0/𝐵 ≥ (1 − 𝜖) and this finishes the proof. 
We may now describe the following algorithm. Let 𝜖 ∈ (0, 1) be a fixed
constant and let ℎ = d 1𝜖 e. We will try to guess the ℎ most profitable items in an
optimal solution and pack the rest greedily.
CHAPTER 3. KNAPSACK 38

Guess h + Greedy(𝑁 , 𝐵)

1. For each 𝑆 ⊆ 𝑁 such that |𝑆| ≤ ℎ and 𝑠(𝑆) ≤ 𝐵 do

A. Pack 𝑆 in knapsack of size at most 𝐵


B. Let 𝑖 be the least profitable item in 𝑆 . Remove all items 𝑗 ∈ 𝑁 − 𝑆
where 𝑝 𝑗 > 𝑝 𝑖 .
GreedyKnapsack on remaining items with remaining capacity
C. Run Õ
𝐵− 𝑠𝑖
𝑖∈𝑆

2. Output best solution from above

Theorem 3.4. Guess h + Greedy gives a (1−𝜖) approximation and runs in 𝑂(𝑛 d1/𝜖e+1 )
time.

Proof. For the running time, observe that there are 𝑂(𝑛 ℎ ) subsets of 𝑁. For each
subset, we spend linear time greedily packing the remaining items. The time
initially spent sorting the items can be ignored thanks to the rest of the running
time.
For the approximation ratio, consider a run of the loop where 𝑆 actually is
the ℎ most profitable items in an optimal solution and the algorithm’s greedy
stage packs the set of items 𝐴0 ⊆ (𝑁 − 𝑆). Let OPT0 be the optimal way to pack
the smaller items in 𝑁 − 𝑆 so that OPT = 𝑝(𝑆) + OPT0. Let item 𝑘 be the first
item rejected by the greedy packing of 𝑁 − 𝑆. We know 𝑝 𝑘 ≤ 𝜖 OPT so by Claim
3.1.1 𝑝(𝐴0) ≥ OPT0 −𝜖 OPT. This means the total profit found in that run of the
loop is 𝑝(𝑆) + 𝑝(𝐴0) ≥ (1 − 𝜖) OPT. 
Note that for any fixed choice of 𝜖 > 0, the preceding algorithm runs
in polynomial time. This type of algorithm is known as a polynomial time
approximation scheme or PTAS. The term “scheme” refers to the fact that the
algorithm varies with 𝜖. We say a maximization problem Π has a PTAS if for all
𝜖 > 0, there exists a polynomial time algorithm that gives a (1 − 𝜖) approximation
((1 + 𝜖) for minimization problems). In general, one can often find a PTAS for a
problem by greedily filling in a solution after first searching for a good basis on
which to work. As described below, Knapsack actually has something stronger
known as a fully polynomial time approximation scheme or FPTAS. A maximization
problem Π has a FPTAS if for all 𝜖 > 0, there exists an algorithm that gives
a (1 − 𝜖) approximation ((1 + 𝜖) for minimization problems) and runs in time
polynomial in both the input size and 1/𝜖.
CHAPTER 3. KNAPSACK 39

3.1.3 Rounding and Scaling


Earlier we mentioned exact algorithms based on dynamic programming that
run in 𝑂(𝑛𝐵) and 𝑂(𝑛𝑃) time but noted that 𝐵 and 𝑃 may be prohibitively large.
If we could somehow decrease one of those to be polynomial in 𝑛 without losing
too much information, we might be able to find an approximation based on one
of these algorithms. Let 𝑝 max = max𝑖 𝑝 𝑖 and note the following.
Observation 3.5. 𝑝 max ≤ OPT ≤ 𝑛𝑝max
Now, fix some 𝜖 ∈ (0, 1). We want to scale the profits and round them to
be integers so we may use the 𝑂(𝑛𝑃) algorithm efficiently while still keeping
enough information in the numbers to allow for an accurate approximation.
For each 𝑖, let 𝑝 0𝑖 = b 𝑛𝜖 𝑝max
1
𝑝 𝑖 c. Observe that 𝑝 0𝑖 ≤ 𝑛𝜖 so now the sum of the
profits 𝑃 0 is at most 𝑛𝜖 . Also, note that we lost at most 𝑛 profit from the scaled
2

optimal solution during the rounding, but the scaled down OPT is still at least
𝑛
𝜖 . We have only lost an 𝜖 fraction of the solution. This process of rounding and
scaling values for use in exact algorithms has use in a large number of other
maximization problems. We now formally state the algorithm Round&Scale
and prove its correctness and running time.

Round&Scale(𝑁, 𝐵)
𝑛 1
1. For each 𝑖 set 𝑝 0𝑖 = b 𝑝𝑖 c
𝜖 𝑝max
2. Run exact algorithm with run time 𝑂(𝑛𝑃 0) to obtain 𝐴

3. Output 𝐴

Theorem 3.6. Round&Scale gives a (1 − 𝜖) approximation and runs in 𝑂( 𝑛𝜖 ) time.


3

Proof. The rounding can be done in linear time and as 𝑃 0 = 𝑂( 𝑛𝜖 ), the dy-
2

namic programing portion of the algorithm runs in 𝑂( 𝑛𝜖 ) time. To show


3

the approximation ratio, let 𝛼 = 𝑛𝜖 𝑝max


1
and let 𝐴 be the solution returned by
the algorithm and 𝐴 be the optimal solution. Observe that for all 𝑋 ⊆ 𝑁,

𝛼𝑝(𝑋) − |𝑋 | ≤ 𝑝 0(𝑋) ≤ 𝛼𝑝(𝑋) as the rounding lowers each scaled profit by at


most 1. The algorithm returns the best choice for 𝐴 given the scaled and rounded
values, so we know 𝑝 0(𝐴) ≥ 𝑝 0(𝐴∗ ).

1 0 1 𝑛
𝑝(𝐴) ≥ 𝑝 (𝐴) ≥ 𝑝 0(𝐴∗ ) ≥ 𝑝(𝐴∗ ) − = OPT −𝜖𝑝max ≥ (1 − 𝜖) OPT
𝛼 𝛼 𝛼

CHAPTER 3. KNAPSACK 40

It should be noted that this is not the best FPTAS known for Knapsack. In
particular, [111] shows a FPTAS that runs in 𝑂(𝑛 log(1/𝜖) + 1/𝜖 4 ) time. There
have been several improvements and we refer the reader to Chan’s paper for the
latest [32].

3.2 Other Problems


There are many variants of Knapsack and it is a fundamental problem of interest
in integer programming as well in several other areas. One can find a book
length treatment in [102]. We close with an interesting variant.
Multiple Knapsack: The input now consists of 𝑚 knapsacks with capacities
𝐵1 , 𝐵2 , . . . , 𝐵𝑚 and 𝑛 items with sizes and profits as in Knapsack. We again wish
to pack items to obtains as large a profit as possible, except now we have more
than one knapsack with which to do so. An interesting special case is when
all the knapsack capacities are the same quantity 𝐵 which is related to the well
known Bin Packing problem.
Chapter 4

Packing Problems

In the previous lecture we discussed the Knapsack problem. In this lecture we


discuss other packing and independent set problems. We first discuss an abstract
model of packing problems. Let 𝑁 be a finite ground set. A collection of ℐ ⊂ 2𝑁
of subsets of 𝑁 is said to be down closed if the following property is true: 𝐴 ∈ ℐ
implies that for all 𝐵 ⊂ 𝐴, 𝐵 ∈ ℐ. A down closed collection is also often called
and independence system. The sets in ℐ are called independent sets. Given an
independence family (𝑁 , ℐ) and a non-negative weight function 𝑤 : 𝑁 → ℝ+
the maximum weight independent set problem is to find max𝑆∈ℐ 𝑤(𝑆). That is,
find an independent set in ℐ of maximum weight. Often we may be interested in
the setting where all weights are 1 in which case we wish to find the maximum
cardinality independent set. We discuss some canonical examples.
Example 4.1. Independent sets in graphs: Given a graph 𝐺 = (𝑉 , 𝐸)
ℐ = {𝑆 ⊆ 𝑉 | there are no edges between nodes in 𝑆}. Here the ground set is
𝑉. There are many interesting special cases of the graph problem. For instance
problems arising from geometric objects such as intervals, rectangles, disks and
others.
Example 4.2. Matchings in graphs: Given a graph 𝐺 = (𝑉 , 𝐸) let ℐ = {𝑀 ⊆ 𝐸 |
𝑀 is a matching in 𝐺}. Here the ground set is 𝐸.
Example 4.3. Matroids: A matroid ℳ = (𝑁 , ℐ) is defined as a system where ℐ is
down closed and in addition satisfies the following key property: if 𝐴, 𝐵 ∈ ℐ
and |𝐵| > |𝐴| then there is an element 𝑒 ∈ 𝐵 \ 𝐴 such that 𝐴 ∪ {𝑒} ∈ ℐ. There
are many examples of matroids. We will not go into details here.
Example 4.4. Intersections of independence systems: given some 𝑘 independence
systems on the same ground set (𝑁 , ℐ1 ), (𝑁 , ℐ2 ), . . . , (𝑁 , ℐ𝑘 ) the system defined
by (𝑁 , ℐ1 ∩ ℐ2 . . . ∩ ℐ𝑘 ) is also an independence system. Well-known examples
include intersections of matroids.

41
CHAPTER 4. PACKING PROBLEMS 42

4.1 Maximum Independent Set Problem in Graphs


A basic graph optimization problem with many applications is the maximum
(weighted) independent set problem (MIS) in graphs.

Definition 4.1. Given an undirected graph 𝐺 = (𝑉 , 𝐸) a subset of nodes 𝑆 ⊆ 𝑉 is an


independent set (stable set) iff there is no edge in 𝐸 between any two nodes in 𝑆. A
subset of nodes 𝑆 is a clique if every pair of nodes in 𝑆 have an edge between them in 𝐺.

The MIS problem is the following: given a graph 𝐺 = (𝑉 , 𝐸) find an


independent set in 𝐺 of maximum cardinality. In the weighted case, each node
𝑣 ∈ 𝑉 has an associated non-negative weight 𝑤(𝑣) and the goal is to find a
maximum weight independent set. This problem is NP-Hard and it is natural to
ask for approximation algorithms. Unfortunately, as the famous theorem below
shows, the problem is extremely hard to approximate.

Theorem 4.2 (Håstad [80]). Unless 𝑃 = 𝑁𝑃 there is no 𝑛 1−𝜖 1


-approximation for MIS
for any fixed 𝜖 > 0 where 𝑛 is the number of nodes in the given graph.

Remark 4.1. The maximum clique problem is to find the maximum cardinality
clique in a given graph. It is approximation-equivalent to the MIS problem;
simply complement the graph.
The theorem basically says the following: there are a class of graphs in which
the maximum independent set size is either less than 𝑛 𝛿 or greater than 𝑛 1−𝛿
and it is NP-Complete to decide whether a given graph falls into the former
category or the latter.
The lower bound result suggests that one should focus on special cases, and
several interesting positive results are known. First, we consider a simple greedy
algorithm for the unweighted problem.

Greedy(𝐺 )

1. 𝑆 ← ∅

2. While 𝐺 is not empty do

A. Let 𝑣 be a node of minimum degree in 𝐺


B. 𝑆 ← 𝑆 ∪ {𝑣}
C. Remove 𝑣 and its neighbors from 𝐺

3. Output 𝑆
CHAPTER 4. PACKING PROBLEMS 43

Theorem 4.3. Greedy outputs an independent set 𝑆 such that |𝑆| ≥ 𝑛/(Δ + 1) where Δ
is the maximum degree of any node in the graph. Moreover |𝑆| ≥ 𝛼(𝐺)/Δ where 𝛼(𝐺)
is the cardinality of the largest independent set. Thus Greedy is a 1/Δ approximation.
Proof. We upper bound the number of nodes in 𝑉 \ 𝑆 as follows. A node 𝑢 is
in 𝑉 \ 𝑆 because it is removed as a neighbor of some node 𝑣 ∈ 𝑆 when Greedy
added 𝑣 to 𝑆. Charge 𝑢 to 𝑣. A node 𝑣 ∈ 𝑆 can be charged at most Δ times since
it has at most Δ neighbors. Hence we have that |𝑉 \ 𝑆| ≤ Δ|𝑆|. Since every node
is either in 𝑆 or 𝑉 \ 𝑆 we have |𝑆| + |𝑉 \ 𝑆| = 𝑛 and therefore (Δ + 1)|𝑆| ≥ 𝑛 which
implies that |𝑆| ≥ 𝑛/(Δ + 1).
We now argue that |𝑆| ≥ 𝛼(𝐺)/Δ. Let 𝑆 ∗ be a largest independent set in 𝐺.
As in the above proof we can charge each node 𝑣 in 𝑆 ∗ \ 𝑆 to a node 𝑢 ∈ 𝑆 \ 𝑆∗
which is a neighbor of 𝑣. The number of nodes charged to a node 𝑢 ∈ 𝑆 \ 𝑆 ∗ is at
most Δ. Thus |𝑆 ∗ \ 𝑆| ≤ Δ|𝑆 \ 𝑆∗ |.

𝑛
Exercise 4.1. Show that Greedy outputs an independent set of size at least 2(𝑑+1)
where 𝑑 is the average degree of 𝐺.
Remark 4.2. The well-known Turan’s theorem shows via a clever argument that
𝑛
there is always an independent set of size (𝑑+1) where 𝑑 is the average degree of
𝐺.
Remark 4.3. For the case of unweighted graphs one can obtain an approximation
log 𝑑
ratio of Ω( 𝑑 log log 𝑑 ) where 𝑑 is the average degree. Surprisingly, under a
complexity theory conjecture called the Unique-Games conjecture it is known to
log2 Δ
be NP-Hard to approximate MIS to within a factor of 𝑂( Δ ) in graphs with
maximum degree Δ when Δ is sufficiently large.
Exercise 4.2. Consider the weigthed MIS problem on graphs of maximum degree
Δ. Alter Greedy to sort the nodes in non-increasing order of the weight and
show that it gives a Δ1 -approximation. Can one obtain an Ω(1/𝑑)-approximation
for the weighted case where 𝑑 is the average degree?
LP Relaxation: One can formulate a simple linear-programming relaxation
for the (weighted) MIS problem where we have a variable 𝑥(𝑣) for each node
𝑣 ∈ 𝑉 indicating whether 𝑣 is chosen in the independent set or not. We have
constraints which state that for each edge (𝑢, 𝑣) only one of 𝑢 or 𝑣 can be chosen.
Õ
maximize 𝑤(𝑣)𝑥(𝑣)
𝑣∈𝑉
subject to 𝑥(𝑢) + 𝑥(𝑣) ≤ 1 (𝑢, 𝑣) ∈ 𝐸
𝑥(𝑣) ∈ [0, 1] 𝑣∈𝑉
CHAPTER 4. PACKING PROBLEMS 44

Although the above is a valid integer programming relaxation of MIS when


the variabels are constrained to be in {0, 1}, it is not a particularly useful
formulation for the following simple reason.
Claim 4.1.1. For any graph the optimum value of the above LP relaxation is at least
𝑤(𝑉)/2. In particular, for the unweighted case it is at least 𝑛/2.
Simply set each 𝑥(𝑣) to 1/2!
One can obtain a strengthened formulation below by observing that if 𝑆 is
clique in 𝐺 then any independent set can pick at most one node from 𝑆.

Õ
maximize 𝑤(𝑣)𝑥(𝑣)
𝑣∈𝑉
Õ
subject to 𝑥(𝑣) ≤ 1 𝑆 is a clique in 𝐺
𝑣∈𝑆
𝑥(𝑣) ∈ [0, 1] 𝑣∈𝑉
The above linear program has an exponential number of constraints, and it
cannot be solved in polynomial time in general, but for some special cases of
interest the above linear program can indeed be solved (or approximately solved)
in polynomial time and leads to either exact algorithms or good approximation
bounds.
Approximability of Vertex Cover and MIS: The following is a basic fact and
is easy to prove.
Fact 4.1. In any graph 𝐺 = (𝑉 , 𝐸), 𝑆 is a vertex cover in 𝐺 if and only if 𝑉 \ 𝑆 is an
independent set in 𝐺. Thus 𝛼(𝐺) + 𝛽(𝐺) = |𝑉 | where 𝛼(𝐺) is the size of a maximum
independent set in 𝐺 and 𝛽(𝐺) is the size of a minimum vertex cover in 𝐺.
The above shows that if one of Vertex Cover or MIS is NP-Hard then the
other is as well. We have seen that Vertex Cover admits a 2-approximation
while MIS admits no constant factor approximation. It is useful to see why a
2-approximation for Vertex Cover does not give any useful information for MIS
even though 𝛼(𝐺) + 𝛽(𝐺) = |𝑉 |. Suppose 𝑆 ∗ is an optimal vertex cover and has
size ≥ |𝑉 |/2. Then a 2-approximation algorithm is only guaranteed to give a
vertex cover of size |𝑉 |! Hence one does not obtain a non-trivial independent
set by complementing the approximate vertex cover.
Some special cases of MIS: We mention some special cases of MIS that have
been considered in the literature, this is by no means an exhaustive list.
• Interval graphs; these are intersection graphs of intervals on a line. An
exact algorithm can be obtained via dynamic programming and one can
solve more general versions via linear programming methods.
CHAPTER 4. PACKING PROBLEMS 45

• Note that a maximum (weight) matching in a graph 𝐺 can be viewed as


a maximum (weight) independent set in the line-graph of 𝐺 and can be
solved exactly in polynomial time. This has been extended to what are
known as claw-free graphs.

• Planar graphs and generalizations to bounded-genus graphs, and graphs


that exclude a fixed minor. For such graphs one can obtain a PTAS due to
ideas originally from Brenda Baker.

• Geometric intersection graphs. For example, given 𝑛 disks on the plane


find a maximum number of disks that do not overlap. One could consider
other (convex) shapes such as axis parallel rectangles, line segments,
pseudo-disks etc. A number of results are known. For example a PTAS is
known for disks in the plane. An Ω( log1 𝑛 )-approximation for axis-parallel
1
rectangles in the plane when the rectangles are weighted and an Ω( log log 𝑛 )-
approximation for the unweighted case. For the unweighted case, very
recently, Mitchell obtained a constant factor approximation!

4.1.1 Elimination Orders and MIS


We have seen that a simple Greedy algorithm gives a Δ-approximation for MIS
in graphs with max degree Δ. One can also get a Δ approximation for a larger
class of Δ-degenerate graphs. To motivate degenerate graphs consider the class
of planar graphs. The maximum degree of a planar graph need not be small.
Nevertheless, via Euler’s theorem, we know that every planar graph has a vertex
of degree at most 5 since the maximum number of edges in a planar graph is at
most 3𝑛 − 6. Moreover, every subgraph of a planar graph is planar, and hence
the Greedy algorithm will repeatedly find a vertex of degree at most 5 in each
iteration. From this one can show that Greedy gives a 1/5-approximation for
MIS in planar graphs. Now consider the intersection graph of a collection of
intervals on the real line. That is, we are given 𝑛 intervals 𝐼1 , 𝐼2 , . . . , 𝐼𝑛 where
each 𝐼 𝑖 = [𝑎 𝑖 , 𝑏 𝑖 ] for real numbers 𝑎 𝑖 ≤ 𝑏 𝑖 . The goal is to find a maximum
number of the intervals in the given set of intervals which do not overlap. This
is the same as finding MIS in the intersection graph of the intervals - the graph is
obtained by creating a vertex 𝑣 𝑖 for each 𝐼 𝑖 , and by adding edges 𝑣 𝑖 𝑣 𝑗 if 𝐼 𝑖 and 𝐼 𝑗
overlap. It is well-known that greedily picking intervals in earliest finish time
order (ordering them according to 𝑏 𝑖 values) is optimal; the reader should try to
prove this. Can one understand the analysis of all these examples in a unified
fashion? Yes. For this purpose we consider the class of inductive 𝑘-independent
graphs considered by by Akcoglu et al. [7] and later again by Ye and Borodin
[159].
CHAPTER 4. PACKING PROBLEMS 46

For a vertex 𝑣 in a graph we use 𝑁(𝑣) denote the neighbors of 𝑣 (not including
𝑣 itself). For a graph 𝐺 = (𝑉 , 𝐸) and 𝑆 ⊂ 𝑉 we use 𝐺[𝑆] to denote the subgraph
of 𝐺 induced by 𝑆.
Definition 4.4. An undirected graph 𝐺 = (𝑉 , 𝐸) is inductive 𝑘-independent if there
is an ordering of the vertices 𝑣 1 , 𝑣2 , . . . , 𝑣 𝑛 such that for 1 ≤ 𝑖 ≤ 𝑛, 𝛼(𝐺[𝑁(𝑣 𝑖 ) ∩
{𝑣 𝑖+1 , . . . , 𝑣 𝑛 }]) ≤ 𝑘.
Graphs which are inductively 1-independent have a perfect elimination order-
ing and are called chordal graphs because they have an alternate characterization.
A graph is chordal iff each cycle 𝐶 in 𝐺 has a chord (an edge connecting two
nodes of 𝐶 which is not an edge of 𝐶), or in other words there is no induced
cycle of length more than 3.
Exercise 4.3. Prove that the intersection graph of intervals is chordal.
Exercise 4.4. Prove that if Δ(𝐺) ≤ 𝑘 then 𝐺 is inductively 𝑘-independent. Prove
that if 𝐺 is 𝑘-degenerate then 𝐺 is inductively 𝑘-independent.
The preceding shows that planar graphs are inductively 5-independent. In
fact, one can show something stronger, that they are inductively 3-independent.
Given a graph 𝐺 one can ask whether there is an algorithm that checks whether
𝐺 is inductively 𝑘-independent. There is such an algorithm that runs in time
𝑂(𝑘 2 𝑛 𝑘+2 ) [159]. A classical result shows how to recognize chordal graphs (𝑘 = 1)
in linear time. However, most of the useful applications arise by showing that a
certain class of graphs are inductively 𝑘-independent for some small value of 𝑘.
See [159] for several examples.
Exercise 4.5. Prove that the Greedy algorithm that considers the vertices in the
inductive 𝑘-independent order gives a 1𝑘 -approximation for MIS.
Interestingly one can obtain a 1𝑘 -approximation for the maximum weight
independent set problem in inductively 𝑘-independent graphs. The algorithm
is simple and runs in linear time but is not obvious. To see this consider the
weighted problem for intervals. The standard algorithm to solve this is via
dynamic programming. However, one can obtain an optimum solution for
all chordal graphs (given the ordering). We refer the reader to [159] for the
algorithm and proof (originally from [7]). Showing a Ω(1/𝑘)-approximation is
easier.

4.2 The efficacy of the Greedy algorithm for a class of


Independence Families
The Greedy algorithm can be defined easily for an arbitrary independence
system. It iteratively adds the best element to the current independent set while
CHAPTER 4. PACKING PROBLEMS 47

maintaining feasibility. Note that the implementation of the algorithm requires


having an oracle to find the best element to add to a current independent set 𝑆.

Greedy(𝑁 ,ℐ )

1. 𝑆 ← ∅

2. While (TRUE)

A. Let 𝐴 ← {𝑒 ∈ 𝑁 \ 𝑆 | 𝑆 + 𝑒 ∈ ℐ}
B. If 𝐴 = ∅ break
C. 𝑒 ← argmax𝑒∈𝐴 𝑤(𝑒)
D. 𝑆 ← 𝑆 ∪ {𝑒}

3. Output 𝑆

Exercise 4.6. Prove that the Greedy algorithm gives a 1/2-approximation for the
maximum weight matching problem in a general graph. Also prove that this
bound is tight even in bipartite graphs. Note that max weight matching can be
solved exactly in polynomial time.

Remark 4.4. It is well-known that the Greedy algorithm gives an optimum


solution when (𝑁 , ℐ) is a matroid. Kruskal’s algorithm for min/max weight
spanning tree is a special case of this fact.
It is easy to see that Greedy does poorly for MIS problem in general graphs.
A natural question is what properties of ℐ enable some reasonable performance
guarantee for Greedy. A very general result in this context has been established
due to Jenkyn’s generalizing several previous results. In order to state the result
we set up some notation. Given an independence system (𝑁 , ℐ) we say that a
set 𝐴 ∈ ℐ is a base if it is a maximal independent set. It is well-known that in
a matroid ℳ all bases have the same cardinality. However this is not true in
general independence system.

Definition 4.5. An independence system (𝑁 , ℐ) is a 𝑘-system if for any two bases


𝐴, 𝐵 ∈ ℐ, |𝐴| ≤ 𝑘|𝐵|. That is, the ratio of the cardinality of a maximum base and the
cardinality of a minimum base is at most 𝑘.

The following theorem is not too difficult but not so obvious either.

Theorem 4.6. Greedy gives a 1/𝑘-approximation for the maximum weight independent
set problem in a 𝑘-system.
CHAPTER 4. PACKING PROBLEMS 48

The above theorem generalizes and unifies several examples that we have
seen so far including MIS in bounded degree graphs, matchings, matroids etc.
How does one see that a given independence system is indeed a 𝑘-system for
some parameter 𝑘? For instance matchings in graphs form a 2-system. The
following simple lemma gives an easy way to argue that a given system is a
𝑘-system.

Lemma 4.1. Suppose (𝑁 , ℐ) is an independence system with the following property: for
any 𝐴 ∈ ℐ and 𝑒 ∈ 𝑁 \ 𝐴 there is a set 𝑌 ⊂ 𝐴 such that |𝑌| ≤ 𝑘 and (𝐴 \ 𝑌) ∪ {𝑒} ∈ ℐ.
Then ℐ is a 𝑘-system.

We leave the proof of the above as an exercise.


We refer the reader to [59, 120] for analysis of Greedy in 𝑘-systems and other
special cases.

4.3 Randomized Rounding with Alteration for Packing


Problems
The purpose of this section to highlight a technique for rounding LP relaxations
for packing problems. We will consider a simple example, namely the maximum
weight independent set problem in interval graphs. Recall that we are given 𝑛
intervals 𝐼1 , 𝐼2 , . . . , 𝐼𝑛 with non-negative weights 𝑤 1 , . . . , 𝑤 𝑛 and the goal is to
find a maximum weight subset of them which do not overlap. Let 𝐼 𝑖 = [𝑎 𝑖 , 𝑏 𝑖 ]
and let 𝑝 1 , 𝑝2 , . . . , 𝑝 𝑚 be the collection of end points of the intervals. We can
write a simple LP relaxation for this problem. For each interval 𝑖 we have a
variable 𝑥 𝑖 ∈ [0, 1] to indicate whether 𝐼 𝑖 is chosen or not. For each point 𝑝 𝑗 ,
among all intervals that contain it, at most one can be chosen. These are clique
constraints in the underlying interval graph.

𝑛
Õ
maximize 𝑤𝑖 𝑥𝑖
𝑖=1
Õ
subject to 𝑥𝑖 ≤ 1 1≤𝑗≤𝑚
𝑖:𝑝 𝑗 ∈𝐼 𝑖

𝑥 𝑖 ∈ [0, 1] 1≤𝑖≤𝑛

Note that it is important to retain the constraint that 𝑥 𝑖 ≤ 1. Interestingly


it is known that the LP relaxation defines an integer polytope and hence one
can solve the integer program by solving the LP relaxation! This is because
the incidence matrix defining the LP is totally unimodular (TUM). We refer the
CHAPTER 4. PACKING PROBLEMS 49

reader to books on combinatorial optimization for further background on this


topic. Here we assume that we do not know the integer properties of the LP.
We will round it via a technique that is powerful and generalizes to NP-Hard
variants of the interval scheduling problem among many others.
Suppose we solve the LP and obtain an optimum fraction solution 𝑥 ∗ . We
have 𝑖 𝑤 𝑖 𝑥 ∗𝑖 ≥ 𝑂𝑃𝑇. How do we round to obtain an integer solution whose
Í
value is close to that of 𝑂𝑃𝑇? Suppose we randomly choose 𝐼 𝑖 with probablility
𝑐𝑥 ∗𝑖 for some 𝑐 ≤ 1. Let 𝑅 be the random set of chosen intervals. Then the
expected weight of 𝑅, by linearity of expectation, is 𝑐 𝑖 𝑤 𝑖 𝑥 ∗𝑖 ≥ 𝑐 ·𝑂𝑃𝑇. However,
Í
it is highly likely that the random solution 𝑅 is not going to be feasible. Some
constraint will be violated. The question is how we can fix or alter 𝑅 to find
a subset 𝑅0 ⊆ 𝑅 such that 𝑅0 is a feasible solution and the expected value of
𝑅0 is not too much smaller than that of 𝑅. This depends on the independence
structure.
Here we illustrate this via the interval problem. Without loss of generality
we assume that 𝐼1 , . . . , 𝐼𝑛 are sorted by their right end point. In other words the
order is a perfect elimination order for the underlying interval graph.

Rounding-with-Alteration

1. Let 𝑥 be an optimum fractional solution

2. Round each 𝑖 to 1 independently with probability 𝑥 𝑖 /2. Let 𝑥 0 be rounded


solution.

3. 𝑅 ← {𝑖 | 𝑥 0𝑖 = 1}

4. 𝑆 ← ∅

5. For 𝑖 = 𝑛 down to 1 do

A. If (𝑖 ∈ 𝑅) and (𝑆 ∪ {𝑖} is feasible) then 𝑆 ← 𝑆 ∪ {𝑖}

6. Output feasible solution 𝑆

The algorithm consists of two phases. The first phase is a simple selection
phase via independent randomized rounding. The second phase is deterministic
and is a greedy pruning step in the reverse elimination order. To analyze the
expected value of 𝑆 we consider two binary random variables for each 𝑖, 𝑌𝑖 and
𝑍 𝑖 . 𝑌𝑖 is 1 if 𝑖 ∈ 𝑅 and 0 otherwise. 𝑍 𝑖 is 1 if 𝑖 ∈ 𝑆 and 0 otherwise.
By linearity of expectation,
𝑤 𝑖 𝔼[𝑍 𝑖 ] = 𝑤 𝑖 P[𝑍 𝑖 = 1].
Í Í
Claim 4.3.1. 𝔼[𝑤(𝑆)] = 𝑖 𝑖

Via the independent randomized rounding in the algorithm.


CHAPTER 4. PACKING PROBLEMS 50

Claim 4.3.2. P[𝑌𝑖 = 1] = 𝑥 𝑖 /2.


How do we analyze P[𝑍 𝑖 = 1]? The random variables 𝑍1 , . . . , 𝑍 𝑛 are
not independent and could be highly correlated even though 𝑌1 , . . . , 𝑌𝑛 are
independent. For this purpose we try to understand P[𝑍 𝑖 = 0 | 𝑌𝑖 = 1] which
is the conditional probability that an interval 𝐼 𝑖 that is chosen in the first step
is rejected in the pruning phase. We often would not be able to get an exact
estimate of this quantity but we can upper bound it as follows. Here the ordering
plays a crucial role. Why would 𝐼 𝑖 be rejected in the pruning phase? Note
that when 𝐼 𝑖 is considered in the pruning phase, the only intervals that have
been considered have their right end points after the right end point of 𝐼 𝑖 . Let
𝐴 𝑖 = {𝑗 | 𝑗 > 𝑖 and 𝐼 𝑗 and 𝐼 𝑖 intersect at 𝑏 𝑖 } be the potential set of intervals that
can cause 𝑖 to be rejected. Recall that the LP implies the following constraint:
Õ
𝑥𝑖 + 𝑥𝑗 ≤ 1
𝑗∈𝐴

at the point 𝑏 𝑗 . Let ℰ1 be the event that 𝐼 𝑖 is rejected in the pruning phase. Let
ℰ ∈ be the event that at least one of the intervals in 𝐴 is selected in the first phase.
Note that ℰ1 can happen only if ℰ2 happens. Thus P[ℰ1 ] ≤ P[ℰ2 ]. In general we
try to upper bound P[ℰ2 ]. In this simple case we have an exact formula for it.
Ö Ö
P[ℰ2 ] = 1 − P[𝑌𝑗 = 0] = 1 − (1 − 𝑥 𝑗 /2).
𝑗∈𝐴 𝑗∈𝐴

Î We claim that P[ℰ2 ] ≤ Í 𝑗∈𝐴 𝑥 𝑗 /2 ≤ 1/2. One can derive this by showing that
Í

𝑗∈𝐴 (1 − 𝑥 𝑗 /2) subject to 𝑗∈𝐴 𝑥 𝑗 /2 ≤ 1/2


Í is at least 1/2. Another way of doing
this is via Markov’s inequality. Let 𝑇 = 𝑗∈𝐴 𝑌𝑗 be the number of intervals from
𝐴 selected in the first phase. 𝐸[𝑇] ≤ 𝑗∈𝐴 𝑥 𝑗 /2 < 1/2. By Markov’s inequality
Í
P[𝑇 ≥ 2𝐸[𝑇]] ≤ 1/2. ℰ2 is the event that P[𝑇 ≥ 1].
Using the claim,
P[𝑍 𝑖 = 1 | 𝑌𝑖 = 1] = 1 − P[𝑍 𝑖 = 0|𝑌𝑖 = 1] ≥ 1/2.
This allows us to lower bound the expected weight of the solution output by the
algorithm, and yields a randomized 1/4 approximation.
𝑤 𝑖 𝑥 𝑖 /4.
Í
Claim 4.3.3. 𝔼[𝑤(𝑆)] ≥ 𝑖
Proof. We have
Õ Õ Õ 𝑥𝑖 1 Õ
𝔼[𝑤(𝑆)] = 𝑤 𝑖 P[𝑍 𝑖 = 1] = 𝑤 𝑖 P[𝑌𝑖 = 1] P[𝑍 𝑖 = 1 | 𝑌𝑖 = 1] ≥ 𝑤𝑖 ( · )≥ 𝑤 𝑖 𝑥 𝑖 /4.
4 2
𝑖 𝑖 𝑖 𝑖

This type of rounding has applications to a variety of settings - see [CVZ]
for applications and the general framework called contention resolution schemes.
CHAPTER 4. PACKING PROBLEMS 51

4.4 Packing Integer Programs (PIPs)


We can express the Knapsack problem as the following integer program. We
scaled the knapsack capacity to 1 without loss of generality.

𝑛
Õ
maximize 𝑝𝑖 𝑥𝑖
𝑖=1
Õ
subject to 𝑠𝑖 𝑥𝑖 ≤ 1
𝑖
𝑥 𝑖 ∈ {0, 1} 1≤𝑖≤𝑛

More generally if have multiple linear constraints on the “items” we obtain


the following integer program.

Definition 4.7. A packing integer program (PIP) is an integer program of the form
max{𝑤𝑥 | 𝐴𝑥 ≤ 1, 𝑥 ∈ {0, 1} 𝑛 } where 𝑤 is a 1 × 𝑛 non-negative vector and 𝐴 is a
𝑚 × 𝑛 matrix with entries in [0, 1]. We call it a {0, 1}-PIP if all entries are in {0, 1}.

In some cases it is useful/natural to define the problem as max{𝑤𝑥 | 𝐴𝑥 ≤


𝑏, 𝑥 ∈ {0, 1} 𝑛 } where entries in 𝐴 and 𝑏 are required to rational/integer valued.
We can convert it into the above form by dividing each row of 𝐴 by 𝑏 𝑖 .
When 𝑚 the number of rows of 𝐴 (equivalently the constraints) is small the
problem is tractable. It is some times called the 𝑚-dimensional knapsack and
one can obtain a PTAS for any fixed constant 𝑚. However, when 𝑚 is large
we observe that MIS can be cast as a special case of {0, 1}-PIP. It corresponds
exactly to the simple integer/linear program that we saw in the previous section.
Therefore the problem is at least as hard to approximate as MIS. Here we show
via a clever LP-rounding idea that one can generalize the notion of bounded-
degree to column-sparsity in PIPs and obtain a related approximation. We will
then introduce the notion of width of the constraints and show how it allows for
improved bounds.

Definition 4.8. A PIP is 𝑘-column-sparse if the number of non-zero entries in each


column of 𝐴 is at most 𝑘. A PIP has width 𝑊 if max𝑖,𝑗 𝐴 𝑖𝑗 /𝑏 𝑖 ≤ 1/𝑊.

4.4.1 Randomized Rounding with Alteration for PIPs


We saw that randomized rounding gave an 𝑂(log 𝑛) approximation algorithm
for the Set Cover problem which is a canonical covering problem. Here we
will consider the use of randomized rounding for packing problems. Let 𝑥 be
an optimum fractional solution to the natural LP relaxation of a PIP where we
CHAPTER 4. PACKING PROBLEMS 52

replace the constraint 𝑥 ∈ {0, 1} 𝑛 by 𝑥 ∈ [0, 1]𝑛 . Suppose we apply independent


randomized rounding where we set 𝑥 0𝑖 to 1 with probability 𝑥 𝑖 . Let 𝑥 0 be the
resulting integer solution. The expected weight of this solution is exactly 𝑖 𝑤 𝑖 𝑥 𝑖
Í
which is the LP solution value. However, 𝑥 0 may not satisfy the constraints given
by 𝐴𝑥 ≤ 𝑏. A natural strategy to try to satisfy the constraints is to set 𝑥 10 to 1 with
probability 𝑐𝑥 𝑖 where 𝑐 < 1 is some scaling constant. This may help in satisfying
the constraints because the scaling creates some room in the constraints; we
now have that the expected solution value is 𝑐 𝑖 𝑤 𝑖 𝑥 𝑖 , a loss of a factor of 𝑐.
Í
Scaling by itself does not allow us to claim that all constraints are satisfied with
good probability. A very useful technique in this context is the technique of
alteration; we judiciously fix/alter the rounded solution 𝑥 0 to force it to satisfy
the constraints by setting some of the variables that are 1 in 𝑥 0 to 0. The trick
is to do this in such a way as to have a handle on the final probability that a
variable is set to 1. We will illustrate this for the Knapsack problem and then
generalize the idea to 𝑘-sparse PIPs. The algorithms we present are from [18].
See [CVZ] for further applications and related problems.
Rounding for Knapsack: Consider the Knapsack problem. It is convenient to
think of this in the context of PIPs. So we have 𝑎𝑥 ≤ 1 where 𝑎 𝑖 now represents
the size of item 𝑖 and the knapsack capacity is 1; 𝑤 𝑖 is the weight of item. Suppose
𝑥 is a fractional solution. Call an item 𝑖 “big” if 𝑎 𝑖 > 1/2 and otherwise it is
“small”. Let 𝑆 be the indices of small items and 𝐵 the indices of the big items.
Consider the following rounding algorithm.

Rounding-with-Alteration for Knapsack

1. Let 𝑥 be an optimum fractional solution

2. Round each 𝑖 to 1 independently with probability 𝑥 𝑖 /4. Let 𝑥 0 be rounded


solution.

3. 𝑥 00 = 𝑥 0

4. If ( 𝑥 0𝑖 = 1 for exactly one big item 𝑖 )

A. For each 𝑗 ≠ 𝑖 set 𝑥 00𝑗 = 0


Õ
5. Else If ( 𝑠 𝑖 𝑥 0𝑖 > 1 or two or more big items are chosen in 𝑥 0)
𝑖∈𝑆

A. For each 𝑗 set 𝑥 00𝑗 = 0

6. Output feasible solution 𝑥 00

In words, the algorithm alters the rounded solution 𝑥 0 as follows. If exactly


CHAPTER 4. PACKING PROBLEMS 53

one big item is chosen in 𝑥 0 then the algorithm retains that item and rejects
all the other small items. Otherwise, the algorithm rejects all items if two or
more big items are chosen in 𝑥 0 or if the total size of all small items chosen in 𝑥 0
exceeds the capacity.
The following claim is easy to verify.

Claim 4.4.1. The integer solution 𝑥 00 is feasible.

Now let us analyze the probability of an item 𝑖 being present in the final
solution. Let ℰ1 be the event that 𝑖∈𝑆 𝑎 𝑖 𝑥 0𝑖 > 1, that is the sum of the sizes of
Í
the small items chose in 𝑥 0 exceeds the capacity. Let ℰ2 be the event that at least
one big item is chosen in 𝑥 0.

Claim 4.4.2. P[ℰ1 ] ≤ 1/4.

Proof. Let 𝑋𝑠 = 𝑖∈𝑆 𝑎 𝑖 𝑥 0𝑖 be the random variable that measures the sum of the
Í
sizes of the small items chosen. We have, by linearity of expectation, that
Õ Õ
𝔼[𝑋𝑠 ] = 𝑎 𝑖 𝔼[𝑥 0𝑖 ] = 𝑎 𝑖 𝑥 𝑖 /4 ≤ 1/4.
𝑖∈𝑆 𝑖∈𝑆

By Markov’s inequality, P[𝑋𝑠 > 1] ≤ 𝔼[𝑋𝑠 ]/1 ≤ 1/4. 


Claim 4.4.3. P[ℰ2 ] ≤ 1/2.

Proof. Since the size of each big item in 𝐵 is at least 1/2, we have 1 ≥ 𝑖∈𝐵 𝑎 𝑖 𝑥 𝑖 ≥
Í

𝑖∈𝐵 𝑥 𝑖 /2. Therefore 𝑖∈𝐵 𝑥 𝑖 /4 ≤ 1/2. Event ℰ2 happens if some item 𝑖 ∈ 𝐵 is


Í Í
chosen in the random selection. Since 𝑖 is chosen with probability 𝑥 𝑖 /4, by the
union bound, P[ℰ2 ] ≤ 𝑖∈𝐵 𝑥 𝑖 /4 ≤ 1/2.
Í

Lemma 4.2. Let 𝑍 𝑖 be the indicator random variable that is 1 if 𝑥 00𝑖 = 1 and 0 otherwise.
Then 𝔼[𝑍 𝑖 ] = P[𝑍 𝑖 = 1] ≥ 𝑥 𝑖 /16.

Proof. We consider the binary random variable 𝑋𝑖 which is 1 if 𝑥 0𝑖 = 1. We have


𝔼[𝑋𝑖 ] = P[𝑋𝑖 = 1] = 𝑥 𝑖 /4. We write
𝑥𝑖
P[𝑍 𝑖 = 1] = P[𝑋𝑖 = 1] · P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] = P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1].
4
To lower bound P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] we upper bound the probability P[𝑍 𝑖 =
0|𝑋𝑖 = 1], that is, the probability that we reject 𝑖 conditioned on the fact that it is
chosen in the random solution 𝑥 0.
First consider a big item 𝑖 that is chosen in 𝑥 0. Then 𝑖 is rejected iff if another
big item is chosen in 𝑥 0; the probability of this can be upper bounded by P[ℰ1 ].
CHAPTER 4. PACKING PROBLEMS 54

If item 𝑖 is small then it is rejected if and only if ℰ2 happens or if a big item is


chosen which happens with P[ℰ1 ]. In either case

P[𝑍 𝑖 = 0|𝑋𝑖 = 1] ≤ P[ℰ1 ] + P[ℰ2 ] ≤ 1/4 + 1/2 = 3/4.

Thus,
𝑥𝑖 𝑥𝑖
P[𝑍 𝑖 = 1] = P[𝑋𝑖 = 1] · P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] = (1 − P[𝑍 𝑖 = 0 | 𝑋𝑖 = 1]) ≥ .
4 16

One can improve the above analysis to show that P[𝑍 𝑖 = 1] ≥ 𝑥 𝑖 /8.

Theorem 4.9. The randomized algorithm outputs a feasible solution of expected weight
at least 𝑛𝑖=1 𝑤 𝑖 𝑥 𝑖 /16.
Í

Proof. The expected weight of the output is


Õ Õ Õ
𝔼[ 𝑤 𝑖 𝑥 00𝑖 ] = 𝑤 𝑖 𝔼[𝑍 𝑖 ] ≥ 𝑤 𝑖 𝑥 𝑖 /16
𝑖 𝑖 𝑖

where we used the previous lemma to lower bound 𝔼[𝑍 𝑖 ]. 

Rounding for 𝒌-sparse PIPs: We now extend the rounding algorithm and
analysis above to 𝑘-sparse PIPs. Let 𝑥 be a feasible fractional solution to
max{𝑤𝑥 | 𝐴𝑥 ≤ 1, 𝑥 ∈ [0, 1]𝑛 }. For a column index 𝑖 we let 𝑁(𝑖) = {𝑗 | 𝐴 𝑗,𝑖 > 0}
be the indices of the rows in which 𝑖 has a non-zero entry. Since 𝐴 is 𝑘-
column-sparse we have that |𝑁(𝑖)| ≤ 𝑘 for 1 ≤ 𝑖 ≤ 𝑛. When we have more
than one constraint we cannot classify an item/index 𝑖 as big or small since it
may be big for some constraints and small for others. We say that 𝑖 is small
for constraint 𝑗 ∈ 𝑁(𝑖) if 𝐴 𝑗,𝑖 ≤ 1/2 otherwise 𝑖 is big for constraint 𝑗. Let
𝑆 𝑗 = {𝑖 | 𝑗 ∈ 𝑁(𝑖), and 𝑖 small for 𝑗} be the set of all small columns for 𝑗 and
𝐵 𝑗 = {𝑖 | 𝑗 ∈ 𝑁(𝑖), and 𝑖 small for 𝑗} be the set of all big columns for 𝑗. Note that
𝑆 𝑗 ∩ 𝐵 𝑗 is the set of all 𝑖 with 𝐴 𝑗,𝑖 > 0.
CHAPTER 4. PACKING PROBLEMS 55

Rounding-with-Alteration for 𝑘-sparse PIPs

1. Let 𝑥 be an optimum fractional solution

2. Round each 𝑖 to 1 independently with probability 𝑥 𝑖 /(4𝑘). Let 𝑥 0 be


rounded solution.

3. 𝑥 00 = 𝑥 0

4. For 𝑗 = 1 to 𝑚 do

A. If ( 𝑥 0𝑖 = 1 for exactly one 𝑖 ∈ 𝐵 𝑗 )


1. For each ℎ ∈ 𝑆 𝑗 ∪ 𝐵 𝑗 and ℎ ≠ 𝑖 set 𝑥 00ℎ = 0
Õ
B. Else If ( 𝐴 𝑗,𝑖 𝑥 0𝑖 > 1 or two or more items from 𝐵 𝑗 are chosen in 𝑥 0)
𝑖∈𝑆 𝑗

1. For each ℎ ∈ 𝑆 𝑗 ∪ 𝐵 𝑗 set 𝑥 00ℎ = 0

5. Output feasible solution 𝑥 00

The algorithm, after picking the random solution 𝑥 0, alters it as follows: it


applies the previous algorithm’s strategy to each constraint 𝑗 separately. Thus
an element 𝑖 can be rejected at different constraints 𝑗 ∈ 𝑁(𝑖). We need to bound
the total probability of rejection. As before, the following claim is easy to verify.
Claim 4.4.4. The integer solution 𝑥 00 is feasible.
Now let us analyze the probability of an item 𝑖 being present in the final
solution. Let ℰ1 (𝑗) be the event that 𝑖∈𝑆 𝑗 𝐴 𝑗,𝑖 𝑥 0𝑖 > 1, that is the sum of the sizes
Í
of the items that are small for 𝑗 in 𝑥 0 exceed the capacity. Let ℰ2 (𝑗) be the event
that at least one big item for 𝑗 is chosen in 𝑥 0. The following claims follow from
the same reasoning as the ones before with the only change being the scaling
factor.
Claim 4.4.5. P[ℰ1 (𝑗)] ≤ 1/(4𝑘).
Claim 4.4.6. P[ℰ2 (𝑗)] ≤ 1/(2𝑘).
Lemma 4.3. Let 𝑍 𝑖 be the indicator random variable that is 1 if 𝑥 00𝑖 = 1 and 0 otherwise.
Then 𝔼[𝑍 𝑖 ] = P[𝑍 𝑖 = 1] ≥ 𝑥 𝑖 /(16𝑘).
Proof. We consider the binary random variable 𝑋𝑖 which is 1 if 𝑥 0𝑖 = 1 after the
randomized rounding. We have 𝔼[𝑋𝑖 ] = P[𝑋𝑖 = 1] = 𝑥 𝑖 /(4𝑘). We write
𝑥𝑖
P[𝑍 𝑖 = 1] = P[𝑋𝑖 = 1] · P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] = P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1].
4𝑘
CHAPTER 4. PACKING PROBLEMS 56

We upper bound the probability P[𝑍 𝑖 = 0|𝑋𝑖 = 1], that is, the probability
that we reject 𝑖 conditioned on the fact that it is chosen in the random solution
𝑥 0. We observe that
Õ
P[𝑍 𝑖 = 0|𝑋𝑖 = 1] ≤ (P[ℰ1 (𝑗)] + P[ℰ2 (𝑗)] ≤ 𝑘(1/(4𝑘) + 1/(2𝑘)) ≤ 3/4.
𝑗∈𝑁(𝑖)

We used the fact that 𝑁(𝑖) ≤ 𝑘 and the claims above. Therefore,
𝑥𝑖 𝑥𝑖
P[𝑍 𝑖 = 1] = P[𝑋𝑖 = 1] · P[𝑍 𝑖 = 1 | 𝑋𝑖 = 1] = (1 − P[𝑍 𝑖 = 0 | 𝑋𝑖 = 1]) ≥ .
4𝑘 16𝑘

The theorem below follows by using the above lemma and linearity of
expectation to compare the expected weight of the output of the randomized
algorithm with that of the fractional solution.

Theorem 4.10. The randomized algorithm outputs a feasible solution of expected weight
at least 𝑛𝑖=1 𝑤 𝑖 𝑥 𝑖 /(16𝑘). There is 1/(16𝑘)-approximation for 𝑘-sparse PIPs.
Í

Larger width helps: We saw during the discussion on the Knapsack problem
that if all items are small with respect to the capacity constraint then one can
obtain better approximations. For PIPs we defined the width of a given instance
as 𝑊 if max𝑖,𝑗 𝐴 𝑖𝑗 /𝑏 𝑖 ≤ 1/𝑊; in other words no single item is more than 1/𝑊
times the capacity of any constraint. One can show using a very similar algorithm
and anaylisis as above that the approximation bound improves √ to Ω(1/𝑘 d𝑊e )
for instance with width 𝑊. Thus if 𝑊 = 2 we get a Ω(1/ 𝑘) approximation
instead of Ω(1/𝑘)-approximation. More generally when 𝑊 ≥ 𝑐 log 𝑘/𝜖 for some
sufficiently large constant 𝑐 we can get a (1 − 𝜖)-approximation. Thus, in the
setting with multiple knapsack constraints, the notion of small with respect
𝑐𝜖
to capacities is that in each constraint the size of the item is ≤ log 𝑘 times the
capacity of that constraint.
Chapter 5

Load Balancing and Bin Packing

This chapter is based on notes first scribed by Rachit Agarwal.


In the last lecture, we studied the Knapsack problem. The Knapsack problem
is an NP-hard problem but does admit a pseudo-polynomial time algorithm and can
be solved efficiently if the input size is small. We used this pseudo-polynomial
time algorithm to obtain an FPTAS for Knapsack. In this lecture, we study
another class of problems, known as strongly NP-hard problems.

Definition 5.1 (Strongly NP-hard Problems). An NPO problem 𝜋 is said to be


strongly NP-hard if it is NP-hard even if the inputs are polynomially bounded in
combinatorial size of the problem 1.

Many NP-hard problems are in fact strongly NP-hard. If a problem Π is


strongly NP-hard, then Π does not admit a pseudo-polynomial time algorithm.
We study two such problems in this lecture, Multiprocessor Scheduling and
Bin Packing.

5.1 Load Balancing / MultiProcessor Scheduling


A central problem in scheduling theory is to design a schedule such that the
finishing time of the last jobs (also called makespan) is minimized. This problem
is often referred to as the Load Balancing, the Minimum Makespan Scheduling
or Multiprocessor Scheduling problem.
1An alternative definition: A problem 𝜋 is strongly NP-hard if every problem in NP can be
polynomially reduced to 𝜋 in such a way that numbers in the reduced instance are always written
in unary

57
CHAPTER 5. LOAD BALANCING AND BIN PACKING 58

5.1.1 Problem Description


In the Multiprocessor scheduling problem, we are given 𝑚 identical machines
𝑀1 , . . . , 𝑀𝑚 and 𝑛 jobs 𝐽1 , 𝐽2 , . . . , 𝐽𝑛 . Job 𝐽𝑖 has a processing time 𝑝 𝑖 ≥ 0 and the
goal is to assign jobs to the machines so as to minimize the maximum load2.

5.1.2 Greedy Algorithm


Consider the following greedy algorithm for the Multiprocessor Scheduling
problem which we will call Greedy Multiprocessor Scheduling.
Greedy Multiprocessor Scheduling:
Order (list) the jobs arbitrarily
For 𝑖 = 1 to 𝑛 do
Assign Job 𝐽𝑖 to the machine with least current load
Update load of the machine that receives job 𝐽𝑖

This algorithm is also called a list scheduling algorithm following Graham’s


terminology from his paper from 1966 [68]. The list is the order in which the
jobs are processed and changing it creates different schedules. We prove that the
Greedy Multiprocessor Scheduling algorithm gives a (2 − 1/𝑚)-approximation.
1
Theorem 5.2. Greedy Multiprocessor Scheduling algorithm gives a (2 − 𝑚 )-
approximation for any list.
To prove the theorem, we will make use of two lower bounds on the length
of the optimal schedule which we denote by OPT.
𝑝𝑖
Í
𝑖
Observation 5.3. OPT ≥ .
𝑚
Observation 5.4. OPT ≥ max𝑖 𝑝 𝑖 .
We leave the proofs of the observations as easy exercises.
Proof of Theorem 5.2. Fix the list and let 𝐿 denote the makespan of the Greedy
Multiprocessor Scheduling algorithm. Let 𝐿 𝑖 denote the load of machine 𝑀 𝑖
and let 𝑀 𝑖 ∗ be the most heavily loaded machine in the schedule by Greedy
Multiprocessor Scheduling algorithm.
Let 𝐽𝑘 be the last job assigned to 𝑀 𝑖 ∗ . Since Greedy Multiprocessor Schedul-
ing algorithm assigns a job to the machine that is least loaded, all machines must
be loaded at least 𝐿 − 𝑝 𝑘 at the time of assigning 𝐽𝑘 . Hence, we have:
𝑛
!
Õ
𝑝 𝑖 − 𝑝 𝑘 ≥ 𝑚 (𝐿 − 𝑝 𝑘 ) (5.1)
𝑖=1

2The load of a machine is defined as the sum of the processing times of jobs that are assigned
to that machine.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 59

which implies:
Í𝑛
( 𝑖=1 𝑝𝑖 ) − 𝑝𝑘
𝐿 − 𝑝𝑘 ≤
𝑚
hence Í𝑛
𝑝𝑖 )
 
( 𝑖=1 1
𝐿 ≤ + 𝑝𝑘 1−
𝑚 𝑚
 
1
≤ OPT + OPT 1 −
𝑚
 
1
= OPT 2 −
𝑚

where the third step follows from the two lower bounds on OPT. 
The above analysis is tight, i.e., there exist instances where the greedy
algorithm produces a schedule which has a makespan (2 − 1/𝑚) times the
optimal. Consider the following instance: 𝑚(𝑚 − 1) jobs with unit processing
time and a single job with processing time 𝑚. Suppose the greedy algorithm
schedules all the short jobs before the long job, then the makespan of the schedule
obtained is (2𝑚 − 1) while the optimal makespan is 𝑚. Hence the algorithm
gives a schedule which has makespan 2 − 1/𝑚 times the optimal.
It may seem from the tight example above that an approximation ratio
𝛼 < (2 − 1/𝑚) could be achieved if the jobs are sorted before processing, which
indeed is the case. The following algorithm, due to [69], sorts the jobs in
decreasing order of processing time prior to running Greedy Multiprocessor
Scheduling algorithm.

Modified Greedy Multiprocessor Scheduling:


Sort the jobs in decreasing order of processing times
For 𝑖 = 1 to 𝑛 do
Assign Job 𝐽𝑖 to the machine with least current load
Update load of the machine that receives job 𝐽𝑖

Graham [69] proved the following tight bound.

Theorem 5.5. Modified Greedy Multiprocessor Scheduling algorithm gives a


(4/3 − 1/3𝑚)-approximation for the Multiprocessor Scheduling problem.

We will not prove the preceding theorem which requires some careful case
analysis. Instead we will show how one can obtain an easier bound of 3/2 via
the following claim.

Claim 5.1.1. Suppose 𝑝 𝑖 ≥ 𝑝 𝑗 for all 𝑖 > 𝑗 and 𝑛 > 𝑚. Then, OPT ≥ 𝑝 𝑚 + 𝑝 𝑚+1 .
CHAPTER 5. LOAD BALANCING AND BIN PACKING 60

Proof. Since 𝑛 > 𝑚 and the processing times are sorted in decreasing order,
some two of the (𝑚 + 1) largest jobs must be scheduled on the same machine.
Notice that the load of this machine is at least 𝑝 𝑚 + 𝑝 𝑚+1 . 
Exercise 5.1. Prove that Modified Greedy Multiprocessor Scheduling gives a
(3/2 − 1/2𝑚)-approximation using the preceding claim and the other two lower
bounds on OPT that we have seen already.
Before going to the description of a PTAS for Multiprocessor Scheduling
problem, we discuss the case when the processing times of the jobs are bounded
from above.
Claim 5.1.2. If 𝑝 𝑖 ≤ 𝜖·OPT, ∀𝑖, then Modified Greedy Multiprocessor Scheduling
gives a (1 + 𝜖)-approximation.

5.1.3 A PTAS for Multi-Processor Scheduling


We will now give a PTAS for the problem of scheduling jobs on identical
machines. We would like to use the same set of ideas that were used for the
Knapsack problem (see Lecture 4): that is, given an explicit time 𝑇 we would
like to round the job lengths and use dynamic programming to see if they will fit
within time 𝑇. Then the unrounded job lengths should fit within time 𝑇(1 + 𝜖).
Big Jobs, Small Jobs and Rounding Big Jobs: For the discussion that follows,
we assume that all the processing times have been scaled so that OPT = 1 and
hence, 𝑝 𝑚𝑎𝑥 ≤ 1.
Given all the jobs, we partition the jobs into two sets: Big jobs and Small jobs.
We call a job 𝐽𝑖 “big” if 𝑝 𝑖 ≥ 𝜖. Let ℬ and 𝒮 denote the set of big jobs and small
jobs respectively, i.e., ℬ = {𝐽𝑖 : 𝑝 𝑖 ≥ 𝜖} and 𝒮 = {𝐽𝑖 : 𝑝 𝑖 < 𝜖}. The significance
of such a partition is that once we pack the jobs in set ℬ, the jobs in set 𝒮 can be
greedily packed using list scheduling.
Claim 5.1.3. If there is an assignment of jobs in ℬ to the machines with load 𝐿,
then greedily scheduling jobs of 𝒮 on top gives a schedule of value no greater than
max {𝐿, (1 + 𝜖) OPT}.
Proof. Consider scheduling the jobs in 𝒮 after all the jobs in ℬ have been
scheduled (with load 𝐿). If all of these jobs in 𝒮 finish processing by time 𝐿, the
total load is clearly no greater than 𝐿.
If the jobs in 𝒮 can not be scheduled within time 𝐿, consider the last job to
finish (after scheduling the small jobs). Suppose this job starts at time 𝑇 0. All the
machines must have been fully loaded up to 𝑇 0, which gives OPT ≥ 𝑇 0. Since,
for all jobs in 𝒮, we have 𝑝 𝑖 ≤ 𝜖 · OPT, this job finishes at 𝑇 0 + 𝜖 · OPT. Hence, the
schedule can be no more than 𝑇 0 + 𝜖 · OPT ≤ (1 + 𝜖) OPT, settling the claim. 
CHAPTER 5. LOAD BALANCING AND BIN PACKING 61

Scheduling Big Jobs: We concentrate on scheduling the jobs in ℬ. We round


the sizes of all jobs in ℬ using geometrically increasing interval sizes using the
following procedure:

Rounding Jobs:
For each big job 𝑖 do
If 𝑝 𝑖 ∈ (𝜖(1 + 𝜖) 𝑗 , 𝜖(1 + 𝜖) 𝑗+1 ]
Set 𝑝 𝑖 = 𝜖(1 + 𝜖) 𝑗+1

Let ℬ 0 be the set of new jobs.


Claim 5.1.4. If jobs in ℬ can be scheduled with load 1, then the rounded jobs in ℬ 0 can
be scheduled with load (1 + 𝜖).
Claim 5.1.5. The number of distinct big job sizes after rounding is 𝑂( ln(1/𝜖)
𝜖 ).

Proof. Notice that due to scaling, we have 𝑝 𝑖 ≤ 1 for all jobs 𝐽𝑖 . Since the job sizes
are between 𝜖 and 1 the number of geometric powers of (1 + 𝜖) required is 𝑘
where

𝜖(1 + 𝜖) 𝑘 ≤ 1
1 ln(1/𝜖)
⇒ 𝑘 ≤ ln(1+𝜖) = 𝑂( ).
𝜖 𝜖

Lemma 5.1. If the number of distinct job sizes is 𝑘, then there is an exact algorithm
that returns the schedule (if there is one) and runs in time 𝑂(𝑛 2𝑘 ).
Proof. Use Dynamic Programming. 
ln(1/𝜖)
Corollary 5.6. Big Jobs can be scheduled (if possible) with load (1 + 𝜖) in time 𝑛 𝑂( 𝜖 )
.
Once we have scheduled the jobs in ℬ, using Claim 5.1.3, we can pack small
items using greedy list scheduling on top of them. The overall algorithm is then
given as:
PTAS Multiprocessor Scheduling:
1. Guess OPT
2. Define ℬ and 𝒮
3. Round ℬ to ℬ 0
4. If jobs in ℬ 0 can be scheduled in (1 + 𝜖) OPT
Greedily pack 𝒮 on top
Else
Modify the guess and Repeat.
In the following subsection, we comment on the guessing process.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 62

Guessing: We define a (1 + 𝜖)-relaxed decision procedure:

Definition 5.7. Given 𝜖 > 0 and a time 𝑇, a (1 + 𝜖)-relaxed decision procedure returns:

• Output a schedule (if there is one) with makespan load (1 + 𝜖) · 𝑇 or

• Output correctly that there is no schedule with makespan 𝑇.

Define
 
1 Õ 


 

𝐿 = max max 𝑝 𝑗 , 𝑝𝑗
 𝑗 𝑚
𝑗
 

 
𝐿 is a lower bound on OPT as we saw earlier. Furthermore, an upper bound on
OPT is given by the Greedy Multiprocessor Scheduling algorithm, which is 2𝐿.
Consider running the decision procedure with guess 𝐿 + 𝑖𝜖𝐿 for each integer
𝑖 ∈ [d2/𝜖e]. We will choose the schedule with the best makespan among all
the successful runs. If 𝐿∗ is the optimum load then the algorithm will try the
decision procedure with 𝐿∗ + 𝜖𝐿 ≤ (1 + 𝜖)𝐿∗ . For this guess we are guaranteed a
solution and the decision procedure will succeed in outputting a schedule with
load (1 + 𝜖)(1 + 𝜖)𝐿∗ ≤ (1 + 3𝜖)𝐿∗ for sufficiently small 𝜖. We run the decision
procedure 𝑂(1/𝜖) times. This gives us the desired PTAS.
Remark 5.1. A PTAS indicates that the problem can approximated arbitrarily
well in polynomial time. However, a running time of the form 𝑛 𝑓 (𝜖) is typically
not very interesting. We have seen that an FPTAS is ruled out for the makespan
minimization problem. However, it does admit what is now called an Efficient
PTAS (EPTAS) whose running time is 2𝑂(1/𝜖 ·(log(1/𝜖)) ) + poly(𝑛). See [93].
2 3

5.1.4 Section Notes


Multiprocessor Scheduling is NP-hard as we can reduce 2-Partition to Multi-
processor Scheduling on two machines. Note that this reduction only proves
that Multiprocessor Scheduling is weakly NP-hard. When 𝑚 is a fixed constant
Horowitz and Sahni [87] give an FPTAS. However Multiprocessor Scheduling
problem is strongly NP-hard when 𝑚 is part of the input (by a reduction from
3-Partition [62]). Thus, there can not exist an FPTAS for the Multiprocessor
Scheduling problem in general, unless 𝑃 = 𝑁𝑃. However, Hochbaum and
Shmoys [83] gave a PTAS which is the one we described. EPTASes have been
developed for several problems and a key technique is the use of integer linear
programming solvers with a small number of variables.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 63

5.2 Bin Packing


5.2.1 Problem Description
In the Bin Packing problem, we are given a set of 𝑛 items {1, 2, . . . , 𝑛}. Item 𝑖
has size 𝑠 𝑖 ∈ (0, 1]. The goal is to find a minimum number of bins of capacity 1
into which all the items can be packed.
One could also formulate the problem as partitioning {1, 2, . . . , 𝑛} into 𝑘 sets
ℬ1 , ℬ2 , . . . , ℬ𝑘 such that 𝑖∈ℬ 𝑗 𝑠 𝑖 ≤ 1 and 𝑘 is minimum.
Í

5.2.2 Greedy Approaches


Consider the following greedy algorithm for bin packing:

Greedy Bin Packing:


Order items in some way
For 𝑖 = 1 to 𝑛
If item 𝑖 can be packed in some open bin
Pack it
Else
Open a new bin and pack 𝑖 in the new bin

In Greedy Bin Packing algorithm, a new bin is opened only if the item can
not be packed in any of the already opened bins. However, there might be
several opened bins in which the item 𝑖 could be packed. Several rules could be
formulated in such a scenario:

• First Fit: Pack item in the earliest opened bin

• Last Fit: Pack item in the last opened bin

• Best Fit: Pack item in the bin that would have least amount of space left
after packing the item

• Worst Fit: Pack item in the bin that would have most amount of space left
after packing the item

Irrespective of what strategy is chosen to pack an item in the opened bins,


one could get the following result:

Theorem 5.8. Any greedy rule yields a 2-approximation.

𝑖 𝑠𝑖 .
Í
Observation 5.9. OPT ≥

We call a bin 𝛼-full if items occupy space at most 𝛼.


CHAPTER 5. LOAD BALANCING AND BIN PACKING 64

Claim 5.2.1. Greedy has at most 1 bin that is 21 -full.

Proof. For the sake of contradiction, assume that there are two bins 𝐵 𝑖 and 𝐵 𝑗
that are 12 -full. WLOG, assume that Greedy Bin Packing algorithm opened bin 𝐵 𝑖
before 𝐵 𝑗 . Then, the first item that the algorithm packed into 𝐵 𝑗 must be of size
at most 12 . However, this item could have been packed into 𝐵 𝑖 since 𝐵 𝑖 is 12 -full.
This is a contradiction to the fact that Greedy Bin Packing algorithm opens a
new bin if and only if the item can not be packed in any of the opened bins. 

Proof of Theorem 5.8. Let 𝑚 be the number of bins opened by Greedy Bin Packing
algorithm. From Claim 5.2.1, we have:
Õ 𝑚−1
𝑠𝑖 >
2
𝑖

𝑖 𝑠𝑖 ,
Í
Using the observation that OPT ≥ we get:

𝑚−1
OPT >
2
which gives us:

𝑚 < 2 · OPT +1
⇒ 𝑚 ≤ 2 · OPT

5.2.3 (Asymptotic) PTAS for Bin Packing


A natural question follows the discussion above: Can Bin Packing have a PTAS?
In this subsection, we settle this question in negative. In particular, we give a
reduction from an NP-complete problem to the Bin Packing problem and show
that a PTAS for the Bin Packing problem will give us an exact solution for the
NP-complete problem in polynomial time. We consider the Partition problem:
In the Partition problem, we are given a set of items {1, 2, . . . , 𝑛}. Item 𝑖 has
a size 𝑠 𝑖 . The goal is to partition the {1, 2, . . . , 𝑛} into two sets 𝒜 and ℬ such
that 𝑖∈𝒜 𝑠 𝑖 = 𝑗∈ℬ 𝑠 𝑗 .
Í Í

Claim 5.2.2. If Bin Packing has a ( 32 − 𝜖)-approximation for any 𝜖 > 0, the Partition
problem can be solved exactly in polynomial time.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 65

Proof. Given an instance 𝐼 of the Partition problem, we construct an instance


of the Bin Packing problem as follows: Scale the size of the items such that
𝑖 𝑠 𝑖 = 2. Consider the scaled sizes of the items as an instance 𝐼 of the Bin
Í 0

Packing problem. If all items of 𝐼 can be packed in 2 bins, then we have an “yes”
0

answer to 𝐼. Otherwise, the items of 𝐼 0 need 3 bins and the answer to 𝐼 is “no”.
OPT for 𝐼 0 is 2 or 3. Hence, if there is a ( 32 − 𝜖)-approximation algorithm for
the Bin Packing problem, we can determine the value of OPT which in turn
implies that we can solve 𝐼. Thus, there can not exist a ( 23 − 𝜖)-approximation
algorithm for the Bin Packing problem, unless P = NP. 
Recall the scaling property where we discussed why many optimization
problems do not admit additive approximations. We notice that the Bin Packing
problem does not have the scaling property. Hence it may be possible to find an
additive approximation algorithms. We state some of the results in this context:
Theorem 5.10 (Johnson ’74 [96]). There exists a polynomial time algorithm such 𝒜 𝐽
such that:
11
𝒜 𝐽 (𝐼) ≤ OPT(𝐼) + 4
9
for all instances 𝐼 of the Bin Packing problem.
Theorem 5.11 (de la Vega, Lueker ’81 [48]). For any fixed 𝜖 > 0 there exists a
polynomial time algorithm such 𝒜 𝐹𝐿 such that:
𝒜 𝐹𝐿 (𝐼) ≤ (1 + 𝜖) OPT(𝐼) + 1
for all instances 𝐼 of the Bin Packing problem.
Theorem 5.12 (Karmarkar, Karp ’82 [100]). There exists a polynomial time algorithm
such 𝒜 𝐾𝐾 such that:
𝒜 𝐾𝐾 (𝐼) ≤ OPT(𝐼) + 𝑂(log2 (OPT(𝐼)))
for all instances 𝐼 of the Bin Packing problem.
This has been improved recently.
Theorem 5.13 (Hoberg and Rothvoss 2017 [82]). There exists a polynomial time
algorithm such 𝒜 𝐻𝑇 such that:
𝒜 𝐾𝐾 (𝐼) ≤ OPT(𝐼) + 𝑂(log(OPT(𝐼)))
for all instances 𝐼 of the Bin Packing problem.
A major open problem is the following.
Open Question 5.14. Is there a polynomial-time algorithm 𝒜 such that 𝒜(𝐼) ≤
OPT(𝐼) + 𝑐, for some fixed constant 𝑐? In particular is 𝑐 = 1?
3
Exercise 5.2. Show that First Fit greedy rule yields a 2 OPT +1-approximation.
CHAPTER 5. LOAD BALANCING AND BIN PACKING 66

5.2.4 Asymptotic PTAS for Bin Packing


A recurring theme in last two lectures has been the rounding of jobs/tasks/items.
To construct an asymptotic PTAS for Bin Packing problem, we use the same set
of ideas with simple in retrospect but non-obvious modifications. In particular,
we divide the set of items into big and small items and concentrate on packing
the big items first. We show that such a technique results in an asymptotic PTAS
for the Bin Packing problem.
Consider the set of items, 𝑠 1 , 𝑠2 , . . . , 𝑠 𝑛 . We divide the items into two sets,
ℬ = {𝑖 : 𝑠 𝑖 ≥ 𝜖} and 𝒮 = {𝑗 : 𝑠 𝑗 < 𝜖}. Similar to the Multiprocessor
Scheduling problem, where we rounded up the processing times of the jobs,
we round up the sizes of the items in the Bin Packing problem. Again, we
concentrate only on the items in ℬ. Let 𝑛 0 = |ℬ| be the number of big items.
We observed earlier that OPT ≥ 𝑖 𝑠 𝑖 and hence OPT ≥ 𝜖 · 𝑛 0 by just
Í
considering the big items.
Claim 5.2.3. Suppose 𝑛 0 > 4/𝜖 2 . Then OPT ≥ 4/𝜖.
If there are very few big items one can solve the problem by brute force.
Claim 5.2.4. Suppose 𝑛 0 < 4/𝜖 2 . An optimal solution for the Bin Packing problem
can be computed in 2𝑂(1/𝜖 ) time.
4

Proof Sketch. If the number of big items is small, one can find the optimal solution
using brute force search. 
The following gives a procedure to round up the items in ℬ:

Rounding Item Sizes:


Sort the items such that 𝑠1 ≥ 𝑠2 ≥ · · · ≥ 𝑠 𝑛0
Group items in 𝑘 = 2/𝜖2 groups ℬ1 , . . . , ℬ𝑘 such that each group has b𝑛 0/𝑘c items
Round the size of all the items in group ℬ𝑖 to the size of the smallest item in ℬ𝑖−1

Lemma 5.2. Consider the restriction of the bin packing problem to instances in which
the number of distinct item sizes is 𝑘. There is an 𝑛 𝑂(𝑘) -time algorithm that outputs the
optimum solution.
Proof Sketch. Use Dynamic Programming. 

Claim 5.2.5. The items in ℬ can be packed in OPT +|ℬ1 | bins in time 𝑛 𝑂(1/𝜖 ) .
2

Proof. Using Rounding Item Sizes, we have restricted all items but those in ℬ1 to
have one of the 𝑘 − 1 distinct sizes. Using lemma 5.2, these items can be packed
efficiently in OPT. Furthermore, the items in ℬ1 can always be packed in |ℬ1 |
bins (one per bin). Hence, the total number of bins is OPT +|ℬ1 |.
The running time of the algorithm follows since 𝑘 = 𝑂(1/𝜖 2 ). 
CHAPTER 5. LOAD BALANCING AND BIN PACKING 67

Lemma 5.3. Let 𝜖 > 0 be fixed. Consider the restriction of the bin packing problem to
instances in which each items is of size at least 𝜖. There is a polynomial time algorithm
that solves this restricted problem within a factor of (1 + 𝜖).

Proof. Using Claim 5.2.5, we can pack ℬ in OPT +|ℬ1 | bins. Recall that |ℬ1 | =
b𝑛 0/𝑘c ≤ 𝜖 2 · 𝑛 0/2 ≤ 𝜖 · OPT/8 where, we have used Claim 5.2.3 to reach the final
expression. 
Theorem 5.15. For any 𝜖, 0 < 𝜖 < 1/2, there is an algorithm 𝒜 𝜖 that runs in time
polynomial in 𝑛 and finds a packing using at most (1 + 2𝜖) OPT +1 bins.

Proof. Assume that the number of bins used to pack items in ℬ is 𝑚 and the
total number of bins used after packing items in 𝒮 is 𝑚 0. Clearly

( 𝑖 𝑠𝑖 )
  Í 
0
𝑚 ≤ max 𝑚,
(1 − 𝜖)

since at most one bin must be (1 − 𝜖) full using an argument in Greedy Bin
Packing. Furthermore,
!
( 𝑖 𝑠𝑖 )
 Í  Õ
≤ 𝑠 𝑖 (1 + 2𝜖) + 1
(1 − 𝜖)
𝑖

for 𝜖 < 1/2. This gives the required expression. 


The algorithm is summarized below:

Asymptotic Ptas Bin Packing:


Split the items in ℬ (big items) and 𝒮 (small items)
Round the sizes of the items in ℬ to obtain constant number of item sizes
Find optimal packing for items with rounded sizes
Use this packing for original items in ℬ
Pack items in 𝒮 using Greedy Bin Packing.

5.2.5 Section Notes


An excellent but perhaps somewhat dated survey on approximation algorithms
for Bin Packing problem is [97]. See [82] for some pointers to more recent
work. There has also been substantial recent work on various generalizations to
multiple dimensions.
Chapter 6

Unrelated Machine Scheduling


and Generalized Assignment

This chapter is based on notes first scribed by Alina Ene.

6.1 Scheduling on Unrelated Parallel Machines


We have a set 𝐽 of 𝑛 jobs, and a set 𝑀 of 𝑚 machines. The processing time of job
𝑖 is 𝑝 𝑖𝑗 on machine 𝑗. Let 𝑓 : 𝐽 → 𝑀 be a function that assigns each job to exactly
one machine. The makespan of 𝑓 is max1≤𝑗≤𝑚 𝑖: 𝑓 (𝑖)=𝑗 𝑝 𝑖𝑗 , where 𝑖: 𝑓 (𝑖)=𝑗 𝑝 𝑖𝑗 is
Í Í
the total processing time of the jobs that are assigned to machine 𝑗. In the
Scheduling on Unrelated Parallel Machines problem, the goal is to find an
assignment of jobs to machines of minimum makespan.
We can write an LP for the problem that is very similar to the routing LP
from the previous lecture. For each job 𝑖 and each machine 𝑗, we have a variable
𝑥 𝑖𝑗 that denotes whether job 𝑖 is assigned to machine 𝑗. We also have a variable
𝜆 for the makespan. We have a constraint for each job that ensures that the job
is assigned to some machine, and we have a constraint for each machine that
ensures that the total processing time of jobs assigned to the machines is at most
the makespan 𝜆.

68
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT69

minimize 𝜆
Õ
subject to 𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗∈𝑀
Õ
𝑥 𝑖𝑗 𝑝 𝑖𝑗 ≤ 𝜆 ∀𝑗 ∈ 𝑀
𝑖∈𝐽

𝑥 𝑖𝑗 ≥ 0 ∀𝑖 ∈ 𝐽, 𝑗 ∈ 𝑀
The above LP is very natural, but unfortunately it has unbounded integrality
gap. Suppose that we have a single job that has processing time 𝑇 on each of
the machines. Clearly, the optimal schedule has makespan 𝑇. However, the LP
can schedule the job to the extend of 1/𝑚 on each of the machines, i.e., it can set
𝑥 1𝑗 = 1/𝑚 for all 𝑗, and the makespan of the resulting fractional schedule is only
𝑇/𝑚.
To overcome this difficulty, we modify the LP slightly. Suppose we knew
that the makespan of the optimal solution is equal to 𝜆, where 𝜆 is some
fixed number. If the processing time 𝑝 𝑖𝑗 of job 𝑖 on machine 𝑗 is greater than
𝜆, job 𝑖 is not scheduled on machine 𝑗, and we can strengthen the LP by
setting 𝑥 𝑖𝑗 to 0 or equivalently, by removing the variable. More precisely, let
𝒮𝜆 = {(𝑖, 𝑗) | 𝑖 ∈ 𝐽, 𝑗 ∈ 𝑀, 𝑝 𝑖𝑗 ≤ 𝜆}. Given a value 𝜆, we can write the following
LP for the problem.

LP(𝜆)
Õ
𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗: (𝑖,𝑗)∈𝒮𝜆
Õ
𝑥 𝑖𝑗 𝑝 𝑖𝑗 ≤ 𝜆 ∀𝑗 ∈ 𝑀
𝑖: (𝑖,𝑗)∈𝒮𝜆

𝑥 𝑖𝑗 ≥ 0 ∀(𝑖, 𝑗) ∈ 𝒮𝜆
Note that the LP above does not have an objective function. In the following,
we are only interested in whether the LP is feasible, i.e, whether there is an
assignment that satisfies all the constraints. Also, we can think of 𝜆 as a
parameter and LP(𝜆) as a family of LPs, one for each value of the parameter. A
useful observation is that, if 𝜆 is a lower bound on the makespan of the optimal
schedule, LP(𝜆) is feasible and it is a valid relaxation for the Scheduling on
Unrelated Parallel Machines problem.
Lemma 6.1. Let 𝜆∗ be the minimum value of the parameter 𝜆 such that LP(𝜆) is feasible.
We can find 𝜆∗ in polynomial time.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT70

Proof. For any fixed value of 𝜆, we can check whether LP(𝜆) is feasible using a
polynomial-time algorithm for solving LPs. Thus we can find 𝜆∗ using binary
search starting with the interval [0, 𝑖,𝑗 𝑝 𝑖𝑗 ].
Í

In the following, we will show how to round a solution to LP(𝜆∗ ) in order to
get a schedule with makespan at most 2𝜆∗ . As we will see shortly, it will help to
round a solution to LP(𝜆∗ ) that is a vertex solution.
Let 𝑥 be a vertex solution to LP(𝜆∗ ). Let 𝐺 be a bipartite graph on the vertex
set 𝐽 ∪ 𝑀 that has an edge 𝑖𝑗 for each variable 𝑥 𝑖𝑗 ≠ 0. We say that job 𝑖 is
fractionally set if 𝑥 𝑖𝑗 ∈ (0, 1) for some 𝑗. Let 𝐹 be the set of all jobs that are
fractionally set, and let 𝐻 be a bipartite graph on the vertex set 𝐹 ∪ 𝑀 that has
an edge 𝑖𝑗 for each variable 𝑥 𝑖𝑗 ∈ (0, 1); note that 𝐻 is the induced subgraph of 𝐺
on 𝐹 ∪ 𝑀. As shown in Lemma 6.2, the graph 𝐻 has a matching that matches
every job in 𝐹 to a machine, and we will it in the rounding algorithm.

Lemma 6.2. The graph 𝐺 has a matching that matches every job in 𝐹 to a machine.

We are now ready to give the rounding algorithm.

SUPM-Rounding
Find 𝜆∗
Find a vertex solution 𝑥 to LP(𝜆∗ )
For each 𝑖 and 𝑗 such that 𝑥 𝑖𝑗 = 1, assign job 𝑖 to machine 𝑗
Construct the graph 𝐻
Find a maximum matching ℳ in 𝐻
Assign the fractionally set jobs according to the matching ℳ

Theorem 6.1. Consider the assignment constructed by SUPM-Rounding. Each job is


assigned to a machine, and the makespan of the schedule is at most 2𝜆∗ .

Proof. By Lemma 6.2, the matching ℳ matches every fractionally set job to a
machine and therefore all of the jobs are assigned. After assigning all of the
integrally set jobs, the makespan (of the partial schedule) is at most 𝜆∗ . Since
ℳ is a matching, each machine receives at most one additional job. Let 𝑖 be a
fractionally set job, and suppose that 𝑖 is matched (in ℳ) to machine 𝑗. Since
the pair (𝑖, 𝑗) is in 𝒮𝜆∗ , the processing time 𝑝 𝑖𝑗 is at most 𝜆∗ , and therefore the
total processing time of machine 𝑗 increases by at most 𝜆 after assigning the
fractionally set jobs. Therefore the makespan of the final schedule is at most
2𝜆∗ . 
Exercise 6.1. Give an example that shows that Theorem 6.1 is tight. That is,
give an instance and a vertex solution such that the makespan of the schedule
SUPM-Rounding is at least (2 − 𝑜(1))𝜆∗ .
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT71

Since 𝜆∗ is a lower bound on the makespan of the optimal schedule, we get the
following corollary.
Corollary 6.2. SUPM-Rounding achieves a 2-approximation.
Now we turn our attention to Lemma 6.2 and some other properties of vertex
solutions to LP(𝜆). The following can be derived from the rank lemma which is
described in Chapter A. Here we give a self-contained proof.
Lemma 6.3. If LP(𝜆) is feasible, any vertex solution has at most 𝑚 + 𝑛 non-zero
variables and it sets at least 𝑛 − 𝑚 of the jobs integrally.
Proof. Let 𝑥 be a vertex solution to LP(𝜆). Let 𝑟 denote the number of pairs in
𝒮𝜆 . Note that LP(𝜆) has 𝑟 variables, one for each pair (𝑖, 𝑗) ∈ 𝒮𝜆 . If 𝑥 is a vertex
solution, it satisfies 𝑟 of the constraints of LP(𝜆) with equality. The first set of
constraints consists of 𝑛 constraints, and the second set of constraints consists of
𝑚 constraints. Therefore at least 𝑟 − (𝑚 + 𝑛) of the tight constraints are from the
third set of constraints, i.e., at least 𝑟 − (𝑚 + 𝑛) of the variables are set to zero.
We say that job 𝑖 is set fractionally if 𝑥 𝑖𝑗 ∈ (0, 1) for some 𝑗; job 𝑖 is set
integrally if 𝑥 𝑖𝑗 ∈ {0, 1} for all 𝑗. Let 𝐼 and 𝐹 be the set of jobs that are set
integrally and fractionally (respectively). Clearly, |𝐼 | + |𝐹| = 𝑛. Any job 𝑖 that
is fractionally set is assigned (fractionally) to at least two machines, i.e., there
exist 𝑗 ≠ ℓ such that 𝑥 𝑖𝑗 ∈ (0, 1) and 𝑥 𝑖ℓ ∈ (0, 1). Therefore there are at least
2|𝐹| distinct non-zero variables corresponding to jobs that are fractionally set.
Additionally, for each job 𝑖 that is integrally set, there is a variable 𝑥 𝑖𝑗 that is
non-zero. Thus the number of non-zero variables is at least |𝐼 | + 2|𝐹|. Hence
|𝐼 | + |𝐹| = 𝑛 and |𝐼 | + 2|𝐹| ≤ 𝑚 + 𝑛, which give us that |𝐼 | is at least 𝑛 − 𝑚. 
Definition 6.3. A connected graph is a pseudo-tree if the number of edges is at most
the number of vertices. A graph is a pseudo-forest if each of its connected components
is a pseudo-tree.
Lemma 6.4. The graph 𝐺 is a pseudo-forest.
Proof. Let 𝐶 be a connected component of 𝐺. We restrict LP(𝜆) and 𝑥 to the
jobs and machines in 𝐶 to get LP0(𝜆) and 𝑥 0. Note that 𝑥 0 is a feasible solution
to LP0(𝜆). Additionally, 𝑥 0 is a vertex solution to LP0(𝜆). If not, 𝑥 0 is a convex
combination of two feasible solutions 𝑥 10 and 𝑥 20 to LP0(𝜆). We can extend 𝑥 10
and 𝑥20 to two solutions 𝑥 1 and 𝑥 2 to LP(𝜆) using the entries of 𝑥 that are not
in 𝑥 0. By construction, 𝑥 1 and 𝑥2 are feasible solutions to LP(𝜆). Additionally,
𝑥 is a convex combination of 𝑥 1 and 𝑥 2 , which contradicts the fact that 𝑥 is a
vertex solution. Thus 𝑥 0 is a vertex solution to LP0(𝜆) and, by Lemma 6.3, 𝑥 0
has at most 𝑛 0 + 𝑚 0 non-zero variables, where 𝑛 0 and 𝑚 0 are the number of jobs
and machines in 𝐶. Thus 𝐶 has 𝑛 0 + 𝑚 0 vertices and at most 𝑛 0 + 𝑚 0 edges, and
therefore it is a pseudo-tree. 
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT72

Proof. of Lemma 6.2 Note that each job that is integrally set has degree one in
𝐺. We remove each integrally set job from 𝐺; note that the resulting graph is 𝐻.
Since we removed an equal number of vertices and edges from 𝐺, it follows that
𝐻 is a pseudo-forest as well. Now we construct a matching ℳ as follows.
Note that every job vertex has degree at least 2, since the job is fractionally
assigned to at least two machines. Thus all of the leaves (degree-one vertices) of
𝐻 are machines. While 𝐻 has at least one leaf, we add the edge incident to the
leaf to the matching and we remove both of its endpoints from the graph. If 𝐻
does not have any leaves, 𝐻 is a collection of vertex-disjoint cycles, since it is a
pseudo-forest. Moreover, each cycle has even length, since 𝐻 is bipartite. We
construct a perfect matching for each cycle (by taking alternate edges), and we
add it to our matching. 
Exercise 6.2. (Exercise 17.1 in [152]) Give a proof of Lemma 6.2 using Hall’s
theorem.

6.2 Generalized Assignment Problem


The Generalized Assignment problem is a generalization of the Scheduling on
Unrelated Parallel Machines problem in which there are costs associated with
each job-machine pair, in addition to a processing time. More precisely, we have
a set 𝐽 of 𝑛 jobs, a set 𝑀 of 𝑚 machines, and a target 𝜆. The processing time
of job 𝑖 is 𝑝 𝑖𝑗 on machine 𝑗, and the cost of assigning job 𝑖 to machine 𝑗 is 𝑐 𝑖𝑗 .
Let 𝑓 : 𝐽 → 𝑀 be a function that assigns each job to exactly one machine. The
assignment 𝑓 is feasible if its makespan is at most 𝜆 (recall that 𝜆 is part of the
input), and its cost is 𝑖 𝑐 𝑖 𝑓 (𝑖) . In the Generalized Assignment problem, the goal
Í
is to construct a minimum cost assignment 𝑓 that is feasible, provided that there
is a feasible assignment.
Remark 6.1. We could allow each machine 𝑀 𝑗 to have a different capacity 𝑐 𝑗
which is more natural in certain settings. However, since 𝑝 𝑖𝑗 values are allowed
to depend on 𝑗 we can scale them to ensure that 𝑐 𝑗 = 𝜆 for every 𝑗 without loss
of generality.
In the following, we will show that, if there is an assignment of cost 𝐶 and
makespan at most 𝜆, then we can construct a schedule of cost at most 𝐶 and
makespan at most 2𝜆. In fact the assignment will have a stronger property that
the load on a a machine exceeds 𝜆 due to at most one job.
As before, we let 𝒮𝜆 denote the set of all pairs (𝑖, 𝑗) such that 𝑝 𝑖𝑗 ≤ 𝜆. We can
generalize the relaxation LP(𝜆) from the previous section to the following LP.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT73

GAP-LP
Õ
min 𝑥 𝑖𝑗 𝑐 𝑖𝑗
(𝑖,𝑗)∈𝒮𝜆
Õ
subject to 𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗: (𝑖,𝑗)∈𝒮𝜆
Õ
𝑥 𝑖𝑗 𝑝 𝑖𝑗 ≤ 𝜆 ∀𝑗 ∈ 𝑀
𝑖: (𝑖,𝑗)∈𝒮𝜆

𝑥 𝑖𝑗 ≥ 0 ∀(𝑖, 𝑗) ∈ 𝒮𝜆
Since we also need to preserve the costs, we can no longer use the previous
rounding; in fact, it is easy to see that the previous rounding is arbitrarily bad
for the Generalized Assignment problem. However, we will still look for a
matching, but in a slightly different graph.
But before we give the rounding algorithm for the Generalized Assignment
problem, we take a small detour into the problem of finding a minimum-cost
matching in a bipartite graph. In the Minimum Cost Biparite Matching problem,
we are given a bipartite graph 𝐵 = (𝑉1 ∪ 𝑉2 , 𝐸) with costs 𝑐 𝑒 on the edges, and
we want to construct a minimum cost matching ℳ that matches every vertex in
𝑉1 , if there is such a matching. For each vertex 𝑣, let 𝛿(𝑣) be the set of all edges
incident to 𝑣. We can write the following LP for the problem.

BipartiteMatching(𝐵)
Õ
min 𝑐 𝑒 𝑦𝑒
𝑒∈𝐸(𝐵)
Õ
subject to 𝑦𝑒 = 1 ∀𝑣 ∈ 𝑉1
𝑒∈𝛿(𝑣)
Õ
𝑦𝑒 ≤ 1 ∀𝑣 ∈ 𝑉2
𝑒∈𝛿(𝑣)

𝑦𝑒 ≥ 0 ∀𝑒 ∈ 𝐸(𝐵)
The following is well-known in combinatorial optimization [137].
Theorem 6.4. For any bipartite graph 𝐵, any vertex solution to BipartiteMatching(𝐵)
is an integer solution. Moreover, given a feasible fractional solution 𝑦, we can find in
polynomial time a feasible solution 𝑧 such that 𝑧 is integral and
Õ Õ
𝑐𝑒 𝑧𝑒 ≤ 𝑐 𝑒 𝑦𝑒 .
𝑒∈𝐸(𝐵) 𝑒∈𝐸(𝐵)
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT74

In the rest of the section we give two different proofs that establish our
claimed result. One is based on the first work that gave this result [140], and the
other is based on iterative rounding [110].

6.2.1 Shmoys-Tardos Rounding


Let 𝑥 be an optimal vertex solution to GAP-LP. As before, we want to construct
a graph 𝐺 that has a matching ℳ that matches all jobs. The graph 𝐺 will now
have costs on its edges and we want a matching of cost at most 𝐶. Recall that for
Scheduling on Unrelated Parallel Machines we defined a bipartite graph on
the vertex set 𝐽 ∪ 𝑀 that has an edge 𝑖𝑗 for every variable 𝑥 𝑖𝑗 that is non-zero. We
can construct the same graph for Generalized Assignment, and we can assign
a cost 𝑐 𝑖𝑗 to each edge 𝑖𝑗. If the solution 𝑥 was actually a fractional matching
— that is, if 𝑥 was a feasible solution to BipartiteMatching(𝐺) — Theorem 6.4
would give us the desired matching. The solution 𝑥 satisfies the constraints
corresponding to vertices 𝑣 ∈ 𝐽, but it does not necessarily satisfy the constraints
corresponding vertices 𝑣 ∈ 𝑀, since a machine can be assigned more than one
job. To get around this difficulty, we will introduce several nodes representing
the same machine, and we will use 𝑥 to construct a fractional matching for the
resulting graph.
The fractional solution 𝑥 assigns 𝑖∈𝐽 𝑥 𝑖𝑗 jobs to machine 𝑗; let 𝑘 𝑗 = d 𝑖∈𝐽 𝑥 𝑖𝑗 e.
Í Í
We construct a bipartite graph 𝐺 as follows. For each job 𝑖, we have a node 𝑖. For
each machine 𝑗, we have 𝑘 𝑗 nodes (𝑗, 1), · · · , (𝑗, 𝑘 𝑗 ). We can think of the nodes
(𝑗, 1), · · · , (𝑗, 𝑘 𝑗 ) as slots on machine 𝑗. Since now we have multiple slots on each
of the machines, we need a fractional assignment 𝑦 that assigns a job to slots
on the machines. More precisely, 𝑦 has an entry 𝑦 𝑖,(𝑗,𝑠) for each job 𝑖 and each
slot (𝑗, 𝑠) that represents the fraction of job 𝑖 that is assigned to the slot. We give
the algorithm that constructs 𝑦 from 𝑥 below. Once we have the solution 𝑦, we
add an edge between any job 𝑖 and any machine slot (𝑗, 𝑠) such that 𝑦 𝑖,(𝑗,𝑠) is
non-zero. Additionally, we assign a cost 𝑐 𝑖,(𝑗,𝑠) to each edge (𝑖, (𝑗, 𝑠)) of 𝐺 that is
equal to 𝑐 𝑖𝑗 .
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT75

GreedyPacking(𝑥)
𝑦 = 0 hhinitialize 𝑦 to 0ii
𝑠 = 1 hh𝑠 is the current binii
𝑅 = 1 hh𝑅 is the space available on bin 𝑠ii
for 𝑖 = 1 to ℎ
hhpack 𝑥 𝑖𝑗 into the binsii
if 𝑥 𝑖𝑗 ≤ 𝑅
𝑦 𝑖,(𝑗,𝑠) = 𝑥 𝑖𝑗
𝑅 = 𝑅 − 𝑥 𝑖𝑗
if 𝑅 = 0
𝑠 = 𝑠+1
𝑅=1
else
𝑦 𝑖,(𝑗,𝑠) = 𝑅
𝑦 𝑖,(𝑗,𝑠+1) = 𝑥 𝑖𝑗 − 𝑅 hhpack 𝑥 𝑖𝑗 − 𝑅 in the next binii
𝑅 = 1 − 𝑦 𝑖,(𝑗,𝑠+1)
𝑠 = 𝑠+1
return 𝑦

Jobs Slots/Bins
y1,(j,1) = 0.5
x1j = 0.5
ble on bin sii y2,(j,1) = 0.5

x2j = 0.7
y2,(j,2) = 0.2
x3j = 0.3 y3,(j,2) = 0.3
y4,(j,2) = 0.2
y5,(j,2) = 0.3
x4j = 0.2

y5,(j,3) = 0.3
x5j = 0.6
ck xij R in the next binii

Figure 6.1: Constructing 𝑦 from 𝑥.


Figure 1. Constructing y from x.
When we construct 𝑦, we consider each machine in turn. Let 𝑗 be the current
machine. Recall that we want to ensure that 𝑦 assigns at most one job to each
ent bin: if there is atslot;
least xij we
as such, space available,
will think of each i.e., xij
slot on machine 𝑗 as apack
 s, we bin with capacity 1. We
ent bin; otherwise, we pack as much as we can into the current bin,
ext bin. (See Figure 1 for an example.)

tructed by GreedyPacking is a feasible solution to BipartiteMatching(G).


X X
yi,(j,s) ci,(j,s) = xij cij .
(j,s))2E(G) (i,j)2S
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT76

“pack” jobs into the bins greedily. We only consider jobs 𝑖 such that 𝑝 𝑖𝑗 is at most
𝜆; let ℎ denote the number of such jobs. We assume without loss of generality
that these are labeled as 1, 2, · · · , ℎ, and 𝑝 1𝑗 ≥ 𝑝 2𝑗 ≥ · · · ≥ 𝑝 ℎ 𝑗 . Informally, when
we construct 𝑦, we consider the jobs 1, 2, · · · , ℎ in this order. Additionally, we
keep track of the bin that has not been filled and the amount of space 𝑠 available
on that bin. When we consider job 𝑖, we try to pack 𝑥 𝑖𝑗 into the current bin: if
there is at least 𝑥 𝑖𝑗 space available, i.e., 𝑥 𝑖𝑗 ≤ 𝑠, we pack the entire amount into
the current bin; otherwise, we pack as much as we can into the current bin, and
we pack the rest into the next bin. (See Figure 1 for an example.)

Lemma 6.5. The solution 𝑦 constructed by GreedyPacking is a feasible solution to


BipartiteMatching(𝐺). Moreover,
Õ Õ
𝑦 𝑖,(𝑗,𝑠) 𝑐 𝑖,(𝑗,𝑠) = 𝑥 𝑖𝑗 𝑐 𝑖𝑗 .
(𝑖,(𝑗,𝑠))∈𝐸(𝐺) (𝑖,𝑗)∈𝒮𝜆

Í𝑘 𝑗
Proof. Note that, by construction, 𝑥 𝑖𝑗 = 𝑠=1
𝑦 𝑖,(𝑗,𝑠) . Therefore, for any job 𝑖, we
have
𝑘𝑗
Õ Õ Õ Õ
𝑦 𝑖,(𝑗,𝑠) = 𝑦 𝑖,(𝑗,𝑠) = 𝑥 𝑖𝑗 = 1
(𝑖,(𝑗,𝑠))∈𝛿(𝑖) 𝑗: (𝑖,𝑗)∈𝒮𝜆 𝑠=1 𝑗: (𝑖,𝑗)∈𝒮𝜆

Additionally, since we imposed a capacity of 1 on the bins associated with each


slot, it follows that, for any slot (𝑗, 𝑠),
Õ
𝑦 𝑖,(𝑗,𝑠) ≤ 1
(𝑖,(𝑗,𝑠)∈𝛿((𝑗,𝑠)))

Therefore 𝑦 is a feasible solution to BipartiteMatching(𝐺). Finally,

𝑛 𝑘𝑗
Õ Õ Õ Õ Õ
𝑦 𝑖,(𝑗,𝑠) 𝑐 𝑖,(𝑗,𝑠) = 𝑦 𝑖,(𝑗,𝑠) 𝑐 𝑖𝑗 = 𝑥 𝑖𝑗 𝑐 𝑖𝑗
(𝑖,(𝑗,𝑠))∈𝐸(𝐺) 𝑖=1 𝑗: (𝑖,𝑗)∈𝒮𝜆 𝑠=1 (𝑖,𝑗)∈𝒮𝜆


Theorem 6.4 gives us the following corollary.

Corollary 6.5. The graph 𝐺 has a matching ℳ that matches every job and it has cost
at most (𝑖,𝑗)∈𝒮𝜆 𝑥 𝑖𝑗 𝑐 𝑖𝑗 . Moreover, we can find such a matching in polynomial time.
Í
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT77

GAP-Rounding
let 𝑥 be an optimal solution to GAP-LP
𝑦 = GreedyPacking(𝑥)
construct the graph 𝐺
construct a matching ℳ in 𝐺 such Íthat ℳ matches every job
and the cost of ℳ is at most (𝑖,𝑗)∈𝒮𝜆 𝑥 𝑖𝑗 𝑐 𝑖𝑗
for each edge (𝑖, (𝑗, 𝑠)) ∈ ℳ
assign job 𝑖 to machine 𝑗

Theorem 6.6. Let 𝐶 = (𝑖,𝑗)∈𝒮𝜆 𝑥 𝑖𝑗 𝑐 𝑖𝑗 . The schedule returned by GAP-Rounding has


Í
cost at most 𝐶 and makespan at most 2𝜆.
Proof. By Corollary 6.5, the cost of the schedule is at most 𝐶. Therefore we only
need to upper bound the makespan of the schedule.
Consider a machine 𝑗. For any slot (𝑗, 𝑠) on machine 𝑗, let

𝑞 𝑗𝑠 = max 𝑝 𝑖𝑗
𝑖:𝑦 𝑖,(𝑗,𝑠) >0

That is, 𝑞 𝑗𝑠 is the maximum processing time of any pair 𝑖𝑗 such that job 𝑖 is
assigned (in 𝑦) to the slot (𝑗, 𝑠). It follows that the total processing time of the
Í𝑘 𝑗
jobs that ℳ assigns to machine 𝑗 is at most 𝑠=1 𝑞 𝑗𝑠 .
Since GAP-LP has a variable 𝑥 𝑖𝑗 only for pairs (𝑖, 𝑗) such that 𝑝 𝑖𝑗 is at most
𝜆, it follows that 𝑞 𝑗1 is at most 𝜆. We restrict attention to the case when at
least two slots are assigned to 𝑗, for otherwise it is easy to see that the load is
Í𝑘 𝑗
atmost 𝜆. Therefore we only need to show that 𝑠=2 𝑞 𝑗𝑠 is at most 𝜆 as well.
Consider a slot 𝑠 on machine 𝑗 such that 𝑠 > 1. Recall that we labeled the jobs
that are relevant to machine 𝑗 — that is, jobs 𝑖 such that 𝑝 𝑖𝑗 is at most 𝜆 — as
1, 2, · · · , ℎ such that 𝑝1𝑗 ≥ 𝑝 2𝑗 ≥ · · · ≥ 𝑝 ℎ 𝑗 . Consider a job ℓ that is assigned to
slot 𝑠. Since GreedyPacking considers jobs in non-increasing order according to
their processing times, the processing time 𝑝ℓ 𝑗 of job ℓ is at most the processing
time of any job assigned to the slot 𝑠 − 1. Therefore 𝑝ℓ 𝑗 is upper bounded by
any convex combination of the processing times of the jobs that are assigned to
the slot 𝑠 − 1. Since the slot 𝑠 − 1 is full, 𝑖 𝑦 𝑖,(𝑗,𝑠−1) = 1 and thus 𝑝ℓ 𝑗 is at most
Í

𝑖 𝑦 𝑖,(𝑗,𝑠−1) 𝑝 𝑖𝑗 . It follows that


Í

𝑘𝑗 𝑘𝑗 𝑘𝑗
Õ Õ Õ Õ Õ
𝑞 𝑗𝑠 ≤ 𝑦 𝑖,(𝑗,𝑠−1) 𝑝 𝑖𝑗 ≤ 𝑦 𝑖,(𝑗,𝑠) 𝑝 𝑖𝑗
𝑠=2 𝑠=2 𝑖 𝑠=1 𝑖

𝑦 𝑖,(𝑗,𝑠) = 𝑥 𝑖𝑗 , and therefore


Í
By construction, 𝑠

𝑘𝑗 𝑠
Õ Õ Õ Õ Õ
𝑦 𝑖,(𝑗,𝑠) 𝑝 𝑖𝑗 = 𝑝 𝑖𝑗 𝑦 𝑖,(𝑗,𝑠) = 𝑝 𝑖𝑗 𝑥 𝑖𝑗
𝑠=1 𝑖 𝑖 𝑠=1 𝑖
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT78

Since 𝑥 is a feasible solution to the GAP-LP,


𝑘𝑗
Õ Õ
𝑞 𝑗𝑠 ≤ 𝑝 𝑖𝑗 𝑥 𝑖𝑗 ≤ 𝜆
𝑠=2 𝑖

which completes the proof. 

6.2.2 Iterative Rounding


Here we describe an alternative proof/algorithm that illustrates the iterative
rounding framework that initially came out of Jain’s seminal work [90]. Since
then it has become a powerful technique in exact and approximation algorithms.
We need some additional formalism to describe the algorithm. We will consider
the input instance as being specified by a graph 𝐺 = (𝐽 ∪ 𝑀, 𝐸) where an edge
𝑖𝑗 ∈ 𝐸 implies that 𝑖 is allowed to be schedule on 𝑗 and has size 𝑝 𝑖𝑗 . It is also
important to consider non-uniform capacities on the machines to allow for a
recursive (or iterative) algorithm. Thus we will assume that each machine 𝑀 𝑗 has
a capacity 𝑏 𝑗 . We will assume that 𝑝 𝑖𝑗 ≤ 𝑏 𝑗 for all 𝑖𝑗 ∈ 𝐸; in fact this assumption
will not be needed until the very end when we analyze the approximation ratio.
It is easy to generalize the LP relaxation for GAP-LP to handle non-uniform
capacities and to handle the constraints specified by 𝐺. We will use the notation
𝛿(𝑖) and 𝛿(𝑗) to denote the edges incident to job 𝑖 and machine 𝑗 respectively.

GAP-LP
Õ
min 𝑐 𝑖𝑗 𝑥 𝑖𝑗
(𝑖,𝑗)∈𝐸
Õ
subject to 𝑥 𝑖𝑗 = 1 ∀𝑖 ∈ 𝐽
𝑗: (𝑖,𝑗)∈𝛿(𝑖)
Õ
𝑝 𝑖𝑗 𝑥 𝑖𝑗 ≤ 𝑏 𝑗 ∀𝑗 ∈ 𝑀
𝑖: (𝑖,𝑗)∈𝐸

𝑥 𝑖𝑗 ≥ 0 ∀(𝑖, 𝑗) ∈ 𝐸

To explain the underlying intuition for iterated rounding approach in the


specific context of GAP, consider the situation where each machine 𝑗 has infinite
capacity. In this case it is easy to find the minimum cost assignment. We simply
assign each job 𝑖 to the machine 𝑗 = arg min 𝑗0 ∈𝛿(𝑖) 𝑐 𝑖𝑗0 which is the cheapest one
that it is allowed to be assigned to. We also observe that if we drop all the
capacity constraints from GAP-LP, and only leave the assignment constraints
( 𝑗 𝑥 𝑖𝑗 = 1 for each 𝑖), then the optimum solution of this LP is the same as the
Í
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT79

one obtained by assigning each job to its cheapest allowed machine (one can
also argue that the LP is an integer polytope). Now consider another scenario.
Suppose each machine 𝑗 has in-degree at most 𝑘 in 𝐺 — that is, there are only 𝑘
jobs that can ever be assigned to any machine 𝑗. Now suppose we assign each
job to its cheapest allowed machine. Clearly the cost is at most the optimum
cost of any feasible solution. But what about the load? Since each machine had
in-degree at most 𝑘 we will load a machine 𝑗 to at most 𝑘𝑏 𝑗 . Thus, if 𝑘 = 2 we will
only violate the machine’s load by a factor of 2. However, this seems to be very
restrictive assumption. Now consider a less restrictive scenario where there is
one machine 𝑗 such that its in-degree is at most 2. Then, in the LP relaxation, we
can omit the constraint that limits its load since we are guaranteed that at most
2 jobs can be assigned to it (note that we still have the job assignment constraints
which only allow a job to be assigned to machines according to the edges of 𝐺).
Omitting constraints in an iterative fashion by taking advantage of sparsity in
the basic feasible solution is the key idea.
To allow dropping of constraints we need some notation. Given an instance of
GAP specified by 𝐺 = (𝐽 ∪ 𝑀, 𝐸) and 𝑀 0 ⊆ 𝑀, we let 𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0) denote the
LP relaxation for GAP where we only impose the load constraints for machines
in 𝑀 0. In other words we drop the load constraints for 𝑀 \ 𝑀 0. Note that jobs
are still allowed to be assigned to machines in 𝑀 \ 𝑀 0.
The key structural lemma that allows for iterated rounding is the following.
Lemma 6.6. Let 𝑦 be a basic feasible solution to 𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0). Then one of the
following properties holds:
1. There is some 𝑖𝑗 ∈ 𝐸 such 𝑦 𝑖𝑗 = 0 or 𝑦 𝑖𝑗 = 1.

2. There is a machine 𝑗 ∈ 𝑀 0 such that 𝑑(𝑗) ≤ 1.

3. There is a machine 𝑗 ∈ 𝑀 0 such that 𝑑𝑒 𝑔(𝑗) = 2 and 𝑦 𝑖𝑗 ≥ 1.


Í
𝑖𝑗

Proof. Let 𝑦 be a basic feasible solution. If 𝑦 𝑖𝑗 = 0 or 𝑦 𝑖𝑗 = 1 for some edge 𝑖𝑗 ∈ 𝐸


we are done. Similarly if there is a machine 𝑗 ∈ 𝑀 0 with 𝑑(𝑗) ≤ 1 we are done.
Thus we restrict our attention to the case when 𝑦 𝑖𝑗 is strictly fractional for every
edge 𝑖𝑗 ∈ 𝐸 and 𝑑(𝑗) > 1 for each machine 𝑗 ∈ 𝑀 0. Note that 𝑑𝑒 𝑔(𝑖) ≥ 2 for each
job 𝐽; otherwise 𝑑𝑒 𝑔(𝑖) = 1 and in that case 𝑦 𝑖𝑗 = 1 for the lone edge 𝑖𝑗 incident
to 𝑖. We will prove that the third property holds.
𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0) has 𝑛 +𝑚 0 non-trivial constraints where 𝑛 = |𝐽 | and 𝑚 0 = |𝑀 0 |.
Since 𝑦 is a basic feasible solution, via the rank lemma, this implies that
|𝐸| = 𝑛 + 𝑚 0. But degree of every 𝑖 and every 𝑗 ∈ 𝑀 0 is at least 2. This implies
that 𝑑𝑒 𝑔(𝑖) = 2 for every 𝑖 ∈ 𝐽 and 𝑑𝑒 𝑔(𝑗) = 2 for every 𝑗 ∈ 𝑀 0 and 𝑑𝑒 𝑔(𝑗) = 0
for every 𝑗 ∈ 𝑀 \ 𝑀 0. Thus 𝐺 consists of a collection of disjoint even cycles.
Let 𝑆 be any such cycle. For every job 𝑖 ∈ 𝑆 we have sum of 𝑗 𝑦 𝑖𝑗 = 1; hence
Í
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT80

𝑦 𝑖𝑗 = |𝑆|/2. Hence there is some machine 𝑗 ∈ 𝑆 such that 𝑦 𝑖𝑗 ≥ 1 and


Í Í
𝑖𝑗∈𝑆 𝑖𝑗
moreover its degree is exacly two as we argued. 

GAP-Iter-Rounding(𝐺 )

1. 𝐹 = ∅, 𝑀 0 = 𝑀

2. While (|𝐹| < 𝑛) do

A. Obtain an optimum basic feasible solution 𝑦 to 𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0)


B. If there is 𝑖𝑗 ∈ 𝐸 such that 𝑦 𝑖𝑗 = 0 then 𝐺 = 𝐺 − 𝑖𝑗
C. Else If there is 𝑖𝑗 ∈ 𝐸 such that 𝑦 𝑖𝑗 = 1 then
𝐹 = 𝐹 ∪ {(𝑖𝑗)}, 𝐺 = 𝐺 − 𝑖 , 𝑏 𝑗 = 𝑏 𝑗 − 𝑝 𝑖𝑗
Õ
D. Else If there 𝑗 ∈ 𝑀 0 such that 𝑑(𝑗) = 1 or 𝑑(𝑗) = 2 and 𝑦 𝑖𝑗 ≥ 1 then
𝑖
𝑀0 = 𝑀0 − 𝑗.

3. Output assignment 𝐹

Theorem 6.7. Given an instance of GAP that is feasible and has optimum cost 𝐶, the
algorithm GAP-Iter-Rounding outputs an assignment whose cost is at most 𝐶 and
such that each machine 𝑗 has load at most 2𝑏 𝑗 .
The proof is by induction on the number of iterations. Alternatively, it is
useful to view the algorithm recursively. We will sketch the proof and leave
some of the formal details to the reader (who can also consult [110]). We observe
that the algorithm makes progress in each iteration via Lemma 6.6. The analysis
will consider the four cases that can happen in each iteration: (i) 𝑦 𝑖𝑗 = 0 for some
𝑖𝑗 ∈ 𝐸 (ii) 𝑦 𝑖𝑗 = 1 for some 𝑖𝑗 ∈ 𝐸 (iii) 𝑑(𝑗) ≤ 1 for some 𝑗 ∈ 𝑀 0 and (iv) 𝑑(𝑗) = 2
and 𝑖𝑗 𝑦 𝑖𝑗 ≥ 1 for some 𝑗 ∈ 𝑀 0.
Í
Thus the algorithm terminates in polynomial number of iterations. It is also
not hard to see that 𝐹 corresponds to an assignment of jobs to machines.
Observation 6.8. The algorithm terminates and outputs an assignment of jobs to
machines, and and job 𝑖 is assigned to 𝑗 implies 𝑖𝑗 ∈ 𝐸.
Now we prove that the assignment has good properties in terms of the cost
and loads.
Lemma 6.7. The cost of the LP solution at the start of each iteration is at most
𝐶 − 𝑖𝑗∈𝐹 𝑐 𝑖𝑗 . Hence, at the end of the algorithm the cost of the assignment 𝐹 is at most
Í
𝐶.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT81

Proof. This is true in the first iteration since 𝐹 = ∅ and the LP cost is less than that
of an optimum integer feasible solution. Now consider an iteration assuming
that the precondition holds.
If 𝑦 𝑖𝑗 = 0 we remove 𝑖𝑗 from 𝐸 and we note that the cost of the LP for the next
iteration does not increase since 𝑦 itself is feasible for the residual instance.
If 𝑦 𝑖𝑗 = 1 and we add 𝑖𝑗 to 𝐹, we can charge the cost of 𝑖𝑗 to what the LP has
already paid on the edge 𝑖𝑗, and the solution 𝑦 with 𝑖𝑗 removed is feasible to the
residual instance obtained by removing job 𝑖 and reducing the capacity of 𝑗 to
𝑏 𝑗 − 𝑝 𝑖𝑗 .
In the other cases we do not change 𝐹 but drop constraints so the LP cost can
only decrease in the subsequent iteration. 
Now we upper bound the load on each machine 𝑗.

Lemma 6.8. For each machine 𝑗, 𝑖𝑗∈𝐹 𝑝 𝑖𝑗 ≤ 2𝑏 𝑗 . In fact, a stronger property holds:
Í
for each 𝑗, its load at the end of the algorithm is at most 𝑏 𝑗 or there is a single job assigned
to 𝑗 such that removing it reduces the load of 𝑗 to at most 𝑏 𝑗 .

Proof. The proof is by induction on iterations. We will sketch it. Consider a


machine 𝑗. If 𝑦 𝑖𝑗 = 0 in some iteration we remove 𝑖𝑗 and the load on any machine
does not change. If 𝑦 𝑖𝑗 = 1 we add 𝑖𝑗 to 𝐹 but for subsequent iterations we reduce
𝑏 𝑗 by 𝑝 𝑖𝑗 hence we account for the increase in load of 𝑗.
Thus, the only reason why the load of 𝑗 may exceed 𝑏 𝑗 is because we drop
the load constraint for 𝑗 in some iteration. If we drop it when 𝑑(𝑗) = 1, then at
most one more job can be assigned to 𝑗 and hence its final load can be at most
𝑏 𝑗 + 𝑝 𝑖𝑗 for some 𝑖𝑗 ∈ 𝐸. Thus, if 𝑝 𝑖𝑗 ≤ 𝑏 𝑗 for all 𝑖 the load is at most 2𝑏 𝑗 . We can
also drop the constraint for 𝑗 when 𝑑(𝑗) = 2. However, in this case we have the
property that 𝑦 𝑖 𝑎 ,𝑗 + 𝑦 𝑖𝑏 ,𝑗 ≥ 1 for some two jobs 𝑖 𝑎 and 𝑖 𝑏 which are the only edges
incident to 𝑗 in that iteration. Since 𝑦 was feasible, we also had the constraint
that 𝑝 𝑖 𝑎 𝑗 𝑦 𝑖 𝑎 𝑗 + 𝑝 𝑖𝑏 𝑗 𝑦 𝑖𝑏 𝑗 ≤ 𝑏 0𝑗 where 𝑏 0𝑗 was the residual capacity of 𝑗 in that iteration.
Assume without loss of generality that 𝑝 𝑖 𝑎 𝑗 ≤ 𝑝 𝑖𝑏 𝑗 ; then it follows that 𝑏 0𝑗 ≥ 𝑝 𝑖 𝑎 𝑗 .
Thus the final load of 𝑗 is at most 𝑏 𝑗 − 𝑏 0𝑗 + 𝑝 𝑖 𝑎 𝑗 + 𝑝 𝑖𝑏 𝑗 since both 𝑖 𝑎 and 𝑖 𝑏 can be
assigned to 𝑗. But this load is at most 𝑏 𝑗 + 𝑝 𝑖𝑏 𝑗 ≤ 2𝑏 𝑗 . We leave it to the reader to
verify the refined property regarding the load claimed in the lemma. 
Running time: The algorithm runs in polynomial number of iterations, and in
each iteration it requires an optimum basic feasible solution to the 𝐺𝐴𝑃𝐿𝑃(𝐺, 𝑀 0).
This can be done in polynomial time. We remark that the algorithm is deliberately
described in a simple iterative fashion to make the proof easier. One can speed
up the algorithm by considering all the cases together in each iteration. Although
iterated rounding is a powerful technique, the running time is typically expensive.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT82

Finding faster algorithms that achieve similar guarantees is an open area of


research.

6.3 Maximization version of GAP


We consider the maximization version which we refer to Max-GAP. We have
𝑛 items and 𝑚 bins (instead of jobs and machines) where 𝑝 𝑖𝑗 is the size of
item 𝑖 in bin 𝑗. Each bin 𝑗 has a capacity 𝑐 𝑗 and assigning item 𝑖 to bin 𝑗
yields a profit/weight 𝑤 𝑖𝑗 . The goal is to assign items to bins to maximize the
weight while not violating any capacities. When 𝑚 = 1 we obtain the Knapsack
problem.
Multiple Knapsack Problem (MKP): MKP is a special case of Max-GAP in
which 𝑤 𝑖𝑗 = 𝑤 𝑖 for all 𝑖, 𝑗 and 𝑝 𝑖𝑗 = 𝑝 𝑖 for all 𝑖, 𝑗. In other words the item
characteristics do not depend on where it is assigned to.
Exercise 6.3. Prove that MKP does not admit an FPTAS even for 𝑚 = 2.
MKP admits a PTAS [38] and even an EPTAS [94]. Simply greedy algorithms
that pack bins one by one using an algorithm for Knapsack as a black box yield
a (1 − 1/𝑒 − 𝜖) and 1/2 − 𝜖 approximation for MKP when the bins are indentical
and when the bins are arbitrary [38].
In contrast to MKP, Max-GAP does not admit a PTAS. There is an absolute
constant 𝑐 > 1 such that a 𝑐 − 𝜖 approximation implies 𝑃 = 𝑁𝑃 [38]. However,
the following is known.
Theorem 6.9. For every fixed 𝑚 there is a PTAS for Max-GAP.
The preceding theorem can be shown by generalizing the ideas behind the
PTAS for Knapsack we discussed in an earlier chapter. An interested reader
may try to prove this by considering the case of 𝑚 = 2.
A 12 -approximation: There is a simple yet clever way to achieve a 12 -approximation
for Max-GAP via the 2-approximation for the min-cost version that we already
saw. Recall that for the min-cost version the algorithm output a solution with
cost no more than the optimum cost while violating the capacity of each bin by
at most one item. We outline how one can use this to obtain a 2-approximation
for Max-GAP and leave the details are an exercise to the reader.
• Reduce the exact maximization version to the exact min-cost version in
which all items have to be assigned by adding an extra dummy bin.
• Use the result for min-cost version to obtain an assignment with weight at
least that of the optimum while violating each bin’s capacity by at most
one item.
CHAPTER 6. UNRELATED MACHINE SCHEDULING AND GENERALIZED ASSIGNMENT83

• Use the preceding assignment to find a feasible packing of items that has
profit at least OPT/2.

For Max-GAP one can use a stronger LP relaxation and obtain a (1 − 1/𝑒 + 𝛿)-
approximation. We refer the reader to [58] for this result, and also to [28] for
connections to submodular function maximization. The latter connection allows
one to obtain an extremely simple 1/2 − 𝜖 greedy approximation algorithm that
is not obvious to discover.

6.4 Bibilographic Notes


The 2-approximation for unrelated machine scheduling is by Lenstra, Shmoys
and Tardos [115]. The same paper showed that unless 𝑃 = 𝑁𝑃 there is no
3/2 − 𝜖-approximation for unrelated machine scheduling. Bridging this gap
has been a major open problem in scheduling. A special case called restricted
assignment problem has been extensively studied in this context; in such
instances 𝑝 𝑖𝑗 ∈ {𝑝 𝑖 , ∞} which means that a job specifies the machines it can be
assigned to, but its processing time does not vary among the machines it can
be assigned to. The 3/2 hardness from [115] applies to restricted assignement
problem as well. Svensson [145] showed that a strengthend form of LP relaxation
(called the configuration LP) has an integrality gap better than 2 for restricted
assignment problem but so far there is no polynomial time algorithm to actually
output an assignment! The best algorithm that beats 2 for this case runs in
quasi-polynomial time [95].
As we mentioned Shmoys and Tardos obtained the 2-approximation for
GAP. The iterated rounding proof is from [110].
The approximability of Max-GAP when 𝑚 is not a fixed constant was studied
in [38] although a PTAS for fixed 𝑚 was known quite early [33]. The current best
approximation ratio is via a configuration LP [58]. The precise integrality gap of
the configuration LP is an interesting open problem.
Chapter 7

Congestion Minimization in
Networks

In Chapter 6 we saw the Scheduling on Unrelated Parallel Machines problem.


Here we consider two problems that also consider allocation with the objective of
minimizing load/congestion. We will first consider the Congestion Minimization
problem in graphs and then the abstract problem of Min-Max-Integer-Programs.

7.1 Congestion Minimization and VLSI Routing


A classical routing problem that was partly inspired by VLSI (very large scale
integrated) circuit design is the following. Let 𝐺 = (𝑉 , 𝐸) be a directed graph and
let (𝑠 1 , 𝑡1 ), . . . , (𝑠 𝑘 , 𝑡 𝑘 ) be 𝑘 source-sink pairs. We want to connect each pair (𝑠 𝑖 , 𝑡 𝑖 )
by a path 𝑃𝑖 such that the paths do not share edges (or nodes). Alternatively
we would like to minimize the maximum number of paths that use any edge
— this is called the Congestion Minimization problem. A special case is EDP
which is to decide if there are paths for the pairs that are edge-disjoint (NDP is
the version where the paths are required to be node-disjoint). EDP and NDP
are classical NP-Complete problems and have many impoartnat connections
to multicommodity flows, routing, cuts, and graph theory. Thus Congestion
Minimization is an NP-Hard optimization problem. Here we will consider two
variants.
Choosing one path from a given collection: We consider a conceptually
simpler variant where we are given a finite path collection 𝒫𝑖 for each pair (𝑠 𝑖 , 𝑡 𝑖 )
where each path 𝑃 ∈ 𝒫𝑖 connects 𝑠 𝑖 to 𝑡 𝑖 . The goal is to choose, for each pair
(𝑠 𝑖 , 𝑡 𝑖 ), one path from 𝒫𝑖 so as to minimize the maximum congestion on any
edge. We can develop an integer programming formulation as follows. For each
𝑖 and each 𝑃 ∈ 𝒫𝑖 we have a variable 𝑥 𝑖,𝑃 which indicates whether we choose

84
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 85

𝑃 to route pair 𝑖. The constraints express that exactly one path is for each pair
𝑖. To minimize the maximum number of paths using any edge we introduce a
variable 𝜆 and minimize it subject it to a natural packing constraint.

minimize 𝜆
Õ
subject to 𝑥 𝑖,𝑃 = 1 1≤𝑖≤𝑘
𝑃∈𝒫𝑖
𝑘
Õ Õ
𝑥 𝑖,𝑃 ≤ 𝜆 ∀𝑒 ∈ 𝐸
𝑖=1 𝑃∈𝒫𝑖 ,𝑃3𝑒

𝑥 𝑖,𝑃 ∈ {0, 1} 1 ≤ 𝑖 ≤ 𝑘, 𝑃 ∈ 𝒫𝑖

As usual we relax the integer constraints to obtain an LP relaxation where


we replace 𝑥 𝑖,𝑃 ∈ {0, 1} with 𝑥 𝑖,𝑃 ∈ [0, 1] (in this particular case we can simply
use 𝑥 𝑖,𝑃 ≥ 0 due to the other constraints). Note the similarities with the IP/LP
for Scheduling on Unrelated Parallel Machines. The LP relaxation is of size
polynomial in the input size since the path collection 𝒫𝑖 is explicitly given for
each 𝑖 as part of the input.
Let 𝜆∗ be the optimum LP solution value. There is a technicality that arises
just as we saw with Scheduling on Unrelated Parallel Machines. It may
happen that 𝜆∗ < 1 while we know that the optimum congestion is at least 1.
Technically we should find the smallest integer 𝜆∗ such that the preceding LP is
feasible. We will assume henceforth that 𝜆∗ is an integer.
How do we round? A simple stragegy is randomized rounding. In fact
the technique of randomized rounding and analysis via Chernoff bounds was
developed in the influential paper of Raghavan and Thompson [133] precisely
for this problem!

Randomized-Rounding

1. Solve LP relaxation and find optimum solution 𝑥 ∗ , 𝜆∗ .

2. For 𝑖 = 1 to 𝑘 do

A. Pick exactly one path 𝑄 𝑖 ∈ 𝒫𝑖 randomly where the probability of


picking 𝑃 is exactly 𝑥 ∗𝑖,𝑃 .

3. Output 𝑄 1 , 𝑄 2 , . . . , 𝑄 𝑘 .

Note that the choices for the pairs done with independent randomness.
The analysis requires the use of Chernoff-Hoeffding bounds. See Chapter B.
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 86

Theorem 7.1. Randomized rounding outputs one path per pair and with probability
log 𝑚
at least (1 − 1/𝑚 2 ) no edge is contained in more than 𝑐 log log 𝑚 · 𝜆∗ paths where 𝑐 is
an absolute constant. Here 𝑚 are the number of edges in the graph 𝐺. One can also
log 𝑚
show that for any fixed 𝜖 > 0 the congestion is at most (1 + 𝜖)𝜆∗ + 𝑐 𝜖2 with high
probability.

Proof. The proof is a simple application of Chernoff-bounds and the union


bound. Fix an edge 𝑒 ∈ 𝐸. Let 𝑌𝑒 be the random variable which is the total
number of paths in 𝑄 1 , 𝑄2 , . . . , 𝑄 𝑘 that use 𝑒. Let 𝑌𝑒 ,𝑖 be the binary random
Í𝑘
variable that indicates whether 𝑒 ∈ 𝑄 𝑖 . Note that 𝑌𝑒 = 𝑖=1 𝑌𝑒 ,𝑖 .
The first observation is that the variables 𝑌𝑒 ,1 , 𝑌𝑒 ,2 , . . . , 𝑌𝑒 ,𝑘 are independent
since we used independent randomness for the pairs. Second we claim that
E[𝑌𝑒 ,𝑖 ] = P[𝑌𝑒 ,𝑖 = 1] = 𝑃∈𝒫𝑖 ,𝑒∈𝑃 𝑥 ∗𝑖,𝑃 . Do you see why? Thus, by linearity of
Í
expectation, Õ Õ Õ
E[𝑌𝑒 ] = E[𝑌𝑒 ,𝑖 ] = 𝑥 ∗𝑖,𝑃 ≤ 𝜆∗
𝑖 𝑖 𝑃∈𝒫𝑖 ,𝑒∈𝑃

where the last inequality follows from the LP constraint.


Since 𝑌𝑒 is the sum of independent binary valued random variables and
log 𝑚
E[𝑌𝑒 ] ≤ 𝜆∗ we can apply Chernoff-bounds to estimate P[𝑌𝑒 ≥ 𝑐 log log 𝑚 𝜆∗ ]. Ap-
plying Corollary B.5 we conclude that we can choose 𝑐 such that this probability
is at most 1/𝑚 3 . Now we apply the union bound over all edges and conclude
that
log 𝑚
P[∃𝑒 ∈ 𝐸, 𝑌𝑒 ≥ 𝑐 𝜆∗ ] ≤ 𝑚/𝑚 3 ≤ 1/𝑚 2 .
log log 𝑚
log 𝑚
Thus, with probability ≥ 1 − 1/𝑚 2 no edge is loaded to more that 𝑐 log log 𝑚 𝜆∗ .
The second bound can be derived in the same way by using the second
Chernoff-bound in Corollary B.5. 
Remark 7.1. We chose the bound (1−1/𝑚 2 ) for concreteness. A success probability
of the form (1 − 1/poly(𝑛)) where 𝑛 is the input size is typically called “with
high probability”.
Remark 7.2. The bound (1 + 𝜖)𝜆∗ + 𝑐 log 𝑚/𝜖 2 implies that when 𝜆∗ ≥ 𝑐 log 𝑚 we
obtain a constant factor approximation.
Remark 7.3. In a graph 𝑚 = 𝑂(𝑛 2 ) and hence one often sees the bounds expressed
in terms of 𝑛 rather than 𝑚. We chose to write in terms of 𝑚 rather than 𝑛 to
highlight the fact that the bound depends on the number of constraints via the
union bound. We will discuss later how column sparsity based bounds can give
refined results that avoid the union bound.
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 87

Implicitly given path collections: In the traditional version of Congestion


Minimization we are only given 𝐺 and the pairs, and the goal is to choose one
path 𝑃 for each (𝑠 𝑖 , 𝑡 𝑖 ) pair from the set of all paths between 𝑠 𝑖 and 𝑡 𝑖 . In other
words 𝒫𝑖 = {𝑃 | 𝑃 is an 𝑠 𝑖 -𝑡 𝑖 path in 𝐺}. 𝒫𝑖 is implicity defined and its size can
be exponential in the graph size. It is not obvious that we can solve the LP
relaxation that we saw above. However, one can indeed solve it in polynomial-
time via the Ellipsoid method. First, we observe that the LP could have an
exponential number of variables but only a polynomial number of non-trivial
constraints: 𝑘 for the pairs and 𝑚 for the edges. Thus, one is guaranteed, by
the rank lemma that there is an optimum solution that has only 𝑘 + 𝑚 non-zero
variables To see that one can indeed find it efficiently, we need to look at the dual
and notice that the separation oracle for the dual is the shortest path problem.
Another way to see that the LP can solved is by writing a compact formulation
via the well-known multicommodity flow. We want to send one unit of flow
from 𝑠 𝑖 to 𝑡 𝑖 so that the total flow on any edge is at most 𝜆. We use variables
𝑓 (𝑒 , 𝑖) to denote a flow for pair 𝑖 on edge 𝑒.

minimize 𝜆
Õ Õ
subject to 𝑓 (𝑒 , 𝑖) − 𝑓 (𝑒 , 𝑖) = 1 1≤𝑖≤𝑘
𝑒∈𝛿+ (𝑠 ) 𝑒∈𝛿− (𝑠 𝑖)
Õ𝑖 Õ
𝑓 (𝑒 , 𝑖) − 𝑓 (𝑒 , 𝑖) = 0 1 ≤ 𝑖 ≤ 𝑘, 𝑣 ∈ 𝑉 − {𝑠 𝑖 , 𝑡 𝑖 }
𝑒∈𝛿+ (𝑣) 𝑒∈𝛿− (𝑣)
𝑘
Õ
𝑓 (𝑒 , 𝑖) ≤ 𝜆 ∀𝑒 ∈ 𝐸
𝑖=1
𝑓 (𝑒 , 𝑖) ≥ 0 1 ≤ 𝑖 ≤ 𝑘, 𝑒 ∈ 𝐸

The preceding multicommodity flow LP has a polynomial number of vari-


ables and can be solved in polynomial-time. Given a flow, for each com-
modity/pair 𝑖 we can take the one unit of 𝑠 𝑖 -𝑡 𝑖 flow and use standard flow-
decomposition to obtain a path-flow with at most 𝑚 paths in the collection. We
can then apply the rounding that we saw above with an given explicit path
collection in exactly the same way.
Remark 7.4. The Ellipsoid based algorithm may seem impractical. However, one
can approximately solve the implicit path based LP via multiplicative weight
update methods efficiently. The implicit formulation and Ellipsoid method is
also useful when one may want to restrict 𝒫𝑖 in some fashion. For instance we
can set 𝒫𝑖 to be the set of all 𝑠 𝑖 -𝑡 𝑖 paths in 𝐺 with at most 𝑑 edges for some given
parameter 𝑑. This will ensure that we choose only “short” paths for each pair. It
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 88

it
is not hard to see that the separation oracle for the dual is another shortest path
type problem that can be solved efficiently (via Bellman-Ford type algorithm).
This is not easy to capture/see via the compact flow based formulation.
Derandomization: Is there a deterministic algorithm with the roughly the
same approximation guarantee? The algorithm can be derandomized via the
notion of pessimistic estimators. Congestion Minimization was one of the first
instances with a sophisticated use of this technique [132].
Integrality gap and Hardness of Approximation: There is simple yet clever
example demonstrating that the integrality gap of the flow relaxation in directed
graphs is Ω(log 𝑚/log log 𝑚) [114]. In a remarkable result, [44] showed that
𝑂(log 𝑚/log log 𝑚) is the hardness factor. The complexity of Congestion Mini-
mization is less clear in undirected graphs. It is known that the LP integrality gap
and hardness of approximation are Ω(log log 𝑛/log log log 𝑛) [8]. Closing the
gap betten the upper and lower bounds is a major open problem.
Here we outline the integrality gap example for directed graphs from [114].
The graph 𝐺 and the pairs are constructed in a recursive fashion. Let ℎ be a
parameter that we will fix later. We start with a directed path 𝑣 0 , 𝑣1 , . . . , 𝑣 𝑛 . We
add a demand pair (𝑠 1 , 𝑡1 ) which connects to the path as follows. We partition
the path into 𝑛/ℎ paths of equal length: add an arc to 𝑠 to the start of each
sub-path and an arc from the end of each sub-path to 𝑡. See figure.
One can see from the figure that the pair (𝑠, 𝑡) can splits its flow along ℎ paths.
Now we consider each of the ℎ sub-paths and recursively create an instance on
the path with length 𝑛/ℎ − 1 (while keeping parameter ℎ the same). Note that in
the second level of the recursion we add ℎ new source-sink pairs, one for each
sub-path. We stop the recursion when the size of the sub-path is Θ(ℎ). Let 𝑑 be
the depth of the recursion.
We claim that there is a fractional routing of all demand pairs where the
congestion is at most 𝑑/ℎ. This follows by splitting the flow of the pairs ℎ ways.
The next claim is that some edge has congestion 𝑑 in any integral routing. This
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 89

can be seen inductively. The top level pair (𝑠, 𝑡) has to choose one amongst the
ℎ sub-paths — all edges in that sub-path will be used by the route for (𝑠, 𝑡).
Inductively there is some edge in that sub-path with congestion 𝑑 − 1 and hence
the congestion of that edge will 𝑑 when we add the path for (𝑠, 𝑡).
It now remains to set the parameters. If we choose ℎ = log2 𝑛 say then
𝑑 = Θ(log 𝑛/log log 𝑛). The fractional congestion is ≤ 1 and integrally congestion
is Θ(log 𝑛/log log 𝑛).
Short paths and improved congestion via Lovász-Local-Lemma: We consider
the congestion minimization problem when the path for each pair is required
to be “short”. By this we mean that we are required to route on a path with
at most 𝑑 edges where 𝑑 is some given parameter. One can imagine that in
many applications 𝑑 is small and is a fixed constant, say 10. The question
is whether the approximation ratio can be improved. Indeed one can show
that the LP integrality gap is 𝑂(log 𝑑/log log 𝑑). Thus, when 𝑑  𝑛 we get a
substantial improvement. However, proving this and obtaining a polynomial
time algorithm are quite non-trivial. One requires the use of the subtle Lovász-
Local-Lemma (LLL), a powerful tool in probabilistic combinatorics. Typically
LLL only gives a proof of existence and there was substantial work in making LLL
constructive/efficient. Srinivasan obtained an algorithm via derandomization of
LLL in this context with a lot of technical work [142]. There was a breakthrough
work of Moser and Tardos [123] that gave an extremely simple way to make
LLL constructive and this has been refined and developed over the last decade.
For the congestion minimization problem we refer the reader to [75] which
builds upon [123] and describes an efficient randomized algorithm that outputs
a solution with congestion 𝑂(log 𝑑/log log 𝑑). In fact the application is given in
the context of a more abstract problem that we discuss in the next section.
Integer flows and Unsplittable flows: We worked with the simple setting
where each pair (𝑠 𝑖 , 𝑡 𝑖 ) wishes to send one unit of flow. One can imagine a
situation where one wants to send 𝑑 𝑖 units of flow for pair 𝑖 where 𝑑 𝑖 is some
(integer) demand value. There are two interesting variants. The first one requires
integer valued flow for each pair which means that we want to find 𝑑 𝑖 paths for
(𝑠 𝑖 , 𝑡 𝑖 ) that each carry one unit of flow (the paths can overlap). This variant can
be essentially reduced to the unit demand flow by creating 𝑑 𝑖 copies of (𝑠 𝑖 , 𝑡 𝑖 )
— we leave this as a simple exercise for the reader. The second variant is that
we want each pair’s flow of 𝑑 𝑖 units to be sent along a single path — this is
called unsplittable flow. When discussing unsplittable flow it is also natural to
consider capacities on the edges. Thus, each edge has a capacity 𝑢𝑒 and one
wants to minimize congestion relative to 𝑢𝑒 . The techniques we discussed can
be generalized relatively easily to this version as well to obtain the same kind
of bounds. The unsplittable flow problem is interesting even in the setting
CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 90

where there is a single source/sink or when the graph is a simple ring or a path.
Interesting results are known here and we refer the reader to [2, 30, 49, 70, 122,
141] for further pointers.

7.2 Min-max Integer Programs


If one looks at the rounding and analysis for Congestion Minimization we notice
that the algorithm uses very little about the structure of the graph. This can be
thought about in two ways. One is that perhaps we can do better by exploiting
graph structure. Two, we can abstract the problem into a more general class
where the same technique applies. As we mentioned, in directed graphs the
bound of 𝑂(log 𝑛/log log 𝑛) is tight but the bound may not be tight in undirected
graphs which admit more structure.
Here we consider the second point and develop a resource allocation view
point while making an analogy to Congestion Minimization so that the ab-
stract problem can be more easily understood. Suppose we have 𝑚 resources
𝑒1 , 𝑒2 , . . . , 𝑒 𝑚 . We have 𝑘 requests 𝑟1 , 𝑟2 , . . . , 𝑟 𝑘 . Each request 𝑖 can be satisfied in
ℓ 𝑖 ways — let 𝒫𝑖 denote a collection of ℓ 𝑖 vectors 𝑣 𝑖,1 , 𝑣 𝑖,2 , . . . , 𝑣 𝑖,ℓ 𝑖 . Each vector
𝑣 𝑖,𝑗 ∈ 𝒫𝑖 is an 𝑚-dimensions: for each 𝑘 ∈ [𝑚], 𝑣 𝑖,𝑗,𝑘 is a scalar that represents
the load it induces on resource 𝑒 𝑘 . The goal is to choose, for each 𝑖, exactly one
𝑗 ∈ [ℓ 𝑖 ] so as to minimize the maximum load on any resource. One can write
this conveniently as the following integer program where we have variables 𝑥 𝑖,𝑗
for 1 ≤ 𝑖 ≤ 𝑘 and 1 ≤ 𝑗 ≤ ℓ 𝑖 which indicates whether 𝑖 chooses 𝑗.

minimize 𝜆
Õ
subject to 𝑥 𝑖,𝑗 = 1 1≤𝑖≤𝑘
1≤𝑗≤ℓ 𝑖
𝑘
Õ Õ
𝑣 𝑖,𝑗,𝑘 𝑥 𝑖,𝑗 ≤ 𝜆 ∀𝑒 𝑘
𝑖=1 1≤𝑗≤ℓ 𝑖

𝑥 𝑖,𝑗 ∈ {0, 1} 1 ≤ 𝑖 ≤ 𝑘, 1 ≤ 𝑗 ≤ ℓ 𝑖

One can view the above integer program compactly as


CHAPTER 7. CONGESTION MINIMIZATION IN NETWORKS 91

minimize 𝜆
Õ
subject to 𝑥 𝑖,𝑗 = 1 1≤𝑖≤𝑘
1≤𝑗≤ℓ 𝑖

𝐴𝑥 ≤ 𝜆𝟙
𝑥 𝑖,𝑗 ∈ {0, 1} 1 ≤ 𝑖 ≤ 𝑘, 1 ≤ 𝑗 ≤ ℓ 𝑖

where 𝐴 is a non-negative matrix with 𝑚 rows. As with Scheduling on


Unrelated Parallel Machines we need to be careful when relaxing the IP to an
LP since the optimum solution to the LP can be a poor lower bound unless we
ensure that 𝑥 𝑖,𝑗 = 0 if 𝑣 𝑖,𝑗,𝑘 > 𝜆. We will assume that we have indeed done this.
One can do randomized rounding exactly as we did for Congestion Min-
imization and obtain an 𝑂(log 𝑚/log log 𝑚) approximation. We say that 𝐴 is
𝑑-column-sparse if the maximum number of non-zeroes in any column of 𝐴 is at
most 𝑑. This corresponds to paths in Congestion Minimization being allowed to
have only 𝑑 edges. One can obtain an 𝑂(log 𝑑/log log 𝑑)-approximation in this
more general setting as well [75].
Chapter 8

Introduction to Local Search

Local search is a powerful and widely used heuristic method (with various
extensions). In this lecture we introduce this technique in the context of
approximation algorithms. The basic outline of local search is as follows. For
an instance 𝐼 of a given problem let 𝒮(𝐼) denote the set of feasible solutions for
𝐼. For a solution 𝑆 we use the term (local) neighborhood of 𝑆 to be the set of all
solutions 𝑆0 such that 𝑆0 can be obtained from 𝑆 via some local moves. We let
𝑁(𝑆) denote the neighborhood of 𝑆.

LocalSearch:
Find a “good” initial solution 𝑆0 ∈ 𝒮(𝐼)
𝑆 ← 𝑆0
repeat
If (∃𝑆0 ∈ 𝑁(𝑆) such that val(𝑆0) is strictly better than val(𝑆))
𝑆 ← 𝑆0
Else
𝑆 is a local optimum
return 𝑆
EndIf
Until (True)

For minimization problems 𝑆0 is strictly better than 𝑆 if val(𝑆0) < val(𝑆)


whereas for maximization problems it is the case if val(𝑆0) > val(𝑆).
The running time of the generic local search algorithm depends on several
factors. First, we need an algorithm that given a solution 𝑆 either declares that 𝑆
is a local optimum or finds a solution 𝑆0 ∈ 𝑁(𝑆) such that val(𝑆0) is strictly better
thatn val(𝑆). A standard and easy approach for this is to ensure that the local
moves are defined in such a way that |𝑁(𝑆)| is polynomial in the input size |𝐼 |
and 𝑁(𝑆) can be enumerated efficiently; thus one can check each 𝑆0 ∈ 𝑁(𝑆) to

92
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 93

see if any of them is an improvement over 𝑆. However, in some more advanced


settings, 𝑁(𝑆) may be exponential in the input size but one may be able to find
a solution in 𝑆0 ∈ 𝑁(𝑆) that improves on 𝑆 in polynomial time. Second, the
running time of the algorithm depends also on the number of iterations it takes
to go from 𝑆0 to a local optimum. In the worst case the number of iterations
could be | OPT −val(𝑆0 )| which can be exponential in the input size. One can
often use a standard scaling trick to overcome this issue; we stop the algorithm
unless the improvement obtained over the current 𝑆 is a significant fraction of
val(𝑆). Finally, the quality of the initial solution 𝑆0 also factors into the running
time.
Remark 8.1. There is a distinction made sometimes in the literature between
oblivious and non-oblivious local search. In oblivious local search the algorithm
uses 𝑓 as a black box when comparing 𝑆 with a candidate solution 𝑆0; the
analysis and the definition of local moves are typically based on properties of
𝑓 . In non-oblivious local search one may use an auxiliary function 𝑔 derived
from 𝑓 in comparing 𝑆 with 𝑆0. Typically the function 𝑔 is some kind of potential
function that aids the analysis. This allows one to move to a solution 𝑆0 even
though 𝑓 (𝑆0) may not be an improvement over 𝑓 (𝑆).
Remark 8.2. There are several heuristics that are (loosely) connected to local
search. These include simulated annealing, tabu search, genetic algorithms and
others. These fall under the broad terminology of metaheuristics.

8.1 Local Search for Max Cut


We illustrate local search for the well-known Max Cut problem. In Max Cut
we are given an undirected graph 𝐺 = (𝑉 , 𝐸) and the goal is to partition 𝑉 into
(𝑆, 𝑉 \ 𝑆) so as to maximize the number of edges crossing 𝑆, that is, |𝛿 𝐺 (𝑆)|;
we will use 𝛿(𝑆) when 𝐺 is clear. For a vertex 𝑣 we use 𝛿(𝑣) instead of 𝛿({𝑣})
to simplify notation. In the weighted version each edge 𝑒 has a non-negative
weight 𝑤(𝑒) the goal is to maximize the weight of the edges crossing 𝑆, that is,
𝑤(𝛿(𝑆)); here 𝑤(𝐴) for a set 𝐴 denotes the quantity 𝑒∈𝐴 𝑤(𝑒).
Í
We consider a simple local search algorithm for Max Cut that starts with an
arbitrary set 𝑆 ⊆ 𝑉 and in each iteration either adds a vertex to 𝑆 or removes a
vertex from 𝑆 as long as it improves the cut value 𝛿(𝑆).
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 94

LocalSearch for Max Cut:


𝑆←∅
repeat
If (∃𝑣 ∈ 𝑉 \ 𝑆 such that 𝑤(𝛿(𝑆 + 𝑣)) > 𝑤(𝛿(𝑆)))
𝑆 ←𝑆+𝑣
Else If (∃𝑣 ∈ 𝑆 such that 𝑤(𝛿(𝑆 − 𝑣)) > 𝑤(𝛿(𝑆)))
𝑆 ←𝑆−𝑣
Else
𝑆 is a local optimum
return 𝑆
EndIf
Until (True)

We will first focus on the quality of solution output by the local search
algorithm.
Lemma 8.1. Let 𝑆 be the output of the local search algorithm. Then for each vertex 𝑣,
𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)) ≥ 𝑤(𝛿(𝑣))/2.
Proof. Let 𝛼 𝑣 = 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)) be the weight of edges among those incident to 𝑣
that cross the cut 𝑆. Let 𝛽 𝑣 = 𝑤(𝛿(𝑣)) − 𝛼 𝑣 .
We claim that 𝛼 𝑣 ≥ 𝛽 𝑣 for each 𝑣. If 𝑣 ∈ 𝑉 \ 𝑆 and 𝛼 𝑣 < 𝛽 𝑣 then moving 𝑣
to 𝑆 will strictly increase 𝑤(𝛿(𝑆)) and 𝑆 cannot be a local optimum. Similarly if
𝑣 ∈ 𝑆 and 𝛼 𝑣 < 𝛽 𝑣 , we would have 𝑤(𝛿(𝑆 − 𝑣)) > 𝑤(𝛿(𝑆)) and 𝑆 would not be a
local optimum. 
Corollary 8.1. If 𝑆 is a local optimum then 𝑤(𝛿(𝑆)) ≥ 𝑤(𝐸)/2 ≥ OPT/2.
Proof. Since each edge is incident to exactly two vertices we have 𝑤(𝛿(𝑆)) =
𝑣∈𝑉 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)). Apply the above lemma,

2


𝑤(𝛿(𝑆)) = 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣))
2
𝑣∈𝑉

≥ 𝑤(𝛿(𝑣))/2
2
𝑣∈𝑉
1
≥ 𝑤(𝐸)
2
1
≥ OPT,
2
since OPT ≤ 𝑤(𝐸). 

The running time of the local search algorithm depends on the number of
local improvement iterations; checking whether there is a local move that results
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 95

in an improvement can be done by trying all possible vertices. If the graph is


unweighted then the algorithm terminates in at most |𝐸| iterations. However,
in the weighted case, it is known that the algorithm can take an exponential
time in |𝑉 | when the weights are large. A very interesting problem is whether
one can find a local optimum efficiently — note that we do not need to find a
local optimum by doing local search! It is believed that finding a local optimum
for Max Cut in the weighted case is hard. There is a complexity class called
PLS that was defined by Johnson, Papadimitrioiu and Yannakakis for which
Max Cut is known to be a complete problem. PLS plays an important role in
algorithmic game theory in recent times. We refer the reader to the Wikipedia
page on PLS for many pointers.
Many local search algorithms can be modified slightly to terminate with
an approximate local optimum such that (i) the running time of the modified
algorithm is strongly polynomial in the input size and (ii) the quality of the
solution is very similar to that given by the original local search. We illustrate
these ideas for Max Cut. Consider the following algorithm where 𝜖 > 0 is a
parameter that can be chosen. Let 𝑛 be the number of nodes in 𝐺.

Modified LocalSearch for Max Cut(𝜖):


𝑆 ← {𝑣 ∗ } where 𝑣 ∗ = arg max𝑣∈𝑉 𝑤(𝛿(𝑣))
repeat
If (∃𝑣 ∈ 𝑉 \ 𝑆 such that 𝑤(𝛿(𝑆 + 𝑣)) > (1 + 𝑛𝜖 )𝑤(𝛿(𝑆)))
𝑆 ←𝑆+𝑣
Else If (∃𝑣 ∈ 𝑆 such that 𝑤(𝛿(𝑆 − 𝑣)) > (1 + 𝑛𝜖 )𝑤(𝛿(𝑆)))
𝑆 ←𝑆−𝑣
Else
return 𝑆
EndIf
Until (True)

The above algorithm terminates unless the improvement is a relative factor of


(1 + 𝑛𝜖 ) over the current solution’s value. Thus the final output 𝑆 is an approximate
local optimum.
Remark 8.3. An alert reader may wonder why the improvement is measured
with respect to the global value 𝑤(𝛿(𝑆)) rather than with respect to 𝑤(𝛿(𝑣)).
One reason is to illustrate the general idea when one may not have fine grained
information about the function like we do here in the specific case of Max Cut.
The global analysis will also play a role in the running time analysis as we will
see shortly.
Lemma 8.2. Let 𝑆 be the output of the modified local search algorithm for Max Cut.
Then 𝑤(𝛿(𝑆)) ≥ 2(1+𝜖/4)
1
𝑤(𝐸).
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 96

Proof. As before let 𝛼 𝑣 = 𝑤(𝛿(𝑆) ∩ 𝛿(𝑣)) and 𝛽 𝑣 = 𝑤(𝛿(𝑣)) − 𝛼 𝑣 . Since 𝑆 is an


approximately local optimum we claim that for each 𝑣
𝜖
𝛽𝑣 − 𝛼𝑣 ≤ 𝑤(𝛿(𝑆)).
𝑛
Otherwise a local move using 𝑣 would improve 𝑆 by more than (1 + 𝜖/𝑛) factor.
(The formal proof is left as an exercise to the reader).
We have,

𝑤(𝛿(𝑆)) = 𝛼𝑣
2
𝑣∈𝑉

= ((𝛼 𝑣 + 𝛽 𝑣 ) − (𝛽 𝑣 − 𝛼 𝑣 ))/2
2
𝑣∈𝑉
1Õ 𝜖
≥ (𝑤(𝛿(𝑣)) − 𝑤(𝑆))
4 𝑛
𝑣∈𝑉
1 1Õ𝜖
≥ 𝑤(𝐸) − 𝑤(𝑆)
2 4 𝑛
𝑣∈𝑉
1 1
≥ 𝑤(𝐸) − 𝜖 · 𝑤(𝑆).
2 4
Therefore 𝑤(𝑆)(1 + 𝜖/4) ≥ 𝑤(𝐸)/2 and the lemma follows. 
Now we argue about the number of iterations of the algorithm.

Lemma 8.3. The modified local search algorithm terminates in 𝑂( 1𝜖 𝑛 log 𝑛) iterations
of the improvement step.

Proof. We observe that 𝑤(𝑆0 ) = 𝑤(𝛿(𝑣 ∗ )) ≥ 𝑛2 𝑤(𝐸) (why?). Each local improve-
ment iteration improves 𝑤(𝛿(𝑆)) by a multiplicative factor of (1 + 𝜖/𝑛). Therefore
if 𝑘 is the number of iterations that the algorithm, then (1 + 𝜖/𝑛) 𝑘 𝑤(𝑆0 ) ≤ 𝑤(𝛿(𝑆)
where 𝑆 is the final output. However, 𝑤(𝛿(𝑆)) ≤ 𝑤(𝐸). Hence

(1 + 𝜖/𝑛) 𝑘 2𝑤(𝐸)/𝑛 ≤ 𝑤(𝐸)

which implies that 𝑘 = 𝑂( 1𝜖 𝑛 log 𝑛). 


A tight example for local optimum: Does the local search algorithm do bet-
ter than 1/2? Here we show that a local optimum is no better than a 1/2-
approximation. Consider a complete bipartite graph 𝐾 2𝑛,2𝑛 with 2𝑛 vertices in
each part. If 𝐿 and 𝑅 are the parts of a set 𝑆 where |𝑆 ∩ 𝐿| = 𝑛 = |𝑆 ∩ 𝑅| is a local
optimum with |𝛿(𝑆)| = |𝐸|/2. The optimum solution for this instance is |𝐸|.
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 97

Max Directed Cut: A problem related to Max Cut is Max Directed Cut in
which we are given a directed edge-weighted graph 𝐺 = (𝑉 , 𝐸) and the goal is
to find a set 𝑆 ⊆ 𝑉 that maximizes 𝑤(𝛿+𝐺 (𝑆)); that is, the weight of the directed
edges leaving 𝑆. One can apply a similar local search as the one for Max Cut.
However, the following example shows that the output 𝑆 can be arbitrarily bad.
Let 𝐺 = (𝑉 , 𝐸) be a directed in-star with center 𝑣 and arcs connecting each of
𝑣 1 , . . . , 𝑣 𝑛 to 𝑣. Then 𝑆 = {𝑣} is a local optimum with 𝛿+ (𝑆) = ∅ while OPT = 𝑛.
However, a minor tweak to the algorithm gives a 1/3-approximation! Instead
of returning the local optimum 𝑆 return the better of 𝑆 and 𝑉 \ 𝑆. This step is
needed because the directed cuts are not symmetric.

8.2 Local Search for Submodular Function Maximization


In this section we consider the utility of local search for maximizing non-
negative submodular functions. Let 𝑓 : 2𝑉 → ℝ+ be a non-negative submodular
set function on a ground set 𝑉. Recall that 𝑓 is submodular if 𝑓 (𝐴) + 𝑓 (𝐵) ≥
𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵) for all 𝐴, 𝐵 ⊆ 𝑉. Equivalently 𝑓 is submodular if 𝑓 (𝐴 + 𝑣) −
𝑓 (𝐴) ≥ 𝑓 (𝐵 + 𝑣) − 𝑓 (𝐵) for all 𝐴 ⊂ 𝐵 and 𝑣 ∉ 𝐵. 𝑓 is monotone if 𝑓 (𝐴) ≤ 𝑓 (𝐵) for
all 𝐴 ⊆ 𝐵. 𝑓 is symmetric if 𝑓 (𝐴) = 𝑓 (𝑉 \ 𝐴) for all 𝐴 ⊆ 𝑉. Submodular functions
arise in a number of settings in combinatorial optimization. Two important
examples are the following.
Example 8.1. Coverage in set systems. Let 𝑆1 , 𝑆2 , . . . , 𝑆𝑛 be subsets of a set 𝒰.
Let 𝑉 = {1, 2, . . . , 𝑛} and define 𝑓 : 2𝑉 → ℝ+ where 𝑓 (𝐴) = | ∪𝑖∈𝐴 𝑆 𝑖 |. 𝑓 is a
monotone submodular function. One can also associate weights to elements of
𝒰 via a function 𝑤 : 𝒰 → ℝ+ ; the function 𝑓 defined as 𝑓 (𝐴) = 𝑤(∪𝑖∈𝐴 𝑆 𝑖 ) is
also monotone submodular.
Example 8.2. Cut functions in graphs. Let 𝐺 = (𝑉 , 𝐸) be an undirected graph with
non-negative edge weights 𝑤 : 𝐸 → ℝ+ . The cut function 𝑓 : 2𝑉 → ℝ+ defined
as 𝑓 (𝑆) = 𝑒∈𝛿 𝐺 (𝑆) 𝑤(𝑒) is a symmetric submodular function; it is not monotone
Í
unless the graph is trivial. If 𝐺 is directed and we define 𝑓 as 𝑓 (𝑆) = 𝑒∈𝛿+ (𝑆) 𝑤(𝑒)
Í
𝐺
then 𝑓 is submodular but is not necessarily symmetric.
The following problem generalizes Max Cut and Max Directed Cut that we
have already seen.

Problem 8.2. Max Submod Func. Given a non-negative submodular set function 𝑓
on a ground set 𝑉 via a value oracle1 find max𝑆⊆𝑉 𝑓 (𝑆).
1A value oracle for a set function 𝑓 : 2𝑉 → ℝ provides access to the function by giving the
value 𝑓 (𝐴) when presented with the set 𝐴.
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 98

Note that if 𝑓 is monotone then the problem is trivial since 𝑉 is the optimum
solution. Therefore, the problem is interesting (and NP-Hard) only when 𝑓
is not necessarily monotone. We consider a simple local search algorithm
for Max Submod Func and show that it gives a 1/3-approximation and a 1/2-
approximation when 𝑓 is symmetric. This was shown in [57].
Remark 8.4. Given a graph 𝐺 = (𝑉 , 𝐸) consider the submodular function 𝑓 :
2𝑉 → ℝ where 𝑓 (𝑆) = |𝛿(𝑆)| − 𝐵 where 𝐵 is a fixed number. Is there a polynomial
time algorithm to decide whether there is a set 𝑆 such that 𝑓 (𝑆) ≥ 0?

LocalSearch for Max Submod Func:


𝑆←∅
repeat
If (∃𝑣 ∈ 𝑉 \ 𝑆 such that 𝑓 (𝑆 + 𝑣) > 𝑓 (𝑆))
𝑆 ←𝑆+𝑣
Else If (∃𝑣 ∈ 𝑆 such that 𝑓 (𝑆 − 𝑣) > 𝑓 (𝑆))
𝑆 ←𝑆−𝑣
Else
𝑆 is a local optimum
return the better of 𝑆 and 𝑉 \ 𝑆
EndIf
Until (True)

We start the analysis of the algorithm with a basic lemma on submodularity.

Lemma 8.4. Let 𝑓 : 2𝑉 → ℝ+ be a submodular set function on 𝑉. Let 𝐴 ⊂ 𝐵 ⊆ 𝑉.


Then

• If 𝑓 (𝐵) > 𝑓 (𝐴) then there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐴 + 𝑣) − 𝑓 (𝐴) > 0.
More generally there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐴 + 𝑣) − 𝑓 (𝐴) ≥
1
|𝐵\𝐴|
( 𝑓 (𝐵) − 𝑓 (𝐴)).

• If 𝑓 (𝐴) > 𝑓 (𝐵) then there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐵 − 𝑣) − 𝑓 (𝐵) > 0.
More generally there is an element 𝑣 ∈ 𝐵 \ 𝐴 such that 𝑓 (𝐵 − 𝑣) − 𝑓 (𝐵) ≥
1
|𝐵\𝐴|
( 𝑓 (𝐴) − 𝑓 (𝐵)).

Exercise 8.1. Prove the preceding lemma.

We obtain the following corollary.

Corollary 8.3. Let 𝑆 be a local optimum for the local search algorithm and let 𝑆 ∗ be an
optimum solution. Then 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆∗ ) and 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∪ 𝑆 ∗ ).

Theorem 8.4. The local search algorithm is a 1/3-approximation and is a 1/2-


approximation if 𝑓 is symmetric.
CHAPTER 8. INTRODUCTION TO LOCAL SEARCH 99

Proof. Let 𝑆 be the local optimum and 𝑆 ∗ be a global optimum for the given
instance. From the previous corollary we have that 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆∗ ) and
𝑓 (𝑆) ≥ 𝑓 (𝑆 ∪ 𝑆 ∗ ). Note that the algorithm outputs the better of 𝑆 and 𝑉 \ 𝑆. By
submodularity, we have,

𝑓 (𝑉 \ 𝑆) + 𝑓 (𝑆 ∪ 𝑆 ∗ ) ≥ 𝑓 (𝑆 ∗ \ 𝑆) + 𝑓 (𝑉) ≥ 𝑓 (𝑆 ∗ \ 𝑆)

where we used the non-negativity of 𝑓 in the second inequality. Putting together


the inequalities,

2 𝑓 (𝑆) + 𝑓 (𝑉 \ 𝑆) = 𝑓 (𝑆) + 𝑓 (𝑆) + 𝑓 (𝑉 \ 𝑆)


≥ 𝑓 (𝑆 ∩ 𝑆 ∗ ) + 𝑓 (𝑆∗ \ 𝑆)
≥ 𝑓 (𝑆 ∗ ) + 𝑓 (∅)
≥ 𝑓 (𝑆 ∗ ) = OPT .

Thus 2 𝑓 (𝑆) + 𝑓 (𝑉 \ 𝑆) ≥ OPT and hence max{ 𝑓 (𝑆), 𝑓 (𝑉 \ 𝑆)} ≥ OPT/3.


If 𝑓 is symmetric we argue as follows. Using Lemma 8.4 we claim that
𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆 ∗ ) as before but also that 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∪ 𝑆¯ ∗ ) where 𝐴¯ is shorthand
notation for the the complement 𝑉 \ 𝐴. Since 𝑓 is symmetric 𝑓 (𝑆 ∪ 𝑆¯ ∗ ) =
𝑓 (𝑉 \ (𝑆 ∪ 𝑆¯ ∗ )) = 𝑓 (𝑆¯ ∩ 𝑆∗ ) = 𝑓 (𝑆 ∗ \ 𝑆). Thus,

2 𝑓 (𝑆) ≥ 𝑓 (𝑆 ∩ 𝑆 ∗ ) + 𝑓 (𝑆 ∪ 𝑆¯ ∗ )
≥ 𝑓 (𝑆 ∩ 𝑆 ∗ ) + 𝑓 (𝑆∗ \ 𝑆)
≥ 𝑓 (𝑆∗ ) + 𝑓 (∅)
≥ 𝑓 (𝑆∗ ) = OPT .

Therefore 𝑓 (𝑆) ≥ OPT/2. 

The running time of the local search algorithm may not be polynomial
but one can modify the algorithm as we did for Max Cut to obtain a strongly
polynomial time algorithm that gives a (1/3 − 𝑜(1))-approximation ((1/2 − 𝑜(1) for
symmetric). See [57] for more details. There has been much work on submodular
function maximization including work on variants with additional constraints.
Local search has been a powerful tool for these problems. See [24, 60, 112]
for some of the results on local search based method, and [25] for a survey on
submodular function maximization.
Chapter 9

Clustering and Facility Location

Clustering and Facility Location are two widely studied topics with a vast
literature. Facility location problems have been well-studied in Operations
Research and logistics. Clustering is ubiquitious with many applications in data
analysis and machine learning. We confine attention to a few central problems
and provide some pointers as needed to other topics. These problems have also
played an important role in approximation algorithms and their study has led to
a variety of interesting techniques. Research on these topics is still quite active.
For both classes of problems a key assumption that we will make is that we
are working with points in some underlying metric space. Recall that a space
(𝑉 , 𝑑) where 𝑑 : 𝑉 × 𝑉 → ℝ+ is a metric space if the distance function 𝑑 satisfies
metric properties: (i) 𝑑(𝑢, 𝑣) = 0 iff 𝑢 = 𝑣 (reflexivity) (ii) 𝑑(𝑢, 𝑣) = 𝑑(𝑣, 𝑢) for
all 𝑢, 𝑣 ∈ 𝑉 (symmetry) and (iii) 𝑑(𝑢, 𝑣) + 𝑑(𝑣, 𝑤) ≥ 𝑑(𝑢, 𝑤) for all 𝑢, 𝑣, 𝑤 ∈ 𝑉
(triangle inequality). We will abuse the notation and use 𝑑(𝐴, 𝐵) for two sets
𝐴, 𝐵 ⊆ 𝑉 to denote the quantity min𝑝∈𝐴,𝑞∈𝐵 𝑑(𝑝, 𝑞). Similarly 𝑑(𝑝, 𝐴) for 𝑝 ∈ 𝑉
and 𝐴 ⊆ 𝑉 will denote min𝑞∈𝐴 𝑑(𝑝, 𝑞).
Center based clustering: In center based clustering we are given 𝑛 points
𝑃 = {𝑝 1 , 𝑝2 , . . . , 𝑝 𝑛 } in a metric space (𝑉 , 𝑑), and an integer 𝑘. The goal is to
cluster/partition 𝑃 into 𝑘 clusters 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 which are induced by choosing
𝑘 centers 𝑐 1 , 𝑐2 , . . . , 𝑐 𝑘 from 𝑉. Each point 𝑝 𝑖 is assigned to its nearest center
from 𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 and this induces a clustering. The nature of the clustering
is controlled by an objective function that measures the quality of the clusters.
Typically we phrase the problem as choosing 𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 to minimize the
clustering objective 𝑛𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 })𝑞 for some 𝑞. The three most well-
Í
studied problems are special cases obtained by choosing an appropriate 𝑞.
• 𝑘-Center is the problem when 𝑞 = ∞ which can be equivalently phrased
as min𝑐1 ,𝑐2 ,...,𝑐 𝑘 ∈𝑉 max𝑛𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 }). In other words we want to
minimize the maximum distance of the input points to the cluster centers.

100
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 101

Í𝑛
• 𝑘-Median is problem when 𝑞 = 1. min𝑐1 ,𝑐2 ,...,𝑐 𝑘 ∈𝑉 𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 }).
Í𝑛
• 𝑘-Means is the problem when 𝑞 = 2. min𝑐1 ,𝑐2 ,...,𝑐 𝑘 ∈𝑉 𝑖=1 𝑑(𝑝 𝑖 , {𝑐1 , . . . , 𝑐 𝑘 })2 .

We will mainly focus on the discrete versions of the problems where 𝑉 =


{𝑝 1 , 𝑝2 , . . . , 𝑝 𝑛 } which means that the centers are to be chosen from the input
points themselves. However, in many data analysis applications the points lie in
ℝ 𝑑 for some 𝑑 and the centers can be chosen anywhere in the ambient space. In
fact this makes the problems more difficult in a certain sense since the center
locations now come from an infinite set. One can argue that limiting centers to
the input points does not lose more than a constant factor in the approximation
and this may be reasonable from a first-cut point of view but perhaps not ideal
from a practical point of view. In some settings there may a requirement or
advantage in choosing the cluster centers from the input set.
Facility Location: In facility location we typically have two finite sets ℱ and
𝒞 where ℱ represents a set of locations where facilities can be located and 𝒟
is a set of client/demand locations. We will assume that 𝑉 = ℱ ] 𝒟 and that
there is a metrid 𝑑 over 𝑉. There are several variants but one of the simplest
one is the Uncapacitated Facility Location (UCFL) problem. In UCFL we are
given (ℱ ] 𝒟, 𝑑) as well auxiliarly information which specifies the cost 𝑓𝑖 of
opening a facility at location 𝑖 ∈ ℱ . The goal is to open a subset of facilities in ℱ
to minimize the sum of the cost of the opened facilities and the total distance
traveled by the clients to their nearest open facility. In other words we want to
solve minℱ 0 ⊆ℱ ( 𝑖∈ℱ 0 𝑓𝑖 + 𝑗∈𝒟 𝑑(𝑗, ℱ 0)). The problem has close connections to
Í Í
𝑘-Median problem. The term “uncapacitated” refers to the fact that we do not
limit the number of clients that can be assigned to an open facility. In several OR
applications that motivate facility location (opening warehouses or distribution
facilities), capacity constraints are likely to be important. For this reasons there
are several capacitated versions.

9.1 𝒌-Center
Recall that in 𝑘-Center we are given 𝑛 points 𝑝 1 , . . . , 𝑝 𝑛 in a metric space and
an integer 𝑘 and we need to choose 𝑘 cluster centers 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 } such
that we minimize max𝑖 𝑑(𝑝 𝑖 , 𝐶). An alternative view is that we wish to find the
smallest radius 𝑅 such that there are 𝑘 balls of radius 𝑅 that together cover all
the input points. Given a fixed 𝑅 this can be seen as a Set Cover problem. In
fact there is an easy reduction from Dominating Set to 𝑘-Center establishing
the NP-Hardness. Moreoever, as we saw already in Chapter 1, 𝑘-Center has no
2 − 𝜖-approximation unless 𝑃 = 𝑁𝑃 via a reduction from Dominating Set. Here
we will see two 2-approximation algorithms that are quite different and have
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 102

their own advantages. The key lemma for their analysis is common and is stated
below.

Lemma 9.1. Suppose there are 𝑘 + 1 points 𝑞1 , 𝑞2 , . . . , 𝑞 𝑘+1 ∈ 𝑃 such that 𝑑(𝑞 𝑖 , 𝑞 𝑗 ) >
2𝑅 for all 𝑖 ≠ 𝑗. Then OPT > 𝑅.

Proof. Suppose OPT ≤ 𝑅. Then there are 𝑘 centers 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 } which


induces 𝑘 clusters 𝐶1 , . . . , 𝐶 𝑘 such that for each cluster 𝐶 ℎ and each 𝑝 ∈ 𝐶 ℎ we
have 𝑑(𝑝, 𝑐 ℎ ) ≤ 𝑅. By the pigeon hole principle some 𝑞 𝑖 , 𝑞 𝑗 , 𝑖 ≠ 𝑗 are in the
same cluster 𝐶 ℎ but this implies that 𝑑(𝑞 𝑖 , 𝑞 𝑗 ) ≤ 𝑑(𝑞 𝑖 , 𝑐 ℎ ) + 𝑑(𝑞 𝑗 , 𝑐 ℎ ) ≤ 2𝑅 which
contradicts the assumption of the lemma. 
Note that the lemma holds even if the centers can be chosen from outside
the given point set 𝑃.

9.1.1 Gonzalez’s algorithm and nets in metric spaces


The algorithm starts with an empty set of centers, and in each iteration picks a
new center which is the farthest point from the set of centers chosen so far.

Gonzalez-𝑘-Center(𝑃, 𝑘 )

1. 𝐶 ← ∅

2. For 𝑖 = 1 to 𝑘 do

A. Let 𝑐 𝑖 = arg max 𝑑(𝑐, 𝐶)


𝑐∈𝑃
B. 𝐶 ← 𝐶 ∪ {𝑐 𝑖 }.

3. Output 𝐶

Note that 𝑐1 is chosen arbitrarily.

Theorem 9.1. Let 𝑅 = max𝑝∈𝑃 𝑑(𝑝, 𝐶) where 𝐶 is the set of centers chosen by
Gonzalez’s algorithm. Then 𝑅 ≤ 2𝑅 ∗ where 𝑅 ∗ is the optimum 𝑘-Center radius for 𝑃.

Proof. Suppose not. There is a point 𝑝 ∈ 𝑃 such that 𝑑(𝑝, 𝐶) > 2𝑅 ∗ which implies
that 𝑝 ∉ 𝐶. Since the algorithm chose the farthest point in each iteration and
could have chosen 𝑝 in each of the 𝑘 iteration but did not, we have the property
that 𝑑(𝑐 𝑖 , {𝑐1 , . . . , 𝑐 𝑖−1 }) > 2𝑅 ∗ for 𝑖 = 2 to 𝑘. This implies that the distance
between each pair of points in the set {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 , 𝑝} is more than 2𝑅 ∗ . By
Lemma 9.1, the optimum radius must be larger than 𝑅∗ , a contradiction. 
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 103

Exercise 9.1. Construct an instance to demonstrate that the algorithm’s worst-


case performance is 2.

Remark 9.1. Gonzalez’s algorithm can be extended in a simple way to create a


permutation of the points 𝑃; we simply run the algorithm with 𝑘 = 𝑛. It is easy
to see from the proof above that for any 𝑘 ∈ [𝑛], the prefix of the permutation
consisting of the first 𝑘 points provides a 2-approximation for that choice of 𝑘.
Thus, one can compute the permutation once and reuse it for all 𝑘.

9.1.2 Hochbaum-Shmoys bottleneck approach


A second algorithmic approach for 𝑘-Center is due to Hochbaum and Shmoys
and has played an influential role in variants of this problem.
For a point 𝑣 and radius 𝑟 let 𝐵(𝑣, 𝑟) = {𝑢 | 𝑑(𝑢, 𝑣) ≤ 𝑟} denote the ball of
radius 𝑟 around 𝑣.

HS-𝑘-Center(𝑃, 𝑘 )

1. Guess 𝑅 ∗ the optimum radius

2. 𝐶 ← ∅, 𝑆 ← 𝑃

3. While (𝑆 ≠ ∅) do

A. Let 𝑐 be an arbitrary point in 𝑆


B. 𝐶 ← 𝐶 ∪ {𝑐}
C. 𝑆 ← 𝑆 \ 𝐵(𝑐, 2𝑅 ∗ )

4. Output 𝐶

Theorem 9.2. Let 𝐶 be the output of the HS algorithm for a guess 𝑅. Then for all
𝑝 ∈ 𝑃, 𝑑(𝑝, 𝐶) ≤ 2𝑅 and moreover if 𝑅 ≥ 𝑅∗ then |𝐶| ≤ 𝑘.

Proof. The first property is easy to see since we only remove a point 𝑝 from 𝑆 if
we add a center 𝑐 to 𝐶 such that 𝑝 ∈ 𝐵(𝑐, 2𝑅). Let 𝑐1 , 𝑐2 , . . . , 𝑐 ℎ be the centers
chosen by the algorithm. We observe that 𝑑(𝑐 𝑖 , {𝑐1 , . . . , 𝑐 𝑖−1 }) > 2𝑅. Thus, if the
algorithm outputs ℎ points then the pairwise distance between any two of them
is more than 2𝑅. By Lemma 9.1, if ℎ ≥ 𝑘 + 1 the optimum radius is > 𝑅. Hence,
if the guess 𝑅 ≥ 𝑅∗ the algorithm outputs at most 𝑘 centers. 
The guessing of 𝑅∗ can be implemented by binary search in various ways.
We omit these routine details.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 104

Exercise 9.2. Describe an example where the algorithm uses exactly 𝑘 centers
even with guess 𝑅 ∗ . Describe an example where the algorithm outputs less than
𝑘 centers with a guess of 𝑅∗ .

9.1.3 Related Problems and Discussion


The 𝑘-Center problem is natural in geometric settings and one can see from the
proof that the 2-approximation for the two algorithms holds even when allowing
for centers to be chosen outside. A surprising result of Feder and Greene [55]
shows that even in two dimensions (the Euclidean plane) one cannot improve
the factor of 2 unless 𝑃 = 𝑁𝑃.
The 𝑘-Supplier problem is closely related to 𝑘-Center and is motivated by
facility location considerations. Here we are given ℱ ∪ 𝒟 which are in a metric
space. We need to choose 𝑘 centers 𝐶 ⊆ ℱ to minimize max𝑝∈𝒟 𝑑(𝑝, 𝐶). Note
that we don’t have to cover the facilities. Hochbaum and Shmoys gave a variant
of their algorithm that obtains a 3-approximation for k−𝑆𝑢𝑝𝑝𝑙𝑖𝑒𝑟 and moreover
showed that unless 𝑃 = 𝑁𝑃 one cannot improve 3 [84]. Interestingly in Euclidean
spaces 3 is not tight [127]. Several generalizations of 𝑘-Center which constrain
the choice of centers have been considered — see [31] for a general framework
that also considers outliers.
One can consider weighted version of 𝑘-Center or relabeled as priority
version in subsequent work. We refer to the work of Plesnik [131] and a recent
one [16] on this variant which has found applications in fair clustering.
We finally mention that 𝑘-Center clustering has nice connections to the notion
of 𝑟-nets in metric spaces. Given a set of points 𝑃 in a metric space and a radius
𝑟, an 𝑟-net is a set of centers 𝐶 such that (i) for all 𝑝 ∈ 𝑃 we have 𝑑(𝑝, 𝐶) ≤ 𝑟 (that
is the points are covered by balls of radius 𝑟) and (ii) for any distinct 𝑐, 𝑐 0 ∈ 𝐶 we
have 𝐵(𝑐, 𝑟/2) and 𝐵(𝑐 0 , 𝑟/2) are disjoint (packing property or the property that
no two centers are too close). 𝑟-nets provide a concise summary of distances
in a metric space at scale 𝑟. One can use 𝑟-nets to obtain nearest-neighbor data
structures and other applications, especially in low-dimensional settings. We
refer the reader to [77, 78].
LP Relaxation: The two 𝑘-Center algorithms we described are combinatorial.
One can also consider an LP relaxation. Since it is a bottleneck problem, writing
an LP relaxation involves issues similar to what we saw with unrelated machine
scheduling. Given a guess 𝑅 we can write an LP to check whether a radius 𝑅 is
feasible and then find the smallest 𝑅 for which it is feasible. The feasibility LP
can be written as follows. Let 𝑥 𝑖 be an indicator for whether we open a center at
point 𝑝 𝑖 .
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 105

𝑛
Õ
𝑥𝑖 = 𝑘
Õ𝑖=1
𝑥𝑖 ≥ 1 ∀𝑝 𝑗 ∈ 𝑃
𝑝 𝑖 ∈𝐵(𝑝 𝑗 ,𝑅)
𝑥𝑖 ≥ 0 ∀𝑝 𝑖 ∈ 𝑃
Exercise 9.3. Prove that if 𝑅 is feasible for the preceding LP then one can obtain
a solution with 𝑘 centers with max radius 2𝑅.
Exercise 9.4. Generalize the LP for the 𝑘-Supplier problem and prove that one
can obtain a 3-approximation with respect to lower bound provided via the LP
approach.

9.2 Uncapacitated Facility Location


We now discuss UCFL. One can obtain a constant factor for UCFL via several
techniques: LP rounding, primal-dual, local search and greedy. The best
known approximation bound is 1.488 due to Li [116] while it is known that one
cannot obtain a ratio better than 1.463 [71]. We will describe the complicated
algorithms and focus on the high-level approaches that yield some constant
factor approximation.
It is common to use 𝑖 to denote a facility in ℱ and 𝑗 to denote a demand/client.

9.2.1 LP Rounding
The first constant factor approximation for UCFL was via LP rounding by Aardal,
Shmoys and Tardos using a filtering technique of Lin and Vitter. We start
with the LP relaxation. We use a variable 𝑦 𝑖 for 𝑖 ∈ ℱ to indicate whether 𝑖 is
opened or not. We use a variable 𝑥 𝑖,𝑗 to indicate whether 𝑗 is connected to 𝑖
(or assigned to 𝑖). One set of constraints are natural here: each client has to be
assigned/connected to a facility. The other constraint requires that 𝑗 is assigned
to 𝑖 only if 𝑖 is open.
Õ ÕÕ
min 𝑓𝑖 𝑦 𝑖 + 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗
𝑖∈ℱ 𝑗∈𝒟 𝑖∈ℱ
Õ
𝑥 𝑖,𝑗 = 1 ∀𝑗 ∈ 𝒟
𝑖
𝑥 𝑖,𝑗 ≤ 𝑦𝑖 𝑖 ∈ℱ,𝑗 ∈𝒟
𝑥, 𝑦 ≥ 0
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 106

Given a feasible solution 𝑥, 𝑦 to the LP the question is how to round. We


note that the LP relaxation does not “know” whether 𝑑 is a metric or not. In fact
when 𝑑 is arbitrary (but non-negative) we obtain the non-metric facility location
problem which is as hard as the Set Cover problem but not much harder — one
can obtain an 𝑂(log 𝑛)-approximation. However, we can obtain a constant factor
when 𝑑 is a metric and the rounding will exploit this property.
Given the fractional solution 𝑥, 𝑦 for each 𝑗 we define 𝛼 𝑗 to be the distance
cost paid for by the LP: therefore 𝛼 𝑗 = 𝑖∈ℱ 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗 . Note that the LP cost is
Í

𝑖 𝑓𝑖 𝑦 𝑖 + 𝑗 𝛼𝑗.
Í Í

Lemma 9.2. For each 𝑗 and each 𝛿 ∈ (0, 1) there is a total facility value of at least (1 − 𝛿)
in 𝐵(𝑗, 𝛼 𝑗 /𝛿). That is, 𝑖∈𝐵(𝑗,𝛼 𝑗 /𝛿) 𝑦 𝑖 ≥ 1 − 𝛿. In particular 𝑖∈𝐵(𝑗,2𝛼 𝑗 ) 𝑦 𝑖 ≥ 1/2.
Í Í

Proof. This essentially follows from Markov’s inequality or averaging. Note that
𝛼 𝑗 = 𝑖 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗 and 𝑖 𝑥 𝑖,𝑗 = 1. Suppose 𝑖∈𝐵(𝑗,𝛼 𝑗 /𝛿) 𝑦 𝑖 < 1 − 𝛿. Since 𝑥 𝑖,𝑗 ≤ 𝑦 𝑖
Í Í Í
for all 𝑖, 𝑗, we will have 𝛼 𝑗 > 𝛿𝛼 𝑗 /𝛿 which is impossible. 
We say that two clients 𝑗 and 𝑗 0 intersect if there is some 𝑖 ∈ ℱ such that
𝑖 ∈ 𝐵(𝑗, 2𝛼 𝑗 ) ∩ 𝐵(𝑗 0 , 2𝛼 𝑗0 ). The rounding algorithm is described below.

UCFL-primal-rounding

1. Solve LP and let (𝑥, 𝑦) be a feasible LP solution


Õ
2. For each 𝑗 compute 𝛼 𝑗 = 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗
𝑖

3. Renumber clients such that 𝛼 1 ≤ 𝛼 2 ≤ . . . ≤ 𝛼 ℎ where ℎ is number of clients

4. For 𝑗 = 1 to ℎ do

A. If 𝑗 already assigned continue


B. Open cheapest facility 𝑖 in 𝐵(𝑗, 2𝛼 𝑗 ) and assign 𝑗 to 𝑖
C. For each remaining client 𝑗 0 > 𝑗 that intersects with 𝑗 , assigne 𝑗 0 to 𝑖

5. Output the list of open facilities and the client assignment

It is not hard to see that every client is assigned to an open facility. The main
issue is to bound the total cost. Let 𝐹 be the total facility opening cost, and let 𝐶
be the total connection cost. We will bound these separately.
Lemma 9.3. 𝐹 ≤ 2 𝑖 𝑓𝑖 𝑦 𝑖 .
Í

Proof. Note that a client 𝑗 opens a new facility only if it has not been assigned
when it is considered by the algorithm. Let 𝑗1 , 𝑗2 , . . . , 𝑗 𝑘 be the clients that open
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 107

facilities. Let 𝐴 𝑗 = ℱ ∩ 𝐵(𝑗, 2𝛼 𝑗 ) be the set of facilities in the ball of radius 2𝛼 𝑗


around 𝑗. From the algorithm and the definition of intersection of clients, we
see that the sets 𝐴 𝑗1 , 𝐴 𝑗2 , . . . , 𝐴 𝑗𝑘 are pairwise disjoint. The algorithm opens the
cheapest facility in 𝐴 𝑗ℓ for 1 ≤ ℓ ≤ 𝑘. Note that 𝑖∈𝐴 𝑗 𝑦 𝑖 ≥ 1/2 by Lemma 9.2.
Í

Hence the cost of the cheapest facility in 𝐴 𝑗ℓ is at most 2 𝑖∈𝐴 𝑗 𝑓𝑖 𝑦 𝑖 (why?). By
Í

the disjointness of the sets 𝐴 𝑗1 , . . . , 𝐴 𝑗𝑘 we see that the total cost of the facilities
opened is at most 2 𝑖 𝑓𝑖 𝑦 𝑖 .
Í

Lemma 9.4. 𝐶 ≤ 6 𝛼𝑗.
Í
𝑗

Proof. Consider a client 𝑗 that opens a facility 𝑖 in 𝐵(𝑗, 2𝛼 𝑗 ). In this case 𝑗 is


assigned to 𝑖 and 𝑑(𝑖, 𝑗) ≤ 2𝛼 𝑗 . Now consider a client 𝑗 that does not open
a facility. This implies that there was a client 𝑗 0 < 𝑗 that opened a facility 𝑖 0
and 𝑗 0 and 𝑗 intersect, and 𝑗 was assigned to 𝑖 0. What is 𝑑(𝑖 0 , 𝑗)? We claim that
𝑑(𝑖 0 , 𝑗) ≤ 6𝛼 𝑗 . To see this we note that 𝑗 0 and 𝑗 intersect because there is some
facility 𝑖 ∈ 𝐵(𝑗, 2𝛼 𝑗0 ) ∩ 𝐵(𝑗, 2𝛼 𝑗 ). By considering the path 𝑗 → 𝑖 → 𝑗 0 → 𝑖 0, via
triangle inequality, 𝑑(𝑖 0 , 𝑗) ≤ 2𝛼 𝑗 + 2𝛼 𝑗0 + 2𝛼 𝑗0 ≤ 6𝛼 𝑗 since 𝛼 𝑗0 ≤ 𝛼 𝑗 . Thus the
distance traveled by each client 𝑗 to its assigned facility is at most 6𝛼 𝑗 . The
lemma follows. 
The two preceding lemmas give us the following which implies that the
algorithm yields a 6-approximation.

Theorem 9.3. 𝐶 + 𝐹 ≤ 6 OPT𝐿𝑃 .

It should be clear to the reader that the algorithm and analysis are not
optimized for the approximation ratio. The goal here was to simply outline the
basic scheme that led to the first constnat factor approximation.

9.2.2 Primal-Dual
Jain and Vazirani developed an elegant and influential primal-dual algorithm
for UCFL [91]. It was influential since it allowed new algorithms for 𝑘-median
and several generalizations of UCFL in a clean way. Moreover the primal-dual
algorithm is simple and efficient to implement. On the other hand we should
mention that one advantage of having an LP solution is that it gives an explicit
lower bound on the optimum value while a primal-dual method yields a lower
bound via a feasible dual which may not be optimal. We need some background
to describe the primal-dual method in approximation.
Complementary slackness: To understand primal-dual we need some basic
background in complementary slackness. Suppose we have a primal LP (P) of
the form min 𝑐𝑥 s.t 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0 which we intentionally wrote in this standard
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 108

form as a covering LP. It’s dual (D) is a packing LP max 𝑏 𝑡 𝑦 s.t 𝑦𝐴𝑡 ≥ 𝑐, 𝑦 ≥ 0.
We will assume that both primal and dual are feasible and hence the optimum
values are finite, and via strong duality we know that the optimum values are
the same.
Definition 9.4. A feasible solution 𝑥 to (P) and a feasible solution 𝑦 to (𝐷) satisfy the
primal complementary slackness condition with respect to each other if the following is
true: for each 𝑖, 𝑥 𝑖 = 0 or the corresponding dual constraint is tight, that is 𝑗 𝐴 𝑖,𝑗 𝑦 𝑗 = 𝑐 𝑖 .
Í
They satisfy the dual complementary slackness condition if the following is true: for each
𝑗, 𝑦 𝑗 = 0 or the corresponding primal constraint is tight, that is 𝑗 𝐴 𝑗,𝑖 𝑥 𝑖 = 𝑏 𝑗 .
Í

One of the consequences of the duality theory of LP is the following.


Theorem 9.5. Suppose (P) and (D) are a primal and dual pair of LPs that both have
finite optima. A feasible solution 𝑥 to (P) and a feasible solution 𝑦 to (D) satisfy the
primal-dual complementary slackness properties with respect to each other if and only
if they are both optimum solutions to the respective LPs.
We illustrate the use of complementary slackness in the context of approxi-
mation via Vertex Cover. Recall the primal covering LP and we also write the
dual.

Õ
min 𝑤𝑣 𝑥𝑣
𝑣∈𝑉
𝑥𝑢 + 𝑥𝑣 ≥ 1 ∀𝑒 = (𝑢, 𝑣) ∈ 𝐸
𝑥𝑣 ≥ 0 ∀𝑣 ∈ 𝑉

Õ
max 𝑦𝑒
𝑒∈𝐸
Õ
𝑦𝑒 ≤ 𝑤𝑣 ∀𝑣 ∈ 𝑉
𝑒∈𝛿(𝑣)
𝑦𝑒 ≥ 0, ∀𝑒 ∈ 𝐸

Recall that we described a simple rounding algorithm. Given a feasible


primal solution 𝑥. We let 𝑆 = {𝑣 | 𝑥 𝑣 ≥ 1/2} and showed that (i) 𝑆 is a vertex
cover for the given graph 𝐺 and (ii) 𝑤(𝑆) ≤ 2 𝑣 𝑤 𝑣 𝑥 𝑣 . Now suppose we have an
Í
optimum solution 𝑥 ∗ to the primal rather than an arbitrary feasible solution. We
can prove an interesting and stronger statement via complementary slackness.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 109

Lemma 9.5. Let 𝑥 ∗ be an optimum solution to the primal covering LP. Then 𝑆 = {𝑣 |
𝑥 𝑣∗ > 0} is a feasible vertex cover for 𝐺 and moreover 𝑤(𝑆) ≤ 2 𝑣 𝑤 𝑣 𝑥 𝑣∗ .
Í

Proof. It is easy to see that 𝑆 is a vertex cover via the same argument that we
have seen before. How do we bound the cost now that we may be rounding
𝑥 𝑣∗ to 1 even though 𝑥 𝑣∗ may be tiny? Let 𝑦 ∗ be any optimum solution to the
dual; one exists (why?). Via strong duality we have 𝑣 𝑤 𝑣 𝑥 𝑣∗ = 𝑒 𝑦 𝑒∗ . Via
Í Í
primal-complementary slackness we have the property that if 𝑥 𝑣∗ > 0 then
𝑒∈𝛿(𝑣) 𝑦 𝑒 = 𝑤 𝑣 . Hence
Í ∗

Õ Õ Õ
𝑤(𝑆) = 𝑤𝑣 = 𝑦 𝑒∗ .
𝑣:𝑥 𝑣∗ >0 𝑣:𝑥 𝑣∗ >0 𝑒∈𝛿(𝑣)

Interchanging the order of summation we obtain that


Õ Õ Õ Õ
𝑤(𝑆) = 𝑦 𝑒∗ ≤ 2 𝑦 𝑒∗ = 2 𝑤 𝑣 𝑥 𝑣∗
𝑣:𝑥 𝑣∗ >0 𝑒∈𝛿(𝑣) 𝑒∈𝐸 𝑣

where we use the fact that an edge 𝑒 has only two end points in the inequality. 

Primal-dual for approximating Vertex Cover: We will first study a primal-


dual approximation algorithm for the Vertex Cover problem — this algorithm
due to Bar-Yehuda and Even [19] can perhaps be considered the first primal-dual
algorithm in approximation. Primal-dual is a classical method in optimization
and is applicable in both continuous and discrete settings. The basic idea, in the
context of LPs (the method applies more generally), is to obtain a solution 𝑥 to a
primal LP and a solution 𝑦 to the dual LP together and certify the optimality of
each solution via complementary slackness. It is beyond the scope of this notes
to give a proper treatment. One could argue that understanding the setting
of approximation is easier than in the classical setting where one seeks exact
algorithms, since our goals are more modest.
Typically one starts with one of 𝑥, 𝑦 being feasible and the other infeasible,
and evolve them over time. In discrete optimization, this method is successfully
applied to LPs that are known to be integer polytopes. Examples include
shortest paths, matchings, and others. In approximation the LP relaxations are
typically not integral. In such a setting the goal is to produce a primal and
dual pair 𝑥, 𝑦 where 𝑥 is an integral feasible solution, and 𝑦 is fractional feasible
solution. The goal is to approximately bound the value of 𝑥 via the dual 𝑦,
and for this purpose we will enforce only the primal complementary slackness
condition for the pair (𝑥, 𝑦). To make the algorithm and analysis manageable,
the primal-dual algorithm is typically done in a simple but clever fashion —
there have been several surprisingly strong and powerful approximation results
via this approach.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 110

We illustrate this in the context of Vertex Cover first. It is useful to interpret


the dual as we did already in the context of the dual-fitting technique for Set
Cover. We think of the edges 𝑒 ∈ 𝐸 as agents that wish to be covered by the
vertices in 𝐺 at minimum cost. The dual variable 𝑦 𝑒 can be thought of as the
amount that edge 𝑒 is willing to pay to be covered. The dual packing constraint
𝑒∈𝛿(𝑣) 𝑦 𝑒 ≤ 𝑤 𝑣 says that for any vertex 𝑣, the total payments of all edges incident
Í
to 𝑣 cannot exceed its weight. This can be understood game theoretically as
follows. The set of edges 𝛿(𝑣) can together pay 𝑤 𝑣 and get covered, and hence
we cannot charge them more. The dual objective is to maximize the total
payment that can be extracted from the edges subject to these natural and simple
constraints. With this interpretation in mind we wish to produce a feasible
dual (payments) and a corresponding feasible integral primal (vertex cover).
The basic scheme is to start with an infeasible primal 𝑥 = 0 and a feasible dual
𝑦 = 0 and increase 𝑦 while maintaining feasibility; during the process we will
maintain primal complementary slackness which means that if a dual constraint
for a vertex 𝑣 becomes tight we set 𝑥 𝑣 = 1. Note that we are producing an integer
primal solution in this process. How should we increase 𝑦 values? We will do it
in the naive fashion which is to uniformly increase 𝑦 𝑒 for all 𝑒 that are not already
part of a tight constraint (and hence not covered yet).

VC-primal-dual(𝐺 = (𝑉 , 𝐸), 𝑤 : 𝑉 → ℝ+ )

1. 𝑥 ← 0, 𝑦 ← 0 // initialization: primal infeasible, dual feasible

2. 𝑈 ← 𝐸 // uncovered edges that are active

3. While (𝑈 ≠ ∅) do
Õ
A. Increase 𝑦 𝑒 uniformly for each 𝑒 ∈ 𝑈 until constraint 𝑦 𝑒 = 𝑤 𝑣 for
𝑒∈𝛿(𝑣)
some vertex 𝑎
B. Set 𝑥 𝑎 = 1 // Maintain primal complementary slackness
C. 𝑈 ← 𝑈 \ 𝛿(𝑎) // Remove all edges covered by 𝑎

4. Output integer solution 𝑥 and dual certificate 𝑦

Remark 9.2. Note that when checking whether a vertex 𝑣 is tight we count the
payments from 𝑒 ∈ 𝛿(𝑣) even though some of them are no longer active.
Lemma 9.6. At the end of the algorithm 𝑥 is a feasible vertex cover for 𝐺 and
𝑣 𝑤 𝑣 𝑥 𝑣 ≤ 2 OPT.
Í

Proof. By induction on the iterations one can prove that (i) 𝑦 remains dual
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 111

feasible throughout (ii) 𝑎𝑏 ∈ 𝑈 at the start of an iteration iff 𝑒 = 𝑎𝑏 is not covered


yet (iii) each iteration adds at least one more vertex and hence the algorithm
terminates in ≤ 𝑛 iterations and outputs a feasible vertex cover. The main issue
is the cost of 𝑥.
For this we note that the algorithm maintains primal complementary slack-
ness. That is, 𝑥 𝑣 = 0 or if 𝑥 𝑣 = 1 then 𝑒∈𝛿(𝑣) 𝑦 𝑒 = 𝑤 𝑣 . Thus, we have
Í

Õ Õ Õ Õ
𝑤𝑣 𝑥𝑣 = 𝑦𝑒 ≤ 2 𝑦 𝑒 ≤ 2 OPT𝐿𝑃 .
𝑣 𝑣:𝑥 𝑣 >0 𝑒∈𝛿(𝑣) 𝑒

We used the fact that 𝑒 has at most two end points in the first inequality and
the fact that 𝑦 is dual feasible in the second inequality. In terms of payment
what this says is that edge 𝑢𝑣’s payment of 𝑦𝑢𝑣 can be used to pay for opening 𝑢
and 𝑣 while the dual pays only once. 
As the reader can see, the algorithm is very simple to implement.
Exercise 9.5. Describe an example to show that the primal-dual algorithm’s
worst case performance is 2. Describe an example to show that the dual value
constructed by the algorithm is ' OPT/2. Are these two parts the same?
Remark 9.3. The algorithm generalizes to give an 𝑓 -approximation for Set Cover
where 𝑓 is the maximum frequency of any element. There are examples showing
that the performance of this algorithm in the worst-case can indeed be a factor
of 𝑓 . We saw earlier that the integrality gap of the LP is at most 1 + ln 𝑑 where
𝑑 is the maximum set size. There is no contradiction here since the specific
primal-dual algorithm that we developed need not achieve the tight integrality
gap.

Primal-Dual for UCFL: Now we consider a primal-dual algorithm for UCFL.


Recall the primal LP that we discussed previously. Now we describe the dual
LP written below. The dual has two types of variables. For each client 𝑗 there is
a variable 𝛼 𝑗 corresponding to the primal constraint that each client 𝑗 needs to
be connected to a facility. For each facility 𝑖 and client 𝑗 there is a variable 𝛽 𝑖,𝑗
corresponding to the constraint that 𝑦 𝑖 ≥ 𝑥 𝑖,𝑗 .

Õ
max 𝛼𝑗
𝑗∈𝒟
Õ
𝛽 𝑖,𝑗 ≤ 𝑓𝑖 ∀𝑖 ∈ ℱ
𝑖
𝛼 𝑗 − 𝛽 𝑖,𝑗 ≤ 𝑑(𝑖, 𝑗) 𝑖 ∈ℱ,𝑗 ∈𝒟
𝛼, 𝛽 ≥ 0
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 112

It is important to interpret the dual variables. There is a similarity to


Set Cover since Non-Metric Facility Location is a generalization, and the LP
formulation does not distinguish between metric and non-metric facility location
— it is only in the rounding that we can take advantage of metric properties. The
variable 𝛼 𝑗 can be interpreted as the amount of payment client 𝑗 is willing to
make. This comes in two parts — the payment to travel to a facility which it
cannot share with any other clients, and the payment it is willing to make to open
a facility which it can share with other clients. The variable 𝛽 𝑖,𝑗 corresponds to
the amount client 𝑗 is willing to pay to facility 𝑖 to open it. (In Set Cover there is
no need to distinguish between 𝛼 𝑗 and 𝛽 𝑖,𝑗 since there are no distances (or they
can be assumed to be 0 or ∞).) The first set of constraints in the dual say that
for any facility 𝑖, the total payments from all the clients ( 𝑗 𝛽 𝑖,𝑗 ) cannot exceed
Í
cost 𝑓𝑖 . The second set of constraints specify that 𝛼 𝑗 − 𝛽 𝑖,𝑗 is at most 𝑑(𝑖, 𝑗). One
way to understand this is that if 𝛼 𝑗 < 𝑑(𝑖, 𝑗) then client 𝑗 will not even be able to
reach 𝑖 and hence will not contribute to opening 𝑖. Via this interpretation it is
convenient to assume that 𝛽 𝑖,𝑗 = max{0, 𝛼 𝑗 − 𝑑(𝑖, 𝑗)}.
The primal-dual algorithm for UCFL will have a growth stage that is similar
to what we saw for Vertex Cover. We increase 𝛼 𝑗 for each “uncovered” client 𝑗
uniformly. A facility 𝑖 will receive payment 𝛽 𝑖,𝑗 = max{0, 𝛼 𝑗 − 𝑑(𝑖, 𝑗)} from 𝑗. To
maintain dual feasibility, as soon as the constraint for facility 𝑖 becomes tight
we open facility 𝑖 (in the primal we set 𝑦 𝑖 = 1); note that any client 𝑗 such that
𝛽 𝑖,𝑗 > 0 cannot increase its 𝛼 𝑗 any further and hence will stop growing since it is
connected to an open facility. This process is very similar to that in Set Cover.
The main issue is that we will get a weak approximation (a factor of |ℱ |) since
a client can contribute payments to a large number of facilities. Note that the
process so far has not taken advantage of the fact that we have a metric facility
location problem. Therefore, in the second phase we will close some facilities
which means that a client may need to get connected to a facility that it did not
contribute to — however we will use the metric property to show that a client
does not need to travel too far to reach a facility that remains open.
With the above discussion in place, we describe the two phase primal-dual
algorithm below. The algorithm also creates a bipartite graph 𝐺 with vertex set
ℱ ∪ 𝒟 and initially it has no edges.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 113

JV-primal-dual((ℱ ∪ 𝒟, 𝑑), 𝑓𝑖 , 𝑖 ∈ ℱ )

1. 𝛼 𝑗 = 0 for all 𝑗 ∈ 𝒟 , 𝛽 𝑖,𝑗 ← 0 for all 𝑖, 𝑗 // initialize dual values to 0

2. 𝐴 ← 𝒟 // active clients that are unconnected

3. 𝑂 ← ∅ // temporarily opened facilities

4. While (𝐴 ≠ ∅) do // Growth phase

A. Increase 𝛼 𝑗 uniformly for each 𝑗 ∈ 𝐴 while maintaining the invariant


max{0, 𝛼 𝑗 − 𝑑(𝑖, 𝑗)} = 𝛽 𝑖,𝑗 until one of following
Õ conditions hold: (i)
𝛼 𝑗 = 𝑑(𝑖, 𝑗) for some 𝑗 ∈ 𝐴 and 𝑖 ∈ 𝑂 or (ii) 𝛽 𝑖𝑗 = 𝑓𝑖 for some
𝑗
𝑖 ∈ ℱ \𝑂
B. If (i) happens then add edge (𝑖, 𝑗) to 𝐺 and and 𝐴 ← 𝐴 − {𝑗} // 𝑗 is
connected to already open facility 𝑖 ∈ 𝑂
C. Else If (ii) happens then
1. 𝑂 ← 𝑂 ∪ {𝑖} // temporarily open 𝑖 that became tight
2. for each 𝑗 ∈ 𝒟 such that 𝛽 𝑖,𝑗 > 0 add edge (𝑖, 𝑗)to 𝐺 // note: clients
not in 𝐴 may also get edges to 𝑖
3. 𝐴 ← 𝐴 \ {𝑗 : 𝛽(𝑖, 𝑗) > 0} // make active clients connected to 𝑖 inactive

5. Create graph 𝐻 with vertex set 𝑂 and edge set 𝑄 where (𝑖, 𝑖 0) ∈ 𝑄 iff
there exists client 𝑗 such that 𝛽(𝑖, 𝑗) > 0 and 𝛽(𝑖 0 , 𝑗) > 0

6. 𝑂 0 is any maximal independent set in 𝐻

7. Close facilities in 𝑂 \ 𝑂 0 // Pruning phase

8. Assign each client 𝑗 to nearest facility in 𝑂 0

Example: The example in Fig 9.2.2 illustrates the need for the pruning phase.
There are 2𝑛 clients and 𝑛 facilities and the opening costs of the facilities are
𝑛 + 2 except for the first one which has an opening cost of 𝑛 + 1. The first group
of clients shown at the top of the figure have a connection cost of 1 to each
facility. The second group of clients have the following property: 𝑑(𝑖 ℎ , 𝑗 0ℎ ) = 1
and 𝑑(𝑖ℓ , 𝑗 0ℎ ) = 3 when ℓ ≠ 𝑗. The rest of the distances are induced by these. One
can verify that in the growth phase all the facilities will be opened. However the
total dual value after the growth phase is 5𝑛 − 1 while the total cost of the opened
facilitie is Θ(𝑛 2 ) and hence pruning is necessary to obtain a good approximation.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 114

Figure 9.1: Example to illustrate the need for the pruning phase.

We will do a high-level analysis of the algorithm skipping a few fine details.


A formal treatment with full details can be found in [152].
The algorithm maintains the property that 𝛼 𝑗 and 𝛽 𝑖,𝑗 variables are dual
feasible. Consider the graph 𝐺 created by the algorithm. We call an edge
(𝑖, 𝑗) ∈ 𝐺 a witness edge for 𝑗 when 𝑗 is removed from 𝐴 (it happens exactly once
for each 𝑗). We call an edge (𝑖, 𝑗) ∈ 𝐺 special if 𝛽(𝑖, 𝑗) > 0 which means that 𝑗
paid to temporarily open 𝑖. We remark that a client 𝑗 may have a special edge to
𝑖 even though (𝑖, 𝑗) is not its witness edge — this can happen if 𝑖 is temporarily
opened after 𝑗 was already removed from 𝐴 (due to other clients contributing to
opening 𝑖 later). One can associate a notion of time with the progression of the
algorithm since dual variables are monotonically increased. This can be used to
order events.
A basic property maintained by the algorithm is the following.

Claim 9.2.1. If (𝑖, 𝑗) is an edge in 𝐺 then 𝛼 𝑗 − 𝛽 𝑖,𝑗 = 𝑑(𝑖, 𝑗) and hence 𝛼 𝑗 ≥ 𝑑(𝑖, 𝑗).

Proof. The algorithm adds an edge (𝑖, 𝑗) only if (𝑖, 𝑗) is a witness edge for 𝑗 or if
𝛽(𝑖, 𝑗) > 0 (in which case (𝑖, 𝑗) is special). The algorithm maintain the invariant
𝛽 𝑖,𝑗 = max{0, 𝛼 𝑗 − 𝑑(𝑖, 𝑗)} and hence if 𝛽 𝑖,𝑗 > 0 the claim is clear. If 𝛽 𝑖,𝑗 = 0 then
(𝑖, 𝑗) is a witness edge for 𝑗 and case (i) happens when 𝑗 is removed from 𝐴 and
in this case 𝛼 𝑗 = 𝑑(𝑖, 𝑗). 
Analysis: We upper bound the cost of opening the facilities in 𝑂 0 and the
connection cost of the clients.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 115

We leave the proof of the following lemma as an exercise.


Lemma 9.7. For each facility 𝑖 ∈ 𝑂 we have 𝛽 𝑖,𝑗 = 𝑓𝑖 .
Í
(𝑖,𝑗) is special

Since a client 𝑗 can pay for multiple facilities which we cannot afford, the
pruning phase removes facilities such that no client 𝑗 is connected two facilities
in 𝑂 0 with special edges (otherwise those two facilities will have an edge in 𝐻).
We say that a client 𝑗 is directly connected to a facility 𝑖 ∈ 𝑂 0 if (𝑖, 𝑗) is a special
edge. We call all such clients directly connected clients and the rest of the clients
are called indirectly connected. We let 𝒟1 = ∪𝑖∈𝑂 0 𝑍 𝑖 be the set of all directly
connected clients and let 𝒟2 be the set of all indirectly connected clients.
For 𝑖 ∈ 𝑂 0 let 𝑍 𝑖 be the directly connected clients. By the pruning rule we
have the property that a client 𝑗 is directly connected to at most one facility in 𝑂 0.
We show that each facility in 𝑂 0 can be paid for by its directly connected clients.
Lemma 9.8. For each 𝑖 ∈ 𝑂 0, 𝛽 𝑖,𝑗 = 𝑓𝑖 .
Í
𝑗∈𝑍 𝑖

Proof. From Lemma 9.7 and the fact that if 𝑖 ∈ 𝑂 0 then every client 𝑗 with special
edge (𝑖, 𝑗) must be directly connected to 𝑖. 
From Claim 9.2.1 we see that if 𝑗 is directly connected to 𝑖 then 𝛼 𝑗 −𝛽 𝑖,𝑗 = 𝑑(𝑖, 𝑗),
and hence 𝑗 can pay its connection cost to 𝑖 and its contribution to opening
𝑖. What about indirectly connected clients? The next lemma bounds their
connection cost.
Lemma 9.9. Suppose 𝑗 ∈ 𝒟2 , that is, it is an indirectly connected client. Let 𝑖 be its
closest facility in 𝑂 0 then 𝑑(𝑖, 𝑗) ≤ 3𝛼 𝑗 .
Proof. Let (𝑖, 𝑗) be the witness edge for 𝑗. Note that 𝑖 ∈ 𝑂. Since 𝑗 is an indirectly
connected client there is no facility 𝑖 0 ∈ 𝑂 0 such that (𝑖 0 , 𝑗) is a special edge. Since
𝑖 ∉ 𝑂 0 it must be because 𝑖 was closed in the pruning phase and hence there
must be a facility 𝑖 0 ∈ 𝑂 0 such that (𝑖, 𝑖 0) is an edge in 𝐻 (otherwise 𝑂 0 would
not be a maximal independent set). Therefore there is some client 𝑗 0 ≠ 𝑗 such
that (𝑖 0 , 𝑗 0) and (𝑖, 𝑗 0) are both special edges. We claim that 𝛼 𝑗 ≥ 𝛼 𝑗0 . Assuming
this claim we see via triangle inequality and Claim 9.2.1 that,
𝑑(𝑖 0 , 𝑗) ≤ 𝑑(𝑖, 𝑗) + 𝑑(𝑖, 𝑗 0) + 𝑑(𝑖 0 , 𝑗 0) ≤ 𝛼 𝑗 + 2𝛼 𝑗0 ≤ 3𝛼 𝑗 .
Since 𝑖 0 ∈ 𝑂 0 the nearest client to 𝑗 is within distance ≤ 3𝛼 𝑗 .
We now prove the claim that 𝛼 𝑗 ≥ 𝛼 𝑗0 . Let 𝑡 = 𝛼 𝑗 be the time when 𝑗 connects
to facility 𝑖 as its witness. Consider two cases. In the first case 𝑑(𝑖, 𝑗 0) ≤ 𝑡 which
means that 𝑗 0 had already reached 𝑖 at or before 𝑡; in this case 𝛼 𝑗0 ≤ 𝑡 since 𝑖 was
open at 𝑡. In the second case 𝑑(𝑖, 𝑗 0) > 𝑡; this means that 𝑗 0 reached 𝑖 strictly after
𝑡. Since 𝑖 was already open at 𝑡, 𝑗 0 would not pay to open 𝑖 which implies that
𝛽(𝑖, 𝑗 0) = 0 but then (𝑖, 𝑗 0) would not be a special edge and hence this case cannot
arise. This finishes the proof of the claim. 
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 116

With the preceding two lemmas in place we can bound the total cost of
opening facilities in 𝑂 0 and connecting clients to them. We will provide a refined
statement that turns out to be useful in some applications.

Theorem 9.6.
Õ Õ Õ
𝑑(𝑂 0 , 𝑗) + 3 𝑓𝑖 ≤ 3 𝛼 𝑗 ≤ 3 OPT𝐿𝑃 .
𝑗∈𝒟 𝑖∈𝑂 0 𝑗∈𝒟

In particular the algorithm yields a 3-approximation.

Proof. Consider directly connected clients 𝒟1 . We have 𝒟1 = ]𝑖∈𝑂 0 𝑍 𝑖 where 𝑍 𝑖


are directly connected to 𝑖. Via Lemma 9.8 and Claim 9.2.1
Õ ÕÕ
𝛼𝑗 = 𝛼𝑗
𝑗∈𝒟1 𝑖∈𝑂 0 𝑗∈𝑍 𝑖
ÕÕ
= (𝑑(𝑖, 𝑗) + 𝛽 𝑖,𝑗 )
𝑖∈𝑂 0 𝑗∈𝑍 𝑖

Õ© Õ
= ­ 𝑓𝑖 + 𝑑(𝑖, 𝑗)®
ª
𝑗∈𝑂 0 « 𝑗∈𝑍 𝑖 ¬
Õ Õ
≥ 𝑓𝑖 + 𝑑(𝑂 0 , 𝑗).
𝑗∈𝑂 0 𝑗∈𝒟1

𝛼𝑗 ≥
Í
Í For indirectly connected clients, via Lemma 9.9, we have 3 𝑗∈𝒟2
𝑗∈𝒟2 𝑑(𝑂 , 𝑗). Thus
0

Õ Õ Õ
3 𝛼𝑗 = 3 𝛼𝑗 + 3 𝛼𝑗
𝑗∈𝒟 𝑗∈𝒟1 𝑗∈𝒟2
Õ Õ Õ
≥ 3 𝑓𝑖 + 3 𝑑(𝑂 0 , 𝑗) + 𝑑(𝑂 0 , 𝑗)
𝑖∈𝑂 0 𝑗∈𝒟1 𝑗∈𝒟2
Õ Õ
≥ 3 𝑓𝑖 + 𝑑(𝑂 0 , 𝑗).
𝑖∈𝑂 0 𝑗∈𝒟


The algorithm is easy and efficient to implement. One of the main advantages
of the stronger property that we saw in the theorem is that it leads to a nice
algorithm for the 𝑘-median problem; we refer the reader to Chapter 25 in [152]
for a detailed description. In addition the flexibility of the primal-dual algorithm
has led to algorithms for several other variants; see [20] for one such example.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 117

9.2.3 Local Search


Local search has been shown to be a very effective heuristic algorithm for facility
location and clustering probelms and there is extensive literature on this topic.
The first paper that proved constant factor approximation bounds for UCFL is
by Korupolu, Plaxtion and Rajaraman [108] and it provided a useful template
for many future papers. We refer the reader to Chapter 9 in [155] for local search
analysis for UCFL.

9.3 𝒌-Median
𝑘-Median has been extensively studied in approximation algorithms due to its
simplicity and connection to UCFL. The first constant factor approximation was
obtained in [35] via LP rounding. We will consider a slight generalization of
𝑘-Median where the medians are to be selected from the facility set ℱ . We
describe the LP which is closely related to that for UCFL which we have already
seen. The variables are the same: 𝑦 𝑖 indicates whether a center is opened at
location 𝑖 ∈ ℱ and 𝑥 𝑖,𝑗 indicates whether client 𝑗 is connected to facility 𝑖. The
objective and constraints change since the problem requires one to open at most
𝑘 facilities but there is no cost to opening them.

ÕÕ
min 𝑑(𝑖, 𝑗)𝑥 𝑖,𝑗
𝑗∈𝒟 𝑖∈ℱ
Õ
𝑥 𝑖,𝑗 = 1 ∀𝑗 ∈ 𝒟
𝑖
𝑥 𝑖,𝑗 ≤ 𝑦𝑖 𝑖 ∈ℱ,𝑗 ∈𝒟
Õ
𝑦𝑖 ≤ 𝑘
𝑖
𝑥, 𝑦 ≥ 0

Rounding of the LP for 𝑘-Median is not as simple as it is for UCFL. This is


mainly because one needs a global argument. To understanding this consider
the following example. Suppose we have 𝑘 + 1 points where the distance between
each pair is 1 (uniform metric space). Then the optimum integer solution has cost
1 since we can place 𝑘 medians at 𝑘 points and the remaining point has to travel
a distance of 1. Now consider a fractional solution where 𝑦 𝑖 = (1 − 1/(𝑘 + 1)) for
each point. The cost of this fractional solution is also 1, however, each client now
pays a fractional cost of 1/(𝑘 + 1). Any rounding algorithm will make at least
one of the clients pay a cost of 1 which is much larger than its fractional cost;
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 118

thus the analysis cannot be based on preserving the cost of each client within a
constant factor of its fractional cost.
There are several LP rounding algorithms known. An advantage of LP based
approach is that it leads to a constant factor approximation for the Matroid
Median problem which is a nice and powerful generalization of the 𝑘-Median
problem; here there is a matroid defined over the facilities and the constraint is
that the set of facilities chosen must be independent in the given matroid. One
can write a natural LP relaxation for this problem and prove that the LP has a
constant integrality gap by appealing to matroid intersection! It showcases the
power of classical result in combinatorial optimization. We refer the reader to
[109, 147].

9.3.1 Local Search


Local search for center based clustering problems is perhaps one of the most
natural heuristics. In particular we will consider the 𝑝-swap heuristic. Given
the current set of 𝑘 centers 𝑆, the 𝑝-swap heuristic will consider swapping out
up to 𝑝 centers from 𝑆 with 𝑝 new centers. It is easy to see that this local search
algorithm can be implemented in 𝑛 𝑂(𝑝) time for each iteration. When 𝑝 = 1
we simply refer to the algorithm as (basic) local search. We will ignore the
convergence time. As we saw for Max Cut, one can use standard tricks to make
the algorithm run in polynomial time with a loss of a (1 + 𝑜(1))-factor in the
approximation bound guaranteed by the worst-case local optimum. Thus the
main focus will be on the quality of the local optimum. The following is a very
nice result.

Theorem 9.7 (Arya et al. [11]). For any fixed 𝑝 ≥ 1 the 𝑝-swap local search heuristic
has a tight worst-case approximation ratio of (3 + 2/𝑝) for 𝑘-Median. In particular the
basic local search algorithm yields a 5-approximation.

Here we give a proof/analysis of the preceding theorem for 𝑝 = 1, following


the simplified analysis presented in [74]. See also [155] and the notes from CMU.
Given any set of centers 𝑆 we define cost(𝑆) = 𝑗∈𝒟 𝑑(𝑗, 𝑆) to be the sum of
Í
the distances of the clients to the centers. Let 𝑆 be a local optimum and let 𝑆∗
be some fixed optimum solution to the given 𝑘-Median instance. To compare
cost(𝑆) with cost(𝑆 ∗ ) the key idea is to set up a clever set of potential swaps
between the centers in 𝑆 and centers in 𝑆∗ . That is, we consider a swap pair
(𝑟, 𝑓 ) where 𝑟 ∈ 𝑆 and 𝑓 ∈ 𝑆 ∗ . Since 𝑆 is a local optimum it must be the case that
cost(𝑆 − 𝑟 + 𝑓 ) ≤ cost(𝑆). The analysis upper bounds the potential increase in the
cost in some interesting fashion and sums up the resulting series of inequalities.
This may seem magical, and indeed it is not obvious why the analysis proceeds
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 119

in this fashion. The short answer is that the analysis ideas required a series of
developments with the somewhat easier case of UCFL coming first.
We set up some notation. Let 𝜙 : 𝒟 → 𝑆 be the mapping of clients to
facilities in 𝑆 based on nearest distance. Similarly let 𝜙∗ : 𝒟 → 𝑆 ∗ the mapping
to facilities in the optimum solution 𝑆∗ . Thus 𝑗 connects to facility 𝜙(𝑗) in the
local optimum and to 𝜙 ∗ (𝑗) in the optimum solution. We also let 𝑁(𝑖) denote
the set of all clients assigned to a facility 𝑖 ∈ 𝑆 and let 𝑁 ∗ (𝑖) denote the set of all
clients assigned to a facility 𝑖 ∈ 𝑆 ∗ . Let 𝐴 𝑗 = 𝑑(𝑗, 𝑆) and 𝑂 𝑗 = 𝑑(𝑗, 𝑆∗ ) be the cost
paid by 𝑗 in local optimum and optimal solutions respectively. To reinforce the
notation we express cost(𝑆) as follows.
Õ Õ Õ
cost(𝑆) = 𝐴𝑗 = 𝑑(𝑗, 𝑖).
𝑗∈𝒟 𝑖∈𝑆 𝑗∈𝑁(𝑖)

Similarly Õ Õ Õ
cost(𝑆∗ ) = 𝑂𝑗 = 𝑑(𝑗, 𝑖).
𝑗∈𝒟 𝑖∈𝑆 ∗ 𝑗∈𝑁 ∗ (𝑖)

Setting up the swap pairs: We create a set of pairs 𝒫 as follows. We will


assume without loss of generality that |𝑆| = |𝑆∗ | = 𝑘. For technical convenience
we also assume 𝑆 ∩ 𝑆 ∗ = ∅; we can always create dummy centers that are
co-located to make this assumption. Consider the mapping 𝜌 : 𝑆 ∗ → 𝑆 where
each 𝑖 ∈ 𝑆 ∗ is mapped to its closest center in 𝑆; hence 𝜌(𝑖) for 𝑖 ∈ 𝑆 ∗ is the closest
center in 𝑆 to it. Let 𝑅1 be the set of centers in 𝑆 that have exactly one center
in 𝑆 ∗ mapped to them. Let 𝑅0 be the set of centers in 𝑆 with no centers of 𝑆∗
mapped to them. This means that for each 𝑖 ∈ 𝑆 \ (𝑅 0 ∪ 𝑅1 ) there are two or
more centers mapped to them. Let 𝑆1∗ ⊆ 𝑆 ∗ be the centers mapped to 𝑅 1 . See
Figure 9.2. By a simple averaging argument we have the following claim.

Claim 9.3.1. 2|𝑅0 | ≥ |𝑆 ∗ \ 𝑆1∗ |.

We create a set of pairs 𝒫 as follows. There will be exactly 𝑘 pairs. For each
𝑓∗ ∈ 𝑆1∗ we add the pair (𝑟, 𝑓 ∗ ) where 𝜌( 𝑓 ∗ ) = 𝑟. For each 𝑓 ∗ ∈ 𝑆∗ \ 𝑆1∗ we add a
pair (𝑟, 𝑓 ∗ ) where 𝑟 is any arbitrary center from 𝑅0 — however we make sure
that a center 𝑟 ∈ 𝑅 0 is in at most two pairs in 𝒫; this is possible because of Claim
9.3.1.
The pairs satisfy the following property.

Claim 9.3.2. If (𝑟, 𝑓 ∗ ) ∈ 𝒫 then for any facility 𝑓ˆ∗ ≠ 𝑓 ∗ , 𝜌( 𝑓ˆ∗ ) ≠ 𝑟.

The intuition for the pairs is as follows. If 𝜌( 𝑓 ∗ ) = {𝑟} then we are essentially
forced to consider the pair (𝑟, 𝑓 ∗ ) since 𝑟 could be the only center near 𝑓 ∗ with
all other centers from 𝑆 very far away. When considering the swap (𝑟, 𝑓 ∗ ) we
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 120

Figure 9.2: Mapping between 𝑆 ∗ and 𝑆 with each 𝑓 ∗ ∈ 𝑆 ∗ mapped to its closest
center in 𝑆.

can move the clients connecting to 𝑟 to 𝑓 ∗ . On the other hand if |𝜌−1 (𝑟)| > 1 then
𝑟 is close to several centers in 𝑆 ∗ and may be serving many clients. Thus we do
not consider such an 𝑟 in the swap pairs.
The main technical claim about the swap pairs is the following.

Lemma 9.10. Let (𝑟, 𝑓 ∗ ) be a pair in 𝒫. Then


Õ Õ
0 ≤ cost(𝑆 + 𝑓 ∗ − 𝑟) − cost(𝑆) ≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 .
𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)

We defer the proof of the lemma for now and use it to show that cost(𝑆) ≤
5cost(𝑆 ∗ ). We sum over all pairs (𝑟, 𝑓 ∗ ) ∈ 𝒫 and note that each 𝑓 ∗ ∈ 𝑆 occurs in
exactly one pair and each 𝑟 ∈ 𝑆 occurs in at most two pairs. Note that 𝑂 𝑗 − 𝐴 𝑗
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 121

can be a negative number while 𝑂 𝑗 is non-negative number.

Õ © Õ Õ
0 ≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 ®
ª
­
(𝑟, 𝑓 ∗ )∈𝒫 « 𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)
Õ Õ Õ Õ ¬
≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2 2𝑂 𝑗
𝑓 ∗ ∈𝑆 ∗ 𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑟∈𝑆 𝑗∈𝑁(𝑟)
∗ ∗
≤ cost(𝑆 ) − cost(𝑆) + 4cost(𝑆 ).

This implies the desired inequality.

Figure 9.3: Two cases in proof of Lemma 9.10. Consider the swap pair (𝑟, 𝑓 ∗ ) In
the figure on the left the client 𝑗 ∈ 𝑁 ∗ ( 𝑓 ∗ ) is assigned to 𝑓 ∗ . In the figure on the
right the client 𝑗 ∈ 𝑁(𝑟) \ 𝑁 ∗ ( 𝑓 ∗ ) is is assigned to 𝑟 0 = 𝜌( 𝑓ˆ∗ ).

Proof of Lemma 9.10. Since 𝑆 is a local optimum swapping 𝑟 with 𝑓 ∗ cannot


improve the cost and hence we obtain cost(𝑆 + 𝑓 ∗ − 𝑟) − cost(𝑆) ≥ 0. We focus
on the more interesting inequality. To bound the increase in cost by removing
𝑟 and adding 𝑓 ∗ to 𝑆, we consider a specific assignment of the clients. Any
client 𝑗 ∈ 𝑁 ∗ ( 𝑓 ∗ ) is assigned to 𝑓 ∗ (even if it is suboptimal to do so). See Figure
9.3. For such a client 𝑗 the change in cost is 𝑂 𝑗 − 𝐴 𝑗 . Now consider any client
𝑗 ∈ 𝑁(𝑟) \ 𝑁 ∗ ( 𝑓 ∗ ); since 𝑟 is no longer available we need to assign it to another
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 122

facility. Which one? Let 𝜙 ∗ (𝑗) = 𝑓ˆ∗ be the facility that 𝑗 is assigned to in the
optimum solution. Note that 𝑓ˆ∗ ≠ 𝑓 ∗ . We assign 𝑗 to 𝑟 0 = 𝜌( 𝑓ˆ∗ ); from Claim 9.3.2,
𝜌( 𝑓ˆ∗ ) ≠ 𝑟 and hence 𝑟 0 ∈ 𝑆 − 𝑟 + 𝑓 ∗ . See Figure 9.3. The change in the cost for
such a client 𝑗 is 𝑑(𝑗, 𝑟 0) − 𝑑(𝑗, 𝑟). We bound it as follows

𝑑(𝑗, 𝑟 0) − 𝑑(𝑗, 𝑟) ≤ 𝑑(𝑗, 𝑓ˆ∗ ) + 𝑑( 𝑓ˆ∗ , 𝑟 0) − 𝑑(𝑗, 𝑟) (via triangle inequality)


≤ 𝑑(𝑗, 𝑓ˆ∗ ) + 𝑑( 𝑓ˆ∗ , 𝑟) − 𝑑(𝑗, 𝑟) (since 𝑟 0 is closest to 𝑓ˆ∗ in 𝑆)
≤ 𝑑(𝑗, 𝑓ˆ∗ ) + 𝑑(𝑗, 𝑓ˆ∗ ) (via triangle inequality)
= 2𝑂 𝑗 .

Every other client is assigned to its existing center in 𝑆. Thus the total
increase in the cost is obtained as
Õ Õ Õ Õ
(𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 ≤ (𝑂 𝑗 − 𝐴 𝑗 ) + 2𝑂 𝑗 .
𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)\𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁 ∗ ( 𝑓 ∗ ) 𝑗∈𝑁(𝑟)


See [11] for a description of the tight example. The example in the conference
version of the paper is buggy.

9.4 𝒌-Means
The 𝑘-Means problem is very popular in practice for a variety of reasons. In terms
of center-based clustering the goal is to choose 𝑘 centers 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐 𝑘 }
to minimize 𝑝 𝑑(𝑝, 𝐶)2 . In the discrete setting one can obtain constant factor
Í
approximation algorithms via several techniques that follow the approach of
𝑘-Median. We note that the squared distance does not satisfy triangle inequality,
however, it satisfies a relaxed triangle inequality and this is sufficient to generalize
certain LP rounding and local search techniques.
In practice the continuous version is popular for clustering applications.
The input points 𝑃 are in the Euclidean space ℝ 𝑑 where 𝑑 is typically large.
Let 𝑋 be the set of input points where each 𝑥 ∈ 𝑋 is now a 𝑑-dimensional
vector. The centers are now allowed to be in ambient space. This is called
the Euclidean 𝑘-Means. Here the squared distance actually helps in a certain
sense. For instance if 𝑘 = 1 then we can see that the optimum center is simply
𝑥∈𝑋 𝑥 — in other words we take the “average”. One can see this
1 Í
obtained by |𝑃|
by considering the problem of finding the center as an optimization problem:
min 𝑦∈ℝ 𝑑 𝑥∈𝑋 k𝑥 − 𝑦k 22 = min 𝑦∈ℝ 𝑑 𝑥∈𝑋 (𝑥 𝑖 − 𝑦 𝑖 )2 . It can be seen that we can
Í Í
optimize in each dimension separately and that the optimum in dimension
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 123

𝑖, can be seen to be 𝑦 𝑖∗ = |𝑋1 | 𝑥 𝑖 . Surprisingly, hardness results for Euclidean


𝑘-Means were only established in the last decade. NP-hardness even when 𝑑 = 2
is established in [118], and APX-hardness for high dimensions was shown in
[13].

9.5 Lloyd’s algorithm, 𝑫 2 -sampling and 𝒌-Means ++


Llyod’s algorithm is a very well-known and widely used heuristic for the
Euclidean 𝑘-Means problem. It can viewed as a local search algorithm with
an alternating optimization flavor. The algorithm starts with a set of centers
𝑐 1 , 𝑐2 , . . . , 𝑐 𝑘 which are typically chosen randomly in some fashion. The centers
define clusters 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 based on assigning each point to its nearest center.
That is 𝐶 𝑖 is the set of all points in 𝑋 that are closest to 𝑐 𝑖 (ties broken in some
fashion). Once we have the clusters, via the observation above, one can find the
best center for that cluster by taking the mean of the points in the cluster. That
is, for each 𝐶 𝑖 we find a new center 𝑐 0𝑖 = |𝐶1 | 𝑥∈𝐶 𝑖 𝑥 (if 𝐶 𝑖 is empty we simply
Í
𝑖
set 𝑐 0𝑖 = 𝑐 𝑖 ). Thus we have a new set of centers and we repeat the process until
convergence or some time limit. It is clear that the cost can only improve by
recomputing the centers since we know the optimum center for 𝑘 = 1 is obtained
by using the average of the points.

Lloyds-𝑘-Means(𝑋 , 𝑘 )

1. Seeding: Pick 𝑘 centers 𝑐 1 , 𝑐 2 , . . . , 𝑐 𝑘

2. repeat

A. Find clusters 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 where


𝐶 𝑖 = {𝑥 ∈ 𝑋 | 𝑐 𝑖 is closest center to 𝑥}
𝑘 Õ
Õ
B. cost = 𝑑(𝑥, 𝑐 𝑖 )2 .
𝑖=1 𝑥∈𝐶 𝑖
1 Õ
C. For 𝑖 = 1 to 𝑘 do 𝑐 𝑖 = 𝑥
|𝐶 𝑖 |
𝑥∈𝐶 𝑖

3. Until cost improvement is too small

4. Output clusters 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘

There are two issues with the algorithm. The first issue is that the algorithm
can, in the worst-case, run for an exponential number of iterations. This issue
is common for many local search algorithms and as we discussed, it can be
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 124

overcome by stopping when cost improvement is too small in a relative sense.


The second issue is the more significant one. The algorithm can get stuck in a
local optimum which can be arbitrarily bad when compared to the optimum
solution. See figure below for a simple example.

Figure 9.4: Example demonstrating that a local optimum for Lloyd’s algorithm
can be arbitrarily bad compared to the optimum clustering. The green clusters
are the optimum ones and the red ones are the local optimum.

𝑫 2 -sampling and 𝒌-Means ++: To overcome the bad local optima it is common
to run the algorithm with random starting centers. Arthur and Vassilvitskii
[151] suggested a specific random sampling scheme to initialize the centers that
is closely related to independent work in [130]. This is called 𝐷 2 sampling.

𝐷 2 -sampling-𝑘-Means ++(𝑋 , 𝑘 )

1. 𝑆 = {𝑐 1 } where 𝑐 1 is chosen uniformly from 𝑋

2. for 𝑖 = 2 to 𝑘 do

A. Choose 𝑐 𝑖 randomly where P[𝑐 𝑖 = 𝑥] ' 𝑑(𝑥, 𝑆)2


B. 𝑆 ← 𝑆 ∪ {𝑐 𝑖 }

3. Output 𝑆

𝑘-Means ++ is Lloyd’s algorithm intialized with 𝑘 centers obtained from 𝐷 2


sampling.
Theorem 9.8 ([151]). Let 𝑆 be the output of Lloyd’s algorithm initialized with 𝐷 2
sampling. Then E[cost(𝑆)] ≤ 8(ln 𝑘 + 2) OPT. Moreover there are examples showing
that it is no better than 2 ln 𝑘 competitive.
CHAPTER 9. CLUSTERING AND FACILITY LOCATION 125

The analysis establishes that the seeding already creates a good approxi-
mation, so in a sense the local search is only refining the initial approximation.
[4, 6] show that if one uses 𝑂(𝑘) centers, initialized according to 𝐷 2 sampling,
then the local optimum will yield a constant factor approximation with constant
probability; note that this is a bicriteria approximation where the number of
centers is a constant factor more than 𝑘 and the cost is being compared with
respect to the optimum cost with 𝑘 centers. The authors also show that there is a
subset of 𝑘 centers from the output of the algorithm that yields a constant factor
approximation. One can then run a discrete optimization algorithm using the
centers. Another interesting result based on 𝐷 2 -sampling ideas yields a PTAS
˜ 2
but the running time is of the form 𝑂(𝑛𝑑2𝑂(𝑘 /𝜖) ) [92]. See [15] for a scalable
version of 𝑘-Means ++.

9.6 Bibliographic Notes


Chapter 10

Introduction to Network Design

(Parts of this chapter are based on previous scribed lecture notes by Nitish
Korula and Sungjin Im.)
Network Design is a broad topic that deals with finding a subgraph 𝐻 of a
given graph 𝐺 = (𝑉 , 𝐸) of minimum cost while satisfying certain requirements.
𝐺 represents an existing network or a constraint over where one can build. The
subgraph 𝐻 is what we want to select/build. Many natural problems can be
viewed this way. For instance the minimum spanning tree (MST) can be viewed
as follows: given an undirected graph 𝐺 = (𝑉 , 𝐸) with edge costs 𝑐 : 𝐸 → ℝ+ ,
find the cheapest connected spanning (spanning means that all vertices are
included) subgraph of 𝐺. The fact that a minimal solution is a tree is clear, but
the point is that the motivation does not explicitly mention the requirement that
the output be a tree.
Connectivity problems are a large part of network design. As we already saw
MST is the most basic one and can be solved in polynomial time. The Steiner Tree
problem is a generalization where we are given a subset 𝑆 of terminals in an edge-
weighted graph 𝐺 = (𝑉 , 𝐸) and the goal is to find a cheapest connected subgraph
that contains all terminals. This is NP-Hard. Traveling Salesman Problem (TSP)
and its variants can also be viewed as network design problems. Network design
is heavily motivated by real-world problems in telecommunication networks and
those problems combine aspects of connectivity and routing and in this context
there are several problems related to buy-at-bulk network design, fixed-charge
flow problems etc.
Graph theory plays an important role in most network algorithmic questions.
The complexity and nature of the problems vary substantially based on whether
the graph is undirected or directed. To illustrate this consider the Directed Steiner
Tree problem. Here 𝐺 = (𝑉 , 𝐸) is a directed graph with non-negative edge/arc
weights, and we are given a root 𝑟 and a set of terminals 𝑆 ⊆ 𝑉. The goal is to

126
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 127

find a cheapest subgraph 𝐻 of 𝐺 such that 𝑟 has a path to each terminal 𝑡 ∈ 𝑆.


Note that when 𝑆 = 𝑉 the problem is the minimum-cost arborescence problem
and is solvable in polynomial-time. One can see that Directed Steiner Tree is
a generalization of the (undirected) Steiner Tree problem. While Steiner Tree
problem admits a constant factor approximation, it is easy to show that Directed
Steiner Tree is at least as hard as Set Cover. This immediately implies that it is
hard to approximate to within an Ω(log |𝑆|) factor. In fact, substantial technical
work has shown that it is in fact harder than Set Cover [76], while we only have
a quasi-polynomial time algorithm that gives a poly-logarithmic approximation
[34]. In even slightly more general settings, the directed graph problems get
much harder when compared to their undirected graph counterparts. This
phenomena is generally true in many graph related problems and hence the
literature mostly focuses on undirected graphs. We refer the reader to some
surveys on network design [72, 107, 129].
This chapter will focus on two basic problems which are extensively studied
and describe some simple approximation algorithms for them. Pointers are
provided for sophisticated results including some very recent ones.

10.1 The Steiner Tree Problem


In the Steiner Tree problem, the input is a graph 𝐺(𝑉 , 𝐸), together with a set
of terminals 𝑆 ⊆ 𝑉, and a cost 𝑐(𝑒) for each edge 𝑒 ∈ 𝐸. The goal is to find a
minimum-cost tree that connects all terminals, where the cost of a subgraph is
the sum of the costs of its edges.
The Steiner Tree problem is NP-Hard, and also APX-Hard [22]. The latter
means that there is a constant 𝛿 > 1 such that it is NP-Hard to approximate the
solution to within a ratio of less than 𝛿; it is currently known that it is hard to
approximate the Steiner Tree problem to within a ratio of 95/94 [42].1
Remark 10.1. If |𝑆| = 2 (that is, there are only 2 terminals), an optimal Steiner
Tree is simply a shortest path between these 2 terminals. If 𝑆 = 𝑉 (that is, all
vertices are terminals), an optimal solution is simply a minimum spanning tree
of the input graph. In both these cases, the problem can be solved exactly in
polynomial time.
Remark 10.2. There is 𝑐 𝑘 poly(𝑛)-time algorithm where 𝑘 is the number of
terminals. Can you figure it out?

1Variants of the Steiner Tree problem, named after Jakob Steiner, have been studied by Fermat,
Weber, and others for centuries. The front cover of the course textbook contains a reproduction of
a letter from Gauss to Schumacher on a Steiner tree question.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 128

Definition 10.1. Given a connected graph 𝐺(𝑉 , 𝐸) with edge costs, the metric com-
pletion of 𝐺 is a complete graph 𝐻(𝑉 , 𝐸0) such that for each 𝑢, 𝑣 ∈ 𝑉, the cost of
edge 𝑢𝑣 in 𝐻 is the cost of the shortest path in 𝐺 from 𝑢 to 𝑣. The graph 𝐻 with
edge costs is a metric on 𝑉, because the edge costs satisfy the triangle inequality:
∀𝑢, 𝑣, 𝑤, 𝑐𝑜𝑠𝑡(𝑢𝑣) ≤ 𝑐𝑜𝑠𝑡(𝑢𝑤) + 𝑐𝑜𝑠𝑡(𝑤𝑣).

5 5
6 7
1 2 1 2

4 3
Figure 10.1: On the left, a graph. On the right, its metric completion, with new
edges and modified edge costs in red.

Observation 10.2. To solve the Steiner Tree problem on a graph 𝐺, it suffices to solve
it on the metric completion of 𝐺.

We now look at two approximation algorithms for the Steiner Tree problem.

10.1.1 The MST Algorithm


The following algorithm, with an approximation ratio of (2 − 2/|𝑆|) is due to
[148]:

SteinerMST(𝐺(𝑉 , 𝐸), 𝑆 ⊆ 𝑉):


Let 𝐻(𝑉 , 𝐸0) ← metric completion of 𝐺.
Let 𝑇 ← MST of 𝐻[𝑆].
Output 𝑇.

(Here, we use the notation 𝐻[𝑆] to denote the subgraph of 𝐻 induced by the set
of terminals 𝑆.)

The following lemma is central to the analysis of the algorithm SteinerMST.

Lemma 10.1. For any instance 𝐼 of Steiner Tree, let 𝐻 denote the metric completion
of the graph, and 𝑆 the set of terminals. There exists a spanning tree in 𝐻[𝑆] (the graph
1
induced by terminals) of cost at most 2(1 − |𝑆| ) OPT, where OPT is the cost of an optimal
solution to instance 𝐼.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 129

5 5
6 7 6 7 6
1 2 1 2

4 3 3 3
Graph 𝐺 Metric Completion 𝐻 𝐻[𝑆] Output Tree 𝑇

Figure 10.2: Illustrating the MST Heuristic for Steiner Tree

Before we prove the lemma, we note that if there exists some spanning tree
in 𝐻[𝑆] of cost at most 2(1 − |𝑆|
1
) OPT, the minimum spanning tree has at most
this cost. Therefore, Lemma 10.1 implies that the algorithm SteinerMST is a
1
2(1 − |𝑆| )-approximation for the Steiner Tree problem.

Proof. Proof of Lemma 10.1 Let 𝑇 ∗ denote an optimal solution in 𝐻 to the given
instance, with cost 𝑐(𝑇 ∗ ). Double all the edges of 𝑇 ∗ to obtain an Eulerian graph,
and fix an Eulerian Tour 𝑊 of this graph. See Fig 10.3. Now, shortcut edges of
𝑊 to obtain a tour 𝑊 0 of the vertices in 𝑇 ∗ in which each vertex is visited exactly
once. Again, shortcut edges of 𝑊 0 to eliminate all non-terminals; this gives a
walk 𝑊 00 that visits each terminal exactly once.

8 4
2 7
9 6 5
10
1
13 12
14 11

Optimal Tree 𝑇 ∗ Eulerian Walk 𝑊


3

8 4
2 7
9 6 5
10
1
13 12
14 11

Blue edges show shortcut tour 𝑊 0 Red edges show shortcut walk 𝑊 00 on terminals

Figure 10.3: Doubling edges of 𝑇 ∗ and shortcutting gives a low-cost spanning


tree on terminals.

It is easy to see that 𝑐(𝑊 00) ≤ 𝑐(𝑊 0) ≤ 𝑐(𝑊) = 2𝑐(𝑇 ∗ ), where the inequalities
follow from the fact that by shortcutting, we can only decrease the length of the
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 130

walk. (Recall that we are working in the metric completion 𝐻.) Now, delete the
heaviest edge of 𝑊 00 to obtain a path through all the terminals in 𝑆, of cost at
1
most (1 − |𝑆| )𝑐(𝑊 00). This path is a spanning tree of the terminals, and contains
only terminals; therefore, there exists a spanning tree in 𝐻[𝑆] of cost at most
1
2(1 − |𝑆| )𝑐(𝑇 ∗ ). 

A tight example: The following example (Fig. 4 below) shows that this analysis
is tight; there are instances of Steiner Tree where the SteinerMST algorithm
finds a tree of cost 2(1 − 𝑆1 ) OPT. Here, each pair of terminals is connected by an
edge of cost 2, and each terminal is connected to the central non-terminal by an
edge of cost 1. The optimal tree is a star containing the central non-terminal,
with edges to all the terminals; it has cost |𝑆|. However, the only trees in 𝐻[𝑆]
are formed by taking |𝑆| − 1 edges of cost 2; they have cost 2(|𝑆| − 1).

2 2 2 2 2 2
2 2 2 2 2 2
1 1
1 1
2 1 2 2 2 2 2
1
1
2 1 2 2 2 2 2
1 1
1 1
2 2 2 2 2 2
2 2 2 2 2 2
Graph 𝐺; not all edges shown 𝐻[𝑆]; not all edges shown. An MST of 𝐻[𝑆].
Figure 10.4: A tight example for the SteinerMST algorithm

10.1.2 The Greedy/Online Algorithm


We now describe another simple algorithm for the Steiner Tree problem, due to
[89].

GreedySteiner(𝐺(𝑉 , 𝐸), 𝑆 ⊆ 𝑉):


Let {𝑠 1 , 𝑠2 , . . . 𝑠 |𝑆| } be an arbitrary ordering of the terminals.
Let 𝑇 ← {𝑠 1 }
For (𝑖 from 2 to |𝑆|):
Let 𝑃𝑖 be the shortest path in 𝐺 from 𝑠 𝑖 to 𝑇.
Add 𝑃𝑖 to 𝑇.

GreedySteiner is a dlog2 |𝑆|e-approximation algorithm; here, we prove a


slightly weaker result.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 131

Theorem 10.3. The algorithm GreedySteiner has an approximation ratio of 2𝐻|𝑆| ≈


2 ln |𝑆|, where 𝐻𝑖 = 𝑖𝑗=1 1/𝑗 denotes the 𝑖’th harmonic number.
Í

Note that this is an online algorithm; terminals are considered in an arbitrary


order, and when a terminal is considered, it is immediately connected to the
existing tree. Thus, even if the algorithm could not see the entire input at once, but
instead terminals were revealed one at a time and the algorithm had to produce
a Steiner tree at each stage, the algorithm GreedySteiner outputs a tree of cost
no more than 𝑂(log |𝑆|) times the cost of the optimal tree.

To prove Theorem 10.3, we introduce some notation. Let 𝑐(𝑖) denote the cost
of the path 𝑃𝑖 used in the 𝑖th iteration to connect the terminal 𝑠 𝑖 to the already
Í|𝑆|
existing tree. Clearly, the total cost of the tree is 𝑖=1 𝑐(𝑖). Now, let {𝑖1 , 𝑖2 , . . . 𝑖 |𝑆| }
be a permutation of {1, 2, . . . |𝑆|} such that 𝑐(𝑖1 ) ≥ 𝑐(𝑖2 ) ≥ . . . ≥ 𝑐(𝑖 |𝑆| ). (That is,
relabel the terminals in decreasing order of the cost paid to connect them to the
tree that exists when they are considered by the algorithm.)

Claim 10.1.1. For all 𝑗, the cost 𝑐(𝑖 𝑗 ) is at most 2 OPT/𝑗, where OPT is the cost of an
optimal solution to the given instance.

Proof. Suppose by way of contradiction this were not true; since 𝑠 𝑖 𝑗 is the terminal
with 𝑗th highest cost of connection, there must be 𝑗 terminals that each pay more
than 2 OPT/𝑗 to connect to the tree that exists when they are considered. Let
𝑆0 = {𝑠 𝑖1 , 𝑠 𝑖2 , . . . 𝑠 𝑖 𝑗 } denote this set of terminals.
We argue that no two terminals in 𝑆0 ∪ {𝑠 1 } are within distance 2 OPT/𝑗 of
each other. If some pair 𝑥, 𝑦 were within this distance, one of these terminals
(say 𝑦) must be considered later by the algorithm than the other. But then the
cost of connecting 𝑦 to the already existing tree (which includes 𝑥) must be at
most 2 OPT/𝑗, and we have a contradiction.
Therefore, the minimum distance between any two terminals in 𝑆0 ∪ {𝑠 1 }
must be greater than 2 OPT/𝑗. Since there must be 𝑗 edges in any MST of these
terminals, an MST must have cost greater than 2 OPT. But the MST of a subset
of terminals cannot have cost more than 2 OPT, exactly as argued in the proof of
Lemma 10.1. Therefore, we obtain a contradiction. 
Given this claim, it is easy to prove Theorem 10.3.
|𝑆| |𝑆| |𝑆| |𝑆|
Õ Õ Õ 2 OPT Õ 1
𝑐(𝑖) = 𝑐(𝑖 𝑗 ) ≤ = 2 OPT = 2𝐻|𝑆| · OPT .
𝑗 𝑗
𝑖=1 𝑗=1 𝑗=1 𝑗=1

Question 10.2. Give an example of a graph and an ordering of terminals such


that the output of the Greedy algorithm is Ω(log |𝑆|) OPT.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 132

Remark 10.3. We emphasize again that the analysis above holds for every ordering
of the terminals. A natural variant might be to adaptively order the terminals so
that in each iteration 𝑖 , the algorithm picks the terminal 𝑠 𝑖 to be the one closest
to the already existing tree 𝑇 built in the first 𝑖 iterations. Do you see that this
is equivalent to using the MST Heuristic with Prim’s algorithm for MST? This
illustrates the need to be careful in the design and analysis of heuristics.

10.1.3 LP Relaxation
A natural LP relaxation for the Steiner Tree problem is the following. For each
edge 𝑒 ∈ 𝐸 we have an indicator variable 𝑥 𝑒 to decide if we choose to include
𝑒 in our solution. The chosen edges should ensure that no two terminals are
separated. We write this via a constraint 𝑒∈𝛿(𝐴) 𝑥 𝑒 ≥ 1 for any set 𝐴 ⊂ 𝑉 such
Í
that 𝐴 contains a terminal and 𝑉 \ 𝐴 contains a terminal.

min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 1 𝐴 ∩ 𝑆 ≠ ∅, (𝑉 − 𝐴) ∩ 𝑆 ≠ ∅
𝑒∈𝛿(𝐴)
𝑥𝑒 ≥ 0
Note that the preceding LP has an exponential number of constraints.
However, there is a polynomial-time separation oracle. Given 𝑥 it is feasible for
the LP iff the 𝑠-𝑡 cut value is at least 1 between any two terminals 𝑠, 𝑡 ∈ 𝑆 with
edge capacities given by 𝑥. How good is this LP relaxation? We will see later
that there is a 2(1 − 1/|𝑆|)-approximation via this LP. Interestingly the LP has an
integrality gap of 2(1 − 1/|𝑆|) even if 𝑆 = 𝑉 in which case we want to solve the
MST problem! Despite the weakness of this cut based LP for these simple cases,
we will see later that it generalizes nicely for higher connectivity problems and
one can derive a 2-approximation even for those much more difficult problems.

10.1.4 Other Results on Steiner Trees


The 2-approximation algorithm using the MST Heuristic is not the best approxi-
mation algorithm for the Steiner Tree problem currently known. Some other
results on this problem are listed below.

1. The first algorithm to obtain a ratio of better than 2 was due to due to
Alexander Zelikovsky [160]; the approximation ratio of this algorithm was
11/6 ≈ 1.83. This was improved to 1 + ln23 ≈ 1.55 [135] and is based on
a local search based improvement starting with the MST heuristic, and
follows the original approach of Zelikovsky.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 133

2. Byrka et al gave an algorithm with an approximation ratio of 1.39 = ln 4 + 𝜖


[26] which is currently the best known for this problem. This was originally
based on a combination of techniques and subsequently there is an LP
based proof [65] that achieves the same approximation for the so-called
Hypergraphic LP relaxation.

3. The bidirected cut LP relaxation for the Steiner Tree was proposed by [52];
1
it has an integrality gap of at most 2(1 − |𝑆| ), but it is conjectured that
the gap is smaller. No algorithm is currently known that exploits this
LP relaxation to obtain an approximation ratio better than that of the
SteinerMST algorithm. Though the true integrality gap is not known,
there are examples that show it is at least 6/5 = 1.2 [153].

4. For many applications, the vertices can be modeled as points on the plane,
where the distance between them is simply the Euclidean distance. The
MST-based algorithm performs√ fairly well on such instances; it has an
approximation ratio of 2/ 3 ≈ 1.15 [51]. An example which achieves
this bound is three points at the corners of an equilateral triangle, say
of side-length 1; the MST heuristic outputs a tree of cost 2 (two sides of
the triangle) while the optimum solution is to connect the three points
to a Steiner vertex which is the circumcenter of the triangle. One can
do better still for instances in the plane (or in any Euclidean space of
small-dimensions); for any 𝜖 > 0, there is a 1 + 𝜖-approximation algorithm
that runs in polynomial time [10]. Such an approximation scheme is also
known for planar graphs [23] and more generally bounded-genus graphs.

10.2 The Traveling Salesperson Problem (TSP)


10.2.1 TSP in Undirected Graphs
In the Traveling Salesperson Problem (TSP), we are given an undirected graph
𝐺 = (𝑉 , 𝐸) and cost 𝑐(𝑒) > 0 for each edge 𝑒 ∈ 𝐸. Our goal is to find a Hamiltonian
cycle with minimum cost. A cycle is said to be Hamiltonian if it visits every
vertex in 𝑉 exactly once.
TSP is known to be NP-Hard. Moreover, we cannot hope to find a good
approximation algorithm for it unless 𝑃 = 𝑁𝑃. This is because if one can give a
good approximation solution to TSP in polynomial time, then we can exactly
solve the NP-Complete Hamiltonian cycle problem (HAM) in polynomial time,
which is not possible unless 𝑃 = 𝑁𝑃. Recall that HAM is a decision problem:
given a graph 𝐺 = (𝑉 , 𝐸), does 𝐺 have a Hamiltonian cycle?
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 134

Theorem 10.4 ([136]). Let 𝛼 : ℕ → ℕ be a polynomial-time computable function.


Unless 𝑃 = 𝑁𝑃 there is no polynomial-time algorithm that on every instance 𝐼 of TSP
outputs a solution of cost at most 𝛼(|𝐼 |) · OPT(𝐼).

Proof. For the sake of contradiction, suppose we have an approximation algo-


rithm 𝒜 for TSP with an approximation ratio 𝛼(|𝐼 |). We show a contradiction
by showing that using 𝒜, we can exactly solve HAM in polynomial time. Let
𝐺 = (𝑉 , 𝐸) be the given instance of HAM. We create a new graph 𝐻 = (𝑉 , 𝐸0)
with cost 𝑐(𝑒) for each 𝑒 ∈ 𝐸0 such that 𝑐(𝑒) = 1 if 𝑒 ∈ 𝐸, otherwise 𝑐(𝑒) = 𝐵,
where 𝐵 = 𝑛𝛼(𝑛) + 2 and 𝑛 = |𝑉 |. Note that this is a polynomial-time reduction
since 𝛼 is a polynomial-time computable function.
We observe that if 𝐺 has a Hamiltonian cycle, OPT = 𝑛, otherwise OPT ≥
𝑛 − 1 + 𝐵 ≥ 𝑛𝛼(𝑛) + 1. (Here, OPT denotes the cost of an optimal TSP solution
in 𝐻.) Note that there is a “gap” between when 𝐺 has a Hamiltonian cycle and
when it does not. Thus, if 𝒜 has an approximation ratio of 𝛼(𝑛), we can tell
whether 𝐺 has a Hamiltonian cycle or not: Simply run 𝒜 on the graph 𝐻; if
𝒜 returns a TSP tour in 𝐻 of cost at most 𝛼(𝑛)𝑛 output that 𝐺 has a Hamilton
cycle, otherwise output that 𝐺 has no Hamilton cycle. We leave it as an exercise
to formally verify that this would solve HAM in polynomial time. 

Since we cannot even approximate the general TSP problem, we consider


more tractable variants.

• Metric-TSP: In Metric-TSP, the instance is a complete graph 𝐺 = (𝑉 , 𝐸)


with cost 𝑐(𝑒) on 𝑒 ∈ 𝐸, where 𝑐 satisfies the triangle inequality, i.e.
𝑐(𝑢𝑤) ≤ 𝑐(𝑢𝑣) + 𝑐(𝑣𝑤) for any 𝑢, 𝑣, 𝑤 ∈ 𝑉.

• TSP-R: TSP with repetitions of vertices allowed. The input is a graph


𝐺 = (𝑉 , 𝐸) with non-negative edge costs as in TSP. Now we seek a
minimum-cost walk that visits each vertex at least once and returns to the
starting vertex.

Exercise 10.1. Show that an 𝛼-approximation for Metric-TSP implies an 𝛼-


approximation for TSP-R and vice-versa.

We focus on Metric-TSP for the rest of this section. We first consider a natural
greedy approach, the Nearest Neighbor Heuristic (NNH).

Nearest Neighbor Heuristic(𝐺(𝑉 , 𝐸), 𝑐 : 𝐸 → ℛ + ):


Start at an arbitrary vertex 𝑠,
While (there are unvisited vertices)
From the current vertex 𝑢, go to the nearest unvisited vertex 𝑣.
Return to 𝑠.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 135

Exercise 10.2. 1. Prove that NNH is an 𝑂(log 𝑛)-approximation algorithm.


(Hint: Think back to the proof of the 2𝐻|𝑆| -approximation for the Greedy
Steiner Tree Algorithm.)

2. NNH is not an 𝑂(1)-approximation algorithm; can you find an example


to show this? In fact one can show a lower bound of Ω(log 𝑛) on the
approximation-ratio achieved by NNH.

There are constant-factor approximation algorithms for TSP; we now consider


an MST-based algorithm. See Fig 10.5.

TSP-MST(𝐺(𝑉 , 𝐸), 𝑐 : 𝐸 → ℛ + ):
Compute an MST 𝑇 of 𝐺.
Obtain an Eulerian graph 𝐻 = 2𝑇 by doubling edges of 𝑇
An Eulerian tour of 2𝑇 gives a tour in 𝐺.
Obtain a Hamiltonian cycle by shortcutting the tour.

Figure 10.5: MST Based Heuristic

Theorem 10.5. MST heuristic(TSP-MST) is a 2-approximation algorithm.

Proof. We have 𝑐(𝑇) = 𝑒∈𝐸(𝑇) 𝑐(𝑒) ≤ OPT, since we can get a spanning tree in
Í
𝐺 by removing any edge from the optimal Hamiltonian cycle, and 𝑇 is a MST.
Thus 𝑐(𝐻) = 2𝑐(𝑇) ≤ 2 OPT. Also shortcutting only decreases the cost. 
We observe that the loss of a factor 2 in the approximation ratio is due to
doubling edges; we did this in order to obtain an Eulerian tour. But any graph
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 136

in which all vertices have even degree is Eulerian, so one can still get an Eulerian
tour by adding edges only between odd degree vertices in 𝑇. Christofides
Heuristic [43] exploits this and improves the approximation ratio from 2 to 3/2.
See Fig 10.6 for a snapshot.

Figure 10.6: Christofides Heuristic

Christofides Heuristic(𝐺(𝑉 , 𝐸), 𝑐 : 𝐸 → ℛ + ):


Compute an MST 𝑇 of 𝐺.
Let 𝑆 be the vertices of odd degree in 𝑇. (Note: |𝑆| is even)
Find a minimum cost matching 𝑀 on 𝑆 in 𝐺
Add 𝑀 to 𝑇 to obtain an Eulerian graph 𝐻.
Compute an Eulerian tour of 𝐻.
Obtain a Hamilton cycle by shortcutting the tour.

Theorem 10.6. Christofides Heuristic is a 1.5-approximation algorithm.


Proof. The main part of the proof is to show that 𝑐(𝑀) ≤ .5 OPT. Suppose
that 𝑐(𝑀) ≤ .5 OPT. Then, since the solution of Christofides Heuristic is
obtained by shortcutting the Eulerian tour on 𝐻, its cost is no more than
𝑐(𝐻) = 𝑐(𝑇) + 𝑐(𝑀) ≤ 1.5 OPT. (Refer to the proof of Lemma 10.5 for the fact
𝑐(𝑇) ≤ OPT.) Therefore we focus on proving that 𝑐(𝑀) ≤ .5 OPT.
Let 𝐹 ∗ be an optimal tour in 𝐺 of cost OPT; since we have a metric-instance we
can assume without loss of generality that 𝐹 ∗ is a Hamiltonian cycle. We obtain
a Hamiltonian cycle 𝐹𝑆∗ in the graph 𝐺[𝑆] by short-cutting the portions of 𝐹 ∗
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 137

that touch the vertices 𝑉 \ 𝑆. By the metric-condition, 𝑐(𝐹𝑆∗ ) ≤ 𝑐(𝐹 ∗ ) = OPT. Let
𝑆 = {𝑣 1 , 𝑣2 , . . . , 𝑣 |𝑆| }. Without loss of generality 𝐹𝑆∗ visits the vertices of 𝑆 in the
order 𝑣1 , 𝑣2 , . . . , 𝑣 |𝑆| . Recall that |𝑆| is even. Let 𝑀1 = {𝑣 1 𝑣 2 , 𝑣3 𝑣 4 , ...𝑣 |𝑆|−1 𝑣 |𝑆| }
and 𝑀2 = {𝑣2 𝑣3 , 𝑣4 𝑣5 , ...𝑣 |𝑆| 𝑣 1 }. Note that both 𝑀1 and 𝑀2 are matchings,
and 𝑐(𝑀1 ) + 𝑐(𝑀2 ) = 𝑐(𝐹𝑆∗ ) ≤ OPT. We can assume without loss of generality
that 𝑐(𝑀1 ) ≤ 𝑐(𝑀2 ). Then we have 𝑐(𝑀1 ) ≤ .5 OPT. Also we know that
𝑐(𝑀) ≤ 𝑐(𝑀1 ), since 𝑀 is a minimum cost matching on 𝑆 in 𝐺[𝑆]. Hence we
have 𝑐(𝑀) ≤ 𝑐(𝑀1 ) ≤ .5 OPT, which completes the proof. 

10.2.2 LP Relaxation
We describe a well-known LP relaxation for TSP called the Subtour-Elimination
LP and sometimes also called the Held-Karp LP relaxation although the formu-
lation was first given by Dantzig, Fulkerson and Johnson [47]. The LP relaxation
has a variable 𝑥 𝑒 for each edge 𝑒 ∈ 𝐸. Note that the TSP solution is a Hamilton
Cycle of least cost. A Hamilton cycle can be viewed as a connected subgraph of 𝐺
with degree 2 at each vertex. Thus we write the degree constraints and also the
cut constraints.

min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 = 2 𝑣∈𝑉
𝑒∈𝛿(𝑣)
Õ
𝑥𝑒 ≥ 2 ∅(𝑆(𝑉
𝑒∈𝛿(𝑆)
𝑥𝑒 ∈ [0, 1] 𝑒∈𝐸

The relaxation is not useful for a general graph since we saw that TSP is not
approximable. To obtain a relaxation for Metric-TSP we apply the above to the
metric completion of the graph 𝐺.
Another alternative is to consider the following LP which view the problem
as finding a connected Eulerian multi-graph of the underlying graph 𝐺. In other
words we are allowed to take an integer number of copies of each edge with the
constraint that the degree of each vertex is even and the graph is connected. It
is not easy to write the even degree condition since we do not have an apriori
bound. Instead one can write the following simpler LP, and interestingly one
can show that its optimum value is the same as that of the preceding relaxation
(when applied to the metric completion).
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 138

min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 2 ∅(𝑆(𝑉
𝑒∈𝛿(𝑆)
𝑥𝑒 ∈ 0 𝑒∈𝐸

Wolsey showed that the 3/2-approximation of Christofides can be analyzed


with respect to the LP above. Hence the integrality gap of the LP is at most 3/2
for Metric-TSP. Is it better? There is a well-known example which shows that the
gap is at least 4/3. The 4/3 conjecture states that the worst-case integrality gap
is at most 4/3. This has been an unsolved problem for many decades and it is
very very recently that the 3/2 barrier was broken.
Remarks:

1. In practice, local search heuristics are widely used and they perform
extremely well. A popular heuristic 2-Opt is to swap pairs from 𝑥 𝑦, 𝑧𝑤 to
𝑥𝑧, 𝑦𝑤 or 𝑥𝑤, 𝑦𝑧, if it improves the tour.

2. It was a major open problem to improve the approximation ratio of 32 for


Metric-TSP; it is conjectured that the Held-Karp LP relaxation [81] gives a
ratio of 43 . In a breakthrough Oveis-Gharan, Saberi and Singh [64] obtained
a 3/2 − 𝛿 approximation for some small but fixed 𝛿 > 0 for the important
special case where 𝑐(𝑒) = 1 for each edge 𝑒 (called Graphic-TSP). Very
recently the 3/2 ratio was finally broken for the general case [99].

10.2.3 TSP in Directed Graphs


In this subsection, we consider TSP in directed graphs. As in undirected TSP, we
need to relax the problem conditions to get any positive result. Again, allowing
each vertex to be visited multiple times is equivalent to imposing the asymmetric
triangle inequality 𝑐(𝑢, 𝑤) ≤ 𝑐(𝑢, 𝑣) + 𝑐(𝑣, 𝑤) for all 𝑢, 𝑣, 𝑤. This is called the
asymmetric TSP (ATSP) problem. We are given a directed graph 𝐺 = (𝑉 , 𝐴)
with cost 𝑐(𝑎) > 0 for each arc 𝑎 ∈ 𝐴 and our goal is to find a closed walk visiting
all vertices. Note that we are allowed to visit each vertex multiple times, as we
are looking for a walk, not a cycle. For an example of a valid Hamiltonian walk,
see Fig 10.7.
The MST-based heuristic for the undirected case has no meaningful general-
ization to the directed setting This is because costs on edges are not symmetric.
Hence, we need another approach. The Cycle Shrinking Algorithm repeatedly
finds a min-cost cycle cover and shrinks cycles, combining the cycle covers
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 139

Figure 10.7: A directed graph and a valid Hamiltonian walk

found. Recall that a cycle cover is a collection of disjoint cycles covering all
vertices. It is known that finding a minimum-cost cycle cover can be done in
polynomial time (see Homework 0). The Cycle Shrinking Algorithm achieves a
log2 𝑛 approximation ratio.

Cycle Shrinking Algorithm(𝐺(𝑉 , 𝐴), 𝑐 : 𝐴 → ℛ + ):


Transform 𝐺 s.t. 𝐺 is complete and satisfies 𝑐(𝑢, 𝑣) + 𝑐(𝑣, 𝑤) ≥ 𝑐(𝑢, 𝑤) for ∀𝑢, 𝑣, 𝑤
If |𝑉 | = 1 output the trivial cycle consisting of the single node
Find a minimum cost cycle cover with cycles 𝐶1 , . . . , 𝐶 𝑘
From each 𝐶 𝑖 pick an arbitrary proxy node 𝑣 𝑖
Recursively solve problem on 𝐺[{𝑣1 , . . . , 𝑣 𝑘 }] to obtain a solution 𝐶
𝐶 0 = 𝐶 ∪ 𝐶1 ∪ 𝐶2 . . . 𝐶 𝑘 is a Eulerian graph.
Shortcut 𝐶 0 to obtain a cycle on 𝑉 and output 𝐶 0.

For a snapshot of the Cycle Shrinking Algorithm, see Fig 10.8.

Figure 10.8: A snapshot of Cycle Shrinking Algorithm. To the left, a cycle cover
𝒞1 . In the center, blue vertices indicate proxy nodes, and a cycle cover 𝒞2 is
found on the proxy nodes. To the right, pink vertices are new proxy nodes, and
a cycle cover 𝒞3 is found on the new proxy nodes.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 140

Lemma 10.3. Let the cost of edges in 𝐺 satisfy the asymmetric triangle inequality.
Then for any 𝑆 ⊆ 𝑉, the cost of an optimal TSP tour in 𝐺[𝑆] is at most the cost of an
optimal TSP tour in 𝐺.

Proof. Since 𝐺 satisfies the triangle inequality there is an optimal tour TSP tour
in 𝐺 that is a Hamiltonian cycle 𝐶. Given any 𝑆 ⊆ 𝑉 the cycle 𝐶 can be short-cut
to produce another cycle 𝐶 0 that visits only 𝑆 and whose cost is at most the cost
of 𝐶. 
Lemma 10.4. The cost of a min-cost cycle-cover is at most the cost of an optimal TSP
tour.

Proof. An optimal TSP tour is a cycle cover. 


Theorem 10.7. The Cycle Shrinking Algorithm is a log2 𝑛-approximation for ATSP.

Proof. We prove the above by induction on 𝑛 the number of nodes in 𝐺. It is


easy to see that the algorithm finds an optimal solution if 𝑛 ≤ 2. The main
observation is that the number of cycles in the cycle-cover is at most b𝑛/2c; this
follows from the fact that each cycle in the cover has to have at least 2 nodes
and they are disjoint. Thus 𝑘 ≤ b𝑛/2c. Let OPT(𝑆) denote the cost of an optimal
solution in 𝐺[𝑆]. From Lemma 10.3 we have that OPT(𝑆) ≤ OPT(𝑉) = OPT for
all 𝑆 ⊆ 𝑉. The algorithm recurses on the proxy nodes 𝑆 = {𝑣 1 , . . . , 𝑣 𝑘 }. Note
that |𝑆| < 𝑛, and by induction, the cost of the cycle 𝐶 output by the recursive
call is at most (log2 |𝑆|) OPT(𝑆) ≤ (log2 |𝑆|) OPT.
The algorithm outputs 𝐶 0 whose cost is at most the cost of 𝐶 plus the
cost of the cycle-cover computed in 𝐺. The cost of the cycle cover is at most
OPT (Lemma 10.4). Hence the cost of 𝐶 0 is at most (log2 |𝑆|) OPT + OPT ≤
(log2 𝑛/2) OPT + OPT ≤ (log2 𝑛) OPT; this finishes the inductive proof. 

10.2.4 LP Relaxation
The LP relaxation for ATSP is given below. For each arc 𝑒 ∈ 𝐸 we have a variable
𝑥 𝑒 . We view the problem as finding a connected Eulerian multi-graph in the
support of 𝐺. That is, we can choose each edge 𝑒 an integer number of times.
We impose Eulerian constraint at each vertex by requiring the in-degree to be
equal to the out-degree. We impose connectivity constraint by ensuring that at
least one arc leaves each set of vertices 𝑆 which is not 𝑉 or ∅.
CHAPTER 10. INTRODUCTION TO NETWORK DESIGN 141

min 𝑐 𝑒 𝑥 𝑒
𝑒∈𝐸
Õ Õ
𝑥𝑒 − 𝑥𝑒 = 0 𝑣∈𝑉
𝑒∈𝛿+ (𝑣) 𝑒∈𝛿 − (𝑣)
Õ
𝑥𝑒 ≥ 1 ∅(𝑆(𝑉
𝑒∈𝛿 + (𝑆)
𝑥𝑒 ≥ 0 𝑒∈𝐸

Remarks:

1. It has remained an open problem for more than 25 years whether there
exists a constant factor approximation for ATSP. Asadpour et al [12] have
obtained an 𝑂(log 𝑛/log log 𝑛)-approximation for ATSP using some very
novel ideas and a well-known LP relaxation.

2. There is now an 𝑂(1)-approximation for ATSP with initial breakthrough


work by Svensson for a special case [144] and then followed up by Svensson,
Tarnawski and Vegh [146] . The current best constant is 22 + 𝜖 due to [150].
The algorithm is highly non-trivial and is based on the LP relaxation.
Chapter 11

Steiner Forest Problem

We discuss a primal-dual based 2-approximation for the Steiner Forest problem1.


In the Steiner Forest problem there is a graph 𝐺 = (𝑉 , 𝐸) where each edge 𝑒 ∈ 𝐸
has a cost 𝑐 𝑒 ∈ ℝ. We are given 𝑘 pairs of vertices (𝑠 1 , 𝑡1 ), (𝑠 2 , 𝑡2 ), . . . (𝑠 𝑘 , 𝑡 𝑘 ) ∈
𝑉 × 𝑉. The goal is to find the minimum cost set of edges 𝐹 such that in the graph
(𝑉 , 𝐹) the vertices 𝑠 𝑖 and 𝑡 𝑖 are in the same connected component for 1 ≤ 𝑖 ≤ 𝑘.
We refer to the set {𝑠 1 , . . . , 𝑠 𝑘 , 𝑡1 , . . . , 𝑡 𝑘 } as terminals. Notice that the graph
(𝑉 , 𝐹) can contain multiple connected components.
One can see that Steiner Forest generalizes the Steiner Tree problem. It is,
however, much less easy to obtain a constant factor approximation for Steiner
Forest. A natural greedy heuristic, similar to that for the online Steiner Tree
problem is the following. We order the the pairs in some arbitrary fashion, say
as 1 to 𝑘 without loss of generality. We maintain a forest 𝐹 of edges that we
have already bought. When considering pair 𝑖 we find a shortest path 𝑃 from
𝑠 𝑖 to 𝑡 𝑖 in the graph in which we reduce the cost of each edge in 𝐹 to 0 (since
we already paid for them). We add the new edges in 𝑃 to 𝐹. Although simple
to describe, the algorithm is not straight forward to analyze. In fact the best
bound we have on the performance of this algorithm is 𝑂(log2 𝑘) (note that the
corresponding bound for Steiner Tree tree is 𝑂(log 𝑘)) while the lower bound
on its peformance is only Ω(log 𝑘). The analysis of the upper bound requires
appealing to certain extremal results [14] and closing the gap between the upper
and lower bound on the analysis of this simplest of greedy algorithms is a very
interesting open problem.
The first constant factor approximation for Steiner Forest is due to the
influential work of Agarwal, Klein and Ravi [5] who gave a primal-dual 2-
approximation which is still the best known. Their primal-dual algorithm has
since been generalized for a wide variety of network design problems via the
1Parts of this chapter are based on past scribed lecture notes by Ben Moseley from 2009.

142
CHAPTER 11. STEINER FOREST PROBLEM 143

work of Goemans and Williamson (see their survey [66]) and several others.
Steiner Forest has the advantage that one can visualize the algorithm and
analysis more easily when compared to the more abstract settings that we will
see shortly.
We now describe an integer programming formulation for the problem. In
the IP we will have a variable 𝑥 𝑒 for each edge 𝑒 ∈ 𝐸 such that 𝑥 𝑒 is 1 if and only
if 𝑒 is part of the solution. Let the set 𝒮 be the collection of all sets 𝑆 ⊂ 𝑉 such
that |𝑆 ∩ {𝑠 𝑖 , 𝑡 𝑖 }| = 1 for some 1 ≤ 𝑖 ≤ 𝑘. For a set 𝑆 ⊂ 𝑉 let 𝛿(𝑆) denote the set
of edges crossing the cut (𝑆, 𝑉 \ 𝑆). The IP can be written as the following.

Õ
min 𝑐𝑒 𝑥𝑒
𝑒∈𝐸
Õ
such that 𝑥 𝑒 ≥ 1 ∀𝑆 ∈ 𝒮
𝑒∈𝛿(𝑆)
𝑥 𝑒 ∈ {0, 1} ∀𝑒 ∈ 𝐸

We can obtain an LP-relaxation by changing the constraint that 𝑥 𝑒 ∈ {0, 1} to


𝑥 𝑒 ≥ 0. The dual of the LP-relaxation can be written as the following.

Õ
max 𝑦𝑆
𝑆∈𝒮
Õ
such that 𝑦𝑆 ≤ 𝑐 𝑒 ∀𝑒 ∈ 𝐸
𝑆:𝑒∈𝛿(𝑆)
𝑦𝑆 ≥ 0 ∀𝑆 ∈ 𝒮

Before we continue, some definitions will be stated which will help to define
our algorithm for the problem.

Definition 11.1. Given a set of edges 𝑋 ⊆ 𝐸, a set 𝑆 ∈ 𝒮 is violated with respect to


𝑋 if 𝛿(𝑆) ∩ 𝑋 = ∅. In other words 𝑆 is violated even with edge set 𝑋 included.

Definition 11.2. Given a set of edges 𝑋 ⊆ 𝐸, a set 𝑆 ∈ 𝒮 is minimally violated with


respect to 𝑋 if 𝑆 is violated with respect to 𝑋 and there is no 𝑆0 ⊂ 𝑆 that is also violated
with respect to 𝑋.

Next we show that any two minimally violated sets are disjoint.

Claim 11.0.1. ∀𝑋 ⊆ 𝐸 if 𝑆 and 𝑆0 are minimally violated sets then 𝑆 ∩ 𝑆0 = ∅, i.e. 𝑆


and 𝑆0 are disjoint.

In fact we will prove the following claim which implies the preceding one.
CHAPTER 11. STEINER FOREST PROBLEM 144

Claim 11.0.2. Let 𝑋 ⊆ 𝐸. The minimially violated sets with respect to 𝑋 are the
connected components of the graph (𝑉 , 𝑋) that are violated with respect to 𝑋.

Proof. Consider a minimal violated set 𝑆. We may assume the set 𝑆 contains
𝑠 𝑖 but not 𝑡 𝑖 for some 𝑖. If 𝑆 is not a connected component of (𝑉 , 𝑋) then there
must be some connected component 𝑆0 of 𝐺[𝑆] that contains 𝑠 𝑖 . But 𝛿 𝑋 (𝑆0) = ∅,
and hence 𝑆0 is violated; this contradicts the fact that 𝑆 is minimal. Therefore, if
a set 𝑆 is a minimal violated set then it must be connected in the graph (𝑉 , 𝑋).
Now suppose that 𝑆 is a connected component of (𝑉 , 𝑋); it is easy to see that
no proper subset of 𝑆 can be violated since some edge will cross any such set.
Thus, if 𝑆 is a violated set then it is minimal violated set.
Thus, the minimal violated sets with respect to 𝑋 are the conected components
of the graph (𝑉 , 𝑋) that are themselves violated sets. It follows that any two
distinct minimal violated sets are disjoint. 
The primal-dual algorithm for Steiner Forest is described below.

SteinerForest:
𝐹←∅
while 𝐹 is not feasible
Let 𝐶1 , 𝐶2 , . . . , 𝐶 ℎ be minimally violated sets with respect to 𝐹
Raise 𝑦𝐶 𝑖 for 1 ≤ 𝑖 ≤ ℎ uniformly until some edge 𝑒 becomes tight
𝐹←𝐹+𝑒
𝑥𝑒 = 1
Output 𝐹0 = {𝑒 ∈ 𝐹 | 𝐹 − 𝑒 is not feasible}

The first thing to notice about the algorithm above is that it is closely related
to our solution to the Vertex Cover problem, however, there are two main
differences. In the Vertex Cover we raised the dual variables for all uncovered
edges uniformly, however, in this algorithm we were more careful on which
dual variables are raised. In this algorithm, we chose to only raise the variables
which correspond to the minimally violated sets. Unlike the case of Steiner
Tree, in Steiner Forest, there can be non-trivial connected components that are
not violated and hence become inactive. A temporarily inactive component may
become part of an active component later if an active component merges with
it. The other main difference is that when we finally output the solution, we
prune 𝐹 to get 𝐹0. This is done for technical reasons, but the intuition is that we
should include no edge in the solution which is not needed to obtain a feasible
solution. To understand this algorithm, there is a non-trivial example in the
textbook [152] that demonstrates the algorithm’s finer points.

Lemma 11.1. At the end of the algorithm, 𝐹0 and y are primal and dual feasible solutions,
respectively.
CHAPTER 11. STEINER FOREST PROBLEM 145

Proof. In each iteration of the while loop, only the dual variables corresponding
to connected components were raised. Therefore, no edge that is contained
within the same component can become tight, and, therefore, 𝐹 is acyclic. To see
that none of the dual constraints are violated, observe that when a constraint
becomes tight (that is, it holds with equality), the corresponding edge 𝑒 is added
to 𝐹. Subsequently, since 𝑒 is contained in some connected component of 𝐹, no
set 𝑆 with 𝑒 ∈ 𝛿(𝑆) ever has 𝑦𝑆 raised. Therefore, the constraint for 𝑒 cannot be
violated, and so y is dual feasible.
As long as 𝐹 is not feasible, the while loop will not terminate, and there are
some minimal violated sets that can have their dual variables raised. Therefore,
at the end of the algorithm 𝐹 is feasible. Moreover, since 𝐹 is acyclic (it is a
forest), there is a unique 𝑠 𝑖 -𝑡 𝑖 path in 𝐹 for each 1 ≤ 𝑖 ≤ 𝑘. Thus, each edge on a
𝑠 𝑖 -𝑡 𝑖 path is not redundant and is not deleted when pruning 𝐹 to get 𝐹0. 
Theorem 11.3. The primal-dual algorithm for Steiner Forest gives a 2-approximation.

Proof. Let 𝐹0 be the output from our algorithm. To prove this theorem, we want
to show that 𝑐(𝐹0) ≤ 2 𝑠∈𝒮 𝑦𝑆 where 𝑦𝑆 is the feasible dual constructed by the
Í
algorithm. It follows from this that the algorithm is in fact a 2-approximation.
First, we know that 𝑐(𝐹0) = 𝑒∈𝐹0 𝑐 𝑒 = 𝑒∈𝐹0 𝑆∈𝒮:𝑒∈𝛿(𝑆) 𝑦 𝑠 because every edge
Í Í Í
picked is tight. Let deg𝐹0 (𝑆) denote the number of edges of 𝐹0 that cross the cut
(𝑆, 𝑉 \ 𝑆). It can be seen that 𝑒∈𝐹0 𝑆∈𝒮:𝑒∈𝛿(𝑆) 𝑦 𝑠 = 𝑠∈𝒮 𝑦𝑆 deg𝐹0 (𝑆).
Í Í Í
Let 𝐴 𝑖 contain the minimally violated sets in iteration 𝑖 and let Δ𝑖 denote
the amount of dual growth in the 𝑖th iteration. Say that our algorithm runs for
𝛼 iterations. We can then rewrite 𝑠∈𝒮 𝑦𝑆 deg𝐹0 (𝑆) as the double summation
Í
𝛼
𝑖=1 Í 𝑆∈𝐴 𝑖 Δ𝑖 deg𝐹0 (𝑆). In the next lemma it will be shown for any iteration 𝑖
Í Í
that 𝑆∈𝐴𝑖 deg𝐹0 (𝑆) ≤ 2|𝐴 𝑖 |. Knowing this we can prove the theorem:

Õ Õ Õ 𝛼 Õ
Õ 𝛼
Õ Õ
𝑦 𝑠 deg𝐹0 (𝑆) = Δ𝑖 deg𝐹0 (𝑆) = Δ𝑖 deg𝐹0 (𝑆) ≤ Δ𝑖 · 2|𝐴 𝑖 | ≤ 2 𝑦𝑆 .
𝑆∈𝒮 𝑆∈𝒮 𝑖:𝑆∈𝐴 𝑖 𝑖=1 𝑆∈𝐴 𝑖 𝑖=1 𝑆∈𝒮


Now we show the lemma used in the previous theorem. It is in this lemma
that we use the fact that we prune 𝐹 to get 𝐹0.
We need a simple claim.

Lemma 11.2. Let 𝑇 = (𝑉 , 𝐸) be a tree/forest and let 𝑍 ⊆ 𝑉 be a subset of the nodes


such that every leaf of 𝑇 is in 𝑍. Then 𝑣∈𝑍 deg𝑇 (𝑣) ≤ 2|𝑍| − 2 where deg𝑇 (𝑣) is the
Í
degree of 𝑣 in 𝑇.
CHAPTER 11. STEINER FOREST PROBLEM 146

Í
Proof. We will prove it for a tree. We have 𝑣∈𝑉 deg𝑇 (𝑉) = 2|𝑉 | − 2 since a tree
has |𝑉 | − 1 edges. Every node 𝑢 ∈ 𝑉 − 𝑍 has degree at least 2 since it is not a
leaf. Thus 𝑢∈𝑉−𝑍 deg𝑇 (𝑢) ≥ 2|𝑉 − 𝑍|. Thus,
Í

Õ
deg𝑇 (𝑣) ≤ 2|𝑉 | − 2 − 2|𝑉 − 𝑍| ≤ 2|𝑍| − 2.
𝑣∈𝑍


Lemma 11.3. For any iteration 𝑖 of our algorithm,
Í
𝑆∈𝐴 𝑖 deg𝐹0 (𝑆) ≤ 2|𝐴 𝑖 | − 2.

Proof. Consider the graph (𝑉 , 𝐹0), and fix an iteration 𝑖. In this graph, contract
each set 𝑆 active in iteration 𝑖 to a single node (call such a node an active node),
and each inactive set to a single node. Let the resulting graph be denoted by 𝐻.
We know that 𝐹 is a forest and we have contracted connected subsets of vertices
in 𝐹; as 𝐹0 ⊆ 𝐹, we conclude that 𝐻 is also a forest.
Claim 11.0.3. Every leaf of 𝐻 is an active node.

Proof. If not, consider leaf node 𝑣 of 𝐻 which is an inactive node and let 𝑒 ∈ 𝐹0
be the edge incident to it. We claim that 𝐹0 − 𝑒 is feasible which would contradict
the minimality of 𝐹0. To see this, if 𝑥, 𝑦 are two nodes in 𝐻 where 𝑣 ≠ 𝑥, 𝑣 ≠ 𝑦
then 𝑥 and 𝑦 are connected also in 𝐻 − 𝑒 since 𝑣 is a leaf. Thus the utility of 𝑒 is
to connect 𝑣 to other nodes in 𝐻 but if this is the case 𝑣 would be an active node
at the start of the iteration which is not the case. 
The degree in 𝐻 of an active node corresponding to violated set 𝑆 is deg𝐹0 (𝑆).
Now we apply Lemma 11.2. 
Chapter 12

Primal Dual for Constrained


Forest Problems

We previously saw a primal-dual based 2-approximation for the Steiner Forest


problem. The algorithm can be generalized to a much wider class of problems
that involve finding a min-cost forest in an undirected edge-weighted graph that
needs to satisfy some constraint. The resulting machinery is a more abstract and
requires more advanced tools. We start with some problems all of which are
NP-Hard.
Point to point connection problem: Given edge-weighted graph 𝐺 = (𝑉 , 𝐸)
and two disjoint sets 𝑋 = {𝑠1 , . . . , 𝑠 𝑘 } and 𝑌 = {𝑡1 , . . . , 𝑡 𝑘 } of terminals find the
min-cost forest in 𝐺 such that each connected component contains same number
of terminals from 𝑋 and 𝑌.
Lower Capacitated Tree Problem: Given 𝐺 = (𝑉 , 𝐸), 𝑐 : 𝐸 → ℝ+ and a 𝑘 ∈ ℤ+
find a set 𝐸0 ⊆ 𝐸 of minimum cost such that every connected component in
(𝑉 , 𝐸0) has at least 𝑘 edges.
Connectivity Augmentation: Given an undirected 𝑘-edge connected graph
𝐺 = (𝑉 , 𝐸) and a set of edges 𝐸 𝑎𝑢 𝑔 ⊆ 𝑉 × 𝑉 − 𝐸, find a set 𝐸0 ⊆ 𝐸 𝑎𝑢 𝑔 of minimum
cost such that 𝐺 = (𝑉 , 𝐸 ∪ 𝐸0) is (𝑘 + 1)-edge connected.
Steiner Connectivity Augmentation: Let 𝐺 = (𝑉 , 𝐸) be an edge-weighted
graphs and let (𝑠1 , 𝑡1 ), . . . , (𝑠 ℎ , 𝑡 ℎ ) be 𝑘 pairs such that each pair is 𝑘-edge-
connected in 𝐺. Given a set of edges 𝐸 𝑎𝑢 𝑔 ⊆ 𝑉 × 𝑉 − 𝐸, find a set 𝐸0 ⊆ 𝐸 𝑎𝑢 𝑔
of minimum cost such that in 𝐺0 = (𝑉 , 𝐸 ∪ 𝐸0) each pair (𝑠 𝑖 , 𝑡 𝑖 ) is (𝑘 + 1)-edge
connected.
Each of the preceding problems can be cast as a special case of the following
abstract problem. Given an edge-weighted graph 𝐺 = (𝑉 , 𝐸) and a function

147
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 148

𝑓 : 2𝑉 → {0, 1}, find a min-cost subset of edges 𝐸0 such that |𝛿 𝐸0 (𝑆)| ≥ 𝑓 (𝑆) for
each 𝑆 ⊆ 𝑉. We use the notation 𝛿 𝐹 (𝑆) to denote the edges from an edge set 𝐹
that cross the set/cut 𝑆. Alternatively we want a min-cost subset of edges 𝐸0
such that each set 𝑆 ∈ 𝒮 is crossed by an edge of 𝐸0 where 𝒮 = {𝑆 | 𝑓 (𝑆) = 1}. It
is easy to observe that a minimal solution to this abstract problem is a forest since
any cut needs to be covered at most once. This formulation is too general since
𝑓 may be completely arbitrary. The goal is to find a sufficiently general class
that captures interesting problems while still being tractable. The advantage of
{0, 1} functions is precisely because the minimal solutions are forests. We will
later consider integer valued functions.
Definition 12.1. Given a graph 𝐺 = (𝑉 , 𝐸) and an integer valued function 𝑓 : 2𝑉 → ℤ
we say that a susbet of edges 𝐹 is feasible for 𝑓 or covers 𝑓 iff |𝛿 𝐹 (𝑆)| ≥ 𝑓 (𝑆) for all
𝑆 ⊆ 𝑉.
Remark 12.1. Even though it may seem natural to restrict attention to requirement
functions that only have non-negative entries, we will see later that the flexibility
of negative requirements is useful.
Given a network design problem Π in an undirected graph 𝐺 = (𝑉 , 𝐸) and
an integer valued function 𝑓 : 2𝑉 → ℤ+ we say that the requirement function of
Π is 𝑓 if covering 𝑓 is equivalent to satisfing the constraints of Π.

12.1 Classes of Functions and Setup


Here we consider classes of requirement functions 𝑓 : 2𝑉 → ℤ and their
relationship. Even though we define more generally, for this chapter the focus
will be on {0, 1} functions.
Definition 12.2. 𝑓 is maximal if for all disjoint 𝐴 and 𝐵 we have 𝑓 (𝐴 ∪ 𝐵) ≤
max{ 𝑓 (𝐴), 𝑓 (𝐵)}.
Exercise 12.1. Prove that the requirement function of Steiner Forest is maximal.
Definition 12.3. 𝑓 is proper if it is symmetric, maximal and 𝑓 (𝑉) = 0.
Exercise 12.2. Prove that the requirement function of Steiner Forest is proper.
Exercise 12.3. Prove that the requirement function Steiner Connectivity Aug-
mentation is proper.
Definition 12.4. 𝑓 is downward monotone if 𝑓 (𝐴) ≤ 𝑓 (𝐵) for all ∅ ≠ 𝐵 ⊂ 𝐴.
Exercise 12.4. Prove that the requirement function of the Lower Capacitated
Tree problem as downward monotone.
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 149

A very general class is the one given below.

Definition 12.5. 𝑓 is skew-supermodular (also called weakly-supermodular) if


for all 𝐴 and 𝐵 one of the following conditions hold,

1. 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵)

2. 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴)

A specialization of skew-supermodular for {0, 1} functions will be the focus


of this chapter.

Definition 12.6. A {0, 1} valued 𝑓 is uncrossable if for all 𝐴 and 𝐵 such that 𝑓 (𝐴)
and 𝑓 (𝐵) = 1, one of the following conditions hold,

1. 𝑓 (𝐴 ∪ 𝐵) = 1 and 𝑓 (𝐴 ∩ 𝐵) = 1

2. 𝑓 (𝐴 − 𝐵) = 1 and 𝑓 (𝐵 − 𝐴) = 1.

Claim 12.1.1. If 𝑓 is downward monotone then it is skew-supermodular.

Proof. Since 𝑓 is downward monotone, 𝐴 − 𝐵 ⊂ 𝐴 and 𝐵 − 𝐴 ⊂ 𝐵 we get:

𝑓 (𝐴) ≤ 𝑓 (𝐴 − 𝐵)
𝑓 (𝐵) ≤ 𝑓 (𝐵 − 𝐴)

and hence the second condition of skew-supermodularity always holds. 


Lemma 12.1. If 𝑓 is proper then it is skew-supermodular.

Proof. Consider two sets 𝐴, 𝐵. By considering 𝐴 as disjoint union of 𝐴 − 𝐵


and 𝐴 ∩ 𝐵 we have (𝑖) 𝑓 (𝐴) ≤ max{ 𝑓 (𝐴 − 𝐵), 𝑓 (𝐴 ∩ 𝐵)}. Similarly (ii) 𝑓 (𝐵) ≤
max{ 𝑓 (𝐵 − 𝐴), 𝑓 (𝐴 ∩ 𝐵)}.
Now we apply symmetry of 𝑓 and note that 𝑓 (𝐴) = 𝑓 (𝑉 − 𝐴). Write
𝑉 − 𝐴 as disjoint union of 𝐵 − 𝐴 and 𝑉 − (𝐴 ∪ 𝐵). Hence (iii) 𝑓 (𝐴) = 𝑓 (𝑉 − 𝐴) ≤
max{ 𝑓 (𝐵−𝐴), 𝑓 (𝑉 −(𝐴∪𝐵)} = max{ 𝑓 (𝐵−𝐴), 𝑓 (𝐴∪𝐵)} where we used symmetry
of 𝑓 in the second equality. Similary, (iv) 𝑓 (𝐵) ≤ max{ 𝑓 (𝐴 − 𝐵), 𝑓 (𝐴 ∪ 𝐵}.
Summing up the four inequalities and replacing max{𝑥, 𝑦} by (𝑥 + 𝑦)/2 we
obtain

2( 𝑓 (𝐴) + 𝑓 (𝐵)) ≤ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴) + 𝑓 (𝐴 ∩ 𝐵) + 𝑓 (𝐴 ∪ 𝐵)

which implies that 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴) or 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 ∩


𝐵) + 𝑓 (𝐴 ∪ 𝐵). 
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 150

Definition 12.7. Let 𝐺 = (𝑉 , 𝐸) and 𝑓 : 2𝑉 → ℤ. For each 𝑋 ⊆ 𝐸, the residual


requirement function 𝑓𝑋 : 2𝑉 → ℤ is defined as:

𝑓𝑋 (𝐴) = 𝑓 (𝐴) − |𝛿 𝑋 (𝐴)|.

Exercise 12.5. Let 𝑔 : 2𝑉 → ℝ be a symmetric submodular function. Prove that


𝑔 satisfies posi-modularity:

𝑔(𝐴) + 𝑔(𝐵) ≥ 𝑔(𝐴 − 𝐵) + 𝑔(𝐵 − 𝐴) ∀𝐴, 𝐵 ⊆ 𝑉.

Lemma 12.2. If 𝑓 is skew-supermodular then 𝑓𝑋 is also skew-supermodular for any


𝑋 ⊆ 𝐸.

Proof. The function |𝛿 𝑋 (.)| is submodular. Hence

|𝛿 𝑋 (𝐴)| + |𝛿 𝑋 (𝐵)| ≥ |𝛿 𝑋 (𝐴 ∩ 𝐵)| + |𝛿 𝑋 (𝐵 ∪ 𝐴)| ∀𝐴, 𝐵 ⊆ 𝑉.

The function is also symmetric and hence also satisfies posi-modularity. There-
fore,
|𝛿 𝑋 (𝐴)| + |𝛿 𝑋 (𝐵)| ≥ |𝛿 𝑋 (𝐴 − 𝐵)| + |𝛿 𝑋 (𝐵 − 𝐴)| ∀𝐴, 𝐵 ⊆ 𝑉.
Now consider the function 𝑓𝑋 and let 𝐴, 𝐵 be any two subsets of 𝑉. Suppose
𝑓 (𝐴) + 𝑓 (𝐵) ≥ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵). Then

𝑓𝑋 (𝐴) + 𝑓𝑋 (𝐴) = 𝑓 (𝐴) − |𝛿 𝑋 (𝐴)| + 𝑓 (𝐵) − |𝛿 𝑋 (𝐵)|


≥ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵) − (|𝛿 𝑋 (𝐴) + |𝛿 𝑋 (𝐵)|
≥ 𝑓 (𝐴 ∪ 𝐵) + 𝑓 (𝐴 ∩ 𝐵) − (|𝛿 𝑋 (𝐴 ∩ 𝐵)| + |𝛿 𝑋 (𝐵 ∪ 𝐴)|
= 𝑓𝑋 (𝐴 ∪ 𝐵) + 𝑓𝑋 (𝐴 ∩ 𝐵).

Similarly, if 𝑓 (𝐴) + 𝑓 (𝐵) ≥ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴) we can use posi-modularity


of |𝛿 𝑋 (·)| to argue that 𝑓𝑋 (𝐴) + 𝑓𝑋 (𝐵) ≥ 𝑓𝑋 (𝐴 − 𝐵) + 𝑓𝑋 (𝐵 − 𝐴). We note that
|𝛿 𝑋 | is both submodular and posi-modular which allows us to use the right
inequality. 
Remark 12.2. Allowing allow negative requirements in the definitoin of skew-
supermodularity allows a clean proof when subtracting |𝛿 𝑋 (·)|.
In the case of {0, 1} functions we make the following claim and leave the
proof as an exercise which is similar to the the one above.

Lemma 12.3. Let 𝑓 : 2𝑉 → {0, 1} be an uncrossable function and let 𝑋 ⊆ 𝐸. Let


ℎ : 2𝑉 → {0, 1} be the residular function where ℎ(𝑆) = 1 iff |𝛿 𝑋 (𝑆)| = 0. Then ℎ is
uncrossable.
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 151

Definition 12.8. Let 𝑓 be a {0, 1} requirement function over 𝑉 and let 𝑋 ⊆ 𝐸. A set
𝑆 is violated with respect to 𝑋 if 𝑓𝑋 (𝑆) = 1 (in other words 𝑆 is not yet covered by
the edge set 𝑋). A set 𝑆 is a minimal violated set if there is no 𝑆0 ( 𝑆 such that 𝑆0 is
violated.

Lemma 12.4. Let 𝑓 be an uncrossable function and 𝑋 ⊆ 𝐸, then the minimial violated
sets of 𝑓 with respect to 𝑋 are disjoint. That is, if 𝐴, 𝐵 are minimal violated sets then
𝐴 = 𝐵 or 𝐴 ∩ 𝐵 = ∅.

Proof. Since 𝑓𝑋 is also uncrossable it sufficies to consider minimal violated sets


𝐴, 𝐵 with respect to 𝑓 (hence the empty set of edges). Suppose the property does
not hold. Then we can assume that 𝐴, 𝐵 propertly cross, that is, 𝐴−𝐵, 𝐴∩𝐵, 𝐵−𝐴
are all non-empty. We have 𝑓 (𝐴) = 𝑓 (𝐵) = 1 since both are violated. Since 𝑓 is
uncrossable, we have 𝑓 (𝐴 − 𝐵) = 𝑓 (𝐵 − 𝐴) = 1 or 𝑓 (𝐴 ∩ 𝐵) = 𝑓 (𝐴 ∪ 𝐵) = 1. In
both cases we see that we violate minimality of 𝐴, 𝐵. 

12.2 A Primal-Dual Algorithm for Covering Uncrossable


Functions
In this section we consider the following problem. Given a graph 𝐺 = (𝑉 , 𝐸)
with non-negative edge weights 𝑐 : 𝐸 → ℝ+ and a uncrossable function
𝑓 : 2𝑉 → {0, 1} find a min-cost set of edges 𝐹 ⊆ 𝐸 such that |𝛿 𝐹 (𝑆)| ≥ 𝑓 (𝑆) for all
𝑆. An important computational issue is how 𝑓 is specified. A natural model is to
consider the value oracle one where we have access to 𝑓 (𝑆) via an oracle. Given
𝑆 ⊆ 𝑉, the oracle returns 𝑓 (𝑆). However this is not sufficient in the general
context of uncrossable functions as we see from the following example. Fix
some set 𝐴. Define the function 𝑓𝐴 where 𝑓𝐴 (𝑆) = 1 iff 𝑆 = 𝐴. It is easy to see
that there is a feasible solution iff 𝛿 𝐸 (𝐴) ≠ ∅. How do we even verify that there
is a feasible solution via a value oracle? In general it may take an exponential
number of queries to find 𝐴. Hence we need more. We will assume that there is
an oracle that given 𝐺 and 𝑋 ⊆ 𝐸 outputs the set of all minimal violated sets of
𝑓𝑋 . This will typically be easy to ensure for specific functions of interest. We
note that for some special class of functions such as proper functions, a value
oracle suffices to find the minimal violated sets.
We write the primal and dual LPs for covering 𝑓 via edges of a given graph
𝐺 = (𝑉 , 𝐸).
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 152

Õ
min 𝑐𝑒 𝑥𝑒
𝑒∈𝐸
Õ
such that 𝑥 𝑒 ≥ 𝑓 (𝑆) ∀𝑆 ⊆ 𝑉
𝑒∈𝛿(𝑆)
𝑥 𝑒 ≥ 0 ∀𝑒 ∈ 𝐸

Õ
max 𝑓 (𝑆)𝑦𝑆
𝑆∈𝒮
Õ
such that 𝑦𝑆 ≤ 𝑐 𝑒 ∀𝑒 ∈ 𝐸
𝑆:𝑒∈𝛿(𝑆)
𝑦𝑆 ≥ 0 ∀𝑆 ⊆ 𝑉

The primal-dual algorithm is similar to the one for Steiner Forest with a
growth phase and reverse-delete phase.

CoverUncrossableFunc(𝐺 = (𝑉 , 𝐸), 𝑓 ):
𝐹←∅
while 𝐹 is not feasible
Let 𝐶1 , 𝐶2 , . . . , 𝐶ℓ be minimally violated sets of 𝑓 with respect to 𝐹
Raise 𝑦𝐶 𝑖 for 1 ≤ 𝑖 ≤ ℓ uniformly until some edge 𝑒 becomes tight
𝐹←𝐹+𝑒
𝑥𝑒 = 1
Let 𝐹0 = {𝑒1 , 𝑒2 , . . . , 𝑒𝑡 } where 𝑒 𝑖 is edge added to 𝐹 in iteration 𝑖
𝐹0 = 𝐹
for 𝑖 = 𝑡 down to 1 do
if (𝐹0 − 𝑒 𝑖 ) is feasible then 𝐹0 = 𝐹0 − 𝑒 𝑖
Output 𝐹0

Analysis: We can prove the following by induction on the iterations and omit
the routine details.
Lemma 12.5. The output of the algorithm, 𝐹0, is a feasible solution that covers 𝑓 .
Assuming oracle access to finding the minimal violated sets in each iteration, the
algorithm can be implemented in polynomial time.
The preceding lemma shows that the algorithm correctly outputs a feasible
solution. The main part is to analyze the cost of the solution.
Theorem 12.9. Let 𝐹0 be the output of the algorithm. Then 𝑐(𝐹0) ≤ 2 𝑓 (𝑆)𝑦𝑆 and
Í
𝑆
hence 𝑐(𝐹0) ≤ 2 OPT.
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 153

The cost analysis is based on the following key structural lemma.

Lemma 12.6. Let 𝐺 = (𝑉 , 𝐸) be a graph and let 𝑓 : 2𝑉 → {0, 1} be an uncrossable


function and let 𝒞 be the set of minimal violated sets of 𝑓 with respect with respect to ∅.
Let 𝐹 ⊆ 𝐸 be any minimal feasible solution that covers 𝑓 . Then 𝐶∈𝒞 deg𝐹 (𝐶) ≤ 2|𝒞|
Í
where deg𝐹 (𝐶) = |𝛿(𝐶) ∩ 𝐹| is the number of edges of 𝐹 crossing 𝐶.

We defer the proof of the preceding lemma to the following subsection


and finish the analysis of the 2-approximation. This is similar to the one for
Steiner Forest. Let 𝑡 be the number of iterations of the algorithm. Let 𝒞𝑖 be the
minimal violated sets in iteration 𝑖 of the algorithm and let Δ𝑖 be the amount by
which the duals grow in iteration 𝑖. We call any 𝐶 ∈ 𝒞𝑖 active in iteration 𝑖. We
have 𝑒∈𝐹0 𝑐 𝑒 = 𝑒∈𝐹0 𝑆:𝑒∈𝛿(𝑆) 𝑦𝑆 since we add edges to 𝐹 only when the dual
Í Í Í
constraint is tight and 𝐹0 ⊆ 𝐹. We also observe that the duals are grown only for
sets 𝑆 ⊆ 𝑉 where 𝑓 (𝑆) = 1 and hence 𝑆 𝑓 (𝑆)𝑦𝑆 = 𝑆 𝑦𝑆 for the dual solution
Í Í
created by the algorithm.

Õ Õ Õ Õ Õ 𝑡
Õ Õ
0
𝑐(𝐹 ) = 𝑦𝑆 = 𝑓 (𝑆)𝑦𝑆 deg𝐹0 (𝑆) = deg𝐹0 (𝑆) Δ𝑖 = Δ𝑖 deg𝐹0 (𝑆).
𝑒∈𝐹0 𝑆:𝑒∈𝛿(𝑆) 𝑆 𝑆 𝑆∈𝒞𝑖 𝑖=1 𝑆∈𝒞𝑖

To complete the analysis we need the following lemma which can be seen as
a corollary of Lemma 12.6.

Lemma 12.7. Consider any iteration 𝑖 of the algorithm. Then


Í
𝐶∈𝒞𝑖 deg𝐹0 (𝐶) ≤ 2|𝒞𝑖 |.

Proof. Consider iteration 𝑖. Let 𝑋 = {𝑒1 , 𝑒2 , . . . , 𝑒 𝑖−1 } which are the set of edges
added by algorithm in the growth phase prior to iteration 𝑖. Thus 𝒞𝑖 is the
minimal violated sets with respect to 𝑋. Consider the graph 𝐺0 = (𝑉 , 𝐸 \ 𝑋)
and the function 𝑓𝑋 . We observe that 𝐹00 = 𝐹0 \ 𝑋 is a minimal feasible solution
to covering 𝑓𝑋 in 𝐺0 — this is because of the reverse delete process. Moreover
we claim that deg𝐹0 (𝐶) = deg𝐹00 (𝐶) for any 𝐶 ∈ 𝒞𝑖 (why?). We can now apply
Lemma 12.6 to the function 𝑓𝑋 (which is uncrossable) and 𝐺0 and 𝐹00 so obtain:
Õ Õ
deg𝐹0 (𝐶) = deg𝐹00 (𝐶) ≤ 2|𝒞𝑖 |.
𝐶∈𝒞𝑖 𝐶∈𝒞𝑖


With the preceding lemma in place we have,
𝑡
Õ Õ 𝑡
Õ Õ
𝑐(𝐹0) = Δ𝑖 deg𝐹0 (𝑆) ≤ Δ𝑖 · 2|𝒞𝑖 | = 2 𝑓 (𝑆)𝑦𝑆 .
𝑖=1 𝑆∈𝒞𝑖 𝑖=1 𝑆
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 154

12.2.1 Proof of Lemma 12.6


Since 𝐹 is a minimal feasible solution to cover 𝑓 , for every 𝑒 ∈ 𝐹 there must be a
set 𝑆 𝑒 such that 𝑓 (𝑆 𝑒 ) = 1 and 𝛿 𝐹 (𝑆 𝑒 ) = {𝑒}; this is the reason we cannot remove
𝑒 from 𝐹 and maintain feasibility. We call such a set 𝑆 𝑒 a witness set for 𝑒. Clearly
a witness set 𝑆 𝑒 is a violated set.

Claim 12.2.1. Let 𝑆 𝑒 be a witness set for 𝑒 ∈ 𝐹. Then for any minimal violated set 𝐶
we have 𝐶 ⊆ 𝑆 𝑒 or 𝐶 ∩ 𝑆 𝑒 = ∅.

Proof. If 𝐶 crosses 𝑆 𝑒 then either 𝐶 − 𝑆 𝑒 or 𝐶 ∩ 𝑆 𝑒 would also be violated (since


𝑓 is uncrossable) contradicting minimality of 𝐶. 
Note that there can be many witness sets for the same edge, however, the
same set cannot be a witness set for two different edges.

Definition 12.10. Given a minimal feasible solution 𝐹 we call a family of sets 𝒮 a


witness family for 𝐹 if there is a bijection ℎ : 𝐹 → 𝒮 such that ℎ(𝑒) is a witness set for 𝑒.

Given 𝐹 we can construct a witness family by considering each edge 𝑒 ∈ 𝐹


and picking an arbitrary witness set for 𝑒 and additing it to the collection.

Definition 12.11. A family 𝒮 of finite sets is laminar if no two sets 𝐴, 𝐵 ∈ 𝒮 cross.

We have seen that there is a witness family for 𝐹 since it is minimal. We can
obtain a special witness family starting with an arbitrary witness family.

Lemma 12.8. There is a witness family for 𝐹 which is laminar.

Proof. The process is based on uncrossing. Suppose we start with an arbitrary


witness family 𝒮 and it is not laminar. Let 𝑆 𝑒1 , 𝑆 𝑒1 be the witness sets for 𝑒1 , 𝑒2 ∈ 𝐹
such that 𝑆 𝑒1 , 𝑆 𝑒2 cross.
Claim 12.2.2. One of the following holds: (i) (𝒮 \ {𝑆 𝑒1 , 𝑆 𝑒2 }) ∪ {𝑆 𝑒1 − 𝑆 𝑒2 , 𝑆 𝑒2 − 𝑆 𝑒1 }
is a witness family for 𝐹 (ii) (𝒮 \ {𝑆 𝑒1 , 𝑆 𝑒2 }) ∪ {𝑆 𝑒1 ∩ 𝑆 𝑒2 , 𝑆 𝑒2 ∪ 𝑆 𝑒2 } is a witness family
for 𝐹.
The new sets don’t cross each other but what about other sets in the family?
One can argue that the number of crossing can only decrease. More formally,
suppose we have three sets 𝐴, 𝐵, 𝐷 and say 𝐷 crosses 𝑝 sets from 𝐴, 𝐵 (𝑝 is one of
{0, 1, 2}). If we uncross 𝐴, 𝐵 (that is replace 𝐴, 𝐵 by 𝐴 − 𝐵, 𝐵 − 𝐴 or 𝐴 ∩ 𝐵, 𝐴 ∪ 𝐵)
then the number of sets that 𝐷 crosses after uncrossing cannot increase. We
leave this claim as an exercise. This means that repeated uncrossing of two
crossing sets will eventually lead to a laminar family.
Now we prove the claim. 𝑆 𝑒1 . 𝑆 𝑒1 and 𝑆 𝑒2 are violated sets which means
that 𝑓 (𝑆 𝑒1 ) = 1 and 𝑓 (𝑆 𝑒2 ) = 1. Since 𝑓 is uncrossable, say 𝑓 (𝑆 𝑒1 − 𝑆 𝑒2 ) = 1 and
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 155

𝑓 (𝑆 𝑒2 − 𝑆 𝑒1 ) = 1. The only edges from 𝐹 that can cross 𝑆 𝑒1 − 𝑆 𝑒2 and 𝑆 𝑒2 − 𝑆 𝑒1 are


𝑒1 and 𝑒2 — if there is another edge 𝑒 that crosses one of them then it would also
cross one of 𝑆 𝑒1 and 𝑆 𝑒2 and hence they would not be witness sets. Since 𝐹0 is
a feasible solution, 𝑆 𝑒1 − 𝑆 𝑒2 and 𝑆 𝑒2 − 𝑆 𝑒1 are both crossed by some edge from
𝑌 = {𝑒1 , 𝑒2 }. We claim that each of them is crossed by exactly one of them. If this
is the case then they are valid witness sets for the two edges 𝑒1 , 𝑒2 . To see why
exaclty one of them crosses each set we use submodularity/posi-modularity:

2 ≥ |𝛿𝑌 (𝑆 𝑒1 − 𝑆 𝑒2 )| + |𝛿𝑌 (𝑆 𝑒2 − 𝑆 𝑒1 )| ≤ |𝛿𝑌 (𝑆 𝑒1 )| + |𝛿𝑌 (𝑆 𝑒2 )| = 2.

The first inequality is since each is covered and the second inequality is because
of posi-modularity of |𝛿𝑌 (·)|.
The other case when 𝑓 (𝑆 𝑒1 ∩ 𝑆 𝑒2 ) = 1 and 𝑓 (𝑆 𝑒1 ∪ 𝑆 𝑒2 ) = 1 can also be handled
with a similar argument. 
Let 𝒮 be a laminar witness family for 𝐹. We create a rooted tree 𝑃 from
𝒮 as follows. To do this we first add 𝑉 to the family. For each set 𝑆 ∈ 𝒮 we
add a vertex 𝑣 𝑆 . For any 𝑆, 𝑇 such that 𝑇 is the smallest set in 𝒮 containing 𝑆
we add the edge (𝑣 𝑆 , 𝑣𝑇 ) which makes 𝑣𝑇 the parent of 𝑣 𝑆 . Since we added 𝑉
to the family we obtain a tree since every maximal set in the original family
becomes a child of 𝑣𝑉 . Note that 𝑃 has |𝐹| edges and |𝐹| + 1 nodes and each
𝑒 ∈ 𝐹 corresponds to the edge (𝑣 𝑆𝑒 , 𝑣𝑇 ) of 𝑃 where 𝑣𝑇 is the parent of 𝑣 𝑆𝑒 in 𝑃.
We keep this bijection between 𝐹 and edges of 𝑃 in mind for later.
See figure.
Consider any minimal violated set 𝐶 ∈ 𝒞. We observe that 𝐶 cannot cross
any set in 𝒮 since it is a witness family. For each 𝐶 we associate the minimal
set 𝑆 ∈ 𝒮 such that 𝐶 ⊆ 𝑆. We call 𝑣 𝑆 an active node of 𝑇 if there is a 𝐶 ∈ 𝒮
associated with it. Note that (i) not all nodes of 𝑃 may be active (ii) every 𝐶 is
associated with exactly one active node of 𝑃 (iii) multiple sets from 𝒞 can be
associated with the same active node of 𝑃.

Lemma 12.9. Let 𝑃𝑎 be the active nodes of 𝑃. Then |𝑃𝑎 | ≤ |𝒞| and every leaf of 𝑃 other
than potentially the root is an active node.

Proof. Since each 𝐶 is associated with an active node we obtain |𝑃𝑎 | ≤ |𝒞|. A
leaf of 𝑃 corresponds to a minimal set from 𝒮 or the root. If 𝑆 𝑒 is a minimal set
from 𝒮 then 𝑆 𝑒 is a violated set and hence must contain a minimal violated set
𝐶. But then 𝑆 𝑒 is active because of 𝐶. Consider 𝑣𝑉 . If it is a leaf then it has only
one child which is the unique maximal witness set 𝑆. The root can be inactive
since the function 𝑓 is not necessarily symmetric. (However, if 𝑓 is symmetric
then 𝑉 − 𝑆 would also be a violated set and hence contain a minimal violated
set from 𝒞.) 
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 156

Figure 12.1: Laminar witness family for a minimal solution 𝐹 shown as red
edges. Sets in green are the minimal violated sets and the sets in black are the
witness sets. The root set 𝑉 is not drawn.

Lemma 12.10. Let 𝑣𝑇 be an active node in 𝑃. Let 𝒞 0 ⊆ 𝒞 be set of all minimal violated
sets associated with 𝑣𝑇 . Then 𝐶∈𝒞0 deg𝐹 (𝐶) ≤ deg𝑃 (𝑣𝑇 ).
Í

Proof. Let 𝑌 = ∪𝐶∈𝒞0 𝛿 𝐹 (𝐶) be the set of edges incident to the sets in 𝒞 0. Consider
any 𝐶 ∈ 𝒞 0. 𝐶 ⊂ 𝑇 and 𝐶 is also disjoint from the children of 𝑇. Consider an
edge 𝑒 ∈ 𝛿 𝐹 (𝐶). If 𝑒 crosses 𝑇 then 𝑇 is the witness set for 𝑒 (since only one edge
from 𝐹 can cross 𝑇 for which it is the witness) and we charge 𝑒 to the parent edge
of 𝑇. If 𝑒 does not cross 𝑇 then the witness set 𝑆 𝑒 must be a child of 𝑇. Since
only one edge can cross each child of 𝑇 we can charge 𝑒 to one child of 𝑇. Note
that no edge 𝑒 ∈ 𝑌 can be incident to two set 𝐶1 , 𝐶2 ∈ 𝒞 0 since both one end
point of 𝑒 must be contained in a child of 𝑇 (assuming 𝑒 does not cross 𝑇) and
both 𝐶1 , 𝐶2 are contained in 𝑇 and disjoint from the children of 𝑇. See figure.
Therefore, we can charge 𝐶∈𝒞0 deg𝐹0 (𝐶) to the number of children of 𝑇 plus the
Í
parent edge of 𝑇, which is deg𝑃 (𝑣𝑇 ). 
Now we are ready to finish the proof. From Lemma 12.10 and the fact that
CHAPTER 12. PRIMAL DUAL FOR CONSTRAINED FOREST PROBLEMS 157

each 𝐶 ∈ 𝒞 is associated to some active node, we have


Õ Õ
deg𝐹 (𝐶) ≤ deg𝑃 (𝑣).
𝐶∈𝒞 𝑣∈𝑃𝑎

To bound 𝑣∈𝑃𝑎 deg𝑃 (𝑣) we had observed that in the tree 𝑃 every leaf except
Í
perhaps the root node is an active node. Suppose root is an active node or it is
Í
not a leaf. Then from Lemma 11.2, 𝑣∈𝑃𝑎 deg𝑃 (𝑣) ≤ 2|𝑃𝑎 | − 2. Suppose root is a
leaf and inactive. Again from Lemma 11.2 where we consider 𝑍 = 𝑃𝑎 ∪ {𝑣𝑉 }, we
Í Í
have 𝑣∈𝑃𝑎 deg𝑃 (𝑣) + 1 ≤ 2|𝑃𝑎 + 1| − 2 = 2|𝑃𝑎 |. Hence 𝑣∈𝑃𝑎 deg𝑃 (𝑣) ≤ 2|𝑃𝑎 | − 1.
Í
Thus, in both cases we see that 𝑣∈𝑃𝑎 deg𝑃 (𝑣) ≤ 2|𝑃𝑎 |. Finally we note that
|𝑃𝑎 | ≤ |𝒞| and hence putting together we have
Õ
deg𝐹 (𝐶) ≤ 2|𝑃𝑎 | ≤ 2|𝒞|
𝐶∈𝒞

as desired.
Bibliographic Notes: The primal-dual algorithm for uncrossable function is
from the paper of Williamson, Goemans, Mihail and Vazirani [157]. The proof in
the paper assumes that ℎ is symmetric without explicitly stating it; for symmetric
functions one obtains a slight advantage over 2. See Williamson’s thesis [156]
where he gives the proof for both symmetric and general uncrossable functions.
The survey by Goemans and Williamson [67] describes the many applications of
the primal-dual method in network design.
Chapter 13

Survivable Network Design


Problem

In this chapter we consider the Survivable Network Design Problem problem.


The input is an undirected graph 𝐺 = (𝑉 , 𝐸) with edge-weights 𝑐 : 𝐸 → ℝ+
and integer requirements 𝑟(𝑢𝑣) for each pair of vertices 𝑢𝑣. We wrote 𝑢𝑣
instead of (𝑢, 𝑣) to indicate that the requirement function is for unordered
pairs (alternatively, 𝑟(𝑢, 𝑣) = 𝑟(𝑣, 𝑢) for all 𝑢, 𝑣). The goal is to find a min-cost
subgraph 𝐻 = (𝑉 , 𝐹) of 𝐺 such that each the connectivity betweeen 𝑢 and 𝑣
in 𝐻 is at least 𝑟(𝑢𝑣). We obtain two versions of the problem: EC-SNDP if
the connectivity requirement is edge-connectivity and VC-SNDP for vertex
connectivity. It turns out that EC-SNDP is much more tractable than VC-SNDP
and we will focus on EC-SNDP.
r(s1t1) = 2 r(s1t1) = 2
r(s2t2) = 2 r(s2t2) = 2
r(s3t3) = 1 r(s3t3) = 1

t1 t1
s1 s1
s2 s2
s3 s3

t2 t3 t2 t3

Figure 13.1: Example of EC-SNDP. Requirement only for three pairs. A feasible
solution shown in the second figure as red edges. In this example the paths
for each pair are also vertex disjoint even though the requirement is only for
edge-disjointness.

For EC-SNDP there is a seminal work of Jain based on iterated rounding

158
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 159

that yields a 2-approximation as a special case of a more general problem.


Prior to his work there was an augmentation based approach that yields 2𝑘
and 2𝐻 𝑘 approximations where 𝑘 = max𝑢𝑣 𝑟(𝑢𝑣) is the maximum connectivity
requirement. Despite being superceded by Jain’s result in terms of the ratio, the
augmentation approach is important for various reasons and we will discuss
both.
We first consider the LP relaxation for the EC-SNDP. We do this by setting up
the requirement function 𝑓 : 2𝑉 → ℤ where we let 𝑓 (𝑆) = max𝑢∈𝑆,𝑣∈𝑉−𝑆 𝑟(𝑢𝑣).
The goal is to find a min-cost subgraph 𝐻 of 𝐺 such that 𝛿 𝐻 (𝑆) ≥ 𝑓 (𝑆).
Claim 13.0.1. The requirement function 𝑓 that captures EC-SNDP is proper and hence
skew-supermodular.
Proof. It is easy to see 𝑓 is symmetric. Consider disjoint sets 𝐴, 𝐵. Suppose
𝑓 (𝐴 ∪ 𝐵) = 𝑘 which means that there is some 𝑠 ∈ 𝐴 ∪ 𝐵 and 𝑡 ∈ 𝑉 − (𝐴 ∪ 𝐵)
such that 𝑟(𝑠𝑡) = 𝑘. If 𝑠 ∈ 𝐴 then 𝑓 (𝐴) ≥ 𝑘 and if 𝑠 ∈ 𝐵 then 𝑓 (𝐵) ≥ 𝑘. Therefore
max{ 𝑓 (𝐴), 𝑓 (𝐵)} ≥ 𝑘 = 𝑓 (𝐴 ∪ 𝐵) since 𝑠 ∈ 𝐴 or 𝑠 ∈ 𝐵. 

13.1 Augmentation approach


The augmentation approach for EC-SNDP is based on iteratively increasing the
connectivity of pairs from 1 to 𝑘 where 𝑘 = max𝑢𝑣 𝑟(𝑢𝑣). In fact this works for
any proper function 𝑓 : 2𝑉 → ℤ and we will work in this generality rather than
focus only on EC-SNDP.
Claim 13.1.1. Let 𝑓 be a proper function and let 𝑝 be an integer. Then the truncated
function 𝑔𝑝 : 2𝑉 → ℤ defined as 𝑔𝑝 (𝑆) = min{𝑝, 𝑓𝑝 (𝑆)} is proper.
Proof. Exercise. 
Lemma 13.1. Let 𝐺 = (𝑉 , 𝐸) be graph and let 𝑓 : 2𝑉 → ℤ be a proper function and let
𝑝 ≥ 0 be a non-negative integer. Let 𝑋 ⊆ 𝐸 be a set of edges such that |𝛿 𝑋 (𝑆)| ≥ 𝑔𝑝 (𝑆)
Consider the function ℎ 𝑝+1 : 2𝑉 → {0, 1} where ℎ 𝑝+1 (𝑆) = 1 iff 𝑓 (𝑆) ≥ 𝑝 + 1 and
|𝛿 𝑋 (𝑆)| = 𝑝. Then ℎ 𝑝+1 is uncrossable and symmetric.
Proof. Consider the function 𝑔𝑝+1 which is proper and hence also skew-supermodular.
For notational convenience we use ℎ for ℎ 𝑝+1 . Suppose ℎ(𝐴) = ℎ(𝐵) = 1. This
implies 𝑔𝑝+1 (𝐴) ≥ 𝑝+1 and 𝑔𝑝+1 (𝐵) ≥ 𝑝+1 and |𝛿 𝑋 (𝐴)| = 𝛿 𝑋 (𝐵) = 𝑝. 𝑔𝑝+1 is skew-
supermodular. First case is when 𝑔𝑝+1 (𝐴) + 𝑔𝑝+1 (𝐵) ≤ 𝑔𝑝+1 (𝐴 ∪ 𝐵) + 𝑔𝑝+1 (𝐴 ∩ 𝐵).
This implies that 𝑔𝑝+1 (𝐴 ∪ 𝐵) = 𝑔𝑝+1 (𝐵) = 𝑝 + 1. By submodularity of |𝛿 𝑋 | we
have |𝛿 𝑋 (𝐴)| + |𝛿 𝑋 (𝐵)| ≥ |𝛿 𝑋 (𝐴 ∩ 𝐵)| + |𝛿 𝑋 (𝐴 ∪ 𝐵)| and by feasibility of 𝑋 for 𝑔𝑝
we have |𝛿 𝑋 (𝐴 ∩ 𝐵)| = |𝛿 𝑋 (𝐴 ∩ 𝐵)| = 𝑝. This implies that ℎ(𝐴 ∩ 𝐵) = ℎ(𝐴 ∪ 𝐵) = 1.
Similarly, if 𝑔𝑝+1 (𝐴) + 𝑔𝑝+1 (𝐵) ≤ 𝑔𝑝+1 (𝐴 − 𝐵) + 𝑔𝑝+1 (𝐵 − 𝐴) we can argue that
ℎ(𝐴 − 𝐵) = ℎ(𝐵 − 𝐴) = 1 via posi-modularity of |𝛿 𝑋 |. 
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 160

Exercise 13.1. Let 𝐺 = (𝑉 , 𝐸) be a graph and let 𝑓 : 2𝑉 → ℤ be a proper function.


Suppose 𝐹 be a feasible cover for 𝑔𝑝 . Let ℎ 𝑝+1 be the residulal uncrossable
function that arises from 𝑔𝑝 as in the preceding lemma. Let 𝐹0 ⊆ 𝐸 \ 𝐹 be a
feasible cover for ℎ 𝑝+1 in the graph 𝐺0 = (𝑉 , 𝐸 \ 𝐹). Then 𝐹 ∪ 𝐹0 is a feasible cover
for 𝑔𝑝+1 .

Lemma 13.2. Let 𝑓 be the requirement function of an instance of EC-SNDP in


𝐺 = (𝑉 , 𝐸) and let 𝑝 be an integer and let 𝑋 ⊆ 𝐸 be a set of edges. There is a polynomial
time algorithm to find the minimal violated sets of 𝑔𝑝 with respect to 𝑋.

Proof. For each pair or nodes (𝑠, 𝑡) find a source-minimal 𝑠-𝑡 mincut 𝑆 in the
graph 𝐻 = (𝑉 , 𝑋) and a sink-minimal mincut 𝑇 via maxflow1. Let 𝑆 be the cut.
If |𝛿 𝑋 (𝑆)| < 𝑝 then 𝑝 is a violated set. We compute all such minimal cuts over all
pairs of vertices and take the minimal sets in this collection. We leave it as an
exercise to check that the minimal violated sets of 𝑔𝑝 are the minimal sets in this
collection and will be disjoint. 
Corollary 13.1. Let 𝑓 be the requirement function of an instance of EC-SNDP in
𝐺 = (𝑉 , 𝐸) and let 𝑝 be an integer. Let 𝑋 be set of edges such that 𝑋 is feasible to cover
𝑔𝑝 . In the graph 𝐺0 = (𝑉 , 𝐸 \ 𝑋) and for any 𝐹 ⊆ (𝐸 \ 𝑋) the minimal violated sets of
ℎ 𝑝+1 with respect to 𝐹 can computed in polynomial time.

Proof. The minimial violated sets of ℎ 𝑝+1 with respect to 𝐹 are the same as the
minimal violated sets of 𝑔𝑝+1 with respect to 𝑋 ∪ 𝐹. 
1The source minimal 𝑠-𝑡 mincut in a directed/undirected graph is unique via submodularity
and can be found by computing 𝑠-𝑡 maxflow and finding the reachable set from 𝑠 in the residual
graph. Similarly sink minimal set.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 161

Augmentation-Algorithm(𝐺 = (𝑉 , 𝐸), 𝑓 )

1. If 𝐸 does not cover 𝑓 output “infeasible”

2. 𝑘 = max 𝑓 (𝑆) is the maximum requirement


𝑆

3. 𝐴 ← ∅

4. for ( 𝑝 = 1 to 𝑘 ) do

A. 𝐺0 = (𝑉 , 𝐸 \ 𝐴)
B. Let 𝑔𝑝 be the function defined as 𝑔𝑝 (𝑆) = min{ 𝑓 (𝑆), 𝑝}
C. Let ℎ 𝑝 be the uncrossable function where ℎ 𝑝 (𝑆) = 1 iff 𝑔𝑝 (𝑆) > |𝛿 𝐴 (𝑆)|
D. Find 𝐴0 ⊆ 𝐸 \ 𝐴 that covers ℎ 𝑝 in 𝐺0
E. 𝐴 ← 𝐴 ∪ 𝐴0

5. Output 𝐴

t1
r(s1t1) = 2 s1
r(s2t2) = 2 s2
r(s3t3) = 1
s3

t2 t3

t1
s1
s2
s3

t2 t3

Figure 13.2: Example to illustrate the augmentation approach. Top picture


shows a set of edges that connect all pairs with connectivity requirement at least
1. Second picture shows the residual graph in which one needs to solve the
augmentation problem. Note that 𝑠 2 and 𝑡2 are isolated vertices in the residual
graph, however, the cuts induced by them are already satisfied by the edges
chosen in the first iteration.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 162

Theorem 13.2. The augmentation algorithm yields a 2𝑘-approximation for EC-SNDP


where 𝑘 is the maximum connectivity requirement.

Proof. We sketch the proof. The algorithm has 𝑘 iterations and in each iteration
it uses a black box algorithm to cover an uncrossable function. We saw a primal-
dual 2-approximation for this problem. We observe that if 𝐹 ∗ is an optimum
solution to the given instance then in each iteration 𝐹 ∗ \ 𝐴 is a feasible solution
to the covering problem in that iteration. Thus the cost paid by the algorithm
in each iteration can be bound by 2𝑐(𝐹 ∗ ) and hence the total cost is at most
2𝑘 OPT. The preceding lemmas argue that the primal-dual algorithm can be
implemented in polynomial time. 
Remark 13.1. A different algorithm that is based on augmentation in reverse
yields a 2𝐻 𝑘 approximation where 𝐻 𝑘 is the 𝑘’th harmonic number. We refer
the reader to [66].

13.2 Iterated rounding based 2-approximation


In the section we describe the seminal result of Jain [90] who obtained a 2-
approximation for EC-SNDP via iterated rounding. He proved a more general
polyhedral result. Consider the problem of covering a skew-supermodular
function 𝑓 : 2𝑉 → ℤ by the edges of a graph 𝐺 = (𝑉 , 𝐸). The natural cut
covering LP relaxation for the problem is given below.

Õ
min 𝑐(𝑒)𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 𝑓 (𝑆) 𝑆⊂𝑉
𝑒∈𝛿(𝑆)
𝑥𝑒 ∈ [0, 1] 𝑒∈𝐸

Note that upper bound constraints 𝑥 𝑒 ≤ 1 are necessary in the general setting
when 𝑓 is integer valued since we can only take one copy of an edge. The key
structural theorem of Jain is the following.

Theorem 13.3. Let 𝑥 be a basic feasible solution to the LP relaxation. Then there is
some edge 𝑒 ∈ 𝐸 such that 𝑥 𝑒 = 0 or 𝑥 𝑒 ≥ 1/2.

With the above in place, and the observation that the residual function of
a skew-supermodular function is again a skew-supermodular function, one
obtains an interative rounding algorithm.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 163

Cover-Skew-Supermodular(𝐺, 𝑓 )

1. If 𝐸 does not cover 𝑓 output “infeasible”

2. 𝐴 ← ∅, 𝑔 = 𝑓

3. While 𝐴 is not a feasible solution do

A. Find an optimum basic feasible solution 𝑥 to cover 𝑔 in 𝐺0 = (𝑉 , 𝐸 \ 𝐴).


B. If there is some 𝑒 such that 𝑥 𝑒 = 0 then 𝐸 ← 𝐸 − {𝑒}
C. Else If there is some 𝑒 such that 𝑥 𝑒 ≥ 1/2 then
1. 𝐴 = 𝐴 ∪ {𝑒}
2. 𝑔 = 𝑓𝐴 (recall 𝑓𝐴 (𝑆) = 𝑓 (𝑆) − |𝛿 𝐴 (𝑆)| )

4. Output 𝐴

Corollary 13.4. The integrality gap of the cut LP is at most 2 for any skew-supermodular
function 𝑓 .
Proof. We consider the iterative rounding algorithm and prove the result via
induction on 𝑚 the number of edges in 𝐺. The base case of 𝑚 = 0 is trivial since
the function has to be 0.
Í Let 𝑥∗ be an optimum basic feasible solution to the LP relaxation. We have

𝑒∈𝐸 𝑐 𝑒 𝑥 𝑒 ≤ OPT. We can assume without loss of generality that 𝑓 is not trivial
in the sense that 𝑓 (𝑆) ≥ 1 for at least some set 𝑆, otherwise 𝑥 = 0 is optimal
and there is nothing to prove. By Theorem 13.3, there is an edge 𝑒˜ ∈ 𝐸 such
that 𝑥 ∗𝑒˜ = 0 or 𝑥 ∗𝑒˜ ≥ 1/2. Let 𝐸0 = 𝐸 \ 𝑒˜ and 𝐺0 = (𝑉 , 𝐸0). In the former case we
can discard 𝑒˜ and the current LP solution restricted to 𝐸0 is a feasible fractional
solution and we obtain the desired result via induction since we have one less
edge.
The more interesting case is when 𝑥 ∗𝑒˜ ≥ 1/2. The algorithm includes 𝑒˜ and
recurses on 𝐺0 and the residual function 𝑔 : 2𝑉 → ℤ where 𝑔(𝑆) = 𝑓 (𝑆) − |𝛿 𝑒˜ (𝑆)|.
Note that 𝑔 is skew-supermodular. We observe that 𝐴0 ⊆ 𝐸0 is a feasible solution
to cover 𝑔 in 𝐺0 iff 𝐴0 ∪ { 𝑒˜ } is a feasible solution to cover 𝑓 in 𝐺. Furthermore,
we also observe that the fractional solution 𝑥 0 obtained by restricting 𝑥 to 𝐸0
is a feasible fractional solution to the LP relaxation to cover 𝑔 in 𝐺0. Thus,
by induction, there is a solution 𝐴0 ⊆ 𝐸0 such that 𝑐(𝐴0) ≤ 2 𝑒∈𝐸0 𝑐(𝑒)𝑥 ∗𝑒 . The
Í
algorithm outputs 𝐴 = 𝐴0 ∪ { 𝑒˜ } which is feasible to cover 𝑓 in 𝐺. We have
Õ Õ
𝑐(𝐴) = 𝑐(𝐴0) + 𝑐(˜𝑒 ) ≤ 𝑐(𝐴0) + 2𝑐(˜𝑒 )𝑥 ∗𝑒˜ ≤ 2 𝑐(𝑒)𝑥 ∗𝑒 + 2𝑐(˜𝑒 )𝑥 ∗𝑒˜ = 2 𝑐(𝑒)𝑥 ∗𝑒 .
𝑒∈𝐸0 𝑒∈𝐸
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 164

We used the fact that 𝑥 ∗𝑒˜ ≥ 1/2 to upper bound 𝑐(˜𝑒 ) by 2𝑐(˜𝑒 )𝑥 ∗𝑒˜ . 
2-approximation for EC-SNDP : We had already seen that the requirement
function for EC-SNDP is skew-supermodular. To applyTheorem 13.3 and obtain
a 2-approximation for EC-SNDP we need to argue that the LP relaxation can
be solved efficiently. We observe that the LP relaxation at the top level can be
solved efficiently via maxflow. We need to check that in the graph 𝐺 with edge
capacities given by the fractional solution 𝑥 the min-cut between every pair of
vertices (𝑠, 𝑡) is at least 𝑟(𝑠, 𝑡). Note that the algorithm is iterative. As we proceed
the function 𝑔 = 𝑓𝐴 where 𝑓 is the original requirement function and the 𝐴 is
the set of edges already chosen.

Exercise 13.2. Prove that there is an efficient separation oracle for each step of
the iterative rounding algorithm when 𝑓 is the requirement function for a given
EC-SNDP instance.

We now prove Theorem 13.3. The proof consists of two steps. The first step
is a characterization of basic feasible solutions via laminar tight sets. The second
step is a counting argument.

13.2.1 Basic feasible solutions and laminar family of tight sets


Let 𝑥 be a basic feasible solution to the LP relaxation. We are done if there is any
edge 𝑒 such that 𝑥 𝑒 = 0 or 𝑥 𝑒 = 1. Hence the interesting case is when 𝑥 is fully
fractional, that is, 𝑥 𝑒 ∈ (0, 1) for every 𝑒 ∈ 𝐸.

Definition 13.5. A set 𝑆 ⊆ 𝑉 is tight with respect to 𝑥 if 𝑥(𝛿(𝑆)) = 𝑓 (𝑆).

The LP relaxation is of the form 𝐴𝑥 ≥ 𝑏, 𝑥 ∈ [0, 1]𝑚 . We number the edges as


𝑒1 , 𝑒2 , . . . , 𝑒 𝑚 arbitarily. Note that each row of 𝐴 corresponds to a set 𝑆 and the
non-zero entries in the row corresponding to 𝑆 are precisely for edges in 𝛿(𝑆). For
notational convenience we use 𝜒𝑆 to denote the 𝑚-dimensional row vector where
𝜒𝑆 (𝑖) = 0 if 𝑒 𝑖 ∉ 𝛿(𝑆) and 𝜒𝑆 (𝑖) = 1 if 𝑒 ∈ 𝛿(𝑆). By the rank lemma, if 𝑥 is a basic
feasible solution that is fully fractional, then there are 𝑚 tight sets 𝑆1 , 𝑆2 , . . . , 𝑆𝑚
such that the vectors 𝜒𝑆1 , 𝜒𝑆2 , . . . , 𝜒𝑆𝑚 are linearly independent in ℝ 𝑚 . In other
words 𝑥 is the unique solution to the system 𝜒𝑆𝑇𝑖 𝑥 = 𝑓 (𝑆 𝑖 ), 1 ≤ 𝑖 ≤ 𝑚. Note
that for a given basic feasible solution 𝑥 there can be many such bases. A key
technical lemma is that one choose a nice one.

Lemma 13.3. Let 𝑥 be a basic feasible solution to the cut covering LP relaxation of
a skew-supermodular function 𝑓 where 𝑥 𝑒 ∈ (0, 1) for all 𝑒. Then there is a laminar
family ℒ of tight sets 𝑆1 , 𝑆2 , . . . , 𝑆𝑚 such that 𝑥 is the unique solution to the system
𝜒𝑆𝑇𝑖 𝑥 = 𝑓 (𝑆 𝑖 ).
x is basic feasible solution and 0 < x(e) < 1 for all e
Set S tight if x(δ(S)) = f(S)

Theorem
CHAPTER [Jain] x unique
13. SURVIVABLE solution to
NETWORK system PROBLEM
DESIGN of m tight 165
sets that form a laminar family

Figure 13.3: Laminar family of tight sets.

We need an auxiliary uncrossing lemma.

Lemma 13.4. Suppose 𝐴 and 𝐵 are two tight sets with respect to 𝑥 such that 𝐴, 𝐵 cross.
Then one of the following holds:

• 𝐴 ∩ 𝐵, 𝐴 ∪ 𝐵 are tight and 𝜒𝐴 + 𝜒𝐵 = 𝜒𝐴∪𝐵 + 𝜒𝐴∩𝐵 .

• 𝐴 − 𝐵, 𝐵 − 𝐴 are tight and 𝜒𝐴 + 𝜒𝐵 = 𝜒𝐴−𝐵 + 𝜒𝐵−𝐴 .

Proof. Since 𝑓 is skew-supermodular 𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 ∩ 𝐵) + 𝑓 (𝐴 ∪ 𝐵) or


𝑓 (𝐴) + 𝑓 (𝐵) ≤ 𝑓 (𝐴 − 𝐵) + 𝑓 (𝐵 − 𝐴). We will consider the first case.
𝐴, 𝐵 are tight, hence 𝑥(𝛿(𝐴)) = 𝑓 (𝐴) and 𝑥(𝛿(𝐵)) = 𝑓 (𝐵). Moreover the
function ℎ(𝑆) = 𝑥(𝛿(𝑆)) is submodular (recall that the cut capacity function
in an undirected graph is symmetric submodular). Thus 𝑥(𝛿(𝐴)) + 𝑥(𝛿(𝐵)) ≥
𝑥(𝛿(𝐴∪𝐵))+𝑥(𝛿(𝐴∩𝐵)). We also have by feasibility of 𝑥 that 𝑥(𝛿(𝐴∪𝐵)) ≥ 𝑓 (𝐴∪𝐵)
nad 𝑥(𝛿(𝐴 ∩ 𝐵)) ≥ 𝑓 (𝐴 ∩ 𝐵). Putting together we have

𝑥(𝛿(𝐴))+𝑥(𝛿(𝐵)) = 𝑓 (𝐴)+ 𝑓 (𝐵) ≤ 𝑓 (𝐴∩𝐵)+ 𝑓 (𝐴∪𝐵) ≤ 𝑥(𝛿(𝐴∪𝐵))+𝑥(𝛿(𝐴∩𝐵)) ≤ 𝑥(𝛿(𝐴))+𝑥(𝛿(𝐵)).

This implies that 𝑥(𝛿(𝐴 ∪ 𝐵)) = 𝑓 (𝐴 ∪ 𝐵) and 𝑥(𝛿(𝐴 ∩ 𝐵)) = 𝑓 (𝐴 ∩ 𝐵). Thus
both 𝐴 ∩ 𝐵 and 𝐴 ∪ 𝐵 are tight. Moreover we observe that 𝑥(𝛿(𝐴)) + 𝑥(𝛿(𝐵)) =
𝑥(𝛿(𝐴 ∪ 𝐵)) + 𝑥(𝛿(𝐴 ∩ 𝐵)) + 2𝑥(𝐸(𝐴 − 𝐵, 𝐵 − 𝐴)) where 𝐸(𝐴 − 𝐵, 𝐵 − 𝐴) is the
set of edges between 𝐴 − 𝐵 and 𝐵 − 𝐴. From the above tightness we see that
𝑥(𝛿(𝐴)) + 𝑥(𝛿(𝐵)) = 𝑥(𝛿(𝐴 ∪ 𝐵)) + 𝑥(𝛿(𝐴 ∩ 𝐵)), and since 𝑥 is fully fractional it
means that 𝐸(𝐴 − 𝐵, 𝐵 − 𝐴) = ∅. This implies that 𝜒𝐴 + 𝜒𝐵 = 𝜒𝐴∪𝐵 + 𝜒𝐴∩𝐵 (why?).
The second case is similar where we use posimodularity of the cut function.

Proof of Lemma 13.3. One natural way to proceed is as follows. We start with
tight sets 𝒮 = {𝑆1 , 𝑆2 , . . . , 𝑆𝑚 } such that 𝑥 is characterized as the unique solution
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 166

of the equations implied by these sets. If the family {𝑆1 , 𝑆2 , . . . , 𝑆𝑚 } is laminar


we are done. Otherwise we pick some two crossing sets, say 𝑆1 , 𝑆2 without loss
of generality and uncross them using Lemma ??. We get a new family 𝒮 0 with
𝑚 tight sets and the number of crossings in the new family goes down by at
least one (as we saw in Lemma ?? previously). Suppose we replace 𝑆1 , 𝑆2 by
𝑆1 ∩ 𝑆2 , 𝑆1 ∪ 𝑆2 . The technical issue is to argue linear independence of the vectors
in the new family. This is where we need the property 𝜒𝑆1 + 𝜒𝑆2 = 𝜒𝑆1 ∩𝑆2 + 𝜒𝑆1 ∪𝑆2 .
Although natural, the linear algebraic argument turns out to be a bit tedious.
Instead we use a slick argument. Let ℒ be a maxmial laminar family of 𝑥-tight
sets such that the vectors 𝜒𝑆 , 𝑆 ∈ ℒ are linearly independent. If ℒ = 𝑚 then we
are done because we have 𝑚 linearly independent vectors that together span R𝑚 .
Suppose |ℒ| < 𝑚. Then there must be a tight set 𝑆 such that 𝜒𝑆 is not spanned by
the vectors in ℒ. Choose a tight set 𝑆 that is not spanned and crosses the fewest
number of sets from ℒ. Since ℒ is maximal, there must be some set 𝑇 ∈ ℒ such
that 𝑆, 𝑇 cross (otherwise we can add 𝑆 to ℒ). Here we use Lemma 13.4 and
consider two cases. Suppose 𝑆 ∩ 𝑇, 𝑆 ∪ 𝑇 are tight. Note that 𝑆 ∩ 𝑇, 𝑆 ∪ 𝑇 cross
fewer sets in ℒ than 𝑆 does. By the choice of 𝑆, it must be the case that both
𝑆 ∩ 𝑇 and 𝑆 ∪ 𝑇 are spanned by ℒ. However, we have 𝜒𝑆 + 𝜒𝑇 = 𝜒𝑆∩𝑇 + 𝜒𝑆∪𝑇
which implies that 𝜒𝑆 is also spanned, a contradiction. The proof for the other
case when 𝑆 − 𝑇 and 𝑇 − 𝑆 are tight is similar. Thus we have ℒ = 𝑚 and this is
the desired family. 

13.2.2 Counting argument


The second key ingredient in the proof is a counting argument that exploits
Lemma 13.3. An easier counting argument shows that there is an edge with
𝑥 𝑒 ≥ 1/3 in any basic feasible solution. The tight bound of 1/2 is more delicate
and Jain’s original proof is perhaps a bit hard to understand (see [152]). The
argument has been subsequently refined and a “fractional token” based analysis
[126] was developed and this is the proof in [155]. The token based analysis is
flexible and powerful in iterated rounding based algorithms. In an attempt to
make the proof even more transparent, the author of this notes developed yet
another proof in [41]. We describe that proof below.
The proof is via contradiction where we assume that 0 < 𝑥 𝑒 < 12 for each
𝑒 ∈ 𝐸. We call the two nodes incident to an edge as the endpoints of the edges.
We say that an endpoint 𝑢 belongs to a set 𝑆 ∈ ℒ if 𝑢 is the minimal set from ℒ
that contains 𝑢.
We consider the simplest setting where ℒ is a collection of disjoint sets, in
other words, all sets are maximal. In this case the counting argument is easy. Let
𝑚 = |𝐸| = |ℒ|. For each 𝑆 ∈ ℒ, 𝑓 (𝑆) ≥ 1 and 𝑥(𝛿(𝑆)) = 𝑓 (𝑆). If we assume that
𝑥 𝑒 < 12 for each 𝑒, we have |𝛿(𝑆)| ≥ 3 which implies that each 𝑆 contains at least
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 167
Counting argument: easy case
3 distinct endpoints. Thus, the 𝑚 disjoint sets require a total of 3𝑚 endpoints.
However the total number of endpoints is at most 2𝑚 since there are 𝑚 edges,
x is basic feasible solution and 0 < x(e) < 1 for all e
leading to a contradiction.
x unique solution to system of m tight laminar sets

Figure 13.4: Easy case of counting argument.

Now we consider a second setting where the forest associated with ℒ has 𝑘
leaves and ℎ internal nodes but each internal node has at least two children. In
this case, following Jain, we can easily prove a weaker statement that 𝑥 𝑒 ≥ 1/3
for some edge 𝑒. If not, then each leaf set 𝑆 must have four edges leaving it
and hence the total number of endpoints must be at least 4𝑘. However, if each
internal node has at least two children, we have ℎ < 𝑘 and since ℎ + 𝑘 = 𝑚 we
have 𝑘 > 𝑚/2. This implies that there must be at least 4𝑘 > 2𝑚 endpoints since
the leaf sets are disjoint. But 𝑚 edges can have at most 2𝑚 endpoints. Our
assumption on each internal node having at least two children is obviously a
restriction. So far we have not used the fact that the vectors 𝜒𝑆 , 𝑆 ∈ ℒ are linearly
independent. We can handle the general case to prove 𝑥 𝑒 ≥ 1/3 by using the
following lemma.

Lemma 13.5. Suppose 𝐶 is a unique child of 𝑆. Then there must be at least two
endpoints in 𝑆 that belong to 𝑆.

Proof. If there is no endpoint that belongs to 𝑆 then 𝛿(𝑆) = 𝛿(𝐶) but then 𝜒𝑆 and
𝜒𝐶 are linearly dependent. Suppose there is exactly one endpoint that belongs
to 𝑆 and let it be the endpoint of edge 𝑒. But then 𝑥(𝛿(𝑆)) = 𝑥(𝛿(𝐶)) + 𝑥 𝑒 or
𝑥(𝛿(𝑆)) = 𝑥(𝛿(𝐶)) − 𝑥 𝑒 . Both cases are not possible because 𝑥(𝛿(𝑆)) = 𝑓 (𝑆) and
𝑥(𝛿(𝐶)) = 𝑓 (𝐶) where 𝑓 (𝑆) and 𝑓 (𝐶) are positive integers while 𝑥 𝑒 ∈ (0, 1). Thus
there are at least two end points that belong to 𝑆. 
Using the preceding lemma we prove that 𝑥 𝑒 ≥ 1/3 for some edge 𝑒. Let 𝑘
be the number of leaves in ℒ and ℎ be the number of internal nodes with at
least two children and let ℓ be the number of internal nodes with exactly one
child. We again have ℎ < 𝑘 and we also have 𝑘 + ℎ + ℓ = 𝑚. Each leaf has at
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 168

least four endpoints. Each internal node with exactly one child has at least two
end points which means the total number of endpoints is at least 4𝑘 + 2ℓ . But
4𝑘 + 2ℓ = 2𝑘 + 2𝑘 + 2ℓ > 2𝑘 + 2ℎ + 2ℓ > 2𝑚 and there are only 2𝑚 endpoints
for 𝑚 edges. In other words, we can ignore the internal nodes with exactly one
child since there are two endpoints in such a node/set and we can effectively
charge one edge to such a node.
We now come to the more delicate argument to prove the tight bound that
𝑥 𝑒 ≥ 21 for some edge 𝑒. We describe invariant that effectively reduces the
argument to the case where we can assume that ℒ is a collection of leaves. This
is encapsulated in the lemma below which requires some notation. Let 𝛼(𝑆) be
the number of sets of ℒ contained in 𝑆 including 𝑆 itself. Let 𝛽(𝑆) be the number
of edges whose both endpoints lie inside 𝑆. Recall that 𝑓 (𝑆) is the requirement of
𝑆.

Lemma 13.6. For all 𝑆 ∈ ℒ, 𝑓 (𝑆) ≥ 𝛼(𝑆) − 𝛽(𝑆).

Assuming that the lemma is true we can do an easy counting argument.


Let 𝑅 1 , 𝑅2 , . . . , 𝑅 ℎ be the maximal sets in ℒ (the roots of the forest). Note that
Íℎ
𝑖=1 𝛼(𝑅 𝑖 ) = |ℒ| = 𝑚. Applying the claim to each 𝑅 𝑖 and summing up,


Õ ℎ
Õ ℎ
Õ ℎ
Õ
𝑓 (𝑅 𝑖 ) ≥ 𝛼(𝑅 𝑖 ) − 𝛽(𝑅 𝑖 ) ≥ 𝑚 − 𝛽(𝑅 𝑖 ).
𝑖=1 𝑖=1 𝑖=1 𝑖=1


𝑓 (𝑅 𝑖 ) is the total requirement of the maximal sets. And
Í
Note that 𝑖=1
Íℎ
𝑚 − 𝑖=1 𝛽(𝑅 𝑖 ) is the total number of edges that cross the sets 𝑅 1 , . . . , 𝑅 ℎ . Let 𝐸0
be the set of edges crossing these maximal sets. Now we are back to the setting
Íℎ
with ℎ disjoint sets and 𝐸0 edges with 𝑖=1 𝑓 (𝑅 𝑖 ) ≥ |𝐸0 |. This easily leads to a
contradiction as before if we assume that 𝑥 𝑒 < 12 for all 𝑒 ∈ 𝐸0. Formally, each set
𝑅 𝑖 requires > 2 𝑓 (𝑅 𝑖 ) edges crossing it from 𝐸0 and therefore 𝑅 𝑖 contains at least
2 𝑓 (𝑅 𝑖 ) + 1 endpoints of edges from 𝐸0. Since 𝑅1 , . . . , 𝑅 ℎ are disjoint the total
number of endpoints is at least 2 𝑖 𝑓 (𝑅 𝑖 ) + ℎ which is strictly more than 2|𝐸0 |.
Í

Proof of Lemma 13.6. Thus, it remains to prove the claim which we do by induc-
tively starting at the leaves of the forest for ℒ.
Case 1: 𝑆 is a leaf node. We have 𝑓 (𝑆) ≥ 1 while 𝛼(𝑆) = 1 and 𝛽(𝑆) = 0 which
verifies the claim.
Case 2: 𝑆 is an internal nodes with 𝑘 children 𝐶1 , 𝐶2 , . . . , 𝐶 𝑘 . See Fig 13.5 for
the different types of edges that are relevant. 𝐸 𝑐𝑐 is the set of edges with end
points in two different children of 𝑆. 𝐸 𝑐𝑝 be the set of edges that cross exactly
one child but do not cross 𝑆. 𝐸 𝑝𝑜 be the set of edges that cross 𝑆 but do not cross
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 169

∈ Epo ∈ Eco ∈ Eco

∈ Ecp
S

∈ Ecc
∈ Ecc
C3
C1 C2

Figure 13.5: 𝑆 is an internal node with several children. Different types of edges
that play a role. 𝑝 refers to parent set 𝑆, 𝑐 refer to a child set and 𝑜 refers to
outside.

any of the children. 𝐸 𝑐𝑜 is the set of edges that cross both a child and 𝑆. This
notation is borrowed from [155].
Let 𝛾(𝑆) be the number of edges whose both endpoints belong to 𝑆 but not
to any child of 𝑆. Note that 𝛾(𝑆) = |𝐸 𝑐𝑐 | + |𝐸 𝑐𝑝 |.
Then,
𝑘
Õ
𝛽(𝑆) = 𝛾(𝑆) + 𝛽(𝐶 𝑖 )
𝑖=1
𝑘
Õ 𝑘
Õ
≥ 𝛾(𝑆) + 𝛼(𝐶 𝑖 ) − 𝑓 (𝐶 𝑖 ) (13.1)
𝑖=1 𝑖=1
𝑘
Õ
= 𝛾(𝑆) + 𝛼(𝑆) − 1 − 𝑓 (𝐶 𝑖 )
𝑖=1

(13.1) follows by applying the inductive hypothesis to each child. From the
preceding inequality, to prove that 𝛽(𝑆) ≥ 𝛼(𝑆) − 𝑓 (𝑆) (the claim for 𝑆), it suffices
to show the following inequality.

𝑘
Õ
𝛾(𝑆) ≥ 𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1. (13.2)
𝑖=1
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 170

The right hand side of the above inequality can be written as:
𝑘
Õ Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 = 2𝑥 𝑒 + 𝑥𝑒 − 𝑥 𝑒 + 1. (13.3)
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝 𝑒∈𝐸 𝑝𝑜

We consider two subcases.


Case 2.1: 𝛾(𝑆) = 0. This implies that 𝐸 𝑐𝑐 and 𝐸 𝑐𝑝 are empty. Since 𝜒(𝛿(𝑆)) is
linearly independent from 𝜒(𝛿(𝐶1 )), . . . , 𝜒(𝛿(𝐶 𝑘 )), we must have that 𝐸 𝑝𝑜 is not
empty and hence 𝑒∈𝐸𝑝𝑜 𝑥 𝑒 > 0. Therefore, in this case,
Í

𝑘
Õ Õ Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 = 2𝑥 𝑒 + 𝑥𝑒 − 𝑥𝑒 + 1 = − 𝑥 𝑒 + 1 < 1.
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝 𝑒∈𝐸 𝑝𝑜 𝑒∈𝐸 𝑝𝑜

𝑘
𝑓 (𝐶 𝑖 )− 𝑓 (𝑆)+1 ≤ 0 = 𝛾(𝑆).
Í
Since the left hand side is an integer, it follows that 𝑖=1
Case 2.2: 𝛾(𝑆) ≥ 1. Recall that 𝛾(𝑆) = |𝐸 𝑐𝑐 | + |𝐸 𝑐𝑝 |.
𝑘
Õ Õ Õ Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 = 2𝑥 𝑒 + 𝑥𝑒 − 𝑥𝑒 + 1 ≤ 2𝑥 𝑒 + 𝑥𝑒 + 1
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝 𝑒∈𝐸 𝑝𝑜 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝

By our assumption that 𝑥 𝑒 < 12 for each 𝑒, we have 𝑒∈𝐸𝑐𝑐 2𝑥 𝑒 < |𝐸 𝑐𝑐 | if |𝐸 𝑐𝑐 | > 0,
Í
and similarly 𝑒∈𝐸𝑐𝑝 𝑥 𝑒 < |𝐸 𝑐𝑝 |/2 if |𝐸 𝑐𝑝 | > 0. Since 𝛾(𝑆) = |𝐸 𝑐𝑐 | + |𝐸 𝑐𝑝 | ≥ 1 we
Í
conclude that Õ Õ
2𝑥 𝑒 + 𝑥 𝑒 < 𝛾(𝑆).
𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝

Putting together we have


𝑘
Õ Õ Õ
𝑓 (𝐶 𝑖 ) − 𝑓 (𝑆) + 1 ≤ 2𝑥 𝑒 + 𝑥 𝑒 + 1 < 𝛾(𝑆) + 1 ≤ 𝛾(𝑆)
𝑖=1 𝑒∈𝐸 𝑐𝑐 𝑒∈𝐸 𝑐𝑝

as desired. 
Tightness of the analysis: The LP relaxation has an integrality gap of 2 even
for the MST problem. Let 𝐺 be the cycle on 𝑛 vertices with all edge costs equal
to 1. Then setting 𝑥 𝑒 = 1/2 on each edge is feasible and the cost is 𝑛/2 while the
MST cost is 𝑛 − 1. Note that the optimum fractional solution here is 1/2-integral.
However, there are more involved examples (see Jain’s paper or [152]) based on
the Petersen graph where the optimum basic feasible solution is not half-integral
while there are one or more edges with fractional value at least 1/2. Jain’s
iterated rounding algorithm is an unusual algorithm in that the output of the
algorithm may not have any discernible structure until it is completely done.
CHAPTER 13. SURVIVABLE NETWORK DESIGN PROBLEM 171

Running time: The strength of the iterated rounding approach is the remark-
able approximation guarantees it delivers for various problems. The weakness
is the high running time which is due to two reasons. First, one needs a basic
feasible solution for the LP — this is typically much more expensive than finding
an approximately good feasible solution. Second, the algorithm requires com-
puting an LP solution many times. Finding faster algorithms with comparable
approximation guarantees is an open research area.
Chapter 14

Introduction to Cut and


Partitioning Problems

Graph cut and partitoning problems such as the well-known 𝑠-𝑡 minimum cut
problem play a fundamental role in combinatorial optimization. Many natural
cut problems that go beyond the 𝑠-𝑡 cut problem are NP-Hard and there has
been extensive work on approximation algorithms and heuristics since they
arise in many applications. In addition to algorithms, the structural results that
capture approximate relationships between flows and cuts (called flow-cut gaps),
and the connections to the theory of metric embeddings as well as graph theory
have led to many beautiful and important results.

14.1 𝒔-𝒕 mincut via LP Rounding and Maxflow-Mincut


Let 𝐺 = (𝑉 , 𝐸) be a directed graph with edge costs 𝑐 : 𝐸 → ℝ+ . Let 𝑠, 𝑡 ∈ 𝑉 be
distinct vertices. The 𝑠-𝑡 mincut problem is to find the cheapest set of edges
𝐸0 ⊆ 𝐸 such that there is no 𝑠-𝑡 path in 𝐺 − 𝐸0. An 𝑠-𝑡 cut is often also defined as
𝛿+ (𝑆) for some 𝑆 ⊂ 𝑉 where 𝑠 ∈ 𝑆, 𝑡 ∈ 𝑉 − 𝑆. Suppose 𝐸0 is an 𝑠-𝑡 cut. Let 𝑆 be
the set of nodes reachable from 𝑠 in 𝐺 − 𝐸0, then 𝛿(𝑆) ⊆ 𝐸0 and moreover 𝛿(𝑆)
is an 𝑠-𝑡 cut. Thus, it suffices to focus on such limited type of cuts, however in
some more general settings it is useful to keep these notions separate.
It is well-known that 𝑠-𝑡 mincut can be computed efficiently via 𝑠-𝑡 max-
imumflow which also establishes the maxflow-mincut theorem. This is a
fundamental theorem in combinatorial optimization with many direct and
indirect applications.

Theorem 14.1. Let 𝐺 = (𝑉 , 𝐸) be a directed graph with rational edge capacities


𝑐 : 𝐸 → ℚ+ and let 𝑠, 𝑡 ∈ 𝑉 be distinct vertices. Then the 𝑠-𝑡 maximum flow value in 𝐺

172
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS173

is equal to the 𝑠-𝑡 minimum cut value and both can be computed in strongly polynomial
time. Further, if 𝑐 is integer valued then there exists an integer-valued maximum flow.
The proof of the preceding theorem is typically established via the augment-
ing path algorithm for computing a maximum flow. Here we take a different
approach to finding an 𝑠-𝑡 cut via an LP relaxation whose dual can be seen as
the the maxflow LP.
Suppose we want to find an 𝑠-𝑡 mincut. We can write it as an integer program
as follows. For each edge 𝑒 ∈ 𝐸 we have a boolean variable 𝑥 𝑒 ∈ {0, 1} to indicate
whether we cut 𝑒. The constraint is that for any path 𝑃 ∈ 𝒫𝑠,𝑡 (here 𝒫𝑠,𝑡 is the
set of all 𝑠-𝑡 paths) we must choose at least on edge from 𝑃. This leads to the
following IP.

Õ
min 𝑐(𝑒)𝑥 𝑒
𝑒∈𝐸
Õ
𝑥𝑒 ≥ 1 𝑃 ∈ 𝒫𝑠,𝑡
𝑒∈𝑃
𝑥𝑒 ∈ {0, 1} 𝑒 ∈ 𝐸.

The LP relaxation is obtained by changing 𝑥 𝑒 ∈ {0, 1} to 𝑥 𝑒 ≥ 0 since we can


omit the constraints 𝑥 𝑒 ≤ 1. We note that the LP has an exponential number of
constraints, however, we have an efficient separation oracle since it corresponds
to computing the shortest 𝑠-𝑡 path. The LP can be viewed as assigning lengths
to the edges such that the shortest path between 𝑠 and 𝑡 according to the lengths
is at least 1. This is a fractional relaxation of the cut.
Rounding the LP: We will prove that the LP relaxation can be rounded without
any loss! The rounding algorithm is described below.

Theta-Rounding(𝐺, 𝑠, 𝑡 )

1. Solve LP to obtain fractional solution 𝑦

2. For each 𝑣 ∈ 𝑉 let 𝑑 𝑦 (𝑠, 𝑣) be the shortest path distance from 𝑠 to 𝑣


according to edge lengths 𝑦 𝑒 .

3. Pick 𝜃 uniformly at random from (0, 1)

4. Output 𝐸0 = 𝛿 + (𝐵(𝑠, 𝜃)) where 𝐵(𝑠, 𝜃) = {𝑣 | 𝑑 𝑦 (𝑠, 𝑣) ≤ 𝜃} is the ball of


radius 𝜃 around 𝑠

It is easy to see that the algorithm outputs a valid 𝑠-𝑡 since 𝑑 𝑦 (𝑠, 𝑡) ≥ 1 by
feasibility of the LP solution 𝑦 and hence 𝑡 ∉ 𝐵(𝑠, 𝜃) for any 𝜃 < 1.
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS174

Lemma 14.1. Let 𝑒 = (𝑢, 𝑣) be an edge. P[𝑒 is cut by algorithm] ≤ 𝑦(𝑢, 𝑣).
Proof. An edge 𝑒 = (𝑢, 𝑣) is cut iff 𝑑 𝑦 (𝑠, 𝑢) ≤ 𝜃 < 𝑑 𝑦 (𝑠, 𝑣). Hence the edge is not
cut if 𝑑 𝑦 (𝑠, 𝑣) ≤ 𝑑 𝑦 (𝑠, 𝑢). If 𝑑 𝑦 (𝑠, 𝑣) > 𝑑 𝑦 (𝑠, 𝑢) we have 𝑑 𝑦 (𝑠, 𝑣)−𝑑 𝑦 (𝑠, 𝑢) ≤ 𝑦(𝑢, 𝑣).
Since 𝜃 is chosen uniformly at random from (0, 1) the probabilty that 𝜃 lies in
the interval [𝑑 𝑦 (𝑠, 𝑢), 𝑑 𝑦 (𝑠, 𝑣)] is at most 𝑦(𝑢, 𝑣). 
Corollary 14.2. The expected cost of the cut output by the algorithm is at most
𝑒 𝑐(𝑒)𝑦 𝑒 .
Í

The preceding corollary shows that there is an integral cut whose cost is at
most that of the LP relaxation which implies that the LP relaxation yields an
optimum solution. The algorithm can be easily derandomized by trying “all
possible value of 𝜃”. What does this mean? Once we have 𝑦 we compute the
shortest path distances from 𝑠 to each vertex 𝑣. We can think of these distances
as producing a line embedding where we place 𝑠 at 0 and each vertex 𝑣 at 𝑑 𝑦 (𝑠, 𝑣).
The only interesting choices for 𝜃 are given by the 𝑛 values of 𝑑 𝑦 (𝑠, 𝑣) and one
can try each of them and the corresponding cut and find the cheapest one. It is
guaranteed to be at most 𝑒 𝑐(𝑒)𝑦 𝑒 .
Í
What is the dual LP? We write it down below and you can verify that it is
the path version of the maxflow!

max 𝑧 𝑃
𝑃∈𝒫𝑠,𝑡
Õ
𝑧𝑃 ≤ 𝑐(𝑒) 𝑒∈𝐸
𝑃:𝑒∈𝑃
𝑧𝑃 ≥ 0 𝑃 ∈ 𝒫𝑠,𝑡 .
Thus, we have seen a proof of the maxflow-mincut theorem via LP rounding
of a relaxation for the 𝑠-𝑡 cut problem.
A compact LP via distance variables: The path based LP relaxation for the 𝑠-𝑡
mincut problem is natural and easy to formulate. We can also express shortest
path constraints via distance variables. We first write a bigger LP than necessary
via variables 𝑑(𝑢, 𝑣) for all ordered pairs of vertices (hence there are 𝑛 2 variables).
We need triangle inequality constraints to enforce that 𝑑(𝑢, 𝑣) values respect
shortest path distances.

min 𝑐(𝑢, 𝑣)𝑑(𝑢, 𝑣)


(𝑢,𝑣)∈𝐸
𝑑(𝑢, 𝑣) + 𝑑(𝑣, 𝑤) − 𝑑(𝑢, 𝑤) ≥ 0 𝑢, 𝑣, 𝑤 ∈ 𝑉
𝑑(𝑠, 𝑡) ≥ 1
𝑑(𝑢, 𝑣) ≥ 0 (𝑢, 𝑣) ∈ 𝑉 × 𝑉
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS175

Although the preceding LP is wasteful in some ways it is quite generic and


can be used for many cut problems where we are interested in distances between
multiple pairs of vertices.
Now we consider a more compact LP formulation. We have two types of
variables, 𝑥(𝑢, 𝑣) for each edge (𝑢, 𝑣) ∈ 𝐸 and 𝑑𝑣 variables for each 𝑣 ∈ 𝑉 to
indicate distances from 𝑠.

min 𝑐(𝑢, 𝑣)𝑥(𝑢, 𝑣)


(𝑢,𝑣)∈𝐸
𝑑𝑣 ≤ 𝑑𝑢 + 𝑥(𝑢, 𝑣) (𝑢, 𝑣) ∈ 𝐸
𝑑𝑡 ≥ 1
𝑑𝑣 ≥ 0 𝑣∈𝑉
𝑥(𝑢, 𝑣) ≥ 0 (𝑢, 𝑣) ∈ 𝐸

Exercise 14.1. Write the dual of the above LP and see it as the standard edge-
based flow formulation for 𝑠-𝑡 maximum flow.

14.2 A Catalog of Cut and Partitioning Problems


Here we list a few prominent problems.
Multiway Cut: Input is an undirected graph 𝐺 = (𝑉 , 𝐸) and edge weights
𝑤 : 𝐸 → ℝ+ and a set of 𝑘 terminals {𝑠 1 , 𝑠2 , . . . , 𝑠 𝑘 } ⊆ 𝑉. The goal is to remove
a minimum weight subset of edges such that there is no path from 𝑠 𝑖 to 𝑠 𝑗 for
any 𝑖 ≠ 𝑗. This problem is NP-Hard even for 𝑘 = 3. Node-weighted Multiway
Cut is the generalization to the setting when 𝐺 has node-weights instead of
edge-weights. Directed Multiway Cut is the version where 𝐺 is a directed
graph and the goal is to remove a minimum weight set of edges so that there
is no 𝑠 𝑖 -𝑠 𝑗 path for any 𝑖 ≠ 𝑗. Note that the directed version generalizes the
node-weighted undirected problem as well.
𝒌-Cut: In 𝑘-Cut the input is a graph 𝐺 and integer 𝑘. The goal is to remove
a minimum weight set of edges such that the resulting graph has at least 𝑘
non-trivial components. The problem is NP-Hard when 𝑘 is part of the input but
admits a polynomial time algorithm for any fixed 𝑘. A common generalization
of 𝑘-Cut and Multiway Cut is the so-called Steiner 𝒌-Cut: here the input is a
graph 𝐺, a set of terminals 𝑆 ⊆ 𝑉 and an integer 𝑘 where 𝑘 ≤ |𝑆|. The goal
is to remove a minimum-weight subset of edges such that there are at least 𝑘
components that each contain a terminal.
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS176

Multicut: The input is an edge-weighted graph 𝐺 = (𝑉 , 𝐸) and 𝑘 source-sink


pairs (𝑠 1 , 𝑡1 ), (𝑠 2 , 𝑡2 ), . . . , (𝑠 𝑘 , 𝑡 𝑘 ) and the goal is to remove a minimum-weight set
of edges so that there is no path from 𝑠 𝑖 to 𝑡 𝑖 for all 𝑖 ∈ [𝑘]. Note that Multiway
Cut is a special case. Multicut also naturally generalizes to the node-weighted
setting and to directed graphs.
Sparsest Cut: We discuss the more general version first called the Non-uniform
Sparsest Cut. The input is an edge-capacitated graph 𝐺 = (𝑉 , 𝐸) and 𝑘 source-
sink pairs (𝑠 1 , 𝑡1 ), . . . , (𝑠 𝑘 , 𝑡 𝑘 ) and associated non-negative scalar demand values
𝐷1 , 𝐷2 , . . . , 𝐷 𝑘 . The goal is to find a minimum-weight subset of edges 𝐸0 ⊆ 𝐸
𝑐(𝐸0 )
0 ) is minimized where 𝑐(𝐸 ) is the total capacity of edges in 𝐸
such the ratio 𝐷(𝐸 0 0

and 𝐷(𝐸0) is the total demand of pairs separated by removing 𝐸0. In undirected
graphs one can see that this ratio is also minimized by a connected component
𝑐(𝛿(𝑆))
and hence one can alternatively phrase the problem as min𝑆⊆𝑉 𝐷(𝛿(𝑆)) . In the
Uniform Sparsest Cut problem we associate 𝐷(𝑢, 𝑣) = 1 for every pair of vertices
𝑐(𝛿(𝑆))
and hence one wants to find min𝑆∈𝑉 |𝑆||𝑉−𝑆| . This is closely related (to within
a factor of 2) to the problem of finding the expansion of a graph where the
objective is min𝑆∈𝑉 ,|𝑆|≤|𝑉 |/2 𝑐(𝛿(𝑆)) |𝑆|
. Other closely related variants are to find the
conductance and sparsest cut for product multicommodity flow instances where
the demands are induced by vertex weights (that is, 𝐷(𝑢, 𝑣) = 𝜋(𝑢)𝜋(𝑣) where
𝜋 : 𝑉 → 𝑚𝑎𝑡 ℎ𝑏𝑏𝑅 + ).
Sparsest Cut can be generalized to node-weighted settings and to directed
graphs. In directed graphs there is a substantial difference when considering
the non-uniform versus the uniform settings because the latter can be thought
of a symmetric version. We will not detail the issues here.
Minimum Bisection and Balanced Partitioning: The input is an undirected
edge-weighted graph 𝐺. In Minimum Bisection the goal is to partition 𝑉 into
𝑉1 , 𝑉2 where b|𝑉 |/2c ≤ |𝑉1 |, |𝑉2 | ≤ d|𝑉 |/2e so as to minimize the weight of the
edges crossing the partition. In Balanced Partition the sizes of the two parts
can be approximately equal — there is a balance parameter 𝛼 ∈ (0, 1/2]) and
the goal is to partition 𝑉 into 𝑉1 , 𝑉2 such that 𝛼|𝑉 | ≤ |𝑉1 |, |𝑉2 | ≤ (1 − 𝛼)|𝑉 |.
These problems are partially motivated by parallel and distributed computation
where a graph representing some computation is recursively decomposed into
several pieces while minimizing the communication required between the pieces
(captured by the edge weights). In this context partitioning into 𝑘 given pieces
to minimize the number of edges while each piece has size roughly |𝑉 |/𝑘 is also
considered as Balanced 𝑘-Partition.
Hypergraphs and Submodular functions: One can generalize several edge-
weighted problems to node-weighted problems in a natural fashion but in some
cases it is useful to consider other ways to model. In this context hypergraphs
CHAPTER 14. INTRODUCTION TO CUT AND PARTITIONING PROBLEMS177

come in handy and have their own intrinsic appeal. Recall that a hypergraph
𝐻 = (𝑉 , 𝐸) consists of a vertex set 𝑉 and a set of hyperedges 𝐸 where each 𝑒 ∈ 𝐸
is a subset of vertices; thus 𝑒 ⊆ 𝑉. The rank of a hypergraph is the maximum
cardinality of a hyperedge, typically denote by 𝑟. Graphs are rank 2 hypergraphs.
One can typically reduce a hypergraph cut problem to a node-weighted cut
problem and vice-versa with some distinctions based on the specific problem at
hand. Finally some of the problems can be naturally lifted to the setting where
we consider an abstract submodular set function defined over the vertex set
𝑉; that is we are given 𝑓 : 2𝑉 → ℝ+ rather than a graph or a hypergraph and
the goal is to partition 𝑉 where the cost of the partition is now measured with
respect to 𝑓 .
Chapter 15

Multiway Cut

Multiway Cut problem is the following1. Given a graph 𝐺 = (𝑉 , 𝐸) with edge


weights 𝑤 : 𝐸 → ℝ+ , and 𝑘 terminal vertices {𝑠 1 , 𝑠2 , . . . , 𝑠 𝑘 }, remove a minimum
weight set of edges such that there is no 𝑠 𝑖 -𝑠 𝑗 path left for any 𝑖 ≠ 𝑗. We phrase it
in this long-winded way since the definition then naturally generalizes to other
settings. In the case of undirected graphs it is also useful to view the problem as a
partition problem. Consider a feasible solution to the given instance; it consists of
a set of edges whose removal leaves at least 𝑘 components and no two terminals
are in the same connected component. One can verify that in a minimal solution,
if the original graph is connected, then we can assume that there will be exactly 𝑘
components with each terminal contained in a separate one. Thus, an alternative
view is to find a partition of 𝑉 into 𝑘 sets 𝑉1 , 𝑉2 , . . . , 𝑉𝑘 such that 𝑠 𝑖 ∈ 𝑉𝑖 for
1 ≤ 𝑖 ≤ 𝑘 and the goal is to minimize 𝑖 𝑤(𝛿(𝑉𝑖 ))/2 where the factor of 2 is
Í
because any edge crossing the partition is counted exactly twice in 𝑖 𝑤(𝛿(𝑉𝑖 ))/2.
Í
When 𝑘 = 2 we can solve the problem in polynomial time since it is the same
problem as finding an 𝑠 1 -𝑠 2 minimum cut in 𝐺. The problem is known to be
NP-Hard and APX-Hard even when 𝑘 = 3 [46]. In this chapter we first consider a
simple combinatorial heuristic that yields a 2(1 − 1/𝑘)-approximation, followed
by a distance based LP relaxation that also yields a 2(1 − 1/𝑘)-approximation
with a matching integrality gap. We then describe a geometric relaxation that
yields the current best approximation ratio of 1.2965 [139] (the best known lower
bound on the integrality gap for this relaxation is 1.20016 [21]). We will also
discuss node-weighted and directed versions in the last section.
1In the literature the problem is also referred to as the Multiterminal Cut problem.

178
CHAPTER 15. MULTIWAY CUT 179

15.1 Isolating Cut Heuristic


Let 𝑆 = {𝑠 1 , 𝑠2 , . . . , 𝑠 𝑘 } be the given set of terminals. A feasible solution to a given
instance of Multiway Cut separates each terminal 𝑠 𝑖 from the terminals 𝑆 − {𝑠 𝑖 }.
One can find the cheapest cut separating 𝑠 𝑖 from 𝑆 − {𝑠 𝑖 } via a minimum-cut
computation (we shrink 𝑆 − {𝑠 𝑖 } to a single vertex 𝑡 and compute an 𝑠-𝑡 minimum
cut). Taking the union of such cuts for each 𝑖 ∈ [𝑘] clearly yields a feasible
solution. It turns that this simple heuristic is not too bad.

IsolatingCut(𝐺 = (𝑉 , 𝐸), 𝑆 = {𝑠 1 , . . . , 𝑠 𝑘 })

1. for each 𝑖 ∈ [𝑘] do

A. Let 𝐸 𝑖 be a minimum weight cut separating 𝑠 𝑖 from 𝑆 − {𝑠 𝑖 } in 𝐺

2. Without loss of generality 𝑤(𝐸1 ) ≤ 𝑤(𝐸2 ) ≤ . . . ≤ 𝑤(𝐸 𝑘 ) (otherwise


reindex)
𝑘−1
3. Ouput ∪𝑖=1 𝐸𝑖

We leave the following as an easy exercise.


Lemma 15.1. The algorithm outputs a feasible solution.
Theorem 15.1. The Isolating Cut heuristic yields a 2(1 − 1/𝑘)-approximation for
Multiway Cut.
Proof. Recall the partition view of the Multiway Cut problem. Let 𝐸∗ be an
optimum and minimal solution and let 𝑉1 , 𝑉2 , . . . , 𝑉𝑘 be the components of
𝐺 − 𝐸 ∗ such that 𝑠 𝑖 ∈ 𝑉𝑖 for 𝑖 ∈ [𝑘]. Each edge 𝑒 ∈ 𝐸 ∗ crosses the partition and
Í𝑘
since it has exactly two end points, we see that 𝑤(𝐸 ∗ ) = 2 𝑖=1 𝑤(𝛿(𝑉𝑖 )).
We claim that for each 𝑖 ∈ [𝑘] 𝑤(𝐸 𝑖 ) ≤ 𝑤(𝛿(𝑉𝑖 )). This follows from the
Í𝑘
fact that 𝛿(𝑉𝑖 ) is a cut that separates 𝑠 𝑖 from 𝑆 − {𝑠 𝑖 }. Therefore 𝑖=1 𝑤(𝐸 𝑖 ) ≤
Í𝑘 Í 𝑘−1
𝑖=1 𝑤(𝛿(𝑉 )) = 2𝑤(𝐸 ). Since 𝑤(𝐸 𝑘 ) ≥ 𝑤(𝐸 𝑖 ) for all 𝑖 ∈ [𝑘], 𝑖=1 𝑤(𝐸 𝑖 ) ≤

Í𝑖𝑘 Í 𝑘−1
(1 − 1/𝑘) 𝑖=1 𝑤(𝐸 𝑖 ), and hence we have 𝑖=1 𝑤(𝐸 𝑖 ) ≤ 2(1 − 1/𝑘)𝑤(𝐸 ∗ ) = 2(1 −
1/𝑘) OPT. 
Exercise 15.1. The analysis of the algorithm is tight. Find an example to
demonstrate this.

15.2 Distance based LP Relaxation


We describe an LP relaxation based on distance/length variables that generalizes
the relaxation for 𝑠-𝑡 cut that we saw previously. For each edge 𝑒 there is a
CHAPTER 15. MULTIWAY CUT 180

length variable 𝑥 𝑒 indicating whether 𝑒 is cut or not. We require that the length
of any path 𝑝 connecting 𝑠 𝑖 and 𝑠 𝑗 , 𝑖 ≠ 𝑗, should be at least 1.

𝑐𝑒 𝑥𝑒
Í
min
𝑒∈𝐸
s.t.
𝑥 𝑒 ≥ 1 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑠 𝑗 , 𝑖 ≠ 𝑗
Í
𝑒∈𝑝
𝑥𝑒 ≥ 0 𝑒∈𝐸
The preceding LP can be solved in polynomial time via the Ellipsoid method
since the separation oracle is the shortest path problem. Alternatively, one can
write a compact LP. We focus on rounding the LP and establishing its integrality
gap.

BallCut(𝐺 = (𝑉 , 𝐸), 𝑆 ⊆ 𝑉 )

1. Solve distance based LP relaxation to obtain solution 𝑥

2. Let 𝑑 𝑥 be the metric induced on 𝑉 by edge lengths 𝑥

3. Pick 𝜃 uniformaly at random from (0, 1/2)


𝑘
4. Output ∪𝑖=1 𝛿(𝐵 𝑑 (𝑠 𝑖 , 𝜃)) where 𝐵(𝑠 𝑖 , 𝜃) is the ball of radius 𝜃 around 𝑠 𝑖

Lemma 15.2. The algorithm outputs a feasible solution.

Proof. Consider a terminal 𝑠 𝑖 . 𝐵 𝑑 (𝑠 𝑖 , 𝜃) does not contain any other terminal


𝑠 𝑗 , 𝑗 ≠ 𝑖 since 𝜃 < 1/2 and 𝑑(𝑠 𝑖 , 𝑠 𝑗 ) ≥ 1. Thus removing the edges 𝛿(𝐵 𝑑 (𝑠 𝑖 , 𝜃))
disconnects 𝑠 𝑖 from every other terminal. Since this holds for each 𝑠 𝑖 taking
the union of the edges ∪𝑖 𝛿(𝐵 𝑑 (𝑠 𝑖 , 𝜃)) disconnects each terminal from all other
terminals. 
Consdier the open balls of radius 1/2 around the terminals, that is, 𝐵 𝑑 (𝑠 𝑖 , 1/2)
for 𝑖 ∈ [𝑘]. We observe that they are disjoint for if 𝑣 ∈ 𝐵(𝑠 𝑖 , 1/2) ∩ 𝐵(𝑠 𝑗 , 1/2)
with 𝑖 ≠ 𝑗 then 𝑑(𝑠 𝑖 , 𝑠 𝑗 ) < 1 which would violate the LP constraint. Thus the
algorithm is essentially running an 𝑠-𝑡 cut type algorithm in the disjoint balls but
since we only have half the radius we lose a factor of 2. In fact, as we will see in
a remark later, the analysis holds even if each 𝑠 𝑖 chose its own 𝜃𝑖 independently
— the correlated choice is not exploited here — but a correlated choice is quite
useful when consider other cut problems including Directed Multiway Cut later
in the chapter.

Lemma 15.3. Let 𝑒 = (𝑢, 𝑣). P[𝑒 is cut] ≤ 2𝑥 𝑒 .


CHAPTER 15. MULTIWAY CUT 181

Proof. There are several cases to consider but the main one is the one where both
𝑢, 𝑣 ∈ 𝐵 𝑑 (𝑠 𝑖 , 1/2] for some 𝑠 𝑖 . Suppose this is the case. Then only 𝑠 𝑖 can cut the
edge (𝑢, 𝑣). It is easy to see that P[𝑒 is cut] = 2|𝑑(𝑠 𝑖 , 𝑢) − 𝑑(𝑠 𝑖 , 𝑣)| since we pick
𝜃 uniformly at random from (0, 1/2). But |𝑑(𝑠 𝑖 , 𝑢) − 𝑑(𝑠 𝑖 , 𝑣)| ≤ 𝑥 𝑒 by triangle
inequality and hence we obtain the desired claim.
Now we consider the case when 𝑢 ∈ 𝐵(𝑠 𝑖 , 1/2) and 𝑣 ∈ 𝐵(𝑠 𝑗 , 1/2) where 𝑖 ≠ 𝑗.
Let 𝛼 = 1/2 − 𝑑(𝑠 𝑖 , 𝑢) and let 𝛽 = 1/2 − 𝑑(𝑠 𝑗 , 𝑣). We observe that 𝛼 + 𝛽 ≤ 𝑥 𝑒 for
otherwise 𝑑(𝑠 𝑖 , 𝑢) + 𝑥 𝑒 + 𝑑(𝑠 𝑗 , 𝑣) < 1. We see that 𝑒 is cut iff 𝜃 lies in the interval
[𝑑(𝑠 𝑖 , 𝑢), 1/2) or it lies in the interval [𝑑(𝑠 𝑗 , 𝑣), 1/2). Thus 𝑒 is cut with probability
2 max(𝛼, 𝛽) ≤ 2(𝛼 + 𝛽) ≤ 2𝑥 𝑒 .
There are two other cases. One is when both 𝑢, 𝑣 are outside every half-ball.
In this case the edge 𝑒 is not cut. The other is when 𝑢 ∈ 𝐵 𝑑 (𝑠 𝑖 , 1/2) for some 𝑖
and 𝑣 is not inside any ball. The analysis here is similar to the second case and
we leave it as an exercise. 
Thus the expected cost of the cut is at most 2 𝑒 𝑐 𝑒 𝑥 𝑒 ≤ 2 OPT𝐿𝑃 ≤ 2 OPT.
Í
One can improve and obtain a 2(1 − 1/𝑘)-approximation by saving on one
terminal as we did in the preceding section.
Exercise 15.2. Modify the algorithm to obtain a 2(1 − 1/𝑘)-approximation with
respect to the LP relaxation.
Exercise 15.3. Consider a variant of the algorithm where each 𝑠 𝑖 picks an
independent 𝜃𝑖 ∈ (0, 1/2); we output the cut ∪𝑖 𝛿(𝐵 𝑑 (𝑠 𝑖 , 𝜃𝑖 )). Prove that this also
yields a 2-approximation (can be improved to 2(1 − 1/𝑘)-approximation).
Integrality gap: The analysis is tight as shown by the following integrality gap
example. Consider a star with center 𝑟 and 𝑘 leaves 𝑠 1 , 𝑠2 , . . . , 𝑠 𝑘 which are the
terminals. All edges have cost 1. Then it is easy to see that a feasible integral
solution consists of removing 𝑘 − 1 edges from the star. Hence OPT = 𝑘 − 1. On
the other hand, setting 𝑥 𝑒 = 1/2 for each edge is a feasible fractional solution of
cost 𝑘/2. Hence the integrality gap is 2(1 − 1/𝑘).

15.3 A Partitioning View and Geometric Relaxation


Section to be filled in later. For now see notes form 2018 https://courses.engr.
illinois.edu/cs583/sp2018/Notes/multiwaycut-ckr.pdf.

15.4 Node-weighted and Directed Multiway Cut


Section to be filled in later. For now see paper which has very short and
elementary proofs of 2-approximations for both problems which are tight.
CHAPTER 15. MULTIWAY CUT 182

http://chekuri.cs.illinois.edu/papers/dir-multiway-cut-soda.pdf.
Chapter 16

Multicut

In the Multicut problem, we are given a graph 𝐺 = (𝑉 , 𝐸), a capacity function


that assigns a capacity 𝑐 𝑒 to each edge 𝑒, and a set of pairs (𝑠 1 , 𝑡1 ), ..., (𝑠 𝑘 , 𝑡 𝑘 ). The
Multicut problem asks for a minimum capacity set of edges 𝐹 ⊆ 𝐸 such that
removing the edges in 𝐹 disconnects 𝑠 𝑖 and 𝑡 𝑖 , for all 𝑖. Note that the Multicut
problem generalizes the Multiway Cut problem that we saw previously. We
describe an 𝑂(log 𝑘) approximation algorithm for Multicut.
We start by describing an LP formulation for the problem. For each edge
𝑒, we have a variable 𝑑 𝑒 . We interpret each variable 𝑑 𝑒 as a distance label for
the edge. Let 𝒫𝑠 𝑖 ,𝑡 𝑖 denote the set of all paths between 𝑠 𝑖 and 𝑡 𝑖 . We have the
following LP for the problem:

𝑐 𝑒 𝑑𝑒
Í
min
𝑒∈𝐸
s.t.
𝑑 𝑒 ≥ 1 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 1 ≤ 𝑖 ≤ 𝑘
Í
𝑒∈𝑝
𝑑𝑒 ≥ 0 𝑒∈𝐸
The LP assigns distance labels to edges so that, on each path 𝑝 between 𝑠 𝑖
and 𝑡 𝑖 , the distance labels of the edges on 𝑝 sum up to at least one. Note that,
even though the LP can have exponentially many constraints, we can solve the
LP in polynomial time using the ellipsoid method and the following separation
oracle. Given distance labels 𝑑 𝑒 , we set the length of each edge to 𝑑 𝑒 and, for
each pair (𝑠 𝑖 , 𝑡 𝑖 ), we compute the length of the shortest path between 𝑠 𝑖 and 𝑡 𝑖
and check whether it is at least one. If the shortest path between 𝑠 𝑖 and 𝑡 𝑖 has
length smaller than one, we have a violated constraint. Conversely, if all shortest
paths have length at least one, the distance labels define a feasible solution.
We also consider the dual of the previous LP. For each path 𝑝 between any
pair (𝑠 𝑖 , 𝑡 𝑖 ) we have a dual variable 𝑓𝑝 . We interpret each variable 𝑓𝑝 as the

183
CHAPTER 16. MULTICUT 184

amount of flow between 𝑠 𝑖 and 𝑡 𝑖 that is routed along the path 𝑝. We have the
following dual LP:
𝑘
𝑓𝑝
Í Í
max
𝑖=1 𝑝∈𝒫𝑠 𝑖 ,𝑡 𝑖
s.t.
𝑓𝑝 ≤ 𝑐 𝑒 𝑒 ∈ 𝐸(𝐺)
Í
𝑝: 𝑒∈𝑝

𝑓𝑝 ≥ 0 𝑝 ∈ 𝒫𝑠1 ,𝑡1 ∪ ... ∪ 𝒫𝑠 𝑘 ,𝑡 𝑘


The dual is an LP formulation for the Maximum Throughput Multicommod-
ity Flow problem. In the Maximum Throughput Multicommodity Flow problem,
we have 𝑘 different commodities. For each 𝑖, we want to route commodity 𝑖
from the source 𝑠 𝑖 to the destination 𝑡 𝑖 . Each commodity must satisfy flow
conservation at each vertex other than its source and its destination. Additionally,
the total flow routed on each edge must not exceed the capacity of the edge. The
goal is to maximize the sum of the commodities routed.
The dual LP tries to assign an amount of flow 𝑓𝑝 to each path 𝑝 so that the
total flow on each edge is at most the capacity of the edge (the flow conservation
constraints are automatically satisfied). Note that the endpoints of the path 𝑝
determine which kind of commodity is routed along the path.

Exercise 16.1. Write the Multicut LP and its dual in a compact form with
polynomially many constraints.

16.1 Upper Bound on the Integrality Gap


In this section, we will show that the integrality gap of the LP is 𝑂(log 𝑘) using
a randomized rounding algorithm due to Calinescu, Karloff, and Rabani [27].
The first algorithm that achieved an 𝑂(log 𝑘)-approximation for Multicut is
due to Garg, Vazirani, and Yannakakis [63] (see [152] and [155]), and it is based
on the region growing technique introduced by Leighton and Rao [113]. The
reason that we choose to present the randomized rounding algorithm is due to
its future application for metric embeddings.
Let 𝐵 𝑑 (𝑣, 𝑟) denote the ball of radius 𝑟 centered at the vertex 𝑣 in the metric
induced by the distance labels 𝑑 𝑒 .
CHAPTER 16. MULTICUT 185

CKR-RandomPartition:
Solve the LP to get the distance labels 𝑑 𝑒
Pick 𝜃 uniformly at random from [0, 1/2)
Pick a random permutation 𝜎 on {1, 2, ..., 𝑘}
for 𝑖 = 1 to 𝑘
𝑉𝜎(𝑖) = 𝐵 𝑑 (𝑠 𝜎(𝑖) , 𝜃)\ 𝑉𝜎(𝑗)
Ð
𝑗<𝑖
𝑘
𝛿(𝑉𝑖 )
Ð
Output
𝑖=1

Lemma 16.1. CKR-RandomPartition correctly outputs a feasible multicut for the given
instance.

Proof. Let 𝐹 be the set of edges output by the algorithm. Suppose 𝐹 is not a
feasible multicut. Then there exists a pair of vertices (𝑠 𝑖 , 𝑡 𝑖 ) such that there is a
path between 𝑠 𝑖 and 𝑡 𝑖 in 𝐺 − 𝐹. Therefore there exists a 𝑗 such that 𝑉𝑗 contains
𝑠 𝑖 and 𝑡 𝑖 . Since 𝑉𝑗 ⊆ 𝐵 𝑑 (𝑠 𝑗 , 𝜃), both 𝑠 𝑖 and 𝑡 𝑖 are contained in the ball of radius 𝜃
centered at 𝑠 𝑗 . Consequently, the distance between 𝑠 𝑗 and 𝑠 𝑖 is at most 𝜃 and the
distance between 𝑠 𝑗 and 𝑡 𝑖 is at most 𝜃. By the triangle inequality, the distance
between 𝑠 𝑖 and 𝑡 𝑖 is at most 2𝜃. Since 𝜃 is smaller than 1/2, it follows that the
distance between 𝑠 𝑖 and 𝑡 𝑖 is smaller than one. This contradicts the fact that the
distance labels 𝑑 𝑒 are a feasible solution for the LP. Therefore 𝐹 is a multicut, as
desired. 
Lemma 16.2. The probability that an edge 𝑒 is cut is at most 2𝐻 𝑘 𝑑 𝑒 , where 𝐻 𝑘 is the
𝑘-th harmonic number and 𝑑 𝑒 is the distance label of the edge 𝑒.

Proof. Fix an edge 𝑒 = (𝑢, 𝑣). Let:

𝐿 𝑖 = min{𝑑(𝑠 𝑖 , 𝑢), 𝑑(𝑠 𝑖 , 𝑣)}

𝑅 𝑖 = max{𝑑(𝑠 𝑖 , 𝑢), 𝑑(𝑠 𝑖 , 𝑣)}


We may assume without loss of generality that 𝐿1 ≤ 𝐿2 ≤ ... ≤ 𝐿 𝑘 (be reindexing
the pairs as needed). See Fig 16.1.
Let 𝐴 𝑖 be the event that the edge 𝑒 is cut first by 𝑠 𝑖 . More precisely, 𝐴 𝑖 is the
event that |𝑉𝑖 ∩ {𝑢, 𝑣}| = 1 and |𝑉𝑗 ∩ {𝑢, 𝑣}| = 0 for all 𝑗 such that 𝜎(𝑗) < 𝜎(𝑖).
Note that |𝑉𝑖 ∩ {𝑢, 𝑣}| = 1 simply says that 𝑠 𝑖 cuts the edge 𝑒. If 𝑠 𝑖 is the first to
cut the edge 𝑒, for all 𝑗 that come before 𝑖 in the permutation, neither 𝑢 nor 𝑣
can be in 𝑉𝑗 (if only one of 𝑢 and 𝑣 is in 𝑉𝑗 , 𝑠 𝑗 cuts the edge 𝑒; if both 𝑢 and 𝑣 are
in 𝑉𝑗 , 𝑠 𝑖 cannot cut the edge 𝑒).
Note that the event that the edge 𝑒 is cut is the union of the disjoint events
𝐴1 , ..., 𝐴 𝑘 . Therefore we have:
CHAPTER 16. MULTICUT 186

Figure 16.1: For a fixed edge 𝑒 = (𝑢, 𝑣) we renumber the pairs such that
𝐿1 ≤ 𝐿2 ≤ ... ≤ 𝐿 𝑘 .

Õ
P[𝑒 is cut] = P[𝐴 𝑖 ].
𝑖

Let us fix 𝑟 ∈ [0, 1/2) and consider P[𝐴 𝑖 | 𝜃 = 𝑟]. Note that 𝑠 𝑖 cuts the edge 𝑒
only if one of 𝑢, 𝑣 is inside the ball of radius 𝑟 centered at 𝑠 𝑖 and the other is
outside the ball. Differently said, 𝑠 𝑖 cuts the edge only if 𝑟 ∈ [𝐿 𝑖 , 𝑅 𝑖 ):

P[𝐴 𝑖 | 𝜃 = 𝑟] = 0 if 𝑟 ∉ [𝐿 𝑖 , 𝑅 𝑖 )

Now suppose that 𝑟 ∈ [𝐿 𝑖 , 𝑅 𝑖 ). Let us fix 𝑗 < 𝑖 and suppose 𝑗 comes before
𝑖 in the permutation (that is, 𝜎(𝑗) < 𝜎(𝑖)). Recall that, since 𝑗 < 𝑖, we have
𝐿 𝑗 ≤ 𝐿 𝑖 ≤ 𝑟. Therefore at least one of 𝑢, 𝑣 is inside the ball of radius 𝑟 centered at
𝑠 𝑗 . Consequently, 𝑠 𝑖 cannot be the first to cut the edge 𝑒. Therefore 𝑠 𝑖 is the first
to cut the edge 𝑒 only if 𝜎(𝑖) < 𝜎(𝑗) for all 𝑗 < 𝑖. See Fig 16.2. Since 𝜎 is a random
permutation, 𝑖 appears before 𝑗 for all 𝑗 < 𝑖 with probability 1/𝑖. Therefore we
have:
CHAPTER 16. MULTICUT 187

1
P[𝐴 𝑖 | 𝜃 = 𝑟] ≤ if 𝑟 ∈ [𝐿 𝑖 , 𝑅 𝑖 )
𝑖

Figure 16.2: If 𝜎(𝑗) < 𝜎(𝑖), 𝑠 𝑖 cannot be the first to cut the edge 𝑒 = (𝑢, 𝑣). On the
left 𝑠 𝑗 also cuts the edge. On the right 𝑠 𝑗 captures both end points and therefore
𝑠 𝑖 cannot cut it.

Since 𝜃 was selected uniformly at random from the interval [0, 1/2), and
independently from 𝜎, we have:

1 2
P[𝐴 𝑖 ] ≤ P[𝜃 ∈ [𝐿 𝑖 , 𝑅 𝑖 )] = (𝑅 𝑖 − 𝐿 𝑖 )
𝑖 𝑖
By the triangle inequality, 𝑅 𝑖 ≤ 𝐿 𝑖 + 𝑑 𝑒 . Therefore:

2𝑑 𝑒
P[𝐴 𝑖 ] ≤
𝑖
Consequently,
Õ
P[𝑒 is cut] = P[𝐴 𝑖 ] ≤ 2𝐻 𝑘 𝑑 𝑒 .
𝑖

Corollary 16.1. The integrality gap of the Multicut LP is 𝑂(log 𝑘).

Proof. Let 𝐹 be the set of edges outputted by the Randomized Rounding algo-
rithm. For each edge 𝑒, let 𝜒𝑒 be an indicator random variable equal to 1 if and
only if the edge 𝑒 is in 𝐹. As we have already seen,

E 𝜒𝑒 = P[𝜒𝑒 = 1] ≤ 2𝐻 𝑘 𝑑 𝑒
Let 𝑐(𝐹) be a random variable equal to the total capacity of the edges in 𝐹. We
have:
CHAPTER 16. MULTICUT 188

Õ Õ Õ
E 𝑐(𝐹) = E 𝑐 𝑒 𝜒𝑒 = 𝑐 𝑒 P[𝜒𝑒 ] ≤ 2𝐻 𝑘 𝑐 𝑒 𝑑 𝑒 = 2𝐻 𝑘 OPTLP
𝑒 𝑒 𝑒

Consequently, there exists a set of edges 𝐹 such that the total capacity of the
edges in 𝐹 is at most 2𝐻 𝑘 OPTLP . Therefore OPT ≤ 2𝐻 𝑘 OPTLP , as desired. 

Corollary 16.2. The algorithm achieves an 𝑂(log 𝑘)-approximation (in expectation)


for the Multicut problem.
Proof. As we have already seen,

E 𝑐(𝐹) ≤ 2𝐻 𝑘 OPTLP
where 𝐹 is the set of edges output by the algorithm and 𝑐(𝐹) is the total capacity
of the edges in 𝐹. Since OPT𝐿𝑃 ≤ OPT,

E 𝑐(𝐹) ≤ 2𝐻 𝑘 OPT = 𝑂(log 𝑘)OPT



Remark 16.1. The expected cost analysis can be used to obtain an algorithm,
via repetition, a randomized algorithm that ouputs an 𝑂(log 𝑘)-approximation
with high probability. The algorithm can also be derandomized but it is not
straight forward. As we remarked there is an alternative deterministic 𝑂(log 𝑘)-
approximation algorithm via region growing.
Flow-Cut Gap: Recall that when 𝑘 = 1 we have the well-known maxflow-
mincut theorem. The integrality gap of the standard LP for MulitCut is the same
as the relative gap between flow and cut when 𝑘 is arbitrary. The upper bound
on the integrality gap gives an upper bound on the gap.
Corollary 16.3. We have:
 
max | 𝑓 | ≤ min |𝐶| ≤ 𝑂(log 𝑘) max | 𝑓 |
m.c. flow 𝑓 multicut 𝐶 m.c. flow 𝑓

where | 𝑓 | represents the value of the multicommodity flow 𝑓 , and |𝐶 | represents the
capacity of the multicut 𝐶.
Proof. Let OPTLP denote the total capacity of an optimal (fractional) solution
for the Multicut LP. Let OPT𝑑𝑢𝑎𝑙 denote the flow value of an optimal solution
for the dual LP. Since OPTLP is a lower bound on the capacity of the minimum
(integral) multicut, we have:

max | 𝑓 | = OPT𝑑𝑢𝑎𝑙 = OPTLP ≤ min |𝐶|


m.c. flow 𝑓 multicut 𝐶
CHAPTER 16. MULTICUT 189

As we have already seen, we have:


 
min |𝐶| ≤ 2𝐻 𝑘 OPTLP = 2𝐻 𝑘 OPT𝑑𝑢𝑎𝑙 = 2𝐻 𝑘 max |𝑓|
multicut 𝐶 m.c. flow 𝑓

16.2 Lower Bound on the Integrality Gap


In this section, we will show that the integrality gap of the LP is Ω(log 𝑘). That
is, we will give a Multicut instance for which the LP gap is Ω(log 𝑘). Let’s start
by looking at expander graphs and their properties.

16.2.1 Expander Graphs


Definition 16.4. A graph 𝐺 = (𝑉 , 𝐸) is an 𝛼-edge-expander if, for any subset 𝑆 of at
most |𝑉 |/2 vertices, the number of edges crossing the cut (𝑆, 𝑉\𝑆) is at least 𝛼|𝑆|.

Note that the complete graph 𝐾 𝑛 is a (|𝑉 |/2)-edge-expander. However, the more
interesting expander graphs are also sparse. Cycles and grids are examples of
graphs that are very poor expanders.

Figure 16.3: The top half of the cycle has |𝑉 |/2 vertices and only two edges
crossing
p the cut. The left half of the grid has roughly |𝑉 |/2 vertices and only
|𝑉 | edges crossing the cut.

Definition 16.5. A graph 𝐺 is 𝑑-regular if every vertex in 𝐺 has degree 𝑑.

Note that 2-regular graphs consist of a collection of edge disjoint cycles and
therefore they have poor expansion. However, for any 𝑑 ≥ 3, there exist 𝑑-regular
graphs that are very good expanders.
CHAPTER 16. MULTICUT 190

Theorem 16.6. For every 𝑑 ≥ 3 there exists an infinite family of 𝑑-regular 1-edge-
expanders.

We will only need the following special case of the previous theorem.

Theorem 16.7. There exists a universal constant 𝛼 > 0 and an integer 𝑛0 such that,
for all even integers 𝑛 ≥ 𝑛0 , there exists an 𝑛-vertex, 3-regular 𝛼-edge-expander.

Proof Idea. The easiest way to prove this theorem is using the probabilistic
method. The proof itself is beyond the scope of this lecture1. The proof idea is
the following.
Let’s fix an even integer 𝑛. We will generate a 3-regular random graph 𝐺 by
selecting three random perfect matchings on the vertex set {1, 2, ..., 𝑛} (recall
that a perfect matching is a set of edges such that every vertex is incident to
exactly one of these edges). We select a random perfect matching as follows.
We maintain a list of vertices that have not been matched so far. While there is
at least one vertex that is not matched, we select a pair of distinct vertices 𝑢, 𝑣
uniformly at random from all possible pairs of unmatched vertices. We add the
edge (𝑢, 𝑣) to our matching and we remove 𝑢 and 𝑣 from the list. We repeat this
process three times (independently) to get three random matchings. The graph
𝐺 will consist of the edges in these three matchings. Note that 𝐺 is actually a
3-regular multigraph since it might have parallel edges (if the same edge is in at
least two of the matchings). There are two properties of interest: (1) 𝐺 is a simple
graph and (2) 𝐺 is an 𝛼-edge-expander for some constant 𝛼 > 0. If we can show
that 𝐺 has both properties with positive probability, it follows that there exists
a 3-regular 𝛼-edge-expander (if no graph is a 3-regular 𝛼-edge-expander, the
probability that our graph 𝐺 has both properties is equal to 0).
It is not very hard to show that the probability that 𝐺 does not have property
(1) is small. To show that the probability that 𝐺 does not have property (2) is
small, for each set 𝑆 with at most 𝑛/2 vertices, we estimate the expected number
of edges that cross the cut (𝑆, 𝑉\𝑆) (e.g., we can easily show that |𝛿(𝑆)| ≥ |𝑆|/2).
Using tail inequalities (e.g., Chernoff bounds), we can show that the probability
that |𝛿(𝑆)| differs significantly from its expectation is extremely small (i.e., small
enough so that the sum – taken over all sets 𝑆 – of these probabilities is also
small) and we can use the union bound to get the desired result. 
Note that explicit constructions of 𝑑-regular expanders are also known.
Margulis [119] gave an infinite family of 8-regular expanders. There are many
explicit construction by now and it is a very important topic of study — we
refer the reader to the survey on expanders by Hoory, Linial and Wigderson
1A more accurate statement is that the calculations are a bit involved and not terribly interesting
for us.
CHAPTER 16. MULTICUT 191

[86]. The vertex set of a graph 𝐺 𝑛 in Margulis’ construction is ℤ𝑛 × ℤ𝑛 , where


ℤ𝑛 is the set of all integers mod 𝑛. The neighbors of a vertex (𝑥, 𝑦) in 𝐺 𝑛 are
(𝑥 + 𝑦, 𝑦), (𝑥 − 𝑦, 𝑦), (𝑥, 𝑦 + 𝑥), (𝑥, 𝑦 − 𝑥), (𝑥 + 𝑦 + 1, 𝑦), (𝑥 − 𝑦 + 1, 𝑦), (𝑥, 𝑦 + 𝑥 + 1),
and (𝑥, 𝑦 − 𝑥 + 1) (all operations are mod 𝑛). Another example is the following
infinite family of 3-regular expanders. For each prime 𝑝, we have a 3-regular
graph 𝐺 𝑝 . The vertex set of 𝐺 𝑝 is ℤ𝑝 . The neighbors of a vertex 𝑥 in 𝐺 𝑝 are 𝑥 + 1,
𝑥 − 1, and 𝑥 −1 (as before, all operations are mod 𝑝; 𝑥 −1 is the inverse of 𝑥 mod 𝑝,
and we define the inverse of 0 to be 0)2.
We conclude this section with the following observations (they will be very
useful in showing the Ω(𝑘) lower bound on the integrality gap of the LP).
Claim 16.2.1. Let 𝐺 be an 𝑛-vertex 𝑑-regular 𝛼-edge-expander, for some constants
𝑑 ≥ 3 and 𝛼 > 0. Then the diameter of 𝐺 is Θ(log 𝑛).
Proof. For any two vertices 𝑢 and 𝑣, let 𝑑𝑖𝑠𝑡(𝑢, 𝑣) denote the length of a shortest
path between 𝑢 and 𝑣 (the length of a path is the number of edges on the path).
Let’s fix a vertex 𝑠. Let 𝐿 𝑖 be the set of all vertices 𝑣 such that 𝑑𝑖𝑠𝑡(𝑠, 𝑣) is at most
𝑖. Now let’s show that (1 + 𝛼/𝑑)|𝐿 𝑖−1 | ≤ |𝐿 𝑖 | ≤ 𝑑|𝐿 𝑖−1 |. Clearly, |𝐿1 | = 𝑑 (since 𝑠
has degree 𝑑). Therefore we may assume that 𝑖 > 1. Every vertex in 𝐿 𝑖 is in 𝐿 𝑖−1
or it has a neighbor in 𝐿 𝑖−1 . Therefore it suffices to bound |𝐿 𝑖 \𝐿 𝑖−1 |.
Note that any vertex in 𝐿 𝑖−1 has at least one neighbor in 𝐿 𝑖−1 . Therefore the
vertices in 𝐿 𝑖−1 have at most (𝑑 − 1)|𝐿 𝑖−1 | neighbors outside of 𝐿 𝑖−1 . Consequently,
|𝐿 𝑖 | ≤ 𝑑|𝐿 𝑖−1 |.
Now one of 𝐿 𝑖−1 , 𝑉\𝐿 𝑖−1 has at most |𝑉 |/2 vertices. Let’s assume without loss
of generality that 𝐿 𝑖−1 has at most |𝑉 |/2 vertices (the other case is symmetric).
Let 𝐴 = 𝐿 𝑖−1 and let 𝐵 be the set of all vertices in 𝑉\𝐿 𝑖−1 that have a neighbor in
𝐿 𝑖−1 (note that |𝐿 𝑖 | = |𝐴| + |𝐵|). Let 𝐹 be the set of all edges that cross the cut
(𝐿 𝑖−1 , 𝑉\𝐿 𝑖−1 ). Now let’s look at the bipartite graph 𝐻 = (𝐴, 𝐵, 𝐹). Since 𝐺 is an
𝛼-edge-expander, we have |𝐹| ≥ 𝛼|𝐴|. Moreover, |𝐹| = 𝑣∈𝐵 𝑑𝐻 (𝑣), where 𝑑𝐻 (𝑣)
Í
is the degree of 𝑣 in 𝐻. Since 𝑑𝐻 (𝑣) is at most 𝑑, we have 𝛼|𝐴| ≤ |𝐹| ≤ 𝑑|𝐵|.
Therefore we have:

𝐿 𝑖 = |𝐴| + |𝐵| ≥ (1 + 𝛼/𝑑)|𝐴| = (1 + 𝛼/𝑑)|𝐿 𝑖−1 |


It follows by induction that 𝑑(1 + 𝛼/𝑑)𝑖−1 ≤ |𝐿 𝑖 | ≤ 𝑑 𝑖 . Therefore 𝑑𝑖𝑠𝑡(𝑠, 𝑣)
is 𝑂(log 𝑛) for all 𝑣 and there exists a vertex 𝑣 such that 𝑑𝑖𝑠𝑡(𝑠, 𝑣) is Ω(log 𝑛).
Since this is true for any 𝑠, it follows that the diameter of 𝐺 is Θ(log 𝑛). 
Claim 16.2.2. Let 𝐺 be an 𝑛-vertex 3-regular 𝛼-edge-expander and let 𝐵(𝑣, 𝑖) be the
set of all vertices 𝑢 such that there√is a path between 𝑢 and 𝑣 with at most 𝑖 edges. For
any vertex 𝑣, |𝐵(𝑣, log3 𝑛/2)| ≤ 𝑛.
2Note that, unlike Margulis’ construction, this construction is not very explicit since we don’t
know how to generate large primes deterministically.
CHAPTER 16. MULTICUT 192

Proof. Note that 𝐵(𝑣, log3 𝑛/2) is the set of all vertices 𝑤 such that 𝑑𝑖𝑠𝑡(𝑣, 𝑤) is
at most log3 𝑛/2. As we have seen in the proof of the previous claim, we have

|𝐵(𝑣, log3 𝑛/2)| ≤ 3log3 𝑛/2 = 𝑛. 

16.2.2 The Multicut Instance


Let 𝑛0 , 𝛼 be as in Theorem 7. Let 𝑛 ≥ 𝑛0 and let 𝐺 be an 𝑛-vertex 3-regular
𝛼-edge-expander. For each edge 𝑒 in 𝐺, we set the capacity 𝑐 𝑒 to 1. Now let
𝑋 = {(𝑢, 𝑣)|𝑢 ∉ 𝐵(𝑣, log3 𝑛/2)}. The pairs in 𝑋 will be the pairs (𝑠 𝑖 , 𝑡 𝑖 ) that we
want to disconnect. Let (𝐺, 𝑋) be the resulting Multicut instance.

Claim 16.2.3. There exists a feasible fractional solution for (𝐺, 𝑋) of capacity 𝑂(𝑛/log 𝑛).

Proof. Let 𝑑 𝑒 = 2/log3 𝑛, for all 𝑒. Note that, since 𝐺 is 3-regular, 𝐺 has 3𝑛/2
edges. Therefore the total capacity of the fractional solution is
Õ 3𝑛 2 3𝑛
𝑑𝑒 = · =
𝑒
2 log3 𝑛 log3 𝑛

Therefore we only need to show that the solution is feasible. Let (𝑢, 𝑣) be a pair
in 𝑋. Let’s consider a path 𝑝 between 𝑢 and 𝑣. Since 𝑢 is not in 𝐵(𝑣, log3 𝑛/2),
the path 𝑝 has more than log3 𝑛/2 edges (recall that 𝐵(𝑣, 𝑖) is the set of all vertices
𝑢 such that there is a path between 𝑢 and 𝑣 with at most 𝑖 edges). Consequently,
Õ log3 𝑛 2
𝑑𝑒 > · =1
𝑒∈𝑝
2 log3 𝑛


Claim 16.2.4. Any integral solution for (𝐺, 𝑋) has capacity Ω(𝑛).

Proof. Let 𝐹 be an integral solution for (𝐺, 𝑋). Let 𝑉1 , ..., 𝑉ℎ be the connected
components of 𝐺 − 𝐹. Fix an 𝑖 and let 𝑣 be an arbitrary vertex in the connected
component 𝑉𝑖 . Note that, for any 𝑢 in 𝑉𝑖 , there is a path between 𝑣 and 𝑢
with at most log3 𝑛/2 edges (if not, (𝑢, 𝑣) is a pair in 𝑋 which contradicts the
fact that removing the edges in 𝐹 disconnects every pair in 𝑋). Therefore√ 𝑉𝑖 is
contained in 𝐵(𝑣, log3 𝑛/2). It follows from Claim 16.2.2 that |𝑉𝑖 | ≤ 𝑛. Since
𝐺 is an 𝛼-edge-expander and |𝑉𝑖 | ≤ |𝑉 |/2, we have |𝛿(𝑉𝑖 )| ≥ 𝛼|𝑉𝑖 |, for all 𝑖.
Consequently,
ℎ ℎ
1Õ 𝛼Õ 𝛼𝑛
|𝐹| = |𝛿(𝑉𝑖 )| ≥ |𝑉𝑖 | =
2 2 2
𝑖=1 𝑖=1

Therefore 𝐹 has total capacity Ω(𝑛) (recall that every edge has unit capacity). 
CHAPTER 16. MULTICUT 193

Theorem 16.8. The integrality gap of the Multicut LP is Ω(log 𝑘).

Proof. Note that 𝑘 = |𝑋 | = 𝑂(𝑛 2 ). It follows from claims 10 and 11 that the LP
has integrality gap Ω(log 𝑛) = Ω(log 𝑘), as desired. 

Bibliographic Notes
Multicut is closely related to the Sparsest Cut problem. Initial algorithm
for Multicut were based on algorithms for Sparsest Cut. Garg, Vazirani and
Yannakakis [63] then used Leighton and Rao’s region growing argument (as
well as their integrality gap example on expanders for the uniform sparsest
cut problem) [113] to obtain a tight 𝑂(log 𝑘) bound on the integrality gap
for Multicut. The randomized proof that we described is from the work of
Calinescu, Karloff and Rabani [27] on the 0-extension problem; their algorithm
and analysis eventually led to an optimal bound for approximating an arbitrary
metric via random trees [54]. For planar graphs (and more generally any proper
minor closed family of graphs) the integrality gap is 𝑂(1), as shown by Klein,
Plotkin and Rao [104] — the constant depends on the family. There have been
several subsequent refinements of the precise dependence of the constant in the
integrality gap — see [1]. The 𝑂(log 𝑘) bound extends to node-weighted case
and the 𝑂(1) approximation for planar graphs also extends to the node-weighted
case. Multicut is APX-Hard even on trees and in general graphs. Assuming
UGC, the problem is known to be hard to approximate to a super-constant factor
[36]. For some special case of Multicut based on the structure of the demand
graph one can obtain improved approximation ratios [39].
The directed graph version of Multicut turns out be much more difficult.
The flow-cut gap is known to be Ω̃(𝑛 1/7 ) and the problem is also known to
be hard to approximate to almost polynomial factors; these negative results
are due to Chuzhoy and Khanna [45]. The best known approximation ratio is
˜ 11/23 )} [3]. Very recently Kawarabayashi and Sidiropoulos obtained
min{𝑘, 𝑂(𝑛
a poly-logarithmic approximation for Directed Multicut if 𝐺 is a planar directed
graph [101]. There is a notion of symmetric demands in directed graphs and
for that version of Multicut one can get a poly-logarithmic flow-cut gap and
approximation; see [37, 105]. This is closely connected to the Feedback Arc Set
problem in directed graphs [53, 138].
Chapter 17

Sparsest Cut

Sparsest Cut is a fundamental problem in graph algorithms with many appli-


cations and connections. There are several variants that are considered in the
literature and they are closely related but it is useful to have proper terminology
and understand the similarities and differences.
Non-Uniform Sparsest Cut: We consider the general one first. The input is
a graph 𝐺 = (𝑉 , 𝐸) with non-negative edge capacities 𝑐 : 𝐸 → ℝ+ and a set of
pairs (𝑠1 , 𝑡1 ), ..., (𝑠 𝑘 , 𝑡 𝑘 ) along with non-negative demand values 𝐷1 , 𝐷2 , . . . , 𝐷 𝑘 .
When considering undirected graphs the demand pairs unordered — by this we
mean that we do not distinguish (𝑠1 , 𝑡1 ) from (𝑡1 , 𝑠1 ). One can also think of the
demand values as “weights” but the demand terminology makes more sense
when considering the dual flow problem. Given a set/cut 𝑆 ⊆ 𝑉 the sparsity of
the cut 𝑆 is defined as the ratio (Í 𝑐(𝛿(𝑆)) 𝐷 ) . The numerator is the capacity of
𝑖:𝑆∩{𝑠 𝑖 ,𝑡 𝑖 }=1 𝑖
the cut and the denominator is the total demand of the pairs separated by 𝑆. The
goal is to find the cut 𝑆 with minimum sparsity. In other words we are trying to
find the best “bang per buck” cut: how much capacity do we need to remove
per demand separated? It is sometime convenient to consider 𝐺 as the supply
graph and the demands as coming from a demand graph 𝐻 = (𝑉 , 𝐹) where
𝐹 represents the pairs and we associate 𝐷 : 𝐹 → ℝ+ to represent the demand
value (alternatively we can also consider multigraphs). With this representation
𝑐(𝛿 𝐺 (𝑆))
of the pairs the sparsity of cut 𝑆 is simply 𝐷(𝛿 note that 𝛿 𝐺 (𝑆) represents the
𝐻 (𝑆))
supply edges crossing 𝑆 and 𝛿 𝐻 (𝑆) represents the demand edges crossing the
cut 𝑆.
Remark 17.1. One can define a cut as removing a set of edges. In the case of
sparsest cut in undirected graphs it suffices to restrict attention to cuts of the
form 𝛿(𝑆) for some 𝑆 ⊆ 𝑉. It is a useful exercise to see why there is always a
sparsest cut of that form for any given instance. This is not necessarily true for

194
CHAPTER 17. SPARSEST CUT 195

directed graphs.
Uniform Sparsest Cut: Very often when people say Sparsest Cut they mean
the uniform version. This is the version in which 𝐷(𝑢, 𝑣) = 1 for each unordered
pair of vertices (𝑢, 𝑣). For these demands the a cut 𝑆 is 𝑐(𝛿 𝐺 (𝑆))
|𝑆||𝑉\𝑆|
. Alternatively the
demand graph 𝐻 is a complete graph with unit demand values on each edge.
A slightly generalization of Uniform Sparsest Cut is obtained by considering
demands induced by weights on vertices (the dual flow instances are called
Prodcut Multicommodity Flow instances). There is a weight function 𝜋 :
𝑉 → ℝ+ on the vertices and and demand 𝐷(𝑢, 𝑣) for pair (𝑢, 𝑣) is set to be
𝜋(𝑢)𝜋(𝑣). Note that if 𝜋(𝑢) = 1 for all 𝑢 then we obtain Uniform Sparsest Cut. If
𝜋(𝑢) ∈ {0, 1} for all 𝑢 then we are focusing our attention on sparsity with respect
to the set 𝑉 0 = {𝑣 | 𝜋(𝑣) = 1} since the the vertices with 𝜋(𝑢) = 0 play no role.
This may seem unnatural at first but it is closely connected to expansion and
conductance as we will see below.
|𝛿(𝑆)|
Expansion: The expansion of a multi-graph 𝐺 = (𝑉 , 𝐸) is defined as min𝑆:|𝑆|≤|𝑉 |/2 |𝑆|
.
Recall that 𝐺 is an 𝛼-expander if the expansion of 𝐺 is at least 𝛼. A random
3-regular graph is 𝛼-expander with 𝛼 = Ω(1) with high probability. Thus, to
find an 𝛼-expander one can obtain an efficient randomized algorithm by picking
a random graph and then verifying its expansion. However, checking expansion
is coNP-Hard. Expansion is closely related to Uniform Sparsest Cut. Note that
when |𝑆| ≤ |𝑉 |/2 we have
1 |𝛿(𝑆)| |𝛿(𝑆)| 2 |𝛿(𝑆)|
≤ ≤ .
|𝑉 | |𝑆| |𝑆||𝑉 \ 𝑆| |𝑉 | |𝑆|
Thus Expansion and Uniform Sparsest Cut are within a factor of 2 of each other.
Sometimes it is useful to consider expansion with vertex weights 𝑤 : 𝑉 →
ℝ+ . Here the expansion is defined as min𝑆:𝑤(𝑆)≤𝑤(𝑉)/2 |𝛿(𝑆)|
𝑤(𝑆)
. This corresponds
to product multicommodity flow instances where 𝜋(𝑣) = 𝑤(𝑣). The term
|𝛿(𝑆)|
Conductance is sometimes used to denote the quantity vol(𝑆) where vol(𝑆) =
Í
𝑣∈𝑆 deg(𝑣) (here vol is short for volume). When a graph is regular the definition
of expansion and conductance are the same but not in the general setting.
Note that we can capture conductance by setting weights on vertices where
𝑤(𝑣) = deg(𝑣).
Some key applications: Uniform Sparsest Cut is fundamentally interesting
because it helps us directly and indirectly solve the Balanced Separator problem.
In the latter problem we want to partition 𝐺 = (𝑉 , 𝐸) into two pieces 𝐺1 = (𝑉1 , 𝐸1 )
and 𝐺2 = (𝑉2 , 𝐸2 ) where |𝑉1 | and |𝑉2 are roughly the same size so that we
minimize the number of edges between 𝑉1 and 𝑉2 . One can repeatedly use
a sparse cut routine to get a balanced separator. The other key application is
CHAPTER 17. SPARSEST CUT 196

to certify expansion of a graph. Expander graphs and relatives arise in many


applications and knowing whether a graph is expanding or not is very useful.

17.0.1 LP Relaxation and Maximum Concurrent Flow


How do we write an LP relaxtion for Sparsest Cut? This is less obvious than it
is for Multicut and other cut problems where we have explicit terminal pairs
that we wish to separate. We consider the Non-Uniform Sparsest Cut since it is
the most general version. First we will try to develop an integer program. We
will have two sets of variables. For each pair (𝑠 𝑖 , 𝑡 𝑖 ) we will have a variable 𝑦 𝑖
to indicate whether we want to separate the pair. For each edge we will have
a variable 𝑥 𝑒 to indicate whether 𝑒 is cut. If we decide to separate pair 𝑖 then
for every path between 𝑠 𝑖 and 𝑡 𝑖 we should cut at least one edge on the path —
this is similar to relaxations we have seen before. The following captures the
problem:

𝑐𝑒 𝑥𝑒
Í
min Í𝑒∈𝐸
𝑘
𝑖=1 𝐷𝑖 𝑦 𝑖
Õ
𝑥𝑒 ≥ 𝑦𝑖 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 𝑖 ∈ [𝑘]
𝑒∈𝑝
𝑥𝑒 ∈ {0, 1} 𝑒∈𝐸
𝑦𝑖 ∈ {0, 1} 𝑖 ∈ [𝑘]

Note, however, that the objective is a ratio and not linear. It is a standard trick
to obtain an LP relaxation wherein we normalize the denominator in the ratio to
1 and relaxt the variables to be real-valued. Thus we obtain the following LP
relaxation.

Õ
min 𝑐𝑒 𝑥𝑒
𝑒∈𝐸
𝑘
Õ
𝐷𝑖 𝑦 𝑖 = 1
𝑖=1
Õ
𝑥𝑒 ≥ 𝑦𝑖 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 𝑖 ∈ [𝑘]
𝑒∈𝑝
𝑥𝑒 ≥ 0 𝑒∈𝐸
𝑦𝑖 ≥ 0 𝑖 ∈ [𝑘]
CHAPTER 17. SPARSEST CUT 197

Exercise 17.1. Show that the LP is indeed a relaxation for the Sparsest Cut
problem. Formally, given an integer feasible solution with sparsity 𝜆 find a
feasible solution to the relaxation such that its value is no more than 𝜆.

Now we consider the dual LP.

max 𝜆
Õ
𝑦𝑝 ≥ 𝜆𝐷𝑖 𝑖 ∈ [𝑘]
𝑝∈𝒫𝑠 𝑖 ,𝑡 𝑖
𝑘
Õ Õ
𝑦𝑝 ≤ 𝑐𝑒 𝑒∈𝐸
𝑖=1 𝑝∈𝒫𝑠 𝑖 ,𝑡 𝑖 ,𝑒∈𝑝

𝑦𝑝 ≥ 0 𝑝 ∈ 𝒫𝑠 𝑖 ,𝑡 𝑖 , 𝑖 ∈ [𝑘]

The dual LP is a multicommodity flow. It solves the Maximum Concurrent


Multicommodity Flow problem for the given instance. In other words it finds
the largest value of 𝜆 such that there is a feasibly multicommodity flow for the
given pairs in which the flow routed for pair (𝑠 𝑖 , 𝑡 𝑖 ) is at least 𝜆𝐷𝑖 . It is called
“concurrent flow” since we need to route all demand pairs to the same factor
which is in constrast to the dual of Multicut which corresponds to the maximum
throughput multicommodity flow (in which some pairs may have zero flow
while others have a lot of flow).

Exercise 17.2. Suppose we have a cut 𝑆 with sparsity 𝑐(𝛿(𝑆))/( 𝐷𝑖 ).


Í
𝑖:𝑆∩{𝑠 𝑖 ,𝑡 𝑖 }=1
Why is the maximum concurrent flow at most the sparsity of 𝑆?

Note that the LP can be solved via the Ellipsoid method. One can also
write a compact LP via distance variables which will help us later to focus on
constraining the metric in other ways.
Õ
min 𝑐(𝑢𝑣)𝑑(𝑢𝑣)
𝑢𝑣∈𝐸
𝑘
Õ
𝐷𝑖 𝑑(𝑠 𝑖 𝑡 𝑖 ) = 1
𝑖=1
𝑑 is a metric on 𝑉

Flow-cut gap: The flow-cut gap in this context is the following equivalent way
of thinking about the problem. Consider a multicommodity flow instance on 𝐺
with demand pairs (𝑠 1 , 𝑡1 ), . . . , (𝑠 𝑘 , 𝑡 𝑘 ) and demand values 𝐷1 , . . . , 𝐷 𝑘 . Suppose
𝐺 satisfies the cut-condition, that is, for every 𝑆 ⊆ 𝑉 the capacity 𝑐(𝛿(𝑆)) is at least
CHAPTER 17. SPARSEST CUT 198

the demand separated by 𝑆. Can we route all the demand pairs? This is true
when 𝑘 = 1 but is not true in general even for 𝑘 = 3 in undirected graphs. The
question is the maximum value of 𝜆 such that we can route 𝜆𝐷𝑖 for every pair 𝑖?
The worst-case integrality gap of the preceding LP relaxation for Sparsest Cut is
precisely the flow-cut gap. One can ask about the flow-cut gap for all graphs, a
specific class of graphs, for a specific class of demand graphs, a specific class of
supply and demand graphs, and so on.
In these notes we will establish that the flow-cut gap in general undirected
graphs is at most 𝑂(log 𝑘). And there are instances where the gap is Ω(log 𝑘). It
is conjecturedpthat the gap is 𝑂(1) for planar graphs but the best upper bound
we have is 𝑂( log 𝑛). Resolving the flow-cut gap in planar graphs is a major
open problem.
Remark 17.2. Approximating the Sparsest Cut problem is not the same as
establishing flow-cut gaps. One can obtain improved approximations for Sparsest
Cut via stronger relaxationspthan the natural LP. Indeed the best approximation
ratio for Sparsest Cut is 𝑂( log 𝑛) via an SDP relaxation.

17.1 Rounding LP via Connection to Multicut


There are close connections between Sparsest Cut and Multicut. By repeatedly
using Sparsest Cut routine and Set Cover style analysis prove the following.

Exercise 17.3. Suppose there is an 𝛼(𝑘, 𝑛)-approximation for Non-Uniform


Sparsest Cut. Prove that this implies an 𝑂(𝛼(𝑘, 𝑛) ln 𝑘)-approximation for
Multicut.

Can we prove some form a converse? That is, can we use an approximation
algorithm for Multicut to obtain an approximation algorithm for Sparsest Cut?
Note that if someone told us the pairs to separate in an optimum solution
to the Sparsest Cut instance then we can use an (approximation) algorithm
for Multicut to separate those pairs. Here we show that one can use the LP
relaxation and obtain an algorithm via the integrality gap that we have already
establised for Multicut LP. We sketch the argument and focus our attention on
the simpler case when 𝐷𝑖 = 1 for all 𝑖 ∈ [𝑘]. We give this argument even though
it does not lead to the optimum ratio, for historical interest, as well as to illustrate
a useful high-level technique that has found applications in other settings.
Identifying the pairs to separate from LP solution: Suppose we solve the LP
and obtain a feasible solution (𝑥, 𝑦). 𝑦 𝑖 indicates the extent to which pair 𝑖 is
separated. Suppose we have an ideal situation where 𝑦 𝑖 ∈ {0, 𝑝} for every 𝑖.
Let 𝐴 = {𝑖 | 𝑦 𝑖 = 𝑝}. We have |𝐴| = 1/𝑝 since 𝑖 𝑦 𝑖 = 1. Then it is intutively
Í
CHAPTER 17. SPARSEST CUT 199

clear that the LP is separating the pairs in 𝐴. We can then solve the Multicut
problem for the pairs in 𝐴 and consider the ratio of the cost of the cut to |𝐴|.
How do we argue about this algorithm? We do the following. Consider a
fractional assignment 𝑥 0 : 𝐸 → ℝ+ where 𝑥 0𝑒 = min{1, 𝑥 𝑒 /𝑝}; in other words
we scale each 𝑥 𝑒 by 1/𝑝. Note that 𝑦 𝑖 = 𝑑 𝑥 (𝑠 𝑖 , 𝑡 𝑖 ). Since we scaled up each 𝑥 𝑒
by 1/𝑝 it is not hard to see that 𝑑 𝑥0 (𝑠 𝑖 , 𝑡 𝑖 ) ≥ 1; in other words 𝑥 0 is a feasible
solution to the Multicut instance on 𝐺 for the pairs in 𝐴. The fractional cost of
𝑥 0 is 𝑒 𝑐 𝑒 𝑥 0𝑒 ≤ 𝑒 𝑐 𝑒 𝑥 𝑒 /𝑝. Thus, by the algorithm for Multicut in the previous
Í Í
chapter, we can find a feasible Multicut 𝐸0 ⊆ 𝐸 that separates all pairs in 𝐴 and
𝑐(𝐸0) = 𝑂(log 𝑘) 𝑒 𝑐 𝑒 𝑥 𝑒 /𝑝. What is the sparsity of this cut? It is 𝑐(𝐸0)/|𝐴| which
Í
is 𝑂(log 𝑘) 𝑒 𝑥 𝑒 . Thus the sparsity of the cut is 𝑂(log 𝑘)𝜆 where 𝜆 is the value
Í
of the LP relaxation.
Now we consider the general setting. Recall that 𝑖 𝑦 𝑖 = 1. We partition
Í
the pairs into groups that have similar 𝑦 𝑖 values. For 𝑗 ≥ 0, let 𝐴 𝑗 = {𝑖 | 𝑦 𝑖 ∈
(1/2 𝑗+1 , 1/2 𝑗 ]. Thus all pairs in 𝐴 𝑗 have a 𝑦 𝑖 value that are within a factor of 2 of
each other.

Claim 17.1.1. There exists a 𝑗 ≤ log2 𝑘 such that 𝑦𝑖 ≥ 1 1


Í
𝑖∈𝐴 𝑗 2(1+log2 𝑘)
≥ 4 log 𝑘 .

Proof. Consider any 𝑖 such that 𝑖 ∈ 𝐴 𝑗 where 𝑗 > log2 𝑘. By definition we


have 𝑦 𝑖 ≤ 1/2𝑘. Since there are only 𝑘 pairs, 𝑗>log2 𝑘 𝑖∈𝐴 𝑗 𝑦 𝑖 ≤ 𝑘/2𝑘 ≤ 1/2.
Í Í
Thus 𝑗≤log2 𝑘 𝑖∈𝐴 𝑗 𝑦 𝑖 ≥ 1/2 and therefore, there must be a 𝑗 ≤ log2 𝑘 such that
Í Í

𝑦𝑖 ≥ 1
Í
𝑖∈𝐴 𝑗 2(1+log2 𝑘)
(there are only so many groups). 

Consider the 𝐴 𝑗 with 𝑦𝑖 ≥ 1


each 𝑖 ∈ 𝐴 𝑗 we have 1/2 𝑗+1 ≤
Í
𝑖∈𝐴 𝑗 4 log2 𝑘 . For
𝑦 𝑖 ≤ 1/2 𝑗 . Therefore |𝐴| 𝑗 | ≥ min{1, 2 𝑗 /4 log2 𝑘}. The algorithm now separates
the pairs in 𝐴 𝑗 via an algorithm for Multicut.

Claim 17.1.2. Consider the fractional solution 𝑥 0 : 𝐸 → [0, 1] where 𝑥 0𝑒 = min{1, 2 𝑗+1 𝑥 𝑒 }.
Then 𝑑 𝑥0 (𝑠 𝑖 , 𝑡 𝑖 ) ≥ 1 for all 𝑖 ∈ 𝐴 𝑗 . Thus 𝑥 0 is a feasible fractional solution to the Multicut
LP for separating the pairs in 𝐴 𝑗 .

Via the rounding algorithm in the preceding chapter we have there is a


set 𝐸0 ⊆ 𝐸 such that 𝐸0 is a feasible multicut for the pairs in 𝐴 𝑗 and 𝑐(𝐸0) =
𝑂(log 𝑘)2 𝑗+1 𝑒 𝑐 𝑒 𝑥 𝑒 . The sparsity of this cut is 𝑐(𝐸0)/|𝐴 𝑗 | = 𝑂(log2 𝑘) 𝑒 𝑐 𝑒 𝑥 𝑒 .
Í Í

Thus we obtained an 𝑂(log2 𝑘)-approximation for Sparsest Cut when 𝐷𝑖 = 1 for


each pair.
Remark 17.3. When demands are not 1 (or identical) the preceding argument
yields an 𝑂(log 𝑘 log 𝐷) approximation where 𝐷 = 𝑖 𝐷𝑖 with the normalization
Í
that 𝐷𝑖 ≥ 1 for all 𝑖.
CHAPTER 17. SPARSEST CUT 200

17.2 Rounding via ℓ1 embeddings


The optimal rounding of the LP relaxation turns out to go via metric em-
bedding theory and this connection was discovered by Linial, London and
Rabinovich [117] and Aumann and Rabani [AR98]. We need some basics in
metric embeddings to point out the connection and rounding.

17.2.1 A digression through trees


It is instructive to consider the simple setting when 𝐺 is a tree 𝑇 = (𝑉 , 𝐸). In this
case it is easy to find the sparsest cut. For each edge 𝑒 ∈ 𝑇 we can associate a
cut 𝑆 𝑒 which is one side of the two components in 𝑇 − 𝑒. The capacity of the cut
𝛿(𝑆 𝑒 ), by defintion, is 𝑐 𝑒 . Let 𝐷(𝑒) = 𝑖:𝑆𝑒 ∪{𝑠 𝑖 ,𝑡 𝑖 }=1 𝐷𝑖 be the demand separated
Í
by 𝑒. The sparsity of the cut 𝑆 𝑒 is simply 𝑐 𝑒 /𝐷𝑒 . Finding sparsest cut in a tree is
easy from the following exercise.
Exercise 17.4. The sparsest cut in a tree is given by arg min𝑒 𝑐 𝑒 /𝐷𝑒 .
A more interesting exercise is to prove that the LP relaxation gives an
optimum solution on a tree.
Lemma 17.1. Let (𝑥, 𝑦) be a feasible solution to the LP with objectie value 𝜆. If 𝐺 is a
tree 𝑇 then there is an edge 𝑒 ∈ 𝑇 such that 𝑐 𝑒 /𝐷𝑒 ≤ 𝜆.
𝑐 𝑥
Í
Proof. We have 𝜆 = Í 𝐷 𝑒𝑑𝑥𝑒(𝑠𝑒 ,𝑡 ) where 𝑑 𝑥 (𝑠 𝑖 , 𝑡 𝑖 ) is the shortest path distances
𝑖 𝑖 𝑖 𝑖
between 𝑠 𝑖 and 𝑡 𝑖 . There is a unique path 𝑃𝑠 𝑖 ,𝑡 𝑖 from 𝑠 𝑖 to 𝑡 𝑖 in a tree so
𝑑 𝑥 (𝑠 𝑖 , 𝑡 𝑖 ) = 𝑒∈𝑃𝑠 ,𝑡 𝑥 𝑒 . Thus,
Í
𝑖 𝑖

𝑐𝑒 𝑥𝑒
Í
𝑒
𝜆 =
𝑖 𝐷𝑖 𝑑 𝑥 (𝑠 𝑖 , 𝑡 𝑖 )
Í

𝑒 𝑐𝑒 𝑥𝑒
Í
=
𝑖 𝐷𝑖 𝑒∈𝑃𝑠 ,𝑡 𝑥 𝑒
Í Í
𝑖 𝑖
𝑐𝑒 𝑥𝑒
Í
= Í𝑒
𝐷𝑒𝑒
𝑐𝑒
≥ min .
𝑒 𝐷𝑒
𝑎 1 +𝑎 2 +...+𝑎 𝑛 𝑎𝑖
In the last inequality we are using the simple fact that 𝑏1 +𝑏2 +...+𝑏 𝑛 ≥ min𝑖 𝑏𝑖 for
positive 𝑎’s and 𝑏’s. 
What made the proof work for trees? Is there a more general phenomenon
than the fact that trees are pretty simple structures? It turns out that the key fact
is that shortest path distances induced by a tree are ℓ1 metrics or equivalently
cut metrics.
CHAPTER 17. SPARSEST CUT 201

17.2.2 Cut metrics, line metrics, and ℓ1 metrics


Let (𝑉 , 𝑑) be a finite metric space. We will be interested in two special types of
metrics.

Definition 17.1. Let 𝑉 be a finite set and let 𝑆 ⊆ 𝑉. The metric 𝑑𝑆 associated with the
cut 𝑆 is the following: 𝑑𝑆 (𝑢, 𝑣) = 1 if |𝑆 ∩ {𝑢, 𝑣}| = 1 and 𝑑𝑆 (𝑢, 𝑣) = 0 otherwise.

Definition 17.2. Let (𝑉 , 𝑑) be a finite metric space. The metric 𝑑 is a cut metric if
there is a set 𝑆 ⊂ 𝑉 such that 𝑑 = 𝑑𝑆 . 𝑑 is in the cut cone (or in the cone of cut metric)
if there exist non-negative scalars 𝑦𝑆 , 𝑆 ⊂ 𝑉 such that 𝑑(𝑢, 𝑣) = 𝑆⊂𝑉 𝑦𝑆 𝑑𝑆 (𝑢, 𝑣) for
Í
all 𝑢, 𝑣 ∈ 𝑉.

Definition 17.3. Let (𝑉 , 𝑑) be a finite metric space. The metrid 𝑑 is a line metric if
there is a mappting 𝑓 : 𝑉 → ℝ (the real line) such that 𝑑(𝑢, 𝑣) = | 𝑓 (𝑢) − 𝑓 (𝑣)| for all
𝑢, 𝑣 ∈ 𝑉.

Definition 17.4. Let (𝑉 , 𝑑) be a finite metric space. The metric 𝑑 is an ℓ 1 metric if


there is some integer 𝑑 and a mapping 𝑓 : 𝑉 → ℝ 𝑑 (Euclidean space in 𝑑 dimensions)
such that 𝑑(𝑢, 𝑣) = | 𝑓 (𝑢) − 𝑓 (𝑣)| 1 (the ℓ 1 distance) for all 𝑢, 𝑣 ∈ 𝑉.

Claim 17.2.1. A metric (𝑉 , 𝑑) is an ℓ 1 metric iff it is a non-negative combination of


line metrics (in the cone of line metrics).

Proof Sketch. If 𝑑 is an ℓ1 metric then each dimension corresponds to a line metric


and since the ℓ 1 metric is separable over the dimensions it is a non-negative
combination of line metric. Conversely, any non-negative combination of line
metrics can be made into an ℓ1 metric where each line metric becomes a separate
dimension (scalar multiplication of a line metric is also a line metric). 
Lemma 17.2. 𝑑 is an ℓ1 metric iff 𝑑 is in the cut cone.

Proof. Consider the metric 𝑑𝑆 . It is easy to that it is a simple line metric. Map all
vertices in 𝑆 to 0 and all vertices in 𝑉 − 𝑆 to 1. If 𝑑 is in the cut cone then it is
a non-negative combination of the cut metrics, and hence it is a non-negative
combination of line metrics, and hence an ℓ 1 metric.
To prove the converse, it suffices to argue that any line metric is in the cut
cone. Let 𝑉 = {𝑣 1 , 𝑣2 , . . . , 𝑣 𝑛 } and let 𝑑 be a line metric on 𝑉. Without loss of
generality assume that the coordinates of the points corresponding to the line
metric 𝑑 are 𝑥 1 ≤ 𝑥 2 ≤≤ 𝑥 𝑛 on the real line. For 1 ≤ 𝑖 < 𝑛 let 𝑆 𝑖 = {𝑣 1 , 𝑣2 , . . . , 𝑣 𝑖 }.
It is not hard to verify that 𝑖 |𝑥 𝑖+1 − 𝑥 𝑖 |𝑑𝑆𝑖 = 𝑑.
Í

CHAPTER 17. SPARSEST CUT 202

17.2.3 Brief introducton to metric embeddings


Let (𝑉 , 𝑑) me a finite metric space. Note that any finite metric space can be
viewed as one that is derived from the shortest path metric induced on a graph
with some non-negative edge lengths. Thus if 𝐺 = (𝑉 , 𝐸) is a simple graph
and ℓ : 𝐸 → ℝ+ are some edge-lengths then the metric induced on 𝑉 depends
both on the “topology” of 𝐺 as well as the lengths. Finite metrics can encode
graph structure and hence can be diverse. When trying to round we may want
to work with simpler metric spaces. One way to do this is to embed a given
metric space (𝑉 , 𝑑) into a simpler host metric space (𝑉 0 , 𝑑0). An embedding is
simply a mapping of 𝑉 to 𝑉 0. Note that that even though we may be interested
in a finite metric space 𝑉 the host metric space can be continuous/infinite such
as the Euclidean space in some dimenstion 𝑑. Embedding typically distorts
the distances and thus one wants to find embeddings with small distortion.
Distortion can be measured as additive or in a relative sense and for our purposes
we will mainly focus on relative notion of distortion.
Definition 17.5. An embedding of a finite metric space (𝑉 , 𝑑) to a host metric space
(𝑉 0 , 𝑑0) is a mapping 𝑓 : 𝑉 → 𝑉 0. The embedding is an isometric embedding
if 𝑑(𝑢, 𝑣) = 𝑑0( 𝑓 (𝑢), 𝑓 (𝑣)) for all 𝑢, 𝑣 ∈ 𝑉. An embedding is a contraction if
𝑑0( 𝑓 (𝑢), 𝑓 (𝑣)) ≤ 𝑑(𝑢, 𝑣) for all 𝑢, 𝑣 ∈ 𝑉. An embedding is non-contracting if
𝑑0( 𝑓 (𝑢), 𝑓 (𝑣)) ≥ 𝑑(𝑢, 𝑣) for all 𝑢, 𝑣 ∈ 𝑉.
Definition 17.6. Let (𝑉 , 𝑑) and (𝑉 0 , 𝑑0) be two metric spaces and let 𝑓 : 𝑉 → 𝑉 0 be
𝑑0 ( 𝑓 (𝑢), 𝑓 (𝑣))
an embedding. The distortion of 𝑓 is max𝑢,𝑣∈𝑉 ,𝑢≠𝑣 max{ 𝑑0( 𝑓𝑑(𝑢,𝑣)
(𝑢), 𝑓 (𝑣))
, 𝑑(𝑢,𝑣) }.

Of particular importance are embeddings of finite metric spaces into Eu-


clidean space ℝ 𝑑 where the distance is measured under various norms such
as the ℓ 𝑝 norm for various values of 𝑝. Of particular interest are ℓ 1 , ℓ2 , ℓ∞ . An
embedding of a finite metric space (𝑉 , 𝑑) into ℝ 𝑑 means that we map each 𝑣 to a
point (𝑥 1 , 𝑥2 , . . . , 𝑥 𝑑 ) and the distance between say 𝑥, 𝑦 is measured as k𝑥 − 𝑦 k
for some norm of interest.
The dimension 𝑑 is also important in various applications but in some settings
like with Sparsest Cut the dimension is not important.
Theorem 17.7 (Bourgain). Any 𝑛-point finite metric space can be embedded into ℓ 2
(and hence also ℓ1 ) with distortion 𝑂(log 𝑛). Moreover the embedding is a contraction
and can be constructed in randomized polynomial time and embeds points into ℝ 𝑑 where
𝑑 = 𝑂(log2 𝑛).
In fact one can obtain a refined theorem that is useful for Sparsest Cut.
Theorem 17.8 (Bourgain). Let (𝑉 , 𝑑) be 𝑛-point finite metric and let 𝑆 ⊆ 𝑉 with
|𝑆| = 𝑘. Then there is a randomized polynomial time algorithm to compute an
CHAPTER 17. SPARSEST CUT 203

2
embedding 𝑓 : 𝑉 → ℝ 𝑂(log 𝑛) such that (i) the embedding is a contraction (that is,
k 𝑓 (𝑢) − 𝑓 (𝑣)k 1 ≤ 𝑑(𝑢, 𝑣) for all 𝑢, 𝑣 ∈ 𝑉 and (ii) for every 𝑢, 𝑣 ∈ 𝑆, k 𝑓 (𝑢) − 𝑓 (𝑣)k 1 ≥
𝑐
log 𝑘 𝑑(𝑢, 𝑣) for some universal constant 𝑐.

17.2.4 Utilizing the ℓ1 embedding


We saw that the integrality gap of the LP is 1 on trees since the shortest path
metric on trees is in the cut cone (equivalently ℓ 1 -embeddable). More generally
one can prove that if the shortest path metric on a graph 𝐺 embeds into ℓ 1 with
distortion 𝛼 then the integrality gap of the LP for Sparsest Cut is at most 𝛼. This
will imply an 𝑂(log 𝑛)-integrality gap via Bourgain’s theorem since any 𝑛 point
finite metric embeds in to ℓ1 with distortion 𝑂(log 𝑛).

Theorem 17.9. Let 𝐺 = (𝑉 , 𝐸) be a graph. Suppose any finite metric induced by edge
lengths on 𝐸 can be embedded into ℓ 1 with distortion 𝛼. Then the integrality gap of the
LP for Sparsest Cut is at most 𝛼 for any instance on 𝐺.

Proof. Let (𝑥, 𝑦) be a feasible fractional solution and let 𝑑 be the metric induced
by edge lengths given by 𝑥. Let 𝜆 be the value of the solution and recall that
𝑢𝑣∈𝐸 𝑐(𝑢𝑣)𝑑(𝑢𝑣)
Í
𝜆 = Í𝑘 .
𝑖=1 𝐷𝑖 𝑑(𝑠 𝑖 ,𝑡 𝑖 )
Since 𝑑 can be embedded into ℓ 1 with distortion at most 𝛼 and any ℓ 1 metric
is in the cut-cone, it implies that there are scalaras 𝑧 𝑆 , 𝑆 ⊂ 𝑉 such that for all 𝑢, 𝑣

1 Õ Õ
𝑦𝑆 𝑑𝑆 (𝑢, 𝑣) ≤ 𝑑(𝑢, 𝑣) ≤ 𝑦𝑆 𝑑𝑆 (𝑢, 𝑣).
𝛼
𝑆⊂𝑉 𝑆⊂𝑉

Here we assumed without loss of generality that the embedding is a contrac-


tion. For a set 𝑆 ⊂ 𝑉 we use Dem(𝛿(𝑆)) = 𝑖:|𝑆∩{𝑠 𝑖 ,𝑡 𝑖 }|=1 𝐷𝑖 to denote the total
Í
demand crossing the cut 𝑆.

𝑐(𝑢𝑣)𝑑(𝑢𝑣)
Í
𝑢𝑣∈𝐸
𝜆 = Í𝑘
𝐷𝑖 𝑑(𝑠 𝑖 , 𝑡 𝑖 )
Í𝑖=1
𝑢𝑣∈𝐸 𝑐(𝑢𝑣) 𝑆⊂𝑉 𝑧 𝑆 𝑑𝑆 (𝑢𝑣)
Í
1

𝛼 Í𝑘
𝑖=1 𝐷𝑖 𝑆⊂𝑉 𝑑𝑆 (𝑠 𝑖 , 𝑡 𝑖 )
Í

𝑧 𝑆 𝑐(𝛿(𝑆))
Í
1
= Í 𝑆⊂𝑉
𝛼 𝑆⊂𝑉 𝑧 𝑆 Dem(𝛿(𝑆))
1 𝑐(𝛿(𝑆))
≥ min .
𝛼 𝑆⊂𝑉 Dem(𝛿(𝑆))

Thus there is a cut whose sparsity is at most 𝛼 · 𝜆. 


CHAPTER 17. SPARSEST CUT 204

Polynomial-time algorithm: How do we fine a sparse cut? The preceding


proof used the embedding into a metric in the cut-cone. The proof shows
that one of the cuts with 𝑧 𝑆 > 0 has sparsity at most 𝛼 · 𝜆. Recall the proof
that a metric is in the cut-cone iff it is ℓ 1 -embeddable. That argument shows
the following. Suppose we have an ℓ 1 embedding into 𝑑-dimensions. Each
dimension corresponds to a line-embedding. Each line embedding is in the
cut-cone with only 𝑛 − 1 cuts used to express it. Thus, given an ℓ 1 embedding
into 𝑑 dimensions with distortion 𝛼 we only need to try 𝑑(𝑛 − 1) cuts and one of
them will be guaranteed to have sparsity at most 𝛼 · 𝜆.
Via Theorem 17.5 we can obain an 𝑂(log 𝑘) randomized approximation and
the algorithm is described below.

SparseCutviaEmbedding

1. Solve LP relaxation to obtain (𝑥, 𝑦) and metric 𝑑 𝑥 on 𝑉

2. Use Theorem 17.5 obtain map 𝑓 : 𝑉 → ℝ 𝑑

3. For 𝑖 = to 𝑑 do

A. Let 𝑣 𝑗1 , 𝑣 𝑗2 , . . . , 𝑣 𝑗𝑛 be the sorting of 𝑉 according to dimension 𝑖


B. For ℎ = 1 to 𝑛 − 1 let 𝑆 𝑖,ℎ = {𝑣 𝑗1 , 𝑣 𝑗2 , . . . , 𝑣 𝑗 ℎ }

4. Output among all cuts 𝑆 𝑖,ℎ with 1 ≤ 𝑖 ≤ 𝑑 and 1 ≤ ℎ ≤ 𝑛 − 1 output the one
with the smallest sparsity.

Exercise 17.5. Use the refined guarantee in and the proof outline in Theorem 17.9
to show that the described algorithm is a randomized 𝑂(log 𝑘)-approximation
algorithm for Sparsest Cut.

17.3 SDP and Spectral Relaxations


To be filled.

Bibliographic Notes
The highly influential paper of Leighton and Rao [113] obtained an 𝑂(log 𝑛)-
approximation and flow-cut gap for Uniform Sparsest Cut and introduced
the region growing argument as well as the lower bound via expanders (an
important influence is the paper of Sharokhi and Matula [SharokhiM99]). [113]
demonstrated many applications of the divide and conquer approach. There is
CHAPTER 17. SPARSEST CUT 205

a large literature on Sparsest Cut and related problems and we only touched
upon a small part. An outstanding open problem is whether the flow-cut gap
for Non-Uniform Sparsest Cut in planar graphs is 𝑂(1) (this called the GNRS
conjecture [73] in the more general context of minor-free graphs); Rao, building
on ideas from [104], showed that the gap is 𝑂( log 𝑛) [134]. No super-constant
p

lower bound is known for planar graphs. The theory of metric embeddings
has been a fruitful bridge between TCS and mathematics and there are several
surveys and connections from both perspectives.
Bibliography

[1] Ittai Abraham, Cyril Gavoille, Anupam Gupta, Ofer Neiman, and Kunal
Talwar. “Cops, robbers, and threatening skeletons: Padded decomposi-
tion for minor-free graphs”. In: SIAM Journal on Computing 48.3 (2019),
pp. 1120–1145.
[2] Anna Adamaszek, Parinya Chalermsook, Alina Ene, and Andreas Wiese.
“Submodular unsplittable flow on trees”. In: International Conference
on Integer Programming and Combinatorial Optimization. Springer. 2016,
pp. 337–349.
[3] Amit Agarwal, Noga Alon, and Moses S Charikar. “Improved approxi-
mation for directed cut problems”. In: Proceedings of the thirty-ninth annual
ACM symposium on Theory of computing. 2007, pp. 671–680.
[4] Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. “Adaptive sampling
for k-means clustering”. In: Approximation, Randomization, and Combina-
torial Optimization. Algorithms and Techniques. Springer, 2009, pp. 15–
28.
[5] Ajit Agrawal, Philip Klein, and Ramamoorthi Ravi. “When trees collide:
An approximation algorithm for the generalized Steiner problem on
networks”. In: SIAM journal on Computing 24.3 (1995), pp. 440–456.
[6] Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. “Streaming k-means
approximation.” In: NIPS. Vol. 22. 2009, pp. 10–18.
[7] Karhan Akcoglu, James Aspnes, Bhaskar DasGupta, and Ming-Yang Kao.
“Opportunity cost algorithms for combinatorial auctions”. In: Computa-
tional Methods in Decision-Making, Economics and Finance. Springer, 2002,
pp. 455–479.
[8] Matthew Andrews, Julia Chuzhoy, Venkatesan Guruswami, Sanjeev
Khanna, Kunal Talwar, and Lisa Zhang. “Inapproximability of edge-
disjoint paths and low congestion routing on undirected graphs”. In:
Combinatorica 30.5 (2010), pp. 485–520.

206
BIBLIOGRAPHY 207

[9] Kenneth Appel, Wolfgang Haken, et al. “Every planar map is four
colorable. Part I: Discharging”. In: Illinois Journal of Mathematics 21.3
(1977), pp. 429–490.
[10] Sanjeev Arora. “Polynomial time approximation schemes for Euclidean
traveling salesman and other geometric problems”. In: Journal of the ACM
(JACM) 45.5 (1998), pp. 753–782.
[11] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh
Munagala, and Vinayaka Pandit. “Local search heuristics for k-median
and facility location problems”. In: SIAM Journal on computing 33.3 (2004),
pp. 544–562.
[12] Arash Asadpour, Michel X Goemans, Aleksander Madry, ˛ Shayan Oveis
Gharan, and Amin Saberi. “An O (log n/log log n)-approximation
algorithm for the asymmetric traveling salesman problem”. In: Operations
Research 65.4 (2017). Preliminary version in Proc. of ACM-SIAM SODA,
2010., pp. 1043–1061.
[13] Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali
Kemal Sinop. “The Hardness of Approximation of Euclidean k-Means”.
In: 31st International Symposium on Computational Geometry (SoCG 2015). Ed.
by Lars Arge and János Pach. Vol. 34. Leibniz International Proceedings
in Informatics (LIPIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-
Zentrum fuer Informatik, 2015, pp. 754–767. isbn: 978-3-939897-83-5. doi:
10.4230/LIPIcs.SOCG.2015.754. url: http://drops.dagstuhl.de/opus/
volltexte/2015/5117.
[14] Baruch Awerbuch, Yossi Azar, and Yair Bartal. “On-line generalized
Steiner problem”. In: Theoretical Computer Science 324.2-3 (2004), pp. 313–
324.
[15] Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and
Sergei Vassilvitskii. “Scalable K-Means++”. In: Proc. VLDB Endow. 5.7
(Mar. 2012), pp. 622–633. issn: 2150-8097. doi: 10.14778/2180912.2180915.
url: https://doi.org/10.14778/2180912.2180915.
[16] Tanvi Bajpai, Deeparnab Chakrabarty, Chandra Chekuri, and Maryam
Negahbani. “Revisiting Priority 𝑘-Center: Fairness and Outliers”. In:
arXiv preprint arXiv:2103.03337 (2021).
[17] Brenda S Baker. “Approximation algorithms for NP-complete problems
on planar graphs”. In: Journal of the ACM (JACM) 41.1 (1994), pp. 153–180.
[18] Nikhil Bansal, Nitish Korula, Viswanath Nagarajan, and Aravind Srini-
vasan. “Solving packing integer programs via randomized rounding
with alterations”. In: Theory of Computing 8.1 (2012), pp. 533–565.
BIBLIOGRAPHY 208

[19] Reuven Bar-Yehuda and Shimon Even. “A linear-time approximation


algorithm for the weighted vertex cover problem”. In: Journal of Algorithms
2.2 (1981), pp. 198–203.
[20] Yair Bartal, Moses Charikar, and Danny Raz. “Approximating Min-Sum
<i>k</i>-Clustering in Metric Spaces”. In: Proceedings of the Thirty-Third
Annual ACM Symposium on Theory of Computing. STOC ’01. Hersonissos,
Greece: Association for Computing Machinery, 2001, pp. 11–20. isbn:
1581133499. doi: 10.1145/380752.380754. url: https://doi.org/10.1145/
380752.380754.
[21] Kristóf Bérczi, Karthekeyan Chandrasekaran, Tamás Király, and Vivek
Madan. “Improving the integrality gap for multiway cut”. In: Mathematical
Programming 183.1 (2020), pp. 171–193.
[22] Marshall Bern and Paul Plassmann. “The Steiner problem with edge
lengths 1 and 2”. In: Information Processing Letters 32.4 (1989), pp. 171–176.
[23] Glencora Borradaile, Philip Klein, and Claire Mathieu. “An 𝑂(𝑛 log 𝑛)
approximation scheme for Steiner tree in planar graphs”. In: ACM
Transactions on Algorithms (TALG) 5.3 (2009), pp. 1–31.
[24] Simon Bruggmann and Rico Zenklusen. “Submodular maximization
through the lens of linear programming”. In: Mathematics of Operations
Research 44.4 (2019), pp. 1221–1244.
[25] Niv Buchbinder and Moran Feldman. “Submodular functions maximiza-
tion problems”. In: Handbook of Approximation Algorithms and Metaheuris-
tics, Second Edition. Chapman and Hall/CRC, 2018, pp. 753–788.
[26] Jaroslaw Byrka, Fabrizio Grandoni, Thomas Rothvoß, and Laura Sanita.
“An improved LP-based approximation for Steiner tree”. In: Proceedings of
the forty-second ACM symposium on Theory of computing. 2010, pp. 583–592.
[27] G. Calinescu, H. Karloff, and Y. Rabani. “Approximation algorithms
for the 0-extension problem”. In: Proceedings of the twelfth annual ACM-
SIAM symposium on Discrete algorithms. Society for Industrial and Applied
Mathematics Philadelphia, PA, USA. 2001, pp. 8–16.
[28] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. “Maxi-
mizing a monotone submodular function subject to a matroid constraint”.
In: SIAM Journal on Computing 40.6 (2011), pp. 1740–1766.
[29] Robert Carr and Santosh Vempala. “Randomized metarounding”. In: Pro-
ceedings of the thirty-second annual ACM symposium on Theory of computing.
2000, pp. 58–62.
BIBLIOGRAPHY 209

[30] Amit Chakrabarti, Chandra Chekuri, Anupam Gupta, and Amit Ku-
mar. “Approximation algorithms for the unsplittable flow problem”. In:
Algorithmica 47.1 (2007), pp. 53–78.
[31] Deeparnab Chakrabarty and Maryam Negahbani. “Generalized center
problems with outliers”. In: ACM Transactions on Algorithms (TALG) 15.3
(2019), pp. 1–14.
[32] Timothy M Chan. “Approximation schemes for 0-1 knapsack”. In: 1st
Symposium on Simplicity in Algorithms (SOSA 2018). Schloss Dagstuhl-
Leibniz-Zentrum fuer Informatik. 2018.
[33] Ashok K Chandra, Daniel S. Hirschberg, and Chak-Kuen Wong. “Ap-
proximate algorithms for some generalized knapsack problems”. In:
Theoretical Computer Science 3.3 (1976), pp. 293–304.
[34] Moses Charikar, Chandra Chekuri, To-Yat Cheung, Zuo Dai, Ashish Goel,
Sudipto Guha, and Ming Li. “Approximation algorithms for directed
Steiner problems”. In: Journal of Algorithms 33.1 (1999), pp. 73–91.
[35] Moses Charikar, Sudipto Guha, Éva Tardos, and David B Shmoys. “A
constant-factor approximation algorithm for the k-median problem”. In:
Journal of Computer and System Sciences 65.1 (2002), pp. 129–149.
[36] Shuchi Chawla, Robert Krauthgamer, Ravi Kumar, Yuval Rabani, and
D Sivakumar. “On the hardness of approximating multicut and sparsest-
cut”. In: computational complexity 15.2 (2006), pp. 94–114.
[37] Chandra Chekuri, Sreeram Kannan, Adnan Raja, and Pramod Viswanath.
“Multicommodity flows and cuts in polymatroidal networks”. In: SIAM
Journal on Computing 44.4 (2015), pp. 912–943.
[38] Chandra Chekuri and Sanjeev Khanna. “A polynomial time approxima-
tion scheme for the multiple knapsack problem”. In: SIAM Journal on
Computing 35.3 (2005), pp. 713–728.
[39] Chandra Chekuri and Vivek Madan. “Approximating multicut and the
demand graph”. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM
Symposium on Discrete Algorithms. SIAM. 2017, pp. 855–874.
[40] Chandra Chekuri and Kent Quanrud. “On approximating (sparse) cover-
ing integer programs”. In: Proceedings of the Thirtieth Annual ACM-SIAM
Symposium on Discrete Algorithms. SIAM. 2019, pp. 1596–1615.
[41] Chandra Chekuri and Thapanapong Rukkanchanunt. “A note on it-
erated rounding for the Survivable Network Design Problem”. In: 1st
Symposium on Simplicity in Algorithms (SOSA 2018). Schloss Dagstuhl-
Leibniz-Zentrum fuer Informatik. 2018.
BIBLIOGRAPHY 210

[42] Miroslav Chlebık and Janka Chlebıková. “Approximation hardness of the


Steiner tree problem on graphs”. In: Scandinavian Workshop on Algorithm
Theory. Springer. 2002, pp. 170–179.
[43] Nicos Christofides. Worst-case analysis of a new heuristic for the travel-
ling salesman problem. Tech. rep. Carnegie-Mellon Univ Pittsburgh Pa
Management Sciences Research Group, 1976.
[44] Julia Chuzhoy, Venkatesan Guruswami, Sanjeev Khanna, and Kunal
Talwar. “Hardness of Routing with Congestion in Directed Graphs”.
In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory
of Computing. STOC ’07. San Diego, California, USA: Association for
Computing Machinery, 2007, pp. 165–178. isbn: 9781595936318. doi: 10.
1145/1250790.1250816. url: https://doi.org/10.1145/1250790.1250816.
[45] Julia Chuzhoy and Sanjeev Khanna. “Polynomial flow-cut gaps and
hardness of directed cut problems”. In: Journal of the ACM (JACM) 56.2
(2009), pp. 1–28.
[46] Elias Dahlhaus, David S. Johnson, Christos H. Papadimitriou, Paul D.
Seymour, and Mihalis Yannakakis. “The complexity of multiterminal
cuts”. In: SIAM Journal on Computing 23.4 (1994), pp. 864–894.
[47] George Dantzig, Ray Fulkerson, and Selmer Johnson. “Solution of a large-
scale traveling-salesman problem”. In: Journal of the operations research
society of America 2.4 (1954), pp. 393–410.
[48] W Fernandez De La Vega and George S. Lueker. “Bin packing can be
solved within 1+ 𝜀 in linear time”. In: Combinatorica 1.4 (1981), pp. 349–
355.
[49] Yefim Dinitz, Naveen Garg, and Michel X Goemans. “On the single-source
unsplittable flow problem”. In: Combinatorica 19.1 (1999), pp. 17–41.
[50] Irit Dinur and Samuel Safra. “On the hardness of approximating mini-
mum vertex cover”. In: Annals of mathematics (2005), pp. 439–485.
[51] D-Z Du and Frank K. Hwang. “A proof of the Gilbert-Pollak conjecture
on the Steiner ratio”. In: Algorithmica 7.1 (1992), pp. 121–135.
[52] Jack Edmonds. “Optimum branchings”. In: Journal of Research of the
National Bureau of Standards, B 71 (1967), pp. 233–240.
[53] Guy Even, Joseph Seffi Naor, Satish Rao, and Baruch Schieber. “Divide-
and-conquer approximation algorithms via spreading metrics”. In: Journal
of the ACM (JACM) 47.4 (2000), pp. 585–616.
[54] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. “A tight bound on
approximating arbitrary metrics by tree metrics”. In: Journal of Computer
and System Sciences 69.3 (2004), pp. 485–497.
BIBLIOGRAPHY 211

[55] Tomás Feder and Daniel Greene. “Optimal algorithms for approximate
clustering”. In: Proceedings of the twentieth annual ACM symposium on
Theory of computing. 1988, pp. 434–444.
[56] Uriel Feige. “A threshold of ln n for approximating set cover”. In: Journal
of the ACM (JACM) 45.4 (1998), pp. 634–652.
[57] Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. “Maximizing non-
monotone submodular functions”. In: SIAM Journal on Computing 40.4
(2011), pp. 1133–1153.
[58] Uriel Feige and Jan Vondrak. “Approximation algorithms for allocation
problems: Improving the factor of 1-1/e”. In: 2006 47th Annual IEEE
Symposium on Foundations of Computer Science (FOCS’06). IEEE. 2006,
pp. 667–676.
[59] Moran Feldman, Joseph Seffi Naor, Roy Schwartz, and Justin Ward. “Im-
proved approximations for k-exchange systems”. In: European Symposium
on Algorithms. Springer. 2011, pp. 784–798.
[60] Moran Feldman, Joseph Seffi Naor, Roy Schwartz, and Justin Ward. “Im-
proved approximations for k-exchange systems”. In: European Symposium
on Algorithms. Springer. 2011, pp. 784–798.
[61] Marshall L Fisher, George L Nemhauser, and Laurence A Wolsey. “An
analysis of approximations for maximizing submodular set functions II”.
In: Polyhedral combinatorics (1978), pp. 73–87.
[62] Michael R Garey and David S Johnson. Computers and intractability.
Vol. 174. Freeman, San Francisco, 1979.
[63] N. Garg, V.V. Vazirani, and M. Yannakakis. “Approximate max-flow
min-(multi) cut theorems and their applications”. In: Proceedings of the
twenty-fifth annual ACM symposium on Theory of computing. ACM New
York, NY, USA. 1993, pp. 698–707.
[64] Shayan Oveis Gharan, Amin Saberi, and Mohit Singh. “A randomized
rounding approach to the traveling salesman problem”. In: 2011 IEEE
52nd Annual Symposium on Foundations of Computer Science. IEEE. 2011,
pp. 550–559.
[65] Michel X Goemans, Neil Olver, Thomas Rothvoß, and Rico Zenklusen.
“Matroids and integrality gaps for hypergraphic steiner tree relaxations”.
In: Proceedings of the forty-fourth annual ACM symposium on Theory of
computing. 2012, pp. 1161–1176.
BIBLIOGRAPHY 212

[66] Michel X Goemans and David P Williamson. “The primal-dual method


for approximation algorithms and its application to network design prob-
lems”. In: Approximation algorithms for NP-hard problems (1997), pp. 144–
191.
[67] Michel X Goemans and David P Williamson. “The primal-dual method
for approximation algorithms and its application to network design prob-
lems”. In: Approximation algorithms for NP-hard problems (1997), pp. 144–
191.
[68] Ronald L Graham. “Bounds for certain multiprocessing anomalies”. In:
Bell system technical journal 45.9 (1966), pp. 1563–1581.
[69] Ronald L. Graham. “Bounds on multiprocessing timing anomalies”. In:
SIAM journal on Applied Mathematics 17.2 (1969), pp. 416–429.
[70] Fabrizio Grandoni, Tobias Mömke, Andreas Wiese, and Hang Zhou.
“A (5/3+ 𝜀)-approximation for unsplittable flow on a path: placing
small tasks into boxes”. In: Proceedings of the 50th Annual ACM SIGACT
Symposium on Theory of Computing. 2018, pp. 607–619.
[71] Sudipto Guha and Samir Khuller. “Greedy strikes back: Improved facility
location algorithms”. In: Journal of algorithms 31.1 (1999), pp. 228–248.
[72] Anupam Gupta and Jochen Könemann. “Approximation algorithms
for network design: A survey”. In: Surveys in Operations Research and
Management Science 16.1 (2011), pp. 3–20.
[73] Anupam Gupta, Ilan Newman, Yuri Rabinovich, and Alistair Sinclair.
“Cuts, trees and 1-embeddings of graphs”. In: Combinatorica 24.2 (2004),
pp. 233–269.
[74] Anupam Gupta and Kanat Tangwongsan. Simpler Analyses of Local Search
Algorithms for Facility Location. 2008. arXiv: 0809.2554 [cs.DS].
[75] Bernhard Haeupler, Barna Saha, and Aravind Srinivasan. “New construc-
tive aspects of the Lovász local lemma”. In: Journal of the ACM (JACM)
58.6 (2011), pp. 1–28.
[76] Eran Halperin and Robert Krauthgamer. “Polylogarithmic inapproxima-
bility”. In: Proceedings of the thirty-fifth annual ACM symposium on Theory
of computing. 2003, pp. 585–594.
[77] Sariel Har-Peled and Manor Mendel. “Fast construction of nets in low-
dimensional metrics and their applications”. In: SIAM Journal on Comput-
ing 35.5 (2006), pp. 1148–1184.
[78] Sariel Har-Peled and Benjamin Raichel. “Net and prune: A linear time
algorithm for euclidean distance problems”. In: Journal of the ACM (JACM)
62.6 (2015), pp. 1–35.
BIBLIOGRAPHY 213

[79] Sariel HarPeled. Concentration of Random Variables —- Chernoff’s Inequality.


Avaialble at https://sarielhp.org/teach/13/b_574_rand_alg/lec/07_
chernoff.pdf.
[80] Johan Hastad. “Clique is hard to approximate within n/sup 1-/spl epsiv”.
In: Proceedings of 37th Conference on Foundations of Computer Science. IEEE.
1996, pp. 627–636.
[81] Michael Held and Richard M Karp. “The traveling-salesman problem and
minimum spanning trees”. In: Operations Research 18.6 (1970), pp. 1138–
1162.
[82] Rebecca Hoberg and Thomas Rothvoss. “A logarithmic additive inte-
grality gap for bin packing”. In: Proceedings of the Twenty-Eighth Annual
ACM-SIAM Symposium on Discrete Algorithms. SIAM. 2017, pp. 2616–2625.
[83] Dorit S Hochbaum and David B Shmoys. “A polynomial approximation
scheme for scheduling on uniform processors: Using the dual approxi-
mation approach”. In: SIAM journal on computing 17.3 (1988), pp. 539–
551.
[84] Dorit S Hochbaum and David B Shmoys. “A unified approach to ap-
proximation algorithms for bottleneck problems”. In: Journal of the ACM
(JACM) 33.3 (1986), pp. 533–550.
[85] Ian Holyer. “The NP-completeness of edge-coloring”. In: SIAM Journal
on computing 10.4 (1981), pp. 718–720.
[86] Shlomo Hoory, Nathan Linial, and Avi Wigderson. “Expander graphs
and their applications”. In: Bulletin of the American Mathematical Society
43.4 (2006), pp. 439–561.
[87] Ellis Horowitz and Sartaj Sahni. “Exact and approximate algorithms for
scheduling nonidentical processors”. In: Journal of the ACM (JACM) 23.2
(1976), pp. 317–327.
[88] Wen-Lian Hsu and George L Nemhauser. “Easy and hard bottleneck
location problems”. In: Discrete Applied Mathematics 1.3 (1979), pp. 209–
215.
[89] Makoto Imase and Bernard M Waxman. “Dynamic Steiner tree problem”.
In: SIAM Journal on Discrete Mathematics 4.3 (1991), pp. 369–384.
[90] Kamal Jain. “A factor 2 approximation algorithm for the generalized
Steiner network problem”. In: Combinatorica 21.1 (2001), pp. 39–60.
[91] Kamal Jain and Vijay V Vazirani. “Approximation algorithms for metric
facility location and k-median problems using the primal-dual schema
and Lagrangian relaxation”. In: Journal of the ACM (JACM) 48.2 (2001),
pp. 274–296.
BIBLIOGRAPHY 214

[92] Ragesh Jaiswal, Amit Kumar, and Sandeep Sen. “A simple D 2-sampling
based PTAS for k-means and other clustering problems”. In: Algorithmica
70.1 (2014), pp. 22–46.
[93] Klaus Jansen. “An EPTAS for scheduling jobs on uniform processors:
using an MILP relaxation with a constant number of integral variables”.
In: SIAM Journal on Discrete Mathematics 24.2 (2010), pp. 457–485.
[94] Klaus Jansen. “Parameterized approximation scheme for the multiple
knapsack problem”. In: SIAM Journal on Computing 39.4 (2010), pp. 1392–
1412.
[95] Klaus Jansen and Lars Rohwedder. “A quasi-polynomial approxima-
tion for the restricted assignment problem”. In: International Conference
on Integer Programming and Combinatorial Optimization. Springer. 2017,
pp. 305–316.
[96] David S Johnson. “Approximation algorithms for combinatorial prob-
lems”. In: Journal of computer and system sciences 9.3 (1974), pp. 256–278.
[97] EG Co man Jr, MR Garey, and DS Johnson. “Approximation algorithms
for bin packing: A survey”. In: Approximation algorithms for NP-hard
problems (1996), pp. 46–93.
[98] George Karakostas. “A better approximation ratio for the vertex cover
problem”. In: ACM Transactions on Algorithms (TALG) 5.4 (2009), p. 41.
[99] Anna R Karlin, Nathan Klein, and Shayan Oveis Gharan. “A (slightly)
improved approximation algorithm for metric TSP”. In: Proceedings of
the 53rd Annual ACM SIGACT Symposium on Theory of Computing. 2021,
pp. 32–45.
[100] Narendra Karmarkar and Richard M Karp. “An efficient approximation
scheme for the one-dimensional bin-packing problem”. In: 23rd Annual
Symposium on Foundations of Computer Science (sfcs 1982). IEEE. 1982,
pp. 312–320.
[101] Ken-ichi Kawarabayashi and Anastasios Sidiropoulos. “Embeddings of
Planar Quasimetrics into Directed ℓ 1 and Polylogarithmic Approximation
for Directed Sparsest-Cut”. In: Proceedigns of IEEE FOCS (2021). To appear.
2021.
[102] H. Kellerer, H.K.U.P.D. Pisinger, U. Pferschy, and D. Pisinger. Knapsack
Problems. Springer Nature Book Archives Millennium. Springer, 2004. isbn:
9783540402862. url: https://books.google.com/books?id=u5DB7gck08YC.
[103] Subhash Khot and Oded Regev. “Vertex cover might be hard to approxi-
mate to within 2- 𝜀”. In: Journal of Computer and System Sciences 74.3 (2008),
pp. 335–349.
BIBLIOGRAPHY 215

[104] Philip Klein, Serge A Plotkin, and Satish Rao. “Excluded minors, network
decomposition, and multicommodity flow”. In: Proceedings of the twenty-
fifth annual ACM symposium on Theory of computing. 1993, pp. 682–690.
[105] Philip N Klein, Serge A Plotkin, Satish Rao, and Eva Tardos. “Approx-
imation algorithms for Steiner and directed multicuts”. In: Journal of
Algorithms 22.2 (1997), pp. 241–269.
[106] Stavros G Kolliopoulos and Neal E Young. “Approximation algorithms
for covering/packing integer programs”. In: Journal of Computer and
System Sciences 71.4 (2005), pp. 495–505.
[107] Guy Kortsarz and Zeev Nutov. “Approximating minimum cost con-
nectivity problems”. In: Parameterized complexity and approximation al-
gorithms. Ed. by Erik D. Demaine, MohammadTaghi Hajiaghayi, and
Dániel Marx. Dagstuhl Seminar Proceedings 09511. Dagstuhl, Germany:
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 2010. url:
http://drops.dagstuhl.de/opus/volltexte/2010/2497.
[108] Madhukar R Korupolu, C Greg Plaxton, and Rajmohan Rajaraman.
“Analysis of a local search heuristic for facility location problems”. In:
Journal of algorithms 37.1 (2000), pp. 146–188.
[109] Ravishankar Krishnaswamy, Amit Kumar, Viswanath Nagarajan, Yo-
gish Sabharwal, and Barna Saha. “The matroid median problem”. In:
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete
Algorithms. SIAM. 2011, pp. 1117–1130.
[110] Lap Chi Lau, Ramamoorthi Ravi, and Mohit Singh. Iterative methods in
combinatorial optimization. Vol. 46. Cambridge University Press, 2011.
[111] Eugene L Lawler. “Fast approximation algorithms for knapsack prob-
lems”. In: Mathematics of Operations Research 4.4 (1979), pp. 339–356.
[112] Jon Lee, Maxim Sviridenko, and Jan Vondrák. “Matroid matching: the
power of local search”. In: SIAM Journal on Computing 42.1 (2013), pp. 357–
379.
[113] T. Leighton and S. Rao. “Multicommodity max-flow min-cut theorems
and their use in designing approximation algorithms”. In: Journal of the
ACM (JACM) 46.6 (1999). Conference version is from 1988., pp. 787–832.
[114] Tom Leighton, Satish Rao, and Aravind Srinivasan. “Multicommodity
flow and circuit switching”. In: Proceedings of the Thirty-First Hawaii
International Conference on System Sciences. Vol. 7. IEEE. 1998, pp. 459–465.
[115] Jan Karel Lenstra, David B Shmoys, and Éva Tardos. “Approximation
algorithms for scheduling unrelated parallel machines”. In: Mathematical
programming 46.1 (1990), pp. 259–271.
BIBLIOGRAPHY 216

[116] Shi Li. “A 1.488 approximation algorithm for the uncapacitated facility
location problem”. In: Information and Computation 222 (2013), pp. 45–58.
[117] Nathan Linial, Eran London, and Yuri Rabinovich. “The geometry of
graphs and some of its algorithmic applications”. In: Combinatorica 15.2
(1995), pp. 215–245.
[118] Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. “The
planar k-means problem is NP-hard”. In: Theoretical Computer Science 442
(2012), pp. 13–21.
[119] G.A. Margulis. “Explicit constructions of expanders”. In: Problemy Peredaci
Informacii 9.4 (1973), pp. 71–80.
[120] Julián Mestre. “Greedy in approximation algorithms”. In: European Sym-
posium on Algorithms. Springer. 2006, pp. 528–539.
[121] Michael Mitzenmacher and Eli Upfal. Probability and computing: Random-
ization and probabilistic techniques in algorithms and data analysis. Cambridge
university press, 2017.
[122] Sarah Morell and Martin Skutella. “Single source unsplittable flows with
arc-wise lower and upper bounds”. In: Mathematical Programming (2021),
pp. 1–20.
[123] Robin A Moser and Gábor Tardos. “A constructive proof of the general
Lovász local lemma”. In: Journal of the ACM (JACM) 57.2 (2010), pp. 1–15.
[124] Dana Moshkovitz. “The Projection Games Conjecture and the NP-
Hardness of ln 𝑛-Approximating Set-Cover”. In: Theory of Computing
11.1 (2015), pp. 221–235.
[125] Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. Cam-
bridge university press, 1995.
[126] Viswanath Nagarajan, R Ravi, and Mohit Singh. “Simpler analysis of LP
extreme points for traveling salesman and survivable network design
problems”. In: Operations Research Letters 38.3 (2010), pp. 156–160.
[127] Viswanath Nagarajan, Baruch Schieber, and Hadas Shachnai. “The Eu-
clidean k-supplier problem”. In: Mathematics of Operations Research 45.1
(2020), pp. 1–14.
[128] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. “An
analysis of approximations for maximizing submodular set functions—I”.
In: Mathematical Programming 14.1 (1978), pp. 265–294.
[129] Zeev Nutov. “Node-connectivity survivable network problems”. In:
Handbook of Approximation Algorithms and Metaheuristics. Chapman and
Hall/CRC, 2018, pp. 233–253.
BIBLIOGRAPHY 217

[130] Rafail Ostrovsky, Yuval Rabani, Leonard J Schulman, and Chaitanya


Swamy. “The effectiveness of Lloyd-type methods for the k-means prob-
lem”. In: Journal of the ACM (JACM) 59.6 (2013), pp. 1–22.
[131] Ján Plesnık. “A heuristic for the p-center problems in graphs”. In: Discrete
Applied Mathematics 17.3 (1987), pp. 263–268.
[132] Prabhakar Raghavan. “Probabilistic construction of deterministic algo-
rithms: approximating packing integer programs”. In: Journal of Computer
and System Sciences 37.2 (1988), pp. 130–143.
[133] Prabhakar Raghavan and Clark D Tompson. “Randomized rounding:
a technique for provably good algorithms and algorithmic proofs”. In:
Combinatorica 7.4 (1987), pp. 365–374.
[134] Satish Rao. “Small distortion and volume preserving embeddings for
planar and Euclidean metrics”. In: Proceedings of the fifteenth annual
symposium on Computational geometry. 1999, pp. 300–306.
[135] Gabriel Robins and Alexander Zelikovsky. “Tighter bounds for graph
Steiner tree approximation”. In: SIAM Journal on Discrete Mathematics 19.1
(2005), pp. 122–134.
[136] Sartaj Sahni and Teofilo Gonzalez. “P-complete approximation problems”.
In: Journal of the ACM (JACM) 23.3 (1976), pp. 555–565.
[137] Alexander Schrijver. Combinatorial optimization: polyhedra and efficiency.
Vol. 24. Springer Science & Business Media, 2003.
[138] Paul D. Seymour. “Packing directed circuits fractionally”. In: Combinator-
ica 15.2 (1995), pp. 281–288.
[139] Ankit Sharma and Jan Vondrák. “Multiway cut, pairwise realizable
distributions, and descending thresholds”. In: Proceedings of the forty-sixth
annual ACM symposium on theory of computing. 2014, pp. 724–733.
[140] David B Shmoys and Éva Tardos. “An approximation algorithm for the
generalized assignment problem”. In: Mathematical programming 62.1
(1993), pp. 461–474.
[141] Martin Skutella. “A note on the ring loading problem”. In: SIAM Journal
on Discrete Mathematics 30.1 (2016), pp. 327–342.
[142] Aravind Srinivasan. “An extension of the Lovász Local Lemma, and its
applications to integer programming”. In: SIAM Journal on Computing
36.3 (2006), pp. 609–634.
[143] Larry Stockmeyer. “Planar 3-colorability is Polynomial Complete”. In:
SIGACT News 5.3 (July 1973), pp. 19–25. issn: 0163-5700. doi: 10.1145/
1008293.1008294. url: http://doi.acm.org/10.1145/1008293.1008294.
BIBLIOGRAPHY 218

[144] Ola Svensson. “Approximating ATSP by relaxing connectivity”. In: 2015


IEEE 56th Annual Symposium on Foundations of Computer Science. IEEE.
2015, pp. 1–19.
[145] Ola Svensson. “Santa claus schedules jobs on unrelated machines”. In:
SIAM Journal on Computing 41.5 (2012), pp. 1318–1341.
[146] Ola Svensson, Jakub Tarnawski, and László A Végh. “A constant-factor ap-
proximation algorithm for the asymmetric traveling salesman problem”.
In: Journal of the ACM (JACM) 67.6 (2020), pp. 1–53.
[147] Chaitanya Swamy. “Improved approximation algorithms for matroid
and knapsack median problems and applications”. In: ACM Transactions
on Algorithms (TALG) 12.4 (2016), pp. 1–22.
[148] Hiromitsu Takahashi and A Matsuyama. “An approximate solution for
the Steiner problem in graphs”. In: Math. Jap. 24.6 (1980), pp. 573–577.
[149] Robin Thomas. “An update on the four-color theorem”. In: Notices of the
AMS 45.7 (1998), pp. 848–859.
[150] Vera Traub and Jens Vygen. “An improved approximation algorithm for
ATSP”. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on
Theory of Computing. 2020, pp. 1–13.
[151] Sergei Vassilvitskii and David Arthur. “k-means++: The advantages
of careful seeding”. In: Proceedings of the eighteenth annual ACM-SIAM
symposium on Discrete algorithms. 2006, pp. 1027–1035.
[152] Vijay V Vazirani. Approximation algorithms. Springer Science & Business
Media, 2013.
[153] Robert Vicari. “Simplex based Steiner tree instances yield large integrality
gaps for the bidirected cut relaxation”. In: arXiv preprint arXiv:2002.07912
(2020).
[154] Douglas Brent West et al. Introduction to graph theory. Vol. 2. Prentice hall
Upper Saddle River, 2001.
[155] David P Williamson and David B Shmoys. The design of approximation
algorithms. Cambridge university press, 2011.
[156] David P. Williamson. “On the design of approximation algorithms for a
class of graphs problems”. PhD thesis. Cambridge, MA: MIT, 1993.
[157] David P. Williamson, Michel X. Goemans, Milena Mihail, and Vijay V.
Vazirani. “A Primal-Dual Approximation Algorithm for Generalized
Steiner Network Problems”. In: 15 (1995), pp. 435–454. doi: http://dx.
doi.org/10.1007/BF01299747.
BIBLIOGRAPHY 219

[158] Laurence A Wolsey. “An analysis of the greedy algorithm for the sub-
modular set covering problem”. In: Combinatorica 2.4 (1982), pp. 385–
393.
[159] Yuli Ye and Allan Borodin. “Elimination graphs”. In: ACM Transactions
on Algorithms (TALG) 8.2 (2012), pp. 1–23.
[160] Alexander Z Zelikovsky. “An 11/6-approximation algorithm for the
network Steiner problem”. In: Algorithmica 9.5 (1993), pp. 463–470.
Appendix A

Basic Feasible Solutions to LPs


and the Rank Lemma

We discuss the rank lemma about vertex solutions for linear programs. Recall
that a polyhedron in ℝ 𝑛 is defined as the intersection of finite collection of half
spaces. Without loss of generality we can assume that it is defined by a system
of inequalities of the form 𝐴𝑥 ≤ 𝑏 where 𝐴 is a 𝑚 × 𝑛 matrix and 𝑏 is a 𝑚 × 1
vector. A polyhedron 𝑃 is bounded if 𝑃 is contained in finite radius ball around
the origin. A polytope in ℝ 𝑛 is defined as the convex hull of a finite collection
of points. A fundamental theorem about linear programming states that any
bounded polyhedron is a polytope. If the polyhedron is not bounded then it can
be expressed as the Minkowski sum of a polytope and a cone.
A bounded polyhedron 𝑃 in ℝ 𝑛 defined by a system 𝐴𝑥 ≤ 𝑏 must necessarily
have 𝑚 ≥ 𝑛. A point 𝑝 ∈ 𝑃 is a basic feasible solution or a vertex solution of the
system if it is the unique solution to a system 𝐴0 𝑦 = 𝑏 0 where 𝐴0 is a sub-matrix
of 𝐴 with 𝑛 inequalities and the rank of 𝐴0 is equal to 𝑛. The inequalities in
𝐴0 are said to be tight for 𝑦. Note that there may be many other inequalities in
𝐴𝑥 ≤ 𝑏 that are tight at 𝑦 and in general there many be many different rank 𝑛
sub-matrices that give rise to the same basic feasible solution 𝑦.

Lemma A.1. Suppose 𝑦 is a basic feasible solution of a system 𝐴𝑥 ≤ 𝑏, ℓ ≤ 𝑥 ≤ 𝑢


where 𝐴 is a 𝑚 × 𝑛 matrix and ℓ and 𝑢 are vectors defining lower and upper bounds
on the variables 𝑥 ∈ ℝ 𝑛 . Let 𝑆 = {𝑖 : ℓ 𝑖 < 𝑦 𝑖 < 𝑢𝑖 } be the set of indices of “fractional”
variables in 𝑦. Then |𝑆| ≤ 𝑟 𝑎𝑛 𝑘(𝐴) ≤ 𝑚. In particular the number of fractional
variables in 𝑦 is at most the number of “non-trivial” constraints (those that are defined
by 𝐴).

An extension of the previous lemma is often useful when the system defining
the polyhedron has equality constraints.

220
APPENDIX A. BASIC FEASIBLE SOLUTIONS TO LPS AND THE RANK LEMMA221

Corollary A.1. Suppose 𝑦 is a basic feasible solution of a system 𝐴𝑥 ≤ 𝑏, 𝐶𝑥 = 𝑑, ℓ ≤


𝑥 ≤ 𝑢 where 𝐴 is a 𝑚 × 𝑛 matrix, 𝐶 is a 𝑚 0 × 𝑛 matrix, and ℓ and 𝑢 are vectors defining
lower and upper bounds on the variables 𝑥 ∈ ℝ 𝑛 . Let 𝑆 = {𝑖 : ℓ 𝑖 < 𝑦 𝑖 < 𝑢𝑖 } be the set
of indices of “fractional” variables in 𝑦. Then |𝑆| ≤ 𝑟 𝑎𝑛 𝑘(𝐴, 𝐶) ≤ 𝑚 + 𝑚 0.

A special case of the preceding corollary is called the rank lemma in [110].
The lemmas are a simple consequence of the definition of basic feasible
solution. We will focus on the proof of Lemma A.1. It is interesting only when
𝑟𝑎𝑛 𝑘(𝐴) or 𝑚 is smaller than 𝑛, otherwise the claim is trivial. Before we prove
it formally we observe some simple corollaries. Suppose we have a system
𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0 where 𝑚 < 𝑛. Then the number of non-zero variables in a basic
feasible solution is at most 𝑚. Similarly if the system is 𝐴𝑥 ≤ 𝑏, 𝑥 ∈ [0, 1]𝑛
then the number of non-integer variables in 𝑦 is at most 𝑚. For example in the
knapsack LP we have 𝑚 = 1 and hence in any basic feasible solution there can
only be one fractional variable.
Now for the proof. We consider the system 𝐴𝑥 ≤ 𝑏, −𝑥 ≤ −ℓ , 𝑥 ≤ 𝑢 as a
single system 𝐶𝑥 ≤ 𝑑 which has 𝑚 + 2𝑛 inequalities. Since 𝑦 is a basic feasible
solution to this system, from the definition, it is the unique solution of sub-system
𝐶 0 𝑥 = 𝑑0 where 𝐶 0 is a 𝑛 × 𝑛 full-rank matrix. How many rows of 𝐶 0 can come
from 𝐴? At most 𝑟 𝑎𝑛 𝑘(𝐴) ≤ 𝑚 rows. It means that the rest of the rows of 𝐶 0
are of the from the other set of inequalities −𝑥 ≤ ℓ or 𝑥 ≤ 𝑢. There are at least
𝑛 − 𝑟 𝑎𝑛 𝑘(𝐴) such rows which are tight at 𝑦. Thus 𝑛 − 𝑟 𝑎𝑛 𝑘(𝐴) variables in 𝑦 are
tight at lower or upper bounds and hence there can only be 𝑟 𝑎𝑛 𝑘(𝐴) fractional
variables in 𝑦.
See [110] for iterated rounding based methodology for exact and approxima-
tion algorithms. The whole methodology relies on properties of basic feasible
solutions to LP relaxations of combinatorial optimization problems.

A.0.1 Some Examples


We give some examples to illustrate the utility of the rank lemma in the context
of LP relaxations that arise in approximation algorithms.
Knapsack: The natural LP relaxation for this of the form max 𝑛𝑖=1 𝑤 𝑖 𝑥 𝑖 subject
Í
to 𝑥 ∈ [0, 1]𝑛 , 𝑛𝑖=1 𝑠 𝑖 𝑥 𝑖 ≤ 1 consisting of a single non-trivial constraint. A basic
Í
feasible solution has at most 1 fractional variable. See Chapter 3.
Packing Integer Programs (PIPs): The LP relaxation is of the form max 𝑤𝑥
subject to 𝐴𝑥 ≤ 𝑏, 𝑥 ∈ [0, 1]𝑛 where 𝐴 is a 𝑚 × 𝑛 non-negative matrix. See
Chapter 4. When 𝑚 = 1 we have the Knapsack problem and hence the general
problem is some-times referred to as the 𝑚-dimensional Knapsack problem,
especially when 𝑚 is a fixed constant. A basic feasible solution for the LP has
APPENDIX A. BASIC FEASIBLE SOLUTIONS TO LPS AND THE RANK LEMMA222

at most 𝑚 fractional variables. When 𝑚 is a fixed constant one can exploit this
after guessing the big items to obtain a PTAS.
Generalized Assignment: See Chapter 6.

A.0.2 Connection to Caratheodary’s Theorem


Suppose we have 𝑛 points 𝑃 = {𝑝 1 , 𝑝2 , . . . , 𝑝 𝑛 } in 𝑑-dimensional Euclidean space
ℝ 𝑑 . A point 𝑝 ∈ ℝ 𝑑 is in the convex hull of 𝑃 iff 𝑝 is a convex combination of
points in 𝑃. Formally, this means that exist scalars 𝜆1 , . . . , 𝜆𝑛 ≥ 0 such that
𝑖 𝜆 𝑖 = 1 and 𝑝 = 𝑖 𝜆 𝑖 𝑝 𝑖 (note that this is a vector sum). Caratheordary’s
Í Í
theorem states that if 𝑝 is in the convex hull of 𝑃 then there is subset 𝑃 0 ⊆ 𝑃 such
that 𝑝 is in the convex hull of 𝑃 0 and |𝑃 0 | ≤ 𝑑 + 1. One can prove Caratheodary’s
theorem directly but it is helpful to see it also as a consequence of the rank lemma.
Consider the system of inequalities 𝜆 𝑖 ≥ 0, 1 ≤ 𝑖 ≤ 𝑛, 𝑖 𝜆 𝑖 = 1, 𝑖 𝜆 𝑖 𝑝 𝑖 = 𝑝
Í Í
where the system 𝑖 𝜆 𝑖 𝑝 𝑖 = 𝑝 consists of 𝑑 equalities. This system of inequalities
Í
in the variables 𝜆1 , . . . , 𝜆𝑛 is feasible by assumption (since 𝑝 is in the convex hull
of 𝑃). If we take any basic feasible solution of this system of inequalities we see
that at most 𝑑 + 1 of them are non-zero by the rank lemma.
One implication of Caratheordary’s theorem in the context of combinatorial
optimization is the following. Suppose we have a polytope 𝑃 which is an LP
relaxation of some combinatorial problem and let 𝑥 ∈ 𝑃 be any feasible point.
Then 𝑥 can be written as a convex combination of at most 𝑛 +1 vertices of 𝑃 where
𝑛 is number of variables. Moreover, via the Ellipsoid method, one can find such
a convex combination efficiently as long as one can optimize over 𝑃 efficiently.
As an example suppose 𝐺 = (𝑉 , 𝐸) is a graph and 𝑃 is the spanning tree polytope
of 𝐺 (the vertices are the characteristic vectors of spanning trees) which is ℝ 𝑚
(𝑚 = |𝐸|). Then any fractional spanning tree 𝑥 ∈ 𝑃 can be decomposed into at
most 𝑚 + 1 spanning trees.
In the context of approximation 𝑃 is typically a relaxation of some hard
combinatorial optimization problem. In such a case the vertices of 𝑃 do not
correspond to structures we are interested in. For example we can consider
the minimum Steiner tree problem in a graph 𝐺 = (𝑉 , 𝐸) with terminal set
𝑆 ⊆ 𝑉. There are several LP relaxations but perhaps the simplest one is the cut
relaxation which has an integrality gap of 2. In such a case a feasible point 𝑥 ∈ 𝑃
cannot be decomposed into convex combination of Steiner trees. However it can
be shown that 2𝑥 dominates a convex combination of Steiner trees and such a
convex combination can be found efficiently. It requires more technical work to
precisely formalize this and we refer the reader to the work of Carr and Vempala
[29] — you can also find a few applications of such decompositions in the same
paper and it is a simple yet powerful tool to keep in mind.
Appendix B

Probabilistic Inequalities

The course will rely heavily on proababilistic methods. We will mostly rely
on discrete probability spaces. We will keep the discussion high-level where
possible and use certain results in a black-box fashion.
Let Ω be a finite set. A probability measure 𝑝 assings a non-negative number
𝑝(𝜔) for each 𝜔 ∈ Ω such that 𝜔∈Ω 𝑝(𝜔) = 1. The tuple (Ω, 𝑝) defines a discrete
Í
probability space; an event in this space is any subset 𝐴 ⊆ Ω and the probability
of an event is simply 𝑝(𝐴) = 𝜔∈𝐴 𝑝(𝜔). When Ω is a continuous space such as
Í
the interval [0, 1] things get trickier and we need to talk about a measure spaces
𝜎-algebras over Ω; we can only assign probability to certain subsets of Ω. We
will not go into details since we will not need any formal machinery for what we
do in this course.
An important definition is that of a random variable. We will focus only
on real-valued random variables in this course. A random variable 𝑋 in a
probability space is a function 𝑋 : Ω → ℝ. In the discrete setting the expectation
of 𝑋, denoted by E[𝑋], is defined as 𝜔∈Ω 𝑝(𝑤)𝑋(𝜔). For continuous spaces
Í

E[𝑋] = 𝑋(𝜔)𝑑𝑝(𝜔) with appropriate definition of the integral. The variance
of 𝑋, denoted by Var[𝑋] or as 𝜎𝑋 2
, is defined as E (𝑋 − E[𝑋]])2 . The standard
 
deviation is 𝜎𝑋 , the square root of the variance.

Theorem B.1 (Markov’s Inequality). Let 𝑋 be a non-negative random variable such


that E𝑋 is finite. Then for any 𝑡 > 0, P[𝑋 ≥ 𝑡] ≤ E[𝑋]/𝑡.

Proof. The proof is in some sense obvious, especially in the discrete case. Here
is a sketch. Define a new random variable 𝑌 where 𝑌(𝜔) = 𝑋(𝜔) if 𝑋(𝜔) < 𝑡
and 𝑌(𝜔) = 𝑡 if 𝑋(𝜔) ≥ 𝑡. 𝑌 is non-negative and 𝑌 ≤ 𝑋 point-wise and hence

223
APPENDIX B. PROBABILISTIC INEQUALITIES 224

E[𝑌] ≤ E[𝑋]. We also see that:


Õ Õ
E[𝑋] ≥ E[𝑌] = 𝑋(𝜔)𝑝(𝜔) + 𝑡𝑝(𝜔)
𝜔:𝑋(𝜔)<𝑡 𝜔:𝑋(𝜔)≥𝑡
Õ
≥ 𝑡 𝑝(𝜔) (since 𝑋 is non-negative)
𝜔:𝑋(𝜔)≥𝑡
≥ 𝑡 P[𝑋 ≥ 𝑡].
The continuous case follows by replacing sums by integrals. 
Markov’s inequality is tight under the assumption. It is useful to construct
an example. The more information we have about a random variable the better
we can bound its deviation from the expectation.
Theorem B.2 (Chebyshev’s Inequality).  𝑋 be a random variable with E[𝑋] and
 Let
Var[𝑋] are finite. Then P[|𝑋 | ≥ 𝑡] ≤ E 𝑋 2 /𝑡 2 and P[|𝑋 − E[𝑋]| ≥ 𝑡𝜎𝑋 ] ≤ 1/𝑡 2 .
Proof. Consider the non-negative random variable 𝑌 = 𝑋 2 . P[|𝑋 | ≥ 𝑡] = P[𝑌 ≥
𝑡 2 ] and we apply Markov’s inequality to the latter. The second inequality is
similar by considering 𝑌 = (𝑋 − E[𝑋])2 . 
Chernoff-Hoeffding Bounds: We will use several times various forms of the
Chernoff-Hoeffding bounds that apply to a random variable that is a a finite
sum of bounded and independent random variables. There are several versions of
these bounds. First we state a general bound that is applicable to non-negative
random variables and is dimension-free in that it depends only the expectation
rather than the number of variables.
Theorem B.3 (Chernoff-Hoeffding). Let 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 be independent binary
random variables and let 𝑎 1 , 𝑎2 , . . . , 𝑎 𝑛 be coefficients in [0, 1]. Let 𝑋 = 𝑖 𝑎 𝑖 𝑋𝑖 . Then
Í
 𝜇
𝑒𝛿
• For any 𝜇 ≥ E[𝑋] and any 𝛿 > 0, P[𝑋 > (1 + 𝛿)𝜇] ≤ (1+𝛿)(1+𝛿)
.
2 /2
• For any 𝜇 ≤ E[𝑋] and any 𝛿 > 0, P[𝑋 < (1 − 𝛿)𝜇] ≤ 𝑒 −𝜇𝛿 .
The following corollary bounds the deviation from the mean in both direc-
tions.
Corollary B.4. Under the conditions of Theorem B.3, the following hold:
• If 𝛿 > 2𝑒 − 1, P[𝑋 ≥ (1 + 𝛿)𝜇] ≤ 2−(1+𝛿)𝜇 .
• For any 𝑈 there is a constant 𝑐(𝑈) such that for 0 < 𝛿 < 𝑈, P[𝑋 ≥ (1 + 𝛿)𝜇] ≤
𝑒 −𝑐(𝑈)𝛿 𝜇 . In particular, combining with the lower tail bound,
2

P[|𝑋 − 𝜇| ≥ 𝛿𝜇] ≤ 2𝑒 −𝑐(𝑈)𝑡 𝜇 .


2
APPENDIX B. PROBABILISTIC INEQUALITIES 225

We refer the reader to the standard books on randomized algorithms [125]


and [121] for the derivation of the above bounds.
If we are interested only in the upper tail we also have the following bounds
which show the dependence of 𝜇 on 𝑛 to obtain an inverse polynomial probability.

Corollary B.5. Under the conditions of Theorem B.3, there is a universal constant
𝛼 such that for any 𝜇 ≥ max{1, E[𝑋]}, and sufficiently large 𝑛 and for 𝑐 ≥ 1,
P[𝑋 > 𝛼𝑐 ln 𝑛 𝑐
ln ln 𝑛 · 𝜇] ≤ 1/𝑛 . Similarly, there is a constant 𝛼 such that for any 𝜖 > 0,
P[𝑋 ≥ (1 + 𝜖)𝜇 + 𝛼𝑐 log 𝑛/𝜖] ≤ 1/𝑛 𝑐 .

Remark B.1. If the 𝑋𝑖 are in the range [0, 𝑏] for some 𝑏 not equal to 1 one can
scale them appropriately and then use the standard bounds.
Some times we need to deal with random variables that are in the range
[−1, 1]. Consider the setting where 𝑋 = 𝑖 𝑋𝑖 where for each 𝑖, 𝑋𝑖 ∈ [−1, 1]
Í
and E𝑋𝑖 = 0, and the 𝑋𝑖 are independent. In this case E[𝑋] = 0 and we can no
longer expect a dimension-free bound. Suppose each 𝑋𝑖 is 1 with probability 1/2
and −1 with probability 1/2. Then 𝑋 = 𝑖 𝑋𝑖 corresponds to a 1-dimensional
Í
random walk and even though the expected value is 0 the standard deviation of
√ √ 2
𝑋 is Θ( 𝑛). One can show that P[|𝑋 | ≥ 𝑡 𝑛] ≤ 2𝑒 −𝑡 /2 . For these settings we
can use the following bounds.

Theorem B.6. Let 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 be independent random variables such that for each
𝑖, 𝑋𝑖 ∈ [𝑎 𝑖 , 𝑏 𝑖 ]. Let 𝑋 = 𝑖 𝑎 𝑖 𝑋𝑖 and let 𝜇 = E[𝑋]. Then
Í

2𝑡 2
− Í𝑛
(𝑏 𝑖 −𝑎 𝑖 )2
P[|𝑋 − 𝜇| ≥ 𝑡] ≤ 2𝑒 𝑖=1 .

In particular if 𝑏 𝑖 − 𝑎 𝑖 ≤ 1 for all 𝑖 then


2𝑡 2
P[|𝑋 − 𝜇| ≥ 𝑡] ≤ 2𝑒 − 𝑛 .

Í
Note that Var[𝑋] = 𝑖 Var[𝑋𝑖 ]. One can show a bound based of the following
form
𝑡2

2(𝜎2 +𝑀𝑡/3)
P[|𝑋 − 𝜇| ≥ 𝑡] ≤ 2𝑒 𝑋

where |𝑋𝑖 | ≤ 𝑀 for all 𝑖.


Remark B.2. Compare the Chebyshev bound to the Chernoff-Hoeffding bounds
for the same variance.
Sariel Har-Peled maintains a cheat sheet of Chernoff bounds and also has an
interesting derivation. See his notes [79].
APPENDIX B. PROBABILISTIC INEQUALITIES 226

Statistical Estimators, Reducing Variance and Boosting: Randomized algo-


rithms compute a function 𝑓 of the input. In many cases they producing an
unbiased estimator, via a random variable 𝑋, for the the function value. That is,
the algorithm will have the property that the E[𝑋] is the desired value. Note that
the randomness is internal to the algorithm and not part of the input (we can
also consider randomness in the input). Having an estimator is not often useful.
We will also typically try to evaluate Var[𝑋] and then we can use Chebyshev’s
inequality. One way to reduce the variance of the estimate is to run the algorithm
in parallel (with separate random bits) and get estimators 𝑋1 , 𝑋2 , . . . , 𝑋 ℎ and
use 𝑋 = 1ℎ 𝑖 𝑋𝑖 as the final estimator. Note that Var[𝑋]) = 1ℎ 𝑖 Var[𝑋𝑖 ] since
Í Í
the 𝑋𝑖 are independent. Thus the variance has been reduced by a factor of ℎ.
A different approach is to use the median value of 𝑋1 , 𝑋2 , . . . , 𝑋 ℎ as the final
estimator. We can then use Chernoff-Hoeffding bounds to get a much better
dependence on ℎ. In fact both approaches can be combined.

You might also like