CS510 Notes 17 Approximation - Algorithms
CS510 Notes 17 Approximation - Algorithms
Contents
1 Hard Problems and Approximation Algorithms 2
1.1 Preliminaries and examples of hard problems . . . . . . . . . . . . . . . 2
1
1 Hard Problems and Approximation Algorithms
A large number of optimization problems are known to be N P -hard, which we know
from complexity theory are impossible to solve unless P = N P (which is widely believed
to be not true). We nonetheless need to solve these hard problems as they appear in
many practical scenarios.
We make the following remarks about how to go about hard problems
Super polynomial time heuristics: Some algorithms are just barely super-
polynomial and run reasonably fast in most practical cases. For example in class
we discussed a dynamic programming based solution for the Knapsack problem
and proved that the runtime of that algorithms was pseudo-polynomial. For most
practical cases this is good enough. But there aren’t many problem for which we
can find pseudo-polynomial time algorithms.
Probabilistic Analysis: We can also strop focusing on worst case runtime anal-
ysis, instead we may ask what is the average case runtime of the algorithm. For
some problems there are very few instances for which an algorithm takes expo-
nential amount of time. For example for the Hamiltonian cycle problem, there
is an algorithm that will find a Hamiltonian cycle in almost every Hamiltonian
graph. Such results are usually derived by assuming a probability distribution on
the input instances and then it shows that the algorithm will solve the problem
with high probability.
It is not very easy to come up with a reasonable probability distribution on the
input instance.
Approximation Algorithms: In this strategy we drop the requirement that
the algorithm will find the optimal solution. Rather we relax the requirement
by asking for a feasible solution that is some sense “pretty close” to the opti-
mal solution. For a large body of NP-hard problems there are polynomial time
algorithms that find solution that are nearly optimal (only slightly sub-optimal).
2
A maximization problem is given I ∈ I, find a solution s ∈ S(I) such that f (s) is
maximum, i.e.
∀s′ ∈ S(I), f (s) ≥ f (s′ ).
We will sometime refer the value of the optimal solution as OP T (I) = f (s).
A minimization problem is defined analogously. Note that optimal solution need
not be unique, but it will always exist.
In this part of the course we will consider N P -hard optimization problems and assume
that P ̸= N P . This basically assume that for all problems we will consider there
is no polynomial time algorithm that finds an optimal solution for all instances of
the problem. Examples of hard problems that we discussed in class are maximum
independent set in a graph, maximum clique in a graph, Hamiltonian cycle in a graph,
minimum set cover, set packing, minimum vertex cover in a graph, graph coloring, the
knapsack problem, optimal scheduling on parallel machine. There are thousands of
problems that are NP hard, we will define only those problems that we will study.
Definition 1. An approximation algorithm A for an optimization problem P , is
a polynomial time algorithm such that given an input instance I of P , it outputs a
solution s′ ∈ S(I). Some time we will denote by A(I) both the value of the solution
f (s′ ) or the solution itself i.e. s′ (it will be clear from the context).
We need to extend this definition to be able to compare the value A(I) with OP T (I).
So far this only asks for a feasible solution. For example for the maximum independent
set problem in a graph G, since every vertex is an independent set of G, the algorithm
that arbitrarily outputs a vertex of G satisfies this definition, which clearly is a trivial
solution. We need to be able to measure the goodness of approximation algorithms,
by asking for certain performance guarantees of approximation.
3
Equivalently it means that f (s′ ) ≤ f (s) + k.
This is clearly the best possible solution for a N P -hard minimization problem.
4
A graph G on 8 vertices A coloring with 6 colors
Theorem 5 (Euler’s formula). A planar graph with n ≥ 3 vertices and m edges has
m ≤ 3n − 6
An immediate corollary of the above fact using the handshaking lemma is that
5
K5 K3,3
v v
Proof. The proof is by induction with base case |V | ≤ 5, is trivial. Let v be a vertex
such that deg(v) ≤ 5, by the above corollary there must exist such a vertex. Consider
the graph G − v, Since G is planar, G − v is also planar. By induction G − v is 6
colorable. Consider a coloring of G − v and add back to it v. Neighbors of v, (N (v))
are already colored, but the maximum number of color used for N (v) is at most 5 and
one of the 6 colors is available and legitimate to be used for v, we use that available
color to cold v with and thus extend the coloring to the whole G.
Note that the above consecutive proof readily gives us a recursive algorithm for 6
coloring of graph. Consider the following algorithm as our approximation algorithm to
color planar graphs.
6
Algorithm Approx-Planar-Color(G)
if G is bipartite then ▷ Easy to check with a BFS
Color G with the obvious 2 coloring
else
Color G with the 6 coloring as in Theorem 8
Proof. If G is not bipartite, then the minimum colors needed to color G is at least 3
(OP T (G) ≥ 3) and PlanarColor uses at most 6 colors (P lanarColor(G) ≤ 6), hence
the statement follows.
Actually it is well known that any planar graph is 5 colorable. There is an easy
algorithm for 5-coloring of planar graphs.
Theorem 10. Any planar graph is 5-colorable.
Proof. Please check out its proof online, I may include it in an appendix.
7
Such an absolute approximation algorithm merely uses the fact that the range is very
small to come up with a tight absolute approximation guarantee.
Not many hard problems have an absolute approximation algorithm. Typically such
impossibility results use the technique of scaling. The broad idea is as follows: We first
scale up certain parameter(s) associate with the instance, secondly we show that if there
is an absolute approximation algorithm for the sacred up instance, then the solution
can be rescaled to get an optimum solution for the original instance. But this imply
an efficient algorithm to solve the optimization problem, which by our assumption of
P ̸= N P is not possible.
8
Proof. Assume the contrary, that is assume that there is k-absolute approximation
algorithm A for MIS. For any G = (V, E), let G′ be made of k + 1 disjoint copies of G
(there are no edges between them). It is easy to see that the MIS of G′ is composed of
one MIS in each copy of G. This implies that OP T (G′ ) = (k + 1)OP T (G).
G G G G G ... G
Now run A on G′ , it will return an independent set of size at least OP T (G′ ) − k (by
its quality guarantee), which is composed of independent sets in each copy of G. By
the pigeon hole principle, at least one copy of G must contain an independent set of
size at least
OP T (G′ ) − k (k + 1)OP T (G) − k 1
= ≥ OP T (G) − ≥ OP T (G).
k+1 k+1 k
Hence we can find the optimum solution in G in polynomial time, which by our as-
sumption (P ̸= N P ), is not possible.
9
is a subset U ′ ⊆ U , such that
P
A feasible solution to the problem i∈U ′ wi ≤ C. Our
′
P
goal is to maximize f (U ) = i∈U ′ vi .
Informally we would like to pack some items of different sizes into a knapsack of fixed
capacity so as to maximize the total profits from the packed items.
This problem is N P -hard. There is no k-absolute approximation algorithm for the
knapsack problem.
Theorem 12. If P ̸= N P , then there is no k-absolute approximation algorithm for
the knapsack problem.
Proof. Suppose there is a k-absolute approximation algorithm A for the knapsack prob-
lem. Given an instance I of the knapsack problem (where all sizes and profits are
integers), let (2k)I be the instance such that everything remains the same as I ex-
cepts profits are scaled up by a factor of 2k, i.e. p′i = 2k · pi . It is easy to see that
OP T ((2k)I) = 2kOP T (I), because same capacity and same weights implies we can
take the same items as in optimal solution to I, the only difference is that the value of
solution for (2k)I is 2k times OP T (I).
Running algorithm A on (2k)I by its performance guarantee gives us a solution such
that A((2k)I) ≥ OP T ((2k)I) − k. Dividing value of each selected item in this solution
by 2k we get a solution s′ to the original instance I, whose value is at least
Since by our assumption I has integer weights and values, the maximum achievable
value (OP T (I)) is also an integer, hence f (s′ ) ≥ OP T (I)− 12 must be equal to OP T (I).
This clearly gives us a polynomial time algorithm to solve optimally any integer instance
of the knapsack problem, contradicting our assumption that P ̸= N P .
10
such that
OP T (I)
∀ I ∈ I, |A(I) ≥ .
α
In other words if s′ is the solution given by A and s is the optimal solution, then
f (s′ ) ≥ f (s)/α.
For a minimization problem it is analogously defined, except for the requirement is
f (s′ ) ≤ αf (s). We call such algorithms α-approximate algorithm.
4
A C A C
cut edges
5 3
8
6 F 1 B F B
G
13 2
G E D E D
6
The maximum cut problem is to partition V into two subsets S and S = V \ S such
that the number of edges in the cut [S, S] is the maximum.
The problem is N P -hard; we give an approximation algorithm for it. The algorithm is
very simple, in iterates over all the vertices in an arbitrary order and keep each vertex
either in A or B = V \ A that are initially empty. In each step if the current vertex v
has more neighbors in A, then it keeps v in B and vice-versa. If v has no neighbors in
11
either or has equal number of neighbors in both, then it arbitrarily keeps v in either A
or B.
The main problem about max cut is that it is not easy to come up with upper bounds
on the optimum max cut.
Theorem 14. The above greedy algorithm is a 2-approximation algorithm for max cut.
Proof. We prove that in every iteration the number of cut edges (among the vertices
already assigned to A and B) is at least as large as the number of uncut edges. Initially
both of these numbers are 0 and at every step some more edges are now cut and uncut,
meaning some non-negative integers are added to both the number of cut edges and the
number of uncut edges. By the greedy choice the number of cut edges always remain
larger than the number of uncut edges.
The optimal solution might include at most all the edges of G, i.e. |E|. Our algorithm
has the guarantee of number of cut edges is at least as large as the number of uncut
|E|
edges, i.e. f (s′ ) ≥ |E| − f (s′ ) =⇒ f (s′ ) ≥ . Which proves the statement.
2
12
U U
S1 S3 S6 S1 S3 S6
S4 S4
S2 S2
S5 S5
This is a very general optimization problem, that models for example the following
scenario. Suppose we have m application softwares with different capabilities, U is the
set of n capabilities we must have in our system. We want to choose the smallest (least
cost) set of softwares that in total will meet our requirement specs.
Consider the following simple greedy algorithm for this problem. It Iterates over sets
until all elements of U are covered. While there is an uncovered element choose a set
Si from S which covers the most number of (yet) uncovered elements.
Algorithm greedy-set-cover(U, S)
X←U ▷ Yet uncovered elements
C←∅
while X ̸= ∅ do
Select an Si ∈ S that maximizes |Si ∩ X| ▷ Cover most elements
C ← C ∪ Si
X ← X \ Si
return C
13
U
S2 S3
S1
Proof. Let f (s) = OP T = k, i.e. there exists k sets in S that cover all elements of U .
By the pigeon-hole principles, this immediately implies that there exists a set Si ∈ S
that covers at least nk elements. Hence the first set that our algorithm will pick have
at least nk elements. Let n1 be the number of elements that remain uncovered after
the first set is picked. We get that n1 ≤ n − nk = n(1 − k1 ). Again by the pigeon-hole
n1
principle one of the remaining sets in S must contain at least k−1 of the uncovered
14
elements, as otherwise the optimal solution would have to contain more than k sets. Our
greedy algorithm will pick the largest such set and the number of remaining uncovered
n1 1
elements is at most n2 ≤ n1 − k−1 = n1 (1 − k−1 ) ≤ n(1 − k1 )(1 − k−1
1
) ≤ n(1 − k1 )2 .
In general after the iteration, for the number of remaining uncovered elements we have
ni ≤ n(1 − k1 )i .
Let us see in how many iterations does this number goes below 1 (at which point we
have a cover) and this is the number of sets we have. The following calculation gives
us a bound on the number of iterations.
i
1
ni ≤ n 1 − <1
k
ki
1 k 1
1− <
k n
i 1 1
e− k < using the fact that (1 − x) x ∼ e−1
n
i
< ln n
k
i < k ln n
This implies that the number of set we choose i ≤ k ln n = ln nf (s), hence this algo-
rithm is ln n-approximate.
This analysis is tight in the sense, that there are instances on which the greedy-set-
cover indeed selects a cover of size log n · opt.
15
C1 C2 C3 C4 |Ci | = 2i |Ct | = 2t
R1
R2
t t t
2i−1 = 2t − 1 2i = 2t+1 − 2
P S P
|R1 | = |R2 | = n = |U | = Ci =
i=1 i=1 i=1
16
Consider the following instance of set cover problem (which is exactly the vertex cover
problem for a given graph G = (V, E).
U = {e1 , . . . , em }, the set of edges in G that we want to cover.
S = {E1 , E2 , . . . , En } where each Ei ⊂ U , is the set of edge incident on vi . i.e.
Ei = {e : e is incident on vi }. One can think of them as the set of edges covered
by vi .
A natural greedy algorithm for the vertex cover problem would be as follows: While
there is an uncovered edge, choose one of its two endpoints to cover it. Clearly one of
the two endpoints is in the optimal cover, but we can be unfortunate at every step and
choose the wrong one at every step. Consider the following graph.
v1
v8 v2
v0
v7 v3
v6 v5 v4
Figure 8: The optimal vertex cover is {v0 } while to cover each edge we might choose
all other vertices.
But we can modify the first greedy algorithm to improve the approximation guarantee.
This algorithm proceeds as follows. If there is an uncovered edge e = (u, v) choose
both of its endpoints to the cover.
Algorithm vertex-cover(G)
C←∅
while E ̸= ∅ do
pick any {u, v} ∈ E
C ← C ∪ {u, v}
Remove all edges incident to either u or v
return C
Theorem 17. The greedy algorithm that repeatedly picks both endpoints of an uncov-
ered edge that remains is 2-approximate.
17
Proof. Since for each edge e = (u, v), the optimal solution must include either u or
v, while this algorithm at worst picks both u and v. Hence if s′ is solution by this
algorithm and s is the optimal solution then f (s′ ) ≤ 2f (s).
a b
c d
Figure 9: To cover the edge (a, b) we will choose a and b, while to cover (c, d), we will
choose c and d, while an optimal solution is {a, d}
Remark 18. The best known algorithm for vertex cover has an approximation guar-
antee of 2 − O(log log n/ log n), while the best known lower bound is 4/3. To close the
gap is an open problem.
p1
z
2
2
z
makespan
p2
makespan
makespan
3 3
2 4
p3
}|
4 4 4 2
}|
}|
2 6 2 6 6
p4 6
3 3
p5 2 2 2 2 2
{
{
p6 2 m1 m2 m3 m1 m2 m3 m1 m2 m3
18
3.4.1 List scheduling algorithm
We give a simple greedy algorithm for this problem. This algorithm iterates over each
process and assign pi to a machine that currently has the lowest load, i.e. assign each
job to a least loaded machine.
Consider the following example of 6 jobs and 3 machines illustrated in Figure 10. If
the order of jobs was 2, 3, 4, 6, 2, 2 the above greedy algorithm will schedule them as
in Figure 11 for a total makespan of 8 (T1 = 8). This is clearly not optimal if the job
order was 6, 4, 3, 2, 2, 2 the scheduling would have been as in Figure 12 for a makespan
of 7, which is optimal.
p1 2
M1 2 6 M1 6
p2 3
p3 4
M2 3 2 M2 4 2
p4 6
p5 2
p6 2
M3 4 2 M3 3 2 2
Figure 10: 6 jobs with Figure 11: Schedule for Figure 12: Schedule for
times 2, 3, 4, 6, 2, 2 order 2, 3, 4, 6, 2, 2 order 6, 4, 3, 2, 2, 2
Proof. First we need to derive a lower bound on OP T (I) for a general instance I.
We can lower bound OP T (I) by considering the total processingP
P
time i ti . By
ti
the pigeon-hole principle one of the k machines must do at least i amount of
k
19
total work (if every one doesPless than 1/k fraction of total work, then the total
work done will be less than i ti . So we get that
P
ti
OP T (I) ≥ i .
k
We also have the following obvious lower bound on OP T (I). Since the machine
to which the maximum time consuming process tmax will take at least tmax time
to finish, hence the makespan will be at least tmax . We get
∀ pi OP T (I) ≥ ti .
WLOG assume that by the above algorithm, machine m1 has the maximum load, let
this load be cmax = T1 . Let pi be the last job placed at machine m1 . At the time pi
was assigned to m1 , by design the load of m1 at that time was the minimum across
alll machines at that time. Let L1 be the load of m1 at that time. Since we know that
pi is the last job placed at m1 , we get that L1 = T1 − ti . Since this is the least loaded
machine, at that time all other machines must have load at least T1 − ti too.
Adding up the load of all machines we get that
X
Tj ≥ k(T1 − ti ) + ti .
j
The quantity on the right hand side is exactly the total time of all the jobs (since very
job is assigned to exactly one machine), combining it with our first lower bound we get
that
20
X X
kOP T (I) ≥ ti = Tj ≥ k(T1 − ti ) + ti
i j
1
Hence the approximation ratio of the above list scheduling algorithm is 2 − k
, where
k is the number of machines.
We give an example to show that the above approximation guarantee is tight for this
algorithm.
z
p1 1 Output
p2 k Optimal Schedule
makespan
p3 1
}|
z
1 1
makespan
z }| {
1 1 1 1 1
.. .. .. .. .. ..
}|
}|
.
k−1
k
. . . . .
1 1 1 1 1
pk(k−1) 1 1 1 ... 1 1 1 ...
z
{
{
pk2 −k+1 k m1 m2 mk m1 m2 mk
Let n = k(k − 1) + 1, let the first n − 1 jobs have runtime of 1 and the last job job has
runtime k. In other words for 1 ≤ i ≤ n − 1, ti = 1, while tn = k. It is easy to see that
OP T (I) for this I is k, (i.e. assign pn to m1 and distribute the remaining k(k − 1) jobs
equally among the remaining k − 1 machines.) Think about what our algorithm will
do in this case, it balances the first n − 1 jobs among the k machines and then assign
the giant job to one of the machines, resulting in makespan of 2k − 1. This achieves
equality in the above upper bound.
21
4 The TSP Problem
Recall that, given a complete graph G on n vertices with edge weights w : E 7→ R, a
tsp tour is a Hamiltonian cycle in G. The tsp problem is to find a minimum length
tsp tour in G.
a a a a
5 2 5 2 5 2 5 2
1 3 1 3 1 3 1 3
b 1 e b 1 e b 1 e b 1 e
3 3 3 3
2 2 2 2
3 4 3 4 3 4 3 4
c d c d c d c d
2 2 2 2
K5 with edge weights A TSP tour of length 15 A TSP tour of length 11 A TSP tour of length 9
Now, if there is a Hamiltoninan Cycle in G, then the same cycle is a tsp tour T in
G′ . Note that T uses all edges (from G) of weight 1 and is of length n. Thus, T is an
α-approximate tour in G′ . If there is no Hamiltonian Cycle in G, then any tsp tour T
in G′ must use an edge of weight αn + 1. Thus, T is an α-approximate tour in G′ .
22
αn + 1 αn + 1
1 1
αn + 1
1 αn + 1
1 αn + 1
1 1 1 1 1
1 1
αn + 1 1
Hamiltonian cycle TSP tour in G′ of length No Hamiltonian Any TSP tour must use
in G shown in blue 5 = n shown in blue cycle in G an edge of weight αn + 1
23
y
w(x, y) w(y, z)
x z
w(x, z) ≤ w(x, y) + w(y, z)
Figure 15: Direct distance is shorter than the distance via an intermediate point
a a
5 2 2 3
1 3 3 5
1 e 3 e
b b
3 7
2 3 4 5 6 4
c d c d
2 8
Not a metric-tsp instance A metric-tsp instance
1 2 1 2
2
1 2
1 2
1 1 1 1 1
1 1
2 1
Now, if there is a Hamiltoninan Cycle in G, then the same cycle is a tsp tour in G′
which uses all edges (from G) of weight 1 and is of length n. If there is no Hamiltonian
Cycle in G, then any tsp tour T in G′ must use an edge of weight 2 and the length of
the tour is greater than n. Therefore, G has a Hamiltonian cycle if and only if G′ has
a tsp tour of length k = n.
24
4.3 2-approximation for Metric TSP
We observe a simple lower bound on metric-tsp:
Theorem 21. If C is ham-cycle and T ∗ is a mst in G, then w(T ∗ ) ≤ w(C)
Figure 16: A Hamiltonian Cycle C can be obtained by adding an edge to the MST T ∗
.
Before we use this to show a 2-approximation algorithm for tsp, let’s recall Eulerian
Graphs. A Euler Circuit: is a closed walk in graph G containing every edge of G. A
Euler Path is a walk in G containing every edge of G.
Theorem 22. G has an Euler circuit if and only if every vertex has even degree
Theorem 23. G has an Euler path if and only if it has exactly two vertices of odd
degree
We use a spanning tree T to find a tsp tour C on a metric-tsp instance G. Suppose
each edge in T is duplicated, i.e. can be used twice. Consider some vertex s to be the
root of the T .
Let L be an Euler tour on T (∗) starting from s (the root).
P We list vertices in order of
L including repetitions. The length of L is w(L) = e∈L w(e).
Let C ∗ be an optimal tsp tour in G. L is not equal to C ∗ since C ∗ must visit each
vertex only once except the first. To convert L to a tsp tour C, we remove duplicate
vertices, while the retaining the first and last vertex, by short-circuiting L. We do
this by traversing L and keeping only the first occurrence of a vertex. When a vertex
25
a a a
99
88
b
99 89 e b e b e
54
54
54
54
19 53
19 99 53 99 19 53 53
19
8
c d c d c d
8 8 8
A metric-tsp instance An mst T T rooted at c and edges duplicated
b e b e b e
c d c d c d
is about to be revisited, we skip it and simply visit the last vertex. Only the repeated
root at the last vertex is retained to complete the cycle C.
The algorithm can be summarized as follows:
Algorithm double-tree-tsp(G)
T ← mst(G) ▷ e.g. Kruskal algorithm
T (∗) ← duplicate edges of T ▷ every vertex has even degree
L ← Euler tour of T (∗) ▷ Fleury or Hierholzer algorithm
C ← short-circuit(L)
return C
Runtime of each step in the above algorithm is polynomial: Kruskal’s algorithm takes
26
a a
b e b e
c d c d
L2 d , c , e , c , a , b , a , c , d C2 = short-circuited L2
d c e c a b a c d d c e c a b a c d
O(|E| log n), duplicating edges is O(n) while Euler tour can be obtained in O(|E|2 )
using Fleury’s algorithm. and in O(|E|) using Hierholzer’s algorithm. Finally, short
circuiting takes O(n). Therefore, the overall runtime is clearly polynomial.
Theorem 24. The double-tree-tsp is a 2-approximation for metric-tsp
Proof. Since edges were duplicated in T to obtain a Euler tour L on T , w(L) = 2w(T ).
Then, since w(T ) ≤ w(C), w(L) ≤ 2w(C ∗ ). Since edges were only removed during
short circuiting, and edge weights in T (or G) are a distance metric, w(C) ≤ w(L).
L y
x y z w(x, y) w(y, z)
C = short-circuited L x z
x y z w(x, z) ≤ w(x, y) + w(y, z)
27
have an even degree. The interesting question now is, is there a less costly way to make
degrees of select vertices even? We see that we can indeed do so in the Christofides’
Algorithm which gives us a 1.5-approximation for metric-tsp.
d e d e d e d e d e
f g h f g h f g h f g h f g h
Algorithm Christofides-Algo-tsp(G)
T ← mst(G) ▷ e.g. Kruskal algorithm
′
G ← Subgraph in G induced by odd degreed vertices in T .
′
T ( ) ← Min-Cost-Perfect-Matching(G′ ) ▷ Micali & and Vazirani’s Algorithm
O(n2.5 )
H ←M ∪T ▷ every vertex has even degree in H
L ← Euler tour of H ▷ Fleury or Hierholzer algorithm
C ← short-circuit(L)
return C
28
Proof. The length of Euler tour L on H is w(L) = w(T ) + w(M ). Since w(C) ≤ w(L)
and w(T ) ≤ w(C∗), w(C) ≤ w(C ∗ ) + w(M ).
First, we show that, given a ham-cycle C in G = (V, E) and U ⊆ V with |U | is even,
for a min-cost perfect matching M on U , w(M ) ≤ 1/2w(C).
Since M is a min-cost perfect matching w(M ) ≤ 1/2w(C ′ ). Also note that w(C ′ ) ≤
w(C) by the triangle inequality. The above two facts imply that w(M ) ≤ 1/2w(C).Furthermore,
since we earlier saw that w(C) ≤ w(C ∗ ) + w(M ) and now given w(M ) ≤ 1/2w(C ′ ), we
can conclude that w(C) ≤ (1 + 1/2)w(C ∗ ).
29
In the following we will assume that all weights and values are integers.
Algorithm GreedyByRation(G)
n
P
if wi ≤ C then ▷ If all items fit in the sack, then take all
i=1
S←U
return S
vi v1 v2 vn
Sort items by wi
into an array S ▷ WLOG assume that w1
≥ w2
≥ ... ≥ wn
W eight ← 0 ▷ We store the total weight collected so far in W eight
V alue ← 0 ▷ We store the total value collected so far in V alue
S←∅ ▷ Initially the knapsack is empty
for i = 1 to n do
if W eight + wi < C then
S ← S ∪ ai
V alue ← V alue + vi
W t ← W eight + wi
It turns out that this algorithm too can be arbitrarily bad. Consider the following
instance W = {1, C} and V = {2, C}, so items have value to weight ratio as 2 and 1.
So we will pick item item 1 first as its ratio is 2, but then there is no more capacity
for the second item. While the optimal solution clearly is to take item 2. The ration
vi /wi is called the density of items. The problem is that density is not necessarily a
good measure of profitability. In the above example more dense item blocks the more
profitable item.
30
We can fix this with an extremely simple trick. We also run another simple greedy
algorithm that chooses the first item that GreedyByRatio algorithm misses. We return
the best of the two algorithms. Following is a pseudocode for this algorithm.
Algorithm ModifiedGreedyByRation(G)
n
P
if wi ≤ C then ▷ If all items fit in the sack, then take all
i=1
S←U
return S
Sort items by wvii into an array S ▷ WLOG assume that wv11 ≥ wv22 ≥ . . . ≥ wvnn
W eight ← 0 ▷ We store the total weight collected so far in W eight
V alue ← 0 ▷ We store the total value collected so far in V alue
S←∅ ▷ Initially the knapsack is empty
for i = 1 to n do
if W eight + wi < C then
S ← S ∪ ai
V alue ← V alue + vi
W t ← W eight + wi
k ← |S|
if V alue ≥ vk then
return S
else
return {ak }
31
We get that
k
X
vi + c · vk+1 ≥ OP T, (1)
i=1
by explanation of c above, (if we were allowed fractional packing) this packing consumes
all the capacity of the Knapsack since and uses the capacity optimally (if fractional
items were allowed), as we always selected items with largest density (largest value per
unit capacity).
Actually we are going to refer to this fact later (a few times) so lets give it a name.
C − (w1 + w2 + . . . + wk )
where c = .
wk+1
Pk+1
As an immediate corollary we get that i=1 vi ≥ OP T , as c < 1.
nP o
′ k
Now our approximation guarantee follows. Since f (s ) = max i=1 vi , vk+1 , if
Pk OP T
i=1 vi < 2
, then vk+1 ≥ OP2 T , because otherwise using that fact c < 1, we get
a contradiction to (1).
This analysis is tight. Consider the following instance where the performance guarantee
is matched. Let U = {a1 , a2 , a3 }, V = {1 + 2ϵ , 1, 1}, W = {1 + 3ϵ , 1, 1}, and C = 2.
The densities are given as {(1 + ϵ/2)/(1 + ϵ/3), 1/1, 1/1}. The above greedy algorithm
will choose a1 as this is equal to S (by the algorithm) and v1 > v2 . While the optimal
solution clearly is {a2 , a3 }. f (s′ ) = (1 + ϵ/2), so f (s) = 2 = 2f (s′ ) − ϵ, which is
arbitrarily close to 2 by choosing a sufficiently small ϵ.
Note that the runtime of this algorithm is n log n for sorting (Each of the n divisions
takes time proportional to log(P.C), where P is the sum of alll values). So if we sacrifice
quality of solution we brought down runtime to n log n from the dynamic programming
pseudo-polynomial time algorithm that took O(nC).
32
6 Polynomial Time Approximation Scheme
How good are the approximation algorithm we have seen so far. Consider the 2-
approximate algorithm for vertex cover, while this is much better than applying the
more general set cover approximate algorithm, but think about it, the solution we give
is a 100% more than the optimal. The set cover solution is even worse O(log n) times
more than the optimal.
An important fact about approximation algorithms and approximability is that while
all N P -Complete problem are equivalent in terms of polynomial time solvability. But if
P ̸= N P , they differ substantially in terms of approximability. For instance we could
find a 1-absolute approximate algorithm for planar graph coloring, while we proved
that the independent set problem cannot have a k-absolute approximate algorithm.
While we didn’t cover any inapproximability results for relative approximation algo-
rithm, it can be shown that the set cover O(log n)-approximate algorithm is essentially
the best possible (assuming P ̸= N P ). Similarly, as we remarked earlier that ver-
tex cover doesn’t admit an approximation ratio better than 4/3. Inapproximability
results are known for many other N P -complete problems too. Such results are part
of a fascinating research area called hardness of approximation or lower bounds on
approximability.
In the next section we will discuss algorithms that achieve any desired approximation
ratio guarantee. Meaning the user inputs in addition to the problem instance the
desired precision required and the algorithm guarantees to outputs a result within that
error bound.
Definition 29. A Polynomial Time Approximation Scheme (PTAS) is an ap-
proximation algorithm that takes as input in addition to the problem instance a pa-
rameter ϵ > 0, and produces a solution that is (1 ± ϵ)-approximate. The running is
polynomial in the size of the problem instance (generally n) its dependency on ϵ can be
exponential however.
(1 ± ϵ)-approximate means that for minimization problem the value of solution is
at most (1 + ϵ) · OP T and that for maximization problem is at least (1 − ϵ) · OP T .
Runtime of such algorithm could be for example O(21/ϵ n3 ), or O(n1/ϵ ), or
Definition 30. A Fully Polynomial Time Approximation Scheme (PTAS) is
a PTAS whose running time is polynomial in both n and 1/ϵ.
33
6.1 PTAS for the Knapsack problem
First we make a few observation about the ModifiedGreedyByRatio algorithm and
Lemma 28. Above we gave an instance where this algorithm actually about 1/2 the
optimal solution. We argue that in some cases the solution by the algorithm is not
very bad, we actually identify when is the solution bad.
Lemma 31. If there is an 0 < ϵ < 1/2, such that for every items wi ≤ ϵC, then
ModifiedGreedyByRatio algorithm gives a (1 − ϵ) approximation.
Proof. First since we sorted items by value to weight ratio, we have that
vi vk+1 vk+1
∀ 1 ≤ i ≤ k + 1, ≥ =⇒ vi ≥ wi .
wi wk+1 wk+1
Adding up all these inequalities we get that
vk+1
v1 + v2 + . . . + vk+1 ≥ (w1 + w2 + . . . + wk+1 )
wk+1
v1 + v2 + . . . + vk+1
wk+1 · ≥ vk+1
w1 + w2 + . . . + wk+1
By definition of k (the last item that the algorithm chose, actually ak+1 is first item that
the algorithm rejected is the more important fact), we have that w1 + w2 + . . . + wk+1 >
C, plugging this in the above inequality we get that
wk+1
vk+1 ≤ · v1 + v2 + . . . + vk+1 .
C
Using every wi ≤ ϵC, pluggin in wk+1 ≤ ϵC, in the above inequality we get
Now if (v1 +v2 +. . .+vk ) ≥ (1−ϵ)·OP T , then we are done (got a (1−ϵ)-approximation).
If (v1 + v2 + . . . + vk ) < (1 − ϵ) · OP T , then we get from the above vk+1 ≤ ϵ · OP T (just
substitute this in the above inequality for vk+1 .
Combining these two implies that v1 +v2 +. . .+vk +vk+1 < (1−ϵ)·OP T +ϵ·OP T < OP T ,
contradicting Lemma 28 (the corollary). Hence at least one of them must be at least
(1 − ϵ) · OP T .
34
Lemma 32. If there is an 0 < ϵ < 1/2, such that for every items vi ≤ ϵOP T , then
ModifiedGreedyByRatio algorithm gives a (1 − ϵ) approximation.
We will next design a PTAS for the Knapsack problem (using ideas from the above
two lemmas). First we state a simple but nonetheless very useful fact.
Fact 33. In any optimal solution with total value OP T and any 0 < ϵ < 1, there are
at most ⌈ 1ϵ ⌉ items with values at least ϵ · OP T .
The above fact and lemma gives us the following idea for designing a PTAS. We will
first try to guess the heavier items (values larger than ϵ · OP T ) in the optimal solution,
(by the above fact there aren’t too many), then for the remaining items we will use
our old ModifiedGreedyByRatio algorithm, by the lemma it will give us quite good
solution (because the remaining items have values at most ϵ · OP T ). The problem
is how to guess the heavier items (actually since we don’t know OP T , we can’t even
define heavy items). All we know is a bound on their number. But that’s good enough
information, since there can only be ⌈ 1ϵ ⌉ of them. We will try all subsets of U of sizes
1
at most ⌈ 1ϵ ⌉. There are at most n⌈ ϵ ⌉+1 subsets of U of size at most ⌈ 1ϵ ⌉.
P P
Here is the algorithm. For a set S ⊆ U we define w(S) = i∈S wi and v(S) = i∈S vi
(the total weight and value of items in S).
35
Algorithm KnapsackPTAS(G)
h ← ⌈ 1ϵ ⌉
currentM ax ← 0
for each H ⊆ U , such that |H| ≤ h do
Pack H in knapsack with capacity C (if possible)
Let vm be the minimum value of any item in H
Let H ′ be items in U \ H with value greater than vm
Run ModifiedGreedyByRatio algorithm on U \ {H ∪ H ′ } with capacity C − w(H)
Let S be the solution returned
if currentM ax < v(H) + v(S) then
currentM ax ← v(H) + v(S)
We can easily keep track of the best solution too, by keeping the current best pair of
H and S.
Theorem 34. The above algorithm is a PTAS for the knapsack problem, with runtime
1
n⌈ ϵ ⌉+1
Proof. There are O(nh+1 ) subsets of U of size at lest h. For each subset we do a linear
amount of work and then call the ModifiedGreedyByRatio algorithm. Note that we do
sorting only once, but that is immaterial as it is dominated by the above O(nh ) term.
1
O(nh+1 ) = O(n⌈ ϵ ⌉+1 ), hence it is polynomial in n (and exponential in 1/ϵ).
For the approximation ratio, note that since we are iterating over all subsets of size at
most h, so in one of these iterations we will actually consider the correct set H (which
the optimal solution has) (we know it cannot have more than h items of value more
than ϵ · OP T ). Consider that iteration. U ′ = U \ {H ∪ H ′ }, Let OP T ′ be the optimal
way to pack items in U ′ . So OP T = v(H) + OP T ′ . In this iteration we removed all
the items that have value at least larger than ϵ · OP T , because by the fact above they
cannot be part of the solution (OP T ′ ). Since for every item in U ′ , vi ≤ ϵ · OP T , by the
above lemma we can get a solution that is at least (1 − ϵ)OP T ′ . Hence our solution is
v(H) + (1 − ϵ)OP T ′ ≥ (1 − ϵ)OP T (v(H) could be 0, the optimal solution might not
include any heavier item).
36
7 FPTAS for the Knapsack problem
We first develop a dynamic programming solution to solve the knapsack problem, which
we use to develop a FPTAS for the problem.
As we discussed earlier we can find OP T (n, C) (and O(n, C)), which is our goal exactly
in time O(nC). This is not polynomial in the size of the input as unless we are using
a unary system C can be described in O(log n) bits.
But can we use this solution to design a FPTAS? Earlier we saw that if all weights are
not very large (≤ ϵC) then we get (1 − ϵ)-approximation. So we can scale down all the
weights and B and solve the problem exactly using this algorithm and then scale up
the solution. There could be a problem; during scaling up we might end up with an
infeasible solution (violating the capacity constraint), which is a hard constraint. In
the following we give another dynamic programming formulation which is more scaling
friendly. That solution actually scales all the values (and use the second lemma, when
all values are small), here we can scale up with care so as to not end up with infeasible
solution, but value is kind of a soft constraint (it is in our objective function and we
are allowed to make bounded error).
37
will be easy to use it as a subroutine for a FPTAS. Recall that all of the values are all
integers. For the previous dynamic programming solution we essentially answered the
question “what is the maximum value that we can gain if capacity is c”? Here we turn
this question to “what is the minimum weight that we need if want to gain a value of
p”? Let’s define OP T (i, v) to be the smallest capacity needed to get the target value
v from the sub-instance U = {a1 , . . . , ai }, W = {w1P
, . . . , wi } and V = {v1 , . . . , vi }.
The maximum target one can achieve is at most P = i vi , we we have to compute a
OP T (i, v) for each 0 ≤ i ≤ n and 0 ≤ v ≤ P . Let vm be the maximum value of any
item, then P ≤ nvm , the total number of subproblems are at most O(n · nvm ).
Remark 35. Remember none of these subproblems is our actual goal, but if we have
solution to these subproblems, then we can compute a solution to our problem instance.
The solution to our instance is the largest v such that OP T (i, v) ≤ C.
It is easy to see that the following recurrence can be used to solve these subproblems.
0 if v = 0
∞
if i = 0 and v > 0
OP T i, v =
OP T i − 1, v if i ≥ 1 and 1 ≤ v < vi
min{OP T (i − 1, v), OP T (i − 1, p − v ) + w
if i ≥ 1 and v ≥ vi
i i
This recurrence might not be clear immediately so think about it until it is. Try
to formally prove it, it just uses the definition of OP T (i, v). You may consult your
textbook for a formal proof.
As usual this immediately gives us a dynamic programming algorithm, (a bottom-up)
iterative procedure that runs in O(n2 vm ) that in view of the above remark computes
OP T (n, C). Note that this too is a pseudo-polynomial time algorithm but as we will
see below we have achieved a lot.
38
If the values are large, then we will scale them down, as we don’t have to deal
with them exactly, since we are only seeking an approximate solution.
We are going to scale down all the values so they are not too large and also round them
so they are integral. This scaling and rounding will introduce some error, because the
algorithm may prefer some subset over another erroneously as the exact values are
unknown. We, therefore, want to do scaling and rounding such that the error is small
(bounded). Since we want to get a (1 − ϵ)-approximation we want the error to be no
more than ϵ · OP T .
Let b = nϵ · OP T . Suppose we change every value vi to vi′ such that vi′ is the smallest
integers such that vi ≤ vi′ · b. The number vi′ satisfying the above bound is given by
lv m
i
vi′ = .
b
So all newPvalues are at most ⌈n/ϵ⌉, a good news as now the total profit can’t be
much (≤ i vi′ ≤ nvm ′
). Since running the scaling friendly dynamic programming
we will give us an optimal solution with vi′ in time O(n · n · vm ) = O(n2 · vm ) =
O(n3 · 1ϵ ) (polynomial in n and 1/ϵ).
3. Lets see what is the error in using vi′ instead of vi in the dynamic programming
run. Let S ′ be the solution returned by this dynamic programming algorithm
′
with
P values vi . Let S be the optimal solution to the original instance, i.e OP T =
i∈S vi . Since we didn’t change the weights wi or the capacity C we have the
w(S ′ ) < C. We will show that v(S ′ ) ≥ (1 − ϵ)v(S). Note that since S ′ is optimal
w.r.t vi′ , we have that v ′ (S ′ ) ≥ v ′ (S) (as S is a feasible solution in the scaled down
version too). One last thing and the rest is calculations, we have by construction
we have
vi vi
≤ vi′ ≤ + 1.
b b
39
Hence we get
X
OP T = v(i)
i∈S
X
≤ b · vi′
i∈S
X
≤b· vi′
i∈S
′
≤ b · v (S)
≤ b · v ′ (S ′ )
X
≤b· vi′
i∈S ′
X vi
≤b· ( + 1)
i∈S ′
b
X vi + b
=b·
i∈S ′
b
1X
=b· (vi + b)
b i∈S ′
X
= vi + b · |S ′ |
i∈S ′
≤ v(S ′ ) + n · b
= v(S ′ ) + ϵ · OP T
Hence v(S ′ ) ≥ (1−ϵ)·OP T , therefore, S ′ , the solution given by the scaling friendly
dynamic programming with scaled down values is at least (1 − ϵ)-approximate.
Here is a huge problem. How do we know the value of OP T , that we used throughout
in the form of b. Actually we don’t know value of OP T , but we know a good lower
bound on that i.e. OP T ≥ vm (assuming all items have weight at most C, since
larger weighing items can be ignored in the pre-processing). So we take b = nϵ · vm
and do all the calculations with this new b. All the above three points go through
with this new b, in the third point at the end of the chain of inequalities we get that
OP T ≤ v(S ′ ) + ϵ · vm , so we plug in vm ≤ OP T there and get OP T ≤ v(S) + ϵ · OP T .
40