ADM Lecture Notes
ADM Lecture Notes
Mathematics
lecture notes, summer term 2022
Prof. Dr. Yann Disser
Contents
1 Algorithms and their Analysis 3
1.1 Example 1: The Celebrity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Running Times of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Asymptotic Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Example 2: Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Shortest Paths 37
5.1 Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 The Algorithm of Bellman-Ford(-Moore) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Network Flows 43
6.1 The Ford-Fulkerson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 The Edmonds-Karp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Path Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7 Matchings 51
7.1 Bipartite Matchings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8 Complexity 55
8.1 NP-completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.2 Important NP-complete problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1
1 Algorithms and their Analysis
This lecture is intended as an introduction to the mathematical design and analysis of algorithms. Ab-
stractly, an algorithm is nothing more than a formal specification of a systematic way to solve a com-
putational problem. For example, at school we learn a basic algorithm to multiply any pair of numbers
using only basic multiplications and additions, e.g.,
7 3 · 5 8
3 5
+ 1 5
+ 5 6
+ 1 2 4
= 4 2 3 4
This method uses 4 multiplications in this example, and n2 multiplications in general for two numbers
with n digits each. It is a natural question whether we can do with fewer multiplications. Indeed, we
can compute the same result by only using the 3 multiplications
as follows:
7 3 · 5 8
3 5
+ 3 5
+ 2 4
+ 2 4
+ 1 11 2
= 4 2 3 4
This method can be generalized to only take roughly n1.58 multiplications, and even better algorithms
are known that take less than n1.01 multiplications (for very large values of n).
In this lecture, we will be concerned with algorithmic questions of this type. In particular, we will
investigate how to systematically solve algorithmic problems and compare different algorithms in terms
of efficiency of computation.
Algorithmic problems and solutions permeate all aspects of our modern lives. Here is a (very) short list
of questions that we commonly delegate to algorithms:
• How to search large collections of data like the Internet?
• How to find a shortest route in a transportation network?
• How to assign and route delivery trucks to ship online orders?
3
• How to schedule lectures in order to minimize overlaps?
• How to position cellular network towers for good coverage?
• How to layout large buildings to allow for fast evacuation?
• How to render complex 3D-scenes to a 2D-display?
• How to distinguish photos of cats and dogs?
To illustrate our approach, we will begin by informally considering a simple problem. We will later
formalize our analysis and apply it to more challenging problems.
Consider a group P of n people where each person p ∈ P knows a subset Pp ⊆ P of the group. In this
setting, we call a person p ∈ P a celebrity if everybody knows them, but they know nobody, i.e., if p ∈ Pp0
for all p0 ∈ P \ {p} and Pp = ;. Our problem is to determine whether there is a celebrity in P and, if
yes, who the celebrity is (obviously, there can be at most one celebrity in P ). To do this, we may ask
questions of the form “Does a ∈ P know b ∈ P ?”. How can we systematically solve our problem using
the smallest number of questions possible?
We can solve the problem simply by asking every pair (a, b) of (distinct) persons whether a knows b and
check explicitly if there is a celebrity. This naive algorithm takes n · (n − 1) questions. Can we do better?
Observe that even if we knew that either a ∈ P is the celebrity or there is none, we still need at least
2(n − 1) questions to check whether a is the celebrity. This means that we always need at least 2(n − 1)
questions at the very least. On the other hand, if we are lucky and guess a ∈ P correctly, we may only
need these 2(n − 1) questions.
However, in the theory of algorithms it is customary to demand a guarantee regarding the efficiency
of an algorithm. This means that we want to bound the maximum number of questions our algorithm
may need in the worst case. Throughout the lecture, we will focus on this perspective and analyze the
worst-case performance of algorithms. So can we do better than asking all n · (n − 1) potential questions
in the worst case?
Mathematical induction motivates an important paradigm of algorithm design: recursion. The idea is
simple: Assume we already knew an algorithm An−1 to solve our problem efficiently for groups of at
most n − 1 people. We can apply this algorithm on a subset P 0 = P \ {p} for some arbitrarily chosen
person p ∈ P to efficiently check whether P 0 has a celebrity. As described above, we can invest 2(n − 1)
additional questions to check whether p is the celebrity in P . If not and if P 0 has a celebrity p0 , the
answers to the two questions involving p and p0 , which we already asked, tell us whether p0 is a celebrity
in P . If not, there is no celebrity.
We trivially know the best-possible algorithm A1 for groups of size 1. We can therefore recursively
define Ai to use Ai−1 for i ∈ {2, . . . , n} as described above. It is easy to see that we still ask all possible
questions: If f (i) denotes the number of questions asked by algorithm Ai for i ∈ {1, . . . , n}, we obtain
f (n) = f (n − 1) + 2(n − 1)
= f (n − 2) + 2(n − 2) + 2(n − 1)
= ...
n−1
X
=2 (n − i)
i=1
= n(n − 1).
4
The main source of questions in our algorithm comes from the fact that we need to explicitly check
whether the person p ∈ P arbitrarily excluded from P is the celebrity. We can save a lot of questions by
ensuring that we exclude a person that cannot be the celebrity. To do this, we can simply ask a single
question for any pair (a, b) of distinct persons. Either a knows b, then a cannot be the celebrity; or a
does not know b, then b cannot be the celebrity. In either case, we can make sure to exclude a person p
that cannot be the celebrity.
Our algorithm can formally be stated using pseudocode:
Algorithm: FINDCELEBRITY(P)
input: set of people P
output: celebrity p? ∈ P if one exists, otherwise ;
if |P| = 1 :
return unique person p ∈ P
take any {a, b} ⊆ P
if “Does a know b?” :
p←a
else
p←b
p0 ← FINDCELEBRITY(P \ {p})
if p0 6= ; and “Does p know p0 ?” and not “Does p0 know p?” :
return p0
else
return ;
¨
0 if n = 1,
f (n) =
1 + f (n − 1) + 2 if n > 1.
To solve such recurrences, we always proceed in the same manner: We first guess the solution by tele-
scoping, i.e.,
We can then formally prove our hypothesis that the number of questions asked is f (n) = 3(n − 1) by
induction. The induction basis holds since f (1) = 0. Now assume that f (i − 1) = 3(i − 2) holds. Then,
5
Questions n(n − 1)
3(n − 1)
2(n − 1)
n
In the previous section we measured the efficiency of our algorithms by counting the number of questions
they ask relative to the size of the group we are given. To generalize this to arbitrary problems and
algorithms, we first have to provide formal definitions of the notion of a problem, of the size of a given
instance of such a problem, and what it means to solve a problem.
Definition 1.1. An algorithmic problem is given by a tuple (I , (S I ) I∈I ) of a set of instances/inputs I and
a family of sets of solutions (S I ) I∈I . Every instance I ∈ I is associated with an input size |I|.
Example. The set of instances of the multiplication problem we considered initially are all pairs of
natural numbers, the size of an instance is given by the number of digits of the larger number. An
instance of the celebrity problem is a set P together with the sets Pp for all p ∈ P , and the size of an
instance is the cardinality of P .
Definition 1.2. An algorithm A solves an algorithmic problem (I , (S I ) I∈I ) if, given any instance I ∈ I , it
terminates after a finite number of steps and returns a solution A(I) ∈ S I .
Of course, we can no longer count the number of “questions” when speaking about problems other than
the celebrity problem. Instead, we count the number of elementary operations like accessing variables,
performing simple arithmetic, comparing, copying and storing numbers or words, evaluating conditional
statements, etc. We adopt the unit cost model where we assume that each elementary operation takes one
time step, independent of the size of the numbers involved. Additionally, we assume that we can iterate
over a set X in |X | steps and that we can infer |X | in one step. The unit cost model significantly simplifies
our mathematical analysis of algorithms and allow us to focus on their qualitative behavior, without
being distracted by technical details that depend on how an algorithm is implemented in practice.
Definition 1.3. The (worst-case) running time of an algorithm A for an algorithmic problem (I , (S I ) I∈I )
is given by the function f : N → N with
6
1.3 Asymptotic Growth
We mentioned that we will compare algorithms according to the asymptotic growth of their running
times for large input sizes n. Asymptotically, functions that only differ by constant factors or offsets
exhibit a similar growth, as we saw above in the illustration for the polynomials 3(n − 1) and 2(n − 1)
compared with n(n − 1) = n2 − n. In addition, since we already made the simplifying assumption that all
elementary operations have the same cost, we should really not distinguish between running times that
are only a constant factor apart. The following examples illustrate the irrelevance of constant factors
when comparing the asymptotic growth of functions:
n en 10 · n2
p
10 · n
100 · ln(n)
100 · n
n n
p
Intuitively, it seems reasonable to say that n2 “grows faster than” n and n “grows faster than” ln(n).
These examples also indicate that we can focus on the fastest growing term when analyzing the asymp-
p
totic growth of a function. In other words, n2 + 10n + 100 n essentially behaves like n2 for large n. We
p
can capture this intuition formally by observing that n2 < n2 + 10n + 100 n < 2n2 for n > 29. Together
with our earlier observation that we can ignore constant factors, this indicates that n2 “grows similarly
p
fast as” n2 + 10n + 100 n.
We now introduce notation that formally captures this intuition. We begin by formalizing the set of
functions that “grow no faster than” a function g .
Definition 1.4. The set of functions that asymptotically grow no faster than g : N → R≥0 is defined as
O(g) := f : N → R≥0 ∃c > 0, n0 ∈ N ∀n ≥ n0 : f (n) ≤ c g(n) .
p
Example. As intended, we have n2 + 10n + 100 n ∈ O(n2 ), e.g., by choosing c = 2 and n0 = 29.
We can now easily extend this notation to other asymptotic relationships between functions.
Definition 1.5. Let g : N → R≥0 . We define the set of functions that asymptotically grow
(i) at least as fast as g : Ω(g) := f : N → R≥0 g ∈ O( f ) ,
The symbols O, Ω, Θ, o, ω are called (Bachmann-)Landau symbols after their creators, or (big-)O notation.
7
Remark 1.6. It is often convenient to use Landau symbols within arithmetic expressions and derivations,
e.g.,
n2 + 4n = n2 + O(n) = n2 + o(n2 ).
In this context, we interpret operations and relations element-wise, i.e., f + Ξ(g) := { f + h| h ∈ Ξ(g)},
and we write f = Ξ(g) for f ∈ Ξ(g) and Ξ(g) = Ξ0 (h) for Ξ(g) ⊆ Ξ0 (h), where Ξ, Ξ0 denote arbitrary
(expressions containing) Landau symbols.
Proposition 1.7. Let f , g, h: N → R≥0 . The Landau symbols Ξ ∈ {O, Ω, Θ, o, ω} have the following
properties:
(iii) For all constants a, b > 1, we have Ξ(loga n) = Ξ(log b n), which justifies the notation Ξ(log n).
(v) For all constants 0 < a < b and 1 < α < β , we have
Equipped with this notation, we can now formally state our result for the celebrity problem. Recall
that every algorithm needs at least 2(n − 1) = Ω(n) steps, and our algorithm FINDCELEBRITY only takes
3(n − 1) = O(n) steps. This means that we have found a tight bound, which we can express using
Θ-notation:
Theorem 1.8. The best possible running time for solving the celebrity problem is Θ(n).
Similarly to our initial example of multiplying two numbers, we can ask how to efficiently compute a
matrix product C = A· B of two n × n matrices A and B with n = 2k for k ∈ N. Note that we can add rows
and columns of 0’s to bring A and B into this form, increasing both dimensions by a factor of at most 2.
Of course, we can explicitly compute C = (ci j ) via
n
X
ci j = ai` b` j ,
`=1
which takes n multiplications and n − 1 additions, i.e., Θ(n) operations, per entry, for a total of Θ(n3 )
operations. Can we do better?
Consider the following recursive idea: We subdivide the matrices into four parts
A1,1 A1,2 B1,1 B1,2 C1,1 C1,2
A= ,B = ,C = ,
A2,1 A2,2 B2,1 B2,2 C2,1 C2,2
8
This takes 8 multiplications of matrices of roughly half dimension. If f (n) denotes the number of ele-
mentary multiplications we need for matrices of dimension n, we get
where we used that two 1 × 1 matrices can be multiplied with a single elementary multiplication. This
means, that we have not improved our running time yet.
However, we can save one matrix multiplication by computing the auxiliary matrices
and then
C1,1 = M1 + M4 − M5 + M7
C1,2 = M3 + M5
C2,1 = M2 + M4
C2,2 = M1 − M2 + M3 + M6 .
We can easily see that the number of multiplications dominates our running time, by observing that the
number of matrix additions in each recursive level is bounded by a constant times the number of matrix
multiplications. We get the following result.
Remark 1.10. The above algorithm is called Strassen’s algorithm. The best known matrix multiplication
algorithm has running time O(n2.373 ) and was found by François Le Gall in 2014.
9
2 Searching and Sorting
In this chapter we consider the problem of sorting a collection of elements and of finding a specific
element in an already sorted collection. To that end, suppose that we are given as input a list of elements
L = (L1 , . . . L n ) ∈ X n over a ground set X , together with a total order on the elements of X . We assume
that comparing two elements is an elementary operation and we use n as a measure for the input size.
We first consider the search problem, where we want to determine whether element x occurs in the list
L = (L1 , . . . , L n ) and, if yes, find an index i ∈ {1, . . . , n} with L i = x . Of course, we can always solve this
problem simply by explicitly comparing each element of L with x .
Algorithm: LINEARSEARCH(L, x)
input: list L = (L1 , . . . , L n ), element x ∈ X
output: index min {i ∈ {1, . . . , n}|L i = x} ∪ {∞}
for i ← 1, . . . , n :
if L i = x :
return i
return ∞
In general, this algorithm obviously has linear running time Θ(n) in the worst-case, and we cannot hope
to do better.
However, if we know that L is sorted, i.e., L1 ≤ L2 ≤ · · · ≤ L n , we can improve on this significantly. The
idea is simple: If we compare first to the element (roughly) in the middle of the list, we can discard half
of the list, depending on the result of the comparison. We can then repeat the same idea recursively on
the remaining half. This algorithm is commonly called binary search.
The following pseudocode shows an iterative implementation of binary search. Note that, in case x does
not occur in L , the algorithm below outputs the index corresponding to the position at which x would
need to be inserted in order to maintain a sorted list. This convention will be useful later.
Algorithm: BINARYSEARCH(L, x)
input: sorted list L = (L1 , . . . , L n ), element x ∈ X
output: index min {i ∈ {1, . . . , n}|L i ≥ x} ∪ {n + 1}
l, r ← 1, n
while l ≤ r :
m ← b(l + r)/2c
if L m < x :
l ← m+1
else
r ← m−1
return l
11
Theorem 2.1. BINARYSEARCH solves the search problem for sorted lists in time Θ(log n).
Proof. To show correctness, let i ? := min {i ∈ {1, . . . , n}|L i ≥ x} ∪ {n + 1} be the index we need to find.
We claim that the following invariant holds throughout the course of the algorithm: i ? ∈ {l, . . . , r + 1}.
This is true initially, and it is easy to see that it remains true whenever l or r are updated (since L is
sorted). The algorithm terminates, since at least the index m is removed from the set {l, . . . , r + 1} in
each step. In the last iteration, either r = m and l is set to l = m + 1, or l = m and r is set to r = m − 1,
i.e., l = r + 1 in either case. By our invariant, we know that at that point i ? ∈ {l}, hence the algorithm
returns the correct result.
The
claimed
running
time follows from the fact that the number of iterations of the loop lies between
log2 n and log2 n + 1 and all other operations are elementary.
Note that binary search can also be implemented elegantly using recursion:
Algorithm: BINARYSEARCH’(L, x)
input: sorted list L = (L1 , . . . , L n ), element x ∈ X
output: index min {i ∈ {1, . . . , n}|L i ≥ x} ∪ {n + 1}
if n = 0 :
return 1
else
m ← b(n + 1)/2c
if L m < x :
return m + BINARYSEARCH’(L m+1,...,n , x)
else
return BINARYSEARCH’(L1,...,m−1 , x)
In the previous section we have seen why it can be useful to sort collections of data. Suppose, again,
that we are given this data in the form of a list L = (L1 , . . . , L n ). How can we sort L efficiently?
Sorting Problem
input: list L = (L1 , . . . , L n ) ∈ X n with X totally ordered
problem: reorder L such that L i ≤ L j for all i ≤ j
First observe that L is sorted if and only if L i ≤ L i+1 for all i ∈ {1, . . . , n−1}, i.e., if all pairs of consecutive
elements are in the correct relative order. This observation suggests the following naive algorithm: Go
over the list repeatedly from front to back and swap pairs of consecutive elements that are in the wrong
relative order until no such pairs are found anymore. This algorithm is called bubble sort, since it makes
elements “bubble” towards the end of the list (cf. Figure 2.1). In particular, after the i -th pass over the
list, the last i elements have reached their correct positions. This means that we need at most n − 1
passes over the list and we may stop each pass one element sooner than the previous one. This allows
us to slightly improve the algorithm:
12
Algorithm: BUBBLESORT(L)
input: list L = (L1 , . . . , L n )
output: sorted version of L
for i ← 1, . . . , n − 1 :
for j ← 1, . . . , n − i :
if L j > L j+1 :
(L j+1 , L j ) ← (L j , L j+1 )
return L
Proof. Correctness follows from the fact that, after the i -th iteration of the outer loop, element L n−i+1 is
at the correct position.
We can obtain a lower bound on the running time by only considering the first bn/2c iterations of the
outer loop and, for each such iteration, only the first bn/2c iterations of the inner loop. This gives a
total of bn/2c2 = Ω(n2 ) iterations and all other operations are elementary. Similarly, we obtain an upper
bound by pretending that both loops always run n times. Even with this generous estimation, we have a
total of n2 = O(n2 ) elementary operations.
Since the actual number of elementary operations must lie between both bounds, we obtain the claimed
running time.
We observed that in each iteration of BUBBLESORT one more element is guaranteed to have reached its
final position in the list. Since this property of the algorithm is sufficient to sort the list in n−1 iterations,
we can further improve the algorithm by concentrating on finding and repositioning the i -th element (in
sorted order) in iteration i , while forgoing any other swapping operations. The resulting algorithm is
called selection sort, since we select one item in each iteration (cf. Figure 2.1):
Algorithm: SELECTIONSORT(L)
input: list L = (L1 , . . . , L n )
output: sorted version of L
for i ← 1, . . . , n − 1 :
j ← arg min`∈{i,...,n} {L` }
(L i , L j ) ← (L j , L i )
return L
While selection sort only performs a linear number of element moves, it still needs a quadratic number
of comparisons.
Proof. By definition of SELECTIONSORT, the first i items are in their correct positions at the end of itera-
tion i . This implies that the list is correctly sorted upon termination of the algorithm. Since it takes linear
time in the worst case to locate the smallest item in a list, an analogous estimation as for BUBBLESORT
still yields a quadratic running time.
13
i BUBBLESORT SELECTIONSORT INSERTIONSORT
8 3 5 2 6 4 7 1 8 3 5 2 6 4 7 1 8 3 5 2 6 4 7 1
1 3 5 2 6 4 7 1 8 1 3 5 2 6 4 7 8 3 8 5 2 6 4 7 1
2 3 2 5 4 6 1 7 8 1 2 5 3 6 4 7 8 3 5 8 2 6 4 7 1
3 2 3 4 5 1 6 7 8 1 2 3 5 6 4 7 8 2 3 5 8 6 4 7 1
4 2 3 4 1 5 6 7 8 1 2 3 4 6 5 7 8 2 3 5 6 8 4 7 1
5 2 3 1 4 5 6 7 8 1 2 3 4 5 6 7 8 2 3 4 5 6 8 7 1
6 2 1 3 4 5 6 7 8 1 2 3 4 5 6 7 8 2 3 4 5 6 7 8 1
7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 2.1: Illustration of the basic sorting algorithms. The grey sublists are guaranteed to be sorted; the
elements changing position in each iteration are marked in red.
Yet another natural sorting algorithm that we often use when playing card games is insertion sort. The
idea of this algorithm is to add elements one at a time to a presorted sublist. We start with a sublist
containing the first element and insert each successive element at its correct relative position to keep the
sublist sorted (cf. Figure 2.1). We already know a good algorithm to find this correct relative position,
namely BINARYSEARCH. This reduces the number of element comparisons to subquadratic. However, in
order to make room for the element inserted in each iteration, we may need to shift the positions of all
other elements in the sorted sublist, so that we still get a quadratic numer of moves in the worst case.
Algorithm: INSERTIONSORT(L)
input: list L = (L1 , . . . , L n )
output: sorted version of L
for i ← 2, . . . , n :
j ← BINARYSEARCH(L1,...,i−1 , L i )
(L j , L j+1 , . . . , L i ) ← (L i , L j , . . . , L i−1 )
return L
Proof. Again, INSERTIONSORT maintains the invariant that the first i items are in their correct order at the
end of iteration i , which implies correctness of the algorithm.
It is obvious that INSERTIONSORT has running time O(n2 ), since the time needed for binary search on a
list of at most n − 1 numbers is O(log n) (by Theorem 2.1), and the time to relocate at most n numbers is
O(n). To see the lower bound, consider a list initially in decreasing order and consider only the iterations
for i ≥ bn/2c. In each of these dn/2e iterations the time needed for BINARYSEARCH is Ω(log bn/2c) =
Ω(log n) (by Theorem 2.1) and the time to relocate exactly i ≥ bn/2c elements is Ω(bn/2c) = Ω(n). This
means that the time needed in the worst case is at least dn/2e · (Ω(log n) + Ω(n)) = Ω(n2 ). Overall, the
(worst-case) running time thus is Θ(n2 ), as claimed.
14
2.3 Mergesort
There are several sorting algorithms that asymptotically have a better running time than Θ(n2 ). We will
see a simple recursive idea that uses the paradigm of Divide and Conquer. The name of this paradigm
expresses the general idea of splitting problem instances into two (roughly) equal parts, solving each of
the parts separately, and then combining results. The solution to the following recurrence illustrates why
this approach is promising to achieve subquadratic running times, provided that combining results takes
linear time.
Proof. First, consider the case that n = 2k for some k ∈ N. By telescoping, we obtain
f (n) = 2 · f (n/2) + n = 2 · 2 · f (n/4) + n + n = · · · = 2log2 n + n · log2 n = n · (1 + log2 n). (2.2)
We can prove this by induction: Equation (2.2) trivially holds for n = 1, since f (1) = 1(1 + 0) = 1.
Inductively, if (2.2) holds for n/2, we obtain
f (n) = 2 f (n/2) + n
= 2 · n/2 · (1 + log2 n/2) + n
= n · (1 + log2 n),
which proves (2.2).
Now, consider the remaining case that 2k−1 < n < 2k for some k = log2 n . By (2.1), f is monotone and
we have
f (n) < f (2k ) = 2k · (1 + k) = O(n log n),
f (n) > f (2k−1 ) = 2k−1 · k = Ω(n log n),
which implies f (n) = Θ(n log n).
With this motivation, merge sort is the most natural divide and conquer algorithm: We simply sort both
halves of the list recursively, and then combine results in linear time (cf. Figure 2.2).
Algorithm: MERGESORT(L)
input: list L = (L1 , . . . , L n )
output: sorted version of L
if n > 1 :
A ← MERGESORT(L1,...,bn/2c )
B ← MERGESORT(Lbn/2c+1,...,n )
for i ← 1, . . . , n :
if A = ; or (A 6= ; and B 6= ; and B1 < A1 ) :
rename A, B to B, A
L i ← A1 ; A ← A2,...,|A|
return L
15
level MERGESORT
8 3 5 2 6 4 7 1
3 3 8 2 5 4 6 1 7
2 2 3 5 8 1 4 6 7
1 1 2 3 4 5 6 7 8
Figure 2.2: Illustration of MERGESORT. The grey sublists are guaranteed to be sorted. The number of
recursive levels to sort 8 numbers is 3 = log2 8, as expected.
Theorem 2.6. MERGESORT solves the sorting problem in time Θ(n log n).
Proof. Correctness of MERGESORT for lists of length n follows inductively if we assume correctness for
lists of length up to dn/2e.
To bound the running time, we need to bound the total number of iterations of the loop over all levels
of the recursion for n > 1 and the number of times that MERGESORT is invoked for n = 1. This number is
given by equation (2.1), and thus application of Proposition (2.5) concludes the proof.
16
3 Basic Graph Theory
Many algorithmic problems arise on structures formed by entities and their relationships, e.g., social
and data networks, road maps and other infrastructures, processes and dependencies. We introduce the
abstract notion of a graph to formally capture the underlying structure of these settings, independently
of problem specific features.
Definition 3.1. A graph is a pair G = (V, E) of a set of vertices V and a set of edges E with
The edges of a directed graph are called arcs. In order to unify the following definitions, we write
[u, v ] ≡ {u, v } for undirected and [u, v ] ≡ (u, v ) for directed graphs.
Remark 3.2. Our definition of graphs demands that there is only at most one edge between every pair
of vertices, and that there cannot be an edge between a vertex and itself. This significantly simplifies
our notation. Definitions that relax these requirements are usually called multigraphs and graphs with
self-loops in the literature.
We can visualize graphs very naturally by drawing them in the plane, such that vertices correspond to
points in R2 and edges are lines or arrows between the corresponding points. We may label vertices or
edges with additional information associated with the graph. For example, for V = {v1 , . . . , v10 } and
E1 = {{v2 , v1 }, {v1 , v3 }, {v4 , v1 }, {v2 , v3 }, {v4 , v2 }, {v3 , v4 }, {v7 , v6 }, {v8 , v6 }, {v8 , v9 }, {v9 , v6 }, {v8 , v10 }},
E2 = {(v2 , v1 ), (v1 , v3 ), (v4 , v1 ), (v2 , v3 ), (v4 , v2 ), (v3 , v4 ), (v7 , v6 ), (v8 , v6 ), (v9 , v6 ), (v8 , v10 ), (v9 , v8 ), (v10 , v8 )},
the undirected graph G1 = (V, E1 ) and the directed graph G2 = (V, E2 ) can be drawn as follows:
v4 v3 v6 v4 v3 v6
v1 v1
v7 v8 v9 v7 v8 v9
v2 v5 v10 v2 v5 v10
Note that such a drawing is not unique, and not all graphs can be embedded in the plane without edges
crossing. Graphs for which this is possible are called planar.
For convenience, we often identify graphs with their set of vertices or edges, in particular in conjunction
with set notation. In particular, for a graph G = (V, E), we write G ± v as a shorthand for (V ± {v }, E)
and G ± e for (V, E ± {e}). In addition, we treat tuples (in particular, arcs) like sets of size two when
applying set operations. The following definition provides two different notions of subgraphs.
17
3 3 3 2 1 0
3 1
1 3 2 1 2 2
3 0 1 2 0 1
Figure 3.1: Illustration of Definition 3.4 for an undirected (left) and a directed graph (right). Vertex labels
specify degrees. The neighborhood of the highlighted vertex is indicated in blue. Edges of the
dashed cut induced by the red vertices are highlighted.
We introduce further notation regarding the relationships between (sets of) vertices and edges in a graph
(cf. Figure 3.1).
• The set of edges incident to u ∈ V is δG (u) := { [u, v ] ∈ E| v ∈ V }, and the set of edges incident to
U ⊆ V is δG (U) := { [u, v ] ∈ E| u ∈ U, v ∈ V \ U}. For directed graphs, we write δG+ (U) := δG (U)
and δG− (U) := δG (V \ U) to distinguish outgoing and incoming arcs.
• A cut is a set of edges S ⊆ E for which ; 6= U ⊂ V exists, such that S = δG (U). We say that U
induces S .
In the above notations, we omit the index whenever the graph is clear from the context.
One of the central structural features of a graph is encoded in the way that local relationships extend
into global connections between vertices. These connections are captured by the following definition
(cf. Figure 3.2).
18
Figure 3.2: Illustration of connectivity in undirected (left) and directed (right) graphs. Vertex colors indi-
cate connected components. Dotted edges form a closed walk, a path, a cycle, and a walk
(from left to right).
W = (v0 , e1 , v1 , e2 , v2 , . . . , e` , v` )
with ei = [vi−1 , vi ] ∈ E for i ∈ {1, . . . , `}. We often identify a walk with the sequence of its edges or
S` S`
vertices, or with the subgraph ( i=0 {vi }, i=1 {ei }). We say
• W has length |W | := `,
• W is closed if v0 = v` ,
• W is a cycle if it is closed, has length ` ≥ 1, and no vertex or edge appears more than once in
(e1 , v1 , e2 , . . . , e` , v` ). We call G acyclic if it does not contain any cycles.
We note the following important relationship between walks and cuts in a graph.
Observation 3.6. Let G = (V, E) be a graph and let U ⊂ V . Then, every walk from u ∈ U to v ∈ V \ U
in G contains an edge of δ(U).
We capture the intuitive fact that a walk is nothing else than a path with detours. Note that, in directed
graphs, closed walks of length two are cycles as well, but this is not the case in undirected graphs.
Definition 3.7. For two walks W = (u0 , . . . , u` ), W 0 = (v0 , . . . , v`0 ) with u` = v0 , we write W ⊕ W 0 :=
(u0 , . . . , u` = v0 , . . . , v`0 ). If C is a closed walk starting (and ending) at u` = v0 , we say that W ⊕ C ⊕ W 0
is a union of (W ⊕ W 0 ) and C .
Proposition 3.8. Let W be a walk from u ∈ V to v ∈ V in a graph G = (V, E). Then, W is a union of a
u-v -path and a set of cycles and of closed walks of length two.
Proof. We proceed by induction over the length ` of the walk W . For ` ≤ 1, the claim holds since W
must be a path. Now consider a walk W = (v0 , e1 , . . . , e` , v` ) with ` ≥ 2 and assume that the claim holds
for walks of length at most ` − 1. If W is a path, a cycle, or a closed walk of length two, the claim trivially
holds. Otherwise, we must have ` > 2 and we can choose i ∈ {0, . . . , `−1} and j ∈ {i +1, . . . , `} such that
vi = v j and (i, j) 6= (0, `). By induction, (vi , ei+1 , . . . , v j = vi ) must be the union of the (trivial) vi -v j -path
(vi ) and a set C of cycles and closed walks of length two. Also, the walk (v0 , e1 , . . . , vi , e j+1 , . . . , v` ) must
19
be the union of a u-v -path p and a set C 0 of cycles and closed walks of length two. But then, W is the
union of the u-v -path p and a set C ∪ C 0 of cycles and closed walks of length two.
Using the notion of walks, we can make the global connectivity of vertices more precise (cf. Figure 3.2).
Definition 3.9. A vertex v ∈ V is reachable from a vertex u ∈ V in a graph G = (V, E) if there exists a
path from u to v . The graph G is (strongly) connected if every vertex v ∈ V is reachable from every other
vertex u ∈ V . The connected component of u ∈ V is the inclusion maximal set Cu ⊆ V such that u ∈ Cu
and G[Cu ] is strongly connected, i.e., Cu := { v ∈ V | u, v are reachable from each other}.
Observation 3.10. Every connected component C of an undirected graph induces an empty cut δ(C)=;.
This is not true for directed graphs.
We often encounter graphs that have all possible edges, which justifies the following terminology.
Definition 3.11. The complement of a graph G = (V, E) is the graph Ḡ = (V, Ē) with Ē =
{ [u, v ] ∈
/ E| u, v ∈ V }. We say that G is complete if it is the complement of the empty graph, i.e., if
Ḡ = (V, ;).
We now turn to our first basic graph algorithms. In order to efficiently operate on the relational structure
defined by a graph, it makes sense to assume that we can efficiently enumerate the edges incident to a
given vertex. More specifically, we assume that accessing all deg(v ) edges incident to a vertex v requires
deg(v ) elementary operations.
This assumption allows us to efficiently traverse a graph, i.e., to systematically visit all of its vertices
by using edge-traversals. Graph traversals lie at the heart of many algorithms operating on graphs. For
example, the problems of deciding whether a graph is connected or of finding a specific vertex of a graph
(e.g., the exit of a maze) can be reduced to traversing the graph systematically.
There are two obvious approaches for graph traversal: depth-first search and breadth-first search. Depth-
first search (DFS) pursues a single path for as long as possible and only backtracks when there are no
more unvisited vertices in the neighborhood. Intuitively, this is the algorithm by which most people
would explore a maze if we allow them to leave markings on the walls. Breadth-first search (BFS) visits
a vertex closest to the starting point in each step. Intuitively, this is the algorithm most people would use
when exploring a maze as a team. Figure 3.3 gives an example for both algorithms.
Observe that the following formulations of both algorithms are extremely similar, and only differ in
their usage of the lists of vertices S and Q whose edges have not been explored yet. In DFS we always
extract the vertex from S that has been inserted most recently, while in BFS we extract the vertex that
has been inserted the earliest. We assume that both these operations are elementary. A data structure
that efficiently supports the last-in-first-out (LIFO) scheme of DFS is called a stack, a data structure that
supports first-in-first-out (FIFO) scheme of BFS is called a queue.
20
8 10 14 13
9 9
6 11 10 8
12 12
7 2 11 2
3 3
4 13 4 7
0 0
14 6
5 1 5 1
Figure 3.3: Execution of DFS (left) and BFS (right) on an undirected graph. Numbers and colors specify
the order in which vertices are visited; arrows indicate parents. Note that the outcome of the
algorithms depends on the ordering of the neighborhoods.
Lemma 3.12. DFS(G, r) and BFS(G, r) visit every vertex of G reachable from r exactly once.
Proof. First observe that every vertex is visited at most once, since only unvisited vertices are ever in-
serted into S or Q and they are immediately marked as visited after being inserted.
It remains to show that, for every ` ∈ N, every vertex of every r -v -path p of length ` in G is eventually
visited. We show this by induction over `. For ` = 0, the only possible path is p = (r), and the
21
statement is trivial. Now assume the statement holds for paths of length ` − 1 and consider any path
p = (v0 = r, . . . v`−1 , v` ) in G . By induction, v`−1 is eventually visited, which means that it gets added
to S or Q. Since the algorithm only terminates once S or Q is empty, v`−1 will be extracted from S or Q
at some point. Since v` ∈ Γ (v`−1 ), either v` is already visited at this point, or it will be visited.
In addition, we establish the meaning of the parents and depths computed by DFS and BFS.
Proposition 3.13. Let (pv )v ∈V and (dv )v ∈V be the parents and depths computed by DFS(G, r) or
BFS(G, r). Then, for every vertex v reachable from r in G , the sequence (v0 = r, v1 , . . . , vdv = v )
with vi−1 = pvi for i ∈ {1, . . . , dv } is an r -v -path.
Proof. We prove the statement for all vertices v reachable from r by induction over dv . For dv = 0 we
have v = r and the statement is trivial. Now consider a vertex v ∈ V with dv ≥ 1 and assume that the
statement holds for all v 0 ∈ V with dv 0 < dv . By Lemma 3.12, every vertex reachable from r is visited
exactly once, hence the parents and depths do not change once they are set by DFS or BFS. Since dv
is set to d pv + 1, we thus know that d pv < dv and hence, by induction, (v0 = r, . . . , vdv −1 = pv ) with
vi−1 = pvi for i ∈ {1, . . . , dv − 1} is an r - pv -path. In addition, every vertex vi of this path has dvi = i by
induction, which means that the path cannot contain v . But then (v0 , . . . , vdv −1 , vdv = v ) with vi−1 = pvi
for i ∈ {1, . . . , dv } is a path, as claimed.
Before we can bound the running times of DFS and BFS, we have to specify a definition of the input
size and what we consider to be elementary operations. Obviously, the number of objects involved in
the description of a graph G = (V, E) is Θ(n + m), where n := |V | and m := |E|. In many applications it
makes sense to account for the contributions of n and m to the running time separately, since the costs of
visiting vertices and traversing edges may differ significantly (e.g., crossing an intersection vs. traveling
along a segment of a highway). From now on, whenever we are dealing with an algorithmic problem
that has a graph as its input, we will therefore follow the convention that running times are expressed
as functions of n and m. Throughout, we will use n and m to denote the number of vertices and edges of
the input graph, respectively. In terms of elementary operations, we assume that, given u ∈ V , we have
direct access to the sets δ(u) and Γ (u), i.e., in particular, we have access to deg(u) and can iterate over
all e ∈ δ(u) and v ∈ Γ (u) in time Θ(deg(u)).
It is now easy to see that both DFS and BFS can be used to solve the connectivity problem on undirected
graphs as efficiently as possible.
Proposition 3.14. DFS or BFS can be used to determine whether an undirected graph is connected in
time Θ(n + m).
Proof. To solve the connectivity problem for a given undirected graph G = (V, E), we simply invoke DFS
or BFS on any vertex v ∈ V and check afterwards whether all vertices have been visited. This algorithm
is correct by Lemma 3.12, since every reachable vertex is visited. The lemma alsoP guarantees that every
vertex is visited at most once, which means that the inner loop iterates at most v ∈V deg(v ) = 2m times
in total. Overall, the running time is bounded by O(n + m). To prove the matching lower bound, observe
that we may need to inspect all vertices and edges in order to conclude that a graph is not connected.
The algorithm described in the proof of Proposition 3.14 can be repeated on unvisited vertices in or-
der to visit all vertices of the graph. The corresponding algorithms are called DFS-TRAVERSAL and
BFS-TRAVERSAL. For example, these algorithms allow us to determine all connected components of an
undirected graph, if we keep track of the sets of vertices that are visited in each invocation of DFS/BFS.
Observe that Proposition 3.14 does not hold for directed graphs, nor does its application to determine
all strongly connected components. However, it is still true that DFS-TRAVERSAL and BFS-TRAVERSAL
systematically visit all vertices, even in directed graphs.
22
6 13
5
7 4
14
10 12
11
8 3
0
2
9 1
Figure 3.4: Execution of DFS0 on the example of Figure 3.3. Again, numbers and colors specify the order in
which vertices are visited, arrows indicate parents, and the outcome depends on the ordering
of the neighborhoods.
Theorem 3.15. DFS-TRAVERSAL and BFS-TRAVERSAL visit every vertex exactly once and have running time
Θ(n + m).
Proof. The statement follows from the fact that vertices are inserted exactly once into S or Q and that
v ∈V deg( v ) ≤ 2m ( m for directed graphs), as in the proofs of Lemma 3.12 and Proposition 3.14.
P
Proof. We simply execute either algorithm and check whether we ever encounter a vertex v ∈ Γ (u)\{pu }
that is already visited during DFS or BFS. The running time follows from Theorem 3.15.
So what are the advantages / disadvantages of DFS and BFS, or when should we prefer one over the
other? Both algorithms have many applications where the specific order in which they visit vertices are
important (see Theorems 3.18 and 3.25 below). In terms of pure graph traversal, DFS has the advantage
that it can be implemented very elegantly using recursion (note that the traversal order of this recursive
implementation differs from the iterative implementation above, see Figure 3.4):
In addition, DFS often profits from “locality”, i.e., the fact that vertices that are visited consecutively
tend to be close to each other. This not only is an advantage when implementing DFS in a physical
system where we actually have to travel through a graph-like environment, but also for implementations
on computing systems that benefit from memory locality, i.e., architectures with a memory hierarchy
(e.g., hard drive, main memory, caches, registers) where memory is transferred block-wise from level to
23
level. On the other hand, BFS can be parallelized more easily. This is important in physical implementa-
tions where we have multiple agents traversing the environment, or for implementations on multi-core
systems.
A crucial feature of the order in which BFS visits vertices lies in the fact that it prefers short paths and
that each vertex gets its smallest possible depth assigned to it. To make this precise, we introduce the
following notion.
Definition 3.17. The distance d(u, v ) ∈ N ∪ {∞} between vertices u and v of a graph G is the shortest
length among all u-v -paths in G , or ∞ if no such path exists. An (unweighted) shortest path in G is a
u-v -path P with |P| = d(u, v ).
With this, we can show that BFS computes shortest paths to all reachable vertices. Observe that this is
not true for DFS.
Theorem 3.18. BFS(G, r) computes shortest paths from r to all vertices v reachable from r in G in
time Θ(m).
Proof. Using Proposition 3.13, it remains to show that the depths computed by BFS satisfy dv = d(r, v )
for all vertices v reachable from r in G . We show this claim by induction on d(r, v ), and, additionally,
that v is added to Q before any vertex v 0 with d(r, v 0 ) > d(r, v ). The case d(r, v ) = 0 is trivial, since it
implies v = r . Now consider v ∈ V with d(r, v ) ≥ 1 and assume that both claims hold for all vertices
of distance at most d(r, v ) − 1 from r . By Lemma 3.12, every vertex reachable from r is visited exactly
once, hence the parents and depths do not change once they are set by BFS. Let (r, . . . , u, v ) be any
r -v -path of length d(r, v ) in G . Then, d(r, u) ≤ d(r, v ) − 1 (even equality holds) and, by induction,
du = d(r, u) < d(r, v ). Consider the iteration of the outer loop in which pv is extracted from Q.
At this point, either pv = u, or u was not yet extracted, since otherwise v would already have been
visited. Since Q is first-in-first-out, this means pv was added to Q no later than u. Hence, by induction,
d(r, pv ) ≤ d(r, u) and d pv = d(r, pv ), which means that BFS sets dv = d pv +1 = d(r, pv )+1 ≤ d(r, u)+1 ≤
d(r, v ). Because v ∈ Γ (pv ), we also have d(r, v ) ≤ d(r, pv ) + 1 = dv , thus equality holds.
Since, by induction, pv was added to Q before any vertex v 0 with d(r, v 0 ) > d(r, pv ), and Q is first-in-
first-out, we know that only vertices v 0 with d(r, v 0 ) ≤ d(r, pv ) + 1 = d(r, v ) can have been added to Q
until pv is extracted and v is added.
The running time follows from the fact that exactly those vertices reachable from r are added to Q
exactly once, by Lemma 3.12. This implies that the number of iterations of the inner loop is at most
v ∈V deg( v ) = O (m), and at least Ω(m) for connected graphs.
P
3.2 Trees
A particularly important class of graphs are trees. Abstractly, trees capture hierarchical structures and
branching processes.
Definition 3.19. A forest is an acyclic, undirected graph. A tree is a connected forest. The vertices of a
forest are called nodes and nodes of degree at most one are called leaves.
Trees naturally arise in many contexts. They are extremal graphs in multiple senses. For example they are
sparsest possible graphs in that they have the fewest number of edges and paths possible for a connected
graph. The following theorem gives various characterizations of trees.
24
r
depth = 2
v
height = 4
Tv
Theorem 3.20. Let G = (V, E) be an undirected graph with n ≥ 1 vertices and m edges. The following
are equivalent:
(i) G is a tree.
For inductive proofs over trees, the following property is essential. It allows to remove a leaf to apply
induction for the resulting subtree.
Proposition 3.21. Every tree with n nodes has at least min{n, 2} leaves.
Proof. We use induction over n. The statement is trivial for n = 1. Now consider any tree T = (V, E) with
n ≥ 2 and assume the statement holds for trees with fewer than n nodes. By Theorem 3.20, removing
any edge e ∈ E yields a graph with two connected components C1 and C2 . Then, T [C1 ] and T [C2 ] are
still trees and, by induction, they have min{|C1 | , 2} and min{|C2 | , 2} leaves. This means that at least one
node of C1 and one node of C2 must be leaves of T .
While trees are undirected by definition, they often describe hierarchical structures where edges corre-
spond to asymmetrical relationships between a predecessor and one or many successors. Rather than
orienting the edges of such a tree, it is sufficient to specify a root node at the top of the hierarchy to
implicitly assign an orientation to all edges (cf. Figure 3.5).
Definition 3.22. A rooted tree is a tree T = (V, E) together with a distinguished root r ∈ V . The depth of
a node v ∈ V is d(r, v ) and the height of T is maxv ∈V d(r, v ). The parent pv of a node v ∈ V \ {r} is the
unique node in Γ (v ) with d(r, pv ) < d(r, v ), and the children of u ∈ V are the nodes in { v ∈ V | pv = u}.
The subtree Tv of T rooted at v ∈ V is the subgraph of T induced by the set of vertices whose unique
paths to the root contain v .
For example, DFS and BFS naturally induce a rooted tree in the following sense (see Figure 3.6).
Proposition 3.23. The vertices pv computed by DFS(G, r) (or DFS0 (G, r)) and BFS(G, r) uniquely deter-
mine trees (V, { {v , pv }| v ∈ V \ {r}}) rooted at r that contains all vertices reachable from r . These trees
are called DFS-tree resp. BFS-tree of G rooted at r .
25
Figure 3.6: The DFS-, DFS0 - and BFS-trees for the example of Figures 3.3 and 3.4.
Proof. Without loss of generality, assume that all vertices are reachable from r . Then, all vertices v ∈
V \ {r} have pv 6= v after execution of DFS/BFS. We need to show that the undirected graph T = (V, E T )
with E T = { {v , pv }| v ∈ V \ {r}} is a tree. We have |E T | = n − 1, thus, by Theorem 3.20 it suffices to
show that T is acyclic.
Let (dv )v ∈V be the depths computed by DFS or BFS. Observe that every edge {u, v } ∈ E T satisfies
|du − dv | = 1 by definition of either algorithm, i.e., du 6= dv . Furthermore, every node v of T has at most
one neighbor u = pv in T with du < dv . For the sake of contradiction, assume that there was a cycle C
in T and let v be the vertex of C that maximizes dv . But then the two neighbors of v in C must have
smaller depth, which is a contradiction.
With this, we are ready to give an example where the order in which depth-first search visits vertices is
crucial. We start by relating strong connectivity to the order in which vertices are visited by DFS’. Note
that the following lemma holds just the same for DFS and BFS.
Lemma 3.24. Let G = (V, E) be a directed graph and r ∈ V . Then, G is strongly connected if and only if
DFS’(G, r) visits all vertices and from every vertex v ∈ V \ {r} some vertex v 0 can be reached that was
visited before v by DFS’.
Proof. First observe that reachability is transitive by Proposition 3.8. This means that G is strongly
connected if and only if every vertex can be reached from r , and r can be reached from every vertex.
⇒: If G is strongly connected, by the above observation and since Lemma 3.12 applies to DFS’ as well,
DFS’ visits all vertices of G . Also, since we can reach r from every vertex and r is visited first, the second
part of the claim holds.
⇐: Note that if DFS’ visits all vertices of G , then every vertex can be reached from r (Proposition 3.13).
It remains to show that r can be reached from every vertex. Consider the vertices in the order v1 , . . . , vn
in which they are visited. We show by induction on i that r can be reached from vi . This trivially holds
for i = 1 since v1 = r . For i ≥ 2, we know that there is an index j < i such that v j is reachable from vi .
By induction, r can be reached from v j . By transitivity, r can be reached from vi .
We can turn this lemma into an efficient algorithm to check whether a directed graph is strongly con-
nected. Crucially, we will see that DFS’ traversal order allows to keep track of the reachable vertices
visited earliest whenever a vertex is being visited.
Theorem 3.25. The problem of determining whether a directed graph is strongly connected can be
solved in time Θ(n + m).
26
Proof. Consider an execution of DFS0 (G, r) on a directed graph G = (V, E) for r ∈ V and let T denote the
resulting DFS-tree. We define two times for each vertex u: The time t u is the position of u in the order
that vertices are visited by DFS0 , and t ureach is the smallest time t v among all vertices v ∈ Vu ∪ ΓG (Vu ),
where Tu = (Vu , Eu ) denotes the subtree of T rooted at u. We can adapt DFS0 to compute these times as
follows.
Algorithm: STRONGCONNECTION(G, u)
input: graph G = (V, E), vertex u ∈ V ,
global time t
t u ← t; t ← t + 1
visit u
t ureach ← t u
for each v ∈ Γ (u) :
if v not visited :
STRONGCONNECTION(G, v )
t ureach ← min{t ureach , t vreach }
else
t ureach ← min{t ureach , t v }
Clearly, if t vreach is correctly computed for all v ∈ V with t v > t u , then t ureach is correctly computed as well.
Hence, by induction, STRONGCONNECTION correctly computes t ureach for all u ∈ V . Note that it is crucial
for this inductive argument that we use the recursive implementation DFS’, and that it is not clear how
to compute t ureach using BFS.
Now, observe that all vertices in Γ (Vu ) are visited before u, since Tu contains all vertices reachable from
u that are not visited before u. This means that t ureach < t u if and only if from u we can reach a vertex
visited before u. By Lemma 3.24, we only need to check whether all vertices are visited and t ureach < t u
for all u ∈ V \{r} to decide whether a graph G is strongly connected. Overall, the computation takes time
Θ(n + m), which is best possible, since we may need to inspect every vertex and every arc to conclude
that G is not strongly connected.
Finally, we illustrate the usefulness of trees when carrying out formal arguments, by showing that merge-
sort has a best-possible running time. As an exercise, try to carry out the following proof without using
the notion of a tree.
Theorem 3.26. Every sorting algorithm has running time Ω(n log n).
Proof. Consider the decision tree of any search algorithm for a list L of mutually distinct numbers. Every
node of the tree corresponds to a comparison during the course of the algorithm, and every path in the
tree to a possible sequence of comparisons. Depending on the outcomes of each comparison, the result
of the algorithm, i.e., the resulting permutation of L , can be different. Since there are n! different total
orders of the elements in L that must yield different permutations in the end, the decision tree must have
at least n! leaves. The number of comparisons that the algorithm needs in the worst case is equal to the
height h of the tree. By definition, every node of the tree has at most two children, hence the number of
leaves of the tree is at most 2h . To allow for n! leaves, the tree thus needs height
n n
Y X n n
h ≥ log2 (n!) = log2 ( i) = log2 (i) ≥ log2 = Ω(n log n).
i=1 i=1
2 2
27
4 Minimum Spanning Trees
In this chapter we focus on the problem of connecting a set of vertices as cheaply as possible by choosing
a subset of the available edges. Formally, we are looking for a spanning tree of an undirected graph.
Definition 4.1. A spanning tree of an undirected graph G = (V, E) is a tree T ⊆ G with set of nodes V .
Observe that we have already seen a way of computing such a spanning tree.
Proposition 4.2. If the input is an undirected, connected graph, DFS and BFS compute a spanning tree
in time Θ(m).
Proof. This follows immediately from Lemma 3.12 and Proposition 3.23.
Corollary 4.3. An undirected graph is connected if and only if it contains a spanning tree.
Proof. This follows by Proposition 4.2 and the fact that every graph that contains a spanning tree is
connected.
In many applications we are not just looking for some spanning tree of a graph, but the best spanning
tree in terms of a cost function associated with the edges of a graph. For example, consider the problem
of connecting all major cities of a country by optical high-speed data connections. The cost of connecting
cities can vary significantly due to their relative geographic locations and other factors. We will often
encounter settings where edges have costs/utilities associated with them, and we introduce the abstract
notion of weighted graphs to model such settings.
Definition 4.4. An (edge-)weighted graph is a graph G = (V, E) together with a function Pc : E → R. The
weight of a set of edges E 0 ⊆ E or the corresponding subgraph of G is given by c(E 0 ) := e∈E 0 c(e).
A spanning tree of minimum weight is simply called minimum spanning tree from now on. The al-
gorithms that we will see in this chapter rely on the following properties of minimum spanning trees
(cf. Figure 4.1).
Observation 4.5. Let G = (V, E) be an undirected, connected graph and T = (V, E T ) be a minimum
spanning tree of G with respect to edge weights c : E → R≥0 . Then, the following statements hold.
(i) Let e ∈ E T and let C be one of the two components of T − e (Theorem 3.20 (iv)). Then c(e) =
mine0 ∈δG (C) c(e0 ), i.e., e has minimum weight in the (unique) cut δG (C) induced by e.
(ii) Let e ∈ E\E T and let K be the (unique) cycle in T +e (Theorem 3.20 (v)). Then c(e) = maxe0 ∈K c(e0 ),
i.e., e has maximum weight on the (unique) cycle K induced by e.
29
5
4 6
7
9 9
9
6
4
9 4
6 4
3
6
3 3 6
1
Figure 4.1: Example of a minimum spanning tree (red). Every edge of the tree induces a unique cut in
the underlying graph, and is cheapest on this cut. Every edge not in the tree induces a unique
cycle, and is most expensive on this cycle.
Proof. If there was an edge e0 with c(e0 ) < c(e) in the cut induced by e ∈ E T , we could obtain a spanning
tree of smaller weight than T by replacing e by e0 . Similarly, if there was an edge e0 with c(e0 ) > c(e) on
the cycle induced by e ∈ E \ E T , we could obtain a spanning tree of smaller weight by replacing e0 by e.
Uniqueness of the induced cut and cycle follows by Proposition 3.20 (vi).
So how can we find a tree with these properties? A natural approach to solve the MST problem is to
consider edges one after the other and include or exclude edges from our solution as described below.1
Algorithm: MST-SCHEME(G, c)
input: undirected, connected graph G = (V, E), weights c : E → R≥0
output: minimum spanning tree
T ← (V, ;); G 0 ← (V, E)
while T 6= G 0 :
apply one of the following:
rule 1 : (add lightest edge over a cut)
C ← any component of T
e ← arg mine0 ∈δG 0 (C) c(e0 )
T ← T +e
rule 2 : (remove heaviest edge on a cycle)
K ← any cycle in G 0
e ← arg maxe0 ∈K\T c(e0 )
G0 ← G0 − e
return T
We show that every algorithm that proceeds according to the above scheme is guaranteed to eventually
compute a minimum spanning tree.
1
For simplicity, we use arg min x∈X f (x) to denote a (single) element x ∈ X with f (x) = min x 0 ∈X f (x 0 ), and similarly for
arg max. We use an arbitrary but fixed total order of X for tie-breaking in case |{ x ∈ X | min x 0 ∈X f (x 0 )}| > 1.
30
5 5 5 5 5
4 6 4 6 4 6 4 6 4 6
9 7 9 9 7 9 9 7 9 9 7 9 9 7 9
9 9 9 9 9
6 4 6 4 6 4 6 4 6 4
9 4 9 4 9 4 9 4 9 4
6 4 6 4 6 4 6 4 6 4
3 3 3 3 3
3 3 6 6 3 3 6 6 3 3 6 6 3 3 6 6 3 3 6 6
1 1 1 1 1
Figure 4.2: Some intermediate steps of an execution of KRUSKAL on the example of Figure 4.1. Red edges
are included in T and grey edges are excluded from G 0 , in terms of MST-SCHEME.
Theorem 4.6. Every implementation of the MST-SCHEME computes a minimum spanning tree.
Proof. We first argue that, as long as T 6= G 0 , at least one of the two rules can be applied. Observe that
T ⊆ G 0 throughout, since we only consider edges of G 0 in rule 1 and only edges outside of T in rule 2.
If T 6= G 0 , we can therefore find e = {u, v } ∈ G 0 \ T . If u and v lie in different connected components
Cu 6= Cv of T , we can apply rule 1 with C = Cu , since e ∈ δG 0 (Cu ) 6= ;. If u and v lie in the same
connected component of T , then T + e contains a cycle and we can apply rule 2.
We now prove that the scheme yields a minimum spanning tree. To that end, we show that it maintains
the invariant that G contains a minimum spanning tree that uses all edges in T and no edge outside G 0 .
Since we have T = G 0 in the end, this concludes the proof.
Let T ? be the minimum spanning tree guaranteed by our invariant before an application of rule 1 for
some connected component C of T and some e = {u, v } ∈ arg mine0 ∈δG 0 (C) c(e0 ). If e ∈ T ? , the invariant
is obviously maintained. Otherwise, the (unique) u-v -path P in T ? must use another edge e? ∈ δG 0 (C)
(by Observation 3.6). By definition of C , we have e? ∈ / T and, by our choice of e, we have c(e) ≤
c(e? ). Additionally, u and v lie in different connected components of T ? − e? , since P is unique by
Theorem 3.20 (vi). Therefore the spanning tree T ? − e? + e satisfies our invariant.
Now let T ? be the minimum spanning tree guaranteed by our invariant before an application of rule 2
for come cycle K in G 0 and some e = {u, v } ∈ arg maxe0 ∈K\T c(e0 ). If e ∈ / T ? , the invariant is obviously
maintained. Otherwise, u and v are in different components Cu 6= Cv of T ? − e (by Theorem 3.20 (vi)).
By Observation 3.6, the u-v -path K − e must contain an edge e0 ∈ δG 0 (Cu ) \ {e}. By definition of Cu , we
/ T ? . By our choice of e, we have c(e0 ) ≤ c(e) and e ∈
have that δ T ? (Cu ) = {e}, thus e0 ∈ / T . Therefore the
?
spanning tree T − e + e satisfies our invariant.
0
There are many ways of implementing the general scheme outlined above, differing by the order in
which edges are considered. We will see two straight-forward approaches in the following.
A natural order in which to consider the edges is by increasing weights. The edge with lowest weight
can safely be included in the spanning tree, since it is a lightest edge on every cut it is part of. Every
subsequent edge that does not close a cycle with the edges we already included in the spanning tree
cannot be the unique edge of maximum weight on any cycle in the graph G 0 of the MST-SCHEME. This
means that the scheme can never exclude this edge, so it must eventually be included. But then we can
safely include it immediately when we consider it.
Analogously, we can consider edges by decreasing weights and discard edges that do not disconnect our
solution, since these edges must be heaviest on a cycle. The two corresponding algorithms are listed
below.
31
Algorithm: KRUSKAL(G, c) Algorithm: KRUSKAL-DUAL(G, c)
input: undir., connected graph G = (V, E), input: undir., connected graph G = (V, E),
weights c : E → R≥0 weights c : E → R≥0
output: minimum spanning tree output: minimum spanning tree
T ← (V, ;) T ←G
(e1 , . . . , em ) ← sort E s.t. c(e1 ) ≤ c(e2 ) ≤ . . . (e1 , . . . , em ) ← sort E s.t. c(e1 ) ≤ c(e2 ) ≤ . . .
for e ← e1 , e2 , . . . , em : for e ← em , . . . , e2 , e1 :
if T + e is acyclic : if T − e is connected :
T ← T +e (rule 1) T ← T −e (rule 2)
else else
discard e (rule 2) keep e (rule 1)
return T return T
An example for Kruskal’s algorithm is given in Figure 4.2. Correctness of both algorithms immediately
follows from Theorem 4.6. Their running times depend on a specific implementation.
Proof. The running time is dominated by sorting the edges and checking whether T ± ei is
acyclic/connected for i ∈ {1, . . . , m}. Sorting can be done in time O(m log m) by using mergesort
(Theorem 2.6). Observe that O(m log m) = O(m log n) since m < n2 . Checking whether T ± ei is
acyclic/connected can be done in time O(n + m) using DFS or BFS (Propositions 3.14 and 3.16). Over-
all, this gives a running time of O(m log n) + m · O(n + m) = O(m2 ), since n ≤ m + 1 for connected
graphs.
We can improve this running time for KRUSKAL by storing which component every vertex belongs to. To
do this, we define variables (compv ∈ V )v ∈V that are initially set to compv = v for all v ∈ V . This
allows us to check in constant time whether T + e is acyclic for e = {u, v }, simply by checking whether
compu 6= compv . Whenever we add an edge e = {u, v } to T , we can 0 merge the components of u and v
by going over all vertices and setting compu0 ← compu for all u ∈ v ∈ V | compv 0 = compv . Each such
0
update takes linear time, and we can merge components at most n − 1 times until a single component
remains. We obtain a total running time of O(m log n + n2 ).
We can further improve the implementation to achieve the best-possible running time.
Theorem 4.8. KRUSKAL can be implemented to find a minimum spanning tree in time Θ(m log n).
Proof. Observe that the method to store components described above represents each component by a
rooted tree of height one if we interpret the variables (compv ∈ V )v ∈V as the parents of each vertex. We
can improve the running time of merging components if we allow trees to have larger heights. The idea
is to balance the running time for merging components with the running time for checking whether two
vertices share the same component, which essentially corresponds to the time needed to determine the
component of a vertex. We begin by reformulating the algorithm:
32
Algorithm: KRUSKAL(G, c) (union-find) Algorithm: FIND(u)
input: undir., connected graph G = (V, E), while compu 6= u :
weights c : E → R≥0 u ← compu
output: minimum spanning tree
return u
T ← (V, ;)
for v ∈ V :
compv ← v ; hv ← 0
Algorithm: UNION(u, v )
(e1 , . . . , em ) ← sort E s.t. c(e1 ) ≤ c(e2 ) ≤ . . .
u0 ← FIND(u)
for e = {u, v } ← e1 , e2 , . . . , em : v 0 ← FIND(v )
Cu ← FIND(u)
if hu0 < hv 0 :
Cv ← FIND(v )
(u0 , v 0 ) ← (v 0 , u0 )
if Cu 6= Cv :
UNION(u, v ) compv 0 ← u0
T ← T +e hu0 ← max{hu0 , hv 0 + 1}
return T
We can make the merge operation (UNION) cheaper by simply attaching the root of one of the trees
to the other root. Both to determine whether two vertices are in the same component or to merge
components then amounts to finding the roots of the corresponding trees (FIND). This needs time linear
in the heights of the trees. To make the process efficient, we thus need a way to limit the heights of the
trees representing components.
A simple trick to achieve this is to make sure that we always attach trees of smaller heights to trees of
larger heights. If we do this, the height of a tree can only increase if it is merged with a tree of the same
height. With this, a simple induction over the height shows that every tree of height h contains at least
2h nodes. Since no tree can contain more than all vertices of G , we have 2h ≤ n and thus h ≤ log2 n. The
running time becomes O(m log n) + m · O(log n) = O(m log n), as claimed.
The lower bound of Ω(m log n) follows from Theorem 3.26 and m ≥ n − 1 (since G is connected).
Remark 4.9. A data structure that supports the operations described in the proof of Theorem 4.8 is called
a union-find data structure.
While KRUSKAL takes the globally lightest edges first, we can take a local approach by growing a connected
component starting with a single vertex. In every step, the connected component C induces a cut δG (C)
and it is safe to include an edge of minimum weight on this cut and repeat with the same reasoning. The
resulting algorithm is Prim’s algorithm. If we simultaneously grow all connected components, we obtain
Borůvka’s algorithm.
33
5 5 5 5 5
4 6 4 6 4 6 4 6 4 6
9 7 9 9 7 9 9 7 9 9 7 9 9 7 9
9 9 9 9 9
6 4 6 4 6 4 6 4 6 4
9 4 9 4 9 4 9 4 9 4
6 4 6 4 6 4 6 4 6 4
3 3 3 3 3
3 3 6 6 3 3 6 6 3 3 6 6 3 3 6 6 3 3 6 6
1 1 1 1 1
Figure 4.3: Some intermediate steps of an execution of PRIM on the example of Figure 4.1. Red edges are
included in T and grey edges are excluded from G 0 , in terms of MST-SCHEME.
An example for Prim’s algorithm is given in Figure 4.3. Again, correctness follows immediately from
Theorem 4.6, and running times depend on a specific implementation. Note that consistent tie-breaking
is essential for B OR ŮVKA (see footnote 1 on page 30).
Proposition 4.10. A naive implementation of PRIM or B OR ŮVKA computes a minimum spanning tree in
time Θ(nm).
Proof. We can obtain the claimed running time by using DFS(T ∪ δG (T ), u) or BFS(T ∪ δG (T ), u) to find
an edge of minimum weight in the cut δG (T ). This algorithm runs in time O(m), and every time we
grow a connected component. This means that we are done after n − 1 steps and obtain the claimed
running time.
We can improve this running time for PRIM by using a data structure that keeps track of the best edge
to reach every vertex in V \ T from T . With this, finding an edge of minimum weight in δG (T ) can be
accomplished in time O(n). Since we need to do this at most n − 1 times, the total running time for
finding edges of minimum weight is O(n2 ). Every edge may be the best edge to reach a vertex in V \ T
from T , so the total cost for updating our data structure is O(m). Overall, we get an improved running
time of O(n2 ) + O(m) = O(n2 ).
The data structure we described above is a so-called priority queue. It supports the following operations:
• INSERT(S): Inserts all elements in S into the queue and assigns value ∞ to them.
• VALUE(s): Returns the value assigned to element s.
• DECREASEVALUE(s, x): The value assigned to s is decreased to x .
• EXTRACTMIN: Returns an object of lowest value and removes it from the queue.
/ VT from a
We can use a priority queue to store the smallest weight of an edge to reach every vertex v ∈
vertex in VT . This leads to the following reformulation of PRIM:
34
Algorithm: PRIM(G, c) (priority queue)
input: undirected, connected graph G = (V, E), weights c : E → R≥0
output: minimum spanning tree
take any u ∈ V
T = (VT , E T ) ← ({u}, ;)
INSERT(V \ {u})
while VT 6= V :
for v ∈ ΓG (u) \ Γ T (u) :
if c({u, v }) < VALUE(v ) :
ev ← {u, v }
DECREASEVALUE(v , c(ev ))
u ← EXTRACTMIN
T ← T + u + eu
return T
We use the fact that priority queues can be implemented more efficiently without proof.
Theorem 4.11. It is possible to implement a priority queue that takes time O(m + n log n) to insert O(n)
elements and perform O(m) other operations.
With this, we obtain an even better running time for Prim’s algorithm.
Theorem 4.12. PRIM can be implemented to find a minimum spanning tree in time O(m + n log n).
Proof. For every vertex, PRIM has to iterate over all neighbors, which means that the for-loop iterates at
most 2m times overall. Consequently, we insert n elements into the priority queue and perform O(m)
other operations on it. The running time thus follows immediately by Theorem 4.11.
Remark 4.13. Observe that this running time is better than the best running time achievable with
Kruskal’s algorithm. The best known running time of O(m · α(m)) for the minimum spanning tree
problem was achieved by Chazelle. Here the inverse Ackermann function α(m) is an extremely slowly
growing function (e.g., α(9876!) = 5).
35
5 Shortest Paths
We now turn to the problem of finding short paths in a directed graph. This problem has obvious
applications for routing and navigation, but it also appears naturally as a subproblem in many other
places (we will see one example in Chapter 6).
We already showed that BFS computes a path with the minimum number of arcs from a given root
(Theorem 3.18). It is more challenging to find a good path in a directed graph G = (V, E) with respect
to given arc weights c : E → R. We will make this notion more precise in the following. For convenience,
we write Pc(u, v ) := c((u, v )) and set c(u, u) = 0 and c(u, v ) = ∞ if (u, v ) ∈
/ E . For a walk W in G , we let
c(W ) := e∈W c(e).
Definition 5.1. Let G = (V, E) be a directed graph and c : E → R. We let Wuv denote the set of all walks
from u to v in G and define
(k)
dc,G (u, v ) := min ∞, min c(W )| W = (v0 = u, e1 , . . . , vk0 = v ) ∈ Wuv with k0 ≤ k ,
(k)
dc,G (u, v ) := lim dc,G (u, v ).
k→∞
(k)
We drop the index G and write dc (u, v ) := dc,G (u, v ) and dc(k) (u, v ) := dc,G (u, v ) when the graph G is
clear from the context.
If P is a u-v -path in G with c(P) = dc (u, v ), we say that P is a shortest path in G with respect to c . If K is
a cycle in G with c(K) < 0, we say that K is a negative cycle in G induced by c .
We formally show the intuitive fact that, if there are no negative cycles, detours can only increase the
length of a walk.
Proposition 5.2. Let G = (V, E) be a directed graph and let c : E → R not induce negative cycles. Then,
for all u ∈ V and all v ∈ V reachable from u, there exists a shortest u-v -path in G with respect to c and
it holds that dc (u, v ) = dc(n−1) (u, v ).
Proof. For any k ≥ n, let W be a walk from u to v in G with at most k arcs and such that c(W ) =
dc(k) (u, v ). By Proposition 3.8, we can decompose W into a u-v -path P and a set of cycles K (note that,
in directed graphs, closed walks of length two are cycles). Since c does not induce negative cycles, we
have c(K) ≥ 0 for all K ∈ K and thus c(W ) = c(P) + K∈K c(K) ≥ c(P). Since P has at most k arcs, we
P
have c(P) ≥ dc(k) (u, v ) = c(W ). Together this implies that c(P) = dc(k) (u, v ).
Now, P is a path and thus has at most n − 1 arcs, which means that dc(n−1) (u, v ) ≤ c(P) = dc(k) (u, v ),
and thus dc(n−1) (u, v ) = dc(k) (u, v ). Since our argument holds for all k ≥ n, we conclude that dc (u, v ) =
dc(n−1) (u, v ) and P is a shortest u-v -path in G .
A key feature of shortest paths is that they have optimum substructure, i.e., any part of a shortest path is
again a shortest path. We prove the following generalization of this statement.
Proposition 5.3. Let G = (V, E) be a directed graph and c : E → R. If, for some k ∈ N, we have
0
c(W ) = dc(k) (v0 , vk0 ) for a walk W = (v0 , e1 , . . . , vk0 ) in G with k0 ≤ k, then c(Wi j ) = dc(k−k + j−i) (vi , v j ) for
every Wi j = (vi , ei+1 , . . . , v j ) with 0 ≤ i < j ≤ k0 .
37
0
Proof. By definition, if c(Wi j ) 6= dc(k−k + j−i) (vi , v j ), then there is a walk Wi0j from vi to v j with at most
k − k0 + j − i arcs in G and with c(Wi0j ) < c(Wi j ). But then the walk W 0 := (v0 , . . . , vi ) ⊕ Wi0j ⊕ (v j , . . . , vk0 )
has at most k arcs and c(W 0 ) < c(W ), contradicting c(W ) = dc(k) (v0 , vk0 ).
Corollary 5.4. Let G = (V, E) be a directed graph and let c : E → R not induce negative cycles. For all
s ∈ V and v ∈ V \ {s} reachable from s, there exists a vertex u ∈ V \ {v } and a shortest s-u-path Psu in
G − v such that Psu ⊕ (u, v ) is a shortest s-v -path in G .
Corollary 5.4 yields an easy way to determine shortest paths once we have computed dc (u, v ) for all
u, v ∈ V : For every v ∈ V reachable from s ∈ V , the predecessor of v on a shortest s-v -path is u ∈ V
with v ∈ Γ (u) and dc (s, v ) = dc (s, u) + c(u, v ). We can obtain a shortest s-v -path by backtracking from v
along these predecessors until we reach s. Note that we have to deal differently with arcs of weight 0.
We are now ready to consider the algorithmic problem of finding a shortest path between vertices s
and t . Observe that, in the worst case, the shortest path to vertex t visits all other vertices, which means
that we have to compute shortest paths to all v ∈ V on the way. It therefore makes sense to reformulate
the problem and ask for shortest paths to all vertices reachable from s to begin with. As described above,
it suffices to compute the values dc (s, v ) for all v ∈ V , since we can find the corresponding paths via
Corollary 5.4.
Corollary 5.4 suggests a greedy approach to computing the shortest paths from s ∈ V if all arc weights
are positive: We start with the set S = {s} of G . Let v ∈
/ S be the closest vertex to s with respect to c . The
shortest s-v -path P must lie in G[S ∪ {v }], since every path that leaves S elsewhere cannot be shorter
than P already up to this point, and it can only be even longer overall, since arc weights are positive.
This means that if we keep track of all shortest paths to vertices in S , we can find the closest vertex to s
and add it to S in each step. We will see that this simple algorithm, listed below, still works if arc weights
are non-negative.
Algorithm: DIJKSTRA(G, c, s)
input: directed graph G = (V, E), weights c : E → R≥0 , vertex s ∈ V
output: distances ds v = dc (s, v ) for all v ∈ V
¨
0, if v = s,
ds v ←
∞, otherwise.
R←V (INSERT)
while R 6= ; :
u ← arg minv ∈R ds v
R ← R \ {u} (EXTRACTMIN)
for v ∈ Γ (u) ∩ R :
ds v ← min{ds v , dsu + c(u, v )} (DECREASEVALUE)
return (ds v )v ∈V
38
We now give a formal argument for the correctness of the above formulation of Dijkstra’s algorithm.
Proof. Denote S := V \ R and Gv := G[S ∪ {v }] for v ∈ V throughout the algorithm. We show that
DIJKSTRA maintains the invariant that for all v ∈ V we have (i) ds v = dc,Gv (s, v ), and (ii) if v ∈ S ,
then ds v = dc (s, v ). This invariant trivially holds in the beginning, and it implies correctness when the
algorithm terminates with S = V after n iterations. To prove the invariant, consider the beginning of an
iteration where vertex u ∈ R is extracted from R.
Let P be a shortest s-u-path in G and let (w, w0 ) be the first arc along P that lies in the cut δ(S) (this
arc exists by Observation 3.6). Since c is non-negative, Proposition 5.3 implies that c(P) ≥ dc (s, w0 ) =
dc,Gw0 (s, w0 ). Our invariant (i) guarantees dc,Gw0 (s, w0 ) = dsw0 . Finally, by choice of u, we have dsw0 ≥
minv ∈R ds v = dsu . Using invariant (i) again, we conclude that dc (s, u) = c(P) ≥ dsu = dc,Gu (s, u). Since
dc (s, u) ≤ dc,Gu (s, u) by definition, equality holds and invariant (ii) is maintained when u is added to S .
Now, by invariants (i) and (ii), we have dc,Gw (s, w) = dc (s, w) for all w ∈ S , i.e., there is a shortest s-w-
path that lies entirely in Gw and, in particular, does not contain u. Consequently, for every v ∈ Γ (u) ∩ R
there is a shortest s-v -path in Guv := G[S ∪ {u, v }] that either does not go via u or uses the arc from u
to v . By Corollary 5.4 and invariant (i), we obtain
which means that we update the values ds v for v ∈ Γ (v ) ∩ R such that invariant (i) is maintained when u
is added to S .
The similarity between DIJKSTRA and PRIM should be obvious. While PRIM adds the arc next that leads to a
vertex closest to any v ∈ S , DIJKSTRA adds the arc next that leads to a vertex closest to s. Otherwise, both
algorithms are identical, which means that we can use a very similar implementation using a priority
queue.
Theorem 5.6. DIJKSTRA can be implemented with a running time of O(m + n log n).
Proof. We can implement R as a priority queue. We initially insert n vertices into R and perform at most
n + m other operations on it. By Theorem 4.11, this can be accomplished with a total running time of
O(m + n log n).
While Dijkstra’s algorithm is conceptually simple, its efficient implementation is quite involved. Addi-
tionally, Dijktra’s algorithm cannot cope with negative arc weights. We can circumvent both these issues
by investigating the interdependency of the values dc(k) (u, v ).
39
Proof. The statement is trivial for k = 1. For k ≥ 2, let W = (v0 = u, . . . , vk0 = v ) be a walk with k0 ≤ k
and c(W ) = dc(k) (u, v ). Then, by Proposition 5.3, c(W 0 ) = dc(k−1) (u, vk0 −1 ) for W 0 = (v0 , . . . , vk0 −1 ). Hence,
dc(k) (u, v ) = c(W ) = c(W 0 ) + c(vk0 −1 , v ) = dc(k−1) (u, vk0 −1 ) + c(vk0 −1 , v ). It remains to show that there is
no vertex w with dc(k−1) (u, w) + c(w, v ) < dc(k) (u, v ). This is the case since, otherwise, we could construct
a walk from u to v with at most k arcs that is shorter than W .
Lemma 5.7 immediately suggests a recursive approach to calculating the values dc(k) (u, v ) for all k ∈
{1, . . . , n − 1}, listed below. Recall that it suffices to compute the values dc(n−1) (s, v ) for all v ∈ V by
Proposition 5.2, assuming that c does not induce negative cycles. However, it is easy to see that in this
way we may consider all nn−1 possible sequences of n vertices starting at s in the process (for a complete
graph), which leads to a horrendous running time.
The problem of the recursive approach is that we repeatedly recompute the same values (e.g., ds(k) v
(u, v )
n−1−k
is computed up to n times). This is a typical issue arising with recursive implementations that ab-
solutely needs to be avoided! A typical way to repair such an implementation without much effort uses
memoization, i.e., the storage of precomputed partial solutions. The main disadvantage of memoization
from a theoretical perspective lies in the difficulty to analyze a memoization solution (not a big issue in
the simple implementation below). From a practical perspective, implementations relying on memoiza-
tion often suffer from a lack of locality, i.e., data accesses are not grouped well together according to the
corresponding locations in system memory.
Algorithm: MEMOIZATIONSP(G, c, s)
input: directed graph G = (V, E),
weights c : E → R, vertex s ∈ V
Algorithm: RECURSIVESP(G, c, s)
output: distances ds v = dc (s, v ) for all v ∈ V
input: directed graph G = (V, E), ¨
weights c : E → R, vertex s ∈ V (k) c(s, v ), if k = 1,
ds v ←
output: distances ds v = dc (s, v ) for all v ∈ V −∞, otherwise.
for v ∈ V : for v ∈ V :
ds v ← REC(v , max{1, n − 1}) ds(n−1)
v
← MEMO(v , max{1, n − 1})
return (ds v )v ∈V return (ds(n−1)
v
)v ∈V
A much more systematic paradigm is dynamic programming, where we compute partial solutions in a
bottom-up fashion (listing below). Dynamic programs (DPs) are easier to analyze and to verify and
can be extremely efficient in practice. As a rule of thumb, problems that admit optimum substructure
lend themselves to dynamic programming solutions. Conceptually, it is often easiest to start with a
recursive solution, reformulate it to use memoization, and finally reformulate it again using dynamic
programming. In a nutshell, the description of every (!) dynamic program consists of the following
ingredients:
(i) DP table and interpretation of the meaning of its entries (here: Definition 5.1)
(ii) dependencies between entries of the DP table (here: Lemma 5.7)
(iii) an order of computation of the entries that respects dependencies (here: by increasing values of k)
(iv) extraction of the final solution from the DP table (here: Proposition 5.2)
40
Algorithm: DYNAMIC SP(G, c, s)
input: directed graph G = (V, E), weights c : E → R, vertex s ∈ V
output: distances ds v = dc (s, v ) for all v ∈ V
¨
0, if v = s,
ds(0)
v
←
∞, otherwise.
for k ← 1, 2, . . . , n − 1 :
for v ∈ V :
ds(k)
v
← minu∈V {dsu (k−1)
+ c(u, v )}
return ds(n−1)
v v ∈V
Indeed, DYNAMIC SP is easy to analyze: It’s running time is Θ(n3 ), which is drastically better than the
purely recursive algorithm above.
/ Γ (u) and we
We can further improve the algorithm by avoiding to iterate over pairs u, v ∈ V with v ∈
can reduce the number of variables (i.e., the memory requirement) of the algorithm if we do not insist
to keep track of the best walks for every number of arcs. These changes lead us to the algorithm by
Bellman-Ford (concurrently proposed by Moore):
Algorithm: BELLMANFORD(MOORE)(G, c, s)
input: directed graph G = (V, E), weights c : E → R, vertex s ∈ V
output: distances ds v = dc (s, v ) for all v ∈ V
¨
0, if v = s,
ds v ←
∞, otherwise.
for k ← 1, 2, . . . , n − 1 :
for (u, v ) ∈ E :
ds v ← min{ds v , dsu + c(u, v )}
return (ds v )v ∈V
To show the following result, we mainly need to argue that using a single value ds v instead of
ds(0)
v
, ds(1)
v
, . . . , ds(n−1)
v
does not cause problems. Note that (for m ∈ ω(log n)) the running time is worse
than for Dijkstra’s algorithm, however, BELLMAN-FORD(-MOORE) is easy to implement (which also means
that the hidden constant factor in the running time is small) and is very efficient in practice. Addition-
ally, BELLMANFORD(MOORE) can work with negative arc weights, as long as their are no negative cycles,
in which case shortest paths may not exist.
Theorem 5.8. BELLMANFORD(MOORE) solves the shortest path problem in time Θ(nm) on directed graphs
without negative cycles.
Proof. We show that the algorithm maintains the invariant that, after the k-th iteration of the outer loop,
ds v ≤ dc(k) (s, v ) for all v ∈ V and that an s-v -path of length at most ds v exists if ds v < ∞ (with an
arbitrary number of arcs). The invariant obviously holds in the beginning, and it implies that the result
of the algorithm is correct, since dc (s, v ) = dc(n−1) (s, v ) by Proposition 5.2. It remains to show that the
invariant is maintained.
Consider an update of ds v for arc (u, v ) in iteration k of the outer loop. If the value of ds v changes, then a
walk of the claimed length exists via u. Since there are no negative cycles, we can use Proposition 3.8 to
41
obtain a path of length at most ds v . By Lemma 5.7, there is a vertex w ∈ V with dc(k) (s, v ) = dc(k−1) (s, w) +
c(w, v ). By our invariant at the end of iteration k − 1, we have dsw ≤ dc(k−1) (s, w). In the iteration of the
inner loop for (w, v ), we set
ds v ← min{ds v , dsw + c(w, v )} ≤ dsw + c(w, v ) ≤ dc(k−1) (s, w) + c(w, v ) = dc(k) (s, v ),
hence our invariant is maintained. The running time of the algorithm is obvious.
42
6 Network Flows
In this chapter we consider network flows as an abstract framework for throughput maximization in
graphs. This framework captures flows of many types, including traffic flows, resource flows, data flows,
currency flows, and many more. In addition, structural results for network flows are applicable in other
domains, as we will see in Chapter 7. We start by introducing the basic framework.
Definition 6.1. A network is a 4-tuple (G, µ, s, t) where G = (V, E) is a directed graph, µ: E → R>0 are
arc capacities, s ∈ V is the source, and t ∈ V is the sink. To simplify notation, we additionally require that
{(u, v ), (v , u)} * E for all u, v ∈ V .
Definition 6.2. A function f : E → R≥0 is an s- t -flow in the network (G = (V, E), µ, s, t) if it satisfies:
(iii) (positivity) | f | ≥ 0,
With this, we are ready to formally state the algorithmic problem that we will be concerned with in this
chapter.
For the remainder of this chapter, we fix a network (G = (V, E), µ, s, t), an s- t -flow f in this network, and
a flow f ? of maximum value. We will see later that such a flow always exists (Corollary 6.14) and say
that f ? is a maximum flow.
The following notion will play a crucial role to bound the value of a maximum flow.
Definition 6.3. An s- t -cut is a cut δ+ (S) with s ∈ S ⊆ V \ {t}. Its capacity is µ(S) := e∈δ+ (S) µ(e).
P
Clearly, flow conservation implies | f | = −ex f (s). We now show a stronger statement.
43
G f Gf
3 2 3 1 3 1
1
s 2 t s 2 t s 2 t
5 4 2 4 3 4
2
Figure 6.1: From left to right: network (G, µ, s, t) with arcs labeled by µ, graph G with arcs labeled by a
flow f of value | f | = 5, residual graph G f with arcs labeled by residual capacities.
Proof. For T := V \ S and using capacity constraints (CC), and flow conservation (FC), we obtain
|f | = ex f (t)
X X
= f (e) − f (e)
e∈δ− (t) e∈δ+ (t)
!
(FC)
X X X
= f (e) − f (e)
v ∈T e∈δ− (v ) e∈δ+ (v )
X X X X
= f (e) − f (e)
v ∈T e∈δ− (v ) v ∈T e∈δ+ (v )
X X X X
= f (e) − f (e)
v ∈T e∈δ− (v )\T 2 v ∈T e∈δ+ (v )\T 2
X X
= f (e) − f (e)
e∈δ− (T ) e∈δ+ (T )
X X
= f (e) − f (e)
e∈δ+ (S) e∈δ− (S)
X
≤ f (e)
e∈δ+ (S)
(CC) X
≤ µ(e),
e∈δ+ (S)
A flow may not saturate all arcs, i.e., there may be arcs e ∈ E with f (e) < µ(e), which means that there
might be potential for increasing the flow value. To do this, we may need to reroute some of the flow
(cf. Figure 6.1 (center)). It makes sense to consider the graph that encodes all potential flow increases
and changes in order to use graph theoretic tools in the design of algorithms. This residual graph is
defined below (cf. Figure 6.1 (right)).
Definition 6.5. For every e = (u, v ) ∈ V ×V , we define the reverse arc of e as ē := (v , u) and set Ḡ = (V, Ē)
with Ē := { ē | e ∈ E}. The residual capacity of e ∈ E ∪ Ē with respect to f is defined by
¨
µ(e) − f (e) if e ∈ E,
µ f (e) :=
f (ē) if e ∈ Ē.
44
Remark 6.6. Observe that residual capacities are well-defined since E ∩ Ē = ; by our additional re-
quirement for networks that arcs may not be present in both directions. Without this requirement, the
residual graph needs to be defined as a multigraph (see Remark 3.2), which would complicate notation.
Alternatively, we can avoid multigraphs if we subdivide each arc in E by inserting an additional vertex,
which ensures that E ∩ Ē = ;. However, this increases the number of vertices to Θ(m), which distorts the
running times we achieve below.
We can now make the notion precise that a flow has the potential to be increased.
Theorem 6.8. The flow f is a maximum flow if and only if there is no augmenting path in G f .
Proof. If there is an s- t -path P in G f , we can increase flow values along this path by mine∈P µ f (e) without
violating capacity constraints (or flow conservation) – hence, f cannot be a maximum flow.
If there is no s- t -path in G f , then the set S ⊆ V of all vertices reachable from s in G f induces an empty
cut δG+ (S) = ; in G f . For every arc e ∈ δG+ (S) ∪ δḠ+ (S) we thus have µ f (e) = 0. With this, we obtain
f
X X
0= µ f (e) + µ f (e)
e∈δ+
G
(S) e∈δ+ (S)
Ḡ
X X
= µ f (e) + µ f (ē)
e∈δ+
G
(S) e∈δ−
G
(S)
X X
= µ(e) − f (e) + f (e),
e∈δ+
G
(S) e∈δ−
G
(S)
X X X
|f | = f (e) − f (e) = µ(e).
e∈δ+
G
(S) e∈δ−
G
(S) e∈δ+
G
(S)
By Proposition 6.4 there cannot be any s- t -flow of larger value. It follows that f is a maximum s- t -
flow.
Theorem 6.8 suggests a straight forward algorithm to find a maximum flow: While there is an augment-
ing path P in G f , augment the flow by the largest amount possible along P . This algorithm is called the
Ford-Fulkerson method, where the designation as a method is due to the fact that the algorithm can be
implemented in different ways, since it does not specify which path exactly to choose in each step.
45
s
x x
s 1 t v1 1 v2 1 v3 r v4
x x
t
Figure 6.2: Left: a network where FORDFULKERSON needs Ω(| f ? |) iterations. Right: a network where
p
FORDFULKERSON may not terminate (for r = ( 5 − 1)/2).
Algorithm: FORDFULKERSON(G, µ, s, t)
input: network (G = (V, E), µ, s, t)
output: maximum s- t -flow
f ←0
while ∃ any s- t -path P in G f :
δ ← mine∈P µ f (e)
AUGMENT(P, δ)
return f
Algorithm: AUGMENT(P, δ)
for e ∈ P :
if e ∈ E :
f (e) ← f (e) + δ
else
f (ē) ← f (ē) − δ
For integral capacities (i.e., µ ∈ N), we can show that this method always works, independent of which
augmenting path we choose in each step. Note that each augmenting path can be found in time O(m)
by using DFS/BFS, so that the following theorem yields a running time of O(| f ? | · m).
Theorem 6.9. Every implementation of FORDFULKERSON finds a maximum flow in O(| f ? |) augmentations
for a network with integral capacities. For some choices of augmenting paths, FORDFULKERSON needs
Ω(| f ? |) augmentations.
Proof. By Theorem 6.8, the algorithm is correct if it terminates. Since all capacities are integral, we
increase the flow value by at least 1 in each iteration. The total number of iterations can thus be at
most | f ? |. For the lower bound, consider the example in Figure 6.2 (left). This network has | f ? | = 2x
and needs 2x augmentations if we decide to augment along paths of length 3 in each iteration.
Corollary 6.10. Every network with integral capacities admits an integral maximum flow.
In contrast, it can be shown that, in general, FORDFULKERSON does not even terminate if augmenting
paths are chosen in an unfortunate way (see Figure 6.2 (right)). The proof of the following statement is
left as an exercise.
Proposition 6.11. FORDFULKERSON may not always terminate for real-valued capacities.
46
6.2 The Edmonds-Karp Algorithm
Inspecting the examples in Figure 6.2, a natural idea to improve the Ford-Fulkerson method is to ensure
that short augmenting paths are favored. The corresponding algorithm that always takes a shortest path
in G f is called the Edmonds-Karp algorithm.
Algorithm: EDMONDSKARP(G, µ, s, t)
input: network (G = (V, E), µ, s, t)
output: maximum s- t -flow
f ←0
while ∃s- t -path in G f :
P ← shortest s- t -path in G f (unweighted)
δ ← mine∈P µ f (e)
AUGMENT(P, δ)
return f
The following lemma establishes that EDMONDSKARP makes progress in some sense, between increasing
flow along an arc and rerouting it later.
Lemma 6.12. Let P1 , P2 , . . . denote the augmenting paths chosen by EDMONDSKARP in this order. Then,
for every two such paths Pk , P` with k ≤ ` we have
Proof. We may assume that there is no path Pi with i ∈ {k+1, . . . , `−1} that contains an arc e with ē ∈ P` .
Otherwise, by induction on ` − k, we immediately have |P` | ≥ |Pi | + 2 ≥ |Pk | + 2 ≥ |Pk | + 2 · min{1, mkl }.
Let Ēk` := e ∈ E ∪ Ē e, ē ∈ Pk ∪ P` and let f k denote the flow before augmenting along Pk in the k-th
iteration. We consider the graphs H k = (V, Ek ) := (V, Pk \ Ēk` ), H` = (V, E` ) := (V, P` \ Ēk` ), and H k` :=
(V, (Pk ∪ P` ) \ Ēk` ). Since the paths Pk+1 , . . . , P`−1 do not contain any reverse arcs to arcs in P` , we have
P` \ Ēk` ⊆ G f k and hence H k , H` , H k` ⊆ G f k . Moreover, we have |δH+ (v )| + |δH+ (v )| = |δH− (v )| + |δH− (v )|
k ` k `
for all v ∈ V \ {s, t}. This allows us to construct a new walk W1 in H k` as follows: Start at s and
repeatedly take an unused arc in H k or H` , until we finally reach t . Because of |δH+ (s)| + |δH+ (s)| = 2
k `
and |δH− (s)| + |δH− (s)| = 0 we can repeat this process once to obtain an additional walk W2 in H k` using
k `
different arcs of H k and H` (in H k` some arcs may be used twice). By Proposition 3.8, we can obtain two
s- t -paths Q 1 ⊆ W1 and Q 2 ⊆ W2 in H k` ⊆ G f k . Since Pk is an (unweighted) shortest s- t -path in G f k , we
must have
2 |Pk | ≤ |Q 1 | + |Q 2 | ≤ |W1 | + |W2 | ≤ |Ek | + |E` | = |Pk | + |P` | − 2mkl ≤ |Pk | + |P` | − 2 · min{1, mkl }.
With this, we can bound the running time of the Edmonds-Karp algorithm even for general capacities.
Theorem 6.13. EDMONDSKARP can be implemented to find a maximum flow in time O(m2 n).
Proof. Consider an arc e ∈ E ∪ Ē and all augmenting paths P1 , P2 , . . . , Pk that saturate arc e. For i ∈
{1, . . . , k − 1}, between the augmentations along Pi and Pi+1 there must have been an augmenting path P
utilizing ē. According to Lemma 6.12 we have
|Pi+1 | ≥ |P| + 2 ≥ |Pi | + 4.
47
With |Pk | ≤ n − 1 and |P1 | ≥ 1, this implies k ≤ (n − 1)/4 = O(n).
Since every augmenting path saturates at least one arc and there are at most O(n) augmenting paths
saturating the same arc, the total number of augmenting paths can be at most E ∪ Ē · O(n) = O(mn).
We can find (unweighted) shortest paths using BFS in time O(m) (Theorem 3.18). Overall this gives a
running time of O(m2 n).
In particular, we have found an algorithmic existence proof of maximum flows in finite networks.
Observe that Theorem 6.8 provides a straight-forward way of proving that a given flow in not a maximum
flow. Conversely, by Proposition 6.4, we can prove that a flow is a maximum flow by finding a cut whos
capacity is equal to the flow value. The following max-flow min-cut theorem guarantees that we can
always find such a cut, and thus provides an elegant way of proving maximality. This structural insight
is the most fundamental result for network flows.
Theorem 6.15 (max-flow min-cut). The value of a maximum s- t -flow is equal to the minimum capacity
over all s- t -cuts, i.e.,
| f ?| = min µ(S).
S⊆V \{t}:s∈S
Conversely, by Theorem 6.8, there is no augmenting path in G f ? . We can therefore let S ⊆ V \ {t}
denote the set of vertices reachable from s in G f ? . Then, δG+ ? (S) = ;. Hence, µ f ? (e) = 0 for all
f
e ∈ δG+ (S) ∪ δḠ+ (S). With Proposition 6.4, we obtain
X X
| f ?| = f ? (e) − f ? (e)
e∈δ+
G
(S) e∈δ−
G
(S)
X X
= f ? (e) − µ f ? (ē)
e∈δ+
G
(S) e∈δ−
G
(S)
X X
= µ(e) − µ f ? (e) − µ f ? (e)
e∈δ+
G
(S) e∈δ+ (S)
Ḡ
X
= µ(e)
e∈δ+
G
(S)
X
≥ µ(e).
e∈δ+
G
(S ? )
Another crucial and intuitive property of flows is that they can always be decomposed in the following
sense.
48
Theorem 6.16 (path decomposition). For every s- t -flow f there exists aPfamily F = P ∪ K with |F | ≤ |E|
of s- t -paths P and cycles K and weights c : F → R>0 with f (e) = F ∈F :e∈F c(F ) for all e ∈ E and
| f | = P∈P c(P).
P
Proof. We prove the statement by induction over |E|. For |E| = 0 or f = 0 the statement trivially holds
with F = ;. Otherwise, we let W = (v0 , e1 , v1 , . . . , ek , vk ) be a walk of maximum length with f (ei ) > 0
for all i ∈ {1, . . . , k} and {v0 , . . . , vk−1 } are all distinct. Then, k ≥ 1 since f 6= ;. By flow conservation and
since W is maximal, we either have vk ∈ {v0 , . . . , vk−1 }, or v0 = s and vk = t . Accordingly, W contains a
walk F that is a cycle or an s- t -path. Let emin ∈ arg mine∈F f (e). We define G 0 = (V, E 0 ) := G − emin and a
flow f 0 : E 0 → R≥0 in G 0 with
¨
f (e) − f (emin ) if e ∈ F,
f 0 (e) :=
f (e) otherwise.
Let F 0 = P 0 ∪ K0 be the family of paths and cycles that we obtain by induction for G 0 and f 0 , and let
c 0 : E 0 → R≥0 be the corresponding weights. We set F := F 0 ∪ {F }, c(F ) := f (emin ), and c(F 0 ) = c 0 (F 0 )
for F 0 ∈ F 0 . By induction, we have |F | = |F 0 | + 1 ≤ |E 0 | + 1 = |E|. Let δe , δ F ∈ {0, 1} such that δe = 1 if
and only if e ∈ F and δ F = 1 if and only if F is a path. With this, we obtain
X X
f (e) = f 0 (e) + δe · f (emin ) = c 0 (F 0 ) + δe · c(F ) = c(F ),
F 0 ∈F 0 :e∈F 0 F ∈F :e∈F
X X
0 0 0
| f | = f + δ F · f (emin ) =
c (P ) + δ F · c(F ) = c(P),
P 0 ∈P 0 P∈P
An important consequence of the above structural insights is Menger’s theorem, which states that the
number of disjoint paths equals the size of a separating set. This result holds for both vertices and edges
of both directed and undirected graphs.
Theorem 6.17 (Menger, edge version). Let G = (V, E) be a graph and s, t ∈ V . The maximum num-
ber pmax of edge-disjoint s- t -paths in G is equal to the minimum size kmin of a set E 0 ⊆ E of edges for
which t is not reachable from s in (V, E \ E 0 ).
Proof. First consider directed graphs. We obviously have kmin ≥ pmax , since we have to remove at
least pmax edges to destroy a fixed set of disjoint paths. We introduce arc capacities µ = 1 for every arc
and let f ? be an integral (by Corollary 6.10) maximum flow in the network (G, µ, s, t). By Theorem 6.16,
we can decompose f ? into a set P of s- t -paths and a set K of cycles, each with P weight 1. It follows that
no arc e can be part of more than one path in P , since 1 = µ(e) ≥ f ? (e) = F ∈P ∪K:e∈F c(F ). Hence, the
maximum flow value is | f ? | = |P | ≤ pmax . For every s- t -cut δ+ (S) we have |δ+ (S)| ≥ kmin , since we could
remove the arcs of the cut to destroy all s- t -paths. By Theorem 6.15, we thus have | f ? | = |δ+ (S ? )| ≥ kmin
for an s- t -cut δ+ (S ? ) of minimum capacity. With | f ? | ≤ pmax it follows that kmin ≤ pmax .
For undirected graphs, we can replace every edge e = {u, v } by arcs (u, x e ), (x e , ye ), ( ye , v ), ( ye , u), (v , x e ),
where we introduced additional vertices x e , ye (see Figure 6.3). In the resulting directed graph we can
still destroy reachability between u and v by removing a single arc. On the other hand, edge-disjoint
paths in the original undirected graph still correspond to edge-disjoint paths in the directed graph.
49
xe
u e v u v
ye
Figure 6.3: From left to right: Transformation for undirected graphs in the proof of Theorem 6.17.
uin vin
u v
uout vout
Figure 6.4: From left to right: Transformation in the proof of Theorem 6.18.
Theorem 6.18 (Menger, vertex version). Let G = (V, E) be a graph and t ∈ V be not adjacent to s ∈ V .
The maximum number pmax of vertex-disjoint s- t -paths in G is equal to the minimum size kmin of a set
U ⊆ V \ {s, t} of vertices for which t is not reachable from s in G[V \ U].
Proof. First consider directed graphs. We replace every vertex v ∈ V in G by an arc (vin , vout ) and replace
every arc (u, v ) by the arc (uout , vin ) and call the resulting graph G 0 (see Figure 6.4). Now, there is a
canonical bijection between s- t -paths in G and sout - t in -paths in G 0 . In particular, two s- t -paths in G are
vertex disjoint if and only if the corresponding paths in G 0 are arc-disjoint. Furthermore, s and t can be
separated in G by deleting k vertices if and only if sout and t in can be separated in G 0 by removing k arcs,
since it is always sufficient to remove arcs of the form (vin , vout ) in G 0 , which corresponds to removing
the vertex v in G . The statement thus follows for graph G by applying Theorem 6.17 to G 0 .
We can reduce the statement for undirected graphs to directed graphs with the same construction as in
the proof of Theorem 6.17. Note that we need to require that s and t are not adjacent for kmin to be
well-defined.
50
7 Matchings
We now turn to simple assignment problems where objects have to be paired together, while respecting
compatitibilies between objects. One example of such a pairing problem are kidney exchange programs,
where people in need of an organ that have somebody willing to donate that is biologically incompatible
need to find other people to perform cross-exchanges of organs. Another example is the matchmaking
in sports or gaming, where teams or individuals have to be paired up against each other. Pairings of this
type can be formally modeled as follows.
Definition 7.1. ASmatching in an undirected graph G = (V, E) is a set M ⊆SE of pairwise disjoint edges.
/ M . A vertex v ∈ e∈M e is ( M -)covered and we
A vertex v ∈ V \ e∈M e is ( M -)exposed and we write v ∈
write v ∈ M . The matching M is
The algorithmic problem pairing the largest number of objects can now be expressed as the problem
of finding a maximum matching. This problem is sometimes also referred to as maximum cardinality
matching problem, to emphasize the difference to the maximum weight matching problem on weighted
graphs.
Maximum Matching Problem
input: undirected graph G = (V, E)
problem: find maximum matching M ⊆ E of G
We can define a notion of augmenting paths to obtain an optimality criterion very similar to Theorem 6.8
for network flows (see Figure 7.1).
Figure 7.1: From left to right: maximal matching, augmenting path, maximum matching.
51
s
G NG
Figure 7.2: Left: bipartite graph G . Right: flow network NG induced by G (µ = 1).
In this lecture, we consider the maximum matching problem only on a restricted class of graphs whose
vertices can be split into two sets such that every edge runs between these sets. These graphs arise when-
ever we are pairing objects of two types into mixed pairs, e.g., when assigning applicants to positions,
men to women, operators to machines, etc.
We can reduce the problem of finding a maximum matching in a bipartite graph to the maximum flow
problem on the following graph (see Figure 7.2).
This construction allows us to rely on algorithms for computing maximum flows, rather than explicitly
using Theorem 7.3 directly.
Theorem 7.6. In bipartite graphs, the maximum matching problem can be solved in time O(nm).
Proof. Observe that there is a bijection between matchings in a bipartite graph G and integral s- t -flows
in NG , such that every matching M is uniquely associated with a flow f with |M | = | f |. This implies that
we can find a maximum matching in G by finding a maximum s- t -flow in NG . We can accomplish the
latter using the Ford-Fulkerson method, as shown below.
52
Figure 7.3: A bipartite graph with a minimum vertex cover of size three (black vertices) and a maximum
matching of size three (thick edges). Observe that the vertices that are not part of the vertex
cover violate the Hall condition.
Algorithm: FORDFULKERSONMATCHING(G)
input: bipartite graph G = (U ∪ V, E)
output: maximum cardinality matching M ⊆ E in G
(H, µ, s, t) ← NG
f ← FORDFULKERSON(H, µ, s, t)
return {{u, v } ∈ E|(u, v ) ∈ U × V ∧ f ((u, v )) = 1}
Since the network NG has unit capacities, Theorem 6.9 implies that the above algorithm finds a maximum
cardinality matching in time O(| f ? | m), where | f ? | denotes the value of a maximum s- t -flow in NG . The
cut induced by the set {s} has capacity |U| ≤ n in H , hence we have | f ? | ≤ n (by Proposition 6.4), and
the claimed running time of O(nm) follows.
In addition to this algorithmic result, we can translate structural results from the theory of maximum
flows to maximum matchings. This allows us to express the size of a maximum matching with respect to
the following combinatorial object (see Figure 7.3).
Definition 7.7. A vertex cover of an undirected graph G = (V, E) is a set X ⊆ V with δ(v ) = E . A
S
v ∈X
vertex cover is minimum if it has the smallest cardinality among all vertex covers.
We have already seen that the existence of an augmenting path proves that a matching is not maximum
(Theorem 7.3). Conversely, we can prove that a matching is maximum by finding an s- t -cut of minimum
cardinality in the flow network. This can be expressed more consisely via the following theorem.
Theorem 7.8 (Kőnig). In a bipartite graph G , the cardinality of a maximum matching is equal to the
cardinality of a minimum vertex cover.
Proof. Consider the flow network NG = (H, µ, s, t) induced by G . The cardinality of a maximum matching
in G is equal to the maximum number of vertex disjoint s- t -paths in H . On the other hand, the cardinality
of a minimum vertex cover corresponds to the number of vertices (other than s or t ) that we need to
remove from H to destroy all s- t -paths. The statement of the theorem thus follows from Menger’s
Theorem for vertex-disjoint paths (Theorem 6.18).
With this, we can derive a characterization of bipartite graphs that admit perfect matchings. We first give
a characterization that focuses on one of the parts of the bipartite graph (see Figure 7.3 for an example).
A symmetrical characterization will then follow directly.
53
Theorem 7.9 (Hall). A bipartite graph G = (U ∪ V, E) has a matching covering all vertices of U if and
only if
Γ (U 0 ) ≥ U 0 for all U 0 ⊆ U.
Proof. Clearly, if there is a matching covering all vertices of U , then the Hall condition |Γ (U 0 )| ≥ |U 0 |
holds for all U 0 ⊆ U .
Now, suppose that there is no matching covering all vertices of U and let M ? denote a maximum matching
in G . Then, |M ? | < |U| and König’s Theorem (Theorem 7.8) yields that the cardinality of a minimum
vertex cover X ? is also bounded by |X ? | < |U|. Let U 0 ∪ V 0 := X ? with U 0 ⊆ U and V 0 ⊆ V . By definition
of X ? , we have that X ? needs to cover all edges in δ(U \ U 0 ), hence Γ (U \ U 0 ) ⊆ V 0 . It follows that
|Γ (U \ U 0 )| ≤ |V 0 | = |X ? | − |U 0 | < |U| − |U 0 | = |U \ U 0 |,
Corollary 7.10 (“Marriage” Theorem). A bipartite graph G = (U ∪ V, E) has a perfect matching if and
only if |U| = |V | and |Γ (U 0 )| ≥ |U 0 | for all U 0 ⊆ U .
54
8 Complexity
In the previous chapters we developed efficient algorithms for various algorithmic problems. In many
cases we repeatedly improved the running times of our algorithms, and for most problems even more
efficient algorithms are known (see lecture “Combinatorial Optimization”). Ultimately, the goal is to find
an “optimal” algorithm with best-possible running time for each problem. Unfortunately, proving lower
bounds on achievable running times has proved extremely challenging, and very few super-linear lower
bounds are known beyond the Ω(n log n) lower bound for sorting (Theorem 3.26).
This lack of theoretical lower bounds is particularly problematic for problems where the best currently
known solutions are extremely inefficient, since we do not know whether effcient algorithms are impos-
sible or have simply not been found yet. We typically distinguish between efficiently solvable problems
that admit polynomial running times and problems that require an exponential running time (in the
worst-case). While a strict separation of problems according to their inherent difficulty (or “complex-
ity”) still eludes the mathematical community, complexity theory has developed an entire hierarchy of
complexity classes to classify and compare algorithmic problems by difficulty. This hierarchy hinges on
a fundamental conjecture relating problems with efficient solutions to problems where solutions can
efficiently be verified.
We will now give an introduction into the fundamental concepts of complexity theory. These will provide
us with a way of showing that a specific algorithmic problem is very unlikely to admit an efficient
solution. To do this, it will prove convenient to restrict ourselves to problems that only have two possible
solutions. In the following, the binary representation of an instance is defined by a (given) bijective map
b : I → {0, 1}∗ .
Definition 8.1. An algorithmic problem (I , (S I ) I∈I ) is called a decision problem if ; ⊂ S I ⊂ {‘yes’, ‘no’} for
all I ∈ I . The input size n := |I| of an instance I ∈ I of a decision problem is defined to be equal to the
number of bits in the binary representation of I .
MST Problem
input: undirected graph G = (V, E), weightsPc : E → R≥0 , number C ∈ R≥0
problem: Is there a spanning tree T of G with e∈T c(e) ≤ C ?
VERTEXCOVER Problem
input: undirected graph G = (V, E), integer k ∈ N
problem: Is there a vertex cover X ⊆ V in G with |X | ≤ k?
MATCHING Problem
input: undirected graph G = (V, E), integer k ∈ N
problem: Is there a matching M ⊆ E in G with |M | ≥ k?
GENERALIZEDCHESS Problem
input: chess pieces on a n × n chessboard
problem: Can the active player guarantee victory?
55
Remark 8.2. Informally, we distinguish between decision problems, where we simply need to decide
feasibility (e.g., is there a path of length at most L ?), optimization problems, where we need to find the
optimum value (e.g., what is the length of a shortest path?), and search problems, where we need to
find an optimal object (e.g., find a shortest path!). We can often obtain a solution for the optimization
problem from a solution for the decision problem by binary search for the best feasible value, and we can
obtain a solution for the search problem from a solution for the optimization problem by backtracking
or dynamic programming. In that sense, if we are only interested in the question whether a problem is
polynomially solvable, we can often restrict ourselves to the decision problem without loss of generality.
We can now introduce our first complexity class that contains all efficiently solvable problems.
Definition 8.3. The complexity class P consists of all decision problems that can be solved in polynomial
time.
Example. All problems we have considered in the previous chapters can be formulated as decision
problems. For example, we can ask whether a spanning tree / path / flow / matching of a given weight
/ length / value / cardinality exists. As we have seen, all these problems are in P.
The second complexity class we will be interested in consists of all problems for which it can be proven
efficiently that the solution for an instance is ‘yes’ (we do not require this for the ‘no’-case). To formally
define this class, we first need to make the notion of efficient proofs precise.
Definition 8.5. The complexity class NP is the set of all decision problems that can be polynomially
verified.
Example. The VERTEXCOVER problem is in NP. To see this, let I be the set of all instances of VERTEXCOVER
and let C = 2N . There is a polynomial algorithm that, given a set c ∈ C , a graph G = (V = {v1 , . . . , vn }, E)
and an integer k ∈ N, decides whether X c := { vi | i ∈ c} is a vertex cover in G of size at most k: Obviously,
such an algorithm exists, since we can check in linear time whether every edge of G intersects X c and
whether |X c | ≤ k. Now, let I be the set of instances of VERTEXCOVER and let Π0 = (I 0 , (S 0 ) I 0 ∈I 0 ) with
I 0 = C × I be the decision problem, such that, for all I = (G, k) ∈ I , we have S(c,I)0
= {‘yes’} if and only
if X c is a vertex cover of size at most k in G . By definition, for I = (G, k) ∈ I there is a c ∈ C with
0
S(c,I) = {‘yes’} if and only if G has a vertex cover of size k. Hence, Π0 certifies Π. Moreover, we have
|c| ≤ |I| and Π0 ∈ P, hence VERTEXCOVER can be polynomially verified and VERTEXCOVER ∈ NP.
Remark 8.6. Almost always the certificate that proves inclusion of a problem in the class NP is simply
given by a solution to the corresponding search problem. This is true in the example above, where a
certificate is simply a vertex cover of the required size, and it will be true in all proofs below.
It is immediate that every problem in P is also in NP, since we can simply reuse its polynomial time
algorithm to verify (both ’yes’ and ’no’) instances. We will see other problems below that fall into NP but
are believed to lie outside P.
56
Proof. This follows by definition, since every decision problem verifies itself (e.g., with C = {;}).
Remark 8.8. The name of the class NP stands for “nondeterministic polynomial”, because it contains
exactly those decision problems that can be solved in polynomial time by a nondeterministic algorithm
that is allowed to “guess” what to do in every step and always guesses correctly. Intuitively, such an
algorithm can simply guess the polynomial certificate of a “yes”-instance, and, conversely, we can use
the outcomes of its (polynomially many) guesses as a certificate.
Remark 8.9. Note that there are (decision) problems that provably lie outside the class NP (for example
GENERALIZEDCHESS). For a discussion of this domain of the complexity landscape, we refer to a class on
complexity theory. Also note that there even are undecidable problems, for which no algorithm can exist
at all that always terminates and yields the correct answer!
The distinction between P and NP lies at the heart of complexity theory. We will see how to show that a
problem is “hardest possible” within NP, which is considered a (very) strong indication for a problem not
being polynomial time solvable. However, it is still an open problem to prove (or disprove) that P 6= NP.
In fact this problem is part of the famous list of the millenium problems published by the Clay institute:
• Yang-Mills and Mass Gap
• Riemann Hypothesis
• P vs NP Problem
• Navier-Stokes Equation
• Hodge Conjecture
• Poincaré Conjecture (the only solved problem of this list)
• Birch and Swinnerton-Dyer Conjecture
8.1 NP-completeness
We have seen examples, where we were able to solve an algorithmic problem by reformulating it as a
special case of another problem, e.g., we solved the maximum bipartite matching problem by formulating
it as a flow problem. If this can be done with little overhead, it is justified to consider the former problem
to not be harder than the latter. Since we are only interested in distinguishing polynomial time solvable
problems from problems that cannot be solved in polynomial time, we can allow a polynomial effort for
the transformation between problems. We now make this precise.
Definition 8.10. We say that an algorithmic problem Π = (I , (S I ) I∈I ) is polynomial time reducible to
problem Π0 = (I 0 , (S I0 0 ) I 0 ∈I 0 ) if there is a function R: I → I 0 that satisfies S I = SR(I)
0
for all I ∈ I and can
be computed in time polynomial in |I|.
This notion gives us a partial ordering by difficulty between problems outside of P. In particular, we can
introduce the class of problems that are “hardest” within the class NP.
Definition 8.12. An algorithmic problem Π is NP-hard if all Π0 ∈ NP are polynomial time reducible to Π.
If, additionally, Π ∈ NP, then Π is NP-complete.
The above definition seems very restrictive, and it is not clear a priori that any NP-complete problems
exist. Surprisingly, it turns out that there are a lot of NP-complete problems. The first problem for which
NP-completeness was established is the following:
57
Definition 8.13. Let X = {x i }i=1,...,n be a set of variables. We call Λ = { x, x̄| x ∈ X } the set of literals
over X and C = λ1 ∨ λ2 ∨ · · · ∨ λk with λi ∈ Λ for i ∈ {1, . . . , k} a clause over X of size |C| = k. A CNF
(conjunctive normal form) formula is a Boolean formula of the form
C1 ∧ C2 ∧ · · · ∧ Cm ,
that is given by a set of clauses C = {Ci }i=1,...,m over a set of variables X . An assignment for a CNF formula
is a function α: X → {0, 1}, and α is called satisfying if λ∈C α(λ) ≥ 1 for all C ∈ C with α(λ) := 1−α(λ̄)
P
for λ ∈ Λ \ X .
(x 1 ∨ x̄ 2 ∨ x 3 ) ∧ (x̄ 1 ∨ x̄ 3 ) ∧ (x 1 ∨ x 2 ∨ x̄ 3 )
is unsatisfiable.
It was a breakthrough and the birth of complexity theory when Cook (1971) showed that every prob-
lem in NP can be reduced to deciding whether a SAT formula admits a satisfying assignment. Roughly
speaking, Cook showed that the outcome of an algorithm can be encoded as a single Boolean formula.
Corollary 8.16. There is no polynomial time algorithm for SAT, unless P = NP.
Knowing an NP-complete problem gives us a much simpler way of establishing NP-completeness for
other problems: Instead of showing that all problems in NP can be reduced to the problem, it suffices to
show that the NP-complete problem reduces to it.
This simple oberservation has allowed to establish many (thousands of) additional NP-hard problems.
The fact that not a single polynomial time algorithm has been found for any of these problems is seen as
a strong indication that NP 6= P, because of the following fact.
58
8.2 Important NP-complete problems
We will now prove NP-completeness for some of the historically most important decision problems. Every
proof of NP-completeness for a decision problem Π0 = (I 0 , (S I0 0 ) I 0 ∈I 0 ) that is based on Obeservation 8.17
needs to address the following aspects:
(i) Proof that Π0 ∈ NP.
(ii) Decision problem Π = (I , (S I ) I∈I ) that we are reducing from.
(iii) Construction of an instance I 0 ∈ I 0 for every instance I ∈ I .
(iv) Equivalence of I 0 and I , i.e., S I 0 = {‘yes’} if and only if S I = {‘yes’}.
(v) Proof that our construction can be carried out in time polynomial in |I|.
We first consider two important variants of SAT that are often useful for reductions.
3SAT Problem
input: instance C = {Ci }i=1,...,m , X = {x i }i=1,...,n of SAT with |C| = 3 for all C ∈ C
problem: Is there a satisfying assignment?
Proof. From SAT ∈ NP it follows that 3SAT ∈ NP, since every instance of 3SAT is an instance of SAT.
Wk
We reduce SAT to 3SAT by replacing every clause C = i=1 λi of length k 6= 3 by an equivalent 3SAT
formula Z of polynomial length (in k). To this end, we introduce new variables { yi }i=1,...,k+1 . If C = λ1 ,
we set
Z = (λ1 ∨ y1 ∨ y2 ) ∧ (λ1 ∨ ȳ1 ∨ y2 ) ∧ (λ1 ∨ y1 ∨ ȳ2 ) ∧ (λ1 ∨ ȳ1 ∨ ȳ2 ),
and, if C = λ1 ∨ λ2 , we set
Z = (λ1 ∨ λ2 ∨ y1 ) ∧ (λ1 ∨ λ2 ∨ ȳ1 ).
It is easy to see that in either case Z is satisfied by an assignment α: X → {0, 1} if and only if α(λi ) = 1
for some i ∈ {1, k}.
If k > 3, we set
Now, Z can be satisfied by setting α(λi ) = 1 for any i ∈ {1, . . . , k} and α( y j ) = 1 for all j ≤ i − 2 and
α( y j ) = 0 for all j ≥ i − 1. On the other hand, we claim that there is no satisfying assignment α with
α(λi ) = 0 for all i ∈ {1, . . . , k}. For the sake of contradiction, assume such an assignment exists. Then
the first clause of Z forces α( y1 ) = 1, which means that the second clause forces α( y2 ) = 1, etc. The last
clause cannot be satisfied anymore.
We can use this variant of satisfiability to obtain NP-hardness for various fundamental graph problems.
As a starting point, we consider the following problem.
Definition 8.20. A set of vertices X ⊆ V is stable (or independent) in an undirected graph G = (V, E) if
G[X ] = (X , ;).
STABLESET Problem
input: undirected graph G = (V, E), integer k ∈ N
problem: Is there a stable set X ⊆ V in G with |X | ≥ k?
59
Theorem 8.21. STABLESET is NP-complete.
Proof. It is easy to check in polynomial time whether a given set of vertices is stable. This means that
we can use the stable set X of size |X | ≥ k as a polynomial certificate for a ‘yes’-instance of STABLESET.
Hence STABLESET ∈ NP.
V m W3
We reduce 3SAT to STABLESET. Let Z = i=1 j=1 λi j be a 3S AT formula. We construct a graph
G = (V, E) with 3m vertices that has a stable set of size m if and only if Z is satisfiable. We let
V = {1, . . . , m} × {1, 2, 3} and introduce edges {(i, j), (i 0 , j 0 )} ∈ E for every λi j = λ̄i 0 j 0 , as well as the
edges {(i, 1), (i, 2)}, {(i, 2), (i, 3)}, {(i, 3), (i, 1)} ∈ E . Observe that our construction can be carried out in
polynomial time.
Suppose there is a satisfying assignment α for Z and choose ji such that α(λi ji ) = 1 for all i ∈ {1, . . . , m}.
Then, by construction, {(i, ji )}i∈{1,...,m} is a stable set of size m in G .
Conversely, let X be a stable set of size m in G . Then, |{(i, 1), (i, 2), (i, 3)} ∩ X | ≤ 1 for all
i ∈ {1, . . . , m}, since {(i, 1), (i, 2)}, {(i, 2), (i, 3)}, {(i, 3), (i, 1)} ∈ E . Since |X | = m, this implies
|{(i, 1), (i, 2), (i, 3)} ∩ X | = 1 for all i ∈ {1, . . . , m}. We can find an assignment α with α(λi j ) = 1 for
all (i, j) ∈ X , since no two pairs (i, j), (i 0 , j 0 ) corresponding to literals λi j = λ̄i 0 j 0 can be in X by construc-
tion of G . This way, at least one literal of each clause is set to 1, and hence α is a satisfying assignment
for Z .
We have already seen that VERTEXCOVER is polynomial time solvable on bipartite graphs by Königs Theo-
rem (Theorem 7.8 and Theorem 7.6). In contrast, the problem turns out to be NP-complete on general
graphs (even though the matching problem remains polynomial time solvable, see lecture “Combinatorial
Optimization”).
Proof. It is easy to check in polynomial time whether a given set X ⊆ V is a vertex cover, hence
VERTEXCOVER ∈ NP. We can trivially reduce STABLESET to VERTEXCOVER, since X is a stable set in a
graph G = (V, E) if and only if V − X is a vertex cover, i.e., there is a stable set of size at least k if and
only if there is a vertex cover of size at most n − k.
Another important covering problem where three disjoint sets have to be covered by selecting triples is
the following:
3DMATCHING Problem
input: disjoint sets A, B, C , triples T ⊆ A × B × C S
problem: Is there a set of disjoint triples M ⊆ T with t∈M t = A ∪ B ∪ C ?
t = A ∪ B ∪ C and that
S
Proof. Obviously, 3DMATCHING ∈ NP, since we can efficiently check whether t∈M
all triples in M are disjoint for given M ⊆ T .
We reduce 3SAT to 3DMATCHING. Let Z be 3SAT formula over variables X = {x 1 , . . . , x n } and clauses
C = {C1 , . . . , Cm }, and let k ∈ N be the smallest number such that no literal appears more than k times in
Z.
We describe sets A, B, C and triples T⊆ A × B × C that permit a disjoint covering of A ∪ B ∪ C if and only
if Z is satisfiable. We let A := x i j , x̄ i j x i ∈ X , j ∈ {0, . . . , k − 1} .
60
For every variable x i ∈ X and every j ∈ {0, . . . , k − 1}, we introduce the elements bi j ∈ B and ci j ∈ C and
the triples (x i j , bi j , ci j ), (x̄ i j , bi j , ci, j+1 mod k ) ⊆ T . These will be the only triples containing bi j , ci j , hence
every solution of 3DMATCHING must select exactly k of these triples. Depending on the selected triples,
either all elements x i j or all elements x̄ i j are not covered, and we interpret the corresponding choices as
setting α(x i ) = 1 an α(x i ) = 0, respectively.
For every clause C` ∈ C we introduce the elements d` ∈ B and e` ∈ C , as well as three triples with these
two elements: For every literal λ ∈ C` with λ = x i we introduce one of the triples (x i j , d` , e` ), such that
every element x i j is contained in at most one such triple (this is possible by definition of k). Similarly, for
every literal λ = x̄ i we introduce one of the triples (x̄ i j , d` , e` ), such that every element x i j is contained
in at most one such triple. The choice of one of the three triples of a clause in a solution corresponds to
the decision which literal should satisfy the clause. Observe that, since every element of A may only be
contained in exactly one triple in a solution to 3DMATCHING, this choice of a literal needs to be consistent
with the variable assignment.
An assignment α satisfies Z if and only if the corresponding selection of triples leaves kn − m elements
of A uncovered. To complete the construction, we introduce additional pairs (pi , qi ) ∈ B, C for i ∈
{1, . . . , kn − m} that ensure that all elements can be covered. For every element a ∈ A and each of these
pairs, we introduce all triples (a, pi , qi ) for i ∈ {1, . . . , 2k − m}. Every solution to 3DMATCHING must cover
all elements pi , qi , which can be accomplished exactly by coving kn − m arbitrary elements of A.
Overall, the constructed instance of 3DMATCHING has a solution (i.e., the solution to the decision problem
is {‘yes’}) if and only if Z has a satisfying assignment α: Each such assignment uniquely fixes a choice
of triples, and vice-versa. Furthermore, our construction can be carried out in polynomial time, since
|A| = |B| = |C| = 2kn and |T | = 2kn + 3m + (kn − m) · 2kn = O(m2 n2 ).
Finally, many optimization problems can be expressed as variants of the following problem. Intuitively,
the complexity of this problem comes from its rigidity: we have to achieve an exact sum of numbers,
and a solution that is off the target sum by very little may still be completely different from any exact
solution.
SUBSETSUM Problem
input: numbers A = (ai ∈ N)i=1,...,n , integer PK ∈N
problem: Is there a subset S ⊆ {1, . . . , n} with i∈S ai = K ?
ai = K for a
P
Proof. Obviously SUBSETSUM ∈ NP since we can check in polynomial time whether i∈I
given set S .
We reduce 3DMATCHING to SUBSETSUM. Let (A, B, C, T ) be an instance of 3DMATCHING with
{u0 , . . . , um−1 } := A ∪ B ∪ C and {T1 , T2 , . . . , Tn } := T . We construct an instance of SUBSETSUM with
X
a j := (n + 1)i ,
ui ∈T j
m−1
X
K := (n + 1)i .
i=0
Represented in basis (n + 1), the number a j has a 1 exactly in the digits that correspond to the elements
in the triple T j , all other digits are equal to 0. In this representation, all digits of K are exactly 1. Since
there are only n different numbers a j , there cannot be a carry-over in any digit with respect to basis
61
(n + 1) when summing a subset of these numbers. Hence, every solution to our SUBSETSUM instance
corresponds to a solution to the given 3DMATCHING instance, since all elements need to be covered and
no element may be covered more than once. In the same way, every solution to the 3DMATCHING instance
corresponds to a solution to the SUBSETSUM instance. Every number has O(m) digits in basis (n + 1),
thus its binary representation has O(m log n) bits. This means that our construction can be carried out
in polynomial time.
62