Part 2 - Graph Algorithms and Data Structures
Part 2 - Graph Algorithms and Data Structures
Tim Roughgarden
c 2018 by Tim Roughgarden
All rights reserved. No portion of this book may be reproduced in any form
without permission from the publisher, except as permitted by U. S. copyright
law.
First Edition
Preface vii
v
vi Contents
Index 216
Preface
vii
viii Preface
find your own favorites. There are also several books that, unlike
these books, cater to programmers looking for ready-made algorithm
implementations in a specific programming language. Many such
implementations are freely available on the Web as well.
Additional Resources
These books are based on online courses that are currently running on
the Coursera and edX platforms. I’ve made several resources available
to help you replicate as much of the online course experience as you
like.
Videos. If you’re more in the mood to watch and listen than
to read, check out the YouTube video playlists available at www.
algorithmsilluminated.org. These videos cover all the topics in
this book series, as well as additional advanced topics. I hope they
exude a contagious enthusiasm for algorithms that, alas, is impossible
to replicate fully on the printed page.
Preface xi
Quizzes. How can you know if you’re truly absorbing the concepts
in this book? Quizzes with solutions and explanations are scattered
throughout the text; when you encounter one, I encourage you to
pause and think about the answer before reading on.
End-of-chapter problems. At the end of each chapter you’ll find
several relatively straightforward questions for testing your under-
standing, followed by harder and more open-ended challenge problems.
Hints or solutions to most of these problems (as indicated by an “(H)”
or “(S),” respectively) are included at the end of the book. Read-
ers can interact with me and each other about the end-of-chapter
problems through the book’s discussion forum (see below).
Programming problems. Most of the chapters conclude with a
suggested programming project whose goal is to help you develop a
detailed understanding of an algorithm by creating your own working
implementation of it. Data sets, along with test cases and their
solutions, can be found at www.algorithmsilluminated.org.
Discussion forums. A big reason for the success of online courses
is the opportunities they provide for participants to help each other
understand the course material and debug programs through discus-
sion forums. Readers of these books have the same opportunity, via
the forums available at www.algorithmsilluminated.org.
Acknowledgments
These books would not exist without the passion and hunger supplied
by the hundreds of thousands of participants in my algorithms courses
over the years. I am particularly grateful to those who supplied
detailed feedback on an earlier draft of this book: Tonya Blust, Yuan
Cao, Jim Humelsine, Vladimir Kokshenev, Bayram Kuliyev, Patrick
Monkelban, and Daniel Zingaro.
xii Preface
Tim Roughgarden
London, United Kingdom
July 2018
Chapter 7
This short chapter explains what graphs are, what they are good
for, and the most common ways to represent them in a computer
program. The next two chapters cover a number of famous and useful
algorithms for reasoning about graphs.
When you hear the word “graph,” you probably think about an x-axis,
a y-axis, and so on (Figure 7.1(a)). To an algorithms person, a graph
can also mean a representation of the relationships between pairs of
objects (Figure 7.1(b)).
40
35
30
25
20
15
10
0
0 5 10 15 20 25 30 35 40
(a) A graph (to most of the world) (b) A graph (in algorithms)
1
2 Graphs: The Basics
the vertices (singular: vertex) or the nodes of the graph.1 The pairwise
relationships translate to the edges of the graph. We usually denote
the vertex and edge sets of a graph by V and E, respectively, and
sometimes write G = (V, E) to mean the graph G with vertices V
and edges E.
There are two flavors of graphs, directed and undirected. Both
types are important and ubiquitous in applications, so you should know
about both of them. In an undirected graph, each edge corresponds to
an unordered pair {v, w} of vertices, which are called the endpoints
of the edge (Figure 7.2(a)). In an undirected graph, an edge with
endpoints v and w can be denoted by (v, w) or by (w, v)—there is no
difference between the two. In a directed graph, each edge (v, w) is an
ordered pair, with the edge traveling from the first vertex v (called
the tail) to the second w (the head); see Figure 7.2(b).2
v v
s t s t
w w
Figure 7.2: Graphs with four vertices and five edges. The edges of
undirected and directed graphs are unordered and ordered vertex pairs,
respectively.
Graphs are a fundamental concept, and they show up all the time in
computer science, biology, sociology, economics, and so on. Here are
a few of the countless examples.
1
Having two names for the same thing can be annoying, but both terms are
in widespread use and you should be familiar with them. For the most part, we’ll
stick with “vertices” throughout this book series.
2
Directed edges are sometimes called arcs, but we won’t use this terminology
in this book series.
7.3 Measuring the Size of a Graph 3
In this book, like in Part 1, we’ll analyze the running time of different
algorithms as a function of the input size. When the input is a single
array, as for a sorting algorithm, there is an obvious way to define the
“input size,” as the array’s length. When the input involves a graph,
we must specify exactly how the graph is represented and what we
mean by its “size.”
quantities.
The next quiz asks you to think about how the number m of edges
in an undirected graph can depend on the number n of vertices. For
this question, we’ll assume that there’s at most one undirected edge
between each pair of vertices—no “parallel edges” are allowed. We’ll
also assume that the graph is “connected.” We’ll define this concept
formally in Section 8.3; intuitively, it means that the graph is “in
one piece,” with no way to break it into two parts without any edges
crossing between the parts. The graphs in Figures 7.1(b) and 7.2(a)
are connected, while the graph in Figure 7.3 is not.
Quiz 7.1
Consider an undirected graph with n vertices and no parallel
edges. Assume that the graph is connected, meaning “in
one piece.” What are the minimum and maximum numbers
of edges, respectively, that the graph could have?
3
For a finite set S, |S| denotes the number of elements in S.
7.3 Measuring the Size of a Graph 5
n(n 1)
a) n 1 and 2
b) n 1 and n2
c) n and 2n
d) n and nn
and at most n(n 1)/2. To see why the lower bound is correct,
consider a graph G = (V, E). As a thought experiment, imagine
building up G one edge at a time, starting from the graph with
vertices V and no edges. Initially, before any edges are added, each
of the n vertices is completely isolated, so the graph trivially has n
distinct “pieces.” Adding an edge (v, w) has the effect of fusing the
piece containing v with the piece containing w (Figure 7.4). Thus,
each edge addition decreases the number of pieces by at most 1.7 To
get down to a single piece from n pieces, you need to add at least n 1
edges. There are plenty of connected graphs that have n vertices and
only n 1 edges—these are called trees (Figure 7.5).
Figure 7.4: Adding a new edge fuses the pieces containing its endpoints
into a single piece. In this example, the number of different pieces drops
from three to two.
Figure 7.5: Two connected undirected graphs with four vertices and three
edges.
4
Figure 7.6: The complete graph on four vertices has 2 = 6 edges.
There is more than one way to encode a graph for use in an algorithm.
In this book series, we’ll work primarily with the “adjacency list”
representation of a graph (Section 7.4.1), but you should also be
aware of the “adjacency matrix” representation (Section 7.4.2).
8 n
2
is pronounced “n choose 2,” and is also sometimes referred to as a
“binomial coefficient.” To see why the number of ways to choose an unordered pair
of distinct objects from a set of n objects is n(n 1)/2, think about choosing the
first object (from the n options) and then a second, distinct object (from the n 1
remaining options). The n(n 1) resulting outcomes produce each pair (x, y) of
objects twice (once with x first and y second, once with y first and x second), so
there must be n(n 1)/2 pairs in all.
8 Graphs: The Basics
a) ⇥(n)
b) ⇥(m)
c) ⇥(m + n)
d) ⇥(n2 )
Thus, an adjacency matrix maintains one bit for each pair of vertices,
which keeps track of whether or not the edge is present (Figure 7.7).
1 2 3 4
1 0 1
1 0 1 0 0
B C
2 B1 0 1 1C
2 B C
3 @0 1 0 1A
4 0 1 1 0
3 4
Figure 7.7: The adjacency matrix of a graph maintains one bit for each
vertex pair, indicating whether or not there is an edge connecting the two
vertices.
It’s easy to add bells and whistles to the adjacency matrix repre-
sentation of a graph:
• Parallel edges. If a graph can have multiple edges with the same
pair of endpoints, then Aij can be defined as the number of
edges with endpoints i and j.
where “edge (i, j)” now refers to the edge directed from i to j.
Every undirected graph has a symmetric adjacency matrix, while
a directed graph usually has an asymmetric adjacency matrix.
Quiz 7.3
How much space does the adjacency matrix of a graph
require, as a function of the number n of vertices and the
number m of edges?
a) ⇥(n)
b) ⇥(m)
c) ⇥(m + n)
d) ⇥(n2 )
the size of this graph, but a conservative lower bound on the number
of vertices is 10 billion, or 1010 . Storing and reading through an array
of this length already requires significant computational resources,
but it is well within the limits of what modern computers can do. The
size of the adjacency matrix of this graph, however, is proportional
to 100 quintillion (1020 ). This is way too big to store or process with
today’s technology. But the Web graph is sparse—the average num-
ber of outgoing edges from a vertex is well under 100. The memory
requirements of the adjacency list representation of the Web graph
are therefore proportional to 1012 (a trillion). This may be too big
for your laptop, but it’s within the capabilities of state-of-the-art
data-processing systems.12
exactly the same amount of space, namely ⇥(m). The final scorecard
is:
vertex array ⇥(n)
edge array ⇥(m)
pointers from edges to endpoints ⇥(m)
+ pointers from vertices to incident edges ⇥(m)
total ⇥(m + n).
The Upshot
14
If the graph is connected, then m n 1 (by Quiz 7.1), and we could
write ⇥(m) in place of ⇥(m + n).
15
This waste can be reduced by using tricks for storing and manipulating sparse
matrices, meaning matrices with lots of zeroes. For instance, both Matlab and
Python’s SciPy package support sparse matrix representations.
Problems 13
a) n(n 1)/2
b) n2 /2
c) n(n 1)
d) n2
a) m
b) m + n
16
The abbreviation “i.e.” stands for id est, and means “that is.”
14 Graphs: The Basics
c) 2m
d) n2
a) ⇥(1)
b) ⇥(k)
c) ⇥(n)
d) ⇥(m)
a) ⇥(1)
b) ⇥(k)
c) ⇥(n)
d) ⇥(m)
Chapter 8
This chapter is all about fundamental primitives for graph search and
their applications. One very cool aspect of this material is that all the
algorithms that we’ll cover are blazingly fast (linear time with small
constants), and it can be quite tricky to understand why they work!
The culmination of this chapter—computing the strongly connected
components of a directed graph with only two passes of depth-first
search (Section 8.6)—vividly illustrates how fast algorithms often
require deep insight into the problem structure.
We begin with an overview section (Section 8.1), which covers some
reasons why you should care about graph search, a general strategy for
searching a graph without doing any redundant work, and a high-level
introduction to the two most important search strategies, breadth-
first search (BFS) and depth-first search (DFS). Sections 8.2 and 8.3
describe BFS in more detail, including applications to computing
shortest paths and the connected components of an undirected graph.
Sections 8.4 and 8.5 drill down on DFS and how to use it to compute
a topological ordering of a directed acyclic graph (equivalently, to
sequence tasks while respecting precedence constraints). Section 8.6
uses DFS to compute the strongly connected components of a directed
graph in linear time. Section 8.7 explains how this fast graph primitive
can be used to explore the structure of the Web.
8.1 Overview
15
16 Graph Search and Its Applications
Figure 8.1: A snippet of the movie network, showing that Jon Hamm’s
Bacon number is at most 2.
1
https://oracleofbacon.org/
2
The Bacon number is a riff on the older concept of the Erdös number, named
after the famous mathematician Paul Erdös, which measures the number of
degrees of separation from Erdös in the co-authorship graph (in which vertices
are researchers, and there is an edge between each pair of researchers who have
co-authored a paper).
3
There are also lots of other two-hop paths between Bacon and Hamm in the
movie network.
8.1 Overview 17
For-Free Primitives