Analysis of Algorithms I: Depth-First Search and Topological Sort
Analysis of Algorithms I: Depth-First Search and Topological Sort
Xi Chen
Columbia University
Introduction
We discuss the second strategy commonly used in the generic
algorithm for reachability described in the last class:
1 set R = {s}
2 while there is an edge from R to V R do
3 let (u, v ) E be such an edge with u R, v V R
4 set R = R {v } and v . = u
Introduction
In DFS, each vertex v has one of the following three colors:
1 White: not discovered yet
2 Gray: discovered but not finished yet
3 Black: finished
Introduction
DFS-Visit (G , u), where u V satisfies u.color = white:
Introduction
It is clear that we change the color of u from white to gray at the
beginning of DFS-Visit (G , u) (we say u is just discovered), and
change it again to black at the end (we say u is finished). But why
do we name this procedure DFS-Visit instead of DFS? In most
of the applications of DFS, we need to keep calling DFS-Visit until
we have discovered all vertices of G . And we reserve DFS for
the latter procedure that makes calls to DFS-Visit. (Comparison to
BFS: One application of BFS is to compute the shortest-path
distances from a given source vertex s V . To this end, it suffices
to make one call BFS (G , s). But to use BFS to compute the
connected components of an undirected graph, then one also needs
to keep calling BFS until all vertices are discovered.)
Introduction
In DFS-Visit (G , u), we enumerate vertices v adj(u) (clearly it is
better to use the list representation here, just like in BFS) not
discovered yet, and make a recursive call DFS-Visit (G , v ) to
explore v . Upon the termination of DFS-Visit (G , u), we use E to
denote the following set of edges:
Using a similar argument from the last class, it is easy to show that
E E (why?) has no cycle (why?) and thus, is a tree rooted at
u. We call it the Depth-first tree formed by DFS-Visit (G , u).
Introduction
Check Figure 22.4 on Page 605 in the textbook. DFS-Visit (G , u)
discovers u first, followed by v , y and x. When x is discovered, it
has no white neighbor in adj(x) so we are finished with x; change
it from gray to black; and backtracks to y , the vertex that
discovered x. Similarly, none of y , v and u has any white neighbor
in their adjacency lists and we are done. For this example,
n o
E = (u, v ), (v , y ), (y , x)
Introduction
In most of the applications, we start with a graph G = (V , E ),
directed or undirected; set v .color = white and v . = nil for all
v V ; and keep calling DFS until every vertex are discovered:
DFS (G ), where G = (V , E ) is either undirected or directed:
Introduction
Upon termination of DFS (G ), it is clear that we have discovered
every vertex v V and v .color = white (because DFS-Visit (G , u)
changes u to black by the end, and we never touch a vertex again
once it is black). It is also easy to check that the edges
n o
E = (v ., v ) : v V with v . 6= nil
Introduction
DFS-Visit (G , u):
Introduction
It is clear that the time counter increases by one every time we
change the color of a vertex (from white to gray or from gray to
black). We use u.d and u.f to record the two timestamps when
DFS discovers and finishes with u, respectively. Before discussing
properties of DFS, what is the total running time of DFS (G )? Its
initilization of course costs (n).
Introduction
During the execution of DFS (G ), we make exactly one call
DFS-Visit (G , u) for each u V because it is only in DFS-Visit
(G , u) that we change the color of u from white to gray and from
gray to black (and never touch it again). The running time of
DFS-Visit (G , u), except those recursive calls made on line 6, is
(1) + adj(u)
Introduction
Basic properties of DFS (G ): (Prove them by yourself.)
Introduction
But why is DFS useful? To describe its first application, we need
to prove the following key theorem: Let u, v be two vertices in G .
We say u is an ancestor of v (or v is a descendant of u) if u, v lie
in the same tree of the depth-first forest and u is an ancestor of v
in the tree. We say u and v are unrelated if u is not an ancestor of
v and v is not an ancestor of u (but may lie in the same tree).
Introduction
The White-Path theorem is very powerful. For example, it implies
that when we make a call DFS-Visit (G , u) in the for-loop of DFS
(G ), the tree rooted at u in the depth-first forest consists of
exactly the vertices v V such that there is a white path from u
to v at the time u.d (or equivalently, at the beginning of DFS-Visit
(G , u)). To prove it, we need the following Parenthesis lemma:
Introduction
Theorem
For any u and v , if u is an ancestor of v , then [v .d, v .f ] is
contained entirely within [u.d, u.f ]:
Introduction
We demonstrate this lemma with Figure 22.5. The proof is
relatively straight-forward, using the two properties mentioned
earlier, and can be found in the textbook.
Introduction
We now use it to prove the White-Path theorem. We first show
that if v is a descendant of u in the forest, then at time u.d there
is a white path from u to v . To this end, it suffices to show that at
the time u.d, every vertex in the subtree rooted at u in the forest is
white. (This clearly holds for u itself, check carefully the statement
of the theorem.) For each proper descendant v of u, we have
u.d < v .d
Introduction
The other direction is slightly more difficult: If at the time u.d,
there is a white path from u to v , then v is a descendant of u. We
assume for contradiction that u, u1 , . . . , uk , v is a white path from
u to v at the time u.d for some k 0, but v is not a descendant
of u. Without loss of generality, we may assume that u1 , . . . , uk
are all descendants of u; otherwise we can choose u be the closest
vertex to u along this path that is not a descendant of u. Denote
uk by w for convenience.
Introduction
By the Parenthesis lemma, we have
w .f u.f
Introduction
Now we describe the first application of DFS: Topological Sort.
The input is a directed and acyclic graph (DAG) G = (V , E ). Here
acyclic means that there is no cycle in G . We need to find a
topological sort (i.e., a permutation) of the n = |V | vertices such
that for any edge (u, v ) E , u appears before v in the sort. (It is
clear that if G has a cycle then no topological sort exists.) For
example, in task scheduling, the vertices are tasks and an edge
(u, v ) E means that task u must be done before v . A
topological sort of the vertices then gives a feasible order of the
tasks, with no violation to the requirements from E .
Introduction
Topological sort of a DAG G = (V , E ):
1 call DFS (G )
2 as each vertex is finished, insert it onto the front of a list
3 return the linked list of vertices
Introduction
To prove the correctness of the algorithm, it suffices to show that
given any DAG G = (V , E ), every edge (u, v ) E satisfies
u.f > v .f
Introduction
Given a directed graph G = (V , E ) (for now we do not require G
to be acyclic) and its depth-first forest, we say (u, v ) E is a
Introduction
Which type an edge (u, v ) E is depends on the color of v when
DFS explores v in DFS-Visit (G , u) (as it goes through adj(u)):
Introduction
To summarize, for tree / forward / cross edges we have u.f > v .f ,
while for back edges we have u.f < v .f . (Keep this in mind
because we will use it again in the next application of DFS:
strongly connected components.) Now we prove the correctness of
the Topological Sort algorithm. Let G be a DAG and we examine
an edge (u, v ) E . To show u.f > v .f , we follow the four cases.
If (u, v ) is a tree / forward / cross edge, we have already shown that
u.f > v .f . The correctness follows if we can show that in a DAG,
there is no back edge. This is trivial because a back edge implies a
cycle (why?) and violates with the DAG assumption.
Introduction