0% found this document useful (0 votes)
18 views7 pages

Parallel DFS

The document describes parallel algorithms for depth-first search (DFS) and breadth-first search (BFS) that have linear speedup on parallel machines. It presents an algorithm for DFS that eliminates incoming arcs in parallel when visiting vertices, maintaining the invariant that visited vertices have no incoming arcs. This approach is extended to an algorithm for BFS with similar properties, running in O(m/p + n) time with p processors.

Uploaded by

ab c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

Parallel DFS

The document describes parallel algorithms for depth-first search (DFS) and breadth-first search (BFS) that have linear speedup on parallel machines. It presents an algorithm for DFS that eliminates incoming arcs in parallel when visiting vertices, maintaining the invariant that visited vertices have no incoming arcs. This approach is extended to an algorithm for BFS with similar properties, running in O(m/p + n) time with p processors.

Uploaded by

ab c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Note on (Parallel) Depth- and Breadth-First Search by Arc

Elimination
Jesper Larsson Träff
Vienna University of Technology
Faculty of Informatics, Institute of Information Systems
Research Group Parallel Computing
Favoritenstrasse 16/184-5, 1040 Vienna, Austria
email: [email protected]
arXiv:1305.1222v3 [cs.DS] 12 Nov 2013

May 6th, 2013

Abstract
This note recapitulates an algorithmic observation for ordered Depth-First Search (DFS)
in directed graphs that immediately leads to a parallel algorithm with linear speed-up for
a range of processors for non-sparse graphs. The note extends the approach to ordered
Breadth-First Search (BFS). With p processors, both DFS and BFS algorithms run in
O(m/p + n) time steps on a shared-memory parallel machine allowing concurrent reading
of locations, e.g., a CREW PRAM, and have linear speed-up for p ≤ m/n. Both algorithms
need n synchronization steps.

1 Introduction
Depth- and Breadth-First Search are elementary graph traversal procedures with simple, sequen-
tial algorithms [2, 7, 8]. Both procedures pose problems for parallel implementation: the ordered
Depth-First Search (DFS) problem is P -complete [5], and therefore unlikely to admit polyloga-
rithmically fast, parallel algorithm using only polynomial resources, whereas for Breadth-First
Search (BFS), no work-optimal, polylogarithmically fast parallel algorithm is known. This note
re-presents the simple, work-optimal, linear time, parallel algorithm for Depth-First Search
by Varman and Doshi [9] (also Vishkin, personal communication, see also [3] which describes
a number of further applications), that can give linear speed-up for graphs that are not too
sparse, and extends the basic observation to Breadth-First Search where an algorithm with
similar properties is given. The idea is simple: instead of examing arcs in the “forwards” direc-
tion, as in standard, textbook formulations of DFS and BFS [2], incoming, “backwards” arcs
are used to eliminate arcs that are no longer relevant for the search. Whereas the standard
algorithms have either conflicts (BFS) and/or dependencies (DFS) that hamper parallelization,
arc elimination can be performed fully in parallel.
Let G = (V, E) be a directed graph with n = |V | vertices and m = |E| arcs (directed edges).
Vertices are assumed to be numbered consecutively, such that V = {0, . . . , n − 1}. Arcs are
ordered pairs of vertices with hu, vi, u, v ∈ V denoting the arc directed from source u to target
v.
It will be assumed that the input graph G is given as an n-element array ADJ of adjacency
arrays. For each vertex u ∈ V , ADJ[u].outdeg stores the out-degree of u, and the target vertex
vi of the ith arc hu, vi i for 0 ≤ i < ADJ[u].outdeg is stored in ADJ[u].out[i].

1
Depth- and Breadth-First Search are procedures for graph traversal starting from a given
a start vertex s ∈ V . Both procedures assign traversal numbers to the vertices, indicating the
order in which they are reached. Breadth-First Search additionally computes for each vertex its
distance (shortest path in number of traversed arcs) from the start vertex. Both procedures also
compute the search tree, which will be represented by a parent pointer. For each vertex u ∈ V ,
these computed values will be stored in ADJ[u].traversal, ADJ[u].distance and ADJ[u].parent
(for u 6= s), respectively.
The search procedures will modify the input graph by eliminating arcs, and both maintain
the following invariant.
Invariant 1 A vertex v is called visited when it has been assigned its (Depth- or Breadth-First
Search) traversal number. Each visited vertex v ∈ V will have no incoming arcs, that is, there
will be no arc hu, vi for any u ∈ V .
In order to maintain Invariant 1 the procedures eliminate incoming arcs when a vertex is
being visited. To do this efficiently, each vertex v ∈ V needs an array storing the vertices u ∈ V
for which there is an arc hu, vi, as well as the index i of v in the array ADJ[u].out such that
v = ADJ[u].out[i]. The arrays ADJ[v].in for each v ∈ V shall store the pairs (u, i) representing
the incoming arcs of v in this fashion.
To eliminate the arc hu, vi from the adjacency array of u, links (indices) to next and previous
non-removed vertices in the array are maintained, imposing a doubly linked list on each of the
adjacency arrays. The operation eliminate(G, u, i) removes the ith vertex in ADJ[u].out by
linking it out of the doubly linked list. The adjacency array itself is not changed. Next and
previous indices are maintained in the arrays ADJ[u].next and ADJ[u].prev; ADJ[u].first shall
index the first non-eliminated vertex in ADJ[u].out.
Algorithm 1 shows how to compute the array of incoming arcs and the pointers for the
doubly linked adjacency lists. A for-construct indicates sequential execution for all values in
some index set (in some order), whereas the par-construct indicates that the computations for
each element in the index set can be performed in parallel by the available processors. All
processors are assumed to have access to the same memory, and concurrent reading is allowed.
Synchronization is implied at the end of each par-construct.

Algorithm 1 Computing incoming arcs for each v ∈ V and doubly linked adjacency lists.
par u ∈ V do
ADJ[u].indeg ← 0,
ADJ[u].first ← 0,
end par
for u ∈ V do
par 0 ≤ i < ADJ[u].outdeg do
v ← ADJ[u].out[i]
d ← ADJ[v].indeg
ADJ[v].in[d] ← (u, i) {Add incoming arc hu, vi to v}
ADJ[v].indeg ← d + 1
ADJ[u].next[i] ← i + 1
ADJ[u].prev[i] ← i − 1
end par
end for

Lemma 1 Algorithm 1 computes the array of (u, i) vertex-index pairs representing the incoming
arcs for all vertices v ∈ V . It also initializes the doubly linked lists over the adjacency arrays

2
ADJ[u].out. The algorithm runs in O(m/p + n) time steps with p processors using O(m + n)
additional space.

Proof: In each sequential iteration over the set of vertices, incoming arcs are added to differ-
ent target vertices. For each u ∈ V this can therefore be done in parallel by the p available pro-
cessors in O(d(u)/p) time steps, where d(u) is the outdegree of vertex u, provided that all proces-
P
sors can read the start address of the array. The total time is O(n+ u∈V d(u)/p) = O(m/p+n).

The ADJ[u].first indices for each u ∈ V will be maintained such that ADJ[u].first < ADJ[u].outdeg
indicates a non-empty list of non-eliminated arcs out of u. The eliminate operation is straight-
forward.

2 Depth-First Search
We can now present the parallel Depth-First Search algorithm. The DFS procedure is called
with a start vertex s ∈ V and its DFS number a, and computes a DFS tree with reachable
vertices numbered successively in DFS order starting from a. Each recursive call visits a new
vertex, assigns it a DFS number, establishes Invariant 1 by eliminating, in parallel, all arcs into
the vertex, and then recursively DFS numbers the subtree from the first non-eliminated arc
hs, vi out of s; outgoing arcs are always considered in the fixed order as given in the adjacency
array representation of G. The recursion traverses the vertices in G reachable from s in DFS
order. The algorithm is given in detail as Algorithm 2, and is essentially as described by Varman
and Doshi [9].

Algorithm 2 Recursive, parallel Depth-First Search from start vertex s ∈ V . Vertices that are
reachable from s will be assigned successive DFS numbers starting from a.
Procedure DFS(s, G, a):
par 0 ≤ i < ADJ[s].indeg do
(u, j) ← ADJ[s].in[i]
eliminate(G, u, j)
end par
ADJ[s].traversal ← a {Vertex s now visited}
a←a+1
while ADJ[s].first < ADJ[s].outdeg do {As long as there are un-eliminated arcs}
i ← ADJ[s].first
v ← ADJ[s].out[i]
ADJ[v].parent ← s
a ← DFS(v, G, a)
end while
return a

Proposition 1 Algorithm 2 computes an ordered Depth-First Search numbering and tree in


O(m/p + n) time steps using p processors.

Proof: By Invariant 1 once a vertex v is visited, it will never be considered again, since
all arcs into v will have been eliminated. Therefore, each vertex in G that is reachable from s
will be visited once. The time complexity is immediate: when a vertex is visited the incom-
ing arcs are eliminated in parallel. Since ADJ[u].first will for each vertex be the index of the

3
0 0 0 0

1 1 1

2 2

0 0 0 0 0

1 1 5 1 6 5 1 6 5 1 8 6 5

2 2 2 2 7 2 7

3 4 3 4 3 4 3 4 3 4

Figure 1: A sample graph G = (V, E) and the DFS traversal as per Algorithm 2 starting
from the topmost node. Arcs are examined in counter-clockwise order, starting from lower left.
Node labels are the DFS numbers, and tree edges are indicated as heavy, undirected edges.
Arcs disappear as they are being eliminated, leaving at the end the heavy DFS tree.

first adjacent vertex v where the arc hu, vi has not been eliminated, the order in which vertices
are visited is the same as standard DFS search procedures, from which the correctness follows. ✷

The algorithm also computes a DFS tree by setting parent pointers for the visited vertices.
Note that it can easily be extended to classify arcs into backwards, forwards, tree and cross arcs,
as sometimes desirable by a DFS traversal, without changing the time bounds. An example
execution of the algorithm is given in Figure 1.

3 Breadth-First Search
The arc elimination idea can also be used for parallel Breadth-First Search, as shown in Algo-
rithm 3. The algorithm has the same structure as standard, “forwards” BFS [2], but performs
parallel arc elimination as new, unexplored vertices are added to the queue for the next level.
This ensures that each reachable vertex is explored once.

Proposition 2 Algorithm 3 computes an ordered Breadth-First Search numbering and tree in


O(m/p + n) time steps using p processors.

Proof: Again by Invariant 1, once a vertex has been visited it will never be considered again,
and therefore arc elimination is performed once for each reachable vertex. From this the time
bound follows. For each vertex in Q for some level, the un-eliminated arcs are considered in
order determined by the representation of G, and as each unvisited vertex is put into the queue
Q′ for the next level, all incoming arcs are eliminated. This in particular ensures that there are
no arcs between vertices in Q′ . As for standard BFS, all vertices in Q before the start of an
iteration of the innermost repeat loop have the same distance to the source vertex, from which
correctness follows. ✷

It is especially worth noticing that there are no arcs between nodes in Q′ , the queue being
filled for the next iteration. An example execution of the algorithm is given in Figure 2. The
important property of BFS is that vertices are explored in least recently visited order; it is

4
Algorithm 3 Parallel Breadth-First Search from start vertex s ∈ Q. Vertices that are reachable
from s will be assigned a BFS number starting from a; also the distance from s (in smallest
number of arcs) will be computed.
Procedure BFS(s, G, a):
par 0 ≤ i < ADJ[s].indeg do
(u, j) ← ADJ[s].in[i]
eliminate(G, u, j)
end par
l←0
ADJ[s].traversal ← a
ADJ[u].distance ← l
Q.enque(s) {Start vertex s visited}
Q′ ← ∅
repeat
l ← l + 1 {Next level}
repeat
u ← Q.deque()
while ADJ[u].first < ADJ[u].outdeg do {As long as there are un-eliminated arcs}
i ← ADJ[u].first
v ← ADJ[u].out[i]
par 0 ≤ j < ADJ[v].indeg do
(w, k) ← ADJ[v].in[j]
eliminate(G, w, k)
end par
a ← a + 1 {Next vertex}
ADJ[v].traversal ← a
ADJ[v].distance ← l
ADJ[v].parent ← u
Q′ .enque(v) {Vertex v has now been visited, enque for next level}
end while
until Q = ∅
Q ← Q′
until Q = ∅

5
0 0 0 0

1 1 2 1 2 3

0 0 0 0 0

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

5 5 6 5 6 5 6

7 7 8

Figure 2: The sample graph G = (V, E) and the BFS traversal as per Algorithm 3 starting
from the topmost node. Arcs are examined in counter-clockwise order, starting from lower left.
Node labels are the computed BFS numbers, and tree edges are indicated as heavy, undirected
edges. Arcs disappear as they are being eliminated, leaving at the end the heavy BFS tree.

therefore, also in the arc elimination algorithm, possible to dispense with the explicit next level
queue Q′ and do with only a single repeat-loop [8].

4 Discussion
The time bounds for both Depth- and Breadth-First Search algorithms guarantee linear speed-
up when p ≤ m/n, or equivalently m ≥ pn; that is, good speed-up is possible for graphs
with average degree larger than the number of processors. The algorithms presented here
are complementary to the standard, textbook, “forwards” procedures for DFS and BFS [2].
Standard DFS where arcs are examined only in the forwards direction has no parallelism; in
contrast, the algorithm given here can perform the arc elimination fully in parallel. Typical,
parallel Breadth-First Search algorithms exploit parallelism mostly by considering active vertices
in the queue for each level in parallel, see, e.g., [1, 4, 6]. Although the forward edges can
also be explored in parallel, compaction or other data structure operations are necessary for
resolving/avoiding update conflicts and maintaining the queue for the next level. The parallel
running time of such algorithms will typically be bounded by the diameter of the graph, but
either at the cost of more work incurred by data structure operations, or by requiring stronger,
atomic operations. The arc elimination approach does not require either of these means (data
structures, compaction, atomic operations), and has exploitable parallelism independent of the
BFS structure of the graphs, as long as the total number of edges m satisfies m ≥ np.

References
[1] D. A. Bader and K. Madduri. Designing multithreaded algorithms for breadth-first search
and st-connectivity on the Cray MTA-2. In International Conference on Parallel Processing
(ICPP), pages 523–530, 2006.

[2] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT


Press, third edition, 2009.

6
[3] J. A. Edwards and U. Vishkin. Better speedups using simpler parallel programming for
graph connectivity and biconnectivity. In International Workshop on Programming Models
and Applications for Multicores and Manycores (PMAM), pages 103–114, 2012.

[4] C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm


(or how to cope with the nondeterminism of reducers). In 22nd Annual ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), pages 303–314, 2010.

[5] J. H. Reif. Depth-first search is inherently sequential. Information Processing Letters,


20:229–234, 1985.

[6] J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared
memory. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
(PPoPP), pages 135–146, 2013.

[7] R. E. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing,
1(2):146–160, 1972.

[8] R. E. Tarjan. Data Structures and Network Algorithms. Society of Industrial and Applied
Mathematics (SIAM), 1983.

[9] P. J. Varman and K. Doshi. Improved parallel algorithms for the depth-first search and
monotone circuit value problems. In 15th ACM Conference on Computer Science, pages
175–182, 1987.

You might also like