0% found this document useful (0 votes)
1K views35 pages

Parallel DFS and BFS

The document discusses parallel implementations of depth-first search (DFS) and best-first search (BFS) algorithms. For parallel DFS, the key issues are how to distribute the search space among processors and how idle processors obtain new work. Dynamic load balancing schemes like asynchronous round robin are proposed. For parallel BFS, a centralized approach where processors share a global queue is discussed, but this suffers from termination detection and contention issues with high queue access overhead.

Uploaded by

Geeta Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views35 pages

Parallel DFS and BFS

The document discusses parallel implementations of depth-first search (DFS) and best-first search (BFS) algorithms. For parallel DFS, the key issues are how to distribute the search space among processors and how idle processors obtain new work. Dynamic load balancing schemes like asynchronous round robin are proposed. For parallel BFS, a centralized approach where processors share a global queue is discussed, but this suffers from termination detection and contention issues with high queue access overhead.

Uploaded by

Geeta Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Parallel Depth-First Search

and
Parallel Best-First Search
Depth-First Search (DFS)

 Depth-first search is the process of searching a graph in such a


way that the search moves forward until it reaches a vertex
whose neighbors have all been examined. At this point it
backtracks a minimum distance and continues in a new
direction.

 In other words, in a depth-first search, if v is the vertex being


searched from (starting vertex), (v,w) is the edge being
examined and w is unvisited, w will be the next vertex searched
from (the new starting point). The DFS is called recursively.
Cont...

 However, if v is the starting vertex and (v, w) is the edge being examined and
w is already visited, v remains the vertex being searched from, and another
vertex adjacent to v which has not been visited. is chosen as the vertex to be
examined.
 This indicates that in the DFS strategy the vertices are examined in order of
decreasing depth in the tree.
Parallel Depth-First Search
Cont...
Cont...

 The parallel depth-first search discussed is applicable to SIMD


and MIMD architectures. However, it is not recommended to
implement a parallel DFS on SIMD computers because the
following two problems:
 Since all processors execute identical instructions, all must be in the
same stage of the search tree. It is possible that only some of the
processors are busy and others are idle, which will reduce the overall
execution rate compared with the MIMD computers.
 Due to the architectural constraints of SIMD computers, load balancing
must be performed globally at the beginning of the search tree or at the
end of each stage of the search. When some of the processors are idle,
busy processors can share the work with the idle processors.
Parallel Depth-First Search

• The critical issue in parallel depth-first search algorithms is the distribution of the
search space among the processors.
• The unstructured nature of tree search and the imbalance resulting from static
partitioning.
Cont...

Dynamic
Load Balancing
Important Parameters of Parallel DFS

• Two characteristics of parallel DFS are critical to determining its


performance:
• first is the method for splitting work at a processor, and
• the second is the scheme to determine the donor processor when a
processor becomes idle.

• Work-Splitting Strategies
• When work is transferred, the donor's stack is split into two stacks, one of
which is sent to the recipient.
• In other words, some of the nodes (that is, alternatives) are removed from
the donor's stack and added to the recipient's stack.
• If too little work is sent, the recipient quickly becomes idle; if too much,
the donor becomes idle.
• Ideally, the stack is split into two equal pieces such that the size of the
search space represented by each stack is the same. Such a split is
called a half-split.
Cont...

• Work-Splitting Strategies
• It is difficult to get a good estimate of the size of the tree rooted at an
unexpanded alternative in the stack.
• However, the alternatives near the bottom of the stack (that is, close to
the initial node) tend to have bigger trees rooted at them, and alternatives
near the top of the stack tend to have small trees rooted at them.
• To avoid sending very small amounts of work, nodes beyond a specified
stack depth are not given away. This depth is called the cutoff depth.

• Some possible strategies for splitting the search space are


(1) send nodes near the bottom of the stack,
(2) send nodes near the cutoff depth, and
(3) send half the nodes between the bottom of the stack and the cutoff depth.
Cont...

Load-Balancing Schemes:
• Asynchronous Round Robin (ARR)
• each processor maintains an independent variable, target.
• Whenever a processor runs out of work, it uses target as the label of a
donor processor and attempts to get work from it.
• The value of target is incremented (modulo p) each time a work request
is sent.
• The initial value of target at each processor is set to ((label + 1) modulo
p) where label is the local processor label.
• Here, work requests are generated independently by each processor.
• However, it is possible for two or more processors to request work from
the same donor at nearly the same time.
Cont...

Load-Balancing Schemes:
• Global Round Robin (GRR)
• It uses a single global variable called target.
• This variable can be stored in a globally accessible space in shared
address space machines or at a designated processor in message
passing machines.
• Whenever a processor needs work, it requests and receives the value of
target, either by locking, reading, and unlocking on shared address space
machines or by sending a message requesting the designated processor
(say P0).
• The value of target is incremented (modulo p) before responding to the
next request.
• The recipient processor then attempts to get work from a donor processor
whose label is the value of target.
• GRR ensures that successive work requests are distributed evenly over
all processors.
• A drawback of this scheme is the contention for access to target.
Cont...

Load-Balancing Schemes:
• Random Polling (RP)
• It is the simplest load-balancing scheme.
• When a processor becomes idle, it randomly selects a donor.
• Each processor is selected as a donor with equal probability, ensuring
that work requests are evenly distributed.
Best-First Search

• Best-first search is a way of combining the advantages of both


the depth-first and the breadth-first search into a single method.
• It uses a heuristic function to direct the traversing of the search
tree. Smaller heuristic values are assigned to more promising
nodes.
• The expansion of a vertex v is estimated numerically by a
heuristic evaluation function f(v) which may depend on the
description of v, the description of the goal, the information
gathered by the search up to this point and any extra knowledge
about the problem domain.
Cont...

• The vertex selected for consideration is the one having the best
value of this evaluation function.
• If the selected vertex is a solution, we can quit; otherwise, all
those new vertices are added to the set of vertices generated so
far for the next step of examination.
• For example, this value should be the associated cost of the
element at that level with respect to the objective function.
• The main disadvantage of BFS is its memory requirement, which
is linear in the size of the search space explored. For problems
with a large search space tree, providing the required memory
becomes a problem.
Execution of Best-First Search
Cont...
Cont...
Cont...
Parallel Best-First Search

• In the sequential best-first search algorithm, the most promising node from
the open list is removed and expanded, and newly generated nodes are
added to the open list (Assuming open list is implemented by queue data
structure).
• Parallelism in a best-first search can be introduced by expanding the vertices
in parallel.
• Centralized strategy:
• Each processor gets work from a single global open list or queue.
• Suppose p processors are available.
• At each time t, instead of expanding a single vertex with the best value of the
evaluation.
• p vertices are considered for expansion as the p best values of the evaluation.
• After each iteration, each processor needs only to place the generated vertices
on queue.
• The new vertices are evaluated and placed on queue for the next step of the
examination.
• The locking operation is used here to serialize queue access by various
processors.
Parallel Best-First Search
Cont...

• There are two problems with the centralized approach:


1.The termination criterion of sequential BFS fails for parallel BFS.
• Since at any moment, p nodes from the open list are being
expanded, it is possible that one of the nodes may be a solution that
does not correspond to the best goal node (or the path found is not
the shortest path).
• This is because the remaining p - 1 nodes may lead to search spaces
containing better goal nodes.
• Therefore, if the cost of a solution found by a processor is c, then this
solution is not guaranteed to correspond to the best goal node until
the cost of nodes being searched at other processors is known to be
at least c.
• The termination criterion must be modified to ensure that termination
occurs only after the best solution has been found.
Cont...

2. Since the open list is accessed for each node expansion, it must be easily
accessible to all processors, which can severely limit performance.
• Even on shared-address-space architectures, contention for the open
list limits speedup.
• Let texp be the average time to expand a single node, and taccess be
the average time to access the open list for a single-node expansion.
• If there are n nodes to be expanded by both the sequential and
parallel formulations (assuming that they do an equal amount of
work), then the sequential run time is given by n(taccess + texp).
• Assume that it is impossible to parallelize the expansion of individual
nodes. Then the parallel run time will be at least ntaccess, because the
open list must be accessed at least once for each node expanded.
• Hence, an upper bound on the speedup is (taccess + texp)/taccess.
Cont...

• One way to avoid the contention due to a centralized open list is to let each
processor have a local open list.
• Initially, the search space is statically divided among the processors by
expanding some nodes and distributing them to the local open lists of
various processors.
• All the processors then select and expand nodes simultaneously.
Cont...

• Consider a scenario where processors do not communicate with each other.


• In this case, some processors might explore parts of the search space that
would not be explored by the sequential algorithm.
• This leads to a high search overhead factor and poor speedup.
• Consequently, the processors must communicate among themselves to
minimize unnecessary search.
• The use of a distributed open list trades-off communication and computation:
decreasing communication between distributed open lists increases search
overhead factor, and decreasing search overhead factor with increased
communication increases communication overhead.
Communication strategies for Parallel Best-First
Tree Search

• A communication strategy allows state-space nodes to be


exchanged between open lists on different processors.
• The objective of a communication strategy is to ensure that
nodes with good heuristic values are distributed evenly among
processors.

• Three communication strategies:


• Random
• Ring
• Blackboard
Cont...

• Random communication strategy


• Each processor periodically sends some of its best nodes to
the open list of a randomly selected processor.
• This strategy ensures that, if a processor stores a good part
of the search space, the others get part of it.
• If nodes are transferred frequently, the search overhead
factor can be made very small; otherwise it can become quite
large.
• The communication cost determines the best node transfer
frequency.
• If the communication cost is low, it is best to communicate
after every node expansion.
Cont...

• Ring communication strategy:


• The processors are mapped in a virtual ring.
• Each processor periodically exchanges some of its best
nodes with the open lists of its neighbors in the ring.
• This strategy can be implemented on message passing as
well as shared address space machines with the processors
organized into a logical ring.
• As before, the cost of communication determines the node
transfer frequency.

• Unless the search space is highly uniform, the search


overhead factor of this scheme is very high. The reason is
that this scheme takes a long time to distribute good nodes
from one processor to all other processors.
Cont...

• A message-passing implementation of parallel best-first search using the ring


communication strategy.
Cont...

• Blackboard communication strategy:


• There is a shared blackboard through which nodes are switched among
processors as follows:
• After selecting the best node from its local open list, a processor
expands the node only if its heuristic value is within a tolerable limit of
the best node on the blackboard.
• If the selected node is much better than the best node on the
blackboard, the processor sends some of its best nodes to the
blackboard before expanding the current node.
• If the selected node is much worse than the best node on the
blackboard, the processor retrieves some good nodes from the
blackboard and reselects a node for expansion.
• The blackboard strategy is suited only to shared-address-space
computers, because the value of the best node in the blackboard has to
be checked after each node expansion.
Cont...

• An implementation of parallel best-first search using the blackboard


communication strategy.

You might also like