Parallel DFS and BFS
Parallel DFS and BFS
and
Parallel Best-First Search
Depth-First Search (DFS)
However, if v is the starting vertex and (v, w) is the edge being examined and
w is already visited, v remains the vertex being searched from, and another
vertex adjacent to v which has not been visited. is chosen as the vertex to be
examined.
This indicates that in the DFS strategy the vertices are examined in order of
decreasing depth in the tree.
Parallel Depth-First Search
Cont...
Cont...
• The critical issue in parallel depth-first search algorithms is the distribution of the
search space among the processors.
• The unstructured nature of tree search and the imbalance resulting from static
partitioning.
Cont...
Dynamic
Load Balancing
Important Parameters of Parallel DFS
• Work-Splitting Strategies
• When work is transferred, the donor's stack is split into two stacks, one of
which is sent to the recipient.
• In other words, some of the nodes (that is, alternatives) are removed from
the donor's stack and added to the recipient's stack.
• If too little work is sent, the recipient quickly becomes idle; if too much,
the donor becomes idle.
• Ideally, the stack is split into two equal pieces such that the size of the
search space represented by each stack is the same. Such a split is
called a half-split.
Cont...
• Work-Splitting Strategies
• It is difficult to get a good estimate of the size of the tree rooted at an
unexpanded alternative in the stack.
• However, the alternatives near the bottom of the stack (that is, close to
the initial node) tend to have bigger trees rooted at them, and alternatives
near the top of the stack tend to have small trees rooted at them.
• To avoid sending very small amounts of work, nodes beyond a specified
stack depth are not given away. This depth is called the cutoff depth.
Load-Balancing Schemes:
• Asynchronous Round Robin (ARR)
• each processor maintains an independent variable, target.
• Whenever a processor runs out of work, it uses target as the label of a
donor processor and attempts to get work from it.
• The value of target is incremented (modulo p) each time a work request
is sent.
• The initial value of target at each processor is set to ((label + 1) modulo
p) where label is the local processor label.
• Here, work requests are generated independently by each processor.
• However, it is possible for two or more processors to request work from
the same donor at nearly the same time.
Cont...
Load-Balancing Schemes:
• Global Round Robin (GRR)
• It uses a single global variable called target.
• This variable can be stored in a globally accessible space in shared
address space machines or at a designated processor in message
passing machines.
• Whenever a processor needs work, it requests and receives the value of
target, either by locking, reading, and unlocking on shared address space
machines or by sending a message requesting the designated processor
(say P0).
• The value of target is incremented (modulo p) before responding to the
next request.
• The recipient processor then attempts to get work from a donor processor
whose label is the value of target.
• GRR ensures that successive work requests are distributed evenly over
all processors.
• A drawback of this scheme is the contention for access to target.
Cont...
Load-Balancing Schemes:
• Random Polling (RP)
• It is the simplest load-balancing scheme.
• When a processor becomes idle, it randomly selects a donor.
• Each processor is selected as a donor with equal probability, ensuring
that work requests are evenly distributed.
Best-First Search
• The vertex selected for consideration is the one having the best
value of this evaluation function.
• If the selected vertex is a solution, we can quit; otherwise, all
those new vertices are added to the set of vertices generated so
far for the next step of examination.
• For example, this value should be the associated cost of the
element at that level with respect to the objective function.
• The main disadvantage of BFS is its memory requirement, which
is linear in the size of the search space explored. For problems
with a large search space tree, providing the required memory
becomes a problem.
Execution of Best-First Search
Cont...
Cont...
Cont...
Parallel Best-First Search
• In the sequential best-first search algorithm, the most promising node from
the open list is removed and expanded, and newly generated nodes are
added to the open list (Assuming open list is implemented by queue data
structure).
• Parallelism in a best-first search can be introduced by expanding the vertices
in parallel.
• Centralized strategy:
• Each processor gets work from a single global open list or queue.
• Suppose p processors are available.
• At each time t, instead of expanding a single vertex with the best value of the
evaluation.
• p vertices are considered for expansion as the p best values of the evaluation.
• After each iteration, each processor needs only to place the generated vertices
on queue.
• The new vertices are evaluated and placed on queue for the next step of the
examination.
• The locking operation is used here to serialize queue access by various
processors.
Parallel Best-First Search
Cont...
2. Since the open list is accessed for each node expansion, it must be easily
accessible to all processors, which can severely limit performance.
• Even on shared-address-space architectures, contention for the open
list limits speedup.
• Let texp be the average time to expand a single node, and taccess be
the average time to access the open list for a single-node expansion.
• If there are n nodes to be expanded by both the sequential and
parallel formulations (assuming that they do an equal amount of
work), then the sequential run time is given by n(taccess + texp).
• Assume that it is impossible to parallelize the expansion of individual
nodes. Then the parallel run time will be at least ntaccess, because the
open list must be accessed at least once for each node expanded.
• Hence, an upper bound on the speedup is (taccess + texp)/taccess.
Cont...
• One way to avoid the contention due to a centralized open list is to let each
processor have a local open list.
• Initially, the search space is statically divided among the processors by
expanding some nodes and distributing them to the local open lists of
various processors.
• All the processors then select and expand nodes simultaneously.
Cont...