UNIT-3 Concurrent and Parallel Programming:: Some Simple Computations
UNIT-3 Concurrent and Parallel Programming:: Some Simple Computations
2. Parallel Prefix Computation. With the same assumptions as in the preceding paragraph, a
parallel prefix computation is defined as simultaneously evaluating all of the prefixes
oftheexpressionx0⊕x1 ...⊕xn–1;i.e.,x0,x0⊕x1,x0⊕x1 ⊕x2,...,x0⊕x1⊕...⊕xn–1. Note that the ith
prefix expression is si = x0 ⊕x1 ⊕ . . . ⊕xi.
The graph representing the prefix computation on a uniprocessor is similar to Fig. 3.1, but with
the intermediate values also output.
1
such that it gets to the destination as quickly as possible. The problem becomes more challenging
when multiple packets reside at different processors, each with its own destination. In this case,
the packet routes may interfere with one another as they go through common intermediate
processors. When each processor has at most one packet to send and one packet to receive, the
packet routing problem is called one-to-one communication or 1–1routing.
5. Sorting: Rather than sorting a set of records, ach with a key and data elements, we focus on
sorting a
set of keys for simplicity. Our sorting problem is thus defined as: Given a list of n keys x0,
x1, . . . , xn–1, and a total order ≤ on key values, rearrange the n keys as xi , xi , . . . , xi ,
such that xi ≤ xi≤ . . . ≤x i. We consider only sorting the keys In non descending order.
SOME SIMPLEARCHITECTURES
In this section, we define four simple parallel architectures:
Linear Array: Figure 3.2 shows a linear array of nine processors, numbered 0 to 8. The
diameter of ap-processor linear array, defined as the longest of the shortest distances between
pairs of processors, is D = p – 1. The ( maximum) node degree, defined as the largest number of
links or communication channels associated with a processor, is d = 2. The ring variant, also
shown in Fig.3.2, has the same node degree of 2 but a smaller diameter of D=p/2.
Fig 3.2: A linear array of nine processors and its ring variant.
Binary Tree: Figure 3.3 shows a binary tree of nine processors. This binary tree is balanced in
that the leaf levels differ by at most 1. If all leaf levels are identical and every non leaf processor
has two children, the binary tree is said to be complete. The diameter
of a p-processor complete binary tree is 2 log2 (p + 1) – 2. More generally, the diameter of ap-
processor balanced binary tree architecture is 2log 2p or2log2p–1, depending on the placement of
leaf nodes at the last level. Unlike linear array, several different p-processor binary tree
architectures may exist. This is usually not a problem as we almost always deal with complete
binary trees. The (maximum) node degree in a binary tree is d=3.
2D Mesh: Figure 3.4 shows a square 2D mesh of nine processors. The diameter of a p-processor
square mesh is – 2. More generally, the mesh does not have to be square. The diameter of a p-
processor r× (p/ r) mesh is D = r + p /r – 2. Again, multiple 2D meshes may exist for the same
number p of processors, e.g., 2 × 8 or 4 × 4. Square meshes are usually preferred because they
minimize the diameter. The torus variant, also shown in Fig. 3.4, has end-around or wraparound
links for rows and columns. The node degree for both meshes and tori is d = 4. But a p -
processor r×(p/r) torus has a smaller diameter of D = ×r /2×+p/(2r).
In the 2D mesh of Fig. 3.4, Processor 0 can send/receive data directly to/from P 1and P3.
However, it has to go through an intermediary to send/receive data to/from P 4 , say. In a shared-
memory multiprocessor, every piece of data is directly accessible to every processor (we assume
that each processor can simultaneously send/receive data over all of its p – 1 links).The diameter
D=1 of a complete graph is an indicator of this direct access. The node
Fig 3.4: A 2D mesh of nine processors and its torus variant.
values, say in P 2 and P 6 , the propagation would have been faster. In the worst case, p – 1
communication steps (each involving sending a processor’s value to both neighbors), and the
same number of three – way comparison steps, are needed. This is the best one can hope for,
given that the diameter of ap-processor linear array is D = p – 1 (diameter-based lower bound).
Fig 3.6: Maximum-finding on a linear array of nine processors
For a general semigroup computation, the processor at the left end of the array (the one with no left neighbor) becomes
active and sends its data value to the right (initially, all processors are dormant or inactive). On receiving a value from its
left neighbor, a processor becomes active, applies the semigroup operation ⊕ to the value received from the left and its
own data value, sends the result to the right, and becomes inactive again. This wave of activity propagates to the
right, until the rightmost processor obtains the desired result. The
computationresultisthenpropagatedleftwardtoallprocessors.Inall,2p–2communication steps
areneeded.
th
Parallel Prefix Computation. Let us assume that we want the i prefix result to be obtained at
th
the i processor, 0 ≤i ≤p – 1. The general semigroup algorithm described in the preceding
paragraph in fact performs a semigroup computation first and then does a broadcast of the final
value to all processors. Thus, we already have an algorithm for parallel prefix computation that
takes p – 1 communication/combining steps. A variant of the parallel prefix computation, in
th
which Processor i ends up with the prefix result up to the (i – 1) value, is sometimes useful.
This diminished prefix computation can be performed just as easily if each processor holds onto
the value received from the left rather than the one it sends to the right. The diminished prefix
sum results for the example of Fig. 3.7 would be 0, 5, 7, 15, 21, 24, 31, 40,41.
Thus far, we have assumed that each processor holds a single data item. Extension of the
semigroup and parallel prefix algorithms to the case where each processor initially holds several
data items is straightforward. Figure 3.8 shows a parallel prefix sum computation with each
processor initially holding two data items. The algorithm consists of each processor doing a
prefix computation on its own data set of size n/p (this takes n/p – 1 combining steps),then doing
a diminished parallel prefix computation on the linear array as above (p-1
communication/combining steps), and finally combining the local prefix result from this last
computation with the locally computed prefixes(n /p combining steps). In all, 2n/p + p-2
combining steps and p – 1 communication steps are required.
Fig 3.8: Computing prefix sums on a linear array with two items per processor.
originating at different processors can flow rightward and leftward in lockstep, without ever
interfering with each other.
Broadcasting. If Processor i wants to broadcast a value a to all processors, it sends an rbcast(a)
(read r-broadcast) message to its right neighbor and an lbcast( a) message to its left neighbor.
Any processor receiving an rbcast(a) message, simply copies the value a and forwards the
message to its right neighbor (if any). Similarly, receiving an lbcast(a) message causes a to be
copied locally and the message forwarded to the left neighbor. The worst-case number of
communication steps for broadcasting is p – 1.
Sorting. We consider two versions of sorting on a linear array: with and without I/O. Figure 3.9
depicts a linear-array sorting algorithm when p keys are input, one at a time, from the left end.
Each processor, on receiving a key value from the left, compares the received value with the
value stored in its local register. The smaller of the two values is kept in the local register and
larger value is passed on to the right. Once all p inputs have been received, we must allow p – 1
additional communication cycles for the key values that are in transit to settle into their
respective positions in the linear array. If the sorted list is to be output from the left, the output
phase can start immediately after the last key value has been received. In this case, an array half
the size of the input list would be adequate and we effectively have zero-time sorting, i.e., the
total sorting time is equal to the I/O time.
If the key values are already in place, one per processor, then an algorithm known as odd–even
transposition can be used for sorting. A total of p steps are required. In an odd-numbered step,
odd-numbered processors compare values with their even-numbered right neighbors. The two
processors exchange their values if they are out of order. Similarly, in an even-numbered step,
even-numbered processors compare–exchange values with their right neighbors (see Fig. 3.10).
In the worst case, the largest key value resides in Processor 0 and must move all the way to the
other end of the array. This needs p – 1 right moves. One step must be added because no
movement occurs in the first step. Of course one could use even–odd transposition, but this will
not affect the worst-case time complexity of the algorithm for our nine-process or linear array.
Fig 3.9: Sorting on a linear array with the keys input sequentially from the left.
Our next example computation is important not only because it is a very useful building block in
many applications, but also in view of the fact that it demonstrates how a problem that seems
hopelessly sequential can be efficiently parallelized.
The problem will be presented in terms of a linear linked list of size p, but in practice it often
arises in the context of graphs of the types found in image processing and computer vision
applications. Many graph-theoretic problems deal with (directed) paths between various pairs of
nodes. Such a path essentially consists of a sequence of nodes, each “pointing” to the next node
on the path; thus, a directed path can be viewed as a linear linked list.
The problem of list ranking can be defined as follows: Given a linear linked list of the type
shown in Fig. 3.11, rank the list elements in terms of the distance from each to the terminal
Fig 3.12: Example linked list and the ranks of its elements.
element. The terminal element is thus ranked 0, the one pointing to it 1, and so forth. In a list of
length p, each element’s rank will be a unique integer between 0 and p–1.
A sequential algorithm for list ranking requires (p) time. Basically, the list must be traversed
once to determine the distance of each element from the head, storing the results in the linked list
itself or in a separate integer vector. This first pass can also yield the length of the list (six in the
example of Fig. 3.12). A second pass, through the list, or the vector of p intermediate results,
then suffices to compute all of the ranks.
The list ranking problem for the example linked list of Fig. 3.12 may be approached with the
PRAM input and output data structures depicted in Fig. 3.13. The info and next vectors are given,
as is the head pointer (in our example, head = 2). The rank vector must be filled with the unique
element ranks at the termination of the algorithm.
The parallel solution method for this problem is known as pointer jumping:
Repeatedly make each element point to the successor of its successor (i.e., make the pointer jump
over the current successor) until all elements end up pointing to the terminal node, keeping track
of the number of list elements that have been skipped over. If the original list is not to be
modified, a copy can be made in the PRAM’s shared memory in constant time before the
algorithm is applied.
Processor j, 0 ≤j <p, will be responsible for computing rank [j]. The invariant of the list ranking
algorithm given below is that initially and after each iteration, the partial computed rank of each
element is the difference between its rank and the rank of its successor. With the difference
between the rank of a list element and the rank of its successor available, the rank of an element
can be determined as soon as the rank of its successor becomes known. Again, a doubling
process takes place. Initially, only the rank of the terminal element (the only node that points to
itself) is known. In successive iterations of the algorithm, the ranks
Fig 3.13: PRAM data structures representing a linked list and the ranking results.
Of two elements, then four elements, the n eight elements, and so forth become known until the
ranks of all elements have been determined
PRAM list ranking algorithm (via pointer jumping)
Processorj,0 j<p,do{initializethepartialranks}
ifnext [j ] = j then rank[j] := 0 else rank[j ] := 1
endif whilerank[next[head]] 0 Processor j, 0 j <p, do
rank[j] := rank [j] + rank [next[j]] next[j] := next
[next[j]] endwhile
Figure3.14 shows the intermediate e values in the vectors rank(numbers with inboxes) and next
(arrows) as the above list ranking algorithm is applied to the example list. Because the number of
elements that are skipped doubles with each iteration, the number of iterations, and thus the
running time of the algorithm, is logarithmic in p.
Fig 3.14: Element ranks initially and after each of the three iterations
Parallel Algorithm-Introduction
An algorithm is a sequence of steps that take inputs from the user and after some computation,
produces an output. A parallel algorithm is an algorithm that can execute sever all instructions
simultaneously on different processing devices and then combine all the individual out put to
produce the final result.
Concurrent Processing
The easy availability of computers along with the growth of Internet has changed the way we
store and process data. We are living in a day and age where data is available in abundance.
Every day we deal with huge volumes of data that require complex computing and that too, in
quick time. Sometimes, we need to fetch data from similar or interrelated events that occur
simultaneously. This is where we require concurrent processing that can divide a complex task
and process. It multiple systems to produce the output in quick time.
Concurrent processing is essential where the task involves processing a huge bulk of complex
data. Examples include− accessing large databases, aircraft testing, astronomical calculations,
atomic and nuclear physics, biomedical analysis, economic planning, image processing, robotics,
weather forecasting, web-based services, etc.
What is Parallelism?
Parallelism is the process of processing several set of instructions simultaneously. It reduces the
total computational time. Parallelism can be implemented by using parallel computers, i.e.a
computer with many processors. Parallel computers require parallel algorithm, programming
languages, compilers and operating system that support multitasking.
In this tutorial, we will discuss only about parallel algorithms. Before moving further, let us first
discuss about algorithms and their types.
What is an Algorithm?
An algorithm is a sequence of instructions followed to solve a problem. While designing an
algorithm, we should consider the architecture of computer on which the algorithm will be
executed. As per the architecture, there are two types of computers–
• Sequential computer
• Parallel computer
Depending on the architecture of computers, we have two types of algorithms –
• Sequential Algorithm − An algorithm in which some consecutive steps of instructions are
executed in a chronological order to solve a problem.
It is not easy to divide a large problem into sub-problems. Sub-problems may have data
dependency among them. Therefore, the processors have to communicate with each other to
solve the problem.
It has been found that the time needed by the processors in communicating with each other is
more than the actual processing time. So, while designing a parallel algorithm, proper CPU
utilization should be considered to get an efficient algorithm.
To design an algorithm properly, we must have a clear idea of the basic model of computation in
a parallel computer.
Model of Computation
Both sequential and parallel computers operate on a set (stream) of instructions called
algorithms. These set of instructions (algorithm) instruct the computer about what it has to do in
each step.
Depending on the instruction stream and data stream, computers can be classified into four
categories–
• Single Instruction stream, Single Data stream(SISD)computers single
• Instruction Stream , Multiple Data Stream (SIMD) computers Multiple
• Instruction Stream , Single Data Stream (MISD) computers Multiple
• Instruction Stream , Multiple Data Stream (MIMD) computers Multiple
SISD Computers
SISD computers contain one control unit, one processing unit, and one memory unit.
In this type of computers, the processor receives a single stream of instructions from the control
unit and operates on a single stream of data from the memory unit. During computation, at each
step, the processor receives one instruction from the control unit and operates on a single data
received from the memory unit.
SIMD Computers:
SIMD computers contain one control unit, multiple processing units, and shared memory or
interconnection network.
Each of the processing units has its own local memory unit to store both data and instructions. In
SIMD computers, processors need to communicate among themselves. This is done by shared
memory or by inter connection network.
While some of the processors execute a set of instructions, the remaining processors wait for
their next set of instructions. Instructions from the control unit decides which processor will be
active(execute instructions) or inactive (wait for next instruction).
MISD Computers
As the name suggests, MISD computers contain multiple control units, multiple processing units,
and one common memory unit.
MIMD Computers
MIMD computers have multiple control units, multiple processing units, and a shared
memory or interconnection network.
Fig 3.18: Instruction and data stream
Here, each processor has its own control unit, local memory unit, and arithmetic and logic
unit. They receive different sets of instructions from their respective control units and
operate on different sets of data.
➢ Multicomputer − When all the processors are very close to one another (e.g., in
the same room).
➢ Distributed system − When all the processors are far away from one another (e.g.-
in the different cities)
Parallel Algorithm-Structure
To apply any algorithm properly, it is very important that you select a proper data structure. It is
because a particular operation performed on a data structure may take more time as compared to
the same operation performed on another data structure.
Example−To access the ith element in a set by using an array, it may take a constant time but by
using a linked list, the time required to perform the same operation may become a polynomial.
Therefore, the selection of a data structure must be done considering the architecture and the type
of operations to be performed.
The following data structures are commonly used in parallel programming
− Hypercube Network
Linked List
A linked list is a data structure having zero or more nodes connected by pointers. Nodes may or
may not occupy consecutive memory locations. Each node has two or three parts−one data part
that stores the data and the other two are link fields that store the address of the previous or next
node. The first node’s address is stored in an external pointer called head. The last node, known
as tail, generally does not contain any address.
There are three types of linked lists −
Singly Linked List Doubly Linked List Circular Linked List
Singly Linked List
A node of a singly linked list contains data and the address of the next node. A next ernal pointer
called head stores the address of the first node.
Hypercube Network
Hyper cube architecture is helpful for those parallel algorithms where each task has to
communicate with other tasks. Hypercube topology can easily embed other topologies such as
ring and mesh. It is also known as n-cubes, where n is the number of dimensions. A hypercube
can be constructed recursively.
Parallel Algorithm-Matrix Multiplication
A matrix is a set of numerical and non-numerical data arranged in a fixed number of rows and
column. Matrix multiplication is an important multiplication design in parallel communication.
Here we will discuss the implementation of matrix multiplication on various communication
networks like mesh and hypercube. Mesh and hypercube have higher network connectivity, so
they allow faster algorithm than other networks like ring network.
Mesh Network
A topology where a set of nodes for map-dimensional grid is called a mesh topology. Here, all
the edges are parallel to the grid and all the adjacent nodes can communicate among themselves.
Total number of nodes= (number of nodes in row) × (number of nodes in column)
A mesh network can be evaluated using the following factors−
• Diameter
• Bisection width
Diameter−In a mesh network, the longest distance between two nodes is its diameter. Ap-
dimensional mesh network having kP nodes has a diameter of (k–1).
Bisection width − Bisection width is the minimum number of edges needed to be removed from
a network to divide the mesh network into two halves.
We have considered a 2D mesh network SIMD model having wrap around connections. We will
design an algorithm to multiply two n×n arrays using n2 processors in a particular amount of
time.
Matrices A and B have elements aij and bij respectively. Processing element PEij represents aij
and bij. Arrange the matrices A and B in such a way that every processor has a pair of elements
to multiply. The elements of matrix A will move in left direction and the elements of matrix B
will move in upward direction. These changes in the position of the elements in matrix A and B
present each processing element, PE, a new pair of values to multiply.
Steps in Algorithm
Algorithm
ProcedureMatrixMulti
Begin
fork =1to n-1
fork=1to n-1step 1
forall Pi;jwherei andj ranges from1to n rotate a
inleft direction
rotate b inthe upward direction c=c+aXb
End
Hypercube Network
A hypercube is an n-dimensional construct where edges are perpendicular among themselves and
are of same length. An n-dimensional hypercube is also known as an n-cube or an n-dimensional
cube.
Features of Hypercube with 2k node
Diameter = k
Block Matrix
Block Matrix or partitioned matrix is a matrix where each element itself represents an individual
matrix. These individual sections are known as a block or sub-matrix.
Example
In Figure(a), X is a block matrix where A, B, C, D are matrix themselves. Figure(f) shows the
total matrix.
Parallel Algorithm-Sorting
Sorting is a process of arranging elements in a group in a particular order, i.e., ascending order,
descending order, alphabetic order, etc. Here we will discuss the following −
• Enumeration Sort
• odd even transportation sortParallel Merge Sort
• Hyper Quick Sort
Sorting a list of elements is a very common operation. A sequential sorting algorithm may not be
efficient enough when we have to sort a huge volume of data. Therefore, parallel algorithms are
used in sorting.
Enumeration Sort
Enumeration sort is a method of arranging all the elements in a list by finding the final position
of each element in a sorted list. It is done by comparing each element with all other elements and
finding the number of elements having smaller value.
Therefore, for any two elements, ai and aj any one of the following cases must be true −
a <a a >a a = a
ii jj ii jj ii jj
Algorithm
begin
foreach process P1,jdo
C[j]:=0;
if(A[i]<A[j])orA[i]=A[j]andi <j)then
C[j]:=1;
else
C[j]:=0;
endENUM_SORTING
begin
id:=process'slabel
fori:=1tondobegin
1);endfor
endODD-EVEN_PAR
procedureparallelmergesort(id,n,data,newdata)
begin
data=sequentialmergesort(data)
newdata=data
end
Hyper quick sort is an implementation of quick sort on hypercube. Its steps are as follows −
id:=process’s label;
begin
send B2 to the process along the ith communication link;
C:=subsequence received along the ith communication link;B:=B1 U
C; endif
else
send B1 to the process along the ith communication link; C:=subsequence
received along the ith communication link;B:=B2 U C;
end else end
or
end HYPERQUICKSORT
In divide and conquer approach, the problem is divided into several small sub-problems.
Then the sub- problems are solved recursively and combined to get the solution of the
original problem.
The divide and conquer approach involves the following steps at each level −
Divide − The original problem is divided into sub-problems.
Conquer − The sub-problems are solved recursively.
Combine−The solutions of the sub- problems are combined to get the solution of the
original problem.
Pseudocode
Binarysearch(a,b,low,high)
Depth-First Search
• Consider a node (root) that is not visited previously and mark it visited. Visit the first
adjacent successor node and mark it visited.
• If all the successors nodes of the considered node are already visited or it doesn’t have
any more successor node, return to its parent node.
Pseudo code
Let v be the vertex where the search starts in Graph G.
DFS(G,v)
StackS :={};
if(notvisited[u])then
visited[u]:=true;
foreach unvisited neighbour w ofu push
S,w;
endif
endwhile
ENDDFS()
Breadth-First Search
Breadth-First Search (orBFS) is an algorithm for searching a tree or an undirected graph data
structure. Here, we start with a node and then visit all the adjacent nodes in the same level and
then move to the adjacent successor node in the next level. This is also known as level-by-level
search.
Steps of Breadth-First Search
• Start with the root node, mark it visited.
• As the root node has no node in the same level, go to the next level. Visit all adjacent
nodes and mark them visited.
• Go to the next level and visit all the unvisited adjacent nodes. Continue this process until
all the nodes are visited.
Pseudocode
Let v be the vertex where the search starts in Graph G.
BFS(G,v)
QueueQ :={};
if(notvisited[u])then
foreach unvisited neighbor w ofu insert Q,w;
endif
endwhile
ENDBFS()
Best-First Search
Best-First Search is an algorithm that traverses a graph to reach a target in the shortest possible
path. Unlike BFS and DFS, Best-First Search follows an evaluation function to determine which
node is the most appropriate to traverse next.
Steps of Best-First Search
• Start with the root node, mark it visited.
• Find the next appropriate node and mark it visited.
• Go to the next level and find the appropriate node and mark it visited. Continue this
process until the target is reached.
Pseudocode
BFS(m )
Insert(m.StartNode)
UntilPriorityQueueisempty
c←PriorityQueue.DeleteMin Ifcisthe goal
ExitElse
Foreachneighbor n ofc
If n "Unvisited"Markn"Visited"
Insert(n)
Markc"Examined"
End procedure
Graph Algorithm
A graph is an abstract notation used to represent the connection between pairs of objects. A
graph consists of −
• Vertices − Interconnected objects in a graph are called vertices. Vertices are also known
as
• nodes.
• Directed graph − In a directed graph, edges have direction, i.e., edges go from one vertex
to another.
Graph Coloring
Graph coloring is a method to assign colors to the vertices of a graph so that no two adjacent
vertices have the same color. Some graph coloring problems are−
• Vertex coloring − A way of coloring the vertices of a graph so that no two adjacent
vertices share the same color.
• Edge Coloring − It is the method of assigning a color to each edge so that no two adjacent
edges have the same color.
• Facecoloring−Itassignsacolortoeachfaceorregionofaplanargraphsothatnotwofaces that
share a common boundary have the samecolor.
Chromatic Number
Chromatic number is the minimum number of colors required to color a graph. For example, the
chromatic number of the following graph is3.
• Now to assign a particular color to a vertex, determine whether that color is already
assigned to the adjacent vertices or not.
• If a processor detects same color in the adjacent vertices, it sets its value in the array to 0.
2
After making n comparisons, if any element of the array is 1, then it is a valid coloring.
end
ok=ΣStatus
end
Some possible spanning trees of the above graph are shown below −
Among all the above spanning trees, figure(d) is the minimum spanning tree. The concept of
minimum cost spanning tree is applied in travelling sales man problem, designing electronic
circuits, Designing efficient networks, and designing efficient routing algorithms.
To implement the minimum cost-spanning tree, the following two methods are used −
Prim’s Algorithm Kruskal’s Algorithm
Prim's Algorithm
Prim’s algorithm is a greedy algorithm, which helps us find the minimum spanning tree for a
weighted undirected graph. It selects a vertex first and finds an edge with the lowest weight
incident on that vertex.
• Steps of Prim’s Algorithm
• Select any vertex, say v1 of Graph G.
• Select an edge, say e1 of G such that e1 = v1 v2 and v1 ≠ v2 and e1 has minimum
weight among the edges incident on v1 in graph G.
• Now, following step 2, select the minimum weighted edge incident on v2. Continue
this till n–1 edges have been chosen. Here n is the number of vertices.
Kruskal's Algorithm
Kruskal’s algorithm is a greedy algorithm, which helps us find the minimum spanning tree
for a connected weighted graph, adding increasing cost arcsat each step. It is a minimum-
spanning-tree algorithm that finds an edge of the least possible weight that connects any two
trees in the forest.
Steps of Kruskal’s Algorithm
• Select an edge of minimum weight; say e1 of Graph G and e1 is not a loop. Select the
next minimum weight ed edge connected to e1.
• Continue this till n–1 edges have been chosen. Here n is the number of vertices.
The minimum spanning tree of the above graph is −
• Find all unlabeled vertices adjacent to the vertex labeled i. If no vertices are
connected to the vertex, S, then vertex, D, is not connected to S. If there are vertices
connected to S, label them i+1.
• If D is labeled, then go to step4, else go to step2 to increase i=i+1. Stop after the
length of the shortest path is found.