Parallelizing Dijkstras Algorithm
Parallelizing Dijkstras Algorithm
1-2021
Recommended Citation
He, Mengqing, "Parallelizing Dijkstra's Algorithm" (2021). Culminating Projects in Computer Science and
Information Technology. 35.
https://repository.stcloudstate.edu/csit_etds/35
This Starred Paper is brought to you for free and open access by the Department of Computer Science and
Information Technology at theRepository at St. Cloud State. It has been accepted for inclusion in Culminating
Projects in Computer Science and Information Technology by an authorized administrator of theRepository at St.
Cloud State. For more information, please contact [email protected].
Parallelizing Dijkstra’s Algorithm
by
Mengqing He
A Starred Paper
Master of Science in
Computer Science
December, 2020
Abstract
Dijkstra’s algorithm is an algorithm for finding the shortest path between nodes in a
graph. The algorithm published in 1959 by Dutch computer scientist Edsger W. Dijkstra, can be
applied on a weighted graph. Dijkstra’s original algorithm runtime is a quadratic function of the
number of vertices.
In this paper, I will investigate the parallel formulation of Dijkstra’s algorithm and its
speedup against the sequential one. The implementation of the parallel formulation will be
performed by Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). The
results gained indicated that the performance of MPI and OpenMP to be significantly better than
sequential for a higher number of input data scale. And the smaller number of processors/threads
give the fastest result for MPI and OpenMP implementation. However, the results show that the
average speedup achieved by parallelization is not satisfied. The parallel implementation of
Dijkstra’s algorithm may not be the best option.
Acknowledgement
I would like to thank my advisor Dr. Meichsner for offering a lot of valuable help and
suggestions to my paperwork. Without her help, I cannot finish this paper smoothly. I would also
like to thank the committee members Dr. Anda and Dr. Julstrom for sharing their valuable time
Table of Contents
Page
Chapter
References ............................................................................................................................. 43
Appendix .............................................................................................................................. 44
5
List of Algorithms
Algorithm Page
List of Tables
Table Page
threads/processors ................................................................................................... 36
4. The comparison for theoretical speedup and experiment speedup for MPI
processors ................................................................................................................ 38
7
List of Figures
Figure Page
1(a). The undirected graph G with 7 vertices, 12 edges and non-negative weight .............. 9
4. The partitioning of the distance array d and the adjacency matrix A among
p processes .......................................................................................................... 23
5. Best execution time for each implementation at different data sets ............................ 34
Figure Page
8. The comparison for theoretical speedup and experiment speedup for MPI
1.1 Introduction
A graph consists of a set of vertices or nodes, together with a set of unordered pairs of
these vertices for an undirected graph or a set of ordered pairs for a directed graph [1]. These
pairs are known as edges, arcs, or lines for an undirected graph and as arrows, directed edges,
directed arcs, or directed lines for a directed graph. Graphs are implemented as data structures by
the adjacency list and adjacency matrix. In this paper, we talk about an undirected and non-
negative weighted graph. Figure 1(a) is an undirected graph with non-negative weights. Figure
1(b) is an adjacency list representation of the undirected graph in Figure 1(a). Similarly, Figure
Figure 1(a). The undirected graph G with 7 vertices, 12 edges and non-negative weight.
10
Suppose we have a given weighted graph G = (V, E, w), where V is the set of vertices in
this graph and E is the set of edges that connect with vertices, w is the set of weights of these
edges. The single source shortest paths problem is to find the shortest paths from a vertex s ∈ V
to all other vertices in V [2]. A shortest path from vertex s to vertex v is a minimized-weight
path. Depending on the application, edge weights may represent time, cost, penalty, loss, or any
algorithm that is used in optimization problems. The algorithm makes the optimal choice at each
step as it attempts to find the overall optimal way to solve the entire problem. Dijkstra’s
algorithm incrementally finds the shortest paths from s to the other vertices of G. It always
There are several variants of Dijkstra’s algorithm [3]; the original variant found the
shortest path between two specific vertices, but a more common variant fixes a single vertex as
the source vertex and finds shortest paths from the source to all other vertices in the graph,
Dijkstra’s algorithm can solve the single source shortest path problem on a graph. For a
given source vertex in the graph, the algorithm finds the shortest path between the vertex and
every other vertex. The solution to the shortest path problem is not unique. If it exists several
paths from source vertex to the specific vertex, Dijkstra’s algorithm will choose one path
arbitrary. In particular, it depends on the order in which we traverse the vertices in each iteration.
* s: source vertex;
*/
10. S := S ∪{u};
From the pseudo code, the time complexity is at line 7~line 12. In the graph (V, E, w), V
is all vertices in the graph and E presents all edges. The first level loop at line 7, the time is O
(|𝑉|). At line 9, get the best vertex, cost time O (|𝑉|). The second level loop at line 11, the time
is O (|𝐸|⁄|𝑉|). The total time complexity is 𝑂 (|𝑉| ∗ (|𝑉| + |𝐸|⁄|𝑉|)) = 𝑂 (|𝑉|2 + |𝐸|) →
𝑂(|𝑉|2 ) → 𝑂(𝑛2 ).
13
1.3 Description
The main feature of Dijkstra’s algorithm is to extend the outer layer (the breadth-first
search idea) around the source vertex until it reaches the end vertex.
When calculating the shortest path in the Graph G, we specify the starting vertex s (that
is, starting from the source vertex s). In addition, two sets S and U are introduced. The role of S
is to record the vertices and the corresponding shortest path length for which the shortest path
has been found. The set U is used to record the vertices and the distance from the vertices to the
source vertex s which the shortest path has not been found. Initially, there is only the source
vertex s in S; U contains vertices other than s, and the path of the vertex in U is the path from
source vertex to this vertex. Then, find the shortest path for this vertex from U and add it to S,
update the vertex and the corresponding path in U. Then, find the shortest vertex of the path from
U and add it to S, update the vertex and the corresponding path in U … repeat the operation until
(1) Initially, S only contains the starting vertex s; U contains other vertices except s, and
the distances. The distance is the weight from the starting vertex s to the vertices in U.
For example, the distance of the vertex v in U is ∞ if s and v are not adjacent.
(2) Select the shortest vertex u from U and add vertex u to S; meanwhile, remove vertex u
from U;
(3) Update the distance from each vertex in U to the source vertex. The reason why the
distance of the vertices in U is updated is that in the previous step u is the vertex of
the shortest path, so that the distance of other vertices can be updated by u; for
example, the distance of (s, v) may be greater than the distance (s, u) + (u, v).
(4) Repeat steps 2 and 3 until all the vertices have been traversed.
14
Simply looking at the above theory may be difficult and misunderstood. The algorithm can be
illustrated by an example Figure 2(a). We would like to compute the distances from source
S = {D (0)}
S is the set of vertices that the shortest path has been calculated.
U is the set of vertices that the shortest path has not been calculated.
After the previous operation, the distance from vertex C to source vertex D in U is the
shortest. Therefore, C is added to S and we update the distance of the vertices in U. Taking
the vertex F as an example, the distance from the previous F to D is ∞; but after adding C to
S = {D (0), C (3)}
After the previous operation, the distance from the vertex E to the source vertex D is the
shortest. Therefore, E is added to S and we update the distance of vertices in U. For example,
the distance from F to D is 9; but after adding E to S, the distance from F to D is 6 = (F, E) +
(E, D).
4) Choose vertex F
5) Choose vertex G
U = {A (22), B (13)}
17
6) Choose vertex B
U = {A (22)}
7) Choose vertex A
U = {}
18
At this point, the shortest distance from the source vertex D to each vertex is calculated:
In the simplest sense, parallel computing is the simultaneous use of multiple compute
- Provide concurrency.
applications in data mining and transaction processing, parallel computing has made a huge
impact in various fields. The cost advantages of parallelism and the performance requirements of
There are two main forms of data exchange between parallel tasks-accessing shared data
accessible by all processors. The processor interacts by modifying the data object stored in this
platform can be local (processor-specific) or global (common to all processors). If it takes the
same amount of time for the processor to access any memory word (global or local) in the
system, the platform will be classified as a unified memory access (UMA) multicomputer. On
the other hand, if it takes longer to access some memory words than others, the platform is called
non-uniform memory access (NUMA) multicomputer. Figure 3(a) and (b) illustrated the UMA
platform, and Figure3(c) illustrates the NUMA platform. In Figure 3(b), accessing stored words
in the cache is faster than accessing locations in memory. However, we still classify it as a UMA
architecture. The reason is that all current microprocessors have a cache hierarchy. Therefor. If
you consider cache access time, even a single processor would not be called UMA. Therefore,
we define NUMA and UMA architectures based on memory access time, not cache access time.
The existence of global memory space makes programming such platforms easier. Programmers
do not see all read-only interactions because they are encoded in the same way as in serial
OpenMP stands for Open Multi-Processing. OpenMP is an API that can be used with
FORTRAN, c and C++ for programming shared address space machines. All OpenMP programs
begin as a single process called the master thread. When the master thread reaches the parallel
region, it creates muiltiple threads to execute the parallel codes enclosed in the parallel region.
When the threads complete the parallel region, they synchronize and terminate, leaving only the
master thread. We initiate the OpenMP programming model with the aid of a simple program.
OpenMP directives on C and C++ are based on the #pragma compiler directives. The directive
OpenMP programs execute serially until they encounter the parallel directive. This
directive is responsible for creating a group of threads. The exact number of threads can be
specified in the directive, set using an environmnet variable, or at runtime using OpenMP
functions. The main thread that encounters the parallel directive becomes the master of this
group of threads and is assigned the thread id 0 within the group. Each thread created by this
directive executes the structured block specified by the parallel directive. The clause list is
used to specify conditional parallelization (if), number of threads (num_threads), and data
processes, each with its own exclusive address space. Each data element must belong to one of
22
the partitions of the space; hence, data must be explicitly partitioned and placed. On such
using messages, hence the name message passing. This exchange of messages is used to transfer
data, work, and to synchronize actions among the processes. In its most general form, message-
process that has the data and the process that wants to access the data. Most message-passing
programs are written using the single program multiple data (SPMD) model.
standard designed by a group of researchers from academia and industry to function on a wide
variety of parallel computing architectures. The standard defines the syntax and semantics of a
core of library routines useful to a wide range of users writing portable message-passing
programs in C/C++ and Fortran. MPI’s goals are high performance, scalability, and portability.
The MPI interface is meant to provide essential virtual topology, synchronization, and
communication functionality between a set of processes (that have been mapped to computer
instances) in a language-independent way. MPI library functions include, but not limited to,
Communicators (Split).
According Algorithm 1, Dijkstra’s algorithm is iterative. Each iteration adds a new vertex
to the computed set. Since the value of d[v] for a vertex v may change every time a new vertex u
is added in S, it is hard to select more than one vertex. This is not easy to perform different
23
iterations of the while loop in parallel. However, each iteration can be performed in parallel as
follows.
Let p be the number of processes, and let n be the number of vertices in the graph. The
set V is partitioned into p subsets using the 1-D block mapping. Each subset has n/p consecutive
vertices, and the work associated with each subset is assigned to a different process. Let Vᵢ be the
subset of vertices assigned to process Pᵢ for i = 0, 1, …, p - 1. Each process Pᵢ stores the part of
the array d that corresponds to Vᵢ (Figure 4.a). Each process Pᵢ computes di[u] = min{di[v]|v (V
- S) Vi} during each iteration of the while loop. The global minimum is then obtained over all
di[u] by using the all-to-one reduction operation and is stored in process P₀. Process P₀ now
holds the new vertex u, which will be inserted into S. Process P₀ broadcasts u to all processes by
using one-to-all broadcast. The process Pᵢ responsible for vertex u marks u as belonging to set S.
Finally, each process updates the values of d[v] for its local vertices.
Figure 4. The partitioning of the distance array d and the adjacency matrix A among p processes
[2].
24
When a new vertex u is inserted into S, the values of d[v] for v (V - S) must be updated.
The process responsible for v must know the weight of the edge (u, v). Hence, each process Pᵢ
needs to store the columns of the weighted adjacency matrix corresponding to set S of the
vertices assigned to it. This corresponds to 1-D block mapping of the matrix. The space to store
the required part of the adjacency matrix at each process is Θ (n²/p). Figure 4.b illustrates the
The computation performed by a process to minimize and update the values of d[v]
during each iteration is Θ (n/p). The communication performed in each iteration is due to the all-
to-one reduction and the one-to-all broadcast. For a p-process message-passing parallel
computer, a one-to-all broadcast to one word takes time log p. Finding the global minimum of
one word at each iteration is Θ (log p). The parallel run time of this formulation is given by
𝑛2
𝑇𝑃 = Θ ( ) + Θ(𝑛 log 𝑝).
𝑝
Equation 1 [2]
Since the sequential run time is W = Θ (n²), the speedup and efficiency are as follows:
Θ(𝑛2 )
𝑆=
Θ(𝑛2 ⁄𝑝) + Θ(𝑛 log 𝑝)
1
𝐸=
1 + Θ((𝑝 log 𝑝)/𝑛)
Equation 2 [2]
Dijkstra’s algorithm can use only p = O (n/log n) processes. Furthermore, the isoefficiency
function due to communication is Θ (p² log² p). Since n must grow at least as fast as p in this
25
formulation, the isoefficiency function due to concurrency is Θ (p²). Thus, the overall
3.1 Technologies
In this part, we talk about which technology to use in the implementation and made
various decisions.
The message passing API must be available on all systems on which it is implemented,
operations. The MPI matches this, so I did not seriously consider alternatives when choosing it.
The MPI implementation is free, easily available, with C bindings, and I already know about it.
The chosen shared address API must allow a lot of control over the tasks that are
performed simultaneously. It must also be provided for free. OpenMP provides a layer on top of
native threads to facilitate various thread-related tasks. Using the instructions provided by
OpenMP, the programmer does not need to perform the task of initializing the attribute object,
setting parameters for the thread, and dividing the iteration space. This facility is especially
useful when the underlying problem has a static or regular task diagram. In the context of various
applications, the overhead associated with automatically generating thread code from
3.1.3 Language
I choose C due to the availability of a relatively stable MPI implementation for message
MPI, Dijkstra’s algorithm implementation in OpenMP, the three implementations are run in
After the technologies to use were determined, I write a JAVA program to generate test
data. This program generates a 2D array with random numbers in the range of 1 to 15, which
represents the input graph for Dijkstra’s algorithm. The weights are created from the Random
function in JAVA. Assume G is a 2D array, G[i][j] represents the weight from vertex i to vertex
j. If they have no direct connect, the weight is set as 9999999, otherwise it is a random number
between 1 ~15. If i = j, that means it’s the vertex i (or j) itself. We set G[i][j] = 0 if (i==j). The
weights are randomly generated. The calculated distance will not be too large. This does not
affect our experimental goals because I only need to get results from different programs that use
same data. I will run the same set of data on serial Dijkstra’s algorithm implementation,
compare the time consumed. We totally have six sets of data are used for input adjacency matrix.
That means, for the graph, 8 vertices, 64 vertices, 256 vertices, 512 vertices, 1024 vertices and
2048 vertices are used. Here is an example for the 8 vertices matrix.
28
0 2 9999999 3 4 3 9999999 3
2 0 8 8 9 9999999 7 7
3 8 6 0 7 3 9 7
4 9 7 7 0 9999999 9999999 4
3 7 2 7 4 3 9999999 0
3.3 Algorithm
Details on how the algorithm was implemented are given in the section below. The
complete source code for the implementations described can be found in Appendix A. Pseudo-
code describing the implementations in simplified form has been provided here.
/* wgt: points to locally stored portion of the weight adjacency matrix of the graph;
* lengths: points to a vector that will store the distance of the shortest paths from
*/
29
The main computational loop of Dijkstra's parallel single-source path algorithm executes
three steps. First, each process will find the locally stored vertex in Vo with the shortest distance
from the source. Then, the process determines the vertex with the shortest distance and includes
it in Vc. Third, each process updates the distance array to reflect the fact that Vc contains new
vertices.
The first step is to scan the vertices stored locally in Vo to determine the short vertex v
[v]. The calculation result is stored in the array lminpair. Specifically, lminpair[0] stores the
distance between vertices, and lminpair[1] stores the vertices themselves. Consider the following
steps to clarify why this storage solution should be used. The next step is to calculate the vertex
with the smallest total distance to the source. We can find the sum of the shortest distance by
However, in addition to the shortest distance, we also need to know the specific vertex of
the shortest distance. Therefore, the appropriate reduction operation is MPI_MINLOC, which
returns the minimum value and the index value associated with the minimum value. Because of
MPI_MINLOC, we use a two-element array lminpair to store the distance and the vertex that
reaches that distance. In addition, all processes need the result of the restore operation to perform
the third step, so we use the MPI_Allreduce operation to perform the reduction. The result of
the reduction operation is returned to the gminpair array. We can perform the third and last step
of each iteration by scanning the local vertices belonging to Vo and updating the shortest
In our MPI program, we assign n/p consecutive W columns to each processor, and uses
iteration. Recall that the index returned by the MPI_MINLOC operation on (a, i) and (a, j) has a
smaller index (because the value is the same). Therefore, among the vertices that are close to the
source vertices, they are biased toward the least vertices. This can lead to load imbalance,
because vertices stored in lower-level processes tend to be included in Vc faster than vertices in
higher-level processes (especially many vertices in Vo have the smallest same distance to
source). Therefore, in higher-level processes, the configured Vo size will be larger, and the entire
One way to solve this problem is to use circular distribution to distribute the columns of
W. This allocation process will get all p vertices starting from vertex i. In this scheme, each
process also allocates n/p vertices, but the indexes of these vertices almost cover the entire graph.
Therefore, MPI_MINLOC preferentially selects the vertex with the smallest number, and will not
statements with one point of entry at the top and one point of exit at the bottom. We can find
computational intensive loops in Dijkstra’s sequential algorithm and make the loop iterations
to the sequential one. Compared with MPI implementation, OpenMP has less lines of code.
The function omp_set_num_threads sets the default number of the threads that will be
created on encountering the next parallel directive. We use this function in the main
function.
32
team. The omp_get_thread_num returns a unique thread id for each thread in a team. This
The critical directive ensures that at any point in the execution of the program, only
critical region that allows different threads to execute different code while being protected from
each other.
provides a barrier directive. On encountering this directive, all threads in a team wait until
encountering the single block, the first thread enters the block. All the other threads proceed
Table 1 contains all primary results of running the Dijkstra’s algorithm in sequential, MPI
and OpenMP programs. The code is run in the same system environment, and the input data
source is generated by Random function. In this paper, the total six sets of data are used for input
adjacency matrix. That means, for the graph, 8 vertices, 64 vertices, 256 vertices, 512 vertices,
1024 vertices and 2048 vertices are used. After running the code, we can get the results, which is
the duration in seconds. For OpenMP and MPI parallel computation, 2, 4, 8, 16, 32, 64, 128, 256,
512 processors (the numbers of vertices should larger than processors) are used to run the code.
Table 2. The best execution time in seconds the three implementations (the numbers in brackets
indicate how many threads/processors).
0.5
0.25
0.125
0.0625
0.03125
Seconds
0.015625
0.0078125
0.0039063
0.0019531
0.0009766
0.0004883
0.0002441
8 64 256 512 1024 2048
Number of Vertices
seq OpenMP MPI
Figure 5. Best execution time for each implementation at different data sets.
35
Figure 5 shows the best execution time for sequential, OpenMP and MPI with different
number of vertices. We can see the performance is better when using OpenMP and MPI to run
the algorithm. For a small number of vertices, more time could be spent on parallelization and
synchronization than it is spent on execution of code as sequential. So, when the number of
vertices less then 512, it is not obvious that parallelization is superior to sequential. We can
predict the cost time of MPI and OpenMP to be significantly better than sequential for a higher
number of vertices.
Another result is that the best execution time for MPI is slightly better than OpenMP as
read data that another processor has written, its cache must be updated. When multiple
processors read and write data on the same cache line, the cache needs to be updated
continuously, this means that the cache is never effective as it must be constantly updated. This
can have a big impact on the performance of algorithms on systems with a shared-address-space.
In contrast, distributed storage systems that use message passing have a separate cache for each
processor which is not invalidated or updated directly by other processors. Therefore, cache
The following table shows the computation time of OpenMP and MPI when the number
Table 3. The results for OpenMP and MPI implementation with different threads/processors.
32
16
4
Seconds
0.5
0.25
0.125
2 4 8 16 32 64 128
Number of Threads/Processors
OpenMP MPI
Figure 6. Different threads/processors for OpenMP and MPI implementation in 1024 graph
vertices.
37
128
64
32
16
8
Seconds
0.5
0.25
2 4 8 16 32 64 128
Number of Threads/Processors
OpenMP MPI2048
Figure 7. Different threads/processors for OpenMP and MPI implementation in 2048 graph
vertices.
and MPI implementation are better than the sequential ones for Dijkstra’s algorithm. However,
through the Figure 6 and Figure 7, we observe the smaller number of processors/threads give the
fastest result. For instance, if we use 1024 vertices, the best number of processors for MPI
implementation is 2, 4, 8; for OpenMP, the number is the same. When the number of vertices is
2048, we have reached a similar conclusion. This is likely because each added process/thread in
code causes extra communication costs in updating them. As the number of processors/threads
38
increase obviously, these communication costs become significantly impact. Especially for MPI
Dijkstra’s algorithm, it’s very poor compared to the OpenMP one, and increasing the number of
processors causes the slowdown to worsen. The parallel performance is likely very poor because
it is dominated by the communication time, the time taken to do the MPI_Allreduce each
iteration.
compute the speedup in each condition. The following table is a comparison for theoretical
speedup and experiment speedup in MPI implementation with 1024 vertices graph input.
Table 4. The comparison for theoretical speedup and experiment speedup for MPI
implementation with 1024 graph vertices and different number of processors.
128
64
32
16
4
Speedup
0.5
0.25
0.125
0.0625
0.03125
0.015625
2 4 8 16 32 64 128
Number of Processorss
Figure 8. The comparison for theoretical speedup and experiment speedup for MPI
implementation with 1024 graph vertices in different processors.
Speedup is a measure that captures the relative benefit of solving a problem in parallel. It
is defined as the ratio of the time taken to solve a problem using the best sequential algorithm to
the time required to solve the same problem on a parallel computer with p identical processing
elements. If speedup can maintain a linear growth with processors, multiple machines can well
shorten the required time. However, this speedup is very difficult to achieve, because when the
machine increases, there is a problem of communication loss, as well as the problem of each
computer node itself (the skew of the slaves). For example, the total time spent by the algorithm
40
is usually determined by the slowest machine. If the time required by each computer is different,
There may be many reasons why the high level of parallel execution of Dijkstra’s has not
reached the expected speedup. One reason may be that the code used is inefficient.
In the experiment, the speedup decreases also maybe because the communication latency
outperforms the benefit from using more processors. We should consider all the information
needed to evaluate the performance of parallel algorithm on a specific architecture with specific
We introduced, designed and implemented parallel Dijkstra’s algorithm in this paper. The
• The performance of Dijkstra’s algorithm is better when using OpenMP and MPI
• For parallelization Dijkstra’s algorithm, the best execution time for MPI is slightly
• For both OpenMP and MPI implementations, the smaller number of processors/
1024 vertices input. The experiment speedup is not a linear growth with processors
particularly the use of a priority queue for replacing the array is an area that allows
for much further work. A priority queue is that each element additionally has a
“priority” associated with it. For a min-priority queue, the minimum element has
highest priority and it will be served before an element with low priority. A min-
vertices and |E| is the number of edges in a graph. If Dijkstra’s algorithm uses an array
42
to scan all the vertices directly, it costs time 𝑂(|𝑉|2 ) . For sparse graphs, if the number
of edges is smaller than number of vertices, we can implement the input graph by
adjacency list instead of adjacency matrix and use the binary heap or Fibonacci heap
(1) Add the source vertex to the heap and adjust the heap;
(3) Deal with the vertices that are adjacent to u: if the vertex is in the queue, update
the distance and adjust the position of the element in the heap; if the vertex is not
(4) If the obtained u is the end point, end this algorithm; otherwise, repeat steps 2
and 3.
The complexity of using a binary heap requires 𝑂((|𝐸| + |𝑉|) log |𝑉|). The Fibonacci
References
[2] A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2nd
ed. Addison Wesley, 2003.
Appendix
Selected Code
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#define N 2048
#define SOURCE 1
for a graph represented by adjacency matrix, and use source vertex as input. */
int weight[N][N];
int i, j;
char fn[255];
FILE *fp;
gettimeofday(&tv, &tz);
strcpy(fn, "input2048.txt");
fp = fopen(fn, "r");
exit(1);
//printf("\n");
dijkstra(weight, SOURCE);
printf("\n");
gettimeofday(&tv, &tz);
printf("\n");
return 0;
/* This array holds the shortest distance from source to other vertices. */
int distance[N];
int visited[N];
distance[i] = graph[source][i];
visited[i] = 0;
visited[source] = 1;
count = 1;
minDistance = MAXINT;
/* Pick the minimum distance vertex from the set of vertices that
is not processed. */
minDistance = distance[i];
nextNode = i;
visited[nextNode] = 1;
count++;
//}
//printf("\n");
/* MPIdijk.c
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <sys/time.h>
49
#include "mpi.h"
#define N 2048
#define SOURCE 1
@param wgt: points to locally stored portion of the weight adjacency matrix of the graph;
@param lengths: points to a vector that will store the distance of the shortest paths from the
*/
void SingleSource(int n, int source, int *wgt, int *lengths, MPI_Comm comm) {
int temp[N];
int i, j;
int firstvtx; /* The index number of the first vertex that is stored locally */
int lastvtx; /* The index number of the last vertex that is stored locally */
int u, udist;
MPI_Status status;
50
MPI_Comm_size(comm, &npes);
MPI_Comm_rank(comm, &myrank);
nlocal = n / npes;
firstvtx = myrank*nlocal;
/* Set the initial distances from source to all the other vertices */
/* This array is used to indicate if the shortest part to a vertex has been found or not. */
/* if marker [v] is one, then the shortest path to v has been found, otherwise it has not. */
marker[j] = 1;
/* The process that stores the source vertex, marks it as being seen */
marker[source - firstvtx] = 0;
/* Step 1: Find the local vertex that is at the smallest distance from source */
51
lminpair[1] = -1;
lminpair[0] = lengths[j];
lminpair[1] = firstvtx + j;
udist = gminpair[0];
u = gminpair[1];
/* The process that stores the minimum vertex, marks it as being seen */
if (u == lminpair[1]) {
marker[u - firstvtx] = 0;
}
52
free(marker);
int i, j, k;
char fn[255];
FILE *fp;
gettimeofday(&tv, &tz);
MPI_Init(&argc, &argv);
53
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/*allocate local weight and local disatance arrays for each prosess*/
/* Open input file, read adjacency matrix and prepare for sendbuf */
if (myrank == SOURCE) {
strcpy(fn,"input2048.txt");
fp = fopen(fn,"r");
exit(1);
fscanf(fp,"%d", &weight[i][j]);
// printf("\n");
for(i=0; i<N;++i) {
for(j=0; j<nlocal;++j) {
sendbuf[k*N*nlocal+i*nlocal+j]=weight[i][k*nlocal+j];
/*distribute data*/
MPI_COMM_WORLD);
MPI_COMM_WORLD);
55
if (myrank == SOURCE) {
// }
// printf("\n");
gettimeofday(&tv, &tz);
free(localWeight);
free(localDistance);
MPI_Finalize();
return 0;
*/
#include <stdlib.h>
#include <stdio.h>
56
#include <string.h>
#include <sys/time.h>
#include <omp.h>
#define N 2048
#define SOURCE 1
/* This program runs single source Dijkstra's algorithm. Given the distance
int i, j;
char fn[255];
FILE *fp;
int graph[N][N];
int threads;
scanf("%d", &threads);
omp_set_num_threads(threads);
gettimeofday(&tv, &tz);
strcpy(fn, "input2048.txt");
fp = fopen(fn, "r");
exit(1);
//printf("\n");
dijkstra(graph, SOURCE);
gettimeofday(&tv, &tz);
58
return 0;
int visited[N];
int i;
int md;
int distance[N]; /* This array holds the shortest distance from source to other vertices. */
int mv;
int my_first; /* The first vertex that stores in one thread locally. */
int my_last; /* The last vertex that stores in one thread locally. */
int my_step; /* local vertex that is at the minimum distance from the source */
visited[i] = 0;
59
distance[i] = graph[source][i];
visited[source] = 1;
# pragma omp parallel private ( my_first, my_id, my_last, my_md, my_mv, my_step ) \
my_id = omp_get_thread_num ( );
nth = omp_get_num_threads ( );
md = MAXINT;
mv = -1;
int k;
my_md = MAXINT;
my_mv = -1;
60
the graph */
my_md = distance[k];
my_mv = k;
md = my_md;
mv = my_mv;
* region will wait until all other threads in this section reach the same point. So
if (mv != - 1){
visited[mv] = 1;
if ( mv != -1 ){
int j;
/*
printf("\n");
*/