0% found this document useful (0 votes)
101 views9 pages

Gpu Working - K Shortest Path Analysis

This document describes a study that implemented Yen's algorithm for finding the k-shortest paths in a graph on a GPU using CUDA. Yen's algorithm finds simple k-shortest paths (without vertex repetition) in polynomial time. The authors developed a parallel GPU version of the algorithm using Nvidia's CUDA programming model. Their implementation achieved a 6x speedup compared to the serial CPU version of the algorithm. The paper presents the GPU and CUDA architecture, describes how the graph is represented, discusses data structures for storing multiple paths, and analyzes the performance of the parallel algorithm on large graphs.

Uploaded by

Sourabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views9 pages

Gpu Working - K Shortest Path Analysis

This document describes a study that implemented Yen's algorithm for finding the k-shortest paths in a graph on a GPU using CUDA. Yen's algorithm finds simple k-shortest paths (without vertex repetition) in polynomial time. The authors developed a parallel GPU version of the algorithm using Nvidia's CUDA programming model. Their implementation achieved a 6x speedup compared to the serial CPU version of the algorithm. The paper presents the GPU and CUDA architecture, describes how the graph is represented, discusses data structures for storing multiple paths, and analyzes the performance of the parallel algorithm on large graphs.

Uploaded by

Sourabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 48 (2015) 5 – 13

International Conference on Intelligent Computing, Communication & Convergence


(ICCC-2014)
(ICCC-2015)
Conference Organized by Interscience Institute of Management and Technology,

Bhubaneswar, Odisha, India

Implementation of K-shortest path algorithm in GPU using CUDA


AvadheshPratapSingha, DhirendraPratapSinghb
a
M. Tech Scholar, MANIT Bhoapl, MP, India.
b
Assistant Professor, MANIT Bhoapl, MP, India

Abstract

K-shortest path algorithm is generalization of the shortest path algorithm. K-shortest path is used in various fields like sequence
alignment problem in molecular bioinformatics, robot motion planning, path finding in gene network where speed to calculate
paths plays a vital role. Parallel implementation is one of the best ways to fulfill the requirement of these applications. A GPU
based parallel algorithm is developed to find k number of shortest path in a positive edge-weighted directed large graph. In
calculated shortest path repetition of the vertices is not allowed. Implemented algorithm calculates a k-shortest path between two
pair of vertices of a graph with n nodes and m vertices. This approach is based on Yen’s algorithm to find k-shortest loopless
path. We implemented our algorithms in Nvidia’s GPU using Compute Unified Device Architecture (CUDA). This paper
presents comparative analysis between CPU and GPU based implementation of Yen’s Algorithm. Our approach achieves the 6
time speed up in comparison of serial algorithm.
©© 2014 TheAuthors.
2015 The Authors.Published
Published
by by Elsevier
Elsevier B.V.B.V.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of scientific committee of International Conference on Computer, Communication
and Convergence (ICCC 2015)

Keywords:Compute Unified Device Architecture (CUDA), Graphical Processing Unit (GPU), Shortest path Algorithm, Parallel Algorithm.

1. Introduction

Let G = (V, E) be a directed graph, where V is a set of n nodes and E is a set of m arcs. m > n is assumed
throughout the paper to avoid trivial complications. In the graph G each edge is associated with the positive weight

1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of scientific committee of International Conference on Computer, Communication and Convergence (ICCC 2015)
doi:10.1016/j.procs.2015.04.103
6 Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13

w. The problem of finding shortest path from p to q is classical and most studied problem of graph algorithm and has
been various methods implemented in sequential as well as parallel. The most popular algorithm for this problem is
Dijkastra’s algorithm1. Parallel implementation of Dijkastra’s algorithm is given by different researchers2-9. K-
shortest path problem is generalization of shortest path problem which calculates k number of shortest path in
increasing order of weight. K-shortest path problem is broadly divided into two categories. First type is containing
path that allows repeated vertices. Second type of path that doesn’t allowed repetition of nodes in the path. We are
concerning on positive edge weight in which shortest path is always without vertex repetition. See figure 1 for an
example illustrating difference between k-shortest path problem with and without repetition of vertices. In figure 1
three simple(without repetition of vertices) path have length 6(s,a,b,t), 20(s,c,t), 21(s,d,t) respectively and Paths that
may have repetitions of vertices are 6(s,a,b,t), 8(s,a,b,a,b,t),10(s,a,b,a,b,a,b,t).

Fig. 1.the difference between path with and without repetition of the vertices

The k-shortest path problem in which repetition of vertices allowed seems to be easier.There are various
algorithms10,11,12,13 to find this type of path. Latest improvement of k-shortest path with repetition of vertices is done
by Eppstein10 that achieves the optimal time of O(m+nlogn+k)the algorithm computes an implicit representation of
the paths, from which each path can be calculated in O(n) additional time.
There are various algorithms15,16,17,18to calculate k-shortest path problem but they are not have a stable time
complexity.Finding K-shortest path without repetition of vertices proved to be more challenging that is called simple
k-shortest paths. The problem is initially examined by Hoffman and Pavley14, but all early attempts15,16,17,18to solve it
leads to exponential time complexity. The first algorithm to find k-shortest path without vertex repetition with
polynomial time complexity is Yen’s algorithm19,20(generalized by Lawler21). The time complexity of the Yen’s
algorithm using new data structure is O(kn(m + nlogn)). In the case of undirected graphs, Katohet. al.22 improves
yen’s algorithm to O(kn(m+nlogn)) time. While Yen’s algorithm worst case time complexity for simple k-shortest
path in weighted directed graph is still unbeaten.
There is various implementation of Yen’s algorithm23,24 with the new data structure but still yen’s worst case
complexity is unbeaten in k-shortest path calculation in the directed graph. Kumar and Ghosh25 designed a CREW
PRAM algorithm for the calculation of all pair version of the k-shortest path problem. They developed algorithm
based on the transitive closure algorithm for computing all pair shortest path. Ruppert26 developed a CREW PRAM
algorithm for the k shortest path problem to a given destination node from every node of an edge-weighted directed
graph. Ruppert algorithm is based on the Eppestein algorithm. Guerriero27et. Al developed a parallel asynchronous
algorithm to calculate k-shortest paths from a single source to all other node of a directed graph. They have
implemented their algorithm in nonuniform memory access multiprocessor. Their algorithm is based on the
parallelism in sequential label correcting method. Ruppert and Guerriero algorithms calculates k-shortest path that
may have repetition of vertices.
K-shortest path problem is applied in various real time problems that are discussed and listed by Eppestein10. In
real time application like,robot motion planning, network routing path, optimization problems such that length
limited Huffman coding, Sequence alignment problem in molecular bioinformatics, multiple object tracking,
multiple path finding in gene network28 where we need speed up and we can apply parallel algorithm to calculate
shortest path. In these applications many applications need fast calculation of k-shortest path. To reduce the time we
can implement k-shortest path algorithm in parallel. Parallel implementation of the k-shortest path algorithm done in
GPU using CUDA improves the processing time of large graph.
Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13 7

In this paper we proposeparallel algorithms to find simple k-shortest path and their comparative analysis with
serial implementation. Both these algorithms are based on Yen’s algorithm.
This paper is organized insevensections. In section 2 we described about GPU and CUDA programming. Graph
representation method is explained in the section 3. In section 4 we described the multiple path representation data
structure. Serial algorithm to obtained k-shortest path and there parallel implementation discussed in the in Section
5. Section 6 presents the results and performance analysis of the parallel algorithm regarding to the various large
graph. At the end in section 7 we discussed conclusion.

2. GPU and CUDA overview

The architecture of GPU best fits in data parallel approach. As per analysis GPU is most suited in algorithms that
have high arithmetic intensity and regular data access pattern29. Leading GPU developing company NVIDIA
introduced Compute Unified Device Architecture to simplify the GPU programming by mean of high level
application programming interface.
GPUs are basically collection of multiple streaming processors. In stream processing, a single instruction is
executing on a stream of data in each thread. A CUDA program is composed of two parts: A host(CPU) code that’s
creating kernel calls, and a device(GPU) code that actually implements the kernel30. The host code is serial code
runs on CPU and device code runs on GPU in each thread to maximize the GPU thread utilization. From
programmer’s view CUDA programming model is collection of thread running in parallel.

Fig. 2. CUDA Programming Architecture

Nvidia created GPU is has multiple multiprocessors, which are able to run numerous processing elements called
cores. Each Multiprocessor can access data from memory hierarchy provided by GPU. Nvidia GPU provides
different memory level named as, a fast private register memory, global memory, shared memory, constant memory
and texture memory. Register memory is private memory for each thread. Global memory, constant memory and
texture memory are accessible for the all thread present in a grid30. Shared memory is local to the thread of block.
Constant memory and texture memory are read only memory present in DRAM of the GPU device.
In CUDA platform a set of instructions (kernel) executed on each thread of the GPU device. Threads are divided
into group that is called block. Block is a collection of thread that can be run on each core of device at a time. A grid
is collection of multiple blocks assigned to multiprocessor. The number of thread that can be executed on a single
8 Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13

multi-processor called wrap which size is fixed. Each thread, block and grid has a unique ID assigned by CUDA.
Unique ID is used to access data in which particular thread’s instruction is executed. Figure 2 shows the CUDA
programming model. In GPU programming only one common data stored in global and it will be same for all the
running blocks under a grid. If we want to work in different data simultaneously then there is a need of multiple
GPUs.

3. Graph Representation

Graph representation in order to access graph, is plays important role in running time of the algorithm.
Conventionally, there is two ways to represent the graph, adjacency matrix and adjacency list. Adjacency matrix is
waste a lot of memory in case of the sparse graph. Adjacency list representation is best way to represent sparse
graph. In GPU device CUDA access the memory in array so because of different size of edge list difficult to use
Adjacency list. Harish et. al.5 and Dhirendraet. al.9 gives the modified adjacency list representation. According to
Dhirendraet. al.9use three arrayVa, Ea and Ew. Vertices of the graph G(V,E,W) are represented as an array Va . Va is
use to store the starting index of own adjacency list in Ea. Ea array of size equal to number of edges, use to store the
vertex number which is connected to ith vertex of Va. In Ea array each entry in range of ith value to (i+1)th value of
Va, is connected to ith vertices of the Va for all i in Va. Ew is use to store the weight of edge corresponds to the Ea.

Fig. 3. Graph Representation

According to Dhirendraet. al.9 in preprocessingtime Ew is store the weight in shortest order in the range of ith
value to (i+1)th value of Va, is connected to ith vertices of the Va for all i in Va. This sorted preprocessing is useful in
calculation of shortest path. Figure 3 shows the representation of the graph in the CUDA. In this way of
representation we can easily calculate the out degree of the each vertex. The out degree of node i is equal to
difference of the ith index of Va and (i+1)th index of Va. We have added one more array Es of size |E| to store the
starting node of the each edge. This graph representation done in such a way that will easily accessible in GPU and
help to reduce the access time of the running algorithm. All of these defined array initialized in preprocessing of the
graph.
Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13 9

4. Data structures to represent the Path

Let Z be an array that store the information related to each calculated shortest path. In this paper we use two arrays,
one for the result path set (Zresult) and another one for the rest path set (Zrest). Size of the Zresult is equal K (number of
shortest paths). If the calculated path is less than K then some field of array is nil.
Each array element stores the following information about the path P that it represents.
Pointer to an array Path that stores the shortest path between two nodes. Each Element of Path array store
two information that is node and weight from the source.
Index value of the parent path from which it is generated (if it is first path or not depends on previous
calculated paths then value of this field is negative).
Diversion node index of the parent path from which new path is calculated (if the source node is diversion
node then value of this field is negative).
Length of the corresponding path.

This data structures is used in the minimizing the function call of the shortest path algorithm in comparison to
yen’s algorithms. Diversion node ensures that there is no duplicate shortest path in Zrest because we start calculating
new path after removing edges in the graph after diversion node index onwards so there are no duplicate paths. This
data structures helps to track the parent path by which new path is generated so there is no need to compare each
path for the common edges in path.

5. Algorithm and there parallel implementation

5.1. Modified Yen’s algorithm using Dijkastra’s algorithm

Dijkastra’s algorithm is basically use for the single source shortest path1calculation. There is various parallel
implementation2,4,5,6,7,8,9of Dijkastra’s algorithm in GPU using CUDA. With some modification Dijkastra’s
algorithm can be used for calculating shortest path between two vertices. Yen’s algorithm internally uses the
shortest path calculation where we will use parallel Dijkastra’s algorithm9. Yen’s algorithm is basically used for
calculating k-shortest simple path in the directed weighted graph. Yen’s algorithm also can be used in calculating
shortest path in undirected graph but it is unnecessarily calculating many shortest paths that lead to it in more time
consuming. Katoahet al.22 gives an efficient algorithm for the undirected graph.

Algorithm 1: Modified Yen’s algorithm (Graph G (V, E, W), Source node, Destination node , K_no)
Create two list for the Result path set RESULT and rest path set REST
Begin
[1] Calculate first shortest path using DIJKASTRA algorithm.
[2] Add the first shortest path in the Zresultt.
[3] for k=2 to K_no
[4] for each edge of the (k-1)th path(s, v1, v2, v3….. vn) of Zresult
[5] Remove each edge (vi , vi+1) from graph corresponding Zresult to where same sub path from s to vi
[6] Apply shortest path algorithm for vi to t
[7] Combine path s to vi and new path from vi to t
[8] Store the calculated path in Zrest
[9] End of for
[10] Take minimum path from the REST path list and store in Zresult.
[11] End of for
End

According to yen’s algorithm firstly calculate the shortest path between given pair of vertices. After calculating
first path store it in the Zresult and then for each next shortest path take the previous calculated path(s, v1, v2, v3….. vn)
from the Zresult and remove each edge (vi, vi+1)and check in all path of result path set if there are some matched path
from s to vi then remove each edge next to vi in the graph and then calculate shortest path between vi and t. After
shortest path calculation combine both the path that is root path(s to vi) and spur path (vi to t) and store in Zrest. After
10 Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13

calculating all path corresponding to previously calculated path take minimum path from the rest of the path list and
store in the Zresult. This whole process is repeated until the size of Zresult is not equal to k. In Modified Yen’s we have
used Dijkastra’s algorithm to internal shortest path calculation in between given pair of vertices. In This modified
algorithm we used new data structure so it can easily modify the graph in order to calculate new shortest path. Data
structure also helps to identify the common edges between the Zresultby the use of parent path field in the data
structure. This data structure also reduces the memory. Modified Yen’s algorithm is defined in the Algorithm 1.
In yen’s serial algorithm there is various modification and improvement has been done previously21,23,24 in data
structure and calculation of new path also. We are also using new data structure to store the path information.

5.2. Implementation in CUDA

Implementation of Yen’s algorithm is done in CUDA is using parallel Dijkastra’s algorithm. In simple SSSP
Dijkastra’s algorithms9 we are calculating path weight between source to each node. In this paper we also
constructing shortest path tree from source to destination doing some minor modification in9. Algorithm 2 defines
the modified version of parallel Dijkastra’s algorithm that is also calculating responsible edge for each node weight
of the path.

Algorithm 2: Dijkastra_SSSP (Node_weight, Mask, res_node, Edge_start, Edge, Weight, S_node, D_node)
Begin
[1] while(thre<infinite and Mask[D_node] != 1) do
[2] thre= infinite
[3] CAL_THRES(Node, Node_weight, Edge, Weight, Mask, thre, infinite) for all nodes of the graph in parallel
[4] REL_NODE(Node, Node_weight, Edge, Weight, Mask, thre) for all nodes of the graph in parallel
[5] Endwhile
[6] FIND_RESPONSIBLE(Edge_start, res_node, edge, edge weight , Node_weight ) for all edge of the graph in parallel
End

In Dijkastra_SSSP algorithm process is same as9 but we have modified in step three and added a new step to
calculate the responsible edge. In step three we have added one condition that is verifying weather we relaxed the
destination node or not. If we relaxed the destination node then the process will stop and move to step 8. In this
algorithm we have 4 kernel call. INITIATE, CAL_THRES and REL_NODE kernel have the same functionality like
INITIALIZATION, THRESHOLD and RELAX in [9] respectively. Fourth kernel is FIND_RESPONSIBLE that is
defined in the Algorithm 3.

Algorithm 3: FIND_RESPONSIBLE (Edge_start, mask, edge, weight, Node_weight )


Begin
[1] id = getThreadID
[2] if(Node_weight[Edge_start[id]] + weight[id] == Node_weight[edge[id]])
[3] then res_node[edge[id]] = Edge_start[id]
[4] End if
End

In FIND_RESPONSIBLE kernel |E| thread are initialized to find the responsible edge for the each node weight.
Responsible edge of the each node i of the path is calculated by comparing the edge weight of all incoming edge
from any node. If the incoming edge weight plus edge start node weight is equal to node weight then that edge is
responsible for weight of that node i.
In Yen’s algorithm implementation we use Dijkastra_SSSP for the calculation of the shortest path. In yen’s
algorithm step 1 and step 6 will use parallel shortest path calculation between two nodes. Algorithm 4 defines the
parallel implementation of the Yen’s algorithm. Edge_start, Edge, Weight are initialize in preprocessing of the
graph.

Algorithm 4: Yen_parallel(Graph G (V, E, W), S_node, D_node, K)


Create an array Node_weight of size |V|, a Boolean array Mask of size |V|, a array res_node of size |V|, a variable infinite with a
very large number assigned to it and a variable thre to store the threshold value. Create two arrayZresultand Zrest. Size of Zresultt is K
Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13 11

that is used to store the result k-shortest path and


Begin
[1] INITIATE(Node_weight, Mask, S_node) for all nodes of the graph in parallel
[2] thre= 0
[3] Dijkastra_SSSP (Node_weight, Mask, res_node, Edge_start, Edge, Weight, S_node, D_node)
[4] Add the first shortest path in the Zresultin first index and initiate diversion node and parent node value to 0 and -1.
[5] for k=2 to K
[6] foreach edge fromm=diversion node totod_node of (k-1)th path of Zresult.
[7] n= n = Zresult[k-1].
[8] whilen = parent path is non-negative and m == Zresult[diversion node].
[9] remove edge (vm, vm+1) from graph.
[10] n = Zresult[n].
[11] End of while
[12] Dijkastra_SSSP (Node_weight, Mask, res_node, Edge_start, Edge, Weight, m, D_node)
[13] Store the calculated path in Zrest.
[14] End of for
[15] find minimum path weight value from Zrest and store in path_wand store index in min.
[16] if(path_w< infinite)
[17] Zresult [k]=Zrest[min] and set Zrest to NULL.
[18] else
[19] exit (there is no more path)
[20] End of for
End

This parallel approach is limited in the concept of GPU where we can assign only one common data in global
memory of the each kernel running under a single grid. Because of this limitation of the GPU we can calculate a
single shortest path at a time.

6. Performance Analysis

Modified yen’s algorithm performance analysis done on the basis of various graph available on the Stanford
graph library31. Various web graphs, the computer network graphs, the citation graph, the citation graphs and road
networks graphs are available in this library. These graphs are verified and tested in various parameters. We have
used directed graph with randomly assigned weight to the edges. This weight is assigned in preprocessing time of
the graph. We have used graph with 10k nodes to 65k nodes graph with edges 20k to 1.5 millions. These graphs are
considered as sparse graph because degree of these graphs is very less.

6.1. Experimental setup

To evaluate the performance we have one setup described in table 1.


Table 1. Experimental Setup
Specification Version / Detail
CUDA Version 5.0
Nvidia GPU Tesla C2075
Compute Capability 2.1
Cores 448
Multiprocessor 14
CPU Processor 2 x CPU Intel HEX(6), 2.8Hz
RAM 24GB
GPU Memory 4GB
OS Windows 7
Visual Studio Professional 2010
12 Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13

6.2. Results

In this section we show the results of parallel implementation of the yen algorithm. Yen’s parallel
implementation compares with serial yen implementation with new data structure developed byErnestoet. al.21. Time
to calculate k number of path between two specified nodes is depends on the number of edges in the k-1 path. It also
effect the number of path stored in rest path set. We implemented shortest path calculation in parallel that reduced
the timing of each shortest path calculation. The selection of shortest path from rest path list is also implemented in
parallel. In a dense graph more parallelization is achieved because we can calculate shortest path quickly.
Result shown in figure consist number of node in graph at x-axes and time in seconds at y-axes. We have analyzed
algorithm for different value of the K.

Fig. 4. Yen's algorithm timing graph(serial vs. parallel) (a) k=100 (b) k=200 (c) k=300

Figure 4(a), figure 4(b) and figure 4(5) show the result of serial yen algorithm (serial_yen) and parallel
implementation of yen’s algorithm (parallel_yen) in specified setup. The result show the comaparative analysis in
graph with 62k and 1.4 million edges. For k=100 the parallel_yen show 6.5 time speedup in the graph. The average
degree of used directed graph is 3 to 5. As we are increasing the value of K the time is almost similar for all values
of K. A graph with node 22k is shows less time because it average degree is 6 to 7. So we can say that is density of
the graph is increases then time to calculate K number of graph is reduced.

7. Conclusion

In this paper we have designed and implemented yen’s algorithm in a efficient way using parallel Dijkastra’s
algorithm. K-shortest path algorithm is get implemented in GPU first time. We have used a new data structure that is
well suited to GPU and reduces the running time of algorithm as well. By using new data structure internal calls of
shortest path is reduced and easily we can identify which how graph is temporarily modified to get new shortest path
Avadhesh Pratap Singh and Dhirendra Pratap Singh / Procedia Computer Science 48 (2015) 5 – 13 13

from previous one. Finally we got a shortest path tree with the according weight of the path. We have tested our
algorithm for different graph with various values of K. We have 6x speed up in comparison to yen’s serial
implementation.
To more performance gain we can implement this algorithm multiple GPU at a time so we can assign multiple
modified graph in different GPU so we can find multiple shortest path. Using Multiple GPUs more parallelization
cabbe gain. We can also use GPU memory hierarchy to reduce memory access time.

References

1. Dijkstra E.Anote on two problems in connection with graphs. Numerical Mathematics.1959;1:395–412.


2. Papaefthymiou M, RodrigueJ.Implementing parallel shortestpathsalgorithms.DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, 1994; pp. 59-68.
3. Fetterer A, Shekhar S. A performance analysis of hierarchical shortest path algorithms, Ninth IEEE International Conference on Tools with
Artificial Intelligence,IEEE, 1997.
4. Crobak JR, Berry JW, Madduri K. and Bader D. A. Advanced shortest paths algorithms on a massively-multithreaded architecture, Parallel
and Distributed Processing Symposium, IEEE, 2007.
5. Harish P, Narayanan PJ. Accelerating large graph algorithms on the GPU using CUDA, in High Performance Computing – HiPC 2007,
Aluru S, Parashar M. et al. (Eds.), Springer Berlin Heidelberg 2007; pp. 197-208.
6. Tang Y, Zhang Y, Chen H. A parallel shortest path algorithm based on graph-partitioning and iterative correcting, 10th IEEE International
Conference on High Performance Computing and Communications, IEEE, 2008.
7. Martín PJ, Torres R, Gavilanes A. CUDA Solutions for the SSSP Problem, in Computational Science – ICCS 2009, G. Allen et al. (Eds.),
Springer-Verlag Berlin, Heidelberg, 2009; pp. 904–913.
8. Kumar S, MisraA,Tomar RS.A Modified Parallel Approach to Single Source Shortest Path Problem for Massively Dense Graphs Using
CUDA, Int. Conf. on Computer & Comm. Tech. (ICCCT), IEEE, 2011.
9. Singh DP, KhareN. Parallel Implementation of the Single Source Shortest Path Algorithm on CPU–GPU Based Hybrid System,
International Journal of Computer Science and Information Security,September 2013;Vol. 11, No. 9.
10. EppsteinD .Finding the k shortest paths. SIAM Journal on Computing1998;28:652–673.
11. Fox BL.k-th shortest paths and applications to the probabilistic networks. In ORSA/TIMS National Mtg, Bull. Operations Research Soc. of
America, 1975;volume 23, page B263.
12. Martins EQV. An algorithm for ranking paths that may contain cycles. European J. Operational Research, 1984;18:123-130.
13. Azevedo JA.An algorithm for the ranking of shortest paths. European J. Operational Research,1993;69:97-106.
14. Hoffman R, Pavley RR.A method for the solution of the nth best path problem. Journal of the Association for Computing Machinery
1959;6:506–515.
15. Clarke S,Krikorian A,Rausan J.Computing the N Best Loopless Paths in a Net-work, J. of SIAM, December 1963;Vol. 11, No. 4,pp. 1096-
1102.
16. Bock F, Kantner H, HaynesJ. An Algorithm (The r-th Best Path Algorithm) for Find-inq and Ranking Paths Through a Network, Research
Report, Armour Research Foundation, Chicago, Illinois, November 15, 1957.
17. Pollack M.Thekth Best Route Through a Network, Opns. Res., 1961;Vol. 9, No. 4 ,pp. 578.
18. Sakarovitch M. The k Shortest Routes and the k Shortest Chains in a Graph, Opns. Res. Center, University of California, Berkeley, Report
ORC-32, October 1966.
19. Yen JY. Finding the K shortest loopless paths in a network, MgmtSci 17, 1971; 712–716.
20. Yen JY. Another algorithm for finding the K shortest loopless network paths, 41st MtgOper Res Soc of Am, Bull Oper Res Soc of Am 20 ,
1972; B/185.
21. Lawler EL.A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path
problem. Management Science, 1972;18:401–405.
22. Katho N, Ibaraki T,Mine H.An efficient algorithm for k shortest simple paths. Networks, 1982;12:411–427.
23. Hadjiconstantinou E, Christofides N. An efficient implementation of an algorithm for finding k shortest simple paths. Networks,
1999;34(2):88–101.
24. Ernesto QVM, Marta MBP. A new Implementation of Yen’s ranking loopless paths algorithm, 2002.
25. Kumar N, Ghosh RK.Parallel algorithm for finding first k shortest paths. Computer Science and Informatics: Journal of the Computer
Society of India, 1994;24(3):21–28.
26. Ruppert E., Finding the k shortest path in parallel, Algorithmica, 2000;28: 242-54.
27. Guerrieo E, Musmanno R.Parallel Asynchronous Algorithms for the K Shortest Paths Problem, Journal of Optimization Theory and
Applications, January 2000;vol 104, No. 1,pp 91-108.
28. Shih YK,ParthasarathyS. A single source k-shortest paths algorithm to infer regulatory pathways in a gene network, ISMB 2012;Vol.
28,pages i49–i58.
29. Nickolls J, Buck I, Garland M, Skadron K. Scalable Parallel Programming with CUDA, ACM Queue, 2008;vol. 6, no. 2, pp. 40-53.
30. NVIDIA Corporation. CUDA C programming guide 2013; http:// docs.nvidia.com /cuda /pdf /CUDA_C_Programming_Guide.pdf.
31. Jure L.Stanford Large Network Dataset Collection, Stanford University, http://snap.stanford.edu/data/.

You might also like