0% found this document useful (0 votes)

237 views

CUDA Based Minimum Spanning Tree

This document contains the Prim's Minimum Spanning Tree algorithm written in CUDA to boost up the performance. The GPU implementation is estimated 10x or more faster than CPU implementation for small number of nodes which are written in adjacent matrix format. The test platforms are the following: CPU: Intel i7 M640 2.8GHz GPU: NVS 3100 w/16 CUDA cores Memory: 8G DDR3

Uploaded by

ChengyaoTan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

237 views

CUDA Based Minimum Spanning Tree

Uploaded by

ChengyaoTan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ECE572 CUDA Project: Minimum Spanning Trees

Guangxin Wang, Christopher King

December 5, 2011

1. Introduction
Using a large number of cores in GPU for enhancing the speedup and performance,
GPGPU has become the trend of computation acceleration. CUDA is one approach of
GPGPU, it is a parallel computing architecture developed by NVIDIA. CUDA is
specifically designed to use multiple cores in GPU to improve the performance via
parallel computation. CUDA is also a programming language. For instance, CUDA can
be programmed in C/C++ with a few extensions, because in CUDA C/C++, GPU is
treated as a compute device which could execute a large number of parallel computations,
and operating as a co-processor to the CPU.

2. Example of CUDA
By comparing the calculation time of minimum spanning tree (MST), this example shows
the improvement between CUDA C and sequential C.
The running environment shown below:
CPU

Intel i7 M640 @2.8 GHz

Graphics

NVS 3100m w/ 16 CUDA cores

Memory

8G DDR3

2.1 Minimum Spanning Tree & Prims Algorithm

Its a set of vertices (A, B, C, D and E) and the edges that interconnect them, each edge

1/7

attributes different weights. A minimum spanning tree is a subset of the edges of the
graph, so theres a path form any node to any other node and that the sum of the weights
of the edges is minimum.

Prims Algorithm found by Robert C. Prim, is one of the classic algorithms for MST. This
algorithm builds the minimal spanning tree by iteratively adding nodes into a working
tree:
Step 1: Start with a tree which contains only one node.
Step 2: Identify a node (outside the tree) which is closest to the tree and add the
minimum weight edge from that node to some node in the tree and
incorporate the additional node as a part of the tree.
Step 3: If there is less then (n 1) edges in the tree, repeat Step 2.

2.2 CUDA Implementation

#include <stdio.h>
#include <string.h>

__shared__ int n;
// The number of nodes in the graph
__shared__ int dist[100][100];
// the distance between node i and node j
__shared__ char inclde[100];
// inclde[i] is 1 if the node i is already in the minimum spanning tree; 0 otherwise
__shared__ int d[100];
// the distance between node i and the minimum spanning tree
__shared__ int w[100];

global void MST(int target) {

int i;
for (i = 0; i < n; ++i)
if ((dist[target][i] != 0) && (d[i] > dist[target][i]))
{

2/7

d[i] = dist[target][i];
w[i] = target;
}
}

int main(int argc, char *argv[])

{
FILE *f = fopen("dist.txt", "r");
fscanf(f*, "%d", &n);
int i, j;
for (i = 0; i < n; ++i)
for (j = 0; j < n; ++j)
fscanf(f, "%d", &dist[i][j]);
fclose(f);

//Initialise d with infinity

for (i = 0; i < n; ++i)
d[i] = 100000;

// Mark all nodes as NOT beeing in the minimum spanning tree

for (i = 0; i < n; ++i)
inclde[i] = 0;

// Add the first node to the tree

printf("Adding node %c\n", 0 + 'A');
inclde[0] = 1;

int total = 0;
int s; //tree size
for (s = 1; s < n; ++s) {
// Find the node with the smallest distance to the tree
int min = -1;

for(j=0;j<n/p;++i)
{
c[i*dist[i][j]+j]=0;
visited[i*dist[i][j]+j]=0;
}
if(i==0)
{
visited[0]=1;

3/7

}
__syncthreads();

shared int flag;

for(j=1;j<n;++i)
{
d[i]=MAX;
nd[i].d=d[i];
flag=0;

for(k=0;k<n/p;++j)
{
if(visited[i*dist[i][j]+k]!=1)
{
if(d[i]>w[i*dist[i][j]+k][c[i*dist[i][j]+k]])
{
d[i]=w[i*dist[i][j]+k][c[i*dist[i][j]+k]];
l=k;
flag=1;
}
}
}

if(flag==1)
{
nd[i].d=w[i*dist[i][j]+l][c[i*dist[i][j]+l]];
nd[i].v1=i*dist[i][j]+l;
nd[i].v2=c[i*dist[i][j]+l];
}
__shared__ node nd1;
__shared__ int m;
__shared__ int l;

if(i==0)
{
m=3;
l=(int)(log((double)(m-1))/log((double)2));
}
int h;
__syncthreads();

4/7

for(h=0;h<=l;h++)
{
if(i%(int)pow((double)2,h+1)==0 && i<m && (i+(int)pow((double)2,h))<m)
{
if(nd[i+(int)pow((double)2,h)].d<nd[i].d)
nd[i]=nd[i+(int)pow((double)2,h)];
}

printf("Adding edge %c-%c\n", w[min] + 'A', min + 'A');

total += d[min];
__syncthreads();
}

if(i==0)
{
nd1=nd[0];
tree[j-1].v1=nd1.v2;
tree[j-1].v2=nd1.v1;
visited[nd1.v1]=1;
}

__syncthreads();

for(k=0;k<n/p;++j)
{
if(visited[i*dist[i][j]+k]!=1)
{
if(w[i*dist[i][j]+k][nd1.v1]<w[i*dist[i][j]+k][c[i*dist[i][j]+k]])
{
c[i*dist[i][j]+k]=nd1.v1;
}
}
}
printf("Adding edge %c-%c\n", w[min] + 'A' + nd1.v1, min + 'A' + nd1.v1);
total += d[min];
__syncthreads();

printf("Adding edge %c-%c\n", w[min] + 'A' + nd1.v2, min + 'A' + nd1.v2);

total += d[min];

5/7

}
printf("Total distance: %d\n", total);
return 0;
}

2.3 Simulation Results

The 10 nodes sample input file looks as below, other larger sample file is created by
random number generator.
A

C 10 10

15 13 15

10 18

10 13

15 10 20

15 17

16 15 12

The sheet of running time:

Number of nodes

CUDA (ms)

C (ms)

0.186

0.198

100

0.479

1.61

200

0.658

2.87

500

1.11

4.21

6/7

3. Conclusion
The MST begins from one node, so there is no obvious speedup between CUDA and
sequential C in the small size sample. However, CUDA can calculate the different node
by parallel computation, as shown in the results, the larger size with the sample, the more
improvement CUDA has. Consequentially, CUDA provide higher performance than CPU
in parallel computation. Besides, CUDA shows high possibility and potential of general
purpose computing. Including CUDA, GPGPU will become a trend of high
computational throughput on parallel computation problems in the future.

7/7

Intuit
No ratings yet
Intuit
44 pages
ADA Lab Manual-3
No ratings yet
ADA Lab Manual-3
31 pages
Id - 162
No ratings yet
Id - 162
24 pages
IRP
No ratings yet
IRP
15 pages
DAA-merged_organized
No ratings yet
DAA-merged_organized
31 pages
Ada Lab Manaul
No ratings yet
Ada Lab Manaul
27 pages
Ada Lab Manaul
No ratings yet
Ada Lab Manaul
25 pages
ADA - Lab Programs-Updated
No ratings yet
ADA - Lab Programs-Updated
32 pages
Cheat Sheet For BLPC 2016
No ratings yet
Cheat Sheet For BLPC 2016
13 pages
newdaamanual
No ratings yet
newdaamanual
45 pages
Dsa Exp 8 9 10
No ratings yet
Dsa Exp 8 9 10
10 pages
LAb Manual
No ratings yet
LAb Manual
34 pages
wmc lab
No ratings yet
wmc lab
34 pages
FinalLabreport
No ratings yet
FinalLabreport
17 pages
P2 Prim's Algorithm
No ratings yet
P2 Prim's Algorithm
2 pages
ADA(Lab)
No ratings yet
ADA(Lab)
37 pages
Ds Assignment - 9 - Debajyoti - Dhar - Bcs - 021
No ratings yet
Ds Assignment - 9 - Debajyoti - Dhar - Bcs - 021
17 pages
A5-Set-A-aTopological sort
No ratings yet
A5-Set-A-aTopological sort
11 pages
Cheat Sheet
No ratings yet
Cheat Sheet
20 pages
Problem: 1 Maximum Weight Node (Easy) : Maxweightcell CNT 0 0 - 1
No ratings yet
Problem: 1 Maximum Weight Node (Easy) : Maxweightcell CNT 0 0 - 1
9 pages
Assignment5-1
No ratings yet
Assignment5-1
5 pages
ADA
No ratings yet
ADA
33 pages
DM LAB Programs
No ratings yet
DM LAB Programs
5 pages
Manual
No ratings yet
Manual
23 pages
The Graph Data Structure: Mugurel Ionu Ț Andreica
No ratings yet
The Graph Data Structure: Mugurel Ionu Ț Andreica
13 pages
Topic - Distance Vector & Link State Routing Protocol
No ratings yet
Topic - Distance Vector & Link State Routing Protocol
18 pages
ICPC Final
No ratings yet
ICPC Final
25 pages
L6
No ratings yet
L6
16 pages
PR-7 MST
No ratings yet
PR-7 MST
9 pages
BCSL404 Programs
No ratings yet
BCSL404 Programs
28 pages
Resolución LP: Include Include Include Define Using Namespace
No ratings yet
Resolución LP: Include Include Include Define Using Namespace
7 pages
Dijkstras Algorithm
No ratings yet
Dijkstras Algorithm
4 pages
Aoa
No ratings yet
Aoa
16 pages
ADA LAB Assignment PDF
No ratings yet
ADA LAB Assignment PDF
17 pages
ADA LAB MANUAL 2022 SCHEME (1)
No ratings yet
ADA LAB MANUAL 2022 SCHEME (1)
28 pages
11
No ratings yet
11
3 pages
Mod 3
No ratings yet
Mod 3
17 pages
C++ Code For Classic Algorithms: Dijkstra's Algo
No ratings yet
C++ Code For Classic Algorithms: Dijkstra's Algo
4 pages
ADA Lab Program
No ratings yet
ADA Lab Program
15 pages
Samsung
No ratings yet
Samsung
58 pages
Assignment No.-3
No ratings yet
Assignment No.-3
2 pages
Ada Final Lab PG
No ratings yet
Ada Final Lab PG
12 pages
Bfs
No ratings yet
Bfs
6 pages
dsa8
No ratings yet
dsa8
4 pages
Untitled (Draft) 2
No ratings yet
Untitled (Draft) 2
16 pages
DAA_9-12
No ratings yet
DAA_9-12
10 pages
AOA_Lab_Assignment_4_2024UG100001
No ratings yet
AOA_Lab_Assignment_4_2024UG100001
17 pages
Daa 08
No ratings yet
Daa 08
4 pages
Raji Ds 9th to 11th Record New
No ratings yet
Raji Ds 9th to 11th Record New
30 pages
济南模板
No ratings yet
济南模板
35 pages
hpc 1_merged
No ratings yet
hpc 1_merged
41 pages
Applied Graph Theory File 7 Semester MCE: Submitted By:-Ankit Jain 2K14/MC/011 Batch R1
No ratings yet
Applied Graph Theory File 7 Semester MCE: Submitted By:-Ankit Jain 2K14/MC/011 Batch R1
35 pages
ADA_Lab_Program 1 & 2
No ratings yet
ADA_Lab_Program 1 & 2
6 pages
Bellman Ford
No ratings yet
Bellman Ford
5 pages
Max Sub Array:: Using Namespace Int Int Int Int Int Int Int Int
No ratings yet
Max Sub Array:: Using Namespace Int Int Int Int Int Int Int Int
7 pages
Daa Assignment (KD)
No ratings yet
Daa Assignment (KD)
16 pages
Assignment 2
No ratings yet
Assignment 2
9 pages
Lab 1-1
No ratings yet
Lab 1-1
23 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Ellipse As - Answer Key
No ratings yet
Ellipse As - Answer Key
2 pages
Kneser Graph - Wikipedia
No ratings yet
Kneser Graph - Wikipedia
1 page
AI LAb Bfs Ucs
No ratings yet
AI LAb Bfs Ucs
4 pages
Graph Theory Assignment 4
No ratings yet
Graph Theory Assignment 4
5 pages
IE Sem2 MLE5025 en Rlupsa 2023 7687
No ratings yet
IE Sem2 MLE5025 en Rlupsa 2023 7687
6 pages
53 Area
No ratings yet
53 Area
12 pages
G. de Micheli - Synthesis and Optimization of Digital Circuits (Text Recognized Using OCR) (V. 1.03 20-4-2005)
No ratings yet
G. de Micheli - Synthesis and Optimization of Digital Circuits (Text Recognized Using OCR) (V. 1.03 20-4-2005)
586 pages
Random graph.ipynb - Colab
No ratings yet
Random graph.ipynb - Colab
4 pages
DAA practical file
No ratings yet
DAA practical file
24 pages
Savings Algorithm - Clarke - Wright
No ratings yet
Savings Algorithm - Clarke - Wright
4 pages
Part-A 1) What Is Binary Search?: Sorting Algorithm Comparison Sort Quicksort Heapsort Merge Sort
No ratings yet
Part-A 1) What Is Binary Search?: Sorting Algorithm Comparison Sort Quicksort Heapsort Merge Sort
9 pages
Complex Network
No ratings yet
Complex Network
6 pages
Quiz 02 A
No ratings yet
Quiz 02 A
6 pages
Matrix Representations for Sierpinski Graphs to Study Spectra at different iteration
No ratings yet
Matrix Representations for Sierpinski Graphs to Study Spectra at different iteration
12 pages
Graph Algorithms and Applications
No ratings yet
Graph Algorithms and Applications
28 pages
VLSI Physical Design: From Graph Partitioning To Timing Closure
No ratings yet
VLSI Physical Design: From Graph Partitioning To Timing Closure
52 pages
Unit 4 - Discrete Structures - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Discrete Structures - WWW - Rgpvnotes.in
18 pages
CN Lab Manual
No ratings yet
CN Lab Manual
55 pages
The Notorious Four-Color Problem
No ratings yet
The Notorious Four-Color Problem
141 pages
DAA Unit Wise Importtant Questions
100% (4)
DAA Unit Wise Importtant Questions
2 pages
(Ebook) Graph Algorithms by Shimon Even, Guy Even ISBN 9780521517188, 0521517184 2024 Scribd Download
100% (3)
(Ebook) Graph Algorithms by Shimon Even, Guy Even ISBN 9780521517188, 0521517184 2024 Scribd Download
86 pages
Ansys Commands and Acronyms
No ratings yet
Ansys Commands and Acronyms
3 pages
MTK 2013 (Ri) Sem 1 1011-1
No ratings yet
MTK 2013 (Ri) Sem 1 1011-1
15 pages
Load Balancing
No ratings yet
Load Balancing
46 pages
Nota Kecil Math T4
No ratings yet
Nota Kecil Math T4
28 pages
Final 12sp Solution
No ratings yet
Final 12sp Solution
12 pages
Trig Paper2 2024 3
No ratings yet
Trig Paper2 2024 3
8 pages
M SC Mathematics
No ratings yet
M SC Mathematics
16 pages
Modern Graph Theory
100% (4)
Modern Graph Theory
408 pages
15.082 and 6.855J Fall 2010: Network Optimization J.B. Orlin
No ratings yet
15.082 and 6.855J Fall 2010: Network Optimization J.B. Orlin
44 pages

CUDA Based Minimum Spanning Tree

Uploaded by

CUDA Based Minimum Spanning Tree

Uploaded by

ECE572 CUDA Project: Minimum Spanning Trees

Guangxin Wang, Christopher King

Intel i7 M640 @2.8 GHz

NVS 3100m w/ 16 CUDA cores

2.1 Minimum Spanning Tree & Prims Algorithm

2.2 CUDA Implementation

__global__ void MST(int target) {

int main(int argc, char *argv[])

//Initialise d with infinity

// Mark all nodes as NOT beeing in the minimum spanning tree

// Add the first node to the tree

__shared__ int flag;

printf("Adding edge %c-%c\n", w[min] + 'A', min + 'A');

printf("Adding edge %c-%c\n", w[min] + 'A' + nd1.v2, min + 'A' + nd1.v2);

2.3 Simulation Results

The sheet of running time:

You might also like

global void MST(int target) {

shared int flag;