0% found this document useful (0 votes)
237 views

CUDA Based Minimum Spanning Tree

This document contains the Prim's Minimum Spanning Tree algorithm written in CUDA to boost up the performance. The GPU implementation is estimated 10x or more faster than CPU implementation for small number of nodes which are written in adjacent matrix format. The test platforms are the following: CPU: Intel i7 M640 2.8GHz GPU: NVS 3100 w/16 CUDA cores Memory: 8G DDR3

Uploaded by

ChengyaoTan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views

CUDA Based Minimum Spanning Tree

This document contains the Prim's Minimum Spanning Tree algorithm written in CUDA to boost up the performance. The GPU implementation is estimated 10x or more faster than CPU implementation for small number of nodes which are written in adjacent matrix format. The test platforms are the following: CPU: Intel i7 M640 2.8GHz GPU: NVS 3100 w/16 CUDA cores Memory: 8G DDR3

Uploaded by

ChengyaoTan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ECE572 CUDA Project: Minimum Spanning Trees

Guangxin Wang, Christopher King


December 5, 2011

1. Introduction
Using a large number of cores in GPU for enhancing the speedup and performance,
GPGPU has become the trend of computation acceleration. CUDA is one approach of
GPGPU, it is a parallel computing architecture developed by NVIDIA. CUDA is
specifically designed to use multiple cores in GPU to improve the performance via
parallel computation. CUDA is also a programming language. For instance, CUDA can
be programmed in C/C++ with a few extensions, because in CUDA C/C++, GPU is
treated as a compute device which could execute a large number of parallel computations,
and operating as a co-processor to the CPU.

2. Example of CUDA
By comparing the calculation time of minimum spanning tree (MST), this example shows
the improvement between CUDA C and sequential C.
The running environment shown below:
CPU

Intel i7 M640 @2.8 GHz

Graphics

NVS 3100m w/ 16 CUDA cores

Memory

8G DDR3

2.1 Minimum Spanning Tree & Prims Algorithm

Its a set of vertices (A, B, C, D and E) and the edges that interconnect them, each edge

1/7

attributes different weights. A minimum spanning tree is a subset of the edges of the
graph, so theres a path form any node to any other node and that the sum of the weights
of the edges is minimum.

Prims Algorithm found by Robert C. Prim, is one of the classic algorithms for MST. This
algorithm builds the minimal spanning tree by iteratively adding nodes into a working
tree:
Step 1: Start with a tree which contains only one node.
Step 2: Identify a node (outside the tree) which is closest to the tree and add the
minimum weight edge from that node to some node in the tree and
incorporate the additional node as a part of the tree.
Step 3: If there is less then (n 1) edges in the tree, repeat Step 2.

2.2 CUDA Implementation


#include <stdio.h>
#include <string.h>

__shared__ int n;
// The number of nodes in the graph
__shared__ int dist[100][100];
// the distance between node i and node j
__shared__ char inclde[100];
// inclde[i] is 1 if the node i is already in the minimum spanning tree; 0 otherwise
__shared__ int d[100];
// the distance between node i and the minimum spanning tree
__shared__ int w[100];

__global__ void MST(int target) {


int i;
for (i = 0; i < n; ++i)
if ((dist[target][i] != 0) && (d[i] > dist[target][i]))
{

2/7

d[i] = dist[target][i];
w[i] = target;
}
}

int main(int argc, char *argv[])


{
FILE *f = fopen("dist.txt", "r");
fscanf(f*, "%d", &n);
int i, j;
for (i = 0; i < n; ++i)
for (j = 0; j < n; ++j)
fscanf(f, "%d", &dist[i][j]);
fclose(f);

//Initialise d with infinity


for (i = 0; i < n; ++i)
d[i] = 100000;

// Mark all nodes as NOT beeing in the minimum spanning tree


for (i = 0; i < n; ++i)
inclde[i] = 0;

// Add the first node to the tree


printf("Adding node %c\n", 0 + 'A');
inclde[0] = 1;

int total = 0;
int s; //tree size
for (s = 1; s < n; ++s) {
// Find the node with the smallest distance to the tree
int min = -1;

for(j=0;j<n/p;++i)
{
c[i*dist[i][j]+j]=0;
visited[i*dist[i][j]+j]=0;
}
if(i==0)
{
visited[0]=1;

3/7

}
__syncthreads();

__shared__ int flag;


for(j=1;j<n;++i)
{
d[i]=MAX;
nd[i].d=d[i];
flag=0;

for(k=0;k<n/p;++j)
{
if(visited[i*dist[i][j]+k]!=1)
{
if(d[i]>w[i*dist[i][j]+k][c[i*dist[i][j]+k]])
{
d[i]=w[i*dist[i][j]+k][c[i*dist[i][j]+k]];
l=k;
flag=1;
}
}
}

if(flag==1)
{
nd[i].d=w[i*dist[i][j]+l][c[i*dist[i][j]+l]];
nd[i].v1=i*dist[i][j]+l;
nd[i].v2=c[i*dist[i][j]+l];
}
__shared__ node nd1;
__shared__ int m;
__shared__ int l;

if(i==0)
{
m=3;
l=(int)(log((double)(m-1))/log((double)2));
}
int h;
__syncthreads();

4/7

for(h=0;h<=l;h++)
{
if(i%(int)pow((double)2,h+1)==0 && i<m && (i+(int)pow((double)2,h))<m)
{
if(nd[i+(int)pow((double)2,h)].d<nd[i].d)
nd[i]=nd[i+(int)pow((double)2,h)];
}

printf("Adding edge %c-%c\n", w[min] + 'A', min + 'A');


total += d[min];
__syncthreads();
}

if(i==0)
{
nd1=nd[0];
tree[j-1].v1=nd1.v2;
tree[j-1].v2=nd1.v1;
visited[nd1.v1]=1;
}

__syncthreads();

for(k=0;k<n/p;++j)
{
if(visited[i*dist[i][j]+k]!=1)
{
if(w[i*dist[i][j]+k][nd1.v1]<w[i*dist[i][j]+k][c[i*dist[i][j]+k]])
{
c[i*dist[i][j]+k]=nd1.v1;
}
}
}
printf("Adding edge %c-%c\n", w[min] + 'A' + nd1.v1, min + 'A' + nd1.v1);
total += d[min];
__syncthreads();

printf("Adding edge %c-%c\n", w[min] + 'A' + nd1.v2, min + 'A' + nd1.v2);


total += d[min];

5/7

}
printf("Total distance: %d\n", total);
return 0;
}

2.3 Simulation Results


The 10 nodes sample input file looks as below, other larger sample file is created by
random number generator.
A

10

10

10

C 10 10

15 13 15

15

10 18

10 13

20

16

15 10 20

15 17

18

12

16 15 12

17

18

18

The sheet of running time:


Number of nodes

CUDA (ms)

C (ms)

10

0.186

0.198

100

0.479

1.61

200

0.658

2.87

500

1.11

4.21

6/7

3. Conclusion
The MST begins from one node, so there is no obvious speedup between CUDA and
sequential C in the small size sample. However, CUDA can calculate the different node
by parallel computation, as shown in the results, the larger size with the sample, the more
improvement CUDA has. Consequentially, CUDA provide higher performance than CPU
in parallel computation. Besides, CUDA shows high possibility and potential of general
purpose computing. Including CUDA, GPGPU will become a trend of high
computational throughput on parallel computation problems in the future.

7/7

You might also like