0% found this document useful (0 votes)
72 views

Graph Coloring Using Multi-Threading

The document discusses graph coloring using multi-threading. It summarizes a research paper that explores parallelizing the graph coloring problem, which involves assigning colors to vertices in a graph such that no adjacent vertices have the same color, using the minimum number of colors. The paper presents sequential graph coloring algorithms, discusses challenges in parallelizing due to the sequential nature of algorithms and synchronization, and proposes a parallel graph coloring algorithm that partitions vertices among threads and uses locking to synchronize coloring of boundary vertices.

Uploaded by

Ramit Sawhney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Graph Coloring Using Multi-Threading

The document discusses graph coloring using multi-threading. It summarizes a research paper that explores parallelizing the graph coloring problem, which involves assigning colors to vertices in a graph such that no adjacent vertices have the same color, using the minimum number of colors. The paper presents sequential graph coloring algorithms, discusses challenges in parallelizing due to the sequential nature of algorithms and synchronization, and proposes a parallel graph coloring algorithm that partitions vertices among threads and uses locking to synchronize coloring of boundary vertices.

Uploaded by

Ramit Sawhney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Graph Coloring using Multi-threading

Ramit Sawhney(319/CO/14), Puneet Mathur(312/CO/14)


Department of Computer Engineering
Netaji Subhas Institute of Technology

1. Introduction

Graphs are used to model various types of relations and


processes in physical, biological, and information systems.
Graphs consist of nodes and edges and thus allow us to model
them as networks and represent various practical problems. In
computer science, graphs thus represent networks of
communication, data organization, computational devices,
and data-flow. Graph problems frequently arise in many
practical applications including data analysis, data mining,
engineering and computational science. When these
applications are at a large scale, the solutions must be
obtained on parallel computing platforms to ensure a fast
enough response for Real Time applications. With the rise of
multi-core and multiprocessor systems, hardware capability
can be exploited using parallelism.
High performance and scalability are tough to achieve for
graph algorithms due to the runtime being dominated by
memory access time rather than process time. Data locality is
poor for graph algorithms due the unpredictable nature of
most algorithms. Thus, while concurrency may be abundant,
it is often fine-grained and we must ensure synchronization
between individual threads. A more effective mechanism as
compared to the use of Caches is by using thread parallelism,
which maintains multiple threads per core and aims at
reducing synchronization overheads due to context switching.
In this paper, we explore one such problem that has a central
role in computer science and models many real world
problems. Graph Coloring (distance -1), in layman words,
deals with assigning colors to the vertices of a graph such that
no two adjacent vertices are of the same color, where we
minimize the number of colors used. The applications of
graph coloring vary from discovering concurrency in parallel
computing, where coloring identifies subtasks that can be
carried out on elements simultaneously. The graph-coloring
problem for an arbitrary graph with minimum colors is an
NP-hard problem. Our aim is thus to explore methods,
parallelize known sequential algorithms, and possibly
approximate algorithms for this problem while exploring
suitable architectures, methods for synchronization and
interprocess communication whilst analyzing shared memory
and explicit message passing as two different mechanisms.
We first explore the sequential algorithm using backtracking,
followed by parallel algorithm for shared memory
architectures using barrier synchronization along with
parallel iterative and recursive dataflow algorithms. The
performance of these algorithms on various architectures
such as the Intel Xeon is then studied using the data and
results of ongoing research. The special case for 2 colors,
Bipartite Graphs are studied with sequential and the scope for
parallel algorithms along with its applications in depth.
Finally we overview the applications, future scope and
conclusions of the above study.

2. Background and Terminology

1. Chromatic Number
A coloring using at most k colors is called a proper
k-coloring. The minimum cardinality of the set of
colors needed to color a graph G is called its
chromatic number, and is denoted by (G). The
chromatic polynomial counts the number of ways a
graph can be colored using no more than a given
number of colors.
2. Problem Definition
A graph G is a pair (V, E) of a set of vertices V and a
set of edges E. The edges are unordered pairs of the
form {i, j} where i, j V. Two vertices i and j are said
to be adjacent if and only if {i, j} E and non-
adjacent otherwise.

The degree of a vertex v is the number of vertices


adjacent to v and is denoted by deg(v).
The maximum and minimum degree in a graph G is
denoted by and respectively. An independent set
in a graph is a set of vertices that are mutually non-
adjacent. This means that there is no edge between
any pair of vertices in an independent set.

The Graph Coloring Problem:

A vertex coloring of a simple graph, or simply


coloring for short, is an assignment of colors to the
vertices such that no two adjacent vertices are
assigned the same color. Alternatively, a coloring is
a partition of the vertex set into a collection of
vertex-disjoint independent sets.
Each independent set in such a partition is called a
color class. The graph-coloring problem is then to
find a vertex coloring for a simple graph using the
minimum number of colors possible.

3. Algorithms

The problem has been known to be NP-hard to solve


optimally. It has also been shown that, for all x > 0, the
problem remains NP-hard to approximate to within n1-x,
where n is the number of vertices in the graph.

1. Polynomial Time
The problem for k=2, is the same as
determining whether the graph is bipartite or
not and thus has a time complexity of the order
O(n) using breadth-first search or depth-first
search. If the graph is planar and has low
branch-width, then it can be solved in
polynomial time using dynamic programming.

2. The general solution


Brute-force search for a k-coloring considers
each of the kn assignments of k colors to n
vertices and checks if this assignment is in
accordance with the problem. This procedure is
impractical for all cases except for very small
problems as the time increases exponentially
with the increase in input.

Research shows that using dynamic


programming and a bound on the number of
maximal independent sets, k-colorability can be
checked with time and space complexity in
O(2.445n). More approximations and faster
algorithms are known for 3 and 4-colorability,
however these still take a large amount of time
due to exponential nature.

3. Greedy Coloring
Greedy coloring considers vertices in a
sequence <v1, v2,., vn> and assigns to vi the
smallest available color not used by vis
neighbors among v1,,vi-1, using a new color
when needed.

Greedy Colorings may or may not result in the


most optimal solution. The maximum number
of colors that can be obtained by this greedy
algorithm, by using a vertex ordering chosen to
maximize this number, is called the Grundy
number of a graph.

Sequential Greedy Algorithm

1. algorithm GreedyColoringSequential ( G(V,E) )


2. Initialize data structures // adjacency list/matrix
3. for each v in V do
4. for each w in adj(v) do
5. forbiddenColors(color[w]) <- v
6. c <- min {i > 0: forbiddenColors[i] not equal to v}
7. color[v] <- c //smallest permissible color

The search for color in Line 6 terminates after at


most d(v) + 1 attempts, where d(v) = |adj(v)| is the
degree of vertex v. Therefore, the work done while
coloring a vertex v is proportional to its degree,
independent of the size of the array forbiddenColors.
Thus the time complexity of Greedy is O(|V | + |E|),
or simply O(|E|) if the graph is connected. With good
vertex ordering techniques, the number of colors
used by Greedy is in fact often near-optimal for
practical graph structures and always bounded by B
+ 1, where B is the maximum back degree
number of already colored neighbors of a vertex.

4. Parallelizing the algorithm


Due to the difficulty of solving the problem
optimally, parallelizing the algorithm creates a
significant speedup. Though it is difficult to
parallelize due to the sequential nature of the
algorithm, we use a shared programming
memory model to achieve this.
Parallelization ensures no hazards (WAR,
RAW, WAW) by the use of synchronization
techniques such as using Locks, using various
thread APIs etc. Sections of the code may still
be sequential, and thus the speedup can be
computed by the use of Amdahls law, and thus
we try to also minimize the sequential
bottleneck as it upper bounds/limits the
maximum speedup obtained from
parallelization. The speedup is also limited by
the parallelization overheads such as inter-
process communication, context switching etc.
We discuss the impact of each of these
overheads at a later point after presenting the
parallel algorithms.

5. The parallel algorithm

1: Input: p no of threads
2: uniform random partitioning of V in V1, V2, . . ,Vp
3: m maximum degree of graph Vertices are inherently
ordered by their vertex ids
4: procedure ParallelGraphColoring(G = (V, E))
5: for all thread Ti| i {1,. . . ,p} do
6: Identify boundary vertices in Vi
7: Initialise TotalColors[m + 1] = {0, 1, ...., m}
8: for each v Vi| v is a internal vertex in Vi do
9: color(v) min{TotalColors color(adjacent(v)}
10: end for
11: for each v Vi| v is a boundary vertex in Vi do
12: List Ai adj(v)
13: Ai Ai {v}
14: Lock all vertices in Ai in increasing order of
vertex ids
15: color(v) min{TotalColors color(adjacent(v)}
16: Unlock all vertices in Ai
17: end for
18: end for
19: end procedure

For the parallel algorithm, we partition the vertices into sets V 1,


V2,,Vn and the graph is preprocessed wherein it is partitioned
into p blocks where p is the number of threads.
The vertices in each block can be categorized as:
1. Internal vertices: These vertices have all
neighboring vertices in the same block.

2. Boundary vertices: These vertices have neighbors


belonging in other partitions.

Each thread is responsible for proper coloring of vertices in its


partition.

We use locks to ensure synchronization between the various


threads. Global synchronization of threads is required so that
they can run independently. With the use of locks, coloring a
vertex becomes a critical section and a thread can only enter the
critical section when it acquires the lock.

Thus we have effectively broken down the problem into a


critical section problem that requires synchronization between
its various threads. Here, each vertex has a lock. A thread
wishing to color the neighboring vertices of a vertex, must
obtain the corresponding locks.

Deadlock:
As each thread must acquire locks in order to color vertices, it
may be possible that a deadlock arises. As each vertex can be
assumed as one resource, the resources here are of single
instance type. Due to only single instances of resources being
present, we can use a simple resource-allocation graph
algorithm that detects a cycle in the graph for all the resources.
The time taken for this is given as O(V2). However, we simply
prevent deadlock rather than perform recovery from deadlock.
The deadlock prevention is done by the elimination of one of the
4 conditions for deadlock. A global ordering of vertices is
maintained and vertices acquire locks in that order. This ensures
the property of circular waiting to never hold.

Elimination of Circular wait for Deadlock prevention

For all resources Ri, each thread must request


resources only in an increasing order of enumeration. After
that, each thread can request instances of type Rj after
acquiring Ri only if F(Rj) > F(Ri) for some function that
orders the resources. A proof by contradiction can be
shown that ensures the prevention of deadlock using the
above ordering mechanism.

5. Practical results by parallelism and scalability

1. Hardware and experimental setup


The results from a study on a 24 core Intel
Xeon(X5675) running at 3.07 GHz core frequency
supporting 6 hardware threads/core are used. Time
taken for the graph partitioning is also taken in,
however time taken for input is neglected.

2. Performance comparison graphs


Dataset graphs from Live Journal and Orkut
Community from SNAP are used to demonstrate the
speedup on real world graphs.
As performance of algorithms dependent on a
multitude of parameters, we consider only 2
significant parameters for the speedup evaluation.
These 2 parameters are:
1. Time taken
2. Number of colors used.
The numbers of threads are varied from 1 to 1000 to see how the
parallel algorithm fares against the sequential algorithm.

Speedup of parallel algorithms is calculated as the ratio of time


taken by the sequential algorithm to that taken by the parallel
algorithm. The speedup thus obtained is 2.42 and 3.79 for the
Live Journal and Orkut datasets respectively.

The decrease in performance of the algorithm is explained by


the overheads due to synchronization. One of the possible
reasons is that the architecture supports only 24*6 (approx 150)
threads. Another reason is the long waiting chains formed due to
locking.

6. Synchronization, Overheads and Threading

Multithreading
Multithreading enables various streams of a program that
can communicate with each other and interfere, such streams are
known as threads. Multithreaded program execution involves an
interleaved unpredictable order of execution of the threads,
which may require external synchronization in order to not
conflict with the aim of the program.
Depending on the amount of parallelization the code/program
can incur, the speedup of the program changes.
Pthreads using POSIX API is one of the most common ways for
the implementation of the above parallelizable code.

Synchronization
As in the graph-coloring problem, each vertex is
considered as a common object to all the threads, in such cases
we require coordination between the threads to ensure no
violations of the critical section problem.
Synchronization can be achieved in many ways such as by the
use of Semaphores, locks, hardware methods, monitors etc.
The above code uses locks to achieve synchronization.
Mutexes and Spinlocks are commonly used.
For single core systems we prefer using mutexes that will sleep
and wake threads up rather than performing busy waiting that
increases CPU idle time. For multiple core systems, spinlocks
may prove to be more optimal as the busy waiting on one core
still allows execution by the other cores, and in cases,
supersedes the overheads due to sleeping and waking threads up.

Overheads
The theoretical speedup for a completely parallelizable
code is greatly optimistic and doesnt match the practical
speedup. The major reason for this is the overheads required to
achieve parallel and concurrent execution. Tasks synchronize at
barriers where they all finish a timestep and here the slowest
task determines the overall speed. A more efficient approach
used is using a global array variable as used above. Minimizing
synchronization overheads such as time spent in busy waiting,
sleeping and waking threads is important and thus makes the
program more scalable. Since synchronization overheads tend to
grow rapidly with the number of threads, the scalability
increases if the overheads increase at a rate slower than the rate
of increase of the number of threads.

7. Bipartite Graphs: An important special case


A bipartite graph is a graph whose vertices can be divided
into two disjoint sets U and V such that every edge connects a
vertex in U to one in V, such that U and V are disjoint. The two
sets U and V may be thought of as a coloring of the graph with
two colors.
Bipartite graphs and their study is an important field as Bipartite
graphs occur commonly as:
1. Trees
2. Cycle graphs with even number of vertices
3. Hypercube graphs, partial cubes and median graphs.
Testing a graph for bipartiteness is a special case of the
graph coloring problem with distance-1 and k=2. In such cases,
the chromatic number must be less than 2. While the above
parallel algorithm has a good speedup, testing bipartiteness is
solvable in linear time using DFS.

Applications of bipartite matching and bipartite graphs


1. Maximum matching in problems such as stable marriage.
2. Modern coding theory
3. Levi graphs in geometry
4. Petri net to study concurrent systems.

A C++ implementation for linear time testing of bipartiteness is


also demonstrated.

8. Applications
The graph-coloring problem has multiple applications, due to
which we try to speedup the process for the generalized
distance-1 coloring. Some of the prominent applications
include:
1. Map Coloring: Geographical map coloring, with an
emphasis on the Four Color theorem to sufficiently color a map.
2. Bipartiteness
3. Register Allocation: For compiler optimization
(reordering of code, prevention of hazards, static scheduling),
the register allocation process of assigning a large number of
target program variables onto a small number of CPU registers
is also a graph coloring problem.
4. It is used for modeling scheduling problems.
9. Conclusions and Future Scope
Massively multithreaded algorithms entail a speedup from
the sequential algorithms and thus are more useful in solving the
NP-hard graph-coloring problem that has numerous
applications. However has the degree of multithreading for the
above used algorithm increases, the synchronization overheads
such as interprocess communication, context switching, using
locks due to the nature of the problem being similar to that of a
critical section problem causes the gain due to parallelism to be
subdued by the overheads to ensure this hazard free parallelism.
Some general conclusions include:
1. Simultaneous multithreading provides an effective way to
tolerate latency.
2. Synchronization overheads can become heavy for multiple
threads as the number of threads exceeds ~150 and nullify the
effect of parallelism leading to a speedup factor of less than
unity.
3. The usage of spinlocks for multi-core systems such as the
above one, the locks are plenty which can be held by cores and
thus the time overhead incurred due to busy waiting of the
spinlocks is less than that for sleeping and waking up the locks
as opposed to the use of mutexes on single core systems.
The above research and field has a large future scope. Some of
the most prominent fields that the above can be extended into
are:
1. Performance metric comparison with an interprocess
communication model using interconnection networks such as
meshes, torus, iliac mesh etc to study the speedup variation with
various network parameters as compared to the shared memory
model.
2. The distance-1 problem though the most common, the above
algorithms and speedup calculations can be extended for
distance-n problems.
3. More approximation techniques, parallelization and reduction
of sequential bottlenecks can be studied and added.
4. More methods for scheduling the threads and defining an
ordering for vertex partition that improve the heuristics can be
included.
5. More fine grain locking can be used to study the issue of long
waiting chains that cause a deterioration of performance in the
algorithm for the number of threads greater than 150.

10. Bibliography and References


[1] Erik G Boman, Doruk Bozdag, Umit Catalyurek, Assefaw H Gebremedhin, and Fredrik Manne. A
scalable parallel graph coloring algorithm for distributed memory computers. In Euro-Par 2005 Parallel
Processing, pages 241251. Springer, 2005.

[2] Umit V. C atalyurek, John Feo, Assefaw Hadish Gebremedhin , Mahantesh Halappanavar, and
Alex Pothen. Graph coloring algorithms for multi-core and massively multithreaded architectures.
Parallel Computing, 38(10-11):576594, 2012.

[3] Assefaw H Gebremedhin, Fredrik Manne, and Tom Woods. Speeding up parallel graph coloring.
In Applied Parallel Computing. State of the Art in Scientific Computing, pages 10791088. Springer,
2006.

[4] Assefaw Hadish Gebremedhin and Fredrik Manne. Scalable parallel graph coloring algorithms.
Concurrency - Practice and Experience, 12(12):11311146, 2000.

[5] Mark T. Jones and Paul E. Plassmann. A parallel graph coloring heuristic. SIAM J. Scientific
Computing, 14(3):654669, 1993.

[6] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection.
http://snap.stanford.edu/data, June 2014.

[7] Md Mostofa Ali Patwary, Assefaw H Gebremedhin, and Alex Pothen. New multithreaded ordering
and coloring algorithms for multicore architectures. In Euro-Par 2011 Parallel Processing, pages 250
262. Springer, 2011.

[8] wikpedia.org

[9] stackoverflow.com

[10] geeksforgeeks.org

[11] T. F. Coleman and J. J. More. Estimation of sparse Hessian matrices and graph coloring
problems. Mathematical Programming, 28:243270, 1984.
C++ Code for Sequential Greedy Graph Coloring
#include <bits/stdc++.h>

using namespace std;

class Graph

{ int V; // No. of vertices

list<int> *adj; // A dynamic array of adjacency lists

public:

Graph(int V) { this->V = V; adj = new list<int>[V]; }

void addEdge(int v, int w);

void greedyColoring(); };

void Graph::addEdge(int v, int w)

{ adj[v].push_back(w);

adj[w].push_back(v); }

void Graph::greedyColoring()

{ int result[V];

result[0] = 0;

for (int u = 1; u < V; u++)

result[u] = -1; // no color is assigned to u

bool available[V];

for (int cr = 0; cr < V; cr++)

available[cr] = false;

for (int u = 1; u < V; u++)

{ list<int>::iterator i;

for (i = adj[u].begin(); i != adj[u].end(); ++i)

if (result[*i] != -1)

available[result[*i]] = true;

for (int cr = 0; cr < V; cr++)

if (available[cr] == false) break;

result[u] = cr; // Assign the found color

for (i = adj[u].begin(); i != adj[u].end(); ++i)

if (result[*i] != -1)

available[result[*i]] = false; }

for (int u = 0; u < V; u++)

cout << "Vertex " << u << " ---> Color "<< result[u] << endl;}

int main()

{ Graph g1(5);

//initiliase graph as required, add code here as g1.add

g1.greedyColoring();

return 0;}

You might also like