Graph Partitioning and Graph Clustering (PDFDrive)
Graph Partitioning and Graph Clustering (PDFDrive)
Graph Partitioning
and Graph Clustering
10th DIMACS Implementation Challenge Workshop
February 13–14, 2012
Georgia Institute of Technology
Atlanta, GA
David A. Bader
Henning Meyerhenke
Peter Sanders
Dorothea Wagner
Editors
David A. Bader
Henning Meyerhenke
Peter Sanders
Dorothea Wagner
Editors
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
588
Graph Partitioning
and Graph Clustering
10th DIMACS Implementation Challenge Workshop
February 13–14, 2012
Georgia Institute of Technology
Atlanta, GA
David A. Bader
Henning Meyerhenke
Peter Sanders
Dorothea Wagner
Editors
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
EDITORIAL COMMITTEE
Dennis DeTurck, Managing Editor
Michael Loss Kailash Misra Martin J. Strauss
Copying and reprinting. Material in this book may be reproduced by any means for edu-
cational and scientific purposes without fee or permission with the exception of reproduction by
services that collect fees for delivery of documents and provided that the customary acknowledg-
ment of the source is given. This consent does not extend to other kinds of copying for general
distribution, for advertising or promotional purposes, or for resale. Requests for permission for
commercial use of material should be addressed to the Acquisitions Department, American Math-
ematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can
also be made by e-mail to [email protected].
Excluded from these provisions is material in articles for which the author holds copyright. In
such cases, requests for permission to use or reprint should be addressed directly to the author(s).
(Copyright ownership is indicated in the notice in the lower right-hand corner of the first page of
each article.)
c 2013 by the American Mathematical Society. All rights reserved.
The American Mathematical Society retains all rights
except those granted to the United States Government.
Copyright of individual articles may revert to the public domain 28 years
after publication. Contact the AMS for copyright status of individual articles.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 13 12 11 10 09 08
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contents
Preface
David A. Bader, Henning Meyerhenke, Peter Sanders,
and Dorothea Wagner vii
High Quality Graph Partitioning
Peter Sanders and Christian Schulz 1
Abusing a Hypergraph Partitioner for Unweighted Graph Partitioning
B. O. Fagginger Auer and R. H. Bisseling 19
Parallel Partitioning with Zoltan: Is Hypergraph Partitioning Worth It?
Sivasankaran Rajamanickam and Erik G. Boman 37
UMPa: A Multi-objective, multi-level partitioner for communication
minimization
¨
Umit V. Çatalyürek, Mehmet Deveci, Kamer Kaya,
and Bora Uçar 53
Shape Optimizing Load Balancing for MPI-Parallel Adaptive Numerical
Simulations
Henning Meyerhenke 67
Graph Partitioning for Scalable Distributed Graph Computations
Aydin Buluç and Kamesh Madduri 83
Using Graph Partitioning for Efficient Network Modularity Optimization
Hristo Djidjev and Melih Onus 103
Modularity Maximization in Networks by Variable Neighborhood Search
Daniel Aloise, Gilles Caporossi, Pierre Hansen, Leo Liberti,
Sylvain Perron, and Manuel Ruiz 113
Network Clustering via Clique Relaxations: A Community Based Approach
Anurag Verma and Sergiy Butenko 129
Identifying Base Clusters and Their Application to Maximizing Modularity
Sriram Srinivasan, Tanmoy Chakraborty,
and Sanjukta Bhowmick 141
Complete Hierarchical Cut-Clustering: A Case Study on Expansion and
Modularity
Michael Hamann, Tanja Hartmann, and Dorothea Wagner 157
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
vi CONTENTS
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Preface
1 http://dimacs.rutgers.edu/Challenges/
vii
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
viii PREFACE
2. Key Results
The main results of the 10th DIMACS Implementation Challenge include:
• Extension of a file format used by several graph partitioning and graph
clustering libraries for graphs, their geometry, and partitions. Formats
are described on the challenge website.5
• Collection and online archival5 of a common testbed of input instances
and generators (including their description) from different categories for
evaluating graph partitioning and graph clustering algorithms. For the
actual challenge, a core subset of the testbed was chosen.
• Definition of a new combination of measures to assess the quality of a
clustering.
• Definition of a measure to assess the work an implemention performs in a
parallel setting. This measure is used to normalize sequential and parallel
implementations to a common base line.
• Experimental evaluation of state-of-the-art implementations of graph par-
titioning and graph clustering codes on the core input families.
• A nondiscriminatory way to assign scores to solvers that takes both run-
ning time and solution quality into account.
• Discussion of directions for further research in the areas of graph parti-
tioning and graph clustering.
• The paper Benchmarks for Network Analysis, which was invited as a con-
tribution to the Encyclopedia of Social Network Analysis and Mining.
The primary location of information regarding the 10th DIMACS Implementation
Challenge is the website http://www.cc.gatech.edu/dimacs10/.
2 Santo Fortunato, Community detection in graphs, Physics Reports 486 (2010), no. 3–5,
75–174.
3 Satu E. Schaeffer, Graph clustering, Computer Science Review 1 (2007), no. 1, 27–64.
4 K. Schloegel, G. Karypis, and V. Kumar, Graph partitioning for high-performance scientific
simulations, Sourcebook of parallel computing (Jack Dongarra, Ian Foster, Geoffrey Fox, William
Gropp, Ken Kennedy, Linda Torczon, and Andy White, eds.) Morgan Kaufmann Publishers,
2003, pp. 491–541.
5 http://www.cc.gatech.edu/dimacs10/downloads.shtml
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
3. CHALLENGE DESCRIPTION ix
3. Challenge Description
3.1. Data Sets. The collection of benchmark inputs of the 10th DIMACS
Implementation Challenge includes both synthetic and real-world data. All graphs
are undirected. Formerly directed instances were symmetrized by making every
directed edge undirected. While this procedure necessarily loses information in a
number of real-world applications, it appeared to be necessary since most existing
software libraries can handle undirected graphs only. Directed graphs (or unsym-
metric matrices) are left for further work.
Synthetic graphs in the collection include random graphs (Erdős-Rényi, R-
MAT, random geometric graphs using the unit disk model), Delaunay triangula-
tions, and graphs that mimic meshes from dynamic numerical simulations. Real-
world inputs consist of co-author and citation networks, road networks, numerical
simulation meshes, web graphs, social networks, computational task graphs, and
graphs from adapting voting districts (redistricting).
For the actual challenge two subsets were chosen, one for graph partitioning
and one for graph clustering. The first one (for graph partitioning) contained 18
graphs, which had to be partitioned into 5 different numbers of parts each, yielding
90 problem instances. The second one (for graph clustering) contained 30 graphs.
Due to the choice of objective functions for graph clustering, no restriction on the
number of parts or their size was necessary in this category.
3.2. Categories. One of the main goals of the challenge was to compare dif-
ferent techniques and algorithmic approaches. Therefore participants were invited
to join different challenge competitions aimed at assessing the performance and
solution quality of different implementations. Let G = (V, E, ω) be an undirected
graph with edge weight function ω.
3.2.1. Graph Partitioning. Here the task was to compute a partition Π of the
vertex set V into k parts of size at most (1 + ) |Vk | . The two objective functions
used to assess the partitioning quality were edge cut (EC, total number of edges
with endpoints in different parts) and maximum communication volume (CV). CV
sums for each part p and each vertex v therein the number of parts adjacent to v
but different from p. The final result is the maximum over each part.
For each instance result (EC and CV results were counted as one instance each),
the solvers with the first six ranks received a descending number of points (10, 6,
4, 3, 2, 1), a scoring system borrowed from former Formula 1 rules.
Three groups submitted solutions to the graph partitioning competition. Only
one of the submitted solvers is a graph partitioner by nature, the other two are
actually hypergraph partitioners. Both hypergraph partitioners use multilevel re-
cursive bisection. While their quality, in particular for the communication volume,
is generally not bad, the vast majority of best ranked solutions (139 out of 170) are
held by the graph partitioner KaPa.
3.2.2. Graph Clustering. The clustering challenge was divided into two separate
competitions with different optimization criteria. For the first competition the
objective modularity had to be optimized. Modularity has been a very popular
measure in the last years, in particular in the field of community detection. It
follows the intra-cluster-density vs. inter-cluster-sparsity paradigm. However, some
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
x PREFACE
3.3. URL to Resources. The main website of the 10th DIMACS Implemen-
tation Challenge can be found at its permanent location http://www.cc.gatech.
edu/dimacs10/. The following subdirectories contain:
• archive/data/: Testbed instances archived for long-term access.
• talks/: Slides of the talks presented at the workshop.
• papers/: Papers on which the workshop talks are based.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
4. CONTRIBUTIONS TO THIS COLLECTION xi
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
xii PREFACE
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
5. DIRECTIONS FOR FURTHER RESEARCH xiii
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11700
1. Introduction
Problems of graph partitioning arise in various areas of computer science, en-
gineering, and related fields. For example in route planning, community detection
in social networks and high performance computing. In many of these applications
large graphs need to be partitioned such that there are few edges between blocks
(the elements of the partition). For example, when you process a graph in parallel
on k processors you often want to partition the graph into k blocks of about equal
size so that there is as little interaction as possible between the blocks. In this
paper we focus on a version of the problem that constrains the maximum block size
to (1 + ) times the average block size and tries to minimize the total cut size, i.e.,
the number of edges that run between blocks. It is well known that this problem
is NP-complete [5] and that there is no approximation algorithm with a constant
ratio factor for general graphs [5]. Therefore mostly heuristic algorithms are used in
practice. A successful heuristic for partitioning large graphs is the multilevel graph
partitioning (MGP) approach depicted in Figure 1 where the graph is recursively
contracted to achieve smaller graphs which should reflect the same structure as the
input graph. After applying an initial partitioning algorithm to the smallest graph,
the contraction is undone and, at each level, a local refinement method is used to
improve the partitioning induced by the coarser level.
Although several successful multilevel partitioners have been developed in the
last 13 years, we had the impression that certain aspects of the method are not
well understood. We therefore have built our own graph partitioner KaPPa [13]
2013
c American Mathematical Society
1
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
2 PETER SANDERS AND CHRISTIAN SCHULZ
input output
contraction phase
graph partition
refinement phase
... ...
local improvement
match
contract uncontract
initial
partitioning
2. Preliminaries
2.1. Basic concepts. Consider an undirected graph G = (V, E, c, ω) with
edge weights ω : E → R>0 , node weights c : V → R≥0 , n = |V |, and m = |E|.
We extend c and ω to sets, i.e., c(V ) := v∈V c(v) and ω(E ) := e∈E ω(e).
Γ(v) := {u : {v, u} ∈ E} denotes the neighbors of v. We are looking for blocks
of nodes V1 ,. . . ,Vk that partition V , i.e., V1 ∪ · · · ∪ Vk = V and Vi ∩ Vj = ∅ for
i
= j. The balancing constraint demands that ∀i ∈ {1..k} : c(Vi ) ≤ Lmax :=
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 3
(1 + )c(V )/k + maxv∈V c(v) for some parameter . The last term in this equation
arises because each node is atomic and therefore a deviation ofthe heaviest node
has to be allowed. The objective is to minimize the total cut i<j w(Eij ) where
Eij := {{u, v} ∈ E : u ∈ Vi , v ∈ Vj }. A clustering is also a partition of the nodes,
however k is usually not given in advance and the balance constraint is removed. A
vertex v ∈ Vi that has a neighbor w ∈ Vj , i
= j, is a boundary vertex. An abstract
view of the partitioned graph is the so called quotient graph, where vertices represent
blocks and edges are induced by connectivity between blocks. Given two clusterings
C1 and C2 the overlay clustering is the clustering where each block corresponds to a
connected component of the graph GE = (V, E\E) where E is the union of the cut
edges of C1 and C2 , i.e. all edges that run between blocks in C1 or C2 . We will need
the of overlay clustering to define a combine operation on partitions in Section 5.
By default, our initial inputs will have unit edge and node weights. However, even
those will be translated into weighted problems in the course of the algorithm.
A matching M ⊆ E is a set of edges that do not share any common nodes,
i.e., the graph (V, M ) has maximum degree one. Contracting an edge {u, v} means
to replace the nodes u and v by a new node x connected to the former neighbors
of u and v. We set c(x) = c(u) + c(v) so the weight of a node at each level is
the number of nodes it is representing in the original graph. If replacing edges of
the form {u, w},{v, w} would generate two parallel edges {x, w}, we insert a single
edge with ω({x, w}) = ω({u, w}) + ω({v, w}). Uncontracting an edge e undoes its
contraction. In order to avoid tedious notation, G will denote the current state of
the graph before and after a (un)contraction unless we explicitly want to refer to
different states of the graph. The multilevel approach to graph partitioning consists
of three main phases. In the contraction (coarsening) phase, we iteratively iden-
tify matchings M ⊆ E and contract the edges in M . Contraction should quickly
reduce the size of the input and each computed level should reflect the structure
of the input network. Contraction is stopped when the graph is small enough to
be directly partitioned using some expensive other algorithm. In the refinement
(or uncoarsening) phase, the matchings are iteratively uncontracted. After uncon-
tracting a matching, a refinement algorithm moves nodes between blocks in order
to improve the cut size or balance.
3. Related Work
There has been a huge amount of research on graph partitioning so that we
refer the reader to [26] for more material on multilevel graph partitioning and to
[15] for more material on genetic approaches for graph partitioning. All general
purpose methods that are able to obtain good partitions for large real world graphs
are based on the multilevel principle outlined in Section 2. Well known software
packages based on this approach include, Jostle [26], Metis [14], and Scotch [20].
KaSPar [19] is a graph partitioner based on the central idea to (un)contract only
a single edge between two levels. KaPPa [13] is a ”classical” matching based MGP
algorithm designed for scalable parallel execution. MQI [16] and Improve [2] are
flow-based methods for improving graph cuts when cut quality is measured by
quotient-style metrics such as expansion or conductance. This approach is only
feasible for k = 2. Improve uses several minimum cut computations to improve
the quotient cut score of a proposed partition. Soper et al. [23] provided the first
algorithm that combined an evolutionary search algorithm with a multilevel graph
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
4 PETER SANDERS AND CHRISTIAN SCHULZ
partitioner. Here crossover and mutation operators have been used to compute edge
biases, which yield hints for the underlying multilevel graph partitioner. Benlic et
al. [4] provided a multilevel memetic algorithm for balanced graph partitioning.
This approach is able to compute many entries in Walshaw’s Benchmark Archive
[23] for the case = 0. Very recently an algorithm called PUNCH [8] has been
introduced. This approach is not based on the multilevel principle. However, it
creates a coarse version of the graph based on the notion of natural cuts. Natural
cuts are relatively sparse cuts close to denser areas. They are discovered by finding
minimum cuts between carefully chosen regions of the graph. They introduced an
evolutionary algorithm which is similar to Soper et al. [23], i.e. using a combine
operator that computes edge biases yielding hints for the underlying graph parti-
tioner. Experiments indicate that the algorithm computes very good partitions for
road networks. For instances without a natural structure natural cuts are not very
helpful.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 5
∂1B ∂2 B
s t
G
V1 V2
B
s t
G
V1 V2
B
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
6 PETER SANDERS AND CHRISTIAN SCHULZ
5. KaFFPa Evolutionary
We now describe the techniques used in KaFFPaE. The general idea behind
evolutionary algorithms (EA) is to use mechanisms which are highly inspired by
biological evolution such as selection, mutation, recombination and survival of the
fittest. An EA starts with a population of individuals (in our case partitions of the
graph) and evolves the population into different populations over several rounds.
In each round, the EA uses a selection rule based on the fitness of the individuals
(in our case the edge cut) of the population to select good individuals and combine
them to obtain improved offspring. Note that we can use the cut as a fitness function
since our partitioner almost always generates partitions that are within the given
balance constraint. Our algorithm generates only one offspring per generation.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 7
Uncoarsening
Coarsening
Graph partitioned
Graph not partitioned
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
8 PETER SANDERS AND CHRISTIAN SCHULZ
match contract
Figure 4. At the far left, a graph G with two partitions, the dark
and the light line, is shown. Cut edges are not eligible for the
matching algorithm. Contraction is done until no matchable edge
is left. The best of the two given partitions is used as initial par-
tition.
the coarsening phase, i.e. they are not contracted during the coarsening phase. In
other words these edges are not eligible for the matching algorithm used during
the coarsening phase and therefore are not part of any matching computed. An
illustration of this can be found in Figure 4.
The stopping criterion for the multi-level partitioner is modified such that it
stops when no contractable edge is left. Note that the coarsest graph is now exactly
the same as the quotient graph Q of the overlay clustering of P and C of G (see
Figure 5). Hence vertices of the coarsest graph correspond to the connected compo-
nents of GE = (V, E\E) and the weight of the edges between vertices corresponds
to the sum of the edge weights running between those connected components in
G. As soon as the coarsening phase is stopped, we apply the partition P to the
coarsest graph and use this as initial partitioning. This is possible since we did
not contract any cut edge of P. Note that due to the specialized coarsening phase
and this specialized initial partitioning we obtain a high quality initial solution on
a very coarse graph which is usually not discovered by conventional partitioning
algorithms. Since our refinement algorithms guarantee no worsening of the input
partition and use random tie breaking we can assure nondecreasing partition qual-
ity. Note that the refinement algorithms can effectively exchange good parts of
the solution on the coarse levels by moving only a few vertices. Figure 5 gives an
example.
When the offspring is generated we have to decide which solution should be
evicted from the current population. We evict the solution that is most similar
to the offspring among those individuals in the population that have a cut worse
or equal than the offspring itself. The difference of two individuals is defined as
the size of the symmetric difference between their sets of cut edges. This ensures
some diversity in the population and hence makes the evolutionary algorithm more
effective.
5.1.1. Classical Combine using Tournament Selection. This instantiation of the
combine framework corresponds to a classical evolutionary combine operator C1 .
That means it takes two individuals P1 , P2 of the population and performs the
combine step described above. In this case P corresponds to the partition having
the smaller cut and C corresponds to the partition having the larger cut. Random
tie breaking is used if both parents have the same cut. The selection process is
based on the tournament selection rule [18], i.e. P1 is the fittest out of two random
individuals R1 , R2 from the population. The same is done to select P2 . Note that in
contrast to previous methods the generated offspring will have a cut smaller or equal
to the cut of P. Due to the fact that our multi-level algorithms are randomized,
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 9
v2 v2
G G
v1 v3 v1 v3
v4 v4
a combine operation performed twice using the same parents can yield different
offspring.
5.1.2. Cross Combine / (Transduction). In this instantiation of the combine
framework C2 , the clustering C corresponds to a partition of G. But instead of
choosing an individual from the population we create a new individual in the fol-
lowing way. We choose k uniformly at random in [k/4, 4k] and uniformly at
random in [, 4]. We then use KaFFPa to create a k -partition of G fulfilling the
balance constraint max c(Vi ) ≤ (1 + )c(V )/k . In general larger imbalances reduce
the cut of a partition which then yields good clusterings for our crossover. To the
best of our knowledge there has been no genetic algorithm that performs combine
operations combining individuals from different search spaces.
5.1.3. Natural Cuts. Delling et al. [8] introduced the notion of natural cuts as
a preprocessing technique for the partitioning of road networks. The preprocessing
technique is able to find relatively sparse cuts close to denser areas. We use the
computation of natural cuts to provide another combine operator, i.e. combining
a k-partition with a clustering generated by the computation of natural cuts. We
closely follow their description: The computation of natural cuts works in rounds.
Each round picks a center vertex v and grows a breadth-first search (BFS) tree.
The BFS is stopped as soon as the weight of the tree, i.e. the sum of the vertex
weights of the tree, reaches αU , for some parameters α and U . The set of the
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
10 PETER SANDERS AND CHRISTIAN SCHULZ
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 11
M2 works quite similar with the small difference that the input partition is not
used as initial partition of the coarsest graph. That means we obtain very good
coarse graphs but we cannot assure that the final individual has a higher quality
than the input individual. In both cases the resulting offspring is inserted into the
population using the eviction strategy described in Section 5.1.
6. Experiments
Implementation. We have implemented the algorithm described above using
C++. Overall, our program (including KaFFPa and KaFFPaE) consists of about
22 500 lines of code. We use three configurations of KaFFPa: KaFFPaStrong,
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
12 PETER SANDERS AND CHRISTIAN SCHULZ
Algorithm 3 All PEs perform the same operations using different random seeds.
procedure locallyEvolve
estimate population size S
while time left
if elapsed time < ttotal /f then create individual and insert into local population
else
flip coin c with corresponding probabilities
if c shows head then
perform a mutation operation
else
perform a combine operation
insert offspring into population if possible
communicate according to communication protocol
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 13
graph n m
Random Geometric Graphs
rgg16 216 ≈342 K
rgg17 217 ≈729 K
Delaunay
delaunay16 216 ≈197 K
delaunay17 217 ≈393 K
Kronecker G500
kron simple 16 216 ≈2 M
kron simple 17 217 ≈5 M
Numerical
adaptive ≈6 M ≈14 M
channel ≈5 M ≈42 M
venturi ≈4 M ≈8 M
packing ≈2 M ≈17 M
2D Frames
hugetrace-00000 ≈5 M ≈7 M
hugetric-00000 ≈6 M ≈9 M
Sparse Matrices
af shell9 ≈500 K ≈9 M
thermal2 ≈1 M ≈4 M
Coauthor Networks
coAutCiteseer ≈227 K ≈814 K
coAutDBLP ≈299 K ≈978 K
Social Networks
cnr ≈326 K ≈3 M
caidaRouterLvl ≈192 K ≈609 K
Road Networks
luxembourg ≈144 K ≈120 K
belgium ≈1 M ≈2 M
netherlands ≈2 M ≈2 M
italy ≈7 M ≈7 M
great-britain ≈8 M ≈8 M
germany ≈12 M ≈12 M
asia ≈12 M ≈13 M
europe ≈51 M ≈54 M
we want to obtain minimal cut values for k ∈ {2, 4, 8, 16, 32, 64} and balance pa-
rameters ∈ {0, 0.01, 0.03, 0.05}. We focus on ∈ {1%, 3%, 5%} since KaFFPaE
(more precisely KaFFPa) is not made for the case = 0. We run KaFFPaE with
a time limit of two hours using 16 PEs (two nodes of the cluster) per graph, k and
. On the eight largest graphs of the archive we gave KaFFPaE eight hours per
graph, k and . KaFFPaE computed 300 partitions which are better than previous
best partitions reported there: 91 for 1%, 103 for 3% and 106 for 5%. Moreover,
it reproduced equally sized cuts in 170 of the 312 remaining cases. When only
considering the 15 largest graphs and ∈ {1.03, 1.05} we are able to reproduce or
improve the current result in 224 out of 240 cases. Overall our systems (including
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
14 PETER SANDERS AND CHRISTIAN SCHULZ
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 15
grp. algorithm/runtime t
Bbest Bavg tavg [m] Kbest Kavg tavg [m]
lux. 79 79 60.1 81 83 0.1
bel. 307 307 60.5 320 326 0.9
net. 191 193 60.6 207 217 1.2
ita. 200 200 64.3 205 210 3.9
gb. 363 365 63.0 381 395 6.5
ger. 473 475 65.3 482 499 11.3
asia. 47 47 67.6 52 55 6.4
eur. 526 527 131.5 550 590 76.1
Solver Points
KaFFPaFast 1372 Solver Points
Metis 1265 KaFFPaFast 1680
KaFFPaEco 1174 KaFFPaEco 1305
KaFFPaE 1134 KaFFPaE 1145
KaFFPaStrong 1085 KaFFPaStrong 1106
UMPa [6] 624 UMPa [6] 782
Scotch 361 Mondrian [11] 462
Mondrian [11] 225
process after one day of computation or after one hundred repetitions yielding un-
balanced partitions. The resulting partition was used for both parts of the challenge,
i.e. optimizing for edge cut and optimizing for maximum communication volume.
The runtime of each iteration was added if more then one iteration was needed to
obtain a feasable partition. KaFFPaE was given four nodes of machine A and a time
limit of eight hours for each instance. When computing partitions for the objective
function maximum communication volume we altered the fitness function to this
objective. This ensures that individuals having a better maximum communication
volume are more often selected for a combine operation. Using this methodology
KaFFPaStrong, KaFFPaEco, KaFFPaFast, KaFFPaE, Metis and Scotch were able
to solve 136, 150, 170, 130, 146 and 110 instances respectively. The resulting points
achieved in the Pareto challenge can be found in Table 4 (see [1] for a description on
how points are computed for the challenges). Note that KaFFPaFast gained more
points than KaFFPaEco, KaFFPaStrong and KaFFPaE. Since it is much faster
than the other KaFFPa configurations it is almost never dominated by them and
therefore scores a lot of points in this particular challenge. For some instances the
partitions produced by Metis always exceeded the balance constraint by exactly one
vertex. We assume that a small modification of Metis would increase the number
of instances solved and most probably also the score achieved.
Quality Challenge. Our quality submission KaPa (Karlsruhe Partitioners) as-
sembles the best solutions of the partitions obtained of our partitioners in the Pareto
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
16 PETER SANDERS AND CHRISTIAN SCHULZ
Solver Points
KaPa 1574
UMPa [6] 1066
Mondrian [11] 616
References
[1] Competition rules and objective functions for the 10th dimacs implementation challenge
on graph partitioning and graph clustering, http://www.cc.gatech.edu/dimacs10/data/
dimacs10-rules.pdf.
[2] Reid Andersen and Kevin J. Lang, An algorithm for improving graph partitions, Proceedings
of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York,
2008, pp. 651–660. MR2487634
[3] David Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner. 10th DIMACS Im-
plementation Challenge - Graph Partitioning and Graph Clustering, http://www.cc.gatech.
edu/dimacs10/.
[4] Una Benlic and Jin-Kao Hao. A multilevel memtetic approach for improving graph k-
partitions. In 22nd Intl. Conf. Tools with Artificial Intelligence, pages 121–128, 2010.
[5] Thang Nguyen Bui and Curt Jones, Finding good approximate vertex and edge partitions is
NP-hard, Inform. Process. Lett. 42 (1992), no. 3, 153–159, DOI 10.1016/0020-0190(92)90140-
Q. MR1168771 (93h:68111)
[6] Ümit V. Çatalyürek, Mehmet Deveci, Kamer Kaya, and Bora Uçar. Umpa: A multi-objective,
multi-level partitioner for communication minimization. In 10th DIMACS Impl. Challenge
Workshop: Graph Partitioning and Graph Clustering. Georgia Institute of Technology, At-
lanta, GA, February 13-14, 2012.
[7] Kenneth A. De Jong, Evolutionary computation: a unified approach, A Bradford Book, MIT
Press, Cambridge, MA, 2006. MR2234532 (2007b:68003)
[8] Daniel Delling, Andrew V. Goldberg, Ilya Razenshteyn, and Renato F. Werneck. Graph
Partitioning with Natural Cuts. In 25th IPDPS. IEEE Computer Society, 2011.
[9] Benjamin Doerr and Mahmoud Fouz, Asymptotically optimal randomized rumor spreading,
Automata, languages and programming. Part II, Lecture Notes in Comput. Sci., vol. 6756,
Springer, Heidelberg, 2011, pp. 502–513, DOI 10.1007/978-3-642-22012-8 40. MR2852451
[10] Doratha E. Drake and Stefan Hougardy, A simple approximation algorithm for the weighted
matching problem, Inform. Process. Lett. 85 (2003), no. 4, 211–213, DOI 10.1016/S0020-
0190(02)00393-9. MR1950496 (2003m:68185)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
HIGH QUALITY GRAPH PARTITIONING 17
[11] B. O. Fagginger Auer and R. H. Bisseling. Abusing a hypergraph partitioner for unweighted
graph partitioning. In 10th DIMACS Impl. Challenge Workshop: Graph Partitioning and
Graph Clustering. Georgia Institute of Technology, Atlanta, GA, February 13-14, 2012.
[12] C. M. Fiduccia and R. M. Mattheyses. A Linear-Time Heuristic for Improving Network
Partitions. In 19th Conference on Design Automation, pages 175–181, 1982.
[13] M. Holtgrewe, P. Sanders, and C. Schulz. Engineering a Scalable High Quality Graph Parti-
tioner. 24th IEEE International Parallal and Distributed Processing Symposium, 2010.
[14] George Karypis and Vipin Kumar, Parallel multilevel k-way partitioning scheme for irregular
graphs, SIAM Rev. 41 (1999), no. 2, 278–300 (electronic), DOI 10.1137/S0036144598334138.
MR1684545 (2000d:68117)
[15] Jin Kim, Inwook Hwang, Yong-Hyuk Kim, and Byung Ro Moon. Genetic approaches for
graph partitioning: a survey. In GECCO, pages 473–480. ACM, 2011.
[16] Kevin Lang and Satish Rao, A flow-based method for improving the expansion or conductance
of graph cuts, Integer programming and combinatorial optimization, Lecture Notes in Com-
put. Sci., vol. 3064, Springer, Berlin, 2004, pp. 325–337, DOI 10.1007/978-3-540-25960-2 25.
MR2144596 (2005m:05181)
[17] J. Maue and P. Sanders. Engineering algorithms for approximate weighted matching. In 6th
Workshop on Exp. Alg. (WEA), volume 4525 of LNCS, pages 242–255. Springer, 2007.
[18] Brad L. Miller and David E. Goldberg, Genetic algorithms, tournament selection, and the
effects of noise, Complex Systems 9 (1995), no. 3, 193–212. MR1390121 (97c:68136)
[19] Vitaly Osipov and Peter Sanders, n-level graph partitioning, Algorithms—ESA 2010. Part
I, Lecture Notes in Comput. Sci., vol. 6346, Springer, Berlin, 2010, pp. 278–289, DOI
10.1007/978-3-642-15775-2 24. MR2762861
[20] F. Pellegrini. Scotch home page. http://www.labri.fr/pelegrin/scotch.
[21] P. Sanders and C. Schulz. Distributed Evolutionary Graph Partitioning. 12th Workshop on
Algorithm Engineering and Experimentation, 2011.
[22] Peter Sanders and Christian Schulz, Engineering multilevel graph partitioning algorithms,
Algorithms—ESA 2011, Lecture Notes in Comput. Sci., vol. 6942, Springer, Heidelberg, 2011,
pp. 469–480, DOI 10.1007/978-3-642-23719-5 40. MR2893224 (2012k:68259)
[23] A. J. Soper, C. Walshaw, and M. Cross, A combined evolutionary search and multilevel
optimisation approach to graph-partitioning, J. Global Optim. 29 (2004), no. 2, 225–241,
DOI 10.1023/B:JOGO.0000042115.44455.f3. MR2092958 (2005k:05228)
[24] Chris Walshaw, Multilevel refinement for combinatorial optimisation problems, Ann. Oper.
Res. 131 (2004), 325–372, DOI 10.1023/B:ANOR.0000039525.80601.15. MR2095810
[25] C. Walshaw and M. Cross, Mesh partitioning: a multilevel balancing and refine-
ment algorithm, SIAM J. Sci. Comput. 22 (2000), no. 1, 63–80 (electronic), DOI
10.1137/S1064827598337373. MR1769526 (2001b:65153)
[26] C. Walshaw and M. Cross. JOSTLE: Parallel Multilevel Graph-Partitioning Software – An
Overview. In F. Magoules, editor, Mesh Partitioning Techniques and Domain Decomposition
Techniques, pages 27–58. Civil-Comp Ltd., 2007. (Invited chapter).
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11707
1. Introduction
In this paper, we use the Mondriaan matrix partitioner [22] to partition the
graphs from the 10th DIMACS challenge on graph partitioning and clustering [1].
In this way, we can compare Mondriaan’s performance as a graph partitioner with
the performance of the state-of-the-art partitioners participating in the challenge.
An undirected graph G is a pair (V, E), with vertices V , and edges E that are
of the form {u, v} for u, v ∈ V with possibly u = v. For vertices v ∈ V , we denote
the set of all of v’s neighbours by
Vv := {u ∈ V | {u, v} ∈ E}.
Note that vertex v is a neighbour of itself precisely when the self-edge {v, v} ∈ E.
Hypergraphs are a generalisation of undirected graphs, where edges can contain
an arbitrary number of vertices. A hypergraph G is a pair (V, N ), with vertices V,
and nets (or hyperedges) N ; nets are subsets of V that can contain any number of
vertices.
Let > 0, k ∈ N, and G = (V, E) be an undirected graph. Then a valid solution
to the graph partitioning problem for partitioning G into k parts with imbalance ,
is a partitioning Π : V → {1, . . . , k} of the graph’s vertices into k parts, each part
Π−1 ({i}) containing at most
|V |
(1.1) |Π−1 ({i})| ≤ (1 + ) , (1 ≤ i ≤ k)
k
vertices.
2010 Mathematics Subject Classification. Primary 05C65, 05C70; Secondary 05C85.
Key words and phrases. Hypergraphs, graph partitioning, edge cut, communication volume.
2013
c American Mathematical Society
19
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
20 B. O. FAGGINGER AUER AND R. H. BISSELING
To measure the quality of a valid partitioning we use two different metrics. The
communication volume metric 1 [1] is defined by
(1.2) CV(Π) := max |Π(Vv ) \ {Π(v)}|.
1≤i≤k
v∈V
Π(v)=i
For each vertex v, we determine the number π(v) of different parts in which v has
neighbours, except its own part Π(v). Then, the communication volume is given
by the maximum over i, of the sum of all π(v) for vertices v belonging to part i.
The edge-cut metric [1], defined as
(1.3) EC(Π) := |{{u, v} ∈ E | Π(u)
= Π(v)}|,
measures the number of edges between different parts of the partitioning Π.
2. Mondriaan
2.1. Mondriaan sparse matrix partitioner. The Mondriaan partitioner
has been designed to partition the matrix and the vectors for a parallel sparse
matrix–vector multiplication, where a sparse matrix A is multiplied by a dense
input vector v to give a dense output vector u = A v as the result. First, the
matrix partitioning algorithm is executed to minimise the total communication
volume LV(Π) of the partitioning, defined below, and then the vector partitioning
algorithm is executed with the aim of balancing the communication among the
processors. The matrix partitioning itself does not aim to achieve such balance,
but it is not biased in favour of any processor part either.
1 We forgo custom edge and vertex weights and assume they are all equal to one, because
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 21
Name Ref. V N
Column-net [7] {r1 , . . . , rm } {{ri | 1 ≤ i ≤ m, ai j
= 0} | 1 ≤ j ≤ n}
Row-net [7] {c1 , . . . , cn } {{cj | 1 ≤ j ≤ n, ai j
= 0} | 1 ≤ i ≤ m}
Fine-grain [9] {vi j | ai j
= 0} {{vi j |1 ≤ i ≤ m, ai j
= 0} | 1 ≤ j ≤ n}
column nets
∪ {{vi j |1 ≤ j ≤ n, ai j
= 0} | 1 ≤ i ≤ m}
row nets
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
22 B. O. FAGGINGER AUER AND R. H. BISSELING
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 23
(a) k = 1 (b) k = 2
We will now translate the DIMACS partitioning problems from Section 1 to the
hypergraph partitioning problem that Mondriaan is designed to solve, by creating
a suitable hypergraph G, encoded as a sparse matrix A in the row-net model.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
24 B. O. FAGGINGER AUER AND R. H. BISSELING
we can provide an upper bound, which we can use to limit CV(Π). We need to
choose the rows of A, corresponding to nets in the row-net hypergraph G = (V, N ),
such that (2.3) and (2.2) are in agreement.
For a net n ∈ N , we have that n ⊆ V = V is simply a collection of vertices
of G, so |Π(n)| in (2.2) equals the number of different parts in which the vertices
of n are contained. In (2.3) we count, for a vertex v ∈ V , all parts in which v
has a neighbour, except Π(v). Note that this number equals |Π(Vv ) \ {Π(v)}| =
|Π(Vv ∪ {v})| − 1.
Hence, we should pick N := {Vv ∪ {v} | v ∈ V } as the set of nets, for (2.3)
and (2.2) to agree. In the row-net matrix model, this corresponds to letting A be a
matrix with a row for every vertex v ∈ V , filled with nonzeros av v and au v for all
u ∈ Vv \ {v}. Then, for this hypergraph G, we have by (2.3) that CV(Π) ≤ LV(Π).
Note that since the communication volume is defined as a maximum, we also have
that k CV(Π) ≥ LV(Π).
Theorem 2.1. Let G = (V, E) be a given graph, k ∈ N, and > 0. Let A be
the |V | × |V | matrix with entries
1 if {u, v} ∈ E or u = v,
au v :=
0 otherwise,
for u, v ∈ V , and let G = (V, N ) be the hypergraph corresponding to A in the row-net
model with vertex weights ζ(v) = 1 for all v ∈ V.
Then, for every partitioning Π : V → {1, . . . , k}, we have that Π satisfies (1.1)
if and only if Π satisfies (2.1), and
1
(2.4) LV(Π) ≤ CV(Π) ≤ LV(Π).
k
2.4. Minimising edge cut. We will now follow the same procedure as in
Section 2.3 to construct a matrix A such that minimising (2.2) subject to (2.1) is
equivalent to minimising (1.3) subject to (1.1).
As in Section 2.3, the columns of A should correspond to the vertices V of G
to ensure that (2.1) is equivalent to (1.1).
Equation (1.3) simply counts all of G’s edges that contain vertices belonging
to two parts of the partitioning Π. Since every edge contains vertices belonging to
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 25
1 if v ∈ e,
ae v :=
0 otherwise,
for e ∈ E, v ∈ V , and let G = (V, N ) be the hypergraph corresponding to A in the
row-net model with vertex weights ζ(v) = 1 for all v ∈ V.
Then, for every partitioning Π : V → {1, . . . , k}, we have that Π satisfies (1.1)
if and only if Π satisfies (2.1), and
(2.5) EC(Π) = LV(Π).
With Theorem 2.1 and Theorem 2.2, we know how to translate a given graph G
to a hypergraph that Mondriaan can partition to obtain solutions to the DIMACS
partitioning challenges.
3. Results
We measure Mondriaan’s performance as a graph partitioner by partitioning
graphs from the walshaw/ [20] category, as well as a subset of the specified par-
titioning instances of the DIMACS challenge test bed [1], see Tables 3 and 4.
This is done by converting the graphs to matrices, as described by Theorem 2.1
and Theorem 2.2, and partitioning these matrices with Mondriaan 3.11, using the
onedimcol splitting strategy (since the matrices represent row-net hypergraphs)
with the lambda1 metric (cf. (2.2)). The imbalance is set to = 0.03, the number
of parts k is chosen from {2, 4, . . . , 1024}, and we measure the communication vol-
umes and edge cuts over 16 runs of the Mondriaan partitioner (as Mondriaan uses
random tie-breaking). All results were recorded on a dual quad-core AMD Opteron
2378 system with 32GiB of main memory and they can be found in Tables 5–8 and
Figures 2 and 3. None of the graphs from Table 3 or 4 contain self-edges, edge
weights, or vertex weights. Therefore, the values recorded in Tables 5–8 satisfy ei-
ther (1.2) or (1.3) (which both assume unit weights), and can directly be compared
to the results of other DIMACS challenge participants.
Tables 5 and 6 contain the lowest communication volumes and edge cuts ob-
tained by Mondriaan in 16 runs for the graphs from Table 3. The strange dip in the
communication volume for finan512 in Table 5 for k = 32 parts can be explained
by the fact that the graph finan512 consists exactly of 32 densely connected parts
with few connections between them, see the visualisation of this graph in [11], such
that there is a natural partitioning with very low communication volume in this
case.
To determine how well Mondriaan performs as a graph partitioner, we have also
partitioned the graphs from Tables 3 and 4 using METIS 5.0.2 [14] and Scotch 5.1.12
[18]. For METIS we used the high-quality PartGraphKway option, while Scotch
was invoked using graphPart with the QUALITY and SAFETY strategies enabled.
We furthermore compare the results from Table 6 to the lowest known edge cuts
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
26 B. O. FAGGINGER AUER AND R. H. BISSELING
G |V | |E| G |V | |E|
add20 2,395 7,462 bcsstk30 28,924 1,007,284
data 2,851 15,093 bcsstk31 35,588 572,914
3elt 4,720 13,722 fe pwt 36,519 144,794
uk 4,824 6,837 bcsstk32 44,609 985,046
add32 4,960 9,462 fe body 45,087 163,734
bcsstk33 8,738 291,583 t60k 60,005 89,440
whitaker3 9,800 28,989 wing 62,032 121,544
crack 10,240 30,380 brack2 62,631 366,559
wing nodal 10,937 75,488 finan512 74,752 261,120
fe 4elt2 11,143 32,818 fe tooth 78,136 452,591
vibrobox 12,328 165,250 fe rotor 99,617 662,431
bcsstk29 13,992 302,748 598a 110,971 741,934
4elt 15,606 45,878 fe ocean 143,437 409,593
fe sphere 16,386 49,152 144 144,649 1,074,393
cti 16,840 48,232 wave 156,317 1,059,331
memplus 17,758 54,196 m14b 214,765 1,679,018
cs4 22,499 43,858 auto 448,695 3,314,611
G |V | |E|
1 delaunay n15 32,768 98,274
2 kron g500-simple-logn17 131,072 5,113,985
3 coAuthorsCiteseer 227,320 814,134
4 rgg n 2 18 s0 262,144 1,547,283
5 auto 448,695 3,314,611
6 G3 circuit 1,585,478 3,037,674
7 kkt power 2,063,494 6,482,320
8 M6 3,501,776 10,501,936
9 AS365 3,799,275 11,368,076
10 NLR 4,163,763 12,487,976
11 hugetric-00000 5,824,554 8,733,523
12 great-britain.osm 7,733,822 8,156,517
13 asia.osm 11,950,757 12,711,603
14 hugebubbles-00010 19,458,087 29,179,764
with 3% imbalance for graphs from the walshaw/ category, available from http://
staffweb.cms.gre.ac.uk/~wc06/partition/ [20]. These data were retrieved on
May 8, 2012 and include results from the KaFFPa partitioner, contributed by
Sanders and Schulz [19], who also participated in the DIMACS challenge. Results
for graphs from the DIMACS challenge, Tables 7 and 8, are given for the number
of parts k specified in the challenge partitioning instances, for a single run of the
Mondriaan, METIS, and Scotch partitioners.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 27
G 2 4 8 16 32 64
add20 74 101 118 141 159 -
data 63 84 80 78 65 -
3elt 45 65 59 65 53 49
uk 19 27 36 33 31 24
add32 9 21 29 24 20 22
bcsstk33 454 667 719 630 547 449
whitaker3 64 130 104 98 77 60
crack 95 97 123 100 78 64
wing nodal 453 593 523 423 362 256
fe 4elt2 66 94 97 85 69 60
vibrobox 996 1,080 966 887 663 482
bcsstk29 180 366 360 336 252 220
4elt 70 90 86 89 88 71
fe sphere 193 213 178 139 107 83
cti 268 526 496 379 295 200
memplus 2,519 1,689 1,069 720 572 514
cs4 319 492 409 311 228 161
bcsstk30 283 637 611 689 601 559
bcsstk31 358 492 498 490 451 400
fe pwt 120 122 133 145 148 132
bcsstk32 491 573 733 671 561 442
fe body 109 143 173 171 145 133
t60k 71 141 154 139 129 96
wing 705 854 759 594 451 324
brack2 231 650 761 635 562 458
finan512 75 76 137 141 84 165
fe tooth 1,238 1,269 1,282 1,066 844 703
fe rotor 549 1,437 1,258 1,138 944 749
598a 647 1,400 1,415 1,432 1,064 871
fe ocean 269 797 1,002 1,000 867 647
144 1,660 2,499 2,047 1,613 1,346 1,184
wave 2,366 2,986 2,755 2,138 1,640 1,222
m14b 921 2,111 2,086 2,016 1,524 1,171
auto 2,526 4,518 4,456 3,982 3,028 2,388
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
28 B. O. FAGGINGER AUER AND R. H. BISSELING
G 2 4 8 16 32 64
add20 680 1,197 1,776 2,247 2,561 -
data 195 408 676 1,233 2,006 -
3elt 87 206 368 639 1,078 1,966
uk 20 43 98 177 299 529
add32 21 86 167 247 441 700
bcsstk33 10,068 21,993 37,054 58,188 82,102 114,483
whitaker3 126 385 692 1,172 1,825 2,769
crack 186 372 716 1,169 1,851 2,788
wing nodal 1,703 3,694 5,845 8,963 12,870 17,458
fe 4elt2 130 350 616 1,091 1,770 2,760
vibrobox 10,310 19,401 28,690 37,038 45,877 53,560
bcsstk29 2,846 8,508 16,714 25,954 39,508 59,873
4elt 137 335 543 1,040 1,724 2,896
fe sphere 404 822 1,258 1,972 2,857 4,082
cti 318 934 1,786 2,887 4,302 6,027
memplus 5,507 9,666 12,147 14,077 15,737 17,698
cs4 389 1,042 1,654 2,411 3,407 4,639
bcsstk30 6,324 16,698 35,046 77,589 123,766 186,084
bcsstk31 2,677 7,731 14,299 25,212 40,641 65,893
fe pwt 347 720 1,435 2,855 5,888 9,146
bcsstk32 4,779 9,146 23,040 41,214 66,606 102,977
fe body 271 668 1,153 2,011 3,450 5,614
t60k 77 227 506 952 1,592 2,483
wing 845 1,832 2,843 4,451 6,558 8,929
brack2 690 2,905 7,314 12,181 19,100 28,509
finan512 162 324 891 1,539 2,592 10,593
fe tooth 3,991 7,434 12,736 19,709 27,670 38,477
fe rotor 1,970 7,716 13,643 22,304 34,515 50,540
598a 2,434 8,170 16,736 27,895 43,192 63,056
fe ocean 317 1,772 4,316 8,457 13,936 21,522
144 6,628 16,822 27,629 41,947 62,157 86,647
wave 8,883 18,949 32,025 47,835 69,236 94,099
m14b 3,862 13,464 26,962 46,430 73,177 107,293
auto 9,973 27,297 49,087 83,505 132,998 191,429
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 29
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
30 B. O. FAGGINGER AUER AND R. H. BISSELING
Table 8. Edge cut, (1.3), for graphs from Table 4, divided into k
parts with imbalance = 0.03 for one run of Mondriaan, METIS,
and Scotch. The numbering of the graphs is given by Table 4.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 31
which is equal to 0.98 in Table 9 for X = {graphs from Table 3}. If the value from
(3.1) is smaller than 1, Mondriaan outperforms METIS, while METIS outperforms
Mondriaan if it is larger than 1. We use this quality measure instead of simply
calculating the average of all CV(ΠMonG )/CV(ΠG
MET
) ratios, because it gives us a
symmetric comparison of all partitioners, in the following sense:
κMon,MET (X ) = 1/κMET,Mon (X ).
Scotch is unable to optimise for the communication volume metric directly and
therefore it is not surprising that Scotch is outperformed by both Mondriaan and
METIS in this metric. Surprisingly, Mondriaan outperforms Scotch in terms of
edge cut for the graphs from Table 4. The more extreme results for the graphs
from Table 4 could be caused by the fact that they have been recorded for a single
run of the partitioners, while the results for graphs from Table 3 are the best in
16 runs. METIS yields lower average communication volumes and edge cuts than
both Mondriaan and Scotch in almost all DIMACS cases.
If we compare the edge cuts for graphs from Table 3 to the best-known results
from [20], we find that Mondriaan’s, METIS’, and Scotch’s best edge cuts obtained
in 16 runs are on average 13%, 10%, and 10% larger, respectively, than those from
[20].
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
32 B. O. FAGGINGER AUER AND R. H. BISSELING
10 2
10 1
10 0
-1
10
10 -2 3
10 10 4 10 5 10 6 10 7 10 8
Number of graph edges
Partitioning time (edge cut)
4
10
Mondriaan (64)
Mondriaan (512)
METIS (64)
10 3 METIS (512)
Scotch (64)
Scotch (512)
Partitioning time (s)
2
10
1
10
0
10
-1
10
10 -2
3 4 5 6 7 8
10 10 10 10 10 10
Number of graph edges
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 33
kkt_power
0
2 4 8 16 32 64 128 256 512
Number of parts k
and Scotch is 12× faster. Note that only six (large) matrices are partitioned into
512 parts.
In the absence of self-edges, the number of nonzeros in the matrices from The-
orem 2.1 and Theorem 2.2 equals 2 |E| + |V | and 2 |E|, respectively. However, the
matrix sizes are equal to |V | × |V | and |E| × |V |, respectively. Therefore, the num-
ber of nonzeros in matrices from Theorem 2.2 is smaller, but the larger number
of nets (typically |E| > |V |, e.g. rgg n 2 18 s0) will lead to increased memory
requirements for the edge-cut matrices.
We have also investigated Mondriaan’s communication volume imbalance, de-
fined for a partitioning Π of G into k parts as
CV(Π)
(3.2) − 1.
LV(Π)/k
This equation measures the imbalance in communication volume and can be com-
pared to the factor for vertex imbalance in (1.1). We plot (3.2) for a selec-
tion of graphs in Figure 3, where we see that the deviation of the communica-
tion volume CV(Π) from perfect balance, i.e. from LV(Π)/k, is very small com-
pared to the theoretical upper bound of k − 1 (via (2.4)), for all graphs except
kron g500-simple-logn17. This means that for most graphs, at most a factor of
2–3 in communication volume per processor can still be gained by improving the
communication balance. Therefore, as the number of parts increases, the different
parts of the partitionings generated by Mondriaan are not only balanced in terms
of vertices, cf. (1.1), but also in terms of communication volume.
4. Conclusion
We have shown that it is possible to use the Mondriaan matrix partitioner as
a graph partitioner by constructing appropriate matrices of a given graph for ei-
ther the communication volume or edge-cut metric. Mondriaan’s performance was
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
34 B. O. FAGGINGER AUER AND R. H. BISSELING
measured by partitioning graphs from the 10th DIMACS challenge on graph parti-
tioning and clustering with Mondriaan, METIS, and Scotch, as well as comparing
obtained edge cuts with the best known results from [20]: here Mondriaan’s best
edge cut in 16 runs was, on average, 13% higher than the best known. Mondriaan
is competitive in terms of partitioning quality (METIS’ and Scotch’s best edge cuts
are, on average, 10% higher than the best known), but it is an order of magnitude
slower (Figure 2). METIS is the overall winner, both in quality and performance.
In conclusion, it is possible to perform graph partitioning with a hypergraph par-
titioner, but graph partitioners are much faster.
To our surprise, the partitionings generated by Mondriaan are reasonably bal-
anced in terms of communication volume, as shown in Figure 3, even though Mon-
driaan does not perform explicit communication volume balancing during matrix
partitioning. We attribute the observed balancing to the fact that the Mondriaan
algorithm performs random tie-breaking, without any preference for a specific part
of the partitioning.
Fortunately, for the given test set of the DIMACS challenge, we did not need
to consider edge weights. However, for Mondriaan to be useful as graph partitioner
also for weighted graphs, we have to extend Mondriaan to take hypergraph net
weights into account for the (λ − 1)-metric, (2.2). We intend to add this feature in
a next version of Mondriaan.
References
[1] D. A. Bader, P. Sanders, D. Wagner, H. Meyerhenke, B. Hendrickson, D. S. Johnson, C. Wal-
shaw, and T. G. Mattson, 10th DIMACS implementation challenge - graph partitioning and
graph clustering, 2012; http://www.cc.gatech.edu/dimacs10.
[2] Rob H. Bisseling, Parallel scientific computation: A structured approach using BSP and MPI,
Oxford University Press, Oxford, 2004. MR2059580
[3] Rob H. Bisseling, Bas O. Fagginger Auer, A. N. Yzelman, Tristan van Leeuwen, and Ümit
V. Çatalyürek, Two-dimensional approaches to sparse matrix partitioning, Combinatorial
scientific computing, Chapman & Hall/CRC Comput. Sci. Ser., CRC Press, Boca Raton, FL,
2012, pp. 321–349, DOI 10.1201/b11644-13. MR2952757
[4] Rob H. Bisseling and Wouter Meesen, Communication balancing in parallel sparse
matrix-vector multiplication, Electron. Trans. Numer. Anal. 21 (2005), 47–65 (electronic).
MR2195104 (2007c:65040)
[5] T. Bui and C. Jones, A heuristic for reducing fill-in in sparse matrix factorization, Pro-
ceedings Sixth SIAM Conference on Parallel Processing for Scientific Computing, SIAM,
Philadelphia, PA, 1993, pp. 445–452.
[6] A. E. Caldwell, A. B. Kahng, and I. L. Markov, Improved algorithms for hypergraph bipar-
titioning, Proceedings Asia and South Pacific Design Automation Conference, ACM Press,
New York, 2000, pp. 661–666. DOI 10.1145/368434.368864.
[7] Ü. V. Çatalyürek and C. Aykanat, Hypergraph-partitioning-based decomposition for parallel
sparse-matrix vector multiplication, IEEE Transactions on Parallel and Distributed Systems
10 (1999), no. 7, 673–693. DOI 10.1109/71.780863.
[8] , PaToH: A multilevel hypergraph partitioning tool, version 3.0, Bilkent University,
Department of Computer Engineering, Ankara, 06533 Turkey. PaToH is available at http://
bmi.osu.edu/~umit/software.htm, 1999.
[9] , A fine-grain hypergraph model for 2D decomposition of sparse matrices, Proceed-
ings Eighth International Workshop on Solving Irregularly Structured Problems in Parallel
(Irregular 2001), IEEE Press, Los Alamitos, CA, 2001, p. 118.
[10] C. Chevalier and F. Pellegrini, PT-Scotch: a tool for efficient parallel graph ordering, Parallel
Comput. 34 (2008), no. 6-8, 318–331, DOI 10.1016/j.parco.2007.12.001. MR2428880
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
ABUSING A HYPERGRAPH PARTITIONER 35
[11] Timothy A. Davis and Yifan Hu, The University of Florida sparse matrix collection,
ACM Trans. Math. Software 38 (2011), no. 1, Art. 1, 25pp, DOI 10.1145/2049662.2049663.
MR2865011 (2012k:65051)
[12] K. D. Devine, E. G. Boman, R. T. Heaphy, R. H. Bisseling, and U. V. Catalyurek,
Parallel hypergraph partitioning for scientific computing, Proceedings IEEE International
Parallel and Distributed Processing Symposium 2006, IEEE Press, p. 102, 2006. DOI
10.1109/IPDPS.2006.1639359.
[13] Bruce Hendrickson and Robert Leland, An improved spectral graph partitioning algorithm
for mapping parallel computations, SIAM J. Sci. Comput. 16 (1995), no. 2, 452–469, DOI
10.1137/0916028. MR1317066 (96b:68140)
[14] George Karypis and Vipin Kumar, A fast and high quality multilevel scheme for partition-
ing irregular graphs, SIAM J. Sci. Comput. 20 (1998), no. 1, 359–392 (electronic), DOI
10.1137/S1064827595287997. MR1639073 (99f:68158)
[15] Multilevel k-way hypergraph partitioning, Proceedings 36th ACM/IEEE Conference
on Design Automation, ACM Press, New York, 1999, pp. 343–348.
[16] , Parallel multilevel k-way partitioning scheme for irregular graphs, SIAM Review 41
(1999), no. 2, 278–300. DOI 10.1145/309847.309954.
[17] B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell
System Technical Journal 49 (2) (1970), 291–307.
[18] F. Pellegrini and J. Roman, Scotch: A software package for static mapping by dual recursive
bipartitioning of process and architecture graphs, Proceedings High Performance Comput-
ing and Networking Europe, Lecture Notes in Computer Science, vol. 1067, Springer, 1996,
pp. 493–498. DOI 10.1007/3-540-61142-8 588.
[19] Peter Sanders and Christian Schulz, Engineering multilevel graph partitioning algorithms,
Algorithms—ESA 2011, Lecture Notes in Comput. Sci., vol. 6942, Springer, Heidelberg, 2011,
pp. 469–480, DOI 10.1007/978-3-642-23719-5 40. MR2893224 (2012k:68259)
[20] A. J. Soper, C. Walshaw, and M. Cross, A combined evolutionary search and multilevel
optimisation approach to graph-partitioning, J. Global Optim. 29 (2004), no. 2, 225–241,
DOI 10.1023/B:JOGO.0000042115.44455.f3. MR2092958 (2005k:05228)
[21] A. Trifunović and W. J. Knottenbelt, Parallel multilevel algorithms for hypergraph parti-
tioning, Journal of Parallel and Distributed Computing 68 (2008), no. 5, 563–581. DOI
10.1016/j.jpdc.2007.11.002.
[22] Brendan Vastenhouw and Rob H. Bisseling, A two-dimensional data distribution method for
parallel sparse matrix-vector multiplication, SIAM Rev. 47 (2005), no. 1, 67–95 (electronic),
DOI 10.1137/S0036144502409019. MR2149102 (2006a:65070)
[23] C. Walshaw and M. Cross, JOSTLE: Parallel Multilevel Graph-Partitioning Software
– An Overview, Mesh Partitioning Techniques and Domain Decomposition Techniques
(F. Magoules, ed.), Civil-Comp Ltd., 2007, pp. 27–58.
[24] A. N. Yzelman and Rob H. Bisseling, Cache-oblivious sparse matrix-vector multiplication by
using sparse matrix partitioning methods, SIAM J. Sci. Comput. 31 (2009), no. 4, 3128–3154,
DOI 10.1137/080733243. MR2529783 (2011a:65111)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11711
1. Introduction
Graph partitioning is a well studied problem in combinatorial scientific com-
puting. An important application is the mapping of data and/or tasks on a parallel
computer, where the goals are to balance the load and to minimize communi-
cation [12]. There are several variations of graph partitioning, but they are all
NP-hard problems. Fortunately, good heuristic algorithms exist. Naturally, there
is a trade-off between run-time and solution quality. In parallel computing, parti-
tioning may be performed either once (static partitioning) or many times (dynamic
load balancing). In the latter case, it is crucial that the partitioning itself is fast.
Furthermore, the rapid growth of problem sizes in scientific computing dictates
that partitioning algorithms must be scalable. The multilevel approach developed
in the 1990s [3, 11, 17] provides a good compromise between run-time (complexity)
1991 Mathematics Subject Classification. Primary 68R10; Secondary 05C65, 68W10, 68Q85.
Key words and phrases. Graph partitioning, hypergraph partitioning, parallel computing.
We thank the U.S. Department of Energy’s Office of Science, the Advanced Scientific Com-
puting Research (ASCR) office, and the National Nuclear Security Administration’s ASC program
for financial support. Sandia is a multiprogram laboratory managed and operated by Sandia Cor-
poration, a wholly owned subsidiary of Lockheed Martin, for the U.S. Department of Energy’s
National Nuclear Security Administration under contract DE-AC04-94AL85000. This research
used resources of the National Energy Research Scientific Computing Center (NERSC), which is
supported by the Office of Science of the DOE under Contract No. DE-AC02-05CH11231.
2013
c American Mathematical Society
37
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
38 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
and quality. Software packages based on this approach (Chaco [13], Metis [14],
and Scotch [19]) have been extremely successful. Even today, all the major paral-
lel software packages for partitioning in scientific computing (ParMetis [15], PT-
Scotch [20], and Zoltan [8, 9]) use variations of the multilevel graph partitioning
algorithm.
The 10th DIMACS implementation challenge offers an opportunity to evaluate
the current (2012) state-of-the-art in partitioning software. This is a daunting task,
as there are several variations of the partitioning problem (e.g., objectives), several
software codes, and a large number of data sets. In this paper we limit the scope
in the following ways: We only consider parallel software since our focus is high-
performance computing. We focus on the Zoltan toolkit since its partitioner can
be used to minimize either the edge cut (graph partitioning) or the communication
volume (hypergraph partitioning). We include some baseline comparisons with
ParMetis, since that is the most widely used parallel partitioning software. We
limit the experiments to a subset of the DIMACS graphs. One may view this paper
as a follow-up to the 2006 paper that introduced the Zoltan PHG partitioner [9].
Contributions: We compare graph and hypergraph partitioners for both sym-
metric and unsymmetric inputs and obtain results that are quite different than
in [4]. For nonsymmetric matrices we see a big difference in communication volume
(orders of magnitude), while there is virtually no difference among the partitioners
for symmetric matrices. We exercise Zoltan PHG on larger number of processors
than before (up to 1024). We present results for impact of partitioning on an it-
erative solver. We also include results for the maximum communication volume,
which is important in practice but not an objective directly modeled by any current
partitioner.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 39
where λ(v, Π) denotes the number of parts that v or any of its neighbors belong to,
with respect to the partition Π.
We then obtain the following two metrics:
(4) CVmax (G, Π) = max comm(πp )
p
(5) CVsum (G, Π) = comm(πp )
p
where λ(e, Π) is the number of distinct parts that contain any vertex in e.
While graphs are restricted to structurally symmetric problems (undirected
graphs), hypergraphs make no such assumption. Furthermore, the number of ver-
tices and hyperedges may differ, making the model suitable for rectangular matrices.
The key advantage of the hypergraph model is that the hyperedge (λ − 1) cut (CV)
accurately models the total communication volume. This was first observed in [4]
in the context of sparse matrix-vector multiplication. The limitations of the graph
model were described in detail in [12]. This realization led to a shift from the
graph model to the hypergraph model. Today, many partitioning packages use the
hypergraph model: PaToH [4], hMetis [16], Mondriaan [21], and Zoltan-PHG [9].
Hypergraphs are often used to represent sparse matrices. For example, using
row-based storage (CSR), each row becomes a vertex and each column becomes a
hyperedge. Other hypergraph models exist: in the “fine-grain” model, each non-
zero is a vertex [5]. For the DIMACS challenge, all input is symmetric and given
as undirected graphs. Given a graph G(V, E), we will use the following derived
hypergraph H(V, E ): for each vertex v ∈ V , create an hyperedge e ∈ E that
contains v and all its neighbors. In this case, it is easy to see that CV (H, Π) =
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
40 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
CVsum (G, Π). Thus, we do not need to distinguish between communication volume
in the graph and hypergraph models.
2.3. Relevance of the Metrics. Most partitioners minimize either the total
edge cut (EC) or the total communication volume (CV-sum). A main reason for
this choice is that algorithms for these metrics are well developed. Less work has
been done to minimize the maximum communication volume (CV-max), though
in a parallel computing setting this may be more relevant as it corresponds to the
maximum communication for any one process.
In order to compare the three metrics and how they correspond to the actual
performance we use conjugate gradient (CG) iteration (from the Belos package [1])
as a test case. We used the matrices from the UF sparse matrix collection group of
the DIMACS challenge. As the goal is to compare the matrix-vector multiply time
in the CG iteration, we used no preconditioner as the performance characteristics
will be different depending on the preconditioners. As there is no preconditioner
and some of these problems are ill-conditioned the CG iteration might not converge
at all, so we report the solve time for 1000 iterations. We compare four different
row-based partitionings (on 12 processors): natural (block) partitioning, random
partitioning, graph partitioning with ParMetis, and hypergraph partitioning with
Zoltan hypergraph partitioner. We only change the data distribution, and do not
reorder the matrix, so the convergence of CG is not affected. The results are shown
in Table 1. As expected, random partitioning is worst since it just balances the
load but has high communication. In all but one case, we see that both graph
and hypergraph partitioning beat the simple natural (block) partitioning (which is
the default in Trilinos). For the audikw1 test matrix, the time is cut to less than
half. For these symmetric problems, the difference between graph and hypergraph
partitioning is very small in terms of real performance gains. We will show in Sec-
tion 4.2 that the partitioners actually differ in terms of the measured performance
metrics for three of the problems shown in Table 1. However, the difference in the
metrics do not translate to measurable real performance gain in the time for the
matrix-vector multiply.
Table 1. Solve time (seconds) for 1000 iterations of CG for dif-
ferent row partitioning options.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 41
vertices
P0 P1 P2
hyperedges
P3 P4 P5
partitioner (PHG) was developed [9] and added to Zoltan. While PHG was de-
signed for hypergraph partitioning, it can also be used for graph partitioning but
it is not optimized for this use case. (Note: “PHG” now stands for Parallel Hyper-
graph and Graph partitioner.) Zoltan also supports other combinatorial problems
such as graph ordering and graph coloring [2].
Zoltan PHG is a parallel multilevel partitioner, consisting of the usual coars-
ening, initial partitioning, and refinement phases. The algorithm is similar to the
serial partitioners PaToH [4], hMetis [16] and Mondriaan [21], but Zoltan PHG is
parallel (based on MPI) so can run on both shared-memory and distributed-memory
systems. Note that Zoltan can partition data into k parts using p processes, where
k
= p. Neither k nor p need be powers of two. We briefly describe the algorithm in
Zoltan PHG, with emphasis on the parallel computing aspects. For further details
on PHG, we refer to [9]. The basic algorithm remains the same, though several
improvements have been made over the years.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
42 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
similar and therefore more likely to be in the same partition in a good partition-
ing. Catalyurek and Aykanat [4] suggested a heavy-connectivity matching, which
measures a similarity metric between pairs of vertices. Their preferred similarity
metric, which was also adopted by hMETIS [16] and Mondriaan [21], is known as
the inner product or simply, heavy connectivity. The inner product between two
vertices is defined as the Euclidean inner product between their binary hyperedge
incidence vectors, that is, the number of hyperedges they have in common. (Edge
weights can be incorporated in a straight-forward way.) Zoltan PHG also uses the
heavy-connectivity (inner-product) metric in the coarsening. Originally only pairs
of vertices were merged (matched) but later vertex aggregation (clustering) that
allows more than two vertices to be merged was made the default as it produces
slightly better results.
Previous work have shown that greedy strategies work well in practice so op-
timal matching based on similarity scores (inner products) is not necessary. The
sequential greedy algorithm works as follows. Pick a (random) unmatched vertex
v. For each unmatched neighbor vertex u, compute the inner product < v, u >.
Select the vertex with the highest non-zero inner product value and match it with
v. Repeat until all vertices have been considered. If we consider the hypergraph as
a sparse matrix A, we essentially need to compute the matrix product AT A. We
can use the sparsity of A to compute only entries of AT A that may be nonzero.
Since we use a greedy strategy, we save work and compute only a subset of the
nonzero entries in AT A. This strategy has been used (successfully) in several serial
partitioners.
With Zoltan’s 2D data layout, this fairly simple algorithm becomes much more
complicated. Each processor knows about only a subset of the vertices and the
hyperedges. Computing the inner products requires communication. Even if A
is typically very sparse, AT A may be fairly dense. Therefore we cannot compute
all of AT A at once, but instead compute parts of it in separate rounds. In each
round, each processor selects a (random) subset of its vertices that we call candi-
dates. These candidates are broadcast to all other processors in the processor row.
This requires horizontal communication in the 2D layout. Each processor then
computes the inner products between its local vertices and the external candidates
received. Note that these inner products are only partial inner products; vertical
communication along processor columns is required to obtain the full (global) inner
products. One could let a single processor within a column accumulate these full
inner products, but this processor may run out of memory. So to improve load
balance, we accumulate inner products in a distributed way, where each processor
is responsible for a subset of the vertices.
At this point, the potential matches in a processor column are sent to the
master row of processors (row 0). The master row first greedily decides the best
local vertex for each candidate. These local vertices are then locked, meaning they
can match only to the desired candidate (in this round). This locking prevents
conflicts between candidates, which could otherwise occur when the same local
vertex is the best match for several candidates. Horizontal communication along
the master row is used to find the best global match for each candidate. Due to
our locking scheme, the desired vertex for each match is guaranteed to be available
so no conflicts arise between vertices. The full algorithm is given in [9].
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 43
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
44 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
4. Experiments
4.1. Software, Platform, and Data. Our primary goal is to study the be-
havior of Zoltan PHG as a graph and a hypergraph partitioner, using different
objectives and a range of data sets. We use Zoltan 3.5 (Trilinos 10.8) and ParMetis
4.0 as a reference for all the tests. The compute platform was mainly Hopper, a
Cray XE6 at NERSC. Hopper has 6,384 compute nodes, each with 24 cores (two
12-core AMD MagnyCours) and 32 GB of memory. The graphs for the tests are
from five test families of the DIMACS collection that are relevant to the compu-
tational problems we have encountered at Sandia. Within each family, we selected
some of the largest graphs that were not too similar. In addition we picked four
other graphs, two each from the street networks and clustering instances (which
also happened to be road networks), to compile our diverse 22 test problems.
The graphs are partitioned into 16, 64, 256, and 1024 parts. In the paral-
lel computing context, this covers everything from a multicore workstation to a
medium-sized parallel computer. Except where stated otherwise, the partitioner
had the same number of MPI ranks as the target number of parts.
Zoltan uses randomization, so results may vary slightly from run to run. How-
ever, for large graphs, the random variation is relatively small. Due to limited
compute time on Hopper, each partitioning test was run only once. Even with the
randomization, it is fair to draw conlusions based on several data sets, though one
should be cautious about overinterpreting any single data point.
4.2. Zoltan vs. ParMetis. In this section, we compare Zoltan’s graph and
hypergraph partitioning with ParMetis’s graph partitioning. We partition the
graphs into 256 parts with 256 MPI processes. The performance profile of the
three metrics – total edge cut (EC), the maximum communication volume (CV-
max) and the total communication volume (CV-sum) – for the 22 matrices is shown
in Figure 2.
The main advantage of the hypergraph partitioners is the ability to handle un-
symmetric problems and to reduce the communication volume for such problems
directly (without symmetrizing the problems). However, all the 22 problems used
for the comparisons in Figure 2 are symmetric problems from the DIMACS chal-
lenge set. We take this opportunity to compare graph and hypergraph partitioners
even for symmetric problems.
In terms of the edge cut metric ParMetis does better than Zoltan for 20 of
the matrices and Zoltan’s graph model does better for just two matrices. However,
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 45
Zoltan’s graph model is within 15% of ParMetis’s edge cuts for 82% of the problems
(see Figure 2(a)). The four problems that cause trouble to Zoltan’s graph model
are the problems from the street networks and clustering instances.
In terms of the CV-sum metric Zoltan’s partitioning with the hypergraph
model, is able to do better than Zoltan’s graph model in all the instances, and
is better than ParMetis for 33% of the problems, and is within 6% or better of CV-
sum of the ParMetis for another 44% of the problems (see Figure 2(c)). Again the
street networks and the clustering instances are the ones that cause problems for
the hypergraph partitioning. In terms of the CV-max metric Zoltan’s hypergraph
partitioning is better than the other two methods for 27% of the problems, and
within 15% of the CV-max for another 42% of the problems (see Figure 2(b)).
From our results, we can see that even for symmetric problems hypergraph
partitioners can perform nearly as well as (or even better than) the graph parti-
tioners depending on the problems and the metrics one cares about. We also note
that three of these 22 instances (af shell10, audikw1 and G3 circuit) come from the
same problems we used in Section 2.3 and Zoltan does better in one problem and
ParMetis does better on other two problems in terms of the CV-max metric. In
terms of EC metric ParMetis does better for all these four problems. However, as
we can see from Table 1 the actual solution time is slightly better when we use
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
46 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
the hypergraph partitioning for the three problems irrespective of which method is
better in terms of the metrics we compute. To be precise, we should again note
that the differences in actual solve time between graph and hypergraph partitioning
are minor for those three problems. We would like to emphasize that we are not
able to observe any difference in the performance of the actual application when
the difference in the metrics is a small percentage. We study the characteristics of
Zoltan’s graph and hypergraph partitioning in the rest of this paper.
(a) (b)
4.3. Zoltan Graph vs. Hypergraph model. We did more extensive exper-
iments on the symmetric problems with the graph and hypergraph partitioning of
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 47
(a) (b)
Zoltan. For each of the test problems from we compute the three metrics (EC, CV-
max, CV-sum) for part sizes 16, 1024. All the experiments use the same number of
MPI processes as the part sizes. Figure 3 shows the three metrics for hypergraph
partitioning normalized to graph partitioner results for both 16 and 1024 parts.
The results show that based on the EC metric, Zoltan’s graph partitioning is the
best for most problems. In terms of the CV-sum metric the hypergraph partitioning
fares better. Neither of the algorithms optimize, CV-max metric and as expected
the results are mixed for this metric. The results for 64 and 256 parts were not
different from the results presented in Figure 3 and are not presented here.
Figure 4 shows the change in the partitioning quality with respect to the three
metrics for both graph and hypergraph partitionings for two problems – cage15 and
hugetrace-0020. The metrics are normalized with respect to the values for the 16
parts case in these figures. These results are for the “good” problems and from
the results we can see why we call these problems the “good” problems – EC and
CV-sum go up by a factor of 3.5 to 4.5 when going from 16 parts to 1024 parts. In
contrast, we also show the change in the metrics from one problem from the street
networks and clustering set each (road central and asia.osm) in Figure 5. Note
that for the some of these problems the metrics scale with similar values that the
lines overlap in the graph. These second set of problems are challenging for both
our graph and hypergraph partitioners as EC and CV-max go up by a factor 60-70
going from 16-1024 processes (for road central). The changes in these values are
mainly because of the structure of the graphs.
4.4. Zoltan scalability. Many of Zoltan’s users use Zoltan within their par-
allel applications dynamically, where the number of parts equals the number of
MPI processes. As a result it is important for Zoltan to have a scalable parallel
hypergraph partitioner. We have made several improvements within Zoltan over
the past few years and we evaluate our parallel scalabilty for the DIMACS problems
instances in this section. Note that having a parallel hypegraph partitioner also
enables us to solve large problems that does not fit into the memory of a compute
node. However, we were able to partition all the DIMACS instances except the
matrix europe.osm with 16 cores. We omit the europe.osm matrix and three small
matrices from the Walshaw group that get partitioned within two seconds even with
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
48 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
16 cores, from these tests. The scalability results for the rest of the 18 matrices are
shown in Figure 6. We normalize the time for all the runs with time to compute
16 parts. Note that even though the matrix size remains the same, this is not a
traditional strong scaling test as the number of parts increases linearly with the
number of MPI processes. Since the work for the partitioner grows, it is unclear
what “perfect scaling” would be, but we believe this is a reasonable experiment as
it reflects a typical use case.
Even with the increase in the amount of work for large matrices like cage15
and hugebubbles-0020 we see performance improvements as we go to 1024 MPI
processes. However, for smaller problems like the auto or m14b the performance
remains flat (or degrades) as we go from 256 MPI processes to 1024 MPI processes.
The scalability of Zoltan’s graph partitioners is shown in Figure 7. We see
that the graph partitioner tends to scale well for most problems. Surprisingly, the
PHG hypergraph partitioner is faster than our graph partitioner in terms of actual
execution time for several of the problems. This may in part be due to the fact that
there are only n hyperedges in the hypergraph model compared to m edges in the
graph model. Recall that PHG treats graphs as hypergraphs, without exploiting
the special structure.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 49
by 4% when partitioners 1024 parts with just 24 MPI processes instead of using
the 1024 MPI processes. This confirms our conjecture that using fewer cores (MPI
processes) gives higher quality results, and raises the possibility of using shared-
memory techniques to improve the quality in the future.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
50 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
result to the original unsymmetric problem. The difference in terms of the commu-
nication volume is presumed to be small. We compare hypergraph partitioning on
A against graph partitioning on the symmetrized graph/matrix. We measure the
communication volume on the original nonsymmetric problem, since this typically
corresponds to the communication cost for the user and show order of magnitudes
difference. For these experiments we partitioned the matrix rows, but results for
column partitioning were similar.
These experiments were run on a 12-core workstation. We ran Zoltan on 12
MPI processes and partitioned into 12 parts. The test matrices were taken from the
UF collection [7], and vary in their degree of symmetry from 0 to 95%. We see from
Table 2 that hypergraph partitioning directly on A gives communication volume at
least one order of magnitude smaller than graph partitioning on the symmetrized
version in half the test cases. This is substantially different from the 30 − 38%
average reduction observed in [4]. We arranged the matrices in decreasing degree
of symmetry. Observe that hypergraph partitioning performs relatively better on
the highly nonsymmetric matrices. Also note that there is essentially no difference
in quality between Zoltan PHG as a graph partitioner and ParMetis for these cases.
We conjecture the difference is neglible because the error made in the model by
symmetrizing the matrix is far greater than differences in the implementation.
Note that some of the problems in the 22 symmetric test problems were origi-
nally unsymmetric problems (like citeseer and DBLP data) but were symmetrized
for graph partitioning. We do not have the unsymmetric versions of these problems
so we could not use those here.
Table 2. Comparison of communication volume (CV-sum) for
nonsymmetric and the corresponding symmetrized matrices.PHG
was used as hypergraph partitioner on A and as a graph partitioner
on Asym ≡ (A + AT )
5. Conclusions
We have evaluated the parallel performance of Zoltan PHG, both as a graph and
hypergraph partitioner on test graphs from the DIMACS challenge data set. We
also made comparisons to ParMetis, a popular graph partitioner. We observed that
ParMetis consistently obtained best edge cut (EC), as we expected. Surprisingly,
ParMetis also obtained lower communication volume (CV) in lot of the symmetric
problems. This raises the question: Is hypergraph partitioning worth it? A key
advantage of hypergraph partitioning is that it accurately minimizes communication
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL PARTITIONING WITH ZOLTAN 51
volume [4, 12]. It appears that the superiority of the hypergraph model is not
reflected in current software. We believe that one reason Zoltan PHG does relatively
poorly on undirected graphs, is that symmetry is not preserved during coarsening,
unlike graph partitioners. Future research should consider hypergraph partitioners
with symmetric coarsening, to try combine the best of both methods.
We further showed that hypergraph partitioners are superior to graph parti-
tioners on nonsymmetric data. The reduction in communication volume can be one
or two orders of magnitude. This is a much larger difference than previously ob-
served. This may in part be due to the selection of data sets, which included some
new areas such as weblink matrices. A common approach today is to symmetrize
a nonsymmetric matrix and partition A + AT . We demonstrated this is often a
poor approach, and with the availability of the PHG parallel hypergraph parti-
tioner in Zoltan, we believe many applications could benefit from using hypergraph
partitioners without any symmetrization.
Our results confirm that it is important to use a hypergraph partitioner on
directed graphs (nonsymmetric matrices). However, for naturally undirected graphs
(symmetric matrices) graph partitioners perform better. If a single partitioner for
all cases is desired, then Zoltan-PHG is a reasonable universal partitioner.
Acknowledgements
We thank Karen Devine for helpful discussions.
References
[1] E. Bavier, M. Hoemmen, S. Rajamanickam, and H. Thornquist, Amesos2 and Belos: Direct
and iterative solvers for large sparse linear systems, Scientific Programming 20 (2012), no. 3,
241–255.
[2] E. G. Boman, U. V. Catalyurek, C. Chevalier, and K. D. Devine, The Zoltan and Isorropia
parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring,
Scientific Programming 20 (2012), no. 2.
[3] T. Bui and C. Jones, A heuristic for reducing fill in sparse matrix factorization, Proc. 6th
SIAM Conf. Parallel Processing for Scientific Computing, SIAM, 1993, pp. 445–452.
[4] Ü. Çatalyürek and C. Aykanat, Hypergraph-partitioning-based decomposition for parallel
sparse-matrix vector multiplication, IEEE Trans. Parallel Dist. Systems 10 (1999), no. 7,
673–693.
[5] , A fine-grain hypergraph model for 2d decomposition of sparse matrices, Proc. IPDPS
8th Int’l Workshop on Solving Irregularly Structured Problems in Parallel (Irregular 2001),
April 2001.
[6] , A hypergraph-partitioning approach for coarse-grain decomposition, Proc. Supercom-
puting 2001, ACM, 2001.
[7] Timothy A. Davis and Yifan Hu, The University of Florida sparse matrix collection,
ACM Trans. Math. Software 38 (2011), no. 1, Art. 1, 25, DOI 10.1145/2049662.2049663.
MR2865011 (2012k:65051)
[8] Karen Devine, Erik Boman, Robert Heaphy, Bruce Hendrickson, and Courtenay Vaughan,
Zoltan data management services for parallel dynamic applications, Computing in Science
and Engineering 4 (2002), no. 2, 90–97.
[9] K.D. Devine, E.G. Boman, R.T. Heaphy, R.H. Bisseling, and U.V. Catalyurek, Parallel hy-
pergraph partitioning for scientific computing, Proc. of 20th International Parallel and Dis-
tributed Processing Symposium (IPDPS’06), IEEE, 2006.
[10] C. M. Fiduccia and R. M. Mattheyses, A linear-time heuristic for improving network parti-
tions, Proc. 19th IEEE Design Automation Conf., 1982, pp. 175–181.
[11] B. Hendrickson and R. Leland, A multilevel algorithm for partitioning graphs, Proc. Super-
computing ’95, ACM, December 1995.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
52 SIVASANKARAN RAJAMANICKAM AND ERIK G. BOMAN
[12] Bruce Hendrickson and Tamara G. Kolda, Graph partitioning models for parallel comput-
ing, Parallel Comput. 26 (2000), no. 12, 1519–1534, DOI 10.1016/S0167-8191(00)00048-X.
MR1786938
[13] Bruce Hendrickson and Robert Leland, The Chaco user’s guide, version 1.0, Tech. Report
SAND93-2339, Sandia National Laboratories, 1993.
[14] G. Karypis and V. Kumar, METIS: Unstructured graph partitioning and sparse matrix
ordering system, Tech. report, Dept. Computer Science, University of Minnesota, 1995,
http://www.cs.umn.edu/˜karypis/metis.
[15] , Parmetis: Parallel graph partitioning and sparse matrix ordering library,
Tech. Report 97-060, Dept. Computer Science, University of Minnesota, 1997,
http://www.cs.umn.edu/~metis.
[16] George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi Shekhar, Multilevel hypergraph
partitioning: Applications in VLSI domain, Proc. 34th Design Automation Conf., ACM, 1997,
pp. 526 – 529.
[17] George Karypis and Vipin Kumar, A fast and high quality multilevel scheme for partition-
ing irregular graphs, SIAM J. Sci. Comput. 20 (1998), no. 1, 359–392 (electronic), DOI
10.1137/S1064827595287997. MR1639073 (99f:68158)
[18] B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell
System Technical Journal 49 (1970), 291–307.
[19] F. Pelligrini, SCOTCH 3.4 user’s guide, Research Rep. RR-1264-01, LaBRI, Nov. 2001.
[20] , PT-SCOTCH 5.1 user’s guide, Research rep., LaBRI, 2008.
[21] Brendan Vastenhouw and Rob H. Bisseling, A two-dimensional data distribution method for
parallel sparse matrix-vector multiplication, SIAM Rev. 47 (2005), no. 1, 67–95 (electronic),
DOI 10.1137/S0036144502409019. MR2149102 (2006a:65070)
[22] Andy Yoo, Allison H. Baker, Roger Pearce, and Van Emden Henson, A scalable eigensolver
for large scale-free graphs using 2d graph partitioning, Proceedings of 2011 International
Conference for High Performance Computing, Networking, Storage and Analysis (New York,
NY, USA), SC ’11, ACM, 2011, pp. 63:1–63:11.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11704
1. Introduction
In parallel computing, the problem of distributing communicating tasks among
the available processing units is important. To solve this problem, several graph
and hypergraph models are proposed [6, 7, 9, 12, 20]. These models transform
the problem at hand to a balanced partitioning problem. The balance restriction
on part weights in conventional partitioning corresponds to the load balance in
the parallel environment, and the minimized objective function corresponds to the
total communication volume between processing units. Both criteria are crucial
in practice for obtaining short execution times, using less power, and utilizing the
computation and communication resources better.
In addition to the total data transfer, there are other communication metrics
investigated before, e.g., the total number of messages sent [19], or maximum vol-
ume of messages sent and/or received by a processor [4, 19]. Even with perfect
load balancing and minimized total data transfer, there can be a bottleneck pro-
cessing unit which participates in most of the data transfers. This can create a
problem especially for data intensive applications where reducing the amount of
data transferred by the bottleneck processing unit can improve the total execution
time significantly.
2013
c American Mathematical Society
53
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
54 ÜMIT V. ÇATALYÜREK, MEHMET DEVECI, KAMER KAYA, AND BORA UÇAR
In this work, given a task graph, our main objective is distributing its tasks
evenly and minimizing the maximum amount of data sent by a processing unit.
Previous studies addressing different communication cost metrics (such as [4, 19])
work in two phases. In the first phase, the total volume of communication is
reduced, and in the second phase the other metrics are addressed. We propose a
directed hypergraph model and partition the related hypergraph with a multi-level
approach and a novel K-way refinement heuristic. While minimizing the primary
objective function, our refinement heuristic also takes the maximum data sent and
received by a processing unit and the total amount of data transfer into account
by employing a tie-breaking scheme. Therefore, our approach is different from the
existing studies in that the objective functions are minimized all at the same time.
The organization of the paper is as follows. In Section 2, the background mate-
rial on graph and hypergraph partitioning is given. Section 2.3 shows the differences
of the graph and hypergraph models and describes the proposed directed hyper-
graph model. In Section 3, we present our multi-level, multi-objective partitioning
tool UMPa (pronounced as “Oompa”). Section 4 presents the experimental results,
and Section 5 concludes the paper.
2. Background
2.1. Hypergraph partitioning. A hypergraph H = (V, N ) is defined as a set
of vertices V and a set of nets (hyperedges) N among those vertices. A net n ∈ N
is a subset of vertices and the vertices in n are called its pins. The number of pins
of a net is called the size of it, and the degree of a vertex is equal to the number of
nets it is connected to. In this paper, we will use pins[n] and nets[v] to represent
the pin set of a net n and the set of nets vertex v is connected to, respectively.
The vertices can be associated with weights, denoted with w[·], and the nets can
be associated with costs, denoted with c[·].
A K-way partition of a hypergraph H is denoted as Π = {V1 , V2 , . . . , VK } where
• parts are pairwise disjoint, i.e., Vk ∩ V = ∅ for all 1 ≤ k <
≤ K,
• each part Vk is a nonempty subset of V, i.e., Vk ⊆ V and Vk
= ∅ for
1 ≤ k ≤ K,
K
• union of K parts is equal to V, i.e., k=1 Vk = V.
Let Wk denote the total vertex weight in Vk (i.e., Wk = v∈Vk w[v]) and Wavg
denote the weight
of each part when the total vertex weight is equally distributed
(i.e., Wavg = ( v∈V w[v])/K). If each part Vk ∈ Π satisfies the balance criterion
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
UMPA: A MULTI-OBJECTIVE, MULTI-LEVEL PARTITIONER 55
In this metric, each cut net n contributes c[n](λn − 1) to the cutsize. The hyper-
graph partitioning problem can be defined as the task of finding a balanced partition
Π with K parts such that χ(Π) is minimized. This problem is also NP-hard [16].
2.2. K-way partitioning and multi-level framework. Arguably, the multi-
level approach [3] is the most successful heuristic for the hypergraph partitioning
problem. Although, it has been first proposed for recursive-bisection based graph
partitioning, it also works well for hypergraphs [2, 5, 7, 13, 17]. In the multi-level
approach, a given hypergraph is coarsened to a much smaller one, a partition is
obtained on the the smallest hypergraph, and that partition is projected to the
original hypergraph. These three phases will be called the coarsening, initial par-
titioning, and uncoarsening phases, respectively. The coarsening and uncoarsening
phases have multiple levels. In a coarsening level, similar vertices are merged to
make the hypergraph smaller. In the corresponding uncoarsening level, the merged
vertices are split, and the partition of the coarser hypergraph is refined for the finer
one.
Most of the multi-level partitioning tools used in practice are based on recursive
bisection. In recursive bisection, the multi-level approach is used to partition a given
hypergraph into two. Each of these parts is further partitioned into two recursively
until K parts are obtained in total. Hence, to partition a hypergraph into K = 2k ,
the recursive bisection approach uses K − 1 coarsening, initial partitioning, and
uncoarsening phases.
Several successful clustering heuristics are proposed to coarsen a hypergraph.
Although their similarity metrics aim to reduce the cutsize, they cannot find an
optimal solution, since the problem is NP-hard. Hence, an optimal partition of
the coarser hypergraph may not be optimal for the finer one. To obtain better
partitions, iterative-improvement-based heuristics are used to refine the coarser’s
partition after projecting it to finer. In practice, Kernighan-Lin (KL) [15] and
Fiduccia-Mattheyses (FM) [11] based refinement heuristics that depend on vertex
swaps and moves between two parts are used.
2.3. Task graph and communication volume metrics. Let A = (T , C)
be a task graph where T is the set of tasks to be executed, and C is the set of
communications between pairs of tasks. We assume that the execution time of
each task may differ, hence each task t ∈ T is associated with an execution time
exec(t). Each task ti ∈ T sends a different amount of data data(ti ) to each tj
such that ti tj ∈ C. The communications between tasks may be uni-directional,
That is ti tj ∈ C does not imply tj ti ∈ C. In our parallel setting, we assume
owner computes rule and hence, each task of A is executed by the processing unit
to which it is assigned. Let Ti be the set of tasks assigned to processing unit
i . Since it is desirable to distribute the tasks evenly, the computational load
P
t∈Ti exec(t) should be almost the same for each Pi . In addition to that, two
heavily communicating tasks should be assigned to the same processing unit since
less data transfer over the network is needed in this case. The total amount of data
transfer throughout the execution of the tasks is called the total communication
volume (totV ). Note that when a task t ∈ Ti needs to send data to a set of tasks
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
56 ÜMIT V. ÇATALYÜREK, MEHMET DEVECI, KAMER KAYA, AND BORA UÇAR
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
UMPA: A MULTI-OBJECTIVE, MULTI-LEVEL PARTITIONER 57
This is also the default metric in PaToH [8], a well-known hypergraph partitioner.
In each level
, we start with a finer hypergraph H and obtain a coarser one H+1 .
If VC ⊂ V is a subset of vertices deemed to be clustered, we create the cluster
vertex u ∈ V +1 where nets[u] = ∪v∈VC nets[v]. We also update the pin sets of the
nets in nets[u] accordingly.
Since we need the direction, i.e., source vertex information for each net to
minimize maxSV and maxSRV , we always store the source vertex of a net n ∈ N
as the first pin in pins[n]. To maintain this information, when a cluster vertex u
is formed in the coarsening phase, we put u to the head of pins[n] for each net n
whose source vertex is in the cluster.
3.3. Initial partitioning phase. To obtain an initial partition for the coars-
est hypergraph, we use PaToH [8], which is proven to produce high quality par-
titions with respect to the total communication volume metric [7]. We execute
PaToH ten times and get the best partition according to the maxSV metric. We
have several reasons to use PaToH. First, although our main objective is minimizing
maxSV , since we also take totV into account, it is better to start with an initial
partition having a good total communication volume. Second, since totV is the
sum of the send volumes of all parts, as we observed in our preliminary experi-
ments, minimizing it may also be good for both maxSV and maxSRV . Also, as
stated in [2], using recursive bisection and FM-based improvement heuristics for
partitioning the coarsest hypergraph is favorable due to small net sizes and high
vertex degrees.
3.4. K-way refinement of communication volume metrics. In an un-
coarsening level, which corresponds to the
th coarsening level, we project the
partition Π+1 obtained for H+1 to H . Then, we refine it by using a novel K-way
refinement heuristic which is described below.
Given a partition Π, let a vertex be a boundary vertex if it is in the pin set
of at least one cutnet. Let Λ(n, p) = |pins[n] ∩ Vp | be the number of pins of net
n in part p, and part[u] be the current part of u. The proposed heuristic runs in
multiple passes where in a pass it visits each boundary vertex u and either leaves
it in part[u], or moves it to another part according to some move selection policy.
Algorithm 1 shows a pass of the proposed refinement heuristic. For each visited
boundary vertex u and for each available part p other than part[u], the heuristic
computes how the communication metrics are affected when u is moved to p. This
is accomplished in three steps. First, u is removed from part[u], and the leave gains
on the send/receive volumes of the parts are computed (after line 1). Second, u
is put into a candidate part p and the arrival losses on the send/receive volumes
are computed (after line 2). Last, the maximum send, maximum send-receive, and
total volumes are computed for this move (after line 4).
3.4.1. Move selection policy and tie-breaking scheme. Our move selection policy
given in Algorithm 2 favors the moves with the maximum gain on maxSV and never
allows a move with negative gain on the same metric. To take other metrics into
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
58 ÜMIT V. ÇATALYÜREK, MEHMET DEVECI, KAMER KAYA, AND BORA UÇAR
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
UMPA: A MULTI-OBJECTIVE, MULTI-LEVEL PARTITIONER 59
account, we use a tie-breaking scheme which is enabled when two different moves of
a vertex u have the same maxSV gain. In this case, the move with maxSRV gain
is selected as the best move. If the gains on maxSRV are also equal then the move
with maximum gain on totV is selected. We do not allow a vertex move without
a positive gain on any of the communication metrics. As the experimental results
show, this move selection policy and tie-breaking scheme have positive impact on
all the metrics.
Algorithm 2: MoveSelect
Data: moveSV, moveSRV, moveV, p,
bestM axSV, bestM axSRV, bestT otV, bestP art
select ← 0
if moveSV < bestM axSV then
select ← 1 Main objective
1 else if moveSV = bestM axSV then
if moveSRV < bestM axSRV then
select ← 1 First tie break
if select = 1 then
bestM axSV ← moveSV
bestM axSRV ← moveSRV
bestT otV ← moveV
bestP art ← p
Figure 1 shows a sample graph with 8 vertices and 13 edges partitioned into
3 parts. Assume that this is a partial illustration of boundary vertices, and any
move will not violate the balance criteria. Each row in the table contains a possible
vertex move and the changes on the communication volume metrics. In the initial
configuration, maxSV = 6, maxSRV = 9, and totV = 12. If we move v3 from the
partition V2 to the partition V3 , we reduce all metrics by 1. On the other hand, if
we move v3 to V1 , we decrease maxSV and maxSRV , but totV does not change.
In this case, since its gain on totV is better, the tie-breaking scheme favors the
move v3 to V3 . Moreover, the moves v4 to V1 , v6 to V3 and v7 to V3 are other move
examples where tie-breaking scheme is used. Note that we allow all the moves in
the first 13 rows of the table including these two. However, we do not allow the
ones in the last three rows.
3.4.2. Implementation details. During the gain computations, the heuristic uses
the connectivity information between nets and parts stored in data structures λ and
Λ. These structures are constructed after the initial partitioning phase, and then
maintained by the uncoarsening phase. Since the connectivity information changes
after each vertex move, when a vertex u is moved, we visit the nets of u and up-
date the data structures accordingly. Also, when new vertices become boundary
vertices, they are inserted into boundary array and visited in the same pass.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
60 ÜMIT V. ÇATALYÜREK, MEHMET DEVECI, KAMER KAYA, AND BORA UÇAR
4. Experimental results
UMPa is tested on a computer with 2.27GHz dual quad-core Intel Xeon CPUs
and 48GB main memory. It is implemented in C++ and compiled with g++ version
4.5.2.
To obtain our data set, we used several graphs from the testbed of 10th DI-
MACS implementation challenge [10]. We remove relatively small graphs contain-
ing less than 104 vertices, and also extremely large ones. There are 123 graphs
in our data set from 10 graph classes. For each graph, we execute UMPa and
other algorithms 10 times. The results in the tables are the averages of these 10
executions.
To see the effect of UMPa’s K-way partitioning structure and its tie-breaking
scheme, we compare it with two different refinement approaches and PaToH. The
first approach is partitioning the hypergraph into K with PaToH’s recursive bi-
section scheme and refining it by using the proposed K-way refinement algorithm
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
UMPA: A MULTI-OBJECTIVE, MULTI-LEVEL PARTITIONER 61
without employing the tie-breaking scheme. The second approach is using UMPa
but again without tie breaking. To remove tie breaking, we remove the else state-
ments at lines labeled with 1 and 2 of Algorithm 2.
Table 1 gives the average performance of all these approaches normalized with
respect to PaToH’s performance. Without tie breaking, refining PaToH’s output
reduces the maximum send volume by 8%. However, it increases the maximum
send-receive and total volumes by 5% and 3%, respectively. Hence, we do not
suggest using the refinement heuristic alone and without tie breaking. On the
other hand, if it is used in the multi-level structure of UMPa, we obtain better
results even without a tie-breaking scheme.
Table 1 shows that UMPa’s multi-level structure helps to obtain 17% and 7%
less volumes than PaToH’s partitions in terms of maxSV and maxSRV , respec-
tively. But since PaToH minimizes the total communication volume, there is a 6%
overhead on the totV . Considering 17% reduction on maxSV , this overhead is
acceptable. However, we can still reduce all the communication metrics 9%-to-10%
more by employing the proposed tie-breaking scheme. For K = 4, this leads us to
a 34% better maximum send volume, which is impressive since even the total com-
munication volume is 16% less compared with PaToH. Actually, for all K values,
UMPa manages to reduce maxSV and maxSRV on the average. The percent of
improvement reduces with the increasing K. This may be expected since when K
is large, the total volume will be distributed into more parts, and the maximum
send or send-receive volume will be less. Still, on the average, the reductions on
maxSV , maxSRV , and totV are 26%, 16%, and 4%, respectively.
Tables 2 and 3 show performance of PaToH and UMPa in terms of the com-
munication metrics and time. There are 20 graphs in each table selected from 10
graph class in DIMACS testbed. For each graph class, we select the two (displayed
consecutively in the tables) for which UMPa obtains the best and worst improve-
ments on maxSV . The numbers given in the tables are averages of 10 different
executions. For all experiments with K = 16 parts, as Table 2 shows, UMPa ob-
tains a better maxSV value than PaToH on the average. When K = 4, 64, and
256, PaToH obtains a better average maxSV only for 16, 4, and 1 graphs, out of
123, respectively.
There are some instances in the tables for which UMPa improves maxSV sig-
nificantly. For example, for graph ut2010 in Table 2, the maxSV value is reduced
from 1506 to 330 with approximately 78% improvement. Furthermore, for the same
graph, the improvements on maxSRV and totV are 75% and 67%, respectively.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
62 ÜMIT V. ÇATALYÜREK, MEHMET DEVECI, KAMER KAYA, AND BORA UÇAR
PaToH UMPa
Graph maxSV maxSRV totV Time maxSV maxSRV totV Time
coPapersDBLP 62,174 139,600 673,302 91.45 53,619 117,907 842,954 145.47
as-22july06 1,506 5,063 12,956 0.63 1,144 3,986 13,162 2.70
road central 500 999 3,926 112.64 279 576 2,810 27.85
smallworld 12,043 24,020 188,269 3.09 10,920 21,844 174,645 19.27
delaunay n14 119 235 1,500 0.19 115 236 1,529 0.88
delaunay n17 351 706 4,100 1.09 322 655 4,237 2.54
hugetrace-00010 2,113 4,225 25,809 93.99 2,070 4,144 28,572 43.39
hugetric-00020 1,660 3,320 20,479 60.96 1,601 3,202 22,019 29.51
venturiLevel3 1,774 3,548 19,020 27.41 1,640 3,282 20,394 16.01
adaptive 2,483 4,967 27,715 54.00 2,345 4,692 29,444 29.33
rgg n 2 15 s0 146 293 1,519 0.34 119 254 1,492 1.03
rgg n 2 21 s0 1,697 3,387 19,627 37.86 1,560 3,215 20,220 16.66
tn2010 2,010 3,666 13,473 1.26 1,684 3,895 56,780 1.54
ut2010 1,506 2,673 3,977 0.43 330 677 1,303 0.82
af shell9 1,643 3,287 17,306 14.83 1,621 3,242 18,430 8.64
audikw1 15,119 29,280 145,976 161.23 11,900 24,182 159,640 77.16
asia.osm 63 125 409 40.43 30 62 323 7.67
belgium.osm 141 281 1,420 4.80 120.6 243 1,406 1.96
memplus 986 7,138 7,958 0.23 686 3,726 10,082 0.72
t60k 155 310 1,792 0.29 148.5 297 1,890 0.99
When K = 256 (Table 3) for the graph memplus, UMPa obtains approximately
50% improvement on maxSV and maxSRV . Although totV increases 26% at the
same time, this is acceptable considering the improvements on the first two metrics.
Table 4 shows the relative performance of UMPa in terms of execution time
with respect to PaToH. As expected, due to the complexity of K-way refinement
heuristic, UMPa is slower than PaToH especially when the number of parts is large.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
UMPA: A MULTI-OBJECTIVE, MULTI-LEVEL PARTITIONER 63
PaToH UMPa
Graph maxSV maxSRV totV Time maxSV maxSRV totV Time
coPapersCiteseer 7,854 16,765 577,278 224.09 5,448 11,615 579,979 658.21
coPapersDBLP 14,568 34,381 1,410,966 143.97 10,629 23,740 1,371,425 1038.86
as-22july06 1,555 7,128 28,246 1.01 617 4,543 33,347 12.62
smallworld 1,045 2,078 232,255 4.55 877 1,751 208,860 36.24
delaunay n20 301 600 57,089 17.98 279 566 58,454 68.85
delaunay n21 420 844 80,603 35.01 398 813 83,234 107.35
hugetrace-00000 407 814 74,563 55.51 415 831 80,176 123.66
hugetric-00010 502 1,004 91,318 92.45 477 955 97,263 167.69
adaptive 753 1,505 143,856 96.60 735 1,472 152,859 224.30
venturiLevel3 568 1,137 107,920 49.97 564 1,132 114,119 132.02
rgg n 2 22 s0 799 1,589 145,902 151.30 724 1,495 147,331 249.23
rgg n 2 23 s0 1,232 2,432 219,404 347.32 1,062 2,168 221,454 446.78
ri2010 3206 5,989 281,638 0.72 2,777 5,782 279,941 8.66
tx2010 5,139 9,230 124,033 8.47 3,011 7,534 117,960 15.55
af shell10 898 1,792 174,624 89.90 885 1,769 184,330 158.04
audikw1 4,318 8,299 680,590 322.57 3,865 7,607 692,714 822.73
asia.osm 72 146 4,535 72.37 66 135 4,484 18.79
great-britain.osm 104 209 11,829 50.52 82 168 11,797 25.51
finan512 199 420 36,023 2.75 192 437 36,827 27.70
memplus 1,860 7,982 15,785 0.49 946 4,318 19,945 8.25
K 4 16 64 256 Avg.
Relative time 1.02 1.29 2.01 5.76 1.98
References
[1] C. J. Alpert and A. B. Kahng, Recent directions in netlist partitioning: A survey, VLSI
Journal 19 (1995), no. 1–2, 1–81.
[2] Cevdet Aykanat, B. Barla Cambazoglu, and Bora Uçar, Multi-level direct k-way hypergraph
partitioning with multiple constraints and fixed vertices, Journal of Parallel and Distributed
Computing 68 (2008), no. 5, 609–625.
[3] S. T. Barnhard and H. D. Simon, Fast multilevel implementation of recursive spectral bisec-
tion for partitioning unstructured problems, Concurrency: Practice and Experience 6 (1994),
no. 2, 67–95.
[4] Rob H. Bisseling and Wouter Meesen, Communication balancing in parallel sparse
matrix-vector multiplication, Electron. Trans. Numer. Anal. 21 (2005), 47–65 (electronic).
MR2195104 (2007c:65040)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
64 ÜMIT V. ÇATALYÜREK, MEHMET DEVECI, KAMER KAYA, AND BORA UÇAR
[5] T. N. Bui and C. Jones, A heuristic for reducing fill-in sparse matrix factorization, Proc. 6th
SIAM Conf. Parallel Processing for Scientific Computing, SIAM, 1993, pp. 445–452.
[6] Ü. V. Çatalyürek and C. Aykanat, A hypergraph model for mapping repeated sparse matrix-
vector product computations onto multicomputers, Proc. International Conference on High
Performance Computing, December 1995.
[7] , Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multi-
plication, IEEE Transactions on Parallel and Distributed Systems 10 (1999), no. 7, 673–693.
[8] Ü. V. Çatalyürek and C. Aykanat, Patoh: A multilevel hypergraph partitioning tool, version
3.0, Bilkent University, Department of Computer Engineering, Ankara, 06533 Turkey. PaToH
is available at http://bmi.osu.edu/∼umit/software.htm, 1999.
[9] Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar, On two-dimensional sparse matrix
partitioning: models, methods, and a recipe, SIAM J. Sci. Comput. 32 (2010), no. 2, 656–
683, DOI 10.1137/080737770. MR2609335 (2011g:05176)
[10] 10th DIMACS implementation challenge: Graph partitioning and graph clustering, 2011,
http://www.cc.gatech.edu/dimacs10/.
[11] C. M. Fiduccia and R. M. Mattheyses, A linear-time heuristic for improving network parti-
tions, Proc. 19th Design Automation Conference, 1982, pp. 175–181.
[12] Bruce Hendrickson and Tamara G. Kolda, Graph partitioning models for parallel comput-
ing, Parallel Comput. 26 (2000), no. 12, 1519–1534, DOI 10.1016/S0167-8191(00)00048-X.
MR1786938
[13] Bruce Hendrickson and Robert Leland, A multilevel algorithm for partitioning graphs, Proc.
Supercomputing (New York, NY, USA), ACM, 1995.
[14] George Karypis, Multilevel hypergraph partitioning, Multilevel optimization in VLSICAD,
Comb. Optim., vol. 14, Kluwer Acad. Publ., Dordrecht, 2003, pp. 125–154. MR2021997
[15] B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, The
Bell System Technical Journal 49 (1970), no. 2, 291–307.
[16] Thomas Lengauer, Combinatorial algorithms for integrated circuit layout, Applicable Theory
in Computer Science, John Wiley & Sons Ltd., Chichester, 1990. With a foreword by Bryan
Preas. MR1071382 (91h:68089)
[17] Aleksandar Trifunovic and William Knottenbelt, Parkway 2.0: A parallel multilevel hyper-
graph partitioning tool, Proc. ISCIS, LNCS, vol. 3280, Springer Berlin / Heidelberg, 2004,
pp. 789–800.
[18] Bora Uçar and Cevdet Aykanat, Minimizing communication cost in fine-grain partitioning of
sparse matrices, Computer and Information Sciences - ISCIS 2003 (A. Yazici and C. Şener,
eds.), Lecture Notes in Computer Science, vol. 2869, Springer Berlin / Heidelberg, 2003,
pp. 926–933.
[19] Bora Uçar and Cevdet Aykanat, Encapsulating multiple communication-cost metrics in par-
titioning sparse rectangular matrices for parallel matrix-vector multiplies, SIAM J. Sci. Com-
put. 25 (2004), no. 6, 1837–1859 (electronic), DOI 10.1137/S1064827502410463. MR2086821
(2005g:05092)
[20] C. Walshaw, M. G. Everett, and M. Cross, Parallel dynamic graph partitioning for adaptive
unstructured meshes, Journal of Parallel Distributed Computing 47 (1997), 102–108.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
UMPA: A MULTI-OBJECTIVE, MULTI-LEVEL PARTITIONER 65
Table 5. The best maximum send volumes for UMPa for the
challenge instances. X means UMPa failed to obtain a partition
with the desired imbalance value.
Parts
Graph 2 4 8 16 32 64 128 256 512 1,024
as365 1,080 790 590 421 316
asia.osm 41 46 60 92 93
auto 2,044 1,497 1,070 733 501
coauthorsciteseer 10,066 7,773 5,313 3,216 2,006
delaunay n15 189 154 121 90 70
er-fact1.5-scale23 5,707,503 3,933,216 2,091,986 1,154,276 622,913
g3 circuit 1,266 1,630 1,151 938 536
great-britain.osm 134 114 92 78 58
hugebubbles-00010 3,012 1,948 1,522 822 609
hugetric-00000 1,274 2,206 1,117 804 458
kkt power 6,162 6,069 4,508 3,078 2,088
kron g500-logn17 36,656 53,381 55,314 49,657 42,272
kron g500-logn21 459,454 351,785 245,355 168,870 X
m6 1,487 2,034 1,427 762 568
nlpkkt160 71,708 55,235 49,700 36,483 25,107
nlr 2,380 1,563 847 623 447
rgg n 2 18 s0 516 431 330 248 195
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11699
Henning Meyerhenke
Abstract. Load balancing is important for the efficient execution of numer-
ical simulations on parallel computers. In particular when the simulation do-
main changes over time, the mapping of computational tasks to processors
needs to be modified accordingly. Most state-of-the-art libraries addressing
this problem are based on graph repartitioning with a parallel variant of the
Kernighan-Lin (KL) heuristic. The KL approach has a number of drawbacks,
including the optimized metric and solutions with undesirable properties.
Here we further explore the promising diffusion-based multilevel graph
partitioning algorithm DibaP. We describe the evolution of the algorithm and
report on its MPI implementation PDibaP for parallelism with distributed
memory. PDibaP is targeted at small to medium scale parallelism with dozens
of processors. The presented experiments use graph sequences that imitate
adaptive numerical simulations. They demonstrate the applicability and qual-
ity of PDibaP for load balancing by repartitioning on this scale. Compared to
the faster ParMETIS, PDibaP’s solutions often have partitions with fewer ex-
ternal edges and a smaller communication volume in an underlying numerical
simulation.
1. Introduction
Numerical simulations are very important tools in science and engineering for
the analysis of physical processes modeled by partial differential equations (PDEs).
To make the PDEs solvable on a computer, they are discretized within the sim-
ulation domain, e. g., by the finite element method (FEM). Such a discretization
yields a mesh, which can be regarded as a graph with geometric (and possibly other)
information. Application areas of such simulations are fluid dynamics, structural
mechanics, nuclear physics, and many others [10].
The solutions of discretized PDEs are usually computed by iterative numerical
solvers, which have become classical applications for parallel computers. For effi-
ciency reasons the computational tasks, represented by the mesh elements, must
be distributed onto the processors evenly. Moreover, neighboring elements of the
mesh need to exchange their values in every iteration to update their own value.
67
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
68 H. MEYERHENKE
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 69
implementation of PDibaP has been improved and adapted for MPI-parallel repar-
titioning. With this implementation we perform various repartitioning experiments
with benchmark graph sequences. These experiments are the first using PDibaP
for repartitioning and show the suitability of the disturbed diffusive approach.
The average quality of the partitions computed by PDibaP is clearly better than
that of the state-of-the-art repartitioners ParMETIS and parallel Jostle, while
PDibaP’s migration volume is usually comparable. It is important to note that
PDibaP’s improvement concerning the partition quality for the graph sequences is
even higher than in the case of static partitioning.
2. Related Work
We give a short introduction to the state-of-the-art of practical graph repar-
titioning algorithms and libraries which only require the adjacency information
about the graph and no additional problem-related information. For a broader
overview the reader is referred to Schloegel et al. [31]. The most recent advances
in graph partitioning are probably best covered in their entirety by the proceedings
volume [2] the present article is part of.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
70 H. MEYERHENKE
local improvement phase, two algorithms are used. On the coarse hierarchy levels,
a diffusive scheme takes care of balancing the subdomain sizes. Since this might af-
fect the partition quality negatively, a refinement algorithm is employed on the finer
levels. It aims at edge-cut minimization by profitable swaps of boundary vertices.
To address the load balancing problem in parallel applications, distributed
versions of the partitioners METIS, Jostle, and Scotch [6, 34, 39] have been
developed. Also, the tools Parkway [36], a parallel hypergraph partitioner, and
Zoltan [5], a suite of load balancing algorithms with focus on hypergraph parti-
tioning, need to be mentioned although they concentrate (mostly) on hypergraphs.
An efficient parallelization of the KL/FM heuristic that these parallel (hyper)graph
partitioners use is complex due to inherently sequential parts in this heuristic. For
example, one needs to ensure that during the KL/FM improvement no two neigh-
boring vertices change their partition simultaneously and destroy data consistency.
A coloring of the graph’s vertices is used by the parallel libraries ParMETIS [32]
and KaPPa [14] for this purpose.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 71
After the load vectors have been computed this way independently for all k
parts, each vertex v is assigned to the partition it has obtained the highest load
from. This completes one TruncCons iteration, which can be repeated several
times (the total number is denoted by Λ subsequently) to facilitate sufficiently
large movements of the parts.
A vertex with the same amount of load as all its neighbors does not change its
load in the next FOS iteration. Due to the choice of initial loads, there are many
such inactive vertices in the beginning. In fact, only vertices incident to the cut
edges of the part under consideration are active initially. In principle each new
FOS iteration adds a new layer of active vertices similar to BFS frontiers. We keep
1 In general L represents the whole graph. Yet, sparsifying the matrix in certain areas (also
called partial graph coarsening) is possible and leads to a significant acceleration without sacrificing
partitioning quality considerably [24]. While the influence of partial graph coarsening on the
partitioning quality is low, the solutions of the linear systems become distorted and more difficult
to analyze. Moreover, the programming overhead is immense. As the next section introduces
a simpler and faster way of diffusive partitioning, we do not consider partial graph coarsening
further here.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
72 H. MEYERHENKE
track which vertices are active and which are not. Thereby, it is possible to forego
the inactive vertices when performing the local FOS calculations.
In our implementation the size of the matrix M for which we compute a matrix-
vector product locally in each iteration is not changed. Instead, inner products
involving inactive rows are not computed as we know their respective result does
not change in the current iteration. That way the computational effort is restricted
to areas close to the partition boundaries.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 73
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
74 H. MEYERHENKE
the current solution is interpolated to the next level until the process stops at
the input level. Sometimes the matching algorithm has hardly coarsened a level.
This happens for example to avoid star-like subgraphs with strongly varying vertex
degrees. Limited coarsening results in two very similar adjacent levels. Local
improvement with TruncCons on both of these levels would result in similar
solutions with an unnecessary running time investment. That is why in such a case
TruncCons is skipped on the finer level of the two.
For static partitioning, which is still an ongoing effort, edges in the cut be-
tween parts on different processors should be considered as matching edges as well.
Otherwise, the multilevel hierarchy contains only a few levels after which no more
edges are found for the matching. The development and/or integration of such a
more general matching is part of future work.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 75
next larger candidate value is chosen as threshold. Another problem could be the
scheduled order in which migration takes place. It could happen that a processor
needs to move a number of vertices that it is about to obtain by a later move. To
address this, we employ a conservative approach and move rather fewer vertices
than too many. As a compensation, the whole procedure is repeated iteratively
until a balanced partition is found.
5. Experiments
Here we present some of our experimental results comparing our PDibaP im-
plementation to the KL/FM-based load balancers ParMETIS and parallel Jostle.
5.1. Benchmark Data. Our benchmark set comprises two types of graph
sequences. The first one consists of three smaller graph sequences with 51 frames
each, having between approximately 1M and 3M vertices, respectively. The second
group contains two larger sequences of 36 frames each. Each frame in this group
has approximately 4.5M to 16M vertices. These sequences result in 50 and 35
repartitioning steps, respectively. We choose to (re)partition the smaller sequences
into k = 36 and k = 60 parts, while the larger ones are divided into k = 60 and
k = 84 parts. These values have been chosen as multiples of 12 because one of our
main test machines has 12 cores per node.
All graphs of these five sequences have a two-dimensional geometry and have
been generated to resemble adaptive numerical simulations such as those occurring
in computational fluid dynamics. A visual impression of some of the data (in smaller
versions) is available in previous work [24, p. 562f.]. The graph of frame i + 1 in
a given sequence is obtained from the graph of frame i by changes restricted to
local areas. As an example, some areas are coarsened, whereas others are refined.
These changes are in most cases due to the movement of an object in the simulation
domain and often result in unbalanced subdomain sizes. For more details the reader
is referred to Marquardt and Schamberger [20], who have provided the generator
for the sequence data.2 Some of these frames are also part of the archive of the
10th DIMACS Implementation Challenge [1].
5.2. Hardware and Software Settings. We have conducted our experi-
ments on a cluster with 60 Fujitsu RX200S6 nodes each having 2 Intel Xeon X5650
processors at 2.66 GHz (results in 12 compute cores per node). Moreover, each node
has 36 GB of main memory. The interconnect is InfiniBand HCA 4x SDR HCA
PCI-e, the operating system Cent OS 5.4. PDibaP is implemented in C/C++.
PDibaP as well as ParMETIS and parallel Jostle have been compiled with In-
tel C/C++ compiler 11.1 and MVAPICH2 1.5.1 as MPI library. The number of
MPI processes always equals the number of parts k in the partition to be computed.
The main parameters controlling the running time and quality of the DibaP
algorithm are the number of iterations in the (re)partitioning algorithms Bubble-
FOS/C and TruncCons. For our experiments we perform 3 iterations within
Bubble-FOS/C, with one AssignPartition and one ComputeCenters operation,
respectively. The faster local approach TruncCons is used on all multilevel hierar-
chy levels with graph sizes above 12,000 vertices. For TruncCons, the parameter
settings Λ = 9 and ψ = 14 for the outer and inner iteration, respectively. These
2 Some of the input data can be downloaded from the website http://www.upb.de/cs/
henningm/graph.html.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
76 H. MEYERHENKE
settings provide a good trade-off between running time and quality. The allowed
imbalance is set to the default value 3% for all tools.
5.3. Results. In addition to the graph partitioning metrics edge-cut and com-
munication volume (of the underlying application based on the computed partition),
we are also interested in migration costs. These costs result from data changing
their processor after repartitioning. We count the number of vertices that change
their subdomain from one frame to the next as a measure of these costs. One could
also assign cost weights to the partitioning objectives and the migration volume to
evaluate the linear combination of both. Since these weights depend both on the
underlying application and the parallel architecture, we have not pursued this here.
We compare PDibaP to the state-of-the-art repartitioning tools ParMETIS and
parallel Jostle. Both competitors are mainly based on the vertex-exchanging KL
heuristic for local improvement. The load balancing toolkit Zoltan [5], whose in-
tegrated KL/FM partitioner is based on the hypergraph concept, is not included in
the detailed presentation. Our experiments with it indicate that it is not as suitable
for our benchmark set of FEM graphs, in particular because it yields disconnected
parts which propagate and worsen in the course of the sequence. We conclude that
currently the dedicated graph (as opposed to hypergraph) partitioners seem more
suitable for this problem type.
The partitioning quality is measured in our experiments by the edge cut (EC,
a summation norm) and the maximum communication volume (CVmax ). CVmax
is the sum of the maximum incoming communication volume and the maximum
outgoing communication volume, taken over all parts, respectively. The values
are displayed in Table 1, averaged over the whole sequence and aggregated by the
different k. Very similar results are obtained for the geometric mean in nearly
all cases, which is why we do not show these data as well. The migration costs
are recorded in both norms and shown for each sequence (again aggregated) in
Table 2. Missing values for parallel Jostle (—) indicate program crashes on the
corresponding instance(s).
The aggregated graph partitioning metrics show that PDibaP is able to com-
pute the best partitions consistently. PDibaP’s advance is highest for the com-
munication volume. With about 12–19% on parallel Jostle and about 34–53%
on ParMETIS these improvements are clearly higher than the approximately 7%
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 77
obtained for static partitioning [23], which is due to the fact that parallel KL
(re)partitioners often compute worse solutions than their serial counterparts for
static partitioning.
The results for the migration volume are not consistent. All tools have a sim-
ilar amount of best values. The fact that ParMETIS is competetitive is slightly
surprising when compared to previous results [22], where it compared worse. Also
unexpectedly, PDibaP shows significantly higher migration costs for the instance
biggerbubbles. Our experiments indicate that PDibaP has a more constant migra-
tion volume, while the values for parallel Jostle and ParMETIS show a higher
amplitude. It depends on the instance which strategy pays off. This behavior is
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
78 H. MEYERHENKE
k = 60 k = 84 k = 60 k = 84 k = 60 k = 84
hugetric 0.68 0.64 2.41∗ 4.68∗ 55.36 62.37
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 79
growing running times. Therefore, one can conclude that PDibaP is more suitable
for simulations with a small number of processors.
We would like to stress that a high repartitioning quality is often very impor-
tant. Usually, the most time consuming parts of numerical simulations are the
numerical solvers. Hence, a reduced communication volume provided by an ex-
cellent partitioning can pay off unless the repartitioning time is extremely high.
Nevertheless, a further acceleration of shape-optimizing load balaincing is of ut-
most importance. Minutes for each repartitioning step might be problematic for
some targeted applications.
6. Conclusions
With this work we have demonstrated that the shape-optimizing repartitioning
algorithm DibaP based on disturbed diffusion can be a good alternative to tra-
ditional KL-based methods for balancing the load in parallel adaptive numerical
simulations. In particular, the parallel implementation PDibaP is very suitable
for simulations of small to medium scale, i. e., when the number of vertices and
edges in the dynamic graphs are on the order of several millions. While PDibaP is
still significantly slower than the state-of-the-art, it usually computes considerably
better solutions w. r. t. edge cut and communication volume. In situations where
the quality of the load balancing phase is more important than its running time –
e. g., when the computation time between the load balancing phases is relatively
high – the use of PDibaP is expected to pay off.
As part of future work, we aim at an improved multilevel process and faster par-
titioning methods. It would also be worthwhile to investigate if Bubble-FOS/C
and TruncCons can be further adapted algorithmically, for example to reduce the
dependence on k in the running time.
References
[1] David Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner, 10th DIMACS
implementation challenge, http://www.cc.gatech.edu/dimacs10/, 2012.
[2] David Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner (eds.), Proceed-
ings of the 10th DIMACS implementation challenge, Contemporary Mathematics, American
Mathematical Society, 2012.
[3] N. A. Baker, D. Sept, M. J. Holst, and J. A. McCammon, The adaptive multilevel finite
element solution of the Poisson-Boltzmann equation on massively parallel computers, IBM
J. of Research and Development 45 (2001), no. 3.4, 427 –438.
[4] U. Catalyurek and C. Aykanat, Hypergraph-partitioning-based decomposition for parallel
sparse-matrix vector multiplication, IEEE Transactions on Parallel and Distributed System
10 (1999), no. 7, 673–693.
[5] Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdağ, Robert T. Heaphy, and
Lee Ann Riesen, A repartitioning hypergraph model for dynamic load balancing, J. Parallel
Distrib. Comput. 69 (2009), no. 8, 711–724.
[6] C. Chevalier and F. Pellegrini, PT-Scotch: a tool for efficient parallel graph ordering, Parallel
Comput. 34 (2008), no. 6-8, 318–331, DOI 10.1016/j.parco.2007.12.001. MR2428880
[7] G. Cybenko, Dynamic load balancing for distributed memory multiprocessors, Parallel and
Distributed Computing 7 (1989), 279–301.
[8] R. Diekmann, R. Preis, F. Schlimbach, and C. Walshaw, Shape-optimized mesh partitioning
and load balancing for parallel adaptive FEM, Parallel Computing 26 (2000), 1555–1581.
[9] C. M. Fiduccia and R. M. Mattheyses, A linear-time heuristic for improving network par-
titions, Proceedings of the 19th Conference on Design automation (DAC’82), IEEE Press,
1982, pp. 175–181.
[10] G. Fox, R. Williams, and P. Messina, Parallel computing works!, Morgan Kaufmann, 1994.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
80 H. MEYERHENKE
[11] Leo Grady and Eric L. Schwartz, Isoperimetric graph partitioning for image segmentation,
IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006), no. 3, 469–475.
[12] B. Hendrickson and R. Leland, A multi-level algorithm for partitioning graphs, Proceedings
Supercomputing ’95, ACM Press, 1995, p. 28 (CD).
[13] Bruce Hendrickson and Tamara G. Kolda, Graph partitioning models for parallel comput-
ing, Parallel Comput. 26 (2000), no. 12, 1519–1534, DOI 10.1016/S0167-8191(00)00048-X.
MR1786938
[14] Manuel Holtgrewe, Peter Sanders, and Christian Schulz, Engineering a scalable high quality
graph partitioner, IPDPS, IEEE, 2010, pp. 1–12.
[15] Y. F. Hu and R. J. Blake, An improved diffusion algorithm for dynamic load balancing, Par-
allel Comput. 25 (1999), no. 4, 417–444, DOI 10.1016/S0167-8191(99)00002-2. MR1684706
[16] George Karypis and Vipin Kumar, MeTiS: A Software Package for Partitioning Unstructured
Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices,
Version 4.0, Univ. of Minnesota, Minneapolis, MN, 1998.
[17] , Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and
Distributed Computing 48 (1998), no. 1, 96–129.
[18] B. W. Kernighan and S. Lin, An efficient heuristic for partitioning graphs, Bell Systems
Technical Journal 49 (1970), 291–308.
[19] Stuart P. Lloyd, Least squares quantization in PCM, IEEE Trans. Inform. Theory 28 (1982),
no. 2, 129–137, DOI 10.1109/TIT.1982.1056489. MR651807 (84a:94012)
[20] O. Marquardt and S. Schamberger, Open benchmarks for load balancing heuristics in parallel
adaptive finite element computations, Proceedings of the International Conference on Parallel
and Distributed Processing Techniques and Applications, (PDPTA’05), CSREA Press, 2005,
pp. 685–691.
[21] H. Meyerhenke and S. Schamberger, Balancing parallel adaptive FEM computations by solv-
ing systems of linear equations, Proceedings of the 11th International Euro-Par Conference,
Lecture Notes in Computer Science, vol. 3648, Springer-Verlag, 2005, pp. 209–219.
[22] Henning Meyerhenke, Dynamic load balancing for parallel numerical simulations based on
repartitioning with disturbed diffusion, Proc. Internatl. Conference on Parallel and Dis-
tributed Systems (ICPADS’09), IEEE Computer Society, 2009, pp. 150–157.
[23] Henning Meyerhenke, Burkhard Monien, and Thomas Sauerwald, A new diffusion-based mul-
tilevel algorithm for computing graph partitions, Journal of Parallel and Distributed Com-
puting 69 (2009), no. 9, 750–761, Best Paper Awards and Panel Summary: IPDPS 2008.
[24] Henning Meyerhenke, Burkhard Monien, and Stefan Schamberger, Graph partitioning and
disturbed diffusion, Parallel Computing 35 (2009), no. 10–11, 544–569.
[25] Vitaly Osipov and Peter Sanders, n-level graph partitioning, Algorithms—ESA 2010. Part
I, Lecture Notes in Comput. Sci., vol. 6346, Springer, Berlin, 2010, pp. 278–289, DOI
10.1007/978-3-642-15775-2 24. MR2762861
[26] François Pellegrini, A parallelisable multi-level banded diffusion scheme for computing bal-
anced partitions with smooth boundaries, Proc. 13th International Euro-Par Conference,
LNCS, vol. 4641, Springer-Verlag, 2007, pp. 195–204.
[27] Peter Sanders and Christian Schulz, Engineering multilevel graph partitioning algorithms,
Algorithms—ESA 2011, Lecture Notes in Comput. Sci., vol. 6942, Springer, Heidelberg, 2011,
pp. 469–480, DOI 10.1007/978-3-642-23719-5 40. MR2893224 (2012k:68259)
[28] Peter Sanders and Christian Schulz, Distributed evolutionary graph partitioning, Meeting on
Algorithm Engineering & Experiments (ALENEX’12), SIAM, 2012.
[29] Stefan Schamberger, Shape optimized graph partitioning, Ph.D. thesis, Universität Paderborn,
2006.
[30] K. Schloegel, G. Karypis, and V. Kumar, Wavefront diffusion and LMSR: Algorithms for
dynamic repartitioning of adaptive meshes, IEEE Transactions on Parallel and Distributed
Systems 12 (2001), no. 5, 451–466.
[31] , Graph partitioning for high performance scientific simulations, The Sourcebook of
Parallel Computing, Morgan Kaufmann, 2003, pp. 491–541.
[32] Kirk Schloegel, George Karypis, and Vipin Kumar, Multilevel diffusion schemes for reparti-
tioning of adaptive meshes, Journal of Parallel and Distributed Computing 47 (1997), no. 2,
109–124.
[33] , A unified algorithm for load-balancing adaptive scientific simulations, Proceedings
of Supercomputing 2000, IEEE Computer Society, 2000, p. 59 (CD).
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
SHAPE OPTIMIZING LOAD BALANCING 81
[34] , Parallel static and dynamic multi-constraint graph partitioning, Concurrency and
Computation: Practice and Experience 14 (2002), no. 3, 219–240.
[35] Klaus Stüben, An introduction to algebraic multigrid, Multigrid (U. Trottenberg, C. W.
Oosterlee, and A. Schüller, eds.), Academic Press, 2000, Appendix A, pp. 413–532.
[36] Aleksandar Trifunović and William J. Knottenbelt, Parallel multilevel algorithms for hyper-
graph partitioning, J. Parallel Distrib. Comput. 68 (2008), no. 5, 563–581.
[37] Denis Vanderstraeten, R. Keunings, and Charbel Farhat, Beyond conventional mesh par-
titioning algorithms and the minimum edge cut criterion: Impact on realistic applications,
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing
(PPSC’95), SIAM, 1995, pp. 611–614.
[38] C. Walshaw and M. Cross, Mesh partitioning: a multilevel balancing and refine-
ment algorithm, SIAM J. Sci. Comput. 22 (2000), no. 1, 63–80 (electronic), DOI
10.1137/S1064827598337373. MR1769526 (2001b:65153)
[39] C. Walshaw and M. Cross, Parallel optimisation algorithms for multilevel mesh partition-
ing, Parallel Comput. 26 (2000), no. 12, 1635–1660, DOI 10.1016/S0167-8191(00)00046-6.
MR1786940
[40] C. Xu and F. C. M. Lau, Load balancing in parallel computers, Kluwer, 1997.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11709
1. Introduction
Graph partitioning is an essential preprocessing step for distributed graph com-
putations. The cost of fine-grained remote memory references is extremely high in
case of distributed memory systems, and so one usually restructures both the graph
layout and the algorithm in order to mitigate or avoid inter-node communication.
The goal of this work is to characterize the impact of common graph partitioning
strategies that minimize edge cut, on the parallel performance of graph algorithms
on current supercomputers. We use breadth-first search (BFS) as our driving ex-
ample, as it is representative of communication-intensive graph computations. It
is also frequently used as a subroutine for more sophisticated algorithms such as
finding connected components, spanning forests, testing for bipartiteness, maxi-
mum flows [10], and computing betweenness centrality on unweighted graphs [1].
BFS has recently been chosen as the first representative benchmark for ranking
supercomputers based on their performance on data intensive applications [5].
2013
c American Mathematical Society
83
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
84 AYDIN BULUÇ AND KAMESH MADDURI
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 85
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
86 AYDIN BULUÇ AND KAMESH MADDURI
of the graph (i.e., assigning m/p edges to each processor). The edge cut is also
O(m) with random partitioning. While it can be shown that low-diameter real-
world graphs do not have sparse separators [8], constants matter in practice, and
any decrease in the communication volume, albeit not asymptotically, may translate
into reduced execution times for graph problems that are typically communication-
bound.
We outline the communication costs incurred in 2D-partitioned BFS in this
section. 2D-partitioned BFS also captures 1D-partitioned BFS as a degenerate
case. We first distinguish different ways of aggregating edges in the local discovery
phase of the BFS approach:
(1) No aggregation at all, local duplicates are not pruned before fold.
(2) Local aggregation at the current frontier only. Our simulations in Section 7.1
follow this assumption.
(3) Local aggregation over all (current and past) locally discovered vertices by
keeping a persistent bitmask. We implement this optimization for gathering
parallel execution results in Section 7.2.
Consider the expand phase. If the adjacencies of a single vertex v are shared
among λ+ ≤ pc processors, then its owner will need to send the vertex to λ+ − 1
neighbors. Since each vertex is in the pruned frontier once, the total communication
volume for the expand phases over all iterations is equal to the communication
volume of the same phase in 2D sparse-matrix vector multiplication (SpMV) [4].
Each iteration of BFS is a sparse-matrix sparse-vector multiplication of the form
AT × Fk . Hence, the column-net hypergraph model of AT accurately captures the
cumulative communication volume of the BFS expand steps, when used with the
connectivity-1 metric.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 87
The communication cost of the fold phase is complicated to analyze due to the
space-time partitioning of edges in the graph in a BFS execution. We can annotate
every edge in the graph using two integers: the partition the edge belongs to, and
the BFS phase in which the edge is traversed (remember each edge is traversed
exactly once).
The communication
volume due to a vertex v in the fold phase is at most
RemoteAdj− (v), which is realized when every e ∈ RemoteAdj− (v) has a distinct
space-time partitioning label, i.e. no two edges are traversed by the same remote
process during the same iteration. The edgecut of the partitioned graph is the set
of all edges for which
the end
vertices belong to different partitions. The size of the
edgecut is equal to v∈V RemoteAdj− (v), giving an upper bound for the overall
communication volume due to fold phases.
Another upper bound is O(diameter · (λ− − 1)), which might be lower than
the edgecut. Here, λ− ≤ pr is the number of processors among which Adj− (v) is
partitioned, and diameter gives the maximum number of BFS iterations. Conse-
quently, the communication volume due to discovering vertex v, comm(v), obeys
the following inequality: comm(v) ≤ min diameter · (λ− − 1), RemoteAdj− (v) .
In the above example, this value is min (8, 6) = 6.
Figure 2 shows the space-time edge partitioning of Adj− (v) per BFS step.
In the first step, the communication volume is 1, as the red processor discovers
v through the edge (h, v) and sends it to the black processor for marking. In the
second step, both green and black processors discover v and communication volume
is 1 from green to black. Continuing this way, we see that the actual aggregate
communication in the fold phase of v is 5.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
88 AYDIN BULUÇ AND KAMESH MADDURI
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 89
Although we report the total communication volume over the course of BFS
iterations, we are most concerned with the MaxVolume metric. It is a better approx-
imation for the time spent on remote communication, since the slowest processor
in each phase determines the overall time spent in communication.
5. Experimental Setup
Our parallel BFS implementation is level-synchronous, and so it is primarily
meant to be applied to low-diameter graphs. However, to quantify the impact of
barrier synchronization and load balance on the overall execution time, we run our
implementations on several graphs, both low- and high-diameter.
We categorize the following DIMACS Challenge instances as low diameter: the
synthetic Kronecker graphs (kron g500-simple-logn and kron g500-logn fami-
lies), Erdős-Rényi graphs (er-fact1.5 family), web crawls (eu-2005 and others),
citation networks (citationCiteseer and others), and co-authorship networks
(coAuthorsDBLP and others). Some of the high-diameter graphs that we report
performance results on include hugebubbles-00020, graphs from the delaunay
family, road networks (road central), and random geometric graphs.
Most of the DIMACS test graphs are small enough to fit in the main memory of
a single machine, and so we are able to get baseline serial performance numbers for
comparison. We are currently using serial partitioning software to generate vertex
partitions and vertex reorderings, and this has been a limitation for scaling to larger
graphs. However, the performance trends with DIMACS graphs still provide some
interesting insights.
We use the k-way multilevel partitioning scheme in Metis (v5.0.2) with the
default command-line parameters to generate balanced vertex partitions (in terms
of the number of vertices per partition) minimizing total edge cut. We relabel
vertices and distribute edges to multiple processes based on these vertex partitions.
Similarly, we use PaToH’s column-wise and checkerboard partitioning schemes to
partition the sparse adjacency matrix corresponding to the graph. While we report
communication volume statistics related to checkerboard partitioning, we are still
unable to use these partitions for reordering, since PaToH edge partitions are not
necessarily aligned.
We report parallel execution times on Hopper, a 6392-node Cray XE6 system
located at Lawrence Berkeley National Laboratory. Each node of this system con-
tains two twelve-core 2.1 GHz AMD Opteron Magny-Cours processors. There are
eight DDR3 1333-MHz memory channels per node, and the observed memory band-
width with the STREAM [9] benchmark is 49.4 GB/s. The main memory capacity
of each node is 32 GB, of which 30 GB is usable by applications. A pair of com-
pute nodes share a Gemini network chip, and these network chips are connected
to form a 3D torus (of dimensions 17 × 8 × 24). The observed MPI point-to-point
bandwidth for large messages between two nodes that do not share a network chip
is 5.9 GB/s. Further, the measured MPI latency for point-to-point communication
is 1.4 microseconds, and the cost of a global barrier is about 8 microseconds. The
maximum injection bandwidth per node is 20 GB/s.
We use the GNU C compiler (v4.6.1) for compiling our BFS implementation.
For inter-node communication, we use Cray’s MPI implementation (v5.3.3), which
is based on MPICH2. We report performance results up to 256-way MPI pro-
cess/task concurrency in this study. In all experiments, we use four MPI tasks
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
90 AYDIN BULUÇ AND KAMESH MADDURI
per node, with every task constrained to six cores to avoid any imbalances due to
Non-Uniform Memory Access (NUMA) effects. We did not explore multithreading
within a node in the current study. This may be another potential source of load
imbalance, and we will quantify this in future work. More details on multithreading
within a node can be found in our prior work on parallel BFS [2].
To compare performance across multiple systems using a rate analogous to
the commonly-used floating point operations per second, we normalize the serial
and parallel execution times by the number of edges visited in a BFS traversal
and present a Traversed Edges Per Second (TEPS) rate. For an undirected graph
with a single connected component, the BFS algorithm would visit every edge in
the component twice. We only consider traversal execution times from vertices
that appear in the largest connected component in the graph (all the DIMACS
test instances we used have one large component), compute the mean search time
(harmonic mean of TEPS) using at least 20 randomly-chosen sources vertices for
each benchmark graph, and normalize the time by the cumulative number of edges
visited to get the TEPS rate.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 91
We consider both the cases of spread-out and contiguous ranks on Hopper, and
microbenchmark Allgatherv and Alltoallv operations by varying processor grid con-
figurations. We benchmark each collective at 400, 1600, and 6400 process counts.
√ √ √ √
For each process count, we use a square p × p grid, a tall skinny (2 p) × ( p/2)
√ √
grid, and a short fat ( p/2) × (2 p) grid, making a total of nine different process
configurations for each of the four cases: Allgatherv spread-out, Alltoallv spread-
out, Allgatherv packed, Alltoallv packed. We perform linear regression on mean
inverse bandwidth (measured as microseconds/MegaBytes) achieved among all sub-
communicators when all subcommunicators work simultaneously. This mimics the
actual BFS scenario. We report the mean as opposed to minimum, because the
algorithm does not require explicit synchronization across subcommunicators.
In each run,we determine constants a, b, c that minimize the sum of squared
errors (SSres = (yobsd − yest )2 ) between the observed inverse bandwidth and the
inverse bandwidth estimated via the equation β(pr , pc ) = a pr +b pc +c. The results
are summarized in Table 1. If the observed t-value of any of the constants are below
the critical t-value, we force its value to zero and rerun linear regression. We have
considered other relationships that are linear in the coefficients, such as power series
and logarithmic dependencies, but the observed t-values were significantly below
the critical t-value for those hypotheses, hence not supporting them. We also list
r 2 , coefficient of determination, which shows the ratio (between 0.0 and 1.0) of
total variation in β that can be explained by its linear dependence on pr and pc .
Although one can get higher r 2 scores by using higher-order functions, we opt for
linear regression in accordance to Occam’s razor, because it adequately explains
the underlying data in this case.
We see that both the subcommunicator size (the number of processes in each
subcommunicator) and the total number of subcommunicators affect the perfor-
mance in a statistically significant way. The linear regression analysis shows that
the number of subcommunicators have a stronger effect on the performance than
the subcommunicator size for the Allgatherv operation on spread-out ranks (0.0700
vs 0.0148). For Alltoallv operation on spread-out ranks, however, their effects are
comparable (0.0428 vs 0.0475). Increasing the number of subcommunicators in-
creases both the contention and the physical distance between participating pro-
cesses. Subcommunicator size does not change the distance between each partic-
ipant in a communicator and the contention, but it can potentially increase the
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
92 AYDIN BULUÇ AND KAMESH MADDURI
Graph p=4×1 p = 16 × 1 p = 64 × 1
N R P N R P N R P
coPapersCiteseer 4.7% 14.7% 1.9% 8.7% 47.9% 3.4% 10.8% 102.5% 4.8%
coAuthorsCiteseer 37.6% 79.9% 5.9% 59.3% 143.9% 11.3% 68.7% 180.3% 15.6%
citationCiteseer 64.8% 75.0% 7.8% 125.0% 139.0% 16.9% 164.9% 176.1% 29.0%
coPapersDBLP 7.6% 18.4% 3.7% 15.7% 58.2% 7.6% 21.0% 118.8% 11.7%
coAuthorsDBLP 45.2% 81.3% 10.9% 74.9% 148.9% 19.8% 90.1% 182.5% 27.2%
eu-2005 5.3% 23.2% 0.3% 8.7% 63.8% 1.9% 12.3% 107.4% 7.2%
kronecker-logn18 7.7% 7.6% 6.3% 22.7% 23.1% 19.5% 47.5% 53.4% 45.0%
delaunay n20 52.4% 123.7% 0.2% 59.3% 178.0% 0.6% 60.6% 194.4% 1.4%
rgg n 2 20 s0 0.2% 85.5% 0.1% 0.6% 160.1% 0.3% 2.5% 188.9% 0.6%
The reported communication volume for the expand phase is exact, in the sense
that a processor receives a vertex v only if it owns one of the edges in Adj+ (v) and
it is not the owner of v itself. We count a vertex as one word of communication.
In contrast, in the fold phase, the discovered vertices are sent in parent, vertex id
pairs, resulting in two words of communication per discovered edge. This is why
values in Table 2 sometimes exceed 100% (i.e. more total communication than the
number of edges), but are always less than 200%. For these simulations, we report
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 93
Graph p=4×1 p = 16 × 1 p = 64 × 1
N R P N R P N R P
coPapersCiteseer 1.46 1.01 1.23 1.81 1.02 1.76 2.36 1.07 2.44
coAuthorsCiteseer 1.77 1.02 1.55 2.41 1.06 2.06 2.99 1.21 2.86
citationCiteseer 1.16 1.02 1.39 1.33 1.07 2.17 1.53 1.21 2.93
coPapersDBLP 1.56 1.01 1.22 1.99 1.02 1.86 2.40 1.05 2.41
coAuthorsDBLP 1.84 1.01 1.39 2.58 1.05 1.85 3.27 1.13 2.43
eu-2005 1.37 1.10 1.05 3.22 1.28 3.77 7.35 1.73 9.36
kronecker-logn18 1.04 1.06 1.56 1.22 1.16 1.57 1.63 1.42 1.92
delaunay n20 2.36 1.03 1.71 3.72 1.13 3.90 6.72 1.36 8.42
rgg n 2 20 s0 2.03 1.03 2.11 4.70 1.13 6.00 9.51 1.49 13.34
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
94 AYDIN BULUÇ AND KAMESH MADDURI
relabeling of the vertices results in partitions that are load-balanced per iteration.
The maximum occurs for the eu-2005 matrix on 64 processors with 1D partitioning,
but even in this case, the maximum (1.73×) is less than twice the average. In
contrast, both natural and PaToH orderings suffer from imbalances, especially for
higher processor counts.
To highlight the problems with minimizing total (hence average) communica-
tion as opposed to the maximum, we plot communication volume scaling in Figure 4
for the Kronecker graph we study. The plots show that even though PaToH achieves
the lowest average communication volume per processor, its maximum communica-
tion volume per processor is even higher than the random case. This partly explains
the computation times reported in Section 7.2, since the maximum communication
per processor is a better approximation for the overall execution time.
Edge count imbalances for different partitioning strategies can be found in the
Appendix. Although they are typically low, they only represent the load imbalance
due to the number of edges owned by each processor, and not the number of edges
traversed per iteration.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 95
We report parallel execution time on Hopper for two different parallel concur-
rencies, p = 16 and p = 256. Tables 6 and 7 give the serial performance rates (with
natural ordering) as well as the relative speedup with different reorderings, for sev-
eral benchmark graphs. There is a 3.5× variation in serial performance rates, with
the skewed-degree graphs showing the highest performance and the high diameter
graphs road central and hugebubbles-00020 the lowest performance. For the
parallel runs, we report speedup over the serial code with the natural ordering.
Interestingly, the random-ordering variants perform best in all of the low-diameter
graph cases. The performance is better than PaToH- and Metis-partitioned vari-
ants in all cases. The table also gives the impact of checkerboard partitioning on
the running time. There is a moderate improvement for the random variant, but
the checkerboard scheme is slower for the rest of the schemes. The variation in rel-
ative speedup across graphs is also surprising. The synthetic low-diameter graphs
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
96 AYDIN BULUÇ AND KAMESH MADDURI
demonstrate the best speedup overall. However, the speedups for the real-world low-
diameter graphs are 1.5× lower, and the relative speedups for the high-diameter
graphs are extremely low.
Figure 6 gives a breakdown of the average parallel BFS execution and inter-node
communication times for 16-processor parallel runs, and provides insight into the
reason behind varying relative speedup numbers. For all the low-diameter graphs,
at this parallel concurrency, execution time is dominated by local computation. The
local discovery and local update steps account for up to 95% of the total time, and
communication times are negligible. Comparing the computational time of random
ordering vs. Metis reordering, we find that BFS on the Metis-reordered graph
is significantly slower. The first reason is that Metis partitions are highly unbal-
anced in terms of the number of edges per partition for this graph, and so we can
expect a certain amount of imbalance in local computation. The second reason is
a bit more subtle. Partitioning the graph to minimize edge cut does not guarantee
that the local computation steps will be balanced, even if the number of edges per
process are balanced. The per-iteration work is dependent on the number of ver-
tices in the current frontier and their distribution among processes. Randomizing
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 97
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
98 AYDIN BULUÇ AND KAMESH MADDURI
vertex identifiers destroys any inherent locality, but also improves local computa-
tion load balance. The partitioning tools reduce edge cut and enhance locality, but
also seem to worsen load balance, especially for skewed degree distribution graphs.
The PaToH-generated 1D partitions are much more balanced in terms of number
of edges per process (in comparison to the Metis partitions for Kronecker graphs),
but the average BFS execution still suffers from local computation load imbalance.
Next, consider the web crawl eu-2005. The local computation balance even after
randomization is not as good as the synthetic graphs. One reason might be that the
graph diameter is larger than the Kronecker graphs. 2D partitioning after random-
ization only worsens the load balance. The communication time for the fold step is
somewhat lower for Metis and PaToH partitions compared to random partitions,
but the times are not proportional to the savings projected in Table 4. This de-
serves further investigation. coPapersCiteseer shows trends similar to eu-2005.
Note that the communication time savings going from 1D to 2D partitioning are
different in both cases.
The tables also indicate that the level-synchronous approach performs ex-
tremely poorly on high-diameter graphs, and this is due to a combination of reasons.
There is load imbalance in the local computation phase, and this is much more ap-
parent after Metis and PaToH reorderings. For some of the level-synchronous
phases, there may not be sufficient work per phase to keep all 16/256 processes
busy. The barrier synchronization overhead is also extremely high. For instance,
observe the cost of the expand step with 1D partitioning for road central in Fig-
ure 6. This should ideally be zero, because there is no data exchanged in expand
for 1D partitioning. Yet, multiple barrier synchronizations of a few microseconds
turn out to be a significant cost.
Table 7 gives the parallel speedup achieved with different reorderings at 256-way
parallel concurrency. The Erdős-Rényi graph gives the highest parallel speedup for
all the partitioning schemes, and they serve as an indicator of the speedup achieved
with good computational load balance. The speedup for real-world graphs is up
to 5× lower than this value, indicating the severity of the load imbalance problem.
One more reason for the poor parallel speedup may be that these graphs are smaller
than the Erdős-Rényi graph. The communication cost increases in comparison to
the 16-node case, but the computational cost comprises 80% of the execution time.
The gist of these performance results is that for level-synchronous BFS, partitioning
has a considerable effect on the computational load balance, in addition to altering
the communication cost. On current supercomputers, the computational imbalance
seems to be the bigger of the two costs to account for, particularly at low process
concurrencies.
As highlighted in the previous section, partitioners balance the load with re-
spect to overall execution, that is the number of edges owned by each processor,
not the number of edges traversed per BFS iteration. Figure 7 shows the actual
imbalance that happens in practice due to the level-synchronous nature of the BFS
algorithm. Even though PaToH limits the overall edge count imbalance to 3%,
the actual per iteration load imbalances are severe. In contrast, random vertex
numbering yields very good load balance across MPI processes and BFS steps.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 99
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
100 AYDIN BULUÇ AND KAMESH MADDURI
Acknowledgments
We thank Bora Uçar for fruitful discussions and his insightful feedback on
partitioning.
References
[1] U. Brandes, A faster algorithm for betweenness centrality, J. Mathematical Sociology 25
(2001), no. 2, 163–177.
[2] A. Buluç and K. Madduri, Parallel breadth-first search on distributed memory systems, Proc.
ACM/IEEE Conference on Supercomputing, 2011.
[3] Ü.V. Çatalyürek and C. Aykanat, PaToH: Partitioning tool for hypergraphs, 2011.
[4] Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar, On two-dimensional sparse matrix
partitioning: models, methods, and a recipe, SIAM J. Sci. Comput. 32 (2010), no. 2, 656–
683, DOI 10.1137/080737770. MR2609335 (2011g:05176)
[5] The Graph 500 List, http://www.graph500.org, last accessed May 2012.
[6] D. Gregor and A. Lumsdaine, The Parallel BGL: A Generic Library for Distributed Graph
Computations, Proc. Workshop on Parallel/High-Performance Object-Oriented Scientific
Computing (POOSC’05), 2005.
[7] G. Karypis and V. Kumar, Multilevel k-way partitioning scheme for irregular graphs, Journal
of Parallel and Distributed Computing 48 (1998), no. 1, 96–129.
[8] Richard J. Lipton, Donald J. Rose, and Robert Endre Tarjan, Generalized nested dissec-
tion, SIAM J. Numer. Anal. 16 (1979), no. 2, 346–358, DOI 10.1137/0716027. MR526496
(80d:65041)
[9] J.D. McCalpin, Memory bandwidth and machine balance in current high performance com-
puters, IEEE Tech. Comm. Comput. Arch. Newslett, 1995.
[10] Yossi Shiloach and Uzi Vishkin, An O(n2 logn) parallel MAX-FLOW algorithm, J. Algorithms
3 (1982), no. 2, 128–146, DOI 10.1016/0196-6774(82)90013-X. MR657270 (83e:68045)
[11] A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and Ü. V. Çatalyürek, A
scalable distributed parallel breadth-first search algorithm on BlueGene/L, Proc. ACM/IEEE
Conf. on High Performance Computing (SC2005), November 2005.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH PARTITIONING FOR SCALABLE DISTRIBUTED GRAPH COMPUTATIONS 101
Graph p=4×1 p = 16 × 1 p = 64 × 1
N R P N R P N R P
coPapersCiteseer 2.11 1.00 1.00 2.72 1.02 1.00 3.14 1.06 1.00
coAuthorsDBLP 1.90 1.00 1.00 2.60 1.03 1.00 3.40 1.04 1.00
eu-2005 1.05 1.01 1.01 1.50 1.05 1.02 2.40 1.06 1.02
kronecker-logn18 1.03 1.02 1.01 1.10 1.08 1.02 1.29 1.21 1.02
rgg n 2 20 s0 1.01 1.00 1.03 1.02 1.00 1.02 1.02 1.00 1.02
delaunay n20 1.00 1.00 1.02 1.00 1.00 1.02 1.00 1.00 1.02
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11713
1. Introduction
One way to extract information about the structure of a network or a graph
and the relationships between its nodes is to divide it into communities, groups of
nodes with denser links within the same group and sparser links between nodes in
different groups. For instance, in a citation network, papers on related topics form
communities and, in social networks, communities may define groups of people with
similar interests.
The intuitive notion of communities given above is too vague as it is not specific
about how dense the in-group links and how sparse the between group links should
be. There are several formal definitions of communities, but the most popular
currently is the one based on the modularity of a partition. Modularity [31, 35]
is a measure of community quality of a partition of a network and measures the
difference between the fraction of the links with endpoints in the same set of the
partition and the expected fraction of that number in a network with a random
2013
c American Mathematical Society
103
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
104 HRISTO DJIDJEV AND MELIH ONUS
placement of the links. Formally, let G = (V, E) be the graph representing the
network and P = {V1 , . . . , Vk }, k ≥ 1, be a partition of V , i.e., such that V1 ∪ · · · ∪
Vk = V and Vi ∩ Vj = ∅ for i
= j. We refer to the sets Vi as communities. Let G
be a random graph on the same set of nodes as G. Then the modularity of P with
respect to G is defined as
1
k
(1.1) mod(P, G, G) = (|E(Vi )| − E(Vi , G)),
m i=1
where m is the number of the edges of G, E(Vi ) denotes the set of all edges of G
whose both endpoints are in Vi and E(Vi , G) denotes the expected number of edges
in G with endpoints in Vi .
There are two choices that have been most often used for the random graph
G. The random graph model G(n, p) of Erdös-Rényi [17] defines equal edge prob-
abilities between all pairs of nodes. If n is the number of the nodes of G, m is the
number of the edges, and p is chosen as m/n2 , then the expected number of edges
of G(n, p) is m. The alternative and more often used choice for a random graph in
the definition of the modularity is based on the Chung and Lu model [10]. In that
graph model, the expected node degrees match the node degrees of G. It defines
an edge in G between nodes v and w of G with probability d(v)d(w)/(2m), where
by d(x) we denote the degree of node x.
By the definition of modularity, a higher modularity indicates a larger frac-
tion of in-community edges and, hence, a community structure of higher quality.
Hence, the community detection problem can be formally defined as a modularity
optimization problem, namely, given a graph G, find a partition of the nodes of G
with maximum modularity. The minimum value of the modularity for a given G
over the set of all partitions is called modularity of G, which we will denote by
mod(G, G). The modularity optimization problem has been shown to be NP-hard
[9].
Hence, polynomial algorithms for finding an exact solution are unlikely, and
various researchers have tried to construct heuristic algorithms for solving the mod-
ularity optimization problem. Clauset, Newman and Moore [11] construct an ag-
glomerative algorithm that starts with a partition where each node represents a
separate community and iteratively merge pairs of communities in order of max-
imum modularity gain, thereby building a dendrogram of the graph. They also
consruct a data structure that makes the search of the best pair to merge very
efficient. Guimerà and Amaral [22] use simulated annealing in a procedure that
iteratively updates an initial partitioning aiming at increasing modularity. Sim-
mulated annealing is used in order to try to avoid converging to a local minimum.
Another physics-based approach is employed by Reichardt and Bornholdt [38], who
simulate spin glass energy minimization for finding a community structure defined
as the configuration of minimum energy. White and Smyth [43] and Newman [33]
use a spectral approach by computing the eigenvector of the modularity matrix
defined as the adjacency matrix of the input graph, appropriately updated to take
into account the contribution of the random graph probabilities.
In this paper we describe a community detection method that reduces mod-
ularity optimization to the problem of finding a minimum weighted cut, which
latter problem can be solved efficiently by using methods and tools developed for
graph partitioning. Our approach was originally reported in [12–14], where we
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
USING GRAPH PARTITIONING FOR MODULARITY OPTIMIZATION 105
have compared our methods against the algorithms from [11, 22, 33, 38] on artifi-
cial graphs and showed that our algorithm is comparable in accuracy with the most
accurate of these algorithms, while its scalability is significantly higher. In this pa-
per we will first review the reduction from modularity optimization to minimum
weighted cut, and then describe briefly how the resulting minimum cut problem can
be solved by modifying the Metis graph partitioning code. Then we will compare
the performance of the resulting algorithm against another algorithm described in
this volume, using data sets from the 10th DIMACS Implementation Challenge
collection.
k
= − min{ − ( |E(Vi )| − E(Vi , G) )}
P
i=1
k
k
(2.1) = − min{ (|E| − |E(Vi )| ) − (|E| − E(Vi , G) )}.
P
i=1 i=1
The first term |E| − ki=1 |E(Vi )| of (2.1) is the number of the edges that
connect all pairs of nodes from different sets of the partition. A cut of a graph is
generally defined as a set C of edges whose removal divides the nodes of the graph
into two or more sets such that all edges in C connect nodes from different sets.
We extend this definition so that C = ∅ is also considered a valid cut, although it
corresponds to a partition of a single set containing all the nodes of the graph. (The
reason is that such partitions are also allowed in the definition of the modularity
and, in fact, are essential as they correspond to a graph with a modularity structure
of a single community.) We denote cut(P, G) = E(G) − ∪ki=1 E(Vi ).
The second term |E|− ki=1 E(Vi , G) of (2.1), which we denote by Ecut(P, G, G),
corresponds to the expected value of the cut size of P in G. The assumption that
we make about the random graph model is that it preserves the expected number of
the edges, hence |E| is equal to the expected number of edges of G. The two random
graph models that we consider in this paper, the Erdös-Rényi and the Chung-Lu
models, have this property, as we show below.
Hence,
m · mod(G, G) = − min{ |cut(P, G)| − Ecut(P, G, G)},
P
which shows that the modularity optimization problem is equivalent to the problem
of finding a partition that minimizes the difference of two cut. In order to merge
the two cuts into a cut of a single graph, we define a new graph as follows.
We define a complete graph G with the same vertices as G and a weight on
each edge (v, w) defined by
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
106 HRISTO DJIDJEV AND MELIH ONUS
For the Chung-Lu model we have p(v, w) = d(v)d(w)/(2m), which gives for the
expected number of edges of G
1 1
p(v, w) = d(v)d(w)
2 4m
(v,w)∈V ×V (v,w)∈V ×V
1 (2m)2
= d(v) d(w) = = m.
4m 4m
v∈V w∈V
The above approach can be generalized in a straightforward manner to graphs
with positively weighted edges. For this end, in definition (1.1), m is replaced with
the sum M of all edge weights, |E(Vi )| with the sum of the weights of all edges
between nodes in Vi , and the expected number of edges in G corresponding to
E(Vi ) with the expected weight of those edges. Finally, the random graph model G
is replaced by a complete graph with weighted edges. For instance, for the Erdös-
Rényi model, the probability p(v, w) of an edge between nodes v and w is replaced
by the weight wt(v, w), which is defined as wt(v, w) = 2M/n2 and in the Chung-Lu
model the weight is defined as D(v)D(w)/(2M ), where D(v) denotes the sum of
the weights of all edges in G incident with v.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
USING GRAPH PARTITIONING FOR MODULARITY OPTIMIZATION 107
ND16, p.210], can be reduced to it. Hence one has to look for approximation or
heuristic based algorithms for the modularity optimization problem.
While different versions of the minimum cut problem have been widely re-
searched from theoretical point of view, much less attention has been paid to their
implementation. The graph partitioning (GP) problem, which is related to the
minimum cut problem, has received much greater attention from practitioners and
very efficient implementations have been developed. The reason is that GP has
important applications such as load balancing for high-performance computing and
VLSI circuit design. For that reason, we are using a GP tool as a basis of our
weighted minimum cut implementation, thereby solving the modularity optimiza-
tion problem.
The GP problem asks, given a graph and an integer k, to find a partitioning
of the vertices of the graph into equally sized (within difference at most one) sets
such that the number (or weight) of the edges between different sets is minimized.
Hence, GP is similar to the minimum cut problem, with the following differences:
(i) in GP the sizes of the parts have to be balanced, while in minimum cut they
can be arbitrary; (ii) in GP the number of the parts is an input variable given
by the user, while in minimum cut and modularity optimization it is subject to
optimization.
Any graph partitioning tool can be chosen as a basis for implementing (after
appropriate modifications) the modularity optimization algorithm. The specific
GP tool that we chose for our implementation is Metis [23, 24]. The reason is that
Metis is considered an efficient and accurate tool for graph partitioning and that it
is publicly available as a source code.
Metis is using multilevel strategy to find a good solution in a scalable manner.
This type of multilevel strategy involves three phases: coarsening, partitioning, and
uncoarsening. In the coarsening phase the size of the original graph is reduced in
several stages, where at each stage connected subsets of nodes are contracted into
single nodes, reducing as a result the number of the nodes of the graph roughly
by half. The coarsening continues until the size of the resulting graph becomes
reasonably small, say about 100 nodes. The final small graph is partitioned during
the partitioning phase using some existing partitioning algorithms. In particular,
Metis uses a graph-growing heuristic where one constructs one set of the partition
by starting with a randomly selected node and then adding nodes to it in a breadth-
first manner. The uncoarsening phase involves projecting the found solution from
smaller graphs to larger ones, and refining the solution after each projection. This
refinement step is one of the most important and sensitive for the quality of the
final partition step. Metis implements it using the Kernighan-Lin algorithm. That
algorithm computes for each node a quantity called gain that is equal to the change
in the size (weight) of the cut if that node is moved from its current part to the other
one. Then nodes with maximum gains are exchanged between partitions, making
sure the balance between the sizes of the parts is maintained and also avoiding
some local minima by allowing a certain number of negative-gain node swaps. See
[23, 24] for more details about the implementation of Metis.
The modifications to Metis that need to be made are of two types: first, ones
that take care of the above mentioned difference between GP and minimum cut
problems and, second, ones aiming at improving the efficiency of the algorithm.
Specifically, the minimum cut problem that we need to solve is on a complete
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
108 HRISTO DJIDJEV AND MELIH ONUS
graph G , whose number of edges is of order Ω(n2 ), where n is the number of the
nodes of G, while the number of the edges of the original graph G is typically of
order O(n). We will briefly discuss these two types of modifications below.
Removing the GP restriction of balanced part sizes is easy; in Metis we have
just to omit checking the balance of the partition. Finding the right number of
parts can be done in the following recursive way. We divide the original graph into
two parts using the algorithm described above. If both parts are non-empty, we
recurse on each part, and, if one of the part is empty (and the other contains all
nodes), we are done. The latter case corresponds to the situation where the current
graph (or subgraph) contains a single community.
The final issue we discuss is how to avoid the necessity of working explicitly
with the edges of G that are not in G and, as a result, to avoid the Ω(n2 ) bound
on the running time. The idea is to use explicitly in the algorithm only the edges
of G, while handling implicitly the other ones by correcting the computed values
in constant time. For instance, suppose that we have a partition P of the nodes of
G in two sets of sizes n1 and n2 , respectively, and we have computed the weight
of the corresponding cut in G, say wG . Our goal is to evaluate the corresponding
cut in G . Assume that the random class model is G(n, p). Then the weight of the
cut corresponding to P in G is wG = wG − n1 n2 p by formula (2.2). Hence it takes
O(1) time to compute wG knowing wG . In a similar way one can compute the
weight of the cut in G in the case of the Chung-Li model, see [14] for details.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
USING GRAPH PARTITIONING FOR MODULARITY OPTIMIZATION 109
website already contains ranking results for several of the algorithms [39], including
ours, those algorithms were not run on the same computer. Furthermore, the data
on the website has not been converted into an easy to read format. Therefore, we
believe it is worth including in this paper a direct comparison of our algorithm
against a top performing algorithm from the challenge.
Ovelgonne and Geyer-Schulz’s algorithm [37], ranked as number one in the
Challenge, exploits the idea of ensemble learning. It learns weak graph clusterings
and uses them to find a strong graph clustering. Table 1 compares the performance
of our algorithm with that algorithm.
The test graphs in our experiments are the Co-author and Citation Networks
and the Clustering Instances datasets of the DIMACS Challenge testbed [40], [2],
[4], [6], [8], [7], [42], [1], [32], [25], [29], [19], [26], [28], [16], [30], [41], [27],
[36], [34], [5], [21], [15], [20], and [3]. All experiments have been run on an
Intel(R) Core(TM) i3 CPU M370 2.40 GHz processor notebook computer with 4G
of memory.
For each experiment, the table shows the average running time and modularity
of the partition for each of the algorithms. The results show modularity of the
clusterings that our algorithm finds is 7% less on average, but our algorithm is 48
times faster on average. For one instance (kron-g500-logn16), our algorithm is 390
times faster.
One of the reasons that the modularities of our partitions are lower than the
modularities produced by the code of [37] is that our algorithm is based on a version
of Metis that is known to perform poorly on power law graphs. Hence our algorithm
inherits the same weakness. Most of the networks in the testbed have power law or
non-uniform degree distribution, which may explain some of the results. There is
a newer version of Metis that is claimed to partition power law graphs successfully
and it can be used for a new implementation of our algorithm.
5. Conclusion
We proved in this paper that the modularity optimization problem is equivalent
to the problem of finding a minimum cut of a complete graph with real edge weights.
We also showed that the resulting minimum cut problem can be solved based on
existing software for graph partitioning. Our implementation was based on Metis,
but we believe most other high-quality graph partitioning tools can be used for the
same purpose. Of particular interest will be using a parallel partitioner as this will
yield a parallel code for community detection.
References
[1] L. A. Adamic and N. Glance, The political blogosphere and the 2004 us election, WWW-2005
Workshop on the Weblogging Ecosystem (2005).
[2] Albert-László Barabási and Réka Albert, Emergence of scaling in random networks, Science
286 (1999), no. 5439, 509–512, DOI 10.1126/science.286.5439.509. MR2091634
[3] Alex Arenas, http://deim.urv.cat/ aarenas/data/welcome.htm.
[4] D. Baird and R.E. Ulanowicz, The seasonal dynamics of the chesapeake bay ecosystem, Ecol.
Monogr. 59 (1989), 329–364.
[5] M. Boguñá, R. Pastor-Satorras, A. Diaz-Guilera, and A. Arenas, Pgp network, Physical Re-
view E 70 (2004).
[6] Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna, Ubicrawler: A scal-
able fully distributed web crawler, Software: Practice & Experience 34 (2004), no. 8, 711–726.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
110 HRISTO DJIDJEV AND MELIH ONUS
[7] Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna, Layered label propagation:
A multiresolution coordinate-free ordering for compressing social networks, Proceedings of
the 20th international conference on World Wide Web, ACM Press, 2011.
[8] Paolo Boldi and Sebastiano Vigna, The WebGraph framework I: Compression techniques,
Proc. of the Thirteenth International World Wide Web Conference (WWW 2004) (Manhat-
tan, USA), ACM Press, 2004, pp. 595–601.
[9] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Görke, Martin Hoefer, Zoran Nikoloski,
and Dorothea Wagner, On modularity clustering, IEEE Trans. Knowl. Data Eng. 20 (2008),
no. 2, 172–188.
[10] Fan Chung and Linyuan Lu, Connected components in random graphs with given expected de-
gree sequences, Ann. Comb. 6 (2002), no. 2, 125–145, DOI 10.1007/PL00012580. MR1955514
(2003k:05123)
[11] Aaron Clauset, M. E. J. Newman, and Cristopher Moore, Finding community structure in
very large networks, Physical Review E 70 (2004), 066111.
[12] H. Djidjev, A fast multilevel algorithm for graph clustering and community detection, Al-
gorithms and Models for the Web-Graph, Lecture Notes in Computer Science, vol. 4936,
2008.
[13] Hristo Djidjev and Melih Onus, A scalable multilevel algorithm for community structure de-
tection, WAW’06, Tech. Report LA-UR-06-6261, Los Alamos National Laboratory, September
2006.
[14] Hristo N. Djidjev and Melih Onus, Scalable and accurate graph clustering and commu-
nity structure detection, IEEE Transactions on Parallel and Distributed Systems 99 (2012),
no. PrePrints.
[15] J. Duch and A. Arenas, C. elegans metabolic network, Physical Review E 72 (2005).
[16] , Condensed matter collaborations 2003, Phys. Rev. E 72 (2005).
[17] P. Erdős and A. Rényi, On random graphs. I, Publ. Math. Debrecen 6 (1959), 290–297.
MR0120167 (22 #10924)
[18] Michael R. Garey and David S. Johnson, Computers and intractability, W. H. Freeman and
Co., San Francisco, Calif., 1979. A guide to the theory of NP-completeness; A Series of Books
in the Mathematical Sciences. MR519066 (80g:68056)
[19] M. Girvan and M. E. J. Newman, Community structure in social and biological net-
works, Proc. Natl. Acad. Sci. USA 99 (2002), no. 12, 7821–7826 (electronic), DOI
10.1073/pnas.122653799. MR1908073
[20] P. Gleiser and L. Danon, Jazz musicians network, Adv. Complex Syst. 565 (2003).
[21] R. Guimerà, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas, E-mail network urv,
Physical Review E 68 (2003).
[22] Roger Guimerà and Luis A. Nunes Amaral, Functional cartography of complex metabolic
networks, Nature 433 (2005), 895.
[23] George Karypis and Vipin Kumar, Multilevel graph partitioning schemes., International Con-
ference on Parallel Processing, 1995, pp. 113–122.
[24] George Karypis and Vipin Kumar, A fast and high quality multilevel scheme for partition-
ing irregular graphs, SIAM J. Sci. Comput. 20 (1998), no. 1, 359–392 (electronic), DOI
10.1137/S1064827595287997. MR1639073 (99f:68158)
[25] D. E. Knuth, Les miserables: coappearance network of characters in the novel les miserables,
The Stanford GraphBase: A Platform for Combinatorial Computing (1993).
[26] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Dolphin
social network, Behavioral Ecology and Sociobiology 54 (2003), 396–405.
[27] M. E. J. Newman, The structure of scientific collaboration networks, Proc. Natl. Acad. Sci.
USA 98 (2001), no. 2, 404–409 (electronic), DOI 10.1073/pnas.021544898. MR1812610
[28] M. E. J. Newman, Condensed matter collaborations 2005, Proc. Natl. Acad. Sci. USA 98
(2001), 404–409.
[29] M. E. J. Newman, High-energy theory collaborations, Proc. Natl. Acad. Sci. USA 98 (2001),
404–409.
[30] M. E. J. Newman, The structure of scientific collaboration networks, Proc. Natl. Acad. Sci.
USA 98 (2001), no. 2, 404–409 (electronic), DOI 10.1073/pnas.021544898. MR1812610
[31] M. E. J. Newman, Mixing patterns in networks, Phys. Rev. E (3) 67 (2003), no. 2, 026126,
13, DOI 10.1103/PhysRevE.67.026126. MR1975193 (2004f:91126)
[32] M. E. J. Newman, Coauthorships in network science, Phys. Rev. E 74 (2006).
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
USING GRAPH PARTITIONING FOR MODULARITY OPTIMIZATION 111
[33] M. E. J. Newman, Finding community structure in networks using the eigenvectors of ma-
trices, Phys. Rev. E (3) 74 (2006), no. 3, 036104, 19, DOI 10.1103/PhysRevE.74.036104.
MR2282139 (2007j:82115)
[34] M. E. J. Newman, Finding community structure in networks using the eigenvectors of ma-
trices, Phys. Rev. E (3) 74 (2006), no. 3, 036104, 19, DOI 10.1103/PhysRevE.74.036104.
MR2282139 (2007j:82115)
[35] M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks,
Physical Review E 69 (2004), 026113.
[36] Mark Newman, Internet: a symmetrized snapshot of the structure of the internet at the level
of autonomous systems, The University of Oregon Route Views Project (2006).
[37] Michael Ovelgönne and Andreas Geyer-Schulz, A divisive clustering technique for maximazing
the modularity, In: 10th DIMACS Implementation Challenge (Atlanta, Georgia), 2012.
[38] Jörg Reichardt and Stefan Bornholdt, Statistical mechanics of community detection, Phys.
Rev. E (3) 74 (2006), no. 1, 016110, 14, DOI 10.1103/PhysRevE.74.016110. MR2276596
(2007h:82089)
[39] Tenth DIMACS implementation challenge results, http://www.cc.gatech.edu/dimacs10/
results, accessed: 1/9/2012.
[40] D. Watts and S. Strogatz, Collective dynamics of small-world networks, Nature (1998).
[41] , Neural network, Nature 393 (1998), 440–442.
[42] , Power grid, Nature 393 (1998), 440–442.
[43] S. White and P. Smyth, A spectral clustering approach to finding communities in graph,
Proceedings of the SIAM International Conference on Data Mining, 2005.
Information Sciences, Los Alamos National Labratory, Los Alamos, New Mexico
87545
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11705
1. Introduction
Clustering is an important chapter of data analysis and data mining with nu-
merous applications in natural and social sciences as well as in engineering and
medicine. It aims at solving the following general problem: given a set of entities,
find subsets, or clusters, which are homogeneous and/or well-separated. As the con-
cepts of homogeneity and of separation can be made precise in many ways, there are
a large variety of clustering problems [HJ, JMF, KR, M]. These problems in turn
are solved by exact algorithms or, more often and particularly for large data sets,
by heuristics, of which there are frequently a large variety. An exact algorithm pro-
vides, hopefully in reasonable computing time, an optimal solution together with a
2013
c American Mathematical Society
113
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
114 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
where aC is the fraction of all edges that lie within module C and eC is the expected
value of the same quantity in a graph in which the vertices have the same expected
degrees but edges are placed at random. A maximum value of Q near to 0 indicates
that the network considered is close to a random one (barring fluctuations), while
a maximum value of Q near to 1 indicates strong community structure. Observe
that maximizing modularity gives an optimal partition together with the optimal
number of modules.
Let the weight vertex function be defined as:
⎧
⎪
⎪ ω({u, v}) if {v, v} ∈
/E
⎨
{u,v}∈E
ω(v) =
⎪
⎪ ω({u, v}) + 2ω({v, v}) if {v, v} ∈ E.
⎩
{u,v}∈E,u =v
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 115
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
116 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
components of x (i.e., one component is set from 0 to 1 and the other goes from 1
to 0). A local search or improving heuristic consists of choosing an initial solution
x, and then moving to the best neighbor x ∈ N (x) in the case f (x ) > f (x). If no
such neighbor exists, the heuristic stops, otherwise it is iterated.
If many local maxima exist for a problem, the range of values they span may
be large. Moreover, the globaly optimum value f (x∗ ) may differ substantially from
the average value of a local maximum, or even from the best such value among
many, obtained by some simple randomized heuristic. In order to escape from
local maxima and, more precisely, the mountains of which they are the top, VNS
exploits the idea of neighborhood change. In fact, VNS relies upon the following
observations:
Fact 1: A local maximum with respect to one neighborhood structure is not
necessarily so for another ;
Fact 2: A global maximum is a local maximum with respect to all possible
neighborhood structures;
Fact 3: For many problems local maxima with respect to one or several neigh-
borhoods are relatively close to each other.
Let us denote with Nt , (t = 1, . . . , tmax ), a finite set of pre-selected neighbor-
hood structures, and with Nt (x) the set of solutions in the tth neighborhood of x.
We call x a local maximum with respect to Nt if there is no solution x ∈ Nt (x)
such that f (x ) > f (x).
In the VNS framework, the neighborhoods used correspond to various types
of moves, or perturbations, of the current solution, and are problem specific. The
current best solution x found is the center of the search. When looking for a
better one, a solution x is drawn at random in an increasingly far neighborhood
and a local ascent is performed from x , leading to another local maximum x .
If f (x ) ≤ f (x), x is ignored and one chooses a new neighbor solution x in a
further neighborhood of x. If, otherwise, f (x ) > f (x), the search is re-centered
around x restarting with the closest neighborhood. If all neighborhoods of x have
been explored without success, one begins again with the closest one to x, until a
stopping condition (e.g. maximum CPU time) is satisfied.
As the size of neighborhoods tends to increase with their distance from the
current best solution x, close-by neighborhoods are explored more thoroughly than
far away ones. This strategy takes advantage of the three Facts 1–3 mentioned
above. Indeed it is often observed that most or all local maxima of combinatorial
problems are concentrated in a small part of the solution space. Thus, finding a
first local maximum x implies that some important information has been obtained:
to get a better, near-optimal solution, one should first explore its vicinity.
The algorithm proposed in this work has two main components: (i) an improve-
ment heuristic, and (ii) exploration of different types of neighborhoods for getting
out of local maxima. They are used within a variable neighborhood decomposition
search framework [HMP] which explores the structure of the problem concentrating
on small parts of it. The basic components as well as the decomposition framework
are described in the next sections.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 117
executes in near linear time (in fact, each iteration of label propagation executes
in time proportional to m), while one round of merging pairs of communities can
execute in O(m log n) [SC].
Label propagation is a similarity-based technique in which the label of a vertex
is propagated to adjacent vertices according to their proximity. Label propaga-
tion algorithms for clustering problems assume that the label of a node correspond
to its incumbent cluster index. Then, at each label propagation step, each ver-
tex is sequentially evaluated for label updating according to a propagation rule.
In [BC], the Barber and Clark propose a label propagation algorithm, called LPA,
for modularity maximization. Their label updating rule for vertex v is (see. [BC]
for details):
n
(4)
v ← argmax Buv δ(
u ,
)
u=1
where
v is the label of vertex v, Buv = ω({u, v}) − (ω(u)ω(v))/2m, and δ(i, j) is
the Kronecker’s delta.
Moreover, the authors prove that the candidate labels for
v in eq.(4) can be
confined to the labels of the vertices adjacent to v and an unused label. We decided
to use this fact in order to speedup LPA. Let us consider a vertex v ∗ ∈ C, where C
is a module of the current partition, and let us suppose that the modules to which
its adjacent vertices belong have not changed since the last evaluation of v ∗ . In
this case, v ∗ can be discarded for evaluation since no value has changed from the
last instantiation of eq.(4) since the last evaluation of v ∗ . With that in mind, we
decided to iterate over “labels” instead of over the vertices of the graph.
We used LPAm+ modified as follows. A list L of all labels is initialized with
the clusters indices of the current partition. Then, from L, we proceed by picking
a label
∈ L until L is empty. Each time a label
is removed from L, we evaluate
by means of eq.(4) all its vertices for label updating. If the label of a vertex is
updated, yielding an improvement in the modularity value of the current partition,
the old and the new labels of that vertex, denoted
old and
new , are inserted in L.
Moreover, the labels of vertices which are connected to a node with label equal to
either
old or
new are also inserted in L. This modification induces a considerable
algorithmic speedup since only a few labels need to be evaluated as the algorithm
proceeds.
We then tested this modified LPAm+, and proceeded to improve it based on
empirical observations. In the final version, whenever a vertex relabeling yields an
improvement, the old and the new labels of that vertex are added to L but only
together with the labels of vertices which are adjacent to the relabeled vertex. This
version was selected to be used in our experiments due to its benefits in terms of
computing times and modularity maximization.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
118 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
(3) NEIGHBOR: relabels each vertex of a cluster to one of the labels of its
neighbors or to an unused label.
(4) EDGE: puts two linked vertices assigned to different clusters into one
neighboring cluster randomly chosen.
(5) FUSION: merges two or more clusters into a single one.
(6) REDISTRIBUTION: destroys a cluster and spreads each one of its vertices
to a neighboring cluster randomly chosen.
1 Algorithm VNDS(P )
2 Construct a random solution x ;
3 x ← LPAm+(x, P ) ;
4 s ← 1;
5 while stopping condition not satisfied do
6 Construct a subproblem S from x with a randomly selected
cluster and s − 1 neighboring clusters ;
7 Select randomly
α ∈ {singleton, division, neighbor, f usion, redistribution} ;
8 x ← shaking(x, α, S);
9 x ← LPAm+(x , S) ;
10 if cost(x ) > cost(x) then
11 x ← LPAm(x , P ) ;
12 s ← 1;
13 else
14 s ← s + 1;
15 if s > min{MAX SIZE, #clusters(x)} then
16 s ← 1;
17 end
18 end
19 end
20 return x
Algorithm 1: Pseudo-code of the decomposition heuristic.
The algorithm VNDS starts with a random solution for an input problem P in
line 2. Then, in line 3 this solution is improved by applying our implementation of
LPAm+. Note that LPAm+ receives two input parameters, they are: (i) the solution
to be improved, and (ii) the space on which an improvement will be searched. In
line 3, the local search is applied in the whole problem space P , which means that
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 119
all vertices are tested for label updating, and all clusters are considered for merging.
In line 4, the variable s which controls the current decomposition size is set to 1.
The central part of the algorithm VNDS consists of the loop executed in lines
5-19 until a stopping criterion is met (this can be the number of non improving
iterations for the Pareto Challenge or maximum allowed CPU time for the Quality
Challenge). This loop starts in line 6 by constructing a subproblem from a randomly
selected cluster and s − 1 neighboring clusters. Then, in line 7 a neighborhood α is
randomly selected for perturbing the incumbent solution x. Our algorithm allows
choosing α by specifying a probability distribution on the neighborhoods. Thus,
the most successful neighborhoods are more often selected. The shaking routine
is actually performed in line 8 in the chosen neigborhood α and in the search
space defined by subproblem S. In the following, the improving heuristic LPAm+
is applied over x in line 9 only in the current subproblem S. If the new solution
x is better than x in line 10, a faster version of the improving heuristic, denoted
LPAm, is applied in line 11 over x in the whole problem P . In this version, the
improving heuristic does not evaluate merging clusters. The resulting solution of
LPAm application is assigned to x in line 11 and s is reset to 1 in line 12. Otherwise,
if x is not better than x, the size of the decomposition is increased by one in line
14. This value is reset to 1 in line 16 if it exceeds the minimum between a given
parameter MAX SIZE and the number of clusters (i.e., #clusters(x)) in the current
solution x (line 15). Finally, a solution x is returned by the algorithm in line 20.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
120 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
4. Experimental Results
The algorithms were implemented in C++ and compiled by gcc 4.5.2. Limited
computational experiments allowed to set the parameters of the VNDS algorithm
as follows:
• MAX SIZE = 15
• Probability distribution for selecting α is drawn with:
– 30% of chances of selecting SINGLETON
– 30% of chances of selecting DIVISION
– 28% of chances of selecting NEIGHBOR
– 5% of chances of selecting EDGE
– 4% of chances of selecting FUSION
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 121
We note that VNDS finds the optimal solutions of instances karate, chesapeake,
dolphins, lesmis, polbooks, adjnoun, football, and jazz. Except for instance
adjnoun, where the optimal solution is found in 2 out of 5 runs, the optimal solu-
tions of the aforementioned instances are obtained in all runs.
4.2. Results for Pareto Challenge. The results presented in this section
and in the following one refers to the final modularity instances of the 10th DIMACS
Implementation Challenge. Particularly for this section, results are presented both
in terms of modularity values and CPU times, which are the two performance
dimensions evaluated in the Pareto challenge. Computational experiments were
performed on a Intel X3353 with a 2.66 Ghz clock and 24Gb of RAM memory.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
122 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
4.3. Results for Quality challenge. Since the amount of work to compute
a solution is not taken into consideration for this challenge, the VNDS algorithm
was allowed to run for a longer period of time than before, the CPU time limit being
the unique stopping condition. In our set of experiments, the instances were split
into two different categories. The algorithm was allowed to run for 1800 seconds (30
minutes) for instances in category Qu1, and 18000 seconds (5 hours) for instances in
category Qu2. Furthermore, in order to overcome memory limitations, VNDS was
executed in a Intel Westmere-EP X5650 with a 2.66 Ghz clock and 48Gb of RAM
memory for the largest instances. This allowed the algorithm to obtain solutions
for instances kron g500-simple-logn20, cage15 and uk-2002.
Table 3 presents the computational results obtained in 10 independent runs
of algorithm VNDS. We chose to present here the same results submitted to the
DIMACS Implementation Challenge. The first column refers to the category of the
instance indicated in the second column. Third and fourth columns refer to the best
obtained modularity value and its corresponding number of clusters. Finally, the
last column shows the rank position of the referred solution among 15 participating
teams.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 123
5. Conclusion
Several integer programming approaches and numerous heuristics have been
applied to modularity maximization. They are due mostly to the physics and com-
puter sciences research communities. We have applied the variable neighborhood
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
124 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
search metaheuristic to that problem and it proves to be very effective. For prob-
lems with known optimum values, the heuristic always found an optimal solution at
least once. For the DIMACS Implementation Challenge, the best know solution was
provided for 11 out of 30 instances. Overall, the proposed algorithm obtained the
second prize in the modularity Quality challenge and the fifth place in the Pareto
challenge.
References
[ACCHLP] D. Aloise, S. Cafieri, G. Caporossi, P. Hansen, L. Liberti, and S. Perron, Column
Generation Algorithms for Exact Modularity Maximization in Networks, Physical
Review E, vol. 82 (2010), no. 046112.
[AK] G. Agarwal and D. Kempe, Modularity-maximizing graph communities via mathemat-
ical programming, Eur. Phys. J. B 66 (2008), no. 3, 409–418, DOI 10.1140/epjb/e2008-
00425-1. MR2465245 (2009k:91130)
[AY] C.J. Alpert and S.-Z. Yao, Spectral Partitioning: The More Eigenvectors, The Better,
Proc. 32nd ACM/IEEE Design Automation Conf., 1995, 195–200.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 125
[BC] M.J. Barber and J.W. Clark, Detecting network communities by propagating labels
under constraints, Physical Review E, vol. 80 (2009), no. 026129.
[BGLL] V. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of com-
munities in large networks, Journal of Statistical Mechanics, (2008), p. P10008.
[BILPR] S. Boccaletti, M. Ivanchenko, V. Latora, A. Pluchino, and A. Rapisarda, Detect-
ing complex network modularity by dynamical clustering, Physical Review E, vol. 75
(2007), 045102(R).
[CFS] D. Chen, R. Fu, and M. Shang, A fast and efficient heuristic algorithm for detecting
community structures in complex networks, Physica A, vol. 388 (2009), 2741–2749.
[CHL] S. Cafieri, P. Hansen, and L. Liberti, A locally optimal heuristic for modularity max-
imization of networks, Physical Review E, vol. 83 (2011), 056105(1-8).
[CNM] A. Clauset, M. Newman, M., and C. Moore, Finding community structure in very
large networks, Physical Review E, vol. 70 (2004), no. 066111.
[D] H. Djidjev, A scalable multilevel algorithm for graph clustering and community struc-
ture detection, LNCS, vol. 4936, (2008).
[DA] J. Duch, and A. Arenas, Community identification using extremal optimization, Phys-
ical Review E 72 (2005) 027104(2).
[DDA] L. Danon, A. Diaz-Guilera, and A. Arenas, The effect of size heterogeneity on commu-
nity identification in complex networks, Journal of Statistical Mechanics, vol. P11010,
2006.
[DGSW] D. Delling, R. Görke, C. Schulz, D. Wagner. ORCA reduction and contraction graph
clustering. In 5th Int. Conf. on Algorithmic Aspects in Information and Management
(AAIM), 2009, 152–165.
[F] Santo Fortunato, Community detection in graphs, Phys. Rep. 486 (2010), no. 3-5,
75–174, DOI 10.1016/j.physrep.2009.11.002. MR2580414 (2011d:05337)
[FLZWD] Y. Fan, M. Li, P. Zhang, J. Wu, Z. Di, Accuracy and precision of methods for com-
munity identification in weighted networks, Physica A, vol. 377 (2007), 363–372.
[GA] R. Guimerà and A. Amaral, Functional cartography of complex metabolic networks,
Nature, vol. 433 (2005), pp. 895–900.
[GH] Olivier Goldschmidt and Dorit S. Hochbaum, A polynomial algorithm for the
k-cut problem for fixed k, Math. Oper. Res. 19 (1994), no. 1, 24–37, DOI
10.1287/moor.19.1.24. MR1290008 (95h:90154)
[GN] M. Girvan and M. E. J. Newman, Community structure in social and biological net-
works, Proc. Natl. Acad. Sci. USA 99 (2002), no. 12, 7821–7826 (electronic), DOI
10.1073/pnas.122653799. MR1908073
[HJ] Pierre Hansen and Brigitte Jaumard, Cluster analysis and mathematical program-
ming, Math. Programming 79 (1997), no. 1-3, Ser. B, 191–215, DOI 10.1016/S0025-
5610(97)00059-2. Lectures on mathematical programming (ismp97) (Lausanne, 1997).
MR1464767 (98g:90043)
[HMP] Pierre Hansen, Nenad Mladenović, and José A. Moreno Pérez, Variable neighbourhood
search: methods and applications, 4OR 6 (2008), no. 4, 319–360, DOI 10.1007/s10288-
008-0089-1. MR2461646 (2009m:90198)
[JMF] A. Jain, M. Murty, and P. Flynn, Data clustering: A review, ACM Computing Surveys,
vol. 31 (1999), 264–323.
[KR] Leonard Kaufman and Peter J. Rousseeuw, Finding groups in data, Wiley Series in
Probability and Mathematical Statistics: Applied Probability and Statistics, John
Wiley & Sons Inc., New York, 1990. An introduction to cluster analysis; A Wiley-
Interscience Publication. MR1044997 (91e:62159)
[KSKK] J. Kumpula, J. Saramaki, K. Kaski, and J. Kertesz, Limited resolution and mul-
tiresolution methods in complex network community detection, Fluctuation and Noise
Letters, vol. 7 (2007), L209–L214.
[LH] S. Lehmann and L. Hansen, Deterministic modularity optimization, European Phys-
ical Journal B, vol. 60 (2007), 83–88.
[LM] X. Liu and T. Murata, Advanced modularity-specialized label propagation algorithm
for detecting communities in networks, Physica A, vol. 389 (2010), 1493–1500.
[LS] W. Li and D. Schuurmans, Modular Community Detection in Networks, In Proc. of
the 22nd. Intl Joint Conf. on Artificial Intelligence, 2011, 1366–1371.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
126 D. ALOISE, G. CAPOROSSI, P. HANSEN, L. LIBERTI, S. PERRON, AND M. RUIZ
[M] Boris Mirkin, Clustering for data mining, Computer Science and Data Analysis Series,
Chapman & Hall/CRC, Boca Raton, FL, 2005. A data recovery approach. MR2158827
(2006d:62002)
[MAD] A. Medus, G. Acuna, and C. Dorso, Detection of community structures in networks
via global optimization, Physica A, vol. 358 (2005), 593–604.
[MD] Jonathan P. K. Doye and Claire P. Massen, Self-similar disk packings as model spatial
scale-free networks, Phys. Rev. E (3) 71 (2005), no. 1, 016128, 12, DOI 10.1103/Phys-
RevE.71.016128. MR2139325 (2005m:82056)
[MHSWL] J. Mei, S. He, G. Shi, Z. Wang, and W. Li, Revealing network communities through
modularity maximization by a contraction-dilation method, New Journal of Physics,
vol. 11 (2009), 043025.
[N04] M. Newman, Fast algorithm for detecting community structure in networks, Physical
Review E, vol. 69 (2004), 066133.
[N06] M. Newman, Modularity and community structure in networks, In Proc. of the Na-
tional Academy of Sciences, 2006, 8577–8582.
[NG] M. Newman and M. Girvan, Finding and evaluating community structure in networks,
Physical Review E, vol. 69 (2004), 026133.
[NHZW] Yan Qing Niu, Bao Qing Hu, Wen Zhang, and Min Wang, Detecting the community
structure in complex networks based on quantum mechanics, Phys. A. 387 (2008),
no. 24, 6215–6224, DOI 10.1016/j.physa.2008.07.008. MR2591580 (2010k:81089)
[NR] A. Noack and R. Rotta, Multi-level algorithms for modularity clustering, LNCS vol.
5526 (2009), 257–268.
[RMP] T. Richardson, P. Mucha, and M. Porter, Spectral tripartitioning of networks, Physical
Review E, vol. 80 (2009), 036111.
[RZ] J. Ruan and W. Zhang, Identifying network communities with a high resolution,
Physical Review E, vol. 77 (2008), 016104.
[SC] P. Schuetz and A. Caflisch, Efficient modularity optimization by multistep greedy
algorithm and vertex mover refinement, Physical Review E, vol. 77 (2008), 046112.
[SDJB] Y. Sun, B. Danila, K. Josic, and K.E. Bassler, Improved community structure detection
using a modified fine-tuning strategy, Europhysics Letters, vol. 86 (2009), 28004.
[SM] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE TPAMI, vol. 22
(2000), 888–905.
[THB] M. Tasgin, A. Herdagdelen, and H. Bingol, Community detection in complex networks
using genetic algorithms, arXiv:0711.0491, (2007).
[WT] K. Wakita and T. Tsurumi, Finding community structure in mega-scale social net-
works, Tech. Rep. 0702048v1, arXiv, 2007.
[XTP] G. Xu, S. Tsoka, and L. Papageorgiou, Finding community structures in complex
networks using mixed integer optimization, The European Physical Journal B, vol. 60
(2007), 231–239.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
MODULARITY MAXIMIZATION IN NETWORKS BY VNS 127
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11702
1. Introduction
Network (graph) based data mining is an emerging field that studies network represen-
tations of data sets generated by an underlying complex system in order to draw meaningful
conclusions regarding the system’s properties. In a network representation of a complex
system, the network’s nodes typically denote the system’s entities, while the edges between
nodes represent a certain kind of similarity or relationship between the entities. Network
clustering, aiming to partition a network into clusters of similar elements, is an important
task frequently arising within this context. The form of each cluster in the partitioning
is commonly specified through a predefined graph structure. Since a cluster is typically
understood as a “tightly knit” group of elements, the graph theoretic concept of a clique,
which is a subset of nodes inducing a complete subgraph, is a natural formalization of a
cluster that has been used in many applications. This results in partitioning into clusters
with the highest possible level of cohesiveness one can hope for.
However, in many applications modeling clusters as cliques is excessively restrictive,
since a highly cohesive structure might not get identified as a cluster by the mere absence
of a few edges. In real life data sets, this is of critical importance because some edges
could be missing either naturally or due to erroneous data collection. Moreover, given that
networks arising in many important applications tend to be very large with respect to the
number of nodes and very sparse in terms of edge density, the clique clustering usually
results in meaninglessly large number of clusters in such situations. In addition, comput-
ing large cliques and good clique partitions are computationally challenging problems, as
2013
c American Mathematical Society
129
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
130 ANURAG VERMA AND SERGIY BUTENKO
finding a maximum clique and a minimum clique partition in a graph are classical NP-hard
problems [8].
To circumvent these issues, researchers in several applied fields, such as social net-
work analysis and computational biology, have defined and studied structures that relax
some of the properties of cliques, and hence are called clique relaxations. Some of the
popular clique relaxations include s-plexes, which require each vertex to be connected to
all but s other vertices [11]; s-clubs, which require the diameter of the induced subgraph
to be at most s [2]; and γ -quasi-cliques, which require the density of the induced subgraph
to be at least γ [1]. It should be noted that each of 1-plex, 1-club and 1-clique trivially
represent a clique. By relaxing the properties of a clique, namely the degree, diameter,
and density, these clique relaxations capture clusters that are strongly but not completely
connected. However, like the clique model, these clique relaxations still suffer from the
drawback of being computationally expensive.
In 1983, Seidman [10] introduced the concept of a k-core that restricts the minimum
number k of direct links a node must have with the rest of the cluster. Using k-cores to
model clusters in a graph has considerable computational advantages over the other clique
clique relaxation models mentioned above. Indeed, the problem of finding the largest k-
core can be easily solved in polynomial time by recursively removing vertices of degree
less than k. As a result, the k-core model has gained significant popularity as a network
clustering tool in a wide range of applications. In particular, k-core clustering has been
used as a tool to visualize very large scale networks [4], to identify highly interconnected
subsystems of the stock market [9], and to detect molecular complexes and predict protein
functions[3, 5]. On the downside, the size of a k-core may be much larger than k, creating
a possibility of a low level of cohesion within the resulting cluster. Because of this, a
k-core itself may not be a good model of a cluster, however, it has been observed that k-
cores tend to contain other, more cohesive, clique relaxation structures, such as s-plexes,
and hence computing a k-core can be used as a scale-reduction step while detecting other
structures [6].
Most recently, the authors of the current paper proposed yet another clique relax-
ation model of a cluster, referred to as k-community, that aims to benefit from the positive
properties of k-cores while ensuring a higher level of cohesion [12]. More specifically,
a k-community is a connected subgraph such that endpoints of every edge have at least
k common neighbors within the subgraph. The k-community structure has proven to be
extremely effective in reducing the scale of very large, sparse instances of the maximum
clique problem [12]. This paper explores the potential of using the k-community structure
as a network clustering tool. Even though the proposed clustering algorithm does not aim
to optimize any of the quantitative measures of clustering quality, the results of numerical
experiments show that, with some exceptions, it performs quite well with respect to most
of such measures available in the literature.
The remainder of this paper is organized as follows. Section 2 provides the necessary
background information. Section 3 outlines the proposed network clustering algorithm.
Section 4 reports the results of numerical experiments on several benchmark instances,
and Section 5 concludes the paper.
2. Background
In this paper, a network is described by a simple (i.e., with no self-loops or multi-
edges) undirected graph G = (V, E) with the set V = {1, 2, . . . , n} of nodes and the set
E of edges. We call an unordered pair of nodes u and v such that {u, v} ∈ E adjacent
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
NETWORK CLUSTERING VIA CLIQUE RELAXATIONS: A COMMUNITY BASED APPROACH 131
3. Clustering Algorithm
The algorithm described in this section is based on the idea of finding k-communities
for large k and placing them in different clusters. To this end, we identify the largest k such
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
132 ANURAG VERMA AND SERGIY BUTENKO
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
NETWORK CLUSTERING VIA CLIQUE RELAXATIONS: A COMMUNITY BASED APPROACH 133
that the k -community of G is non-empty, and place all k -communities formed in distinct
clusters. Once this has been done, all the nodes that have been placed in clusters are
removed from G and the whole procedure is repeated till either k becomes small (reaches
a lower bound l provided by the user) or no vertices are left to cluster. If any vertex is left
to cluster, we attach it to the cluster that contains the most neighbors of that vertex. This
basic procedure is described in Algorithm 2.
In this algorithm, we stop when k becomes small enough so that a k-community be-
comes meaningless. For example, any set of vertices that induce a tree will form a 0-
community. While in some cases this might be the best possible option (the original graph
is a forest), for most clustering instances we would like the vertices in a cluster to share
more than just one edge with the remaining nodes. For this paper, the lower bound l was
set to 1 in Algorithm 2.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
134 ANURAG VERMA AND SERGIY BUTENKO
It should be noted that the clustering provided by Algorithm 2 does not aim to optimize
any criteria provided such as modularity, performance, average isolated inter-cluster con-
ductance (aixc), average isolated inter-cluster expansion (aixe), and minimum intra-cluster
density (mid) as described in the DIMACS 2011 challenge [7].
4. Computational Results
In this section we provide computational results obtained by using the k-community
and k-core clustering on the graph sets provided in the DIMACS 2011 challenge [7]. The
computational results were obtained on a desktop machine (Intel Xeon [email protected],
16 cores, 12GB RAM). All computations except for the final steps of attaching leftover
vertices to already formed clusters and the local search used only one core. The local
search and attaching leftover vertices were parallelized using OpenMP with 16 threads.
Table 1 presents the modularity and number of clusters found by Algorithm 2 using
the k-core and k-community clustering for 27 graphs. For each graph, the higher of the
two modularities as found be the two methods is highlighted in bold. It can be seen that
k-community clustering is better on about half of the instances (14 of the 27 graphs tested).
However, a closer look suggests that when the k-community based clustering significantly
outperforms (difference in modularity more than 0.2) k-core clustering in 5 of those 14 in-
stances, while k-community based clustering is significantly outperformed by k-core clus-
tering only once out of the remaining 13 instances. Some noteworthy examples are the
preferentialAttachment, smallworld, luxembourg.osm and belgium.osm graphs, where the
almost all nodes in the graph are identified as 4-, 6-, 1- & 1-cores respectively and placed
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
NETWORK CLUSTERING VIA CLIQUE RELAXATIONS: A COMMUNITY BASED APPROACH 135
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
136 ANURAG VERMA AND SERGIY BUTENKO
in one huge cluster by the k-core clustering. On the other hand, the k-community cluster-
ing is able to identify a more meaningful clustering. The examples provided in Figure 1
point to some potential reasons why k-cores are not able to cluster these graphs as well as
k-communities do.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
NETWORK CLUSTERING VIA CLIQUE RELAXATIONS: A COMMUNITY BASED APPROACH 137
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
138 ANURAG VERMA AND SERGIY BUTENKO
In addition, Table 1 also reports the time taken by the two approaches on each of the
graphs. It can be seen that our approach scales well for large graphs, with graphs with up
to 5 million vertices solved in reasonable time on a desktop machine.
Table 2 presents the modularity and number of clusters found by Algorithm 3 using the
k-core and k-community clustering for the same 27 graphs. It can be seen that k-community
based clustering outperforms k-core based clustering in 19 of the 27 instances. On an
average, the improvement in the modularity was 0.099 for the k-core based clustering and
0.122 for the k-community based clustering. The time required for clustering increases,
but is still within reasonable limit. A user can decide for or against using enhancements
depending on the trade-off between the extra time required and the increase in modularity.
Table 3 presents the modularity, coverage, mirror coverage, performance, average
isolated inter-cluster conductance, average isolated inter-cluster expansion, and minimum
intra-cluster density for the clusterings found by the basic Algorithm 2 and the enhanced
Algorithm 3 using the k-community based approach. For each graph, the table highlights
the higher modularity, performance, average isolated inter-cluster conductance, average
isolated inter-cluster expansion, and minimum intra-cluster density entries amongst the
respective columns. It can be noted that while the enhanced Algorithm 3 increases the
modularity, it has an adverse effect on the other clustering measures. This is an impor-
tant observation that suggests that modularity maximization should not be used as the sole
measure of good clustering.
5. Conclusion
This paper introduces k-community clustering, which can be thought of as something
between k-core clustering and clique partitioning. The use of polynomially computable
k-community not only provides a faster approach, but also provides a more effective clus-
tering method by being able to identify cohesive structures that might not be cliques. k-
Community clustering also provides advantages over k-core clustering due to the more
cohesive nature of a k-community. As our computational results show, both the k-core and
k-communities perform well for certain graphs, but k-community approach outperforms
the k-core approach in general.
Acknowledgements
This work was partially supported by NSF award OISE-0853804 and AFOSR Award
FA8651-12-2-0011.
References
[1] James Abello, Mauricio G. C. Resende, and Sandra Sudarsky, Massive quasi-clique detection, LATIN
2002: Theoretical informatics (Cancun), Lecture Notes in Comput. Sci., vol. 2286, Springer, Berlin, 2002,
pp. 598–612, DOI 10.1007/3-540-45995-2 51. MR1966153
[2] Richard D. Alba, A graph-theoretic definition of a sociometric clique, J. Mathematical Sociology 3 (1973),
113–126. MR0395938 (52 #16729)
[3] M. Altaf-Ul-Amin, K. Nishikata, T. Koma, T. Miyasato, Y. Shinbo, M. Arifuzzaman, C. Wada, and
M. Maeda et al., Prediction of protein functions based on k-cores of protein-protein interaction networks
and amino acid sequences, Genome Informatics 14 (2003), 498–499.
[4] J. I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, and A. Vespignani, k-core decomposition: a tool for the
visualization of large scale networks, CoRR abs/cs/0504107 (2005).
[5] G. D. Bader and C. W. V. Hogue, An automated method for finding molecular complexes in large protein
interaction networks, BMC Bioinformatics 4 (2003), no. 2.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
NETWORK CLUSTERING VIA CLIQUE RELAXATIONS: A COMMUNITY BASED APPROACH 139
[6] Balabhaskar Balasundaram, Sergiy Butenko, and Illya V. Hicks, Clique relaxations in social network anal-
ysis: the maximum k-plex problem, Oper. Res. 59 (2011), no. 1, 133–142, DOI 10.1287/opre.1100.0851.
Electronic companion available online. MR2814224 (2012e:91241)
[7] D IMACS, 10th DIMACS Implementation Challenge: Graph Partitioning and Graph Clustering,
http://www.cc.gatech.edu/dimacs10/, 2011.
[8] Michael R. Garey and David S. Johnson, Computers and intractability, W. H. Freeman and Co., San Fran-
cisco, Calif., 1979. A guide to the theory of NP-completeness; A Series of Books in the Mathematical
Sciences. MR519066 (80g:68056)
[9] J. Idicula, Highly interconnected subsystems of the stock market, NET Institute Working Paper No. 04-17,
2004, Available at SSRN: http://ssrn.com/abstract=634681.
[10] Stephen B. Seidman, Network structure and minimum degree, Social Networks 5 (1983), no. 3, 269–287,
DOI 10.1016/0378-8733(83)90028-X. MR721295 (85e:05115)
[11] Stephen B. Seidman and Brian L. Foster, A graph-theoretic generalization of the clique concept, J. Math.
Sociol. 6 (1978/79), no. 1, 139–154, DOI 10.1080/0022250X.1978.9989883. MR506325 (80j:92014)
[12] A. Verma and S. Butenko, Maximum clique problem on very large scale sparse networks.
I NDUSTRIAL & S YSTEMS E NGINEERING , 3131 TAMU, T EXAS A&M U NIVERSITY, C OLLEGE S TA -
TION , T EXAS
E-mail address: [email protected]
I NDUSTRIAL & S YSTEMS E NGINEERING , 3131 TAMU, T EXAS A&M U NIVERSITY, C OLLEGE S TA -
TION , T EXAS
E-mail address: [email protected]
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11708
1. Introduction
Many complex networks, such as those arising in biology [V], social sciences [B1]
and epidemiology [B3] exhibit community structure, that is, there exists a natu-
ral division of groups of vertices that are tightly connected within themselves and
sparsely connected across the groups. Identifying such naturally occurring commu-
nities is an important operation in analyzing complex networks. A popular method
for obtaining good communities is by optimizing the modularity of the network.
The higher the modularity, generally the better the distribution into communities.
Therefore, many community detection algorithms are designed with the objective
function of improving the modularity.
There exists several issues in using modularity as a metric for community de-
tection. Finding the maximum modularity is a NP-complete problem [B5] and
therefore like other combinatorial optimization problems, the ordering of the ver-
tices in the network can significantly affect the results. Although high modularity
values often indicate good divisions into communities, the highest modularity value
need not reflect the best community division, as in examples exhibiting the reso-
lution limit [G1]. Similarly a near-optimal modularity does not necessarily mean
2013
c American Mathematical Society
141
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
142 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
the division is also near-optimal. However, this metric has been effective in find-
ing communities in networks, where there exists an inherent and strong commu-
nity structure–the key proposition being that the network can be naturally divided
into distinct communities. Most community detection algorithms, that are based
on modularity optimization, however, do not contain any mechanism to ascertain
whether the network indeed has a modularity structure. This is a ”chicken-and-
egg” problem because in order to discover communities, we first have to make sure
that they exist.
In this paper, we propose a solution to this problem by introducing the concept
of base clusters in communities. Base clusters consist of sets of vertices that form
the kernel of each community in the network, and are groups of vertices that are al-
ways assigned to the same community, independent of the modularity maximization
algorithm employed or the order in which the vertices are processed.
A naive, but effective method for identifying these base clusters of vertices
would be to execute different community detection methods, and different vertex
orderings and then comparing the groupings to find vertices that are always assigned
to the same cluster. This approach has been implemented in [O] as part of their
ensemble learning and recently in [L1] where they are called consensus clusters.
However this technique is expensive because it requires executing all the algorithms
in the set, and the effect of a bad permutation may persist over several of the
methods. We propose an orthogonal method of finding base clusters that is based
only on the topology of the network.
In addition to indicating whether a network indeed posseses community struc-
ture, base clusters can also be used as a preprocessing step to modularity maximiza-
tion. First the base clusters are identified and assigned to the same community,
because they are guaranteed to be in the same group, and then modularity maxi-
mization is applied to the smaller network. Combining base clusters as an initial
step helps bias the network to move towards the correct community division and
generally also increases modularity. In this paper, we study the effect of preprocess-
ing using base clustering on two agglomerative modularity maximization methods–
(i) proposed by Clauset et. al. in [C] (henceforth referred to as the CNM method)
and (ii) proposed by Blondel et. al. in [B2] (henceforth referred to as the Louvain
method). These two methods are both based on a greedy approach of combining
pairs of vertices at each step that lead to the most increase in modularity.
The remainder of this paper is arranged as follows. In Section 2, we provide
a brief overview of the network terminology used in this paper, short descriptions
and a comparison of the CNM and Louvain methods and discussion on a few other
preprocessing algorithms for modularity maximization. In Section 3, we present
our main contribution–an algorithm to find base clusters in networks. In Section
4, we present experimental results of using base clusters as a preprocessing step
to modularity maximization and discuss the effectiveness of this technique. We
conclude in Section 5 with a discussion of future research.
2. Background
Terminology: A network (or graph) G = (V, E) consists of a set of vertices
V and a set of edges E. An edge e ∈ E is defined by two vertices {u, v} which are
called its endpoints. A vertex u is a neighbor of v if they are share an edge. The
degree of a vertex u is the number of its neighbors. A path, of length l, in a graph
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 143
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
144 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 145
times to get a consensus of the smaller clusters. Once these clusters are obtained,
then the base algorithm is executed over the entire graph. Note that all these three
preprocessing steps are variations of finding a kernel community, like our proposed
method of base clusters. Our method differs in that we try to estimate to base
clusters based only on the topology of the network and instead of presupposing
the existence of a communities, we use base clusters to estimate wether a network
indeed has good community structure. Our clusters should ideally be invariant for
a given network because they are not based on random selections such as the seeded
method and the random walk nor on the effect of an underlying algorithm as in
the case of the ensemble learning method. However, this is not always possible
practically, and the benefits and issues of the base clustering method are discussed
in Section 3.
Other works, not on preprocessing, but dealing with core communities include
a study statistically significant communities by perturbing the connectivity of the
network and then comparing change in community structures by Karrer et. al. [K1]
and a recent work by Fortunato et. al. [L1] that looks at the consensus communities
over different community detection algorithms on synthetically generated networks
of varying degrees of community structure.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
146 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
members of cliques may not always fall in the same community. In the example
vertices (2,3,4,5) form a clique. But a partition of the six vertices as ({1}, {2,3,4,5},
{6}) gives a negative modularity of -.06 This is because the even though the vertices
in the clique are tightly connected amongst themselves, each subgroup (2,3) and
(4,5) also have a strong connection to an external community. For example (2,3)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 147
has two edges to the external vertex (1) and also two edges to the internal vertex
(4). Thus after (2,3) is combined it is equally likely that it can combine with (1)
or with (4) or with (5).
Ideally, each subgroup within a base community should have more internal con-
nections than external ones, to resist the pull from vertices outside the group. But
it is expensive to find groups of vertices that satisfy this condition. We therefore
temporize and look for clusters where the number of internal connections is consid-
erable greater than the external connections. In the results presented in this paper
we set the parameters such that the number of external connections is less than
half the number of internal connections. However unless the network has extremely
well-defined communities, even this condition is not always prevalent.
To accommodate base clusters with more external edges, we note that having
more external edges is not necessarily bad so long as the external connections are
to different communities. This way the ”pull” from other communities is reduced,
even though there are more outside connections. Figure 2b gives an example where
the network is partitioned such that despite having more external edges, the ”pull”
is dissipated amongst different communities. The problem however, is that we have
not yet grouped the vertices into communities. Therefore, we do not know which
of edges point to the same community.
We use the community diameter to estimate the kernels. We define a commu-
nity to have diameter d, if the shortest path between two vertices in that community
is d. We assume that consensus communities have diameters of at least 2. Then, if
a base cluster is composed of a core vertex and its distance-1 neighbors, the neigh-
bors of neighbors, i.e. vertices at distance 2 from the core vertex are first ones that
can be on the edge of the community. We identify base clusters such that these
distance-2 vertices exert less pull on the distance-1 neighbors as follows;
We compute the fill-in of the vertices in the network and identify ones with low
fill-in (generally 0-2). We form a temporary base community C composed of the
vertex v and its neighbors. If the number of internal connections of each vertex in
C is more than twice the number of external (to the core) connections then C is
designated as a base community.
Otherwise, we consider set N of the distance-2 neighbors of v, that are not ele-
ments of C. The edges in N can be classified as follows; (i) one endpoint connected
to a vertex in community C (type X); (ii) both endpoints connected to vertices in
community N (type Y) and (iii) one end point connected to a vertex that is niether
in C nor N (type Z). A vertex in C is considered to be suitable for the base cluster,
if that vertex; (a) has fewer edges of type X than of type Y and (b)has fewer edges
of type X and Y together than of type Z. Condition (a) ensures that the distance-2
neighbors do not have significantly more connections to the vertices in the base
cluster to pull them out and condition (b) ensures that the set of external vertices
has a larger ”pull” from external communities other than C and therefore it is likely
that they will not exert as much ”pull” on the vertices within C.
It is possible that a vertex can be designated to be in multiple base clusters. If
a vertex has multiple affiliations to several communities, we remove them. A side
effect of removing these vertices is that the size of the base clusters now depends
on the vertex ordering and the base clusters also become smaller. However this
procedure reduces the ambiguity of the clusters, so we apply it for the current
version of the algorithm. The pseudocode of our heuristic is shown in Algorithm 1.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
148 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
Our algorithm focuses on finding the innermost kernel of the consensus commu-
nities, and as such the size of the base clusters is likely to be considerably smaller
than the ones found by the other preprocessing methods discussed in Section 2.
However, recall that the primary objective of our algorithm is to check whether
community structure at all exists in the network. In this respect, we are more
successful than the other methods because our algorithm will not return any base
community if there is no community in the network of diameter larger than two.
For example, our method returns zero base clusters for the Delaunay meshes, which
ideally do not have community structure. Our method also returns zero base clus-
ters for the Football graph (network of American college football) [G1]. This is
interesting because Football is known to have distinct communities. However, the
diameters of the communities are in most cases at most two and the lowest fill-in
of the vertices is more than 10. Due to the absence of tight kernels our algorithm
cannot find any base clusters. The ratio of the number of vertices in base clusters
to the total vertices provides an estimate of the strength of the communities in the
network.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 149
are;(i) Karate (network of member in a Zachary’s karate club [Z] (V=34, E=78), (ii)
Jazz (network of jazz musicians) [G2](V=198, E=2742), (iii) PolBooks (network of
books about USA politics) [K2](V=105, E=441)), (iv) Celegans (metabolic network
of C. elegans) [D2] (V=453, E=2025), (v) (social network of dolphins) [L3](V=62,
E=159), (vi) Email (the network of e-mail interchanges between members of the
Univeristy Rovira i Virgili) [G3](V=1133, E=5451), (vii) Power(topology of power
grid in the western states of USA) [W](V=4941, E=6594) and (vii) PGP (compo-
nent of the network of users of the Pretty-Good-Privacy algorithm) [B4] (V=10680,
E=24316).
Algorithm Implementation. Although our underlying modularity maximization
methods CNM and Louvain are extensively used in the network community, the
available codes do not include provisions for preprocessing. We also could not
find any easy to modify open source code that implements both the methods.
Therefore to include the preprocessing step and to ensure a fair comparison we
implemented the methods (in STL C++) along with the additional preprocessing
for finding base clusters. The primary purpose of the code was to understand
how using base clusters affect modularity maximization. Therefore although the
results match the original versions, performance issues, such as execution time, are
not optimized in our implementation. We anticipate in future to develop a faster
version of the algorithm. Here we highlight some of the special characteristics of
our implementation.
Unlike, most other implementations which uses adjacency lists, we use a com-
pressed sparse row (CSR) structure to store the network. CSR is a standard format
for storing sparse matrices. We used this storage because in future versions we plan
to use matrix operations on the network. Additionally, even though the networks
are undirected we store both directions of the edges (i.e. {v,w} as well as {w,v}).
This is done to accommodate the code for directed networks when required. These
features make the implementation slower than other versions of the algorithm.
However, we are building towards a general software, not just an algorithm for base
clusters. In these set of experiments time was used only to compare the different
methods against each other in the same environment.
In the CNM code we implemented a heap, as is popularly used, to find the high-
est change in modularity. However, as the iterations progressed the heap continued
to collect obsolete values associated with edges whose endpoints have merged to
the same or different communities. The solution was either to recreate the heap
after each iteration or to verify that the highest value in the heap with the value
stored in the network, and continue until a valid value was obtained. Both these
options are computationally expensive. We implemented a compromise where the
heap is recreated only if a certain number of misses (top of the heap not being a
valid value) is encountered. We set this value to 2.
In the Louvain implementation provided by the authors, there is a function
for generating a random permutation of the vertices. The random permutation is
not an essential part of the algorithm itself as it is described in [B2], but rather,
we think, is included to ameliorate the effect of vertex perturbations. However,
in our experiments we specifically want to see the effect of vertex permutations
and compare its effects across the CNM and Louvain methods and their variations
using base clusters. Therefore we did not include the random permutation within
the Louvain implementation. The Louvain method also recreates a compressed
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
150 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
network at the end of each outer loop. This process reduces the performance
time significantly as the subsequent operations are executed on a much smaller
network. In our code, we keep use the final community allocation of the vertices
to identify which are compressed into a supernode, but retain the original network.
Consequently, our execution times for the larger networks are substantially slower
than compared to the code provided by the authors.
Empirical Results We applied 60 permutations to each of the networks in the
test suite. The permutation orders were created using a random number generator.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 151
For each permutation we applied the CNM and the Louvain method as well as
the methods after finding and combining the base clusters. The statistics of the
modularity obtained by these four methods is given in Tables 1 and 2.
We see that in general using base clustering increases the average modularity
value as well as the highest one. There are a few exceptions, such as in average
for Jazz and maximum for power in CNM and average for Email and Celegans and
max for PGP in Louvain. In general, the improvement is higher for CNM, than
for the Louvain methods. We believe that this is due to the backtracking feature
of the Louvain algorithm. We also compare the standard deviations of the values
across the different perturbations. The range of values in Louvain is not as affected
by using base clusters as those of CNM. This phenomena once again points to the
backtracking feature of the Louvain method, which automatically the process to
adjust from any initial position to a good community partition. This leads us to
conclude that the base clustering preprocessing would be most effective when the
underlying algorithm does not contain self-adjusting steps.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
152 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
The last column in Table 1 gives the percentage of vertices in the base clusters
to the total number of vertices. We see that compared to other networks in the
set the percentage is rather low (9%) for the Power network, which indicates poor
community structure and also matches with our observations in Figure 1. The
PGP network also has a low percentage (4%) of base cluster vertices, but since
the network was large we only sampled 10% of the total vertices for fill-in. If the
sample percentage is adjusted, the percentage of base clusters can go upto (40%).
The last column in Table 2 compares the best known modularity value obtained
using other preprocessing methods. The ensemble strategy is denoted by (E), the
random walk strategy by (R) and for networks where preprocessing was not used
we tabulated the best known values listed in [N1] and denoted these as (O for
other). For networks with well defined community structures (karate and jazz)
base clustering can come very close to the highest modularity, but not so much for
the others. We believe this is because (i) base clusters try to find the kernels of the
communities and is therefore independent of modularity and (ii) due to the much
smaller size of the base clusters.
Figure 3 plots the change in modularity over all the permutations of the Dolphin
and the Power networks. In the Dolphin network we can see that using base clusters
gives a significant boost to the CNM method. Also observe that although, in
general, the Louvain methods can produce higher modularity, there exists certain
case where the CNM with base communities method is equivalent to the Louvain
method. This points to the importance of obtaining good permutations for a given
algorithm and also indicates that the Dolphin network posses community structure.
In contrast, the values in the Power network are well separated. As we know, Power
network does not have as strong a community structure so perhaps separation of
values by two algorithms is an indication of that. We plan to further investigate
this phenomena in future.
Table 3 compares the difference in community structure across the original and
the method with base cluster preprocessing using the Rand Index. Most of the
values (with an exception in Email) are quite high (over 77%). However the values
are generally higher for the Louvain method, once again reflecting the effect of self
adjustment. Table 4 gives the average time (in seconds) to compute the original
methods, the original methods with preprocessing and the time for only prepro-
cessing. The codes were compiled with GNU-g++ and the experiments were run
on a Xeon dual-core processor with 2.7GHz speed and a 32 GB RAM. We see that
in some cases preprocessing helps reduce the overall agglomeration time, however
finding the base clusters is generally as expensive as is our current implementation
of modularity maximization. But that since the base clusters depend only on the
network topology, finding them can be a one time operation. After that we can
reuse the clusters for any underlying algorithm. Although, not implemented in this
paper, this technique can help make base cluster preprocessing more cost effective.
It would also be instructive to compare how good our base cluster algorithm is
in finding kernels of the consensus communities. However to analyze this we would
have to compute the consensus communities themselves, such as by comparing the
common groups over multiple perturbations. This is possible for small networks,
but not for large ones like PGP–because as the number of vertices grows it is
important to check out large number of perturbations (as close to n! as possible) to
cover as much of the search space as possible. In this paper we have computed the
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 153
consensus communities for Jazz, Dolphin and Power. Jazz has 86% of its vertices in
the three highest consensus communities (our base cluster found 26%) and Dolphin
has 74% of its vertices in the three highest consensus communities (our base cluster
found 22%). These numbers are encouraging because we are only looking at the
kernel —not the entire community and on inspecting the base clusters obtained,
that in most cases they indeed belong to the same consensus cluster. However there
are some false positives in that if nodes of two clusters are closely attached–they
can appear as a base cluster. This happens for some permutations in Jazz and
Dolphins, and those are the ones where the modularity is not as high. For example
out of 53 of vertices, in Jazz, denoted to be in the base communities 5 were false
positives. We found that the Louvain method is less forgiving of the false positives
than the CNM method. In order to reduce the chances of false positives, for the
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
154 SRIRAM SRINIVASAN, TANMOY CHAKRABORTY, AND SANJUKTA BHOWMICK
Louvain method, we only used cluster sizes ranging from 2-4. In future we plan
to further modify the base cluster identification algorithm to reduce these false
positives.
The Power network has just 1% of its vertices in the largest three consensus
communities, yet by our method we were able to find 9% of the nodes. On inspection
we found that this happened because the base cluster method picked up many of
the smaller communities, that were built around a vertex with low fill-in. Once
again, we need more stringent conditions in our algorithm to avoid picking up very
small communities.
References
[B1] A. L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek, Evolution of
the social network of scientific collaborations, Phys. A 311 (2002), no. 3-4, 590–614, DOI
10.1016/S0378-4371(02)00736-7. MR1943379
[B2] V.D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre. Fast unfolding of community
hierarchies in large networks. J. Stat. Mech. (10) (2008)
[B3] Statistical mechanics of complex networks, Lecture Notes in Physics, vol. 625, Springer-
Verlag, Berlin, 2003. Selected papers from the 18th Conference held in Sitges, June 10–14,
2002; Edited by R. Pastor-Satorras, M. Rubi and A. Diaz-Guilera. MR2179067 (2007e:82002)
[B4] M. Boguna, R. Pastor-Satorras, A. Diaz-Guilera and A. Arenas, Physical Review E., vol. 70,
056122 (2004).
[B5] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wag-
ner. On modularity clustering. IEEE Transactions on Knowledge and Data Engineering,
20(2):172188, (2008)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
IDENTIFYING BASE CLUSTERS 155
[C] A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large
networks. Phys. Rev. E. 70(6), 66111 (2004)
[D1] Dimacs Testbed http://www.cc.gatech.edu/dimacs10/downloads.shtml
[D2] J. Duch and A. Arenas. Community identification using Extremal Optimization , Physical
Review E., vol. 72, (2005)
[G1] M. Girvan and M. E. J. Newman, Community structure in social and biological net-
works, Proc. Natl. Acad. Sci. USA 99 (2002), no. 12, 7821–7826 (electronic), DOI
10.1073/pnas.122653799. MR1908073
[G1] Benjamin H. Good, Yves-Alexandre de Montjoye, and Aaron Clauset, Performance of mod-
ularity maximization in practical contexts, Phys. Rev. E (3) 81 (2010), no. 4, 046106, 19,
DOI 10.1103/PhysRevE.81.046106. MR2736215 (2011g:05279)
[G2] P.Gleiser and L. Danon ,Adv. Complex Syst. 6, 565 (2003)
[G3] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt and A. Arenas. Physical Review E. , vol.
68, 065103(R), (2003).
[K1] B. Karrer, E. Levina, and M. E. J. Newman. Robustness of community structure in networks
Physical Review E. Vol. 77, No. 4. (2008)
[K2] V. Kreb. Books on US politics. http://www.orgnet.com/
[L1] A. Lancichinetti and S. Fortunato. Consensus clustering in complex networks. Scientific
Reports Vol 2 (2012)
[L2] D. Lai, H. Lu and C. Nardini. Enhanced modularity-based community detection by random
walk network preprocessing. Phys. Rev. E. 81, 066118 (2010)
[L3] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson. Behavioral
Ecology and Sociobiology 54, 396-405 (2003)
[N1] M. C. V. Nascimento and L. S. Pitsouli. Community Detection by Modularity Maximiza-
tion using GRASP with Path Relinking. 10th DIMACS Implementation Challenge Graph
Partitioning and Graph Clustering. (2012)
[N2] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in net-
works.Phys. Rev. E. 69(2), 026113 (2004)
[O] M. Ovelgonne and A. Geyer-Schulz. An Ensemble Learning Strategy for Graph Clustering.
10th DIMACS Implementation Challenge Graph Partitioning and Graph Clustering. (2012)
[P] Mason A. Porter, Jukka-Pekka Onnela, and Peter J. Mucha, Communities in networks,
Notices Amer. Math. Soc. 56 (2009), no. 9, 1082–1097. MR2568495
[R1] W. Rand, Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc.
66 (336), 846850 (1971)
[R2] J. Reidy, D. A. Bader, K. Jiang, P. Pande and R. Sharma. Detecting Communities from
Given Seeds in Social Networks. Technical Report. http://hdl.handle.net/1853/36980
[V] K. Voevodski, S. H. Teng, Y. Xia. Finding local communities in protein networks. BMC
Bioinformatics 10(10), 297 (2009)
[W] D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).
[Z] W. W. Zachary, An information flow model for conflict and fission in small groups, Journal
of Anthropological Research 33, 452-473 (1977)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11710
1. Introduction
The aim of graph clustering is to identify subgraphs of high internal connectivity
that are only sparsely interconnected. This vague notion lead to countless attempts
of formalizing properties that characterize a set of good clusters. The resulting
variety of different quality measures still affects the design of algorithms, although
for many measures the sufficiency of the underlying properties is not examined yet
or has been even disproven. This is, a good clustering according to a non-sufficient
quality measure might be still implausible with respect to the given graph structure.
For example, Montgolfier et al. [4] showed that the asymptotic modularity of grids
is 1, which is maximum since modularity ranges whithin [−0.5, 1]. However, by
intuition the uniform structure of grids does not support any meaningful clustering,
and thus, also a clustering of high modularity can not be plausible. Furthermore,
common quality measures are most generally hard to optimize. Thus, heuristics
are often used in practice.
2010 Mathematics Subject Classification. Primary 05C85, 05C75; Secondary 05C21, 05C40.
This work was partially supported by the DFG under grant WA 654/15 within the Priority
Programme “Algorithm Engineering”.
2013
c Michael Hamann, Tanja Hartmann, and Dorothea Wagner
157
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
158 MICHAEL HAMANN, TANJA HARTMANN, AND DOROTHEA WAGNER
1 The inter-cluster expansion considered in this work is defined slightly different from the
common inter-cluster expansion. The latter normalizes by the number of vertices on the smaller
cut side while we count the vertices on the side that does not induce the cluster.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
COMPLETE HIERARCHICAL CUT-CLUSTERING: A CASE STUDY 159
returns only fine clusterings with low modularity values if there are no other plau-
sible clusterings supported by the graph structure. Based on this result we claim
modularity applied to cut-clusterings as a good measure for how well a graph can
be clustered.
2. Preliminaries
Throughout this work we consider a simple, undirected, weighted graph G =
(V, E, c) with vertex set V , edge set E and a non-negative edge cost function c. In
unweighted graphs we assign cost 1 to each edge. We denote the numberof vertices
(edges) by n := |V | (m := |E|) and the costs of a set E ⊆ E by c(E ) := e∈E c(e).
Whenever we consider the degree deg(v) of v ∈ V , we implicitly mean the sum of
all edge costs incident to v. With S, T ⊂ V , S ∩ T = ∅, we write c(S, T ) for the
costs of all edges having one endpoint in S and one in T . If S, T induce a cut in G,
i.e., S ∪ T = V , c(S, T ) describes the costs of this cut.
Our understanding of a clustering Ω(G) is a partition of the vertex set V
into subsets C 1 , . . . , C k , which define vertex-induced subgraphs, called clusters. A
cluster is called trivial if it corresponds to a connected component. A vertex that
forms a non-trivial singleton cluster we consider as unclustered. A clustering is
trivial if it consists of trivial clusters or if k = n, i.e., all vertices are unclustered.
A hierarchy of clusterings is a sequence Ω1 (G) ≤ · · · ≤ Ωr (G) such that Ωi (G) ≤
Ωj (G) implies that each cluster in Ωi (G) is a subset of a cluster in Ωj (G). We say
Ωi (G) ≤ Ωj (G) are hierarchically nested.
2.1. Quality Measures. A quality measure for clusterings is a mapping to
real numbers. Depending on the measure, either high or low values correspond to
high quality. In this work we consider three quality measures, modularity, intra-
cluster expansion and inter-cluster expansion. The former two indicate high quality
by high values. Inter-cluster expansion indicates good quality by low values.
Modularity was first introduced by Newman and Girvan [11] and is based on
the total edge costs covered by clusters. The values range between −0.5 and 1
and express the significance of a given clustering compared to a random clustering.
Formally, the modularity M(Ω) of a clustering Ω is defined as
M(Ω) := c(EC )/c(E) − ( deg(v))2 /4c(E)2 ,
C∈Ω C∈Ω v∈C
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
160 MICHAEL HAMANN, TANJA HARTMANN, AND DOROTHEA WAGNER
Algorithm 1: CutC
Input: Graph Gα = (Vα , Eα , cα )
1 Ω←∅
2 while ∃ u ∈ Vα \ {t} do
3 C u ← community of u in Gα w.r.t. t
4 r(C u ) ← u
5 forall the C i ∈ Ω do
6 if r(C i ) ∈ C u then Ω ← Ω \ {C i }
7 Ω ← Ω ∪ {C u } ; Vα ← Vα \ C u
8 return Ω
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
COMPLETE HIERARCHICAL CUT-CLUSTERING: A CASE STUDY 161
αmax Ωmax
<
>
...
...
< <
> >
α1 Ω1
α0 Ω0
Otherwise, there would exist a set S ⊆ C \{r(C)} such that c(S, C \S) < c(S, V \C).
This implies that the cut (C \ S, V \ (C \ S)) is a cheaper r(C)-t-cut in Gα than
the cut (C, V \ C), which induces the cluster C. This contradicts the fact that C
is the community of r(C) in Gα . The costs of these cuts in Gα are
c(C \ S, V \ (C \ S)) + α|C \ S| =
c(C \ S, S) + c(C \ S, V \ C) + α|C \ S| < c(S, V \ C) + c(C \ S, V \ C) + α|C|
= c(C, V \ C) + α|C|.
With similar arguments Flake et al. have further proven that the parameter
value α that was used to construct the augmented graph Gα constitutes a guarantee
on intra-cluster expansion and inter-cluster expansion:
Ψ(Ω) ≥ α ≥ Φ(Ω).
Applying CutC iteratively with decreasing parameter values yields a hierarchy
of at most n different clusterings (cp. Fig. 1). This is due to a further nesting
property of communities, which is proven by Gallo et al. [6] as well as Flake et al. [5]:
Let C1 denote the community of a fixed vertex u in Gα1 and C2 the community
of u in Gα2 . Then it is C1 ⊆ C2 if α1 ≥ α2 . The hierarchy is bounded by two
trivial clusterings, which we already know in advance. The clustering at the top
consists of the connected components of G and is returned by CutC for αmax = 0,
the clustering at the bottom consists of singletons and comes up if we choose α0
equal to the maximum edge cost in G.
The crucial point with the construction of such a clustering hierarchy, however,
is the choice of α. If we choose the next value too close to a previous one, we get a
clustering we already know, which implies unnecessary effort. If we choose the next
value too far from any previous value, we possibly miss a meaningful clustering. In
our experiments we thus use a simple parametric search approach that returns a
complete hierarchy without fail. For a detailed description of this approach see [8].
In order to find all different levels in the hierarchy, this approach constructs the
breakpoints in the continuous parameter range between consecutive levels. This is,
each clustering Ωi is assigned to an interval [αi , αi−1 ) where CutC returns Ωi . The
breakpoint αi marks the border to the next higher clustering Ωi+1 , whereas αi−1
is the breakpoint between Ωi and the previous level Ωi−1 . Thus the guarantee on
expansion given by the parameter can be extended to
Ψ(Ωi ) ≥ αi−1 > αi ≥ Φ(Ωi )
for each cut-clustering Ωi in the complete hierarchy. We call [αi , αi−1 ) the guarantee
interval of Ωi .
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
162 MICHAEL HAMANN, TANJA HARTMANN, AND DOROTHEA WAGNER
graph n m graph n m
karate 34 78 dolphins 62 159
lesmis 77 254 polbooks 105 441
adjnoun 112 425 football 115 613
jazz 198 2742 celegansneural 297 2148
celegans metabolic 453 2025 delaunay n10 1024 3056
email 1133 5451 polblogs 1490 16715
netscience 1589 2742 delaunay n11 2048 6127
bo cluster 2114 2203 data 2851 15093
delaunay n12 4096 12264 dokuwiki org 4416 12914
power 4941 6594 hep-th 8361 15751
PGPgiantcompo 10680 24316 astro-ph 16706 121251
cond-mat 16726 47594 as-22july06 22963 48436
cond-mat-2003 31163 120029 rgg n 2 15 s0 32768 160240
cond-mat-2005 40421 175691 G n pin pout 100000 501198
3. Experimental Study
The experiments in this work aim at two questions. The first question asks
how much more information the given guarantee on expansion provides, compared
to a trivial intra-cluster expansion bound that is easy to compute. Recall that
computing the intra-cluster expansion of a clustering is NP-hard, and thus, bounds
give at least an idea of the true values. Since we are nevertheless interested in the
actual intra-cluster expansion of cut-clusterings, we consider a further, non-trivial
lower bound, which is more costly to compute but also more precise than the trivial
bound. Finally we also look at the inter-cluster expansion, which can be efficiently
computed for a clustering. The second question focuses on the modularity values
that can be reached by cut-clusterings, and the plausibility of these values with
respect to the graph structure.
For our experiments we use real world instances as well as generated instances.
Most instances are taken from the testbed of the 10th DIMACS Implementation
Challenge [1], which provides benchmark instances for partitioning and clustering.
Additionally, we consider the protein interaction network bo cluster published by
Jeong et al. [9], a snapshot of the linked wiki pages at www.dokuwiki.org, which
we gathered ourselves, and 275 snapshots of the email-communication network of
the Department of Informatics at KIT [2]. The latter have around 200 up to 400
vertices. The sizes of the remaining instances are listed in Table 1. Our analysis
considers only one cut-clustering per instance, namely the cut-clustering with the
best modularity value of all clusterings in the complete hierarchy. The results for
the snapshots of the email network are depicted separately from the remaining
instances in the following figures, for the sake of a better readability. Furthermore,
the instances are decreasingly ordered by the amount of unclustered vertices in
the cut-clusterings, which corresponds to an increasing order by coarseness. The
instances, respectively their clusterings, are associated with points on the x-axis.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
COMPLETE HIERARCHICAL CUT-CLUSTERING: A CASE STUDY 163
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
164 MICHAEL HAMANN, TANJA HARTMANN, AND DOROTHEA WAGNER
singletons
singletons
singletons
singletons
1.5
1.0
0.5
0.0
powe r
cond-mat
delauna y n10
rgg n 2 15 s0
netscience
cond-mat-2003
PGPgian tcom po
delauna y n11
as22july06
* G n pin pout
polb ooks
*adjnoun
lesmis
hep-th
*dolphins
celegans metab.
data
astro-ph
celegansneural
p olblogs
*jazz
karate
cond-mat-2005
email
fo otball
bo cluster
guarantee interval cut-clusterings: A B Bu
2.5
expansion b ounds
2.0
1.5
1.0
0.5
0.0
275 snapshots of the email net work of the Departme nt of Informarics at KIT.
independent instances ignoring the edges between the clusters, the resulting bound
A (Ω) potentially lies above the guarantee interval, which is also confirmed by our
experiment (cp. Figure 2). This is, most of the cut-clusterings are even better than
guaranteed. Besides, by reaching the upper bound Bu (Ω) in some further cases, the
bound A (Ω) increases the amount of instances for which we know the intra-cluster
expansion for sure to 20%.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
COMPLETE HIERARCHICAL CUT-CLUSTERING: A CASE STUDY 165
allows single vertices to move in order to further increase modularity. Note, that
computing a modularity-optimal clustering is NP-hard [3].
Since high modularity values are known to be misleading in some cases, we fur-
ther establish a plausibility check by testing whether the clusters of the reference
clusterings satisfy the significance-property, which guarantees that they are clearly
indicated by the graph structure. Recall that the clusters of the cut-clusterings own
this property due to their construction. Figure 3 shows the percentage amount of
significant clusters, i.e., clusters with the significance-property, for the reference
clusterings. To get also a better idea of the structure of the cut-clusterings, we
present the percentage amount of unclustered vertices in these clusterings. Unclus-
tered vertices may occur due to the strict behavior of the cut-clustering algorithm,
which is necessary in oder to guarantee the significance-property. Note that in
contrast none of the reference clusterings contains unclustered vertices. As a last
structural information on the clusterings, Figure 3 depicts the cluster sizes in terms
of whisker-bars.
With this bunch of information at hand we can say the following: In some cases
the modularity of the cut-clusterings is quite low, however, it increases with the
amount of clustered vertices and the size of the clusters. It also reaches very high
values, in particular for the snapshots of the email network and the ”netscience”
instance. This is a rather unexpected behavior since the cut-clustering algorithm
is not designed to optimize modularity. We further observe a gap between the
modularity values of many cut-clusterings and those of the according reference
clusterings. We conjecture that this is caused more by an implausibility of the
modularity values of the reference clusterings than by an implausibility of the cut-
clusterings. Our conjecture is based on the observation, that the more significant
the clusters in the reference clustering are, the closer comes the references modu-
larity to the modularity of the cut-clustering, suggesting that the cut-clusterings
are more trustable.
Furthermore, the Delaunay triangulations and the snapshots of the email net-
work are nice examples that also vividly reveal the meaningfulness and plausibility
of the cut-clusterings. The latter consider emails that were sent at most 72 hours
ago. In contrast to other email networks, which consider a longer period of time,
this makes the snapshots very sparse and stresses recent communication links, which
yields clear clusters of people that recently work together. Thus, we would expect
any feasible clustering approach to return meaningful non-trivial clusters. This is
exactly what the cut-clustering algorithm as well as the modularity-based greedy
approach do. In contrast, the Delaunay triangulations generated from random
points in the plane are quite uniform structures. By intuition significant clusters
are rare therein. The cut-clustering algorithm confirms our intuition by leaving
all vertices unclustered. This explains the low modularity values of these clus-
terings and indicates that the underlying graph can not be clustered well. The
modularity-based reference clusterings, however, contradict the intuition, as they
consist of large clusters containing at least 20 vertices.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
166 MICHAEL HAMANN, TANJA HARTMANN, AND DOROTHEA WAGNER
unclustered vertices
significant clusters 0.9
0.6
0.3
0.9
modularity
0.6
0.3
0.0
20
cluster sizes
15
10
PGPgiantcom po
as22july06
adjnoun
lesmis
delauna y n12
power
celegansneural
delauna y n10
delauna y n11
hep-th
astro-ph
cond-mat-2005
bo cluster
celegans metab.
jazz
cond-mat
cond-mat-2003
data
rgg n 2 15 s0
netscience
karate
fo otball
0.9
0.6
0.3
0.9
mo dularit y
0.6
0.3
0.1
275 snapshots of the email net work of the Departme nt of Informarics at KIT.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
COMPLETE HIERARCHICAL CUT-CLUSTERING: A CASE STUDY 167
singletons
singletons
singletons
singletons
1.5
1.0
0.5
0.0
PGPgiantcompo
celegans metab.
*cond-mat
*dolphins
*rgg n 2 15 s0
*cond-mat-2005
*cond-mat-2003
*hep-th
power
email
*celegansneural
as22july06
data
*adjnoun
karate
*astro-ph
*polblogs
*jaz z
bo cluster
*netscience
delauna y n12
delauna y n11
*G n pin pout
fo otball
2.0
1.5
1.0
0.5
0.0
275 snapshots of the email network of the Department of Informarics at KIT.
Figure 4 compares the guarantee interval and the alternative non-trivial lower
bound A (Ω) for the cut-clusterings (already seen in Section 3.1) to the bounds for
the modularity-based clusterings. Regarding the snapshots of the email network
we omit depicting A (Ω) for the cut-clusterings.
We observe that the trivial lower bound B (Ω) stays clearly below the guar-
antee, and compared to the trivial bound for the cut-clusterings in Section 3.1
(cp. Figure 2) this behavior is even more evident. For the instances different from
the snapshots of the email network the values of B (Ω) are so low so that we omit
depicting them.
In contrast, the alternative non-trivial lower bound A (Ω) for the modularity-
based clusterings often exceeds the guarantee interval, particularly for the snap-
shots. Nevertheless, it does rarely reach the corresponding bound for the cut-
clusterings. For 85% of the instances it rather stays below the best lower bound
for the cut-clustering. Thus, with respect to the lower bounds, there is no evidence
that the intra-cluster expansion of the modularity-based clusterings surpasses that
of the cut-clusterings. The upper bound Φ(Ω), which drops below the best lower
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
168 MICHAEL HAMANN, TANJA HARTMANN, AND DOROTHEA WAGNER
bound for the cut-clusterings in 23% of the cases, even proves a lower intra-cluster
expansion for these clusterings. The according instances in the upper part of Fig-
ure 4 are marked by a star.
4. Conclusion
In this work we examined the behavior of the hierarchical cut-clustering algo-
rithm of Flake et al. [5] in the light of expansion and modularity. Cut-clusterings
are worth being studied since, in contrast to the results of other clustering ap-
proaches, they provide a guaranteed intra-cluster expansion and inter-cluster ex-
pansion and are clearly indicated by the graph structure. The latter materializes
in the significance-property, which says that each set of vertices in a cluster is at
least as strongly connected to the remaining vertices in the cluster as to the vertices
outside the cluster.
Our experiments document that the given guarantee on intra-cluster expansion
provides a deeper insight compared to a trivial bound that is easy to compute.
The true intra-cluster expansion and inter-cluster expansion turned out to be even
better than guaranteed. An analog analysis of the expansion of modularity-based
clusterings could further give no evidence that modularity-based clusterings surpass
cut-clusterings in terms of intra-cluster expansion. On the contrary, around one
fourth of the considered modularity-based clusterings could be proven to be worse
than the cut-clusterings.
Within the modularity analysis we could reveal that, although it is not designed
to optimize modularity, the hierarchical cut-clustering algorithm fairly reliably finds
clusterings of good modularity if those clusterings are structurally indicated. Other-
wise, if no good clustering is clearly indicated, the cut-clustering algorithm returns
only clusterings of low modularity. This confirms a high trustability of the cut-
clustering algorithm and justifies the use of modularity applied to cut-clusterings
as a feasible measure for how well a graph can be clustered.
Acknowledgements. We thank Markus Völker for technical support and Ignaz
Rutter for proofreading and helpful suggestions on the structure of this paper. We
further thank the anonymous reviewer for the thoughtful comments.
References
[1] 10th DIMACS Implementation Challenge – Graph Partitioning and Graph Clustering, 2011,
http://www.cc.gatech.edu/dimacs10/.
[2] Dynamic network of email communication at the Department of Informatics at Karlsruhe
Institute of Technology (KIT), 2011, Data collected, compiled and provided by Robert Görke
and Martin Holzer of ITI Wagner and by Olaf Hopp, Johannes Theuerkorn and Klaus
Scheibenberger of ATIS, all at KIT. i11www.iti.kit.edu/projects/spp1307/emaildata.
[3] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Görke, Martin Höfer, Zoran Nikoloski,
and Dorothea Wagner, On Modularity Clustering, IEEE Transactions on Knowledge and Data
Engineering 20 (2008), no. 2, 172–188.
[4] Fabien de Montgolfier, Mauricio Soto, and Laurent Viennot, Asymptotic Modularity of Some
Graph Classes, Proceedings of the 22nd International Symposium on Algorithms and Com-
putation (ISAAC’11), 2011, pp. 435–444.
[5] Gary William Flake, Robert E. Tarjan, and Kostas Tsioutsiouliklis, Graph clustering and
minimum cut trees, Internet Math. 1 (2004), no. 4, 385–408. MR2119992 (2005m:05210)
[6] Giorgio Gallo, Michael D. Grigoriadis, and Robert E. Tarjan, A fast parametric maxi-
mum flow algorithm and applications, SIAM J. Comput. 18 (1989), no. 1, 30–55, DOI
10.1137/0218003. MR978165 (90b:68038)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
COMPLETE HIERARCHICAL CUT-CLUSTERING: A CASE STUDY 169
[7] R. E. Gomory and T. C. Hu, Multi-terminal network flows, J. Soc. Indust. Appl. Math. 9
(1961), 551–570. MR0135624 (24 #B1671)
[8] Michael Hamann, Complete hierarchical Cut-Clustering: An Analysis of Guarantee and Qual-
ity, Bachelor’s thesis, Department of Informatics, Karlsruhe Institute of Technology (KIT),
2011, http://i11www.iti.uni-karlsruhe.de/teaching/theses/finished.
[9] Hawoong Jeong, Sean P. Mason, Albert-László Barabási, and Zoltan N. Oltvai, Lethality and
Centrality in Protein Networks, Nature 411 (2001), 41–42.
[10] David Lisowski, Modularity-basiertes Clustern von dynamischen Graphen im Offline-Fall,
Master’s thesis, Department of Informatics, Karlsruhe Institute of Technology (KIT), 2011,
http://i11www.iti.uni-karlsruhe.de/teaching/theses/finished.
[11] Mark E. J. Newman and Michelle Girvan, Finding and evaluating community structure in
networks, Physical Review E 69 (2004), no. 026113, 1–16.
[12] Randolf Rotta and Andreas Noack, Multilevel local search algorithms for modularity clus-
tering, ACM J. Exp. Algorithmics 16 (2011), Paper 2.3, 27, DOI 10.1145/1963190.1970376.
MR2831090 (2012g:90232)
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11712
1. Introduction
Clustering graphs into disjoint vertex sets is a fundamental challenge in many
areas of science [3, 16, 22, 23]. It has become a central tool in network analysis.
With the recent rise in the availability of data on large scale real-world networks,
the need for fast algorithms capable of clustering such instances accurately has
increased significantly.
There is no generally accepted notion of what constitutes a good clustering,
and in many cases the quality of a clustering is application specific. However, there
are several widely accepted measurements for clustering quality called clustering
indices. Among the most widespread clustering indices are expansion, conductance,
and modularity. In the following, we will focus on modularity. See [23] for a
discussion of the former two indices.
Modularity was proposed in [32] to analyze networks, and has recently grown in
popularity as a clustering index [15, 18–20, 27, 37]. In addition, several heuristics
based on greedy agglomeration [11, 29] and other approaches [30, 34] have been
proposed for the problem. Although it was shown in [7] that these provide no
approximation guarantee, for small real world instances the solutions produced by
these heuristics are usually within a very small factor of the optimum.
In general there are two algorithmic approaches to community detection which
are commonly known as agglomerative and divisive (see [28] for a short survey
of general techniques). Agglomerative approaches start with every vertex in a
separate cluster and successively merge clusters until the clustering can no longer
2013
c American Mathematical Society
171
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
172 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
2. Background
2.1. Preliminaries. In the following, G = (V, E, ω) is a weighted undirected
graph with ω : E → R+ as the weight function. A clustering C = {C1 , . . . , CK } is a
partition of the vertex set V . Each Ci is called a cluster. We use G(Ck ) to denote
the subgraph induced by the vertices in Ck , that is G(Ck ) = (Ck , Ck × Ck ∩ E, ω).
We define the weight of a vertex as the sum of the weights of its incident edges:
ψ(v) = u∈V,{u,v}∈E ω(u, v), and we use ψ(C ) to denote the sum of the weights of
all vertices in a cluster C . The sum of edge weights
between two vertex sets U and
T will be denoted by ω(U, T ), that is ω(U, T ) = {u,v}∈U×T ∩E ω(u, v). The sum of
the weights of all edges is denoted by ω(E), and the sum of the weights of the edges
whose both endpoints are in the same cluster C is denoted as ω(C ). Furthermore,
by cut(C) we denote the sum of the weights of all edges having vertices in two
different clusters of C.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 173
3. Algorithms
We follow the divisive approach to devise an algorithm for obtaining a clustering
with high modularity. The main motivation for choosing this approach is that for a
clustering C with two clusters, the coverage is just 1 − cut(C)/ω(E) and the second
term in (2.2) is minimized when clusters have equal weights. In other words, in
splitting a graph into two clusters so as to maximize the modularity, heuristics
for the NP-complete minimum bisection problem should be helpful (a more formal
discussion is given by Brandes et al. [7, Section 4.1]). We can therefore harness
the power and efficiency of the existing graph and hypergraph (bi-)partitioning
routines such as MeTiS [25], PaToH [10], and Scotch [33] in a divisive approach
to clustering for modularity.
given graph. Initially, all the vertices are in a single cluster. At every step, the
heaviest cluster, say Ck , is selected and split into two (by the subroutine Bisect),
if |Ck | > 2. If the bisection is acceptable, that is if the bisection improves the
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
174 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
3 3 3
2 u v w x y 2 C2
a b C1
4 3
modularity (see the line 6), the cluster Ck is replaced by the two clusters resulting
from the bisection. If not, the cluster Ck remains as is. The algorithm then proceeds
to another step to pick the heaviest cluster. The clustering C found during the
bisections is then refined in the subroutine RefineClusters that starts just after
the bisections.
The computational core of the algorithm is the Bisect routine. This routine
accepts a graph and splits that graph into two clusters using existing tools that are
used for the graph/hypergraph bisection problem. We have instrumented the code
in such a way that one can use MeTiS, PaToH, or Scotch quite effectively at this
point.
Unfortunately, there is no guarantee that it is sufficient to stop bisecting a
cluster as soon as a split on it reduced the modularity score. As finding a bipartition
of maximum modularity is NP-hard [7], it is possible that a Bisect step which
reduces modularity can be followed by a second Bisect step that increases it beyond
its original value. As an example, consider the graph in Fig. 1 which shows a
clustering, albeit a suboptimal one, that we will call C where C = {C1 , C2 }. This
clustering has the following modularity score
5 (3 + 4)2 + (2 + 3 + 3 + 3 + 2)2 18
−
p(C) = =− .
10 4 × 102 400
Since a trivial clustering {V } has modularity p({V }) = 0, we can easily see that
the clustering C reduces the modularity to negative. Now, consider the clustering
C = {C1 , C21 , C22 } which is obtained via a bipartition of C2 as shown in Fig. 2.
Clustering C has the following modularity:
4 (3 + 4)2 + (2 + 3 + 3)2 + (3 + 2)2 22
p(C ) =
− = .
10 4 × 102 400
Thus, clustering C2 has higher modularity than the initial trivial clustering {V }.
Of course, this effect is due to the suboptimal clustering C. However, since the
bipartitioning algorithm provides no approximation guarantee, we cannot preclude
this. Therefore, not bisecting a cluster anymore when a Bisect operation on it
reduces the modularity score has its drawbacks.
3.1. The bisection heuristic. Our bisection heuristic is of the form shown
in Algorithm 2 whose behavior is determined by a set of four parameters: a, imb,
b, and e. The first one, a, chooses which algorithm to use as a bisector. We have
integrated MeTiS, PaToH, and Scotch as bisectors. The bisection heuristics in
PaToH and Scotch accept a parameter imb that defines the allowable imbalance
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 175
3 3 3
C21 2 u v w x y 2 C22
a b C1
4 3
between the part weights. We modified a few functions in the MeTiS 4.0 library
to make the bisection heuristics accept the parameter imb. The other parameters
are straightforwardly used as follows: the bisection heuristic (Algorithm 2) applies
the bisector b times, refines each bisection e times and chooses the one that has the
best modularity.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
176 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
3.2. Refining the clustering. The last ingredient of the proposed clustering
algorithm is RefineClusters(G, K, C, p). It aims to improve the clustering found
during the bisections. Unlike the RefineBisection algorithm, this algorithm visits
the vertices in random order. At each vertex v, the gain values associated with
moving v from its current cluster to all others are computed. Among all those
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 177
moves, the most beneficial one is performed if doing so increases the modularity
score. If not, the vertex v remains in its own cluster. We repeat this process several
times (we use m as a parameter to control the number of such passes). The time
complexity of a pass is O(|V |K + |E|) for a K-way clustering of a graph with |V |
vertices and |E| edges.
4. Experiments
We perform a series of experiments to measure the effect of the various param-
eters on the modularity scores of the solutions produced by the algorithm, and to
evaluate overall performance of the approach.
To this end, we use a set of 29 popular test instances which have been used
in the past to study the modularity scores achieved by clustering algorithms. The
instances are from various resources [1, 2, 4–6, 21, 31, 35, 36] and are available at
http://www.cc.gatech.edu/dimacs10/.
We first test the algorithm using the standard parameter combination. It con-
sists of m=5 refinement rounds at the end and a bipartition parameter of b=1. No
refinements are performed during the algorithm (e=0). The results using PaToH,
MeTiS, and Scotch partitioning are shown in Table 1 below.
As expected, the results for each instance are very close, with a maximum
difference of less than 0.04. All partitioners provide good results, with PaToH
delivering somewhat higher modularity scores. However, using MeTiS consistently
yielded slightly inferior results. The same was true in preliminary versions of the
experiments described below. Thus, MeTiS was not considered in the following
experiments.
The slightly higher scores of PaToH can be explained by the fact that unlike
SCOTCH, it uses randomization. Even though this is not intended by the algorithm
design, when performing multiple partitioning runs during the Bisect routine, the
randomized nature of the PaToH gives it a slightly higher chance to find a superior
solution, which is generally kept by the algorithm.
In the next experiment, we investigate the effect of the refinement algorithm
RefineClusters on the final result. Table 2 shows the refined modularity scores
using a maximum of m = 5 refinement steps at the end of Algorithm 1 for PaToH
and Scotch partitioning, as opposed to the unrefined results (m = 0). On aver-
age, the effect of the clustering refinement step (RefineClusters) amounts to
an improvement of about 0.01 for Scotch and 0.0042 for PaToH. Our preliminary
experiments showed that increasing the number of refinement steps beyond m = 5
improves the end result only marginally in both cases. Although the improvement
for Scotch is slightly larger, the difference is not sufficient to completely equalize the
gap between the unrefined results for PaToH and Scotch. Since the computational
cost of the refinement heuristic is low, we will continue to use it in the following
experiments.
Furthermore, we investigate the influence of the number of repetitions of the
Bisector step on the modularity score by increasing the parameter b from 1 to 5.
Results are shown in Table 3 where we observe a slight positive effect for b = 5 as
compared to b = 1. It is interesting to note that even though PaToH is random-
ized, selecting the best out of 5 bisections has almost no effect. This is partially
because the RefineClusters operation finds the same improvements. Due to the
refinement, the total effect can even be negative since a different clustering might
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
178 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
Modularity score
Instance Vertices Edges PaToH Scotch MeTiS
adjnoun 112 425 0.2977 0.2972 0.2876
as-22july06 22963 48436 0.6711 0.6578 0.6486
astro-ph 16706 121251 0.7340 0.7238 0.7169
caidaRouterLevel 192244 609066 0.8659 0.8540 0.8495
celegans metabolic 453 2025 0.4436 0.4407 0.4446
celegansneural 297 2148 0.4871 0.4939 0.4754
chesapeake 39 170 0.2595 0.2624 0.2595
citationCiteseer 268495 1156647 0.8175 0.8119 0.8039
cnr-2000 325557 2738969 0.9116 0.9026 0.8819
coAuthorsCiteseer 227320 814134 0.8982 0.8838 0.8853
coAuthorsDBLP 299067 977676 0.8294 0.8140 0.8117
cond-mat 16726 47594 0.8456 0.8343 0.8309
cond-mat-2003 31163 120029 0.7674 0.7556 0.7504
cond-mat-2005 40421 175691 0.7331 0.7170 0.7152
dolphins 62 159 0.5276 0.5265 0.5246
email 1133 5451 0.5776 0.5748 0.5627
football 115 613 0.6046 0.6046 0.6019
G n pin pout 100000 501198 0.4913 0.4740 0.4825
hep-th 8361 15751 0.8504 0.8409 0.8342
jazz 198 2742 0.4450 0.4451 0.4447
karate 34 78 0.4198 0.4198 0.3843
lesmis 77 254 0.5658 0.5649 0.5656
netscience 1589 2742 0.9593 0.9559 0.9533
PGPgiantcompo 10680 24316 0.8831 0.8734 0.8687
polblogs 1490 16715 0.4257 0.4257 0.4257
polbooks 105 441 0.5269 0.5269 0.4895
power 4941 6594 0.9398 0.9386 0.9343
preferentialAttachment 100000 499985 0.3066 0.2815 0.2995
smallworld 100000 499998 0.7846 0.7451 0.7489
Average 0.6507 0.6430 0.6373
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 179
PaToH Scotch
Instance Unrefined Refined Improv. Unrefined Refined Improv.
adjnoun 0.2945 0.2977 0.0033 0.2946 0.2972 0.0026
as-22july06 0.6683 0.6711 0.0028 0.6524 0.6578 0.0054
astro-ph 0.7295 0.7340 0.0046 0.7183 0.7238 0.0055
caidaRouterLevel 0.8641 0.8659 0.0019 0.8506 0.8540 0.0035
celegans metabolic 0.4318 0.4436 0.0118 0.4343 0.4407 0.0064
celegansneural 0.4855 0.4871 0.0016 0.4905 0.4939 0.0034
chesapeake 0.2495 0.2595 0.0100 0.2624 0.2624 0.0000
citationCiteseer 0.8160 0.8175 0.0015 0.8094 0.8119 0.0025
cnr-2000 0.9116 0.9116 0.0000 0.8981 0.9026 0.0045
coAuthorsCiteseer 0.8976 0.8982 0.0005 0.8826 0.8838 0.0012
coAuthorsDBLP 0.8281 0.8294 0.0013 0.8115 0.8140 0.0025
cond-mat-2003 0.8443 0.8456 0.0013 0.8329 0.8343 0.0013
cond-mat-2005 0.7651 0.7674 0.0023 0.7507 0.7556 0.0049
cond-mat 0.7293 0.7331 0.0038 0.7084 0.7170 0.0086
dolphins 0.5155 0.5276 0.0121 0.5265 0.5265 0.0000
email 0.5733 0.5776 0.0043 0.5629 0.5748 0.0120
football 0.6009 0.6046 0.0037 0.6009 0.6046 0.0037
G n pin pout 0.4565 0.4913 0.0347 0.3571 0.4740 0.1169
hep-th 0.8494 0.8504 0.0010 0.8392 0.8409 0.0016
jazz 0.4330 0.4450 0.0120 0.4289 0.4451 0.0162
karate 0.4188 0.4198 0.0010 0.4188 0.4198 0.0010
lesmis 0.5658 0.5658 0.0000 0.5540 0.5649 0.0108
netscience 0.9593 0.9593 0.0000 0.9559 0.9559 0.0000
PGPgiantcompo 0.8830 0.8831 0.0001 0.8726 0.8734 0.0008
polblogs 0.4257 0.4257 0.0000 0.4247 0.4257 0.0010
polbooks 0.5266 0.5269 0.0004 0.5242 0.5269 0.0027
power 0.9394 0.9398 0.0003 0.9384 0.9386 0.0002
preferentialAttachment 0.3013 0.3066 0.0053 0.2461 0.2815 0.0353
smallworld 0.7838 0.7846 0.0008 0.7061 0.7451 0.0390
Average 0.6465 0.6507 0.0042 0.6329 0.6430 0.0101
and the algorithms come quite close, deviating by only 0.00047 from the optimum
values on average. The instance lesmis is a weighted graph and was treated as
such here. Therefore the modularity score obtained is higher than the unweighted
optimum computed in [8]. It is included here for the sake of completeness, but it
is not considered for the aggregated results.
For larger instances, obtaining optimum values is computationally infeasible.
Thus, the scores given here represent the best value found by other clustering
algorithms. Our algorithm surpasses those in 9 out of 13 instances, and its aver-
age modularity score surpasses the best reported values by 0.01. Naturally, most
clustering algorithms will be quite close in such a comparison, which renders the
difference quite significant.
Summing up, we conclude that the optimum configuration for our algorithm
uses PaToH for partitioning with RefineClusters at m = 5. For the bipartition
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
180 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
PaToH Scotch
Instance b=1 b=5 Difference b=1 b=5 Difference
adjnoun 0.2977 0.2990 0.0012 0.2972 0.2999 0.0027
as-22july06 0.6711 0.6722 0.0011 0.6578 0.6503 -0.0075
astro-ph 0.7340 0.7353 0.0012 0.7238 0.7261 0.0023
caidaRouterLevel 0.8659 0.8677 0.0018 0.8540 0.8576 0.0035
celegans metabolic 0.4436 0.4454 0.0017 0.4407 0.4467 0.0060
celegansneural 0.4871 0.4945 0.0074 0.4939 0.4942 0.0004
chesapeake 0.2595 0.2624 0.0029 0.2624 0.2624 0.0000
citationCiteseer 0.8175 0.8166 -0.0009 0.8119 0.8141 0.0022
cnr-2000 0.9116 0.9119 0.0003 0.9026 0.9052 0.0026
coAuthorsCiteseer 0.8982 0.8994 0.0012 0.8838 0.8872 0.0033
coAuthorsDBLP 0.8294 0.8306 0.0011 0.8140 0.8180 0.0040
cond-mat 0.8456 0.8469 0.0013 0.8343 0.8378 0.0035
cond-mat-2003 0.7674 0.7692 0.0018 0.7556 0.7593 0.0037
cond-mat-2005 0.7331 0.7338 0.0007 0.7170 0.7248 0.0078
dolphins 0.5276 0.5265 -0.0011 0.5265 0.5265 0.0000
email 0.5776 0.5768 -0.0008 0.5748 0.5770 0.0022
football 0.6046 0.6046 0.0000 0.6046 0.6046 0.0000
G n pin pout 0.4913 0.4915 0.0002 0.4740 0.4844 0.0104
hep-th 0.8504 0.8506 0.0002 0.8409 0.8425 0.0017
jazz 0.4450 0.4450 0.0000 0.4451 0.4451 0.0000
karate 0.4198 0.4198 0.0000 0.4198 0.4198 0.0000
lesmis 0.5658 0.5658 0.0000 0.5649 0.5649 0.0000
netscience 0.9593 0.9593 0.0000 0.9559 0.9591 0.0032
PGPgiantcompo 0.8831 0.8834 0.0004 0.8734 0.8797 0.0063
polblogs 0.4257 0.4257 0.0000 0.4257 0.4257 0.0000
polbooks 0.5269 0.5269 0.0000 0.5269 0.5269 0.0000
power 0.9398 0.9397 -0.0001 0.9386 0.9398 0.0012
preferentialAttachment 0.3066 0.3065 -0.0001 0.2815 0.2887 0.0073
smallworld 0.7846 0.7850 0.0004 0.7451 0.7504 0.0053
Average 0.6507 0.6514 0.0008 0.6430 0.6455 0.0025
5. Conclusion
We have presented a new algorithm for finding graph clusterings of high mod-
ularity. It follows a divisive approach by applying recursive bipartition to clusters.
In addition, it makes use of a standard refinement heuristic. It can be implemented
efficiently by making use of established partitioning software.
We experimentally established that the best modularity scores can be obtained
by choosing the best out of multiple partitionings during the bipartitioning step and
applying the refinement heuristic at the end of the algorithm. The modularity scores
obtained in this manner surpass those of previously known clustering algorithms.
A possible variant of the proposed algorithm that can be further studied would
accept bipartitions of inferior modularity for a limited number of recursion steps,
thereby alleviating the problem described in Section 3.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 181
PaToH Scotch
Instance e=0 e=5 Diff. e=0 e=5 Diff.
adjnoun 0.2977 0.3014 0.0037 0.2972 0.2941 -0.0031
as-22july06 0.6711 0.6653 -0.0058 0.6578 0.6581 0.0003
astro-ph 0.7340 0.7283 -0.0058 0.7238 0.7204 -0.0034
caidaRouterLevel 0.8659 0.8627 -0.0033 0.8540 0.8483 -0.0058
celegans metabolic 0.4436 0.4430 -0.0007 0.4407 0.4433 0.0026
celegansneural 0.4871 0.4945 0.0074 0.4939 0.4944 0.0005
chesapeake 0.2595 0.2658 0.0063 0.2624 0.2658 0.0034
citationCiteseer 0.8175 0.8145 -0.0030 0.8119 0.8088 -0.0031
cnr-2000 0.9116 0.9050 -0.0066 0.9026 0.9019 -0.0007
coAuthorsCiteseer 0.8982 0.8971 -0.0011 0.8838 0.8829 -0.0009
coAuthorsDBLP 0.8294 0.8276 -0.0018 0.8140 0.8106 -0.0033
cond-mat 0.8456 0.8424 -0.0031 0.8343 0.8333 -0.0010
cond-mat-2003 0.7674 0.7643 -0.0031 0.7556 0.7532 -0.0023
cond-mat-2005 0.7331 0.7309 -0.0022 0.7170 0.7142 -0.0028
dolphins 0.5276 0.5265 -0.0011 0.5265 0.5265 0.0000
email 0.5776 0.5748 -0.0028 0.5748 0.5647 -0.0101
football 0.6046 0.6032 -0.0013 0.6046 0.6032 -0.0013
G n pin pout 0.4913 0.4921 0.0009 0.4740 0.4872 0.0132
hep-th 0.8504 0.8472 -0.0031 0.8409 0.8412 0.0003
jazz 0.4450 0.4451 0.0001 0.4451 0.4271 -0.0181
karate 0.4198 0.4198 0.0000 0.4198 0.4198 0.0000
lesmis 0.5658 0.5658 0.0000 0.5649 0.5652 0.0003
netscience 0.9593 0.9551 -0.0042 0.9559 0.9558 -0.0001
PGPgiantcompo 0.8831 0.8791 -0.0040 0.8734 0.8732 -0.0002
polblogs 0.4257 0.4257 0.0000 0.4257 0.4257 0.0000
polbooks 0.5269 0.5108 -0.0161 0.5269 0.5108 -0.0161
power 0.9398 0.9373 -0.0024 0.9386 0.9346 -0.0040
preferentialAttachment 0.3066 0.3058 -0.0008 0.2815 0.2952 0.0137
smallworld 0.7846 0.7851 0.0005 0.7451 0.7857 0.0406
Average 0.6507 0.6488 -0.0018 0.6430 0.6429 0.0001
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
182 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 183
Acknowledgment
This work was supported in parts by the DOE grant DE-FC02-06ER2775 and
by the NSF grants CNS-0643969, OCI-0904809, and OCI-0904802.
References
[1] A. Arenas, Network data sets, available at http://deim.urv.cat/ aarenas/data/welcome.htm,
October 2011.
[2] Albert-László Barabási and Réka Albert, Emergence of scaling in random networks, Science
286 (1999), no. 5439, 509–512, DOI 10.1126/science.286.5439.509. MR2091634
[3] M. Bern and D. Eppstein, Approximation algorithms for geometric problems, Approximation
Algorithms for NP-Hard Problems (D. S. Hochbaum, ed.), PWS Publishing Co., Boston, MA,
USA, 1997, pp. 296–345.
[4] P. Boldi, B. Codenotti, M. Santini, and S. Vigna, Ubicrawler: A scalable fully distributed web
crawler, Software: Practice & Experience 34 (2004), no. 8, 711–726.
[5] P. Boldi, M. Rosa, M. Santini, and S. Vigna, Layered label propagation: A multiresolution
coordinate-free ordering for compressing social networks, Proceedings of the 20th interna-
tional conference on World Wide Web, ACM Press, 2011.
[6] P. Boldi and S. Vigna, The WebGraph framework I: Compression techniques, Proc. of the
Thirteenth International World Wide Web Conference (WWW 2004) (Manhattan, USA),
ACM Press, 2004, pp. 595–601.
[7] U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner, On
finding graph clusterings with maximum modularity, Graph-theoretic concepts in computer
science, Lecture Notes in Comput. Sci., vol. 4769, Springer, Berlin, 2007, pp. 121–132, DOI
10.1007/978-3-540-74839-7 12. MR2428570 (2009j:05215)
[8] S. Cafieri, P. Hansen, and L. Liberti, Locally optimal heuristic for modularity maximization
of networks, Phys. Rev. E 83 (2011), 056105.
[9] Ü. V. Çatalyürek and C. Aykanat, Hypergraph-partitioning-based decomposition for parallel
sparse-matrix vector multiplication, IEEE Transactions on Parallel and Distributed Systems
10 (1999), no. 7, 673–693.
[10] , PaToH: A multilevel hypergraph partitioning tool, version 3.0, Bilkent Univer-
sity, Department of Computer Engineering, Ankara, 06533 Turkey. PaToH is available at
http://bmi.osu.edu/ umit/software.htm, 1999.
[11] A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large
networks, Phys. Rev. E 70 (2004), 066111.
[12] B. Dasgupta and D. Desai, On the complexity of Newman’s finding approach for biological
and social networks, arXiv:1102.0969v1, 2011.
[13] D. Delling, R. Görke, C. Schulz, and D. Wagner, Orca reduction and contraction graph cluster-
ing, Proceedings of the 5th International Conference on Algorithmic Aspects in Information
and Management (Berlin, Heidelberg), AAIM ’09, Springer-Verlag, 2009, pp. 152–165.
[14] H. N. Djidjev and M. Onuş, Scalable and accurate graph clustering and community structure
detection, IEEE Transactions on Parallel and Distributed Systems, 99, Preprints, (2012).
[15] J. Duch and A. Arenas, Community detection in complex networks using extremal optimiza-
tion, Phys. Rev. E 72 (2005), 027104.
[16] B. Everitt, Cluster analysis, 2nd ed., Social Science Research Council Reviews of Current
Research, vol. 11, Heinemann Educational Books, London, 1980. MR592781 (82a:62082)
[17] C. M. Fiduccia and R. M. Mattheyses, A linear-time heuristic for improving network parti-
tions, DAC ’82: Proceedings of the 19th Conference on Design Automation (Piscataway, NJ,
USA), IEEE Press, 1982, pp. 175–181.
[18] P. F. Fine, E. Di Paolo, and A. Philippides, Spatially constrained networks and the evolution
of modular control systems, From Animals to Animats 9: Proceedings of the Ninth Inter-
national Conference on Simulation of Adaptive Behavior (S. Nolfi, G. Baldassarre, R. Cal-
abretta, J. Hallam, D. Marocco, O. Miglino, J. A. Meyer, and D. Parisi, eds.), Springer Verlag,
2006, pp. 546–557.
[19] S. Fortunato and M. Barthélemy, Resolution limit in community detection, Proceedings of
the National Academy of Sciences 104 (2007), no. 1, 36–41.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
184 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
DIVISIVE CLUSTERING 185
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
186 ÜMIT V. ÇATALYÜREK, KAMER KAYA, JOHANNES LANGGUTH, AND BORA UÇAR
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11701
1. Introduction
Graph clustering, i.e. the identification of cohesive submodules or ’natural’
groups in graphs, is an important technique in several domains. The identification
of functional groups in metabolic networks [GA05] and the identification of social
groups in friendship networks are two popular application areas of graph clustering.
Here we define graph clustering as the task of simultaneously detecting the num-
ber of submodules in a graph and detecting the submodules themselves. In contrast,
we use the term graph partitioning for the problem of identifying a parametrized
number of partitions where usually additional restrictions apply (usually, that all
submodules are of roughly equal size). Two recent review articles on graph cluster-
ing by Schaeffer [Sch07] and Fortunato [For10] provide a good overview on graph
clustering techniques as well as on related topics like evaluating and benchmarking
clustering methods.
Graph clustering by optimizing an explicit objective function became popular
with the introduction of the modularity measure [NG04]. Subsequently, a number
of variations of modularity [MRC05, LZW+ 08] have been proposed to address
shortcomings of modularity such as its resolution limit [FB07]. The identification
of a graph clustering by finding a graph partition with maximal modularity is NP-
hard [BDG+ 08]. Therefore, finding clusterings of a problem instance with more
2000 Mathematics Subject Classification. Primary 05C85; Secondary 05C70, 68R10, 90C27.
2013
c American Mathematical Society
187
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
188 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
than a few hundred vertices has to be based on good heuristics. A large number of
modularity optimization heuristics has been proposed in recent years, but most of
them have a poor optimization quality.
The objective of this contribution is to present a new graph clustering scheme,
called the Core Groups Graph Clustering (CGGC) scheme, which is able to find
high quality clustering by using an ensemble learning approach. In [OGS10] we
presented an algorithm called RG+ for maximizing the modularity of a graph par-
tition via an intermediate step of first identifying core groups of vertices. The RG+
algorithm was able to outperform all previously published heuristics in terms of
optimization quality. This paper deals with a generalization of this optimization
approach.
The paper has been organized in the following way. First, we briefly discuss
ensemble learning in Section 2. Then, we introduce the CGGC scheme in Section
3 and modularity maximization algorithms in Section 4. In Section 5, we evaluate
the performance of the CGGC scheme using modularity maximization algorithms
within the scheme. We discuss the scheme from the viewpoint of global analysis in
Section 6. Finally, a short conclusion follows in Section 7.
2. Ensemble Learning
Ensemble based systems have been used in decision making for quite some time.
Ensemble learning is a paradigm in machine learning, where several intermediate
classifiers (called weak or base classifiers) are generated and combined to finally get
a single classifier. The algorithms used to compute the weak classifiers are called
weak learners. An important notion is, that even if a weak learner has only a slightly
better accuracy than random choice, by combining several classifiers created by this
weak learner, a strong classifier can be created [Sch90]. For a good introduction
to this topic, see the review article by Polikar [Pol06].
Two examples of ensemble learning strategies are bagging and boosting. A bag-
ging algorithm for supervised classification trains several classifiers from bootstraps
of the training data. The combined classifier is computed by simple majority voting
of the ensemble of base classifiers, i.e. a data item gets the label the majority of
base classifiers assigns to that data item. A simple boosting algorithm (following
[Pol06]) works with classifiers trained from three subsets of the training data. The
first dataset is a random subset of the training data of arbitrary size. The second
dataset is created so that the classifier trained with the first dataset classifies half
of the data items correctly and the other half wrong. The third dataset consists of
the data items the classifiers trained by the first and the second dataset disagree
on. The strong classifier is the majority vote of the three classifiers.
Another ensemble learning strategy called Stacked Generalization has been
proposed by Wolpert [Wol92]. This strategy is based on the assumption that some
data points are more likely to be misclassified than others, because they are near
to the boundary that separates different classes of data points. First, an ensemble
of classifiers is trained. Then, using the output of the classifiers a second level of
classifiers is trained with the outputs of the ensemble of classifiers. In other words,
the second level of classifiers learns for which input a first level classifier is correct
or how to combine the “guesses” of the first level classifiers.
An ensemble learning strategy for clustering has been used by Fred and Jain
[FJ05], first. They called this approach evidence accumulation. They worked on
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 189
clustering data points in an Euclidean space. Initially, the data points are clustered
several times based on their distance and by means of an algorithm like k-means.
The ensemble of generated clusterings is used to create a new distance matrix
called the co-association matrix. The new similarity between two data points is the
fraction of partitions that assign both data points to the same cluster. Then, the
data points are clustered on basis of the co-association matrix.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
190 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
induced by P̂ . To create the induced graph, all vertices in a cluster in P̂ are merged
to one vertex in Ĝ. Accordingly, Ĝ has as many vertices as there are clusters in P̂ .
An edge (v, w) ∈ Ê has the weight of the combined weights of all edges in G that
connect vertices in the clusters represented by v and w. Then, the clustering of Ĝ
would have to be projected back to G to get a clustering of the original graph.
Agglomerative hierarchical optimization schemes often show the best scalability
for clustering algorithms as they usually make local decisions. A partial explanation
is that the number of partitions of n nodes in k classesn grows as a Stirling number
k
r (k − r) and that this implies that
1 j k
of the second kind S(n, k) = k! j=0 (−1)
growth of the search space is smaller in the bottom-up direction than in the top-
down direction [Boc74, p. 110]. For the example shown in Figure 1, we have
10 partitions (5 objects in 4 clusters) for the first bottom-up decision versus 15
partitions (5 objects in 2 clusters) for the first top-down decision.
While using only local information increases the scalability, it is a source of
globally poor decisions, too. Extracting the overlap of an ensemble of clusterings
provides a more global view. Figure 1 shows the complete merge lattice of an
example graph of 5 vertices. An agglomerative hierarchical algorithm always starts
with the partition into singletons (shown at the bottom) and merges in some way
the clusters until only one cluster containing all vertices remains (shown at the
top). Every merge decision means going one level up in the lattice. Restarting the
search at the maximal overlap of several partitions in an ensemble means to go back
to a point in the lattice from which all of the partitions in this ensemble can be
reached. If we restart the search for a good partition from this point, we will most
probably be able to reach other good partitions than those in the ensemble, too. In
fact, reaching other good or even better partitions than those in the ensemble will
be easier than starting from singletons as poor cluster assignments in the ensemble
have been leveled out.
3.1. The Iterated Approach. Wolpert [Wol92] discussed the problem that
some data points are harder to assign to the correct cluster than others. Data
points at the natural border of two clusters are harder to assign than those inside.
For the specific case of modularity maximization with agglomerative hierarchical
algorithms, we discussed the influence of prior merge decision on all later merges
in [OGS12]. Often, the order of the merge operations influences which side of the
border a vertex is assigned to. Node 3 in Figure 1 is an example for this effect.
With the help of the maximal overlaps of the CGGC scheme we try to separate
the cores of the cluster from the boundaries. The harder decisions on which clusters
contain the vertices at the boundaries are made, when the knowledge of the cores
provides additional information. This idea of separating cores and boundaries can
be iterated in the following way (subsequently denoted as the CGGCi scheme):
(1) Set P best to the partition into singletons and set Ĝ to G
(2) Create a set S of k (fairly) good partitions of Ĝ with base algorithm Ainitial
(3) Identify the partition P̂ of the maximal overlaps in S
(4) If P̂ is a better partition than P best , set P best = P̂ , create the graph Ĝ
induced by P̂ and go back to step 2
(5) Use base algorithm Afinal to search for a good partition of Ĝ
(6) Project partition of Ĝ back to G
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 191
Level 1: S(5,1)=1
(1 2 3 4 5)
(1 5) (2 3 4) (1 4) (2 3 5) (1 4 5) (2 3) (1 3) (2 4 5) Level 2: S(5,2)=15
1 (2 3 4 5) (1 3 5)(2 4)
(1 3 4)(2 5)
(1 2 3 4) 5 (1 3 4 5) 2
(1 2 4 5) 3
(1 2 3) (4 5) (1 2 4) (3 5) (1 2 3 5) 4 (1 2 5) (3 4) (1 2) (3 4 5)
Local Maximum Local Maximum Level 3: S(5,3)=25
1 2 (3 4 5) 1 (2 5) (3 4) 1 (2 4 5) 3 1 (2 3 5) 4 1 (2 3 4) 5
1 (2 4) (3 5) (1 5) 2 (3 4)
1 (2 3) (4 5) (1 5) (2 4) 3
(1 5) (2 3) 4
(1 2 3) 4 5
(1 4) 2 (3 5)
(1 2 4) 3 5 (1 4) (2 5) 3
(1 2 5) 3 4 (1 4) (2 3) 5
(1 2) (3 4) 5 (1 4 5) 2 3
(1 2) 3 (4 5) (1 3) 2 (4 5)
(1 2) (3 5) 4 (1 3) (2 5) 4
Saddle (1 3 4) 2 5 (1 3 5) 2 4 (1 3) (2 4) 5
(1 2) 3 4 5 (1 3) 2 4 5 (1 4) 2 3 5 (1 5) 2 3 4 1 (2 3) 4 5 1 (2 4) 3 5 1 (2 5) 3 4 1 2 (3 4) 5 1 2 (3 5) 4 1 2 3 (4 5)
In every new clustering P best some more vertices or groups of vertices have been
merged or rearranged. So, every new clustering is likely to provide more accurate
information on the structure of the graph for the succeeding iterations.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
192 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
4.1. Randomized Greedy (RG). Newman [New04] proposed the first al-
gorithm to be used to identify clusterings by maximizing modularity. The hierar-
chical agglomerative algorithm starts with a partition into singletons and merges
in each step one pair of clusters that causes the maximal increase in modularity.
The result is the cut of the dendrogram with the maximal modularity. This al-
gorithm is slow, as it considers to merge every pair of adjacent clusters in every
step. The complete search over all adjacent pairs also leads to an unbalanced merge
process. Some clusters grow faster than others and the size difference is a bias for
later merge decisions. Large clusters are merged with many small clusters in their
neighborhood, whether this is good from a global perspective or not [OGS12].
The randomized greedy algorithm [OGS10] is a fast agglomerative hierarchical
algorithm that has a very similar structure to Newman’s algorithm but does not
suffer from an unbalanced merge process. This algorithm selects in every step
a small sample of k vertices and determines the best merge involving one of the
vertices in the sample (see Algorithm 1). Because of the sampling, the algorithm can
be implemented quite efficiently and has a complexity of O(m ln n) (see [OGS10]).
join(nextjoin);
joinList ← joinList + nextjoin;
clusters ← extractClustersF romJoins(joinList) ;
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 193
classifiers and for the final clustering starting from the maximal overlap of these
partitions. To obtain a standardized naming of all other CGGC scheme algorithms
in this article we will denote this algorithms as CGGCRG in the following.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
194 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
the original vertices are summed up and give the new edge weights between the
new vertices. Then, the algorithm returns to the first phase and moves the new
vertices between clusters.
Noack and Rotta [NR09] experimentally investigated a framework of hierar-
chical agglomerative modularity optimization algorithms. While most algorithms
only use the modularity increase as the priority criterion, they analyzed several
other priority criteria that weight modularity increase in some way. Furthermore,
they considered merging more than one pair of vertices in every step and locally
refining the intermediate partitions regularly during the merging process (multi-
level refinement). With the best configuration of their framework Noack and Rotta
achieve significantly better results than Blondel et al. [BGLL08] at the price of a
much higher runtime.
Another well performing algorithm is the MOME algorithm by Zhu et al.
[ZWM+ 08]. In a first phase, the coarsening phase, the algorithm recursively cre-
ates a set of graphs. Starting with the input graph, each vertex of the graph will
be merged with the neighbor that yields the maximal increase in modularity. If the
modularity delta is negative for all neighbors, the vertex will be left as it is. The
resulting graph will be recursively processed until the graph can not be contracted
any more. Subsequently, in the uncoarsening phase, the set of successively col-
lapsed graphs will be expanded while the clustering gets refined by moving vertices
between neighboring clusters.
Many other algorithms have been proposed. For practical usage and to be used
within the CGGC scheme most of them are of no interest due to their inferior per-
formance in terms of modularity maximization or runtime efficiency. Among these
algorithms are several spectral algorithms ([WS05], [New06], [RZ07], [RZ08])
and algorithms based on generic meta heuristics like iterated tabu search [MLR06],
simulated annealing [MAnD05], or mean field annealing [LH07]. Formulations of
modularity maximization as an integer linear program (e.g. [AK08], [BDG+ 07])
allow finding an optimal solution without enumerating all possible partitions. How-
ever, processing networks with as few as 100 vertices is already a major problem
for current computers.
4.3.1. Refinement. The results of most modularity maximization algorithms
can be improved by a local vertex mover strategy. Noack and Rotta [NR09] sur-
veyed the performance of several strategies inspired by the famous Kernighan-Lin
algorithm [KL70]. We employ the fast greedy vertex movement strategy to the re-
sults of all evaluated algorithms, because all other strategies scale much worse with-
out providing significant improvements in quality. The fast greedy vertex mover
strategy sweeps iteratively over the set of vertices as long as moving a vertex to
one of its neighboring clusters improves modularity.
5. Evaluation
The clustering scheme is evaluated by means of real-world and artificial net-
works from the testbed of the 10th DIMACS implementation challenge on graph
partitioning and graph clustering. Memory complexity is a bigger issue than time
complexity for our algorithms and we had to omit the two largest datasets from
the category Clustering Instances because of insufficient main memory. We also
omitted the small networks with less than 400 vertices where many algorithms are
able to find the optimal partitions [OGS10].
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 195
CGGCLP
5 10 15 20
ensemble size
Before we conducted the evaluation, we first determined the best choice for
the number of partitions in the ensembles. The results of our tests (see Figure 3)
show that the ensemble size should be roughly ln n for all algorithms but CGGCLP .
When using LP as the base algorithm, the quality improves with increasing ensem-
ble size for the iterated scheme but heavily decreases for the non-iterated scheme
(see Figure 2). This seems to be a result of the weak learning quality of LP. A
larger ensemble size results in more and smaller core groups in the maximal overlap
partition. LP is not able to find a good clustering from finer decompositions when
not iteratively applied as in the CGGCi scheme.
The results in Table 2 show the average optimization quality and therefore the
quality we can expect when using the algorithm in a practical context. In Table
1 we show the boundary of the scheme, i.e. the best optimization quality we were
able to achieve using the scheme given much time.
While the iterated CGGCi scheme does not provide much improvement com-
pared to the non-iterated scheme when used with the RG algorithm (CGGCiRG
vs. CGGCRG ), its improvement for the LP algorithm is significant (CGGCiLP
vs. CGGCLP ). There is still a difference between the CGGCiRG and CGGCiLP .
But for most networks, CGGCiLP achieves better results than the standalone RG
algorithm which showed to be a quite competitive algorithm [OGS10] among non-
CGGC scheme algorithms.
A notable result is that the LP algorithm performs extremely bad on the pref-
erentialAttachment network (pref.Attach.). This network is the result of a random
network generation process where iteratively edges are added to the network and
the probability that an edge is attached to one vertex depends on the current degree
of the vertex. The average modularity for the standalone LP on the preferentialAt-
tachment network is extremely low as the algorithm identified only in 1 of 100 test
runs a community structure. In all other cases the identified clusterings were parti-
tions into singletons. Therefore, using LP within the CGGC scheme failed as well.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
196 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
PGPgiantcompo
0.8860
0.8850
Q
0.8840
CGGCiRG
CGGCiLP
CGGCRG
0.8830
5 10 15 20
ensemble size
caidaRouterLevel
0.872
0.871
0.870
Q
0.869
0.868
CGGCiRG
CGGCiLP
0.867
CGGCRG
5 10 15 20
ensemble size
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 197
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
198 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 199
CGGCiRG
0.75
8000
0.70
6000
#core groups
modularity
0.65
modularity
#core groups
0.60
4000
0.55
2000
0 10 20 30 40 50
iteration
CGGCiLP
0.75
10000
0.70
8000
0.65
#core groups
modularity
6000
modularity
0.60
#core groups
4000
0.55
0.50
2000
5 10 15 20 25
iteration
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
200 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
cluster (the sup of the lattice) at level k = 1. For a partition with k clusters we
have k(k−1)2 merge choices.
In each iteration of the CGGCi scheme, the search starts at some partition
P at level kP and goes up in the lattice to identify several new local maxima
(the partitions in the ensemble S). For example, the algorithm starts twice at the
partition 1 2 3 4 5 at level 5 in Figure 1 and reaches the two local maxima (1 2
3)(4 5) and (1 2)(3 4 5) at level 2. Then the algorithm goes down in the lattice to
the maximal overlap partition P̂ at a level kP̂ ≤ kP . In the example, this is the
partition (1 2) 3 (4 5) at level 3. In the worst case, when the ensemble of partitions
S created starting at P does not agree on any vertex, the maximal overlap is again
P and the core groups search stops. Otherwise, when the ensemble agrees on how
to merge at least one vertex, a new core groups partition is identified at a level
kP̂ < kP .
If a local optimum P has been reached by a hill-climbing method, all partitions
that have been visited on the way through the merge lattice to P have a lower
objective function value than the local optima. As can be seen from the merge
lattice given in Figure 1, there are usually many paths to get from the bottom
partition on level n to any other partition.
A path in the merge lattice can be identified by an ordered set of partitions. Let
FPi denote the set of all paths that connect the singleton partition to the partition
Pi , let Ω denote all partitions of a set of vertices V , and S be a set of partitions.
Then, P(S) = {P ∈ Ω | ∀Pi ∈ S ∃D ∈ FPi : P ∈ D} is the set of all partitions that
are included in at least one path to each partition in S. In other words, P(S) is the
set of all branch points from which all partitions in S can be reached. P(S) always
contains at least the singleton partition which all paths share as the starting point.
The maximal overlap P̂ of the ensemble of partitions in S is the partition in P(S)
with the minimal number of clusters. That means, P̂ is the latest point from where
a hierarchical agglomerative algorithm can reach all partitions in the ensemble. We
see that the core groups partition of the maximal overlap is a special partition as
it is a branching point in the merge path of the ensemble S.
For a moment, we put the merge path discussion aside and discuss Morse theory
which originates from the work of Morse on the topology of manifolds [Mor34].
Although the theory originally has been developed for continuous function spaces,
and we are dealing with discrete optimization, Morse theory provides a suitable
means to understand the topology of high-dimensional non-linear functions. In
the following, we assume that the discrete points (the partitions of a graph) are
embedded in a continuous space in such a way that the critical points (maxima,
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 201
A*
L1*
L2
B*
L3*
B
A
L4 B*(L3) A*(L1)
B A L5* L5
L5
L6 L6
L6 Local maximum
Saddlepoint
(a) (b)
Figure 5. Graph (a) and respective level line (b). The levels
marked with * are critical levels with Karush-Kuhn-Tucker points.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
202 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
D
A C
A*
B* C*
C* D* D* D
A* C
C
B*
D
A B B C C D B
Local maximum
Saddlepoint
-1 0 1 Local minimum
(a) (b)
different local optima can be reached depending on the direction the randomized
gradient algorithm follows. In contrast, gradient algorithms starting at points in
the interior of basins of attraction lead to one local maximum - even if they are
randomized.
Table 4 compares the properties of strict critical points for at least 2-times
continuously differentiable spaces with the properties of critical points in the merge
lattice of agglomerative hierarchical modularity clustering algorithms. Note, that
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 203
saddle points are characterized as split points of algorithm paths to critical points.
In Figure 1 such a path split occurs at the partition (1 2) 3 (4 5) with two paths
leading to the two local maxima (1 2 3)(4 5) and (1 2)(3 4 5).
Thus, the core groups partitions correspond to saddle points as in the path
space of a graph the core groups are branch points where the join-paths to local
maxima separate. As the core groups partitions correspond to saddle points, they
are good start points for randomized greedy algorithms. For other classifiers, e.g.
the label propagation algorithm, core groups partitions work well as long as the
classifiers reach points in different basins of attraction which is a weaker condition
than the requirement of reaching a local maximum. Obviously, in order to be a
good restart point in the CGGC scheme, other local optima need to be reachable
from a core group than those used to create the core groups, too. The rugged
mountain saddle shown in Figure 7 is a familiar example for such a branch point
in R3 . By iteratively identifying core groups of increasing modularity, we identify
saddle points that lead to higher and higher local maxima.
In summary, through the theoretical considerations of this section (and sup-
ported by the evaluation in Section 5) our explanation for the high optimization
quality of the CGGC scheme is:
• The operation of forming core groups partitions from sets of locally (al-
most) maximal partitions identifies (some) critical points on the merge
lattice of partitions.
• Core group partitions are good points for restarting randomized greedy al-
gorithms, because a core groups partition is a branch point (saddle point)
in the search space where different basins of attraction meet.
7. Conclusion
In this paper we have shown that learning several weak classifiers has a number
of advantages for graph clustering. The maximal overlap of several weak classifiers
is a good restart point for further search. Depending on the viewpoint, this ap-
proach can be regarded as a way to make first the ’easy’ decisions on which pairs of
vertices belong together and make ’harder’ decisions not before the unambiguous
ones have been made. When looking at the search space, maximal overlaps seem
to be capable of identifying those critical points from which especially randomized
gradient algorithms can find good local maxima.
As it turned out, when using the CGGCi scheme, the choice of base algorithm
has no major impact on the clustering quality. This is an important notion. Using
the core groups scheme, the base algorithm(s) can be selected because of other
considerations. For example, for most so far developed algorithms for modularity
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
204 MICHAEL OVELGÖNNE AND ANDREAS GEYER-SCHULZ
References
[AK08] G. Agarwal and D. Kempe, Modularity-maximizing graph communities via mathemat-
ical programming, Eur. Phys. J. B 66 (2008), no. 3, 409–418, DOI 10.1140/epjb/e2008-
00425-1. MR2465245 (2009k:91130)
[BDG+ 07] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Görke, Martin Hoefer, Zoran
Nikoloski, and Dorothea Wagner, On finding graph clusterings with maximum modu-
larity, Graph-theoretic concepts in computer science, Lecture Notes in Comput. Sci.,
vol. 4769, Springer, Berlin, 2007, pp. 121–132, DOI 10.1007/978-3-540-74839-7 12.
MR2428570 (2009j:05215)
[BDG+ 08] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran
Nikoloski, and Dorothea Wagner, On modularity clustering, IEEE Transactions on
Knowledge and Data Engineering 20 (2008), no. 2, 172–188.
[BGLL08] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre,
Fast unfolding of communities in large networks, Journal of Statistical Mechanics:
Theory and Experiment 2008 (2008), no. 10, P10008.
[Boc74] Hans Hermann Bock, Automatische Klassifikation, Vandenhoeck & Ruprecht,
Göttingen, 1974. Theoretische und praktische Methoden zur Gruppierung und
Strukturierung von Daten (Cluster-Analyse); Studia Mathematica/Mathematische
Lehrbücher, Band XXIV. MR0405723 (53 #9515)
[FB07] Santo Fortunato and Marc Barthélemy, Resolution limit in community detection,
Proceedings of the National Academy of Sciences of the United States of America
104 (2007), no. 1, 36–41.
[FJ05] Ana L. N. Fred and Anil K. Jain, Combining multiple clusterings using evidence
accumulation, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005), 835–850.
[For10] Santo Fortunato, Community detection in graphs, Phys. Rep. 486 (2010), no. 3-5,
75–174, DOI 10.1016/j.physrep.2009.11.002. MR2580414 (2011d:05337)
[GA05] R. Guimera and LAN Amaral, Functional cartography of complex metabolic networks,
Nature 433 (2005), 895–900.
[JJT00] Hubertus Th. Jongen, Peter Jonker, and Frank Twilt, Nonlinear optimization in finite
dimensions, Nonconvex Optimization and its Applications, vol. 47, Kluwer Academic
Publishers, Dordrecht, 2000. Morse theory, Chebyshev approximation, transversality,
flows, parametric aspects. MR1794354 (2001i:90002)
[KL70] B.W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs,
The Bell System Technical Journal 49 (1970), no. 1, 291–307.
[LH07] S. Lehmann and L.K. Hansen, Deterministic modularity optimization, The European
Physical Journal B - Condensed Matter and Complex Systems 60 (2007), no. 1, 83–88.
[LZW+ 08] Zhenping Li, Shihua Zhang, Rui-Sheng Wang, Xiang-Sun Zhang, and Luonan Chen,
Quantitative function for community detection, Physical Review E 77 (2008), no. 3,
036109.
[MAnD05] A. Medus, G. Acuña, and C.O. Dorso, Detection of community structures in networks
via global optimization, Physica A: Statistical Mechanics and its Applications 358
(2005), no. 2-4, 593–604.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
AN ENSEMBLE LEARNING STRATEGY FOR GRAPH CLUSTERING 205
[MLR06] Alfonsas Misevicius, Antanas Lenkevicius, and Dalius Rubliauskas, Iterated tabu
search: an improvement to standard tabu search, Information Technology and Control
35 (2006), 187–197.
[Mor34] Marston Morse, The calculus of variations in the large, Colloquium Publications of
the American Mathematical Society, vol. 18, American Mathematical Society, New
York, 1934.
[MRC05] Stefanie Muff, Francesco Rao, and Amedeo Caflisch, Local modularity measure for
network clusterizations, Physical Review E 72 (2005), no. 5, 056107.
[New04] Mark E. J. Newman, Fast algorithm for detecting community structure in networks,
Physical Review E 69 (2004), no. 6, 066133.
[New06] , Modularity and community structure in networks, Proceedings of the Na-
tional Academy of Sciences of the United States of America 103 (2006), no. 23,
8577–8582.
[NG04] Mark E. J. Newman and Michelle Girvan, Finding and evaluating community struc-
ture in networks, Physical Review E 69 (2004), no. 2, 026113.
[NR09] Andreas Noack and Randolf Rotta, Multi-level algorithms for modularity clustering,
Proceedings of the 8th International Symposium on Experimental Algorithms, Lecture
Notes in Computer Science, vol. 5526, Springer Berlin / Heidelberg, 2009, pp. 257–268.
[OGS10] Michael Ovelgönne and Andreas Geyer-Schulz, Cluster cores and modularity maxi-
mization, ICDMW ’10. IEEE International Conference on Data Mining Workshops,
2010, pp. 1204–1213.
[OGS12] Michael Ovelgönne and Andreas Geyer-Schulz, A comparison of agglomerative hi-
erarchical algorithms for modularity clustering, Challenges at the Interface of Data
Analysis, Computer Science, and Optimization, Studies in Classification, Data Anal-
ysis, and Knowledge Organization, Springer Berlin Heidelberg, 2012, pp. 225–232.
[Pol06] R. Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems
Magazine 6 (2006), no. 3, 21–45.
[RAK07] Usha Nandini Raghavan, Réka Albert, and Soundar Kumara, Near linear time algo-
rithm to detect community structures in large-scale networks, Physical Review E 76
(2007), no. 3, 036106.
[RZ07] Jianhua Ruan and Weixiong Zhang, An efficient spectral algorithm for network com-
munity discovery and its applications to biological and social networks, ICDM 2007,
Seventh IEEE International Conference on Data Mining, 2007, pp. 643–648.
[RZ08] Jianhua Ruan and Weixiong Zhang, Identifying network communities with a high
resolution, Physical Review E 77 (2008), 016104.
[Sch90] Robert E. Schapire, The strength of weak learnability, Machine Learning 5 (1990),
197–227.
[Sch07] Satu Elisa Schaeffer, Graph clustering, Computer Science Review 1 (2007), no. 1,
27–64.
[Wol92] David H. Wolpert, Stacked generalization, Neural Networks 5 (1992), no. 2, 241–259.
[WS05] S. White and P. Smyth, A spectral clustering approach to finding communities in
graphs, Proceedings of the Fifth SIAM International Conference on Data Mining,
SIAM, 2005, pp. 274–285.
[ZWM+ 08] Zhemin Zhu, Chen Wang, Li Ma, Yue Pan, and Zhiming Ding, Scalable community
discovery of large networks, WAIM ’08: Proceedings of the 2008 International Con-
ference on Web-Age Information Management, 2008, pp. 381–388.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11703
1. Communities in Graphs
Graph-structured data inundates daily electronic life. Its volume outstrips the
capabilities of nearly all analysis tools. The Facebook friendship network has over
845 million users [9]. Twitter boasts over 140 million new messages each day [34],
and the NYSE processes over 300 million trades each month [25]. Applications
of analysis range from database optimization to marketing to regulatory monitor-
ing. Global graph analysis kernels at this scale tax current hardware and software
architectures due to the size and structure of typical inputs.
One such useful analysis kernel finds smaller communities, subgraphs that lo-
cally optimize some connectivity criterion, within these massive graphs. We extend
the boundary of current complex graph analysis by presenting the first algorithm
for detecting communities that scales to graphs of practical size, over 100 million
vertices and over three billion edges in less than 500 seconds on a shared-memory
parallel architecture with 256 GiB of memory.
2010 Mathematics Subject Classification. Primary 68R10, 05C85; Secondary 68W10, 68M20.
This work was supported in part by the Pacific Northwest National Lab (PNNL) Center
for Adaptive Supercomputing Software for MultiThreaded Architectures (CASS-MT), NSF Grant
CNS-0708307, and the Intel Labs Academic Research Office for the Parallel Algorithms for Non-
Numeric Computing Program.
2013
c American Mathematical Society
207
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
208 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 209
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
210 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
community graph is halved with each iteration, our algorithm requires O(|E| ·
log |V |) operations, where |V | is the number of vertices in the input graph. If
the graph is a star, only two vertices are contracted per step and our algorithm
requires O(|E| · |V |) operations. This matches experience with the sequential CNM
algorithm [35].
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 211
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
212 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
We score an edge {i, j} by the negation of the change from old to new, or
φ(Si ) + φ(Sj ) − φ(Si ∪ Sj ). We again track the edge multiplicity in the edge weight
and the volume of the subgraph in the vertex weight.
3.1. Graph representation. We use the same core data structure as our
earlier work [30, 31] and represent a weighted, undirected graph with an array
of triples (i, j, w) for edges between vertices i and j with i
= j. We accumulate
repeated edges by adding their weights. The sum of weights for self-loops, i = j,
are stored in a |V |-long array. To save space, we store each edge only once, similar
to storing only one triangle of a symmetric matrix.
Unlike our initial work, however, the array of triples is kept in buckets defined
by the first index i, and we hash the order of i and j rather than storing the strictly
lower triangle. If i and j both are even or odd, then the indices are stored such
that i < j, otherwise i > j. This scatters the edges associated with high-degree
vertices across different source vertex buckets.
The buckets need not be sequential. We store both beginning and ending in-
dices into the edge array for each vertex. In a traditional sparse matrix compressed
format, the entries adjacent to vertex i + 1 would follow those adjacent to i. Per-
mitting non-sequential buckets reduces synchronization within graph contraction.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 213
Storing both i and j enables direct parallelization across the entire edge array. Be-
cause edges are stored only once, edge {i, j} can appear in the bucket for either i
or j but not both.
A graph with |V | vertices and |E| non-self, unique edges requires space for
3|V | + 3|E| 64-bit integers plus a few additional scalars to store |V |, |E|, and other
book-keeping data. Section 3.4 describes cutting some space by using 32-bit integers
for some vertex information.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
214 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
Our improved matching’s performance gains over our original method are mar-
ginal on the Cray XMT but drastic on Intel-based platforms using OpenMP. The
original method followed potentially long chains of pointers, an expensive operation
on Intel-based platforms. Scoring and matching together require |E| + 4|V | 64-bit
integers plus an additional |V | locks on OpenMP platforms.
4. Parallel Performance
We evaluate parallel performance on two different threaded hardware architec-
tures, the Cray XMT2 and an Intel-based server. We highlight two graphs, one real
and one artificial, from the Implementation Challenge to demonstrate scaling and
investigate performance properties. Each experiment is run three times to capture
some of the variability in platforms and in our non-deterministic algorithm. Our
current implementation achieves speed-ups of up to 13× on a four processor, 40-
physical-core Intel-based platform. The Cray XMT2 single-processor times are too
slow to evaluate speed-ups on that platform.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 215
Graph |V | |E|
uk-2002 18 520 486 261 787 258
kron g500-simple-logn20 1 048 576 44 619 402
4.3. Time and parallel speed-up. Figure 1 shows the execution time as a
function of allocated OpenMP thread or Cray XMT processor separated by platform
and graph. Figure 2 translates the time into speed-up against the best single-thread
execution time on the Intel-based platform. The execution times on a single XMT2
processor are too large to permit speed-up studies on these graphs. The results
are the best of three runs maximizing modularity with our parallel variant of the
Clauset, Newman, and Moore heuristic until the communities contain at least half
the edges in the graph. Because fewer possible contractions decrease the conduc-
tance, minimizing conductance requires three to five times as many contraction
steps and a proportionally longer time.
Maximizing modularity on the 105 million vertex, 3.3 billion edge uk-2007-05
requires from 496 seconds to 592 seconds using all 80 hardware threads of the Intel
E7-8870 platform. The same task on the Cray XMT2 requires from 2 388 seconds
to 2 466 seconds.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
216 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
●
368.0 s 349.6 s
● ●
10 2.5 ● 285.4 s
● ●
●●
●
10 2 84.9 s ●
●
72.1 s
Time (s)
●
●
● ●
●
10 1.5
●●
33.4 s
10 1
6.6 s
10 0.5
20 21 22 23 24 25 26 20 21 22 23 24 25 26
Number of threads / processors
Graph
● uk−2002 kron_g500−simple−logn20
5. Related Work
Graph partitioning, graph clustering, and community detection are tightly re-
lated topics. A recent survey by Fortunato [12] covers many aspects of community
detection with an emphasis on modularity maximization. Nearly all existing work
of which we know is sequential and targets specific contraction edge scoring mech-
anisms. Many algorithms target specific contraction edge scoring or vertex move
mechanisms [14]. Our previous work [30, 31] established and extended the first
parallel agglomerative algorithm for community detection and provided results on
the Cray XMT. Prior modularity-maximizing algorithms sequentially maintain and
update priority queues [8], and we replace the queue with a weighted graph match-
ing. Separately from this work, Fagginger Auer and Bisseling developed a similar
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 217
13 x
12
11● x
●
10
●
●
Speedup 8 ●
●
●
6 ●
4 ●
●
2
●
●
20 40 60 80
Number of threads / processors
Graph
● uk−2002 kron˙g500−simple−logn20
6. Observations
Our algorithm and implementation, the first published parallel algorithm for
agglomerative community detection, extracts communities with apparently high
modularity or low conductance in a reasonable length of time. Finding modularity-
maximizing communities in a graph with 105 million vertices and over 3.3 billion
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
218 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
kron_g500−simple−logn20
●
uk−2002
Graph name
mb
kron_g500−simple−logn20
●
●
●
●
uk−2002
cnm
kron_g500−simple−logn20
●
●
●
●
6
5
5.5
4.5
5.5
4.5
10
10
10
10
10
10
10
10
Number of communities
Intel E7−8870 Cray XMT2
uk−2002
cond
kron_g500−simple−logn20
●
Graph name
uk−2002
mb
kron˙g500−simple−logn20
●
●
●
●
●
●
uk−2002
cnm
kron_g500−simple−logn20
●
●
●
●
●
●
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
Coverage
Intel E7−8870 Cray XMT2
uk−2002
cond
kron_g500−simple−logn20
●
Graph name
uk−2002
mb
kron_g500−simple−logn20
●
●
●
●
●
uk−2002
cnm
kron˙g500−simple−logn20
●
●
●
●
●
●
●
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Mirror Coverage
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 219
edges requires a little over eight minutes on a four processor, Intel E7-8870-based
server. Our implementation can optimize with respect to different local optimiza-
tion criteria, and its modularity results are comparable to a state-of-the-art se-
quential implementation. By altering termination criteria, our implementation can
examine some trade-offs between optimization quality and performance. As a twist
to established sequential algorithms for agglomerative community detection, our
parallel algorithm takes a novel and naturally parallel approach to agglomeration
with maximum weighted matchings. That difference appears to reduce differences
between the CNM and MB edge scoring methods. The algorithm is simpler than ex-
isting sequential algorithms and opens new directions for improvement. Separating
scoring, choosing, and merging edges may lead to improved metrics and solutions.
Our implementation is publicly available1 .
Outside of the edge scoring, our algorithm relies on well-known primitives that
exist for many execution models. Much of the algorithm can be expressed through
sparse matrix operations, which may lead to explicitly distributed memory imple-
mentations through the Combinatorial BLAS [7] or possibly cloud-based imple-
mentations through environments like Pregel [19]. The performance trade-offs for
graph algorithms between these different environments and architectures remain
poorly understood.
Besides experiments with massive real-world data sets, future work includes the
extension of the algorithm to a streaming scenario. In such a scenario, the graph
changes over time without an explicit start or end. This extension has immediate
uses in many social network applications but requires algorithmic changes to avoid
costly recomputations on large parts of the graph.
Acknowledgments
We thank PNNL and the Swiss National Supercomputing Centre for providing
access to Cray XMT systems. We also thank reviews of previous work inside Oracle
and anonymous reviewers of this work.
References
[1] R. Andersen and K. Lang, Communities from seed sets, Proc. of the 15th Int’l Conf. on World
Wide Web, ACM, 2006, p. 232.
[2] D.A. Bader and J. McCloskey, Modularity and graph algorithms, Presented at UMBC, Sep-
tember 2009.
[3] D.A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Competition rules and objec-
tive functions for the 10th DIMACS Implementation Challenge on graph partitioning and
graph clustering, http://www.cc.gatech.edu/dimacs10/data/dimacs10-rules.pdf, Septem-
ber 2011.
[4] J.W. Berry., B. Hendrickson, R.A. LaViolette, and C.A. Phillips, Tolerating the community
detection resolution limit with edge weighting, CoRR abs/0903.1072 (2009).
[5] Béla Bollobás, Modern graph theory, Graduate Texts in Mathematics, vol. 184, Springer-
Verlag, New York, 1998. MR1633290 (99h:05001)
[6] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Görke, Martin Hoefer, Zoran Nikoloski,
and Dorothea Wagner, On modularity clustering, IEEE Trans. Knowledge and Data Engi-
neering 20 (2008), no. 2, 172–188.
[7] Aydın Buluç and John R Gilbert, The Combinatorial BLAS: design, implementation, and
applications, International Journal of High Performance Computing Applications 25 (2011),
no. 4, 496–509.
1 http://www.cc.gatech.edu/
~jriedy/community-detection/
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
220 E. JASON RIEDY, HENNING MEYERHENKE, DAVID EDIGER, AND DAVID A. BADER
[8] A. Clauset, M.E.J. Newman, and C. Moore, Finding community structure in very large net-
works, Physical Review E 70 (2004), no. 6, 66111.
[9] Facebook, Fact sheet, February 2012, http://newsroom.fb.com/content/default.aspx?
NewsAreaId=22.
[10] B. O. Fagginger Auer and R. H. Bisseling, Graph coarsening and clustering on the GPU, Tech.
report, 10th DIMACS Implementation Challenge - Graph Partitioning and Graph Clustering,
Atlanta, GA, February 2012.
[11] S. Fortunato and M. Barthélemy, Resolution limit in community detection, Proc. of the
National Academy of Sciences 104 (2007), no. 1, 36–41.
[12] Santo Fortunato, Community detection in graphs, Phys. Rep. 486 (2010), no. 3-5, 75–174,
DOI 10.1016/j.physrep.2009.11.002. MR2580414 (2011d:05337)
[13] Joachim Gehweiler and Henning Meyerhenke, A distributed diffusive heuristic for cluster-
ing a virtual P2P supercomputer, Proc. 7th High-Performance Grid Computing Workshop
(HGCW’10) in conjunction with 24th Intl. Parallel and Distributed Processing Symposium
(IPDPS’10), IEEE Computer Society, 2010.
[14] Robert Görke, Andrea Schumm, and Dorothea Wagner, Experiments on density-constrained
graph clustering, Proc. Algorithm Engineering and Experiments (ALENEX12), 2012.
[15] Jaap-Henk Hoepman, Simple distributed weighted matchings, CoRR cs.DC/0410047 (2004).
[16] P. Konecny, Introducing the Cray XMT, Proc. Cray User Group meeting (CUG 2007) (Seattle,
WA), CUG Proceedings, May 2007.
[17] Andrea Lancichinetti and Santo Fortunato, Limits of modularity maximization in community
detection, Phys. Rev. E 84 (2011), 066122.
[18] S. Lozano, J. Duch, and A. Arenas, Analysis of large social datasets by community detection,
The European Physical Journal - Special Topics 143 (2007), 257–259.
[19] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty
Leiser, and Grzegorz Czajkowski, Pregel: a system for large-scale graph processing, Proceed-
ings of the 2010 international conference on Management of data (New York, NY, USA),
SIGMOD ’10, ACM, 2010, pp. 135–146.
[20] Fredrik Manne and Rob Bisseling, A parallel approximation algorithm for the weighted maxi-
mum matching problem, Parallel Processing and Applied Mathematics (Roman Wyrzykowski,
Jack Dongarra, Konrad Karczewski, and Jerzy Wasniewski, eds.), Lecture Notes in Computer
Science, vol. 4967, Springer Berlin / Heidelberg, 2008, pp. 708–717.
[21] M.E.J. Newman, Modularity and community structure in networks, Proc. of the National
Academy of Sciences 103 (2006), no. 23, 8577–8582.
[22] M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks,
Phys. Rev. E 69 (2004), no. 2, 026113.
[23] Andreas Noack and Randolf Rotta, Multi-level algorithms for modularity clustering, Exper-
imental Algorithms (Jan Vahrenhold, ed.), Lecture Notes in Computer Science, vol. 5526,
Springer Berlin / Heidelberg, 2009, pp. 257–268.
[24] Mark B. Novick, Fast parallel algorithms for the modular decomposition, Tech. report, Cornell
University, Ithaca, NY, USA, 1989.
[25] NYSE Euronext, Consolidated volume in NYSE listed issues, 2010 – current, March
2011, http://www.nyxdata.com/nysedata/asp/factbook/viewer_edition.asp?mode=table&
key=3139&category=3.
[26] OpenMP Architecture Review Board, OpenMP application program interface; version 3.0,
May 2008.
[27] STACS 99, Lecture Notes in Computer Science, vol. 1563, Springer-Verlag, Berlin, 1999.
Edited by Christoph Meinel and Sophie Tison. MR1734032 (2000h:68028)
[28] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, Defining and identifying
communities in networks, Proc. of the National Academy of Sciences 101 (2004), no. 9, 2658.
[29] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barabási, Hierarchical
organization of modularity in metabolic networks, Science 297 (2002), no. 5586, 1551–1555.
[30] E. Jason Riedy, David A. Bader, and Henning Meyerhenke, Scalable multi-threaded commu-
nity detection in social networks, Workshop on Multithreaded Architectures and Applications
(MTAAP) (Shanghai, China), May 2012.
[31] E. Jason Riedy, Henning Meyerhenke, David Ediger, and David A. Bader, Parallel community
detection for massive graphs, Proceedings of the 9th International Conference on Parallel
Processing and Applied Mathematics (Torun, Poland), September 2011.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
PARALLEL COMMUNITY DETECTION FOR MASSIVE GRAPHS 221
[32] , Parallel community detection for massive graphs, Tech. report, 10th DIMACS Im-
plementation Challenge - Graph Partitioning and Graph Clustering, Atlanta, GA, February
2012.
[33] C. Seshadhri, Tamara G. Kolda, and Ali Pinar, Community structure and scale-free collec-
tions of Erdös-Rényi graphs, CoRR abs/1112.3644 (2011).
[34] Twitter, Inc., Happy birthday Twitter!, March 2011, http://blog.twitter.com/2011/03/
happy-birthday-twitter.html.
[35] Ken Wakita and Toshiyuki Tsurumi, Finding community structure in mega-scale social net-
works, CoRR abs/cs/0702048 (2007).
[36] Dennis M. Wilkinson and Bernardo A. Huberman, A method for finding communities of re-
lated genes, Proceedings of the National Academy of Sciences of the United States of America
101 (2004), no. Suppl 1, 5241–5248.
[37] Yuzhou Zhang, Jianyong Wang, Yi Wang, and Lizhu Zhou, Parallel community detection on
large networks with propinquity dynamics, Proceedings of the 15th ACM SIGKDD interna-
tional conference on Knowledge discovery and data mining (New York, NY, USA), KDD ’09,
ACM, 2009, pp. 997–1006.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Contemporary Mathematics
Volume 588, 2013
http://dx.doi.org/10.1090/conm/588/11706
1. Introduction
We present a fine-grained shared-memory parallel algorithm for graph coars-
ening and apply this algorithm in the context of graph clustering to obtain a fast
greedy heuristic for maximising modularity in weighted undirected graphs. This is
a follow-up to [8], which was concerned with generating weighted graph matchings
on the GPU, in an effort to use the parallel processing power offered by multi-core
CPUs and GPUs for discrete computing tasks, such as partitioning and clustering
of graphs and hypergraphs. Just as generating graph matchings, graph coarsening
is an essential aspect of both graph partitioning [4, 9, 12] and multi-level clustering
[22] and therefore forms a logical continuation of the research done in [8].
Our contribution is a parallel greedy clustering algorithm, that scales well with
the number of available processor cores, and generates clusterings of reasonable
quality in very little time. We have tested this algorithm, see Section 5, against a
large set of clustering problems, all part of the 10th DIMACS challenge on graph
partitioning and clustering [1], such that the performance of our algorithm can
directly be compared with the state-of-the-art clustering algorithms participating
in this challenge.
An undirected graph G is a pair (V, E), with vertices V , and edges E that are
of the form {u, v} for u, v ∈ V with possibly u = v. Edges can be provided with
weights ω : E → R>0 , in which case we call G a weighted undirected graph. For
vertices v ∈ V , we denote the set of all of v’s neighbours by
Vv := {u ∈ V | {u, v} ∈ E} \ {v}.
2010 Mathematics Subject Classification. Primary 68R10, 68W10; Secondary 91C20, 05C70.
Key words and phrases. Graphs, GPU, shared-memory parallel, clustering.
This research was performed on hardware from NWO project NWO-M 612.071.305.
2013
c American Mathematical Society
223
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
224 B. O. FAGGINGER AUER AND R. H. BISSELING
Note that the number of clusters is not fixed beforehand, and that there can be
a single large cluster, or as many clusters as there are vertices, or any number of
clusters in between. A quality measure for clusterings, modularity, was introduced
in [16], which we will use to judge the quality of the generated clusterings.
Let G = (V, E, ω) be a weighted undirected graph. We define the weight ζ(v)
of a vertex v ∈ V in terms of the weights of the edges incident to this vertex as
⎧
⎪
⎪ ω({u, v}) if {v, v} ∈
/ E,
⎨ {u,v}∈E
(1.1) ζ(v) :=
⎪ ω({u, v}) + 2 ω({v, v}) if {v, v} ∈ E.
⎪
⎩ {u,v}∈E
u =v
which is bounded by − 12
≤ mod(C) ≤ 1, as we show in the appendix.
Finding a clustering C which maximises mod(C) is an NP-complete problem, i.e.
ascertaining whether there exists a clustering that has at least a fixed modularity
is strongly NP-complete [3, Theorem 4.4]. Hence, to find clusterings that have
maximum modularity in reasonable time, we need to resort to heuristic algorithms.
Many different clustering heuristics have been developed, for which we would like
to refer the reader to the overview in [19, Section 5] and the references contained
therein: there are heuristics based on spectral methods, maximum flow, graph
bisection, betweenness, Markov chains, and random walks. The clustering method
we present belongs to the category of greedy agglomerative heuristics [2, 5, 15, 17,
22]. Our overall approach is similar to the parallel clustering algorithm discussed
by Riedy et al. in [18] and a detailed comparison is included in Section 5.
2. Clustering
We will now rewrite (1.2) to a more convenient
form. Let C ∈ C be a cluster
and define the weight of a cluster as ζ(C) := v∈C ζ(v), the set of all internal edges
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 225
This way of looking at the modularity is useful for reformulating the agglomerative
heuristic in terms of graph coarsening, as we will see in Section 2.1.
For this purpose, we also need to determine what effect the merging of two
clusters has on the clustering’s modularity. Let C be a clustering and C, C ∈ C. If
we merge C and C into one cluster C ∪ C , then the clustering C := (C \ {C, C }) ∪
{C ∪ C } we obtain, has modularity (see the appendix)
1 !
(2.2) mod(C ) = mod(C) + 2 Ω ω(cut(C, C
)) − ζ(C) ζ(C
) ,
2 Ω2
and the new cluster has weight
(2.3) ζ(C ∪ C ) = ζ(v) + ζ(v) = ζ(C) + ζ(C ).
v∈C v∈C
vertices of G that were merged together into a single vertex in Gi via π 0 , . . . , π i−1 ,
are considered as a single cluster. (In particular for G0 = G each vertex of the
original graph is a separate cluster.)
From (2.3) we know that weights ζ(·) of merged clusters should be summed,
while for calculating the modularity, (2.1), and the change in modularity due to
merging, (2.2), we only need the total edge weight ω(cut(·, ·)) of the collection of
edges between two clusters, not of individual edges. Hence, when merging two
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
226 B. O. FAGGINGER AUER AND R. H. BISSELING
clusters, we can safely merge the edges in Gi that are mapped to a single edge in
Gi+1 by π i , provided we sum their edge weights. This means that the merging
of clusters in Gi to obtain Gi+1 corresponds precisely to coarsening the graph
Gi to Gi+1 . Furthermore, weighted matching in the graph of all current clusters
corresponds to a weighted matching in Gi where we consider edges {ui , v i } ∈ E i to
have weight 2 Ω ω i ({ui , v i }) − ζ i (ui ) ζ i (v i ) during matching. This entire procedure
is outlined in Algorithm 1, where we use a map μ : V → N to indicate matchings
M ⊆ E by letting μ(u) = μ(v) ⇐⇒ {u, v} ∈ M for vertices u, v ∈ V .
3. Coarsening
Graph coarsening is the merging of vertices in a graph to obtain a coarser
version of the graph. Doing this recursively, we obtain a sequence of increasingly
coarser approximations of the original graph. Such a multilevel view of the graph
is useful for graph partitioning [4, 9, 12], but can also be used for clustering [22].
Let G = (V, E, ω, ζ) be an undirected graph with edge weights ω and vertex
weights ζ. A coarsening of G is a map π : V → V together with a graph G =
(V , E , ω , ζ ) satisfying the following properties:
(1) π(V ) = V ,
(2) π(E) = {{π(u), π(v)} | {u, v} ∈ E} = E ,
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 227
(3) for v ∈ V ,
(3.1) ζ (v ) = ζ(v),
v∈V
π(v)=v
(4) and for e ∈ E ,
(3.2) ω (e ) = ω({u, v}).
{u,v}∈E
{π(u),π(v)}=e
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
228 B. O. FAGGINGER AUER AND R. H. BISSELING
the graph to a single vertex. Hence, star graphs increase the number of coarsening
iterations at line 5 of Algorithm 1 we need to perform, which increases running
time and has an adverse effect on parallelisation, because of the few matches that
can actually be made in each iteration.
A way to remedy this problem is to identify vertices with the same neighbours
and match these pairwise, see Figure 2(b) [7, 10]. When maximising clustering
modularity however, this is not a good idea: for clusters C, C ∈ C without any
edges between them, cut(C, C ) = ∅, merging C and C will change the modularity
by 2−1
Ω2 ζ(C) ζ(C ) ≤ 0.
Because of this, we will use the strategy from Figure 2(c), and merge multiple
outlying vertices, referred to as satellites from now on, to the centre of the star
simultaneously. To do so, however, we need to be able to identify star centres and
satellites in the graph.
As the defining characteristic of the centre of a star is its high degree, we will
use the vertex degrees to measure to what extent a vertex is a centre or a satellite.
We propose, for vertices v ∈ V , to let
deg(v)2
(3.3) cp(v) := ,
deg(u)
u∈Vv
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 229
4. Parallel implementation
In this section, we will demonstrate how the different parts of the clustering
algorithm can be implemented in a style that is suitable for the GPU.
To make the description of the algorithm more explicit, we will need to deviate
from some of the graph definitions of the introduction. First of all, we consider
arrays in memory as ordered lists, and suppose that the vertices of the graph
G = (V, E, ω, ζ) to be coarsened are given by V = (1, 2, . . . , |V |). We index such
lists with parentheses, e.g. V (2) = 2, and denote their length by |V |. Instead of
storing the edges E and edge weights ω of a graph explicitly, we will store for each
vertex v ∈ V the set of all its neighbours Vv , and include the edge weights ω in this
list. We will refer to these sets as extended neighbour lists and denote them by Vvω
for v ∈ V .
Let us consider a small example: a graph with 3 vertices and edges {1, 2}
and {1, 3} with edge weights ω({1, 2}) = 4 and ω({1, 3}) = 5. Then, for the
parallel coarsening algorithm we consider this graph as V = (1, 2, 3), together with
V1ω = ((2, 4), (3, 5)) (since there are two edges originating from vertex 1, one going
to vertex 2, and one going to vertex 3), V2ω = ((1, 4)) (as ω({1, 2}) = 4), and
V3ω = ((1, 5)) (as ω({1, 3}) = 5).
In memory, such neighbour lists are stored as an array of indices and weights
(in the small example, ((2, 4), (3, 5), (1, 4), (1, 5))), with for each vertex a range in
this array (in the small example range (1, 2) for vertex 1, (3, 3) for 2, and (4, 4)
for 3). Note that we can extract all edges together with their weights ω directly
from the extended neighbour lists. Hence, (V, E, ω, ζ) and (V, {Vvω | v ∈ V }, ζ) are
equivalent descriptions of G.
We will now discuss the parallel coarsening algorithm described by Algorithm
2, in which the parallel * functions are slight adaptations of those available in
the Thrust template library [11]. The for . . . parallel do construct indicates a
for-loop of which each iteration can be executed in parallel, independent of all other
iterations.
We start with an undirected weighted graph G with vertices V = (1, 2, . . . , |V |),
vertex weights ζ, and edges E with edge weights ω encoded in the extended neigh-
bour lists as discussed above. A given map μ : V → N indicates which vertices
should be merged to form the coarse graph.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
230 B. O. FAGGINGER AUER AND R. H. BISSELING
Algorithm 2 starts by creating an ordered list ρ of all the vertices V , and sorting
ρ according to μ. The function parallel sort by key(a, b) sorts b in increasing or-
der and applies the same sorting permutation to a, and does so in parallel. Consider
for example a graph with 12 vertices and a given μ:
ρ 1 2 3 4 5 6 7 8 9 10 11 12
μ 9 2 3 22 9 9 22 2 3 3 2 4
Then applying parallel sort by key will yield
ρ 2 8 11 3 9 10 12 1 5 6 4 7
μ 2 2 2 3 3 3 4 9 9 9 22 22
We then apply the function parallel adjacent not equal(a) which sets a(1) to 1,
and for 1 < i ≤ |a| sets a(i) to 1 if a(i)
= a(i − 1) and to 0 otherwise. This yields
ρ 2 8 11 3 9 10 12 1 5 6 4 7
μ 1 0 0 1 0 0 1 1 0 0 1 0
Now we know where each group of vertices of G that needs to be merged together
starts. We will store these numbers in the ‘inverse’ of the projection map π, such
that we know, for each coarse vertex v , what vertices v in the original graph are
coarsened to v . The function parallel copy index if nonzero(a) picks out the
indices 1 ≤ i ≤ |a| for which a(i)
= 0 and stores these consecutively in a list, π −1
in this case, in parallel.
ρ 2 8 11 3 9 10 12 1 5 6 4 7
μ 1 0 0 1 0 0 1 1 0 0 1 0
π −1 1 4 7 8 11
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 231
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
232 B. O. FAGGINGER AUER AND R. H. BISSELING
vertex of the original graph, such that for i = 0, 1, . . ., our clustering will be
C i = {{v ∈ V | κi (v) = k} | k ∈ N} (i.e. vertices u and v belong to the same
cluster precisely when κi (u) = κi (v)). Initially we assign all vertices to a differ-
ent cluster by letting κ0 (v) ← v for all v ∈ V . After coarsening, the clustering is
updated at line 11 by setting κi+1 (v) ← π i (κi (v)). We do this in parallel using
c ← parallel gather(a, b), which sets c(i) ← b(a(i)) for 1 ≤ i ≤ |a| = |c|.
Note that unlike [17, 22], we do not employ a local refinement strategy such as
Kernighan–Lin [13] to improve the quality of the obtained clustering from Algo-
rithm 1, because such an algorithm does not lend itself well to parallelisation. This
is primarily caused by the fact that exchanging a single vertex between two clusters
changes the total weight of both clusters, leading to a change in the modularity gain
of all vertices in both the clusters. A parallel implementation of the Kernighan–Lin
algorithm for clustering is therefore even more difficult than for graph partitioning
[9, 12], where exchanging vertices only affects the vertex’s neighbours. Remedying
this is an interesting avenue for further research.
To improve the performance of Algorithm 1 further, we make use of two addi-
tional observations. We found during our clustering experiments that the modular-
ity would first increase as the coarsening progressed and then would decrease after
a peak value was obtained, as is also visible in [16, Figures 6 and 9]. Hence, we
stop Algorithm 1 after the current modularity drops below 95% (to permit small
fluctuations) of the highest modularity encountered thus far.
The second optimisation makes use of the fact that we do not perform un-
coarsening steps in Algorithm 1 (although with the data generated by Algorithm 2
this is certainly possible), which makes it unnecessary to store the entire hierarchy
G0 , G1 , G2 , . . . in memory. Therefore, we only store two graphs, G0 and G1 , and
coarsen G0 to G1 as before, but then we coarsen G1 to G0 , instead of a new graph
G2 , and alternate between G0 and G1 as we coarsen the graph further.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 233
5. Results
Algorithm 1 was implemented using NVIDIA’s Compute Unified Device Ar-
chitecture (CUDA) language together with the Thrust template library [11] on
the GPU and using Intel’s Threading Building Blocks (TBB) library on the CPU.
The experiments were performed on a computer equipped with two quad-core 2.4
GHz Intel Xeon E5620 processors with hyperthreading (we use 16 threads), 24 GiB
RAM, and an NVIDIA Tesla C2075 with 5375 MiB global memory. All source
code for the algorithms, together with the scripts required to generate the bench-
mark data, has been released under the GNU General Public Licence and are freely
available from https://github.com/BasFaggingerAuer/Multicore-Clustering.
It is important to note that the clustering times listed in Table 1, 2, and Figure
3 do include data transfer times from CPU to GPU, but not data transfer from
hard disk to CPU memory. On average, 5.5% of the total running time is spent on
CPU–GPU data transfer. The recorded time and modularity are averaged over 16
runs, because of the use of random numbers in the matching algorithm [8]. These
are generated using the TEA-4 algorithm [21] to improve performance.
The modularity of the clusterings generated by the CPU implementation is
generally a little higher (e.g. eu-2005) than those generated by the GPU. The dif-
ference between both algorithms is caused by the matching stage of Algorithm
1. For the GPU implementation, we always generate a maximal matching to
coarsen the graph as much as possible, even if including some edges {u, v} ∈
E for which 2 Ω ω({u, v}) − ζ(u) ζ(v) < 0 will decrease the modularity. This
yields a fast algorithm, but has an adverse effect on the obtained modularity.
For the CPU implementation, we only include edges {u, v} ∈ E which satisfy
2 Ω ω({u, v}) − ζ(u) ζ(v) ≥ 0 in the matching, such that the modularity can only be
increased by each matching stage. This yields higher modularity clusterings, but
will slow down the algorithm if only a few modularity-increasing edges are available
(if there are none, we perform a single matching round where we consider all edges).
Comparing Table 1 with modularities from [17, Table 1] for karate (0.412),
jazz (0.444), email (0.572), and PGPgiantcompo (0.880), we see that Algorithm
1 generates clusterings of lesser modularity. We attribute this to the absence of a
local refinement strategy in Algorithm 1, as noted in Section 4.1. The modularity
of the clusterings of irregular graphs from the kronecker/ categories is an order of
magnitude smaller than those of graphs from other categories. We are uncertain
about what causes this behaviour.
Algorithm 1 is fast: for the road central graph with 14 million vertices and 17
million edges, the GPU generates a clustering with modularity 0.996 in 4.6 seconds,
while for uk-2002, with 19 million vertices and 262 million edges, the CPU generates
a clustering with modularity 0.974 in 30 seconds.
In particular,
for clustering of
nearly regular graphs (i.e. where the ratio maxv∈V deg(v) / minv∈V deg(v) is
small) such as street networks, the high bandwidth of the GPU enables us to find
high-quality clusterings in very little time (Table 2). Furthermore, Figure 3(a)
suggests that in practice, Algorithm 1 scales linearly with the number of edges of
the graph, while Figure 3(b) shows that the parallel performance of the algorithm
scales reasonably with the number of available cores, increasingly so as the size
of the graph increases. Note that with dual quad-core processors, we have eight
physical cores available, which explains the smaller increase in performance when
the number of threads is extended beyond eight via hyperthreading.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
234 B. O. FAGGINGER AUER AND R. H. BISSELING
10 1 60
50
10 0 40 linear
10 -1 2 15
30 16
2 17
2
10 -2 2 18
19
20 2 20
10 -3 2 21
2 22
2 23
10 -4 2
24
2
10 -5 1 10
10 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 1 2 4 8 16
Number of graph edges |E | Number of CPU threads
(a) (b)
From Figure 3(a), we see that while the GPU performs well for large, |E| ≥ 106 ,
nearly regular graphs, the CPU handles small and irregular graphs better. This
can be explained by the GPU setup time that becomes dominant for small graphs,
and by the fact that for large irregular graphs, vertices with a higher-than-average
degree keep one of the threads occupied, while the threads treating the other, low-
degree, vertices are already done, leading to a low GPU occupancy (i.e. where
only a single of the 32 threads in a warp is still doing actual work). On the CPU,
varying vertex degrees are a much smaller problem because threads are not launched
in warps: they can immediately start working on a new vertex, without having to
wait for other threads to finish. This results in better performance for the CPU on
irregular graphs.
The most costly per-vertex operation is compress neighbours, used during
coarsening. We therefore expect the GPU to spend more time, for irregular graphs,
on coarsening than on matching. For the regular graph asia (GPU 3.4× faster),
the GPU (CPU) spends 68% (52%) of the total time on matching and 16% (41%) on
coarsening. For the irregular graph eu-2005 (CPU 4.7× faster), the GPU (CPU)
spends 29% (39%) on matching and 70% (57%) on coarsening, so coarsening indeed
becomes the bottleneck for the GPU when the graph is irregular.
The effectiveness of merging unmatched satellites can also be illustrated using
these graphs: for asia the number of coarsenings performed in Algorithm 1 is
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 235
Table 1. For graphs G = (V, E), this table lists the average mod-
ularities mod1,2 , (1.2), of clusterings of G generated in an average
time of t1,2 seconds by the CUDA1 and TBB2 implementations of
Algorithm 1. The ‘%1 ’ column indicates the percentage of time
spent on CPU–GPU data transfer. Results are averaged over 16
runs. A ‘-’ indicates that the test system ran out of memory in one
of the runs. This table lists graphs from the clustering/ category
of the 10th DIMACS challenge [1].
reduced from 47 to 37 (1.1× speedup), while for eu-2005 it is reduced from 10,343
to 25 (55× speedup), with similar modularities. This explains the good speedup of
our algorithm over [18] in Table 3 for eu-2005, while we do not obtain a speedup
for belgium.
In the remainder of this section, we will compare our method to the existing
clustering heuristic developed by Riedy et al. [18]. We use the same global greedy
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
236 B. O. FAGGINGER AUER AND R. H. BISSELING
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 237
hard because of the different test systems that have been used: we (t1 and t2 ) used
two quad-core 2.4 GHz Intel Xeon E5620 processors with a Tesla C2050, while the
algorithm from [18] used four ten-core 2.4 GHz Intel Xeon E7-8870 processors (tO )
and a Cray XMT2 (tX ).
6. Conclusion
In this paper we have presented a fine-grained shared-memory parallel algo-
rithm for graph coarsening, Algorithm 2, suitable for both multi-core CPUs and
GPUs. Through a greedy agglomerative clustering heuristic, Algorithm 1, we try
to find graph clusterings of high modularity to measure the performance of this
coarsening method. Our parallel clustering algorithm scales well for large graphs
if the number of threads is increased, Figure 3(b), and can generate clusterings of
reasonable quality in very little time, requiring 4.6 seconds to generate a modularity
0.996 clustering of a graph with 14 million vertices and 17 million edges.
An interesting direction for future research would be the development of a
local refinement method for clustering that scales well with the number of available
processing cores, and can be implemented efficiently on GPUs. This would greatly
benefit the quality of the generated clusterings.
7. Appendix
7.1. Reformulating modularity. Our first observation is that for every clus-
ter C ∈ C, by (1.1):
(7.1) ζ(C) = 2 ω(int(C)) + ω(ext(C)).
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
238 B. O. FAGGINGER AUER AND R. H. BISSELING
As
"
ext(C) = {{u, v} ∈ E | u ∈ C, v ∈
/ C} = cut(C, C ),
C ∈C
C =C
merging C and C .
Then ζ(C ) = ζ(C) + ζ(C ) by (2.3). Furthermore, as cut(C, C ) = ext(C) ∩
ext(C ), we have that
(7.3) ω(ext(C )) = ω(ext(C)) + ω(ext(C )) − 2 ω(cut(C, C )).
Using this, together with (7.2), we find that
4 Ω2 (mod(C ) − mod(C)) = −ζ(C) (2 Ω − ζ(C)) + 2 Ω ω(ext(C))
− ζ(C ) (2 Ω − ζ(C )) + 2 Ω ω(ext(C ))
+ ζ(C ) (2 Ω − ζ(C )) − 2 Ω ω(ext(C ))
(7.3)
= −ζ(C) (2 Ω − ζ(C)) + 2 Ω ω(ext(C))
− ζ(C ) (2 Ω − ζ(C )) + 2 Ω ω(ext(C ))
+ (ζ(C) + ζ(C )) (2 Ω − (ζ(C) + ζ(C )))
# $
− 2 Ω ω(ext(C)) + ω(ext(C )) − 2 ω(cut(C, C ))
= 4 Ω ω(cut(C, C )) − 2 ζ(C) ζ(C ).
So merging clusters C and C from C to obtain a clustering C , leads to a change
in modularity given by (2.2).
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
GRAPH COARSENING AND CLUSTERING ON THE GPU 239
From (1.2),
ω({u, v}) ω({u, v})
C∈C {u,v}∈E {u,v}∈E
u,v∈C u,v∈V
mod(C) ≤ − 0≤ = 1,
ω(e) ω(e)
e∈E e∈E
which shows one of the inequalities. For the other inequality, note that for every
C ∈ C we have 0 ≤ ω(int(C)) ≤ Ω − ω(ext(C)), and therefore
1 !
mod(C) = 4 Ω ω(int(C)) − ζ(C) 2
4 Ω2
C∈C
(7.1) 1
= 4 Ω ω(int(C)) − 4 ω(int(C))2 − 4 ω(int(C)) ω(ext(C))
4 Ω2
C∈C
!
− ω(ext(C))2
1 !
= 4 ω(int(C)) [Ω − ω(ext(C)) − ω(int(C))] − ω(ext(C)) 2
4 Ω2
C∈C
1 ! ω(ext(C)) 2
≥ 0 − ω(ext(C)) 2
= − .
4 Ω2 2Ω
C∈C C∈C
ω(ext(Ci ))
Enumerate C = {C1 , . . . , Ck } and define xi := for 1 ≤ i ≤ k to obtain2Ω
a vector x ∈ Rk . Note that 0 ≤ xi ≤ 12 (as 0 ≤ ω(ext(Ci )) ≤ Ω) for 1 ≤
i ≤ k, and because every external edge connects precisely two clusters, we have
k k
i=1 ω(ext(Ci )) ≤ 2 Ω, so i=1 xi ≤ 1. By the above, we know that
mod(C) ≥ −!x!22 ,
hence we need to find an upper bound on !x!22 , for x ∈ [0, 12 ]k satisfying ki=1 xi ≤
1. For all k ≥ 2, this upper bound equals !( 21 , 12 , 0, . . . , 0)!22 = 12 , so mod(C) ≥ − 12 .
The proof is completed by noting that for a single cluster, mod({V }) = 0 ≥ − 12 .
Acknowledgements
We would like to thank Fredrik Manne for his insights in parallel matching and
coarsening, and the Little Green Machine project, http://littlegreenmachine.
org/, for permitting us to use their hardware under project NWO-M 612.071.305.
References
[1] D. A. Bader, P. Sanders, D. Wagner, H. Meyerhenke, B. Hendrickson, D. S. Johnson, C. Wal-
shaw, and T. G. Mattson, 10th DIMACS implementation challenge - graph partitioning and
graph clustering, 2012. http://www.cc.gatech.edu/dimacs10
[2] H. Bisgin, N. Agarwal, and X. Xu, Does similarity breed connection? - an investigation in
Blogcatalog and Last.fm communities., Proc of. SocialCom/PASSAT’10, 2010, pp. 570–575.
DOI 10.1109/SocialCom.2010.90.
[3] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner,
On modularity clustering, IEEE Trans. Knowledge and Data Engineering 20 (2008), no. 2,
172–188. DOI 10.1109/TKDE.2007.190689.
[4] T. Bui and C. Jones, A heuristic for reducing fill-in in sparse matrix factorization, Proc.
Sixth SIAM Conference on Parallel Processing for Scientific Computing (Philadelphia, PA,
USA), SIAM, 1993, pp. 445–452.
[5] A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large
networks, Phys. Rev. E 70 (2004), 066111. DOI 10.1103/PhysRevE.70.066111.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
240 B. O. FAGGINGER AUER AND R. H. BISSELING
[6] T. A. Davis and Y. Hu, The University of Florida sparse matrix collection, ACM Trans.
Math. Software 38 (2011), no. 1, Art. 1, 25pp, DOI 10.1145/2049662.2049663. MR2865011
(2012k:65051)
[7] I. S. Duff and J. K. Reid, Exploiting zeros on the diagonal in the direct solution of indefinite
sparse symmetric linear systems, ACM Trans. Math. Software 22 (1996), no. 2, 227–257,
DOI 10.1145/229473.229480. MR1408491 (97c:65085)
[8] B. O. Fagginger Auer and R. H. Bisseling, A GPU algorithm for greedy graph matching, Proc.
FMC II, LNCS, vol. 7174, Springer Berlin / Heidelberg, 2012, pp. 108–119. DOI 10.1007/978-
3-642-30397-5 10.
[9] B. Hendrickson and R. Leland, A multilevel algorithm for partitioning graphs, Proc. Super-
computing ’95 (New York, NY, USA), ACM, 1995. DOI 10.1145/224170.224228.
[10] B. Hendrickson and E. Rothberg, Improving the run time and quality of nested dissection
ordering, SIAM J. Sci. Comput. 20 (1998), no. 2, 468–489, DOI 10.1137/S1064827596300656.
MR1642639 (99d:65142)
[11] J. Hoberock and N. Bell, Thrust: A parallel template library, 2010, Version 1.3.0.
[12] G. Karypis and V. Kumar, Analysis of multilevel graph partitioning, Proc. Supercomputing
’95 (New York, NY, USA), ACM, 1995, p. 29. DOI 10.1145/224170.224229.
[13] B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell
System Technical Journal 49 (1970), 291–307.
[14] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, Statistical properties of commu-
nity structure in large social and information networks, Proc. WWW ’08 (New York, NY,
USA), ACM, 2008, pp. 695–704. DOI 10.1145/1367497.1367591.
[15] M. E. J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev.
E 69 (2004), 066133. DOI 10.1103/PhysRevE.69.066133.
[16] M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks,
Phys. Rev. E 69 (2004), 026113. DOI 10.1103/PhysRevE.69.026113.
[17] M. Ovelgönne, A. Geyer-Schulz, and M. Stein, Randomized greedy modularity optimization
for group detection in huge social networks, Proc. SNA-KDD ’10 (Washington, DC, USA),
ACM, 2010.
[18] E. J. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader, Parallel community detection for
massive graphs, Proc. PPAM11 (Torun, Poland), LNCS, vol. 7203, Springer, 2012, pp. 286–
296. DOI 10.1007/978-3-642-31464-3 29.
[19] S. E. Schaeffer, Graph clustering, Computer Science Review 1 (2007), no. 1, 27–64. DOI
10.1016/j.cosrev.2007.05.001.
[20] B. Vastenhouw and R. H. Bisseling, A two-dimensional data distribution method for parallel
sparse matrix-vector multiplication, SIAM Rev. 47 (2005), no. 1, 67–95 (electronic), DOI
10.1137/S0036144502409019. MR2149102 (2006a:65070)
[21] F. Zafar, M. Olano, and A. Curtis, GPU random numbers via the tiny encryption algorithm,
Proc. HPG10 (Saarbrucken, Germany), Eurographics Association, 2010, pp. 133–141.
[22] Z. Zhu, C. Wang, L. Ma, Y. Pan, and Z. Ding, Scalable community discovery of large networks,
Proc. WAIM ’08, 2008, pp. 381–388. DOI 10.1109/WAIM.2008.13.
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Selected Published Titles in This Series
588 David A. Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner,
Editors, Graph Partitioning and Graph Clustering, 2013
587 Wai Kiu Chan, Lenny Fukshansky, Rainer Schulze-Pillot, and Jeffrey D.
Vaaler, Editors, Diophantine Methods, Lattices, and Arithmetic Theory of Quadratic
Forms, 2013
584 Clara L. Aldana, Maxim Braverman, Bruno Iochum, and Carolina Neira
Jiménez, Editors, Analysis, Geometry and Quantum Field Theory, 2012
583 Sam Evens, Michael Gekhtman, Brian C. Hall, Xiaobo Liu, and Claudia Polini,
Editors, Mathematical Aspects of Quantization, 2012
582 Benjamin Fine, Delaram Kahrobaei, and Gerhard Rosenberger, Editors,
Computational and Combinatorial Group Theory and Cryptography, 2012
581 Andrea R. Nahmod, Christopher D. Sogge, Xiaoyi Zhang, and Shijun Zheng,
Editors, Recent Advances in Harmonic Analysis and Partial Differential Equations, 2012
580 Chris Athorne, Diane Maclagan, and Ian Strachan, Editors, Tropical Geometry
and Integrable Systems, 2012
579 Michel Lavrauw, Gary L. Mullen, Svetla Nikova, Daniel Panario, and Leo
Storme, Editors, Theory and Applications of Finite Fields, 2012
578 G. López Lagomasino, Recent Advances in Orthogonal Polynomials, Special Functions,
and Their Applications, 2012
577 Habib Ammari, Yves Capdeboscq, and Hyeonbae Kang, Editors, Multi-Scale
and High-Contrast PDE, 2012
576 Lutz Strüngmann, Manfred Droste, László Fuchs, and Katrin Tent, Editors,
Groups and Model Theory, 2012
575 Yunping Jiang and Sudeb Mitra, Editors, Quasiconformal Mappings, Riemann
Surfaces, and Teichmüller Spaces, 2012
574 Yves Aubry, Christophe Ritzenthaler, and Alexey Zykin, Editors, Arithmetic,
Geometry, Cryptography and Coding Theory, 2012
573 Francis Bonahon, Robert L. Devaney, Frederick P. Gardiner, and Dragomir
Šarić, Editors, Conformal Dynamics and Hyperbolic Geometry, 2012
572 Mika Seppälä and Emil Volcheck, Editors, Computational Algebraic and Analytic
Geometry, 2012
571 José Ignacio Burgos Gil, Rob de Jeu, James D. Lewis, Juan Carlos Naranjo,
Wayne Raskind, and Xavier Xarles, Editors, Regulators, 2012
570 Joaquı́n Pérez and José A. Gálvez, Editors, Geometric Analysis, 2012
569 Victor Goryunov, Kevin Houston, and Roberta Wik-Atique, Editors, Real and
Complex Singularities, 2012
568 Simeon Reich and Alexander J. Zaslavski, Editors, Optimization Theory and
Related Topics, 2012
567 Lewis Bowen, Rostislav Grigorchuk, and Yaroslav Vorobets, Editors, Dynamical
Systems and Group Actions, 2012
566 Antonio Campillo, Gabriel Cardona, Alejandro Melle-Hernández, Wim Veys,
and Wilson A. Zúñiga-Galindo, Editors, Zeta Functions in Algebra and Geometry,
2012
565 Susumu Ariki, Hiraku Nakajima, Yoshihisa Saito, Ken-ichi Shinoda, Toshiaki
Shoji, and Toshiyuki Tanisaki, Editors, Algebraic Groups and Quantum Groups, 2012
564 Valery Alexeev, Angela Gibney, Elham Izadi, János Kollár, and Eduard
Looijenga, Editors, Compact Moduli Spaces and Vector Bundles, 2012
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
588 CONM
Graph Partitioning and Graph Clustering • Bader et al., Editors
Graph partitioning and graph clustering are ubiquitous subtasks in many applications where
graphs play an important role. Generally speaking, both techniques aim at the identification
of vertex subsets with many internal and few external edges. To name only a few, problems
addressed by graph partitioning and graph clustering algorithms are:
• What are the communities within an (online) social network?
• How do I speed up a numerical simulation by mapping it efficiently onto a parallel
computer?
• How must components be organized on a computer chip such that they can commu-
nicate efficiently with each other?
• What are the segments of a digital image?
• Which functions are certain genes (most likely) responsible for?
The 10th DIMACS Implementation Challenge Workshop was devoted to determining
realistic performance of algorithms where worst case analysis is overly pessimistic and
probabilistic models are too unrealistic. Articles in the volume describe and analyze various
experimental data with the goal of getting insight into realistic algorithm performance in
situations where analysis fails.
9 780821 890387
AMS/DIMACS
CONM/588
Licensed to Penn St Univ, University Park. Prepared on Mon Jul 8 20:46:56 EDT 2013 for download from IP 130.203.136.75.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms