0% found this document useful (0 votes)
7 views10 pages

Parallel Modularity Clustering

This paper presents a parallel approach for modularity clustering, which is used to identify communities in social networks. The authors develop a fast parallel implementation leveraging GPU capabilities and demonstrate its effectiveness compared to existing methods, achieving significant performance improvements. Key contributions include a novel algebraic formulation for clustering with weighted graphs and the ability to identify multiple clusters simultaneously.

Uploaded by

mohammedsiyadb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

Parallel Modularity Clustering

This paper presents a parallel approach for modularity clustering, which is used to identify communities in social networks. The authors develop a fast parallel implementation leveraging GPU capabilities and demonstrate its effectiveness compared to existing methods, achieving significant performance improvements. Key contributions include a novel algebraic formulation for clustering with weighted graphs and the ability to identify multiple clusters simultaneously.

Uploaded by

mohammedsiyadb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Available online at www.sciencedirect.

com

ScienceDirect
This space is reserved for the Procedia header, do not use it
This Procedia
space isComputer
reserved for 108C
Science the Procedia header, do not use it
(2017) 1793–1802
This space is reserved for the Procedia header, do not use it

International Conference on Computational Science, ICCS 2017, 12-14 June 2017,


Zurich, Switzerland
Parallel Modularity Clustering
Parallel Modularity Clustering
Alexandre Fender Parallel
1,2,3
, Nahid Modularity Clustering
Emad2,3 , Serge Petiton 2,4
, and Maxim Naumov1
1,2,3 2,3 2,4 1
Alexandre Fender , Nahid Emad , Serge Petiton , and Maxim Naumov
1
1,2,3 Nvidia Corporation,
2,3 USA 2,4
Alexandre Fender , Nahid
2 1 Emad
Maison , Serge Petiton
de Corporation,
Nvidia la Simulation, France
USA , and Maxim Naumov1
3 2
LI-PaRAD,
Maison
1 University of Versailles,
de la Simulation, France France
4 3 Nvidia Corporation, USA
University
2 of Lille
LI-PaRAD, I, Sciencesof&Versailles,
University Technologies, France
France
4 Maison de la Simulation, France
University
3 of Lille I, Sciences & Technologies, France
LI-PaRAD, University of Versailles, France
4
University of Lille I, Sciences & Technologies, France
Abstract
Abstract
In this paper we develop a parallel approach for computing the modularity clustering often
In
usedthis
to paper weand
identify develop
analysea parallel approach
communities for computing
in social networks. the Wemodularity
show that clustering
modularityoften
can
Abstract
used
be to identify and
approximated by analyse at
looking communities
the largest in social networks.
eigenpairs of the We show
weighted thatadjacency
graph modularity can
matrix
In this paper we develop a parallel approach for computing the modularity clustering often
be
thatapproximated
has been by looking
perturbed by aat theone
rank largest eigenpairs
update. Also, of generalize
we the weightedthisgraph adjacency
formulation to matrix
identify
used to identify and analyse communities in social networks. We show that modularity can
that has clusters
multiple been perturbed
at by
Weaatrank onea update. Also,implementation
we generalize this formulation
takes to identify
be approximated byonce.
looking develop
the largestfast eigenpairs
parallel of the weighted forgraph
it thatadjacency
advantage
matrix
multiple
of the clusters
Lanczos at once. We
eigenvalue develop
solver and a fast parallel
k-means implementation
algorithm on the GPU.forFinally,
it that we
takes advantage
highlight the
that has been perturbed by a rank one update. Also, we generalize this formulation to identify
of the Lanczos
performance eigenvalue
and quality ofsolver
our and k-means
approach versusalgorithm
existing on the GPU. Finally,
state-of-the-art we highlight the
techniques.
multiple clusters at once. We develop a fast parallel implementation for it that takes advantage
performance and quality of our approach versus existing state-of-the-art techniques.
©
of2017
the The
Keywords: Authors.
Lanczos Published
eigenvalue
modularity, by Elsevier
solver
assortativity and B.V.
k-means
coefficient, algorithm
spectral, on thecommunity
clustering, GPU. Finally, we highlight
detection, the
graphs, par-
Peer-review
performance
Keywords:
allel under responsibility
and quality
modularity,
algorithms, ofofour
the scientific
assortativity
Lanczos, k-means, approach
CUDA, committee
versus
coefficient,
GPU ofexisting
theclustering,
spectral, International Conferencetechniques.
state-of-the-art
community on Computational
detection, Science
graphs, par-
allel algorithms, Lanczos, k-means, CUDA, GPU
Keywords: modularity, assortativity coefficient, spectral, clustering, community detection, graphs, par-
allel algorithms, Lanczos, k-means, CUDA, GPU
1 Introduction
1 Introduction
The graph clustering technique can often be used to identify and analyse communities in social
1
The Introduction
graph
networks clustering
among many technique can often beThe
other applications. usedgraph
to identify and analyse
is usually communities
split into in social
clusters based on a
networks
particular among many other applications. The graph is usually split intoorclusters based on
17].a
The graph metric, such
clustering as infomap
technique can[6, 19], be
often minimum
used to balanced cut analyse
identify and [12, 13] modularity
communities in[5,
social
particular metric,
In thisamong
paper we such
focusas infomap
on the [6, 19],
latter metricminimum
that balanced
measures cut [12,
how well 13] or modularity [5, 17].
networks many other applications. The graph is usually splitainto
given clustering
clusters basedapplies
on a
to In this paper
a particular we
graph focus on the latter metric that measures how well a given clustering applies
particular metric, suchversus a random
as infomap graph
[6, 19], [15, 16].balanced
minimum We notecutthat modularity
[12, is an important
13] or modularity [5, 17].
to a particular
metric and has graph
been versusinapractice
used random to graph [15,
study 16]. Wediseases
different note that modularity
epidemics [15].isAlso
an important
weapplies
point
In this paper we focus on the latter metric that measures how well a given clustering
metric
out thatand
this has beenisused
metric in practice
closely related tothe
to study different diseases
assortativity epidemics
coefficient [14] as [15].asAlso
well the we point
algebraic
to a particular graph versus a random graph [15, 16]. We note that modularity is an important
out that this of
connectivity metric
graphsis closely
[7, 8], related to the assortativity coefficient [14]review
as wellgiven
as the algebraic
metric and has been used in with a comprehensive
practice scientific
to study different literature
diseases epidemics [15]. Alsoinwe[17,point
21].
connectivity
As the of graphs
main [7, 8], with
contributions of a comprehensive
this paper we scientific
extended the literature review
modularity theory given
by in [17, 21].
providing an
out that this metric is closely related to the assortativity coefficient [14] as well as the algebraic
As theformulation
algebraic main contributions
that of this
works withpaper we extended
weighted graphs the identifies
and modularity theory clusters
multiple by providing
at an
once.
connectivity of graphs [7, 8], with a comprehensive scientific literature review given in [17, 21].
algebraic
Also, we formulation
develop a that
novel worksimplementation
parallel with weighted graphs
of the and identifies
algorithm on multiple
the GPU clusters
that at once.
outperforms
As the main contributions of this paper we extended the modularity theory by providing an
Also,
earlierwe develop a novelapproaches.
state-of-the-art parallel implementation of the algorithm on the GPU that outperforms
algebraic formulation that works with weighted graphs and identifies multiple clusters at once.
earlier state-of-the-art
First, we develop anapproaches.
algebraic formaulation and show how toon setup an eigenvalue problem
Also, we develop a novel parallel implementation of the algorithm the GPU that outperforms
and First,
how weuse
to develop
the an algebraic
k-means formaulation
algorithm to andthe
transform show how to setup
eigenvectors an eigenvalue
corresponding to problem
its largest
earlier state-of-the-art approaches.
and how to use
eigenvalues intothe
a k-meansassignment
discrete algorithm to of transform
nodes intothe eigenvectors
clusters. We corresponding
closely follow thetoframework
its largest
First, we develop an algebraic formaulation and show how to setup an eigenvalue problem
eigenvalues
developed into a discrete assignment of nodes into clusters. Weallows
closelyusfollow themany
framework
and how tofor usespectral clustering
the k-means and partitioning
algorithm to transforminthe [13], which
eigenvectors to reuse
corresponding of its
to its largest
developed for spectral clustering and partitioning in [13], which allows us to reuse many of its
eigenvalues into a discrete assignment of nodes into clusters. We closely follow the framework
developed for spectral clustering and partitioning in [13], which allows us to reuse many of its 1
1
1877-0509 © 2017 The Authors. Published by Elsevier B.V. 1
Peer-review under responsibility of the scientific committee of the International Conference on Computational Science
10.1016/j.procs.2017.05.198
1794 Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov

features. We note that in our approach the number of fixed clusters p is arbitrary. It does not
need to be a power of two, i.e. p = 2k , as when repeated bisection is used k times.
Also, we analyse the effects of using single and double precision to solve the problem. The
latter some times being faster, but requiring more iterations to convergence. Moreover, we
outline how the proposed approach could be modified to find an adaptive number of clusters.
In particular, we show that the clustering information could be derived from the smaller, same
or larger number of eigenvectors, with the former case often exchanging lower quality for higher
performance.
Finally, in our experiments we compare the clustering obtained by the modularity approach
developed in this paper to previous work. We comment on the quality and performance tradeoffs
when they are applied to large social network graphs that often have power law-like distribution
of edges per node. Also, we highlight the performance obtained by our novel parallel approach
on the GPU. For example, it can find 7 clusters with a modularity score over 0.5 in about 0.8
seconds for hollywood-2009 network with over a hundred million undirected edges.

2 Graph Clustering
Let a graph G = (V, E) be defined by its vertex V and edge E sets. The vertex set V = {1, ..., n}
represents n nodes in a graph, with each node identified by a unique integer number i ∈ V . The
edge set E = {wi1 ,j1 , ..., wim ,jm } represents m weighted edges in a graph, with each positive
wi,j ≥ 0 undirected edge identified by wi,j ∈ E.
Let the weighted adjacency matrix A = [ai,j ] of a graph G = (V, E) be defined through its
elements ai,j = wi,j if there is an edge connecting i to j, and 0 otherwise. Notice that matrix
A is symmetric because graph is assumed to be undirected and therefore wi,j ≡ wj,i . Also,
assume that we do not include self-edges, diagonal elements, in the definition of the weighted
adjacency matrix A.
In graph clustering we
p are often interested in finding a partitioning of vertices V into disjoint
sets Sk ⊆ V such that k=1 Sk = V . Notice that we can equivalently express this partitioning
as a function c(i) = k specifying the assignment of nodes i ∈ V into clusters k = 1, ..., p.
In the following discussion, let |.| denote cardinality (number of elements) of a set and di
denote the degree (number of edges) of the vertex i ∈ V . Also, let us define the volume of a
n
node vi = j=1 ai,j and volume of a set of vertices

n
 n 
 n
vol(V ) = vi = ai,j = 2ω (1)
i=1 i=1 j=1

Notice that for unweighted graphs ai,j = 1 and therefore vi = di and 2ω = 2m.

3 Modularity
An intuitive way to identify structure in a graph is to assume that similar vertices are connected
by more edges in the current graph than if they were randomly connected. The modularity
measures the difference between how well vertices are assigned into clusters for the current
graph G = (V, E), when compared to a random graph R = (V, F ) [16, 17].
The reference random graph R = (V, F ) is constructed with the same set of vertices, but
a different set of edges as the current graph. The set of edges F of the random graph is

2
Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov 1795

constructed such that the number of edges |F | = |E| = m and degree di of each vertex is the
same, but the edges themselves are rewired randomly between vertices in V .
Notice that every broken edge, generates two edge ends that are available for rewiring. Then,
the weighted probability of a particular edge end to be connected with some edge end at node
i is vi /2ω. Therefore, the probability of node i and j to be connected during the rewiring is
(vi vj )/2ω.
The modularity is the difference between existing edges and the probabilities of edges in
random graph across all nodes that belong to a given set of clusters.

Definition 1. Let graph G = (V, E) and c(i) be an assignment of nodes into clusters. Then,
modularity Q can be expressed as

1  vi vj 
n n
1 if c(i) = c(j)
Q= ai,j − δc(i),c(j) where δc(i),c(j) = (2)
2ω i=1 j=1 2ω 0 otherwise

The above definition can be reduced to the special case in [16, 17] if we choose to ignore the
vi vj di dj
edge weights during rewiring or work with unweighted graphs, in which case 2ω = 2m .

Lemma 1. The modularity Q is bounded, as shown in [5],

1
− ≤Q≤1 (3)
2
Let us now define the modularity matrix, state its properties and show its relationship to
modularity metric.

Definition 2. Let volume vector vT = [v1 , ..., vn ], then the modularity matrix can be written as

1
B =A− vvT (4)

Lemma 2. The modularity and adjacency matrices have the following properties

Be = 0, Ae = v, vT e = eT Ae = 2ω (5)

where e = [1, ..., 1]T .


T
Proof. The latter two follow from (1). The former follows from Be = Ae − ( v2ωe )v = 0.

Notice that the modularity matrix B is symmetric indefinite. Also, using Lemma 2 we may
conclude that it is singular, with an eigenvalue 0 and corresponding eigenvector e = [1, ..., 1]T .
Let us now define a tall matrix U = [ui,k ], that can be interpreted as a set of vectors
U = [u1 , ..., up ] where each vector uk corresponds to a cluster Sk for k = 1, ..., p, with elements
ui,k = 1 if c(i) = k and 0 otherwise.

Theorem 1. Let the matrix U = [u1 , ..., up ] be specified above. Then,

1
Q= Tr(U T BU ) (6)

3
1796 Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov

Proof. Notice that


p
1     vi vj 
p
1  T  1
Q = ai,j − = uk Buk = Tr(U T BU ) (7)
2ω 2ω 2ω 2ω
k=1 ∀i s.t. ∀j s.t. k=1
c(i)=k c(j)=k

with elements of U constrained to be in set C = {0, 1}.

Notice that ultimately we are interested in finding the cluster assignment c that achieves
the maximum modularity
1
max Q = max Tr(U T BU ) (8)
c 2ω U ∈C
The exact solution to the modularity maximization problem stated in (8) is NP-complete
[5]. However, we can find an approximation by relaxing the requirement that elements of matrix
U take discrete values [12, 13].
Notice that U T U = D, where D = [dk,k ] is a p×p diagonal matrix with elements dk,k = |Sk |.
Then, introducing auxiliary matrix U = U D−1/2 ∈ Rn×p , we can start by looking for

 T BU
max Tr(U ) (9)
T U
U  =I

Notice that by the Courant-Fischer theorem [9] this maximum is achieved by the largest
eigenpairs of the modularity matrix. Now, we still need to convert the real values obtained in
(9) back into the discrete assignment into clusters.
Since we are working in multiple dimensions, it is natural to use the distance between points
as a metric of how to group them. In this case, if we interpret each row of the matrix U as a
point in a p-dimensional space then it becomes natural to use a clustering algorithm, such as
k-means [1, 11] to identify the p distinct partitions. We are not aware of a theoretical result
guaranteeing that the obtained approximate solution will closely match the optimal discrete
solution, but in practice we often do obtain a good approximation.

4 Algorithm
The outline of the modularity clustering technique is described in Alg. 1.

Algorithm 1 Modularity Clustering


1: Let G = (V, E) be an input graph and A be its weighted adjacency matrix.
2: Let p be the number of desired clusters.
1
3: Set the modularity matrix B = A − 2ω vvT .
4: Find p largest eigenpairs BU = U Σ, where Σ = diag(λ1 , ..., λp ).
5: Scale eigenvectors U by row or by column (optional).
6: Run clustering algorithm, such as k-means, on points defined by rows of U .

Notice that the general outline of the modularity clustering closely resembles the spectral
clustering in [13]. The main difference is that in the former case we use the modularity matrix
B and find its largest eigenpairs, while in the latter case we use the Laplacian matrix L and find
its smallest eigenpairs. The properties of modularity and Laplacian matrices are also different,
requiring a different choice of eigenvalue problem solvers.

4
Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov 1797

4.1 Eigenvalue Problem


In our numerical experiments we have found that the eigenvalue solver is the most critical part
of the algorithm because the accuracy of the solution of the eigenvalue problem has a significant
impact on the quality of the obtained clustering. The insufficient accuracy or a failure of the
method usually results in poor approximation to the original discrete problem or even a failure
of the entire algorithm, respectively.
The eigenvalues solver is also the most time consuming part of the computation, as shown
in Fig. 1a. Therefore, we chose to use the implicitly restarted Lanczos method [3, 20], which
is one of the most efficient eigenvalue solvers for finding the largest eigenvalues of symmetric
problems. Notice that the most time consuming part of Lanczos is the sparse matrix-vector
multiplication (csrmv) with remaining time consumed by BLAS operations [23], as shown in
Fig. 1b.

(a) Profiling of the modularity algorithm (b) Profiling of the Lanczos eigensolver

Figure 1: Profiles

4.2 Clustering Problem


Let us now find an approximation to the discrete problem (8) based on the solution of the real-
valued optimization problem (9). Let us interpret each row of U as a point xi in p-dimensional
space. Then, we  sets Sk for k = 1, ..., p, each with a centroid (point in the center) yk ,
can find
p
such that minSk k=1 i∈Sk ||xi −yk ||22 . The exact solution of this problem is NP-complete, but
we can find an approximation to it using many variations of the k-means clustering algorithm
[1, 11], which will define the assignment of nodes into clusters.
Notice that the number of partitions identified by the clustering algorithm does not neces-
sarily need to match the number of computed eigenvectors. In fact it can be chosen adaptively
based on modularity score [22] or x-means algorithm [18].

4.3 Parallelism and Energy Efficiency


The performance and scalability are often the primary concerns for a graph clustering algorithm
because the analysis of the graph structure is usually on the critical path of complex data
analytics. The modularity approach we propose is limited by memory bandwidth, which is
higher on the GPU than on the CPU. Therefore, taking advantage of the parallelism available
on the GPU is critical for the successful application of the algorithm in practice.
In our implementation all building blocks in Alg. 1, including Lanczos and k-means, are
implemented on the GPU. Also, all data structures, including the adjacency matrix A, are
stored in the GPU memory. The action of the matrix B on a vector is computed implicitly
using sparse matrix-vector multiplication (csrmv) with A and rank-one update with v.
We can expect that using GPU would also be advantageous from the energy efficiency
perspective. While we do not measure the energy consumed by the algorithm directly, in broad

5
1798 Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov

terms we can relate it to the difference in TDP1 and time consumed by the algorithm on
different hardware platforms. For instance, in the next section we will perform experiments on
Nvidia Titan X (Pascal) GPU and Intel Core i7-3930K CPU with 250 and 130 Watts TDP,
respectively. Also, we will show that our algorithm on the GPU outperforms the state-of-the-art
implementation on the CPU by ∼ 3× on average. Since the ratio between the speedup and
ratio of TDP (250/130 ∼ 2×) on these platforms is 3/2 > 1, we can in general expect to achieve
a better power efficiency on the GPU.

5 Numerical Experiments
Let us now study the performance and quality of the clustering obtained by the proposed
modularity algorithm Alg. 1 on a relevant sample of graphs from the DIMACS10, LAW and SNAP
graph collections [24], shown in Tab. 1.
In the modularity algorithm, we let the stopping criteria for the Lanczos solver be based on
the norm of the residual of the largest eigenpair ||r1 ||2 = ||Bu1 − λ1 u1 ||2 ≤ 10−3 and maximum
# of iterations 800 (with restart at every 20 iterations), while for the k-means we let it be based
on the scaled error difference |l − l−1 |/n < 10−2 and maximum # of iterations 20.
Also, all numerical experiments are performed on a workstation with Ubuntu 14.04 operating
system, gcc 4.8.4 compiler, CUDA Toolkit 8.0 software and Intel Core i7-3930K CPU 3.2 GHz
and Nvidia Titan X (Pascal) GPU hardware. The performance of the algorithms was always
measured across multiple runs to ensure consistency.

5.1 Clustering and Effects of Precision


First, let the number of cluster into which we would like to partition the graph be fixed. For
instance, suppose that we have decided to partition the graph into 7 clusters. We show the
corresponding time, number of iterations and modularity score in Tab. 1.

Matrix 64 bits 32 bits


Name n = |V| m = |E| Rand Mod T It Mod T It
preferentialA... 100,000 499,985 0.00006 0.147 82 92 0.108 80 140
caidaRouterLevel 192,244 609,066 -0.00008 0.397 74 92 0.233 59 141
coAuthorsDBLP 299,067 977,676 -0.00042 0.392 62 44 0.297 45 81
citationCiteseer 268,495 1,156,647 -0.00011 0.417 108 81 0.417 64 92
coPapersDBLP 540,486 15,245,729 0.00000 0.326 318 80 0.201 514 188
coPapersCiteseer 434,102 16,036,720 -0.00001 0.319 168 56 0.092 1206 681
as-Skitter 1,696,415 22,190,596 -0.00002 0.407 1104 104 0.223 2001 230
hollywood-2009 1,139,905 113,891,327 -0.00136 0.544 796 69 0.187 973 116

Table 1: The modularity (Mod), time (T) in milliseconds and # of iterations (It) achieved for
64 and 32 bit precision, when splitting the graph into 7 clusters. The column (Rand) contains
the modularity score resulting of random cluster assignments

First, notice that the modularity algorithm is robust and converged to the solution on all
networks of interest. Also, notice that the computed modularity score remained in the interval
[−0.5, 1] as predicted by the theory for all the problems. Moreover, the modularity score
1 Thermal Design Power (TDP) measures the average power a processor dissipates when operating with all

cores active. The real energy usage may be different and may change depending on the hardware generation.

6
Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov 1799

computed by random assignment of nodes into clusters was 0 for all the networks as expected.
It is an important baseline for comparing attained modularity scores.
Second, notice the difference in behaviour of the algorithm when the computation is per-
formed using single (32 bit) and double (64 bit) floating point arithmetic. In particular, notice
that the total time to the solution can be significantly better in 64 bit than in 32 bit precision as
shown in the time column of Tab. 1. Indeed, single precision can result in unwanted perturba-
tions during the computation of the Krylov subspace by the Lanczos eigenvalue solver. Those
perturbations can impact the number of iterations and the overall quality of the approximation.
Therefore, we have found that using 64 bit precision is a safer option.

5.2 Adaptive Clustering


In the previous experiments we have kept the number of eigenpairs and k-means clusters the
same as suggested by the theory developed in earlier sections. Next, we investigate what
happens when we decouple these parameters. On one hand, we would expect that by selecting
more k-means clusters than eigenpairs we would trade lower quality for higher performance. On
the other hand, we could interpret selecting less k-means clusters than eigenpairs as filtering
the noise in the data and perhaps obtaining a better solution.
In our experiments we have indeed found that for most networks it is possible to maximize
the modularity by varying the number of k-means clusters independently of the number of
eigenpairs as shown in Fig. 2a. In this experiments we have computed 2 to 8 eigenpairs, and
afterwards we have continued to increase only the number of k-means clusters. Notice that the
plot demonstrates how the choice of the number of eigenpairs impacts the modularity score.
Based on the formulation of the modularity problem one can expect that the best case scenario
would be to compute a number of eigenpairs equal to the number of clusters. This is mostly,
but not always, true, with the exceptions often due to loss of orthogonality or low quality
approximation of the latest eigenvectors.
We also see that it is possible to compute less eigenpairs for an equivalent clustering quality
as shown in Fig. 2a. Indeed, modularity values seem to follow a trend set by the number of
clusters and the selected number of eigenpairs is secondary. Hence, given a constant number
of eigenpairs it is possible to maximize modularity by only increasing the number of k-means
clusters.

(a) Comparing the impact of varying the # of (b) The modularity achieved when changing the
clusters used for assignment for different # of # of clusters for citationCiteseer network in 64
computed eigenvectors bit precision

Figure 2: Varying the # of clusters

7
1800 Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov

The above experiments lead us to propose the following method for discovering an approx-
imation to the natural number of clusters, which has also been proposed for small networks in
[22]. We propose computing as many eigenpairs as clusters up to a fixed point, such as 7, and
afterwards continuing to increase the number of k-means clusters only, while keeping track of
modularity score as shown on Fig. 2b. Since the plotted modularity score curve has a Gaussian
shape it is straight forward to detect that its maximum is at 17 clusters on x-axis. A similar
trend can be seen for several other networks in our experiments. Moreover, we also found that
it is better to over than under estimate the number of clusters.
Also, notice on Fig. 2b that when we increase the number of clusters by 10× from 2 to 20
the time to compute them only increases by about 20% from 95 ms to 120 ms. The plotted time
line has a very low slope with respect to x-axis, because the number of computed eigenpairs
does not increase past 7 in this experiment. Hence, the time growth shown on the figure only
reflects additional time spent in the k-means step.
Using this technique we were able to detect the best clustering for all of the networks in
Tab. 1. The resulting number of clusters and the modularity score found by our method are
shown in Tab. 2.
In general, our algorithm can very quickly compute modularity for many clusters with only
limited memory requirements. For example, we computed 53 clusters in half a second for
coPapersCiteseer network with 16 million edges. Also, it takes only 0.8 seconds to find a
clustering with a modularity score over 0.5 for hollywood-2009 which has 1, 139, 905 vertices
and 113, 891, 327 edges.

5.3 Related Work


The earlier work on modularity was based on an implementation of a greedy algorithm on the
GPU [2]. The performance of our approach versus it is plotted in Fig. 3a. Our implementation
achieves speedups of up to 8× compared to previous results, even when compensating for
bandwidth difference of up to ∼ 3× due to difference in the hardware resources 2 . Also our
approach allows us to handle networks with up to hundred million edges in less than a second.
However, as a tradeoff in some large cases our approach obtains a lower modularity score, see
Tab. 2a.
Matrix Clu Mod RMod Matrix Clu Mod RMod
preferentialAtt... 7 0.147 0.214 karate 3 0.390 0.363
caidaRouterLevel 11 0.397 0.768 dolphins 5 0.509 0.453
coAuthorsDBLP 7 0.392 0.748 lesmis 11 0.250 0.444
citationCiteseer 17 0.506 0.643 adjnoun 5 0.255 0.247
coPapersDBLP 73 0.540 0.640 polbooks 3 0.504 0.437
coPapersCiteseer 53 0.636 0.746 football 7 0.575 0.412
(a) Large data sets in [2] (b) Small data sets in [2]

Table 2: Modularity (Mod) for a given # of clusters (Clu) vs. reference results (Rmod) in [2]

On the other hand, for the small cases the modularity algorithm we propose often attains a
better modularity score than the reference results in [2]. In fact its score is better in 5 out of 6
2 We do not have access to the corresponding code and are forced to make the comparisons with the results

obtained on Tesla C2075 GPU in [2]. Since in both algorithms the execution time is limited by memory
bandwidth, we estimate a factor of ∼ 3× as baseline performance difference between the Tesla C2075 with
144GB/s and TitanX with 337GB/s bandwidth.

8
Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering ScienceN.108C
A. Fender, (2017)
Emad, S. 1793–1802
Petiton and M. Naumov 1801

(a) Large data sets on GPU in [2] (b) Large data sets on CPU in [10]

Figure 3: The speedup and relative quality when compared to the reference results

cases considered in the study, as shown in Tab. 2b. Since we implement different algorithms for
computing modularity, it is not completely surprising that their behavior varies on different data
sets. Unfortunately, we could not identify any particular trends that would tell us when one
algorithm would be better than the other in terms of quality. However, we always outperform
the reference approach on large cases.
Another more recent work on modularity developed a hierarchical algorithm for computing
it on the CPU [10]. We have experimented with this algorithm by computing 7 clusters in 64 bit
precision and using all CPU cores available on the machine. The performance of our approach
versus these results is plotted in Fig. 3b. Notice that on average our algorithm outperforms
the hierarchical approach by about ∼ 3×, but it has the same quality tradeoffs.

6 Conclusion and Future Work


In this paper we extended the modularity theory by developing an algebraic formulation that
works with weighted graphs and identifies multiple clusters at once. Also, we have implemented
a parallel variation of the technique that computes multiple eigenpairs with implicitly restarted
Lanczos algorithm and perform a multidimensional k-means clustering on the obtained eigen-
vectors on the GPU.
This approach allowed us to achieve speedups of up to 8× compared to previous state-of-
the-art results, even when compensating for bandwidth difference of up to 3× due to difference
in the hardware resources. It also allowed us to handle networks with up to hundred million
edges in less than a second.
In our experiments on real networks we have shown that modularity benefits from more
accurate and stable approximation of the eigenpairs, often requiring use of 64 bit precision
floating point arithmetic. Also, we have shown that to compute l clusters for the original graph
more quickly, at the expense of lower quality, we may use l k-means centroids, while computing
only k  l eigenvectors. Moreover, we have used this observation to develop a technique to
detect and adaptively select the natural number of clusters in a graph.
In the future, we plan to investigate the impact of different graph features as well as eigen-
solver parameters on modularity computation.

9
1802 Alexandre Fender et al. / Procedia Computer
Parallel Modularity Clustering Science
A. Fender, N.108C (2017)
Emad, 1793–1802
S. Petiton and M. Naumov

7 Acknowledgements
The authors would like to acknowledge Steven Dalton, Joe Eaton, Alex Fit-Florea and Michael
Garland for their useful comments and suggestions.

References
[1] D. Arthur and S. Vassilvitskii, K-means++: The Advantages of Careful Seeding, Proc. 18th Annual
ACM-SIAM Symposium on Discrete algorithms, pp. 1027-1035, 2007.
[2] B. O. F. Auer, GPU Acceleration of Graph Matching, Clustering and Partitioning, Ph.D. Thesis,
Utrecht University, 2013.
[3] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. van der Vorst, Templates for the solution of
Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA, 2000.
[4] N. Bell and M. Garland, Implementing Sparse Matrix-Vector Multiplication on Throughput-
Oriented Processors, Proc. SC09, 2009.
[5] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski and D. Wagner, On
Modularity Clustering, IEEE Trans. Knowledge and Data Engineering, Vol. 20, pp. 172-188, 2008.
[6] W. M. Campbell, C. K. Dagli, and C. J. Weinstein, Social Network Analysis with Content and
Graphs, Lincoln Lab. Journal, Vol. 20, 2013.
[7] W. E. Donath and A. J. Hoffman, Lower Bounds for the Partitioning of Graphs, IBM Journal of
Research and Development, Vol. 17, pp. 420-425, 1973.
[8] M. Fiedler, Algebraic Connectivity of Graphs, Czechoslovak Mathematical Journal, Vol. 23, pp.
298-305, 1973.
[9] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, NY, 1999.
[10] D. LaSalle and G. Karypis Multi-threaded Modularity Based Graph Clustering Using the Multilevel
Paradigm, Parallel Distrib. Comput., Vol. 76, pp. 66-80, 2015.
[11] S. P. Lloyd, Least Square Quantization in PCM, IEEE Trans. Information Theory, Vol. 28, pp.
129-137, 1982.
[12] U. von Luxburg, A Tutorial on Spectral Clustering, Technical Report No. TR-149, Max Planck
Institute, 2007.
[13] M. Naumov and T. Moon, Parallel Spectral Graph Partitioning, NVIDIA Technical Report, NVR-
2016-001, 2016.
[14] M. E. J. Newman, Assortative Mixing in Networks, Phys. Rev. Lett., Vol. 89, pp. 208701, 2002.
[15] M. E. J. Newman, The Structure and Function of Complex Networks, SIAM Review, Vol. 45, pp.
167-256, 2003.
[16] M. E. J. Newman and M. Girvan, Finding and Evaluating Community Structure in Networks,
Phys. Rev. E, Vol. 69, pp. 026113, 2004.
[17] M. E. J. Newman, Networks: An Introduction, Oxford University Press, New York, NY, 2010.
[18] D. Pelleg and A. Moore, X-means: Extending K-means with Efficient Estimation of the Number
of Clusters, Proc. 17th Int. Conf. on Machine Learning, pp. 727-734, 2000.
[19] M. Rosvall and C. T. Bergstrom, Maps of Random Walks on Complex Networks Reveal Community
Structure, Proc. Natl. Acad. Sci. USA, Vol. 105, pp. 1118-1123, 2008.
[20] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, PA, 2nd Ed., 2003.
[21] M. Chen, K. Kuzmin and B. K. Szymanski, Community Detection via Maximization of Modularity
and Its Variants, IEEE Trans. Comput. Social System, Vol. 1, pp. 46-65, 2014.
[22] S. White and P. Smyth, A Spectral Approach to Finding Communities in Graphs, SIAM Conf.
Data Mining, 2005.
[23] Nvidia, CUDA Toolkit, http://developer.nvidia.com/cuda-downloads
[24] The University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices

10

You might also like