A Modified Distance Dynamics Model For Improvement
A Modified Distance Dynamics Model For Improvement
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2018.DOI
ABSTRACT Community detection is a key technique for identifying the intrinsic community structures
of complex networks. The distance dynamics model has been proven effective in finding communities
with arbitrary size and shape and identifying outliers. However, to simulate distance dynamics, the model
requires manual parameter specification and is sensitive to the cohesion threshold parameter, which is
difficult to determine. Furthermore, it has difficulty handling rough outliers and ignores hubs (nodes that
bridge communities). In this paper, we propose a robust distance dynamics model, namely, Attractor++,
which uses a dynamic membership degree. In Attractor++, the dynamic membership degree is used to
determine the influence of exclusive neighbors on the distance instead of setting the cohesion threshold.
Additionally, considering its inefficiency and low accuracy in handling outliers and identifying hubs, we
design an outlier optimization model that is based on triangle adjacency. By using optimization rules, a
postprocessing method further judges whether a singleton node should be merged into the same community
as its triangles or regarded as a hub or an outlier. Extensive experiments on both real-world and synthetic
networks demonstrate that our algorithm more accurately identifies nodes that have special roles (hubs and
outliers) and more effectively identifies community structures.
INDEX TERMS community detection, complex network, distance dynamics model, membership function.
VOLUME 4, 2016 1
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
Lately, one of the most successful community detection namely, 300 outliers and 59 communities. In this model,
methods, namely, Attractor, which is a distance dynamics many other real-world networks have similar scenarios.
model, was proposed by Shao et al. [15]. Unlike the tra- • Unable to identify the differences between outliers
ditional algorithms [8], [12], [14], [16], Attractor provides and hubs. Actually, Some of the sparsely connected
an intuitive way to analyze the community structure of a nodes in a network may not be outliers but hubs. In
network. This model views the entire graph as an adaptive addition to detecting communities and outliers, identi-
global dynamical system and simulates the synchronization fying nodes with special roles, such as hubs, is a chal-
dynamics over time. The process of the traditional distance lenging task in determining the structure of a complex
dynamics model involves the following stages: First, each network, as hubs play important roles in many real
edge is associated with an initial distance. Then, in a se- complex networks [12]. For instance, hubs in epidemiol-
quential process, each distance gradually shrinks or stretches ogy networks can be core nodes for spreading diseases;
via interaction with its local topological structure. Finally, all in collaboration networks, hubs can be core nodes for
distances converge to 0 or 1. As a result, all communities and sharing ideas.
outliers are naturally obtained by removing the edges with To describe the hubs more clearly, let us consider a simple
distance that are equal to 1. The traditional model has several example; see Figure 1. By using the traditional distance
attractive benefits, such as intuitive community detection, dynamics model, we identify 2 communities and 2 outliers.
small community detection, and anomaly detection. How- All nodes with the same color belong to the same community
ever, there are several limitations of the traditional distance and the two red nodes are the outliers. Our method, namely,
dynamics model. Attractor++, identifies two communities, namely, 1, 2, 3, 4, 5,
6 and 9, 10, 11, 12, 13, 14, and identifies node 7 as an outlier
• Extremely sensitive parameter settings. In the tra- and node 8 as a hub.
ditional distance dynamics model, the global cohesion
parameter, which is denoted as λ, is used to determine 3 4 10 11
the positive or negative interaction influence on the dis-
Outlier
tances for exclusive neighbors. Typically, a lower value
1 2 5 8 9 12
of λ yields larger communities whereas a higher value
Outlier
of λ produces more communities. However, different
networks have different local structures and may require 7 6 14 13
different parameter settings. Thus, it is difficult to find a FIGURE 1: Running example.
proper value of the cohesion parameter λ for a specified
network. In some cases, minor changes to parameter λ
may cause great differences in the resulting community A robust distance dynamics model should be able to over-
structure. come the above limitations. We propose a robust distance
• Unreasonable influence from exclusive neighbors. dynamics model for community detection. To overcome the
During the local dynamic interaction process, the struc- parameter-sensitivity problem, the dynamic membership de-
tures of the communities are constantly changing as new gree is introduced to determine the influence of an exclusive
distances converge. In the traditional distance dynam- neighbor on the distance. Furthermore, the dynamic influ-
ics model, once the underlying influence of exclusive ences from exclusive neighbors can also be easily determined
neighbors on the distances has been determined by the by our algorithm. The membership degree is a dynamic func-
cohesion parameter λ, the influence does not change tion that is based on the characteristics of the communities
during the entire dynamic interaction process. Even if during the local dynamic interaction process in real time. To
an exclusive neighbor has a positive influence on the dis- overcome the rough-outlier problem, an outlier optimization
tance at time step 0, it would have a negative influence rule is proposed for further judging whether an outlier should
on the distance at time step t (the exclusive neighbor be merged into a community based on the adjacent triangle.
may have been moving far away from the corresponding To overcome the unidentified-hub problem, another outlier
node at time step t). optimization rule is developed for further judging whether an
• Poor quality of anomaly detection. In the process of outlier should be as hub based on the connected triangle. We
synchronization dynamics, the traditional model easily summarize the main contributions of this paper as follows.
produces many rough outliers, especially in a large- • A robust distance dynamics model. Based on a dy-
scale, high-density, or noisy network. Many outliers namic membership degree, we propose a robust distance
that are identified by the traditional distance dynamics dynamics model that has improved robustness. The dy-
model belong to a community in the ground truth of the namic membership degree is used to handle the tradi-
network. Consider the typical email-enron network as tional cohesive parameter λ. The dynamic membership
an example: The network consists of 1133 nodes and degree is a similarity index that is used to measures
5451 edges. By using the traditional distance dynamics the similarity between nodes and communities. The
model with parameter λ=0.5, we identify 359 classes, experimental results demonstrate the effectiveness and
2 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
accuracy of finding communities via the robust distance also identifies hubs and outliers. To overcome the problems
dynamics model. of parameter sensitivity and exhaustive similarity evaluations
• An outlier optimization model. To further judge in SCAN, two parameter-free methods, namely, SHRINK
whether each outlier should be merged into the same [18] and SkeletonClu [17], have been proposed and SCAN++
community as its triangles or classified as a hub, we [13] and pSCAN [19] have been proposed to reduce the time
design two outlier optimization rules that help identify complexity.
vertices that have special roles (i.e., hubs and outliers) Dynamic method and distance dynamic model. Dy-
and integrate them into the distance dynamics model. namic algorithms that support additional community se-
• Robust algorithm: Attractor++. A robust community mantics are another research area. Dynamic-process-based
detection algorithm, namely, Attractor++, is proposed. methods are important in the field of complex network com-
It is based on a robust distance dynamics model and munity discovery. Typical dynamic methods include label
an outlier optimization model. Experimental results on propagation [16], random walk [20], and distance dynamics
artificial and real-world networks demonstrate that At- [15]. Owing to the simplicity of its procedure, the label
tractor++ is more robust and efficient in overcoming the propagation method can detect communities in almost linear
above limitations than the original algorithm. time; however, it has poor stability due to the randomness
The remainder of this paper is organized as follows: Re- in the label propagation process [2]. Random-walk-based
lated works are discussed in Section II. Section III presents methods are routinely used for community detection from
our robust model and corresponding community detection the global perspective [20], [21]. However, the quality of the
algorithm. An extensive experimental evaluation is presented detected communities heavily depends on the choice of the
in Section IV . Finally, Section V presents the conclusions of seed node. Recently, inspired by synchronization clustering
this paper. [22], Shao et al. [15] consider the problem of community
detection from a new point of view: distance dynamics.
II. RELATED WORKS Unfortunately, this method has several problems, which were
Community detection has been studied for decades in many analyzed in Section I. To overcome the sensitivity of param-
fields. Recently, scholars have proposed many algorithms for eter λ, E-Attractor [18] was recently proposed. It improves
detecting community structures in complex networks, partic- the stability of Attractor by employing Ego-Leader to replace
ularity in computer science [2]. We review related works, cohesion parameter λ in the dynamic interaction process.
which are organized according to the community detection By using Ego-Leader, the underlying influence of exclusive
algorithms, dynamic method and distance dynamics model, neighbors can be determined by identifying the top-k neigh-
and dynamic membership degree. bors. However, it still has difficulty determining the globally
Community detection algorithms. Currently, the most optimal value of k and lacks an automated way to find a
widely used and practical community detection algorithms satisfactory value of k. Moreover, clustering based on the
can be divided into four categories: graph-partitioning algo- global parameter settings cannot always describe the intrinsic
rithms, modularity-based algorithms, density-based-method community structure accurately and easily produces many
algorithms and dynamics algorithms. In graph-partitioning rough outliers. In addition, F an L et al. [23] proposed a
algorithms, community detection was first modeled as a semisupervised community detection method that integrates
graph partitioning problem. Hence, graph-partitioning algo- the prior information into the distance dynamics model to
rithms [8], [9] are natural choices for community detection. improve the accuracy of community detection. Although this
However, these algorithms rely on a prespecified number of approach is novel, it does not consider these problems. To the
communities k, which renders them not highly applicable best of our knowledge, ours is the first work to solve these
to real-world networks. Since the community structures are problems systematically.
highly complex, it is often expensive or impossible to obtain Dynamic membership degree. The dynamic membership
the number of communities in many real-world networks. degree is essentially a dynamic membership function. Dy-
For modularity-based algorithms, many researchers devoted namic membership functions have been extensively studied
their efforts to improving the effectiveness of community [24], [25] and are widely used in fuzzy systems to describe
detection. One typical method is to optimize the modularity the system dynamics [26]. N epusz et al. [27] define a
measure [10], which is widely used to evaluate the commu- numerical membership degree and develop an algorithm for
nity structure of a network from a global perspective. Unfor- determining the optimal membership degree that is able to
tunately, although modularity-based algorithms are effective identify outlier vertices that do not belong to any of the
in many applications [10], [11], they are difficult to apply communities, which are called bridge vertices, and quantify
to large-scale networks due to their high time complexity the centrality of a vertex with respect to its dominant com-
(which is called the resolution limit problem). Due to the res- munity. Kundu S et al. [28] proposed a community de-
olution limit of modularity-based algorithms, density-based tection algorithm that identifies fuzzy-rough communities in
algorithms have attracted wide research interest [12], [17]. which a single node can belong to many groups with various
One of the most successful density-based algorithms, namely, membership degrees. The method performs well when the
SCAN [12], not only detects meaningful communities but network contains overlapping communities. Recently, Luo
VOLUME 4, 2016 3
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
et al. [29] used various dynamic membership functions to B. DYNAMIC MEMBERSHIP DEGREE CONSTRUCTION
describe the dynamics of the process of community forma- To systematically address the limitations of the traditional
tion and achieved satisfactory result. Therefore, evidence distance dynamics model and enable efficient community
increasingly supports that dynamic membership functions detection, we propose a new metric, namely, the dynamic
have substantial advantages in describing system dynamics. membership degree, for measuring the similarity between an
Motivated by the advantageous properties of dynamic exclusive neighbor node and core nodes and border nodes.
membership functions, we replace cohesion parameter λ Specifically, if an exclusive neighbor node has a stronger
by a dynamic membership degree to simulate the distance membership degree to the core nodes than to the border
dynamics more accurately. In this paper, we integrate the nodes, the exclusive neighbor node will have a positive influ-
dynamic membership degree into the distance dynamics ence on the distance. Moreover, because the node set of core
model and propose a robust distance dynamics model that has nodes and border nodes will change over time, the member-
no parameters. We combine the original network topology ship degree is dynamic. The key to the dynamic membership
with the membership degree to modify the distance model, degree is that each community in a graph consists of a set
which can substantially shorten the time step to accelerate of core nodes and the border nodes that are associated with
the convergence of the distance between nodes and improve these core nodes. Thus, the dynamic membership degree
the accuracy of our algorithm. can replace traditional cohesion parameter λ to determine
the influence of an exclusive neighbor on the distance. To
III. PROPOSED METHOD:ATTRACTOR++ compute the membership degree of an exclusive neighbor
A. PRELIMINARIES node, we define the core nodes of the community that is
Before introducing our method, we present the basic notions associated with a node as follows:
and related definitions that we use throughout the paper. In Definition 3 (Core Nodes). For any arbitrary node u, the
this paper, we focus on an undirected graph G = (V, E, W ), core nodes C(u) are defined as follows: First, the node u and
where V , E and W denote the node set, edge set, and edge its neighbors are considered core nodes if they have nonzero
weight set, respectively. The distance between two nodes similarity degree with node u. Second, for a node that is not
depends on their shared neighbors. Thus, prior to computing a neighbor of node u to become a core node, it must have a
distances, the structural neighbors of a node are defined. The distance of 0 from node u or any other core node.
structural neighbors of a node are its adjacent nodes and the These core nodes are more likely to cluster with node u.
node itself. The core membership degree of exclusive neighbor node x to
Definition 1 (Neighbors of Node u). In an undirected the community that is associated with node u is computed as
graph G = (V, E, W ), the neighbors of node u, which are follows:
|T (x) ∩ C (u)|
denoted by N (u), are defined as follows: CM (x, u) = , (4)
|T (x)|
N (u) = { v ∈ V | {u, v} ∈ E} ∪ {u} . (1)
where T (x) is the set of neighbors of the exclusive neighbor
The distance between adjacent nodes is computed accord- node x and not include the node x and C(u) is the set of core
ing to the common nodes in the structural neighborhoods. nodes that are associated with node u.
This measurement is called the Jaccard distance and is de- Definition 4 (Border Nodes). For any arbitrary node u and
fined as follows: exclusive neighbor node x such that u 6= x, the border nodes
Definition 2 (Jaccard Distance). In an unweighted undi- B(u) are define as follows: First, node x and its neighbors
rected graph G = (V, E), the Jaccard distance between node are considered border nodes if they are not core nodes that
u and node v is defined as: are associated with node u. Second, for a node that is not a
core node that is associated with node u to become a border
|N (u) ∩ N (v)| node, it must have a distance of 0 from node x or any other
d (u, v) = 1 − . (2)
|N (u) ∪ N (v)| border node.
The border nodes are those nodes that have a small prob-
In the above equation, | ∗ | denotes the number of nodes in ability of clustering with node u. The border membership
set ∗ and N (∗) denotes the neighbors of node ∗. The Jaccard degree of exclusive neighbor node x is computed as follows:
distance is a score that varies from 0 to 1 and indicates
the scale of the matching degree of the common neighbors. |T (x) ∩ B (u)|
BM (x, u) = , (5)
When two adjacent nodes share few common neighbors, their |T (x)|
Jaccard distance is large.
where T (x) is the set of neighbors of exclusive neighbor node
For a weighted undirected graph, because each edge has
x and does not include the node x and B(u) is the set of
a different weight, the equation for the Jaccard distance is
border nodes that are associated with node u.
different:
P After computing both the core membership degree and
x∈N (u)∩N (v) (w (u, x) + w (v, x)) the border membership degree, we can easily determine the
d (u, v) = 1 − P . (3)
{x,y}∈E;x,y∈N (u)∪N (v) w (x, y)
positive or negative influence of exclusive neighbors on the
4 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
x y x
u v u d v u d v u v
d(u,v) d
OR
d1 d2 d3 d3'
(a) Example graph (b)Influence from direct (c) Influence from (d) Influence from
linked nodes common neighbors exclusive neighbors
FIGURE 3: Three distinct interaction patterns.
In equation 8, CM (x, u) and BM (x, u) are localized basic indicator of a strong relation in the graph. The main rea-
similarity indices for assessing the similarity between the son we choose the triangle structure connectivity is that real-
exclusive neighbor and the core nodes and border nodes, world graphs such as social networks contains more triangles
respectively. The function σ (x, u) not only characterizes than random graphs and have many triangles in a community
the degree of influence of the exclusive neighbor x on the [30]. Moreover, triangles are known as fundamental building
distance but also indicates the direction (positive or negative) blocks of a network. In a network, a triangle implies a strong
of this influence. Based on the function σ (x, u), the robust tie among three nodes or the existence of a common node
pattern of influence by exclusive neighbors, which is called between other two nodes. In this section, we introduce two
REI, is defined as follows: outlier optimization rules: triangle adjacency and triangle
X f (1 − d (x, u)) · σ (x, u) connectivity. Based on these optimization rules, we propose
REI = − an outlier postprocessing method. If the triangles of an outlier
deg (u) satisfy triangle adjacency and all adjacent triangles belong
x∈EN (u)
(9) to the same community, this outlier should cluster with its
X f (1 − d (y, v)) · σ (y, v)
− , triangles. For that, we define triangle adjacency.
deg (v)
y∈EN (v) Definition 6 (Triangle adjacency.) Two triangles, namely,
∆o and ∆c , in G = (V, E) are adjacent if and only if ∆o and
where EN (u) and EN (v) are node sets of exclusive neigh-
∆c share a common edge, which is denoted by ∆o ∩ ∆c =
bors of nodes u and v, respectively and are expressed as
e(x, y) ∈ E(G).
EN (u) = N (u) − (N (u) ∩ N (v)) and EN (v) = N (v) −
(N (u) ∩ N (v)). For the graph G in Figure 4(a), ∆5,14,15 and ∆5,8,15 are
adjacent as they share a common edge: e(5, 15). Based on
As a result, we obtain the robust distance dynamics model
triangle adjacency, we propose the first optimization rule.
by considering three interaction patterns together. The dis-
tance dynamics d(u, v, t + 1) between nodes u and v over Optimization Rule 1. Given the set C of clusters in a
time is defined as follows: graph G, a vertex u that is not in any cluster in C is not
an outlier vertex if its triangles are only adjacent to other
d (u, v, t + 1) = d (u, v, t) + DI (u, v, t) triangles that are in same community, which is denoted as Ci .
(10)
+CI (u, v, t) + REI (u, v, t) , In this case, the outlier u belongs to community Ci (u ∈ Ci ).
To describe the rule more clearly, we consider a simple
where d(u, v, t + 1) is the new distance at time step t + 1 and example. In Figure 4(a), a simple social network is illustrated.
DI(u, v, t), CI(u, v, t) and REI(u, v, t) indicate the influ- By using the robust distance dynamics model, we find 3
ences of directly connect end nodes, common neighbors, and classes and 2 outliers, where all nodes with same color
exclusive neighbors, respectively, on the distance d(u, v, t) at belong to the same community and the two green nodes
time step t. are the outliers. The triangles of outlier 5, namely, ∆5,14,15
Finally, as time evolves, all distances will converge, and and ∆5,8,15 , are adjacent to triangles ∆11,14,15 and ∆8,10,15 ,
all communities and outliers can be easily identified by respectively. Furthermore, triangles ∆11,14,15 and ∆8,10,15
removing the edges with the distances equivalent to 1. both belong to the blue community. Hence, optimization rule
1 is satisfied. Therefore, node 5 is not an outlier and should
D. OUTLIER OPTIMIZATION MODEL be merged into the blue community. Unlike node 5, node 22
In a network, an outlier has few links in its neighbor set doesn’t satisfy optimization rule 1. Therefore, node 22 is an
[13]. Therefore, we try to exploit the structure connectivity outlier and should not merged into the purple community.
of neighbors to further optimize the accuracy of outliers, In many real-world networks, there are typically hub nodes
which are identified via the distance dynamics model. The [12] that bridge many clusters but don’t belong to any cluster.
concept of structure connectivity has been widely used in Outliers are the nodes that are neither clusters nor hubs.
community detection [12]. Contrary to traditional methods Each outlier is only weakly associated with a cluster. If the
[12], [13], we take the triangles instead of the edges as the triangles of an outlier are connected and belong to different
6 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
communities. It has been most widely used to measure the TABLE 2: Synthetic networks and parameters for the LFR
quality of a community when the ground truth is known. The benchmark.
NMI provides a real number that ranges from 0 to 1 via Networks N C k kmax µ Edges
normalization. If the detected communities are completely LFR1 200 4 15 18 0.25 3208
independent of the real communities, then NMI=0; if the LFR2 800 8 15 18 0.25 12453
detected communities are identical to the real communities, LFR3 2000 20 20 25 0.2 35268
LFR4 4000 50 20 25 0.2 79011
then NMI=1. LFR5 10000 80 10 12 0.15 107056
F-measure: We also use F-measure [33] to quantify the LFR6 20000 120 10 12 0.15 216717
performances of the identified communities. F-measure is LFR7 40000 180 12 15 0.10 453054
a commonly used criterion for community detection al- LFR8 60000 250 12 15 0.10 674128
gorithms when the community ground truth is known. F-
measure provides a real number that is between zero and
one and combines recall and precision. A poorly performing values of NMI, F-measure, ARI and running time are calcu-
community detection algorithm should be associated with a lated. The experimental results are shown in Figure 5.
low F-measure. The higher the F-measure value, the better Figure 5(a) displays the MNI results of various algo-
the algorithm performs. rithms on LFR synthetic networks, from which we make the
ARI: The Adjusted Rand Index (ARI [34]) is selected following observations: (1) The five community detection
as the third metric for all algorithms. ARI measures the algorithms yield satisfactory results and the average value
similarity between two clustering results (the agreement on of NMI exceeds 0.6. (2) Comparing the five algorithms, we
whether to put two nodes in the same cluster or in different find that Attractor++ and E-Attractor offer better efficiency
clusters). ARI has a value that is between 0 and 1, where 1 and stability, followed by Attractor and Louvain; the LPA
indicates that the two clustering results are completely same. algorithm performs worst. (3) Focusing on the Attractor, E-
If the detected communities are poor, then ARI=0. Attractor and Attractor++ algorithms, we find that Attrac-
tor++ and E-Attractor are more stable than Attractor on most
LFR networks and E-Attractor has very similar performance
B. SYNTHETIC NETWORKS
to Attractor++.
1) Network Generation
To evaluate the performance and the sensitivity to
community-structure of the selected algorithms, we investi-
gated the results on synthetic networks that were generated
F-measure
MNI
degree and community size follow power-law distributions. ( c ) ARI ( d ) Running Time
By varying the parameters of the LFR benchmark, we can FIGURE 5: Community detection performances of various
analyze the performances of the algorithms in detail. In algorithms on LFR networks.
these experiments, we generate eight synthetic networks with
ground-truth information. The values of the parameters for Figure 5(b) displays the F-measure of various algo-
the generated networks are listed in Table 2. rithms on LFR synthetic networks, from which we make
To make the synthetic networks more consistent with the the following observations: (1) On the high-noise networks
real-world networks, we generate eight networks with vari- (LFR1 LFR4, mixing parameter µ ≥ 0.2), the F-measure
ous numbers of communities. By setting parameters k, kmax fluctuation is substantial for all five algorithms, of which
and µ, we ensure that all synthetic networks have different Attractor++ and E-Attractor perform best, followed by At-
average degrees, maximum degrees of nodes and noise edges tractor and Louvain, and LPA performs worst. (2) On the
in each community. low-noise networks (LFR5 LFR8, mixing parameter µ<0.2),
the performances of five algorithms are similar.
2) Community Detection Performance Figure 5(c) displays the ARI values of the five algorithms
We evaluate the community detection performances of vari- on LFR synthetic networks, from which we make the fol-
ous algorithms on LFR synthetic networks. Then, the average lowing observations: (1) For the ARI, the differences among
VOLUME 4, 2016 9
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
Number of Outliers
the algorithms are unstable among the LFR networks. (2)
(O#)
36% 52%
Comparing the five algorithms, we find that Attractor++
O#
71%
O#
O#
63%
Number of Outliers
the following observations: (1) Comparing the LPA and Lou-
(O#)
51% 46% 34%
O#
vain algorithms, we observe that the running time of LPA is
O#
O#
27%
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
Attractor++ algorithms, the performance of E-Attractor is dynamics model typically detects many outliers, e.g., on
more stable than that of Attractor and that of Attractor++ is the polblogs and DBLP networks. By using our proposed
more stable than that of E-Attractor. outlier postprocessing method, the number of outliers that are
Figure 7(c) displays the ARI values of the algorithms on identified by Attractor++ can be substantially reduced and
real-world networks, from which we make the following the accuracy of outlier identification enhanced. For example,
observations: (1) In terms of average ARI, LPA performs on the polbooks network, the outlier optimization percentage
significantly worse than the other four algorithms. (2) Com- reaches 50%; on the adjnoun network, all outliers are opti-
paring the five algorithms, we find that Attractor++ has the mized; and on the DBLP network, the percentage exceeds
highest efficiency and stability on most real-world networks. 60%. Considering all real-world networks, the distance dy-
Figure 7(d) displays the running times of the algorithms namics model faces the drawback of easily producing many
on real-world networks, from which we make the following rough outliers and the outlier optimization step is highly
observations: (1) The average running time of the LAP algo- necessary for the Attractor++ algorithm. Moreover, Figure
rithm is slightly shorter than that of the Louvain algorithm. 8 demonstrates the effectiveness of the proposed outlier
(2) The running times of the LPA and Louvain algorithms are postprocessing method.
a few orders of magnitude shorter than those of the Attractor,
E-Attractor and Attractor algorithms. (3) The running times
Number of Outliers
of the Attractor, E-Attractor and Attractor++ algorithms are
(O#)
very similar. Specifically, Attractor is slightly faster than E-
O#
O#
100%
Attractor and E-Attractor is slightly faster than Attractor++. 0% 50%
In summary, Attractor++, E-Attractor, Attractor and Lou-
vain outperform LPA on both the sparse real-world networks
(karate, adjnoun and DBLP) and the high-density real-world
Number of Outliers
O#
O#
No Outliers 64%
tractor and E-Attractor algorithms.
networks.
MNI
D. TIME STEPS
MNI vs Networks F-measure vs Networks
( a ) MNI ( b ) F-measure In the distance dynamics model, the dynamics of each dis-
tance is simulated according to the three interaction patterns.
Before all distances in the network converge (either 1 or 0),
Running Time(Ms)
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
slows as the number of time steps increases, whereas the con- FIGURE 11: Case study on the dolphins network using the
vergence speed of Attractor++ increases. For example, after Attractor algorithm.
one time step, nearly 42% of the distances have converged
when the Attractor algorithm is used, compared to only 23%
distances for the Attractor++ algorithm. However, after six nodes are real outliers. For the other 4 outliers, we can use
time steps, nearly 99% of the distances have converged with the outlier postprocessing algorithm to further optimize the
the Attractor++ algorithm, compared to nearly 98% distances results.
with the Attractor algorithm. The main reason is that At-
tractor adopts a global parameter setting to determine the 11
4
underlying influence of exclusive neighbors on the distance, [2 classes, 8 outliers (green nodes-7 leaf)] 55
23 51 24
but the structures of the communities are constantly changing
59 3 15 21
18
with the convergence of new distances. In contrast, Attrac- 22 35
25 39 8 45 29
26
tor++ adopt the dynamic membership degree to determine the 31 36
27 1 14 52 43
underlying influence of exclusive neighbors on the distance. 9
17
40
0 50 33 12
13 41 7
A similar result can easily be obtained from Figure 10; it is 32
54
28
20
37 16
38
34 46
60 5 57 10
47 49
not discussed due to space limitations. 6 19 30 42
44
53
56 2 61
48 58
60 101
102
94
C2
76
53
9
19
11 24
47
63 40 17
48 14
consists of 62 vertices and 159 undirected edges. There are 62
97 96
81
83
93 28
4
46
55
29
10
15
26
12
35
84 7 18 33
82 51
two communities in the dataset but no class label information. 99
100
86
73
72
80
31
103
69
6
22
25
21
27
23
8
44
98 74 66 104 0 36
Each node in the network represents a dolphin that lives 87
89
91 79
75
71
30
68
85 64 52 2 16
1
3 32
54
34
in New Zealand. If two dolphins are in contact frequently, (a) Ground truth 88 90 92 78 70 67 65 58 5 50 41 38 37 39
12 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
The second network is the Books About US Politics [3] S. Fortunato, M. Barthelemy, Resolution limit in community detection,
network, which is referred to as the polbooks network and Proceedings of the National Academy of Sciences 104 (1) (2007) 36–41.
[4] A. Lancichinetti, S. Fortunato, Community detection algorithms: a com-
consists of 105 nodes and 441 edges. There are three commu- parative analysis, Physical review E 80 (5) (2009) 056117.
nities in the network and ground-truth information is avail- [5] W. Cui, Y. Xiao, H. Wang, W. Wang, Local search of communities in
able. Each node in the network represents a book about US large graphs, in: Proceedings of the 2014 ACM SIGMOD international
conference on Management of data, ACM, 2014, pp. 991–1002.
politics. An edge between two books indicates that they are [6] A. R. Benson, D. F. Gleich, J. Leskovec, Higher-order organization of
often purchased together by customers. Figure 13(a) shows complex networks, Science 353 (6295) (2016) 163–166.
the ground truth of the polbooks network, which covers [7] L. Chen, J. Zhang, L. Cai, Z. Deng, Fast community detection based on
distance dynamics, Tsinghua Science and Technology 22 (6) (2017) 564–
3 clusters. Figure 13(b) shows the detection results that 585.
were obtained by the Attractor algorithm, which identified [8] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transac-
4 communities and 6 outliers (red dashed circle). Figure tions on pattern analysis and machine intelligence 22 (8) (2000) 888–905.
[9] L. Wang, Y. Xiao, B. Shao, H. Wang, How to partition a billion-node
13(c) shows the detection results that were obtained by the graph, in: Data Engineering (ICDE), 2014 IEEE 30th International Con-
Attractor++ algorithm, which identified 3 communities and ference on, IEEE, 2014, pp. 568–579.
1 hub (red dashed circle). Comparing Figure 13(b) to Figure [10] M. E. Newman, Modularity and community structure in networks, Pro-
ceedings of the national academy of sciences 103 (23) (2006) 8577–8582.
13(a) and Figure 13(c) to Figure 13(a), Attractor++ performs [11] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding
better in identifying ground-truth communities. of communities in large networks, Journal of statistical mechanics: theory
Based on the above two case studies, we make the fol- and experiment 2008 (10) (2008) P10008.
[12] X. Xu, N. Yuruk, Z. Feng, T. A. Schweiger, Scan: a structural clustering
lowing remarks: (1) Our algorithm, namely, Attractor++, can algorithm for networks, in: Proceedings of the 13th ACM SIGKDD
effectively identify vertices that have special roles (hubs and international conference on Knowledge discovery and data mining, ACM,
outliers). (2) Our robust distance dynamics model, which is 2007, pp. 824–833.
[13] H. Shiokawa, Y. Fujiwara, M. Onizuka, Scan++: efficient algorithm for
based on the dynamic membership degree, is effective on finding clusters, hubs and outliers on large-scale graphs, Proceedings of
various networks. the VLDB Endowment 8 (11) (2015) 1178–1189.
[14] M. Rosvall, C. T. Bergstrom, Maps of random walks on complex networks
reveal community structure, Proceedings of the National Academy of
V. CONCLUSIONS
Sciences 105 (4) (2008) 1118–1123.
In this paper, we have presented the novel concept of dynamic [15] J. Shao, Z. Han, Q. Yang, T. Zhou, Community detection based on dis-
membership degree. It enables us to avoid strong dependence tance dynamics, in: Proceedings of the 21th ACM SIGKDD International
on the cohesion parameter λ. Thus, we can conveniently Conference on Knowledge Discovery and Data Mining, ACM, 2015, pp.
1075–1084.
identify high-quality communities. Based on this concept, a [16] U. N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect
robust distance dynamics model has been developed, along community structures in large-scale networks, Physical review E 76 (3)
with a robust community detection algorithm: Attractor++. (2007) 036106.
[17] J. Huang, H. Sun, Q. Song, H. Deng, J. Han, Revealing density-based
Moreover, to improve the accuracy of outlier node identifi- clustering structure from the core-connected tree of a network, IEEE
cation, we further propose two optimization rules for judging Transactions on Knowledge and Data Engineering 25 (8) (2013) 1876–
whether an outlier should be merged into same community as 1889.
[18] J. Huang, H. Sun, J. Han, H. Deng, Y. Sun, Y. Liu, Shrink: a structural
its triangles or be classified as a hub. We conduct extensive clustering algorithm for detecting hierarchical communities in networks,
experiments on both synthetic and real-world networks, and in: Proceedings of the 19th ACM international conference on Information
the results demonstrate the effectiveness and efficiency of the and knowledge management, ACM, 2010, pp. 219–228.
[19] L. Chang, W. Li, L. Qin, W. Zhang, S. Yang, pscan: Fast and exact
proposed algorithm. structural graph clustering, IEEE Transactions on Knowledge and Data
However, complex networks in the real world change Engineering 29 (2) (2017) 387–401.
dynamically over time and their community structures are [20] P. Pons, M. Latapy, Computing communities in large networks using
random walks, in: International symposium on computer and information
dynamically updated. In the face of dynamic networks with sciences, Springer, 2005, pp. 284–293.
complex changes, designing dynamic community discovery [21] Y. Wu, R. Jin, J. Li, X. Zhang, Robust local community detection: on free
algorithms that are based on distance dynamics models re- rider effect and its elimination, Proceedings of the VLDB Endowment 8 (7)
(2015) 798–809.
quires further study. In addition, multiobjective optimization, [22] C. Böhm, C. Plant, J. Shao, Q. Yang, Clustering by synchronization,
game theory, statistics and other theories can be used in in: Proceedings of the 16th ACM SIGKDD international conference on
dynamic community discovery scenarios to design better- Knowledge discovery and data mining, ACM, 2010, pp. 583–592.
[23] L. Fan, S. Xu, D. Liu, Y. Ru, Semi-supervised community detection based
performing dynamic community discovery algorithms. on distance dynamics, IEEE Access.
[24] J. Virant, N. Zimic, Attention to time in fuzzy logic, Fuzzy Sets and
ACKNOWLEDGMENT Systems 82 (1) (1996) 39–49.
[25] S. Wu, M. J. Er, Dynamic fuzzy neural networks-a novel approach to func-
Tao Meng and authors thank the experimental equipments tion approximation, IEEE Transactions on Systems, Man, and Cybernetics,
provided by National Super Computing Center of Changsha, Part B (Cybernetics) 30 (2) (2000) 358–364.
located in Hunan province of China. [26] M. Cerrada, J. Aguilar, E. Colina, A. Titli, Dynamical membership func-
tions: an approach for adaptive fuzzy modelling, Fuzzy Sets and Systems
152 (3) (2005) 513–533.
REFERENCES [27] T. Nepusz, A. Petróczi, L. Négyessy, F. Bazsó, Fuzzy communities and
[1] J. Duch, A. Arenas, Community detection in complex networks using the concept of bridgeness in complex networks, Physical Review E 77 (1)
extremal optimization, Physical review E 72 (2) (2005) 027104. (2008) 016107.
[2] S. Fortunato, Community detection in graphs, Physics reports 486 (3-5) [28] S. Kundu, S. K. Pal, Fuzzy-rough community in social networks, Pattern
(2010) 75–174. Recognition Letters 67 (2015) 145–152.
VOLUME 4, 2016 13
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access
Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection
[29] W. Luo, D. Zhang, H. Jiang, L. Ni, Y. Hu, Local community detection with
the dynamic membership function, IEEE Transactions on Fuzzy Systems.
[30] A. Prat-Pérez, D. Dominguez-Sal, J. M. Brunat, J.-L. Larriba-Pey, Shaping
communities out of triangles, in: Proceedings of the 21st ACM inter-
national conference on Information and knowledge management, ACM,
2012, pp. 1677–1681.
[31] C. L. Jun, Z. Jing, C. Lei, H. T. Qin, Enhanced distance dynamics model
for community detection via ego-leader., KSII Transactions on Internet &
Information Systems 12 (5).
[32] L. Danon, A. Dĺłazguilera, J. Duch, A. Arenas, Comparing community
structure identification, Journal of Statistical Mechanics 2005 (09) (2005)
09008.
[33] W. M. Rand, Objective criteria for the evaluation of clustering methods,
Publications of the American Statistical Association 66 (336) (1971) 846–
850.
[34] C. D. Manning, P. Raghavan, H. Schĺźtze, Introduction to information
retrieval, Journal of the American Society for Information Science &
Technology 43 (3) (2008) 824–825.
14 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.