0% found this document useful (0 votes)
62 views14 pages

A Modified Distance Dynamics Model For Improvement

This paper proposes a modified distance dynamics model called Attractor++ to improve community detection. The traditional distance dynamics model requires manual parameter specification and has difficulty handling outliers and identifying hubs. Attractor++ uses a dynamic membership degree instead of a cohesion threshold to determine neighbor influence. It also includes an outlier optimization model based on triangle adjacency and a postprocessing method to better identify hubs, outliers, and community structures.

Uploaded by

caronte162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views14 pages

A Modified Distance Dynamics Model For Improvement

This paper proposes a modified distance dynamics model called Attractor++ to improve community detection. The traditional distance dynamics model requires manual parameter specification and has difficulty handling outliers and identifying hubs. Attractor++ uses a dynamic membership degree instead of a cohesion threshold to determine neighbor influence. It also includes an outlier optimization model based on triangle adjacency and a postprocessing method to better identify hubs, outliers, and community structures.

Uploaded by

caronte162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2018.DOI

A Modified Distance Dynamics Model for


Improvement of Community Detection
TAO MENG1 , LIJUN CAI1 , TINGQIN HE1 , LEI CHEN2 , ZIYUN DENG3 , WEIPING DING4,5
ZEHONG CAO5
1
College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China (e-mail: [email protected])
2
College of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201 China
3
Department of Economics and Trade, ChangSha Commerce and Tourism College,Changsha, 410082 China
4
School of Computer Science and Technology, Nantong University, Nantong, 226019, China
5
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW 2007, Australia
Corresponding author: Lijun Cai (e-mail: [email protected]).
This work was supported by the National Natural Science Foundation of China (61472127, 61272395); the Natural Science Foundation of
Hunan Province (Nos. 2017JJ5064); the Social Science Foundation of Hunan Province (No. 16ZDA07); The Open Project of State Key
Laboratory of Advanced Design and Manufacturing for Vehicle Body (No. 31715010).

ABSTRACT Community detection is a key technique for identifying the intrinsic community structures
of complex networks. The distance dynamics model has been proven effective in finding communities
with arbitrary size and shape and identifying outliers. However, to simulate distance dynamics, the model
requires manual parameter specification and is sensitive to the cohesion threshold parameter, which is
difficult to determine. Furthermore, it has difficulty handling rough outliers and ignores hubs (nodes that
bridge communities). In this paper, we propose a robust distance dynamics model, namely, Attractor++,
which uses a dynamic membership degree. In Attractor++, the dynamic membership degree is used to
determine the influence of exclusive neighbors on the distance instead of setting the cohesion threshold.
Additionally, considering its inefficiency and low accuracy in handling outliers and identifying hubs, we
design an outlier optimization model that is based on triangle adjacency. By using optimization rules, a
postprocessing method further judges whether a singleton node should be merged into the same community
as its triangles or regarded as a hub or an outlier. Extensive experiments on both real-world and synthetic
networks demonstrate that our algorithm more accurately identifies nodes that have special roles (hubs and
outliers) and more effectively identifies community structures.

INDEX TERMS community detection, complex network, distance dynamics model, membership function.

I. INTRODUCTION [4]. Understanding the community structure of a network


is an important problem and is very useful in our lives.
ANY complex systems in the real world can be
M viewed as complex networks [1], such as social net-
works, sensor networks, collaboration networks, biological
The development of algorithms for detecting communities in
networks has attracted the interest of physicists, sociologists,
and especially computer scientists [5]–[7].
networks and other types of complex networks. In recent Up to now, many community detection algorithms have
years, the discovery of community structures in complex been developed to reveal hidden community structures,
networks has gradually become a hot research field. Com- which mainly include the graph-partitioning method [8], [9],
munity detection plays an important role in complex network modularity-based method [10], [11], density-based method
analysis because it provides comprehensive insight into the [12], [13], and dynamic method [14], [15]. Most existing
organizational structure, functional behavior, and evolution- community detection algorithms use a greedy optimization
ary dynamics of the network [2], [3]. The main objective metric to qualify community structure from various points
of community detection is to group similar nodes into the of view and each algorithm has its own advantages and
same community while partitioning dissimilar nodes into limitations. Apart from the user-defined metric algorithm,
different communities, where a community can be regarded how can we identify the communities in a real-world network
as a group of nodes with high-density links within the group in an intuitive way?
and relatively low-density links with nodes in external groups

VOLUME 4, 2016 1

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

Lately, one of the most successful community detection namely, 300 outliers and 59 communities. In this model,
methods, namely, Attractor, which is a distance dynamics many other real-world networks have similar scenarios.
model, was proposed by Shao et al. [15]. Unlike the tra- • Unable to identify the differences between outliers
ditional algorithms [8], [12], [14], [16], Attractor provides and hubs. Actually, Some of the sparsely connected
an intuitive way to analyze the community structure of a nodes in a network may not be outliers but hubs. In
network. This model views the entire graph as an adaptive addition to detecting communities and outliers, identi-
global dynamical system and simulates the synchronization fying nodes with special roles, such as hubs, is a chal-
dynamics over time. The process of the traditional distance lenging task in determining the structure of a complex
dynamics model involves the following stages: First, each network, as hubs play important roles in many real
edge is associated with an initial distance. Then, in a se- complex networks [12]. For instance, hubs in epidemiol-
quential process, each distance gradually shrinks or stretches ogy networks can be core nodes for spreading diseases;
via interaction with its local topological structure. Finally, all in collaboration networks, hubs can be core nodes for
distances converge to 0 or 1. As a result, all communities and sharing ideas.
outliers are naturally obtained by removing the edges with To describe the hubs more clearly, let us consider a simple
distance that are equal to 1. The traditional model has several example; see Figure 1. By using the traditional distance
attractive benefits, such as intuitive community detection, dynamics model, we identify 2 communities and 2 outliers.
small community detection, and anomaly detection. How- All nodes with the same color belong to the same community
ever, there are several limitations of the traditional distance and the two red nodes are the outliers. Our method, namely,
dynamics model. Attractor++, identifies two communities, namely, 1, 2, 3, 4, 5,
6 and 9, 10, 11, 12, 13, 14, and identifies node 7 as an outlier
• Extremely sensitive parameter settings. In the tra- and node 8 as a hub.
ditional distance dynamics model, the global cohesion
parameter, which is denoted as λ, is used to determine 3 4 10 11
the positive or negative interaction influence on the dis-
Outlier
tances for exclusive neighbors. Typically, a lower value
1 2 5 8 9 12
of λ yields larger communities whereas a higher value
Outlier
of λ produces more communities. However, different
networks have different local structures and may require 7 6 14 13
different parameter settings. Thus, it is difficult to find a FIGURE 1: Running example.
proper value of the cohesion parameter λ for a specified
network. In some cases, minor changes to parameter λ
may cause great differences in the resulting community A robust distance dynamics model should be able to over-
structure. come the above limitations. We propose a robust distance
• Unreasonable influence from exclusive neighbors. dynamics model for community detection. To overcome the
During the local dynamic interaction process, the struc- parameter-sensitivity problem, the dynamic membership de-
tures of the communities are constantly changing as new gree is introduced to determine the influence of an exclusive
distances converge. In the traditional distance dynam- neighbor on the distance. Furthermore, the dynamic influ-
ics model, once the underlying influence of exclusive ences from exclusive neighbors can also be easily determined
neighbors on the distances has been determined by the by our algorithm. The membership degree is a dynamic func-
cohesion parameter λ, the influence does not change tion that is based on the characteristics of the communities
during the entire dynamic interaction process. Even if during the local dynamic interaction process in real time. To
an exclusive neighbor has a positive influence on the dis- overcome the rough-outlier problem, an outlier optimization
tance at time step 0, it would have a negative influence rule is proposed for further judging whether an outlier should
on the distance at time step t (the exclusive neighbor be merged into a community based on the adjacent triangle.
may have been moving far away from the corresponding To overcome the unidentified-hub problem, another outlier
node at time step t). optimization rule is developed for further judging whether an
• Poor quality of anomaly detection. In the process of outlier should be as hub based on the connected triangle. We
synchronization dynamics, the traditional model easily summarize the main contributions of this paper as follows.
produces many rough outliers, especially in a large- • A robust distance dynamics model. Based on a dy-
scale, high-density, or noisy network. Many outliers namic membership degree, we propose a robust distance
that are identified by the traditional distance dynamics dynamics model that has improved robustness. The dy-
model belong to a community in the ground truth of the namic membership degree is used to handle the tradi-
network. Consider the typical email-enron network as tional cohesive parameter λ. The dynamic membership
an example: The network consists of 1133 nodes and degree is a similarity index that is used to measures
5451 edges. By using the traditional distance dynamics the similarity between nodes and communities. The
model with parameter λ=0.5, we identify 359 classes, experimental results demonstrate the effectiveness and
2 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

accuracy of finding communities via the robust distance also identifies hubs and outliers. To overcome the problems
dynamics model. of parameter sensitivity and exhaustive similarity evaluations
• An outlier optimization model. To further judge in SCAN, two parameter-free methods, namely, SHRINK
whether each outlier should be merged into the same [18] and SkeletonClu [17], have been proposed and SCAN++
community as its triangles or classified as a hub, we [13] and pSCAN [19] have been proposed to reduce the time
design two outlier optimization rules that help identify complexity.
vertices that have special roles (i.e., hubs and outliers) Dynamic method and distance dynamic model. Dy-
and integrate them into the distance dynamics model. namic algorithms that support additional community se-
• Robust algorithm: Attractor++. A robust community mantics are another research area. Dynamic-process-based
detection algorithm, namely, Attractor++, is proposed. methods are important in the field of complex network com-
It is based on a robust distance dynamics model and munity discovery. Typical dynamic methods include label
an outlier optimization model. Experimental results on propagation [16], random walk [20], and distance dynamics
artificial and real-world networks demonstrate that At- [15]. Owing to the simplicity of its procedure, the label
tractor++ is more robust and efficient in overcoming the propagation method can detect communities in almost linear
above limitations than the original algorithm. time; however, it has poor stability due to the randomness
The remainder of this paper is organized as follows: Re- in the label propagation process [2]. Random-walk-based
lated works are discussed in Section II. Section III presents methods are routinely used for community detection from
our robust model and corresponding community detection the global perspective [20], [21]. However, the quality of the
algorithm. An extensive experimental evaluation is presented detected communities heavily depends on the choice of the
in Section IV . Finally, Section V presents the conclusions of seed node. Recently, inspired by synchronization clustering
this paper. [22], Shao et al. [15] consider the problem of community
detection from a new point of view: distance dynamics.
II. RELATED WORKS Unfortunately, this method has several problems, which were
Community detection has been studied for decades in many analyzed in Section I. To overcome the sensitivity of param-
fields. Recently, scholars have proposed many algorithms for eter λ, E-Attractor [18] was recently proposed. It improves
detecting community structures in complex networks, partic- the stability of Attractor by employing Ego-Leader to replace
ularity in computer science [2]. We review related works, cohesion parameter λ in the dynamic interaction process.
which are organized according to the community detection By using Ego-Leader, the underlying influence of exclusive
algorithms, dynamic method and distance dynamics model, neighbors can be determined by identifying the top-k neigh-
and dynamic membership degree. bors. However, it still has difficulty determining the globally
Community detection algorithms. Currently, the most optimal value of k and lacks an automated way to find a
widely used and practical community detection algorithms satisfactory value of k. Moreover, clustering based on the
can be divided into four categories: graph-partitioning algo- global parameter settings cannot always describe the intrinsic
rithms, modularity-based algorithms, density-based-method community structure accurately and easily produces many
algorithms and dynamics algorithms. In graph-partitioning rough outliers. In addition, F an L et al. [23] proposed a
algorithms, community detection was first modeled as a semisupervised community detection method that integrates
graph partitioning problem. Hence, graph-partitioning algo- the prior information into the distance dynamics model to
rithms [8], [9] are natural choices for community detection. improve the accuracy of community detection. Although this
However, these algorithms rely on a prespecified number of approach is novel, it does not consider these problems. To the
communities k, which renders them not highly applicable best of our knowledge, ours is the first work to solve these
to real-world networks. Since the community structures are problems systematically.
highly complex, it is often expensive or impossible to obtain Dynamic membership degree. The dynamic membership
the number of communities in many real-world networks. degree is essentially a dynamic membership function. Dy-
For modularity-based algorithms, many researchers devoted namic membership functions have been extensively studied
their efforts to improving the effectiveness of community [24], [25] and are widely used in fuzzy systems to describe
detection. One typical method is to optimize the modularity the system dynamics [26]. N epusz et al. [27] define a
measure [10], which is widely used to evaluate the commu- numerical membership degree and develop an algorithm for
nity structure of a network from a global perspective. Unfor- determining the optimal membership degree that is able to
tunately, although modularity-based algorithms are effective identify outlier vertices that do not belong to any of the
in many applications [10], [11], they are difficult to apply communities, which are called bridge vertices, and quantify
to large-scale networks due to their high time complexity the centrality of a vertex with respect to its dominant com-
(which is called the resolution limit problem). Due to the res- munity. Kundu S et al. [28] proposed a community de-
olution limit of modularity-based algorithms, density-based tection algorithm that identifies fuzzy-rough communities in
algorithms have attracted wide research interest [12], [17]. which a single node can belong to many groups with various
One of the most successful density-based algorithms, namely, membership degrees. The method performs well when the
SCAN [12], not only detects meaningful communities but network contains overlapping communities. Recently, Luo
VOLUME 4, 2016 3

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

et al. [29] used various dynamic membership functions to B. DYNAMIC MEMBERSHIP DEGREE CONSTRUCTION
describe the dynamics of the process of community forma- To systematically address the limitations of the traditional
tion and achieved satisfactory result. Therefore, evidence distance dynamics model and enable efficient community
increasingly supports that dynamic membership functions detection, we propose a new metric, namely, the dynamic
have substantial advantages in describing system dynamics. membership degree, for measuring the similarity between an
Motivated by the advantageous properties of dynamic exclusive neighbor node and core nodes and border nodes.
membership functions, we replace cohesion parameter λ Specifically, if an exclusive neighbor node has a stronger
by a dynamic membership degree to simulate the distance membership degree to the core nodes than to the border
dynamics more accurately. In this paper, we integrate the nodes, the exclusive neighbor node will have a positive influ-
dynamic membership degree into the distance dynamics ence on the distance. Moreover, because the node set of core
model and propose a robust distance dynamics model that has nodes and border nodes will change over time, the member-
no parameters. We combine the original network topology ship degree is dynamic. The key to the dynamic membership
with the membership degree to modify the distance model, degree is that each community in a graph consists of a set
which can substantially shorten the time step to accelerate of core nodes and the border nodes that are associated with
the convergence of the distance between nodes and improve these core nodes. Thus, the dynamic membership degree
the accuracy of our algorithm. can replace traditional cohesion parameter λ to determine
the influence of an exclusive neighbor on the distance. To
III. PROPOSED METHOD:ATTRACTOR++ compute the membership degree of an exclusive neighbor
A. PRELIMINARIES node, we define the core nodes of the community that is
Before introducing our method, we present the basic notions associated with a node as follows:
and related definitions that we use throughout the paper. In Definition 3 (Core Nodes). For any arbitrary node u, the
this paper, we focus on an undirected graph G = (V, E, W ), core nodes C(u) are defined as follows: First, the node u and
where V , E and W denote the node set, edge set, and edge its neighbors are considered core nodes if they have nonzero
weight set, respectively. The distance between two nodes similarity degree with node u. Second, for a node that is not
depends on their shared neighbors. Thus, prior to computing a neighbor of node u to become a core node, it must have a
distances, the structural neighbors of a node are defined. The distance of 0 from node u or any other core node.
structural neighbors of a node are its adjacent nodes and the These core nodes are more likely to cluster with node u.
node itself. The core membership degree of exclusive neighbor node x to
Definition 1 (Neighbors of Node u). In an undirected the community that is associated with node u is computed as
graph G = (V, E, W ), the neighbors of node u, which are follows:
|T (x) ∩ C (u)|
denoted by N (u), are defined as follows: CM (x, u) = , (4)
|T (x)|
N (u) = { v ∈ V | {u, v} ∈ E} ∪ {u} . (1)
where T (x) is the set of neighbors of the exclusive neighbor
The distance between adjacent nodes is computed accord- node x and not include the node x and C(u) is the set of core
ing to the common nodes in the structural neighborhoods. nodes that are associated with node u.
This measurement is called the Jaccard distance and is de- Definition 4 (Border Nodes). For any arbitrary node u and
fined as follows: exclusive neighbor node x such that u 6= x, the border nodes
Definition 2 (Jaccard Distance). In an unweighted undi- B(u) are define as follows: First, node x and its neighbors
rected graph G = (V, E), the Jaccard distance between node are considered border nodes if they are not core nodes that
u and node v is defined as: are associated with node u. Second, for a node that is not a
core node that is associated with node u to become a border
|N (u) ∩ N (v)| node, it must have a distance of 0 from node x or any other
d (u, v) = 1 − . (2)
|N (u) ∪ N (v)| border node.
The border nodes are those nodes that have a small prob-
In the above equation, | ∗ | denotes the number of nodes in ability of clustering with node u. The border membership
set ∗ and N (∗) denotes the neighbors of node ∗. The Jaccard degree of exclusive neighbor node x is computed as follows:
distance is a score that varies from 0 to 1 and indicates
the scale of the matching degree of the common neighbors. |T (x) ∩ B (u)|
BM (x, u) = , (5)
When two adjacent nodes share few common neighbors, their |T (x)|
Jaccard distance is large.
where T (x) is the set of neighbors of exclusive neighbor node
For a weighted undirected graph, because each edge has
x and does not include the node x and B(u) is the set of
a different weight, the equation for the Jaccard distance is
border nodes that are associated with node u.
different:
P After computing both the core membership degree and
x∈N (u)∩N (v) (w (u, x) + w (v, x)) the border membership degree, we can easily determine the
d (u, v) = 1 − P . (3)
{x,y}∈E;x,y∈N (u)∪N (v) w (x, y)
positive or negative influence of exclusive neighbors on the
4 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

Core Nodes Border Nodes


Core Nodes Robust Pattern 1: In the first interaction pattern (Figure
Border Nodes 14 14
3(b)), the distance d(u, v) is influenced by two directly linked
6 13 6 13
1 10 1 10 nodes: u and v. In this scenario, one node attracts another
to move toward itself, thereby leading to a decrease in the
5 9 5 9
4 7 12 4 7 12 distance d(u, v). Formally, the influence from the directly
2 2 linked nodes, which is denoted as DI, is defined as follows:
8 11 8 11
3 3  
f (1 − d (u, v)) f (1 − d (u, v))
(a) Membership degree for exclusive (b) Membership degree for exclusive DI = − + . (6)
neighbor node 1 on the edge e(4,5) neighbor node 6 on the edge e(7,9) deg (u) deg (v)
FIGURE 2: Illustration of the dynamic membership degree. In the pattern DI, f (∗) is a coupling function and sin(∗) is
used in Attractor, deg(∗) indicates the degree of node ∗. The
DI is score of varying from -1 to 0 that indicates the degree
distances. To illustrate the dynamic membership degree more of influence on the distance from direct linked nodes. When
clearly, Figure 2 shows two examples. two direct linked nodes are more similar, the higher influence
Consider the graph G in Figure 2(a). Node 4 and node 5 between each other they will have, and vice versa.
are indirectly connected and node 1 is an exclusive neighbor Robust Pattern 2: Influence from common neighbors.
of node 5 on edge e(4, 5). The core nodes (circled by a red In the second interaction pattern (Figure 3(c)), the distance
dotted line) of node 5 are nodes 3, 4, 5, 6 and 7, according to d(u, v) is influenced by the common neighbors. The common
Definition 3. The border nodes (circled by a blue dotted line) neighbors have links with both nodes u and v and are denoted
of node 5 are nodes 1 and 2, according to Definition 4. Since as CN = (N (u) − u) ∩ (N (v) − v). In this scenario, each
CM (1, 5) = 2/3 and BM (1, 5) = 1/3, node 1 is more common neighbor attracts both node u and node v to move
similar to the core nodes than to the border nodes. Therefore, toward itself, thereby resulting in a decrease in the distance
exclusive neighbor node 1 will have a positive influence and d(u, v). Formally, the influence of the common nodes, which
reduce the distance on edge e(4, 5). For comparison, consider is denoted as CI, is defined as follows:
the graph G in Figure 2(b). Node 7 and node 9 are indirectly X  f (1 − d (x, u)) · (1 − d (x, v)) 
connected nodes and node 6 is an exclusive neighbor of node CI = −
deg (u)
9 on edge e(7, 9). The core nodes (circled by a red dotted x∈CN

X f (1 − d (x, v)) · (1 − d (x, u))  (7)
line) of node 9 are nodes 7, 8, 9, 10 and 12, according to − .
Definition 3. The border nodes (circled by a blue dotted line) deg (v)
x∈CN
of node 9 are nodes 1, 4, 5 and 6, according to Definition
4. Since CM (6, 9) = 1/4 and BM (6, 9) = 3/4, node 6 In the pattern CI, for any common neighbor x, the CI
is more similar to the border nodes than to the core nodes. is score of varying from -1 to 0 that indicates the degree of
Therefore, exclusive neighbor node 6 will have a negative influence on the distance from common neighbor x. When
influence and increase the distance on edge e(7, 9). the common neighbor x share many members between node
u and node v, the influence becomes large, and vice versa
Furthermore, as time evolves, the node sets C(u) and B(u)
Robust Pattern 3: Influence from exclusive neighbors.
change as distances converge (to 0 or 1). Thus, CM (x, u)
Unlike in the DI pattern and CI patterns, where directly
and BM (x, u) are dynamic membership degree functions.
linked nodes or common nodes can only exert a positive
For instance, after many time steps, the distance on edge
influence on the distance, in the EI pattern exclusive neigh-
e(4, 1), e(4, 2), e(3, 2) or e(6, 1) may converge to 0 in Figure
bors can exert a positive or negative influence on the distance
2(a). As a result, node 1 or node 2 may join the core nodes of
(Figure 3(d)); otherwise, all distances would converge to 0.
node 5 and may be removed from the border nodes of node
To avoid this problem, instead of using a cohesion parameter
1.
λ to determine the underlying influence, we focus on the
dynamic membership degree. For edge e(u, v), node x is an
C. ROBUST DISTANCE DYNAMICS MODEL exclusive neighbor of node u and we calculate the values of
In the traditional distance dynamics model, three interaction CM (x, u) and BM (x, u). If CM (x, u) ≥ BM (x, u), the
patterns (DI, CI, and EI) are designed for simulating the relationship between x and u is very close and results in the
distance dynamics. However, a naive cohesion parameter λ decrease of the distance d(u, v). Similarly, if CM (x, u) <
is used to determine the positive or negative influence of an BM (x, u), the relationship between x and u is not close and
exclusive neighbor on the distance in interaction pattern EI. results in the increase of the distance d(u, v). To determine
A poor choice of parameter λ may lead to bad results. Hence, the positive or negative influence of an exclusive neighbor
we use notions of core nodes and border nodes and the prop- on distance d(u, v), the dynamic membership degree is used,
erties of the dynamic membership degree that are discussed which is defined as:
above to improve the EI pattern. Because the traditional DI

(1 − d (x, v)) , (CM (x, u) − BM (x, u) ) ≥ 0,
and CI patterns do not use cohesion parameter λ, these two σ (x, u) =
(d (x, v) − 1) , otherwise.
patterns are unchanged in our robust model. (8)
VOLUME 4, 2016 5

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

x y x

u v u d v u d v u v
d(u,v) d
OR

d1 d2 d3 d3'

(a) Example graph (b)Influence from direct (c) Influence from (d) Influence from
linked nodes common neighbors exclusive neighbors
FIGURE 3: Three distinct interaction patterns.

In equation 8, CM (x, u) and BM (x, u) are localized basic indicator of a strong relation in the graph. The main rea-
similarity indices for assessing the similarity between the son we choose the triangle structure connectivity is that real-
exclusive neighbor and the core nodes and border nodes, world graphs such as social networks contains more triangles
respectively. The function σ (x, u) not only characterizes than random graphs and have many triangles in a community
the degree of influence of the exclusive neighbor x on the [30]. Moreover, triangles are known as fundamental building
distance but also indicates the direction (positive or negative) blocks of a network. In a network, a triangle implies a strong
of this influence. Based on the function σ (x, u), the robust tie among three nodes or the existence of a common node
pattern of influence by exclusive neighbors, which is called between other two nodes. In this section, we introduce two
REI, is defined as follows: outlier optimization rules: triangle adjacency and triangle
X  f (1 − d (x, u)) · σ (x, u)  connectivity. Based on these optimization rules, we propose
REI = − an outlier postprocessing method. If the triangles of an outlier
deg (u) satisfy triangle adjacency and all adjacent triangles belong
x∈EN (u)
  (9) to the same community, this outlier should cluster with its
X f (1 − d (y, v)) · σ (y, v)
− , triangles. For that, we define triangle adjacency.
deg (v)
y∈EN (v) Definition 6 (Triangle adjacency.) Two triangles, namely,
∆o and ∆c , in G = (V, E) are adjacent if and only if ∆o and
where EN (u) and EN (v) are node sets of exclusive neigh-
∆c share a common edge, which is denoted by ∆o ∩ ∆c =
bors of nodes u and v, respectively and are expressed as
e(x, y) ∈ E(G).
EN (u) = N (u) − (N (u) ∩ N (v)) and EN (v) = N (v) −
(N (u) ∩ N (v)). For the graph G in Figure 4(a), ∆5,14,15 and ∆5,8,15 are
adjacent as they share a common edge: e(5, 15). Based on
As a result, we obtain the robust distance dynamics model
triangle adjacency, we propose the first optimization rule.
by considering three interaction patterns together. The dis-
tance dynamics d(u, v, t + 1) between nodes u and v over Optimization Rule 1. Given the set C of clusters in a
time is defined as follows: graph G, a vertex u that is not in any cluster in C is not
an outlier vertex if its triangles are only adjacent to other
d (u, v, t + 1) = d (u, v, t) + DI (u, v, t) triangles that are in same community, which is denoted as Ci .
(10)
+CI (u, v, t) + REI (u, v, t) , In this case, the outlier u belongs to community Ci (u ∈ Ci ).
To describe the rule more clearly, we consider a simple
where d(u, v, t + 1) is the new distance at time step t + 1 and example. In Figure 4(a), a simple social network is illustrated.
DI(u, v, t), CI(u, v, t) and REI(u, v, t) indicate the influ- By using the robust distance dynamics model, we find 3
ences of directly connect end nodes, common neighbors, and classes and 2 outliers, where all nodes with same color
exclusive neighbors, respectively, on the distance d(u, v, t) at belong to the same community and the two green nodes
time step t. are the outliers. The triangles of outlier 5, namely, ∆5,14,15
Finally, as time evolves, all distances will converge, and and ∆5,8,15 , are adjacent to triangles ∆11,14,15 and ∆8,10,15 ,
all communities and outliers can be easily identified by respectively. Furthermore, triangles ∆11,14,15 and ∆8,10,15
removing the edges with the distances equivalent to 1. both belong to the blue community. Hence, optimization rule
1 is satisfied. Therefore, node 5 is not an outlier and should
D. OUTLIER OPTIMIZATION MODEL be merged into the blue community. Unlike node 5, node 22
In a network, an outlier has few links in its neighbor set doesn’t satisfy optimization rule 1. Therefore, node 22 is an
[13]. Therefore, we try to exploit the structure connectivity outlier and should not merged into the purple community.
of neighbors to further optimize the accuracy of outliers, In many real-world networks, there are typically hub nodes
which are identified via the distance dynamics model. The [12] that bridge many clusters but don’t belong to any cluster.
concept of structure connectivity has been widely used in Outliers are the nodes that are neither clusters nor hubs.
community detection [12]. Contrary to traditional methods Each outlier is only weakly associated with a cluster. If the
[12], [13], we take the triangles instead of the edges as the triangles of an outlier are connected and belong to different
6 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

3 20 3 20 adjacency, the method merges the singleton node into the


2 21 2 21
4
16
4
16 community via optimization rule 1 (line 9). After that, the
7 19 22 7 19 22
17 17
postprocessing method classifies the singleton nodes that do
6 6
1 5 1 5 not belong to any community as either hubs or outliers. If a
18 18
singleton node satisfies the definition of triangle connectivity,
14 15 14 15
12 8 12 8 it is regarded as a hub according to optimization rule 2 (line
10 10
13 13 11); otherwise, it is regarded as an outlier (line 13).
11 9 11 9

(a) Triangle Adjacent (b) Triangle Connected Algorithm 1 : Outlier postprocessing


FIGURE 4: Illustration of the optimization model. 1: Input: Rough communities CR , and outliers OR ;
2: Output: Final communities C, hubs H, and outliers O;
3: Procedure: Outlier_Optimization(CR , OR );
communities, this node is not an outlier; it is a hub. For that, 4: // Initialization.
we define triangle connectivity. 5: Set C=CR , H=∅, O=∅;
Definition 10 (Triangle connectivity.) Two triangles, 6: // Handling Outliers.
namely, ∆o and ∆c , in G = (V, E) are connected if and 7: for each outlier o ∈ OR do
only if ∆o and ∆c share only one common node, which is 8: if o satisfies optimization rule 1 then
denoted by ∆o ∩ ∆c = u ∈ E(V ). 9: merge o in to the C with its adjacent triangles;
For the graph G in Figure 4(b), ∆5,8,15 and ∆5,17,18 are 10: else if o satisfies optimization rule 2 then
connected as they share a common node: node 5. Based on 11: label o as a hub and add it to node set H;
the triangle connectivity, we propose the second optimization 12: else
rule. 13: label o as an outlier and add it to node set O;
Optimization Rule 2. Given the set C of clusters in a 14: end if
graph G, a vertex u that is not in any cluster in C is not an 15: end for
outlier vertex if its triangles are connected to other triangles 16: Return: C, H, O;
that are in different communities, which are denoted as Ci
and Cj . In this case, node u is not an outlier, but a hub.
To describe the rule more clearly, let us consider a simple E. THE ATTRACTOR++ ALGORITHM
example. In Figure 4(b), a simple social network is illus- In this section, we discuss the main components of our algo-
trated. By using the robust distance dynamics model, we rithm. The proposed algorithm, namely, Attractor++, consists
identify 3 classes and 2 outliers, where all nodes with the of three stages: the distance initialization stage, the dynamic
same color belong to the same community and the two green clustering stage and the cluster refinement stage. The output
nodes are the outliers. Triangles ∆5,17,18 and ∆5,8,15 of out- of the dynamic clustering method is the input of the cluster
lier 5 are connected to triangles ∆17,19,20 and ∆10,11,15 , re- refinement method. In the dynamic clustering stage, Attrac-
spectively. However, triangles ∆17,19,20 and ∆10,11,15 belong tor++ roughly clusters the specified graph. At this stage, the
to the purple community and blue community, respectively. communities and outliers have been roughly identified. After
Hence, optimization rule 2 is satisfied. Therefore, node 5 identifying the candidate clusters and outliers, Attractor++
is not an outlier; it is a hub. Unlike node 5, node 22 does refines the clusters by isolating hubs and outliers in the
not satisfy optimization rule 2 and should be identified as cluster refinement stage.
an outlier. Since outliers have little or no influence on the The pseudocode of our proposed method, namely, Attrac-
clustering of a network, they should be isolated as noises. tor++, is given in Algorithm 2. First, Attractor++ runs the
Based on the two optimization rules, we propose an outlier distance initialization stage (lines 5-8). At the initial time
postprocessing algorithm for further optimizing the outliers (t=0), without any interaction, all the edges are associated
that were identified by the distance dynamics model, to with an initial distance via the Jaccard-distance function,
enhance the accuracy of outlier identification. Before we de- which is expressed in Eq. 2 and Eq. 3 (line 7). Then, the
scribe the postprocessing algorithm, we make three important dynamic clustering stage begins (line 10-31), which relies
remarks: First, when an outlier is a leaf node, we do not on the three proposed interaction patterns (DI, CI, and
change it. Second, when neighbors of an outlier are also REI) and the distances among nodes change gradually as
outliers, they will be excluded from the neighbor set and time evolves (lines 11-24). Because the dynamic membership
not considered in the two optimization rules. Finally, in this degree is used to determine the underlying influences, nodes
paper, we consider a partition that has less than two nodes to that share the same community tend to synchronize and
be an outlier. the distances between these nodes decrease. By contrast,
The outlier postprocessing algorithm is presented as Al- nodes that are in different communities will separate and the
gorithm 1. First, the postprocessing method checks if the distances between these nodes will increase. As a result, all
singleton node satisfies the triangle adjacency condition in distances will converge to either 0 or 1 and the communities
Definition 6 (line 8). If the singleton node satisfies triangle and outliers will be roughly identified by removing all edges
VOLUME 4, 2016 7

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

Algorithm 2 : Attractor++ TABLE 1: Comparison Algorithms.


1: Input: An undirected Graph G = (V, E, W ); Algorithm Type Implementation
2: Output: Final communities C, hubs H, and outliers O; Louvain [11] modularity based algorithm Python
3: Procedure: Attractor++(G); LPA [16] dynamic algorithm Python
Attractor [15] dynamic algorithm Python
4: // Stage 1: Initialization Distance E-Attractor [31] dynamic algorithm Python
5: Set CR =∅, OR =∅, C=∅, H=∅, O=∅; Attractor++ dynamic algorithm Python
6: for each edge e = (u, v) ∈ E do
7: compute the distance d(u, v, t0 ) via Eq.(2) or Eq.(3);
8: end for each time step. Thus, time complexity of this process is
9: // Stage 2: Dynamic Clustering O(T ∗ K ∗ |E|), where T is the number of time steps and K
10: while any edge has not converged to 0 or 1 do is the average number of exclusive neighbors of two linked
11: for each edge e = (u, v) ∈ E do nodes. Finally, the outliers must be further optimized. The
12: if 0 < d(u, v, ti ) < 1 then time complexity is O(D ∗ |O|), where D is the average
13: compute DI(u, v, ti ) via Eq.(6); degree of the graph and |O| is the number of outliers that are
14: compute CI(u, v, ti ) via Eq.(7); identified in the dynamic clustering phase. In total, the time
15: compute REI(u, v, ti ) via Eq.(9); complexity of Attractor++ is O(|E| + T ∗ K ∗ |E| + D ∗ |O|).
16: update distance d(u, v, ti+1 ) via Eq.(10);
17: end if IV. EXPERIMENTS
18: if d(u, v, ti+1 ) ≤ 0 then In this section, we preform extensive experiments to evaluate
19: d(u, v, ti+1 ) = 0; the effectiveness and efficiency of the proposed algorithm
20: end if using a variety of synthetic and real-world networks.
21: if d(u, v, ti+1 ) ≥ 1 then
22: d(u, v, ti+1 ) = 1; A. EXPERIMENTAL SETUP
23: end if Baseline. To evaluate the performance of Attractor++, we
24: end for compare it to four representative community detection algo-
25: end while rithms. All comparison algorithms are listed in Table 1, of
26: for each edge e = (u, v) ∈ E do which the Louvain algorithm has been recognized as a state-
27: if d(u, v, ti+1 ) = 1 then of-the-art community detection algorithm, the LPA algorithm
28: remove edge e from graph G; shows linear time complexity for community detection, and
29: end if the E-Attractor algorithm is a state-of-the-art algorithm that
30: end for was extended from Attractor and provides a parameter-
31: the communities CR and outliers OR are roughly identi- insensitive distance dynamics model that is based on Ego-
fied. Leader. In addition, the Attractor algorithm, as the native
32: // Stage 3: Cluster Refinement algorithm that is based on the traditional distance dynamics
33: use the algorithm 1 Outlier_Optimization(CR , OR ) to model, is indispensable. For all community detection algo-
obtain C, H, and O. rithms, unless otherwise stated, the recommended default
34: Return: C, H, O; parameter values are used to obtain the best experimental
results.
Experimental Platform. To simulate the performances of
all algorithms on both real and synthetic graphs, we rented
with distances of 1 (line 26-30). Finally, the cluster refine-
a high-performance server (IBM x3650 m4) from National
ment process is executed (line 33). After classifying all sin-
Super Computing Center of Changsha, which is located in
gleton nodes as communities, hubs, and outliers, Attractor++
Hunan province, China. The server is comprised of one CPU
terminates the community detection procedure.
with 8 cores (Intel Xeon Processor E5-2603) and 16 GB main
memory. All algorithms are run on the high-performance
F. COMPLEXITY ANALYSIS server using the Windows server 2012 operating system. The
In this section, we analyze the computational complexity of Attractor, E-Attractor and Attractor++ algorithms are imple-
algorithm Attractor++. Given a graph with |V | nodes and |E| mented in Python. For the other two algorithms, we have
edges, Attractor++ finds all communities, hubs and outliers downloaded the Python implementations from the official
without any parameter settings. For the time complexity websites of the corresponding authors.
analysis, Attractor is divided into three parts: initialization, Evaluate Metrics. To extensively compare the community
dynamic clustering and cluster refinement. First, each edge is detection algorithms in terms of effectiveness, we adopt the
associated with an initial distance; thus, the computation time following three quality measures:
is O(|E|). After that, for dynamic clustering, Attractor++ NMI: The first metric is the normalized mutual informa-
must compute the corresponding core membership degrees tion (NMI [32]), which is based on information theory and
and border membership degrees for exclusive neighbors at compares the similarity between the memberships of two
8 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

communities. It has been most widely used to measure the TABLE 2: Synthetic networks and parameters for the LFR
quality of a community when the ground truth is known. The benchmark.
NMI provides a real number that ranges from 0 to 1 via Networks N C k kmax µ Edges
normalization. If the detected communities are completely LFR1 200 4 15 18 0.25 3208
independent of the real communities, then NMI=0; if the LFR2 800 8 15 18 0.25 12453
detected communities are identical to the real communities, LFR3 2000 20 20 25 0.2 35268
LFR4 4000 50 20 25 0.2 79011
then NMI=1. LFR5 10000 80 10 12 0.15 107056
F-measure: We also use F-measure [33] to quantify the LFR6 20000 120 10 12 0.15 216717
performances of the identified communities. F-measure is LFR7 40000 180 12 15 0.10 453054
a commonly used criterion for community detection al- LFR8 60000 250 12 15 0.10 674128
gorithms when the community ground truth is known. F-
measure provides a real number that is between zero and
one and combines recall and precision. A poorly performing values of NMI, F-measure, ARI and running time are calcu-
community detection algorithm should be associated with a lated. The experimental results are shown in Figure 5.
low F-measure. The higher the F-measure value, the better Figure 5(a) displays the MNI results of various algo-
the algorithm performs. rithms on LFR synthetic networks, from which we make the
ARI: The Adjusted Rand Index (ARI [34]) is selected following observations: (1) The five community detection
as the third metric for all algorithms. ARI measures the algorithms yield satisfactory results and the average value
similarity between two clustering results (the agreement on of NMI exceeds 0.6. (2) Comparing the five algorithms, we
whether to put two nodes in the same cluster or in different find that Attractor++ and E-Attractor offer better efficiency
clusters). ARI has a value that is between 0 and 1, where 1 and stability, followed by Attractor and Louvain; the LPA
indicates that the two clustering results are completely same. algorithm performs worst. (3) Focusing on the Attractor, E-
If the detected communities are poor, then ARI=0. Attractor and Attractor++ algorithms, we find that Attrac-
tor++ and E-Attractor are more stable than Attractor on most
LFR networks and E-Attractor has very similar performance
B. SYNTHETIC NETWORKS
to Attractor++.
1) Network Generation
To evaluate the performance and the sensitivity to
community-structure of the selected algorithms, we investi-
gated the results on synthetic networks that were generated
F-measure
MNI

by the Lancichinetti Fortunato Radicchi (LFR) benchmark.


The network generation model, namely, LFR(N , C, k,
kmax , µ, . . .), has five important parameters: N is the number MNI vs Networks F-measure vs Networks

of nodes in the network, C is the number of communities, ( a ) MNI ( b ) F-measure

k is the average degree of the nodes, kmax is the maxi-


mum degree of the nodes, and µ is the mixing parameter,
Running Time(Ms)

which indicates the proportion of a node’s neighbors that


ARI

reside in other communities. Typically, the larger the mix-


ing parameter of a network is, the more difficult it is to
identify the intrinsic communities. Furthermore, the average ARI vs Networks Running Time vs Networks

degree and community size follow power-law distributions. ( c ) ARI ( d ) Running Time

By varying the parameters of the LFR benchmark, we can FIGURE 5: Community detection performances of various
analyze the performances of the algorithms in detail. In algorithms on LFR networks.
these experiments, we generate eight synthetic networks with
ground-truth information. The values of the parameters for Figure 5(b) displays the F-measure of various algo-
the generated networks are listed in Table 2. rithms on LFR synthetic networks, from which we make
To make the synthetic networks more consistent with the the following observations: (1) On the high-noise networks
real-world networks, we generate eight networks with vari- (LFR1 LFR4, mixing parameter µ ≥ 0.2), the F-measure
ous numbers of communities. By setting parameters k, kmax fluctuation is substantial for all five algorithms, of which
and µ, we ensure that all synthetic networks have different Attractor++ and E-Attractor perform best, followed by At-
average degrees, maximum degrees of nodes and noise edges tractor and Louvain, and LPA performs worst. (2) On the
in each community. low-noise networks (LFR5 LFR8, mixing parameter µ<0.2),
the performances of five algorithms are similar.
2) Community Detection Performance Figure 5(c) displays the ARI values of the five algorithms
We evaluate the community detection performances of vari- on LFR synthetic networks, from which we make the fol-
ous algorithms on LFR synthetic networks. Then, the average lowing observations: (1) For the ARI, the differences among
VOLUME 4, 2016 9

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

the five algorithms are substantial and the performances of

Number of Outliers
the algorithms are unstable among the LFR networks. (2)

(O#)
36% 52%
Comparing the five algorithms, we find that Attractor++

O#
71%

O#

O#
63%

offers higher efficiency and stability.


Figure 5(d) displays the running times of the five algo-
rithms on the LFR synthetic networks, from which we make

Number of Outliers
the following observations: (1) Comparing the LPA and Lou-

(O#)
51% 46% 34%

O#
vain algorithms, we observe that the running time of LPA is

O#

O#
27%

3∼10 times shorter than that of Louvain on average and a few


orders of magnitude shorter than that of Attractor. (2) The
running times of the Attractor, E-Attractor and Attractor++
FIGURE 6: Outlier optimization performances on LFR net-
algorithms are very similar. Attractor is slightly faster than E-
works.
Attractor and E-Attractor is slightly faster than Attractor++.
In summary, for synthetic networks, Attractor++ performs
TABLE 3: The characteristic of commonly used real-world
well in identifying ground-truth communities and is more
networks.
robust to noise than the other algorithms.
Networks Node Edge Average degree Communities
3) Outlier Optimization Performance karate 34 78 4.6 2
polbooks 105 441 8.4 3
In the following, we compare the clustering accuracies of adjnoun 112 425 7.6 2
Attractor++ and Attractor on various synthetic networks. football 115 613 10.7 12
Figure 6 presents the outlier optimization results on the polblogs 1490 19090 22.4 2
eight LFR networks. In Figure 6, the "before" column lists the DBLP 317080 1049866 6.6 13477
number of outliers (#O) that are identified by Attractor with-
out the outlier postprocessing, the "after" column lists the
number of outliers (#O) that are detected by Attractor++ with which can be used to evaluate the results of each algorithm
the outlier postprocessing, and the red number indicates the with a desirable accuracy. Moreover, the six selected real-
percentage reduction of #O. As shown in Figure 6, Attractor world networks vary in terms of graph density: polbooks,
(the distance dynamics model) easily finds many outliers: football and polblogs are dense networks, whereas karate,
the numbers of identified outliers exceed 450 on the LFR-7 adjnoun and DBLP are sparse networks. All selected real-
and LFR-8 networks. By using the outlier postprocessing, the world networks are publicly available from the UCI network
number of identified outliers can be substantially reduced and data repository (https://networkdata.ics.uci.edu/index.php)
the accuracy of outlier identification enhanced. For example, and the Stanford large network dataset collection
on the four high-density and noisy networks (LFR-1 LFR- (http://snap.stanford.edu/data/).
4, parameter µ ≥ 0.2), the outlier optimization percentages
(reducing #O) are very large: 71%, 63%, 36% and 52% 2) Community Detection Performance
respectively. On the four low-density and low-noise net- We evaluate the community detection performances of var-
works (LFR-5 LFR-8), the outlier optimization percentages ious algorithms on real-world networks. Then, the average
are slightly lower: 51%, 27%, 46%, and 34%, respectively. values of NMI, F-measure, ARI and running time are calcu-
Considering all eight LFR networks, we find that the distance lated. The experimental results are shown in Figure 7.
dynamics model faces the drawback of easily producing Figure 7(a) displays the NMI results of the algorithms
many rough outliers and the outlier optimization step is on real-world networks, from which we make the follow-
highly necessary for the Attractor++ algorithm. Moreover, ing observations: (1) In terms of NMI, the five algorithms
according to Figure 6, our proposed outlier postprocessing yield different results. Of particular interest, on football and
has a very good performance, which further demonstrates the polblogs, which are high-density networks, Attractor++ ob-
effectiveness of outlier postprocessing. tains the best NMI, followed by E-Attractor and Attractor,
and Louvain and LPA perform worst. (2) Focusing on the
C. REAL-WORLD NETWORKS Attractor, E-Attractor and Attractor++ algorithms, we find
1) Network Description that Attractor++ is more stable than Attractor and E-Attractor
To evaluate the performance and efficiency, it is necessary on most real-world networks and E-Attractor is more stable
to conduct experiments on real-world networks. We compare than Attractor.
the performances of algorithms in terms of the accuracy and Figure 7(b) displays the F-measure of the algorithms on
speed on networks with accurate ground-truth communities. real-world networks, from which we make the following
Six commonly used real-world networks are considered in observations: (1) In terms of F-measure, the differences
the experiments; the characteristics of the networks are listed among the five algorithms are significant, where Attractor++
Table 3. These networks are selected because they are very outperforms the other four algorithms in terms of average
well-known and contain the real structures of communities, F-measure. (2) Focusing on the Attractor, E-Attractor and
10 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

Attractor++ algorithms, the performance of E-Attractor is dynamics model typically detects many outliers, e.g., on
more stable than that of Attractor and that of Attractor++ is the polblogs and DBLP networks. By using our proposed
more stable than that of E-Attractor. outlier postprocessing method, the number of outliers that are
Figure 7(c) displays the ARI values of the algorithms on identified by Attractor++ can be substantially reduced and
real-world networks, from which we make the following the accuracy of outlier identification enhanced. For example,
observations: (1) In terms of average ARI, LPA performs on the polbooks network, the outlier optimization percentage
significantly worse than the other four algorithms. (2) Com- reaches 50%; on the adjnoun network, all outliers are opti-
paring the five algorithms, we find that Attractor++ has the mized; and on the DBLP network, the percentage exceeds
highest efficiency and stability on most real-world networks. 60%. Considering all real-world networks, the distance dy-
Figure 7(d) displays the running times of the algorithms namics model faces the drawback of easily producing many
on real-world networks, from which we make the following rough outliers and the outlier optimization step is highly
observations: (1) The average running time of the LAP algo- necessary for the Attractor++ algorithm. Moreover, Figure
rithm is slightly shorter than that of the Louvain algorithm. 8 demonstrates the effectiveness of the proposed outlier
(2) The running times of the LPA and Louvain algorithms are postprocessing method.
a few orders of magnitude shorter than those of the Attractor,
E-Attractor and Attractor algorithms. (3) The running times

Number of Outliers
of the Attractor, E-Attractor and Attractor++ algorithms are

(O#)
very similar. Specifically, Attractor is slightly faster than E-

O#

O#
100%
Attractor and E-Attractor is slightly faster than Attractor++. 0% 50%
In summary, Attractor++, E-Attractor, Attractor and Lou-
vain outperform LPA on both the sparse real-world networks
(karate, adjnoun and DBLP) and the high-density real-world
Number of Outliers

networks (polbooks, football and polblogs). In addition, on


(O#)

some real-world networks, Attractor++ outperforms the At- 45%

O#

O#
No Outliers 64%
tractor and E-Attractor algorithms.

FIGURE 8: Outlier optimization performances on real-world


F-measure

networks.
MNI

D. TIME STEPS
MNI vs Networks F-measure vs Networks

( a ) MNI ( b ) F-measure In the distance dynamics model, the dynamics of each dis-
tance is simulated according to the three interaction patterns.
Before all distances in the network converge (either 1 or 0),
Running Time(Ms)

the entire interaction process needs to go through multiple


ARI

time steps. In this experiment, we compare the number of


time steps of our proposed algorithm, namely, Attractor++,
with that of the Attractor algorithm on two real-world net-
ARI vs Networks Running Time vs Networks

( c ) ARI ( d ) Running Time works (polblogs and DBLP).


FIGURE 7: Community detection performances of various
algorithms on real-world networks.

> 99% > 99%

3) Outlier Optimization Performance


In this subsection, we compare Attractor++ with Attractor in
terms of clustering accuracy on various real-world networks.
Figure 8 presents the outlier optimization results on the six FIGURE 9: Comparison of the number of time steps for
real-world networks. convergence on the polblogs network.
In Figure 8, the "before" column lists the numbers of
outliers (#O) that are detected by Attractor without the outlier Figure 9 shows the convergence ratio of the edges at each
postprocessing, the "after" column lists the numbers of out- time step on the polblogs network. The green dashed line
liers (#O) that are identified by Attractor++ with the outlier in Figure 9 indicates the time steps when at least 99% of
postprocessing, and the red number indicates the percentage the edges have converged. Attractor++ is much faster than
by which #O is reduced. According to Figure 8, the distance the original algorithm, namely, Attractor. For the Attractor
VOLUME 4, 2016 11

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

algorithm, it takes at least 9 time steps to achieve convergence 11


4
55
of 99% of the distances and all distances converge after [2 classes, 11 outliers (green nodes)] 23 51 24
34 time steps. However, for the Attractor++ algorithm, all 59 3 15 18
21
22 35
distances converge after 17 time steps and it only take 6 time 25 26
39 8 45 29
31 36
steps to achieve 99% convergence of 99% of the distances. 17 27 1 14 52 43 12
0 50 33
Although the total number of time steps of Attractor++ is 32
9
13 41 7
28
40
37 16 34
20 46
much less than that of Attractor, the convergence speed of 60 5 57
54
47 10
44
38
49
6 19
Attractor++ is slower than that of Attractor at the early 56
30 42
2 61 53

time steps. Unfortunately, the convergence speed of Attractor 48 58

slows as the number of time steps increases, whereas the con- FIGURE 11: Case study on the dolphins network using the
vergence speed of Attractor++ increases. For example, after Attractor algorithm.
one time step, nearly 42% of the distances have converged
when the Attractor algorithm is used, compared to only 23%
distances for the Attractor++ algorithm. However, after six nodes are real outliers. For the other 4 outliers, we can use
time steps, nearly 99% of the distances have converged with the outlier postprocessing algorithm to further optimize the
the Attractor++ algorithm, compared to nearly 98% distances results.
with the Attractor algorithm. The main reason is that At-
tractor adopts a global parameter setting to determine the 11
4
underlying influence of exclusive neighbors on the distance, [2 classes, 8 outliers (green nodes-7 leaf)] 55
23 51 24
but the structures of the communities are constantly changing
59 3 15 21
18
with the convergence of new distances. In contrast, Attrac- 22 35
25 39 8 45 29
26
tor++ adopt the dynamic membership degree to determine the 31 36
27 1 14 52 43
underlying influence of exclusive neighbors on the distance. 9
17
40
0 50 33 12
13 41 7
A similar result can easily be obtained from Figure 10; it is 32
54
28
20
37 16
38
34 46
60 5 57 10
47 49
not discussed due to space limitations. 6 19 30 42
44
53
56 2 61
48 58

FIGURE 12: Case study on the dolphins network using the


Attractor++ algorithm.
> 99% > 99%

Figure 12 shows the detection results that are obtained by


the Attractor++ algorithm, which identifies 2 communities
and 7 outliers. According to Figure 12, the 3 outliers (nodes
FIGURE 10: Comparison of the number of time steps on the 36, 44 and 55) in the red dashed circle are optimized and the
DBLP network. other 8 outliers are filtered out in the optimization process.
Specifically, nodes 4, 11, 12, 22, 31, 35 and 48 are leaf nodes
and do not satisfy the optimization rules and nodes 36 and 44
E. CASE STUDIES should be merged into the light-blue community because all
To evaluate the effect effectiveness of our robust distance the triangles of these two nodes are only adjacent to the light-
dynamics model, we select two well-known real-world net- blue community. Similar to nodes 36 and 44, node 55 should
works, namely, dolphins (without class labels) and polbooks be merged into the light-red community. Because node 39 is
(with ground truth), for case studies. These real-world net- not in any of the triangles, it is an outlier.
works are publicly available from the UCI data repository at
https://networkdata.ics.uci.edu/index.php. C1 C3
77 49 57 20 45 13 43 42
56
The first network is the dolphins social network, which 59 61 95

60 101
102

94
C2
76
53
9
19
11 24
47
63 40 17
48 14
consists of 62 vertices and 159 undirected edges. There are 62
97 96
81

83
93 28
4
46
55
29
10

15
26

12
35
84 7 18 33
82 51
two communities in the dataset but no class label information. 99
100
86
73
72
80
31
103
69
6
22

25
21

27
23
8
44
98 74 66 104 0 36
Each node in the network represents a dolphin that lives 87
89
91 79
75
71

30
68

85 64 52 2 16
1
3 32
54
34

in New Zealand. If two dolphins are in contact frequently, (a) Ground truth 88 90 92 78 70 67 65 58 5 50 41 38 37 39

there is an edge between their two nodes. Figure 11 shows


the detection results that were obtained by the Attractor
(c) Attractor++
(b) Attractor

algorithm, which identified 2 communities and 11 outliers.


In Figure 11, all nodes that are the same color belong to the
same community and the 11 green nodes are outliers. Of the
11 outliers that were identified by the Attractor algorithm, 7
are leaf nodes and we cannot intuitively decide whether these FIGURE 13: Case study on the polbooks network

12 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

The second network is the Books About US Politics [3] S. Fortunato, M. Barthelemy, Resolution limit in community detection,
network, which is referred to as the polbooks network and Proceedings of the National Academy of Sciences 104 (1) (2007) 36–41.
[4] A. Lancichinetti, S. Fortunato, Community detection algorithms: a com-
consists of 105 nodes and 441 edges. There are three commu- parative analysis, Physical review E 80 (5) (2009) 056117.
nities in the network and ground-truth information is avail- [5] W. Cui, Y. Xiao, H. Wang, W. Wang, Local search of communities in
able. Each node in the network represents a book about US large graphs, in: Proceedings of the 2014 ACM SIGMOD international
conference on Management of data, ACM, 2014, pp. 991–1002.
politics. An edge between two books indicates that they are [6] A. R. Benson, D. F. Gleich, J. Leskovec, Higher-order organization of
often purchased together by customers. Figure 13(a) shows complex networks, Science 353 (6295) (2016) 163–166.
the ground truth of the polbooks network, which covers [7] L. Chen, J. Zhang, L. Cai, Z. Deng, Fast community detection based on
distance dynamics, Tsinghua Science and Technology 22 (6) (2017) 564–
3 clusters. Figure 13(b) shows the detection results that 585.
were obtained by the Attractor algorithm, which identified [8] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transac-
4 communities and 6 outliers (red dashed circle). Figure tions on pattern analysis and machine intelligence 22 (8) (2000) 888–905.
[9] L. Wang, Y. Xiao, B. Shao, H. Wang, How to partition a billion-node
13(c) shows the detection results that were obtained by the graph, in: Data Engineering (ICDE), 2014 IEEE 30th International Con-
Attractor++ algorithm, which identified 3 communities and ference on, IEEE, 2014, pp. 568–579.
1 hub (red dashed circle). Comparing Figure 13(b) to Figure [10] M. E. Newman, Modularity and community structure in networks, Pro-
ceedings of the national academy of sciences 103 (23) (2006) 8577–8582.
13(a) and Figure 13(c) to Figure 13(a), Attractor++ performs [11] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding
better in identifying ground-truth communities. of communities in large networks, Journal of statistical mechanics: theory
Based on the above two case studies, we make the fol- and experiment 2008 (10) (2008) P10008.
[12] X. Xu, N. Yuruk, Z. Feng, T. A. Schweiger, Scan: a structural clustering
lowing remarks: (1) Our algorithm, namely, Attractor++, can algorithm for networks, in: Proceedings of the 13th ACM SIGKDD
effectively identify vertices that have special roles (hubs and international conference on Knowledge discovery and data mining, ACM,
outliers). (2) Our robust distance dynamics model, which is 2007, pp. 824–833.
[13] H. Shiokawa, Y. Fujiwara, M. Onizuka, Scan++: efficient algorithm for
based on the dynamic membership degree, is effective on finding clusters, hubs and outliers on large-scale graphs, Proceedings of
various networks. the VLDB Endowment 8 (11) (2015) 1178–1189.
[14] M. Rosvall, C. T. Bergstrom, Maps of random walks on complex networks
reveal community structure, Proceedings of the National Academy of
V. CONCLUSIONS
Sciences 105 (4) (2008) 1118–1123.
In this paper, we have presented the novel concept of dynamic [15] J. Shao, Z. Han, Q. Yang, T. Zhou, Community detection based on dis-
membership degree. It enables us to avoid strong dependence tance dynamics, in: Proceedings of the 21th ACM SIGKDD International
on the cohesion parameter λ. Thus, we can conveniently Conference on Knowledge Discovery and Data Mining, ACM, 2015, pp.
1075–1084.
identify high-quality communities. Based on this concept, a [16] U. N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect
robust distance dynamics model has been developed, along community structures in large-scale networks, Physical review E 76 (3)
with a robust community detection algorithm: Attractor++. (2007) 036106.
[17] J. Huang, H. Sun, Q. Song, H. Deng, J. Han, Revealing density-based
Moreover, to improve the accuracy of outlier node identifi- clustering structure from the core-connected tree of a network, IEEE
cation, we further propose two optimization rules for judging Transactions on Knowledge and Data Engineering 25 (8) (2013) 1876–
whether an outlier should be merged into same community as 1889.
[18] J. Huang, H. Sun, J. Han, H. Deng, Y. Sun, Y. Liu, Shrink: a structural
its triangles or be classified as a hub. We conduct extensive clustering algorithm for detecting hierarchical communities in networks,
experiments on both synthetic and real-world networks, and in: Proceedings of the 19th ACM international conference on Information
the results demonstrate the effectiveness and efficiency of the and knowledge management, ACM, 2010, pp. 219–228.
[19] L. Chang, W. Li, L. Qin, W. Zhang, S. Yang, pscan: Fast and exact
proposed algorithm. structural graph clustering, IEEE Transactions on Knowledge and Data
However, complex networks in the real world change Engineering 29 (2) (2017) 387–401.
dynamically over time and their community structures are [20] P. Pons, M. Latapy, Computing communities in large networks using
random walks, in: International symposium on computer and information
dynamically updated. In the face of dynamic networks with sciences, Springer, 2005, pp. 284–293.
complex changes, designing dynamic community discovery [21] Y. Wu, R. Jin, J. Li, X. Zhang, Robust local community detection: on free
algorithms that are based on distance dynamics models re- rider effect and its elimination, Proceedings of the VLDB Endowment 8 (7)
(2015) 798–809.
quires further study. In addition, multiobjective optimization, [22] C. Böhm, C. Plant, J. Shao, Q. Yang, Clustering by synchronization,
game theory, statistics and other theories can be used in in: Proceedings of the 16th ACM SIGKDD international conference on
dynamic community discovery scenarios to design better- Knowledge discovery and data mining, ACM, 2010, pp. 583–592.
[23] L. Fan, S. Xu, D. Liu, Y. Ru, Semi-supervised community detection based
performing dynamic community discovery algorithms. on distance dynamics, IEEE Access.
[24] J. Virant, N. Zimic, Attention to time in fuzzy logic, Fuzzy Sets and
ACKNOWLEDGMENT Systems 82 (1) (1996) 39–49.
[25] S. Wu, M. J. Er, Dynamic fuzzy neural networks-a novel approach to func-
Tao Meng and authors thank the experimental equipments tion approximation, IEEE Transactions on Systems, Man, and Cybernetics,
provided by National Super Computing Center of Changsha, Part B (Cybernetics) 30 (2) (2000) 358–364.
located in Hunan province of China. [26] M. Cerrada, J. Aguilar, E. Colina, A. Titli, Dynamical membership func-
tions: an approach for adaptive fuzzy modelling, Fuzzy Sets and Systems
152 (3) (2005) 513–533.
REFERENCES [27] T. Nepusz, A. Petróczi, L. Négyessy, F. Bazsó, Fuzzy communities and
[1] J. Duch, A. Arenas, Community detection in complex networks using the concept of bridgeness in complex networks, Physical Review E 77 (1)
extremal optimization, Physical review E 72 (2) (2005) 027104. (2008) 016107.
[2] S. Fortunato, Community detection in graphs, Physics reports 486 (3-5) [28] S. Kundu, S. K. Pal, Fuzzy-rough community in social networks, Pattern
(2010) 75–174. Recognition Letters 67 (2015) 145–152.

VOLUME 4, 2016 13

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2877235, IEEE Access

Tao Meng et al.: A Modified Distance Dynamics Model for Improvement of Community Detection

[29] W. Luo, D. Zhang, H. Jiang, L. Ni, Y. Hu, Local community detection with
the dynamic membership function, IEEE Transactions on Fuzzy Systems.
[30] A. Prat-Pérez, D. Dominguez-Sal, J. M. Brunat, J.-L. Larriba-Pey, Shaping
communities out of triangles, in: Proceedings of the 21st ACM inter-
national conference on Information and knowledge management, ACM,
2012, pp. 1677–1681.
[31] C. L. Jun, Z. Jing, C. Lei, H. T. Qin, Enhanced distance dynamics model
for community detection via ego-leader., KSII Transactions on Internet &
Information Systems 12 (5).
[32] L. Danon, A. Dĺłazguilera, J. Duch, A. Arenas, Comparing community
structure identification, Journal of Statistical Mechanics 2005 (09) (2005)
09008.
[33] W. M. Rand, Objective criteria for the evaluation of clustering methods,
Publications of the American Statistical Association 66 (336) (1971) 846–
850.
[34] C. D. Manning, P. Raghavan, H. Schĺźtze, Introduction to information
retrieval, Journal of the American Society for Information Science &
Technology 43 (3) (2008) 824–825.

14 VOLUME 4, 2016

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like