FOCS Fast Overlapped Community Search
FOCS Fast Overlapped Community Search
Abstract—Discovery of natural groups of similarly functioning individuals is a key task in analysis of real world networks. Also, overlap
between community pairs is commonplace in large social and biological graphs, in particular. In fact, overlaps between communities
are known to be denser than the non-overlapped regions of the communities. However, most of the existing algorithms that detect
overlapping communities assume that the communities are denser than their surrounding regions, and falsely identify overlaps as
communities. Further, many of these algorithms are computationally demanding and thus, do not scale reasonably with varying
network sizes. In this article, we propose Fast Overlapped Community Search (FOCS), an algorithm that accounts for local
connectedness in order to identify overlapped communities. FOCS is shown to be linear in number of edges and nodes. It
additionally gains in speed via simultaneous selection of multiple near-best communities rather than merely the best, at each iteration.
FOCS outperforms some popular overlapped community finding algorithms in terms of computational time while not compromising
with quality.
Index Terms—Overlapping community search, social network, local heuristic, complex network
1 INTRODUCTION
school/ college class mates. In Fig. 1, for example, for the The problem of community detection is to find family of
node marked a in the overlapped region of the two commu- subgraphs S ¼ fSi jSi V g such that for any node vj in a
nities A and B, only a very small fraction of its neighbors, subgraph Si , it is more connected in the subgraph Si than in
i.e., only five of 16, are in community B. LPAs, in this case, another subgraph Sj0 . Here, Sj0 ¼ ðSk jvj 2 = Sk ^ Sk 2 SÞ is any
will assign only label A to node a resulting in two disjoint subgraph in family S not containing node vj . Each subgraph
communities. Algorithm DEMON [28] extracts local net- Si 2 S is a community.
work for each node, applies label propagation algorithm to For each node vj , 8j 2 f1; 2;::; jV jg, let Sðvj Þ ¼ fSi jvj 2 Si
each of them, and finally finds union of obtained communi- ^Si 2 Sg be the collection of communities containing node
ties to get overlapped community structure. The algorithm vj . Further, let S 0 ðvj Þ ¼ S Sðvj Þ be the collection of com-
however suffers with the same limitation as an LPA. munities not containing node vj . If each node vj belongs to
Local spectral clustering based methods have also found exactly 1, or no community at all, i.e., jSðvj Þj 1, then it is
application in overlapped community detection [29], [30]. called disjoint clustering, overlapped clustering otherwise.
These methods usually require an upper bound on the num- FOCS algorithm, proposed in this paper, explores over-
ber of communities as input. They usually first approximately lapped clusters in a given graph.
embed the graph in d n dimensions (where n is the number
of nodes) using spectral clustering. Following this, the points
3.2 Connectedness
in low dimensional space are clustered using simpler existing
As has already been mentioned in Section 3.1, for each node
clustering methods. However, computation of eigenvalues/
eigenvectors for spectral clustering are computationally vj , 8j 2 f1; 2;::; jV jg, vj is more connected to any community
expensive. Efforts to parallelize computation in MapReduce in Sðvj Þ than any of the communities in S 0 ðvj Þ. Conse-
in [30] still show limited application in terms of scalability. quently, we say, vj is equally well connected to all the com-
In [22], [31], the problem of overlapped community munities in Sðvj Þ. This derives the working principle for
detection in social networks has been addressed using a FOCS.
game theoretic framework, where the dynamics of commu- Let Nðvj Þ be the set of neighbors of a node vj 2 V . Or,
nity formation have been captured as a strategic game.
Here, each node, a selfish agent in disguise, selects the com- Nðvj Þ ¼ fvk jðvj ; vk Þ 2 Eg: (1)
munities to join or leave, based on its definition of utility.
Utility is usually a combination of gain and loss functions. Now, let Ni ðvj Þ be the within community neighborhood
In [22], for example, increase in modularity has been for- of node vj defined for community Si 2 Sðvj Þ as follows:
mulated as the gain function, whereas the number of com-
Ni ðvj Þ ¼ fvk jðvj ; vk Þ 2 E ^ vk 2 Si g: (2)
munities a node joins is the input parameter to the loss
function. There are other methods that solve community
FOCS defines connectedness of a node with respect to its
detection problem for social networks based on cost-benefit
community as the ratio of the size of its within community
trade-off [23]. They mostly add or remove nodes iteratively
neighborhood to the size of the community minus 1. An
from a community, or merge communities, in order to
individual, thus, is considered to be well connected within
improve the benefits, and reduce the costs incurred to a
its community if it has connections to most of the nodes in
node. Many approaches among these impose the number of
the community (apart from itself). The community connected-
communities a node participates in as a restriction [18], [21],
ness score ~zij , thus, assigned to each node vj in each commu-
[22], [20], [32], which is not the case in real networks [14].
Although the aforementioned methods are simple and nity Si 2 S is,
fast, they mostly find disjoint clusters. The ones that find over-
jNi ðvj Þj
lapped clusters are mostly computationally demanding, and ~zij ¼ : (3)
still restrictive. This makes them inapplicable to large scale jSi j 1
real networks. FOCS, on the other hand is a fast algorithm
that evolves on the basis of some locally computed scores to Further, to ensure that a node in any community has at
discover overlapped communities. It scales well over large least K neighbors within the community [33], Equation (3)
sized social networks. It additionally gains in speed via simul- has been modified to define community connectedness score
taneous selection of multiple near-best communities rather zij as follows:
than merely the best. This helps to save a number of iterations.
Moreover, the communities detected by the method are not jNi ðvj Þj K þ 1
zij ¼ ; if jNi ðvj Þj > K; and, 0; otherwise:
limited to a particular hierarchical level, rather are inclusive jSi j K
of all meaningful communities in the given network. Further- (4)
more, the method is deterministic i.e., the results are not
dependent on the sequence in which the nodes are consid- Reasonably, if K is assigned a very large value, small but
ered. This is a problem in [9], [21], [22], [23], [25], [31]. dense communities will be missed out. On the other hand, a
very small value for K allows discovery of sparser large
communities and insignificant small communities. It is
3 METHOD found that the algorithm is not sensitive to low values of K
3.1 Problem Definition and performs consistently well over networks of varying
We are given an undirected, unweighted graph GðV; EÞ. The sizes with K ¼ 2. Fig. 2 can be referred for variation in sta-
graph is assumed to be simple (without self loop or parallel tistics of detected communities when FOCS is applied on
edges). Amazon network, with increasing values of K, and OVL
BANDYOPADHYAY ET AL.: FOCS: FAST OVERLAPPED COMMUNITY SEARCH 2977
TABLE 1
Status of a Node in a Community on Basis of Neighborhood Connectedness and Community Connectedness Scores
Fig. 6. Distribution of sum of number of neighbors in multiple communities for nodes of corresponding networks.
Amazon. It is the undirected network of Amazon prod- Orkut. It is a free on-line social network where users
uct co-purchasing. Here, the product categories are hierar- form friendship with each other. Ground-truth communi-
chically nested and thus the corresponding network ties are defined on a basis similar to that of LiveJournal [34].
inherently organizes into overlapping community structure. Yeast PPIN. The yeast interaction network is collected
The products in the same ground-truth community share a and combined from three different sources –Y2H-Union con-
common function [34]. taining 2,930 interactions [41], 2,770 interactions from [42],
DBLP. It is the scientific collaboration network of DBLP and only the positive examples i.e., top 58 interactions from
computer science, where two authors are connected if they [43]. Redundancies and self-loops are removed, resulting in
have published at least one paper together. Here, publica- a network of 2,705 interactions among 1,966 proteins. The set
tion venues i.e., journals and conferences is used as proxies of protein complexes considered as true community set is
for ground-truth research communities. In such network CYC2008 collected from [44]. From the complexes in
members are related to each other pertaining to areas of CYC2008 the proteins that are not in the interaction dataset
research, and thus highly overlapping community structure are removed, followed by elimination of complexes contain-
is natural to be observed [34]. ing two or less protein subunits. Following the filtration pro-
YouTube. It is a video-sharing web site. Users can form cess 137 out of 408 original complexes remain.
friendship with each other and thus YouTube also depicts a Human PPIN. The human protein interactome is the
social network. Ground-truth communities considered are PCDq dataset collected from Results of computational
groups explicitly formed by users [34]. analysis section [45]. It provides for both the interaction
LiveJournal. It is a free on-line blogging community network and the complexes (both experimentally verified
where users declare friendship with each other. Ground- and computationally predicted using DBClus). The com-
truth communities are groups explicitly created by users plete interaction network with 32,198 interactions among
based on common interest topics, affiliations, and geo- 9,268 proteins is used. The human protein complex data-
graphical regions. Other users in the network then join set contains 1,078 complexes constituting of 3,759 pro-
some of these communities. Communities belong to one of teins. Among these, only complexes that belong to either
the categories: culture, entertainment, expression, fandom, category I or category II are considered. These are the
life/style, life/support, gaming, sports, student life and complexes with high number of proteins experimentally
technology [34]. verified (for details check [45]). Further complexes of
Fig. 7. Distribution of standard deviation of number of neighbors in multiple communities for nodes of corresponding networks.
2982 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 11, NOVEMBER 2015
TABLE 3
Dataset Statistics
D: average degree, Dmax : maximum degree, C: number of communities, S: average community size, Smax : maximum community size, M: number of communi-
ties per node on an average, Mmax : maximum number of communities any node partcipated in, F : fraction of nodes that participated in atleast one community,
F2þ : fraction of nodes that participated in more than one community. K denotes a thousand and m denotes a million.
size 2 or less are filtered out, resulting in a total of 1,221 updates its belonging coefficient and decides on its set of
complexes formed of 4,325 proteins. community labels by averaging that of its neighbors in syn-
The size of the networks ranges from hundreds of thou- chronous fashion. SLPA propagates community labels
sands to millions of nodes and a hundred of millions of edges. between nodes such that a listener node receives and saves
The number of ground-truth communities, community sizes the most probable label among those sent by its neighbors
and average node membership for the communities too range where each neighboring node sends a label with probability
over a large scale. Table 3 provides the specifics. proportional to its occurrence frequency in memory over
Protein complexes are coherent group of proteins that multiple iterations. Link Communities, on the other hand,
bind at same time and place, to perform a particular func- performs agglomerative hierarchical clustering where simi-
tion. A single protein is known to often bind with a multiple larity between nodes is a function of the commonalities
set of proteins at different time and location for different in their respective neighborhoods. BigClam employs non-
functions, thereby resulting in overlapped complexes in the negative matrix factorization method along with block sto-
protein interactome. The protein interactomes available till chastic gradient descent to optimize the model likelihood of
date though incomplete are expected to closely follow the explaining the links in network based on communities the
complete interactome structure. The interactomes collected nodes participate.
are the most complete available. The original implementations have been used for each of
Table 4 reports the execution time taken by the various the listed algorithms. Further, they have been executed hav-
algorithms on the considered networks. Table 5 summaries ing their parameters set to the default values, except for
the results with each cell representing the NMI between the GCE, where minimum cluster size is changed to 3 instead
detected and the ground-truth communities. of 4. Additionally for LinkComm, SLPA, and COPRA, the
FOCS is compared with seven widely used overlapped communities of size 2 or less are filtered out. COPRA also
community detetction algorithms namely Greedy clique requires one to set the maximum number of communities a
expansion (GCE) [10], MOSES [18], OSLOM [38], COPRA node can participate as an input parameter, v which was
[21], SLPA [25], Link Communities (LinkComm) [16] and tested for values starting from 2, increasing by 1 each time
BigClam [19]. Greedy clique expansion expands cliques until the results became worse. The results reported are
greedily to include edges such that within community edge those with v set to values that yielded community structure
density is improved. MOSES employs stochastic block with close match to the number of ground-truth communi-
model based community detection. OSLOM finds commu- ties for different networks. Similarly, results from SLPA
nities based on the difference between modularity of a can- depends heavily on the probability threshold parameter, r
didate community and that of the same set of nodes in a which was tested for r 2 ½0:01; 0:5
and chosen such that the
randomly generated network. In COPRA each node number of output communities was close to that reported
TABLE 4
Comparison of Time Taken in Detection of Communities by FOCS and by Seven of the Existing Algorithms
#Communities/Time Taken
Networks MOSES GCE OSLOM COPRA SLPA LinkComm BigClam FOCS
Amazon 30.2K/160 s 25.9K/10 s 18.7K/711 s 8.4K/1183 s 30.5K/456 s 61.5K/14 s 151K/1.18 h 20.9K/2 s
DBLP 46.4K/273 s 22.6K/16 s 22.2K/21 m 14.9K/180 s 22.2K/578 s 78.4K/34 s 39.6K/33 m 24.2K/2 s
YouTube 8K/1.9 h - - 12K/238 s 39.9K/104 m 5.1K/1.5 h 8K/1.4 h 7K/52 s
LiveJournal - - - - - - - 0.2M/312 s
Orkut - - - - - - - 0.2M/48 m
Yeast PPIN 76/18 s 92/0 s 74/3 s 86/0 s 243/1 s 159/0 s 137/1 s 32/0 s
Human PPIN 106/30 s 284/4 s 206/60 s 26/1 s 337/3 s 436/1 s 1,078/57 s 114/0 s
The blanks in the table denote that the method was allowed to run for 4 hours before any result was generated, after which it was terminated. h, m, and s denote
hour(s), minute(s) and second(s) respectively. K denotes a thousand and M denotes a million.
BANDYOPADHYAY ET AL.: FOCS: FAST OVERLAPPED COMMUNITY SEARCH 2983
TABLE 5
Comparison of NMI between Ground-Truth Communities and Communities Detected by FOCS and by Seven of the
Existing Algorithms
NMI
Networks MOSES GCE OSLOM COPRA SLPA LinkComm BigClam FOCS
Amazon 0.2239 0.2164 0.1851 0.2076 0.1208 0.2558 0.2421 0.2075
DBLP 0.153 0.1374 0.1276 0.1484 0.1191 0.2112 0.1448 0.2135
YouTube 0.0127 - - 0.0150 0.0025 0.0161 0.0008 0.0225
LiveJournal - - - - - - - 0.0307
Orkut - - - - - - - 0.0611
Yeast PPIN 0.1064 0.1322 0.0481 0.1236 0.0502 0.1148 0.004 0.1284
Human PPIN 0.0793 0.0481 0.0744 0.2510 0.1305 0.1106 0.0328 0.2471
The blanks in the table denote that the method was allowed to run for 4 hours before any result was generated, after which it was terminated.
for the corresponding ground-truth communities. Tables 4 inline figure depicting runtime for Amazon, DBLP and You-
and 5 report results for COPRA with v set to 9, 4, 3, 2, and 2 Tube only. None of the algorithms except FOCS scales well
for Amazon, DBLP, YouTube, Yeast PPIN, and Human PPIN to the two largest social network datasets considered,
respectively. Results reported for SLPA are those with r set within the given time and memory constraints.
to 0.01, 0.05, 0.01, 0.5, and 0.05 for these datasets respectively.
For BigClam, either number of communities, or a range of 5 CONCLUSION
number of communities to be tested is required as input. We Social networks are complex and large. Fast Overlapped
tested with number of communities equal to that appearing Community Search (FOCS) explores communities rapidly
in ground truth communities for the networks, as well as for by selecting only those where all nodes are locally well con-
a range encompassing outputs from other algorithms. The nected. The community connectedness and neighborhood con-
number of communities which yielded the best result was nectedness scores, which are computed for each node
noted, and the time shown in Table 4 is for simulation with throughout the algorithm reflect real world community
these exact number of communities as input parameter. properties. These make the algorithm applicable to real net-
Blanks in Tables 4 and 5 show that GCE and OSLOM could works of varying sizes. Users are free to set the maximum
not produce results for dataset YouTube within four hours allowed overlap between any two communities, and the
time, while none of the methods except FOCS scaled to data- minimum number of neighbors that a node should have, to
sets LiveJournal and Orkut in the same time. COPRA and determine its membership in any community.
SLPA, however, faced memory limitations much earlier for One of the limitations of FOCS is that the maximum num-
datasets LiveJournal and Orkut. The performance of the ber of communities that can be detected by this method is
other algorithms including LFM [9], DEMON [28], and equal to the number of nodes in a network. Whereas, in social
game-theoretic [22] are eliminated from comparison as they networks, as can be seen in Orkut [14], the number of com-
could not produce results even after four hours of execution munities can in fact be double the number of nodes. This
for any of the social network datasets. The game-theoretic happens when a node is allowed to create multiple commu-
algorithm in contradiction to its claim does not converge. nities. We try to address this issue in our future work. Fur-
The results depict significant gain in terms of execution ther, we would like to extend the method to work with
time as compared to the other algorithms. Interestingly, it weighted and/or directed networks, dynamic networks, etc.
does not come at the cost of performance. For all networks
except Amazon and PPIN networks, FOCS outperforms the
other methods. Communities serving as ground-truth for
Amazon have very high overlap (about 91 percent nodes
participate in two or more communities as can be seen in
Table 3). Thus, NMI values for Amazon mostly conform
with methods that yield very high number of overlapping
communities. LinkComm, though efficient in detecting
most of the communities correctly does not scale well with
increasing network sizes. BigClam performs well with input
for number of communities set equal to that in ground-truth
communities except in the case of DBLP network. It per-
forms competitively only for the case of Amazon dataset,
but requires large amount of time. COPRA produces strong
results for the PPIN networks which mostly have disjoint
communities, with very few nodes participating in overlaps
between communities. Fig. 8 shows the runtime of FOCS
versus the number of edges, m, as compared to all the seven
Fig. 8. Runtime of FOCS compared to seven of the existing algorithms
existing algorithms considered. The figure depicts run time for five of the social network datasets [34] with increasing number of
for five of the social network datasets considered, with edges, m.
2984 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 11, NOVEMBER 2015
APPENDIX A region while the other did not. Thus, number of nodes in
each community becomes n k 1, as a node reduces,
Theorem A.1. Given OVL as the maximum allowed overlap
and number of nodes in the overlapped region between
between any two communities for a network, the average num-
1 the pair is reduced by exactly one. This results in an over-
ber of communities per node l is maximally bounded by 1OVL.
lap of nk11nk1 ¼ nk1.
nk2
On the other hand,
Proof. We are given an undirected, unweighted network Omaxmin ðk þ 1Þ ¼ nðkþ1Þ1
nðkþ1Þ ¼ nk1, thereby showing that
nk2
[12] T. Evans, “Clique graphs and overlapping communities,” J. Statis- [38] A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. Fortunato,
tical Mech.: Theory Experiment, vol. 2010, no. 12, p. P12037, 2010. “Finding statistically significant communities in networks,” PloS
[13] N. Mishra, R. Schreiber, I. Stanton, and R. E. Tarjan, “Finding One, vol. 6, no. 4, p. e18961, 2011.
strongly knit clusters in social networks,” Internet Math., vol. 5, [39] S. Fortunato and M. Barthelemy, “Resolution limit in community
no. 1/2, pp. 155–174, 2008. detection,” Proc. Nat. Acad. Sci., vol. 104, no. 1, pp. 36–41, 2007.
[14] J. Yang and J. Leskovec, “Structure and overlaps of communities [40] J. Leskovec, K. J. Lang, and M. Mahoney, “Empirical comparison
in networks,” CoRR, vol. abs/1205.6228, 2012. of algorithms for network community detection,” in Proc. 19th Int.
[15] S. L. Feld, “The focused organization of social ties,” Am. J. Sociol., Conf. World Wide Web, 2010, pp. 631–640.
vol. 86, pp. 1015–1035, 1981. [41] H. Yu, P. Braun, M. A. Yıldırım, I. Lemmens, K. Venkatesan,
[16] Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, “Link communities J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis,
reveal multiscale complexity in networks,” Nature, vol. 466, T. Hao, J. F. Rual, A. Dricot, A. Vazquez, R. R. Murray, C. Simon,
no. 7307, pp. 761–764, 2010. L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A. S. de Smet, A. Motyl,
[17] T. Evans and R. Lambiotte, “Line graphs, link partitions, and M. E. Hudson, J. Park, X. Xin, M. E. Cusick, T. Moore, C. Boone,
overlapping communities,” Phys. Rev. E, vol. 80, no. 1, pp. 016105, M. Snyder, F. P. Roth, A. L. Barabasi, J. Tavernier, D E. Hill, and
2009. M. Vidal, “High-quality binary protein interaction map of the yeast
[18] A. McDaid and N. Hurley, “Detecting highly overlapping com- interactome network,” Science, vol. 322, no. 5898, pp. 104–110, 2008.
munities with model-based overlapping seed expansion,” in Proc. [42] K. Tarassov, V. Messier, C. R. Landry, S. Radinovic, M. M. S.
Int. Conf. Adv. Social Netw. Anal. Mining, 2010, pp. 112–119. Molina, I. Shames, Y. Malitskaya, J. Vogel, H. Bussey, and S. W.
[19] J. Yang and J. Leskovec, “Overlapping community detection at Michnick, “An in vivo map of the yeast protein interactome,” Sci-
scale: A nonnegative matrix factorization approach,” in Proc. 6th ence, vol. 320, no. 5882, pp. 1465–1470, 2008.
ACM Int. Conf. Web Search Data Mining, 2013, pp. 587–596. [43] J. P. Miller, R. S. Lo, A. Ben-Hur, C. Desmarais, I. Stagljar, W. S.
[20] R. Narayanam and Y. Narahari, “A game theory inspired, decen- Noble, and S. Fields, “Large-scale identification of yeast integral
tralized, local information based algorithm for community detec- membrane protein interactions,” Proc. Nat. Acad. Sci. United States
tion in social graphs,” in Proc. 21st Int. Conf. Pattern Recognition, of America, vol. 102, no. 34, pp. 12 123–12 128, 2005.
2012, pp. 1072–1075. [44] S. Pu, J. Wong, B. Turner, E. Cho, and S. J. Wodak, “Up-to-date
[21] S. Gregory, “Finding overlapping communities in networks by catalogues of yeast protein complexes,” Nucleic Acids Res., vol. 37,
label propagation,” New J. Phys., vol. 12, no. 10, p. 103018, 2010. no. 3, pp. 825–831, 2009.
[22] W. Chen, Z. Liu, X. Sun, and Y. Wang, “A Game-theoretic frame- [45] S. Kikugawa, K. Nishikata, K. Murakami, Y. Sato, M. Suzuki,
work to identify overlapping communities in social networks,” M. Altaf-Ul-Amin, S. Kanaya, and T. Imanishi, “PCDq: Human
Data Mining Knowl. Discovery, vol. 21, no. 2, pp. 224–240, 2010. protein complex database with quality index which summarizes
[23] J. Baumes, M. Goldberg, and M. Magdon-Ismail, “Efficient identi- different levels of evidences of protein complexes predicted from
fication of overlapping communities,” in Proc. IEEE Int. Conf. H-invitational Protein-protein interactions integrative dataset,”
Intell. Security Informat., 2005, pp. 27–36. BMC Syst. Biol., vol. 6, no. Suppl 2, p. S7, 2012.
[24] J. Xie and B. K. Szymanski, “Community detection using a neigh-
borhood strength driven label propagation algorithm,” in Proc. Sanghamitra Bandyopadhyay received the
IEEE Network Sci. Workshop, 2011, pp. 188–195. PhD degree in computer science from ISI. She is
[25] J. Xie and B. K. Szymanski, “Towards linear time overlapping currently a professor at the Indian Statistical
community detection in social networks,” in Proc. 16th Pacific-Asia Institute, Kolkata, India. She has authored/co-
Conf. Adv. Knowl. Discovery Data Mining, 2012, pp. 25–36. authored more than 250 technical articles and
[26] J. Xie and B. K. Szymanski, “LabelRank: A stabilized label propa- published five authored and edited books. Her
gation algorithm for community detection in networks,” in Proc. research interests include computational biology
IEEE 2nd Netw. Sci. Workshop, 2013, pp. 138–143. and bioinformatics, soft and evolutionary compu-
[27] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algo- tation, pattern recognition, and data mining. She
rithm to detect community structures in large-scale networks,” is a fellow of NASI and INAE, India, and received
Phys. Rev. E, vol. 76, no. 3, p. 036106, 2007. several prestigious awards including the Hum-
[28] M. Coscia, G. Rossetti, F. Giannotti, and D. Pedreschi, “Demon: A boldt Fellowship from Germany, ICTP Senior Associate, Trieste, Italy,
local-first discovery method for overlapping communities,” in and the Shanti Swarup Bhatnagar Prize in Engineering Science. She is
Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, a senior member of the IEEE.
2012, pp. 615–623.
[29] M. Magdon-Ismail and J. Purnell, “SSDE-cluster: Fast overlapping
clustering of networks using sampled spectral distance embed- Garisha Chowdhary received the BTech and
ding and gmms,” in Proc. IEEE 3rd Int. Conf. Privacy, Security, Risk MTech degrees in computer science from Biju
Trust IEEE 3rd Int. Conf. Social Comput., 2011, pp. 756–759. Pattnaik Univeristy of Technology and Jadavpur
[30] S. Tsironis, M. Sozio, M. Vazirgiannis, and L.-E. Poltechnique, University, respectively. Currently, she is a senior
“Accurate spectral clustering for community detection in research fellow in the Machine Intelligence Unit
Mapreduce,’’ Frontiers of Network Analysis: Methods, Models, and of the Indian Statistical Institute, Kolkata, India.
Applications. Lake Tahoe, NIPS Workshop, 2013. Her research interest includes machine learning
[31] H. Alvari, S. Hashemi, and A. Hamzeh, “Discovering overlapping and complex network analysis.
communities in social networks: A novel Game-theoretic
approach,” AI Commun., vol. 26, no. 2, pp. 161–177, 2013.
[32] F. Bonchi, A. Gionis, and A. Ukkonen, “Overlapping correlation
clustering,” in Proc. IEEE 11th Int. Conf. Data Mining, 2011, pp. 51–60.
[33] S. B. Seidman, “Network structure and minimum degree,” Social Debarka Sengupta received the BTech and PhD
Netw., vol. 5, no. 3, pp. 269–287, 1983. degrees in computer science and engineering
[34] J. Yang and J. Leskovec, “Defining and evaluating network com- from West Bengal University of Technology and
munities based on ground-truth,” in Proc. ACM SIGKDD Work- Jadavpur University, respectively. He was in the
shop Mining Data Semantics, 2012, pp. 3. Machine Intelligence Unit of the Indian Statistical
[35] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source Institute as a research fellow during March, 2009-
software for exploring and manipulating networks.” Proc. Int. March, 2013. Currently, he is a postdoctoral
AAAI Conf. Weblogs Social Media, 2009, vol. 8, pp. 361–362. fellow in Computational and Systems Biology
[36] J. Baumes, M. K. Goldberg, M. S. Krishnamoorthy, M. Group, Genome Institute of Singapore. His
Magdon-Ismail, and N. Preston, “Finding communities by research interest includes computational biology,
clustering a graph into overlapping subgraphs,” in Proc. IADIS functional genomics, and machine learning.
Int. Conf. Appl. Comput., 2005, vol. 5, pp. 97–104.
[37] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and
S. M. Dawson, “The bottlenose dolphin community of doubtful " For more information on this or any other computing topic,
sound features a large proportion of Long-lasting associations,” please visit our Digital Library at www.computer.org/publications/dlib.
Behavioral Ecol. Sociobiol., vol. 54, no. 4, pp. 396–405, 2003.