Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
7, September-October 2015
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
Analysis of Large Graph Partitioning and Frequent Subgraph Mining on Graph Data
Abstract: Graph mining has attracted much attention due to explosive growth in generating graph databases. The graph database is one type of
database that consists of either a single large graph or a number of relatively small graphs. Some applications that produce graph database are
biological networks, semantic web and behavioural modelling. Frequent subgraph mining is playing an essential role in data mining, with an
objective of extracting knowledge in the form of repeated structures. Many efficient subgraph mining algorithms have been discovered in the last
two decades, yet most do not scale to the type of data, the so-called “Large-Scale Graph Data”. Many problems are so large or complex that it is
impractical or impossible to solve them on a single computer, especially with given limited memory. Scalable parallel computing algorithms
holds the key role for solving the problem in this context. Various algorithms and parallel frameworks have been discussed for graph
partitioning, frequent subgraph mining based on apriori and pattern growth approaches, and large-scale graph processing techniques. The central
objective of this paper is to initiate research and development of identifying frequent subgraph mining and strategies for graph data centres in
such a way that brings it parallel frameworks for achieving memory scalability, partitioning, load balancing, granularity, and technical
enhancement for future generations.
Keywords: graph partitioning; frequent subgraph mining; apriori; pattern growth; parallel framework.
• A parallel graph partitioning algorithms can take an [34]. The Figure 3 describes distributed memory system and
advantage of the extensively higher amount of label propagation.
memory available in parallel implementation to
partition very large graphs. • Distributed graphs A distributed memory system is
Parallel graph partitioning [23] is crucial for achieving a right communication for online query processing
potential results in such an environment, within the context over a billion node graph. To organize a graph on a
of adaptive graph partitioning, where graph is already distributed memory system, have to divide the
distributed among processors, and however it must be graph into multiple partitions and store each
repartitioned as a result of dynamic nature of underlying partition in one machine (without any overlapping).
computation. In such cases, getting the graph into a single Network communication is necessary for accessing
processor for repartitioning will produce a serious non-local partitions of the graph. Hence, how the
bottleneck that would adversely impact the measurability of graph is partitioned might cause the major impact
the general application. on load balancing and communication [31].
Further work on parallel graph partitioning was • Label Propagation (LP) A local inhabitant of LP as
concentrated on geometric, spectral by Stephen and Simon follows. First assign a unique label id to each
[24], and multilevel partitioning schemes by Karypis and vertex. Next, update the vertex label iteratively. At
Kumar [25, 26]. Geometric graph partitioning algorithms every, iteration a vertex takes the label that is
[27, 28] tend to be slightly easy to parallelize whereas ubiquitous in its neighborhood as its own label. The
spectral and multilevel partitioners are complex to process terminates when labels no longer change.
parallelize. The parallel asymptotic run times are equivalent Vertices that have the identical label belong to the
as that of performing a parallel matrix-vector multiplication identical partition.
on a randomly partitioned matrix. Because of this reason the There are two reasons to assume label propagation for
input graph is not well-distributed across the processors. If partitioning [31].
the graph is first partitioned and then distributed across the 1) Label propagation mechanism is lightweight. It
processors consequently, the parallel asymptotic run times does not cause intermediary results, and it does not
of spectral and multilevel partitioners drop to that of need sorting or indexing the data as in many
performing a parallel matrix-vector multiplication on a well- existing graph partitioning algorithms.
partitioned matrix. Primarily, performing these partitioning 2) Label propagation is able to discover inherent
schemes professionally in parallel needs a good partitioning community structures in real networks: Given the
of the input graph. The static graph partitioning cannot reality of local closely connected substructures, a
provide a good quality of input graph whereas the adaptive label tends to propagate within such structures.
graph partitioning can provide a high quality of input graph Since most real-life networks exhibit clear
partitioning which includes a low edge cut. For this reason, community structures, a partitioning algorithm
parallel adaptive graph partitioners [29, 30] attend to run based on label propagation may divide the graph
considerably faster than static partitioners. into consequential partitions. Compared to
Since the run time of most parallel geometric partitioning maximal match, LP is more semantic-aware and is
schemes does not seem to affect the initial distribution of the a better coarsening scheme.
graph, they will primarily be accustomed to working out a
partitioning for the partitioning algorithm. That is a rough
partitioning of the input graph which is often computed by a
faster geometric approach. This partitioning can be used to
reallocate graph before performing parallel multilevel or
spectral partitioning. Use of this "boot-strapping" approach
will significantly increase the parallel efficiency of the
additional correct partitioning scheme by providing it with
data region [23].
C. Dynamic Graph Partitioning
Currently, huge research is carrying on dynamic graph
partitioning due to the real world applications and scalable
graphs. Dynamic graph partitioning cannot do in single
memory because it has the feature of dramatically increasing
of node/vertices in a graph (billion nodes). For that reason it
holds the concept of distributed memory system, helps to
Figure 3. (a) A graph with 3 machines {m1, m2, m3}, each machine carries 4
place graphs in various machines and processing. For vertices and the partition graphs as {A, B, C, D}, {E, F, G, H}, {I, J, K, L}.
partitioning a dynamic graph, it needs clustering, load (b) Coarsened by maximal match. (c) Coarsened by LP
balancing and some local heuristic methods which obtain
good results. Few authors presented an articles are How to D. Classifiation of graph partitioning algorithms
Partition a Billion-node graph presented by [31]. Streaming The main challenges of partitioning a graph data are: (i)
graph partitioning method was by [32], parallel graph quality graph partitioning (ii) multilevel paradigm (ii) load
partitioning for complex Networks by [33]. Spinner balancing. Graph partitioning algorithms use different
technique for Scalable graph partitioning for the cloud by approaches to eliminate these challenges. Figure 4 shows
different algorithms are classified based on the
implementation of graph partitioning approaches.
© 2015-19, IJARCS All Rights Reserved 31
Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40
∑𝑛𝑛𝑖𝑖=1 𝐺𝐺𝑖𝑖
𝑆𝑆𝑆𝑆𝑆𝑆�𝐺𝐺𝑠𝑠𝑠𝑠𝑠𝑠 � =
𝑛𝑛
Whereas n is the total number of graphs in the graph dataset
[55].
Algorithm Nature of the graph Search Strategy Isomorphic test Nature of output Reference
WARMR Static Breadth First Approximate Complete Dehaspe et al., (1999) [60]
AGM Static Breadth First exact complete Inokuchi et al., (2000) [61]
FSG Static Breadth First Exact incomplete Kuramochi and Karypis, (2001) [59]
FARMER Static Breadth First Approximate complete Nijssen and kok (2002) [62]
FFSG Static Depth First exact complete Huan et al., (2003) [63]
HSIGRAM Static Breadth First Adjustable complete Jiang et al., (2004) [64]
GREW Static Greedy Exact incomplete Kuramochi and Karypis, (2004) [65]
SPIN Static Depth First exact incomplete Huan et al., (2004) [66]
Dynamic GREW Dynamic graphs Depth First Exact incomplete Kuramochi et al., (2006) [67]
ISG Static Breadth First exact complete Thomas et al., (2009) [68 ]
MUSE Static Depth First exact complete Li et al., (2009) [69]
Weighted MUSE Static Depth First exact complete Jamil et al., (2011) [70]
MUSE-P Static Depth First exact complete Li et al., (2010) [71]
UGRAP Static Depth First exact complete Papapetrou et al., (2011) [72]
testbed for comparing an integrated discovery system based restricted only to these embedding lists. Isomorphism tests
on SUBDUE to other non-structural systems. can be done inexpensively by testing whether an embedding
Borgelt and Berthold were proposed a MoFa algorithm list can be polished in the similar way. The algorithm also
[75] which is referred as Molecular Fragment Miner. The uses structural pruning and environment knowledge to
MoFa algorithm keeps all the embedding list of formerly reduce support computation and to remove duplicates uses
found subgraphs with an edge and the extension function is benchmark isomorphism testing.
Algorithm Nature of graphs Search method Isomorphic test Nature of output Reference
GBI Static greedy exact complete Yoshida et al., (1994) [73]
SUBDUE Static greedy approximate complete Cook and Holder et al., (1994) [74]
MOFA Static Depth First exact complete Borgelt and Berthold (2002 ) [75]
gSpan Static Depth First exact complete Yan and Han (2002) [76]
ClosedGraph Static Depth First exact incomplete Yan and Han (2003) [77]
GASTON Static Depth First exact complete Nijssen and Kok (2004) [78]
HybridGMiner Static Depth First exact complete Meinl et al (2004) [79]
SEUS Static Depth First exact complete Gudes et al (2006) [80]
MSPAN Static Depth First exact complete Li et al., (2009) [81]
FCPMiner Static Depth First exact complete Ke et al (2009) [82]
RING Static Depth First exact complete Zhang et al., (2009) [83]
JPMiner Static Depth First exact incomplete Liu et al., (2009) [84]
GraphSig Static Depth First exact complete Ranu et al., (2009) [85]
TSP Dynamic Depth First exact incomplete Li and Hsieh (2010) [86]
RP-GD Static Depth First exact incomplete Li et al., (2011) [87]
RP-FP Static Depth First exact incomplete Li et al., (2011) [87]
real life and large synthetic datasets validate the effectiveness International Conference on Knowledge Discovery and
of MIRAGE for mining frequent subgraphs from large graph Data Mining (ICKDD 95), 1995, pp. 210-215.
datasets. [5] M. Deshpande, M. Kuramochi and G. Karypis, “Frequent
Bhuiyan et al., (2014) proposed a frequent subgraph mining
sub-structure based approaches for classifying chemical
algorithm called FSM-H [114] which handles real world graph
data grows both in size and quantity. FSM-H is a compounds,” IEEE Transactions on Knowledge and Data
distributed frequent subgraph mining method over a Engineering, vol. 17, no. 8, 2005, pp. 1036-1050.
MapReduce-based framework. The framework consists of [6] M. A. Srinuvasu, P. Padmaja and Y. Dharmateja,
three phases: data partition, a preparation phase, and mining “Subgraph relative frequency approach for extracting
phase. In data partition phase, FSM-H creates the partitions of
input data along with the omission of infrequent edges from interesting substructures from molecular data,”
the input graph. Preparation and mining phase performs the International Journal of Computer Engineering &
actual mining task. FSM-H generates a complete set of Technology, 2013, vol.4 no. 4, pp. 400-411.
frequent subgraphs for a given minimum threshold support, [7] H. Haiyan Hu, Xifeng Yan, Yu Huang1, Jiawei Han,
and it is efficient as it applies all the optimizations that the Xianghong Jasmine Zhou. Mining coherent dense
latest FSM algorithm adopt. The experiments with real life and
large synthetic datasets validate the effectiveness of FSM-H subgraphs across massive biological networks for
for mining frequent subgraphs from large graph datasets. functional discovery. Bioinformatics, vol. 21, no. 1, 2005,
pp. 213-221.
VI. CONCLUSION AND FUTURE DIRECTIONS [8] G. Ciriello and C. Guerra, “A review on models and
algorithms for motif discovery in protein-protein
In this paper, we have described various discoveries of
graph partitioning and frequent subgraph mining algorithms. interaction networks,” Briefings Functional Genomics &
Based on the different graph partitioning algorithms, we have Proteomic, vol. 7, no. 2, 2008, pp. 147–156.
classified graph partitioning approaches into static, dynamic [9] J. Huan, W. Wang, J. Prins, and J. Yang, “Spin: Mining
and parallel implementations. In case of frequent subgraph maximal frequent subgraphs from graph databases,” UNC
mining for the large graph, a multiple sets of small graph data, Technical Report TR04-018, 2004.
dynamic graph data and uncertain graph data. The algorithms [10] P. Raghavan, “Social Networks on the Web and in the
have been classified based on the apriori and pattern growth Enterprise,” Proce. First Asia-Pacific Conference on Web
approaches. To handle a large scale graphs we presented Intelligence, 2001, pp. 58-60.
various graph processing techniques and parallel frameworks
[11] C. Bichot and P. Siarry, “Graph Partitioning,” Wiley, New
for frequent subgraph mining are discussed in detail.
York, 2011.
The future directions for identifying frequent subgraphs are:
1) For handling large graph data, very few methodologies [12] D. J. Cook and L. B. Holder, “Mining Graph Data,”
are there for FSM. So, by adopting graph partitioning Wiley, New Jersey, 2007.
algorithms a large graph can be decomposed into a subset of [13] M. R. Garey and D. S. Johnson, “Computers and
graphs and then to the smaller graphs either apriori-based or Intractability: A Guide to the Theory of Np-
pattern growth approach algorithms can be implemented to Completeness,” W. H. Freeman & Co., New York, NY,
identify frequent subgraph mining. USA. 1990.
2) GraphLab, Giraph and GraphX parallel frameworks [14] R. Preis and R. Diekmann, “PARTY- a software library
provide good results while comparing with other different for graph partitioning,” Advances in Computational
frameworks, thus one of the above-mentioned frameworks can
Mechanics with Parallel and Distributed Processing,
be adopted for identifying frequent subgraphs in a large graph
CIVIL COMP PRESS, 1997, pp. 63-71.
data.
[15] S. T. Barnard and H. D. Simon, “A fast multilevel
VII. REFERENCES implementation of recursive spectral bisection for
partitioning unstructured problems,” Concurrency:
[1] D. Chakrabarti and C. Faloutsos, “Graph mining: Laws, Practice and Experience, 1994, vol.6, pp. 101-117.
generators, and algorithms,” ACM comput. Surv. 38, 1, [16] B. Hendrickson and R. Leland, “A multilevel algorithm
Article 2, jun. 2006, doi:10.1145/1132952.1132954. for partitioning graphs,” Proce. ACM/IEEE Conference
[2] C. Wang and S. Parthasarathy, “Parallel Algorithms for on Supercomputing. 1995, pp. 28-28.
Mining Frequent Structural Motifs in Scientific Data,” [17] B. W. Kernighan and S. Lin “An efficient heuristic
Proc. ACM International Conference on Supercomputing procedure for partitioning graphs,” The Bell System
(ICS 04). Jun. 2004, pp. 31-40, Technical Journal, 1970, vol. 49, no. 0, pp. 291–307.
doi:10.1145/1006209.1006215. [18] C. M. Fiduccia and R. M. Mattheyses, “A linear time
[3] J. R. Punin, M. Krishnamoorthy, and M. J. Zaki, heuristic for improving network partitions,” Proce. IEEE
“LOGML-Log Markup Language for Web Usage Design Automation Conference, 1982, pp.175–181.
Mining,” WEBKDD Workshop: Mining Log Data across [19] B. Monien, R. Preis and R. Diekmann, “Quality matching
All Customers Touch Points, 2001, pp. 88-112. and local improvement for multilevel graph-partitioning,”
[4] H. Mannilla, H. Toivonen and I. Verkamo, “Discovering Parallel Computing, vol. 26, no. 12, 2000, pp. 1609-1634.
Frequent Episodes in Sequences.” Proc. IEEE
[20] C. Chevalier and I. Safro, “Comparison of Coarsening [33] H. Meyerhenke, P. Sanders and C. Schulz, “Parallel
Schemes for Multilevel Graph Partitioning,” Proce. Graph Partitioning for Complex Networks,” CoRR, 2014,
International Conference on Learning and Intelligent abs/1404.4797.
Optimization. 2009, pp. 191–205. [34] C. Martella, D. Logotheti and G. Siganos, “Spinner:
[21] I. Safro, P. Sanders and C. Schulz, “Advanced coarsening Scalable Graph Partitioning for the Cloud,” arXiv:
schemes for graph partitioning,” Proce. International 1404.3861v1, 2014.
Symposium on Experimental Algorithms (SEA’12). 2012, [35] B. Hendrickson and R. Leland, “A multilevel algorithm
pp. 369–380. for partitioning graphs,” Technical Report SAND93-1301,
[22] A. Grama, G. Karypis and V. Kumar, “Introduction to Sandia National Laboratories, 1993.
Parallel Computing,” Addison-Wesley, 2nd edition, 2003. [36] G. Karypis and V. Kumar, “A Fast and High Quality
[23] K. Schloegel, G. Karypis and V. Kumar, “Graph Multilevel Scheme for Partitioning Irregular Graphs,”
partitioning for high-performance scientific simulations,” SIAM J on Scientific Computing, vol. 20, no. 1, 1998, pp.
In: Sourcebook of parallel computing, Jack Dongarra, Ian 359-392.
Foster, Geoffrey Fox, William Gropp, Ken Kennedy, [37] G. Karypis and V. Kumar, “hMeTiS 1.5: A hypergraph
Linda Torczon, and Andy White (Eds.). Morgan partitioning package,” Technical report, Dept. of
Kaufmann Publishers Inc., San Francisco, CA, USA, Computer Science and Engineering, Univ. of Minnesota,
2003, pp. 491-541. 1998.
[24] S.T Barnard and H. Simon. “A parallel implementation of [38] G. Karypis and V. Kumar, “MeTiS 4.0: Unstructured
multilevel recursive spectral bisection for application to graph partitioning and sparse matrix ordering system,”
adaptive unstructured meshes,” Proce. SIAM conference Technical report, Dept. of Computer Science and
on Parallel Processing for Scientific Computing.1995, pp. Engineering, Univ. of Minnesota, 1998.
627–632. [39] G. Karypis and V. Kumar, “Parallel Multilevel k-way
[25] G. Karypis and V. Kumar, “A parallel algorithm for Partitioning Scheme for Irregular Graphs,” Proce.
multilevel graph partitioning and sparse matrix ordering,” ACM/IEEE Supercomputing’96 conference. 1996.
Journal of Parallel and Distributed Computing, vol. 48, [40] G. Karypis, K. Schloegel and V. Kumar, “ParMeTiS:
no. 1, 1998, pp. 71-95. Parallel graph partitioning and sparse matrix ordering
[26] G. Karypis and Kumar, “Parallel multilevel k-way library,” Technical report, Dept. of Computer Science and
partitioning scheme for irregular graphs,” SIAM Review, Engineering, University of Minnesota, 1997.
vol. 41, no. 2, 1999, pp. 278-300. [41] P. Sanders and C. Schulz, “Engineering Multilevel Graph
[27] G. L. Miller, S. H. Teng and S. A. Vavasis, “A unified Partitioning Algorithms,” Proce. European Symposium on
geometric approach to graph separators,” Proce. Annual Algorithms, 2011, pp. 469–480
Symposium on Foundations of Computer Science, 1991, [42] P. Sanders and C. Schulz, “Think Locally, Act Globally:
pp. 538–547. Highly Balanced Graph Partitioning,” Experimental
[28] H. Heath and P. Raghavan, “A Cartesian parallel nested Algorithms Lecture Notes in Computer Science, vol.
dissection algorithm,” SIAM Journal of Matrix Analysis 7933, 2013, pp. 164-175.
and Applications, vol. 16, no. 1, 1995, pp. 235-253. [43] M. Holtgrewe, P. Sanders and C. Schulz, “Engineering a
[29] K. Schloegel, G. Karypis and V. Kumar, “Wavefront Scalable High Quality Graph Partitioner,” Proce. IEEE
division and LMSR: algorithms for dynamic International Symposium on Parallel & Distributed
repartitioning of adaptive meshes,” Technical Report TR Processing (IPDPS). 2010, pp.1–12.
98-034, Dept. of Computer Science and Engineering, [44] C. Chevalier and F. Pellegrini, “Improvement of the
Univ. of Minnesota. 1998. Efficiency of Genetic Algorithms for Scalable Parallel
[30] C. Walshaw, M. Cross and M. Everett, “Parallel dynamic Graph Partitioning in a Multi-level Framework,” Euro-Par
graph partitioning for adaptive unstructured meshes,” 2006 Parallel Processing Lecture Notes in Computer
Journal of Parallel and Distributed Computing, vol. 47, Science, 2006, vol. 4128, pp. 243-252.
no. 2, 1997, pp. 102-108. [45] C. Chevalier and F. Pellegrini, “PT-Scotch: A tool for
[31] Lu Wang, Yanghua Xiao, Bin Shao, Haixun Wang. “How efficient parallel graph ordering,” Parallel Computing,
to partition a billion-node graph,” Proce. IEEE 30th 2008, vol. 34, no. 6, pp. 318-331.
International Confernce on Data Engineering (ICDE). [46] F. Pellegrini and J. Roman, “SCOTCH: A software
2014, pp. 568-579. package for static mapping by dual recursive
[32] C. E. Tsourakakis, C. Gkantsidis, B. Radunovic and M. bipartitioning of process and architecture graphs,” HPCN-
Vojnovic, “FENNEL: Streaming Graph Partitioning for Europe, Springer LNCS 1067, 1996, pp. 493-498.
Massive Scale Graphs,” Technical Report MSR-TR-2012- [47] C. Walshaw and M. Cross, “JOSTLE: Parallel Multilevel
113, 2012. Graph-Partitioning Software - An Overview,” In Mesh
Partitioning Techniques and Domain Decomposition
Techniques, Civil-Comp Ltd., 2007, pp. 27-58 .
[48] R. Preis and R. Diekmann, “PARTY- a software library [62] S. Nijssen and J. Kok, “Faster association rules for
for graph partitioning,” Advances in Computational multiple relations,” Proce. International Joint Conference
Mechanics with Parallel and Distributed Processing, on Artificial Intelligence (IJCAI’01), 2001, pp. 891–896.
CIVIL COMP PRESS, 1997,pp. 63-71. [63] J. Huan, W. Wang and J. Prins, “Efficient mining of
[49] H. Meyerhenke, B. Monien and T. Sauerwald ,“A new frequent subgraph in the presence of isomorphism,” UNC
diffusion-based multilevel algorithm for computing graph computer science techonology report TR03-021, 2003
partitions,” Journal of Parallel and Distributed [64] M. Kuramochi.M and G. Karypis, “Finding Frequent
Computing, 2009, vol. 69, no. 9, pp. 750-761. Patterns In a Large Sparse Graph”, in Proceedings of
[50] Trifunovic and W. J. Knottenbelt, “Parallel Multilevel the4th SIAM International Conference on Data Mining
Algorithms for Hypergraph Partitioning,” Journal of (SDM 2004), USA, 2004.
Parallel and Distributed Computing, 2008, vol. 68, no. 5, [65] M. Kuramochi and G. Karypis, “GREW: A Scalable
pp. 563-581. Frequent Subgraph Discovery Algorithm”, Proce.
[51] Umit V Catalyurek, Erik G Boman, Karen D Devine, International Conference on Data Mining (ICDM’04),
Doruk Bozdag, Robert T. Heaphy, Lee Ann Riesen. “A Brighton, pp.439–442, 2004.
repartitioning Hypergraph model for dynamic load [66] J. Huan, W. Wang, J. Prins and J. Yang, “Spin: Mining
balancing,” Journal of Parallel Distribution and maximal frequent subgraphs from graph databases,” UNC
Computing , 2009, vol. 69, no. 8, pp. 711-724. Technical Report TR04-018, 2004.
[52] U. V. Catalyure and C. Aykanat, “PaToH: Partitioning [67] B. Wackersreuther, P. Wackersreuther, A. Oswald, C.
Tool for Hypergraphs,” 2011. Bohm and K. M. Borgwardt, “Frequent subgraph
[53] C. C. Aggarwal and H. Wang, editors. Managing and discovery in dynamic networks,” Proce. Eighth Workshop
Mining Graph Data, volume 40 of Advances in Database on mining and Learning with Graphs (MLG 10), 2010, pp.
Systems. Springer, 2010. 155-162, doi:10.1145/1830252.1830272.
[54] S. Aluru, “Handbook of Computational Molecular [68] Thomas L, Valluri S, Karlapalem K. “Isg: Itemset based
Biology,” Chapman and Hall/CRC. 2006. subgraph mining,” Technical Report, Center for Data
[55] S. U. Rehman, S. Asghar, Y. Zhuang, and S. Fong. Engineering, IIIT, Hyderabad, 2009.
“Performance evaluation of frequent subgraph discovery [69] Z. Zou and J. Li, “Mining Frequent Subgrph Patterns from
techniques,” Mathematical Problems in Engineering, Uncertain Graph Data,” IEEE Transactions on Knowledge
2014, pp. 1-5. and Data Engineering, vol. 22, no. 9, 20 may. 2010, pp.
[56] A.Inokuchi, T. Washio and H. Motoda, “An apriori-based 1203-1218, doi:10.1109/TKDE.2010.80
algorithm for mining frequent substructures from graph [70] S. Jamil, A. Khan, Z Halim, A. R. Baig, “Weighted
data,” Proce. European Conference on Principles of Data MUSE for Frequent Subgraph Pattern Finding in
Mining and Knowledge Discovery (PKDD 00), 2000, pp. Uncertain DBPL Data,” Proce. International Conference
13-23. on Internet Technology and applications (iTAP), 16-18
[57] K. Lakshmi and T. Meyyappan , “A comparative study of Aug. 2011, pp. 1-6.
frequent subgraph mining algorithms,” International [71] Z. Zou, H. Gao and J. Li "Discovering frequent subgraphs
Journal of Information Technology Convergence and over uncertain graph databases under probabilistic
Services (IJITCS), 2012, vol. 2, no. 2, pp. 23-39. semantics," Proce. ACM SIGKDD international
[58] K. Lakshmi and T. Meyyappan, “Frequent subgraph conference on Knowledge discovery and data mining
mining algorithms - a survey and framework for (KDD 10), 2010, pp. 633–642,
classification,” Proce. Conference on Innovations in doi:10.1145/1835804.1835885.
Theoretical Computer Science (ITCS 12), 2012, pp. 189– [72] O. Papapetrou, E. Loannou, D. Skoutas, ”Efficient
202,. Discovery of Frequent Subgraph Patterns in Uncertain
[59] M. Kuramochi and G. Karypis, “Frequent Subgraph Graph Database,” Proce. International Conference on
Discovery,” Proce. IEEE International Conference on Extending Database Technology (EDBT/ICDT '11),
Data Mining (ICDM 01). 2001, pp. 313-320. Anastasia Ailamaki, Sihem Amer-Yahia, Jignesh Pate,
[60] L. Dehaspe and H. Toivonen, “Discovery of Tore Risch, Pierre Senellart, and Julia Stoyanovich (Eds.).
FrequentDatalog Patterns”, Data Mining and Knowledge ACM, New York, NY, USA, pp. 355-366,
Discovery, 1999, pp.7-36. doi:10.1145/1951365.1951408.
[61] A. Inokuchi, T. Washio, and H. Motoda, "An apriori- [73] K. Yoshida, H. Motoda and N. Indurkhya, “Graph-based
based algorithm for mining frequent substructures from Induction as a Unified Learning Framework”, Journal of
graph data," Proce. European Conference on Principles of Applied Intelligence, pp.297–328.
Data Mining and Kn owledge Discovery (PKDD 00), [74] L. B. Holder, D. J. Cook and S. Djoko, “Substructure
2000, pp. 13-23. Discovery in the Subdue System”, Proce. AAAI’94
workshop knowledge discovery in databases (KDD’94),
WA, 1994, pp 169–180.
[75] C. Borgelt and M. R. Berhold, “Mining molecular Trust, 2010, pp. 282-287,
fragments: Finding relevant substructures of molecules,” doi:10.1109/SocialCom.2010.47.
Proce. IEEE International Conference on Data Mining [87] J. Li, Y. Liu and H. Gao. “Efficient algorithms for
(ICDM 02), IEEE Press, 2002, pp. 51-58, summarizing graph patterns,” IEEE Transactions on
doi:10.1109/ICDM.2002.1183885. Knowledge and Data Engineering, 2011, vol. 23, no.9, pp.
[76] X. Yan and J. Han, “gSpan: Graph-based substructure 1388-1405, doi:10.1109/TKDE.2010.48.
pattern mining,” Proce. IEEE International Conference on [88] J. Dean and S. Ghemawat, "MapReduce: Simplified data
Data Mining (ICDM 02), IEEE press, 2002, pp. 721-724, processing on large clusters," Proce. Symposium on
doi:10.1109/ICDM.2002.1184038. Operating System Design and Implementation, pp. 137–
[77] X. Yan and J. Han, “CloseGraph: Mining closed frequent 150, 2004.
graph patterns,” Proce. ACM SIGKDD International [89] Apache Hadoop. http://hadoop.apache.org/.
Conference Knowledge Discovery and Data Mining [90] Hadoop Distributed File System.
(KDD 03). Aug. 2003, pp. 286-295, http://hadoop.apache.org/hdfs/.
doi:10.1145/956750.956784. [91] C. Olston, B. Reed, U.Srivastava, R. Kumar, and A.
[78] S. Nijssen and J. Kok, “A quickstart in frequent structure Tomkins, "Pig Latin: a Not-So-Foreign Language for Data
mining can make a difference,” Proce. ACM International Processing," Proce. ACM/SIGMOD International
Conference on Knowledge Discovery and Data Mining Conference on Management of Data (SIGMOD), 2008,
(SIGKDD 04), Aug. 2004, pp. 647–652, pp. 1099-1110, doi:10.1145/1376616.1376726.
doi:10.1145/1014052.1014134. [92] U. Kang, C. E. Tsourakakis, and C. Faloutsos,
[79] T. Meinl, and M. R. Berthold, “Hybrid fragment mining "PEGASUS: A peta-scale graph mining system –
with MoFa and FSG,” Proce. IEEE International Implementation and observations," Proce. IEEE
Conference on Systems, Man and Cybernetics. 2004, pp. International Conference on Data Mining, 2009, pp. 229–
4559-4564, doi: 10.1109/ICSMC.2004.1401250. 238, doi:10.1109/ICDM.2009.14.
[80] E. Gudes, E. Shimony and N. Vanetik, “Discovering [93] Apache Mahout. http://mahout.apache.org/.
Frequent Graph Patterns Using Disjoint Paths”, IEEE [94] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst,
Transactions on Knowledge and Data Engineering, Los "HaLoop: Efficient iterative data processing on large
Angeles, 2006, pp.1441–1456, clusters," Proce. International Conference on Very Large
doi:10.1109/TKDE.2006.173. Databases, pp. 285–296, 2010.
[81] Y. Li, Q. Lin, G. Zhong, D. Duan, Y. Jin and W. Bi. “A [95] Y. Zhang, Q. Gao, L. Gao, and C. Wang, "iMapreduce: A
directed labeled graph frequent pattern mining algorithm distributed computing framework for iterative
based on minimum code,” Proce. International computation," DataCloud, 2011.
Conference on Multimedia and Ubiquitous Engineering [96] R. Chen, X. Weng, B. He, and M. Yang, "Large graph
(ICMUE ’09), 2009, pp. 353-359, processing in the cloud," Proce. International Conference
doi:10.1109/MUE.2009.67. on Management of Data, pp. 1123–1126, 2010.
[82] Y. Ke, J. Cheng and J. X. Yu. “Efficient Discovery of [97] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. H. Bae,
Frequent Correlated Subgraph Pairs,” Proce. IEEE J. Qiu, and G. Fox, "Twister: a runtime for iterative
International Conference on Data Mining (ICDM ’09), mapreduce," Proce. ACM International Symposium on
IEEE press, 2009, pp. 239-248. High Performance Distributed Computing, HPDC ’10, pp.
[83] S. Zhang, J. Yang and S. Li. “RING: An Integrated 810–818, New York, NY, USA, 2010. ACM.
Method for Frequent Representative Subgraph Mining, [98] Open MPI. http://www.open-mpi.org/.
Proce. International Conference on Data Mining (ICDM [99] MPICH2.
’09), 2009, pp. 1082-1087, doi:10.1109/ICDM.2009.96. http://www.mcs.anl.gov/research/projects/mpich2/.
[84] Y. Liu, J. Li and H. Gao. “JPMiner: Mining Frequent [100] pyMPI. http://pympi.sourceforge.net/.
Jump Patterns from Graph Databases,” Proce. [101] OCaml MPI.
International Conference on Fuzzy Systems and http://forge.ocamlcore.org/projects/ocamlmpi/.
Knowledge Discovery (ICFSKD ’09), 2009, pp. 114-118. [102] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert,
[85] S. Ranu, K. Ambuj K, Singh. “GraphSig: A Scalable I. Horn, N. Leiser, and G. Czajkowski, "Pregel: A System
Approach to Mining Significant Subgraphs in Large for LargeScale Graph Processing," Proce. ACM SIGMOD
Graph Databases,” Proce. IEEE International Conference International Conference on Management of Data, pp.
on Data Engineering (ICDE ’09), IEEE press, 2009, pp. 155–166, 2011.
844-855, doi:10.1109/ICDE.2009.133. [103] Apache Incubator Giraph.
[86] H. P. Hsieh and C. T. Li. “Mining Temporal Subgraph http://incubator.apache.org/giraph//.
Patterns in Heterogeneous Information Networks,” Proce. [104] GoldenOrb. http://www.raveldata.com/goldenorb/.
IEEE International Conference on Social Computing / [105] Phoebus. https://github.com/xslogic/phoebus.
International Conference on Privacy, Security, Risk and [106] Apache Hama. http://incubator.apache.org/hama/