0% found this document useful (0 votes)
7 views

Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data

Uploaded by

javad.hsadeghi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data

Uploaded by

javad.hsadeghi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Volume 6, No.

7, September-October 2015
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
Analysis of Large Graph Partitioning and Frequent Subgraph Mining on Graph Data

Appala Srinuvasu Muttipati Dr. Poosapati Padmaja


Research Scholar: Computer Science and Engineering Associate Professor: Information Technology
GITAM University GITAM University
Andhra Pradesh, India Andhra Pradesh, India

Abstract: Graph mining has attracted much attention due to explosive growth in generating graph databases. The graph database is one type of
database that consists of either a single large graph or a number of relatively small graphs. Some applications that produce graph database are
biological networks, semantic web and behavioural modelling. Frequent subgraph mining is playing an essential role in data mining, with an
objective of extracting knowledge in the form of repeated structures. Many efficient subgraph mining algorithms have been discovered in the last
two decades, yet most do not scale to the type of data, the so-called “Large-Scale Graph Data”. Many problems are so large or complex that it is
impractical or impossible to solve them on a single computer, especially with given limited memory. Scalable parallel computing algorithms
holds the key role for solving the problem in this context. Various algorithms and parallel frameworks have been discussed for graph
partitioning, frequent subgraph mining based on apriori and pattern growth approaches, and large-scale graph processing techniques. The central
objective of this paper is to initiate research and development of identifying frequent subgraph mining and strategies for graph data centres in
such a way that brings it parallel frameworks for achieving memory scalability, partitioning, load balancing, granularity, and technical
enhancement for future generations.

Keywords: graph partitioning; frequent subgraph mining; apriori; pattern growth; parallel framework.

I. INTRODUCTION The rest of the paper is organized as follows. Section II


presented a review on various graph partitioning methods
Graph-based representation of real world problem has and related software tools. Section III, describes various
been obliging large due to their improving simplicity and existed frequent subgraph mining algorithms based on
professional use in finding solutions. A graph has a newest apriori-based and patterns growth approach which assist to
domain in the field of Knowledge discovery and data mining find the frequent subgraph in single graph data or multiple
(i.e. graph mining). Graph mining [1] is one of the most set of graph data. Section IV, deal with the graph processing
investigated and growing research topics in data mining framework which helps to process large-scale graphs.
domain. The applications of data mining include Section V, authors discuss the parallel frameworks for
Bioinformatics pattern mining [2], Network link analysis finding the frequent subgraph from Large-scale graphs. After
[3], Financial analysis data [4], Chemical data analysis [5,6], discussing future research directions, we concluded in
section VI.
Drug detection, Biological network [7], Protein structure
interaction [8,9] and Social networking [10]. Long- II. GRAPH PARTITIONING
established data mining techniques such as pattern
matching, classification, clustering and frequent subgraph A graph G is a pair (N, E). Let N be a non-empty set of
discovery has been extended to graph scenario. vertices and E be a nonempty set of edges E ⊆ N×N such
Nowadays, large graphs have increased a lot, migrating that every edge e ϵ E relates to the pair of vertices (N1, N2).
from Gigabytes to Terabytes and even up to Petabytes of The graph partitioning (GP) problem is NP-complete [13].
data are being generated every day which gets processed GP can be referred to as min-cut problem; it is defined as
online through social networking sites, biological networks. dividing a graph into a smaller blocks or pieces. It can be
This type of data can be represented as a modeled graph, done in two ways edge based partitioning and vertex based
where the nodes represent user and edges represent the partitioning, Figure 1 shows an example of graph partition.
relation between them. Likewise, search engines manage
huge amounts of data by capturing from the internet. Most
common example to represent modeled graph is by
considering websites as nodes and URLs as edges.
The main motive of this paper is to partition a large graph
and identify frequent subgraph by mining techniques. Many
efficient graph partitioning [11] and frequent subgraph
mining algorithms [12] are existed that find frequent
subgraph/ patterns from single graph data or multiple sets of
graph data. Some of today’s frequent subgraph mining
source data may not fit on a single machine’s hard drive.
The exponential nature of the solution space compounds this
problem. Scalable parallel algorithms hold a key role in
addressing frequent subgraph in the context of large-scale
graph data.

© 2015-19, IJARCS All Rights Reserved 29


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

Figure 1. a) graph partitioning with edge cut b) graph partitioning with


vertex cut
Consequently, in general, it is not sufficient to compute
optimal partitioning for graphs of interesting size in a
realistic amount of time. This statement combines the
significance of the problem, which lead to the development
of numerous heuristic approaches. The partitioning Figure 2. Multilevel Graph Partitioning
heuristics are divided into global and local search methods.
Global methods are sometimes called as construction The two essential parts of multilevel approach [19-21] are
heuristics for the reason that they take graph description as coarsening strategy and local improvement method. In
input and generate a balanced partition. Local search maintaining coarsening approach, the following aspects are
methods are called improvement heuristics. They take a required.
graph and balanced partition as input and try to improve the • The matching algorithm has to be very fast so that
partition. Each partitioning algorithm has to include a global it is more time efficient than standard partitioning
search method. The Optimal method produces the optimal methods applied to the initial graph.
results, although have exponential run time behaviour. The • Since edges of high weights are usually connecting
heuristic methods are Linear, Scattered and Random are dense areas of graph, the algorithm is supposed to
exclusively depending on the node position in the given calculate a matching with a high edge weight in
node list [14]. order to avoid them from appearing in the coarse
graph and being cut.
A. Static Graph Partitioning
• To speed up the coarsening process and to coarsen
Hendrickson and Leland enthused by the work of on the whole graph simultaneously, the matching’s
Stephen and Horst [15]. A multilevel approach to computing on each level should have a high cardinality. The
an eigenvector needed for a spectral partitioning algorithm. maximum reduction is achieved by splitting
This work analyzes numerical problems in transferring number of vertices in halves on each level. This is
eigenvectors between levels. The above problem is solved only possible when a complete matching can be
by transferring partitions between levels using multilevel found.
graph partitioning algorithm by Hendrickson and Leland
[16]. The algorithm contains three divisions: First division, a Table I: different method for multilevel phases
sequence of graphs is constructed by collapsing together
selected vertices of the input graph in order to form a related coarsening partitioning Uncoarsening
coarser graph. This newly constructed graph then proceeds Random Matching Coordinate sorting, Greedy Refinement
as the input graph for another round of graph coarsening and (RM), Geometric (GR),
Heavy Edge partitioning, Kernighan Lin
so on until an adequately small graph is obtained. Second Matching (HEM), Spectral Bisection Refinement (KLR),
division; computation of the spectral method was performed Modified Heavy (SB), Boundary Greedy
on the coarsest of these graphs are very fast. Third division, Edge Matching Recursive Bisection Refinement (BGR),
to project the coarse partition back through the sequence of (MHEM) (RB), Boundary Kernighan
Light Edge Matching Coordinate Nested Lin Refinement
graphs, periodically improving it with a local refinement (LEM), Dissection (CND), (BKLR),
algorithm for this local improvement phase authors use a Heavy Clique Graph Growing Combination of BGR
variant of a popular algorithm originally derived by Matching (HCM). Partition (GGP), and BKLR,
Kernighan and Lin [17], additionally it was improved by Greedy Graph Hill Climbing,
Growing Partitioning Helpful-Set,
Fiduccia and Mattheyses [18], though other methods could (GGGP). Simulated annealing,
be used very well. Figure 4 gives a detail construction of Tabu search,
multilevel graph partitioning phase and Table I provides Reactive search
different methods for contraction, initial partitioning, and optimization.
refinement. B. Parallel Graph Partitioning
The graph partitioning is performed in parallel
implementation [22] is necessary for various reasons.
• The large graphs can’t be computed in serial
implementation because of memory, which is often
not enough to allow the partitioning that can be
now solved on massively parallel implementation
and workstation clusters.

© 2015-19, IJARCS All Rights Reserved 30


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

• A parallel graph partitioning algorithms can take an [34]. The Figure 3 describes distributed memory system and
advantage of the extensively higher amount of label propagation.
memory available in parallel implementation to
partition very large graphs. • Distributed graphs A distributed memory system is
Parallel graph partitioning [23] is crucial for achieving a right communication for online query processing
potential results in such an environment, within the context over a billion node graph. To organize a graph on a
of adaptive graph partitioning, where graph is already distributed memory system, have to divide the
distributed among processors, and however it must be graph into multiple partitions and store each
repartitioned as a result of dynamic nature of underlying partition in one machine (without any overlapping).
computation. In such cases, getting the graph into a single Network communication is necessary for accessing
processor for repartitioning will produce a serious non-local partitions of the graph. Hence, how the
bottleneck that would adversely impact the measurability of graph is partitioned might cause the major impact
the general application. on load balancing and communication [31].
Further work on parallel graph partitioning was • Label Propagation (LP) A local inhabitant of LP as
concentrated on geometric, spectral by Stephen and Simon follows. First assign a unique label id to each
[24], and multilevel partitioning schemes by Karypis and vertex. Next, update the vertex label iteratively. At
Kumar [25, 26]. Geometric graph partitioning algorithms every, iteration a vertex takes the label that is
[27, 28] tend to be slightly easy to parallelize whereas ubiquitous in its neighborhood as its own label. The
spectral and multilevel partitioners are complex to process terminates when labels no longer change.
parallelize. The parallel asymptotic run times are equivalent Vertices that have the identical label belong to the
as that of performing a parallel matrix-vector multiplication identical partition.
on a randomly partitioned matrix. Because of this reason the There are two reasons to assume label propagation for
input graph is not well-distributed across the processors. If partitioning [31].
the graph is first partitioned and then distributed across the 1) Label propagation mechanism is lightweight. It
processors consequently, the parallel asymptotic run times does not cause intermediary results, and it does not
of spectral and multilevel partitioners drop to that of need sorting or indexing the data as in many
performing a parallel matrix-vector multiplication on a well- existing graph partitioning algorithms.
partitioned matrix. Primarily, performing these partitioning 2) Label propagation is able to discover inherent
schemes professionally in parallel needs a good partitioning community structures in real networks: Given the
of the input graph. The static graph partitioning cannot reality of local closely connected substructures, a
provide a good quality of input graph whereas the adaptive label tends to propagate within such structures.
graph partitioning can provide a high quality of input graph Since most real-life networks exhibit clear
partitioning which includes a low edge cut. For this reason, community structures, a partitioning algorithm
parallel adaptive graph partitioners [29, 30] attend to run based on label propagation may divide the graph
considerably faster than static partitioners. into consequential partitions. Compared to
Since the run time of most parallel geometric partitioning maximal match, LP is more semantic-aware and is
schemes does not seem to affect the initial distribution of the a better coarsening scheme.
graph, they will primarily be accustomed to working out a
partitioning for the partitioning algorithm. That is a rough
partitioning of the input graph which is often computed by a
faster geometric approach. This partitioning can be used to
reallocate graph before performing parallel multilevel or
spectral partitioning. Use of this "boot-strapping" approach
will significantly increase the parallel efficiency of the
additional correct partitioning scheme by providing it with
data region [23].
C. Dynamic Graph Partitioning
Currently, huge research is carrying on dynamic graph
partitioning due to the real world applications and scalable
graphs. Dynamic graph partitioning cannot do in single
memory because it has the feature of dramatically increasing
of node/vertices in a graph (billion nodes). For that reason it
holds the concept of distributed memory system, helps to
Figure 3. (a) A graph with 3 machines {m1, m2, m3}, each machine carries 4
place graphs in various machines and processing. For vertices and the partition graphs as {A, B, C, D}, {E, F, G, H}, {I, J, K, L}.
partitioning a dynamic graph, it needs clustering, load (b) Coarsened by maximal match. (c) Coarsened by LP
balancing and some local heuristic methods which obtain
good results. Few authors presented an articles are How to D. Classifiation of graph partitioning algorithms
Partition a Billion-node graph presented by [31]. Streaming The main challenges of partitioning a graph data are: (i)
graph partitioning method was by [32], parallel graph quality graph partitioning (ii) multilevel paradigm (ii) load
partitioning for complex Networks by [33]. Spinner balancing. Graph partitioning algorithms use different
technique for Scalable graph partitioning for the cloud by approaches to eliminate these challenges. Figure 4 shows
different algorithms are classified based on the
implementation of graph partitioning approaches.
© 2015-19, IJARCS All Rights Reserved 31
Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

researched frequent patterns, thus, it can unify the mining


process into the same framework. Therefore, frequent
subgraph mining has raised great interests.
In fact, this issue is precisely described here. If D is the
entry dataset, the frequent subgraph mining intends to mine
graphs with more support value in association with
predetermined threshold. The graph support Gsup is denoted
by sup(Gsup) and is given as

∑𝑛𝑛𝑖𝑖=1 𝐺𝐺𝑖𝑖
𝑆𝑆𝑆𝑆𝑆𝑆�𝐺𝐺𝑠𝑠𝑠𝑠𝑠𝑠 � =
𝑛𝑛
Whereas n is the total number of graphs in the graph dataset
[55].

A. Apriori based approach


Apriori-based frequent substructure mining algorithms share
similar characteristics with Apriori based frequent item set
mining algorithms [56]. The search used for frequent graphs
starts with graphs of tiny “size” and proceeds in a bottom-up
approach. Iteration at every time increases the size by one
Figure 4. Classification of Graph Partitioning Methods
out of newly discovered frequent substructures. These latest
E. Software tools substructures are first generated by joining two similar but
Hendrickson and Leland [35] are the first to launch graph slightly different frequent subgraphs that were discovered
partitioning tool Chaco and to implement the multilevel already. The frequency of newly formed graphs is then
graph partitioning. By adopting this multilevel Karypis and checked. The Apriori-based algorithms have considerable
Kumar [36-39] has proposed parallel multilevel k-way overhead when two size-k frequent substructures are joined
partitioning scheme, fast and high quality multilevel to generate a size (k+1) graph candidates [57, 58].
schemes, parallel algorithm for multilevel graph partitioning, The apriori based algorithms suffer two additional costs:
MeTiS, hMeTiS tool which offer good partitions. Karypis, • Costly subgraph isomorphism test. Since subgraph
Schloegel and Kumar [40] were developed ParMeTiS tool isomorphism is an NP-Complete problem, no
for parallel graph partitioning. Recently Sander and Schulz polynomial algorithm can solve. Thus, testing of
[41] was developed KaHIP tool in which Karlsruhe Fast false candidates (false test or false search) degrades
Flow Partitioning (KaFFPa) is a multilevel graph partitioning the performance a lot.
algorithm by Sander and Schulz [42] uses novel local • Costly candidate generation. The generation of
improvement algorithm based on max-flow and min-cut size(k+1) subgraph candidates from size k frequent
computations and more localized FM searches and, on the
subgraphs are more complicated and costly than
other hand, uses more sophisticated global search strategies
transferred from multi-grid linear solvers. KaFFPa that of item sets as observed by Kuramochi and
Evolutionary [43] is a distributed evolutionary algorithm to Karypis [59]
solve the graph partitioning problem. Pellegrini developed Apriori based algorithms include WARMER [60], AGM
Scotch [44-46], uses recursive multilevel bisection and [61], FSG [62], FARMER [63], FFSG [64], HSIGRAM
incorporate sequential and parallel graph partitioning [65], GREW [66], SPIN [67], Dynamic GREW [68], ISG
methods. Chris Walshaw developed Jostle [47], a well- [69], MUSE [70], Weighted MUSE [71], MUSE-P [72] and
known sequential and parallel graph partitioning techniques. UGRAP [73]. Few of the algorithms are discussed below
Party [48] developed by Robert, has implemented algorithms Table II gives the information about remaining apriori based
are Bubble/Sharp-optimization and Helpful Sets. algorithms overview.
Meyerhenke developed a software packages DibaP and FFSM [64] is a novel subgraph mining algorithm by
PDibaP [49]. The tools mainly focus on hypergraph Huang et al. FFSM is utilized a vertical search schema with
partitioning are Parkway [50] by Trifunovic, Zoltan [51] by in an algebraic graph framework and utilizes a restricted join
Devine et al., and Cataiyure et al., proposed PaToH [52] operation to generate candidates and stores embeddings to
produces high-quality partitioning. avoid explicit subgraph isomorphism testing. The
experimental results were done on synthetic and real
III. FREQUENT SUBGRAPH MINING datasets exhibited that FFSM achieves a substantial
performance over the state-of-the art subgraph mining
The problem of frequent subgraph mining [53] is to find
algorithm approach.
frequent subgraphs over a collection of graphs. Frequent
HSIGRAM [65] uses adjacency matrix representation of
subgraph mining delivers meaningfully structured
graph and use iterative merging for subgraph generation.
information such as hot web access patterns, common
The aim of the HSIGRAM id to find the maximal
protein structures, and Computational Molecular Biology
independent set of graph, which are constructed out of the
[54]. Frequent subgraph mining can also be used in fraud
embeddings of a frequent subgraph so as to evaluate its
detection to catch similar fraud translation patterns from
frequency.
millions of electronic payments. Furthermore, a graph is a
general data structure which covers almost all previous well-

© 2015-19, IJARCS All Rights Reserved 32


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

Table II: Comparison of FSM based on Apriori approach

Algorithm Nature of the graph Search Strategy Isomorphic test Nature of output Reference

WARMR Static Breadth First Approximate Complete Dehaspe et al., (1999) [60]
AGM Static Breadth First exact complete Inokuchi et al., (2000) [61]
FSG Static Breadth First Exact incomplete Kuramochi and Karypis, (2001) [59]
FARMER Static Breadth First Approximate complete Nijssen and kok (2002) [62]
FFSG Static Depth First exact complete Huan et al., (2003) [63]
HSIGRAM Static Breadth First Adjustable complete Jiang et al., (2004) [64]
GREW Static Greedy Exact incomplete Kuramochi and Karypis, (2004) [65]
SPIN Static Depth First exact incomplete Huan et al., (2004) [66]
Dynamic GREW Dynamic graphs Depth First Exact incomplete Kuramochi et al., (2006) [67]
ISG Static Breadth First exact complete Thomas et al., (2009) [68 ]
MUSE Static Depth First exact complete Li et al., (2009) [69]
Weighted MUSE Static Depth First exact complete Jamil et al., (2011) [70]
MUSE-P Static Depth First exact complete Li et al., (2010) [71]
UGRAP Static Depth First exact complete Papapetrou et al., (2011) [72]

approach on three real-world datasets and on synthetic


Zou et al., (2010) proposed an algorithm for Mining uncertain graph databases exhibits the significant cost
Frequent Subgraph Patterns from Uncertain Graph Data. In savings with respect to the state-of-the-art approach [73].
many real applications, graph data is liable to uncertainties
because of incompleteness and imprecision of data. Mining B. Pattern growth approch
such uncertain graph data is semantically different from and In order to avoid limitations of apriori algorithms, Pattern
computationally more challenging than mining conventional growth algorithms have been developed, most of which
exact graph data. A novel model of uncertain graphs is adopt the depth-first search strategy. The pattern growth
presented, and the frequent subgraph pattern mining mining algorithm extends a frequent graph by adding a new
problem is formalized by introducing a new measure, called edge, in each and every possible position. A possible issue
expected support. An approximate mining algorithm called with the edge extension is that the similar graph can be
Mining Uncertain Sub graph patterns (MUSE) [70], is exposed many times. The gSpan algorithm solves this
proposed to find a set of approximately frequent subgraph problem by introducing a rightmost extension technique,
patterns by allowing an error tolerance on expected supports where the only extensions take place on the right-most
of discovered subgraph patterns. The algorithm uses paths. A rightmost path is the straight path from the straight
efficient methods to determine whether a subgraph pattern vertex V0 to the last vertex Vn, according to a depth-first
can be output or not and new pruning method to reduce the search on the graph.
complexity of examining subgraph patterns. Analytical and Pattern growth based approaches based on multiple small
experimental results showed that the algorithm is very graphs included algorithms are Jianzhong, Yong and Hong
efficient, accurate, and scalable for large uncertain graph were proposed algorithms are RP-FP, RP-GD [88], Yong,
databases. Jianzhong and Hong was proposed JPMiner [85], Yuhua et
Khan et al., (2011) proposed a Weighted MUSE [71] by al. were proposed MSPAN [82], HybridGMiner [79], PATH
modifying the MUSE by assigning weights factor w (0, 1) to [80], SEUS[81] Yiping, James, Jeffrey was proposed
the edges of embeddings includes in the identified frequent algorithm FCPMiner [83], Shijie, Jiong, Shirong was
subgraph pattern. proposed by RING [84], Sayan, Ambuj was proposed
Another author investigated on frequent subgraph GraphSig [86], Hsun-Ping, Chengp-Te was proposed TSP
mining on uncertain graphs under probabilistic semantics [87], , for more details [55]. Four well-known algorithms are
[72]. Specifically, a measure called ϕ-frequent probability is discussed below and Table III gives the information about
introduced to evaluate the degree of recurrence of remaining pattern growth algorithms overview.
subgraphs. The goal is to find quickly all the subgraphs with Holder et al., (1994) proposed a Substructure Discovery
frequent probability. The extensive experiments on real in the SUBDUE system. The SUBDUE [74] system, which
uncertain graph data verify that the algorithm is efficient and uses the minimum description length principal to discover
that the mining results have very high quality. substructures that compress the database and represent
Papapetrou et al., (2011) proposed a method that uses an structural concept in the data. By replacing previously
index of the uncertain graph database to reduce the number discovered substructures in the data, multiple passes of
of comparisons needed to find frequent subgraph patterns. SUBDUE produces a hierarchical description of the
The algorithm depends on the apriori property for structural regularities in the data. The optimal background
enumerating candidate subgraph patterns efficiently. Then, knowledge guides SUBDUE towards appropriate
the index is used to reduce the number of comparisons substructures for a particular domain. The use of an inexact
required for computing the expected support of each graph matching allows a controlled amount of deviations in
candidate pattern. It also enables additional optimizations the instance of a substructure concept. The large amount of
with respect to scheduling and early termination, that further structural information that can be added to non-structural
increase the efficiency of the method. The evaluation of our data collection on physical phenomena provides a large

© 2015-19, IJARCS All Rights Reserved 33


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

testbed for comparing an integrated discovery system based restricted only to these embedding lists. Isomorphism tests
on SUBDUE to other non-structural systems. can be done inexpensively by testing whether an embedding
Borgelt and Berthold were proposed a MoFa algorithm list can be polished in the similar way. The algorithm also
[75] which is referred as Molecular Fragment Miner. The uses structural pruning and environment knowledge to
MoFa algorithm keeps all the embedding list of formerly reduce support computation and to remove duplicates uses
found subgraphs with an edge and the extension function is benchmark isomorphism testing.

Table III: Comparison of FSM based on Pattern Growth approach

Algorithm Nature of graphs Search method Isomorphic test Nature of output Reference
GBI Static greedy exact complete Yoshida et al., (1994) [73]
SUBDUE Static greedy approximate complete Cook and Holder et al., (1994) [74]
MOFA Static Depth First exact complete Borgelt and Berthold (2002 ) [75]
gSpan Static Depth First exact complete Yan and Han (2002) [76]
ClosedGraph Static Depth First exact incomplete Yan and Han (2003) [77]
GASTON Static Depth First exact complete Nijssen and Kok (2004) [78]
HybridGMiner Static Depth First exact complete Meinl et al (2004) [79]
SEUS Static Depth First exact complete Gudes et al (2006) [80]
MSPAN Static Depth First exact complete Li et al., (2009) [81]
FCPMiner Static Depth First exact complete Ke et al (2009) [82]
RING Static Depth First exact complete Zhang et al., (2009) [83]
JPMiner Static Depth First exact incomplete Liu et al., (2009) [84]
GraphSig Static Depth First exact complete Ranu et al., (2009) [85]
TSP Dynamic Depth First exact incomplete Li and Hsieh (2010) [86]
RP-GD Static Depth First exact incomplete Li et al., (2011) [87]
RP-FP Static Depth First exact incomplete Li et al., (2011) [87]

remove or reduce these challenges. Figure 5 shows the frequent


Yan and Han were proposed CloseGraph [77] algorithm subgraph mining algorithms have been classified based on
refer as Mining Closed Frequent Graph Pattern. It is the Apriori based and Pattern growth-based approaches. Grouping
extension of the gSpan algorithm. A graph G is closed in a the similar approaches based on their nature of the input graph
database if there exists no proper subgraph of G that has the data.
same support as G. CloseGraph, is developed by discovering
several interesting looping methods. The performance study
shows that, CloseGraph not only dramatically reduces
unnecessary subgraphs to be generated, but also significantly
increases the efficiency of mining, particularly in the presence
of large graph patterns.
Nijssen and Kok were presented a GASTON [78] algorithm
which is referred as GrAph/Sequence/Tree extractiON.
GASTON algorithm puts together frequent path, sub-tree, and
subgraph, owing to the surveillance that most frequent
substructures in molecular datasets are open trees. The
algorithm suggests an explanation by divide the frequent
subgraph mining procedure into the path, then sub-tree, and at
last generate subgraph. As a result, the generated subgraphs
are invoked when needed. Hence, GASTON functions
optimally when graphs are generally trees or paths because the
most expensive subgraph isomorphism testing is finding out
subgraph mining phase. GASTON keeps all the embeddings in
order to generate only new subgraphs that actually appear;
thus saving on unnecessary isomorphism detection. GASTON
can compute the frequency of a subgraph either with
isomorphism tests or embedding lists
C. Classifiation of Frequent Subgraph Mining
Three main challenges of subgraph generation process are: (i)
isomorphic subgraphs, (ii) infrequent subgraphs and (iii) the
subgraphs that not exist in the graph database. Frequent Figure 5. Classification of Frequent Subgraph Mining Algorithm
subgraph mining algorithms use different approaches to

© 2015-19, IJARCS All Rights Reserved 34


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

IV. PARALLEL FRAMEWORKS FOR LARGE SCALE D. Bulk-Synchronize Parallel


GRAPHS Leslie Valiant (1990) developed a Bulk-Synchronize
Parallel in Harvard University. In the middle of 1990 -1992,
In this section describes an implementation for processing valiant and McColl dealt with thoughts for a distributed
and generating large graph data sets with parallel and memory BSP programming model. Somewhere around 1992
distributed algorithms on a cluster. and 1997, McColl leads a large research team at Oxford and
A. MapReduce developed different BSP programming libraries, languages,
tools and also various massively parallel BSP algorithms. BSP
MapReduce [88] is a programming framework for Model for massage passing and collective communications. A
processing the massive amount of data in a distributed BSP program consists of a sequence of supersteps. Each
approach. MapReduce provides users a simple interface for superstep comprises of three phases: Local computation,
programming distributed algorithm, and it handles all the Process communication, Barrier synchronization. BSP
details of data distribution, replication, fault tolerance, and programming enables you to write high-performance parallel
load balancing. A typical MapReduce task consists of three computing algorithms for a wide range of scientific problems.
abstract functions: map, shuffle, reduce. At the “map” Pregel [102] presented the first bulk synchronous
function, the raw data is read and processed to output (key, distributed message-passing framework, from which graph
value) pairs. At the “shuffle” function, the out of the map processing system has drawn. A few different frameworks are
phase is sent to reducers passing through the network so that based on Pregel, including Giraph [103], GoldenOrb [104],
the pair with the same key is grouped together. At the” Phoebus [105], Hama [106], JPregel [107], Bagel [108].
reduced” function, the (key, value) pairs with the same key are Giraph is the most well known and advanced of these systems.
processed to output another (key, value) pairs with the same Giraph adds several features beyond the basic Pregel model,
key are processed to output another (key, value) pair. An including master computation, shared aggregators, edge-
iterative MapReduce program runs several MapReduce tasks, oriented input, out-of-core computation and more. In 2012,
by feeding the output of the current task as the input of the apache Giraph is dispatched as an open partner to Pregel and an
next task. iterative graph processing system constructed for high
scalability. Giraph can run as a typical Hadoop job that uses the
B. Hadoop Hadoop clustering infrastructure. Giraph model is appropriate
Hadoop [89] is an open-source implementation of for distributed implementation because it doesn’t demonstrate
MapReduce programming model written in Java language. any mechanism for detecting the order of execution within a
superset, and all communication ids from superstep S to
Hadoop uses the Hadoop Distributed File System (HDFS) [90]
superstep S+1. During program execution, graph vertices are
for its file system. There are several packages that run on top
partitioned and assigned to workers. The default partition
of Hadoop, including PIG [91], a high-level language for mechanism is hash-partitioning, but the custom partition also
Hadoop, and HBASE, column-oriented data storage on top of supports.
Hadoop. Due to the simplicity, scalability and fault tolerance,
big graph mining using Hadoop attracted significant attentions E. Other systems
in the research community. The examples of these systems are Krepska et al., (2010) proposed a HipG [109] is a framework
Pegasus [92]: A peta-scale graph mining system – in which each vertex is a java object and the computation is
Implementation and observations was introduced by Kang et done sequentially starting from a particular vertex. The code is
al., in 2009, Mahout [93], HaLoop [94] was introduced by Bu expressed as if the graph is in a single machine, and the reads
et al., in 2010, iMapReduce [95], Surfer [96] was introduced and writes to vertices residing in other machines are translated
by Chen et al., in 2010 and Twister [97] was introduced by as RPC calls. HipG incurs significant overhead from RPC calls
Ekanayake et al., in 2010. when executing algorithms, such as PageRank, that compute a
value for each vertex in the graph. Zaharia et al., (2010)
C. Message Passing Interface developed Spark [110] framework is a general clustering
The Message Passing Interface (MPI) is a library particular system, whose API is designed to express generic iterative
for message-passing. MPI [98] is a standard Application components. As a result, programming graph algorithms on
Program Interface (API) that can be utilized to make Spark required signification more coding effort than on graph
application. The objective of the MPI is to give a broadly used processing system. GraphX [111] is apache Spark’s API for
standard for writing message-passing programs. The interface graphs and graph-parallel computation. The requirement for
endeavors to be: practical, portable, efficient, flexible. MPI was instinctive, scalable tools for graph computation has lead to the
designed for distributed memory architecture, which were advancement of new graph parallel system like Pregel and
becoming increasingly popular at time 1980s -1990s. An GraphLab[112] which are intended to proficiently execute
architecture trend changed, shared memory was combined over graph algorithms. Unfortunately, these systems do not address
networks creating hybrid distributed memory/shared memory the challenges of graph construction and transformation and
system. Today, MPI runs on virtually any hardware platform: offer restricted fault tolerance and support for interactive
Distributed Memory, Shared Memory, and Hybrid. The analysis.
programmer is responsible for correctly identifying parallelism
and implementing parallel algorithms using MPI constructs. V. PARALLEL FRAMEWORKS ON FSM
There are several executions of MPI [99-101] which can be
used to actualize parallel message-passing graph algorithms in Bhuiyan et al., (2013) presented a novel iterative
various programming languages. MPI consists of very low- MapReduce framework based on Frequent subgraph mining
level communication primitives that do not provide any algorithm called MIRAGE[113]. MIRAGE is complete as it
consistency or fault-tolerance. Programmers must build another returns all the frequent subgraphs for a given user-defined
level of deliberation all alone, which makes programming support, and it is efficient as it applies all the optimizations
harder than bulk synchronous message-passing systems. that the latest FSM algorithms adopt. The experiments with

© 2015-19, IJARCS All Rights Reserved 35


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

real life and large synthetic datasets validate the effectiveness International Conference on Knowledge Discovery and
of MIRAGE for mining frequent subgraphs from large graph Data Mining (ICKDD 95), 1995, pp. 210-215.
datasets. [5] M. Deshpande, M. Kuramochi and G. Karypis, “Frequent
Bhuiyan et al., (2014) proposed a frequent subgraph mining
sub-structure based approaches for classifying chemical
algorithm called FSM-H [114] which handles real world graph
data grows both in size and quantity. FSM-H is a compounds,” IEEE Transactions on Knowledge and Data
distributed frequent subgraph mining method over a Engineering, vol. 17, no. 8, 2005, pp. 1036-1050.
MapReduce-based framework. The framework consists of [6] M. A. Srinuvasu, P. Padmaja and Y. Dharmateja,
three phases: data partition, a preparation phase, and mining “Subgraph relative frequency approach for extracting
phase. In data partition phase, FSM-H creates the partitions of
input data along with the omission of infrequent edges from interesting substructures from molecular data,”
the input graph. Preparation and mining phase performs the International Journal of Computer Engineering &
actual mining task. FSM-H generates a complete set of Technology, 2013, vol.4 no. 4, pp. 400-411.
frequent subgraphs for a given minimum threshold support, [7] H. Haiyan Hu, Xifeng Yan, Yu Huang1, Jiawei Han,
and it is efficient as it applies all the optimizations that the Xianghong Jasmine Zhou. Mining coherent dense
latest FSM algorithm adopt. The experiments with real life and
large synthetic datasets validate the effectiveness of FSM-H subgraphs across massive biological networks for
for mining frequent subgraphs from large graph datasets. functional discovery. Bioinformatics, vol. 21, no. 1, 2005,
pp. 213-221.
VI. CONCLUSION AND FUTURE DIRECTIONS [8] G. Ciriello and C. Guerra, “A review on models and
algorithms for motif discovery in protein-protein
In this paper, we have described various discoveries of
graph partitioning and frequent subgraph mining algorithms. interaction networks,” Briefings Functional Genomics &
Based on the different graph partitioning algorithms, we have Proteomic, vol. 7, no. 2, 2008, pp. 147–156.
classified graph partitioning approaches into static, dynamic [9] J. Huan, W. Wang, J. Prins, and J. Yang, “Spin: Mining
and parallel implementations. In case of frequent subgraph maximal frequent subgraphs from graph databases,” UNC
mining for the large graph, a multiple sets of small graph data, Technical Report TR04-018, 2004.
dynamic graph data and uncertain graph data. The algorithms [10] P. Raghavan, “Social Networks on the Web and in the
have been classified based on the apriori and pattern growth Enterprise,” Proce. First Asia-Pacific Conference on Web
approaches. To handle a large scale graphs we presented Intelligence, 2001, pp. 58-60.
various graph processing techniques and parallel frameworks
[11] C. Bichot and P. Siarry, “Graph Partitioning,” Wiley, New
for frequent subgraph mining are discussed in detail.
York, 2011.
The future directions for identifying frequent subgraphs are:
1) For handling large graph data, very few methodologies [12] D. J. Cook and L. B. Holder, “Mining Graph Data,”
are there for FSM. So, by adopting graph partitioning Wiley, New Jersey, 2007.
algorithms a large graph can be decomposed into a subset of [13] M. R. Garey and D. S. Johnson, “Computers and
graphs and then to the smaller graphs either apriori-based or Intractability: A Guide to the Theory of Np-
pattern growth approach algorithms can be implemented to Completeness,” W. H. Freeman & Co., New York, NY,
identify frequent subgraph mining. USA. 1990.
2) GraphLab, Giraph and GraphX parallel frameworks [14] R. Preis and R. Diekmann, “PARTY- a software library
provide good results while comparing with other different for graph partitioning,” Advances in Computational
frameworks, thus one of the above-mentioned frameworks can
Mechanics with Parallel and Distributed Processing,
be adopted for identifying frequent subgraphs in a large graph
CIVIL COMP PRESS, 1997, pp. 63-71.
data.
[15] S. T. Barnard and H. D. Simon, “A fast multilevel
VII. REFERENCES implementation of recursive spectral bisection for
partitioning unstructured problems,” Concurrency:
[1] D. Chakrabarti and C. Faloutsos, “Graph mining: Laws, Practice and Experience, 1994, vol.6, pp. 101-117.
generators, and algorithms,” ACM comput. Surv. 38, 1, [16] B. Hendrickson and R. Leland, “A multilevel algorithm
Article 2, jun. 2006, doi:10.1145/1132952.1132954. for partitioning graphs,” Proce. ACM/IEEE Conference
[2] C. Wang and S. Parthasarathy, “Parallel Algorithms for on Supercomputing. 1995, pp. 28-28.
Mining Frequent Structural Motifs in Scientific Data,” [17] B. W. Kernighan and S. Lin “An efficient heuristic
Proc. ACM International Conference on Supercomputing procedure for partitioning graphs,” The Bell System
(ICS 04). Jun. 2004, pp. 31-40, Technical Journal, 1970, vol. 49, no. 0, pp. 291–307.
doi:10.1145/1006209.1006215. [18] C. M. Fiduccia and R. M. Mattheyses, “A linear time
[3] J. R. Punin, M. Krishnamoorthy, and M. J. Zaki, heuristic for improving network partitions,” Proce. IEEE
“LOGML-Log Markup Language for Web Usage Design Automation Conference, 1982, pp.175–181.
Mining,” WEBKDD Workshop: Mining Log Data across [19] B. Monien, R. Preis and R. Diekmann, “Quality matching
All Customers Touch Points, 2001, pp. 88-112. and local improvement for multilevel graph-partitioning,”
[4] H. Mannilla, H. Toivonen and I. Verkamo, “Discovering Parallel Computing, vol. 26, no. 12, 2000, pp. 1609-1634.
Frequent Episodes in Sequences.” Proc. IEEE

© 2015-19, IJARCS All Rights Reserved 36


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

[20] C. Chevalier and I. Safro, “Comparison of Coarsening [33] H. Meyerhenke, P. Sanders and C. Schulz, “Parallel
Schemes for Multilevel Graph Partitioning,” Proce. Graph Partitioning for Complex Networks,” CoRR, 2014,
International Conference on Learning and Intelligent abs/1404.4797.
Optimization. 2009, pp. 191–205. [34] C. Martella, D. Logotheti and G. Siganos, “Spinner:
[21] I. Safro, P. Sanders and C. Schulz, “Advanced coarsening Scalable Graph Partitioning for the Cloud,” arXiv:
schemes for graph partitioning,” Proce. International 1404.3861v1, 2014.
Symposium on Experimental Algorithms (SEA’12). 2012, [35] B. Hendrickson and R. Leland, “A multilevel algorithm
pp. 369–380. for partitioning graphs,” Technical Report SAND93-1301,
[22] A. Grama, G. Karypis and V. Kumar, “Introduction to Sandia National Laboratories, 1993.
Parallel Computing,” Addison-Wesley, 2nd edition, 2003. [36] G. Karypis and V. Kumar, “A Fast and High Quality
[23] K. Schloegel, G. Karypis and V. Kumar, “Graph Multilevel Scheme for Partitioning Irregular Graphs,”
partitioning for high-performance scientific simulations,” SIAM J on Scientific Computing, vol. 20, no. 1, 1998, pp.
In: Sourcebook of parallel computing, Jack Dongarra, Ian 359-392.
Foster, Geoffrey Fox, William Gropp, Ken Kennedy, [37] G. Karypis and V. Kumar, “hMeTiS 1.5: A hypergraph
Linda Torczon, and Andy White (Eds.). Morgan partitioning package,” Technical report, Dept. of
Kaufmann Publishers Inc., San Francisco, CA, USA, Computer Science and Engineering, Univ. of Minnesota,
2003, pp. 491-541. 1998.
[24] S.T Barnard and H. Simon. “A parallel implementation of [38] G. Karypis and V. Kumar, “MeTiS 4.0: Unstructured
multilevel recursive spectral bisection for application to graph partitioning and sparse matrix ordering system,”
adaptive unstructured meshes,” Proce. SIAM conference Technical report, Dept. of Computer Science and
on Parallel Processing for Scientific Computing.1995, pp. Engineering, Univ. of Minnesota, 1998.
627–632. [39] G. Karypis and V. Kumar, “Parallel Multilevel k-way
[25] G. Karypis and V. Kumar, “A parallel algorithm for Partitioning Scheme for Irregular Graphs,” Proce.
multilevel graph partitioning and sparse matrix ordering,” ACM/IEEE Supercomputing’96 conference. 1996.
Journal of Parallel and Distributed Computing, vol. 48, [40] G. Karypis, K. Schloegel and V. Kumar, “ParMeTiS:
no. 1, 1998, pp. 71-95. Parallel graph partitioning and sparse matrix ordering
[26] G. Karypis and Kumar, “Parallel multilevel k-way library,” Technical report, Dept. of Computer Science and
partitioning scheme for irregular graphs,” SIAM Review, Engineering, University of Minnesota, 1997.
vol. 41, no. 2, 1999, pp. 278-300. [41] P. Sanders and C. Schulz, “Engineering Multilevel Graph
[27] G. L. Miller, S. H. Teng and S. A. Vavasis, “A unified Partitioning Algorithms,” Proce. European Symposium on
geometric approach to graph separators,” Proce. Annual Algorithms, 2011, pp. 469–480
Symposium on Foundations of Computer Science, 1991, [42] P. Sanders and C. Schulz, “Think Locally, Act Globally:
pp. 538–547. Highly Balanced Graph Partitioning,” Experimental
[28] H. Heath and P. Raghavan, “A Cartesian parallel nested Algorithms Lecture Notes in Computer Science, vol.
dissection algorithm,” SIAM Journal of Matrix Analysis 7933, 2013, pp. 164-175.
and Applications, vol. 16, no. 1, 1995, pp. 235-253. [43] M. Holtgrewe, P. Sanders and C. Schulz, “Engineering a
[29] K. Schloegel, G. Karypis and V. Kumar, “Wavefront Scalable High Quality Graph Partitioner,” Proce. IEEE
division and LMSR: algorithms for dynamic International Symposium on Parallel & Distributed
repartitioning of adaptive meshes,” Technical Report TR Processing (IPDPS). 2010, pp.1–12.
98-034, Dept. of Computer Science and Engineering, [44] C. Chevalier and F. Pellegrini, “Improvement of the
Univ. of Minnesota. 1998. Efficiency of Genetic Algorithms for Scalable Parallel
[30] C. Walshaw, M. Cross and M. Everett, “Parallel dynamic Graph Partitioning in a Multi-level Framework,” Euro-Par
graph partitioning for adaptive unstructured meshes,” 2006 Parallel Processing Lecture Notes in Computer
Journal of Parallel and Distributed Computing, vol. 47, Science, 2006, vol. 4128, pp. 243-252.
no. 2, 1997, pp. 102-108. [45] C. Chevalier and F. Pellegrini, “PT-Scotch: A tool for
[31] Lu Wang, Yanghua Xiao, Bin Shao, Haixun Wang. “How efficient parallel graph ordering,” Parallel Computing,
to partition a billion-node graph,” Proce. IEEE 30th 2008, vol. 34, no. 6, pp. 318-331.
International Confernce on Data Engineering (ICDE). [46] F. Pellegrini and J. Roman, “SCOTCH: A software
2014, pp. 568-579. package for static mapping by dual recursive
[32] C. E. Tsourakakis, C. Gkantsidis, B. Radunovic and M. bipartitioning of process and architecture graphs,” HPCN-
Vojnovic, “FENNEL: Streaming Graph Partitioning for Europe, Springer LNCS 1067, 1996, pp. 493-498.
Massive Scale Graphs,” Technical Report MSR-TR-2012- [47] C. Walshaw and M. Cross, “JOSTLE: Parallel Multilevel
113, 2012. Graph-Partitioning Software - An Overview,” In Mesh
Partitioning Techniques and Domain Decomposition
Techniques, Civil-Comp Ltd., 2007, pp. 27-58 .

© 2015-19, IJARCS All Rights Reserved 37


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

[48] R. Preis and R. Diekmann, “PARTY- a software library [62] S. Nijssen and J. Kok, “Faster association rules for
for graph partitioning,” Advances in Computational multiple relations,” Proce. International Joint Conference
Mechanics with Parallel and Distributed Processing, on Artificial Intelligence (IJCAI’01), 2001, pp. 891–896.
CIVIL COMP PRESS, 1997,pp. 63-71. [63] J. Huan, W. Wang and J. Prins, “Efficient mining of
[49] H. Meyerhenke, B. Monien and T. Sauerwald ,“A new frequent subgraph in the presence of isomorphism,” UNC
diffusion-based multilevel algorithm for computing graph computer science techonology report TR03-021, 2003
partitions,” Journal of Parallel and Distributed [64] M. Kuramochi.M and G. Karypis, “Finding Frequent
Computing, 2009, vol. 69, no. 9, pp. 750-761. Patterns In a Large Sparse Graph”, in Proceedings of
[50] Trifunovic and W. J. Knottenbelt, “Parallel Multilevel the4th SIAM International Conference on Data Mining
Algorithms for Hypergraph Partitioning,” Journal of (SDM 2004), USA, 2004.
Parallel and Distributed Computing, 2008, vol. 68, no. 5, [65] M. Kuramochi and G. Karypis, “GREW: A Scalable
pp. 563-581. Frequent Subgraph Discovery Algorithm”, Proce.
[51] Umit V Catalyurek, Erik G Boman, Karen D Devine, International Conference on Data Mining (ICDM’04),
Doruk Bozdag, Robert T. Heaphy, Lee Ann Riesen. “A Brighton, pp.439–442, 2004.
repartitioning Hypergraph model for dynamic load [66] J. Huan, W. Wang, J. Prins and J. Yang, “Spin: Mining
balancing,” Journal of Parallel Distribution and maximal frequent subgraphs from graph databases,” UNC
Computing , 2009, vol. 69, no. 8, pp. 711-724. Technical Report TR04-018, 2004.
[52] U. V. Catalyure and C. Aykanat, “PaToH: Partitioning [67] B. Wackersreuther, P. Wackersreuther, A. Oswald, C.
Tool for Hypergraphs,” 2011. Bohm and K. M. Borgwardt, “Frequent subgraph
[53] C. C. Aggarwal and H. Wang, editors. Managing and discovery in dynamic networks,” Proce. Eighth Workshop
Mining Graph Data, volume 40 of Advances in Database on mining and Learning with Graphs (MLG 10), 2010, pp.
Systems. Springer, 2010. 155-162, doi:10.1145/1830252.1830272.
[54] S. Aluru, “Handbook of Computational Molecular [68] Thomas L, Valluri S, Karlapalem K. “Isg: Itemset based
Biology,” Chapman and Hall/CRC. 2006. subgraph mining,” Technical Report, Center for Data
[55] S. U. Rehman, S. Asghar, Y. Zhuang, and S. Fong. Engineering, IIIT, Hyderabad, 2009.
“Performance evaluation of frequent subgraph discovery [69] Z. Zou and J. Li, “Mining Frequent Subgrph Patterns from
techniques,” Mathematical Problems in Engineering, Uncertain Graph Data,” IEEE Transactions on Knowledge
2014, pp. 1-5. and Data Engineering, vol. 22, no. 9, 20 may. 2010, pp.
[56] A.Inokuchi, T. Washio and H. Motoda, “An apriori-based 1203-1218, doi:10.1109/TKDE.2010.80
algorithm for mining frequent substructures from graph [70] S. Jamil, A. Khan, Z Halim, A. R. Baig, “Weighted
data,” Proce. European Conference on Principles of Data MUSE for Frequent Subgraph Pattern Finding in
Mining and Knowledge Discovery (PKDD 00), 2000, pp. Uncertain DBPL Data,” Proce. International Conference
13-23. on Internet Technology and applications (iTAP), 16-18
[57] K. Lakshmi and T. Meyyappan , “A comparative study of Aug. 2011, pp. 1-6.
frequent subgraph mining algorithms,” International [71] Z. Zou, H. Gao and J. Li "Discovering frequent subgraphs
Journal of Information Technology Convergence and over uncertain graph databases under probabilistic
Services (IJITCS), 2012, vol. 2, no. 2, pp. 23-39. semantics," Proce. ACM SIGKDD international
[58] K. Lakshmi and T. Meyyappan, “Frequent subgraph conference on Knowledge discovery and data mining
mining algorithms - a survey and framework for (KDD 10), 2010, pp. 633–642,
classification,” Proce. Conference on Innovations in doi:10.1145/1835804.1835885.
Theoretical Computer Science (ITCS 12), 2012, pp. 189– [72] O. Papapetrou, E. Loannou, D. Skoutas, ”Efficient
202,. Discovery of Frequent Subgraph Patterns in Uncertain
[59] M. Kuramochi and G. Karypis, “Frequent Subgraph Graph Database,” Proce. International Conference on
Discovery,” Proce. IEEE International Conference on Extending Database Technology (EDBT/ICDT '11),
Data Mining (ICDM 01). 2001, pp. 313-320. Anastasia Ailamaki, Sihem Amer-Yahia, Jignesh Pate,
[60] L. Dehaspe and H. Toivonen, “Discovery of Tore Risch, Pierre Senellart, and Julia Stoyanovich (Eds.).
FrequentDatalog Patterns”, Data Mining and Knowledge ACM, New York, NY, USA, pp. 355-366,
Discovery, 1999, pp.7-36. doi:10.1145/1951365.1951408.
[61] A. Inokuchi, T. Washio, and H. Motoda, "An apriori- [73] K. Yoshida, H. Motoda and N. Indurkhya, “Graph-based
based algorithm for mining frequent substructures from Induction as a Unified Learning Framework”, Journal of
graph data," Proce. European Conference on Principles of Applied Intelligence, pp.297–328.
Data Mining and Kn owledge Discovery (PKDD 00), [74] L. B. Holder, D. J. Cook and S. Djoko, “Substructure
2000, pp. 13-23. Discovery in the Subdue System”, Proce. AAAI’94
workshop knowledge discovery in databases (KDD’94),
WA, 1994, pp 169–180.

© 2015-19, IJARCS All Rights Reserved 38


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

[75] C. Borgelt and M. R. Berhold, “Mining molecular Trust, 2010, pp. 282-287,
fragments: Finding relevant substructures of molecules,” doi:10.1109/SocialCom.2010.47.
Proce. IEEE International Conference on Data Mining [87] J. Li, Y. Liu and H. Gao. “Efficient algorithms for
(ICDM 02), IEEE Press, 2002, pp. 51-58, summarizing graph patterns,” IEEE Transactions on
doi:10.1109/ICDM.2002.1183885. Knowledge and Data Engineering, 2011, vol. 23, no.9, pp.
[76] X. Yan and J. Han, “gSpan: Graph-based substructure 1388-1405, doi:10.1109/TKDE.2010.48.
pattern mining,” Proce. IEEE International Conference on [88] J. Dean and S. Ghemawat, "MapReduce: Simplified data
Data Mining (ICDM 02), IEEE press, 2002, pp. 721-724, processing on large clusters," Proce. Symposium on
doi:10.1109/ICDM.2002.1184038. Operating System Design and Implementation, pp. 137–
[77] X. Yan and J. Han, “CloseGraph: Mining closed frequent 150, 2004.
graph patterns,” Proce. ACM SIGKDD International [89] Apache Hadoop. http://hadoop.apache.org/.
Conference Knowledge Discovery and Data Mining [90] Hadoop Distributed File System.
(KDD 03). Aug. 2003, pp. 286-295, http://hadoop.apache.org/hdfs/.
doi:10.1145/956750.956784. [91] C. Olston, B. Reed, U.Srivastava, R. Kumar, and A.
[78] S. Nijssen and J. Kok, “A quickstart in frequent structure Tomkins, "Pig Latin: a Not-So-Foreign Language for Data
mining can make a difference,” Proce. ACM International Processing," Proce. ACM/SIGMOD International
Conference on Knowledge Discovery and Data Mining Conference on Management of Data (SIGMOD), 2008,
(SIGKDD 04), Aug. 2004, pp. 647–652, pp. 1099-1110, doi:10.1145/1376616.1376726.
doi:10.1145/1014052.1014134. [92] U. Kang, C. E. Tsourakakis, and C. Faloutsos,
[79] T. Meinl, and M. R. Berthold, “Hybrid fragment mining "PEGASUS: A peta-scale graph mining system –
with MoFa and FSG,” Proce. IEEE International Implementation and observations," Proce. IEEE
Conference on Systems, Man and Cybernetics. 2004, pp. International Conference on Data Mining, 2009, pp. 229–
4559-4564, doi: 10.1109/ICSMC.2004.1401250. 238, doi:10.1109/ICDM.2009.14.
[80] E. Gudes, E. Shimony and N. Vanetik, “Discovering [93] Apache Mahout. http://mahout.apache.org/.
Frequent Graph Patterns Using Disjoint Paths”, IEEE [94] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst,
Transactions on Knowledge and Data Engineering, Los "HaLoop: Efficient iterative data processing on large
Angeles, 2006, pp.1441–1456, clusters," Proce. International Conference on Very Large
doi:10.1109/TKDE.2006.173. Databases, pp. 285–296, 2010.
[81] Y. Li, Q. Lin, G. Zhong, D. Duan, Y. Jin and W. Bi. “A [95] Y. Zhang, Q. Gao, L. Gao, and C. Wang, "iMapreduce: A
directed labeled graph frequent pattern mining algorithm distributed computing framework for iterative
based on minimum code,” Proce. International computation," DataCloud, 2011.
Conference on Multimedia and Ubiquitous Engineering [96] R. Chen, X. Weng, B. He, and M. Yang, "Large graph
(ICMUE ’09), 2009, pp. 353-359, processing in the cloud," Proce. International Conference
doi:10.1109/MUE.2009.67. on Management of Data, pp. 1123–1126, 2010.
[82] Y. Ke, J. Cheng and J. X. Yu. “Efficient Discovery of [97] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. H. Bae,
Frequent Correlated Subgraph Pairs,” Proce. IEEE J. Qiu, and G. Fox, "Twister: a runtime for iterative
International Conference on Data Mining (ICDM ’09), mapreduce," Proce. ACM International Symposium on
IEEE press, 2009, pp. 239-248. High Performance Distributed Computing, HPDC ’10, pp.
[83] S. Zhang, J. Yang and S. Li. “RING: An Integrated 810–818, New York, NY, USA, 2010. ACM.
Method for Frequent Representative Subgraph Mining, [98] Open MPI. http://www.open-mpi.org/.
Proce. International Conference on Data Mining (ICDM [99] MPICH2.
’09), 2009, pp. 1082-1087, doi:10.1109/ICDM.2009.96. http://www.mcs.anl.gov/research/projects/mpich2/.
[84] Y. Liu, J. Li and H. Gao. “JPMiner: Mining Frequent [100] pyMPI. http://pympi.sourceforge.net/.
Jump Patterns from Graph Databases,” Proce. [101] OCaml MPI.
International Conference on Fuzzy Systems and http://forge.ocamlcore.org/projects/ocamlmpi/.
Knowledge Discovery (ICFSKD ’09), 2009, pp. 114-118. [102] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert,
[85] S. Ranu, K. Ambuj K, Singh. “GraphSig: A Scalable I. Horn, N. Leiser, and G. Czajkowski, "Pregel: A System
Approach to Mining Significant Subgraphs in Large for LargeScale Graph Processing," Proce. ACM SIGMOD
Graph Databases,” Proce. IEEE International Conference International Conference on Management of Data, pp.
on Data Engineering (ICDE ’09), IEEE press, 2009, pp. 155–166, 2011.
844-855, doi:10.1109/ICDE.2009.133. [103] Apache Incubator Giraph.
[86] H. P. Hsieh and C. T. Li. “Mining Temporal Subgraph http://incubator.apache.org/giraph//.
Patterns in Heterogeneous Information Networks,” Proce. [104] GoldenOrb. http://www.raveldata.com/goldenorb/.
IEEE International Conference on Social Computing / [105] Phoebus. https://github.com/xslogic/phoebus.
International Conference on Privacy, Security, Risk and [106] Apache Hama. http://incubator.apache.org/hama/

© 2015-19, IJARCS All Rights Reserved 39


Appala Srinuvasu Muttipati et al, International Journal of Advanced Research in Computer Science, 6 (7), September–October, 2015, 29-40

[107] JPregel. http://kowshik.github.com/JPregel/. [114] M. A. Bhuiyan and M. A. Hasan, “FSM-H: Frequent


[108] Bagel Programming Guide. subgraph mining algorithm in Hadoop,” Proce. IEEE
https://github.com/mesos/spark/wiki/BagelProgramming- International Conference on Big Data (ICBD 14), pp. 9-
Guide/. 16, doi:10.1109/BigData.Congress.2014.12
[109] E. Krepska, T. Kielmann, W. Fokkink, and H. Bal, "A
high-level framework for distributed processing of large-
scale graphs," Proce. International Conference on Short Bio data for the Authors
Distributed Computing and Networking, pp. 1123–1126,
2010. Appala Srinuvasu Muttipati is a Research Scholar in
Computer Science and Engineering from GITAM University,
[110] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker,
Visakhapatnam, Andhra Pradesh, India. He received his
and I.Stoica, "Spark: cluster computing with working M.Tech Degree in Computer Science and Technology from
sets," Proce. USENIX conference on Hot topics in cloud GITAM University in 2011. MCA Degree from Andhra
computing, HotCloud’10, pp. 10–10, Berkeley, CA, USA, University in 2008. His Current research areas include Data
2010. USENIX Association. mining and Graph mining.
[111] Graphx. http://spark.apache.org/graphx/
[112] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Padmaja Poosapati is a Associate Professor in Department of
Information Technology at GITAM University,
Guestrin, and J. M. Hellerstein, "GraphLab: A New
Visakhapatnam, Andhra Pradesh, India. She received her
Framework for Parallel Machine Learning," CoRR, Master degree in Computer Science and Engineering from
abs/1006.4990, 2010. Andhra University in 1999 and PhD degree in Computer
[113] M. A. Bhuiyan and M. A. Hasan, “MIRAGE: An Science and Engineering from Andhra University 2010. Her
iterative MapReduce based frequent subgraph mining current research interests include Clustering and Classification
algorithm,” arXiv:1307.5894v1 [cs.DB] 22 Jul 2013. in Data mining and Graph mining.

© 2015-19, IJARCS All Rights Reserved 40

You might also like