0% found this document useful (0 votes)
76 views14 pages

Graph Pattern Mining, Search and OLAP

This document discusses graph pattern mining and graph search. It begins by describing frequent graph pattern mining, including algorithms that use Apriori-based or pattern-growth approaches. It then discusses mining graph patterns with constraints, approximate patterns, discriminative patterns, and sampling techniques. The document next discusses graph search problems in multiple graphs and single graphs. It describes indexing techniques used for graph search, including feature-based, pattern-based, tree-based, and spectral-based approaches.

Uploaded by

Namtien Us
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views14 pages

Graph Pattern Mining, Search and OLAP

This document discusses graph pattern mining and graph search. It begins by describing frequent graph pattern mining, including algorithms that use Apriori-based or pattern-growth approaches. It then discusses mining graph patterns with constraints, approximate patterns, discriminative patterns, and sampling techniques. The document next discusses graph search problems in multiple graphs and single graphs. It describes indexing techniques used for graph search, including feature-based, pattern-based, tree-based, and spectral-based approaches.

Uploaded by

Namtien Us
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Graph Pattern Mining, Search and OLAP

Xifeng Yan
November 21, 2012

1 Graph Pattern Mining


Graph patterns become increasingly important in analyzing complex struc-
tures in many domains such as information networks, social networks, and
computer security. They can be utilized to index, search, classify, cluster,
predict interactions and functions in graphs.
Frequent Graph Pattern Among various kinds of graph patterns, frequent
subgraphs are the very basic pattern that can be discovered in a collection of
graphs. There are two problems settings: multiple graphs and single graph.

Multiple Graphs: Given a set of graphs, D = {G1 , G2 , . . . , Gn }, a graph


g is frequent if g is a subgraph of at least s graphs in D, where s is a
user-specified threshold, 1 s n.

Single Graph: Given a graph G, a graph g is frequent if g has more


than s (disjoint) embeddings in G, where s is a user-specified threshold,
1 s |V (G)|.

The existing studies are mostly focused on the multiple graphs scenario.
With some modifications, the mining methodology can be extended to the
single graph scenario [30]. Washio and Motoda [56] conducted a survey
on graph-based data mining. Holder et al. [21] proposed SUBDUE to do
subgraph pattern discovery based on minimum description length and back-
ground knowledge. The most popular graph pattern mining algorithms adapt
either Apriori-based or pattern-growth approach.
In an Apriori-based approach, the search for frequent subgraphs starts
with graphs of small size, and proceeds in a bottom-up manner. At each

1
iteration, the size of newly discovered frequent subgraphs is increased by
one node or edge. The new candidates are generated by joining two similar
but slightly different frequent subgraphs that were discovered already. The
frequency of the newly formed graphs is then checked. Typical Apriori-based
frequent graph pattern mining algorithms include AGM [23], FSG [29] , and
an edge-disjoint path-join algorithm [55].
In a pattern-growth approach, a frequent graph is extended directly by
adding a new node or edge, in every possible position. A potential problem
with this extension approach is that the same graph can be discovered many
times. The gSpan [59] algorithm solves this problem by introducing a right-
most extension technique, where the only extensions take place on the right-
most path. Many other algorithms adapt a similar strategy, including MoFa
[4], FFSM [22], and Gaston [42].
Graph Patterns with Constraints Constraint-based graph pattern min-
ing finds frequent graph patterns that satisfy user-specified constraints such
as degree, density, frequency, size etc. Mining closed graph patterns was stud-
ied in [60]. The goal is to reduce the number of graph patterns by removing
subgraph patterns that can be derived from other patterns. Techniques were
developed for pushing constraints as deep as possible in the mining process
[65].
Approximate Graph Patterns Due to the complexity of isomorphism
testing and the inelastic pattern definition, frequent subgraphs are not able to
capture approximate graph patterns. In [28], proximity pattern is defined as
a set of labels that co-occur frequently in neighborhoods. It relaxes the rigid
structure constraint of frequent subgraphs, while introducing connectivity to
frequent itemsets. Empirical results show that it not only finds interesting
patterns that are ignored by the existing approaches, but also achieves high
performance for finding proximity patterns in large-scale graphs.
Due to the exponential set of frequent graph patterns, it is necessary
to discover the most representative ones. Random sampling techniques are
developed to sample the pattern space uniformly and equally [18]. By doing
so, the mining time can be significantly improved while the number of similar
patterns can be reduced.
Discriminative Graph Patterns Discriminative graph pattern mining is
to find significant graph patterns that can tell the difference between two
sets of graphs. The two sets of graphs could be graphs with different class
labels. The discovered discriminative graph patterns can be used as features

2
for classification. [52] proposed an algorithm for mining the minimal contrast
subgraph which is able to capture the structural differences between any two
collections of graphs. LEAP [58] is a general approach to leverage structural
proximity and frequency association to quickly skip pattern search space and
find discriminative graph patterns, with respect to the objective function
given by a user.

2 Graph Search
Development of scalable methods for analyzing large graph data sets, in-
cluding graphs built from knowledge base and social networks, poses great
challenges. At the core of many graph analysis applications, lies a com-
mon and critical problem: how to efficiently search graphs. There are two
problems settings: multiple graphs and single graph.

Multiple Graphs: Given a set of graphs, D = {G1 , G2 , . . . , Gn } and a


query graph g, graph search returns an answer set Dg = {G|M(g, G) =
1, G D}, where M is a boolean function. M could be a function
testing graph isomorphism (full graph search), subgraph isomorphism
(subgraph search), approximate match (full graph similarity search),
and subgraph approximate match (subgraph similarity search).

Single Graph: Given a graph G and a query graph g, find all the
embeddings of g in G.

For graph search, it is inefficient to perform a sequential scan on a graph


database and check each graph to find answers to a query graph. Sequential
scan is costly because one has to not only access the whole graph database
but also check (sub)graph isomorphism. It is known that subgraph isomor-
phism is an NP-complete problem [9]. Ullmanns backtracking method [54],
VF2 [11], SwiftIndex[45] are the popular programs for subgraph isomorphism
checking. Therefore, high performance graph indexing is needed to quickly
prune graphs or regions of a graph that obviously violate the query require-
ment.
The problem of graph search has been addressed in different domains
since it is a critical problem in many applications. In content-based image
retrieval, [44] represented each graph as a vector of features and indexed
graphs in a high dimensional space using R-trees. [47] indexed graphs by

3
a signature computed from the eigenvalues of adjacency matrices. Instead
of casting a graph to a vector form, [3] proposed a metric indexing scheme
which organizes graphs hierarchically according to their mutual distances.
In semistructured/XML databases, query languages built on path ex-
pressions become popular. Efficient indexing techniques for path expression
were initially introduced in DataGuide [16] and 1-index [38]. A(k)-index [26]
proposes k-bisimilarity to exploit local similarity existing in semistructured
databases. Index Fabric [10] represents every path in a tree as a string and
stores it in a Patricia trie.
For more complicated graph queries, Shasha et al. [46] (GraphGrep) ex-
tended the path-based technique to do full scale graph retrieval. GraphGrep
is an example of feature-based graph indexing techniques. Let F be a feature
set for a given graph database D. For any feature f F , Df is the set of
graphs containing f , Df = {G|f G, G D}. The graph query processing
has three steps: (1) Search, which enumerates all the features in a query
T
graph, Q, to compute the candidate query answer set, CQ = f Df (f Q
and f F ); each graph in CQ contains all of Qs features. Therefore, DQ
is a subset of CQ . (2) Fetching, which retrieves the graphs in the candidate
answer set from disks. (3) Verification, which checks the graphs in the can-
didate answer set to verify if they really satisfy the query. The candidate
answer set is verified to prune false positives.
gIndex [61] introduces a pattern-based indexing techniques that facilitate
graph search in graph databases with thousands of instances. Nevertheless,
similar techniques can also be applied to indexing single massive graphs. The
idea is to precompute features from a graph database and build indices based
on these features. There are various kinds of features that could be used,
including node/edge labels, paths, trees, and subgraph patterns. gIndex is
a subgraph pattern-based approach, while GraphGrep is a path-based ap-
proach. FG-index [7] builds index using frequent subgraphs too. However, it
directly answer frequent graph queries without verification.
Zhao et al. [63] analyzed the effectiveness and efficiency of paths, trees,
and graphs as indexing features from three aspects: feature size, feature
selection cost, and pruning power. Like paths and graphs, tree features can
be effectively and efficiently used as indexing features for graph databases.
GString [25] combines three basic structures together: path, star, and cycle
for graph search.
GCoding [66] is another tree-based graph indexing approach. For each
node u, it extracts a level-n path tree, which consists of all n-step simple

4
pathes from u in a graph. The node is then encoded with eigenvalues derived
from this local tree structure. If a query graph Q is a subgraph of a graph
G, for each vertex u in Q, there must exist a corresponding vertex u in G
such that the local structure around u in Q should be preserved around u
in G. There is a partial order relationship between the eigenvalues of these
two local structures. Based on this property, GCoding could quickly prune
graphs that violate the order.
Closure-Tree [19] organizes graphs into a tree-based index structure using
graph closures as the bounding boxes.

3 Graph Similarity Search


A common problem in graph search is: what if there is no match or very
few matches for a given query graph? In this situation, a subsequent query
refinement process has to be taken in order to find the structures of interest.
Unfortunately, it is often too time-consuming for a user to manually refine the
query. One solution is to ask the system to find graphs that approximately
contain the query graph. This similarity search problem has been studied in
various fields.
There have been numerous studies on inexact graph search in large graphs.
Tong et al. [53] proposed the best-effort pattern matching, which aims to
maintain the shape of the query. Tian et al. [51] proposed an approxi-
mate subgraph search tool, called TALE, with efficient indexing. Mongiovi
et. al. introduced a set-cover-based inexact subgraph matching technique,
called SIGMA [39]. Both of the techniques use edge misses to measure the
quality of a matches. There are other works on inexact subgraph match-
ing. An incomplete list (see [15] for surveys) includes homomorphism based
subgraph matching [13], belief propagation based net alignment [2, 14], edge-
edit-distance based subgraph indexing technique [62], subgraph matching in
billion node graphs [48], regular expression based graph pattern matching
[1], schema [36] and unbalanced ontology matching [64], and graph partition
based subgraph identification scheme [5].
NESS [27] introduces a relaxed, computationally effective definition of
approximate graph matching by changing the strict subgraph isomorphism
checking to proximity checking. Under this new measure, it was proved
that subgraph similarity search is NP hard, while graph similarity match
is polynomial. An information propagation model was applied. It is able

5
to convert a large network into a set of multidimensional vectors, where
sophisticated indexing and similarity search algorithms are available. Ness
is appropriate for graphs with low automorphism and high noise, which are
common in many social and information networks.
There are several studies on simulation and bisimulation-based graph
pattern matching, e.g., [37, 12, 34], which define subgraph matching as a
relation among the query nodes and target nodes.

4 Graph Query Language


In the area of graph databases, a few of graph query languages have been
proposed to query and manage graph data. GraphLog [8] represents both
data and queries as graphs. Edges in queries represent edges or paths in the
database, indicting a regular expression kind of query. In terms of expressive
power, GraphLog was showed equivalent to stratified linear Datalog. Graph
query languages were also introduced with oriented object data models in
GOOD [33], GraphDB[17], and GOQL [31].
GraphQL [20] is a new graph query language that treat graphs as the
basic unit. It has an algebraic system similar to SQL, but the algebraic
operators are defined directly on graphs. [40] proposed ego-centric pattern
census queries, where a given structural pattern is searched in every nodes
neighborhood and the counts are reported or used in further analysis. This
kind of analysis is useful in opinion leader identification, node classification,
link prediction, and role identification. It developed an SQL-based declara-
tive language and a series of efficient query evaluation algorithms for it.

5 Graph OLAP
Graph OLAP aims to provide a model to perform composite structure and
information analysis in heterogonous networks. For example, in terms of
network intrusions, apart from the topological structures encoded in the un-
derlying network, multidimensional attributes are often specified and associ-
ated with nodes and edges, e.g., security software installed in computers, de-
fense strategies, access policies, etc., forming the so-called multidimensional
networks. While studies on contemporary networks have been around for
decades [41] , and a plethora of algorithms and systems have been devised

6
for multidimensional analysis in relational databases [24], none has taken
both aspects into account in the multidimensional network scenario. Graph
OLAP is the technique developed to fill the technology gaps in multidimen-
sional networks.
Graph OLAP performs discovery-driven OLAP operations for fast and
accurate knowledge discovery, through structure discovery, network summa-
rization, aggregation, correlation, clustering and classification. The concept
of Graph OLAP was first introduced in [6]. Two kinds of OLAPs were de-
fined: Informational OLAP (abbr. I-OLAP) and Topological OLAP (abbr.
T-OLAP). For roll-up in I-OLAP, the characterizing feature is that, snap-
shots are just different observations of the same underlying network, and thus
when they are all grouped into one cell in the cube, it is like overlaying mul-
tiple pieces of information, without changing the objects whose interactions
are being looked at. For roll-up in T-OLAP, the reorganization switches to
happen inside individual networks. Here, merging is performed internally
which zooms out the users focus to a generalized set of objects, and a new
graph formed by such shrinking might greatly alter the original networks
topological structure. where
[50] introduced two potential operations to summarize graphs, a keystep
in T-OLAP. The first operation, called SNAP, produces a summary graph
by grouping nodes based on user-selected node attributes and relationships.
The second operation, called k-SNAP, further allows users to control the
resolutions of summaries and provides the drill-down and roll-up abilities to
navigate through summaries with different resolutions. [43] discussed how to
efficiently compute T-OLAP using graph cubing techniques. It implemented
Graph Cube by combining special characteristics of multidimensional net-
works with the existing well-studied data cube techniques.
In addition to graph summarization, another important operation in
graph OLAP is similarity search. Large-scale heterogeneous information
networks consist of multi-typed, interconnected objects, it is important to
provide similarity measures in such networks. Intuitively, two objects are
similar if they are linked by many paths in the network. However, differ-
ent semantic meanings behind paths shall be are taken into consideration.
[57] studied similarity search that is defined among the same type of objects
in heterogeneous networks, and introduced the concept of meta path-based
similarity, where a meta path is a path consisting of a sequence of relations
defined between different object types (i.e., structural paths at the meta
level). Meta-path similarity turns out to be more meaningful in many sce-

7
narios compared with random-walk based similarity measures.

6 Vertex Programming
Vertex programming is adopted in several leading distributed graph comput-
ing platforms in clusters such as Pregel [35] and GraphLab [32]. They can
be implemented using the bulk synchronous parallel model or asynchronous
models. Vertex Programming is suitable for graph algorithms that can be
modified to store computation states in vertices and these states can be
distributed and shared with multiple vertices. Pregel and GraphLab have
demonstrated their success in computation of shortest paths, random walk,
clustering, and belief propagation which can support many machine learning
algorithms. However, it is unknown if an effective implementation of sub-
graph isomorphism exists using vertex programming. [49] proposed passing
partial matches around computers in order to find a complete match. One can
also implement a centralized algorithm that collets partial matchings from
different machines and assembles them in a center machine. Both algorithms
have pros and cons. They are not compatible with vertex programming and
need a special demon process in computers to coordinate the partial result
assembly. Our approximate graph search algorithms that use message pass-
ing between vertices, e.g., NESS [27], are suitable for vertex programming.
NESS uses vector representation of graphs. The neighborhood information
of each vertex is computed by propagating the labels of its neighbors with
distance weighting, which is encoded in each vertex. The best matches of
each vertex can be further passed to its neighbors to find the best match of
the entire vertex set. The structure of a vertexs neighbors is encoded with
their distance to that vertex. When the number of distinct labels is high in a
graph, NESS will likely find a good match in terms of subgraph isomorphism.

References
[1] P. Barcelo, L. Libkin, and J. L. Reutter. Querying Graph Patterns.
PODS, 2011.

[2] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algo-


rithms for Large, Sparse Network Alignment Problems. ICDM, 2009.

8
[3] S. Beretti, A. Bimbo, and E. Vicario. Efficient matching and indexing
of graph models in content based retrieval. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 23:10891105, 2001.

[4] C. Borgelt and M. Berthold. Mining molecular fragments: Finding rel-


evant substructures of molecules. In Proc. of 2002 Int. Conf. on Data
Mining (ICDM02), pages 211218, 2002.

[5] M. Brocheler, A. Pugliese, and V. S. Subrahmanian. COSI: Cloud Ori-


ented Subgraph Identification in Massive Social Networks. ASONAM,
2010.

[6] F. Zhu J. Han C. Chen, X. Yan and P. S. Yu. Graph olap: Towards
online analytical processing on graphs. In Proc. 2008 Int. Conf. on Data
Mining, 2008.

[7] J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-Index: Towards Verification-


Free Query Processing on Graph Databases. SIGMOD, 2007.

[8] M. Consens and A. Mendelzon. Graphlog: a visual formalism for real


life recursion. In PODS, 1990.

[9] S. Cook. The complexity of theorem-proving procedures. In Proc. of the


3rd ACM Symp. on Theory of Computing (STOC71), pages 151158,
1971.

[10] B. Cooper, N. Sample, M. Franklin, G. Hjaltason, and M. Shadmon. A


Fast Index for Semistructured Data. VLDB, 2001.

[11] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph Iso-


morphism Algorithm for Matching Large Graphs. IEEE Tran. Pattern
Anal. and Machine Int., 2004.

[12] W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph Pattern
Matching: From Intractable to Polynomial Time. PVLDB, 2010.

[13] W. Fan, J. Li, S. Ma, H. Wang, and Y. Wu. Graph Homomorphism


Revisited for Graph Matching. PVLDB, 2010.

[14] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient Belief Propagation


for Early Vision. Int. J. Comput. Vision, 70(1), 2006.

9
[15] B. Gallagher. Matching Structure and Semantics: A Survey on Graph-
Based Pattern Matching. AAAI FS., 2006.

[16] R. Goldman and J. Widom. Dataguides: Enabling query formulation


and optimization in semistructured databases. In Proc. of 1997 Int.
Conf. on Very Large Data Bases (VLDB97), pages 436445, 1997.

[17] R. H. Guting. Graphdb: Modeling and querying graphs in databases.


In VLDB, page 297308, 1994.

[18] M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns.
Proc. of the VLDB Endowment (35th Int. Conf. on Very Large Data
Bases), 2(1):730741, 2009.

[19] H. He and A. Singh. Closure-Tree: An Index Structure for Graph


Queries. ICDE, 2006.

[20] H. He and A. Singh. Graphs-at-a-time: query language and access meth-


ods for graph databases. In Proc. of the 2008 ACM SIGMOD int. conf.
on Management of data, SIGMOD08, pages 405418, 2008.

[21] L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the


Subdue System. KDD, 1994.

[22] J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph


in the presence of isomorphism. In Proc. of 2003 Int. Conf. on Data
Mining (ICDM03), pages 549552, 2003.

[23] A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm


for mining frequent substructures from graph data. In Proc. of 2000
European Symp. Principle of Data Mining and Knowledge Discovery
(PKDD00), pages 1323, 2000.

[24] A. Bosworth A. Layman D. Reichart M. Venkatrao F. Pellow J. Gray,


S. Chaudhuri and H. Pirahesh. Data cube: A relational aggregation
operator generalizing group-by, cross-tab, and sub-totals. Data Min.
Knowl. Discov.,, 1(1):2953, 1997.

[25] H. Jiang, H. Wang, P. Yu, and S. Zhou. GString: A Novel Approach


for Efficient Search in Graph Databases. ICDE, 2007.

10
[26] R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local
similarity for efficient indexing of paths in graph structured data. In
Proc. of 2002 Int. Conf. on Data Engineering (ICDE02), pages 129
140, 2002.

[27] A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neigh-


borhood Based Fast Graph Search in Large Networks. SIGMOD, 2011.

[28] A. Khan, X. Yan, and K.-L. Wu. Towards Proximity Pattern Mining in
Large Graphs. SIGMOD, 2010.

[29] M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. ICDM,


2001.

[30] M. Kuramochi and G. Karypis. Finding frequent patterns in a large


sparse graph. Data Mining and Knowledge Discovery, 11(3):243271,
2005.

[31] Z. M. Ozsoyoglu L. Sheng and G. Ozsoyoglu. A graph query language


and its query processing. In ICDE, 1999.

[32] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Heller-


stein. Distributed graphlab: a framework for machine learning and data
mining in the cloud. Proc. VLDB Endow., 5(8):716727, 2012.

[33] J. Paredaens M. Gyssens and D. van Gucht. A graph-oriented object


database model. In PODS, page 417424, 1990.

[34] S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo. Capturing Topology in


Graph Pattern Matching. PVLDB, 2012.

[35] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn,


N. Leiser, and G. Czajkowski. PREGEL: A System for Large-Scale
Graph Processing. SIGMOD, 2010.

[36] S. Melnik, H. G.-Molina, and E. Rahm. Similarity Flooding: A Versatile


Graph Matching Algorithm and its Application to Schema Matching.
ICDE, 2002.

[37] R. Milner. Communication and Concurrency. Prentice Hall, 1989.

11
[38] T. Milo and D. Suciu. Index structures for path expressions. Lecture
Notes in Computer Science, 1540:277295, 1999.

[39] M. Mongiov, R. Di Natale, R. Giugno, A. Pulvirenti, A. Ferro, and


R. Sharan. SIGMA: A Set-Cover-Based Inexact Graph Matching Algo-
rithm. J. Bioinfo. and Comp. Bio., 2010.

[40] W. Moustafa, A. Deshpande, and L. Getoor. Ego-centric graph pattern


census. In ICDE, 2012.

[41] M. Newman. Networks: An Introduction. Oxford University Press, 2010.

[42] S. Nijssen and J. Kok. A quickstart in frequent structure mining can


make a difference. In Proc. of 2004 ACM Int. Conf. on Knowledge
Discovery in Data Mining (KDD04), pages 647652, 2004.

[43] D. Xin P. Zhao, X. Li and J. Han. Graph cube: On warehousing and


olap multidimensional networks. In SIGMOD, 2011.

[44] E. Petrakis and C. Faloutsos. Similarity searching in medical image


databases. Knowledge and Data Engineering, 9(3):435447, 1997.

[45] H. Shang, Y. Zhang, X. Lin, and J. Yu. Taming Verification Hardness:


An Efficient Algorithm for Testing Subgraph Isomorphism. PVLDB,
2008.

[46] D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and Applica-


tions of Tree and Graph Searching. PODS, 2002.

[47] A. Shokoufandeh, S. Dickinson, K. Siddiqi, and S. Zucker. Indexing


using a spectral encoding of topological structure. In Proc. of IEEE Int.
Conf. on Computer Vision and Pattern Recognition (CVPR99), pages
24912497, 1999.

[48] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient Subgraph


Matching on Billion Node Graphs. PVLDB, 2012.

[49] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph


matching on billion node graphs. Proc. VLDB Endow., 5(9):788799,
2012.

12
[50] Y. Tian, R. A. Hankins, and J. M. Patel. Efficient Aggregation for
Graph Summarization. SIGMOD, 2008.

[51] Y. Tian and J. M. Patel. TALE: A Tool for Approximate Large Graph
Matching. ICDE, 2008.

[52] R. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In


Proc. of 2006 SIAM Int. Conf. on Data Mining (SDM06), 2006.

[53] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast Best-


Effort Pattern Matching in Large Attributed Graphs. KDD, 2007.

[54] J. R. Ullmann. An Algorithm for Subgraph Isomorphism. J. ACM,


1976.

[55] N. Vanetik, E. Gudes, and S. Shimony. Computing frequent graph pat-


terns from semistructured data. In Proc. of 2002 Int. Conf. on Data
Mining (ICDM02), pages 458465, 2002.

[56] T. Washio and H. Motoda. State of the art of graph-based data mining.
SIGKDD Explorations, 5:5968, 2003.

[57] X. Yan P. S. Yu Y. Sun, J. Han and T. Wu. Pathsim: Meta path-based


top-k similarity search in heterogeneous information networks. In Proc.
of 2011 Int. Conf. on Very Large Data Bases (VLDB11), 2011.

[58] X. Yan, H. Cheng, P. S. Yu, and J. Han. Mining significant graph


patterns by leap search. In Proc. of 2008 ACM-SIGMOD Int. Conf. on
Management of Data (SIGMOD08), pages 433 444, 2008.

[59] X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining.


ICDM, 2002.

[60] X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns.
In Proc. of 2003 Int. Conf. on Knowledge Discovery and Data Mining
(KDD03), pages 286295, 2003.

[61] X. Yan, P. S. Yu, and J. Han. Graph Indexing: A Frequent Structure-


Based Approach. SIGMOD, 2004.

[62] S. Zhang, J. Yang, and W. Jin. SAPPER: Subgraph Indexing and Ap-
proximate Matching in Large Graphs. PVLDB, 2010.

13
[63] P. Zhao, J. Yu, and P. Yu. Graph Indexing: Tree + Delta >= Graph.
VLDB, 2007.

[64] Q. Zhong, H. Li, J. Li, G. Xie, J. Tang, L. Zhou, and Y. Pan. A


Gauss Function Based Approach for Unbalanced Ontology Matching.
SIGMOD, 2009.

[65] F. Zhu, X. Yan, J. Han, and P. S. Yu. gprune: a constraint pushing


framework for graph pattern mining. In Proc. of the 11th Pacific-Asia
conf. on Advances in knowledge discovery and data mining, pages 388
400, 2007.

[66] L. Zou, L. Chen, J. Yu, and Y. Lu. A Novel Spectral Coding in a Large
Graph Database. EDBT, 2008.

14

You might also like