0% found this document useful (0 votes)

76 views14 pages

Graph Pattern Mining, Search and OLAP

This document discusses graph pattern mining and graph search. It begins by describing frequent graph pattern mining, including algorithms that use Apriori-based or pattern-growth approaches. It then discusses mining graph patterns with constraints, approximate patterns, discriminative patterns, and sampling techniques. The document next discusses graph search problems in multiple graphs and single graphs. It describes indexing techniques used for graph search, including feature-based, pattern-based, tree-based, and spectral-based approaches.

Uploaded by

Namtien Us

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views14 pages

Graph Pattern Mining, Search and OLAP

Uploaded by

Namtien Us

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Graph Pattern Mining, Search and OLAP

Xifeng Yan
November 21, 2012

1 Graph Pattern Mining

Graph patterns become increasingly important in analyzing complex struc-
tures in many domains such as information networks, social networks, and
computer security. They can be utilized to index, search, classify, cluster,
predict interactions and functions in graphs.
Frequent Graph Pattern Among various kinds of graph patterns, frequent
subgraphs are the very basic pattern that can be discovered in a collection of
graphs. There are two problems settings: multiple graphs and single graph.

Multiple Graphs: Given a set of graphs, D = {G1 , G2 , . . . , Gn }, a graph

g is frequent if g is a subgraph of at least s graphs in D, where s is a
user-specified threshold, 1 s n.

Single Graph: Given a graph G, a graph g is frequent if g has more

than s (disjoint) embeddings in G, where s is a user-specified threshold,
1 s |V (G)|.

The existing studies are mostly focused on the multiple graphs scenario.
With some modifications, the mining methodology can be extended to the
single graph scenario [30]. Washio and Motoda [56] conducted a survey
on graph-based data mining. Holder et al. [21] proposed SUBDUE to do
subgraph pattern discovery based on minimum description length and back-
ground knowledge. The most popular graph pattern mining algorithms adapt
either Apriori-based or pattern-growth approach.
In an Apriori-based approach, the search for frequent subgraphs starts
with graphs of small size, and proceeds in a bottom-up manner. At each

1
iteration, the size of newly discovered frequent subgraphs is increased by
one node or edge. The new candidates are generated by joining two similar
but slightly different frequent subgraphs that were discovered already. The
frequency of the newly formed graphs is then checked. Typical Apriori-based
frequent graph pattern mining algorithms include AGM [23], FSG [29] , and
an edge-disjoint path-join algorithm [55].
In a pattern-growth approach, a frequent graph is extended directly by
adding a new node or edge, in every possible position. A potential problem
with this extension approach is that the same graph can be discovered many
times. The gSpan [59] algorithm solves this problem by introducing a right-
most extension technique, where the only extensions take place on the right-
most path. Many other algorithms adapt a similar strategy, including MoFa
[4], FFSM [22], and Gaston [42].
Graph Patterns with Constraints Constraint-based graph pattern min-
ing finds frequent graph patterns that satisfy user-specified constraints such
as degree, density, frequency, size etc. Mining closed graph patterns was stud-
ied in [60]. The goal is to reduce the number of graph patterns by removing
subgraph patterns that can be derived from other patterns. Techniques were
developed for pushing constraints as deep as possible in the mining process
[65].
Approximate Graph Patterns Due to the complexity of isomorphism
testing and the inelastic pattern definition, frequent subgraphs are not able to
capture approximate graph patterns. In [28], proximity pattern is defined as
a set of labels that co-occur frequently in neighborhoods. It relaxes the rigid
structure constraint of frequent subgraphs, while introducing connectivity to
frequent itemsets. Empirical results show that it not only finds interesting
patterns that are ignored by the existing approaches, but also achieves high
performance for finding proximity patterns in large-scale graphs.
Due to the exponential set of frequent graph patterns, it is necessary
to discover the most representative ones. Random sampling techniques are
developed to sample the pattern space uniformly and equally [18]. By doing
so, the mining time can be significantly improved while the number of similar
patterns can be reduced.
Discriminative Graph Patterns Discriminative graph pattern mining is
to find significant graph patterns that can tell the difference between two
sets of graphs. The two sets of graphs could be graphs with different class
labels. The discovered discriminative graph patterns can be used as features

2
for classification. [52] proposed an algorithm for mining the minimal contrast
subgraph which is able to capture the structural differences between any two
collections of graphs. LEAP [58] is a general approach to leverage structural
proximity and frequency association to quickly skip pattern search space and
find discriminative graph patterns, with respect to the objective function
given by a user.

2 Graph Search
Development of scalable methods for analyzing large graph data sets, in-
cluding graphs built from knowledge base and social networks, poses great
challenges. At the core of many graph analysis applications, lies a com-
mon and critical problem: how to efficiently search graphs. There are two
problems settings: multiple graphs and single graph.

Multiple Graphs: Given a set of graphs, D = {G1 , G2 , . . . , Gn } and a

query graph g, graph search returns an answer set Dg = {G|M(g, G) =
1, G D}, where M is a boolean function. M could be a function
testing graph isomorphism (full graph search), subgraph isomorphism
(subgraph search), approximate match (full graph similarity search),
and subgraph approximate match (subgraph similarity search).

Single Graph: Given a graph G and a query graph g, find all the
embeddings of g in G.

For graph search, it is inefficient to perform a sequential scan on a graph

database and check each graph to find answers to a query graph. Sequential
scan is costly because one has to not only access the whole graph database
but also check (sub)graph isomorphism. It is known that subgraph isomor-
phism is an NP-complete problem [9]. Ullmanns backtracking method [54],
VF2 [11], SwiftIndex[45] are the popular programs for subgraph isomorphism
checking. Therefore, high performance graph indexing is needed to quickly
prune graphs or regions of a graph that obviously violate the query require-
ment.
The problem of graph search has been addressed in different domains
since it is a critical problem in many applications. In content-based image
retrieval, [44] represented each graph as a vector of features and indexed
graphs in a high dimensional space using R-trees. [47] indexed graphs by

3
a signature computed from the eigenvalues of adjacency matrices. Instead
of casting a graph to a vector form, [3] proposed a metric indexing scheme
which organizes graphs hierarchically according to their mutual distances.
In semistructured/XML databases, query languages built on path ex-
pressions become popular. Efficient indexing techniques for path expression
were initially introduced in DataGuide [16] and 1-index [38]. A(k)-index [26]
proposes k-bisimilarity to exploit local similarity existing in semistructured
databases. Index Fabric [10] represents every path in a tree as a string and
stores it in a Patricia trie.
For more complicated graph queries, Shasha et al. [46] (GraphGrep) ex-
tended the path-based technique to do full scale graph retrieval. GraphGrep
is an example of feature-based graph indexing techniques. Let F be a feature
set for a given graph database D. For any feature f F , Df is the set of
graphs containing f , Df = {G|f G, G D}. The graph query processing
has three steps: (1) Search, which enumerates all the features in a query
T
graph, Q, to compute the candidate query answer set, CQ = f Df (f Q
and f F ); each graph in CQ contains all of Qs features. Therefore, DQ
is a subset of CQ . (2) Fetching, which retrieves the graphs in the candidate
answer set from disks. (3) Verification, which checks the graphs in the can-
didate answer set to verify if they really satisfy the query. The candidate
answer set is verified to prune false positives.
gIndex [61] introduces a pattern-based indexing techniques that facilitate
graph search in graph databases with thousands of instances. Nevertheless,
similar techniques can also be applied to indexing single massive graphs. The
idea is to precompute features from a graph database and build indices based
on these features. There are various kinds of features that could be used,
including node/edge labels, paths, trees, and subgraph patterns. gIndex is
a subgraph pattern-based approach, while GraphGrep is a path-based ap-
proach. FG-index [7] builds index using frequent subgraphs too. However, it
directly answer frequent graph queries without verification.
Zhao et al. [63] analyzed the effectiveness and efficiency of paths, trees,
and graphs as indexing features from three aspects: feature size, feature
selection cost, and pruning power. Like paths and graphs, tree features can
be effectively and efficiently used as indexing features for graph databases.
GString [25] combines three basic structures together: path, star, and cycle
for graph search.
GCoding [66] is another tree-based graph indexing approach. For each
node u, it extracts a level-n path tree, which consists of all n-step simple

4
pathes from u in a graph. The node is then encoded with eigenvalues derived
from this local tree structure. If a query graph Q is a subgraph of a graph
G, for each vertex u in Q, there must exist a corresponding vertex u in G
such that the local structure around u in Q should be preserved around u
in G. There is a partial order relationship between the eigenvalues of these
two local structures. Based on this property, GCoding could quickly prune
graphs that violate the order.
Closure-Tree [19] organizes graphs into a tree-based index structure using
graph closures as the bounding boxes.

3 Graph Similarity Search

A common problem in graph search is: what if there is no match or very
few matches for a given query graph? In this situation, a subsequent query
refinement process has to be taken in order to find the structures of interest.
Unfortunately, it is often too time-consuming for a user to manually refine the
query. One solution is to ask the system to find graphs that approximately
contain the query graph. This similarity search problem has been studied in
various fields.
There have been numerous studies on inexact graph search in large graphs.
Tong et al. [53] proposed the best-effort pattern matching, which aims to
maintain the shape of the query. Tian et al. [51] proposed an approxi-
mate subgraph search tool, called TALE, with efficient indexing. Mongiovi
et. al. introduced a set-cover-based inexact subgraph matching technique,
called SIGMA [39]. Both of the techniques use edge misses to measure the
quality of a matches. There are other works on inexact subgraph match-
ing. An incomplete list (see [15] for surveys) includes homomorphism based
subgraph matching [13], belief propagation based net alignment [2, 14], edge-
edit-distance based subgraph indexing technique [62], subgraph matching in
billion node graphs [48], regular expression based graph pattern matching
[1], schema [36] and unbalanced ontology matching [64], and graph partition
based subgraph identification scheme [5].
NESS [27] introduces a relaxed, computationally effective definition of
approximate graph matching by changing the strict subgraph isomorphism
checking to proximity checking. Under this new measure, it was proved
that subgraph similarity search is NP hard, while graph similarity match
is polynomial. An information propagation model was applied. It is able

5
to convert a large network into a set of multidimensional vectors, where
sophisticated indexing and similarity search algorithms are available. Ness
is appropriate for graphs with low automorphism and high noise, which are
common in many social and information networks.
There are several studies on simulation and bisimulation-based graph
pattern matching, e.g., [37, 12, 34], which define subgraph matching as a
relation among the query nodes and target nodes.

4 Graph Query Language

In the area of graph databases, a few of graph query languages have been
proposed to query and manage graph data. GraphLog [8] represents both
data and queries as graphs. Edges in queries represent edges or paths in the
database, indicting a regular expression kind of query. In terms of expressive
power, GraphLog was showed equivalent to stratified linear Datalog. Graph
query languages were also introduced with oriented object data models in
GOOD [33], GraphDB[17], and GOQL [31].
GraphQL [20] is a new graph query language that treat graphs as the
basic unit. It has an algebraic system similar to SQL, but the algebraic
operators are defined directly on graphs. [40] proposed ego-centric pattern
census queries, where a given structural pattern is searched in every nodes
neighborhood and the counts are reported or used in further analysis. This
kind of analysis is useful in opinion leader identification, node classification,
link prediction, and role identification. It developed an SQL-based declara-
tive language and a series of efficient query evaluation algorithms for it.

5 Graph OLAP
Graph OLAP aims to provide a model to perform composite structure and
information analysis in heterogonous networks. For example, in terms of
network intrusions, apart from the topological structures encoded in the un-
derlying network, multidimensional attributes are often specified and associ-
ated with nodes and edges, e.g., security software installed in computers, de-
fense strategies, access policies, etc., forming the so-called multidimensional
networks. While studies on contemporary networks have been around for
decades [41] , and a plethora of algorithms and systems have been devised

6
for multidimensional analysis in relational databases [24], none has taken
both aspects into account in the multidimensional network scenario. Graph
OLAP is the technique developed to fill the technology gaps in multidimen-
sional networks.
Graph OLAP performs discovery-driven OLAP operations for fast and
accurate knowledge discovery, through structure discovery, network summa-
rization, aggregation, correlation, clustering and classification. The concept
of Graph OLAP was first introduced in [6]. Two kinds of OLAPs were de-
fined: Informational OLAP (abbr. I-OLAP) and Topological OLAP (abbr.
T-OLAP). For roll-up in I-OLAP, the characterizing feature is that, snap-
shots are just different observations of the same underlying network, and thus
when they are all grouped into one cell in the cube, it is like overlaying mul-
tiple pieces of information, without changing the objects whose interactions
are being looked at. For roll-up in T-OLAP, the reorganization switches to
happen inside individual networks. Here, merging is performed internally
which zooms out the users focus to a generalized set of objects, and a new
graph formed by such shrinking might greatly alter the original networks
topological structure. where
[50] introduced two potential operations to summarize graphs, a keystep
in T-OLAP. The first operation, called SNAP, produces a summary graph
by grouping nodes based on user-selected node attributes and relationships.
The second operation, called k-SNAP, further allows users to control the
resolutions of summaries and provides the drill-down and roll-up abilities to
navigate through summaries with different resolutions. [43] discussed how to
efficiently compute T-OLAP using graph cubing techniques. It implemented
Graph Cube by combining special characteristics of multidimensional net-
works with the existing well-studied data cube techniques.
In addition to graph summarization, another important operation in
graph OLAP is similarity search. Large-scale heterogeneous information
networks consist of multi-typed, interconnected objects, it is important to
provide similarity measures in such networks. Intuitively, two objects are
similar if they are linked by many paths in the network. However, differ-
ent semantic meanings behind paths shall be are taken into consideration.
[57] studied similarity search that is defined among the same type of objects
in heterogeneous networks, and introduced the concept of meta path-based
similarity, where a meta path is a path consisting of a sequence of relations
defined between different object types (i.e., structural paths at the meta
level). Meta-path similarity turns out to be more meaningful in many sce-

7
narios compared with random-walk based similarity measures.

6 Vertex Programming
Vertex programming is adopted in several leading distributed graph comput-
ing platforms in clusters such as Pregel [35] and GraphLab [32]. They can
be implemented using the bulk synchronous parallel model or asynchronous
models. Vertex Programming is suitable for graph algorithms that can be
modified to store computation states in vertices and these states can be
distributed and shared with multiple vertices. Pregel and GraphLab have
demonstrated their success in computation of shortest paths, random walk,
clustering, and belief propagation which can support many machine learning
algorithms. However, it is unknown if an effective implementation of sub-
graph isomorphism exists using vertex programming. [49] proposed passing
partial matches around computers in order to find a complete match. One can
also implement a centralized algorithm that collets partial matchings from
different machines and assembles them in a center machine. Both algorithms
have pros and cons. They are not compatible with vertex programming and
need a special demon process in computers to coordinate the partial result
assembly. Our approximate graph search algorithms that use message pass-
ing between vertices, e.g., NESS [27], are suitable for vertex programming.
NESS uses vector representation of graphs. The neighborhood information
of each vertex is computed by propagating the labels of its neighbors with
distance weighting, which is encoded in each vertex. The best matches of
each vertex can be further passed to its neighbors to find the best match of
the entire vertex set. The structure of a vertexs neighbors is encoded with
their distance to that vertex. When the number of distinct labels is high in a
graph, NESS will likely find a good match in terms of subgraph isomorphism.

References
[1] P. Barcelo, L. Libkin, and J. L. Reutter. Querying Graph Patterns.
PODS, 2011.

[2] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algo-

rithms for Large, Sparse Network Alignment Problems. ICDM, 2009.

8
[3] S. Beretti, A. Bimbo, and E. Vicario. Efficient matching and indexing
of graph models in content based retrieval. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 23:10891105, 2001.

[4] C. Borgelt and M. Berthold. Mining molecular fragments: Finding rel-

evant substructures of molecules. In Proc. of 2002 Int. Conf. on Data
Mining (ICDM02), pages 211218, 2002.

[5] M. Brocheler, A. Pugliese, and V. S. Subrahmanian. COSI: Cloud Ori-

ented Subgraph Identification in Massive Social Networks. ASONAM,
2010.

[6] F. Zhu J. Han C. Chen, X. Yan and P. S. Yu. Graph olap: Towards
online analytical processing on graphs. In Proc. 2008 Int. Conf. on Data
Mining, 2008.

[7] J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-Index: Towards Verification-

Free Query Processing on Graph Databases. SIGMOD, 2007.

[8] M. Consens and A. Mendelzon. Graphlog: a visual formalism for real

life recursion. In PODS, 1990.

[9] S. Cook. The complexity of theorem-proving procedures. In Proc. of the

3rd ACM Symp. on Theory of Computing (STOC71), pages 151158,
1971.

[10] B. Cooper, N. Sample, M. Franklin, G. Hjaltason, and M. Shadmon. A

Fast Index for Semistructured Data. VLDB, 2001.

[11] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph Iso-

morphism Algorithm for Matching Large Graphs. IEEE Tran. Pattern
Anal. and Machine Int., 2004.

[12] W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph Pattern
Matching: From Intractable to Polynomial Time. PVLDB, 2010.

[13] W. Fan, J. Li, S. Ma, H. Wang, and Y. Wu. Graph Homomorphism

Revisited for Graph Matching. PVLDB, 2010.

[14] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient Belief Propagation

for Early Vision. Int. J. Comput. Vision, 70(1), 2006.

9
[15] B. Gallagher. Matching Structure and Semantics: A Survey on Graph-
Based Pattern Matching. AAAI FS., 2006.

[16] R. Goldman and J. Widom. Dataguides: Enabling query formulation

and optimization in semistructured databases. In Proc. of 1997 Int.
Conf. on Very Large Data Bases (VLDB97), pages 436445, 1997.

[17] R. H. Guting. Graphdb: Modeling and querying graphs in databases.

In VLDB, page 297308, 1994.

[18] M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns.
Proc. of the VLDB Endowment (35th Int. Conf. on Very Large Data
Bases), 2(1):730741, 2009.

[19] H. He and A. Singh. Closure-Tree: An Index Structure for Graph

Queries. ICDE, 2006.

[20] H. He and A. Singh. Graphs-at-a-time: query language and access meth-

ods for graph databases. In Proc. of the 2008 ACM SIGMOD int. conf.
on Management of data, SIGMOD08, pages 405418, 2008.

[21] L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the

Subdue System. KDD, 1994.

[22] J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph

in the presence of isomorphism. In Proc. of 2003 Int. Conf. on Data
Mining (ICDM03), pages 549552, 2003.

[23] A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm

for mining frequent substructures from graph data. In Proc. of 2000
European Symp. Principle of Data Mining and Knowledge Discovery
(PKDD00), pages 1323, 2000.

[24] A. Bosworth A. Layman D. Reichart M. Venkatrao F. Pellow J. Gray,

S. Chaudhuri and H. Pirahesh. Data cube: A relational aggregation
operator generalizing group-by, cross-tab, and sub-totals. Data Min.
Knowl. Discov.,, 1(1):2953, 1997.

[25] H. Jiang, H. Wang, P. Yu, and S. Zhou. GString: A Novel Approach

for Efficient Search in Graph Databases. ICDE, 2007.

10
[26] R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local
similarity for efficient indexing of paths in graph structured data. In
Proc. of 2002 Int. Conf. on Data Engineering (ICDE02), pages 129
140, 2002.

[27] A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neigh-

borhood Based Fast Graph Search in Large Networks. SIGMOD, 2011.

[28] A. Khan, X. Yan, and K.-L. Wu. Towards Proximity Pattern Mining in
Large Graphs. SIGMOD, 2010.

[29] M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. ICDM,

2001.

[30] M. Kuramochi and G. Karypis. Finding frequent patterns in a large

sparse graph. Data Mining and Knowledge Discovery, 11(3):243271,
2005.

[31] Z. M. Ozsoyoglu L. Sheng and G. Ozsoyoglu. A graph query language

and its query processing. In ICDE, 1999.

[32] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Heller-

stein. Distributed graphlab: a framework for machine learning and data
mining in the cloud. Proc. VLDB Endow., 5(8):716727, 2012.

[33] J. Paredaens M. Gyssens and D. van Gucht. A graph-oriented object

database model. In PODS, page 417424, 1990.

[34] S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo. Capturing Topology in

Graph Pattern Matching. PVLDB, 2012.

[35] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn,

N. Leiser, and G. Czajkowski. PREGEL: A System for Large-Scale
Graph Processing. SIGMOD, 2010.

[36] S. Melnik, H. G.-Molina, and E. Rahm. Similarity Flooding: A Versatile

Graph Matching Algorithm and its Application to Schema Matching.
ICDE, 2002.

[37] R. Milner. Communication and Concurrency. Prentice Hall, 1989.

11
[38] T. Milo and D. Suciu. Index structures for path expressions. Lecture
Notes in Computer Science, 1540:277295, 1999.

[39] M. Mongiov, R. Di Natale, R. Giugno, A. Pulvirenti, A. Ferro, and

R. Sharan. SIGMA: A Set-Cover-Based Inexact Graph Matching Algo-
rithm. J. Bioinfo. and Comp. Bio., 2010.

[40] W. Moustafa, A. Deshpande, and L. Getoor. Ego-centric graph pattern

census. In ICDE, 2012.

[41] M. Newman. Networks: An Introduction. Oxford University Press, 2010.

[42] S. Nijssen and J. Kok. A quickstart in frequent structure mining can

make a difference. In Proc. of 2004 ACM Int. Conf. on Knowledge
Discovery in Data Mining (KDD04), pages 647652, 2004.

[43] D. Xin P. Zhao, X. Li and J. Han. Graph cube: On warehousing and

olap multidimensional networks. In SIGMOD, 2011.

[44] E. Petrakis and C. Faloutsos. Similarity searching in medical image

databases. Knowledge and Data Engineering, 9(3):435447, 1997.

[45] H. Shang, Y. Zhang, X. Lin, and J. Yu. Taming Verification Hardness:

An Efficient Algorithm for Testing Subgraph Isomorphism. PVLDB,
2008.

[46] D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and Applica-

tions of Tree and Graph Searching. PODS, 2002.

[47] A. Shokoufandeh, S. Dickinson, K. Siddiqi, and S. Zucker. Indexing

using a spectral encoding of topological structure. In Proc. of IEEE Int.
Conf. on Computer Vision and Pattern Recognition (CVPR99), pages
24912497, 1999.

[48] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient Subgraph

Matching on Billion Node Graphs. PVLDB, 2012.

[49] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph

matching on billion node graphs. Proc. VLDB Endow., 5(9):788799,
2012.

12
[50] Y. Tian, R. A. Hankins, and J. M. Patel. Efficient Aggregation for
Graph Summarization. SIGMOD, 2008.

[51] Y. Tian and J. M. Patel. TALE: A Tool for Approximate Large Graph
Matching. ICDE, 2008.

[52] R. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In

Proc. of 2006 SIAM Int. Conf. on Data Mining (SDM06), 2006.

[53] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast Best-

Effort Pattern Matching in Large Attributed Graphs. KDD, 2007.

[54] J. R. Ullmann. An Algorithm for Subgraph Isomorphism. J. ACM,

1976.

[55] N. Vanetik, E. Gudes, and S. Shimony. Computing frequent graph pat-

terns from semistructured data. In Proc. of 2002 Int. Conf. on Data
Mining (ICDM02), pages 458465, 2002.

[56] T. Washio and H. Motoda. State of the art of graph-based data mining.
SIGKDD Explorations, 5:5968, 2003.

[57] X. Yan P. S. Yu Y. Sun, J. Han and T. Wu. Pathsim: Meta path-based

top-k similarity search in heterogeneous information networks. In Proc.
of 2011 Int. Conf. on Very Large Data Bases (VLDB11), 2011.

[58] X. Yan, H. Cheng, P. S. Yu, and J. Han. Mining significant graph

patterns by leap search. In Proc. of 2008 ACM-SIGMOD Int. Conf. on
Management of Data (SIGMOD08), pages 433 444, 2008.

[59] X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining.

ICDM, 2002.

[60] X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns.
In Proc. of 2003 Int. Conf. on Knowledge Discovery and Data Mining
(KDD03), pages 286295, 2003.

[61] X. Yan, P. S. Yu, and J. Han. Graph Indexing: A Frequent Structure-

Based Approach. SIGMOD, 2004.

[62] S. Zhang, J. Yang, and W. Jin. SAPPER: Subgraph Indexing and Ap-
proximate Matching in Large Graphs. PVLDB, 2010.

13
[63] P. Zhao, J. Yu, and P. Yu. Graph Indexing: Tree + Delta >= Graph.
VLDB, 2007.

[64] Q. Zhong, H. Li, J. Li, G. Xie, J. Tang, L. Zhou, and Y. Pan. A

Gauss Function Based Approach for Unbalanced Ontology Matching.
SIGMOD, 2009.

[65] F. Zhu, X. Yan, J. Han, and P. S. Yu. gprune: a constraint pushing

framework for graph pattern mining. In Proc. of the 11th Pacific-Asia
conf. on Advances in knowledge discovery and data mining, pages 388
400, 2007.

[66] L. Zou, L. Chen, J. Yu, and Y. Lu. A Novel Spectral Coding in a Large
Graph Database. EDBT, 2008.

VDP Pitch Deck
No ratings yet
VDP Pitch Deck
38 pages
Ingram WebService ImplementationGuide - 2
No ratings yet
Ingram WebService ImplementationGuide - 2
64 pages
28 May 2020 / Document No. D20.104.03 Prepared By: Minatotw Endgame Author (S) : Eks & Mrb3N Classification: O Cial
No ratings yet
28 May 2020 / Document No. D20.104.03 Prepared By: Minatotw Endgame Author (S) : Eks & Mrb3N Classification: O Cial
18 pages
100 Sales Qualification Questions PDF
No ratings yet
100 Sales Qualification Questions PDF
16 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Unit 4
No ratings yet
Unit 4
78 pages
Co So Du Lieu Do Thi
No ratings yet
Co So Du Lieu Do Thi
46 pages
CA10 GraphMining
No ratings yet
CA10 GraphMining
59 pages
GraphMining 04 FrequentSubgraph
No ratings yet
GraphMining 04 FrequentSubgraph
61 pages
Seminars in bio lecture6 2022 Graphنينااااا
No ratings yet
Seminars in bio lecture6 2022 Graphنينااااا
22 pages
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
No ratings yet
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
4 pages
Data Mining Graphs and Networks
No ratings yet
Data Mining Graphs and Networks
5 pages
Grami-2014-Elseidy
No ratings yet
Grami-2014-Elseidy
12 pages
Finding Frequent Subpaths in A Graph
No ratings yet
Finding Frequent Subpaths in A Graph
12 pages
2019 BDA TKG Top-K-Subgraphs
No ratings yet
2019 BDA TKG Top-K-Subgraphs
18 pages
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
No ratings yet
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
18 pages
Mining Frequent Subgraph Patterns From Uncertain Graph Data
No ratings yet
Mining Frequent Subgraph Patterns From Uncertain Graph Data
16 pages
A Comparative Study of Frequent Subgraph Mining Algorithms
No ratings yet
A Comparative Study of Frequent Subgraph Mining Algorithms
17 pages
2019 BDA TKG Top-K-Subgraphs
No ratings yet
2019 BDA TKG Top-K-Subgraphs
19 pages
Pattern Mining Current Challenges and Op
No ratings yet
Pattern Mining Current Challenges and Op
16 pages
Shervashidze 11 A
No ratings yet
Shervashidze 11 A
23 pages
Graph Mining Tools
No ratings yet
Graph Mining Tools
3 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
Shortest Path Computing in Relational DBMSS: Jun Gao, Jiashuai Zhou, Jeffrey Xu Yu, and Tengjiao Wang
No ratings yet
Shortest Path Computing in Relational DBMSS: Jun Gao, Jiashuai Zhou, Jeffrey Xu Yu, and Tengjiao Wang
15 pages
An Efficient Algorithm For Discovering Frequent Subgraphs
No ratings yet
An Efficient Algorithm For Discovering Frequent Subgraphs
13 pages
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
No ratings yet
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
22 pages
Graph Relational Data
No ratings yet
Graph Relational Data
1 page
Support Computation For Mining Frequent Subgraphs in A Single Graph
No ratings yet
Support Computation For Mining Frequent Subgraphs in A Single Graph
6 pages
Feature-Based Similarity Search in Graph
No ratings yet
Feature-Based Similarity Search in Graph
36 pages
F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing
No ratings yet
F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing
15 pages
A New Method For Subgraph Detection - SubGraD
No ratings yet
A New Method For Subgraph Detection - SubGraD
8 pages
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
No ratings yet
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
17 pages
Indexing Sparse Graphs For Similarity Search
No ratings yet
Indexing Sparse Graphs For Similarity Search
3 pages
SADMJ12
No ratings yet
SADMJ12
19 pages
Graph Based Clustering
No ratings yet
Graph Based Clustering
78 pages
Paper Graph Mining
No ratings yet
Paper Graph Mining
8 pages
Graph Indexing - A Review
No ratings yet
Graph Indexing - A Review
40 pages
Graph Mining: A Survey of Graph Mining Techniques: August 2012
No ratings yet
Graph Mining: A Survey of Graph Mining Techniques: August 2012
6 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
DM Unit 2 Topics
No ratings yet
DM Unit 2 Topics
12 pages
Mining Concepts Apriori Frequent Pattern
No ratings yet
Mining Concepts Apriori Frequent Pattern
6 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Information Sciences: Chunyao Song, Tingjian Ge, Yao Ge, Haowen Zhang, Xiaojie Yuan
No ratings yet
Information Sciences: Chunyao Song, Tingjian Ge, Yao Ge, Haowen Zhang, Xiaojie Yuan
24 pages
Logical Reasoning To Reasoning Studies
No ratings yet
Logical Reasoning To Reasoning Studies
10 pages
Adaptive XML Tree Classification On Evolving Data Streams
No ratings yet
Adaptive XML Tree Classification On Evolving Data Streams
16 pages
An Introduction To Graph Data: IBM T. J. Watson Research Center Hawthorne, NY 10532
No ratings yet
An Introduction To Graph Data: IBM T. J. Watson Research Center Hawthorne, NY 10532
11 pages
Efficient Densest Subgraphs Discovery in Large Dynamic Graphs by Greedy Approximation
No ratings yet
Efficient Densest Subgraphs Discovery in Large Dynamic Graphs by Greedy Approximation
11 pages
Neural Subgraph Counting With Wasserstein Estimator - 副本
No ratings yet
Neural Subgraph Counting With Wasserstein Estimator - 副本
16 pages
IEEE - Finding Patterns in Three Dimensional Graphs Algorith
No ratings yet
IEEE - Finding Patterns in Three Dimensional Graphs Algorith
19 pages
ANF: A Fast and Scalable Tool For Data Mining in Massive Graphs
No ratings yet
ANF: A Fast and Scalable Tool For Data Mining in Massive Graphs
10 pages
1 s2.0 S0306437924000590 Main
No ratings yet
1 s2.0 S0306437924000590 Main
15 pages
Generic Pattern Mining
No ratings yet
Generic Pattern Mining
17 pages
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
No ratings yet
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
67 pages
Graph Databases: Their Power and Limitations
No ratings yet
Graph Databases: Their Power and Limitations
12 pages
Graph Theory and Patterns
No ratings yet
Graph Theory and Patterns
4 pages
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
No ratings yet
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
16 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
No ratings yet
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
12 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Ozlink Mobile
No ratings yet
Ozlink Mobile
2 pages
Oreilly Modern Web Development On The Jamstack
100% (1)
Oreilly Modern Web Development On The Jamstack
127 pages
Six Circles PDF
No ratings yet
Six Circles PDF
50 pages
An Investor's Due Diligence
No ratings yet
An Investor's Due Diligence
3 pages
ISO Datasheet UE-71D10 Preliminary BMST
No ratings yet
ISO Datasheet UE-71D10 Preliminary BMST
1 page
USBank Terms and Conditions
No ratings yet
USBank Terms and Conditions
36 pages
First Tech Credit Union Fees Schedule
No ratings yet
First Tech Credit Union Fees Schedule
3 pages
System 3R EMD Electrode Holder T-2389-E - Edm
No ratings yet
System 3R EMD Electrode Holder T-2389-E - Edm
108 pages
Moldbase Library Catalog
No ratings yet
Moldbase Library Catalog
406 pages
Platform Developer I Exam Revision 3
No ratings yet
Platform Developer I Exam Revision 3
54 pages
Chapter 5 Mcqs Poe
No ratings yet
Chapter 5 Mcqs Poe
120 pages
Delhi Institute of Engineering Studies
No ratings yet
Delhi Institute of Engineering Studies
16 pages
Oomd 1
No ratings yet
Oomd 1
18 pages
I-06-Database Migration - Zero Downtime Migration-Transcript
0% (1)
I-06-Database Migration - Zero Downtime Migration-Transcript
2 pages
Pavan Kumar Devulapalli
No ratings yet
Pavan Kumar Devulapalli
4 pages
Pig: Building High-Level Dataflows Over Map-Reduce: Utkarsh Srivastava
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce: Utkarsh Srivastava
46 pages
An Introduction To Cloud Database
No ratings yet
An Introduction To Cloud Database
48 pages
Basic SQL: IS 2511 - Fundamentals of Database Systems
No ratings yet
Basic SQL: IS 2511 - Fundamentals of Database Systems
53 pages
Chapter 11 - Developing and Managing Customer Related Databases
No ratings yet
Chapter 11 - Developing and Managing Customer Related Databases
22 pages
IT Practical File CLASS 10 Project 11-15
No ratings yet
IT Practical File CLASS 10 Project 11-15
16 pages
Pel PDF
No ratings yet
Pel PDF
4 pages
Website Panel Installation Guide
No ratings yet
Website Panel Installation Guide
52 pages
PRESENT
No ratings yet
PRESENT
18 pages
MCQ Day1 DBMS - 13
No ratings yet
MCQ Day1 DBMS - 13
12 pages
Enterprise Service Oriented Architecture Presentation
No ratings yet
Enterprise Service Oriented Architecture Presentation
28 pages
Government Service Bus
No ratings yet
Government Service Bus
7 pages
0131477005
No ratings yet
0131477005
10 pages
Online Voting Report
No ratings yet
Online Voting Report
16 pages
Stock Database Project Report v1
No ratings yet
Stock Database Project Report v1
9 pages
MERN Stack Project Presentation
No ratings yet
MERN Stack Project Presentation
10 pages
Resume Placement
No ratings yet
Resume Placement
2 pages
Case Study of Indian Railway
No ratings yet
Case Study of Indian Railway
7 pages
Car Rental Documentaion
No ratings yet
Car Rental Documentaion
35 pages
PlantStruxure PES Configuration v4.2 2016-09-28 Vol1
No ratings yet
PlantStruxure PES Configuration v4.2 2016-09-28 Vol1
429 pages
AI Data Analysis Project Plan Final
No ratings yet
AI Data Analysis Project Plan Final
4 pages
4D Construction Sequence Planning - New Process and Data Model PDF
No ratings yet
4D Construction Sequence Planning - New Process and Data Model PDF
6 pages

Graph Pattern Mining, Search and OLAP

Uploaded by

Graph Pattern Mining, Search and OLAP

Uploaded by

Graph Pattern Mining, Search and OLAP

1 Graph Pattern Mining

Multiple Graphs: Given a set of graphs, D = {G1 , G2 , . . . , Gn }, a graph

Single Graph: Given a graph G, a graph g is frequent if g has more

Multiple Graphs: Given a set of graphs, D = {G1 , G2 , . . . , Gn } and a

For graph search, it is inefficient to perform a sequential scan on a graph

3 Graph Similarity Search

4 Graph Query Language

[2] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algo-

[4] C. Borgelt and M. Berthold. Mining molecular fragments: Finding rel-

[5] M. Brocheler, A. Pugliese, and V. S. Subrahmanian. COSI: Cloud Ori-

[7] J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-Index: Towards Verification-

[8] M. Consens and A. Mendelzon. Graphlog: a visual formalism for real

[9] S. Cook. The complexity of theorem-proving procedures. In Proc. of the

[10] B. Cooper, N. Sample, M. Franklin, G. Hjaltason, and M. Shadmon. A

[11] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph Iso-

[13] W. Fan, J. Li, S. Ma, H. Wang, and Y. Wu. Graph Homomorphism

[14] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient Belief Propagation

[16] R. Goldman and J. Widom. Dataguides: Enabling query formulation

[17] R. H. Guting. Graphdb: Modeling and querying graphs in databases.

[19] H. He and A. Singh. Closure-Tree: An Index Structure for Graph

[20] H. He and A. Singh. Graphs-at-a-time: query language and access meth-

[21] L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the

[22] J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph

[23] A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm

[24] A. Bosworth A. Layman D. Reichart M. Venkatrao F. Pellow J. Gray,

[25] H. Jiang, H. Wang, P. Yu, and S. Zhou. GString: A Novel Approach

[27] A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao. Neigh-

[29] M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. ICDM,

[30] M. Kuramochi and G. Karypis. Finding frequent patterns in a large

[31] Z. M. Ozsoyoglu L. Sheng and G. Ozsoyoglu. A graph query language

[32] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Heller-

[33] J. Paredaens M. Gyssens and D. van Gucht. A graph-oriented object

[34] S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo. Capturing Topology in

[35] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn,

[36] S. Melnik, H. G.-Molina, and E. Rahm. Similarity Flooding: A Versatile

[37] R. Milner. Communication and Concurrency. Prentice Hall, 1989.

[39] M. Mongiov, R. Di Natale, R. Giugno, A. Pulvirenti, A. Ferro, and

[40] W. Moustafa, A. Deshpande, and L. Getoor. Ego-centric graph pattern

[41] M. Newman. Networks: An Introduction. Oxford University Press, 2010.

[42] S. Nijssen and J. Kok. A quickstart in frequent structure mining can

[43] D. Xin P. Zhao, X. Li and J. Han. Graph cube: On warehousing and

[44] E. Petrakis and C. Faloutsos. Similarity searching in medical image

[45] H. Shang, Y. Zhang, X. Lin, and J. Yu. Taming Verification Hardness:

[46] D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and Applica-

[47] A. Shokoufandeh, S. Dickinson, K. Siddiqi, and S. Zucker. Indexing

[48] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient Subgraph

[49] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph

[52] R. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In

[53] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast Best-

[54] J. R. Ullmann. An Algorithm for Subgraph Isomorphism. J. ACM,

[55] N. Vanetik, E. Gudes, and S. Shimony. Computing frequent graph pat-

[57] X. Yan P. S. Yu Y. Sun, J. Han and T. Wu. Pathsim: Meta path-based

[58] X. Yan, H. Cheng, P. S. Yu, and J. Han. Mining significant graph

[59] X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining.

[61] X. Yan, P. S. Yu, and J. Han. Graph Indexing: A Frequent Structure-

[64] Q. Zhong, H. Li, J. Li, G. Xie, J. Tang, L. Zhou, and Y. Pan. A

[65] F. Zhu, X. Yan, J. Han, and P. S. Yu. gprune: a constraint pushing

You might also like