An Introduction to Graph Data Management
An Introduction to Graph Data Management
T
historical overview of its main development, and study the main current
systems that implement them.
AF
DR
Table of Contents
T
2.5 The Property Graph Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Graph Database Query languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Essential Graph Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Graph Analytical Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
AF
3.3 Graph Query Languages in Practice . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Graph Database Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Graph databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 RDF Database Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Graph Processing Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
DR
Introduction
It has been long recognized that graphs are a natural way to represent infor-
mation and knowledge. In fact, graph database (abbreviated graph-db) models
have a long development, at least since the 1980’s. But it is only recently that
several technological developments have made it possible to make this abstract
idea a reality. Powerful hardware to store and process graphs; powerful sensors
to record directly the information; powerful machines that allow to analyze and
visualize graphs, among other factors, have given rise to the current flourishing
in the area of graph data management.
Graph database models. In this lecture we will introduce the notion of graph-db
model. As is well known, a data model can be characterized by three basic compo-
nents, namely data structures, query and transformation language, and integrity
constraints. Following this definition, a graph-db model is a model where data
structures for the schema and/or instances are modeled as graphs (or general-
T
izations of them), where the data manipulation is expressed by graph-oriented
operations (i.e. a graph query language), and appropriate integrity constraints
can be defined over the graph structure
AF
The main characteristic of a graph database is that the data are conceptually
modeled and presented to the user as a graph, that is the data structures (data
and/or schema) are represented by graphs, or by data structures generalizing
the notion of graph (e.g. hypergraphs or hypernodes). One of the main features
of a graph structure is the simplicity to model unstructured data. Therefore,
in graph-db models the separation between schema and data (instances) is less
marked than in the classical relational model.
Regarding data manipulation and querying, it is is expressed by graph trans-
DR
Graph Database Systems There are two categories of graph database systems:
graph databases and graph processing frameworks. The former are database
systems, much like the relational ones, which aim at storing and querying graph
data. The latter are frameworks that batch process big graphs, putting emphasis
on throughput, hence taking advantage of multiple machines. These systems
provide two perspectives for storing and querying graph data, each one with
their own goals.
T
goal is to serve as complementary material of the lecture. It is important to note
that, on one hand, historical references are mentioned to help the student, rather
than establish historical facts. On the other hand, the material is not novel and
we relied heavily our own previous surveys on the subject, sometimes, copying
AF
literally whole paragraphs from them.
– Allow for a natural modeling of data when it has graph structure. Graph
structures become visible to the user and they allow a natural way of han-
dling applications data. Graphs have the advantage of being able to keep all
the information about an entity in a single node and show related informa-
tion by arcs connected to it. Graph objects (like paths, neighborhoods) may
have first order citizenship.
– Queries can address direct and explicitly this graph structure. Associated
with graphs are specific graph operations in the query language algebra,
such as finding shortest paths, determining certain subgraphs, and so forth.
Explicit graphs and graph operations allow users to express a query at a
high level of abstraction. Another advantage is that it is not important to
require full knowledge of the structure to express meaningful queries.
– Implementation-wise, graph databases may provide special graph storage
structures, and efficient graph algorithms available for realizing specific graph
operations over the data.
T
simple notion of relation, which together with its associated algebra and logic,
made the relational model a primary model for database research. In particular,
its standard query and transformation language, SQL, became a paradigmatic
AF
language for querying. It popularized the concept of abstraction levels by intro-
ducing a separation between the physical and logical levels. Gradually the focus
shifted to modeling data as seen by applications and users (that is, tables) [146].
The di↵erences between graph db-models and the relational db-model are mani-
fold. The relational model is geared towards simple record-type data, where the
data structure is known in advance (airline reservations, accounting, inventories,
etc.). The schema is fixed, which makes it difficult to extend these databases.
It is not easy to integrate di↵erent schemas, nor is it automatable. The query
DR
language cannot explore the underlying graph of relationships among the data,
such as paths, neighborhoods, patterns.
Semantic db-models [152] focus on the incorporation of richer and more ex-
pressive semantics into the database, from a user’s viewpoint. Database designers
can represent objects and their relations in a natural and clear manner (similar
to the way users view an application) by using high-level abstraction concepts
such as aggregation, classification and instantiation, sub- and super-classing,
attribute inheritance and hierarchies [146]. In general, the extra semantics sup-
port database design and evolution [114]. A well-known and successful case is
the entity-relationship model [78], which has become a basis for the early stages
of database design. Semantic db-models are relevant to graph db-model research
because the semantic db-models reason about the graph-like structure generated
by the relationships between the modeled entities.
Object-oriented (O-O) db-models [122] are designed to address the weaknesses
of the relational model in data intensive domains (Knowledge bases, engineering
applications). This research is motivated by applications involving complex data
objects and complex object interactions, such as CAD/CAM software, com-
puter graphics and information retrieval. According to the O-O programming
paradigm on which these models are based, they represent data as a collection
of objects that are organized into classes, and have complex values and methods.
O-O db-models are related to graph db-models in their explicit or implicit use
of graph structures in definitions [131, 58, 108]. Nevertheless, there are impor-
tant di↵erences with respect to the approach for modeling how to model the
world. O-O db-models view the world as a set of complex objects having cer-
tain state (data), where interaction is via method passing. On the other hand,
graph db-models view the world as a network of relations, emphasizing data
interconnection, and the properties of these relations. O-O db-models focus on
object dynamics, their values and methods. Graph db-models focus instead on
the interconnection, while maintaining the structural and semantic complexity
of the data.
Semistructured db-models [72, 53] were motivated by the increased existence
of semistructured data (also called unstructured data), data exchange, and data
browsing mainly on the Web [72]. In semistructured data, the structure is ir-
regular, implicit and partial; the schema does not restrict the data, it only de-
scribes it, a feature that allows extensible data exchanges; the schema is large
T
and constantly evolving; the data is self-describing, as it contains schema in-
formation [53]. Among the most representative original models are OEM [149],
Lorel [54], UnQL [73], ACeDB [168] and Strudel [99]. Many of these ideas can
AF
be seen in current semi-structured languages like XML or JSON. Generally,
semistructured data is represented using a tree-like structure. However, cycles
between data nodes are possible, which leads to graph-like structures like in
graph db-models. Some authors characterize semistructured data as rooted di-
rected connected graphs [73].
The origins of graph databases can be dated at least to the nineties, where much
of the theory developed. Probably due to the lack of hardware support to manage
big graphs, this line of research declined for a while until a few years ago, when
a second wave of research was initiated.
The first wave. In an early approach, facing the failure of contemporary systems
to take into account the semantics of a database, a semantic network to store
data about the database was proposed [158] . An implicit structure of graphs
for the data itself was presented in the Functional Data Model [166], whose goal
was to provide a “conceptually natural” database interface. A di↵erent approach
proposed the Logical Data Model (LDM) [126], where an explicit graph db-
model intended to generalize the relational, hierarchical and network models.
Later [125] proposed a graph db-model for representing complex structures of
knowledge called G-Base.
In the late eighties an object-oriented db-model based on a graph structure,
called O2 , was introduced [128]. Along the same lines, GOOD [108] is an in-
fluential graph-oriented object model, intended to be a theoretical basis for a
system in which manipulation as well as representation are transparently graph-
based. Among the subsequent developments based on GOOD are: GMOD [58]
that proposes a number of concepts for graph-oriented database user interfaces;
Gram [57] which is an explicit graph db-model for hypertext data; PaMaL [100]
which extends GOOD with explicit representation of tuples and sets; GOAL [113]
that introduces the notion of association nodes; G-Log [151] which proposed a
declarative query language for graphs; and GDM [112] that incorporates repre-
sentation of n-ary symmetric relationships.
There were proposals that used generalization of graphs with data modeling
purposes. [130] The Hypermodel [128] (which we will develop in more detail)
was a model based on nested graphs on which subsequent work was developed
[153, 129]. The same idea was used for modeling multi-scaled networks [136] and
genome data [102].
Another generalization of graphs, hypergraphs, gave rise to another family of
models. GROOVY [131] is an object-oriented db-model based on hypergraphs.
This generalization was used in other contexts: query and visualization in the
Hy+ system [86]; modeling of data instances and access to them [175]; represen-
tation of user state and browsing [171];
T
There are several other proposals that deal with graph data models. Güting
proposed GraphDB [155] intended for modeling and querying graphs in object-
oriented databases and motivated by managing information in transport net-
AF
works. Database Graph Views [106] proposed an abstraction mechanism to define
and manipulate graphs stored in either relational object-oriented or file systems.
The project GRAS [121] uses attributed graphs for modeling complex informa-
tion from software engineering projects. The well known OEM [149] model aims
at providing integrated access to heterogeneous information sources, focusing on
information exchange.
Another important line of development has to do with data representation
models and the World Wide Web. Among them are data exchange models like
DR
XML [69], metadata representation models like RDF [123] and ontology repre-
sentation models like OWL [141].
The second wave. We are witnessing the second impulse of development of graph
data management which is focused on one hand, in practical systems, and on
the other, in theoretical analyses particularly of graph query languages. We
will review the former in Section 4 concentrating in database systems and will
leave the latter out of this lecture. The interested reader will find the Barcelo’s
tutorial [65] helpful for the subject.
T
and entities.
undefined concept to one defined by multiple complex relations. Note that, hy-
AF
pergraphs can be modeled by hypernodes by (i) encapsulating the contents of
each undirected hyperedge within a further hypernode and (ii) replacing each
directed hyperedge by two hypernodes related by a labeled edge. In contrast,
multi-level nesting provided by hypernodes cannot be easily captured by hyper-
graphs.
Next, we will present these data structures and show a paradigmatic example
of each. We will also present the most popular graph data structures today, RDF
for the Web, and the property graph model for graph databases.
DR
The most basic data structure for graph database models is a directed graph
with nodes and edges labeled by some vocabulary. A good example is Gram
[57], a graph db-model motivated by hypertext querying.
A schema in Gram is a directed labeled multigraph, where each node is la-
beled with a symbol called a type, which has associated a domain of values. In the
same way, each edge has assigned a label representing a relation between types
(see example in Figure 1). A feature of Gram is the use of regular expressions for
explicit definition of paths called walks. An alternating sequence of nodes and
edges represent a walk, which combined with other walks conforms other special
objects called hyperwalks.
For querying the model (particularly path-like queries), an algebraic language
based on regular expressions is proposed. For this purpose a hyperwalk algebra
is defined, which presents unary operations (projection, selection, renaming) and
binary operations (join, concatenation, set operations), all closed under the set
of hyperwalks.
Fig. 2. GROOVY. At the schema level (left), we model an object PERSON as an
hypergraph that relates the attributes NAME, LASTNAME and PARENTS. Note
T
the value functional dependency (VDF) NAME,LASTNAME ! PARENTS logically
represented by the directed hyperedge ({NAME,LASTNAME} {PARENTS}). This
VFD asserts that NAME and LASTNAME uniquely determine the set of PARENTS.
AF
2.2 The Hypergraph Data Model
T
as mappings and records. A key feature is its inherent ability to encapsulate
information.
The hypernode model which we will use as example was introduced in [130].
AF
It defines the model and a declarative logic-based language structured as a se-
quence of instructions (hypernode programs), used for querying and updating
hypernodes. A more elaborated version [153] includes the notion of schema and
type checking, introduced via the idea of types (primitive and complex), that are
also represented by nested graphs (See an example in Figure 3). It also includes
a rule-based query language called Hyperlog, which can support both querying
and browsing with derivations as well as database updates, and is intractable
in the general case. A third version of the model [129] discusses a set of con-
DR
considered as a labeled graph, called an RDF Graph (see example in Figure 4),
although formally is not a graph [111].
DR
SPARQL [154] is the standard query language for RDF proposed by the
W3C. It is able to express complex graph patterns by means of a collection of
triple patterns whose solutions can be combined and restricted by using several
operators (i.e. AND, UNION, OPTIONAL, and FILTER). The latest version of
the language, SPARQL 1.1 [98], includes explicit operators to express negation
of graph patterns, arbitrary length path matching (i.e. reachability), aggregate
operators (e.g. COUNT), subqueries, and query federation.
T
to provide schema-based restrictions. Additionally, each (directed) edge has a
unique identifier and one or more labels.
AF
Property graphs are used extensively in computing as they are more expres-
sive than the simplified mathematical objects studied in theory. However, note
that expressiveness is defined by ease of use, not by the limits of what can be
modeled [157]. In fact, the property graph model can express other types of
graph models by simply abandoning or adding particular bits and pieces [156].
There is no standard query language for property graphs although some pro-
posals are available. Blueprints [11] was one of the first libraries created for
DR
the property graph data model. Blueprints is analogous to the JDBC, but for
graph databases. Gremlin [7] is a functional graph query language which al-
lows to express complex graph traversals and mutation operations over property
graphs. Neo4j [30] provides Cypher [15], a declarative query language for prop-
erty graphs. The syntax of Cypher, very similar to SQL via expressions match-
where-return, allows to easily express graph patterns and path queries. PQL [36]
is a path query language for property graphs which was derived from Lorel. PQL
is based on a mapping from path patterns to nested relational algebra extended
with transitive closure.
Graph databases system address two main kinds of query workloads: (i) low-
latency online graph query processing (e.g., social network transactions), and
(ii) high-throughput o✏ine graph analytics (e.g., PageRank computation). The
former are the focus of graph databases whereas the latter are the speciality of
graph processing frameworks.
In this section we will present examples of both types of graph queries. Addi-
tionally, we present examples of queries expressed in three di↵erent graph query
languages.
In this section we present a set of essential queries that are supported by di↵erent
graph query languages.
T
p
such a ! b is an edge in the data graph.
Pattern query languages combine basic patterns using algebraic structures
(union, di↵erence, filtering, etc.), given rise to complex and expressive expres-
sions.
AF
Pattern matching has attracted a great deal of attention in database theory
[182, 165, 64, 79, 96, 94], data mining [174, 180, 181], bioinformatics [173], the se-
mantic Web [76], social networks [61], and user-interfaces [58]. Today it is at the
core of most graph query languages.
Adjacency queries This type of queries are special cases of the most elemen-
tary patterns. The primary notion in this type of queries is node/edge adjacency.
DR
Two nodes are adjacent (or neighbors) when there is an edge between them. Sim-
ilarly, two edges are adjacent when they share a common node. Think of these
p x?
queries as a !?x or a ! b, etc.
This class includes the following types of operations:
– basic node/edge adjacency [124]: tests whether two nodes (edges) are adja-
cent.
– k-neighborhood of a node [148]: for a node v, the k-neighborhood of v is
the set K of all nodes that are reachable from v via a path of k edges (or
“hops“). For instance, the k-neighborhood of v where k = 1 is the set of
nodes containing v and immediate neighbors to v.
– k-hops[91]: returns all the nodes that are at a distance of k edges from the
root node. Note that a k-neighborhood query can be expressed as a compo-
sition of k-hops queries (1-hops [ · · · [ k-hops), but removing duplicates.
Some application areas include spatial databases [160], molecular biology [162],
information retrieval (for web ranking using hubs and authorities)[77], semantic
Web [104], and recommendation systems (to obtain a particular user’s neighbor-
hood with similar interest) [91].
Reachability queries (connectivity) These types of queries are characterized
by path or traversal problems. The problem of reachability tests whether two
given nodes are connected by a path. A reachability query may involve the
generation of a boolean result, a set of nodes, a single possible solution path, or
a set of possible paths. From the literature we identify the following reachability
queries:
T
– Dual single-sink problem [182]: to find all the nodes that can reach a given
sink node v.
– Fixed-length paths: find a path which contain a fixed number of nodes and
edges.
AF
– Simple paths [142]: it implies that the path does not use any node of edge
more than once.
– Regular path queries [142, 55, 133, 90, 70, 169, 95, 132]: find a path which al-
low some node and edge restrictions (i.e., regular expressions).
– Conjunctive regular path queries [177, 74, 63]: a query that combines con-
junctive queries and regular path queries.
– Shortest path [62]: compute the quickest/shortest route between two nodes.
DR
Summarization queries These types of queries do not consult the graph struc-
ture; instead they are based on special operators that allow to summarize the
query results, normally returning a single value. Aggregate functions (e.g., aver-
age, Count, maximum, etc.) are included in this group. Summarization queries
can be used to answer the following queries:
There are several important algorithms for graph analysis and mining (see [56]
for a extensive review). However, there exists no standard categorization of such
algorithms. Next, we briefly review some categories and graph analytical algo-
rithms described in articles that compare graph processing frameworks.
In 2010, the authors of [91] presented a very complete categorization by divid-
ing graph operations into: traversals (shortest path and k-hops), graph analysis
(hop-plot, diameter, eccentricity, density and clustering coefficient), components
(connected components, bridges and cohesion), communities (dendrogram, max-
flow min-cut and clustering), centrality measures (degree centrality, closeness
centrality and betweenness centrality), pattern matching (graph and subgraph
matching), graph anonymization (k-degree and k-neighborhood anonymization)
and other operations (PageRank and structural equivalence).
For an empirical evaluation of graph-processing algorithm [105], the graph
processing algorithms were categorized into five groups: general statistics, graph
T
traversal, connected components, community detection, and graph evolution.
The general statistics (STATS) [105] algorithm computes the number of vertices
and edges, and the average of the local clustering coefficient of all vertices. Graph
traversals are characterized by their random memory access pattern and data
AF
driven computation [81]. Breadth-first search (BFS) is a widely used algorithm
in graph traversals, which is often a building block for more complex algorithms,
such as item search, distance calculation, diameter calculation, shortest path
and longest path. Connected component (CONN) is an algorithm for extract-
ing groups of vertices that can reach each other via graph edges. Community
detection (CD) is important for social network applications, as users of these
networks tends to form communities, that is, groups whose constituent nodes
DR
form more relationships within the group than with nodes outside the group.
For graph evolution (EVO) [105], an accurate algorithm not only can predict
how a graph structure will evolve over time, but can also help to prepare for
these changes (for example data size increase).
In a comparison study [184] of parallel processing systems, the authors consid-
ered three graph algorithms commonly used in network analysis studies, namely
PageRank, Shortest Path and Triangle Counting. PageRank and its variants
such as Personalized PageRank are e↵ective methods for link prediction based
on finding structure similarities between nodes in a network. The clustering co-
efficient of a vertex expresses the chance of how likely its neighbors are also
connected to one another. The (global) clustering coefficient is based on triplets
of nodes, where a triplet consists of three nodes that are connected by either
two (open) or three (closed) undirected ties. The global clustering coefficient
is therefore defined as the fraction of the number of closed triplets over num-
ber of total connected triplets of vertices. Therefore, the method of calculating
the global clustering coefficient is also often called triangle counting [176]. The
characteristic path length is the average shortest path length in a network [176],
which measures the average degree of separation between nodes in a network
(or network component). The clustering coefficient and the characteristic path
Fig. 6. A sample property graph database which contains social network data.
T
length are important network measures that are often used to determine the
type of a network (e.g., random, small-world and scale-free).
In a recent experimental comparison of Pregel-like graph processing systems,
the authors [109] considered four categories of graph algorithms: random walk,
AF
sequential traversal, parallel traversal, and graph mutation. Random walk algo-
rithms perform computations on all vertices based on the random walk model.
Examples of algorithms in this category are PageRank and HITS. The cate-
gory of sequential traversal includes algorithms like single-source shortest path,
breadth-first search, depth-first search and reachability. Algorithms like weakly
connected components, label propagation and graph clustering are in the group
of parallel traversal. Finally, the category of graph mutation contains algorithms
for computing the minimum spanning tree, graph coarsening and graph aggre-
DR
gation.
In this section, we will present a brief description of three graph query languages
(SPARQL, G-SPARQL and Cypher) by presenting their syntax and main fea-
tures. Additionally, we compare the expressivity of the three languages by pre-
senting several types of queries in natural language, and showing how they are
expressed (if possible) in each query language.
The queries discussed in this section will be based on a graph dataset similar
to the property graph showed in Figure 6. The sample property graph uses
nodes to represent Persons, Tags and Posts, which are common entities in the
social network use-case. Two persons can be related by using the relationship
(edge) “knows”. Persons could be related with Tags and Posts via relationships
“hasInterest” and “like” respectively. A Post can be associated with a Tag via
the “hasTag” relation. Persons, Tags and Posts can contain di↵erent properties.
The edges of type “knows” include the property “year” to register the year when
the relationship was created.
Pattern Matching Queries A pattern matching query is based on the defini-
tion of a graph pattern and the objective is to find subgraphs (in the database
graph) satisfying the graph pattern. We consider several types of pattern match-
ing queries depending on the complexity of the graph pattern.
– Single node graph patterns This type of query looks for nodes having a given
attribute or a condition over an attribute.
Example: return the persons whose attribute first name is “James”.
## SPARQL 1.0 and SPARQL 1.1
SELECT ?X
FROM <http://www.socialnetwork.org>
WHERE { ?X sn:firstName "James" }
## G-SPARQL
SELECT ?X
WHERE { ?X @firstName "James" }
T
## CYPHER
MATCH (person:Person)
WHERE person.firstName="James"
RETURN person
AF
– Single graph patterns A single graph pattern consists of a single structure
node-edge-node where variables are allowed in any part of the structure. A single
graph pattern is oriented to evaluate adjacency between nodes.
Example: return the pairs of persons related by the “knows” relationship.
## SPARQL 1.0 and SPARQL 1.1
PREFIX sn: <http://www.socialnetwork.org/>
SELECT ?X ?Y
DR
WHERE { ?X sn:knows ?Y }
## G-SPARQL
SELECT ?X ?Y
WHERE { ?X knows ?Y }
## CYPHER
MATCH (person1:Person)-[:knows]->(person2:Person)
RETURN person1, person2
## G-SPARQL
SELECT ?N
WHERE { ?X @type "Person" . ?X @firstName ?N .
?X knows ?Y . ?Y @firstName "Thomas" }
## CYPHER
MATCH (person:Person)-[:knows]-(thomas:Person)
WHERE thomas.firstName="Thomas"
RETURN person.firstName
T
“Queen” or “U2”.
SELECT ?X
WHERE { ?X sn:type sn:Person . ?X sn:hasInterest ?T . ?T sn:type sn:Tag .
?T sn:name ?N . FILTER ( ?N = "Queen" || ?N = "U2" ) }
## G-SPARQL
SELECT ?X
WHERE { ?X @type "Person" . ?X hasInterest ?T . ?T @type "Tag" .
?T @name ?N . FILTER ( ?N = "Queen" || ?N = "U2" ) }
## CYPHER
MATCH (person:Person)-[:hasInterest]->(tag:Tag)
WHERE tag.name="Queen" OR tag.name="U2"
RETURN DISTINCT person
## SPARQL 1.0
PREFIX sn: <http://www.socialnetwork.org/>
SELECT ?N
WHERE {
?X sn:type sn:Person .
{ ?X sn:firstName ?N . OPTIONAL { ?X sn:likes ?P } }
FILTER (!bound(?P))
}
## SPARQL 1.1
PREFIX sn: <http://www.socialnetwork.org/>
SELECT ?N
WHERE {
{ ?X sn:type sn:Person . ?X sn:firstName ?N }
MINUS
{ ?X sn:likes ?P }
}
## G-SPARQL
SELECT ?N
WHERE {
T
?X @type "Person" .
{ ?X @firstName ?N . OPTIONAL { ?X likes ?P } }
FILTER (!bound(?P))
}
AF
## CYPHER
MATCH (person:Person)
WHERE NOT((person)-[:likes]->(:Post))
RETURN person.firstName
## G-SPARQL
SELECT ?X
WHERE {
?X @type "Person" . ?X @age ?A . FILTER (?A > 18 && ?A < 30)
}
## CYPHER
MATCH (person:Person)
WHERE person.age>18 and person.age<30
RETURN person
Fixed-length path queries A fixed-length path query is a special type of
graph pattern which represents a traversal from a source node to a target node,
by including a fixed number of nodes and edges.
Example: Find the names of people at distance 2 from “James” by following
“knows” links.
## G-SPARQL
SELECT ?N
WHERE { ?X @type "Person" . ?X @firstName "James" .
?X knows ?Z . ?Z knows ?Y . ?Y @firstName ?N .
T
FILTER (!(?Y = ?X || ?Y = ?Z)) }
## CYPHER
MATCH
AF
(james:Person{firstName:"James"})-[:knows*2..2]-(person:Person)
WHERE
NOT(person=james) AND NOT((person)-[:knows]-(james))
RETURN
DISTINCT person.firstName
## SPARQL 1.1
PREFIX sn: <http://www.socialnetwork.org/>
SELECT ?N
WHERE { ?X sn:type sn:Person . ?X sn:firstName "James" .
?X sn:knows* ?Y .
?Y sn:firstName ?N }
## G-SPARQL
SELECT ?N
WHERE { ?X @type "Person" . ?X @firstName "James" .
?X knows* ?Y .
?Y @firstName ?N }
## CYPHER
MATCH (james:Person)-[:knows*]-(reachablePerson:Person)
WHERE james.firstName="James"
RETURN DISTINCT reachablePerson
T
## G-SPARQL
SELECT ?N ??P
WHERE { ?X @type "Person" . ?X @firstName "James" .
?X ?PP(knows*) ?Y .
AF
?Y @firstName ?N }
## CYPHER
MATCH path = (james:Person)-[:knows*]-(reachablePerson:Person)
WHERE james.firstName="James"
RETURN DISTINCT reachablePerson, path
Note that these queries will return one path per “reachable” person but not
DR
Regular path queries with path-length restrictions Some languages allow to re-
strict the length of the paths returned by a regular path query.
G-SPARQL allows expressions of the form Length(??P, <3) which allows to
filter the path variable ??P with the length condition <3. In Cypher, path-length
restrictions can be defined in the recursive relation, for example [:KNOWS*1..3].
SPARQL 1.1 is not able to express path-length restrictions.
Example: return the paths of length 5-10, along the “knows” relation, that
connect ’James’ and ’Axel’.
## G-SPARQL
SELECT ??P
WHERE { ?X @type "Person" . ?X @firstName "James" .
?X ??P(knows*) ?Y . ?Y @firstName "Axel" .
FILTERPATH(Length(??P, >=5)) . FILTERPATH(Length(??P, <=10)) }
## CYPHER
MATCH path = (james:Person)-[:KNOWS*5..10]-(axel:Person)
WHERE james.firstName="James" AND axel.firstName="Axel"
RETURN path
Regular path queries with value-based restrictions This type of queries implies
the introduction of value-based restriction over the paths returned by a regular
path query, i.e. value conditions over the attributes of nodes and edges belonging
to the resulting paths.
SPARQL 1.1 does not support this kind of restrictions. G-SPARQL is char-
acterized by allowing valued-based restrictions over specific nodes and edges (i.e.
AtLeastNode, AtMostNode, AllNodes, AtLeastEdge, AtMostEdge, AllEdges). In
the case of Cypher, this type of restrictions can be defined in the WHERE clause.
Example: find the first name of people that can be reached from “James”
by following relations “knows” created during the year 2012 (assume that each
relation “knows” contains an attribute “year” of creation).
## G-SPARQL
SELECT ?N
WHERE { ?X sn:firstName "James" .
?X ??P(sn:knows+) ?Y .
T
?Y sn:firstName ?N .
FILTERPATH(AllEdges(??P, @year "2012")) }
## CYPHER
MATCH (james:Person)-[r:KNOWS]-(other:Person)
AF
WHERE james.firstName="James" AND r.year=2012
RETURN other.firstName
Regular path queries with structural restrictions This type of queries implies the
introduction of structural restrictions over the paths returned by a regular path
query, i.e. conditions over the edges of the nodes in the resulting paths.
Only Cypher is able to provide this kind of restrictions. They can be defined
DR
in the clause WHERE by using the collection function nodes(path) which returns
the nodes occurring in the resulting paths associate to the given path.
Example: find the first name of people that can be reached from “James” by
following relations “knows” and satisfying that each people in the sequence also
knows “James” ).
## CYPHER
MATCH path = (james:Person)-[:KNOWS*]-(other:Person)
WHERE filter(node IN nodes(path) WHERE (node)-[:KNOWS]-(james))
RETURN other.firstName
Shortest path queries A shortest path query means to compute the quickest/shortest
route between two nodes in the graph. Most languages provide ad-hoc functions
to calculate shortest paths queries. In some cases the shortest path can be calcu-
lated by combining a reachability query with aggregate operators (e.g. COUNT
+ MIN), although it requires that the reachability query results in a set of paths.
SPARQL 1.1. is not able to calculate shortest path queries. G-SPARQL and
Cypher provide special predicates to return the shortest path.
Example: Return the shortest path between “James” and “Axel” by following
the relation “knows”.
## G-SPARQL
SELECT ?*P
WHERE { ?X @firstName "James" .
?X ?*P(knows+) ?Y .
?Y @firstName "Axel"}
## CYPHER
MATCH (jame:Person{firstName:"James"}),(axel:Person{firstName:"Axel"})
MATCH path=shortestPath((james)-[:knows*]-(axel))
RETURN path
Aggregate queries and grouping Aggregate queries are based on special op-
erators, non related to the data model, that permit to summarize or operate on
the query results. Common aggregate operators include: COUNT, SUM, AVG,
MIN, and MAX. Aggregate operators are very useful to calculate special infor-
mation about nodes and edges in the graph, for example the degree of a node
T
(i.e. by counting the neighbors of the node) or the length of the shortest path
between two nodes.
SPARQL 1.0 and G-SPARQL do not support aggregate operators. SPARQL
1.1 and Cypher defines all the common aggregate operators. Additionally, Cypher
AF
includes special aggregate operators for paths, for example length(path) allows
to obtain the length of the given path.
Example: Return the number of friends of “James”
## SPARQL 1.1
PREFIX sn: <http://www.socialnetwork.org/>
SELECT (COUNT(?Y) AS ?Friends)
DR
## CYPHER
MATCH (james:Person)-[:FRIENDS]->(friend:Person)
WHERE james.firstName=’James’
RETURN count(friend)
Example: Return the length of the shortest path between “James” and “Axel”.
## CYPHER
MATCH (jame:Person{firstName:"James"}),(axel:Person{firstName:"Axel"})
MATCH path=shortestPath((james)-[:knows*]-(axel))
RETURN length(path)
## CYPHER
MATCH (person:Person)
RETURN person, length((person)-[:knows]-())
Group conditions. Some query languages allows to filter the groups according
to a given condition. This is the function of the HAVING operator in SQL.
Example: for each person having 100 friends or more, returns their first name
and friends’ number.
## SPARQL 1.1
T
PREFIX sn: <http://www.socialnetwork.org/>
SELECT ?N, (COUNT(?F) AS ?FriendsNumber)
WHERE { ?X sn:type sn:Person . ?X sn:firstName ?N . ?X sn:knows ?F }
GROUP BY ?X
AF
HAVING (COUNT(?F) >= 100)
## CYPHER
MATCH (person:Person)
WITH person, length((person)-[:knows]-()) AS friendsNumber
WHERE friendsNumber>=100
RETURN person.firstName, friendsNumber
DR
Most of the above operators are supported by SPARQL 1.1 and Cypher.
Example: Return the youngest top-5 distinct friends of “James”.
## SPARQL 1.1
PREFIX sn: <http://www.socialnetwork.org/>
SELECT DISTINCT ?F ?N
WHERE { ?X sn:type sn:Person . ?X sn:firstName "James" .
?X sn:knows ?F . ?F sn:birthday ?B }
ORDER BY ASC(?B)
LIMIT 5
## CYPHER
MATCH (james:Person)-[:knows]-(friend:Person)
WHERE james.name="James
RETURN friend
ORDER BY friend.birthday ASC
LIMIT 5
The systems for graph data management can be classified in two main categories,
graph databases and graph processing frameworks. Although the problems ad-
dressed for both groups are similar, they provide two di↵erent approaches for
storing and querying graph data, with their own advantages and disadvantages.
Graph databases aim at persistent management of graph data, allowing to
transactionally store and access graph data on a persistent medium. In this
T
sense, these provide efficient single-node solutions with limited scalability. On
the other hand, graph processing frameworks aim to provide batch processing
and analysis of large graphs often in a distributed environment with multiple
machines. These solutions usually process the graph in memory, but di↵erent
AF
parts of the graph are managed by distinct, distributed nodes.
Very related to graph databases are the systems for managing RDF data.
These systems, called RDF Triple Stores or RDF databases, are specifically de-
signed to store collections of RDF triples, to support the standard SPARQL
query language, and possibly to allow some kind of inference via semantic rules.
Although Triple Stores are based on the RDF graph data model, they are spe-
cialized databases with their own characteristics. Therefore, we will study them
separately.
DR
T
and graph data [33].
There are several papers comparing the features [92, 59, 71, 170, 140] and per-
formance [172, 81, 66, 117] of graph databases. In Table 1, we present a general
view of the main features of current graph databases, including data model, stor-
AF
age model and query facilities. Next, we briefly describe the systems we consider
more relevant.
AllegroGraph[3] is one of the precursors in the current generation of graph
databases. Although it was born as a graph database, its current development
is oriented to meet the Semantic Web standards (i.e., RDF/S, SPARQL and
OWL). Additionally, AllegroGraph provides special features for GeoTemporal
Reasoning and Social Network Analysis.
DR
Graph algorithms
Query language
Property graph
Nested graph
Simple graph
Hypergraph
Single-node
Distributed
Non-native
Native
API
Graph database
AllegroGraph • • • • • •
ArangoDB • • • • • •
Bitsy • • • • •
Cayley • • • • • •
FlockDB • • • •
T
GraphBase • • • • •
Graphd • • • •
Horton • • • •
HyperGraphDB • • • • • • •
AF
IBM System G
imGraph •
• •
•
• •
•
• •
•
InfiniteGraph • • • • • • •
InfoGrid • • • • • •
Neo4j • • • • • • •
OrientDB • • • • • •
Sparksee/DEX • • • • •
Titan • • • • • • •
DR
Trinity • • • • • • •
TurboGraph • • • • •
a native disk-based storage manager for graphs, and a framework for graph
traversals.
Trinity [164, 44]) implements a general purpose graph engine over a dis-
tributed memory cloud. Trinity implements a globally addressable distributed
memory storage, and provides a random access abstraction for large graph com-
putation. Hence, it supports both online graph query processing and o✏ine graph
analytics. Its query languages, called TSL, allows users to declare data schema
and communication protocols.
Additionally, we can find multiple tools related to graph databases, includ-
ing small systems (G-Store [19],redis graph [37], vertexdb [47]), in-development
databases (CloudGraph [14], Orly [35], StigDB [41], JCoreDB Graph [29], Weaver
[49]), graph libraries (Filament [17]), academic developments (SGDB [82], Syl-
vaDB [?]) in-memory graph analysis tools (e.g., Cytoscape) and graph visual-
ization tools (e.g., JUNG, IGraph, GraphViz, Gephi and NodeXL).
Oriented to support the development of graph applications, TinkerPop [7]
provides several tools for graph databases. Blueprints provides a common API
for the property graph data model; Pipes provides a data flow framework to
join Blueprints with specific graph databases; Frames exposes the elements of
Blueprints graphs as Java objects (i.e., implements an object-graph mapping)
Gremlin is a domain specific language designed for traversing graphs. Rexster
allows to expose any Blueprints graph by implementing a Restful interface.
An RDF database (also called Triple Store) is a specialized graph database for
managing RDF data. RDF defines a data model based on expressions of the form
subject-predicate-object (SPO) called RDF triples. Therefore, an RDF dataset
is composed by a large collection of RDF triples which implicitly form a graph.
SPARQL is the standard query language for RDF databases. It is a declara-
T
tive language which allows to express several types of graph patterns. Its most
recent version (SPARQL 1.1) supports advanced features like property paths,
aggregate functions and subqueries.
There are several works comparing RDF databases [144, 167, 97, 89]. Similar
AF
to graph databases, RDF databases can also be classified into native and non-
native RDF databases. Examples of native RDF databases are Jena [85], RDF-
3X [147], 4store [2] and TripleBit [183]. Among the non-native RDF databases
we can mention to OpenLink Virtuoso [48], Sesame [38] and DB2RDF [68], which
are implemented on top of relational database systems.
Being more specific about native storage approaches, the RDF databases can
be classified into four categories [183]: triples table, property table, column store
DR
T
[6], Stratosphere [42] and Pegasus [118] have been adapted for graph process-
ing due to their facilities for batch data processing. Most of these systems are
based on the MapReduce programming model and implemented on top of the
AF
Hadoop platform, the open source version of MapReduce. By exploiting data-
parallelism, these systems are highly scalable and support a range of fault-
tolerance strategies. Though these systems improve the performance of iterative
queries, users still need to “think” their analytical graph queries as MapReduce
jobs. In fact, naively expressing graph computation and graph algorithms in these
data-parallel abstractions can be challenging [179]. Additionally, these systems
cannot take advantage of the characteristics of graph-structure data and often
result in complex job chains and excessive data movement when implementing
DR
T
(Gather, Apply, Scatter), which is similar to, but fundamentally di↵erent from,
the BSP model. In the GAS model, a vertex accumulates information about its
neighbourhood in the Gather phase, applies the accumulated value in the Apply
AF
phase, and updates its adjacent vertices and edges and activates its neighbouring
vertices in the Scatter phase. Another key di↵erence is that GraphLab partitions
graphs using vertex cuts rather than edge cuts. Consequently, each edge is as-
signed to a unique machine, while vertices are replicated in the caches of remote
machines. Besides graph processing, it also supports various machine learning
algorithms.
There are several works comparing graph processing frameworks. For in-
stance, the first evaluation study of modern big data frameworks, including
DR
References
1. 3store. http://sourceforge.net/projects/threestore/
2. 4store. http://www.4store.org
3. AllegroGraph. http://www.franz.com/agraph/allegrograph/
4. Apache Giraph. http://giraph.apache.org
5. Apache Hadoop. https://hadoop.apache.org
6. Apache Hadoop NextGen MapReduce (YARN).
http://hadoop.apache.org/docs/current/hadoop-yarn/
7. Apache TinkerPop - An open source graph computing framework.
http://tinkerpop.incubator.apache.org
8. ArangoDB. http://www.arangodb.org
9. Bitsy. https://bitbucket.org/lambdazen/bitsy/wiki/Home
10. Blazegraph. http://www.blazegraph.com/bigdata
11. Blueprints. https://github.com/tinkerpop/blueprints/wiki
12. BrightstarDB - A native RDF database for the .NET platform.
http://brightstardb.com
13. Cayley graph database. https://github.com/google/cayley
14. Cloudgraph. http://www.cloudgraph.com/
15. Cypher - Graph Query Language. http://neo4j.com/developer/cypher-query-
language/
16. DEX. http://www.sparsity-technologies.com/dex
17. Filament - Graph Management Toolkits. http://filament.sourceforge.net/
18. FlockDB. https://github.com/twitter/flockdb/
19. G-Store. http://g-store.sourceforge.net/
20. Giraph. https://github.com/aching/Giraph
21. GPS: A Graph Processing System. http://infolab.stanford.edu/gps/
22. Graphbase. http://graphbase.net/
T
23. Graphd. http://wiki.freebase.com/wiki/Graphd
24. GraphX - Apache API for graphs and graph-parallel computation.
https://spark.apache.org/graphx/
25. HyperGraphDB - A Graph Database. http://www.hypergraphdb.org/
26. IBM System G. http://systemg.research.ibm.com
27.
AF
InfiniteGraph. http://infinitegraph.com/
28. InfoGrid - The Internet Graph Database. http://infogrid.org/
29. jCoreDB Graph. https://sites.google.com/site/jcoredb/
30. Neo4j. http://neo4j.org/
31. Ontotext GraphDB. http://www.ontotext.com/products/ontotext-graphdb/
32. OQGraph. https://mariadb.com/kb/en/mariadb/oqgraph-storage-engine/
33. Oracle spatial and graph. http://www.oracle.com/technetwork/database/options/spatialandgraph/
34. OrientDB - Multi-Model NoSQL Database. http://orientdb.com
DR
T
European Conference on Hypertext Technology (ECHT). pp. 201–211. ACM (Nov
- Dec 1992)
58. Andries, M., Gemis, M., Paredaens, J., Thyssens, I., den Bussche, J.V.: Concepts
for Graph-Oriented Object Manipulation. In: Proceedings of the 3rd International
AF
Conference on Extending Database Technology (EDBT). LNCS, vol. 580, pp. 21–
38. Springer (March 1992)
59. Angles, R.: A Comparison of Current Graph Database Models. In: Proceedings of
the 2012 IEEE 28th International Conference on Data Engineering Workshops.
pp. 171–177. IEEE Computer Society (2012)
60. Angles, R., Gutierrez, C.: Survey of graph database models. ACM Computing
Surveys (CSUR) 40(1), 1–39 (2008)
61. Anyanwu, K., Sheth, A.: P-Queries: enabling querying for semantic associations
DR
T
76. Carroll, J.: Matching RDF Graphs. In: Proceedings of the International Semantic
Web Conference (ISWC) (2002)
77. Chang, C.S., Chen, A.L.P.: Supporting conceptual and neighborhood queries
on the world wide web. IEEE Transactions on Systems, Man, and Cybernetics
AF
(TSMC) 28(2), 300–308 (1998)
78. Chen, P.P.S.: The Entity-Relationship Model - Toward a Unified View of Data.
ACM Transactions on Database Systems (TODS) 1(1), 9–36 (1976)
79. Cheng, J., Yu, J.X., Ding, B., Yu, P.S., Wang, H.: Fast graph pattern matching.
In: Proc. of the 2008 IEEE 24th International Conference on Data Engineering
(ICDE). pp. 913–922. IEEE Computer Society (2008)
80. Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An Efficient SQL-based RDF
Querying Scheme. In: Proc. VLDB Endow. pp. 1216–1227 (2005)
81. Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking Traversal Operations over
DR
T
processing. In: Proceedings of the IEEE International Conference on Big Data.
pp. 60–67. IEEE (2013)
94. Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., Wu, Y.: Incremental graph pat-
tern matching. In: Proc. of the International Conference on Management of data
AF
(SIGMOD). ACM Press (2011)
95. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y.: Adding regular expressions to graph
reachability and pattern queries. In: Proc. of the IEEE 27th International Con-
ference on Data Engineering (ICDE). pp. 39–50 (2011)
96. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from
intractable to polynomial time. Proc. VLDB Endowment 3, 264–275 (September
2010)
97. Faye, D.C., Cure, O., Blin, G.: A survey of RDF storage approaches. In: ARIMA
DR
T
Database Systems (PODS). pp. 417–424. ACM Press (1990)
109. Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An Experimen-
tal Comparison of Pregel-like Graph Processing Systems. Proc. VLDB Endow.
7(12), 1047–1058 (Aug 2014)
AF
110. Han, W.S., Lee, S., Park, K., Lee, J.H., Kim, M.S., Kim, J., Yu, H.: TurboGraph:
A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC. In:
Proceedings of the 19th ACM International Conference on Knowledge Discovery
and Data Mining. pp. 77–85. ACM, New York, NY, USA (2013)
111. Hayes, J., Gutierrez, C.: Bipartite Graphs as Intermediate Model for RDF. In:
Proceedings of the 3th International Semantic Web Conference (ISWC). pp. 47–
61. No. 3298 in LNCS, Springer-Verlag (Nov 2004)
112. Hidders, J.: Typing Graph-Manipulation Operations. In: Proceedings of the 9th
International Conference on Database Theory (ICDT). pp. 394–409. Springer-
DR
Verlag (2002)
113. Hidders, J., Paredaens, J.: GOAL, A Graph-Based Object and Association Lan-
guage. Advances in Database Systems: Implementations and Applications, CISM
pp. 247–265 (Sept 1993)
114. Hull, R., King, R.: Semantic Database Modeling: Survey, Applications, and Re-
search Issues. ACM Computing Surveys 19(3), 201–260 (1987)
115. Iordanov, B.: HyperGraphDB: a generalized graph database. In: Proceedings of
the International Conference on Web-age Information Management (WAIM). pp.
25–36. Springer-Verlag (2010)
116. Jouili, S., Reynaga, A.: imGraph: A Distributed In-Memory Graph Database.
In: Proc. of the International Conference on Social Computing (SocialCom). pp.
732–737 (2013)
117. Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In:
Social Computing (SocialCom), 2013 International Conference on. pp. 708–715
(Sept 2013)
118. Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: A Peta-Scale Graph Min-
ing System Implementation and Observations. In: Proceedings of the 9th IEEE
International Conference on Data Mining. pp. 229–238. IEEE Computer Society,
Washington, DC, USA (2009)
119. Khan, A., Elnikety, S.: Systems for Big-Graphs. In: Proc. of the 40th International
Conference on Very Large Data Bases (VLDB) (2014)
120. Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.:
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing.
In: Proceedings of the 8th ACM European Conference on Computer Systems. pp.
169–182. ACM, New York, NY, USA (2013)
121. Kiesel, N., Schurr, A., Westfechtel, B.: GRAS: A Graph-Oriented Software Engi-
neering Database System. In: IPSEN Book. pp. 397–425 (1996)
122. Kim, W.: Object-Oriented Databases: Definition and Research Directions. IEEE
Transactions on Knowledge and Data Engineering (TKDE) 2(3), 327–341 (1990)
123. Klyne, G., Carroll, J.: Resource Description Framework (RDF) Concepts and
Abstract Syntax. http://www.w3.org/TR/2004/REC-115-concepts-20040210/
(February 2004)
124. Kowalik, L.: Adjacency queries in dynamic sparse graphs. Information Processing
Letters 102, 191–195 (May 2007)
125. Kunii, H.S.: DBMS with Graph Data Model for Knowledge Handling. In: Pro-
ceedings of the 1987 Fall Joint Computer Conference on Exploring technology:
today and tomorrow. pp. 138–142. IEEE Computer Society Press (1987)
126. Kuper, G.M., Vardi, M.Y.: A New Approach to Database Logic. In: Proceedings
T
of the 3th Symposium on Principles of Database Systems (PODS). pp. 86–96.
ACM Press (April 1984)
127. Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: Large-scale Graph Computation
on Just a PC. In: Proceedings of the 10th USENIX Conference on Operating
AF
Systems Design and Implementation. pp. 31–46. USENIX Association, Berkeley,
CA, USA (2012)
128. Lécluse, C., Richard, P., Vélez, F.: O2, an Object-Oriented Data Model. In: Pro-
ceedings of the ACM International Conference on Management of Data (SIG-
MOD). pp. 424–433. ACM Press (June 1988)
129. Levene, M., Loizou, G.: A Graph-Based Data Model and its Ramifications. IEEE
Transactions on Knowledge and Data Engineering (TKDE) 7(5), 809–823 (1995)
130. Levene, M., Poulovassilis, A.: The Hypernode Model and its Associated Query
DR
T
143. Meyer, S.M., Degener, J., Giannandrea, J., Michener, B.: Optimizing Schema-
last Tuple-store Queries in Graphd. In: Proceedings of the ACM International
Conference on Management of Data (SIGMOD). pp. 1047–1056. ACM, New York,
NY, USA (2010)
144.
AF
Michael Schmidt and Thomas Hornung and Norbert Küchlin and Georg Lausen
and Christoph Pinkel: An Experimental Comparison of RDF Data Management
Approaches in a SPARQL Benchmark Scenario. In: Proc. of the 7th International
Semantic Web Conference (ISWC). pp. 82–97. Springer-Verlag (2008)
145. Morari, A., Castellana, V., Villa, O., Tumeo, A., Weaver, J., Haglin, D., Choud-
hury, S., Feo, J.: Scaling semantic graph databases in size and performance. IEEE
Micro 34(4), 16–26 (July 2014)
146. Navathe, S.B.: Evolution of Data Modeling for Databases. Communications of
the ACM 35(9), 112–123 (1992)
DR
147. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF
data. The VLDB Journal 19(1), 91–113 (2010)
148. Papadopoulos, A.N., Manolopoulos, Y.: Nearest Neighbor Search - A Database
Perspective. Series in Computer Science, Springer (2005)
149. Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object Exchange across
Heterogeneous Information Sources. In: Proceedings of the 11th International
Conference on Data Engineering (ICDE). pp. 251–260. IEEE Computer Society
(1995)
150. Paredaens, J., Kuijpers, B.: Data Models and Query Languages for Spatial
Databases. Data & Knowledge Engineering (DKE) 25(1-2), 29–53 (1998)
151. Paredaens, J., Peelman, P., Tanca, L.: G-Log: A Graph-Based Query Language.
IEEE Transactions on Knowledge and Data Engineering (TKDE) 7(3), 436–453
(1995)
152. Peckham, J., Maryanski, F.J.: Semantic Data Models. ACM Computing Surveys
20(3), 153–189 (1988)
153. Poulovassilis, A., Levene, M.: A Nested-Graph Model for the Representation and
Manipulation of Complex Objects. ACM Transactions on Information Systems
(TOIS) 12(1), 35–68 (1994)
154. Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF.
W3C Recommendation. http://www.w3.org/TR/2008/REC-115-sparql-query-
20080115/ (January 15 2008)
155. Ralf Hartmut Güting: GraphDB: Modeling and Querying Graphs in Databases.
In: Proceedings of 20th International Conference on Very Large Data Bases
(VLDB). pp. 297–308. Morgan Kaufmann (1994)
156. Rodriguez, M.A., Neubauer, P.: Constructions from dots and lines. Bul. Am. Soc.
Info. Sci. Tech. 36(6), 35–41 (2010)
157. Rodriguez, M.A.: Mapping Semantic Networks to Undirected Networks. Interna-
tional Journal of Applied Mathematics and Computer Sciences 5(1), 30–42 (2009)
158. Roussopoulos, N., Mylopoulos, J.: Using Semantic Networks for Database Man-
agement. In: Proceedings of the International Conference on Very Large Data
Bases (VLDB). pp. 144–172. ACM (Sept 1975)
159. Salihoglu, S., Widom, J.: GPS: A Graph Processing System. In: Proceedings of the
25th International Conference on Scientific and Statistical Database Management.
pp. 22:1–22:12. ACM, New York, NY, USA (2013)
160. Samet, H.: Modern Database Systems: The Object Model, Interoperability and
Beyond, chap. Spatial data structures, pp. 361–385. Addison Wesley - ACM Press
(1995)
161. Sarwat, M., Elnikety, S., He, Y., Mokbel, M.F.: Horton+: A Distributed System
T
for Processing Declarative Reachability Queries over Partitioned Graphs. Proc.
VLDB Endow. 6(14), 1918–1929 (Sep 2013)
162. Seidl, T., peter Kriegel, H.: A 3d molecular surface representation supporting
neighborhood queries. In: Proc. of the 3rd Conference on Intelligent Systems for
AF
Molecular Biology (ISMB). pp. 240–258. Springer (1995)
163. Shang, Z., Yu, J.X.: Catch the wind: Graph workload balancing on cloud. In:
30th International Conference on Data Engineering. pp. 553–564. IEEE Computer
Society (2013)
164. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory
cloud. In: Proceedings of the ACM International Conference on Management of
Data (SIGMOD). pp. 505–516. ACM, New York, NY, USA (2013)
165. Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and Applications of Tree
and Graph Searching. In: Proceedings of the 21th Symposium on Principles of
DR
T
Statistical Database Management. pp. 421–422. IEEE (June 2004)
181. Xu, A., Lei, H.: LCGMiner: Levelwise Closed Graph Pattern Mining from Large
Databases. In: Proceedings of the 16th International Conference on Scientific and
Statistical Database Management (SSDBM’04). p. 421. IEEE (June 2004)
AF
182. Yannakakis, M.: Graph-Theoretic Methods in Database Theory. In: Proceedings
of the 9th Symposium on Principles of Database Systems (PODS). pp. 230–242.
ACM Press (1990)
183. Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: A Fast and
Compact System for Large Scale RDF Data. Proc. VLDB Endow. 6(7), 517–528
(May 2013)
184. Zhao, Y., Yoshigoe, K., Xie, M., Zhou, S., Seker, R., Bian, J.: Evaluation and
analysis of distributed graph-parallel processing frameworks. Journal of Cyber
DR