Implement - Graph Databases
Implement - Graph Databases
Graph Databases
• Graph databases allow you to store entities and relationships
between these entities
• Entities are also known as nodes, which have properties.
• Node is an instance of an object in the application
• Relations are known as edges that can have properties.
• Edges have directional significance; nodes are organized by
relationships which allow you to find interesting patterns
between the nodes.
• The organization of the graph lets the data to be stored once
and then interpreted in different ways based on relationships.
What Is a Graph Database?
• Nodes are entities that have properties, such as name. The node of
Martin is actually a node that has property of name set to Martin.
• Edges have types, such as likes, author, and so on.
• These properties let us organize the nodes; for example, the nodes
Martin and Pramod have an edge connecting them with a
relationship type of friend.
• Edges can have multiple properties.
• We can assign a property of since on the friend relationship type
between Martin and Pramod.
• Relationship types have directional significance; the friend
relationship type is bidirectional but likes is not.
What Is a Graph Database?
• Once a graph of nodes and edges created, we can query
the graph in many ways, such as “get all nodes
employed by Big Co that like NoSQL Distilled.”
• A query on the graph is also known as traversing the
graph.
• An advantage of the graph databases is that we can
change the traversing requirements without having to
change the nodes or edges
• Query : “get all nodes that like NoSQL Distilled,” we can
do so without having to change the existing data or the
model of the database, because we can traverse the
graph any way we like.
What Is a Graph Database?
• In graph databases, traversing the joins or
relationships is very fast.
• Nodes can have different types of relationships
between them, allowing you to both represent
relationships between the domain entities and to
have secondary relationships for things like
category, path, time-trees, quad-trees for spatial
indexing, or linked lists for sorted access.
• Since there is no limit to the number and kind of
relationships a node can have, all they can be
represented in the same graph database.
Features
• There are many graph databases available,
such as
– Neo4J
– Infinite Graph
– OrientDB or FlockDB (which is a special case: a
graph database that only supports single-depth
relationships or adjacency lists, where you cannot
traverse more than one level deep for
relationships).
Features
• Creating a graph is as simple as creating two nodes
and then creating a relationship.
• Let’s create two nodes, Martin and Pramod:
Node martin = graphDb.createNode();
martin.setProperty("name", "Martin");
Node pramod = graphDb.createNode();
pramod.setProperty("name", "Pramod");
• We have assigned the name property of the two
nodes the values of Martin and Pramod.
• Once we have more than one node, we can create a
relationship:
martin.createRelationshipTo(pramod, FRIEND);
pramod.createRelationshipTo(martin, FRIEND);
Features
• Relationships are first-class citizens in graph databases;
most of the value of graph databases is derived from
the relationships
• Relationships don’t only have a type, a start node, and
an end node, but can have properties of their own.
• Using these properties on the relationships, we can
add intelligence to the relationship
• for example,
– since when did they become friends
– what is the distance between the nodes
– what aspects are shared between the nodes.
• These properties on the relationships can be used to
query the graph.
Features
• Since most of the power from the graph
databases comes from the relationships and
their properties, a lot of thought and design
work is needed to model the relationships in
the domain that we are trying to work with.
• Adding new relationship types is easy;
changing existing nodes and their
relationships is similar to data migration,
because these changes will have to be done
on each node and each relationship in the
existing data.
Features
1. Consistency
• Since graph databases are operating on
connected nodes, most graph database
solutions usually do not support distributing
the nodes on different servers
• Some graph database support node
distribution across a cluster of servers, such as
Infinite Graph.
Features
1. Consistency
• Within a single server, data is always consistent,
especially in Neo4J which is fully ACID-compliant.
• When running Neo4J in a cluster, a write to the
master is eventually synchronized to the slaves,
while slaves are always available for read.
• Writes to slaves are allowed and are immediately
synchronized to the master; other slaves will not
be synchronized immediately, though—they will
have to wait for the data to propagate from the
master.
Features
1. Consistency
• Graph databases ensure consistency through
transactions.
• They do not allow dangling relationships: The
start node and end node always have to exist,
and nodes can only be deleted if they don’t
have any relationships attached to them.
Features
2. Transactions
• Neo4J is ACID-compliant. Before changing any
nodes or adding any relationships to existing
nodes, we have to start a transaction.
• Without wrapping operations in transactions,
we will get a NotInTransactionException.
• Read operations can be done without
initiating a transaction.
Features
2. Transactions
Transaction transaction = database.beginTx();
try {
Node node = database.createNode();
node.setProperty("name", "NoSQL Distilled");
node.setProperty("published", "2012");
transaction.success();
} finally {
transaction.finish();
}
Features
2. Transactions
• In the above code, we started a transaction on the
database, then created a node and set properties on it.
• We marked the transaction as success and finally
completed it by finish.
• A transaction has to be marked as success, otherwise
Neo4J assumes that it was a failure and rolls it back
when finish is issued.
• Setting success without issuing finish also does not
commit the data to the database.
• This way of managing transactions has to be
remembered when developing, as it differs from the
standard way of doing transactions in an RDBMS.
Features
3. Availability
• Neo4J, as of version 1.8, achieves high
availability by providing for replicated slaves.
• These slaves can also handle writes: When they
are written to, they synchronize the write to
the current master, and the write is committed
first at the master and then at the slave.
• Other slaves will eventually get the update.
• Other graph databases, such as Infinite Graph
and FlockDB, provide for distributed storage of
the nodes.
Features
3. Availability
• Neo4J uses the Apache ZooKeeper to keep track
of the last transaction IDs persisted on each
slave node and the current master node.
• Once a server starts up, it communicates with
ZooKeeper and finds out which server is the
master.
• If the server is the first one to join the cluster, it
becomes the master; when a master goes
down, the cluster elects a master from the
available nodes, thus providing high availability.
Features
4. Query Features
• Graph databases are supported by query
languages such as
– Gremlin : Gremlin is a domainspecific language for
traversing graphs; it can traverse all graph databases
that implement the Blueprintsproperty graph.
– Neo4J also has the Cypher query language for
querying the graph.
• Outside these query languages, Neo4J allows
you to query the graph for properties of the
nodes, traverse the graph, or navigate the
nodes relationships using language bindings.
Features
4. Query Features
• Properties of a node can be indexed using the
indexing service.
• Similarly, properties of relationships or edges
can be indexed, so a node or edge can be
found by the value.
• Indexes should be queried to find the starting
node to begin a traversal.
Features
4. Query Features
• we can index the nodes as they are added to the database, or
we can index all the nodes later by iterating over them. We
first need to create an index for the nodes using the
IndexManager
Index<Node> nodeIndex = graphDb.index().forNodes("nodes");
• When new nodes are created, they can be added to the index.
Transaction transaction = graphDb.beginTx();
try {
Index<Node> nodeIndex = graphDb.index().forNodes("nodes");
nodeIndex.add(martin, "name", martin.getProperty("name"));
nodeIndex.add(pramod, "name", pramod.getProperty("name"));
transaction.success();
} finally {
transaction.finish();
}
Features
4. Query Features
• Once the nodes are indexed, we can search
them using the indexed property.
• If we search for the node with the name of
Barbara, we would query the index for the
property of name to have a value of Barbara.
Node node = nodeIndex.get("name",
"Barbara").getSingle();
Features
4. Query Features
• We get the node whose name is Martin; given
the node, we can get all its relationships.
Node martin = nodeIndex.get("name", "Martin").getSingle();
allRelationships = martin.getRelationships();
• We can get both INCOMING or OUTGOING
relationships.
incomingRelations =
martin.getRelationships(Direction.INCOMING);
Features
4. Query Features
• We can also apply directional filters on the queries when
querying for a relationship.
• If we want to find all people who like NoSQL Distilled, we can
find the NoSQL Distilled node and then get its relationships
with Direction.INCOMING.
• At this point we can also add the type of relationship to the
query filter, since we are looking only for nodes that LIKE
NoSQL Distilled.
Node nosqlDistilled = nodeIndex.get("name",
"NoSQL Distilled").getSingle();
relationships = nosqlDistilled.getRelationships(INCOMING, LIKES);
for (Relationship relationship : relationships) {
likesNoSQLDistilled.add(relationship.getStartNode());
}
Features
4. Query Features
• Graph databases are really powerful when you want to
traverse the graphs at any depth and specify a starting
node for the traversal.
• This is especially useful when you are trying to find nodes
that are related to the starting node at more than one
level down.
• As the depth of the graph increases, it makes more sense
to traverse the relationships by using a Traverser where
you can specify that you are looking for INCOMING,
OUTGOING, or BOTH types of relationships.
• You can also make the traverser go top-down or sideways
on the graph by using Order values of BREADTH_FIRST or
DEPTH_FIRST.
Features
4. Query Features
• find all the nodes at any depth that are related as a
FRIEND with Barbara:
Node barbara = nodeIndex.get("name", "Barbara").getSingle();
Traverser friendsTraverser = barbara.traverse(Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
EdgeType.FRIEND,
Direction.OUTGOING);
• The friendsTraverser provides us a way to find all the nodes
that are related to Barbara where the relationship type is
FRIEND. The nodes can be at any depth—friend of a
friend at any level— allowing you to explore tree
structures.
Features
4. Query Features
• One of the good features of graph databases is finding paths
between two nodes—determining if there are multiple paths,
finding all of the paths or the shortest path.
• Example:
– Barbara is connected to Jill by two distinct paths; to find all these paths
and the distance between Barbara and Jill along those different paths,