0% found this document useful (0 votes)
13 views8 pages

Nosql Module5

A graph database is a NoSQL database that stores data as nodes (entities) and edges (relationships), allowing for efficient handling of complex interconnections. It supports powerful querying capabilities and maintains strict consistency and ACID compliance, making it suitable for use cases like social networks, routing services, recommendation systems, and fraud detection. However, it is not ideal for massive unstructured data or write-heavy workloads due to performance degradation during updates.

Uploaded by

bharadwajvarun35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Nosql Module5

A graph database is a NoSQL database that stores data as nodes (entities) and edges (relationships), allowing for efficient handling of complex interconnections. It supports powerful querying capabilities and maintains strict consistency and ACID compliance, making it suitable for use cases like social networks, routing services, recommendation systems, and fraud detection. However, it is not ideal for massive unstructured data or write-heavy workloads due to performance degradation during updates.

Uploaded by

bharadwajvarun35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NOSQL DATABASE

Module 5

What is graph database?

A graph database is a type of NoSQL database that is designed to handle data with complex
relationships and interconnections. In a graph database, data is stored as nodes and edges,
where nodes represent entities and edges represent the relationships between those entities.

Graph databases allow you to store entities and relationships between these entities. Entities
are also known as nodes, which have properties.
●​ Node is an instance of an object in the application.
●​ Relations are known as edges that can have properties.
●​ Edges have directional significance; nodes are organized by relationships which helps
to find patterns.

The organization of the graph lets the data to be stored once and then interpreted in different
ways based on relationships.

For example, Martin is a node with property name “Martin”

Nodes can have different types :


●​ Person – Anna, Barbara, Martin, etc.
●​ Book – Refactoring, NoSQL distilled.
●​ Company – BigCo.

1
Edges represent Relationships.
For example, nodes Martin and Pramod have an edge connecting them with a relationship
type of “friend”.
Relationship types have directional significance; the friend relationship type is bidirectional
but likes is not. When Dawn likes NoSQL Distilled, it does not automatically mean NoSQL
Distilled likes Dawn.

A query on the graph is also known as traversing the graph. An advantage of the graph
databases is that we can change the traversing requirements without having to change the
nodes or edges.
Example,
If we want to “get all nodes that like NoSQL Distilled,” we can do so without having to
change the existing data or the model of the database, because we can traverse the graph any
way we like.

Nodes can have multiple relationships, such as:


●​ Primary relationships: emp, likes, etc.
●​ Secondary relationships: category, time based, etc.
In graph databases, traversing the joins or relationships is very fast. The relationship between
nodes are not calculated at query time but are actually persisted as a relationship.

Explain how relationships and properties are represented in a graph, with a neat
diagram.

Relationships connect nodes and define “how nodes are related”.


In Neo4J, creating a graph is as simple as creating two nodes and then creating a relationship.
Let’s create two nodes, Martin and Pramod:

Node martin = graphDb.createNode();


martin.setProperty("name", "Martin");
Node pramod = graphDb.createNode();
pramod.setProperty("name", "Pramod");

We have assigned the name property of the two nodes the values of Martin and Pramod. Once
we
have more than one node, we can create a relationship:

martin.createRelationshipTo(pramod, FRIEND);
pramod.createRelationshipTo(martin, FRIEND);

A relationship has:
➔​ A type (eg., friend, emp_of)
➔​ A start node and end node.

2
➔​ Properties of its own.

Directionality of relationships matters,


For example, A user “likes” a product, but the product does not “likes” the user.
This incoming and outgoing direction helps build a rich and meaningful domain model.

Relationships can store metadata using properties. These properties add more context and
intelligence to the relationships.
Example: timestamps, distance.

●​ “Friend” relationship connects people node with properties like :


“since” : Represents the year the friendship began.

●​ “employee_of” connects people nodes to the company node, with properties like:
“hired_year” : Represents the year the person joined the company.

Adding or modifying relationships is easier in graph databases compared to traditional


relational models.
Powerful queries like: who became friends in 2005? What items do Anna and Barbara share?
Can be traversed and retrieved easily.

Features of Graph Databases


1.​ Consistency:
Relationships in graph databases are strictly defined, ensuring that there are no dangling
relationships.
Example, A relationship cannot point to a non-existent node, and nodes cannot be deleted if
they have existing relationships.

3
On a single server, graph databases like Neo4J are fully ACID compliant.
When Neo4J operates in cluster mode:-
●​ Writes are directed to the master node, and are eventually synchronised to the slave
nodes.
●​ Reads are allowed from slave nodes.

2.​ Transactions
Transactions in Neo4J are ACID complaints.
In Neo4J, any changes must occur within a transaction block. If no transaction is started,
operations throw a NotInTransactionException.

Transaction transaction = database.beginTx();


try {
Node node = database.createNode();
node.setProperty("name", "NoSQL Distilled");
node.setProperty("published", "2012");
transaction.success();
} finally {
transaction.finish();
}

●​ In the above code, we started a transaction on the database, then created a node and
set properties on it.
●​ We marked the transaction as a success and finally completed it by finish.
●​ A transaction has to be marked as success, otherwise Neo4J assumes that it was a
failure and rolls it back when finish is issued.
●​ Setting success without issuing finish also does not commit the data to the database.

3.​ Availability
Neo4J, as of version 1.8, achieves high availability by providing for replicated slaves. When
they are written to, they synchronize the write to the current master, and the write is
committed first at the master and then at the slave.

Neo4J uses the Apache ZooKeeper to keep track of the last transaction IDs persisted
on each slave node and the current master node.
Once a server starts up, it communicates with ZooKeeper and finds out which server is the
master.
If the server is the first one to join the cluster, it becomes the master; when a master goes
down, the cluster elects a master from the available nodes, thus providing high availability.

4.​ Query features


Graph databases are supported by query languages such as Gremlin. Gremlin is a
domain-specific language for traversing graphs; it can traverse all graph databases.

4
Neo4J also has the Cypher query language for querying the graph.
Neo4J allows you to query the graph for properties of the nodes, traverse the graph, or
navigate the node relationships using language bindings.

Properties of a node can be indexed using the indexing service. Similarly, properties of
relationships or edges can be indexed.

1)We create an index for the nodes using the IndexManager.

Index<Node> nodeIndex = graphDb.index().forNodes("nodes");

2)We can perform indexing of the nodes for the name property. Neo4J uses Lucene [Lucene]
as its indexing service.

Transaction transaction = graphDb.beginTx();


try {
Index<Node> nodeIndex = graphDb.index().forNodes("nodes");
nodeIndex.add(martin, "name", martin.getProperty("name"));
nodeIndex.add(pramod, "name", pramod.getProperty("name"));
transaction.success();
} finally {
transaction.finish();
}

3) Adding nodes to the index is done inside the context of a transaction. Once the nodes are
indexed, we can search them using the indexed property.

Example, If we search for the node with the name of Barbara, we would query the index for
the property of the name to have a value of Barbara.
Node node = nodeIndex.get("name", "Barbara").getSingle();

4) By giving the node, we can find that particular node relationship.


Example, We get the node whose name is Martin;

Node martin = nodeIndex.get("name", "Martin").getSingle();


allRelationships = martin.getRelationships();

We can get INCOMING and OUTGOING relationships.


incomingRelations = martin.getRelationships(Direction.INCOMING);

5) We can make the traverser go top-down or sideways on the graph by using Order values of
BREADTH_FIRST or DEPTH_FIRST. The traversal has to start at some node—

5
For example, we try to find all the nodes at any depth that are related as a FRIEND with
Barbara:

Node barbara = nodeIndex.get("name", "Barbara").getSingle();


Traverser friendsTraverser = barbara.traverse(Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
EdgeType.FRIEND,
Direction.OUTGOING);

Neo4J also provides the Cypher query language to query the graph. Cypher needs a node to
START the query.
The start node can be identified by its node ID, a list of node IDs, or index lookups.
Cypher uses the
-​ MATCH keyword for matching patterns in relationships;
-​ the WHERE keyword filters the properties on a node or relationship.
-​ The RETURN keyword specifies what gets returned by the query —nodes,
relationships, or fields on the nodes or relationships.
We can find which nodes are connected to each other.
Example, we find all nodes connected to Barbara, either incoming or outgoing, by using the
--.

START barbara = node:nodeIndex(name = "Barbara")


MATCH (barbara)--(connected_node)
RETURN connected_node

For directional significance, we can use


MATCH (barbara)<--(connected_node) for incoming relationships or
MATCH (barbara)-->(connected_node) for outgoing relationships.

Match can also be done on specific relationships using the :RELATIONSHIP_TYPE


convention and returning the required fields or nodes.

We can query for relationships where a particular relationship property exists. We can also
filter on the properties of relationships and query if a property exists or not.

START barbara = node:nodeIndex(name = "Barbara")


MATCH (barbara)-[relation]->(related_node)
WHERE type(relation) = 'FRIEND' AND relation.share
RETURN related_node.name, relation.since

5.​ Scaling
Scaling a graph database is challenging because graph databases are “relationship-oriented”
rather than aggregate oriented. Three main techniques to scale graph database are :-

6
(a)​Scaling by adding more RAM.
●​ Add sufficient RAM to a single server so that the entire dataset fits into memory.
●​ Modern machines can support large amounts of RAM, making this viable for many
use cases.
●​ This approach does not work for extremely large datasets that exceed realistic
memory capacity.

(b)​Master-slave replication (Read scaling)


●​ Use a master slave architecture where writes go to a master server. Reads are handled
by slave servers.
●​ Increases availability, as slaves can continue handling reads even if the master fails.
●​ This technique does not scale writes effectively, as all write operations still go to the
master.

(c)​ Application level sharding.


●​ Use domain-specific knowledge to split and distribute nodes across different servers.
●​ Nodes related to “N.A” can be stored on one server, while nodes related to “Asia” are
stored on another.
●​ Reduces the load on a single server.
●​ Allows for horizontal scaling by distributing data logically across servers.

Each of these techniques addresses different scaling challenges in graph databases, allowing
them to handle larger datasets while maintaining performance and availability.

7
List and explain use cases where graph databases support.

1)Connected data: Graph Databases are ideal for modeling social relationships, such as
friendships, professional networks, or collaborations.
Efficient traversal of relationships, such as finding mutual friends or collaborations.

2)Routing, dispatch and location based services: Graph Databases can be used to model
nodes (location) and relationships for routing, logistics and location based recommendations.
For example, Graph Database can find the shortest path between two locations.
Applications can recommend restaurants, shops or point of interest near by.

3)Recommendations system/engines : Recommendations based on behaviour patterns, such


as “customers who bought X also bought Y”.
Suggest complementary items that are frequently bought together.
An interesting side effect of using the graph databases for recommendations is that as the data
size grows, the number of nodes and relationships available to make the recommendations
quickly increases.

4)Fraud detection: Relies on identifying unusual or suspicious patterns in relationships


within the data. Visualization of relationships helps uncover hidden connections.

5)Healthcare and Bioinformatics: Graph Database can represent relationships between


medical entities, such as patients, diseases, treatments, and research findings.

When not to use Graph Databases?

Systems dealing with massive unstructured data like text, images, or videos without a strong
need for relationships (e.g., data lakes). Graph databases struggle with sharding and scaling
unstructured datasets efficiently due to the inherent relationship-oriented design.

Writes in graph databases often require re-indexing or updating relationships, which can
degrade performance. Write-heavy workloads are better suited for other NoSQL databases
like Cassandra or DynamoDB.

When you want to update all or a subset of entities—for example, in an analytics solution
where all entities may need to be updated with a changed property—graph databases may not
be optimal since changing a property on all the nodes is not a straightforward operation.

You might also like