Nosql Module5
Nosql Module5
Module 5
A graph database is a type of NoSQL database that is designed to handle data with complex
relationships and interconnections. In a graph database, data is stored as nodes and edges,
where nodes represent entities and edges represent the relationships between those entities.
Graph databases allow you to store entities and relationships between these entities. Entities
are also known as nodes, which have properties.
● Node is an instance of an object in the application.
● Relations are known as edges that can have properties.
● Edges have directional significance; nodes are organized by relationships which helps
to find patterns.
The organization of the graph lets the data to be stored once and then interpreted in different
ways based on relationships.
1
Edges represent Relationships.
For example, nodes Martin and Pramod have an edge connecting them with a relationship
type of “friend”.
Relationship types have directional significance; the friend relationship type is bidirectional
but likes is not. When Dawn likes NoSQL Distilled, it does not automatically mean NoSQL
Distilled likes Dawn.
A query on the graph is also known as traversing the graph. An advantage of the graph
databases is that we can change the traversing requirements without having to change the
nodes or edges.
Example,
If we want to “get all nodes that like NoSQL Distilled,” we can do so without having to
change the existing data or the model of the database, because we can traverse the graph any
way we like.
Explain how relationships and properties are represented in a graph, with a neat
diagram.
We have assigned the name property of the two nodes the values of Martin and Pramod. Once
we
have more than one node, we can create a relationship:
martin.createRelationshipTo(pramod, FRIEND);
pramod.createRelationshipTo(martin, FRIEND);
A relationship has:
➔ A type (eg., friend, emp_of)
➔ A start node and end node.
2
➔ Properties of its own.
Relationships can store metadata using properties. These properties add more context and
intelligence to the relationships.
Example: timestamps, distance.
● “employee_of” connects people nodes to the company node, with properties like:
“hired_year” : Represents the year the person joined the company.
3
On a single server, graph databases like Neo4J are fully ACID compliant.
When Neo4J operates in cluster mode:-
● Writes are directed to the master node, and are eventually synchronised to the slave
nodes.
● Reads are allowed from slave nodes.
2. Transactions
Transactions in Neo4J are ACID complaints.
In Neo4J, any changes must occur within a transaction block. If no transaction is started,
operations throw a NotInTransactionException.
● In the above code, we started a transaction on the database, then created a node and
set properties on it.
● We marked the transaction as a success and finally completed it by finish.
● A transaction has to be marked as success, otherwise Neo4J assumes that it was a
failure and rolls it back when finish is issued.
● Setting success without issuing finish also does not commit the data to the database.
3. Availability
Neo4J, as of version 1.8, achieves high availability by providing for replicated slaves. When
they are written to, they synchronize the write to the current master, and the write is
committed first at the master and then at the slave.
Neo4J uses the Apache ZooKeeper to keep track of the last transaction IDs persisted
on each slave node and the current master node.
Once a server starts up, it communicates with ZooKeeper and finds out which server is the
master.
If the server is the first one to join the cluster, it becomes the master; when a master goes
down, the cluster elects a master from the available nodes, thus providing high availability.
4
Neo4J also has the Cypher query language for querying the graph.
Neo4J allows you to query the graph for properties of the nodes, traverse the graph, or
navigate the node relationships using language bindings.
Properties of a node can be indexed using the indexing service. Similarly, properties of
relationships or edges can be indexed.
2)We can perform indexing of the nodes for the name property. Neo4J uses Lucene [Lucene]
as its indexing service.
3) Adding nodes to the index is done inside the context of a transaction. Once the nodes are
indexed, we can search them using the indexed property.
Example, If we search for the node with the name of Barbara, we would query the index for
the property of the name to have a value of Barbara.
Node node = nodeIndex.get("name", "Barbara").getSingle();
5) We can make the traverser go top-down or sideways on the graph by using Order values of
BREADTH_FIRST or DEPTH_FIRST. The traversal has to start at some node—
5
For example, we try to find all the nodes at any depth that are related as a FRIEND with
Barbara:
Neo4J also provides the Cypher query language to query the graph. Cypher needs a node to
START the query.
The start node can be identified by its node ID, a list of node IDs, or index lookups.
Cypher uses the
- MATCH keyword for matching patterns in relationships;
- the WHERE keyword filters the properties on a node or relationship.
- The RETURN keyword specifies what gets returned by the query —nodes,
relationships, or fields on the nodes or relationships.
We can find which nodes are connected to each other.
Example, we find all nodes connected to Barbara, either incoming or outgoing, by using the
--.
We can query for relationships where a particular relationship property exists. We can also
filter on the properties of relationships and query if a property exists or not.
5. Scaling
Scaling a graph database is challenging because graph databases are “relationship-oriented”
rather than aggregate oriented. Three main techniques to scale graph database are :-
6
(a)Scaling by adding more RAM.
● Add sufficient RAM to a single server so that the entire dataset fits into memory.
● Modern machines can support large amounts of RAM, making this viable for many
use cases.
● This approach does not work for extremely large datasets that exceed realistic
memory capacity.
Each of these techniques addresses different scaling challenges in graph databases, allowing
them to handle larger datasets while maintaining performance and availability.
7
List and explain use cases where graph databases support.
1)Connected data: Graph Databases are ideal for modeling social relationships, such as
friendships, professional networks, or collaborations.
Efficient traversal of relationships, such as finding mutual friends or collaborations.
2)Routing, dispatch and location based services: Graph Databases can be used to model
nodes (location) and relationships for routing, logistics and location based recommendations.
For example, Graph Database can find the shortest path between two locations.
Applications can recommend restaurants, shops or point of interest near by.
Systems dealing with massive unstructured data like text, images, or videos without a strong
need for relationships (e.g., data lakes). Graph databases struggle with sharding and scaling
unstructured datasets efficiently due to the inherent relationship-oriented design.
Writes in graph databases often require re-indexing or updating relationships, which can
degrade performance. Write-heavy workloads are better suited for other NoSQL databases
like Cassandra or DynamoDB.
When you want to update all or a subset of entities—for example, in an analytics solution
where all entities may need to be updated with a changed property—graph databases may not
be optimal since changing a property on all the nodes is not a straightforward operation.