0% found this document useful (0 votes)
54 views9 pages

Module 5 Nosql

This document provides an overview of graph databases, detailing their components (nodes and edges), organizational advantages, querying methods, and performance benefits. It compares graph databases with relational databases, highlighting their flexibility and efficiency in handling interconnected data, and discusses their applications in various domains. Additionally, it covers transaction management, consistency, availability, and scaling challenges, along with suitable use cases and scenarios where graph databases may not be ideal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views9 pages

Module 5 Nosql

This document provides an overview of graph databases, detailing their components (nodes and edges), organizational advantages, querying methods, and performance benefits. It compares graph databases with relational databases, highlighting their flexibility and efficiency in handling interconnected data, and discusses their applications in various domains. Additionally, it covers transaction management, consistency, availability, and scaling challenges, along with suitable use cases and scenarios where graph databases may not be ideal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

NOSQL Database 21CS745

MODULE 5

Graph Databases

A graph database is a specialized database optimized for storing and querying data
represented as graphs. It consists of two main components: nodes (entities)
and edges (relationships), along with their associated properties.

1. Components of a Graph Database

Nodes

 Nodes represent entities in the graph.

 Each node can have properties stored as key-value pairs, providing descriptive
information.

 Nodes can be thought of as the fundamental building blocks of the graph structure.

Edges

 Edges define relationships between nodes.

 Relationships are directional and can carry meaning based on their direction. For
instance, a "likes" relationship implies one-way affinity, while a "friend" relationship
might be bidirectional.

 Edges can also have properties, enabling richer metadata to be stored about the
relationship.

2. Organizational Advantages

Graph databases allow for flexible organization of data. The relationships between nodes are
explicitly stored, enabling the discovery of complex patterns. This explicit storage facilitates
efficient queries without the need for extensive computation or schema changes.

 Nodes and edges are stored once, but they can be interpreted and queried in various
ways.

 This flexibility supports evolving data models, unlike rigid schemas in relational
databases.

3. Querying and Traversal

 Querying in graph databases is achieved by traversing the graph. Traversal refers to


navigating through nodes and edges based on specific criteria.

 Queries can be designed to explore direct or indirect relationships, such as identifying


all connected nodes of a particular type.

1
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

 Graph traversal is efficient because relationships are stored persistently rather than
being computed dynamically during a query.

4. Comparisons with Relational Databases

 Schema Adaptability: Relational databases require schema changes and data


migration to accommodate new relationships. Graph databases do not, as relationships
are part of the inherent structure.

 Join Operations: Relational databases use joins to connect data across tables, which
can be slow for complex queries. In graph databases, relationships are explicitly
stored, making traversal fast and efficient.

 Model Flexibility: Graph databases are not limited to a single type of relationship.
Nodes can have diverse and numerous connections, allowing for richer
representations of complex domains.

5. Applications

 Graph databases are well-suited for domains with intricate, interconnected data, such
as social networks, recommendation systems, fraud detection, knowledge graphs, and
supply chain management.

 They enable secondary relationships, hierarchical structures, spatial indexing, and


temporal data to coexist in a single database structure.

6. Performance Benefits

 Persisted relationships in graph databases lead to faster traversal compared to


calculating relationships dynamically in relational databases.

 The ability to handle dynamic and varied relationships efficiently is a significant


advantage over traditional systems.

Features of Graph Databases

Graph databases, exemplified by tools such as Neo4j, OrientDB, and FlockDB, offer a robust
way to model and analyze interconnected data.

1. Consistency in Graph Databases

Consistency in graph databases is crucial due to their reliance on tightly interconnected nodes
and relationships.

Single-Server Consistency

 Most graph databases do not distribute nodes across multiple servers, focusing instead
on maintaining data consistency within a single server.

 Neo4j, for instance, is fully ACID-compliant, ensuring strong consistency:

2
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

 Atomicity: All parts of a transaction are completed or none at all.

 Consistency: Transactions maintain the integrity of relationships and


properties.

 Isolation: Concurrent transactions do not interfere with one another.

 Durability: Once a transaction is committed, the data is permanently stored.

Cluster Consistency

 Some solutions, like Infinite Graph, support node distribution across a server cluster.
Neo4j also supports clustering with specific behaviors:

 A master-slave architecture ensures that writes to the master node are


eventually synchronized to slave nodes.

 Slave nodes are always available for reads, even if data propagation is
delayed.

 Write operations on slave nodes are synchronized to the master, but other
slaves only update when the master propagates the data.

 This approach ensures eventual consistency in a cluster while maintaining


ACID compliance for single-server operations.

Dangling Relationships

 Graph databases ensure that relationships remain valid:

 Start and end nodes must exist for a relationship to be created.

 Nodes cannot be deleted if they have active relationships, preventing dangling


references.

2. Transactions in Graph Databases

Transactions are a cornerstone of maintaining consistency and integrity in graph


databases. Neo4j provides robust transactional support, emphasizing its ACID compliance.
Neo4J's design prioritizes availability, scalability, and query efficiency. With features
like ACID compliance, high availability clusters, and graph-specific query capabilities, it
is ideal for use cases requiring rich, interconnected data representations.

Here's how it works:

Key Features of Transactions

1. Mandatory for Modifications:

3
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

 Write operations like adding nodes or creating relationships require


transactions.

 Without transactions, Neo4j throws a NotInTransactionException.

 Read operations can be performed without transactions.

2. Transaction Workflow:

 A transaction is initiated using beginTx.

 Operations (e.g., creating nodes, setting properties) are performed within the
transaction.

 The transaction is marked successful using success().

 Finally, the transaction is completed using finish().

Example Code for a Transaction

Transaction transaction = database.beginTx(); // Start a transaction try

{ Node node = database.createNode(); // Create a new node()

node.setProperty("name", "NoSQL Distilled"); // Set properties


node.setProperty("published", "2012");

transaction.success(); // Mark the transaction as successful }

finally { transaction.finish(); // Complete the transaction }

Key Points to Remember

 If a transaction is not marked as successful, Neo4j assumes a failure and rolls back
the changes when finish() is called.

 Merely marking a transaction as successful without finishing it does not commit the
changes.

 This explicit transaction management differs from traditional RDBMS systems, where
commit and rollback mechanisms are more implicit.

Advantages of Transaction Management

 Ensures data integrity even in cases of failure or interruption.

 Provides a clear mechanism for handling success and failure, offering developers
fine-grained control.

 Prevents partial or inconsistent updates to the graph.

4
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

3 . Availability

 Neo4J supports high availability via replicated slaves:

 Slaves can handle both reads and writes.

 Write synchronization:

 Slaves sync writes with the master first.

 Other slaves are updated eventually.

 Uses Apache ZooKeeper for cluster coordination:

 Tracks transaction IDs.

 Identifies the master node.

 Elects a new master when the current one fails.

4 . Query Features

1. Query Languages:

 Cypher: Neo4J's declarative query language.

 Gremlin: For traversing property graphs compatible with Blueprints.

2. Indexing and Searching:

 Nodes and relationships can be indexed using Lucene.

 Indexed properties are used to locate starting nodes for traversals.

Node barbara = nodeIndex.get("name", "Barbara").getSingle();

3. Traversals:

 Traversers explore relationships at various depths:

 Direction filters: INCOMING, OUTGOING, BOTH.

 Order strategies: BREADTH_FIRST, DEPTH_FIRST.

Traverser friendsTraverser = barbara.traverse(

Order.BREADTH_FIRST,

StopEvaluator.END_OF_GRAPH,

ReturnableEvaluator.ALL_BUT_START_NODE,

EdgeType.FRIEND,

5
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

Direction.OUTGOING

);

4. Pathfinding:

 Identify paths between nodes:

 Shortest path using algorithms like Dijkstra.

 All paths:

PathFinder<Path> finder = GraphAlgoFactory.allPaths(

Traversal.expanderForTypes(FRIEND, Direction.OUTGOING), MAX_DEPTH

);

Iterable<Path> paths = finder.findAllPaths(barbara, jill);

5. Cypher Query Language Syntax:

START <starting node>

MATCH <relationship patterns>

WHERE <conditions>

RETURN <results>

ORDER BY <ordering>

SKIP <records>

LIMIT <result limit>

Graph Databases Use Cases

 Powerful for multi-level traversals, like finding friends-of-friends.

 Applications include social networks, recommendation systems, and path analysis.

5. Scaling

1. Challenges in Scaling Graph Databases:

 Graph databases are relationship-oriented, making sharding difficult because


related nodes often need to be stored on the same server for performance.

 Traversing a graph across servers leads to significant performance


degradation.

2. Common Scaling Techniques:

6
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

 Increase RAM:

 Store the working set of nodes and relationships in memory.

 Suitable only if the dataset fits within a realistic amount of RAM.

 Read Scaling with Master-Slave Architecture:

 Use a master node for writes and multiple slave nodes for reads.

 Proven in MySQL clusters; helps with availability and read-scaling.

 Practical for datasets that cannot fit into a single machine’s memory
but are small enough to replicate across machines.

 Sharding Using Domain Knowledge:

 Perform application-level sharding based on domain-specific criteria.

 For example:

 Nodes related to North America stored on one server.

 Nodes related to Asia stored on another server.

 Requires understanding that nodes are on physically separate


databases.

Suitable Use Cases

1. Connected Data:

 Ideal for social networks or domains with rich interconnections.

 Examples:

 Employees and their collaboration on projects.

 Relationships between entities across domains (e.g., social, spatial,


commerce).

2. Routing, Dispatch, and Location-Based Services:

 Nodes represent locations or addresses.

 Relationships may include properties like distance.

 Applications:

 Optimize deliveries by minimizing distances.

 Recommend nearby places of interest (e.g., restaurants).

7
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

 Notify users about points of sale when nearby.

3. Recommendation Engines:

 Use nodes and relationships to make personalized recommendations:

 "Your friends also bought this."

 "Other visitors to this location liked these activities."

 Graphs can also help in:

 Identifying patterns (e.g., products often bought together).

 Fraud detection through relationship pattern analysis.

 Advantages:

 Recommendations improve as data grows (more nodes and


relationships).

When Not to Use

 Mass Updates:

 Not suitable for operations requiring updates to all or many entities (e.g.,
analytics requiring global property changes).

 Global Graph Operations:

 Some databases may struggle with handling large amounts of data or


performing global operations across the graph.

Graph Database

8
Koustav Biswas, Dept. Of CSE, DSATM
NOSQL Database 21CS745

-----------------------------------------END OF MODULE 5---------------------------------------------

9
Koustav Biswas, Dept. Of CSE, DSATM

You might also like