Nosql 4
Nosql 4
Column Family
Aspect Key-Value Database Document Database
Database
Large-scale, Complex,
distributed hierarchical data
Simple lookups and
applications where with flexible
fast retrieval by key
Use Cases data is mostly read structures (e.g.,
(e.g., caching,
by column (e.g., content
session stores).
analytics, time- management, user
series data). profiles).
Column Family
Aspect Key-Value Database Document Database
Database
Scalable, though
Highly scalable, Highly scalable, horizontal scaling
Scalability horizontal scaling is optimized for may require more
the default method. horizontal scaling. effort compared to
key-value stores.
Apache Cassandra,
Redis, Riak, MongoDB, CouchDB,
Example Systems HBase, Google
DynamoDB RavenDB
Bigtable
ACID Properties Provides tunable Often does not fully May support ACID
consistency, but support ACID; transactions (like in
does not fully typically BASE MongoDB), but
support ACID. (Basically Available, generally provides
Soft state, eventual
Column Family
Aspect Key-Value Database Document Database
Database
Eventually
consistency.
consistent).
Data is stored in
columns, grouped Data is stored as
Data is stored in a
into families, which documents, usually
flat key-value
Data Storage allows efficient in a binary format
format, with
storage of data that like BSON (in
minimal structure.
is often accessed MongoDB) or JSON.
together.
1. Consistency
Consistency ensures that a database is always in a valid state. In other words, it
ensures that data adheres to the rules and constraints defined within the system
(e.g., data integrity, foreign key constraints, and business logic).
Graph Databases:
o Most graph databases are ACID-compliant, ensuring strong consistency
during transactions. They maintain consistency within the graph
structure and preserve relationships during updates.
o In distributed systems, graph databases can employ eventual
consistency (depending on configuration) but typically provide strong
consistency in a single-node setup.
o In distributed setups (e.g., sharded databases), consistency might be
eventually consistent, but distributed relational databases like Google
Spanner or CockroachDB strive for strong consistency.
2. Transactions
A transaction is a sequence of operations that are treated as a single unit. A database
ensures that all operations within a transaction are completed successfully (commit)
or rolled back (rollback) in case of failure.
Graph Databases:
o ACID Transactions: Most graph databases (like Neo4j) support ACID
transactions, meaning that the changes made during the transaction
are consistent, isolated, and durable.
o Support for Complex Relationships: Transactions involving multiple
nodes and edges are treated as a single atomic unit, which is essential
when modifying deeply connected data.
o follows the ACID properties. Transactions ensure that the database
maintains integrity during operations like inserts, updates, and deletes.
o Transaction Isolation: Relational databases use isolation levels (e.g.,
Read Committed, Serializable) to ensure the accuracy and consistency
of the data during concurrent operations.
3. Availability
Availability refers to the ability of a database to remain operational and accessible,
even in the face of failures. In distributed systems, it means that the database can
serve read and write requests, even if some components fail.
Graph Databases:
o High Availability: Many graph databases support replication and
distributed architecture to ensure availability. For example, Neo4j
offers clustering with automatic failover to keep the system available if
a node fails.
o In certain configurations, graph databases offer eventual consistency to
ensure high availability, allowing updates to propagate across nodes
asynchronously.
o
4. Scaling
Scaling refers to a system's ability to handle increased loads by adding more
resources, either vertically (scaling up) or horizontally (scaling out).
Graph Databases:
o Horizontal Scaling: Graph databases are typically designed for
horizontal scaling, meaning they can distribute data across multiple
nodes to handle more significant workloads. Examples include Neo4j's
Causal Clustering and Amazon Neptune, which can scale to meet the
needs of high-performance applications.
o Distributed Graph Databases: As relationships are key to graph data,
distributing them across multiple nodes must be done carefully to
minimize cross-node operations, which can affect performance.
Summary Table:
Aspect Graph Databases Relational Databases
1. Properties of Graphs
a) Type of Graph
Undirected Graph: In this graph, edges have no direction. If there is an edge
between nodes A and B, you can traverse from A to B and from B to A.
Directed Graph (Digraph): Edges have direction. An edge from node A to B is
different from an edge from B to A.
Weighted Graph: Each edge in the graph has a weight, often representing
costs, distances, or capacities.
Unweighted Graph: Edges do not have any associated weight; they only
indicate a connection.
Cyclic Graph: Contains at least one cycle, meaning there’s a path that starts
and ends at the same node.
Acyclic Graph: Does not contain any cycles. A Directed Acyclic Graph (DAG) is
a directed graph with no cycles.
Connected Graph: A graph is connected if there is a path between every pair
of nodes.
Disconnected Graph: A graph is disconnected if at least one pair of nodes is
not connected by a path.
b) Graph Density
Sparse Graph: A graph is considered sparse if the number of edges is much
less than the maximum possible number of edges.
Dense Graph: A graph is dense if the number of edges is close to the
maximum possible number of edges.
c) Degree of a Graph
Degree of a node refers to the number of edges connected to it.
o In-degree: The number of edges directed towards a node (relevant for
directed graphs).
o Out-degree: The number of edges directed away from a node (relevant
for directed graphs).
d) Graph Connectivity
Strongly Connected Graph (in a directed graph): There is a path from any node
to every other node.
Weakly Connected Graph: If the edges are ignored as directed, there is a path
between any two nodes.
e) Planarity
Planar Graph: A graph that can be drawn on a plane without any of its edges
crossing.
Non-Planar Graph: A graph that cannot be drawn in a plane without edge
intersections.
f) Subgraph
A subgraph is a graph formed from a subset of the nodes and edges of the original
graph.
2. Properties of Nodes
a) Node Degree
Degree: The number of edges connected to a node.
o Undirected Graph: The degree is simply the count of edges.
o Directed Graph: In-degree (edges coming in) and out-degree (edges
going out).
o Weighted Graph: The degree can be the sum of weights of the edges
connected to the node.
b) Centrality
Centrality measures are used to determine the importance of a node within a graph.
Degree Centrality: The number of edges connected to a node. Nodes with
higher degrees are considered more central.
Betweenness Centrality: Measures how often a node acts as a bridge along
the shortest path between two other nodes.
Closeness Centrality: Measures how close a node is to all other nodes in the
graph.
Eigenvector Centrality: A measure of the influence of a node in a network,
based on the number and quality of connections.
c) Node Clustering (Community Detection)
Clustering Coefficient: A measure of the degree to which nodes in a graph
tend to cluster together. It measures the likelihood that two neighbors of a
node are connected to each other.
Community: A set of nodes that are more densely connected to each other
than to other nodes in the graph. Identifying communities helps in network
analysis (e.g., social network groups).
d) Node Connectivity
Articulation Node (Cut Vertex): A node whose removal would disconnect the
graph or increase the number of connected components.
Isolated Node: A node with no edges connected to it.
Leaf Node: A node with only one edge connected to it, often found in tree-like
structures.
e) Node Types in Special Graphs
Source Node: In a directed graph, a node with only outgoing edges (in-degree
is 0).
Sink Node: In a directed graph, a node with only incoming edges (out-degree
is 0).
Root Node: In tree-like structures, the top node from which all other nodes
are descended.
f) Node Labeling
Nodes can be labeled or given attributes that help to identify or categorize them. This
is especially important in weighted or attributed graphs, where nodes may hold extra
information (e.g., user IDs, product IDs, or labels like “active” or “inactive”).
4. Structural Properties
Eulerian Path/Circuit: A path or circuit that visits every edge in the graph
exactly once. For an Eulerian circuit to exist, every vertex must have an even
degree.
Hamiltonian Path/Circuit: A path or circuit that visits every vertex exactly
once. Finding a Hamiltonian path is NP-complete, which means it is
computationally difficult to solve.
No problem! Let me break it down in a simpler way and focus on how these types of
graphs are used in graph databases, step by step:
Graph Databases
In graph databases, data is represented in the form of graphs, where:
Nodes represent entities (e.g., people, products, places).
Edges represent the relationships between these entities (e.g., a person "likes"
a post, or a product "belongs to" a category).
Properties can be added to both nodes and edges to store extra information
(like names, dates, costs).
Now, let's understand the different types of graphs that can exist in a graph database,
but from a database perspective.
2. Flow Network
Database View: A flow network is a special type of directed graph where each
edge has a capacity (like how much "flow" can go through it). This is used
when you need to track resources, like goods, data, or money, moving from
one node to another.
Example: Imagine a system where packages are moving between different
warehouses. Each warehouse connection (edge) has a limit on how many
packages can pass through it.
o Graph Database Example:
o (WarehouseA)-[:SHIPS {capacity: 100}]->(WarehouseB)
3. Bipartite Graph
Database View: A bipartite graph has two types of nodes. Edges only exist
between these two types of nodes, not within them. This is useful for
situations where you have two distinct sets of entities, and they are connected
in some way.
Example: In a job portal, one set of nodes represents workers, and another set
represents jobs. A worker can be assigned to a job, but jobs and workers don’t
interact directly.
o Graph Database Example:
o (WorkerA)-[:ASSIGNED_TO]->(Job1)
o (WorkerB)-[:ASSIGNED_TO]->(Job2)
4. Multigraph
Database View: A multigraph is a graph where there can be multiple edges
between the same two nodes, each edge representing a different
relationship.
Example: Imagine a social media platform where users can interact with each
other in different ways, such as liking a post, commenting on it, or sharing it.
These different interactions are represented by multiple edges between the
same two users.
o Graph Database Example:
o (UserA)-[:LIKES]->(Post1)
o (UserA)-[:SHARES]->(Post1)
o (UserA)-[:COMMENTS]->(Post1)
5. Weighted Graph
Database View: In a weighted graph, each edge has a weight or value that
represents something like cost, distance, or time. This is used when you need
to find the shortest path or the most efficient route between nodes.
Example: In a navigation system, roads between cities are represented by
edges, and each edge has a weight that represents the distance or travel time
between cities.
o Graph Database Example:
o (CityA)-[:CONNECTED {distance: 150}]->(CityB)
o (CityB)-[:CONNECTED {distance: 100}]->(CityC)
🔒 2. Transactions
Definition: A transaction is a group of database operations that either all succeed or
all fail (also called atomic operations).
🌍 Real-World Example: Online Shopping Cart
Let’s say you’re checking out your shopping cart on Flipkart:
Deduct item from inventory.
Apply discount coupon.
Deduct money from your wallet.
Generate invoice.
If even one step fails, like the wallet doesn’t have enough money:
All other operations must be rolled back.
The order should not be placed partially.
🧠 In Column-Family DB:
Column-family databases like Cassandra do not support full ACID transactions (like
SQL), but:
You can use batch operations for atomicity within a partition.
Lightweight transactions (LWT) support compare-and-set operations to
prevent race conditions.
✅ Good for use-cases like updating user profile data or logging an event.
📈 3. Scaling
Definition: Scaling is the ability of a database to handle more users, data, or traffic
by increasing system resources.
🌍 Real-World Example: Netflix User Activity
During peak hours (evening), millions of users are watching different shows.
Netflix needs to:
o Store user preferences
o Track watch history
o Record likes/dislikes
All these happen in real-time and across multiple countries.
🧠 In Column-Family DB:
Column-family databases like Cassandra support horizontal scaling:
o Just add more nodes to handle more data or traffic.
o Data gets evenly distributed using consistent hashing.
o No downtime while scaling.
👍 Ideal for applications like:
Social media feeds
Online games
Real-time analytics