NoSQL M2
NoSQL M2
• NoSQL databases are increasingly popular due to their capability to run on large
clusters, making them suitable for handling growing data volumes.
• Scaling Options: Instead of scaling up with larger servers, NoSQL allows for
scaling out by distributing databases across clusters of servers.
• Aggregate Orientation: The use of aggregates aligns well with scaling out, serving
as a natural unit for data distribution.
• Data Distribution Benefits: Effective distribution models can enhance data storage
capacity, improve read/write performance, and increase availability during network
issues.
• Distribution Techniques: Two main methods for data distribution are replication
and sharding (distributing different data across nodes).
• Replication Types: Replication can be implemented in two forms: master-slave and
peer-to-peer, with the discussion covering single-server setups, master-slave
replication, sharding, and peer-to-peer replication.
Single-Server Database Distribution in NoSQL 4
• Sequential Data Access: Aggregates that are likely to be read in sequence can be
arranged to improve processing efficiency, as illustrated by the organization of
data in the Bigtable paper.
• Challenges of Manual Sharding: Historically, sharding has been managed
through application logic, complicating programming and requiring code changes
for rebalancing data across shards.
• Benefits of Auto-Sharding: Many NoSQL databases offer auto-sharding,
simplifying the distribution of data and allowing for more efficient application
development.
• Performance Enhancement: Sharding improves both read and write performance
by horizontally scaling the database, making it valuable for applications with high
write demands.
• Cautions with Sharding: While sharding can enhance performance, it can also
decrease resilience if not implemented carefully, and transitioning from a single-
server to a sharded configuration should be done proactively to avoid issues in
production.
Master-Slave Replication 8
• Update Processing: The master is responsible for all data updates, and the slaves
are synchronized with it to ensure they reflect the latest data.
• Read Scalability: Master-slave replication is beneficial for read-intensive
datasets, allowing horizontal scaling by adding more slave nodes to handle
increased read requests.
• Limitations on Write Traffic: The master’s ability to process updates can become
a bottleneck in write-heavy environments, as it handles all write operations.
• Read Resilience: If the master fails, slaves can still process read requests,
providing continuity for read-heavy applications.
• Recovery Speed: In the event of a master failure, a slave can be quickly
promoted to master, facilitating faster recovery.
Master-Slave Replication (3) 10
• Hot Backup Functionality: The system can function like a single-server setup with a
hot backup, improving resilience without needing complex scaling.
• Master Appointment Methods: Masters can be appointed manually during
configuration or automatically through a cluster election process, enhances uptime.
• Separate Read and Write Paths: To achieve read resilience, applications should
have distinct paths for read and write operations, which may require specialized
database connections.
• Testing for Resilience: It's essential to conduct tests to ensure that reads can occur
even when writes are disabled, verifying the system's read resilience.
• Data Consistency Challenges: A major drawback of master-slave replication is the
potential for inconsistency; different clients might read different values if updates
haven’t fully propagated to the slaves.
• Risk of Data Loss: If the master fails before updates are replicated to the slaves,
those changes can be lost, emphasizing the importance of consistency and recovery
strategies.
Peer-to-Peer Replication (1) 11
• Limitations of Master-Slave
Replication: While master-slave
replication enhances read scalability, it
does not improve write scalability and
poses a single point of failure at the
master node.
• Introduction to Peer-to-Peer
Replication: Peer-to-peer (P2P)
replication eliminates the master node,
allowing all replicas to have equal
status, which can accept writes.
Peer-to-Peer Replication (2) 12
• Node Failure Resilience: In a P2P setup, the loss of any single node does not disrupt
access to the data store, enhancing overall data availability.
• Performance Enhancement: Adding more nodes in a P2P cluster can improve
performance without the bottleneck of a single master.
• Consistency Challenges: A major complication with P2P replication is maintaining
consistency, especially during simultaneous writes to the same record, leading to
write-write conflicts.
• Impact of Inconsistent Writes: While read inconsistencies are typically transient,
write inconsistencies can have permanent effects, complicating data integrity.
• Coordinating Writes for Consistency: One approach to handle write inconsistencies
involves coordinating writes across replicas, requiring majority agreement for a
valid update.
Peer-to-Peer Replication (3) 13
• Module -2 : Chapter 5
Consistency 19
• 3. Example of Pessimistic: In a scenario, only the first client (Martin) acquires the
lock and can update, while the second (Pramod) must wait.
• 4. Optimistic Approach: Allows conflicts to occur but detects and resolves them
post-factum, often through conditional updates.
• 7. Multiple Servers: With multiple servers, different update orders can lead to
discrepancies in data (e.g., different phone numbers).
11. Merging Updates: Conflicting updates may require user intervention to merge
or automatic handling based on specific rules.
13. Safety vs. Liveness: There’s a fundamental tradeoff between avoiding errors
(safety) and maintaining quick responses (liveness).
14. Deadlocks: Pessimistic approaches can lead to deadlocks, which are challenging
to prevent and debug.
1. Update Consistency vs. Read Consistency: A data store can maintain update
consistency, but readers may not always receive consistent data.
2. Logical Consistency:
- Ensures related data items (e.g., order line items and shipping charges) are
consistent.
- Transactions in relational databases prevent read-write conflicts.
3. NoSQL Transactions:
- Claims that NoSQL databases lack transaction support are misleading.
- Aggregate-oriented NoSQL databases allow atomic updates within single
aggregates, not across multiple aggregates.
4. Inconsistency Window:
- Time during which inconsistent reads can occur when updates affect multiple
aggregates. - Example: Amazon’s SimpleDB has a short inconsistency window
(usually <1 second).
A read-write conflict in logical consistency 25
Read Consistency 26
5. Replication Consistency:
- Different replicas may return inconsistent data (e.g., hotel room booking).
- Eventually consistent: all nodes will update to the same value eventually.
6. Impact of Replication on Consistency: Replication can extend logical
inconsistency windows, especially if updates happen rapidly.
7. Configurable Consistency Levels: Applications can specify the desired
consistency level per request (weak or strong).
8. User Experience and Inconsistency: Inconsistencies can confuse users, especially
during simultaneous actions (e.g., booking hotel rooms).
9. Read-Your-Writes Consistency: Guarantees users will see their updates after
they write, often implemented via session consistency.
Replication Consistency Example 27
Read Consistency 28
This concise summary captures the essence of the complexities surrounding data
consistency in databases, particularly focusing on relational and NoSQL systems.
Relaxing Consistency 29
1. The CAP theorem states that in distributed systems, you can only achieve two
out of three properties: Consistency, Availability, and Partition tolerance, leading to
necessary trade-offs.
- Consistency: All nodes see the same data at the same time.
- Availability: If a node is reachable, it responds to requests.
- Partition Tolerance: The system continues to operate despite communication
breakdowns.
3. Single-Server vs. Cluster Systems: Single-server systems are naturally CA
(Consistency and Availability) but cannot tolerate partitions. In contrast, cluster
systems must often prioritize Partition tolerance, leading to compromises on
Consistency.
4. Practical Trade-offs: Systems may allow inconsistent writes to enhance
availability, such as in hotel bookings or shopping carts, where some level of
overbooking or merging data may be acceptable.
Partition tolerance 31
• Module -2 : Chapter 3
NoSQL Database (21CS745)
Version Stamps
Conditional Updates
• Use version stamps to perform conditional updates, ensuring updates are not
based on stale data.
• Similar mechanisms are used in HTTP with etags for resource updates.
• Compare-and-set (CAS) operations can also be used, comparing a version stamp
before setting the new value.
Additional Uses of Version Stamps 44
Version Histories:
Requires each node to track version stamp history.
Clients or server nodes must store and share histories.
Can detect inconsistencies by checking if one history is an ancestor of another.
Not commonly used in NoSQL databases.
Vector Stamps:
A set of counters, one for each node.
Each node updates its own counter on internal updates.
Nodes synchronize their vector stamps during communication.
Allows for determining newer versions: all counters in the newer stamp are greater than or equal to
those in the older stamp.
Detects write-write conflicts when both stamps have counters greater than the other.
Missing values are treated as 0, allowing easy addition of new nodes.
Helps spot inconsistencies but does not resolve them.
Next Class 47