0% found this document useful (0 votes)
16 views6 pages

Explain The Update Consistency - Update (Write-Write Conflict), Read (Read-Write Conflict) With An Example and A Neat Diagram

Uploaded by

Vijaylaxmi Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Explain The Update Consistency - Update (Write-Write Conflict), Read (Read-Write Conflict) With An Example and A Neat Diagram

Uploaded by

Vijaylaxmi Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1.

Explain the update consistency – update (write-write conflict), read (read-write


conflict) with an example and a neat diagram.
Update (Write-Write Conflict): This occurs when two users attempt to update the same data item
simultaneously. For example, if Martin and Pramod both try to update a phone number on a company
website at the same time, they may use different formats, leading to a write-write conflict. The system
must resolve this conflict, often by allowing one update to succeed while the other fails or is queued
for later resolution

Example: Martin and Pramod both update a contact number on a company website. Martin changes
it to "123-456-7890," while Pramod updates it to "987-654-3210." If the server processes Martin's
update first, the final stored value may be "987-654-3210," causing Martin's update to be lost.

Read (Read-Write Conflict): This happens when one user reads data while another user is updating it.
For instance, if Martin reads the old phone number while Pramod is updating it, he may not see the
latest information. This can lead to inconsistencies in what users perceive as the current data

Example: If Martin reads the phone number while Pramod is updating it, he may see the old number
"123-456-7890." This outdated information can lead to incorrect decisions based on stale data.

2.Define Quorums and explain read and write quorum with examples
A quorum is a subset of nodes in a distributed system that must agree on a read or write
operation to ensure strong consistency.
The concept is crucial when dealing with replicated data across multiple nodes, as it helps
avoid inconsistencies that can arise from concurrent operations.
Write Quorum
A write quorum is the minimum number of nodes that must acknowledge a write
operation for it to be considered successful.
For example, if data is replicated across three nodes (N = 3), a write quorum (W) of 2
means that at least two nodes must confirm the write. This can be expressed as W > N/2,
ensuring that a majority of nodes have the latest data.
If two nodes acknowledge the write while one does not, the system can still maintain
consistency, as the majority has agreed on the new value .
Read Quorum
A read quorum is the minimum number of nodes that must be contacted to ensure that
the most recent write is read.
Continuing with the previous example, if the write quorum is W = 2, then a read quorum
(R) of 2 is also required to guarantee that the latest data is retrieved. This can be expressed
as R + W > N.
If a read operation contacts only one node while the write quorum was not met, it may
read stale data. However, if it contacts two nodes, it can ensure that it retrieves the most
up-to-date information .
Example Scenario
Consider a system with three nodes (A, B, and C) where:
A write operation is performed, and nodes A and B acknowledge the write (W = 2).
For a subsequent read operation, if nodes A and C are contacted (R = 2), the read
will return the latest data since the write quorum was met.

3.Define Version Stamps. List and explain the approaches through which version stamps
can be constructed for single source models.
Version stamps are mechanisms used to track changes in data records, ensuring that updates
are based on the most current information. They help prevent conflicts in multi-user
environments by indicating the version of a record at any given time. When a record is
updated, its version stamp changes, allowing systems to verify whether the data being
modified is up-to-date.
Approaches to Construct Version Stamps for Single Source Models
1. Counter-Based Version Stamps:
• Each time a record is updated, a counter is incremented.
• This approach is straightforward and allows easy comparison of versions; a
higher counter indicates a more recent update.
• However, it requires a single authoritative source to manage the counter to
avoid duplication [1].
2. GUID (Globally Unique Identifier):
• A GUID is a large random number that is unique across different systems.
• It can be generated by any node, eliminating the risk of duplication.
• The downside is that GUIDs are large and cannot be directly compared for
recency, making it difficult to determine which version is newer [1].
3. Content Hashing:
• This method involves creating a hash of the contents of the resource.
• A sufficiently large hash key size can ensure global uniqueness and can be
generated by anyone.
• While deterministic (the same content will always produce the same hash), it
cannot be directly compared for recency [1].
4. Timestamp-Based Version Stamps:
• This approach uses the timestamp of the last update to indicate the version.
• Timestamps are relatively short and can be directly compared to determine
which version is more recent.
• However, it requires synchronized clocks across multiple machines to avoid
issues with data corruption due to clock discrepancies [1].
5. Composite Version Stamps:
• A combination of the above methods can be used to create a composite version
stamp.
• For example, using both a counter and a content hash can help in identifying
conflicts while allowing for recentness comparison.
• This method is particularly useful in systems that require high availability and
consistency, such as peer-to-peer replication systems

4.Explain map-reduce with example


Map-Reduce is a programming model designed for processing large datasets by distributing the work
across multiple machines in a cluster. It consists of two main functions: the Map function and
the Reduce function. Here’s a breakdown of how it works, along with an example.
How Map-Reduce Works
• Map Function:
• The map function reads records from a dataset and emits key-value pairs. For instance,
if we have a list of orders, the map function might extract details like product name
and quantity, emitting pairs such as (product_name, quantity) for each order .
• Shuffle and Sort:
• After the map phase, the framework groups all emitted key-value pairs by key. This
means all values associated with the same key are collected together, preparing them
for the reduce phase .
• Reduce Function:
• The reduce function takes these grouped key-value pairs and combines them to
produce a final result. For example, if the key is a product name, the reduce function
could sum the quantities to find the total number of orders for that product .
5.Explain the partitioning and combining stages with examples

Partitioning is the process of dividing the output of the map function into different
segments or partitions. Each partition contains key-value pairs that will be sent to a specific
reducer. The goal is to ensure that all data for the same key is grouped together in one
partition so it can be processed by a single reducer .
Example:
Consider a scenario where we have the following key-value pairs emitted by the map
function:
(Product A, 2)
(Product B, 1)
(Product A, 3)
(Product C, 4)
If we have two reducers, the partitioning might look like this:
Reducer 1: (Product A, 2), (Product A, 3)
Reducer 2: (Product B, 1), (Product C, 4)
Here, all entries for Product A are sent to Reducer 1, while Products B and C go to Reducer
2. This allows each reducer to work on its own set of keys in parallel, improving processing
speed .
Combining Stage
Definition:
The combining stage is an optional step that occurs before the data is sent to the reducers.
A combiner function can be used to combine all values for the same key within each
partition. This helps reduce the amount of data that needs to be transferred across the
network, making the process more efficient .
Example:
Using the same key-value pairs from the previous example, if we apply a combiner function
that sums the quantities for each product, the output before sending to the reducers
might look like this:
(Product A, 5) // Combined from (Product A, 2) and (Product A, 3)
(Product B, 1)
(Product C, 4)
This means that instead of sending multiple entries for Product A to the reducer, we only
send a single entry with the total quantity. This reduces the amount of data transferred
and speeds up the overall process .

6.Explain two stages of map-reduce with a neat diagram


7.What are Key-value stores? List out some popular key value database.

Key-value stores are a type of NoSQL database that uses a simple data model to store data as a
collection of key-value pairs. Each key is unique and acts as an identifier for the associated value, which
can be a simple data type or a more complex data structure. This model is akin to a hash table, where
the key is the index, and the value is the data being stored
Popular Key-Value Databases
• Redis:An in-memory data structure store, often used as a database, cache, and message
broker. It supports various data structures such as strings, hashes, lists, sets, and more.
• Amazon DynamoDB:A fully managed NoSQL database service that provides fast and
predictable performance with seamless scalability. It is designed for high availability and
durability.
• Riak:A distributed NoSQL database that offers high availability, fault tolerance, and scalability.
It is designed to handle large amounts of data across many servers.
• Cassandra:While primarily a wide-column store, it can also function as a key-value store. It is
known for its high availability and scalability, making it suitable for handling large datasets
across multiple nodes.
• Berkeley DB:A high-performance embedded database that provides a key-value store
interface. It is often used in applications requiring fast data access.
• LevelDB:A fast key-value storage library written at Google that provides an ordered mapping
from string keys to string values. It is designed for high performance and efficiency.

You might also like