0% found this document useful (0 votes)
4 views16 pages

Replication

Uploaded by

raghavi.s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views16 pages

Replication

Uploaded by

raghavi.s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Replication

Presented by Shivang Kumar


Introduction
Replication means keeping a copy of the same data on
multiple machines that are connected via a network. There
are several reasons why you might want to replicate data:
1.To keep data geographically close to your users (and
thus reduce latency)
2.To allow the system to continue working even if some of
its parts have failed (and thus increase availability)
3.To scale out the number of machines that can serve read
queries (and thus increase read throughput)
Leader-Based
Replication
• Replication Basics:
⚬ Database replicas store copies of the data.
⚬ Synchronization is crucial for consistency.
• Leader-Based Replication:
⚬ One replica is the leader; others are followers.
⚬ Clients write to the leader, which updates its storage.
⚬ Leader sends changes to followers for synchronisation.
• Operations:
⚬ Reads can be from any replica.
⚬ Writes are only accepted by the leader.
• Examples:
⚬ Common in relational databases
(PostgreSQL, MySQL, Oracle).
⚬ Used in nonrelational databases
(MongoDB, RethinkDB), message
brokers (Kafka), and more.

Leader-based (master–slave)
replication
Synchronous Vs.
Asynchronous
• Configurability: Replication can occur synchronously or
asynchronously.
• Communication Flow: Figure illustrates synchronous and
asynchronous communication between leader and followers.
• Synchronous Replication: The leader waits for follower
confirmation, ensuring consistency.
• Asynchronous Replication: The leader sends data but doesn't wait
for the follower’s response, introducing a potential delay.
• Replication Lag: Asynchronous followers might experience delays,
leading to inconsistencies.
In the example, the replication to follower
1 is synchronous: the leader waits until
follower 1 has confirmed that it received
the write before reporting success to the
user, and before making the write visible
to other clients. The replication to follower
2 is asynchronous: the leader sends the
message, but doesn’t wait for a response
from the follower.
Leader-based replication with
one synchronous and one
asynchronous follower.
Setting Up New
Followers
• Snapshot Creation: Consistent snapshot of leader's database taken
without locking the entire database.
• Copy and Connect: Snapshot copied to new follower; follower
connects to leader for data changes.
• Catch-Up Process: Follower processes backlog, catching up to
leader.
• Automated or Manual: Setting up followers varies, from automated
processes to manual workflows.
Handling Node
Outages
• Follower Recovery: Follower easily recovers from crashes or
network interruptions using its log.
• Leader Failure (Failover): This tricky process involves promoting a
follower, reconfiguring clients, and transitioning other followers.
• Failover Detection: Timeout-based detection; leader assumed dead
if no response for a specified period.
• Failover Challenges: Potential issues include conflicting writes, split
brain scenarios, and determining the right timeout.
• Manual vs. Automatic Failover: Operations teams may prefer
manual failover for better control.
Implementation of
Replication Logs
• Replication Methods: Overview of statement-based, write-ahead
log shipping, logical (row-based) log replication, and trigger-based
replication.
• Compatibility Challenges: Write-ahead log shipping closely tied to
storage engine, limiting software version flexibility.
• Logical Log Advantages: Decoupled from storage engine, allowing
backward compatibility and different software versions.
• Trigger-Based Replication: Application-layer approach for flexibility
but with higher overhead and potential limitations.
Problems with
Replication Lag
• Read-Scaling Architecture: Using asynchronous replication for
read-heavy workloads with many followers.
• Eventual Consistency: Replication lag can lead to temporary
inconsistencies between leader and followers.
• Challenges with Replication Lag: Three highlighted problems:
"Reading Your Own Writes," "Cross-Device Consistency," and
"Problems with Replication Lag."
• Solutions for Read-After-Write Consistency: Various techniques,
including leader reads, time-based decisions, and timestamp
tracking.
Reading Your Own
Writes
• User Data Submission: Users can submit data like comments or
records in applications.
• Asynchronous Replication Challenge: Viewing data shortly after
writing may lead to perceived data loss.
• Eventual Consistency: Term coined by Douglas Terry, popularized
by Werner Vogels; a common goal for NoSQL projects.
• Read-After-Write Consistency: Ensures users always see updates
they submitted upon page reload.
• Implementation Techniques: Various methods, e.g., reading from
leader when user might have modified data.
In this situation, we need read-after-write
consistency, also known as read-your-
writes consistency. This guarantees that
users will always see any updates they
submit if they reload the page. It makes
no promises about other users: their
updates may only be visible later.
However, it reassures the user that their
input has been saved correctly. A user makes a write, followed
by a read from a stale replica. To
prevent this anomaly, we need
read-after-write consistency
Monotonic Reads
• Avoiding Time Reversal: Users might observe time moving
backward when reading from different replicas.
• Monotonic Reads Guarantee: Ensures users do not read older data
after previously reading newer data.
• Replica Selection: Consistent replica selection for each user,
possibly based on a hash of the user ID.
For example, Figure shows user 2345
making the same query twice, first to a
follower with little lag, then to a follower
with greater lag.
The first query returns a comment that
was recently added by user 1234, but the
second query doesn’t return anything
because the lagging follower has not yet
A user first reads from a fresh picked up that write.

replica, then from a stale replica.


Time appears to go backward. To
prevent this anomaly, we need
monotonic reads.
Solutions for
Replication Lag
• Considering Lag Impact: Understanding the impact of replication
lag on application behaviour.
• Stronger Guarantees: Designing systems for stronger guarantees,
e.g., read-after-write, when necessary.
• Application Complexity: Challenges in dealing with replication
issues in application code.
• Transaction Importance: Transactions as a way for databases to
provide stronger guarantees.
• Single-Node Transactions: Abandonment of single-node
transactions in distributed databases, with a call for a nuanced
view.
Thank You

You might also like