0% found this document useful (0 votes)
70 views26 pages

Dynamo: Amazon's Highly Available Key-Value Store

Dynamo is Amazon's key-value data store designed to be highly available and scalable. It sacrifices consistency to achieve high availability. It uses consistent hashing to partition data, vector clocks to track data versions, quorums and hinted handoff to maintain availability during failures, and Merkle trees to synchronize replicas. Dynamo guarantees service level agreements through tunable parameters like replication factor and quorum sizes.

Uploaded by

lourdes_chang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views26 pages

Dynamo: Amazon's Highly Available Key-Value Store

Dynamo is Amazon's key-value data store designed to be highly available and scalable. It sacrifices consistency to achieve high availability. It uses consistent hashing to partition data, vector clocks to track data versions, quorums and hinted handoff to maintain availability during failures, and Merkle trees to synchronize replicas. Dynamo guarantees service level agreements through tunable parameters like replication factor and quorum sizes.

Uploaded by

lourdes_chang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Dynamo: Amazon’s Highly

Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun,


Madan Jampani, Gunavardhan Kakulapati,
Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall
and Werner Vogels

Slide modifications made by : Lourdes Chang


Motivation
 Build a distributed storage system:
 Scale
 Simple: key-value
 Highly available (sacrifice consistency)
 Guarantee Service Level Agreements (SLA)
System Assumptions and Requirements

 Query Model - simple read and write operations to


a data item that is uniquely identified by a key.
 ACID Properties - Atomicity, Consistency, Isolation,
Durability
 Dynamo targets applications that operate with weaker
consistency (the “C” in ACID) if this results in high
availability.
 Efficiency - latency requirements which are in general
measured at the 99.9th percentile of the distribution
 Other Assumptions - operation environment is assumed
to be non-hostile and there are no security related requirements
such as authentication and authorization.
Service Level Agreements (SLA)

 Application can deliver


its functionality in
abounded time
 Every dependency in the
platform needs to deliver
its functionality with even
tighter bounds.
 Example
 service guaranteeing that
it will provide a response
within 300ms for 99.9% of
its requests for a peak
client load of 500 requests
per second.
Design Consideration
 Sacrifice strong consistency for availability
 Conflict resolution is executed during read
instead of write, i.e. “always writeable”

 Other principles
 Incremental scalability.
 Symmetry
 Decentralization
 Heterogeneity
System architecture
 Partitioning
 High Availability for writes
 Handling temporary failures
 Recovering from permanent failures
Partition
 Motivation
 One of the key design requirements for
Dynamo is that it must scale incrementally.
 This requires a mechanism to dynamically
partition the data over the set of nodes (i.e.,
storage hosts).
Partition (Cont’d)

 Consistent hashing: the output


range of a hash function is treated as
a fixed circular space or “ring”.
 ”Virtual Nodes”: Each node can
be responsible for more than one
virtual node.
Replication
 Dynamo replicates its data
on multiple hosts
 “preference list”: The list of
nodes that is responsible
for storing a particular key.
Data Versioning
 A put() call may return to its caller before the
update has been applied at all the replicas
 A get() call may return many versions of the
same object.
 Challenge: an object having distinct version sub-histories,
which the system will need to reconcile in the future.
 Solution: uses vector clocks in order to capture causality
between different versions of the same object.
Vector Clock
 A vector clock is a list of (node, counter)
pairs.
 Every version of every object is associated
with one vector clock.
 If the counters on the first object’s clock are
less-than-or-equal to all of the nodes in the
second clock, then the first is an ancestor of
the second and can be forgotten.
Vector clock example
Execution of get () and put ()
operations
 Two strategies to select a node:
1. Route its request through a generic
load balancer that will select a node
based on load information.
2. Use a partition-aware client library that
routes requests directly to the
appropriate coordinator nodes.
Quorum
 R/W is the minimum number of nodes that
must participate in a successful read/write
operation.
 Setting R + W > N yields a quorum-like
system.
 R and W are usually configured to be less
than N, to provide better latency
Hinted handoff
 Assume N = 3. When A
is temporarily down or
unreachable during a
write, send replica to D.
 D is hinted that the
replica is belong to A and
it will deliver to A when A
is recovered.
 Again: “always writeable”
Replica synchronization
 Structure of Merkle tree:
 a hash tree where leaves are hashes of the values of
individual keys.
 Parent nodes higher in the tree are hashes of their
respective children.
 Advantage of Merkle tree:
 Each branch of the tree can be checked independently
without requiring nodes to download the entire tree.
 Help in reducing the amount of data that needs to be
transferred while checking for inconsistencies among
replicas.
Summary of techniques used in Dynamo
and their advantages

Problem Technique Advantage

Partitioning Consistent Hashing Incremental Scalability


High Availability for writes Vector clocks with reconciliation Version size is decoupled from
during reads update rates.

Handling temporary failures Quorum and hinted handoff Provides high availability and
durability guarantee when some of
the replicas are not available.

Recovering from permanent Anti-entropy using Merkle trees Synchronizes divergent replicas in
failures the background.
Implementation
 Java
 Local persistence component allows for
different storage engines to be plugged in:
 Berkeley Database (BDB) Transactional Data
Store: object of tens of kilobytes
 MySQL: object of > tens of kilobytes
 BDB Java Edition, etc.
Performance
 Guarantee Service Level
Agreements (SLA)
 the latencies exhibit a clear
diurnal pattern (incoming
request rate)
 write operations always
results in disk access.
 affected by several factors
such as variability in request
load, object sizes, and
locality patterns
Balance

 out-of-balance
 If the node’s request load deviates from the average load by
a value more than a certain threshold (15%)
 Imbalance ratio decreases with increasing load
 under high loads, a large number of popular keys are accessed
and the load is evenly distributed
Partitioning and placement of key
 Each node assigned T
tokens
 Tokens of all nodes are
ordered according to
their values in the hash
space
 Every two consecutive
tokens define a range
 Ranges vary in size
Partitioning and placement of key (cont’d)

 divides the hash space into


Q equally sized partitions
 The primary advantages of
this strategy are:
1. decoupling of partitioning
and partition placement,
2. enabling the possibility of
changing the placement
scheme at runtime.
Partitioning and placement of key (cont’d)

 divides the hash space into


Q equally sized partitions
 each node is assigned Q/S
tokens where S is the
number of nodes in the
system.
 When a node leaves the
system, its tokens are
randomly distributed to the
remaining nodes
 when a node joins the
system it "steals" tokens
from nodes in the system
Partitioning and placement of key (cont’d)
Conclusion
 Dynamo is a highly available and scalable
data store, used for storing state of a number
of core services of Amazon.com’s e-
commerce platform.
 Dynamo has been successful in handling
server failures, data center failures and
network partitions.
Conclusion (Cont’d)
 Dynamo is incrementally scalable and allows
service owners to scale up and down based
on their current request load.
 Dynamo allows service owners to customize
their storage system to meet their desired
performance, durability and consistency SLAs
by allowing them to tune the parameters N,
R,and W.

You might also like