Amazon DynamoDB - Wikipedia
Amazon DynamoDB - Wikipedia
History
Werner Vogels, CTO at Amazon.com, provided a
motivation for the project in his 2012
Developer(s) Amazon.com
announcement.[3] Amazon began as a decentralized
network of services. Originally, services had direct Initial release January 2012[1]
access to each other's databases. When this became a Written in Java
bottleneck on engineering operations, services Operating system Cross-platform
moved away from this direct access pattern in favor
Available in English
of public-facing APIs. Still, third-party relational
database management systems struggled to handle Type Document-oriented
Amazon's client base. This culminated during the database · Key–value
database
2004[4][5] holiday season, when several technologies
failed under high traffic. License Proprietary
Website aws.amazon.com
Traditional databases often split data into smaller
/dynamodb/ (http://aws.am
pieces to save space, but combining those pieces
azon.com/dynamodb/)
during searches can make queries slower. Many of
Amazon's services demanded mostly primary-key
reads on their data, and with speed a top priority, putting these pieces together was extremely
taxing.[6]
Content with compromising storage efficiency, Amazon's response was Dynamo: a highly available
key–value store built for internal use.[3] Dynamo, it seemed, was everything their engineers
needed, but adoption lagged. Amazon's developers opted for "just works" design patterns with S3
and SimpleDB. While these systems had noticeable design flaws, they did not demand the
overhead of provisioning hardware and scaling and re-partitioning data. Amazon's next iteration of
NoSQL technology, DynamoDB, automated these database management operations.
Overview
DynamoDB organizes data into tables, which are similar to
spreadsheets. Each table contains items (rows), and each item
is made up of attributes (columns). Each item has a unique
identifier called a primary key, which helps locate it within the
table.
DynamoDB Items
An Item in DynamoDB is a set of attributes that can be uniquely identified in a Table. An Attribute
is an atomic data entity that in itself is a Key-Value pair. The Key is always of String type, while the
value can be of one of multiple data types.
An Item is uniquely identified in a Table using a subset of its attributes called Keys.[7]
Keys In DynamoDB
A Primary Key is a set of attributes that uniquely identifies items in a DynamoDB Table. Creation
of a DynamoDB Table requires definition of a Primary Key. Each item in a DynamoDB Table is
required to have all of the attributes that constitute the Primary Key, and no two items in a Table
can have the same Primary Key. Primary Keys in Dynamo DB can consist of either one or two
attributes.
When a Primary Key is made up of only one attribute, it is called a Partition Key. Partition Keys
determine the physical location of the associated item. In this case, no two items in a table can
have the same Partition Key.
When a Primary Key is made up of two attributes, the first one is called a "Partition Key" and the
second is called a "Sort Key". As before, the Partition Key decides the physical Location of Data,
but the Sort Key then decides the relative logical position of associated item's record inside that
physical location. In this case, two items in a Table can have the same Partition Key, but no two
items in a partition can have the same Sort Key. In other words, a given combination of Partition
Key and Sort Key is guaranteed to have at most one item associated with it in a DynamoDB
Table.[7]
DynamoDB Indices
Primary Key of a Table is the Default or Primary Index of a DynamoDB Table.
In addition, a DynamoDB Table can have Secondary Indices. A Secondary Index is defined on an
attribute that is different from Partition Key or Sort Key as the Primary Index.
When a Secondary Index has same Partition Key as Primary Index but a different Sort Key, it is
called as the Local Secondary Index.
When Primary Index and Secondary Index have different Partition Key, the Secondary index is
known as the Global Secondary Index.[7]
Development considerations
Access Patterns
To optimize DynamoDB performance, developers must carefully plan and analyze access patterns
when designing their database schema. This involves creating tables and configuring indices
accordingly. [11][12][13][14]
When querying non-indexed attributes, DynamoDB performs a full table scan, resulting in
significant overhead. Even when using the "filter" operation, DynamoDB scans the entire table
before applying filters. [12]
Joins in DynamoDB
Amazon DynamoDB does not natively support join operations, as it is a NoSQL database optimized
for single-table, high-performance access patterns. However, join-like operations can be achieved
through integrations with external tools such as Amazon EMR, Amazon Athena, and Apache
Spark. These tools process DynamoDB data outside the database, allowing SQL-style joins for
analytical and batch workloads. While these methods expand DynamoDB's querying capabilities,
they introduce additional complexity and latency, making them unsuitable for real-time or
transactional use cases. DynamoDB's architecture is designed to avoid joins by encouraging
denormalization and single-table schemas. [15][16][17][18]
System architecture
Data structures
DynamoDB uses hashing and B-trees to manage data. Upon
entry, data is first distributed into different partitions by
hashing on the partition key. Each partition can store up to
10GB of data and handle by default 1,000 write capacity units
(WCU) and 3,000 read capacity units (RCU).[19] One RCU
represents one strongly consistent read per second or two
eventually consistent reads per second for items up to 4KB in
size.[20] One WCU represents one write per second for an item
up to 1KB in size.
Within each partition, one of the three nodes is designated the "leader node". All write operations
travel first through the leader node before propagating, which makes writes consistent in
DynamoDB. To maintain its status, the leader sends a "heartbeat" to each other node every 1.5
seconds. Should another node stop receiving heartbeats, it can initiate a new leader election.
DynamoDB uses the Paxos algorithm to elect leaders.
Amazon engineers originally avoided Dynamo due to engineering overheads like provisioning and
managing partitions and nodes.[6] In response, the DynamoDB team built a service it calls
AutoAdmin to manage a database.[21] AutoAdmin replaces a node when it stops responding by
copying data from another node. When a partition exceeds any of its three thresholds (RCU, WCU,
or 10GB), AutoAdmin will automatically add additional partitions to further segment the data.[19]
Just like indexing systems in the relational model, DynamoDB demands that any updates to a table
be reflected in each of the table's indices. DynamoDB handles this using a service it calls the "log
propagator", which subscribes to the replication logs in each node and sends additional Put,
Update, and Delete requests to indices as necessary.[21] Because indices result in substantial
performance hits for write requests, DynamoDB allows a user at most five of them on any given
table.[22]
Query execution
Suppose that a DynamoDB user issues a write operation (a Put, Update, or Delete). While a typical
relational system would convert the SQL query to relational algebra and run optimization
algorithms, DynamoDB skips both processes and gets right to work.[21] The request arrives at the
DynamoDB request router, which authenticates––"Is the request coming from where/whom it
claims to be?"––and checks for authorization––"Does the user submitting the request have the
requisite permissions?" Assuming these checks pass, the system hashes the request's partition key
to arrive in the appropriate partition. There are three nodes within, each with a copy of the
partition's data. The system first writes to the leader node, then writes to a second node, then
sends a "success" message, and finally continues propagating to the third node. Writes are
consistent because they always travel first through the leader node.
Finally, the log propagator propagates the change to all indices. For each index, it grabs that
index's primary key value from the item, then performs the same write on that index without log
propagation. If the operation is an Update to a preexisting item, the updated attribute may serve as
a primary key for an index, and thus the B tree for that index must update as well. B trees only
handle insert, delete, and read operations, so in practice, when the log propagator receives an
Update operation, it issues both a Delete operation and a Put operation to all indices.
Now suppose that a DynamoDB user issues a Get operation. The request router proceeds as before
with authentication and authorization. Next, as above, we hash our partition key to arrive in the
appropriate hash. Now, we encounter a problem: with three nodes in eventual consistency with
one another, how can we decide which to investigate? DynamoDB offers the user two options when
issuing a read: consistent and eventually consistent. A consistent read visits the leader node. But
the consistency-availability trade-off rears its head again here: in read-heavy systems, always
reading from the leader can overwhelm a single node and reduce availability.
The second option, an eventually consistent read, selects a random node. In practice, this is where
DynamoDB trades consistency for availability. If we take this route, what are the odds of an
inconsistency? We'd need a write operation to return "success" and begin propagating to the third
node, but not finish. We'd also need our Get to target this third node. This means a 1-in-3 chance of
inconsistency within the write operation's propagation window. How long is this window? Any
number of catastrophes could cause a node to fall behind, but in the vast majority of cases, the
third node is up-to-date within milliseconds of the leader.
See also
Amazon Aurora
Amazon DocumentDB
Amazon Redshift
Amazon Relational Database Service
ScyllaDB (DynamoDB compatible)
Comparison of relational database management systems
References
1. "Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet
Scale Applications - All Things Distributed" (https://www.allthingsdistributed.com/2012/01/amaz
on-dynamodb.html). www.allthingsdistributed.com. 18 January 2012.
2. "What is Amazon DynamoDB?" (https://docs.aws.amazon.com/amazondynamodb/latest/develo
perguide/Introduction.html).
3. Vogels, Werner (2012-01-18). "Amazon DynamoDB – a Fast and Scalable NoSQL Database
Service Designed for Internet Scale Applications" (http://www.allthingsdistributed.com/2012/01/
amazon-dynamodb.html). All Things Distributed blog. Retrieved 2012-01-21.
4. "How Amazon's DynamoDB helped reinvent databases" (https://www.networkworld.com/article/
939814/how-amazon-s-dynamodb-helped-reinvent-databases.html). Network World. Retrieved
2023-11-30.
5. brockmeier 1, joe (2012-01-18). "Amazon Takes Another Pass at NoSQL with DynamoDB" (htt
ps://readwrite.com/amazon-enters-the-nosql-market/). ReadWrite. Retrieved 2023-11-30.
6. DeCandia, Giuseppe; Hastorun, Deniz; Jampani, Madan; Kakulapati, Gunavardhan;
Lakshman, Avinash; Pilchin, Alex; Sivasubramanian, Swaminathan; Vosshall, Peter; Vogels,
Werner (October 2007). "Dynamo: Amazon's Highly Available Key–value Store". SIGOPS
Oper. Syst. Rev. 41 (6): 205–220. doi:10.1145/1323293.1294281 (https://doi.org/10.1145%2F1
323293.1294281). ISSN 0163-5980 (https://search.worldcat.org/issn/0163-5980).
7. "Core components of Amazon DynamoDB - Amazon DynamoDB" (https://docs.aws.amazon.co
m/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html).
docs.aws.amazon.com. Retrieved 2023-05-28.
8. "Supported data types and naming rules in Amazon DynamoDB - Amazon DynamoDB" (https://
docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDat
aTypes.html). docs.aws.amazon.com. Retrieved 2023-05-28.
9. Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with
predictable, scalable performance. ISBN 9781803248325.
10. "Troubleshooting latency issues in Amazon DynamoDB" (https://docs.aws.amazon.com/amazo
ndynamodb/latest/developerguide/TroubleshootingLatency.html).
11. "Core components of Amazon DynamoDB" (https://docs.aws.amazon.com/amazondynamodb/l
atest/developerguide/HowItWorks.CoreComponents.html).
12. "Scanning tables in DynamoDB" (https://docs.aws.amazon.com/amazondynamodb/latest/devel
operguide/Scan.html).
13. "Improving data access with secondary indexes in DynamoDB" (https://docs.aws.amazon.com/
amazondynamodb/latest/developerguide/SecondaryIndexes.html).
14. "Querying tables in DynamoDB" (https://docs.aws.amazon.com/amazondynamodb/latest/devel
operguide/Query.html).
15. Serverless Programming Cookbook: Practical solutions to building serverless applications
using Java and AWS. ISBN 9781788621533.
16. Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with
predictable, scalable performance. ISBN 9781803248325.
17. "Unsuitable workloads" (https://docs.aws.amazon.com/whitepapers/latest/best-practices-for-mi
grating-from-rdbms-to-dynamodb/unsuitable-workloads.html).
18. "Querying data in DynamoDB" (https://docs.aws.amazon.com/amazondynamodb/latest/develo
perguide/EMRforDynamoDB.Querying.html).
19. Gunasekara, Archie (2016-06-27). "A Deep Dive into DynamoDB Partitions" (https://shinesoluti
ons.com/2016/06/27/a-deep-dive-into-dynamodb-partitions/). Shine Solutions Group. Retrieved
2019-08-03.
20. "Amazon DynamoDB Developer Guide" (https://docs.aws.amazon.com/amazondynamodb/late
st/developerguide/Introduction.html). AWS. August 10, 2012. Retrieved July 18, 2019.
21. AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale
Database (DAT321) (https://www.youtube.com/watch?v=yvBR71D0nAQ), 27 November 2018,
retrieved 2019-08-03
22. "Service, account, and table quotas in Amazon DynamoDB - Amazon DynamoDB" (https://doc
s.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html#limits-secon
dary-indexes). docs.aws.amazon.com. Retrieved 2024-01-09.
External links
Official website (http://aws.amazon.com/dynamodb/)
Video: AWS re:Invent 2019: [REPEAT 1] Amazon DynamoDB deep dive: Advanced design
patterns (DAT403-R1) (https://www.youtube.com/watch?v=6yqfmXiZTlM)