0% found this document useful (0 votes)
21 views47 pages

Database Notes

Uploaded by

Taruna Dhaddha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views47 pages

Database Notes

Uploaded by

Taruna Dhaddha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47



Building applications is all about data collection and management. If you design an e-commerce
application, you want to show available inventory catalog data to customers and collect purchase
data as they make a transaction. Similarly, if you are running an autonomous vehicle application,

on that data. As of now, you have learned about networking, storage, and compute in previous
chapters. In this chapter, you will learn the choices of database services available in AWS to
complete the core architecture tech stack.

With so many choices at your disposal, it is easy to get analysis paralysis. So, in this chapter, we will

you are using the right tool for the job.

In this chapter, you will learn the following topics:

• A brief history of databases and data-driven innovation trends


• Database consistency model
• Database usages model
• AWS relational and non-relational database services

222 Selecting the Right Database Service

• Choosing the right tool for the job


• Migrating databases to AWS

By the end of this chapter, you will learn about different AWS database service offerings and how
to choose a suitable database for your workload.


Relational databases have been

and rows in tables are associated with other rows in other tables by using the column values
in each row as relationship keys. Another important feature of relational databases is that they
normally use Structured Query Language (SQL) to access, insert, update, and delete records. SQL
was created by IBM researchers Raymond Boyce and Donald Chamberlin in the 1970s. Relational
databases and SQL have served us well for decades.

As the internet’s popularity increased in the 1990s, we started hitting scalability limits with
relational databases. Additionally, a wider variety of data types started cropping up. Relational
Database Management Systems (RDBMSs) were
development of new designs, and we got the term NoSQL databases. As confusing as the term
is, it does convey the idea that it can deal with data that is not structured, and it deals with it

NoSQL

by Eric Evans and Johan Oskarsson to describe databases that were not relational.

and query it. Let’s see an example of making a choice between a relational and non-relational

every customer’s bank account to always be consistent and roll back in case of any error. In such
a scenario, you want to use a relational database. For a relational database, if some information
is not available, then you are forced to store null or some other value. Now take an example of

and education. In such cases, you want to use a non-relational database and store only the infor-
mation provided by the user without adding null values where the user doesn’t provide details
(unlike in relational databases).
223

that have the same user data.

First Name Last Name City Country


Maverick Doe Seattle USA
Goose Henske NULL NULL
John Gayle London NULL


First Name Last Name City Country


Maverick Doe Seattle USA
Goose Henske
John Gayle London



country. You can see that only Maverick provided their full information while the other users
in-
formation with a NULL value across all columns, while in a non-relational database, that column
doesn’t exist at all.

It is nothing short of amazing what has occurred since then. Hundreds of new offerings have
been developed, each trying to solve a different problem. In this environment, deciding the best
service or product to solve your problem becomes complicated. And you must consider not only
your current requirements and workloads, but also take into account that your choice of database
will be able to cover your future requirements and new demands. With so much data getting
generated, it is natural that much of innovation is driven by data. Let’s look in detail at how data
is driving innovation.


Since high-speed internet became available in the last decade, more and more data is getting
generated. Before we -
tive on data:

• The surge of data: Our current era is witnessing an enormous surge in data generation.
Managing the vast amount of data originating from your business applications is essential.
224 Selecting the Right Database Service

However, the exponential growth primarily stems from the data produced by network-con-

devices, including but not limited to mobile phones, connected vehicles, smart homes,
wearable technologies, household appliances, security systems, industrial equipment,
machinery, and electronic gadgets, constantly generate real-time data. Notably, over one-
third of mobile sign-ups on cellular networks result from built-in cellular connections in
most modern cars. In addition, applications generate real-time data, such as purchase
data from e-commerce sites, user behavior from mobile apps, and social media posts/

• Microservices change analytics requirements -


izing organizations’ data and analytics requirements. Rather than developing monolithic
applications, companies are shifting towards a microservices architecture that divides
-

faster. Microservices enable developers to break down their applications into smaller

and it must be incorporated into every aspect of the business, rather than just being an
after-the-fact activity. Monitoring the organization’s operations in real time is critical
to fuel innovation and quick decision-making, whether through human intervention or

• DevOps driving fast changes -

automated development tools to facilitate continuous software development, deploy-


ment, and enhancement. DevOps emphasizes effective communication, collaboration,

rate of change and change management, enabling businesses to adapt to evolving market
needs and stay ahead of the competition.

While you see the trend that the industry is adopting, let’s learn some basics of databases and
learn about the database consistency model in more detail.
225


In the context of databases, ensuring transaction data consistency involves restricting any da-
tabase transaction’s ability to modify data in unauthorized ways. When data is written to the

stringent process ensures that data integrity is maintained and that the information stored in the
database is accurate and trustworthy. Currently, there are two popular data consistency models.
We’ll discuss these models in the following subsections.


When database sizes were measured in megabytes, we could have stringent requirements that
enforced strict consistency. Since storage has become exponentially cheaper, databases can be
much bigger, often measured in terabytes and even petabytes. For this reason, making databas-

following:

• Atomicity: For an operation to be considered atomic, it should ensure that transactions


within the operation either succeed or fail. If one of the transactions fails, all operations
should fail and be rolled back. Could you imagine what would happen if you went to the

• Consistency structurally sound and consistent after completing each


transaction.
• Isolation are isolated and don’t contend with each other. Access to data
from multiple users is moderated to avoid contention. Isolation guarantees that two
transactions cannot coincide.
• Durability: After a transaction is completed, any changes a transaction makes should be
durable and permanent, even in a failure such as a power failure.

BASE model, which we will describe next. If performance were


not a consideration, using the ACID model would always be the right choice. BASE only came into
the picture because the ACID model could not scale in many instances, especially with internet
applications that serve a worldwide client base.
226 Selecting the Right Database Service


ACID was taken as the law of the land for many years, but a new model emerged with the advent
of bigger-scale projects and implementations. In many instances, the ACID model is more pessi-
mistic than required, and it’s too safe at the expense of scalability and performance.

ACID requirements, such as data freshness, immediate consistency, and accuracy, to gain other

AWS account and region.

BASE Basic Availability, Soft-state, and Eventual


consistency. Let’s explore what this means further:

• Basic availability available for the majority of the time (but not necessarily all

of data replication when writing a record.


• Soft-state
-
ures
provide fault tolerance. As sales come in and get written into the system, different readers

numbers, and others may be a few milliseconds behind and not have the latest updates.
In this case, the readers will have different results, but if they rerun the query soon after,

getting the results fast versus being entirely up to date may be acceptable.
• Eventual consistency data exhibits consistency eventually and maybe not
until the data is retrieved at a later point.

looser than the ACID model ones, and a direct one-for-one

in aggregate databases (including wide-column databases), key-value databases, and document


databases.

Let’s look at the database usage model, which is a crucial differentiator when storing your data.
227


-

always be present.

On the ingestion side, the data will be ingested in two different ways. It will either be a data

change data capture (CDC) set, which is changes in existing data or accessing brand new data.
But what drives your choice of database is not the fact that these two operations are present but
rather the following:








-
ogies have been the standards to address these questions for many years: online transaction
processing (OLTP) systems and online analytics processing (OLAP
that needs to be answered is - is it more important for the database to perform during data ingestion
or retrieval?
need to be read-heavy or write-heavy.


-
tions (such as inserts and

with less emphasis


number of transactions executed in a given time (usually seconds). Data is typically stored using
a schema that has been normalized, usually using the 3rd normal form (3NF). Before moving
on, let’s quickly discuss 3NF. 3NF is a state that a relational database schema design can possess.
228 Selecting the Right Database Service

A table using 3NF will reduce data duplication, minimize data anomalies, guarantee referential
Edgar F. Codd, the
inventor of the relational model for database management.

A database relation (for example, a database table) meets the 3NF standard if each table’s col-
umns only depend on the table’s primary key. Let’s look at an example of a table that fails to meet

columns, contains the employee’s supervisor’s name as well as the supervisor’s phone number. A
supervisor can undoubtedly have more than one employee under supervision, so the supervisor’s

resolve this issue, we could add a supervisor table, put the supervisor’s name and phone number
in the supervisor table, and remove the phone number from the employee table.


Conversely, OLAP databases do not process many transactions. Once data is ingested, it is usually

retrieval is often performed using some query language (the Structured Query Language (SQL)).
Queries in an OLAP environment are often complex and involve subqueries and aggregations.
In the context of OLAP systems, the performance of queries is the relevant measure. An OLAP
database typically contains historical data aggregated and stored in multi-dimensional schemas
(typically using the star schema).

For example, a bank might handle millions of daily transactions, storing deposits, withdrawals,

and, once aggregated, for more extended reporting periods.


229



As you have learned about the database consistency model and its uses, you must be wondering
which model is
BASE can be applied for OLAP.

Let’s go further and learn about the various kinds of database services available in AWS and how


AWS offers a broad range of database services that are purpose-built for every major use case.

are battle-tested and provide deep functionality, so you get the high availability, performance,
reliability, and security required by production workloads.
230 Selecting the Right Database Service

-
actional applications, such as Amazon RDS and Amazon Aurora, non-relational databases like
Amazon DynamoDB for internet-scale applications, an in-memory data store called Amazon
ElastiCache for caching and real-time workloads, and a graph database, Amazon Neptune, for
developing applications with highly connected data. Migrating your existing databases to AWS
is made simple and cost-effective with the AWS Database Migration Service. Each of these da-
tabase services is so vast that going into details warrants a book for each of these services itself.


database space, but relational databases have served us well for
many years without needing any other type of database. A relational database is probably the

So, let’s analyze the different relational options that AWS offers us.


Given what we said in the previous section, it is not surprising that Amazon has a robust lineup

possible to install your database into an EC2 instance and manage it yourself. Unless you have

you consider all the costs, including system administration costs, you will most likely be better
off and save money using Amazon RDS.

Amazon RDS was designed by AWS to simplify the management of crucial transactional appli-
cations by providing an easy-to-use platform for setting up, operating, and scaling a relational
database in the cloud. With RDS, laborious administrative tasks such as hardware provisioning,
-

for memory, performance, or I/O, and supports six well-known database engines, including
Amazon Aurora (compatible with MySQL and PostgreSQL), MySQL, PostgreSQL, MariaDB, SQL
Server, and Oracle.

If you want more control of your database at the OS level, AWS has now launched Amazon RDS
Custom. It provisions all AWS resources in your account, enabling full access to the underlying
Amazon EC2 resources and database environment access.
231

You can install third-party and packaged applications directly onto the database instance as they
would have in a self-managed environment while
RDS traditionally provides.

• Community (Postgres, MySQL, and MariaDB): AWS offers RDS with three different open-
source -

• Amazon Aurora (Postgres and MySQL): As you can see, Postgres and MySQL are here,
as they are in the
within the Aurora wrapper can add many
started offering the MySQL service in 2014 and added the Postgres version in 2017. Some
of these are as follows:

a.
b. Fivefold performance increase over the vanilla MySQL version
c. Automatic six-way replication across availability zones to improve availability
and fault tolerance

• Commercial (Oracle and SQLServer): Many organizations still run Oracle workloads, so
AWS offer

the cost of this service, there will be a licensing cost associated with using this service,
which otherwise might not be present if you use a community edition.


managed database service offered by AWS. Let’s look
at its key attributes to make your database more resilient and performant.

Multi-AZ deployments - Multi-AZ deployments in RDS provide improved availability and du-
rability for database instances, making them an ideal choice for production database workloads.
With Multi-AZ DB instances, RDS synchronously replicates data to a standby instance in a dif-
ferent Availability Zone (AZ) for enhanced resilience. You can change your environment from
Single-AZ to Multi-AZ at any time. Each AZ runs on its own distinct, independent infrastructure
and is built to be highly dependable.
232 Selecting the Right Database Service

In the event of an infrastructure failure, RDS initiates an automatic failover to the standby instance,
allowing you to resume database operations as soon as the failover is complete. Additionally, the
endpoint for your DB instance remains the same after a failover, eliminating manual adminis-
trative intervention and enabling your application to resume database operations seamlessly.

Read replicas - RDS makes it easy to create read replicas of your database and automatically keeps
them in sync with the primary database (for MySQL, PostgreSQL, and MariaDB engines). Read
replicas are helpful for both read scaling and disaster recovery use cases. You can add read repli-
cas to handle read workloads, so your master database doesn’t become overloaded with reading
requests. Depending on the database engine, you may be able to position your read replica in a
different region than your master, providing you with the option of having a read location that

in case of an issue with the master, ensuring you have coverage in the event of a disaster.

While both Multi-AZ deployments and read replicas can be used independently, they can also
be used together to provide even greater availability and performance for your database. In this
case, you would create a Multi-AZ deployment for your primary database and then create one or

failover capabilities of Multi-AZ deployments and the performance improvements provided by


read replicas.

Automated backup - With RDS, a scheduled backup is automatically performed once a day during

retention period for your backups, which can be up to 35 days. While automated backups are
available for 35 days, you can retain longer backups using the manual snapshots feature provided
by RDS. RDS keeps multiple copies of your backup in each AZ where you have an instance de-
ployed to ensure their durability and availability. During the automatic backup window, storage

cause a brief period of elevated latency. However, no I/O suspension occurs

help achieve high performance if your application is time-sensitive and needs to be always on.

Database snapshots - You can manually create backups of your instance stored in Amazon S3,
which are retained until you decide to remove them. You can use a database snapshot to create a
new instance whenever needed. Even though database snapshots function as complete backups,
you are charged only for incremental storage usage.
233

Data storage - Amazon RDS supports the most demanding database applications by utilizing
Amazon Elastic Block Store (Amazon EBS
two SSD-backed storage options to choose from: a cost-effective general-purpose option and a

volumes to improve performance based on the requested storage amount.

Scalability - You can often scale your RDS database compute and storage resources without

and price requirements. You may want to scale your database instance up or down, including
scaling up to handle the higher load, scaling down to preserve resources when you have a lower
load, and scaling up and down to control costs if you have regular periods of high and low usage.

Monitoring - RDS offers a set of 15-18 monitoring metrics that are automatically available for you.

monitor crucial aspects such as CPU utilization, memory usage, storage, and latency. You can view
the metrics in individual or multiple graphs or integrate them into your existing monitoring tool.
Additionally, RDS provides Enhanced Monitoring, which offers access to more than 50 additional
metrics. By enabling Enhanced Monitoring, you can specify the granularity at which you want

for all six database engines supported by RDS.

Amazon RDS Performance Insights is a performance monitoring tool for Amazon RDS databases.
It allows you to monitor the performance of your databases in real-time and provides insights
and recommendations for improving the performance of your applications. With Performance
Insights, you can view a graphical representation of your database’s performance over time and

potential performance bottlenecks or issues and take action to resolve them.

Performance Insights also provides recommendations for improving the performance of your
-

to improve the overall performance of your application.

Security - Controlling network access to your database is made simple with RDS. You can run
your database instances in Amazon Virtual Private Cloud (Amazon VPC) to isolate them and

Additionally, most RDS engine types offer encryption at rest, and all engines support encryption
in transit. RDS offers a wide range of compliance readiness, including HIPAA eligibility.
234 Selecting the Right Database Service

You can learn more about RDS by visiting the AWS page: https://aws.amazon.com/rds/.

As you have learned about RDS, let’s dive deeper into AWS cloud-native databases with Amazon
Aurora.


Amazon Aurora is a relational database service that blends the availability and rapidity of high-
end commercial databases with the simplicity and cost-effectiveness of open-source databases.
Aurora is built with full compatibility with MySQL and PostgreSQL engines, enabling applications

construct serverless and machine learning (ML


managed and automates time-intensive administration tasks, including hardware provision-
ing, database setup, patching, and backups. It provides commercial-grade databases’ reliability,
availability, and security while costing only a fraction of the price.

Amazon Aurora has many key features that have been added to expand the service’s capabilities
since it launched in 2014. Let’s review some of these key features:


-
tomatically start up, shut down, and adjust its capacity based on the needs of your appli-
cation. Amazon Aurora Serverless v2 scales almost instantly to accommodate hundreds

ensure the right resources for your application. You won’t have to manage the database
capacity, and you’ll only pay for your application’s resources. Compared to peak load
provisioning capacity, you could save up to 90% of your database cost with Amazon
Aurora Serverless.
• Global Database

across multiple AWS regions, allowing for faster local reads and rapid disaster recovery.
Global Database utilizes storage-based replication to replicate your database across var-
ious regions, typically resulting in less than one-second latency. By utilizing a secondary
region, you can have a backup option in case of a regional outage or degradation and can
quickly recover. Additionally, it takes less than one minute to promote a database in the
secondary region to full read/write capabilities.
235

• Encryption - With Amazon Aurora, you can encrypt your databases by using keys you
create and manage through AWS Key Management Service (AWS KMS). When you use
Amazon Aurora encryption, data stored on the underlying storage and automated backups,
snapshots, and replicas within the same cluster are encrypted. Amazon Aurora secures
data in transit using SSL (AES-256).
• Automatic, continuous, incremental backups and point-in-time restore - Amazon Au-
-
capability is known as

Amazon Simple Storage Service


(Amazon S3 -
mental, continuous, and automatic, and they have no impact on database performance.
• Multi-AZ Deployments with Aurora Replicas - In the event of an instance failure, Amazon
Aurora leverages Amazon RDS Multi-AZ technology to perform an automated failover to
one of the up to 15 Amazon Aurora Replicas you have established across three AZs. If you
have not provisioned any Amazon Aurora Replicas, in the event of a failure, Amazon RDS
will automatically attempt to create a new Amazon Aurora DB instance for you.
• Compute Scaling - You can scale the provisioned instances powering your deployment

process of compute scaling typically takes only a few minutes to complete.


• Storage auto-scaling - Amazon Aurora automatically scales the size of your database
volume to

storage to handle the future growth of your database.


• Fault-tolerant and self-healing storage - Each 10 GB chunk of your database volume
is replicated six times across three Availability Zones, making Amazon Aurora storage
fault-tolerant. It can handle the loss of up to two data copies without affecting write
availability, and up to three copies without affecting read availability. Amazon Aurora
storage is also self-healing, continuously scanning data blocks and disks for errors and
replacing them automatically.
• Network isolation - Amazon Aurora operates within Amazon VPC, providing you with
the ability to segregate your database within your own virtual network and connect to

network access to your DB instances.


236 Selecting the Right Database Service

• Monitoring and metrics - Amazon Aurora offers a range of monitoring and performance
tools to help you keep your database instances running smoothly. You can use Amazon
CloudWatch metrics at no additional cost to monitor over 20 key operational metrics, such
as compute, memory, storage, query throughput, cache hit ratio, and active connections.
If you need more detailed insights, you can use Enhanced Monitoring to gather metrics
from the operating system instance that your database runs on. Additionally, you can use
Amazon RDS Performance Insights, a powerful database monitoring tool that provides an
easy-to-understand dashboard for visualizing database load and detecting performance
problems, so you can take corrective action quickly.
• Governance
AWS infrastructure, providing you with oversight over storage, analysis, and corrective
actions. You can ensure your organization remains compliant with regulations such as
platform enables you to capture
and unify user activity and API usage across AWS Regions and accounts in a centralized,
controlled environment, which can help you avoid penalties.
• Amazon Aurora machine learning - With Amazon Aurora machine learning, you can
incorporate machine learning predictions into your applications through SQL program-
ming language, eliminating the need to acquire separate tools or possess prior machine
learning experience. It offers a straightforward, optimized, and secure integration between
Aurora and AWS ML services, eliminating the need to create custom integrations or move
data between them.

Enterprise use cases for Amazon Aurora span multiple industries. Here are examples of some of

for each:

• Revamp corporate applications - Ensure high availability and performance of enterprise


applications, including CRM, ERP, supply chain, and billing applications.
• Build a Software-as-a-Service (SaaS) application
scaling to support dependable, high-performing, and multi-tenant SaaS applications.
Amazon Aurora is a good choice for building a SaaS application, as it provides the scal-
ability, performance, availability, and security features that are essential for successful
SaaS applications. Amazon Aurora automatically creates and maintains multiple replicas

SaaS application is always available, even in the event of an outage or failure.


237

Amazon Aurora provides several security features, such as encryption at rest and in transit,
to help protect your data and ensure compliance with industry standards and regulations.
• Deploy globally distributed applications - Achieve multi-region scalability and resilience
for internet-scale applications, such as mobile games, social media apps, and online

requirements, databases are often split across multiple instances, but this can lead to
over-provisioning or under-provisioning, resulting in increased costs or limited scalability.
Aurora Serverless solves this problem by automatically scaling the capacity of multiple
-

• Variable and unpredictable workloads - If you run an infrequently-used application


where you need to provision for peak, it will require you to pay for unused resources. A

You will not need to manage or upsize your servers manually. You need to set a min/max
capacity unit setting and allow Aurora to scale to meet the load.

Amazon RDS Proxy works


scalability by enabling applications to share and pool connections established with the database.
Let’s learn more details about RDS Proxy.


Amazon RDS Proxy is a service that acts as a database proxy for Amazon Relational Database
Service (RDS). It is fully managed by AWS and helps to increase the scalability and resilience of

• Improved scalability: RDS Proxy automatically scales to handle a large number of con-
current connections, making it easier for your application to scale.
• Better resilience to database failures: RDS Proxy can automatically failover to a standby
replica if the primary database instance becomes unavailable, reducing downtime and
improving availability.
• Enhanced security: RDS Proxy can authenticate and authorize incoming connections,
helping to prevent unauthorized access to your database. It can also encrypt data in transit,
providing an extra security measure for your data.
238 Selecting the Right Database Service

database passwords are managed by AWS Secrets Manager. Aurora put across 2 Availability
Zones (AZs) to achieve high availability, where AZ1 hosts Aurora’s primary database while AZ2
has the Aurora read replica.



With RDS Proxy, when a failover happens, application connections are preserved. Only trans-
actions that are actively sending or processing data will be impacted. During failover, the proxy

Amazon RDS Proxy is a useful tool for improving the performance, availability, and security of
your database-powered applications. It is fully managed, so you don’t have to worry about the
underlying infrastructure, and it can help make your applications more scalable, resilient, and
secure. You can learn more about RDS Proxy by visiting the AWS page: https://aws.amazon.
com/rds/proxy/.

High availability and performance are a database’s most essential and tricky parts. But this prob-
lem can be solved in an intelligent way using machine learning. Let’s look at RDS’s newly launched
feature, Amazon DevOps Guru, to help with database performance issues using ML.
239


DevOps Guru for Amazon RDS is a recently introduced capability that uses machine learning to
automatically identify and troubleshoot performance and operational problems related to rela-

or problematic SQL queries and provide recommendations for resolution, helping developers
address these issues quickly. DevOps Guru for Amazon RDS utilizes machine learning models to
deliver these insights and suggestions.

DevOps Guru for RDS aids in quickly resolving operational problems related to databases by
notifying developers immediately via Amazon (SNS
and EventBridge when issues arise. It also provides diagnostic information, as well as intelligent
remediation recommendations, and details on the extent of the issue.

AWS keeps adding innovations for Amazon RDS as a core service. Recently, they launched Ama-
zon RDS instances available on AWS’s chip Graviton2, which helps them offer lower prices with

database near your workload in an on-premise environment. You can learn more about RDS Proxy
by visiting the AWS page: https://aws.amazon.com/devops-guru/.

Besides SQL databases, Amazon’s well-known NoSQL databases are very popular. Let’s learn
about some NoSQL databases.


When it comes to NoSQL databases, you need to understand two main categories, key-value and
document, as those can needs high throughput, low latency,
reads and writes, and endless scale, while the document database stores documents and quickly
accesses querying on any attribute. Document and key-value databases are close cousins. Both

that in a document database, the value stored (the document) will be transparent to the database
and, therefore, can be indexed to assist retrieval. In the case of a key-value database, the value is
opaque and will not be scannable, indexed, or visible until the value is retrieved by specifying the
key. Retrieving a value without using the key would require a full table scan. Content is stored as
240 Selecting the Right Database Service

are three simple API operations:

• Retrieve the value for a key.


• Insert or update a value using a key reference.
• Delete a value using the key reference.

Binary Large Objects (BLOBs). Data stores keep the value without regard
for the content. Data in key-value database records is accessed using the key (in rare instances,

are neither supported nor a secondary consideration. Key-value stores often use the hash table
pattern to store the keys. No column-type relationships exist, which keeps the implementation
details simple. Starting in the key-value category, AWS provides Amazon DynamoDB. Let’s learn
more about Dynamo DB.


DynamoDB is a fully managed, multi-Region, multi-active database that delivers exceptional
performance, with single-digit-millisecond latency, at any scale. It is capable of handling more
than 10 trillion daily requests, with the ability to support peaks of over 20 million requests per sec-
ond, making it an ideal choice for internet-scale applications. DynamoDB offers built-in security,
backup and restore features, and in-memory caching. One of the unique features of DynamoDB
is its elastic scaling, which allows for seamless growth as the number of users and required I/O
throughput increases. You pay only for the storage and I/O throughput you provision, or on a

to support millions of users making thousands of concurrent requests every second. In addition,
control and support for end-to-end encryption to ensure
data security.

• Fully managed
• Supports multi-region deployment
• Multi-master deployment
• Fine-grained identity and access control
241

• Seamless integration with IAM security


• In-memory caching for fast retrieval
• Supports ACID transactions
• Encrypts all data by default

DynamoDB provides the option of on-demand backups for archiving data to meet regulatory

Additionally, you can enable continuous backups for point-in-time recovery, allowing restoration
to any point in the last 35 days with per-second granularity. All backups are automatically en-

discoverable and helps meet regulatory requirements.

DynamoDB is built for high availability and durability. All writes are persisted on an SSD disk and

DynamoDB Accelerator (DAX) is a managed, highly available, in-memory cache for DynamoDB

even at high request rates. DAX eliminates the need for you to manage cache invalidation, data
population, or cluster management, and delivers microseconds of latency by doing all the heavy
lifting required to add in-memory acceleration to your DynamoDB tables.


DynamoDB is a table-based database. While creating the table, you can specify at least three
components:

1. Keys
second, a sort key to sort and retrieve a batch of data in a given range. For example, trans-
action ID can be your primary key, and transaction date-time can be the sort key.
2. WCU: Write capacity unit at what rate you want to write your data
in DynamoDB.
3. RCU: Read capacity unit at what rate you want to read from your given
DynamoDB table.

-
visioning capacity of the table are equally distributed for all partitions.
242 Selecting the Right Database Service

-
base, and within the table, you have items.



two. As more items are added to the table in DynamoDB, it becomes apparent that attributes can
differ between items, and each item can have a unique set of attributes. Additionally, the primary

useful for establishing one-to-many relationships and facilitating in-range queries.

Sometimes you need to query data using the primary key, and sometimes you need to query by

provides two kinds of indexes:

• Local Secondary Index (LSI)

create an LSI with


the same primary key (order_ID) and a different secondary key (fulfilled). Now your

the same partition as the item in the table, ensuring consistency. Whenever an item is
updated, the corresponding LSI is also updated and acknowledged.
243

the parent table can have a different sort key. In the index, you can choose to have just the

the partition storage of the original table.


• Global Secondary Index (GSI) -A GSI is an extension of the concept of indexes, allowing
for more complex queries with various attributes as query criteria. In some cases, using
the existing
parallel or secondary table with a partition key that is different from the original table
and an alternate sort key. When creating a GSI, you must specify the expected workload
for read and write capacity units. Similar to an LSI, you can choose to have just the keys

want to be returned with the query.

Local Secondary Index (LSI) Global Secondary Index (GSI)

created during table creation only. table creation.


LSI shares WCU/RCU with the main table, so GSI WCU/RCU is independent of the table, so
you must have enough read/write capacity to it can scale without impacting the main table.
accommodate LSI need.
LSI size is a maximum of 10 GB in sync with the As GSI is independent of the main table, so it
primary table partition, which is limited to 10 has no size limits.
GB size per partition.
You can create a maximum of 5 LSIs. You can create up to 20 GSIs.
As LSIs tie up to the main table, they offer GSI offer “eventual consistency,” which
“strong consistency,” which means their current means there may be a slight lag on data
access to the most updated data. updates and lowers the chances you get
stable data.


So, you may ask when


when you want to query data based on an alternate sort key within the same partition key as
the base table.
244 Selecting the Right Database Service

scenar-

hand, allows you to query data based on attributes that are not part of the primary key or sort key
of the base table. It’s useful when you want to perform ad-hoc queries on different attributes of
the data or when you need to support multiple access patterns. A GSI can be used to scale read
queries beyond the capacity of the base table and can also be used to query data across partitions.

In general, if your data size is small enough, and you only need to query data based on a different
sort key within the same partition key, you should use an LSI. If your data size is larger, or you
need to query data based on attributes that are not part of the primary key or sort key, you should
use a GSI. However, keep in mind that a GSI comes with some additional cost and complexity in
terms of provisioned throughput, index maintenance, and eventual consistency.

If an item collection’s data size exceeds 10 GB, the only option is to use a GSI as an LSI limits the
data size in a particular partition. If eventual consistency is acceptable for your use case, a GSI
can be used as it is suitable for 99% of scenarios.

DynamoDB is very useful in designing serverless event-driven architecture. You can capture the
item-level data change, e.g., putItem, updateItem, and delete, by using DynamoDB Streams.
You can learn more about Amazon DynamoDB by visiting the AWS page: https://aws.amazon.
com/dynamodb/.

You may want a more sophisticated database when storing JSON documents for index and fast
search. Let’s learn about the AWS document database offering.


In this section, let’s talk about Amazon DocumentDB, “a fully managed MongoDB-compatible da-
tabase service designed from the ground up to be fast, scalable, and highly available”. DocumentDB is
a purpose-built document database engineered for the cloud that provides millions of requests
per second with millisecond latency and can scale-out to 15 read replicas in minutes. It is com-
patible with MongoDB 3.6/4.0 and managed by AWS, which means no hardware provisioning,
auto-patching, quick setup, good security, and automatic backups are needed.

If you look at the evolution of document databases, what was the need for document databases

data modeling within applications. Using JSON in the application and then trying to map JSON
to relational databases introduced friction and complication.
245

Object Relational Mappers (ORMs) were created to help with this friction, but there were com-
plications with performance and functionality. A crop of document databases popped up to solve

• Data is stored in documents that are in a JSON-like format, and these documents are

documents are stored as values or data types, in DocumentDB, documents are the key
design point of the database.

semi-structured data. Additionally, powerful indexing capabilities make querying such
documents much faster.

data between your application and the database.

across documents, making it easier to extract insights from data.

-
-

applications quickly over time and build applications faster.

In the case of a document


structured or semi-structured data value is called a document. It is typically stored using Exten-
sible Markup Language (XML), JavaScript Object Notation (JSON), or Binary JavaScript Object
Notation (BSON) format types.

• Content management systems


• E-commerce applications
• Analytics
• Blogging applications

• Requirements for complex queries or table joins



246 Selecting the Right Database Service


Compared to traditional relational databases, Amazon DocumentDB offers several advantages,
such as:

• On-demand instance pricing: You can pay by the hour without any upfront fees or long-
term commitments, which eliminates the complexity of planning and purchasing database

and testing.
• Compatibility with MongoDB 3.x and 4.x: Amazon DocumentDB supports MongoDB 3.6
drivers and tools, allowing customers to use their existing applications, drivers, and tools

API on a distributed, fault-tolerant, self-healing storage system, Amazon DocumentDB


offers the performance, scalability, and availability necessary for operating mission-critical
MongoDB workloads at scale.
• Migration support: You can use the AWS Database Migration Service (DMS) to migrate
MongoDB databases from on-premises, or on Amazon EC2, to Amazon DocumentDB at no
additional cost (for up to six months per instance) with minimal downtime. DMS allows
you to migrate from a MongoDB replica set or a sharded cluster to Amazon DocumentDB.
• Flexible schema

data being stored has a complex or hierarchical structure, or when the data being stored
is subject to frequent changes.
• High performance: DocumentDB is designed for high performance and can be well suited
for applications that require fast read and write access to data.
• Scalability: DocumentDB is designed to be horizontally scalable, which means that it can
be easily expanded to support large amounts of data and a high number of concurrent users.
• Easy querying
it easy to retrieve and manipulate data within the database.

DocumentDB has recently introduced new features that allow for ACID transactions across mul-
tiple documents, statements, collections, or databases. You can learn more about Amazon Docu-
mentDB by visiting the AWS page: https://aws.amazon.com/documentdb/.

If you want sub-millisecond performance, you need your data in memory. Let’s learn about
in-memory databases.
247


Studies have shown that if your site is slow, even for just a few seconds, you will lose customers. A
slow site results in 90% of your customers leaving the site. 57% of those customers will purchase

Business News Daily (https://www.businessnewsdaily.com/15160-slow-retail-websites-


lose-customers.html). Additionally, you will lose 53% of your mobile users if a page load takes
longer than 3 seconds (https://www.business.com/articles/website-page-speed-affects-
behavior/).

are demanding times for services, and to keep up with demand, you need to ensure that users
aren’t waiting to purchase your service, products, and offerings to continue growing your business

Applications and databases have changed dramatically not only in the past 50 years but also just
in the past 10. Where you might have had a few thousand users who could wait for a few seconds
-

the application and database world to rethink how data is stored and accessed. It’s essential to
use the right tool for the job. In-memory data stores are used when there is a need for maximum

throughput as you are caching data in memory, which helps to increase the performance by taking

In-memory databases, or IMDBs for short, usually store the entire data in the main memory.
Contrast this with databases that use a machine’s RAM for optimization but do not store all
the data simultaneously in primary memory and instead rely on disk storage. IMDBs generally
perform better than disk-optimized databases because disk access is slower than direct memory
access. In-memory operations are more straightforward and can be performed using fewer CPU
cycles. In-memory data access seeks time when querying the data, which enables faster and more
-
mance, in-memory operations are usually measured in nanoseconds, whereas operations that
require disk access are usually measured in milliseconds. So, in-memory operations are usually
about a million times faster than operations needing disk access.

Some use cases of in-memory databases are real-time analytics, chat apps, gaming leaderboards,
248 Selecting the Right Database Service

As shown in the following diagram, based on your data access pattern, you can use either lazy
caching or write-through. In lazy caching, the cache engine checks whether the data is in the
cache and, if not, gets it from the database and keeps it in the cache to serve future requests. Lazy
caching is also called the cache aside pattern.



to the caching engine, which tries to load data from the cache. If data is not available in the cache,
-
azon ElastiCache service is an AWS-provided cache database. Let’s learn more details about it.


Amazon ElastiCache is a cloud-based web service that enables users to deploy and manage an
in-memory cache with ease. By storing frequently accessed data in memory, in-memory caches
can enhance application performance, enabling faster data access compared to retrieving data
from a slower backing store such as a disk-based database. ElastiCache offers support for two
popular open-source in-memory cache engines: Memcached and Redis. Both engines are well
known for their reliability, scalability, and performance.

With ElastiCache, you can quickly and easily set up, manage, and scale an in-memory cache in
the cloud.
249

clusters, and monitoring the health of the cache environment, allowing you to focus on developing
and deploying your application.

In addition, ElastiCache integrates seamlessly with other AWS services, making it easy to use the

improve the performance and scalability of your overall application architecture.

Amazon ElastiCache
build data-intensive applications. In-memory data storage boosts applications’ performance by
retrieving data directly from memory.


Since ElastiCache offers two

user of Memcached. If your organization already has committed to Memcached, it is likely not

the future, but as of this writing, Redis continues to gain supporters. Here is a comparison of the
features and capabilities of the two:


250 Selecting the Right Database Service

key differences between Redis and Memcached. You can choose


by validating your options, and you can learn more about Amazon ElastiCache by visiting the
AWS page: https://aws.amazon.com/elasticache/.

AWS recently launched Amazon MemoryDB for Redis due to Redis’s popularity. It is a durable,
in-memory database service that provides ultra-fast performance and is compatible with Redis

is stored in memory, allowing for microsecond read and single-digit millisecond write latency,
as well as high throughput. You can learn more about Amazon MemoryDB by visiting the AWS
page: https://aws.amazon.com/memorydb/.

Now data is getting more complicated with many-to-many relationships and several layers in

their common likes. Let’s look at graph databases to solve this problem.


Graph databases are data stores that
In traditional databases, relationships are often an afterthought. In the case of relational data-
bases, relationships are implicit and manifest themselves as foreign key relationships. In graph
databases, relationships are
these relationships are called edges.

certain use cases, they offer much better data retrieval performance than traditional databases.
As you can imagine, graph databases are particularly suited for use cases that place heavy im-
portance on relationships among entities.

constant time. With graph databases, it is not uncommon to be able to traverse millions of edges
per second.

Graph databases can handle nodes with many edges regardless of the dataset’s number of nodes.
You only need a pattern and an initial node to traverse a graph database. Graph databases can
easily navigate the adjacent edges and nodes around an initial starting node while caching and
aggregating data from the visited nodes and edges. As an example of a pattern and a starting
point, you might have a database that contains ancestry information. In this case, the starting
point might be you, and the pattern might be a parent.
251

components of a graph database:

• Nodes: Nodes are elements


attributes, or key-value pairs. Nodes can be given tags, which constitute roles in the do-
main. Node labels can be employed to assign metadata (such as indices or constraints)
to the nodes.
• Edges between
two nodes. An edge has a direction, a type, a start node, and an end node. Like a node, an
edge can also have properties. In some situations, an edge can have quantitative properties,

can share edges regardless of the quantity or type without a performance penalty. Edges

200, 300, or 400.



project. It provides an imperative traversal language, called Gremlin, that can be used to write
traversals on property graphs, and many open-source and vendor implementations support it. You

traversal language could be a favorable option as it offers a method to navigate through property
graphs. You might also like openCypher, an open-source declarative query language for graphs,
as it provides a familiar SQL-like structure to compose queries for graph data.
252 Selecting the Right Database Service

Resource Description Framework (RDF), standardized by the W3C in a set of


standards

labeled, directed multi-graph, but it uses the concept of triples, subject, predicate, and object, to
encode the graph. Now let’s look at Amazon Neptune, which is Amazon’s graph database service.


Amazon Neptune is a managed service for graph databases, which uses nodes, edges, and prop-

suited to represent the complex relationships found in many types of data, such as the relation-
ships between people in a social network or the interactions between different products on an
e-commerce website.

Neptune supports the property graph and W3C’s RDF standards, making it easy to integrate with
other systems and tools that support these standards. Neptune also provides a query language
called Gremlin which is powerful and easy to use, which makes it easy to perform complex graph
traversals and data manipulation operations on the data stored in the database.

In addition, Neptune is highly scalable and available, with the ability to support billions of ver-
tices and edges in a single graph. It is also fully managed, which means that Amazon takes care
of the underlying infrastructure and performs tasks such as provisioning, patching, and backup
and recovery, allowing you to focus on building and using your application. You can learn more
about Amazon Neptune by visiting the AWS page: https://aws.amazon.com/neptune/.

Let’s learn more about it.


A time-series database (TSDB) is a database

not only important to track what happened but just as important to track when
unit of measure to use for the time depends on the use case. For some applications, it might be
enough to know on what day the event happened. But for other applications, it might be required

• Performance monitoring
• Networking and infrastructure applications
253

• Adtech and click stream processing



• Event-driven applications
• Financial applications
• Log analysis
• Industrial telemetry data for equipment maintenance
• Other analytics projects

types and require different optimization techniques.

• Millions of inserts from disparate sources potentially per second


• Summarization of data for downstream analytics
• Access to individual events

properties (which might not be present with other data types):



RDBMSes can store this data, but they are not optimized to process, store, and analyze this type

is probably an excellent solution to your problem.



generate events that need to be tracked and measured, sometimes with real-time requirements.
254 Selecting the Right Database Service

query processing engine that can make heads or tails of

has features that can automate query rollups, retention, tiering, and data compression. Like

depending on how much data is coming into the streams. Also, because it’s serverless and fully
-
ing are not the responsibility of the DevOps team, allowing them to focus on more important tasks.

records feature, instead of one measure per table row, making it easier to migrate existing data

automatically and periodically run the queries and store the results in a separate table. Addi-

the AWS page: https://


aws.amazon.com/timestream/.

serve this purpose, let’s learn about ledger databases.


A ledger database (LDB
and transparent transaction log orchestrated by a central authority:

• LDB immutability: Imagine you deposit $1,000 in your bank. You see on your phone that
re-

after the fact. In other words, only inserts are allowed, and updates cannot be performed.

• LDB transparency: In this context, transparency refers to the ability to track changes to
-
imum, should include who changed the data, when the data was changed, and what the
value of the data was before it was changed.
255

• : How can we ensure that our transaction will be im-

the transaction is recorded, the entire transaction data is hashed. In simple terms, the
string of data that forms the transaction is whittled down into a smaller string of unique
characters. Whenever the transaction is hashed, it needs to match that string. In the
ledger, the hash comprises the transaction data and appends the previous transaction’s
hash. Doing this ensures that the entire chain of transactions is valid. If someone tried to
enter another transaction in between, it would invalidate the hash, and it would detect
that the foreign transaction was added via an unauthorized method.

the ledger records all credits and debits related to the bank account. It can then be followed from
a point in history, allowing us to calculate the current account balance. With immutability and

other methods, such as RDBMSes, all transactions could be changed or erased.


Amazon QLDB is a fully managed service that provides a centralized trusted authority to manage
an immutable, transparent, and -
tion value changes and manages a

track of the history of transactions that need high availability and reliability. Some examples that
need this level of reliability are as follows:

• Financial credits and debits



Amazon QLDB offers various blockchain services, such as anonymous data sharing and smart
contracts, while still using a centrally trusted transaction log.

QLDB is designed to act as your system of record or source of truth. When you write to QLDB,
your transaction is
append-only interactions and -
actions, such as reads, inserts, updates, and deletes, like a transaction and catalogs everything
sequentially in this journal.
256 Selecting the Right Database Service

Once the transaction is committed to the journal, it is immediately materialized into tables and
indexes. QLDB provides a current state table and indexed history as a default when you create a
new ledger. Leveraging these allows customers to, as the names suggest, view the current states
learn more
about Amazon QLDB by visiting the AWS page: https://aws.amazon.com/qldb/.

Sometimes you need a database for large-scale applications that need fast read and write perfor-
mance, which a wide-column store can achieve. Let’s learn more about it.


Wide-column databases can sometimes be referred to as column family databases. A wide-column
database is a NoSQL database that can store petabyte-scale amounts of data. Its architecture relies
on persistent, sparse matrix, multi-dimensional mapping using a tabular format. Wide-column
databases are generally not relational.

When is it a good idea to use


• Geolocation data
• User preferences
• Reporting

• Logging applications
• Many inserts, but not many updates
• Low latency requirements

hoc queries:

• Heavy requirement for joins


• High-level aggregations
• Requirements change frequently

Apache Cassandra is probably the most popular wide-column store implementation today. Its
architecture allows deployments without single points of failure. It can be deployed across clus-
ters and data centers.
257

Amazon Keyspaces (formerly Amazon Managed Apache Cassandra Service, or Amazon MCS) is a
fully managed service that allows users to deploy Cassandra workloads. Let’s learn more about it.


Amazon Keyspaces (formerly known as Amazon Cassandra) is a fully managed, scalable, and
highly available NoSQL database service. NoSQL databases are a type of database that does not
use the traditional table-based relational database model and is well suited for applications that
require fast, scalable access to large amounts of data.

Keyspaces is based on Apache Cassandra, an open-source NoSQL database that is widely used for
applications that require high performance, scalability, and availability. Keyspaces provides the

with other AWS services.

Keyspaces supports both table and Cassandra Query Language (CQL) APIs, making it easy to
migrate existing Cassandra applications to Keyspaces. It also provides built-in security features,
such as encryption at rest and network isolation, using Amazon VPC and integrates seamlessly
with other AWS services, such as Amazon EMR and Amazon SageMaker.

Servers are automatically spun up or brought down, and, as such, users are only charged for the
servers Cassandra is using at any one time. Since AWS manages it, users of the service never have

shell, which is called cqlsh. With cqlsh, you can create tables, insert data into the tables, and
access the data via queries, among other operations.

Keyspaces supports the Cassandra CQL API. Because of this, the current code and drivers devel-
oped in Cassandra will work without changing the code. Using Amazon Keyspaces instead of
just Apache Cassandra is as easy as modifying your database endpoint to point to an Amazon
MCS service table.

In addition to Keyspaces being wide-column, the major difference from DynamoDB is that it
supports composite partition keys and multiple clustering keys, which are not available in Dy-
namoDB. However, DynamoDB has better connectivity with other AWS services, such as Athena,
Kinesis, and Elasticsearch.
258 Selecting the Right Database Service

Keyspaces provides an SLA for 99.99% availability within an AWS Region. Encryption is enabled
by default for tables, and tables are replicated three times in multiple AWS Availability Zones to
ensure high availability. You can create continuous backups of tables with hundreds of terabytes
of data with no effect on your application’s performance, and recover data to any point in time
within the last 35 days. You can learn more about Amazon Keyspaces by visiting the AWS page:
https://aws.amazon.com/keyspaces/.


In the new world of cloud-born
Modern organizations will not only use multiple types of databases for multiple applications, but

you can choose the following three options available in AWS based on your workload.


Managing and scaling databases in a legacy infrastructure, whether on-premises or self-managed
in the cloud (on EC2), can be a tedious, time-consuming, and costly process. You have to worry
about operational

• -
ating backups can be time-consuming and laborious
• Performance and availability issues
• Scalability issues, such as capacity planning and scaling clusters for computing and storage
• Security and compliance issues, such as network isolation, encryption, and compliance
programs, including PCI, HIPAA, FedRAMP, ISO, and SOC

Instead of dealing with the challenges mentioned above, you would rather spend your time in-
novating and creating new applications instead of managing infrastructure. With AWS-managed
databases, you can avoid the need to over- or under-provision infrastructure to accommodate

costs such as software licensing, hardware refresh, and maintenance resources. AWS manages
everything for you, so you can focus on innovation and application development, rather than
infrastructure management. You won’t need to worry about administrative tasks such as server

your clusters to ensure your workloads are running with self-healing storage and automated scal-
ing, allowing you to focus on higher-value tasks such as schema design and query optimization.
259



Let’s look at the next

accommodate a decade-old relational database. Purpose-built databases help you to achieve


maximum output and performance as per the nature of your application.


In the 60s and 70s, mainframes were the primary means of building applications, but by the 80s,

Applications became more distributed, but the underlying data model remained mostly struc-
tured, and the database often functioned as a monolith. With the advent of the internet in the
90s, three-tier application architecture emerged. Although client and application code became
more distributed, the underlying data model continued to be mainly structured, and the database
remained a monolith. For nearly three decades, developers typically built applications against
a single database. And that is an interesting data point because if you have been in the industry
for a while, you often bump into folks whose mental model is, “Hey, I’ve been building apps for
a long time, and it’s always against this one database.”

-
tions are built in the cloud. Microservices have now extended to databases, providing developers
with the ability to break down larger applications into smaller, specialized services that cater to
260 Selecting the Right Database Service

database with a single storage and compute engine that struggles to handle every access pattern.

lower latency and the ability to handle millions of transactions per second with many concurrent
users. As a result, data management systems have evolved to include specialized storage and

trade-offs between functionality, performance, and scale.

Plus, what we’ve seen over the last few years is that more and more companies are hiring tech-
nical talent in-house to take advantage of the enormous wave of technological innovation that

microservices, where they compose the different elements together using the right tool for the

powering the most demanding workloads in the cloud.

Many factors contribute to the performance, scale, and availability requirements of modern apps:

• Users
businesses to touch millions of new customers.
• Data volume

even petabyte-scale data.


• Locality
users, which complicates the architectures of their solutions/products.
• Performance

• Request rate
experiences in more markets, they need their apps and databases to handle unprecedented
levels of throughput.
• Access/scale
billions
of smartphones worldwide, and businesses connect smartphones, cars, manufacturing

devices are connected to the cloud.


261

• Economics
and hope they’ll succeed, that model is unrealistic in 2023. Instead, they have to hedge
their success by only paying for what they use, without capping how much they can grow.

applications have wildly different database requirements, which are more advanced and
nuanced than simply running everything in a relational database.

complex applications into smaller components and choosing the most appropriate tool for each

a given task often varies by use case, leading developers to build highly distributed applications
using multiple specialized databases.

Now you have learned about the different types of AWS databases, let’s go into more detail about
moving on from legacy databases.


Numerous legacy applications have been developed on conventional databases, and consumers
have had to grapple with database providers that are expensive, proprietary, and impose punish-
ing licensing terms and frequent audits. Oracle, for instance, announced that they would double
licensing fees if their software is run on AWS or Microsoft. As a result, customers are attempting
to switch as soon as possible to open-source databases such as MySQL, PostgreSQL, and MariaDB.

Customers who are migrating to open-source databases are seeking to strike a balance between
-
cial-grade databases. Achieving the same level of performance on open-source databases as on

AWS introduced Amazon Aurora, a cloud-native relational database that is compatible with MySQL
and PostgreSQL to address this need. Aurora aims to provide a balance between the performance
and availability of high-end commercial databases and the simplicity and cost-effectiveness of
open-source databases. It boasts 5 times better performance than standard MySQL and 3 times
better performance than standard PostgreSQL, while maintaining the security, availability, and
reliability of commercial-grade databases, all at a fraction of the cost. Additionally, customers
can migrate their workloads to other AWS services, such as DynamoDB, to achieve application
scalability.
262 Selecting the Right Database Service

database services that you have learned about, so let’s put them together and learn how to choose
the right database.


In the previous sections, you learned how to classify databases and the different database ser-
vices that AWS provides. In a nutshell, you learned about the following database services under
different categories:



When you think about the collection of databases shown in the preceding diagram, you may think,
“Oh, no. You don’t need that many databases. I have a relational database, and it can take care
of all this for you”. Swiss Army knives are hardly the best solution for anything other than the
most straightforward task. If you want the right tool for the right job that gives you the expected

So no one tool rules the world, and you should have the right tool for the right job to make you
spend less money, be more productive, and change the customer experience.

Consider focusing on common database categories to choose the right database instead of brows-
ing through hundreds of different databases. One such category is ‘relational,’ which many people
are familiar with. Suppose you have a workload where strong consistency is crucial, where you

that will be asked of the data and require consistent answers. In that case, a relational database
263

Popular options for this category include Amazon Aurora, Amazon RDS, open-source engines
like PostgreSQL, MySQL, and MariaDB, as well as RDS commercial engines such as SQL Server
and Oracle Database.

AWS has developed several purpose-built non-relational databases to facilitate the evolution of
application development. For instance, in the key-value category, Amazon DynamoDB is a data-
base that provides optimal performance for running key-value pairs at a single-digit millisecond

querying data in the same document model used in your application code, then Amazon Docu-

-
ed an XML data type to become an XML database. However, this approach had limitations, as

have replaced XML databases. Amazon DocumentDB, launched in January 2019, is an excellent
example of such a database.

If your application requires faster response times than single-digit millisecond latency, consider an
in-memory database and cache that can access data in microseconds. Amazon ElastiCache offers
management for Redis and Memcached, making it possible to retrieve data rapidly for real-time
processing use cases such as messaging, and real-time geospatial data such as drive distance.

Suppose you have large datasets with many connections between them. For instance, a sports
company should link its athletes with its followers and provide personalized recommendations
based on the interests of millions of users. Managing all these connections and providing fast
queries can be challenging with traditional relational databases. In this case, you can use Amazon

data.

not just a timestamp or a data type that you might use in a relational database.
Instead, a time-series database’s core feature is that the primary axis of the data model is time.

an example of a purpose-built time-series database that provides fast and scalable querying of
time-series data.

Amazon QLDB is a fully managed ledger database service. A ledger is a type of database that is
used to store and track transactions and is typically characterized by its immutability and the
ability to append data in sequential order.
264 Selecting the Right Database Service

means that once data is written to the ledger, it cannot be changed, and new data can only be

A wide-column database is an excellent choice for applications that require fast data processing
-
ment, and route optimization. Amazon Keyspaces for Apache Cassandra provides a wide-column
database option that allows you to develop applications that can handle thousands of requests
per second with practically unlimited throughput and storage.

trying to solve. Some of the questions the requirements should answer are as follows:





Why are these questions -

In instances where there is a lot of data and it needs to be accessed quickly, NoSQL databases
might be a better solution. SQL vendors realize this and are constantly trying to improve their
offerings to better compete with NoSQL, including adopting techniques from the NoSQL world.
For example, Aurora is a SQL service, and it now offers Aurora Serverless, taking a page out of
the NoSQL playbook.

As services get better, the line between NoSQL and SQL databases keeps on blurring, making

might want to draw up a Proof of Concept using a couple of options to determine which option

Another reason to choose SQL or NoSQL might be the feature offered by NoSQL to create sche-

However, tread carefully. Not having a schema might come at a high price.
265

which becomes too variable and creates more problems than it solves. Just because we can create
databases without a schema in a NoSQL environment, we should not forgo validation checks
before creating a record. If possible, a validation scheme should be implemented, even when
using a NoSQL option.

It is true that going schema-less increases implementation agility during the data ingestion phase.
However, it increases complexity during the data access phase. So, make your choice by making
a required trade-off between data context vs. data performance.


maintain your relational databases as they scale, consider switching
to a managed database service such as Amazon RDS or Amazon Aurora. With these services, you
can migrate your workloads and applications without the need to redesign your application, and
you can continue to utilize your current database skills.

Consider moving to a managed relational database if:

• Your database is currently hosted on-premises or in EC2.


• You want to reduce the burden of database administration and allocate DBA resources
to application-centric work.
• You prefer not to rearchitect your application and wish to use the same skill sets in the
cloud.
• You need a straightforward path to a managed service in the cloud for database workloads.
• You require improved performance, availability, scalability, and security.

Self-managed databases like Oracle, SQL Server, MySQL, PostgreSQL, and MariaDB can be mi-
grated to Amazon RDS using the lift and shift approach. For better performance and availability,
MySQL and PostgreSQL databases can be moved to Amazon Aurora, which offers 3-5 times better
throughput. Non-relational databases like MongoDB and Redis are popularly used for document
and in-memory databases in use cases like content management, personalization, mobile apps,

maintain non-relational databases at scale, organizations can move to a managed database ser-
vice like Amazon DocumentDB for self-managed MongoDB databases or Amazon ElastiCache for

to manage the databases without rearchitecting the application and enable the same DB skill sets
to be leveraged while migrating workloads and applications.
266 Selecting the Right Database Service

As you understand the different choices of databases, then the question comes of how to migrate

• Self-service - For many migrations, the self-service path using the DMS and Schema
Conversion Tool (SCT) offers the tools necessary to execute with over 250,000 migrations
completed through DMS, customers have successfully migrated their instances to AWS.
Using the Database Migration Service (DMS), you can make homogeneous migrations
from your legacy database service to a managed service on AWS, such as from Oracle to

possible, such as converting from SQL Server to Amazon Schema Conversion


Tool (SCT) assesses the source compatibility and recommends the best target engine.
• Commercially licensed to aws databases
looking to move away from the licensing costs of commercial database vendors and avoid
vendor lock-in. Most of these migrations have been from Oracle and SQL Server to open-
source databases and Aurora, but there are use cases for migrating to NoSQL databases
as well. For example, an online store may have started on a commercial or open-source
database but now is growing so fast that it would need a NoSQL database like DynamoDB
to scale to millions of transactions per minute. Refactoring, however, typically requires
application changes and takes more time to migrate than the other migration methods.
AWS provides a Database Freedom program to assist with such migration. You can learn
more about the AWS Database Freedom program by visiting the AWS page: https://aws.
amazon.com/solutions/databasemigrations/database-freedom/.
• MySQL Database Migrations - Standard MySQL import and export tools can be used
for MySQL database migrations to Amazon Aurora. Additionally, you can create a new
Amazon Aurora database from an Amazon RDS for MySQL database snapshot with ease.
Migration operations based on DB snapshots typically take less than an hour, although
the duration may vary depending on the amount and format of data being migrated.
• PostgreSQL Database Migrations - For PostgreSQL database migrations, standard Post-
greSQL import and export tools such as pg_dump and pg_restore can be used with Amazon
Aurora. Amazon Aurora also supports snapshot imports from Amazon RDS for PostgreSQL,
and replication with AWS DMS.
267

• The AWS Data Lab is a service that helps customers choose their platform and understand
the differences between self-managed and managed services. It involves a 4-day intensive
engagement between the customer and AWS database service teams, supported by AWS
solutions architecture resources, to create an actionable deliverable that accelerates the
customer’s use and success with database services. Customers work directly with Data
Lab architects and each service’s product managers and engineers. At the end of a Lab, the
customer will have a working prototype of a solution that they can put into production
at an accelerated rate.

A Data Lab is a mutual commitment between a customer and AWS. Each party dedicates
key personnel for an intensive joint engagement, where potential solutions will be eval-

days to create usable deliverables to enable the customer to accelerate the deployment of
large AWS projects. After the Lab, the teams remain in communication until the projects
are successfully implemented.

In addition to the above, AWS has an extensive Partner Network of consulting and software vendor
partners who can provide expertise and tools to migrate your data to AWS.


In this chapter, you learned about many of the database options available in AWS. You started by
revisiting a brief history of databases and innovation trends led by data. After that, you explored

You further explored different types of databases and when it’s appropriate to use each one, and
-
ple database choices available in AWS, and you learned about making a choice to use the right
database service for your workload.

In the next chapter, you will learn about AWS’s services for cloud security and monitoring.

You might also like