AWS Developer Associate
Databases on AWS
Learning Objectives
By the end of this lesson, you will be able to:
Identify the different types of databases offered by AWS
Present an overview of the features and benefits of Amazon RDS
Create a table using DynamoDB console
Delineate the concepts in DynamoDB
List the aspects of Amazon ElastiCache
Introduction to Databases
What Is a Database?
A database is a collection of individual data items stored in a highly structured manner.
Provides the ability to store a large amount of information
Facilitates quick access to information
Allows users to share information at different locations
Ensures data security
AWS databases are both relational and non-relational.
Relational Databases
A Relational Database is a group of data items having pre-defined relationships with each other.
These items are arranged into a set of tables with rows and columns.
Table 2
users
1 1
id int
Table 1 Table 4
ratings first_name varchar tags
last_name varchar
id int id int
email varchar
rating int tag varchar
user_id int Table 3 user_id int
.. ..
movie_id int movies movie_id int
id int
1 1
name varchar
description text
Features of Relational Databases
SQL: is a primary interface
Data integrity: is enforced by a set of constraints
Transactions: result in a COMMIT or a ROLLBACK
ACID compliance: ensures data integrity
AWS Relational Databases
Here are some Relational Database Engines that Amazon RDS offers:
Amazon Aurora
Oracle
Microsoft SQL Server
Maria DB
Key-Value Databases
A Key-Value Database is a type of Non-relational Database. To store data, it uses a collection of key-
value pairs in which the key acts as a unique identifier.
Products
Primary key
Attributes
Partition key Sort key
Product ID Type Schema is defined per item
1 Book ID Odyssey Homer 1871
2 Album ID 6 Partitas Bach
Items
Album ID: Partita
2
Track ID No. 1
Drama,
3 Movie ID The Kid Chaplin
Comedy
Example of data stored as key-value pairs in DynamoDB
Use Cases of Key-Value Databases
Session Store Shopping Cart
Session data is always queried by a Key-Value Databases are capable of
primary key. Hence, a fast key-value scaling large amounts of data and
store is an ideal fit for session data. high volumes of state changes.
AWS In-Memory Databases
An In-Memory Database is a type of purpose-built database that primarily depends on memory
for data storage.
In-Memory Databases are ideal for applications that need microsecond response times.
Application
Master Server
RAM: RAM: RAM: RAM:
Data Partition 1 Data Partition 2 Data Partition 3 Data Partition 4
Use Cases of In-Memory Databases
Real-Time Bidding Gaming Leaderboards
In-Memory Databases can In-Memory Databases can quickly
ingest, process, and analyze real-time deliver sorting results and update
data with sub-millisecond latency. the leaderboard in real-time.
Caching
The primary purpose of a cache
is to facilitate increased
data retrieval performance.
AWS In-Memory Databases
Amazon Elasticache for Redis
A blazing fast in-memory data store that
provides sub-millisecond latency to power
Internet-scale, real-time applications
Amazon Elasticache for Memcached
A Memcached-compatible in-memory
key-value store service that can be
used as a cache or a data store
Amazon RDS
Amazon RDS
Amazon RDS is a Relational Database Management service.
• Provides CPU, memory, IOPS, and storage separately for individual scaling
• Looks after software patching, updates, backups, recovery, and automatic failure
detection
• Facilitates creating backups automatically or manually via snapshot
• Has a primary instance and a simultaneous secondary instance to provide high
availability and avoid failure
It is mainly used to manage the data of e-commerce platforms,
gaming software, apps, and websites.
Benefits of Amazon RDS
Availability of MySQL, postgreSQL, Oracle,
and SQL servers
Need for payment only during use
Ease in handling of patching, backups, and
replication
Simple and fast scaling
AWS RDS Simple and fast deployment
Fast and predictable performance
Amazon RDS Database Engines
Amazon Aurora PostgreSQL
MySQL MariaDB
Microsoft SQL
Oracle
Server
Amazon Aurora
Amazon Aurora is a Relational Database fully managed by Amazon RDS.
Compatibility with MySQL and PostgreSQL
High speed: Up to 5X faster performance than MySQL and 3X faster
performance than PostgreSQL
Applicability for cross-region Read Replica
High availability, durability, and security
Cost-effective
Amazon Aurora consists a storage volume of 10GB logical blocks. It can scale
up to 64 TB when required.
Crash Recovery
Traditional Databases AWS Aurora
• Replay logs since the last • Performs redo of records on
checkpoint demand, as part of disk read
• Generally, takes five minutes • Performs parallel, distributed,
between checkpoints vs and asynchronous operations
• MySQL works with single-thread; • Does not replay on startup of
number of disk accesses are server
very high
Use Cases of Amazon RDS
Web and Mobile Applications
• Amazon RDS is the perfect fit for highly demanding applications as it
provides a high throughput, massive storage scalability, and high
availability.
• The absence of licensing constraints best suits the variable usage
pattern of these applications.
Use Cases of Amazon RDS
E-Commerce Applications
• Amazon RDS is a flexible, secured, highly scalable, and low-cost
database solution that is well-qualified for small and large e-
commerce businesses.
• It helps satisfy PCI compliance and builds a superior customer
experience, without the hassle of managing the underlying
database.
Use Cases of Amazon RDS
Mobile and Online Games
• Amazon RDS efficiently manages the database by taking care of the
provisioning, scaling, and monitoring of database servers.
• It can rapidly increase its capacity by providing familiar database
engines to meet user demand.
Database Instances
A Database Instance is a set of memory structures that manage the database.
It is a basic building block of RDS.
The computation and memory capacity of a DB Instance is determined by its
DB Instance class, which is selected as per need.
Every DB Instance can host multiple-user created databases or a single
oracle database with multiple schemas.
Every DB Instance runs on a DB engine.
By default, a customer can have 40 DB Instances.
Backup and Restore in Amazon RDS
VPC A VPC B
RDS RDS
Instance R Instance R
1 2
EC2 S3 EC2
Instance Bucket Instance
A B
Data Flow Diagram during Backup and Restore
Backup and Restore in Amazon RDS
Amazon RDS offers automated backups, point-in-time restores, and database snapshots.
AWS RDS carries the automated
backups of DB Instances, based on The backup retention period can be
the specified backup retention set between one and 35 days.
period.
When a DB Instance is deleted, the
Backups can also be created automated backups also get
manually via snapshots. deleted. But the manual snapshots
remain the same.
Multi-AZ Deployments in Amazon RDS
When a Multi-AZ DB Instance is provisioned, Amazon RDS creates a primary DB
Instance automatically and, simultaneously, replicates the data to a standby instance
in a different Availability Zone (AZ).
Benefits
Enhanced durability
Increased availability
Protected database performance
Automatic failover
Failover Conditions
AWS RDS automatically switches from a primary DB Instance to a standby replica present in
another availability zone whenever one or more of following conditions occur:
Failure of a primary DB Instance Blackout of an availability zone
Software patching of the OS of Change in the DB Instance
DB Instance under process server type
The normal failover time is 60–120 seconds. This may be exceeded in case of a heavy
recovery process.
Failover Conditions
Application Database
servers failure Standby
New standby
Availability Zone A
Primary
Availability Zone B
Read Replicas in Amazon RDS
Read Replicas are one or more copies of a particular Relational Database Instance to handle
high volume read traffic.
Application servers Database server
Read/write Primary
• Any amazon RDS activity initiated runs only
in the current default region.
• Amazon RDS provides high availability and
failover support for DB Instances by Asynchronous
maintaining asynchronous standby replica in replication
multi-availability zone deployments.
• Amazon RDS synchronizes standby replicas
in different availability zones. Read only
BI/reporting
application server Read replica
Costs of Amazon RDS
Amazon RDS offers a pay for what you use. The table below lists the billing procedure for
various parameters:
Parameters Billing procedure
Based on the class, a full hour will be considered even if
DB Instance hours
the DB Instance is consumed for a partial hour
Scaling the provisioned storage capacity within the
Storage (per GB per month)
month will be billed pro-rated
I/O requests per month Total number of storage i/o requests
Data transfer Data transfer in and out on tour DB Instance on Internet
Assisted Practice
RDS Database Instance
Problem Statement: Create an RDS database instance. Duration: 15 mins
Assisted Practice: Guidelines
Steps to create an RDS Database Instance:
1. Go to AWS management console and click on “RDS”.
2. Select the database engine.
3. Fill the required details.
4. Click on “launch DB Instance”.
5. Install WAMP 64 and give the path of its location in command prompt.
6. Enter the endpoint, username, port, and password to connect AWS, RDS and the WAMP
server.
7. Once the connect is done, perform CRUD operation in it.
Amazon DynamoDB
Difference Between SQL and NoSQL Databases
Characteristics SQL NoSQL
Workloads Ad hoc queries, data warehousing, OLAP Web scale applications
Schema-less with a primary key;
Well-defined schema where data is
Data model manages structured or
normalized into tables, rows, and columns
semi-structured data
AWS management console or
Data Access SQL
AWS CLI; performs ad hoc tasks
Performance Optimized for storage Optimized for compute
Scaling Vertical scaling Horizontal scaling
Amazon DynamoDB
DynamoDB is a fully managed NoSQL database that supports key-value and document data.
It is used by systems that require milli-second read latency.
The record in every row is known as item. A TTL (Time to leave) can be set to
automatically delete the items in the table once they expire.
Operations such as create, insert, update, query, scan, and delete are
performed in the table via appropriate API.
For faster performance and data durability, the table data is stored in an SSD
disk and spread across many servers in different availability zones.
Use Cases of Amazon DynamoDB
Ad tech Gaming
Retail Banking and Finance
Use Cases of
Amazon
DynamoDB
Media and Entertainment Software and Internet
Read Consistency in DynamoDB
DynamoDB supports both Eventually Consistent Reads and Strongly Consistent Reads.
Eventually Consistent Read
Stale data is provided instead of the one recently added in the DynamoDB table.
If the read request is repeated after a short time, the response returns the latest data.
Strongly Consistent Read
The response is returned with the most up-to-date data, reflecting the updates from all
prior successful write operations.
Strongly Consistent Read might not be available if there is a network delay or outage.
Amazon DynamoDB Global Tables
Amazon DynamoDB global tables act as a complete solution to deploy a multi-region,
multi-active database, without the need for building and maintaining a replication.
The AWS Regions where the table is to be available can be specified.
DynamoDB executes all the tasks needed to create identical tables in the
specified regions and distributes ongoing data changes to all of them.
How DynamoDB works
2. Add and query
items
3. Monitor and manage
1. Create table table
Benefits of Amazon DynamoDB Global Tables
Is a perfect fit for massively scaled
applications with globally dispersed users
Promotes fast application performance
Provides automatic multi-active replication
to AWS Regions globally
Delivers low-latency data access to users,
irrespective of their location
Amazon DynamoDB Pricing
The cost for using DynamoDB depends on the charges for reading, writing, and storing data
in DynamoDB tables, and for optional features, if any.
DynamoDB has two capacity modes that have specific billing options.
On-demand capacity mode Provisioned capacity mode
Charges for the data reads and writes Charges according to the number of
the application performs on the tables reads and writes specified per second
by the user
DynamoDB Use Case: Duolingo
Duolingo is a popular language-learning website and mobile app that delivers lessons for
80 languages. Duolingo uses DynamoDB to store around 31 billion items.
DynamoDB fits the requirements for Duolingo owing to its
scalability and performance.
Assisted Practice
DynamoDB
Problem Statement: Create a table using the DynamoDB Console. Duration: 15 mins
Assisted Practice: Guidelines
Steps to create a table using the DynamoDB Console:
1. Go to AWS management console and select the DynamoDB service.
2. Click on create table and enter the table name and primary keys.
3. Now select Items and click on create item to insert data in the table.
4. If the data is inserted successfully, you can read it from the dashboard.
5. If you want to remove an item from the table, click on remove.
6. If you want to delete the table, click on Delete table.
DynamoDB Concepts
Indexes
An index is a data structure that allows the user to perform fast queries on
specific columns in a table.
DynamoDB supports two types of indexes.
01 02
Local Global
Secondary Secondary
Index Index
Scan vs Query API Call
Scan API scans the table to Query API performs a direct
look for elements that match lookup to a selected partition.
the criteria. The lookup will be based on
partition or hash key.
DynamoDB APIs
There are three planes in DynamoDB API.
Control Plane
Data Plane
DynamoDB Streams
Control Plane
Control Plane allows to create and manage DynamoDB table.
CREATETABLE
DESCRIBETABLE
UPDATETABLE
Operations
DELETETABLE
LISTTABLE
DESCRIBELIMITS
Data Plane
Data Plane allows to perform CRUD actions on data in a table.
Creating data
Reading data
Updating data
Deleting data
Throughput Capacity
Throughput capacity is the speed at which the file server hosting the file system can
serve file data.
Read and Write capacities
A Read Capacity unit represents only one strongly consistent read per second,
or two Eventually Consistent Reads per second, for an item up to 4KB in size.
A Write Capacity unit represents one write per second for an item up to
1KB in size.
Note
Specify the capacity requirement for Read and Write activity
while creating the table.
DynamoDB On-Demand Capacity
DynamoDB On-Demand Capacity is a flexible billing option that requires no capacity
planning. The user need not mention the Read and Write Capacity.
On-demand is preferable when:
New tables with unknown workloads must be created.
The application traffic is unpredictable.
Pay for what is used is preferred.
Note
On-demand mode can be chosen either while creating the
table, or later, using the Capacity tab.
DynamoDB Accelerator
DynamoDB Accelerator (DAX) is a caching service, which is:
Fully Highly
Manageable Available
10-times In-memory
faster cache
DynamoDB Transactions
DynamoDB transactions help developers operate on multiple items in a single request.
Help the developer implement business logic that requires
multiple, all or no operation across one or more tables
Provide atomicity, consistency, isolation, and durability
(ACID) across tables
Support scale, and performance to a broader set of
workloads
Offer multiple read and write options to meet different
application requirements
Working of DynamoDB Transactions
TransactWriteItems API
Is a batch operation that contains a write set, with one or more PutItem, UpdateItem
and DeleteItem operations. It can optionally check the pre-requisite that must be
satisfied before an update is made.
Idempotency
It is an optionally available feature, which prevents application errors if multiple items
are submitted due to connection time-outs or network errors.
Working of DynamoDB Transactions
Error Handling for Writing
Write transaction fails if a condition expression is not met ‘or’ more than one action in
the same TransactionWriteItems target the same item.
TransactGetItems API
Is a batch operation that contains a read set with one or more GetItem operation. If it is
issued on an item that is a part of an active write transaction, the read transaction is
cancelled. It can include up to 25 unique items or 4 MB data.
DynamoDB Transactions
Within a transaction, a conflict can occur during concurrent item-level requests on a same
item.
The scenarios when transactional conflicts could occur are:
A request (put, update, delete) for an item conflicts with an ongoing
TransactWriteItems request
A request for a TransactWriteItems with an ongoing TransactWriteItems for
the same item
A request for a TransactGetItems with an ongoing TransactWriteItems for
the same item
DynamoDB Time To Live
Amazon DynamoDB Time to Live (TTL) supports defining a per-item timestamp. This helps to
determine when an item is no longer needed.
TTL Features
Removes user or sensor data after one year of inactivity
in an application
Archives expired items to an Amazon S3 data lake via
Amazon DynamoDB Streams and AWS Lambda
Retains sensitive data for a certain amount of time, based
on contractual or regulatory obligations
DynamoDB Streams
DynamoDB Streams are used to replicate the data from one table
to another in a different region.
APIs used for data transfer are:
LISTSTREAM: Retrieves a list of stream descriptors for current account and endpoint
DESCRIBESTREAM: Retrieves detailed information about a given stream.
GETSHRADITERATOR: Retrieves a shard iterator
GETRECORDS: Retrieves the stream records within a given shard
Routing Policies
Routing Policies are used to route the traffic based on the geographic location
from where the DNS query has originated.
Fast and
consistent Fully Fine-grained
performance manageable access control
Highly Event-driven Flexible in
scalable programming nature
Amazon ElasticCache
Amazon Elasticache
ElastiCache is an AWS in-memory data store and cache environment. It is used to cache results
and reduce overhead and latency on database.
It is a web service that improves the performance of web applications.
It helps to set up, manage, and scale a distributed in-memory cache
environment in the cloud.
It supports two open-source memory engines—Redis and Memcached.
Popular Use Cases of Elasticache: Adtech
Ad serving
Real-time bidding
ID-looking
Session tracking
User profile management
Popular Use cases of Elasticache: IoT
Tracking state
Real-time notification
Metadata and reading from
millions of devices
Popular Use cases of Elasticache: Gaming
Recording game details
Leader boards
Session information
Usage history
Logs
Popular Use cases of Elasticache: Mobile and Web
Storing user profile
Session details
Personalization setting
Entity-specific metadata
Amazon Elasticache: Redis
Redis is an in-memory data structure store used as database, cache, and
message broker.
It is single threaded, and its Read Replicas are synced asynchronously.
It collects one to six Redis nodes and the collection process is called Shard.
It uses one to 15 shards when cluster mode is enabled and uses only one
shard when it is disabled.
It stores the backups in s3, with a retention period of 0 to 35 days.
Amazon Elasticache for Redis: Benefits
Monitoring and management Enhanced Redis Engine
Reliable and efficient open
Simplified administrative tasks
source Redis
Security and compliance Scalability
Compliant data protection and Adjustable usage, based on the
help needs
Amazon Elasticache: Memcached
Memcached is used to speed up the dynamic data driven websites. Hence, it is called
distributed memory catching system .
Memcached is simple to use and is multi-threaded.
Memcached cluster can have a maximum of 100 nodes in a region.
Memcached supports both horizontal and vertical scaling.
Memcached is fast and is well established.
Benefits of Amazon Elasticache for Memcached
Extreme Performance Secure and Hardened
By utilizing an end-to-end optimized It continuously monitors your nodes and
stack running on customer nodes, it applies the necessary patch to keep your
provides blazing fast performance. environment safe.
Memcached compatible
It’s compliant with Memcached, so
popular tools we use today will work
seamlessly with the service.
Benefits of Amazon ElastiCache for Memcached
Easily Scalable Fully-Managed
It includes sharding to scale in – memory No longer need to perform management
cache up to 20 nodes and 12.7 TB per tasks as it monitors your cluster to keep
cluster. your workloads up and running.
Auto-Discovery
It saves users’ time by simplifying the
way an application connects to a
Memcached cluster.
Amazon Elasticache Costs
Elasticache offers a usage-based subscription following a free trial. It provides storage space
for one snapshot free of charge for each active ElastiCache for Redis cluster.
Shown below is a list of node types supported by Elasticache:
On-demand nodes: A user pays for memory capacity by
the hour that a node runs.
Reserved nodes: A user can choose to make a one-time
upfront payment, no upfront payment, or one-time upfront
payment with low hourly charges for each reserved node.
Note
Additional back up storage for snapshots is charged at
$0.085 per GB every month.
Memcached versus Redis
Characteristics Memcached Redis
Is an in-memory key value store, Is an in-memory data structure store, used
Description
originally intended for catching as database, cache, and message broker
Replication Does not support replication Supports master-slave replication
Stores variables in memory and
Storage type retrieves information directly Is like a database that resides in memory
from server instead of DB
Memcached versus Redis
Characteristics Memcached Redis
Good to handle high traffic Neither can handle high traffic on read nor
Read and Write speed
websites heavy writes
Key length Has a maximum of 250 bytes Has a maximum of 2GB
Catching relatively small and Session cache, full page cache (FPC),
Ideal for static data such as HTML code Queues, 000000000000000000000 or
fragments counting, and more
Key Takeaways
There are three types of databases offered by AWS—
Relational, Key-Value, and In-Memory Databases.
Amazon RDS is a web service that helps to set up, operate,
and scale a relational database in the AWS Cloud.
Amazon DynamoDB is a fully-managed NoSQL database
service that provides high speed and seamless scalability.
There are three planes in DynamoDB API—Control Plane,
Data Plane, and DynamoDB Streams.
Amazon ElastiCache is used to cache results and reduce the
overhead and latency on the database.
Storing Application Data in MySQL DB using Amazon RDS
Problem Statement:
You are asked to demonstrate joining multiple VPC together using Peering
Connection and Private Link
Tools required:
WAMP Server, AWS RDS, Visual Studio Code
Expected Deliverables:
Screenshots for every steps