0% found this document useful (0 votes)
10 views

AWS+Database Distribution ML

Uploaded by

pushpjeetsahay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

AWS+Database Distribution ML

Uploaded by

pushpjeetsahay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

AWS Databases

Relational, NoSQL, In-Memory, Data warehouse, Specialized

Chandra Lingam
Cloud Wave LLC

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


AWS Databases

Note: Not complete list

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Standard Features – AWS Databases

Encryption

Database Backup
S3
Security Group

Encryption Multi-AZ

My VPC

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


DynamoDB Global Table - Multi-Region, Multi-Master

Client Client

Low Latency Access

Multi-Master, Multi-Region
Database Database

My VPC Region 1 My VPC Region 2

North America South America

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Benefits
• Wide selection of database engines
• Fully managed
• VPC Network Isolation
• Encryption at rest using KMS
• Encryption in transit
• Automated Backup
• Highly Durable and Available – Replicated across multiple devices
in Availability Zone, Region
• Multi-Region, Multi-Master (some products) – Low latency access
and disaster recovery

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


AWS Portfolio of Databases (1 of 2)
Service Type of Database
RDS - Relational Relational Database. Choice of database engines - Aurora,
Database Service PostgreSQL, MySQL, MariaDB, Oracle Database, SQL Server
Uses: Traditional applications, ERP, CRM, ecommerce

Redshift Petabyte scale Data warehouse, Massively Parallel


Columnar Storage, integrates with S3 data lake
Uses: business intelligence, analytics, SQL to explore data lake

DynamoDB, NoSQL Database


Cassandra, Key-value storage, document store, consistent single digit
DocumentDB millisecond latency at any scale
Uses: high traffic web applications, ecommerce, gaming systems

ElastiCache In-memory database - MemCached, Redis


Sub-millisecond latency
Uses: Caching, user session, gaming leaderboards, geospatial applications
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
AWS Portfolio of Databases (2 of 2)
Service Type of Database

Neptune Graph Database – optimized for highly connected datasets and


querying relationships
Uses: Social networks, recommendation engines
Timestream Timeseries Database – optimized for storing and querying high
volume timeseries data at 1/10th the cost of relational databases
Uses: IoT applications, Industrial telemetry, DevOps
Quantum Ledger Ledger Database – Blockchain based system for transparent,
Database immutable, and cryptographically verifiable transaction log
Uses: Systems of record, supply chain, banking transactions

Elasticsearch Search database, store, analyze and correlate logs from disparate
applications and systems
Uses: search, infrastructure and application monitoring, Security info and
event management

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Database Migration

AWS Database Migration Service (DMS)


One-time data replication
Continuous data replication from on-premises to AWS
(and reverse)
Homogeneous and Heterogeneous replication

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Summary

"The broadest selection of purpose-built databases for all


your application needs"

"By picking the best database to solve a specific problem


or a group of problems, you can breakaway from restrictive
one-size-fits-all monolithic databases“

Reference: https://aws.amazon.com/products/databases/

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Relational Database Service
(RDS)

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Relational Database

User Movie Actor

User-Movie Movie-
Rating Actor

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Relational Database
General Purpose – Design a schema for any need

Rigid Schema – difficult to change

SQL – Flexible Querying System

Complex System

Scaling Challenges
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Amazon Relational Database Service (RDS)
Automates time-consuming administrative tasks (hardware,
installation, patching, backup)

Production ready database in minutes

Push button scaling (CPU, Memory, Storage)

Six popular database engines: Aurora, MySQL, PostgreSQL,


MariaDB, Oracle, SQL Server
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Amazon Relational Database Service (RDS)

SYNC
User, Primary Standby
Application Create
Read
Update
Delete

EBS EBS
Storage Storage

AZ 1 AZ 2
Multi-AZ Configuration
• Connect using DNS Name
• RDS maintains mapping between DNS Name and Primary Instance
• After failover, DNS is updated to point to new primary
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Failover Client

DNS – mydb.a1b2yz.us-west-2.rds.amazonaws.com

Primary Standby
Primary

Standby

AZ 1 AZ 2

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


RDS - Backup and Snapshot

Automated
Primary Standby S3
Backup

User initiated - Snapshot


Automated Backup
• Configurable for a retention up to 35 Snapshot
days • User initiated
• Last restorable time – typically within last • Snapshot is kept until explicitly
5 minutes deleted
• Point-in-time restore up to specified • Suitable for long term retention
second (to a new instance) • Copy to another region
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
RDS – Read Replica
Client Client

Read, Write Read

Primary Read
ASYNC Replica

Standby

Read Replica
• Offload Read traffic from primary instance
• Data can be stale
• One or more read replicas (depending on DB engine)
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
RDS Patching

"Amazon RDS will make sure that the relational database


software powering your deployment stays up-to-date with
the latest patches.“

You can specify a maintenance window that RDS can use


for patching systems

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


RDS – Scaling CPU and Memory

• Specify desired CPU and Memory configuration and RDS


takes care of scaling
• Completes in a few minutes (needs to spin up new
instances)
• RDS performs failover during compute scaling
(interruption to client for the duration of failover)

Scaling can be scheduled during next maintenance window


or apply-immediate
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
RDS – Storage Scaling

• Storage can be scaled without interruption (zero-


downtime)
• SQL Server up to 16 TB
• Aurora up to 64 TB
• MySQL, MariaDB, PostgreSQL, Oracle up to 32 TB

Scaling can be scheduled during next maintenance window


or apply-immediate

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


RDS – Deployment

Web Web Web

Public Subnet Public Subnet Public Subnet

RDS RDS
Primary Standby

Private Subnet Private Subnet Private Subnet

AZ 1 AZ 2 AZ 3
VPC

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


RDS – Network Security
• Deploy RDS in Private Subnet (unless your requirement
is a publicly accessible RDS instance)
• Configure RDS Security Group to allow access from
Web Server or Application Server Security Groups
• Assign a subnet in all Availability Zones to the DB
Subnet Group
• In case of extended AZ down or some other issue, RDS may
choose to launch a replacement standby instance in a
different AZ
• Connect from on-premises using Amazon DirectConnect
or VPN
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
RDS – Permissions and Encryption

• IAM for Control Plane Access – who can create,


manage, delete RDS database instances
• DB Specific User for Data Plane access – who can
connect to the database, run SQL
• Optional encryption at rest using AWS Key Management
Service (KMS)
• Optional encrypted connection support using SSL/TLS

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


RDS – Customization, Optimization
• You can customize RDS database instance and fine
tune using DB Parameter Groups
• RDS provides best practice guidance by analyzing
configuration and usage metrics
• Use Reserved Instances for long term use (1 to 3-year
terms) at substantial discount
• To prevent configuration drifts, you can use AWS Config
to record and audit changes to DB instance
• For monitoring, you can use CloudWatch

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Amazon Aurora and Aurora
Serverless

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Traditional Relational Database Engine

SYNC
User, Primary Standby
Application Create
Read
Update
Delete

EBS EBS
Storage Storage

AZ 1 AZ 2
Multi-AZ Configuration

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Amazon Aurora

Read Read
Primary
Replica 1 Replica 15

Read
Read Read
Write

1 2 3 4 5 6
AZ 1 AZ 2 AZ 3

Aurora Storage Subsystem

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Aurora vs other Relational Databases
• Storage Subsystem that automatically maintains six copies
of data across three availability zones
• Any changes made by Primary instance is replicated
automatically
• Low latency Read Replica instances (lag time often in
single digit millisecond)
• When the Primary fails,
• A Read Replica is promoted as the new primary (typically under
60 seconds)
• If Read Replica is not there, a new replacement primary is
launched
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Aurora Features
• MySQL and PostgreSQL Compatibility Modes
• Up to five times faster than standard MySQL database
• Up to three times faster than standard PostgreSQL
database
• Security, Availability, Reliability of commercial databases
at 1/10th cost
• Support for up to 15 low latency read replicas
• Global Database - Multi-Region Replication (fast local
access, disaster recovery) for globally distributed
applications
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Aurora
Cluster Endpoint
• Points to Current Primary Instance
• Suitable for Writes and Reads
mydbcluster.cluster-123456789012.us-east-1.rds.amazonaws.com:3306
Reader Endpoint
• Points to Read Replicas
• Suitable for Reads
• Multiple Read Replicas are load balanced at connection level
mydbcluster.cluster-ro-123456789012.us-east-1.rds.amazonaws.com:3306
Instance Endpoint
• Points to Individual Aurora Instance
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Amazon Aurora Serverless Aurora Server Warm Pool
Client

Aurora Proxy Fleet

Primary

Read/Write

1 2 3 4 5 6
Aurora Storage Subsystem

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Aurora Serverless
• Storage and Processing are separate – scale down to
zero processing and pay only for storage
• Automatic Pause and Resume – Configurable period of
inactivity after which DB Cluster is Paused
• Default is 5 minutes
• When paused, you are charged only for Storage
• Automatically Resumes when new database connections are
requested

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Aurora Serverless
• Aurora Serverless - Suitable for use cases that are
intermittent or unpredictable
• Specify Minimum, Maximum Aurora Capacity Units
(ACU)
• 1 ACU is ~2 GB of Memory with corresponding
CPU/Network
• Pricing 1 ACU is $0.06 per hour + Storage + I/O
• Aurora Serverless automatically scales up and down
based on load
• Scaling is rapid – uses a pool of warm resources
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
NoSQL Databases
DynamoDB, Cassandra, DocumentDB

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


DynamoDB
• Key-value NoSQL datastore
• Flexible schema - only primary key needs to be defined
– all columns/attributes are flexible
• Consistent performance at any scale – single digit
millisecond

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Example: Movie Data

Data Sample:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.html
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Primary Key

Simple – Single attribute

Composite – Two attributes (partition key, sort key)

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Simple Primary Key - title
title
Avatar Data
Citizen Kane
JurassicPark
Titanic Data
Up
X-Men

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Composite Primary Key – year, title
title
year
Airplane! Data
1965
Flash Gordon
1970 Star Wars V
1980 SuperMan II Data
2002 The Shining
2006
title
2019 300
Cars
Casino Royale
The Departed
The Prestige

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Scale-Out Processing
Partition Key

Hash Function – Maps the Partition Key


to a Node

Partition Partition Partition


1 2 n

DynamoDB Database
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Game Score Table

UserID GameTitle Country Other attributes


101 G1 USA
101 G2 USA
102 G1 USA
102 G2 USA
103 G1 USA
103 G2 USA
104 G1 USA
104 G2 USA

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Game Score Table – UserID, GameTitle
UserID

Partition 1 Partition 2 Partition 3


UserID Title UserID Title UserID Title
101 G1 102 G1 103 G1
101 G2 102 G2 103 G2
104 G1
104 G2

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Game Score Table – GameTitle, UserID
GameTitle

Partition 1 Partition 2 Partition 3


Title UserID Title UserID Title UserID
G1 101 G2 101
G1 102 G2 102
G1 103 G2 103
G1 104 G2 104

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Game Score Table – Country, UserID
Country

Partition 1 Partition 2 Partition 3


Country UserID Country UserID Country UserID
USA 101
USA 102
USA 103
USA 104

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


DynamoDB Features
• Automatic replication of data across multiple-availability
zones in a region
• Global Tables – multi-master, multi-region replication -Fast
local access across different regions
• ACID Transaction Support
• Point-in-Time Recovery – Automated Continuous Backup
(35 days retention)
• On-Demand Backup/Snapshot for long term retention
• Automatic deletion of expired items – Time To Live
• Limits - Item size cannot exceed 400 KB
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
DynamoDB Global Table - Multi-Region, Multi-Master

Client Client

Low Latency Access

Multi-Master, Multi-Region
Database Database

My VPC Region 1 My VPC Region 2

North America South America

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Transactions
DynamoDB supports ACID Transactions - Atomicity,
Consistency, Isolation, Durability

Transactions are useful when you want to insert, delete or


update multiple items as a single logical operation

"DynamoDB provides native, server-side support for transactions,


simplifying the developer experience of making coordinated, all-or-
nothing changes to multiple items both within and across tables"
https://aws.amazon.com/dynamodb/features/

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Cassandra, DocumentDB

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Amazon Managed Cassandra
AWS managed open source Apache Cassandra

Move Cassandra workloads to AWS Cloud

Performance Benefits are comparable to DynamoDB

AWS recommends Cassandra for: industrial equipment


data collection, and other use cases that require high
performance and large number of columns
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Cassandra versus DynamoDB
• DynamoDB primary key is made up of single attribute
partition key and optional single attribute sort key.
Cassandra supports multi-column partition and sort keys
• DynamoDB max item size is 400KB – Cassandra has a
theoretical limit of 2GB per column. However, general
practice is not to exceed few MBs.
• Cassandra also supports large number of columns –
DynamoDB even though supports large number of
attributes, it is constrained by 400KB size limit per item

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Amazon DocumentDB

“Amazon DocumentDB (with MongoDB compatibility) is a


fast, scalable, highly available, and fully managed
document database service that supports MongoDB
workloads.”

DocumentDB emulates MongoDB API and it is not true port


of open source code. Currently, there is a drift in the
direction of MongoDB and DocumentDB.

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


ElastiCache
In-memory data store

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Amazon ElastiCache

• In-memory datastore with sub-millisecond latency


• Ideal for frequently read data, reduce read-traffic going
to database, buffer high-frequency writes and
periodically reconcile with backend database
Uses: Product reviews and rating, Caching, Session Management,
Gaming leaderboards, geospatial applications
• Deploy in your VPC – Network isolation and security
• Choice of engines: Memcached, Redis

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Game Leader Boards (READs/WRITEs)
Players

Application

4. Periodic Writes 1. User 3. Game Play 2. User Session


From Cache Authentication

Game Leader
GameScores UserProfile UserSession
Board

DynamoDB ElastiCache
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
MemCached Features

• Key-value store
• Scales up to 20 nodes and 12.7 TB
• Sub milli-second latency

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Redis Features

• In-memory datastore with advanced data structures: Strings,


Lists, Sorted Sets, Hash, Bit Arrays
Sorted Sets can be used to easily Game Leader Boards – keep a list of
players sorted by rank.
• Built-in commands for Geospatial data
• Distance between two places or persons
• Find all places within a given distance from a point
• Sub milli-second latency
• Scales up to 250 nodes and 170 TB

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Redis High Availability Features

• Pub-Sub and Messaging


For example, High performance chat rooms, server to server
communication, social media feeds
• Read Replica across multiple Availability Zones
• Detects primary node failure and automatically promotes
replica as primary
• Backup, Restore
• Export to another region
• Lua scripting support
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.
Amazon Redshift

Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.


Data Warehouse - Redshift
• Peta Byte Scale Massively Parallel Relational Database
• Cluster consists of Leader Node and Multiple Compute
Nodes
Available Storage = Storage per Compute Node X Number of
Compute nodes
• Columnar Storage
• Targeted Data Compression
• Powerful SQL based Analytics
• With Redshift Spectrum - query can span tables in
Redshift and files stored in S3 Data Lake
Copyright © 2019 ChandraMohan Lingam. All Rights Reserved.

You might also like