0% found this document useful (0 votes)
64 views

AWS Databases

Uploaded by

Kaustubh Negi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

AWS Databases

Uploaded by

Kaustubh Negi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Amazon RDS

AWS RDS Overview

● RDS stands for Relational Database Service


● It’s a managed DB service for DB use SQL as a query language.
● It allows you to create databases in the cloud that are managed by AWS
○ Postgres
○ MySQL
○ MariaDB
○ Oracle
○ Microsoft SQL Server
○ Aurora (AWS Proprietary database)
Advantage over using RDS versus deploying
DB on EC2
● RDS is a managed service:
○ Automated provisioning, OS patching
○ Continuous backups and restore to specific timestamp (Point in Time Restore)!
○ Monitoring dashboards
○ Read replicas for improved read performance
○ Multi AZ setup for DR (Disaster Recovery)
○ Maintenance windows for upgrades
○ Scaling capability (vertical and horizontal)
○ Storage backed by EBS (gp2 or io1)
● BUT you can’t SSH into your instances
RDS Solution Architecture - EC2
RDS– Storage Auto Scaling
● Helps you increase storage on your RDS DB instance
dynamically
● When RDS detects you are running out of free database storage,
it scales automatically
● Avoid manually scaling your database storage
● You have to set Maximum Storage Threshold (maximum limit
for DB storage)
● Automatically modify storage if:
○ Free storage is less than 10% of allocated storage
○ Low-storage lasts at least 5 minutes
○ 6 hours have passed since last modification
○ Useful for applications with unpredictable workloads
○ Supports all RDS database engines (MariaDB, MySQL,

● PostgreSQL, SQL Server, Oracle


RDS Read Replicas for read scalability
● Up to 5 Read Replicas

● Within AZ, Cross AZ or


Cross Region

● Replication is ASYNC, so
reads are eventually
consistent

● Replicas can be promoted


to their own DB

● Applications must update


the connection string to
leverage read replicas
RDS Read Replicas – Network Cost
● In AWS there’s a network cost when data goes from one AZ to another
● For RDS Read Replicas within the same region, you don’t pay that fee
Amazon Aurora
● Aurora is a proprietary technology from AWS (not open sourced)
● PostgreSQL and MySQL are both supported as Aurora DB
● Aurora is “AWS cloud optimized” and claims 5x performance improvement
over MySQL on RDS, over 3x the performance of Postgres on RDS
● Aurora storage automatically grows in increments of 10GB, up to 64 TB.
● Aurora costs more than RDS (20% more) – but is more efficient

NOTE: Not in the free tier


Aurora DB Cluster
Features of Aurora
● Automatic fail-over
● Backup and Recovery
● Isolation and security
● Industry compliance
● Push-button scaling
● Automated Patching with Zero Downtime
● Advanced Monitoring
● Routine Maintenance
● Backtrack: restore data at any point of time without using backups
Aurora Replicas - Auto Scaling
Aurora – Custom Endpoints
● Define a subset of Aurora Instances as a Custom Endpoint
● Example: Run analytical queries on specific replicas
● The Reader Endpoint is generally not used after defining Custom Endpoints
Aurora Serverless
● Automated database
instantiation and auto-
scaling based on actual
usage
● Good for infrequent,
intermittent or
unpredictable workloads
● No capacity planning
needed
● Pay per second, can be
more cost-effective
Backups
RDS
● Automated backups: Aurora
○ Daily full backup of the database (during the ● Automated backups
maintenance window)
○ Transaction logs are backed-up by RDS every ○ 1 to 35 days (cannot be disabled)
5 minutes
○ => ability to restore to any point in time (from ○ point-in-time recovery in that
oldest backup to 5 minutes ago)
○ 1 to 35 days of retention, set 0 to disable timeframe
automated backups

● Manual DB Snapshots ● Manual DB Snapshots


○ Manually triggered by the user
○ Retention of backup for as long as you want ○ Manually triggered by the user
○ Retention of backup for as long as you
● Trick: in a stopped RDS database, you will want
still pay for storage. If you plan on stopping
it for a long time, you should snapshot &
restore instead
Amazon ElastiCache Overview
● The same way RDS is to get managed Relational Databases…
● ElastiCache is to get managed Redis or Memcached
● Caches are in-memory databases with high performance, low latency
● Helps reduce load off databases for read intensive workloads
● AWS takes care of OS maintenance / patching, optimizations, setup,
configuration, monitoring, failure recovery and backups
ElastiCache Solution Architecture - Cache
ElastiCache – Redis vs Memcached
DynamoDB
● Fully Managed Highly available with replication across 3 AZ
● NoSQL database - not a relational database
● Scales to massive workloads, distributed “serverless” database
● Millions of requests per seconds, trillions of row, 100s of TB of storage
● Fast and consistent in performance
● Single-digit millisecond latency – low latency retrieval
● Integrated with IAM for security, authorization and administration
● Low cost and auto scaling capabilities
● Standard & Infrequent Access (IA) Table Class
DynamoDB Accelerator - DAX
● Fully Managed in-memory cache for
DynamoDB
● 10x performance improvement – single-
digit millisecond latency to
microseconds latency – when accessing
your DynamoDB tables
● Secure, highly scalable & highly available
● Difference with ElastiCache at the CCP
level: DAX is only used for and is
integrated with DynamoDB, while
ElastiCache can be used for other
databases
DynamoDB – Global Tables
● Make a DynamoDB table accessible with low latency in multiple-regions
● Active-Active replication (read/write to any AWS Region)
Redshift Overview
● Redshift is based on PostgreSQL, but it’s not used for OLTP
● It’s OLAP – online analytical processing (analytics and data warehousing)
● Load data once every hour, not every second
● 10x better performance than other data warehouses, scale to PBs of data
● Columnar storage of data (instead of row based)
● Massively Parallel Query Execution (MPP), highly available
● Pay as you go based on the instances provisioned
● Has a SQL interface for performing the queries
● BI tools such as AWS Quicksight or Tableau integrate with it
Amazon EMR
● EMR stands for “Elastic MapReduce”
● EMR helps creating Hadoop clusters (Big Data) to analyze and process vast
amount of data
● The clusters can be made of hundreds of EC2 instances
● Also supports Apache Spark, HBase, Presto, Flink…
● EMR takes care of all the provisioning and configuration
● Auto-scaling and integrated with Spot instances
● Use cases: data processing, machine learning, web indexing, big data…
Amazon Athena
● Serverless query service to analyze data stored in Amazon S3
● Uses standard SQL language to query the files
● Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
● Pricing: $5.00 per TB of data scanned
● Use compressed or columnar data for cost-savings (less scan)
● Use cases: Business intelligence / analytics / reporting, analyze
& query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…

● Exam Tip: analyze data in S3 using serverless SQL, use Athena


Amazon QuickSight
● Serverless machine learning-powered business intelligence service to create
interactive dashboards
● Fast, automatically scalable, embeddable, with per-session pricing
● Use cases:
● Business analytics
● Building visualizations
● Perform ad-hoc analysis
● Get business insights using data
● Integrated with RDS, Aurora, Athena,
Redshift, S3…
DocumentDB
● Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
● DocumentDB is the same for MongoDB (which is a NoSQL database)
● MongoDB is used to store, query, and index JSON data
● Similar “deployment concepts” as Aurora
● Fully Managed, highly available with replication across 3 AZ
● DocumentDB storage automatically grows in increments of 10GB, up to 64
TB.
● Automatically scales to workloads with millions of requests per seconds
Amazon Neptune
● Fully managed graph database
● A popular graph dataset would be a social network
○ Users have friends
○ Posts have comments
○ Comments have likes from users
○ Users share and like posts…
● Highly available across 3 AZ, with up to 15 read replicas
● Build and run applications working with highly connected datasets – optimized for
complex and hard queries
● Can store up to billions of relations and query the graph with milliseconds latency
● Highly available with replications across multiple AZs
● Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines,
social networking
Amazon QLDB
● QLDB stands for ”Quantum Ledger Database”
● A ledger is a book recording financial
transactions
● Fully Managed, Serverless, High available,
Replication across 3 AZ
● Used to review history of all the changes made to your application data over time
● Immutable system: no entry can be removed or modified, cryptographically
verifiable
● 2-3x better performance than common ledger blockchain frameworks, manipulate
data using SQL
● Difference with Amazon Managed Blockchain: no decentralization component, in
accordance with financial regulation rules
Amazon Managed Blockchain
● Blockchain makes it possible to build applications where multiple parties can
execute transactions without the need for a trusted, central authority.
● Amazon Managed Blockchain is a managed service to:
○ Join public blockchain networks

○ Or create your own scalable private network

● Compatible with the frameworks Hyperledger Fabric & Ethereum


AWS Glue
● Managed extract, transform, and load
(ETL) service
● Useful to prepare and transform data
for analytics
● Fully serverless service
● AWS Glue is a serverless data
integration service that makes it easier
to discover, prepare, move, and integrate data from multiple sources for analytics,
machine learning (ML), and application development.
● AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives.
● For example, you can configure AWS Glue to initiate your ETL jobs to run as soon
as new data becomes available in Amazon Simple Storage Service (S3).
DMS – Database Migration Service
● Quickly and securely migrate databases to AWS,
resilient, self healing
● The source database remains available during the
migration
● Supports:
● Homogeneous migrations: ex Oracle to Oracle
● Heterogeneous migrations: ex Microsoft SQL Server
to Aurora
Databases & Analytics Summary in AWS
● Relational Databases - OLTP: RDS & Aurora (SQL)
● Differences between Multi-AZ, Read Replicas, Multi-Region
● In-memory Database: ElastiCache
● Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
● Warehouse - OLAP: Redshift (SQL)
● Hadoop Cluster: EMR
● Athena: query data on Amazon S3 (serverless & SQL)
● QuickSight: dashboards on your data (serverless)
● DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
● Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
● Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
● Glue: Managed ETL (Extract Transform Load) and Data Catalog service
● Database Migration: DMS
● Neptune: graph database

You might also like