0% found this document useful (0 votes)
145 views943 pages

AWS Certified Developer Study Content

The document outlines the AWS Certified Developer Associate Course (DVA-C02), covering various AWS services and concepts such as EC2, IAM, S3, and serverless applications. It provides an overview of AWS's global infrastructure, including regions and availability zones, and emphasizes the importance of security practices like IAM policies and multi-factor authentication. Additionally, it highlights the diverse use cases for AWS across industries and the significance of understanding EC2 for cloud computing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views943 pages

AWS Certified Developer Study Content

The document outlines the AWS Certified Developer Associate Course (DVA-C02), covering various AWS services and concepts such as EC2, IAM, S3, and serverless applications. It provides an overview of AWS's global infrastructure, including regions and availability zones, and emphasizes the importance of security practices like IAM policies and multi-factor authentication. Additionally, it highlights the diverse use cases for AWS across industries and the significance of understanding EC2 for cloud computing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 943

Table of Contents

• Getting Started with AWS


• AWS Identity & Access Management (AWS IAM)
• Amazon EC2 – Basics
• Amazon EC2 – Instance Storage
• High Availability & Scalability
• RDS, Aurora, & ElastiCache
• Amazon Route 53
• Amazon VPC – Basics
• Amazon S3
• AWS CLI, SDK, IAM Roles & Policies
Table of Contents
• Amazon S3 – Advanced
• Amazon S3 – Security
• Amazon CloudFront
• Containers on AWS
• AWS Elastic Beanstalk
• AWS CloudFormation
• AWS Integration & Messaging
• AWS Monitoring, Troubleshooting & Audit
• AWS Lambda
• Amazon DynamoDB
Table of Contents
• Amazon API Gateway
• AWS CICD
• AWS Serverless Application Model (SAM)
• AWS Cloud Development Kit (CDK)
• Amazon Cognito
• Other Serverless
• Advanced Identity in AWS
• AWS Security & Encryption
• Other Services
• Exam Preparation
• Congratulations
AWS Certified Developer
Associate Course
DVA-C02
What’s AWS?
• AWS (Amazon Web Services) is a Cloud Provider
• They provide you with servers and services that you can use on
demand and scale easily

• AWS has revolutionized IT over time


• AWS powers some of the biggest websites in the world
• Amazon.com
• Netflix
What we’ll learn in this course

Amazon Amazon ECR Amazon ECS AWS Elastic AWS Elastic Load Amazon Amazon Amazon Amazon
EC2 Beanstalk Lambda Balancing CloudFront Kinesis Route 53 S3

Amazon Amazon Amazon Amazon Amazon Amazon AWS Step Functions Auto Scaling Amazon API Amazon Amazon
RDS Aurora DynamoDB ElastiCache SQS SNS Gateway SES Cognito

IAM Amazon Amazon EC2 AWS AWS AWS AWS AWS AWS AWS AWS KMS
CloudWatch Systems Manager CloudFormation CloudTrail CodeCommit CodeBuild CodeDeploy CodePipeline X-Ray
Navigating the AWS spaghetti bowl
Getting started with AWS
AWS Cloud History
2002: 2004: 2007:
Internally Launched publicly Launched in
launched with SQS Europe

2003: 2006:
Amazon infrastructure is Re-launched
one of their core strength. publicly with
Idea to market SQS, S3 & EC2
AWS Cloud Number Facts
• In 2019, AWS had $35.02
billion in annual revenue
• AWS accounts for 47% of the
market in 2019 (Microsoft is
2nd with 22%)
• Pioneer and Leader of the
AWS Cloud Market for the
9th consecutive year
• Over 1,000,000 active users

Gartner Magic Quadrant


AWS Cloud Use Cases
• AWS enables you to build sophisticated, scalable applications
• Applicable to a diverse set of industries
• Use cases include
• Enterprise IT, Backup & Storage, Big Data analytics
• Website hosting, Mobile & Social Apps
• Gaming
AWS Global Infrastructure
• AWS Regions
• AWS Availability Zones
• AWS Data Centers
• AWS Edge Locations /
Points of Presence

• https://infrastructure.aws/
AWS Regions
• AWS has Regions all around the world
• Names can be us-east-1, eu-west-3…
• A region is a cluster of data centers
• Most AWS services are region-scoped

https://aws.amazon.com/about-aws/global-infrastructure/
How to choose an AWS Region?

If you need to launch a new application,


where should you do it? • Compliance with data governance and legal
requirements: data never leaves a region without
your explicit permission

? ? • Proximity to customers: reduced latency


• Available services within a Region: new services
? ? and new features aren’t available in every Region
• Pricing: pricing varies region to region and is
transparent in the service pricing page
AWS Availability Zones
• Each region has many availability zones
AWS Region
(usually 3, min is 3, max is 6). Example: Sydney: ap-southeast-2
• ap-southeast-2a
• ap-southeast-2b ap-southeast-2a
• ap-southeast-2c
• Each availability zone (AZ) is one or more
discrete data centers with redundant power,
networking, and connectivity
• They’re separate from each other, so that ap-southeast-2b ap-southeast-2c

they’re isolated from disasters


• They’re connected with high bandwidth,
ultra-low latency networking
AWS Points of Presence (Edge Locations)
• Amazon has 400+ Points of Presence (400+ Edge Locations & 10+
Regional Caches) in 90+ cities across 40+ countries
• Content is delivered to end users with lower latency

https://aws.amazon.com/cloudfront/features/
Tour of the AWS Console
• AWS has Global Services:
• Identity and Access Management (IAM)
• Route 53 (DNS service)
• CloudFront (Content Delivery Network)
• WAF (Web Application Firewall)
• Most AWS services are Region-scoped:
• Amazon EC2 (Infrastructure as a Service)
• Elastic Beanstalk (Platform as a Service)
• Lambda (Function as a Service)
• Rekognition (Software as a Service)
• Region Table: https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services
AWS Identity & Access
Management (AWS IAM)
IAM: Users & Groups
• IAM = Identity and Access Management, Global service
• Root account created by default, shouldn’t be used or shared
• Users are people within your organization, and can be grouped
• Groups only contain users, not other groups
• Users don’t have to belong to a group, and user can belong to multiple groups

Group: Developers Group: Operations


Group
Audit Team

Alice Bob Charles David Edward Fred


IAM: Permissions {
"Version": "2012-10-17",
• Users or Groups can be "Statement": [
{
assigned JSON documents "Effect": "Allow",
"Action": "ec2:Describe*",
called policies },
"Resource": "*"

• These policies define the {


"Effect": "Allow",
permissions of the users "Action": "elasticloadbalancing:Describe*",
"Resource": "*"

• In AWS you apply the least },


{
privilege principle: don’t give "Effect": "Allow",
"Action": [
more permissions than a user "cloudwatch:ListMetrics",
"cloudwatch:GetMetricStatistics",
needs ],
"cloudwatch:Describe*"

"Resource": "*"
}
]
}
IAM Policies inheritance
Audit Team

Developers Operations

inline

Alice Bob Charles David Edward Fred


IAM Policies Structure
• Consists of
• Version: policy language version, always include “2012-10-
17”
• Id: an identifier for the policy (optional)
• Statement: one or more individual statements (required)
• Statements consists of
• Sid: an identifier for the statement (optional)
• Effect: whether the statement allows or denies access
(Allow, Deny)
• Principal: account/user/role to which this policy applied to
• Action: list of actions this policy allows or denies
• Resource: list of resources to which the actions applied to
• Condition: conditions for when this policy is in effect
(optional)
IAM – Password Policy
• Strong passwords = higher security for your account
• In AWS, you can setup a password policy:
• Set a minimum password length
• Require specific character types:
• including uppercase letters
• lowercase letters
• numbers
• non-alphanumeric characters
• Allow all IAM users to change their own passwords
• Require users to change their password after some time (password expiration)
• Prevent password re-use
Multi Factor Authentication - MFA
• Users have access to your account and can possibly change
configurations or delete resources in your AWS account
• You want to protect your Root Accounts and IAM users
• MFA = password you know + security device you own

Password + => Successful login

Alice

• Main benefit of MFA:


if a password is stolen or hacked, the account is not compromised
MFA devices options in AWS
Virtual MFA device Universal 2nd Factor (U2F) Security Key

Google Authenticator Authy YubiKey by Yubico (3rd party)


(phone only) (multi-device)

Support for multiple root and IAM users


Support for multiple tokens on a single device.
using a single security key
MFA devices options in AWS
Hardware Key Fob MFA Device Hardware Key Fob MFA Device for
AWS GovCloud (US)

Provided by Gemalto (3rd party) Provided by SurePassID (3rd party)


How can users access AWS ?
• To access AWS, you have three options:
• AWS Management Console (protected by password + MFA)
• AWS Command Line Interface (CLI): protected by access keys
• AWS Software Developer Kit (SDK) - for code: protected by access keys
• Access Keys are generated through the AWS Console
• Users manage their own access keys
• Access Keys are secret, just like a password. Don’t share them
• Access Key ID ~= username
• Secret Access Key ~= password
Example (Fake) Access Keys

• Access key ID: AKIASK4E37PV4983d6C


• Secret Access Key: AZPN3zojWozWCndIjhB0Unh8239a1bzbzO5fqqkZq
• Remember : don’t share your access keys
What’s the AWS CLI?
• A tool that enables you to interact with AWS services using commands in
your command-line shell
• Direct access to the public APIs of AWS services
• You can develop scripts to manage your resources
• It’s open-source https://github.com/aws/aws-cli
• Alternative to using AWS Management Console
What’s the AWS SDK?
• AWS Software Development Kit (AWS SDK)
• Language-specific APIs (set of libraries)
• Enables you to access and manage AWS services
programmatically AWS SDK
• Embedded within your application
• Supports
• SDKs (JavaScript, Python, PHP, .NET, Ruby, Java, Go, Node.js,
C++)
• Mobile SDKs (Android, iOS, …) Your Application
• IoT Device SDKs (Embedded C, Arduino, …)
• Example: AWS CLI is built on AWS SDK for Python
IAM Roles for Services
IAM Role
• Some AWS service will need to
perform actions on your behalf
• To do so, we will assign EC2 Instance
permissions to AWS services (virtual server)
with IAM Roles
• Common roles:
• EC2 Instance Roles Access AWS
• Lambda Function Roles
• Roles for CloudFormation
IAM Security Tools
• IAM Credentials Repor t (account-level)
• a report that lists all your account's users and the status of their various
credentials

• IAM Access Advisor (user-level)


• Access advisor shows the service permissions granted to a user and when those
services were last accessed.
• You can use this information to revise your policies.
IAM Guidelines & Best Practices
• Don’t use the root account except for AWS account setup
• One physical user = One AWS user
• Assign users to groups and assign permissions to groups
• Create a strong password policy
• Use and enforce the use of Multi Factor Authentication (MFA)
• Create and use Roles for giving permissions to AWS services
• Use Access Keys for Programmatic Access (CLI / SDK)
• Audit permissions of your account using IAM Credentials Report & IAM
Access Advisor
• Never share IAM users & Access Keys
Shared Responsibility Model for IAM
You

• Infrastructure (global • Users, Groups, Roles, Policies


management and monitoring
network security)
• Enable MFA on all accounts
• Configuration and • Rotate all your keys often
vulnerability analysis • Use IAM tools to apply
• Compliance validation appropriate permissions
• Analyze access patterns &
review permissions
IAM Section – Summary
• Users: mapped to a physical user, has a password for AWS Console
• Groups: contains users only
• Policies: JSON document that outlines permissions for users or groups
• Roles: for EC2 instances or AWS services
• Security: MFA + Password Policy
• AWS CLI: manage your AWS services using the command-line
• AWS SDK: manage your AWS services using a programming language
• Access Keys: access AWS using the CLI or SDK
• Audit: IAM Credential Reports & IAM Access Advisor
Amazon EC2 – Basics
Amazon EC2
• EC2 is one of the most popular of AWS’ offering
• EC2 = Elastic Compute Cloud = Infrastructure as a Service
• It mainly consists in the capability of :
• Renting virtual machines (EC2)
• Storing data on virtual drives (EBS)
• Distributing load across machines (ELB)
• Scaling the services using an auto-scaling group (ASG)

• Knowing EC2 is fundamental to understand how the Cloud works


EC2 sizing & configuration options
• Operating System (OS): Linux, Windows or Mac OS
• How much compute power & cores (CPU)
• How much random-access memory (RAM)
• How much storage space:
• Network-attached (EBS & EFS)
• hardware (EC2 Instance Store)
• Network card: speed of the card, Public IP address
• Firewall rules: security group
• Bootstrap script (configure at first launch): EC2 User Data
EC2 User Data
• It is possible to bootstrap our instances using an EC2 User data script.
• bootstrapping means launching commands when a machine starts
• That script is only run once at the instance first start
• EC2 user data is used to automate boot tasks such as:
• Installing updates
• Installing software
• Downloading common files from the internet
• Anything you can think of
• The EC2 User Data Script runs with the root user
Hands-On:
Launching an EC2 Instance running Linux
• We’ll be launching our first virtual server using the AWS Console
• We’ll get a first high-level approach to the various parameters
• We’ll see that our web server is launched using EC2 user data
• We’ll learn how to start / stop / terminate our instance.
EC2 Instance Types - Overview
• You can use different types of EC2 instances that are optimised for
different use cases (https://aws.amazon.com/ec2/instance-types/)
• AWS has the following naming convention:

m5.2xlarge

• m: instance class
• 5: generation (AWS improves them over time)
• 2xlarge: size within the instance class
EC2 Instance Types – General Purpose
• Great for a diversity of workloads such as web servers or code repositories
• Balance between:
• Compute
• Memory
• Networking
• In the course, we will be using the t2.micro which is a General Purpose EC2
instance

* this list will evolve over time, please check the AWS website for the latest information
EC2 Instance Types – Compute Optimized
• Great for compute-intensive tasks that require high performance
processors:
• Batch processing workloads
• Media transcoding
• High performance web servers
• High performance computing (HPC)
• Scientific modeling & machine learning
• Dedicated gaming servers

* this list will evolve over time, please check the AWS website for the latest information
EC2 Instance Types – Memory Optimized
• Fast performance for workloads that process large data sets in memory
• Use cases:
• High performance, relational/non-relational databases
• Distributed web scale cache stores
• In-memory databases optimized for BI (business intelligence)
• Applications performing real-time processing of big unstructured data

* this list will evolve over time, please check the AWS website for the latest information
EC2 Instance Types – Storage Optimized
• Great for storage-intensive tasks that require high, sequential read and write
access to large data sets on local storage
• Use cases:
• High frequency online transaction processing (OLTP) systems
• Relational & NoSQL databases
• Cache for in-memory databases (for example, Redis)
• Data warehousing applications
• Distributed file systems

* this list will evolve over time, please check the AWS website for the latest information
EC2 Instance Types: example

Storage Network EBS Bandwidth


Instance vCPU Mem (GiB)
Performance (Mbps)
t2.micro 1 1 EBS-Only Low to Moderate
t2.xlarge 4 16 EBS-Only Moderate
c5d.4xlarge 16 32 1 x 400 NVMe SSD Up to 10 Gbps 4,750
r5.16xlarge 64 512 EBS Only 20 Gbps 13,600
m5.8xlarge 32 128 EBS Only 10 Gbps 6,800

t2.micro is part of the AWS free tier (up to 750 hours per month)

Great website: https://instances.vantage.sh


Introduction to Security Groups
• Security Groups are the fundamental of network security in AWS
• They control how traffic is allowed into or out of our EC2 Instances.

Inbound traffic

Security
Group
WWW Outbound traffic EC2 Instance

• Security groups only contain rules


• Security groups rules can reference by IP or by security group
Security Groups
Deeper Dive
• Security groups are acting as a “firewall” on EC2 instances
• They regulate:
• Access to Ports
• Authorised IP ranges – IPv4 and IPv6
• Control of inbound network (from other to the instance)
• Control of outbound network (from the instance to other)
Security Groups
Diagram
Your Computer - IP XX.XX.XX.XX
Security Group 1 Port 22 (authorised port 22)
Inbound
Filter IP / Port with Rules Port 22 Other computer
(not authorised port 22)

EC2 Instance
IP XX.XX.XX.XX

Security Group 1 WWW


Outbound Any Port Any IP – Any Port
Filter IP / Port with Rules
Security Groups
Good to know
• Can be attached to multiple instances
• Locked down to a region / VPC combination
• Does live “outside” the EC2 – if traffic is blocked the EC2 instance won’t see it
• It’s good to maintain one separate security group for SSH access
• If your application is not accessible (time out), then it’s a security group issue
• If your application gives a “connection refused“ error, then it’s an application
error or it’s not launched
• All inbound traffic is blocked by default
• All outbound traffic is authorised by default
Referencing other security groups
Diagram
Security
Port 123 Group 2 EC2 Instance
(attached) IP XX.XX.XX.XX

Security Group 1
EC2 Instance Security
Inbound EC2 Instance
IP XX.XX.XX.XX Port 123 Group 1
Authorising Security Group 1 IP XX.XX.XX.XX
(attached)
Authorising Security Group 2

Security
Port 123 Group 3 EC2 Instance
IP XX.XX.XX.XX
(attached)
Classic Ports to know
• 22 = SSH (Secure Shell) - log into a Linux instance
• 21 = FTP (File Transfer Protocol) – upload files into a file share
• 22 = SFTP (Secure File Transfer Protocol) – upload files using SSH
• 80 = HTTP – access unsecured websites
• 443 = HTTPS – access secured websites
• 3389 = RDP (Remote Desktop Protocol) – log into a Windows instance
SSH Summary Table
EC2 Instance
SSH Putty
Connect

Mac

Linux

Windows < 10

Windows >= 10
Which Lectures to watch
• Mac / Linux:
• SSH on Mac/Linux lecture

• Windows:
• Putty Lecture
• If Windows 10: SSH on Windows 10 lecture

• All:
• EC2 Instance Connect lecture
SSH troubleshooting
• Students have the most problems with SSH

• If things don’t work…


1. Re-watch the lecture. You may have missed something
2. Read the troubleshooting guide
3. Try EC2 Instance Connect

• If one method works (SSH, Putty or EC2 Instance Connect) you’re


good
• If no method works, that’s okay, the course won’t use SSH much
How to SSH into your EC2 Instance
Linux / Mac OS X
• We’ll learn how to SSH into your EC2 instance using Linux / Mac
• SSH is one of the most important function. It allows you to control a
remote machine, all using the command line.

SSH – Port 22
WWW EC2 Instance
Linux
Public IP

• We will see how we can configure OpenSSH ~/.ssh/config to facilitate


the SSH into our EC2 instances
How to SSH into your EC2 Instance
Windows
• We’ll learn how to SSH into your EC2 instance using Windows
• SSH is one of the most important function. It allows you to control a
remote machine, all using the command line.

SSH – Port 22
WWW EC2 Instance
Linux
Public IP

• We will configure all the required parameters necessary for doing SSH
on Windows using the free tool Putty.
EC2 Instance Connect
• Connect to your EC2 instance within your browser
• No need to use your key file that was downloaded
• The “magic” is that a temporary key is uploaded onto EC2 by AWS

• Works only out-of-the-box with Amazon Linux 2

• Need to make sure the port 22 is still opened!


EC2 Instances Purchasing Options
• On-Demand Instances – short workload, predictable pricing, pay by second
• Reserved (1 & 3 years)
• Reserved Instances – long workloads
• Conver tible Reserved Instances – long workloads with flexible instances
• Savings Plans (1 & 3 years) –commitment to an amount of usage, long workload
• Spot Instances – short workloads, cheap, can lose instances (less reliable)
• Dedicated Hosts – book an entire physical server, control instance placement
• Dedicated Instances – no other customers will share your hardware
• Capacity Reservations – reserve capacity in a specific AZ for any duration
EC2 On Demand
• Pay for what you use:
• Linux or Windows - billing per second, after the first minute
• All other operating systems - billing per hour
• Has the highest cost but no upfront payment
• No long-term commitment

• Recommended for shor t-term and un-interrupted workloads, where


you can't predict how the application will behave
EC2 Reserved Instances
• Up to 72% discount compared to On-demand
• You reserve a specific instance attributes (Instance Type, Region, Tenancy,
OS)
• Reservation Period – 1 year (+discount) or 3 years (+++discount)
• Payment Options – No Upfront (+), Partial Upfront (++), All Upfront
(+++)
• Reserved Instance’s Scope – Regional or Zonal (reserve capacity in an AZ)
• Recommended for steady-state usage applications (think database)
• You can buy and sell in the Reserved Instance Marketplace

• Conver tible Reserved Instance


• Can change the EC2 instance type, instance family,
Note:OS,
the %scope and
discounts aretenancy
different from the video as AWS
• Up to 66% discount change them over time – the exact numbers are not needed
for the exam. This is just for illustrative purposes J
EC2 Savings Plans
• Get a discount based on long-term usage (up to 72% - same as RIs)
• Commit to a certain type of usage ($10/hour for 1 or 3 years)
• Usage beyond EC2 Savings Plans is billed at the On-Demand price

• Locked to a specific instance family & AWS region (e.g., M5 in us-east-1)


• Flexible across:
• Instance Size (e.g., m5.xlarge, m5.2xlarge)
• OS (e.g., Linux, Windows)
• Tenancy (Host, Dedicated, Default)
EC2 Spot Instances
• Can get a discount of up to 90% compared to On-demand
• Instances that you can “lose” at any point of time if your max price is less than the
current spot price
• The MOST cost-efficient instances in AWS

• Useful for workloads that are resilient to failure


• Batch jobs
• Data analysis
• Image processing
• Any distributed workloads
• Workloads with a flexible start and end time

• Not suitable for critical jobs or databases


EC2 Dedicated Hosts
• A physical server with EC2 instance capacity fully dedicated to your use
• Allows you address compliance requirements and use your existing server-
bound software licenses (per-socket, per-core, pe—VM software licenses)
• Purchasing Options:
• On-demand – pay per second for active Dedicated Host
• Reserved - 1 or 3 years (No Upfront, Partial Upfront, All Upfront)
• The most expensive option

• Useful for software that have complicated licensing model (BYOL – Bring Your
Own License)
• Or for companies that have strong regulatory or compliance needs
EC2 Dedicated Instances
• Instances run on hardware that’s
dedicated to you

• May share hardware with other


instances in same account

• No control over instance placement


(can move hardware after Stop / Start)
EC2 Capacity Reservations
• Reserve On-Demand instances capacity in a specific AZ for any
duration
• You always have access to EC2 capacity when you need it
• No time commitment (create/cancel anytime), no billing discounts
• Combine with Regional Reserved Instances and Savings Plans to benefit
from billing discounts
• You’re charged at On-Demand rate whether you run instances or not

• Suitable for short-term, uninterrupted workloads that needs to be in a


specific AZ
Which purchasing option is right for me?
• On demand: coming and staying in resort
whenever we like, we pay the full price
• Reserved: like planning ahead and if we plan to
stay for a long time, we may get a good discount.
• Savings Plans: pay a certain amount per hour for
certain period and stay in any room type (e.g.,
King, Suite, Sea View, …)
• Spot instances: the hotel allows people to bid for
the empty rooms and the highest bidder keeps the
rooms. You can get kicked out at any time
• Dedicated Hosts: We book an entire building of
the resort
• Capacity Reservations: you book a room for a
period with full price even you don’t stay in it
Price Comparison
Example – m4.large – us-east-1
Price Type Price (per hour)
On-Demand $0.10
Spot Instance (Spot Price) $0.038 - $0.039 (up to 61% off)
Reserved Instance (1 year) $0.062 (No Upfront) - $0.058 (All Upfront)
Reserved Instance (3 years) $0.043 (No Upfront) - $0.037 (All Upfront)
EC2 Savings Plan (1 year) $0.062 (No Upfront) - $0.058 (All Upfront)
Reserved Convertible Instance (1 year) $0.071 (No Upfront) - $0.066 (All Upfront)
Dedicated Host On-Demand Price
Dedicated Host Reservation Up to 70% off
Capacity Reservations On-Demand Price
Amazon EC2 – Instance Storage
What’s an EBS Volume?
• An EBS (Elastic Block Store) Volume is a network drive you can attach
to your instances while they run
• It allows your instances to persist data, even after their termination
• They can only be mounted to one instance at a time (at the CCP
level)
• They are bound to a specific availability zone

• Analogy: Think of them as a “network USB stick”


• Free tier: 30 GB of free EBS storage of type General Purpose (SSD) or
Magnetic per month
EBS Volume
• It’s a network drive (i.e. not a physical drive)
• It uses the network to communicate the instance, which means there might be a bit of
latency
• It can be detached from an EC2 instance and attached to another one quickly

• It’s locked to an Availability Zone (AZ)


• An EBS Volume in us-east-1a cannot be attached to us-east-1b
• To move a volume across, you first need to snapshot it

• Have a provisioned capacity (size in GBs, and IOPS)


• You get billed for all the provisioned capacity
• You can increase the capacity of the drive over time
EBS Volume - Example
US-EAST-1A US-EAST-1B

EBS EBS EBS EBS EBS


(10 GB) (100 GB) (50 GB) (50 GB) (10 GB)
unattached
EBS – Delete on Termination attribute

• Controls the EBS behaviour when an EC2 instance terminates


• By default, the root EBS volume is deleted (attribute enabled)
• By default, any other attached EBS volume is not deleted (attribute disabled)
• This can be controlled by the AWS console / AWS CLI
• Use case: preserve root volume when instance is terminated
EBS Snapshots
• Make a backup (snapshot) of your EBS volume at a point in time
• Not necessary to detach volume to do snapshot, but recommended
• Can copy snapshots across AZ or Region

US-EAST-1A US-EAST-1B

EBS Snapshot

EBS snapshot restore EBS


(50 GB) (50 GB)
EBS Snapshots Features EBS Snapshot
EBS Snapshot
Archive
• EBS Snapshot Archive
• Move a Snapshot to an ”archive tier” that is archive
75% cheaper
• Takes within 24 to 72 hours for restoring the
archive

• Recycle Bin for EBS Snapshots EBS Snapshot Recycle Bin


• Setup rules to retain deleted snapshots so you
can recover them after an accidental deletion delete
• Specify retention (from 1 day to 1 year)

• Fast Snapshot Restore (FSR)


• Force full initialization of snapshot to have no
latency on the first use ($$$)
AMI Overview
• AMI = Amazon Machine Image
• AMI are a customization of an EC2 instance
• You add your own software, configuration, operating system, monitoring…
• Faster boot / configuration time because all your software is pre-packaged
• AMI are built for a specific region (and can be copied across regions)
• You can launch EC2 instances from:
• A Public AMI: AWS provided
• Your own AMI: you make and maintain them yourself
• An AWS Marketplace AMI: an AMI someone else made (and potentially sells)
AMI Process (from an EC2 instance)
• Start an EC2 instance and customize it
• Stop the instance (for data integrity)
• Build an AMI – this will also create EBS snapshots
• Launch instances from other AMIs

Custom AMI
US-EAST-1A US-EAST-1B
Launch
Create AMI from AMI
EC2 Instance Store
• EBS volumes are network drives with good but “limited” performance
• If you need a high-performance hardware disk, use EC2 Instance
Store

• Better I/O performance


• EC2 Instance Store lose their storage if they’re stopped (ephemeral)
• Good for buffer / cache / scratch data / temporary content
• Risk of data loss if hardware fails
• Backups and Replication are your responsibility
Local EC2 Instance Store Very high IOPS
EBS Volume Types
• EBS Volumes come in 6 types
• gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for
a wide variety of workloads
• io1 / io2 (SSD): Highest-performance SSD volume for mission-critical low-latency or
high-throughput workloads
• st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput-
intensive workloads
• sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads

• EBS Volumes are characterized in Size | Throughput | IOPS (I/O Ops Per Sec)
• When in doubt always consult the AWS documentation – it’s good!
• Only gp2/gp3 and io1/io2 can be used as boot volumes
EBS Volume Types Use cases
General Purpose SSD
• Cost effective storage, low-latency
• System boot volumes, Virtual desktops, Development and test environments
• 1 GiB - 16 TiB
• gp3:
• Baseline of 3,000 IOPS and throughput of 125 MiB/s
• Can increase IOPS up to 16,000 and throughput up to 1000 MiB/s independently
• gp2:
• Small gp2 volumes can burst IOPS to 3,000
• Size of the volume and IOPS are linked, max IOPS is 16,000
• 3 IOPS per GB, means at 5,334 GB we are at the max IOPS
EBS Volume Types Use cases
Provisioned IOPS (PIOPS) SSD
• Critical business applications with sustained IOPS performance
• Or applications that need more than 16,000 IOPS
• Great for databases workloads (sensitive to storage perf and consistency)
• io1/io2 (4 GiB - 16 TiB):
• Max PIOPS: 64,000 for Nitro EC2 instances & 32,000 for other
• Can increase PIOPS independently from storage size
• io2 have more durability and more IOPS per GiB (at the same price as io1)
• io2 Block Express (4 GiB – 64 TiB):
• Sub-millisecond latency
• Max PIOPS: 256,000 with an IOPS:GiB ratio of 1,000:1
• Supports EBS Multi-attach
EBS Volume Types Use cases
Hard Disk Drives (HDD)
• Cannot be a boot volume
• 125 GiB to 16 TiB
• Throughput Optimized HDD (st1)
• Big Data, Data Warehouses, Log Processing
• Max throughput 500 MiB/s – max IOPS 500
• Cold HDD (sc1):
• For data that is infrequently accessed
• Scenarios where lowest cost is important
• Max throughput 250 MiB/s – max IOPS 250
EBS – Volume Types Summary

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html#solid-state-drives
EBS Multi-Attach – io1/io2 family
• Attach the same EBS volume to multiple EC2
instances in the same AZ Availability Zone 1
• Each instance has full read & write permissions
to the high-performance volume
• Use case:
• Achieve higher application availability in clustered
Linux applications (ex: Teradata)
• Applications must manage concurrent write
operations
• Up to 16 EC2 Instances at a time
• Must use a file system that’s cluster-aware (not io2 volume with Multi-Attach
XFS, EXT4, etc…)
EFS – Elastic File System
• Managed NFS (network file system) that can be mounted on many EC2
• EFS works with EC2 instances in multi-AZ
• Highly available, scalable, expensive (3x gp2), pay per use
us-east-1a us-east-1b us-east-1c

EC2 Instances EC2 Instances EC2 Instances

Security Group

EFS FileSystem
EFS – Elastic File System
• Use cases: content management, web serving, data sharing, Wordpress
• Uses NFSv4.1 protocol
• Uses security group to control access to EFS
• Compatible with Linux based AMI (not Windows)
• Encryption at rest using KMS

• POSIX file system (~Linux) that has a standard file API


• File system scales automatically, pay-per-use, no capacity planning!
EFS – Performance & Storage Classes
• EFS Scale
• 1000s of concurrent NFS clients, 10 GB+ /s throughput
• Grow to Petabyte-scale network file system, automatically
• Performance Mode (set at EFS creation time)
• General Purpose (default) – latency-sensitive use cases (web server, CMS, etc…)
• Max I/O – higher latency, throughput, highly parallel (big data, media processing)
• Throughput Mode
• Bursting – 1 TB = 50MiB/s + burst of up to 100MiB/s
• Provisioned – set your throughput regardless of storage size, ex: 1 GiB/s for 1 TB storage
• Elastic – automatically scales throughput up or down based on your workloads
• Up to 3GiB/s for reads and 1GiB/s for writes
• Used for unpredictable workloads
EFS – Storage Classes
• Storage Tiers (lifecycle management feature
– move file after N days)
• Standard: for frequently accessed files
• Infrequent access (EFS-IA): cost to retrieve files,
lower price to store. Enable EFS-IA with a Lifecycle
no access
Policy for 60 days

EFS Standard
• Availability and durability
move
• Standard: Multi-AZ, great for prod Lifecycle Policy

• One Zone: One AZ, great for dev, backup enabled


by default, compatible with IA (EFS One Zone-IA)

• Over 90% in cost savings EFS IA

Amazon EFS File System


EBS vs EFS – Elastic Block Storage
• EBS volumes… Availability Zone 1 Availability Zone 2
• one instance (except multi-attach io1/io2)
• are locked at the Availability Zone (AZ) level
• gp2: IO increases if the disk size increases
• io1: can increase IO independently
• To migrate an EBS volume across AZ
• Take a snapshot
• Restore the snapshot to another AZ EBS EBS
• EBS backups use IO and you shouldn’t run
them while your application is handling a lot
of traffic
snapshot restore
• Root EBS Volumes of instances get
terminated by default if the EC2 instance
gets terminated. (you can disable that)
EBS Snapshot
EBS vs EFS – Elastic File System
• Mounting 100s of instances across AZ Availability Zone 1 Availability Zone 2

• EFS share website files (WordPress) Linux Linux

• Only for Linux Instances (POSIX)

• EFS has a higher price point than EBS


EFS EFS
• Can leverage EFS-IA for cost savings Mount Mount
Target Target

• Remember: EFS vs EBS vs Instance Store

EFS
High Availability & Scalability
Scalability & High Availability
• Scalability means that an application / system can handle greater loads
by adapting.
• There are two kinds of scalability:
• Vertical Scalability
• Horizontal Scalability (= elasticity)
• Scalability is linked but different to High Availability

• Let’s deep dive into the distinction, using a call center as an example
Vertical Scalability
• Vertically scalability means increasing the size
of the instance
• For example, your application runs on a
t2.micro
• Scaling that application vertically means
running it on a t2.large
• Vertical scalability is very common for non
distributed systems, such as a database.
• RDS, ElastiCache are services that can scale
vertically.
• There’s usually a limit to how much you can
vertically scale (hardware limit)
junior operator senior operator
Horizontal Scalability operator operator operator

• Horizontal Scalability means increasing the


number of instances / systems for your
application

• Horizontal scaling implies distributed systems.


• This is very common for web applications /
modern applications

• It’s easy to horizontally scale thanks the cloud


offerings such as Amazon EC2
operator operator operator
High Availability first building in New York

• High Availability usually goes hand in


hand with horizontal scaling
• High availability means running your
application / system in at least 2 data
centers (== Availability Zones)
• The goal of high availability is to survive
a data center loss
second building in San Francisco

• The high availability can be passive (for


RDS Multi AZ for example)
• The high availability can be active (for
horizontal scaling)
High Availability & Scalability For EC2
• Vertical Scaling: Increase instance size (= scale up / down)
• From: t2.nano - 0.5G of RAM, 1 vCPU
• To: u-12tb1.metal – 12.3 TB of RAM, 448 vCPUs

• Horizontal Scaling: Increase number of instances (= scale out / in)


• Auto Scaling Group
• Load Balancer

• High Availability: Run instances for the same application across multi AZ
• Auto Scaling Group multi AZ
• Load Balancer multi AZ
What is load balancing?
• Load Balances are servers that forward traffic to multiple
servers (e.g., EC2 instances) downstream

Elastic Load Balancer


EC2 Instance

EC2 Instance

EC2 Instance
Why use a load balancer?
• Spread load across multiple downstream instances
• Expose a single point of access (DNS) to your application
• Seamlessly handle failures of downstream instances
• Do regular health checks to your instances
• Provide SSL termination (HTTPS) for your websites
• Enforce stickiness with cookies
• High availability across zones
• Separate public traffic from private traffic
Why use an Elastic Load Balancer?
• An Elastic Load Balancer is a managed load balancer
• AWS guarantees that it will be working
• AWS takes care of upgrades, maintenance, high availability
• AWS provides only a few configuration knobs

• It costs less to setup your own load balancer but it will be a lot more effort
on your end

• It is integrated with many AWS offerings / services


• EC2, EC2 Auto Scaling Groups, Amazon ECS
• AWS Certificate Manager (ACM), CloudWatch
• Route 53, AWS WAF, AWS Global Accelerator
Health Checks
• Health Checks are crucial for Load Balancers
• They enable the load balancer to know if instances it forwards traffic to
are available to reply to requests
• The health check is done on a port and a route (/health is common)
• If the response is not 200 (OK), then the instance is unhealthy
Protocol: HTTP
Port: 4567
Health Checks Endpoint: /health

Elastic Load Balancer EC2 Instance


Types of load balancer on AWS
• AWS has 4 kinds of managed Load Balancers
• Classic Load Balancer (v1 - old generation) – 2009 – CLB
• HTTP, HTTPS, TCP, SSL (secure TCP)
• Application Load Balancer (v2 - new generation) – 2016 – ALB
• HTTP, HTTPS, WebSocket
• Network Load Balancer (v2 - new generation) – 2017 – NLB
• TCP, TLS (secure TCP), UDP
• Gateway Load Balancer – 2020 – GWLB
• Operates at layer 3 (Network layer) – IP Protocol

• Overall, it is recommended to use the newer generation load balancers as they


provide more features
• Some load balancers can be setup as internal (private) or external (public) ELBs
Load Balancer Security Groups
LOAD BALANCER
HTTPS / HTTP HTTP Restricted
From anywhere to Load balancer
Users EC2

Load Balancer Security Group:

Application Security Group: Allow traffic only from Load Balancer


Classic Load Balancers (v1)

• Supports TCP (Layer 4), HTTP &


HTTPS (Layer 7)
listener internal
• Health checks are TCP or HTTP
based
• Fixed hostname Client CLB EC2
XXX.region.elb.amazonaws.com
Application Load Balancer (v2)
• Application load balancers is Layer 7 (HTTP)

• Load balancing to multiple HTTP applications across machines


(target groups)
• Load balancing to multiple applications on the same machine
(ex: containers)
• Support for HTTP/2 and WebSocket
• Support redirects (from HTTP to HTTPS for example)
Application Load Balancer (v2)
• Routing tables to different target groups:
• Routing based on path in URL (example.com/users & example.com/posts)
• Routing based on hostname in URL (one.example.com & other.example.com)
• Routing based on Query String, Headers
(example.com/users?id=123&order=false)

• ALB are a great fit for micro services & container-based application
(example: Docker & Amazon ECS)
• Has a port mapping feature to redirect to a dynamic port in ECS
• In comparison, we’d need multiple Classic Load Balancer per application
Application Load Balancer (v2)
HTTP Based Traffic

Target Group

Health Check
application
for Users
Route /user HTTP
WWW

External
Application
Load Balancer
(v2)

Target Group

Health Check
application
for Search
Route /search HTTP
WWW
Application Load Balancer (v2)
Target Groups
• EC2 instances (can be managed by an Auto Scaling Group) – HTTP
• ECS tasks (managed by ECS itself) – HTTP
• Lambda functions – HTTP request is translated into a JSON event
• IP Addresses – must be private IPs

• ALB can route to multiple target groups


• Health checks are at the target group level
Application Load Balancer (v2)
Query Strings/Parameters Routing
Target Group 1
?Platform=Mobile AWS – EC2 based

Requests External
WWW Application
Load Balancer
(v2) Target Group 2
?Platform=Desktop On-premises – Private IP routing
Application Load Balancer (v2)
Good to Know
• Fixed hostname (XXX.region.elb.amazonaws.com)
• The application servers don’t see the IP of the client directly
• The true IP of the client is inserted in the header X-Forwarded-For
• We can also get Port (X-Forwarded-Port) and proto (X-Forwarded-Proto)

Load Balancer IP
Client IP (Private IP) EC2
12.34.56.78 Instance

Connection termination
Network Load Balancer (v2)
• Network load balancers (Layer 4) allow to:
• Forward TCP & UDP traffic to your instances
• Handle millions of request per seconds
• Less latency ~100 ms (vs 400 ms for ALB)

• NLB has one static IP per AZ, and suppor ts assigning Elastic IP
(helpful for whitelisting specific IP)

• NLB are used for extreme performance, TCP or UDP traffic


• Not included in the AWS free tier
Network Load Balancer (v2)
TCP (Layer 4) Based Traffic

Target Group

Health Check
application
for Users
TCP + Rules TCP
WWW

External
Network Load
Balancer (v2)

Target Group

Health Check
application
for Search
TCP + Rules HTTP
WWW
Network Load Balancer – Target Groups
• EC2 instances
• IP Addresses – must be private IPs
• Application Load Balancer
• Health Checks support the TCP, HTTP and HTTPS Protocols
Network Network Network
Load Balancer Load Balancer Load Balancer

i-1234567890abcdef0 i-1234567890abcdef0 192.168.1.118 10.0.4.21

Target Group Target Group Target Group


(EC2 Instances) (IP Addresses) (Application Load Balancer)
Gateway Load Balancer
Route
• Deploy, scale, and manage a fleet of 3rd party Table
network virtual appliances in AWS
• Example: Firewalls, Intrusion Detection and Users Application
Prevention Systems, Deep Packet Inspection (source) (destination)
Systems, payload manipulation, …
traffic traffic

• Operates at Layer 3 (Network Layer) – IP Gateway


Packets Load Balancer
• Combines the following functions:
• Transparent Network Gateway – single
entry/exit for all traffic
• Load Balancer – distributes traffic to your virtual
appliances Target Group
• Uses the GENEVE protocol on port 6081
3rd Party Security
Virtual Appliances
Gateway Load Balancer – Target Groups
• EC2 instances
• IP Addresses – must be private IPs

Gateway Gateway
Load Balancer Load Balancer

i-1234567890abcdef0 i-1234567890abcdef0 192.168.1.118 10.0.4.21

Target Group Target Group


(EC2 Instances) (IP Addresses)
Sticky Sessions (Session Affinity)
• It is possible to implement stickiness so that the
same client is always redirected to the same Client 1 Client 2 Client 3
instance behind a load balancer
• This works for Classic Load Balancer,
Application Load Balancer, and Network Load
Balancer
• For both CLB & ALB, the “cookie” used for
stickiness has an expiration date you control
• Use case: make sure the user doesn’t lose his
session data
• Enabling stickiness may bring imbalance to the
load over the backend EC2 instances
EC2 Instance EC2 Instance
Sticky Sessions – Cookie Names
• Application-based Cookies
• Custom cookie
• Generated by the target
• Can include any custom attributes required by the application
• Cookie name must be specified individually for each target group
• Don’t use AWSALB, AWSALBAPP, or AWSALBTG (reserved for use by the ELB)
• Application cookie
• Generated by the load balancer
• Cookie name is AWSALBAPP
• Duration-based Cookies
• Cookie generated by the load balancer
• Cookie name is AWSALB for ALB, AWSELB for CLB
Cross-Zone Load Balancing
With Cross Zone Load Balancing: Without Cross Zone Load Balancing:
each load balancer instance distributes evenly Requests are distributed in the instances of the
across all registered instances in all AZ node of the Elastic Load Balancer

50 50 50 50

10 10 10 10 6.25 6.25 6.25 6.25


10 10 25 25

10 10 10 10 6.25 6.25 6.25 6.25

Availability Zone 1 Availability Zone 2 Availability Zone 1 Availability Zone 2


Cross-Zone Load Balancing
• Application Load Balancer
• Enabled by default (can be disabled at the Target Group level)
• No charges for inter AZ data

• Network Load Balancer & Gateway Load Balancer


• Disabled by default
• You pay charges ($) for inter AZ data if enabled

• Classic Load Balancer


• Disabled by default
• No charges for inter AZ data if enabled
SSL/TLS - Basics
• An SSL Certificate allows traffic between your clients and your load balancer
to be encrypted in transit (in-flight encryption)

• SSL refers to Secure Sockets Layer, used to encrypt connections


• TLS refers to Transport Layer Security, which is a newer version
• Nowadays, TLS cer tificates are mainly used, but people still refer as SSL

• Public SSL certificates are issued by Certificate Authorities (CA)


• Comodo, Symantec, GoDaddy, GlobalSign, Digicert, Letsencrypt, etc…

• SSL certificates have an expiration date (you set) and must be renewed
Load Balancer - SSL Certificates
LOAD BALANCER
HTTPS (encrypted) HTTP
Over www Over private VPC
EC2
Users Instance

• The load balancer uses an X.509 certificate (SSL/TLS server certificate)


• You can manage certificates using ACM (AWS Certificate Manager)
• You can create upload your own certificates alternatively
• HTTPS listener:
• You must specify a default certificate
• You can add an optional list of certs to support multiple domains
• Clients can use SNI (Server Name Indication) to specify the hostname they reach
• Ability to specify a security policy to support older versions of SSL / TLS (legacy clients)
SSL – Server Name Indication (SNI)
• SNI solves the problem of loading multiple Target group for

SSL cer tificates onto one web server (to www.mycorp.com

serve multiple websites)


• It’s a “newer” protocol, and requires the client
to indicate the hostname of the target server
in the initial SSL handshake Target group for
Domain1.example.com
• The server will then find the correct I would like
www.mycorp.com
certificate, or return the default one
Client ALB
Note: SSL Cert:
• Only works for ALB & NLB (newer Use the correct Domain1.example.com

generation), CloudFront SSL cert

SSL Cert:
• Does not work for CLB (older gen) www.mycorp.com

….
Elastic Load Balancers – SSL Certificates
• Classic Load Balancer (v1)
• Support only one SSL certificate
• Must use multiple CLB for multiple hostname with multiple SSL certificates

• Application Load Balancer (v2)


• Supports multiple listeners with multiple SSL certificates
• Uses Server Name Indication (SNI) to make it work

• Network Load Balancer (v2)


• Supports multiple listeners with multiple SSL certificates
• Uses Server Name Indication (SNI) to make it work
Connection Draining
• Feature naming
• Connection Draining – for CLB waiting for existing
• Deregistration Delay – for ALB & NLB connections to complete

EC2 Instance
• Time to complete “in-flight requests” while the DRAINING
instance is de-registering or unhealthy
• Stops sending new requests to the EC2
instance which is de-registering
Users EC2 Instance
• Between 1 to 3600 seconds (default: 300 ELB
seconds)
• Can be disabled (set value to 0) new connections
established to all other instances
• Set to a low value if your requests are short EC2 Instance
What’s an Auto Scaling Group?
• In real-life, the load on your websites and application can change
• In the cloud, you can create and get rid of servers very quickly

• The goal of an Auto Scaling Group (ASG) is to:


• Scale out (add EC2 instances) to match an increased load
• Scale in (remove EC2 instances) to match a decreased load
• Ensure we have a minimum and a maximum number of EC2 instances running
• Automatically register new instances to a load balancer
• Re-create an EC2 instance in case a previous one is terminated (ex: if unhealthy)

• ASG are free (you only pay for the underlying EC2 instances)
Auto Scaling Group in AWS

Auto Scaling Group

EC2 EC2 EC2 EC2 EC2 EC2 EC2


Instance Instance Instance Instance Instance Instance Instance

Minimum Capacity Scale Out as Needed

Desired Capacity

Maximum Capacity
Auto Scaling Group in AWS With Load Balancer
Users

Elastic Load Balancer

ELB can check the health of your EC2 instances!

Auto Scaling Group

EC2 EC2 EC2 EC2 EC2 EC2 EC2


Instance Instance Instance Instance Instance Instance Instance
Auto Scaling Group Attributes
• A Launch Template (older “Launch Configurations” are deprecated)
• AMI + Instance Type
• EC2 User Data ASG Launch Template

• EBS Volumes
• Security Groups
AMI Instance EBS Volumes
• SSH Key Pair Type
• IAM Roles for your EC2 Instances
• Network + Subnets Information Security
Groups SSH Key Pair IAM Role
• Load Balancer Information
• Min Size / Max Size / Initial Capacity …
• Scaling Policies VPC + Subnets Load
Balancer
Auto Scaling - CloudWatch Alarms & Scaling
• It is possible to scale an ASG based on CloudWatch alarms
• An alarm monitors a metric (such as Average CPU, or a custom metric)
• Metrics such as Average CPU are computed for the overall ASG instances
• Based on the alarm:
• We can create scale-out policies (increase the number of instances)
• We can create scale-in policies (decrease the number of instances)

Auto Scaling Group trigger Scaling

EC2 EC2 EC2 EC2 EC2


Instance Instance Instance Instance Instance
CloudWatch
Alarm
Auto Scaling Groups – Dynamic Scaling Policies
• Target Tracking Scaling
• Most simple and easy to set-up
• Example: I want the average ASG CPU to stay at around 40%
• Simple / Step Scaling
• When a CloudWatch alarm is triggered (example CPU > 70%), then add 2 units
• When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1
• Scheduled Actions
• Anticipate a scaling based on known usage patterns
• Example: increase the min capacity to 10 at 5 pm on Fridays
Auto Scaling Groups – Predictive Scaling
• Predictive scaling: continuously forecast load and schedule scaling ahead
Good metrics to scale on Users

• CPUUtilization: Average CPU


utilization across your instances
• RequestCountPerTarget: to make sure
the number of requests per EC2 Application
Load Balancer
instances is stable
RequestCountPerTarget
• Average Network In / Out (if you’re Target Value: 3
application is network bound)
• Any custom metric (that you push
using CloudWatch)
Auto Scaling group
Auto Scaling Groups - Scaling Cooldowns
• After a scaling activity happens, you are in Scaling Action
the cooldown period (default 300 Occurs

seconds)
• During the cooldown period, the ASG will
not launch or terminate additional No Default
instances (to allow for metrics to stabilize) Launch or
Teminate Instance
Cooldown
in effect?

• Advice: Use a ready-to-use AMI to reduce Yes

configuration time in order to be serving


request fasters and reduce the cooldown Ignore Action
period
Auto Scaling – Instance Refresh
User
• Goal: update launch template
and then re-creating all EC2 New Launch Template StartInstanceRefresh
(Updated AMI) Min. Healthy Percentage: 60 %
instances
• For this we can use the native
feature of Instance Refresh Auto Scaling Group

• Setting of minimum healthy New Launch


Template
percentage
• Specify warm-up time (how long
until the instance is ready to use)
Old Launch
Template
RDS, Aurora, & ElastiCache
Amazon RDS Overview
• RDS stands for Relational Database Service
• It’s a managed DB service for DB use SQL as a query language.
• It allows you to create databases in the cloud that are managed by AWS
• Postgres
• MySQL
• MariaDB
• Oracle
• Microsoft SQL Server
• Aurora (AWS Proprietary database)
Advantage over using RDS versus deploying
DB on EC2
• RDS is a managed service:
• Automated provisioning, OS patching
• Continuous backups and restore to specific timestamp (Point in Time Restore)!
• Monitoring dashboards
• Read replicas for improved read performance
• Multi AZ setup for DR (Disaster Recovery)
• Maintenance windows for upgrades
• Scaling capability (vertical and horizontal)
• Storage backed by EBS (gp2 or io1)
• BUT you can’t SSH into your instances
RDS – Storage Auto Scaling
• Helps you increase storage on your RDS DB instance
dynamically
• When RDS detects you are running out of free database
storage, it scales automatically
• Avoid manually scaling your database storage
Application
• You have to set Maximum Storage Threshold (maximum
limit for DB storage)
• Automatically modify storage if: Read/Write
• Free storage is less than 10% of allocated storage
• Low-storage lasts at least 5 minutes
• 6 hours have passed since last modification
• Useful for applications with unpredictable workloads
• Supports all RDS database engines (MariaDB, MySQL,
PostgreSQL, SQL Server, Oracle) Storage
RDS Read Replicas for read scalability
• Up to 15 Read Replicas
• Within AZ, Cross AZ or Application
Cross Region
• Replication is ASYNC,
so reads are eventually
consistent reads writes reads reads
• Replicas can be
promoted to their own
DB
• Applications must
update the connection ASYNC ASYNC
string to leverage read replication replication
replicas RDS DB RDS DB
RDS DB
instance read instance instance read
replica replica
RDS Read Replicas – Use Cases
• You have a production database
that is taking on normal load Production Reporting
Application Application
• You want to run a reporting
application to run some analytics
• You create a Read Replica to run reads reads
the new workload there
• The production application is
unaffected
• Read replicas are used for SELECT
(=read) only kind of statements ASYNC
(not INSERT, UPDATE, DELETE) replication
RDS DB RDS DB
instance instance read
replica
RDS Read Replicas – Network Cost
• In AWS there’s a network cost when data goes from one AZ to another
• For RDS Read Replicas within the same region, you don’t pay that fee

Same Region / Different AZ Region/AZ Region/AZ


us-east-1a us-east-1b us-east-1a eu-west-1b

VS
ASYNC ASYNC
Replication Replication
RDS DB RDS DB RDS DB RDS DB
instance Same Region instance read instance Cross-Region instance read
Free replica $$$ replica
RDS Multi AZ (Disaster Recovery)
• SYNC replication
Application
• One DNS name – automatic app
failover to standby
• Increase availability writes reads
• Failover in case of loss of AZ, loss of
network, instance or storage failure
One DNS name – automatic failover
• No manual intervention in apps
• Not used for scaling

• Note:The Read Replicas be setup as SYNC


Multi AZ for Disaster Recovery (DR) replication
RDS DB
instance standby RDS Master DB
(AZ B) instance (AZ A)
RDS – From Single-AZ to Multi-AZ
• Zero downtime operation (no RDS DB Standby DB
need to stop the DB) instance

• Just click on “modify” for the


database
• The following happens internally: SYNC
Replication
• A snapshot is taken
• A new DB is restored from the
snapshot in a new AZ restore
snapshot
• Synchronization is established
between the two databases

DB snapshot
Amazon Aurora
• Aurora is a proprietary technology from AWS (not open sourced)
• Postgres and MySQL are both supported as Aurora DB (that means your
drivers will work as if Aurora was a Postgres or MySQL database)
• Aurora is “AWS cloud optimized” and claims 5x performance improvement
over MySQL on RDS, over 3x the performance of Postgres on RDS
• Aurora storage automatically grows in increments of 10GB, up to 128 TB.
• Aurora can have up to 15 replicas and the replication process is faster than
MySQL (sub 10 ms replica lag)
• Failover in Aurora is instantaneous. It’s HA (High Availability) native.
• Aurora costs more than RDS (20% more) – but is more efficient
Aurora High Availability and Read Scaling
• 6 copies of your data across 3 AZ:
AZ 1 AZ 2 AZ 3
• 4 copies out of 6 needed for writes
• 3 copies out of 6 need for reads
• Self healing with peer-to-peer replication
• Storage is striped across 100s of volumes W
R R R R R
• One Aurora Instance takes writes (master)
• Automated failover for master in less than Shared storage Volume
Replication + Self Healing + Auto Expanding
30 seconds
• Master + up to 15 Aurora Read Replicas
serve reads
• Suppor t for Cross Region Replication
Aurora DB Cluster

client

Writer Endpoint Reader Endpoint


Pointing to the master Connection Load Balancing

Auto Scaling

W R R R R R

Shared storage Volume


Auto Expanding from 10G to 128 TB
Features of Aurora
• Automatic fail-over
• Backup and Recovery
• Isolation and security
• Industry compliance
• Push-button scaling
• Automated Patching with Zero Downtime
• Advanced Monitoring
• Routine Maintenance
• Backtrack: restore data at any point of time without using backups
RDS & Aurora Security
• At-rest encryption:
• Database master & replicas encryption using AWS KMS – must be defined as launch time
• If the master is not encrypted, the read replicas cannot be encrypted
• To encrypt an un-encrypted database, go through a DB snapshot & restore as encrypted
• In-flight encryption: TLS-ready by default, use the AWS TLS root certificates client-side
• IAM Authentication: IAM roles to connect to your database (instead of username/pw)
• Security Groups: Control Network access to your RDS / Aurora DB
• No SSH available except on RDS Custom
• Audit Logs can be enabled and sent to CloudWatch Logs for longer retention
Amazon RDS Proxy
VPC
• Fully managed database proxy for RDS Lambda functions
• Allows apps to pool and share DB connections …
established with the database
IAM
• Improving database efficiency by reducing the Authentication
stress on database resources (e.g., CPU, RAM) and
minimize open connections (and timeouts)
• Serverless, autoscaling, highly available (multi-AZ)
• Reduced RDS & Aurora failover time by up 66% Private subnet

• Supports RDS (MySQL, PostgreSQL, MariaDB, MS RDS Proxy


SQL Server) and Aurora (MySQL, PostgreSQL)
• No code changes required for most apps
• Enforce IAM Authentication for DB, and securely
store credentials in AWS Secrets Manager
• RDS Proxy is never publicly accessible (must be RDS DB
accessed from VPC) Instance
Amazon ElastiCache Overview
• The same way RDS is to get managed Relational Databases…
• ElastiCache is to get managed Redis or Memcached
• Caches are in-memory databases with really high performance, low
latency
• Helps reduce load off of databases for read intensive workloads
• Helps make your application stateless
• AWS takes care of OS maintenance / patching, optimizations, setup,
configuration, monitoring, failure recovery and backups
• Using ElastiCache involves heavy application code changes
ElastiCache
Solution Architecture - DB Cache
• Applications queries Amazon
ElastiCache, if not ElastiCache
available, get from RDS Cache hit
and store in ElastiCache.
• Helps relieve load in RDS
Cache miss
• Cache must have an application
Read from DB
invalidation strategy to
make sure only the most
current data is used in Write to cache Amazon
there. RDS
ElastiCache
Solution Architecture – User Session Store
• User logs into any of the Write session
application application
• The application writes Amazon
the session data into ElastiCache
ElastiCache Retrieve session

User
• The user hits another application
instance of our
application
• The instance retrieves the
data and the user is application
already logged in
ElastiCache – Redis vs Memcached
REDIS MEMCACHED
• Multi AZ with Auto-Failover • Multi-node for partitioning of
• Read Replicas to scale reads and data (sharding)
have high availability • No high availability (replication)
• Data Durability using AOF
persistence • Non persistent
• Backup and restore features • No backup and restore
• Supports Sets and Sorted Sets • Multi-threaded architecture

Replication
+
sharding
Caching Implementation Considerations
• Read more at: https://aws.amazon.com/caching/implementation-
considerations/

• Is it safe to cache data? Data may be out of date, eventually consistent


• Is caching effective for that data?
• Pattern: data changing slowly, few keys are frequently needed
• Anti patterns: data changing rapidly, all large key space frequently needed
• Is data structured well for caching?
• example: key value caching, or caching of aggregations results

• Which caching design pattern is the most appropriate?


Lazy Loading / Cache-Aside / Lazy Population
• Pros
Amazon • Only requested data is
ElastiCache cached (the cache isn’t filled
up with unused data)
Cache hit • Node failures are not fatal
(just increased latency to
warm the cache)

Cache miss • Cons


application • Cache miss penalty that
Read from DB results in 3 round trips,
noticeable delay for that
request
• Stale data: data can be
Write to cache updated in the database and
Amazon outdated in the cache
RDS
Lazy Loading / Cache-Aside / Lazy Population
Python Pseudocode
Write Through –
Add or Update cache when database is updated
• Pros:
Amazon • Data in cache is never
ElastiCache stale, reads are quick
• Write penalty vs Read
Cache hit penalty (each write
requires 2 calls)
• Cons:
application • Missing Data until it is
1) Write to DB added / updated in the
DB. Mitigation is to
implement Lazy Loading
strategy as well
2) Write to cache Amazon • Cache churn – a lot of the
RDS data will never be read
Write-Through
Python Pseudocode
Cache Evictions and Time-to-live (TTL)
• Cache eviction can occur in three ways:
• You delete the item explicitly in the cache
• Item is evicted because the memory is full and it’s not recently used (LRU)
• You set an item time-to-live (or TTL)
• TTL are helpful for any kind of data:
• Leaderboards
• Comments
• Activity streams
• TTL can range from few seconds to hours or days

• If too many evictions happen due to memory, you should scale up or out
Final words of wisdom
• Lazy Loading / Cache aside is easy to implement and works for many
situations as a foundation, especially on the read side
• Write-through is usually combined with Lazy Loading as targeted for the
queries or workloads that benefit from this optimization
• Setting a TTL is usually not a bad idea, except when you’re using Write-
through. Set it to a sensible value for your application
• Only cache the data that makes sense (user profiles, blogs, etc…)

• Quote: There are only two hard things in Computer Science: cache
invalidation and naming things
Amazon MemoryDB for Redis
• Redis-compatible, durable, in-memory database service
• Ultra-fast performance with over 160 millions requests/second
• Durable in-memory data storage with Multi-AZ transactional log
• Scale seamlessly from 10s GBs to 100s TBs of storage
• Use cases: web and mobile apps, online gaming, media streaming, …

AZ 1 AZ 2 AZ 3

Amazon MemoryDB In-Memory Speed Multi-AZ Transactional Log


Microservices Applications Stores data in-memory Stores data across multiple
Web, mobile, retail, gaming, media for Redis across up to hundreds Availability Zones to provide
and entertainment, banking, Redis-compatible, durable,
of nodes for ultra-fast durability and fast recovery
finance, and more .. in-memory database
performance
Amazon Route 53
What is DNS?
• Domain Name System which translates the human friendly hostnames
into the machine IP addresses
• www.google.com => 172.217.18.36
• DNS is the backbone of the Internet
• DNS uses hierarchical naming structure

.com

example.com
www.example.com
api.example.com
DNS Terminologies
• Domain Registrar : Amazon Route 53, GoDaddy, …
• DNS Records: A, AAAA, CNAME, NS, …
• Zone File: contains DNS records
• Name Server : resolves DNS queries (Authoritative or Non-Authoritative)
• Top Level Domain (TLD): .com, .us, .in, .gov, .org, …
• Second Level Domain (SLD): amazon.com, google.com, …
URL

http://api.www.example.co Root
m.
Protocol
SLD
TLD

Sub Domain
FQDN (Fully Qualified Domain Name)
How DNS Works
Web Server
(example.com)
(IP: 9.10.11.12)
Managed by ICANN
?
p l e .com
exam
.3.4 Root DNS Server
.2
o m NS 1
.c
example.com? example.com?
Managed by IANA
9.10.11.12 (Branch of ICANN)
TTL example.com NS 5.6.7.8
TTL
Web Browser TLD DNS Server
You want to access Local DNS Server exam
p (.com)
le.co
example.com exam m?
Assigned and Managed by p le.co
m IP
your company or assigned by 9.10
.11.
your ISP dynamically 1 2 Managed by Domain Registrar
(e.g., Amazon Registrar, Inc.)
SLD DNS Server
(example.com)
Amazon Route 53
• A highly available, scalable, fully example.com?
Amazon
managed and Authoritative DNS Route 53
• Authoritative = the customer (you) Client
54.22.33.44
can update the DNS records
• Route 53 is also a Domain Registrar
• Ability to check the health of your
resources AWS Cloud

• The only AWS service which Public IP


provides 100% availability SLA 54.22.33.44

• Why Route 53? 53 is a reference to EC2 Instance


the traditional DNS port
Route 53 – Records
• How you want to route traffic for a domain
• Each record contains:
• Domain/subdomain Name – e.g., example.com
• Record Type – e.g., A or AAAA
• Value – e.g., 12.34.56.78
• Routing Policy – how Route 53 responds to queries
• TTL – amount of time the record cached at DNS Resolvers
• Route 53 supports the following DNS record types:
• (must know) A / AAAA / CNAME / NS
• (advanced) CAA / DS / MX / NAPTR / PTR / SOA / TXT / SPF / SRV
Route 53 – Record Types
• A – maps a hostname to IPv4
• AAAA – maps a hostname to IPv6
• CNAME – maps a hostname to another hostname
• The target is a domain name which must have an A or AAAA record
• Can’t create a CNAME record for the top node of a DNS namespace (Zone
Apex)
• Example: you can’t create for example.com, but you can create for
www.example.com
• NS – Name Servers for the Hosted Zone
• Control how traffic is routed for a domain
Route 53 – Hosted Zones
• A container for records that define how to route traffic to a domain and
its subdomains

• Public Hosted Zones – contains records that specify how to route


traffic on the Internet (public domain names)
application1.mypublicdomain.com
• Private Hosted Zones – contain records that specify how you route
traffic within one or more VPCs (private domain names)
application1.company.internal

• You pay $0.50 per month per hosted zone


Route 53 – Public vs. Private Hosted Zones
Public Hosted Zone Private Hosted Zone

example.com?

VPC
54.22.33.44
Client Private Hosted Zone
Public Hosted Zone

db.example.internal?
al?
ern

10.0.0.35
.int

.10
pl e
DB Instance

0
am

.0.
VPC

10
i.ex
(db.example.internal)

ap
(Private IP)

S3 Bucket Amazon EC2 Instance Application EC2 Instance EC2 Instance


CloudFront (Public IP) Load Balancer (webapp.example.internal) (api.example.internal)
(Private IP) (Private IP)
Route 53 – Records TTL (Time To Live)
• High TTL – e.g., 24 hr
• Less traffic on Route 53 equest
D NS R le.com
?
p
• Possibly outdated records myapp
.exa m

• Low TTL – e.g., 60 sec. A 1 2 .3 4.56.7


TL)
8 Amazon
• More traffic on Route 53 ($$) (with T Route 53
TTL
• Records are outdated for less HT T
P Re
time Client ques
t
Will cache the result for HT T
• Easy to change records The TTL of the record
P Re
spon
se

• Except for Alias records,


TTL is mandatory for each
DNS record Web Server
CNAME vs Alias
• AWS Resources (Load Balancer, CloudFront...) expose an AWS hostname:
• lb1-1234.us-east-2.elb.amazonaws.com and you want myapp.mydomain.com

• CNAME:
• Points a hostname to any other hostname. (app.mydomain.com => blabla.anything.com)
• ONLY FOR NON ROOT DOMAIN (aka. something.mydomain.com)
• Alias:
• Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
• Works for ROOT DOMAIN and NON ROOT DOMAIN (aka mydomain.com)
• Free of charge
• Native health check
Route 53 – Alias Records
Amazon
Route 53
• Maps a hostname to an AWS resource
• An extension to DNS functionality
• Automatically recognizes changes in the Alias Record (Enabled)
Record Name Type Value
resource’s IP addresses example.com A MyALB-123456789.us-
east-
• Unlike CNAME, it can be used for the top node 1.elb.amazonaws.com

of a DNS namespace (Zone Apex), e.g.:


example.com
• Alias Record is always of type A/AAAA for MyALB-123456789.us-east-1.elb.amazonaws.co
AWS resources (IPv4 / IPv6) AWS-Managed
(IP Addresses might change)
• You can’t set the TTL Application
Load Balancer
Route 53 – Alias Records Targets
• Elastic Load Balancers
• CloudFront Distributions Elastic Amazon Amazon
• API Gateway Load Balancer CloudFront API Gateway

• Elastic Beanstalk environments


• S3 Websites
Elastic Beanstalk S3 Websites VPC Interface
• VPC Interface Endpoints Endpoints

• Global Accelerator accelerator


• Route 53 record in the same hosted zone
Global Accelerator Route 53 Record
(same Hosted Zone)

• You cannot set an ALIAS record for an EC2 DNS


name
Route 53 – Routing Policies
• Define how Route 53 responds to DNS queries
• Don’t get confused by the word “Routing”
• It’s not the same as Load balancer routing which routes the traffic
• DNS does not route any traffic, it only responds to the DNS queries
• Route 53 Supports the following Routing Policies
• Simple
• Weighted
• Failover
• Latency based
• Geolocation
• Multi-Value Answer
• Geoproximity (using Route 53 Traffic Flow feature)
Routing Policies – Simple
Single Value
• Typically, route traffic to a single foo.example.com
resource A 11.22.33.44

• Can specify multiple values in the Client


same record Amazon
Route 53
• If multiple values are returned, a
random one is chosen by the client
Multiple Value
• When Alias enabled, specify only
one AWS resource foo.example.com

• Can’t be associated with Health


Checks Client
A 11.22.33.44
A 55.66.77.88
A 99.11.22.33
Amazon
chooses
a random value Route 53
Routing Policies – Weighted
• Control the % of the requests that go to each
specific resource
• Assign each record a relative weight: % Weight: 70
!"#$%& '() * +,"-#'#- )"-(). 70
• 𝑡𝑟𝑎𝑓𝑓𝑖𝑐 (%) =
/01 (' *22 &%" 3"#$%&+ '() *22 )"-().+
• Weights don’t need to sum up to 100
• DNS records must have the same name and type
• Can be associated with Health Checks 20%
• Use cases: load balancing between regions, testing
new application versions… Amazon Weight: 20
Route 53
• Assign a weight of 0 to a record to stop 10
%
sending traffic to a resource
• If all records have weight of 0, then all records
will be returned equally
Weight: 10
Routing Policies – Latency-based
• Redirect to the resource that
has the least latency close to us
• Super helpful when latency for
users is a priority
• Latency is based on traffic
between users and AWS
Regions
ALB
• Germany users may be (us-east-1)
directed to the US (if that’s the
lowest latency) ALB
(ap-southeast-1)
• Can be associated with Health
Checks (has a failover
capability)
Route 53 – Health Checks
Amazon Route 53
DNS Record
• HTTP Health Checks are only for public (latency, geoproximity, …
resources
• Health Check => Automated DNS Failover: Health Check Health Check
1. Health checks that monitor an endpoint
(application, server, other AWS resource)
2. Health checks that monitor other health us-east-1 eu-west-1
checks (Calculated Health Checks)
3. Health checks that monitor CloudWatch
Alarms (full control !!) – e.g., throttles of ALB ALB
DynamoDB, alarms on RDS, custom metrics,
… (helpful for private resources)
Auto Scaling group Auto Scaling group

• Health Checks are integrated with CW


metrics EC2 Instance EC2 Instance
Health Checks – Monitor an Endpoint
• About 15 global health checkers will check the Health Checker
(us-east-1)
Health Checker
(us-west-1)
Health Checker
(sa-east-1)
endpoint health
• Healthy/Unhealthy Threshold – 3 (default)

HT /hea

to
Interval – 30 sec (can set to 10 sec – higher cost)

TP lth
20

req
• Supported protocol: HTTP, HTTPS and TCP

0c

ue
• If > 18% of health checkers report the endpoint is

od

st
healthy, Route 53 considers it Healthy. Otherwise, it’s

e
Unhealthy eu-west-1
Must allow incoming
• Ability to choose which locations you want Route 53 to requests from Route 53
use Health Checkers IP
• Health Checks pass only when the endpoint ALB
address range
responds with the 2xx and 3xx status codes
• Health Checks can be setup to pass / fail based on
the text in the first 5120 bytes of the response Auto Scaling group

• Configure you router/firewall to allow incoming


requests from Route 53 Health Checkers
EC2 Instance
Route 53 – Calculated Health Checks
Amazon Route 53
• Combine the results of multiple Health
Checks into a single Health Check
Health Check
• You can use OR, AND, or NOT (Parent)

• Can monitor up to 256 Child Health Checks


• Specify how many of the health checks need
to pass to make the parent pass
Health Check Health Check Health Check
• Usage: perform maintenance to your website (Child) (Child) (Child)

without causing all health checks to fail


monitor monitor monitor

EC2 Instance EC2 Instance EC2 Instance


Health Checks – Private Hosted Zones
• Route 53 health checkers are outside the
VPC
VPC
• They can’t access private endpoints Private subnet

(private VPC or on-premises resource)


Health Checker
(us-east-1)

• You can create a CloudWatch Metric monitor

and associate a CloudWatch Alarm, then


create a Health Check that checks the monitor

alarm itself CloudWatch


Alarm
Routing Policies – Failover (Active-Passive)

EC2 Instance
Health Check (Primary)
(mandatory)
DNS Requests
Failover
Client
Amazon
Route 53

EC2 Instance
(Secondary – Disaster Recovery)
Routing Policies – Geolocation
A 11.22.33.44
• Different from Latency-based!
• This routing is based on user location
• Specify location by Continent, Country
or by US State (if there’s overlapping,
most precise location selected) Default
A 99.11.22.33
• Should create a “Default” record (in
case there’s no match on location)
• Use cases: website localization, restrict
content distribution, load balancing, …
• Can be associated with Health Checks
A 55.66.77.88
Routing Policies – Geoproximity
• Route traffic to your resources based on the geographic location of users and
resources
• Ability to shift more traffic to resources based on the defined bias
• To change the size of the geographic region, specify bias values:
• To expand (1 to 99) – more traffic to the resource
• To shrink (-1 to -99) – less traffic to the resource

• Resources can be:


• AWS resources (specify AWS region)
• Non-AWS resources (specify Latitude and Longitude)
• You must use Route 53 Traffic Flow to use this feature
Routing Policies – Geoproximity

us-west-1 us-east-1
Bias: 0 Bias: 0
Routing Policies – Geoproximity

us-west-1 us-east-1
Bias: 0 Bias: 50

Higher bias in us-east-1


Route 53 – Traffic flow
• Simplify the process of creating and
maintaining records in large and
complex configurations
• Visual editor to manage complex
routing decision trees
• Configurations can be saved as
Traffic Flow Policy
• Can be applied to different Route 53
Hosted Zones (different domain
names)
• Supports versioning
Routing Policies – IP-based Routing
User B User A
• Routing is based on clients’ IP addresses (200.5.4.100) (203.0.113.56)

• You provide a list of CIDRs for your clients


and the corresponding endpoints/locations Route 53
(user-IP-to-endpoint mappings) CIDR Collection
• Use cases: Optimize performance, reduce Locations CIDR blocks
location-1 203.0.113.0/24
network costs… location-2 200.5.4.0/24

• Example: route end users from a particular Records

ISP to a specific endpoint Record Name


example.com
Value
1.2.3.4
IP-based
location-1
example.com 5.6.7.8 location-2

EC2 Instance EC2 Instance


(5.6.7.8) (1.2.3.4)
Routing Policies – Multi-Value
• Use when routing traffic to multiple resources
• Route 53 return multiple values/resources
• Can be associated with Health Checks (return only values for healthy resources)
• Up to 8 healthy records are returned for each Multi-Value query
• Multi-Value is not a substitute for having an ELB
Domain Registar vs. DNS Service
• You buy or register your domain name with a Domain Registrar typically by
paying annual charges (e.g., GoDaddy, Amazon Registrar Inc., …)
• The Domain Registrar usually provides you with a DNS service to manage
your DNS records
• But you can use another DNS service to manage your DNS records
• Example: purchase the domain from GoDaddy and use Route 53 to manage
your DNS records

purchase
example.com manage DNS records

User
Amazon
Route 53
GoDaddy as Registrar & Route 53 as DNS Service

Amazon Public Hosted Zone


Route 53 stephanetheteacher.com
3rd Party Registrar with Amazon Route 53
• If you buy your domain on a 3rd par ty registrar, you can still use
Route 53 as the DNS Service provider

1. Create a Hosted Zone in Route 53


2. Update NS Records on 3rd party website to use Route 53 Name
Servers

• Domain Registrar != DNS Service


• But every Domain Registrar usually comes with some DNS features
Amazon VPC – Basics
VPC – Crash Course
• VPC is something you should know in depth for the AWS Certified Solutions
Architect Associate & AWS Certified SysOps Administrator

• At the AWS Cer tified Developer Level, you should know about:
• VPC, Subnets, Internet Gateways & NAT Gateways
• Security Groups, Network ACL (NACL), VPC Flow Logs
• VPC Peering, VPC Endpoints
• Site to Site VPN & Direct Connect

• I will just give you an overview, less than1 or 2 questions at your exam.
• Later in the course, I will be highlighting when VPC concepts are helpful
VPC & Subnets Primer www

• VPC: private network to deploy your


resources (regional resource)
• Subnets allow you to partition your
network inside your VPC
(Availability Zone resource)
Public Subnet
• A public subnet is a subnet that is
accessible from the internet
• A private subnet is a subnet that is not
accessible from the internet
• To define access to the internet and Private Subnet

between subnets, we use Route Tables. AZ A


VPC Diagram
AWS Cloud

Region
Availability Zone 1 Availability Zone 2

VPC
VPC CIDR Range:
10.0.0.0/16
Public subnet Public subnet

Private subnet Private subnet


Internet Gateway & NAT Gateways www

• Internet Gateways helps our VPC IGW


instances connect with the internet
• Public Subnets have a route to the
internet gateway. NAT

Public Subnet

• NAT Gateways (AWS-managed) &


NAT Instances (self-managed) allow
your instances in your Private Subnets Private Subnet
to access the internet while remaining
private AZ A
Network ACL & Security Groups
• NACL (Network ACL) VPC

• A firewall which controls traffic from and to Public subnet

subnet
• Can have ALLOW and DENY rules
• Are attached at the Subnet level NACL
• Rules only include IP addresses

• Security Groups
• A firewall that controls traffic to and from an
ENI / an EC2 Instance
• Can have only ALLOW rules
• Rules include IP addresses and other security
groups Security group
Network ACLs vs Security Groups

https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Secur
ity.html#VPC_Security_Comparison
VPC Flow Logs
• Capture information about IP traffic going into your interfaces:
• VPC Flow Logs
• Subnet Flow Logs
• Elastic Network Interface Flow Logs
• Helps to monitor & troubleshoot connectivity issues. Example:
• Subnets to internet
• Subnets to subnets
• Internet to subnets
• Captures network information from AWS managed interfaces too: Elastic Load
Balancers, ElastiCache, RDS, Aurora, etc…
• VPC Flow logs data can go to S3, CloudWatch Logs, and Kinesis Data Firehose
VPC Peering
• Connect two VPC, privately using VPC peering
AWS’ network VPC A
Aß àB
VPC B
• Make them behave as if they were
in the same network
• Must not have overlapping CIDR (IP
address range)
VPC C
• VPC Peering connection is not VPC peering VPC peering
transitive (must be established for Aß àC Bß à C

each VPC that need to


communicate with one another)
VPC Endpoints
• Endpoints allow you to connect to AWS VPC
Services using a private network instead of Private subnet
the public www network
• This gives you enhanced security and lower
latency to access AWS services VPC Endpoint
Interface (ENI)

• VPC Endpoint Gateway: S3 & DynamoDB VPC Endpoint


• VPC Endpoint Interface: the rest Gateway

• Only used within your VPC


S3 DynamoDB CloudWatch
Site to Site VPN & Direct Connect
• Site to Site VPN
• Connect an on-premises VPN to AWS
• The connection is automatically encrypted Public www Public www
• Goes over the public internet
• Direct Connect (DX) Site-to-Site VPN
• Establish a physical connection between on- (encrypted)
premises and AWS
• The connection is private, secure and fast
• Goes over a private network On-premises DC VPC
• Takes at least a month to establish

Private Private
Direct Connect
VPC Closing Comments
• VPC: Virtual Private Cloud
• Subnets: Tied to an AZ, network partition of the VPC
• Internet Gateway: at the VPC level, provide Internet Access
• NAT Gateway / Instances: give internet access to private subnets
• NACL: Stateless, subnet rules for inbound and outbound
• Security Groups: Stateful, operate at the EC2 instance level or ENI
• VPC Peering: Connect two VPC with non overlapping IP ranges, non transitive
• VPC Endpoints: Provide private access to AWS Services within VPC
• VPC Flow Logs: network traffic logs
• Site to Site VPN: VPN over public internet between on-premises DC and AWS
• Direct Connect: direct private connection to a AWS
VPC note – AWS Certified Developer
• Don’t stress if you didn’t understand everything in that section
• I will be highlighting in the course the specific VPC features we need

• Feel free to revisit that section after you’re done in the course !

• Moving on J
Typical 3 tier solution architecture
Route 53
ElastiCache
Auto Scaling group

Availability zone 1

Multi AZ

Store / retrieve
Availability zone 2
session data
+ Cached data

ELB
Availability zone 3

Amazon RDS
Read / write data

PUBLIC SUBNET PRIVATE SUBNET DATA SUBNET


LAMP Stack on EC2
• Linux: OS for EC2 instances
• Apache: Web Server that run on Linux (EC2)
• MySQL: database on RDS
• PHP: Application logic (running on EC2)

• Can add Redis / Memcached (ElastiCache) to include a caching tech


• To store local application data & software: EBS drive (root)
Wordpress on AWS
Availability zone 1

Multi AZ
ENI

Send image
Availability zone 2

EFS

ENI
WordPress on AWS (more complicated)

https://aws.amazon.com/blogs/architecture/wordpress-best-practices-on-aws/
Amazon S3
Section introduction
• Amazon S3 is one of the main building blocks of AWS
• It’s advertised as ”infinitely scaling” storage

• Many websites use Amazon S3 as a backbone


• Many AWS services use Amazon S3 as an integration as well

• We’ll have a step-by-step approach to S3


Amazon S3 Use cases
• Backup and storage
• Disaster Recovery
• Archive Nasdaq stores 7 years of
data into S3 Glacier
• Hybrid Cloud storage
• Application hosting
• Media hosting
• Data lakes & big data analytics
Sysco runs analytics on
• Software delivery its data and gain business
insights
• Static website
Amazon S3 - Buckets
• Amazon S3 allows people to store objects (files) in “buckets” (directories)
• Buckets must have a globally unique name (across all regions all accounts)
• Buckets are defined at the region level
• S3 looks like a global service but buckets are created in a region
• Naming convention
• No uppercase, No underscore
• 3-63 characters long
• Not an IP
• Must start with lowercase letter or number
• Must NOT start with the prefix xn-- S3 Bucket
• Must NOT end with the suffix -s3alias
Amazon S3 - Objects
• Objects (files) have a Key
• The key is the FULL path:
• s3://my-bucket/my_file.txt
• s3://my-bucket/my_folder1/another_folder/my_file.txt Object
• The key is composed of prefix + object name
• s3://my-bucket/my_folder1/another_folder/my_file.txt
• There’s no concept of “directories” within buckets
(although the UI will trick you to think otherwise)
• Just keys with very long names that contain slashes (“/”) S3 Bucket
with Objects
Amazon S3 – Objects (cont.)

• Object values are the content of the body:


• Max. Object Size is 5TB (5000GB)
• If uploading more than 5GB, must use “multi-part upload”

• Metadata (list of text key / value pairs – system or user metadata)


• Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle
• Version ID (if versioning is enabled)
Amazon S3 – Security
• User-Based
• IAM Policies – which API calls should be allowed for a specific user from IAM

• Resource-Based
• Bucket Policies – bucket wide rules from the S3 console - allows cross account
• Object Access Control List (ACL) – finer grain (can be disabled)
• Bucket Access Control List (ACL) – less common (can be disabled)

• Note: an IAM principal can access an S3 object if


• The user IAM permissions ALLOW it OR the resource policy ALLOWS it
• AND there’s no explicit DENY

• Encryption: encrypt objects in Amazon S3 using encryption keys


S3 Bucket Policies
• JSON based policies
• Resources: buckets and objects
• Effect: Allow / Deny
• Actions: Set of API to Allow or Deny
• Principal: The account or user to apply the
policy to

• Use S3 bucket for policy to:


• Grant public access to the bucket
• Force objects to be encrypted at upload
• Grant access to another account (Cross
Account)
Example: Public Access - Use Bucket Policy

S3 Bucket Policy
Allows Public Access

Anonymous www website visitor S3 Bucket


Example: User Access to S3 – IAM permissions

IAM Policy

IAM User

S3 Bucket
Example: EC2 instance access - Use IAM Roles

IAM permissions
EC2 Instance Role

EC2 Instance

S3 Bucket
Advanced: Cross-Account Access –
Use Bucket Policy
S3 Bucket Policy
Allows Cross-Account

IAM User
Other AWS account

S3 Bucket
Bucket settings for Block Public Access

• These settings were created to prevent company data leaks


• If you know your bucket should never be public, leave these on
• Can be set at the account level
Amazon S3 – Static Website Hosting
User
• S3 can host static websites and have them accessible on
the Internet
http://demo-bucket.s3-website-us-west-2.amazonaws.com
http://demo-bucket.s3-website.us-west-2.amazonaws.com

• The website URL will be (depending on the region)


• http://bucket-name.s3-website-aws-region.amazonaws.com
OR us-west-2

• http://bucket-name.s3-website.aws-region.amazonaws.com

S3 Bucket
• If you get a 403 Forbidden error, make sure the bucket (demo-bucket)

policy allows public reads!


Amazon S3 - Versioning
User
• You can version your files in Amazon S3
• It is enabled at the bucket level
upload
• Same key overwrite will change the “version”: 1, 2, 3….
• It is best practice to version your buckets
• Protect against unintended deletes (ability to restore a version) S3 Bucket
(my-bucket)
• Easy roll back to previous version
Version 1 Version 2
• Notes: Version 3
• Any file that is not versioned prior to enabling versioning will
have version “null” s3://my-bucket/my-file.docx
• Suspending versioning does not delete the previous versions
Amazon S3 – Replication (CRR & SRR)
• Must enable Versioning in source and destination buckets
• Cross-Region Replication (CRR)
S3 Bucket
• Same-Region Replication (SRR) (eu-west-1)
• Buckets can be in different AWS accounts
• Copying is asynchronous
asynchronous
• Must give proper IAM permissions to S3 replication

• Use cases: S3 Bucket


• CRR – compliance, lower latency access, replication across (us-east-2)
accounts
• SRR – log aggregation, live replication between production and test
accounts
Amazon S3 – Replication (Notes)
• After you enable Replication, only new objects are replicated
• Optionally, you can replicate existing objects using S3 Batch Replication
• Replicates existing objects and objects that failed replication

• For DELETE operations


• Can replicate delete markers from source to target (optional setting)
• Deletions with a version ID are not replicated (to avoid malicious deletes)

• There is no “chaining” of replication


• If bucket 1 has replication into bucket 2, which has replication into bucket 3
• Then objects created in bucket 1 are not replicated to bucket 3
S3 Storage Classes
• Amazon S3 Standard - General Purpose
• Amazon S3 Standard-Infrequent Access (IA)
• Amazon S3 One Zone-Infrequent Access
• Amazon S3 Glacier Instant Retrieval
• Amazon S3 Glacier Flexible Retrieval
• Amazon S3 Glacier Deep Archive
• Amazon S3 Intelligent Tiering

• Can move between classes manually or using S3 Lifecycle configurations


S3 Durability and Availability
• Durability:
• High durability (99.999999999%, 11 9’s) of objects across multiple AZ
• If you store 10,000,000 objects with Amazon S3, you can on average expect to
incur a loss of a single object once every 10,000 years
• Same for all storage classes

• Availability:
• Measures how readily available a service is
• Varies depending on storage class
• Example: S3 standard has 99.99% availability = not available 53 minutes a year
S3 Standard – General Purpose
• 99.99% Availability
• Used for frequently accessed data
• Low latency and high throughput
• Sustain 2 concurrent facility failures

• Use Cases: Big Data analytics, mobile & gaming applications, content
distribution…
S3 Storage Classes – Infrequent Access
• For data that is less frequently accessed, but requires rapid access when needed
• Lower cost than S3 Standard

• Amazon S3 Standard-Infrequent Access (S3 Standard-IA)


• 99.9% Availability
• Use cases: Disaster Recovery, backups

• Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)


• High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
• 99.5% Availability
• Use Cases: Storing secondary backup copies of on-premises data, or data you can recreate
Amazon S3 Glacier Storage Classes
• Low-cost object storage meant for archiving / backup
• Pricing: price for storage + object retrieval cost

• Amazon S3 Glacier Instant Retrieval


• Millisecond retrieval, great for data accessed once a quarter
• Minimum storage duration of 90 days
• Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier):
• Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free
• Minimum storage duration of 90 days
• Amazon S3 Glacier Deep Archive – for long term storage:
• Standard (12 hours), Bulk (48 hours)
• Minimum storage duration of 180 days
S3 Intelligent-Tiering
• Small monthly monitoring and auto-tiering fee
• Moves objects automatically between Access Tiers based on usage
• There are no retrieval charges in S3 Intelligent-Tiering

• Frequent Access tier (automatic): default tier


• Infrequent Access tier (automatic): objects not accessed for 30 days
• Archive Instant Access tier (automatic): objects not accessed for 90 days
• Archive Access tier (optional): configurable from 90 days to 700+ days
• Deep Archive Access tier (optional): config. from 180 days to 700+ days
S3 Storage Classes Comparison
Intelligent- Glacier Instant Glacier Flexible Glacier Deep
Standard Standard-IA One Zone-IA
Tiering Retrieval Retrieval Archive

Durability 99.999999999% == (11 9’s)

Availability 99.99% 99.9% 99.9% 99.5% 99.9% 99.99% 99.99%

Availability SLA 99.9% 99% 99% 99% 99% 99.9% 99.9%

Availability
>= 3 >= 3 >= 3 1 >= 3 >= 3 >= 3
Zones

Min. Storage
None None 30 Days 30 Days 90 Days 90 Days 180 Days
Duration Charge

Min. Billable
None None 128 KB 128 KB 128 KB 40 KB 40 KB
Object Size

Retrieval Fee None None Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved

https://aws.amazon.com/s3/storage-classes/
S3 Storage Classes – Price Comparison
Example: us-east-1
Glacier Instant Glacier Flexible Glacier Deep
Standard Intelligent-Tiering Standard-IA One Zone-IA
Retrieval Retrieval Archive
Storage Cost
$0.023 $0.0025 - $0.023 $0.0125 $0.01 $0.004 $0.0036 $0.00099
(per GB per month)
GET: $0.0004
GET: $0.0004
POST: $0.03
POST: $0.05
Retrieval Cost GET: $0.0004 GET: $0.0004 GET: $0.001 GET: $0.001 GET: $0.01
(per 1000 request) POST: $0.005 POST: $0.005 POST: $0.01 POST: $0.01 POST: $0.02 Expedited: $10
Standard: $0.10
Standard: $0.05
Bulk: $0.025
Bulk: free

Expedited (1 – 5 mins)
Standard (12 hours)
Retrieval Time Instantaneous Standard (3 – 5 hours)
Bulk (48 hours)
Bulk (5 – 12 hours)

Monitoring Cost
$0.0025
(pet 1000 objects)

https://aws.amazon.com/s3/pricing/
AWS CLI, SDK, IAM Roles &
Policies
EC2 Instance Metadata (IMDS)
• AWS EC2 Instance Metadata (IMDS) is powerful but one of the least known
features to developers
• It allows AWS EC2 instances to ”learn about themselves” without using an
IAM Role for that purpose.
• The URL is http://169.254.169.254/latest/meta-data
• You can retrieve the IAM Role name from the metadata, but you CANNOT
retrieve the IAM Policy.
• Metadata = Info about the EC2 instance
• Userdata = launch script of the EC2 instance

• Let’s practice and see what we can do with it!


IMDSv2 vs. IMDSv1
• IMDSv1 is accessing http://169.254.169.254/latest/meta-data directly

• IMDSv2 is more secure and is done in two steps:


1. Get Session Token (limited validity) – using headers & PUT

2. Use Session Token in IMDSv2 calls – using headers


MFA with CLI
• To use MFA with the CLI, you must create a temporary session
• To do so, you must run the STS GetSessionToken API call

• aws sts get-session-token --serial-number arn-of-the-mfa-device --token-


code code-from-token --duration-seconds 3600
AWS SDK Overview
• What if you want to perform actions on AWS directly from your applications
code ? (without using the CLI).
• You can use an SDK (software development kit) !
• Official SDKs are…
• Java
• .NET
• Node.js
• PHP
• Python (named boto3 / botocore)
• Go
• Ruby
• C++
AWS SDK Overview
• We have to use the AWS SDK when coding against AWS Services such
as DynamoDB

• Fun fact… the AWS CLI uses the Python SDK (boto3)
• The exam expects you to know when you should use an SDK
• We’ll practice the AWS SDK when we get to the Lambda functions

• Good to know: if you don’t specify or configure a default region, then


us-east-1 will be chosen by default
AWS Limits (Quotas)
• API Rate Limits
• DescribeInstances API for EC2 has a limit of 100 calls per seconds
• GetObject on S3 has a limit of 5500 GET per second per prefix
• For Intermittent Errors: implement Exponential Backoff
• For Consistent Errors: request an API throttling limit increase

• Service Quotas (Service Limits)


• Running On-Demand Standard Instances: 1152 vCPU
• You can request a service limit increase by opening a ticket
• You can request a service quota increase by using the Service Quotas API
Exponential Backoff (any AWS service)
• If you get ThrottlingException intermittently, use exponential backoff
• Retry mechanism already included in AWS SDK API calls
• Must implement yourself if using the AWS API as-is or in specific cases
• Must only implement the retries on 5xx server errors and throttling
• Do not implement on the 4xx client errors
1

5
AWS CLI Credentials Provider Chain
• The CLI will look for credentials in this order

1. Command line options – --region, --output, and --profile


2. Environment variables – AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
and AWS_SESSION_TOKEN
3. CLI credentials file –aws configure
~/.aws/credentials on Linux / Mac & C:\Users\user\.aws\credentials on Windows
4. CLI configuration file – aws configure
~/.aws/config on Linux / macOS & C:\Users\USERNAME\.aws\config on Windows
5. Container credentials – for ECS tasks
6. Instance profile credentials – for EC2 Instance Profiles
AWS SDK Default Credentials Provider Chain
• The Java SDK (example) will look for credentials in this order

1. Java system proper ties – aws.accessKeyId and aws.secretKey


2. Environment variables –
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
3. The default credential profiles file – ex at: ~/.aws/credentials, shared by
many SDK
4. Amazon ECS container credentials – for ECS containers
5. Instance profile credentials– used on EC2 instances
AWS Credentials Scenario
• An application deployed on an EC2 instance is using environment variables
with credentials from an IAM user to call the Amazon S3 API.
• The IAM user has S3FullAccess permissions.
• The application only uses one S3 bucket, so according to best practices:
• An IAM Role & EC2 Instance Profile was created for the EC2 instance
• The Role was assigned the minimum permissions to access that one S3 bucket

• The IAM Instance Profile was assigned to the EC2 instance, but it still
had access to all S3 buckets. Why?
the credentials chain is still giving priorities to the environment variables
AWS Credentials Best Practices
• Overall, NEVER EVER STORE AWS CREDENTIALS IN YOUR CODE
• Best practice is for credentials to be inherited from the credentials chain

• If using working within AWS, use IAM Roles


• => EC2 Instances Roles for EC2 Instances
• => ECS Roles for ECS tasks
• => Lambda Roles for Lambda functions
• If working outside of AWS, use environment variables / named profiles
Signing AWS API requests
• When you call the AWS HTTP API, you sign the request so that AWS
can identify you, using your AWS credentials (access key & secret key)
• Note: some requests to Amazon S3 don’t need to be signed
• If you use the SDK or CLI, the HTTP requests are signed for you

• You should sign an AWS HTTP request using Signature v4 (SigV4)


SigV4 Request examples
• HTTP Header option (signature in Authorization header)

• Query String option, ex: S3 pre-signed URLs (signature in X-Amz-Signature)


Amazon S3 – Advanced
Amazon S3 – Moving between Storage Classes
• You can transition objects between
storage classes Standard

Standard IA
• For infrequently accessed object,
move them to Standard IA Intelligent Tiering

• For archive objects that you don’t One-Zone IA


need fast access to, move them to
Glacier or Glacier Deep Archive Glacier Instant Retrieval

Glacier Flexible Retrieval


• Moving objects can be automated
using a Lifecycle Rules Glacier Deep Archive
Amazon S3 – Lifecycle Rules
• Transition Actions – configure objects to transition to another storage class
• Move objects to Standard IA class 60 days after creation
• Move to Glacier for archiving after 6 months

• Expiration actions – configure objects to expire (delete) after some time


• Access log files can be set to delete after a 365 days
• Can be used to delete old versions of files (if versioning is enabled)
• Can be used to delete incomplete Multi-Part uploads

• Rules can be created for a certain prefix (example: s3://mybucket/mp3/*)


• Rules can be created for certain objects Tags (example: Department: Finance)
Amazon S3 – Lifecycle Rules (Scenario 1)
• Your application on EC2 creates images thumbnails after profile
photos are uploaded to Amazon S3. These thumbnails can be easily
recreated, and only need to be kept for 60 days. The source images
should be able to be immediately retrieved for these 60 days, and
afterwards, the user can wait up to 6 hours. How would you design
this?

• S3 source images can be on Standard, with a lifecycle configuration to


transition them to Glacier after 60 days
• S3 thumbnails can be on One-Zone IA, with a lifecycle configuration to
expire them (delete them) after 60 days
Amazon S3 – Lifecycle Rules (Scenario 2)
• A rule in your company states that you should be able to recover your
deleted S3 objects immediately for 30 days, although this may happen
rarely. After this time, and for up to 365 days, deleted objects should
be recoverable within 48 hours.

• Enable S3 Versioning in order to have object versions, so that “deleted


objects” are in fact hidden by a “delete marker” and can be recovered
• Transition the “noncurrent versions” of the object to Standard IA
• Transition afterwards the “noncurrent versions” to Glacier Deep Archive
Amazon S3 Analytics – Storage Class Analysis
• Help you decide when to transition objects to S3 Bucket
the right storage class
• Recommendations for Standard and Standard
IA
• Does NOT work for One-Zone IA or Glacier S3 Analytics
• Report is updated daily
• 24 to 48 hours to start seeing data analysis .csv report

Date StorageClass ObjectAge

• Good first step to put together Lifecycle Rules 8/22/2022 STANDARD 000-014

(or improve them)! 8/25/2022 STANDARD 030-044


9/6/2022 STANDARD 120-149
S3 Event Notifications
• S3:ObjectCreated, S3:ObjectRemoved,
S3:ObjectRestore, S3:Replication…
SNS
• Object name filtering possible (*.jpg)
• Use case: generate thumbnails of images
uploaded to S3 events
• Can create as many “S3 events” as desired
Amazon S3 SQS

• S3 event notifications typically deliver events


in seconds but can sometimes take a minute
or longer
Lambda Function
S3 Event Notifications – IAM Permissions

SNS

SNS Resource (Access) Policy


events

Amazon S3 SQS
SQS Resource (Access) Policy

Lambda Function
Lambda Resource Policy
S3 Event Notifications
with Amazon EventBridge
events All events rules Over 18
AWS services
as destinations
Amazon S3 Amazon
bucket EventBridge

• Advanced filtering options with JSON rules (metadata, object size,


name...)
• Multiple Destinations – ex Step Functions, Kinesis Streams / Firehose…
• EventBridge Capabilities – Archive, Replay Events, Reliable delivery
S3 – Baseline Performance
• Amazon S3 automatically scales to high request rates, latency 100-200 ms
• Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or
5,500 GET/HEAD requests per second per prefix in a bucket.
• There are no limits to the number of prefixes in a bucket.
• Example (object path => prefix):
• bucket/folder1/sub1/file => /folder1/sub1/
• bucket/folder1/sub2/file => /folder1/sub2/
• bucket/1/file => /1/
• bucket/2/file => /2/
• If you spread reads across all four prefixes evenly, you can achieve 22,000
requests per second for GET and HEAD
S3 Performance
• Multi-Par t upload: • S3 Transfer Acceleration
• recommended for files > 100MB, • Increase transfer speed by transferring
must use for files > 5GB file to an AWS edge location which will
• Can help parallelize uploads (speed forward the data to the S3 bucket in the
up transfers) target region
• Compatible with multi-part upload
Divide Parallel uploads
In parts
Fast Fast
(public www) (private AWS)
File in USA Edge Location S3 Bucket
Amazon S3 USA Australia
BIG file
S3 Performance – S3 Byte-Range Fetches
• Parallelize GETs by requesting specific
byte ranges
• Better resilience in case of failures
Can be used to retrieve only partial
Can be used to speed up downloads data (for example the head of a file)

File in S3 File in S3

Byte-range request for header


(first XX bytes)
Part 1 Part 2 … Part N header

Requests in parallel
S3 Select & Glacier Select
• Retrieve less data using SQL by performing server-side filtering
• Can filter by rows & columns (simple SQL statements)
• Less network transfer, less CPU cost client-side

CSV file

Get CSV with S3 Select

Send filtered dataset


Amazon S3

Server-side filtering
https://aws.amazon.com/blogs/aws/s3-glacier-select/
S3 User-Defined Object Metadata & S3 Object Tags
Metadata Tags
• S3 User-Defined Object Metadata
• When uploading an object, you can also assign metadata Key Value
• Name-value (key-value) pairs Content-Length 7.5 KB Key Value
• User-defined metadata names must begin with "x-amz-meta-” Content-Type html Project Blue
• Amazon S3 stores user-defined metadata keys in lowercase x-amz-meta-origin paris PHI True
• Metadata can be retrieved while retrieving the object
• S3 Object Tags
• Key-value pairs for objects in Amazon S3
• Useful for fine-grained permissions (only access specific objects S3 Object
with specific tags)
• Useful for analytics purposes (using S3 Analytics to group by tags)
index data in
• You cannot search the object metadata or object tags DDB Table
(searchable)
• Instead, you must use an external DB as a search index such
as DynamoDB

DynamoDB Table
Amazon S3 – Security
Amazon S3 – Object Encryption
• You can encrypt objects in S3 buckets using one of 4 methods

• Server-Side Encryption (SSE)


• Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) – Enabled by
Default
• Encrypts S3 objects using keys handled, managed, and owned by AWS
• Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
• Leverage AWS Key Management Service (AWS KMS) to manage encryption keys
• Server-Side Encryption with Customer-Provided Keys (SSE-C)
• When you want to manage your own encryption keys
• Client-Side Encryption

• It’s important to understand which ones are for which situation for the exam
Amazon S3 Encryption – SSE-S3
• Encryption using keys handled, managed, and owned by AWS
• Object is encrypted server-side
• Encryption type is AES-256
• Must set header "x-amz-server-side-encryption": "AES256"
• Enabled by default for new buckets & new objects

Amazon S3
Object

upload

HTTP(S) + Header + Encryption


User
S3 Bucket
S3 Owned Key
Amazon S3 Encryption – SSE-KMS
• Encryption using keys handled and managed by AWS KMS (Key Management Service)
• KMS advantages: user control + audit key usage using CloudTrail
• Object is encrypted server side
• Must set header "x-amz-server-side-encryption": "aws:kms"

Amazon S3
Object

upload

HTTP(S) + Header + Encryption


User
S3 Bucket
KMS Key
AWS KMS
SSE-KMS Limitation
• If you use SSE-KMS, you may be impacted S3 Bucket KMS Key
by the KMS limits API call
• When you upload, it calls the
GenerateDataKey KMS API
Upload / download
• When you download, it calls the Decrypt SSE-KMS
KMS API
• Count towards the KMS quota per second
(5500, 10000, 30000 req/s based on region)
Users
• You can request a quota increase using the
Service Quotas Console
Amazon S3 Encryption – SSE-C
• Server-Side Encryption using keys fully managed by the customer outside of AWS
• Amazon S3 does NOT store the encryption key you provide
• HTTPS must be used
• Encryption key must provided in HTTP headers, for every HTTP request made

Amazon S3
Object
+
upload

HTTPS ONLY + Encryption


User + Key in Header S3 Bucket
Client-Provided Key
Amazon S3 Encryption – Client-Side Encryption
• Use client libraries such as Amazon S3 Client-Side Encryption Library
• Clients must encrypt data themselves before sending to Amazon S3
• Clients must decrypt data themselves when retrieving from Amazon S3
• Customer fully manages the keys and encryption cycle

File Amazon S3

+ Encryption upload

HTTP(S)
File
(encrypted) S3 Bucket
Client Key
Amazon S3 – Encryption in transit (SSL/TLS)
• Encryption in flight is also called SSL/TLS

• Amazon S3 exposes two endpoints:


• HTTP Endpoint – non encrypted
• HTTPS Endpoint – encryption in flight

• HTTPS is recommended
• HTTPS is mandatory for SSE-C
• Most clients would use the HTTPS endpoint by default
Amazon S3 – Force Encryption in Transit
aws:SecureTransport

Account B
User http

S3 Bucket
(my-bucket)
https

User Bucket Policy


Amazon S3 – Default Encryption vs. Bucket Policies
• SSE-S3 encryption is automatically applied to new objects stored in S3 bucket
• Optionally, you can “force encryption” using a bucket policy and refuse any API call
to PUT an S3 object without encryption headers (SSE-KMS or SSE-C)

• Note: Bucket Policies are evaluated before “Default Encryption”


What is CORS?
• Cross-Origin Resource Sharing (CORS)
• Origin = scheme (protocol) + host (domain) + por t
• example: https://www.example.com (implied port is 443 for HTTPS, 80 for HTTP)
• Web Browser based mechanism to allow requests to other origins while
visiting the main origin
• Same origin: http://example.com/app1 & http://example.com/app2
• Different origins: http://www.example.com & http://other.example.com
• The requests won’t be fulfilled unless the other origin allows for the
requests, using CORS Headers (example: Access-Control-Allow-Origin)
What is CORS?
OPTIONS /
Host: www.other.com
Origin: https://www.example.com
Preflight Request

Access-Control-Allow-Origin: https://www.example.com
Access-Control-Allow-Methods: GET, PUT, DELETE
HTTPS Request
Preflight Response
Web Browser
Web Server Web Server
(Origin) GET / (Cross-Origin)
https://www.example.com Host: www.other.com https://www.other.com
Origin: https://www.example.com
CORS Headers received already by the Origin
The Web Browser can make requests
Amazon S3 – CORS
• If a client makes a cross-origin request on our S3 bucket, we need to enable
the correct CORS headers
• It’s a popular exam question
• You can allow for a specific origin or for * (all origins)
GET /index.html
Host: http://my-bucket-html.s3-website.us-west-2.amazonaws.com S3 Bucket
(my-bucket-html)
(Static Website Enabled)
index.html

GET /images/coffee.jpg
Web Browser Host: http://my-bucket-assets.s3-website.us-west-2.amazonaws.com
Origin: http://my-bucket-html.s3-website.us-west-2.amazonaws.com S3 Bucket
(my-bucket-assets)
(Static Website Enabled)
Access-Control-Allow-Origin: http://my-bucket-html.s3-website.us-west-2.amazonaws.com
Amazon S3 – MFA Delete
• MFA (Multi-Factor Authentication) – force users to generate a code on a
device (usually a mobile phone or hardware) before doing important
operations on S3
• MFA will be required to:
• Permanently delete an object version Google Authenticator
• Suspend Versioning on the bucket
• MFA won’t be required to:
• Enable Versioning
• List deleted versions MFA Hardware Device
• To use MFA Delete, Versioning must be enabled on the bucket
• Only the bucket owner (root account) can enable/disable MFA Delete
S3 Access Logs
• For audit purpose, you may want to log all access to S3 buckets
requests
• Any request made to S3, from any account, authorized or denied,
will be logged into another S3 bucket
• That data can be analyzed using data analysis tools…
• The target logging bucket must be in the same AWS region
My-bucket

Log all
requests
• The log format is at:
https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html

Logging Bucket
S3 Access Logs: Warning
• Do not set your logging bucket to be the monitored bucket
• It will create a logging loop, and your bucket will grow exponentially

Logging loop
PutObject

App Bucket &


Logging Bucket

Do not try this at home J


Amazon S3 – Pre-Signed URLs
Owner
• Generate pre-signed URLs using the S3 Console, AWS CLI or SDK

pre-signed URL
• URL Expiration

generate
• S3 Console – 1 min up to 720 mins (12 hours)
• AWS CLI – configure expiration with --expires-in parameter in seconds URL
(default 3600 secs, max. 604800 secs ~ 168 hours)
• Users given a pre-signed URL inherit the permissions of the user
that generated the URL for GET / PUT
URL
S3 Bucket
• Examples: (Private)
• Allow only logged-in users to download a premium video from your S3
bucket
• Allow an ever-changing list of users to download files by generating URLs
dynamically URL
• Allow temporarily a user to upload a file to a precise location in your S3
bucket
S3 – Access Points
Policy
Grant R/W to
Users /finance prefix Finance S3 Bucket
(Finance) Access Point Simple Bucket
Policy
Grant R/W to
Policy
Users /finance/…
/sales prefix Sales
(Sales) Access Point
Policy /sales/…
Grant R to
Users entire bucket Analytics
(Analytics) Access Point

• Access Points simplify security management for S3 Buckets


• Each Access Point has:
• its own DNS name (Internet Origin or VPC Origin)
• an access point policy (similar to bucket policy) – manage security at scale
S3 – Access Points – VPC Origin
• We can define the access VPC
Access Point
S3 Bucket
point to be accessible EC2 Instance VPC Endpoint VPC Origin

only from within the VPC


• You must create a VPC
Endpoint to access the Endpoint Access Point Bucket
Policy Policy Policy

Access Point (Gateway


or Interface Endpoint)
• The VPC Endpoint Policy
must allow access to the
target bucket and Access
Point
S3 Object Lambda
AWS Cloud
• Use AWS Lambda Functions to
change the object before it is
retrieved by the caller application Original
Object
S3 Bucket

• Only one S3 bucket is needed, on E-Commerce


top of which we create S3 Access Application
Point and S3 Object Lambda Access Supporting
S3 Access Point
Points. S3 Object Lambda
Access Point
Redacting
Lambda Function
• Use Cases: Redacted
• Redacting personally identifiable Object
information for analytics or non- Analytics
production environments. Application S3 Object Lambda Enriching
• Converting across data formats, such Access Point Lambda Function
as converting XML to JSON. Enriched
• Resizing and watermarking images on Object
the fly using caller-specific details, such Marketing
as the user who requested the object. Application Customer Loyalty
Database
Amazon CloudFront
Amazon CloudFront
• Content Delivery Network (CDN)
• Improves read performance,
content is cached at the edge
• Improves users experience
• 216 Point of Presence globally (edge
locations)
• DDoS protection (because
worldwide), integration with Shield, Source: https://aws.amazon.com/cloudfront/features/?nc=sn&loc=2
AWS Web Application Firewall
CloudFront – Origins
• S3 bucket
• For distributing files and caching them at the edge
• Enhanced security with CloudFront Origin Access Control (OAC)
• OAC is replacing Origin Access Identity (OAI)
• CloudFront can be used as an ingress (to upload files to S3)

• Custom Origin (HTTP)


• Application Load Balancer
• EC2 instance
• S3 website (must first enable the bucket as a static S3 website)
• Any HTTP backend you want
CloudFront at a high level
GET /beach.jpg?size=300x300 HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.example.com
Accept-Encoding: gzip, deflate

Origin
Forward Request
to your Origin
S3

CloudFront Edge Location or


Client

HTTP
Local Cache
CloudFront – S3 as an Origin
AWS Cloud

Public www
Private AWS
Private AWS
Edge Edge
Los Angeles Mumbai

Private AWS Private AWS

Origin (S3 bucket)


Public www OAC
Edge Edge
São Paulo Melbourne

Origin Access Control


+ S3 bucket policy
CloudFront vs S3 Cross Region Replication
• CloudFront:
• Global Edge network
• Files are cached for a TTL (maybe a day)
• Great for static content that must be available everywhere

• S3 Cross Region Replication:


• Must be setup for each region you want replication to happen
• Files are updated in near real-time
• Read only
• Great for dynamic content that needs to be available at low-latency in few
regions
CloudFront Caching
Client
• The cache lives at each CloudFront Edge
Location request

• CloudFront identifies each object in the cache


using the Cache Key (see next slide)
Origin
• You want to maximize the Cache Hit ratio to forward
minimize requests to the origin
Edge Location
• You can invalidate part of the cache using the
CreateInvalidation API check/update cached objects

expires based on TTL


What is CloudFront Cache Key?
Client

• A unique identifier for every object GET /content/stories/example-story.html?ref=123abc&split-pages=false HTTP/1.


Host: mywebsite.com
in the cache User-Agent: Mozilla/5.0 (Mac OS X 10_15_2….)
Date: Tue, 28 Jan 2021 17:01:57 GMT
• By default, consists of hostname + Authorization: SAPISIDHASH fdd00ecee39fe….
Keep-Alive: 300
resource portion of the URL Accept-Ranges: bytes
Cookie: session_id=12344321

• If you have an application that


CloudFront
serves up content that varies based Cache Hit
on user, device, language, location… get object if exists
Edge Location
• You can add other elements
Cache Key (cache based on) Object
(HTTP headers, cookies, query - mywebsite.com
strings) to the Cache Key using - /content/stories/example-story.html
Cache Miss
CloudFront Cache Policies get object from origin

Origin
(EC2 Instance)
CloudFront Policies – Cache Policy
• Cache based on:
• HTTP Headers: None – Whitelist
• Cookies: None – Whitelist – Include All-Except – All
• Query Strings: None – Whitelist – Include All-Except – All
• Control the TTL (0 seconds to 1 year), can be set by the origin using
the Cache-Control header, Expires header…
• Create your own policy or use Predefined Managed Policies
• All HTTP headers, cookies, and query strings that you include in the
Cache Key are automatically included in origin requests
CloudFront Caching – Cache Policy
HTTP Headers
GET /blogs/myblog.html HTTP/1.1
Host: mywebsite.com • None:
User-Agent: Mozilla/5.0 (Mac OS X 10_15_2….)
Date: Tue, 28 Jan 2021 17:01:57 GMT • Don’t include any headers in the Cache Key
Authorization: SAPISIDHASH fdd00ecee39fe…. (except default)
Keep-Alive: 300 • Headers are not forwarded (except default)
Language: fr-fr
• Best caching performance
• Whitelist:
request
• only specified headers included in the Cache Key
CloudFront
• Specified headers are also forwarded to Origin
Client
CloudFront Cache – Cache Policy
Query Strings
• None
• Don’t include any query strings in the Cache Key
• Query strings are not forwarded
GET /image/cat.jpg?border=red&size=large HTTP/1.1 • Whitelist
… • Only specified query strings included in the Cache Key
• Only specified query strings are forwarded
• Include All-Except
• Include all query strings in the Cache Key except the
specified list
request • All query strings are forwarded except the specified list
• All
Client CloudFront • Include all query strings in the Cache Key
• All query strings are forwarded
• Worst caching performance
CloudFront Policies – Origin Request Policy
• Specify values that you want to include in origin requests without
including them in the Cache Key (no duplicated cached content)
• You can include:
• HTTP headers: None – Whitelist – All viewer headers options
• Cookies: None – Whitelist – All
• Query Strings: None – Whitelist – All
• Ability to add CloudFront HTTP headers and Custom Headers to an
origin request that were not included in the viewer request
• Create your own policy or use Predefined Managed Policies
Cache Policy vs. Origin Request Policy
GET /content/stories/example-story.html?ref=123abc&split-pages=false HTTP/1.1
Host: mywebsite.com GET /content/stories/example-story.html?ref=123abc HTTP/1.1
User-Agent: Mozilla/5.0 (Mac OS X 10_15_2….) Host: mywebsite.com
Date: Tue, 28 Jan 2021 17:01:57 GMT User-Agent: Mozilla/5.0 (Mac OS X 10_15_2….)
Authorization: SAPISIDHASH fdd00ecee39fe…. Authorization: SAPISIDHASH fdd00ecee39fe….
Keep-Alive: 300 Cookie: session_id=12344321
Accept-Ranges: bytes
Cookie: session_id=12344321

request forward

Client CloudFront Origin


(EC2 Instance)
Cache Policy Origin Request Policy (whitelist)
Cache Key (cache based on) Object Edge Location
Type Value
- mywebsite.com HTTP Headers User-Agent, Authorization
- /content/stories/example-story.html
- Header: Authorization Cookies session_id
Query Strings ref
CloudFront – Cache Invalidations
• In case you update the back-end GET /index.html
Invalidate
- /index.html
origin, CloudFront doesn’t know - /images/*
about it and will only get the
refreshed content after the TTL has CloudFront
expired
• However, you can force an entire or invalidate

partial cache refresh (thus bypassing


the TTL) by performing a CloudFront Edge Location Edge Location
Invalidation Cache Cache
• You can invalidate all files (*) or a
special path (/images/*) index.html /images/ index.html /images/

update files S3 Bucket


(origin)
CloudFront – Cache Behaviors
Route To Multiple Origins
• Configure different settings for a given URL path
pattern
• Example: one specific cache behavior to
CloudFront
images/*.jpg files on your origin web server Distribution
• Route to different kind of origins/origin groups
based on the content type or path pattern
• /images/* Cache Behaviors
• /api/*
/api/* /*
• /* (default cache behavior)
default
• When adding additional Cache Behaviors, the
Default Cache Behavior is always the last to be Origins
processed and is always /*
Application S3 Bucket
Load Balancer
CloudFront – Cache Behaviors – Sign In Page

Cache Behaviors Origins


default

/* S3 Bucket
Signed Cookies

authenticate

Users CloudFront
Distribution /login EC2 Instance
Signed Cookies
Generate Signed
Cookies
CloudFront – Maximize cache hits by
separating static and dynamic distributions
CDN Layer
CloudFront Dynamic Content (REST, HTTP server):
ALB + EC2
Cache based on correct
headers and cookie

Dynamic

Static content
Static requests

No headers / session caching rules


Required for maximizing cache hits
CloudFront – ALB or EC2 as an origin
Security group

Allow Public IP of Edge Locations

http://d7uri8nf7uskq.cloudfront.net/tools/list-cloudfront-ips

Edge Location EC2 Instances


Must be Public

Security group Security group


Allow Public IP of Allow Security Group
Edge Locations of Load Balancer

Edge Location Application Load Balancer EC2 Instances


Public IPs Must be Public Can be Private
CloudFront Geo Restriction

• You can restrict who can access your distribution


• Allowlist: Allow your users to access your content only if they're in one of the
countries on a list of approved countries.
• Blocklist: Prevent your users from accessing your content if they're in one of the
countries on a list of banned countries.

• The “country” is determined using a 3rd party Geo-IP database


• Use case: Copyright Laws to control access to content
CloudFront Signed URL / Signed Cookies
• You want to distribute paid shared content to premium users over the world
• We can use CloudFront Signed URL / Cookie. We attach a policy with:
• Includes URL expiration
• Includes IP ranges to access the data from
• Trusted signers (which AWS accounts can create signed URLs)
• How long should the URL be valid for?
• Shared content (movie, music): make it short (a few minutes)
• Private content (private to the user): you can make it last for years

• Signed URL = access to individual files (one signed URL per file)
• Signed Cookies = access to multiple files (one signed cookie for many files)
CloudFront Signed URL Diagram
Amazon CloudFront Amazon S3

Signed URL
OAC
Client Object
Edge location

Return
Authentication Signed URL
+ Authorization Edge location

Use AWS SDK


Generate Signed URL
Application
CloudFront Signed URL vs
S3 Pre-Signed URL
• CloudFront Signed URL: • S3 Pre-Signed URL:
• Allow access to a path, no matter • Issue a request as the person who
the origin pre-signed the URL
• Account wide key-pair, only the root • Uses the IAM key of the signing
can manage it IAM principal
• Can filter by IP, path, date, expiration • Limited lifetime
• Can leverage caching features

Pre-Signed URL
Signed URL Origin
Client
Client
Edge location
CloudFront Signed URL Process
• Two types of signers:
• Either a trusted key group (recommended)
• Can leverage APIs to create and rotate keys (and IAM for API security)
• An AWS Account that contains a CloudFront Key Pair
• Need to manage keys using the root account and the AWS console
• Not recommended because you shouldn’t use the root account for this
• In your CloudFront distribution, create one or more trusted key groups
• You generate your own public / private key
• The private key is used by your applications (e.g. EC2) to sign URLs
• The public key (uploaded) is used by CloudFront to verify URLs
CloudFront - Pricing
• CloudFront Edge locations are all around the world
• The cost of data out per edge location varies

lower higher
CloudFront – Price Classes
• You can reduce the number of edge locations for cost reduction
• Three price classes:
1. Price Class All: all regions – best performance
2. Price Class 200: most regions, but excludes the most expensive regions
3. Price Class 100: only the least expensive regions
CloudFront - Price Class
Prices Class 100
Prices Class 200
Prices Class All
CloudFront – Multiple Origin
• To route to different kind of origins based on the content type
• Based on path pattern:
• /images/*
• /api/*
• /*
Cache Behaviors Origins

/api/*
Application
Load Balancer

Amazon CloudFront /*

S3 Bucket
CloudFront – Origin Groups
• To increase high-availability and do failover
• Origin Group: one primary and one secondary origin
• If the primary origin fails, the second one is used

Amazon Origin Group Amazon Origin Group


CloudFront Send request CloudFront Send request

Responds with error status code Responds with error status code

Origin A Origin A

Replication
(Primary Origin) (Primary Origin)
Client Client
Try same request Try same request

Responds with OK status code Responds with OK status code

Origin B Origin B

S3 + CloudFront – Region-level High Availability


CloudFront – Field Level Encryption
• Protect user sensitive information through application stack
• Adds an additional layer of security along with HTTPS
• Sensitive information encrypted at the edge close to user
• Uses asymmetric encryption
• Usage:
• Specify set of fields in POST requests that you want to be encrypted (up to 10 fields)
• Specify the public key to encrypt them
Decrypt
Encrypt
POST /submit HTTP/1.1 POST /submit HTTP/1.1 using
Host: www.example.com using Host: www.example.com Private Key
Field Public Key
Field Field Field Origins
1 2 1 2

HTTPS HTTPS HTTPS HTTPS

Client Edge Location Amazon CloudFront Application Web Servers


Load Balancer
CloudFront – Real Time Logs
• Get real-time requests received by CloudFront sent to Kinesis Data Streams
• Monitor, analyze, and take actions based on content delivery performance
• Allows you to choose:
• Sampling Rate – percentage of requests for which you want to receive
• Specific fields and specific Cache Behaviors (path patterns)

requests logs records


Real-time Processing
Users CloudFront Kinesis Data Lambda
Streams

requests logs records


Near Real-time Processing
Users CloudFront Kinesis Data Kinesis Data
Streams Firehose
Containers on AWS
What is Docker?
• Docker is a software development platform to deploy apps
• Apps are packaged in containers that can be run on any OS
• Apps run the same, regardless of where they’re run
• Any machine
• No compatibility issues
• Predictable behavior
• Less work
• Easier to maintain and deploy
• Works with any language, any OS, any technology
• Use cases: microservices architecture, lift-and-shift apps from on-
premises to the AWS cloud, …
Docker on an OS
Server (e.g., EC2 instance)
Where are Docker images stored?
• Docker images are stored in Docker Repositories

• Docker Hub (https://hub.docker.com)


• Public repository
• Find base images for many technologies or OS (e.g., Ubuntu, MySQL, …)

• Amazon ECR (Amazon Elastic Container Registry)


• Private repository
• Public repository (Amazon ECR Public Gallery https://gallery.ecr.aws)
Docker vs. Virtual Machines
• Docker is ”sort of ” a virtualization technology, but not exactly
• Resources are shared with the host => many containers on one server

Apps Apps Apps

Guest OS Guest OS Guest OS


(VM) (VM) (VM)

Hypervisor Docker Daemon

Host OS Host OS (EC2 Instance)

Infrastructure Infrastructure
Getting Started with Docker

Build Run container

Dockerfile
image

Push Pull

Docker Repository

Amazon
ECR
Docker Containers Management on AWS
• Amazon Elastic Container Service (Amazon ECS) Amazon ECS
• Amazon’s own container platform

• Amazon Elastic Kubernetes Service (Amazon EKS) Amazon EKS


• Amazon’s managed Kubernetes (open source)

• AWS Fargate
AWS Fargate
• Amazon’s own Serverless container platform
• Works with ECS and with EKS

• Amazon ECR: Amazon ECR


• Store container images
Amazon ECS - EC2 Launch Type
Amazon ECS / ECS Cluster
• ECS = Elastic Container Service
New Docker
• Launch Docker containers on AWS = Container
Launch ECS Tasks on ECS Clusters
• EC2 Launch Type: you must provision
& maintain the infrastructure (the EC2 Instance EC2 Instance EC2 Instance
EC2 instances)
• Each EC2 Instance must run the ECS
Agent to register in the ECS Cluster
• AWS takes care of starting / stopping
containers
ECS Agent ECS Agent ECS Agent
Amazon ECS – Fargate Launch Type
• Launch Docker containers on AWS New Docker
Container
• You do not provision the infrastructure
(no EC2 instances to manage)
• It’s all Serverless!
AWS Fargate / ECS Cluster
• You just create task definitions
• AWS just runs ECS Tasks for you based
on the CPU / RAM you need
• To scale, just increase the number of
tasks. Simple - no more EC2 instances
Amazon ECS – IAM Roles for ECS
EC2 Instance Profile

• EC2 Instance Profile (EC2 Launch Type EC2 Instance


ECS
only):
• Used by the ECS agent ECR
• Makes API calls to ECS service
ECS Agent CloudWatc
• Send container logs to CloudWatch Logs
Logs
• Pull Docker image from ECR
EC2 Task A Role
• Reference sensitive data in Secrets Manager or
SSM Parameter Store Task A S3

• ECS Task Role: EC2 Task B Role

• Allows each task to have a specific role DynamoDB


Task B
• Use different roles for the different ECS Services
you run
• Task Role is defined in the task definition
Amazon ECS – Load Balancer Integrations
• Application Load Balancer supported EC2 Instance
and works for most use cases
ECS Task

• Network Load Balancer recommended ECS Task


only for high throughput / high
performance use cases, or to pair it with 80/443

AWS Private Link EC2 Instance


Users
Application ECS Task
Load Balancer
• Classic Load Balancer supported but
not recommended (no advanced ECS Task

features – no Fargate)
ECS Cluster
Amazon ECS – Data Volumes (EFS)
• Mount EFS file systems onto ECS tasks
• Works for both EC2 and Fargate launch types EC2 Instance Fargate
• Tasks running in any AZ will share the same data
in the EFS file system
• Fargate + EFS = Serverless

• Use cases: persistent multi-AZ shared storage for mount mount


your containers ECS Cluster

• Note: Amazon EFS


• Amazon S3 cannot be mounted as a file system

File System
ECS Service Auto Scaling
• Automatically increase/decrease the desired number of ECS tasks

• Amazon ECS Auto Scaling uses AWS Application Auto Scaling


• ECS Service Average CPU Utilization
• ECS Service Average Memory Utilization - Scale on RAM
• ALB Request Count Per Target – metric coming from the ALB

• Target Tracking – scale based on target value for a specific CloudWatch metric
• Step Scaling – scale based on a specified CloudWatch Alarm
• Scheduled Scaling – scale based on a specified date/time (predictable changes)

• ECS Service Auto Scaling (task level) ≠ EC2 Auto Scaling (EC2 instance level)
• Fargate Auto Scaling is much easier to setup (because Serverless)
EC2 Launch Type – Auto Scaling EC2 Instances
• Accommodate ECS Service Scaling by adding underlying EC2 Instances

• Auto Scaling Group Scaling


• Scale your ASG based on CPU Utilization
• Add EC2 instances over time

• ECS Cluster Capacity Provider


• Used to automatically provision and scale the infrastructure for your ECS Tasks
• Capacity Provider paired with an Auto Scaling Group
• Add EC2 Instances when you’re missing capacity (CPU, RAM…)
ECS Scaling – Service CPU Usage Example
CPU
Usage

Task 1 Task 3
(new)

Task 2
Service A
Auto Scaling
Auto Scaling Group
ca le
S
Scale ECS Capacity Providers
CloudWatch Metric Trigger (optional)
(ECS Service CPU Usage)
CloudWatch Alarm
ECS Rolling Updates
ECS Service update screen
• When updating from v1 to v2, we can
control how many tasks can be started
and stopped, and in which order
Actual Running Capacity (100%)

v1 v1 v1 v1 v1 v1 v1 v1 v1 v2 v2 v2

Minimum Healthy Percent (0-100%) Allowed to terminate tasks


Allowed to create tasks

Maximum Percent (100-200%)


ECS Rolling Update – Min 50%, Max 100%
• Starting number of tasks: 4

v1 v2 v2 v2

v1 v2 v2 v2

v1 v1 v1 v2

v1 v1 v1 v2
ECS Rolling Update – Min 100%, Max 150%
• Starting number of tasks: 4
v1 v1 v2 v2

v1 v1 v2 v2

v1 v1 v1 v1

v1 v1 v1 v1

v2 v2 v2 v2

v2 v2 v2 v2
ECS tasks invoked by Event Bridge
Region

VPC

Upload object
AWS Fargate
Ge
Client to
S3 Bucket bje
ct
Task
(new) ECS Task Role
Event (Access S3 & DynamoDB) Save result
k
Tas
n ECS
e : Ru
Rul Amazon
DynamoDB
Amazon ECS Cluster
Amazon
EventBridge
ECS tasks invoked by Event Bridge Schedule

AWS Fargate

Task
(new) ECS Task Role
Every 1 hour Access S3
Rule: Run ECS Task Batch Processing

Amazon Amazon S3
EventBridge
Amazon ECS Cluster
ECS – SQS Queue Example

Task 1 Task 3
Messages Poll for messages

SQS Queue
Task 2
Service A
ECS Service Auto Scaling
ECS – Intercept Stopped Tasks using EventBridge
ECS Task

event trigger email

exited
EventBridge SNS Administrator
Containers

Event Pattern
Amazon ECS – Task Definitions
• Task definitions are metadata in JSON form to tell Internet
ECS how to run a Docker container
• It contains crucial information, such as:
• Image Name EC2 Instance
8080 Host Port

• Port Binding for Container and Host Container Port


• Memory and CPU required 80
• Environment variables
• Networking information
• IAM Role
• Logging configuration (ex CloudWatch)
ECS Agent
• Can define up to 10 containers in a Task Definition
Amazon ECS – Load Balancing (EC2 Launch Type)
• We get a Dynamic Host Dynamic EC2 Instance
Por t Mapping if you Host Port Mapping
36789 ECS Task
define only the
container por t in the
task definition 39586 ECS Task

• The ALB finds the right 80/443

port on your EC2 EC2 Instance


Users
Instances Application
39748 ECS Task
Load Balancer
• You must allow on the
EC2 instance’s Security 39856 ECS Task
Group any por t from
the ALB’s Security ECS Cluster
Group
Amazon ECS – Load Balancing (Fargate)
• Each task has a unique
private IP 172.16.4.5

80 ECS Task
• Only define the container 172.17.35.88
por t (host port is not 80 ECS Task
applicable)
80/443

• Example Users
Application
172.18.8.192

80 ECS Task
• ECS ENI Security Group Load Balancer
172.16.4.6
• Allow port 80 from the ALB
80 ECS Task
• ALB Security Group
• Allow port 80/443 from web
ECS Cluster
Amazon ECS
One IAM Role per Task Definition
Service A
ECS Task A Role
S3

Task Definition A

Service B
ECS Task B Role
DynamoDB

Task Definition B
Amazon ECS – Environment Variables
• Environment Variable
• Hardcoded – e.g., URLs
• SSM Parameter Store – sensitive variables (e.g., API keys, shared configs)
• Secrets Manager – sensitive variables (e.g., DB passwords)
• Environment Files (bulk) – Amazon S3
fetch values
SSM Parameter Store

fetch values
Secrets Manager

Task Definition
fetch files
S3 Bucket
Amazon ECS – Data Volumes (Bind Mounts)
• Share data between multiple containers in the
same Task Definition
ECS Task
• Works for both EC2 and Fargate tasks
• EC2 Tasks – using EC2 instance storage Application Metrics & Logs
• Data are tied to the lifecycle of the EC2 instance Containers Container (Sidecar)

• Fargate Tasks – using ephemeral storage


• Data are tied to the container(s) using them
• 20 GiB – 200 GiB (default 20 GiB) write read

Bind Mount
• Use cases: Shared Storage
• Share ephemeral data between multiple containers (/var/logs/)
• “Sidecar” container pattern, where the “sidecar”
container used to send metrics/logs to other
destinations (separation of conerns)
ECS Cluster
Amazon ECS – Task Placement
• When an ECS task is started with EC2 Amazon ECS
Launch Type, ECS must determine
where to place it, with the constraints New Docker
of CPU and memory (RAM) Container

• Similarly, when a service scales in, ECS ?


needs to determine which task to
terminate
EC2 Instance EC2 Instance EC2 Instance
• You can define:
• Task Placement Strategy
• Task Placement Constraints
• Note: only for ECS Tasks with EC2
Launch Type (Fargate not supported)
Amazon ECS – Task Placement Process
• Task Placement Strategies are a best effor t

• When Amazon ECS places a task, it uses the following process to select
the appropriate EC2 Container instance:
1. Identify which instances that satisfy the CPU, memory, and por t requirements
2. Identify which instances that satisfy the Task Placement Constraints
3. Identify which instances that satisfy the Task Placement Strategies
4. Select the instances
Amazon ECS – Task Placement Strategies
• Binpack
• Tasks are placed on the least available amount of CPU and Memory
• Minimizes the number of EC2 instances in use (cost savings)

Amazon ECS

EC2 Instance EC2 Instance


Amazon ECS – Task Placement Strategies
• Random
• Tasks are placed randomly

Amazon ECS

EC2 Instance EC2 Instance


Amazon ECS – Task Placement Strategies
• Spread
• Tasks are placed evenly based on the specified value
• Example: instanceId, attribute:ecs.availability-zone, …
us-east-1a us-east-1b us-east-1c

EC2 Instance EC2 Instance EC2 Instance

ECS Cluster
Amazon ECS – Task Placement Strategies
• You can mix them together
Amazon ECS – Task Placement Constraints
• distinctInstance
• Tasks are placed on a different EC2 instance

• memberOf
• Tasks are placed on EC2 instances that satisfy a specified expression
• Uses the Cluster Query Language (advanced)
Amazon ECR
ECR Repository
• ECR = Elastic Container Registry
Docker Docker
• Store and manage Docker images on AWS Image A Image B

• Private and Public repository (Amazon ECR


Public Gallery https://gallery.ecr.aws) pull pull

• Fully integrated with ECS, backed by Amazon S3 IAM Role

• Access is controlled through IAM (permission EC2 Instance

errors => policy)


• Supports image vulnerability scanning, versioning,
image tags, image lifecycle, …
ECS Cluster
Amazon ECR – Using AWS CLI
• Login Command
• AWS CLI v2
aws ecr get-login-password --region region | docker login --username AWS
--password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

• Docker Commands
• Push
docker push aws_account_id.dkr.ecr.region.amazonaws.com/demo:latest

• Pull
docker pull aws_account_id.dkr.ecr.region.amazonaws.com/demo:latest

• In case an EC2 instance (or you) can’t pull a Docker image, check IAM
permissions
AWS Copilot
• CLI tool to build, release, and operate production-ready containerized apps
• Run your apps on AppRunner, ECS, and Fargate
• Helps you focus on building apps rather than setting up infrastructure
• Provisions all required infrastructure for containerized apps (ECS, VPC, ELB, ECR…)
• Automated deployments with one command using CodePipeline
• Deploy to multiple environments
• Troubleshooting, logs, health status…
Well-architected
Amazon ECS
infrastructure setup

Deployment Pipeline AWS Fargate


Microservices Architecture AWS Copilot
Use CLI or YAML to describe the CLI for containerized Effective Operations
architecture of your applications applications and Troubleshooting AWS App Runner
Amazon EKS Overview
• Amazon EKS = Amazon Elastic Kubernetes Service
• It is a way to launch managed Kubernetes clusters on AWS
• Kubernetes is an open-source system for automatic deployment, scaling and
management of containerized (usually Docker) application
• It’s an alternative to ECS, similar goal but different API
• EKS supports EC2 if you want to deploy worker nodes or Fargate to deploy
serverless containers
• Use case: if your company is already using Kubernetes on-premises or in
another cloud, and wants to migrate to AWS using Kubernetes
• Kubernetes is cloud-agnostic (can be used in any cloud – Azure, GCP…)
• For multiple regions, deploy one EKS cluster per region
• Collect logs and metrics using CloudWatch Container Insights
Amazon EKS - Diagram
AWS Cloud
Availability Zone 1 Availability Zone 2 Availability Zone 3

VPC
Public subnet 1 Public subnet 2 Public subnet 3
EKS
Public
Service LB NGW NGW
ELB NGW ELB ELB

Private subnet 1 Private subnet 2 Private subnet 3 EKS


Private
Service LB
ELB
EKS node EKS node EKS node
Auto Scaling Group

EKS Pods EKS Pods EKS Pods

EKS Worker Nodes


Amazon EKS – Node Types
• Managed Node Groups
• Creates and manages Nodes (EC2 instances) for you
• Nodes are part of an ASG managed by EKS
• Supports On-Demand or Spot Instances

• Self-Managed Nodes
• Nodes created by you and registered to the EKS cluster and managed by an ASG
• You can use prebuilt AMI - Amazon EKS Optimized AMI
• Supports On-Demand or Spot Instances

• AWS Fargate
• No maintenance required; no nodes managed
Amazon EKS – Data Volumes
• Need to specify StorageClass manifest on your EKS cluster
• Leverages a Container Storage Interface (CSI) compliant driver

• Support for…
• Amazon EBS
• Amazon EFS (works with Fargate)
• Amazon FSx for Lustre
• Amazon FSx for NetApp ONTAP
AWS Elastic Beanstalk
Deploying applications in AWS safely and predictably
Typical architecture: Web App 3-tier
Route 53
ElastiCache
Auto Scaling group

Availability zone 1

Multi AZ

Store / retrieve
Availability zone 2
session data
+ Cached data

ELB
Availability zone 3

Amazon RDS
Read / write data

PUBLIC SUBNET PRIVATE SUBNET DATA SUBNET


Developer problems on AWS
• Managing infrastructure
• Deploying Code
• Configuring all the databases, load balancers, etc
• Scaling concerns

• Most web apps have the same architecture (ALB + ASG)


• All the developers want is for their code to run!
• Possibly, consistently across different applications and environments
Elastic Beanstalk – Overview
• Elastic Beanstalk is a developer centric view of deploying an application
on AWS
• It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, …
• Managed service
• Automatically handles capacity provisioning, load balancing, scaling, application
health monitoring, instance configuration, …
• Just the application code is the responsibility of the developer
• We still have full control over the configuration
• Beanstalk is free but you pay for the underlying instances
Elastic Beanstalk – Components
• Application: collection of Elastic Beanstalk components (environments,
versions, configurations, …)
• Application Version: an iteration of your application code
• Environment
• Collection of AWS resources running an application version (only one application
version at a time)
• Tiers: Web Server Environment Tier & Worker Environment Tier
• You can create multiple environments (dev, test, prod, …)

update version

Create Upload Launch Manage


Application Version Environment Environment

deploy new version


Elastic Beanstalk – Supported Platforms
• Go • Ruby
• Java SE • Packer Builder
• Java with Tomcat • Single Container Docker
• .NET Core on Linux • Multi-container Docker
• .NET on Windows Server • Preconfigured Docker
• Node.js
• PHP • If not supported, you can write
your custom platform (advanced)
• Python
Web Server Tier vs. Worker Tier
Web Environment
(myapp.us-east-1.elasticbeanstalk.com) Worker Environment

SQS Queue
Availability Zone 1 ELB Availability Zone 2 Availability Zone 1 Availability Zone 2

SQS message SQS message


pull
messages

Security Group Security Group

Auto Scaling group Auto Scaling group

EC2 Instance EC2 Instance


EC2 Instance EC2 Instance
(Worker) (Worker)
(Web Server) (Web Server)

• Scale based on the number of SQS messages


• Can push messages to SQS queue from
another Web Server Tier
Elastic Beanstalk Deployment Modes

Single Instance High Availability with Load Balancer


Great for dev Great for prod

Availability Zone 1 Availability Zone 1 ALB Availability Zone 2


Elastic IP
Auto Scaling Group

EC2 Instance EC2 Instance EC2 Instance

RDS Master RDS Master RDS Standby


Beanstalk Deployment Options for Updates
• All at once (deploy all in one go) – fastest, but instances aren’t available to serve
traffic for a bit (downtime)
• Rolling: update a few instances at a time (bucket), and then move onto the next
bucket once the first bucket is healthy
• Rolling with additional batches: like rolling, but spins up new instances to move the
batch (so that the old application is still available)
• Immutable: spins up new instances in a new ASG, deploys version to these instances,
and then swaps all the instances when everything is healthy
• Blue Green: create a new environment and switch over when ready
• Traffic Splitting: canary testing – send a small % of traffic to new deployment
Elastic Beanstalk Deployment
All at once
• Fastest deployment v1 v2

• Application has downtime


v1 v2
• Great for quick iterations in
development environment
v1 v2
• No additional cost
v1 v2
Elastic Beanstalk Deployment
Rolling
• Application is
running below
capacity

Bucket (size 2)
v1 v2 v2 v2
• Can set the
bucket size
v1 v2 v2 v2
• Application is
running both
versions

Bucket (size 2)
v1 v1 v1 v2
simultaneously
• No additional
cost v1 v1 v1 v2
• Long
deployment
Elastic Beanstalk Deployment
Rolling with additional batches
• Application is running
at capacity v1 v1 v2 v2 v2 v2
• Can set the bucket
size v1 v1 v2 v2 v2 v2
• Application is running
both versions v1 v2 v2
simultaneously v1 v1 v1
• Small additional cost
• Additional batch is v1 v1 v1 v1 v2 v2
removed at the end
of the deployment new v2 v2 v2 v2 v2 terminated
• Longer deployment
• Good for prod new
v2 v2 v2 v2 v2 terminated
Elastic Beanstalk Deployment
Immutable Current ASG Current ASG Current ASG Current ASG

• Zero downtime v1 v1 v1

V1 terminated
• New Code is deployed to new v1 v1 v1
instances on a temporary ASG
v1 v1 v1
• High cost, double capacity
• Longest deployment v2 v2

• Quick rollback in case of failures v2


v2 v2
(just terminate new ASG) v2
• Great for prod v2 v2
v2

Temp ASG
Elastic Beanstalk Deployment
Blue / Green
• Not a “direct feature” of Elastic Beanstalk

Environment “blue”
v1
• Zero downtime and release facility
• Create a new “stage” environment and v1
deploy v2 there

90
• The new environment (green) can be v1

%
validated independently and roll back if Web traffic
issues
• Route 53 can be setup using weighted

Environment “green”
v2
policies to redirect a little bit of traffic to

%
Amazon
Route 53

10
the stage environment v2
• Using Beanstalk, “swap URLs” when done
with the environment test v2
Elastic Beanstalk - Traffic Splitting
• Canary Testing
v1
• New application version is deployed to a
temporary ASG with the same capacity

Main ASG
v1
• A small % of traffic is sent to the
temporary ASG for a configurable amount

Migrate Instances

90
of time v1

%
• Deployment health is monitored
• If there’s a deployment failure, this triggers
an automated rollback (very quick) v2 ALB

Temporary ASG

%
• No application downtime

10
• New instances are migrated from the v2
temporary to the original ASG
• Old application version is then terminated v2
Elastic Beanstalk Deployment Summary
from AWS Doc
• https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-
features.deploy-existing-version.html
Elastic Beanstalk CLI
• We can install an additional CLI called the “EB cli” which makes working with
Beanstalk from the CLI easier
• Basic commands are:
• eb create
• eb status
• eb health
• eb events
• eb logs
• eb open
• eb deploy
• eb config
• eb terminate
• It’s helpful for your automated deployment pipelines!
Elastic Beanstalk Deployment Process
• Describe dependencies
(requirements.txt for Python, package.json for Node.js)
• Package code as zip, and describe dependencies
• Python: requirements.txt
• Node.js: package.json
• Console: upload zip file (creates new app version), and then deploy
• CLI: create new app version using CLI (uploads zip), and then deploy

• Elastic Beanstalk will deploy the zip on each EC2 instance, resolve
dependencies and start the application
Beanstalk Lifecycle Policy
• Elastic Beanstalk can store at most 1000 application versions
• If you don’t remove old versions, you won’t be able to deploy anymore
• To phase out old application versions, use a lifecycle policy
• Based on time (old versions are removed)
• Based on space (when you have too many versions)
• Versions that are currently used won’t be deleted
• Option not to delete the source bundle in S3 to prevent data loss
Elastic Beanstalk Extensions
• A zip file containing our code must be deployed to Elastic Beanstalk
• All the parameters set in the UI can be configured with code using files
• Requirements:
• in the .ebextensions/ directory in the root of source code
• YAML / JSON format
• .config extensions (example: logging.config)
• Able to modify some default settings using: option_settings
• Ability to add resources such as RDS, ElastiCache, DynamoDB, etc…

• Resources managed by .ebextensions get deleted if the environment goes away


Elastic Beanstalk Under the Hood
• Under the hood, Elastic Beanstalk relies on CloudFormation
• CloudFormation is used to provision other AWS services (we’ll see later)


Elastic Beanstalk CloudFormation

• Use case: you can define CloudFormation resources in your


.ebextensions to provision ElastiCache, an S3 bucket, anything you want!
• Let’s have a sneak peak into it!
Elastic Beanstalk Cloning
• Clone an environment with the exact same configuration
• Useful for deploying a “test” version of your application

• All resources and configuration are preserved:


cloning
• Load Balancer type and configuration
• RDS database type (but the data is not preserved)
• Environment variables

• After cloning an environment, you can change settings


Elastic Beanstalk Migration: Load Balancer
• After creating an Elastic Beanstalk Beanstalk old
environment, you cannot change CLB
the Elastic Load Balancer type
(only the configuration)
• To migrate:
1. create a new environment with the Route 53
same configuration except LB Or CNAME Swap
(can’t clone) Beanstalk new
2. deploy your application onto the ALB
new environment
3. perform a CNAME swap or Route
53 update
RDS with Elastic Beanstalk
• RDS can be provisioned with Beanstalk, which is great for dev / test
• This is not great for prod as the database lifecycle is tied to the
Beanstalk environment lifecycle
• The best for prod is to separately create an RDS database and provide
our EB application with the connection string
Beanstalk with RDS
RDS EC2 ALB
Elastic Beanstalk Migration: Decouple RDS
1. Create a snapshot of RDS DB (as a Beanstalk old
safeguard)
2. Go to the RDS console and protect
the RDS database from deletion
3. Create a new Elastic Beanstalk
environment, without RDS, point your
application to existing RDS Route 53
4. perform a CNAME swap (blue/green) Or CNAME Swap
or Route 53 update, confirm working Beanstalk new

5. Terminate the old environment (RDS


won’t be deleted)
6. Delete CloudFormation stack (in
DELETE_FAILED state)
AWS CloudFormation
Managing your infrastructure as code
Infrastructure as Code
• Currently, we have been doing a lot of manual work
• All this manual work will be very tough to reproduce:
• In another region
• in another AWS account
• Within the same region if everything was deleted

• Wouldn’t it be great, if all our infrastructure was… code?


• That code would be deployed and create / update / delete our
infrastructure
What is CloudFormation
• CloudFormation is a declarative way of outlining your AWS
Infrastructure, for any resources (most of them are supported).
• For example, within a CloudFormation template, you say:
• I want a security group
• I want two EC2 machines using this security group
• I want two Elastic IPs for these EC2 machines
• I want an S3 bucket
• I want a load balancer (ELB) in front of these machines

• Then CloudFormation creates those for you, in the right order, with the
exact configuration that you specify
Benefits of AWS CloudFormation (1/2)
• Infrastructure as code
• No resources are manually created, which is excellent for control
• The code can be version controlled for example using git
• Changes to the infrastructure are reviewed through code

• Cost
• Each resources within the stack is tagged with an identifier so you can easily see how
much a stack costs you
• You can estimate the costs of your resources using the CloudFormation template
• Savings strategy: In Dev, you could automation deletion of templates at 5 PM and
recreated at 8 AM, safely
Benefits of AWS CloudFormation (2/2)
• Productivity
• Ability to destroy and re-create an infrastructure on the cloud on the fly
• Automated generation of Diagram for your templates!
• Declarative programming (no need to figure out ordering and orchestration)

• Separation of concern: create many stacks for many apps, and many layers. Ex:
• VPC stacks
• Network stacks
• App stacks

• Don’t re-invent the wheel


• Leverage existing templates on the web!
• Leverage the documentation
How CloudFormation Works
• Templates have to be uploaded in S3 and then referenced in
CloudFormation
• To update a template, we can’t edit previous ones. We have to re-
upload a new version of the template to AWS
• Stacks are identified by a name
• Deleting a stack deletes every single artifact that was created by
CloudFormation.
Deploying CloudFormation templates
• Manual way:
• Editing templates in the CloudFormation Designer
• Using the console to input parameters, etc

• Automated way:
• Editing templates in a YAML file
• Using the AWS CLI (Command Line Interface) to deploy the templates
• Recommended way when you fully want to automate your flow
CloudFormation Building Blocks
Templates components (one course section for each):
1. Resources: your AWS resources declared in the template (MANDATORY)
2. Parameters: the dynamic inputs for your template
3. Mappings: the static variables for your template
4. Outputs: References to what has been created
5. Conditionals: List of conditions to perform resource creation
6. Metadata

Templates helpers:
1. References
2. Functions
Note:
This is an introduction to CloudFormation
• It can take over 3 hours to properly learn and master CloudFormation
• This section is meant so you get a good idea of how it works
• We’ll be slightly less hands-on than in other sections

• We’ll learn everything we need to answer questions for the exam


• The exam does not require you to actually write CloudFormation
• The exam expects you to understand how to read CloudFormation
Introductory Example
• We’re going to create a simple EC2 instance.
• Then we’re going to create to add an Elastic IP to it
• And we’re going to add two security groups to it
EC2 Instance
• For now, forget about the code syntax.
• We’ll look at the structure of the files later on

• We’ll see how in no-time, we are able to get started with CloudFormation!
YAML Crash Course
• YAML and JSON are the languages you can
use for CloudFormation.
• JSON is horrible for CF
• YAML is great in so many ways
• Let’s learn a bit about it!

• Key value Pairs


• Nested objects
• Support Arrays
• Multi line strings
• Can include comments!
What are resources?
• Resources are the core of your CloudFormation template (MANDATORY)
• They represent the different AWS Components that will be created and
configured
• Resources are declared and can reference each other

• AWS figures out creation, updates and deletes of resources for us


• There are over 224 types of resources (!)
• Resource types identifiers are of the form:
AWS::aws-product-name::data-type-name
How do I find
resources documentation?
• I can’t teach you all of the 224 resources, but I can teach you how to
learn how to use them.
• All the resources can be found here:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aw
s-template-resource-type-ref.html
• Then, we just read the docs J
• Example here (for an EC2 instance):
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aw
s-properties-ec2-instance.html
Analysis of CloudFormation Template
• Going back to the example of the introductory section, let’s learn why it
was written this way.
• Relevant documentation can be found here:
• http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-
properties-ec2-instance.html
• http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-
properties-ec2-security-group.html
• http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-
properties-ec2-eip.html
FAQ for resources
• Can I create a dynamic amount of resources?
ØNo, you can’t. Everything in the CloudFormation template has to be
declared. You can’t perform code generation there

• Is every AWS Service supported?


ØAlmost. Only a select few niches are not there yet
ØYou can work around that using AWS Lambda Custom Resources
What are parameters?
• Parameters are a way to provide inputs to your AWS CloudFormation
template
• They’re important to know about if:
• You want to reuse your templates across the company
• Some inputs can not be determined ahead of time
• Parameters are extremely powerful, controlled, and can prevent errors
from happening in your templates thanks to types.
When should you use a parameter?
• Ask yourself this:
• Is this CloudFormation resource configuration likely to change in the future?
• If so, make it a parameter.

• You won’t have to re-upload a template to change its content J


Parameters Settings
Parameters can be controlled by all these settings:
• Type: • ConstraintDescription (String)
• String • Min/MaxLength
• Number
• CommaDelimitedList
• Min/MaxValue
• List<Type> • Defaults
• AWS Parameter (to help catch • AllowedValues (array)
invalid values – match against
existing values in the AWS Account) • AllowedPattern (regexp)
• Description • NoEcho (Boolean)
• Constraints
How to Reference a Parameter
• The Fn::Ref function can be leveraged to reference parameters
• Parameters can be used anywhere in a template.
• The shorthand for this in YAML is !Ref
• The function can also reference other elements within the template
Concept: Pseudo Parameters
• AWS offers us pseudo parameters in any CloudFormation template.
• These can be used at any time and are enabled by default
Reference Value Example Return Value
AWS::AccountId 1234567890
[arn:aws:sns:us-east-
AWS::NotificationARNs 1:123456789012:MyTopic]
AWS::NoValue Does not return a value.
AWS::Region us-east-2
arn:aws:cloudformation:us-east-
1:123456789012:stack/MyStack/1c2fa62
AWS::StackId 0-982a-11e3-aff7-50e2416294e0
AWS::StackName MyStack
What are mappings?
• Mappings are fixed variables within your CloudFormation Template.
• They’re very handy to differentiate between different environments
(dev vs prod), regions (AWS regions), AMI types, etc
• All the values are hardcoded within the template
• Example:
When would you use mappings vs parameters ?
• Mappings are great when you know in advance all the values that can be
taken and that they can be deduced from variables such as
• Region
• Availability Zone
• AWS Account
• Environment (dev vs prod)
• Etc…
• They allow safer control over the template.

• Use parameters when the values are really user specific


Fn::FindInMap
Accessing Mapping Values
• We use Fn::FindInMap to return a named value from a specific key
• !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]
What are outputs?
• The Outputs section declares optional outputs values that we can import into
other stacks (if you export them first)!
• You can also view the outputs in the AWS Console or in using the AWS CLI
• They’re very useful for example if you define a network CloudFormation, and
output the variables such as VPC ID and your Subnet IDs
• It’s the best way to perform some collaboration cross stack, as you let expert
handle their own part of the stack
• You can’t delete a CloudFormation Stack if its outputs are being referenced
by another CloudFormation stack
Outputs Example
• Creating a SSH Security Group as part of one template
• We create an output that references that security group
Cross Stack Reference
• We then create a second template that leverages that security group
• For this, we use the Fn::ImportValue function
• You can’t delete the underlying stack until all the references are deleted
too.
What are conditions used for?
• Conditions are used to control the creation of resources or outputs
based on a condition.
• Conditions can be whatever you want them to be, but common ones
are:
• Environment (dev / test / prod)
• AWS Region
• Any parameter value
• Each condition can reference another condition, parameter value or
mapping
How to define a condition?

• The logical ID is for you to choose. It’s how you name condition
• The intrinsic function (logical) can be any of the following:
• Fn::And
• Fn::Equals
• Fn::If
• Fn::Not
• Fn::Or
Using a Condition
• Conditions can be applied to resources / outputs / etc…
CloudFormation
Must Know Intrinsic Functions
• Ref
• Fn::GetAtt
• Fn::FindInMap
• Fn::ImportValue
• Fn::Join
• Fn::Sub
• Condition Functions (Fn::If, Fn::Not, Fn::Equals, etc…)
Fn::Ref
• The Fn::Ref function can be leveraged to reference
• Parameters => returns the value of the parameter
• Resources => returns the physical ID of the underlying resource (ex: EC2 ID)
• The shorthand for this in YAML is !Ref
Fn::GetAtt
• Attributes are attached to any resources you create
• To know the attributes of your resources, the best place to look at is
the documentation.
• For example: the AZ of an EC2 machine!
Fn::FindInMap
Accessing Mapping Values
• We use Fn::FindInMap to return a named value from a specific key
• !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]
Fn::ImportValue
• Import values that are exported in other templates
• For this, we use the Fn::ImportValue function
Fn::Join
• Join values with a delimiter

• This creates “a:b:c”


Function Fn::Sub
• Fn::Sub, or !Sub as a shorthand, is used to substitute variables from a
text. It’s a very handy function that will allow you to fully customize your
templates.
• For example, you can combine Fn::Sub with References or AWS Pseudo
variables!
• String must contain ${VariableName} and will substitute them
Condition Functions

• The logical ID is for you to choose. It’s how you name condition
• The intrinsic function (logical) can be any of the following:
• Fn::And
• Fn::Equals
• Fn::If
• Fn::Not
• Fn::Or
CloudFormation Rollbacks
• Stack Creation Fails:
• Default: everything rolls back (gets deleted). We can look at the log
• Option to disable rollback and troubleshoot what happened

• Stack Update Fails:


• The stack automatically rolls back to the previous known working state
• Ability to see in the log what happened and error messages
CloudFormation Stack Notifications
• Send Stack events to SNS Topic (Email, Lambda, …)
• Enable SNS Integration using Stack Options

events trigger filter only


ROLLBACK_IN_PROGRES
events

events CloudFormation SNS Lambda Funceon


Stack
(SNS Integration enabled)
CloudFormation SNS ROLLBACK_IN_PROGRESS
Stack
(SNS Integration enabled)

email

User SNS
ChangeSets
• When you update a stack, you need to know what changes before it
happens for greater confidence
• ChangeSets won’t say if the update will be successful

1. Create 2. View 4. Execute


Change set Change set Change set

Original stack AWS CloudFormation


change set change set

3. (optional) Create
Additional change sets
From: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stac
changesets.html
Nested stacks
• Nested stacks are stacks as part of other stacks
• They allow you to isolate repeated patterns / common components in
separate stacks and call them from other stacks
• Example:
• Load Balancer configuration that is re-used
• Security Group that is re-used
• Nested stacks are considered best practice
• To update a nested stack, always update the parent (root stack)
CloudFormation – Cross vs Nested Stacks
• Cross Stacks Stack 1
• Helpful when stacks have different lifecycles
• Use Outputs Export and Fn::ImportValue VPC
• When you need to pass export values to Stack 2
Stack
many stacks (VPC Id, etc…)
Stack 3
• Nested Stacks
• Helpful when components must be re-used
• Ex: re-use how to properly configure an App Stack App Stack
Application Load Balancer RDS RDS
• The nested stack only is important to the Stack Stack
higher level stack (it’s not shared)
ASG ELB ASG ELB
Stack Stack Stack Stack
CloudFormation - StackSets
• Create, update, or delete stacks CloudFormation StackSet
Admin Account
across multiple accounts and regions
with a single operation
• Administrator account to create
StackSets
• Trusted accounts to create, update,
delete stack instances from StackSets
• When you update a stack
set, all associated stack instances are
updated throughout all accounts and Account A Account A Account B
us-east-1 ap-south-1 eu-west-2
regions.
CloudFormation Drift
• CloudFormation allows you to create infrastructure
• But it doesn’t protect you against manual configuration changes
• How do we know if our resources have drifted?

• We can use CloudFormation drift!

• Not all resources are supported yet:


https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/usi
ng-cfn-stack-drift-resource-list.html
CloudFormation Stack Policies
• During a CloudFormation Stack update, all
update actions are allowed on all resources
(default)

• A Stack Policy is a JSON document that


defines the update actions that are allowed
on specific resources during Stack updates
• Protect resources from unintentional updates
• When you set a Stack Policy, all resources in
the Stack are protected by default
• Specify an explicit ALLOW for the resources Allow updates on all resources
you want to be allowed to be updated except the ProductionDatabase
AWS Integration & Messaging
SQS, SNS & Kinesis
Section Introduction
• When we start deploying multiple applications, they will inevitably need
to communicate with one another
• There are two patterns of application communication

1) Synchronous communications 2) Asynchronous / Event based


(application to application) (application to queue to application)

Buying Shipping Buying Shipping


Queue
Service Service Service Service
Section Introduction
• Synchronous between applications can be problematic if there are
sudden spikes of traffic
• What if you need to suddenly encode 1000 videos but usually it’s 10?

• In that case, it’s better to decouple your applications,


• using SQS: queue model
• using SNS: pub/sub model
• using Kinesis: real-time streaming model
• These services can scale independently from our application!
Amazon SQS
What’s a queue?
Consumer

Producer
Consumer
Send messages
Producer Poll messages

Consumer

Producer SQS Queue

Consumer
Amazon SQS – Standard Queue
• Oldest offering (over 10 years old)
• Fully managed service, used to decouple applications

• Attributes:
• Unlimited throughput, unlimited number of messages in queue
• Default retention of messages: 4 days, maximum of 14 days
• Low latency (<10 ms on publish and receive)
• Limitation of 256KB per message sent

• Can have duplicate messages (at least once delivery, occasionally)


• Can have out of order messages (best effort ordering)
SQS – Producing Messages
• Produced to SQS using the SDK (SendMessage API)
• The message is persisted in SQS until a consumer deletes it
• Message retention: default 4 days, up to 14 days

• Example: send an order to be processed


• Order id
Sent to SQS
• Customer id
• Any attributes you want
Message
Up to 256 kb
• SQS standard: unlimited throughput
SQS – Consuming Messages
• Consumers (running on EC2 instances, servers, or AWS Lambda)…
• Poll SQS for messages (receive up to 10 messages at a time)
• Process the messages (example: insert the message into an RDS database)
• Delete the messages using the DeleteMessage API

Poll / Receive
messages insert
Consumer

DeleteMessage
SQS – Multiple EC2 Instances Consumers
• Consumers receive and process
messages in parallel
• At least once delivery
SQS Queue • Best-effort message ordering
• Consumers delete messages
after processing them
• We can scale consumers
horizontally to improve
poll throughput of processing
SQS with Auto Scaling Group (ASG)
Poll for messages
EC2 Instances

SQS Queue

Auto Scaling Group

scale

Alarm for breach

CloudWatch Metric – Queue Length CloudWatch Alarm


ApproximateNumberOfMessages
SQS to decouple between application tiers

Back-end processing
Front-end web app Application
(video processing)

requests SendMessage ReceiveMessages insert

SQS Queue

Auto-Scaling Auto-Scaling
Amazon SQS - Security
• Encryption:
• In-flight encryption using HTTPS API
• At-rest encryption using KMS keys
• Client-side encryption if the client wants to perform encryption/decryption itself

• Access Controls: IAM policies to regulate access to the SQS API

• SQS Access Policies (similar to S3 bucket policies)


• Useful for cross-account access to SQS queues
• Useful for allowing other services (SNS, S3…) to write to an SQS queue
SQS Queue Access Policy
Cross Account Access Publish S3 Event Notifications
Account Account
To SQS Queue
444455556666 111122223333
Upload object Send message
Poll for messages
S3 Bucket SQS Queue
SQS Queue (bucket1) (queue1)
EC2 Instance {
(queue1)
"Version": "2012-10-17",
{ "Statement" : [{
"Version": "2012-10-17", "Effect": "Allow",
"Statement" : [{ "Principal": { "AWS": "*"},
"Effect": "Allow", "Action": [ "sqs:SendMessage" ],
"Principal": { "AWS": [ "111122223333" ] }, "Resource": "arn:aws:sqs:us-east-1:444455556666:queue1",
"Action": [ "sqs:ReceiveMessage" ], "Condition": {
"Resource": "arn:aws:sqs:us-east-1:444455556666:queue1" "ArnLike": { "aws:SourceArn": "arn:aws:s3:*:*:bucket1" },
}] "StringEquals": { "aws:SourceAccount": "<bucket1_owner_account_id>" },
} }
}]
}
SQS – Message Visibility Timeout
• After a message is polled by a consumer, it becomes invisible to other consumers
• By default, the “message visibility timeout” is 30 seconds
• That means the message has 30 seconds to be processed
• After the message visibility timeout is over, the message is “visible” in SQS
ReceiveMessage ReceiveMessage ReceiveMessage ReceiveMessage
Request Request Request Request

Visibility timeout

Not returned Not returned Time

Message returned Message returned (again)


SQS – Message Visibility Timeout
ReceiveMessage ReceiveMessage ReceiveMessage ReceiveMessage
Request Request Request Request

Visibility timeout

Not returned Not returned Time

Message returned Message returned (again)

• If a message is not processed within the visibility timeout, it will be processed twice
• A consumer could call the ChangeMessageVisibility API to get more time
• If visibility timeout is high (hours), and consumer crashes, re-processing will take time
• If visibility timeout is too low (seconds), we may get duplicates
Amazon SQS – FIFO Queue
• FIFO = First In First Out (ordering of messages in the queue)

Send messages Poll messages


Producer Consumer
4 3 2 1 4 3 2 1

• Limited throughput: 300 msg/s without batching, 3000 msg/s with


• Exactly-once send capability (by removing duplicates)
• Messages are processed in order by the consumer
Amazon SQS – Dead Letter Queue (DLQ)
Dead Letter
• If a consumer fails to process a message within the Queue
SQS Queue
Visibility Timeout…
the message goes back to the queue!
• We can set a threshold of how many times a message can
go back to the queue
• After the MaximumReceives threshold is exceeded, the
message goes into a Dead Letter Queue (DLQ)
• Useful for debugging!
• DLQ of a FIFO queue must also be a FIFO queue
Consumer
• DLQ of a Standard queue must also be a Standard
queue
Failure
• Make sure to process the messages in the DLQ before loop
they expire:
• Good to set a retention of 14 days in the DLQ
SQS DLQ – Redrive to Source
• Feature to help consume Source Dead Letter
messages in the DLQ to SQS Queue Queue
understand what is wrong with Redrive task

them
• When our code is fixed, we can
redrive the messages from the ✅
Manual inspection
DLQ back into the source And debugging
queue (or any other queue) in
batches without writing custom Consumer
code
Amazon SQS – Delay Queue
• Delay a message (consumers don’t see it immediately) up to 15 minutes
• Default is 0 seconds (message is available right away)
• Can set a default at queue level
• Can override the default on send using the DelaySeconds parameter

Send messages Poll messages


Producer Consumer
Amazon SQS - Long Polling
message
• When a consumer requests messages from the
queue, it can optionally “wait” for messages to
arrive if there are none in the queue
• This is called Long Polling
• LongPolling decreases the number of API calls
made to SQS while increasing the efficiency SQS Queue
and latency of your application.
• The wait time can be between 1 sec to 20 sec
(20 sec preferable)
• Long Polling is preferable to Short Polling poll
• Long polling can be enabled at the queue level
or at the API level using
ReceiveMessageWaitTimeSeconds Consumer
SQS Extended Client
• Message size limit is 256KB, how to send large messages, e.g. 1GB?
• Using the SQS Extended Client (Java Library)

SQS Queue

Small metadata Small metadata


Producer Consumer
message message

Send large message to S3 Retrieve large message from S3

Amazon S3
bucket
SQS – Must know API
• CreateQueue (MessageRetentionPeriod), DeleteQueue
• PurgeQueue: delete all the messages in queue
• SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage
• MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
• ReceiveMessageWaitTimeSeconds: Long Polling
• ChangeMessageVisibility: change the message timeout

• Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility


helps decrease your costs
SQS FIFO – Deduplication
• De-duplication interval is 5 minutes
• Two de-duplication methods:
• Content-based deduplication: will do a SHA-256 hash of the message body
• Explicitly provide a Message Deduplication ID

SHA-256:
b94d27b9934d3e08a52e52d7da7dabfa
Hello world c484efe37a5380ee9088f7ace2efcde9

Producer

Hello world
SQS FIFO – Message Grouping
• If you specify the same value of MessageGroupID in an SQS FIFO queue,
you can only have one consumer, and all the messages are in order
• To get ordering at the level of a subset of messages, specify different values
for MessageGroupID
• Messages that share a common Message Group ID will be in order within the group
• Each Group ID can have a different consumer (parallel processing!)
• Ordering across groups is not guaranteed

A3 A2 A1 Consumer for Group “A”

B4 B3 B2 B1 Consumer for Group “B”

C2 C1 Consumer for Group “C”


SQS FIFO
Amazon SNS
• What if you want to send one message to many receivers?
Direct Email Pub / Sub
Email
integration notification notification

Fraud Fraud
Service Service
Buying Buying
Service Service
Shipping Shipping
Service SNS Topic Service

SQS Queue SQS Queue


Amazon SNS
• The “event producer” only sends message to one SNS topic
• As many “event receivers” (subscriptions) as we want to listen to the SNS topic notifications
• Each subscriber to the topic will get all the messages (note: new feature to filter messages)
• Up to 12,500,000 subscriptions per topic
• 100,000 topics limit
Subscribers

publish SQS Lambda Kinesis Data


Firehose

SNS
HTTP(S) SMS & Emails
Endpoints Mobile Notifications
SNS integrates with a lot of AWS services
• Many AWS services can send data directly to SNS for notifications


CloudWatch Alarms AWS Budgets Lambda

… publish
Auto Scaling Group S3 Bucket DynamoDB
(Notifications) (Events)
SNS

CloudFormation AWS DMS RDS Events
(State Changes) (New Replic)
Amazon SNS – How to publish
• Topic Publish (using the SDK)
• Create a topic
• Create a subscription (or many)
• Publish to the topic

• Direct Publish (for mobile apps SDK)


• Create a platform application
• Create a platform endpoint
• Publish to the platform endpoint
• Works with Google GCM, Apple APNS, Amazon ADM…
Amazon SNS – Security
• Encryption:
• In-flight encryption using HTTPS API
• At-rest encryption using KMS keys
• Client-side encryption if the client wants to perform encryption/decryption itself

• Access Controls: IAM policies to regulate access to the SNS API

• SNS Access Policies (similar to S3 bucket policies)


• Useful for cross-account access to SNS topics
• Useful for allowing other services ( S3…) to write to an SNS topic
SNS + SQS: Fan Out
SQS Queue
Fraud
Service
Buying
Service
Shipping
SNS Topic Service

SQS Queue

• Push once in SNS, receive in all SQS queues that are subscribers
• Fully decoupled, no data loss
• SQS allows for: data persistence, delayed processing and retries of work
• Ability to add more SQS subscribers over time
• Make sure your SQS queue access policy allows for SNS to write
• Cross-Region Delivery: works with SQS Queues in other regions
Application: S3 Events to multiple queues
• For the same combination of: event type (e.g. object create) and prefix
(e.g. images/) you can only have one S3 Event rule
• If you want to send the same S3 event to many SQS queues, use fan-out
SQS Queues

Fan-out

S3 Object events
created…
SNS Topic
Amazon S3
Lambda Function
Application: SNS to Amazon S3 through
Kinesis Data Firehose
• SNS can send to Kinesis and therefore we can have the following
solutions architecture:

Buying Amazon S3
Service
SNS Topic Kinesis Data
Firehose
Any supported KDF
Desxnaxon
Amazon SNS – FIFO Topic
• FIFO = First In First Out (ordering of messages in the topic)

Send messages Receive messages Subscribers


Producer
SQS FIFO
4 3 2 1 4 3 2 1

• Similar features as SQS FIFO:


• Ordering by Message Group ID (all messages in the same group are ordered)
• Deduplication using a Deduplication ID or Content Based Deduplication
• Can only have SQS FIFO queues as subscribers
• Limited throughput (same throughput as SQS FIFO)
SNS FIFO + SQS FIFO: Fan Out
• In case you need fan out + ordering + deduplication

SQS FIFO Queue


Fraud
Service
Buying
Service
Shipping
SNS FIFO Topic Service

SQS FIFO Queue


SNS – Message Filtering
• JSON policy used to filter messages sent to SNS topic’s subscriptions
• If a subscription doesn’t have a filter policy, it receives every message
Filter Policy SQS Queue
State: Placed
(Placed orders)
SQS Queue
(Cancelled orders)
Filter Policy
Buying New transaction State: Cancelled Email Subscripxon
Service (Cancelled orders)
Order: 1036
Product: Pencil SNS Topic Filter Policy
Qty: 4 State: Declined
SQS Queue
State: Placed
(Declined orders)

SQS Queue
(All)
Kinesis Overview
• Makes it easy to collect, process, and analyze streaming data in real-time
• Ingest real-time data such as: Application logs, Metrics, Website clickstreams,
IoT telemetry data…

• Kinesis Data Streams: capture, process, and store data streams


• Kinesis Data Firehose: load data streams into AWS data stores
• Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
• Kinesis Video Streams: capture, process, and store video streams
Kinesis Data Streams

Applications Record Apps (KCL, SDK)


Record
Shard 1
Partition Key Partition Key

Client Data Blob Shard 2 Sequence no. Lambda


(up to 1 MB) Data Blob

2 MB/sec (shared)
Kinesis Data
1 MB/sec
SDK, KPL Per shard all consumers Firehose
or 1000 msg/sec
per shard Shard N
OR
Kinesis Agent Kinesis Data
Stream 2 MB/sec (enhanced) Analytics
Per shard per consumer

Producers Kinesis Data Streams Consumers

Can scale # of shards


Kinesis Data Streams
• Retention between 1 day to 365 days
• Ability to reprocess (replay) data
• Once data is inserted in Kinesis, it can’t be deleted (immutability)
• Data that shares the same partition goes to the same shard (ordering)
• Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
• Consumers:
• Write your own: Kinesis Client Library (KCL), AWS SDK
• Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics,
Kinesis Data Streams – Capacity Modes
• Provisioned mode:
• You choose the number of shards provisioned, scale manually or using API
• Each shard gets 1MB/s in (or 1000 records per second)
• Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
• You pay per shard provisioned per hour

• On-demand mode:
• No need to provision or manage the capacity
• Default capacity provisioned (4 MB/s in or 4000 records per second)
• Scales automatically based on observed throughput peak during the last 30 days
• Pay per stream per hour & data in/out per GB
Kinesis Data Streams Security
• Control access / authorization using Region
IAM policies
VPC
• Encryption in flight using HTTPS
endpoints Private subnet
Shard 1
• Encryption at rest using KMS HTTPS HTTPS
Shard 2
• You can implement EC2 Instance VPC Endpoint
encryption/decryption of data on Shard 3
client side (harder)
• VPC Endpoints available for Kinesis to Stream

access within VPC Kinesis Data Stream


• Monitor API calls using CloudTrail
Encryption at rest
Kinesis Producers
• Puts data records into data streams
• Data record consists of:
• Sequence number (unique per partition-key within shard)
• Partition key (must specify while put records into stream)
• Data blob (up to 1 MB)
• Producers:
• AWS SDK: simple producer
• Kinesis Producer Library (KPL): C++, Java, batch, compression, retries
• Kinesis Agent: monitor log files
• Write throughput: 1 MB/sec or 1000 records/sec per shard
• PutRecord API
• Use batching with PutRecords API to reduce costs & increase throughput
Kinesis Producers
Partition Key = Device Id Same partition key = Same shard
111222333 Hash 2BD12A930..
.
function 799F
BAD5
3... Shard 1
111222333
Device Id Data Blob Shard 2
111222333 (up to 1 MB)

444555666
444555666
Device Id
444555666 Data Blob Shard N
(up to 1 MB)
1 MB/sec
or 1000 records/sec Stream
per shard
IoT Devices Kinesis Data Streams
Use highly distributed partition key to avoid “hot partition”
Kinesis - ProvisionedThroughputExceeded
1 MB/sec or 1000 records/sec
Applications
Shard 1

Client Shard 2

SDK, KPL 2 MB/sec


Shard N
ProvisionedThroughputExceeded
Kinesis Agent
Solution: Stream
• Use highly distributed partition key
Producers Kinesis Data Streams
• Retries with exponential backoff
• Increase shards (scaling)
Kinesis Data Streams Consumers
• Get data records from data streams and process them

• AWS Lambda
• Kinesis Data Analytics
• Kinesis Data Firehose
• Custom Consumer (AWS SDK) – Classic or Enhanced Fan-Out
• Kinesis Client Library (KCL): library to simplify reading from data stream
Kinesis Consumers – Custom Consumer
Shared (Classic) Fan-out Consumer Enhanced Fan-out Consumer
2 MB/sec per consumer
2 MB/sec per shard
per shard
across all consumers
GetRecord SubscribeT
s () oShard()
Shard 1 Consumer Shard 1 Consumer
2 MB/sec
Applicaxon A Application A
Shard 2 Shard 2 2M
B/s
ec
Consumer 2M Consumer
Application B B/ Application B
se
c

Shard N Consumer Shard N Consumer


Application C Application C
Stream Stream
Kinesis Data Streams Kinesis Data Streams
Kinesis Consumers Types
Shared (Classic) Fan-out Consumer - pull Enhanced Fan-out Consumer - push
• Low number of consuming applications • Multiple consuming applications for
• Read throughput: 2 MB/sec per shard the same stream
across all consumers • 2 MB/sec per consumer per shard
• Max. 5 GetRecords API calls/sec • Latency ~70 ms
• Latency ~200 ms • Higher costs ($$$)
• Minimize cost ($) • Kinesis pushes data to consumers
over HTTP/2 (SubscribeToShard
• Consumers poll data from Kinesis using API)
GetRecords API call • Soft limit of 5 consumer applications
• Returns up to 10 MB (then throttle for (KCL) per data stream (default)
5 seconds) or up to 10000 records
Kinesis Consumers – AWS Lambda
• Supports Classic & Enhanced
Lambda functions
fan-out consumers (process records & save to DynamoDB)
• Read records in batches 3 1 2 1 GetBatch() 1 1

• Can configure batch size and


Shard 2
batch window 2

• If error occurs, Lambda retries 3


Amazon
until succeeds or data expired Shard N DynamoDB
• Can process up to 10 batches Stream
per shard simultaneously Kinesis Data Streams
Kinesis Client Library (KCL)
• A Java library that helps read record from a Kinesis Data Stream with
distributed applications sharing the read workload
• Each shard is to be read by only one KCL instance
• 4 shards = max. 4 KCL instances
• 6 shards = max. 6 KCL instances
• Progress is checkpointed into DynamoDB (needs IAM access)
• Track other workers and share the work amongst shards using DynamoDB
• KCL can run on EC2, Elastic Beanstalk, and on-premises
• Records are read in order at the shard level
• Versions:
• KCL 1.x (supports shared consumer)
• KCL 2.x (supports shared & enhanced fan-out consumer)
KCL Example: 4 shards

Checkpointing progress
Shard 1
KCL app
Running on EC2
Shard 2

Shard 3
KCL app
Running on EC2
Shard 4
Stream Amazon
DynamoDB
Kinesis Data Streams
KCL Example: 4 shards, Scaling KCL App

Shard 1
KCL app

Checkpointing progress
Running on EC2

Shard 2
KCL app
Running on EC2

Shard 3
KCL app
Running on EC2

Shard 4
KCL app
Stream Running on EC2
Kinesis Data Streams Amazon
DynamoDB
KCL Example: 6 shards, Scaling Kinesis

Shard 1
KCL app

Checkpointing progress
Running on EC2
Shard 2

Shard 3 KCL app


Running on EC2
Shard 4
KCL app
Shard 5 Running on EC2

Shard 6
KCL app
Running on EC2
Stream
Kinesis Data Streams Amazon
DynamoDB
KCL Example: 6 shards, Scaling KCL App

Shard 1

Checkpointing progress
Shard 2

Shard 3

Shard 4

Shard 5

Shard 6
Stream
Kinesis Data Streams Amazon
DynamoDB
Kinesis Operation – Shard Splitting
• Used to increase the Stream
capacity (1 MB/s data in per shard)
• Used to divide a “hot shard”
Shard 1
• The old shard is closed and will be Shard 1
Split
deleted once the data is expired Shard 4 (new)
Shard 2
• No automatic scaling (manually Shard 5 (new)
increase/decrease capacity) Shard 3
Shard 3
• Can’t split into more than two Stream
Stream
shards in a single operation Kinesis Data Stream
Kinesis Data Stream

Increase capacity and cost


Kinesis Operation – Merging Shards
• Decrease the Stream capacity and
save costs
• Can be used to group two shards Merge
Shard 1
with low traffic (cold shards) Shard 6
• Old shards are closed and will be Shard 4

deleted once the data is expired Shard 5 Shard 5


• Can’t merge more than two shards
Shard 3 Shard 3
in a single operation
Stream Stream

Kinesis Data Stream Kinesis Data Stream

Decrease capacity and cost


Kinesis Data Firehose 3rd-party Partner Destinations

Lambda
function Datadog
Applications
Kinesis Data
Data Streams transformation AWS Destinations
Amazon S3
Client Record
Up to 1 MB
Amazon Redshift
Amazon Batch writes (COPY through S3)
SDK, KPL CloudWatch
(Logs & Events) Kinesis
Data Firehose Amazon OpenSearch

Kinesis Agent All or Failed data


Custom Destinations
AWS IoT
Producers S3 backup bucket HTTP Endpoint
Kinesis Data Firehose
• Fully Managed Service, no administration, automatic scaling, serverless
• AWS: Redshift / Amazon S3 / OpenSearch
• 3rd party partner: Splunk / MongoDB / DataDog / NewRelic / …
• Custom: send to any HTTP endpoint
• Pay for data going through Firehose
• Near Real Time
• 60 seconds latency minimum for non full batches
• Or minimum 1 MB of data at a time
• Supports many data formats, conversions, transformations, compression
• Supports custom data transformations using AWS Lambda
• Can send failed or all data to a backup S3 bucket
Kinesis Data Streams vs Firehose
Kinesis Data Streams Kinesis Data Firehose

• Streaming service for ingest at scale • Load streaming data into S3 / Redshift /
• Write custom code (producer / OpenSearch / 3rd party / custom HTTP
consumer) • Fully managed
• Real-time (~200 ms) • Near real-time (buffer time min. 60 sec)
• Manage scaling (shard splitting / • Automatic scaling
merging) • No data storage
• Data storage for 1 to 365 days • Doesn’t support replay capability
• Supports replay capability
Kinesis Data Analytics for SQL applications
SQL Kinesis
Statements Data Streams AWS Lambda anywhere

Kinesis Applications anywhere


Data Streams

Kinesis Amazon S3
Data Firehose
Kinesis Amazon Redshift
Kinesis Data Analytics (COPY through S3)
Data Firehose for SQL Applications
Other Firehose destinations…
Sources Sinks

Reference Data in S3
Kinesis Data Analytics (SQL application)
• Real-time analytics on Kinesis Data Streams & Firehose using SQL
• Add reference data from Amazon S3 to enrich streaming data
• Fully managed, no servers to provision
• Automatic scaling
• Pay for actual consumption rate
• Output:
• Kinesis Data Streams: create streams out of the real-time analytics queries
• Kinesis Data Firehose: send analytics query results to destinations
• Use cases:
• Time-series analytics
• Real-time dashboards
• Real-time metrics
Kinesis Data Analytics for Apache Flink
• Use Flink (Java, Scala or SQL) to process and analyze streaming data
Kinesis Data
Streams

Amazon MSK Kinesis Data Analyxcs


For Apache Flink

• Run any Apache Flink application on a managed cluster on AWS


• provisioning compute resources, parallel computation, automatic scaling
• application backups (implemented as checkpoints and snapshots)
• Use any Apache Flink programming features
• Flink does not read from Firehose (use Kinesis Analytics for SQL instead)
Ordering data into Kinesis
• Imagine you have 100 trucks
(truck_1, truck_2, … truck_100) on
the road sending their GPS positions
regularly into AWS. 1 Kinesis Stream with 3 Shards
• You want to consume the data in
order for each truck, so that you can 2 Shard 1
track their movement accurately.
• How should you send that data into 3 Shard 2
Kinesis?
4 Shard 3

• Answer : send using a “Par tition Key” 5 Partition Key is “truck_id”


value of the “truck_id”
• The same key will always go to the
same shard
Ordering data into SQS
• For SQS standard, there is no ordering.
• For SQS FIFO, if you don’t use a Group ID, messages are consumed in the
order they are sent, with only one consumer

• You want to scale the number of consumers, but you want messages to be “grouped”
when they are related to each other
• Then you use a Group ID (similar to Partition Key in Kinesis)

https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/
Kinesis vs SQS ordering
• Let’s assume 100 trucks, 5 kinesis shards, 1 SQS FIFO
• Kinesis Data Streams:
• On average you’ll have 20 trucks per shard
• Trucks will have their data ordered within each shard
• The maximum amount of consumers in parallel we can have is 5
• Can receive up to 5 MB/s of data
• SQS FIFO
• You only have one SQS FIFO queue
• You will have 100 Group ID
• You can have up to 100 Consumers (due to the 100 Group ID)
• You have up to 300 messages per second (or 3000 if using batching)
SQS vs SNS vs Kinesis
SQS: SNS: Kinesis:
• Consumer “pull data” • Push data to many • Standard: pull data
subscribers
• 2 MB per shard
• Data is deleted after being • Up to 12,500,000 subscribers
consumed • Enhanced-fan out: push data
• Data is not persisted (lost if • 2 MB per shard per consumer
• Can have as many workers not delivered)
(consumers) as we want • Pub/Sub • Possibility to replay data
• No need to provision • Up to 100,000 topics • Meant for real-time big data,
throughput analytics and ETL
• No need to provision
• Ordering guarantees only on throughput • Ordering at the shard level
FIFO queues • Integrates with SQS for fan- • Data expires after X days
• Individual message delay out architecture pattern
• Provisioned mode or on-
capability • FIFO capability for SQS FIFO
demand capacity mode
AWS Monitoring, Troubleshooting
& Audit
CloudWatch, X-Ray and CloudTrail
Why Monitoring is Important
• We know how to deploy applications
• Safely
• Automatically
• Using Infrastructure as Code
• Leveraging the best AWS components!
• Our applications are deployed, and our users don’t care how we did it…
• Our users only care that the application is working!
• Application latency: will it increase over time?
• Application outages: customer experience should not be degraded
• Users contacting the IT department or complaining is not a good outcome
• Troubleshooting and remediation
• Internal monitoring:
• Can we prevent issues before they happen?
• Performance and Cost
• Trends (scaling patterns)
• Learning and Improvement
Monitoring in AWS
• AWS CloudWatch:
• Metrics: Collect and track key metrics
• Logs: Collect, monitor, analyze and store log files
• Events: Send notifications when certain events happen in your AWS
• Alarms: React in real-time to metrics / events
• AWS X-Ray:
• Troubleshooting application performance and errors
• Distributed tracing of microservices
• AWS CloudTrail:
• Internal monitoring of API calls being made
• Audit changes to AWS Resources by your users
AWS CloudWatch Metrics
• CloudWatch provides metrics for every services in AWS
• Metric is a variable to monitor (CPUUtilization, NetworkIn…)
• Metrics belong to namespaces
• Dimension is an attribute of a metric (instance id, environment, etc…).
• Up to 30 dimensions per metric
• Metrics have timestamps
• Can create CloudWatch dashboards of metrics
EC2 Detailed monitoring
• EC2 instance metrics have metrics “every 5 minutes”
• With detailed monitoring (for a cost), you get data “every 1 minute”
• Use detailed monitoring if you want to scale faster for your ASG!

• The AWS Free Tier allows us to have 10 detailed monitoring metrics

• Note: EC2 Memory usage is by default not pushed (must be pushed


from inside the instance as a custom metric)
CloudWatch Custom Metrics
• Possibility to define and send your own custom metrics to CloudWatch
• Example: memory (RAM) usage, disk space, number of logged in users …
• Use API call PutMetricData
• Ability to use dimensions (attributes) to segment metrics
• Instance.id
• Environment.name
• Metric resolution (StorageResolution API parameter – two possible value):
• Standard: 1 minute (60 seconds)
• High Resolution: 1/5/10/30 second(s) – Higher cost
• Important: Accepts metric data points two weeks in the past and two hours in the
future (make sure to configure your EC2 instance time correctly)
CloudWatch Logs
• Log groups: arbitrary name, usually representing an application
• Log stream: instances within application / log files / containers
• Can define log expiration policies (never expire, 1 day to 10 years…)
• CloudWatch Logs can send logs to:
• Amazon S3 (exports)
• Kinesis Data Streams
• Kinesis Data Firehose
• AWS Lambda
• OpenSearch
• Logs are encrypted by default
• Can setup KMS-based encryption with your own keys
CloudWatch Logs - Sources
• SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
• Elastic Beanstalk: collection of logs from application
• ECS: collection from containers
• AWS Lambda: collection from function logs
• VPC Flow Logs: VPC specific logs
• API Gateway
• CloudTrail based on filter
• Route53: Log DNS queries
CloudWatch Logs Insights

https://mng.workshop.aws/operations-2022/detect/cwlogs.html
CloudWatch Logs Insights
• Search and analyze log data stored in CloudWatch Logs
• Example: find a specific IP inside a log, count occurrences of
“ERROR” in your logs…
• Provides a purpose-built query language
• Automatically discovers fields from AWS services and JSON log
events
• Fetch desired event fields, filter based on conditions, calculate
aggregate statistics, sort events, limit number of events…
• Can save queries and add them to CloudWatch Dashboards
• Can query multiple Log Groups in different AWS accounts
• It’s a query engine, not a real-time engine
CloudWatch Logs – S3 Export

• Log data can take up to 12 hours to


become available for export
• The API call is CreateExpor tTask

CloudWatch Logs Amazon S3


• Not near-real time or real-time… use
Logs Subscriptions instead
CloudWatch Logs Subscriptions
• Get a real-time log events from CloudWatch Logs for processing and analysis
• Send to Kinesis Data Streams, Kinesis Data Firehose, or Lambda
• Subscription Filter – filter which logs are events delivered to your destination

real-time OpenSearch
Service

Lambda
near
logs real-time
S3

CloudWatch Logs Subscription Filter Kinesis Data Firehose


KDF KDA EC2 Lambda
Kinesis Data Streams
CloudWatch Logs Aggregation
Multi-Account & Multi Region
ACCOUNT A
REGION 1

CloudWatch Logs Subscription Filter

ACCOUNT B Near
REGION 2 Real Time

CloudWatch Logs Subscription Filter Kinesis Data Streams Kinesis Data Firehose Amazon S3

ACCOUNT B
REGION 3

CloudWatch Logs Subscription Filter


CloudWatch Logs Subscriptions
• Cross-Account Subscription – send log events to resources in a different
AWS account (KDS, KDF)
IAM Role
(Cross-Account)
Account – Sender Account – Recipient
(111111111111) (999999999999)

logs logs

CloudWatch Subscription Subscription Kinesis Data Streams


Logs Filter Destination (RecipientStream)
Destination Destination
Access Policy Access Policy
Can be assumed
IAM Role
allow PutRecord
CloudWatch Logs for EC2
• By default, no logs from your EC2
machine will go to CloudWatch CloudWatch Logs

• You need to run a CloudWatch


agent on EC2 to push the log files
you want
• Make sure IAM permissions are
correct CloudWatch CloudWatch
Logs Agent Logs Agent
• The CloudWatch log agent can be
setup on-premises too On Premise
Server
EC2 Instance
CloudWatch Logs Agent & Unified Agent
• For virtual servers (EC2 instances, on-premise servers…)
• CloudWatch Logs Agent
• Old version of the agent
• Can only send to CloudWatch Logs

• CloudWatch Unified Agent


• Collect additional system-level metrics such as RAM, processes, etc…
• Collect logs to send to CloudWatch Logs
• Centralized configuration using SSM Parameter Store
CloudWatch Unified Agent – Metrics
• Collected directly on your Linux server / EC2 instance

• CPU (active, guest, idle, system, user, steal)


• Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
• RAM (free, inactive, used, total, cached)
• Netstat (number of TCP and UDP connections, net packets, bytes)
• Processes (total, dead, bloqued, idle, running, sleep)
• Swap Space (free, used, used %)

• Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)
CloudWatch Logs Metric Filter
• CloudWatch Logs can use filter expressions
• For example, find a specific IP inside of a log
• Or count occurrences of “ERROR” in your logs
• Metric filters can be used to trigger alarms
• Filters do not retroactively filter data. Filters only publish the metric data
points for events that happen after the filter was created.
• Ability to specify up to 3 Dimensions for the Metric Filter (optional)

stream
CloudWatch
Logs Agent
EC2 Instance CW Logs Metric Filters CW Alarm SNS
CloudWatch Alarms
• Alarms are used to trigger notifications for any metric
• Various options (sampling, %, max, min, etc…)
• Alarm States:
• OK
• INSUFFICIENT_DATA
• ALARM
• Period:
• Length of time in seconds to evaluate the metric
• High resolution custom metrics: 10 sec, 30 sec or multiples of 60 sec
CloudWatch Alarm Targets
• Stop, Terminate, Reboot, or Recover an EC2 Instance
• Trigger Auto Scaling Action
• Send notification to SNS (from which you can do pretty much anything)

Amazon EC2 EC2 Auto Scaling Amazon SNS


CloudWatch Alarms – Composite Alarms
• CloudWatch Alarms are on a single metric
• Composite Alarms are monitoring the states of multiple other alarms
• AND and OR conditions
• Helpful to reduce “alarm noise” by creating complex composite alarms
Composite Alarm
monitor CPU ALARM

CW Alarm - A trigger
EC2 Instance

monitor IOPS ALARM Amazon SNS

CW Alarm - B
EC2 Instance Recovery
• Status Check:
• Instance status = check the EC2 VM
• System status = check the underlying hardware

monitor alert

EC2 Instance CloudWatch Alarm SNS Topic


StatusCheckFailed_System

EC2 Instance Recovery

• Recovery: Same Private, Public, Elastic IP, metadata, placement group


CloudWatch Alarm: good to know
• Alarms can be created based on CloudWatch Logs Metrics Filters
CloudWatch
Metric Filter

Alert

CW Logs CW Alarm
Amazon SNS

• To test alarms and notifications, set the alarm state to Alarm using CLI
aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value
ALARM --state-reason "testing purposes"
CloudWatch Synthetics Canary Users

• Configurable script that monitor your APIs, URLs, us-east-1


Websites, …
• Reproduce what your customers do
programmatically to find issues before customers
are impacted Route 53
• Checks the availability and latency of your
endpoints and can store load time data and update monitor
screenshots of the UI DNS record
• Integration with CloudWatch Alarms invoke trigger
• Scripts written in Node.js or Python
• Programmatic access to a headless Google Chrome Lambda CloudWatch CloudWatch
browser Function Alarm Synthetics Canary

• Can run once or on a regular schedule


us-west-2
CloudWatch Synthetics Canary Blueprints
• Hear tbeat Monitor – load URL, store screenshot and an HTTP archive file
• API Canary – test basic read and write functions of REST APIs
• Broken Link Checker – check all links inside the URL that you are testing
• Visual Monitoring – compare a screenshot taken during a canary run with a
baseline screenshot
• Canary Recorder – used with CloudWatch Synthetics Recorder (record your
actions on a website and automatically generates a script for that)
• GUI Workflow Builder – verifies that actions can be taken on your webpage (e.g.,
test a webpage with a login form)
Amazon EventBridge
(formerly CloudWatch Events)
• Schedule: Cron jobs (scheduled scripts)
Schedule Every hour Trigger script on Lambda function

• Event Pattern: Event rules to react to a service doing something

IAM Root User Sign in Event SNS Topic with Email Noificaion

• Trigger Lambda functions, send SQS/SNS messages…


Amazon EventBridge Rules
Example Destinations
Example Source

Compute
JSON
{
"version": "0", Lambda AWS Batch ECS Task
EC2 Instance CodeBuild "id": "6a7e8feb-b491",

Integration
"detail-type": "EC2 Instance
(ex: Start Instance) (ex: failed build) State-change NoZficaZon",
Filter events ….
(optional) }
SQS SNS Kinesis Data
Streams

Maintenance Orchestration
S3 Event Trusted Advisor
(ex: upload object) (ex: new Finding) Amazon
EventBridge
Step CodePipeline CodeBuild
Funcuons
CloudTrail Schedule or Cron
(any API call) (ex: every 4 hours)

SSM EC2 Actions


Amazon EventBridge
AWS Services AWS SaaS Custom
Default Partners
Partner Apps Custom
Event Bus Event Bus Event Bus

• Event buses can be accessed by other AWS accounts using Resource-based Policies

• You can archive events (all/filter) sent to an event bus (indefinitely or set period)
• Ability to replay archived events
Amazon EventBridge – Schema Registry
• EventBridge can analyze the events in
your bus and infer the schema

• The Schema Registry allows you to


generate code for your application, that
will know in advance how data is
structured in the event bus

• Schema can be versioned


Amazon EventBridge – Resource-based Policy
• Manage permissions for a specific Event Bus
• Example: allow/deny events from another AWS account or AWS region
• Use case: aggregate all events from your AWS Organization in a single AWS
account or AWS region

AWS Account AWS Account


(123456789012) (111122223333)

PutEvents

EventBridge Bus Lambda function


(central-event-bus)
Allow events from another AWS account
EventBridge – Multi-account Aggregation
Account A Account B

state change event event state change

EC2 Instances Event Rule Event Rule EC2 Instances

Central Account

trigger
Event Bus SNS
Event Rule
Resource policy

Account C Account D

state change event event state change

EC2 Instances Event Rule Event Rule EC2 Instances


AWS X-Ray
• Debugging in Production, the good old way:
• Test locally
• Add log statements everywhere
• Re-deploy in production
• Log formats differ across applications using CloudWatch and analytics is
hard.
• Debugging: monolith “easy”, distributed services “hard”
• No common views of your entire architecture!

• Enter… AWS X-Ray!


AWS X-Ray
Visual analysis of our applications
AWS X-Ray advantages
• Troubleshooting performance (bottlenecks)
• Understand dependencies in a microservice architecture
• Pinpoint service issues
• Review request behavior
• Find errors and exceptions
• Are we meeting time SLA?
• Where I am throttled?
• Identify users that are impacted
X-Ray compatibility
• AWS Lambda
• Elastic Beanstalk
• ECS
• ELB
• API Gateway
• EC2 Instances or any application server (even on premise)
AWS X-Ray Leverages Tracing
• Tracing is an end to end way to following a “request”
• Each component dealing with the request adds its own “trace”
• Tracing is made of segments (+ sub segments)
• Annotations can be added to traces to provide extra-information
• Ability to trace:
• Every request
• Sample request (as a % for example or a rate per minute)
• X-Ray Security:
• IAM for authorization
• KMS for encryption at rest
AWS X-Ray EC2 Instance

How to enable it?


I) Your code (Java, Python, Go, Node.js, .NET) must import the Application Code
AWS X-Ray SDK + AWS X-Ray SDK
• Very little code modification needed
• The application SDK will then capture:
• Calls to AWS services Send traces
• HTTP / HTTPS requests
• Database Calls (MySQL, PostgreSQL, DynamoDB)
X-Ray Daemon
• Queue calls (SQS)
Running on
machine
2) Install the X-Ray daemon or enable X-Ray AWS Integration
• X-Ray daemon works as a low level UDP packet interceptor Send batch
(Linux / Windows / Mac…)
every 1 second
• AWS Lambda / other AWS services already run the X-Ray to AWS X-Ray
daemon for you
• Each application must have the IAM rights to write data to X-Ray
The X-Ray magic
• X-Ray service collects data from all the different services
• Service map is computed from all the segments and traces
• X-Ray is graphical, so even non technical people can help troubleshoot
AWS X-Ray Troubleshooting
• If X-Ray is not working on EC2
• Ensure the EC2 IAM Role has the proper permissions
• Ensure the EC2 instance is running the X-Ray Daemon

• To enable on AWS Lambda:


• Ensure it has an IAM execution role with proper policy
(AWSX-RayWriteOnlyAccess)
• Ensure that X-Ray is imported in the code
• Enable Lambda X-Ray Active Tracing
X-Ray Instrumentation in your code
• Instrumentation means the
measure of product’s performance, Example for Node.js & Express
diagnose errors, and to write trace
information.
• To instrument your application
code, you use the X-Ray SDK
• Many SDK require only
configuration changes
• You can modify your application
code to customize and annotation
the data that the SDK sends to X-
Ray, using interceptors, filters,
handlers, middleware…
X-Ray Concepts
• Segments: each application / service will send them
• Subsegments: if you need more details in your segment
• Trace: segments collected together to form an end-to-end trace
• Sampling: decrease the amount of requests sent to X-Ray, reduce cost
• Annotations: Key Value pairs used to index traces and use with filters
• Metadata: Key Value pairs, not indexed, not used for searching

• The X-Ray daemon / agent has a config to send traces cross account:
• make sure the IAM permissions are correct – the agent will assume the role
• This allows to have a central account for all your application tracing
X-Ray Sampling Rules
• With sampling rules, you control the amount of data that you record
• You can modify sampling rules without changing your code

• By default, the X-Ray SDK records the first request each second, and
five percent of any additional requests.
• One request per second is the reservoir, which ensures that at least
one trace is recorded each second as long the service is serving
requests.
• Five percent is the rate at which additional requests beyond the
reservoir size are sampled.
X-Ray Custom Sampling Rules
• You can create your own rules with the reservoir and rate
X-Ray Write APIs (used by the X-Ray daemon)
• PutTraceSegments: Uploads segment
documents to AWS X-Ray
• PutTelemetryRecords: Used by the AWS
X-Ray daemon to upload telemetry.
• SegmentsReceivedCount,
SegmentsRejectedCounts,
BackendConnectionErrors…
• GetSamplingRules: Retrieve all sampling
rules (to know what/when to send)
• GetSamplingTargets &
GetSamplingStatisticSummaries: advanced
• The X-Ray daemon needs to have an IAM
policy authorizing the correct API calls to
arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess function correctly
X-Ray Read APIs – continued
• GetServiceGraph: main graph
• BatchGetTraces: Retrieves a list of
traces specified by ID. Each trace is a
collection of segment documents that
originates from a single request.
• GetTraceSummaries: Retrieves IDs
and annotations for traces available for
a specified time frame using an
optional filter. To get the full traces,
pass the trace IDs to BatchGetTraces.
• GetTraceGraph: Retrieves a service
graph for one or more specific trace
IDs.
X-Ray with Elastic Beanstalk ❤

• AWS Elastic Beanstalk platforms include the X-Ray daemon


• You can run the daemon by setting an option in the Elastic Beanstalk console
or with a configuration file (in .ebextensions/xray-daemon.config)

• Make sure to give your instance profile the correct IAM permissions so that
the X-Ray daemon can function correctly
• Then make sure your application code is instrumented with the X-Ray SDK
• Note: The X-Ray daemon is not provided for Multicontainer Docker
ECS + X-Ray integration options ❤

ECS Cluster ECS Cluster Fargate Cluster


X-Ray Container as a Daemon X-Ray Container as a “Side Car” X-Ray Container as a “Side Car”

EC2 EC2 EC2 EC2 ECS Cluster

App App App App


Container Container Container Container Fargate Task Fargate Task
X-Ray X-Ray App App
App App Sidecar Sidecar Container Container
Container Container
X-Ray X-Ray
App App Sidecar Sidecar
Container Container
X-Ray X-Ray
Daemon Daemon X-Ray X-Ray
Container Container Sidecar Sidecar
ECS + X-Ray: Example Task Definition

https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon-ecs.html#xray-daemon-ecs-build
AWS Distro for OpenTelemetry
• Secure, production-ready AWS-supported distribution of the open-source
project OpenTelemetry project
• Provides a single set of APIs, libraries, agents, and collector services
• Collects distributed traces and metrics from your apps
• Collects metadata from your AWS resources and services
• Auto-instrumentation Agents to collect traces without changing your code
• Send traces and metrics to multiple AWS services and partner solutions
• X-Ray, CloudWatch, Prometheus…
• Instrument your apps running on AWS (e.g., EC2, ECS, EKS, Fargate, Lambda) as
well as on-premises
• Migrate from X-Ray to AWS Distro for Temeletry if you want to standardize
with open-source APIs from Telemetry or send traces to multiple
destinations simultaneously
AWS Distro for OpenTelemetry
AWS X-Ray

Amazon
CloudWatch

Collect Traces Collect Metrics AWS Resources


AWS Distro for collect data about collect metrics from and Contextual Data
OpenTelemetry the request from each app the request Amazon Managed
collect information about
each app the request passes through AWS resources and metadata Service for
passes through where the app is running Prometheus

Partner Monitoring
Solueons
AWS CloudTrail
• Provides governance, compliance and audit for your AWS Account
• CloudTrail is enabled by default!
• Get an history of events / API calls made within your AWS Account
by:
• Console
• SDK
• CLI
• AWS Services
• Can put logs from CloudTrail into CloudWatch Logs or S3
• A trail can be applied to All Regions (default) or a single Region.
• If a resource is deleted in AWS, investigate CloudTrail first!
CloudTrail Diagram

SDK

CloudWatch Logs
CloudTrail Console
CLI

Console
Inspect & Audit S3 Bucket

IAM Users &


IAM Roles
CloudTrail Events
• Management Events:
• Operations that are performed on resources in your AWS account
• Examples:
• Configuring security (IAM AttachRolePolicy)
• Configuring rules for routing data (Amazon EC2 CreateSubnet)
• Setting up logging (AWS CloudTrail CreateTrail)
• By default, trails are configured to log management events.
• Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)

• Data Events:
• By default, data events are not logged (because high volume operations)
• Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
• AWS Lambda function execution activity (the Invoke API)

• CloudTrail Insights Events:


• See next slide J
CloudTrail Insights
• Enable CloudTrail Insights to detect unusual activity in your account:
• inaccurate resource provisioning
• hitting service limits
• Bursts of AWS IAM actions
• Gaps in periodic maintenance activity
• CloudTrail Insights analyzes normal management events to create a baseline
• And then continuously analyzes write events to detect unusual patterns
• Anomalies appear in the CloudTrail console
• Event is sent to Amazon S3
• An EventBridge event is generated (for automation needs)
CloudTrail Console

Conunous analysis generate


Management Events Insights Events S3 Bucket

CloudTrail Insights
EventBridge event
CloudTrail Events Retention
• Events are stored for 90 days in CloudTrail
• To keep events beyond this period, log them to S3 and use Athena

Management Events CloudTrail


Athena
log analyze
Data Events

90 days S3 Bucket
Insights Events Long-term retention
retenxon
Amazon EventBridge – Intercept API Calls

User

DeleteTable API Call 💥

Log API call event alert

DynamoDB CloudTrail Amazon SNS


(any API call) EventBridge
Amazon EventBridge + CloudTrail

API Call logs event

AssumeRole
IAM CloudTrail EventBridge SNS
User
IAM Role

API Call logs event

AuthorizeSecurityGroupIngress
EC2 CloudTrail EventBridge SNS
edit SG
User Security Group
Inbound Rules
CloudTrail vs CloudWatch vs X-Ray
• CloudTrail:
• Audit API calls made by users / services / AWS console
• Useful to detect unauthorized calls or root cause of changes
• CloudWatch:
• CloudWatch Metrics over time for monitoring
• CloudWatch Logs for storing application log
• CloudWatch Alarms to send notifications in case of unexpected metrics
• X-Ray:
• Automated Trace Analysis & Central Service Map Visualization
• Latency, Errors and Fault analysis
• Request tracking across distributed systems
AWS Lambda
It’s a serverless world
What’s serverless?
• Serverless is a new paradigm in which the developers don’t have to
manage servers anymore…
• They just deploy code
• They just deploy… functions !
• Initially... Serverless == FaaS (Function as a Service)
• Serverless was pioneered by AWS Lambda but now also includes
anything that’s managed: “databases, messaging, storage, etc.”
• Serverless does not mean there are no servers…
it means you just don’t manage / provision / see them
Serverless in AWS Users

t

en
AWS Lambda

Lo
REST API
nt

gi
co

n
• DynamoDB

tic
Sta
• AWS Cognito
• AWS API Gateway
• Amazon S3
S3 bucket API Gateway Cognito
• AWS SNS & SQS
• AWS Kinesis Data Firehose
• Aurora Serverless
• Step Functions Lambda
• Fargate

DynamoDB
Why AWS Lambda
• Virtual Servers in the Cloud
• Limited by RAM and CPU
• Continuously running
Amazon EC2 • Scaling means intervention to add / remove servers

• Virtual functions – no servers to manage!


• Limited by time - short executions
• Run on-demand
Amazon Lambda
• Scaling is automated!
Benefits of AWS Lambda
• Easy Pricing:
• Pay per request and compute time
• Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time

• Integrated with the whole AWS suite of services


• Integrated with many programming languages
• Easy monitoring through AWS CloudWatch
• Easy to get more resources per functions (up to 10GB of RAM!)
• Increasing RAM will also improve CPU and network!
AWS Lambda language support
• Node.js (JavaScript)
• Python
• Java (Java 8 compatible)
• C# (.NET Core)
• Golang
• C# / Powershell
• Ruby
• Custom Runtime API (community supported, example Rust)

• Lambda Container Image


• The container image must implement the Lambda Runtime API
• ECS / Fargate is preferred for running arbitrary Docker images
AWS Lambda Integrations
Main ones

API Gateway Kinesis DynamoDB S3 CloudFront

CloudWatch Events CloudWatch Logs SNS SQS Cognito


EventBridge
Example: Serverless Thumbnail creation

u sh
p
New thumbnail in S3
trigger

pu
Image name

sh
New image in S3 AWS Lambda Function Image size
Creates a Thumbnail Creaxon date
etc…

Metadata in DynamoDB
Example: Serverless CRON Job

Trigger
Every 1 hour

CloudWatch Events
EventBridge AWS Lambda Function
Perform a task
AWS Lambda Pricing: example
• You can find overall pricing information here:
https://aws.amazon.com/lambda/pricing/
• Pay per calls:
• First 1,000,000 requests are free
• $0.20 per 1 million requests thereafter ($0.0000002 per request)
• Pay per duration: (in increment of 1 ms)
• 400,000 GB-seconds of compute time per month for FREE
• == 400,000 seconds if function is 1GB RAM
• == 3,200,000 seconds if function is 128 MB RAM
• After that $1.00 for 600,000 GB-seconds
• It is usually very cheap to run AWS Lambda so it’s very popular
Lambda – Synchronous Invocations
• Synchronous: CLI, SDK, API Gateway, Application Load Balancer
• Results is returned right away
• Error handling must happen client side (retries, exponential backoff, etc…)
invoke
SDK/CLI Do something
Response

invoke proxy
Client Do something
Response Response
API Gateway
Lambda - Synchronous Invocations - Services
• User Invoked:
• Elastic Load Balancing (Application Load Balancer)
• Amazon API Gateway
• Amazon CloudFront (Lambda@Edge)
• Amazon S3 Batch
• Service Invoked:
• Amazon Cognito
• AWS Step Functions
• Other Services:
• Amazon Lex
• Amazon Alexa
• Amazon Kinesis Data Firehose
Lambda Integration with ALB
• To expose a Lambda function as an HTTP(S) endpoint…
• You can use the Application Load Balancer (or an API Gateway)
• The Lambda function must be registered in a target group

Target Group
HTTP/HTTPS INVOKE SYNC

Client Application Load Balancer


(ALB)
ALB to Lambda: HTTP to JSON
Request Payload for Lambda Funcion

ELB information

HTTP Method & Path


Query String Parameters as Key/Value pairs

Headers as Key/Value pairs

Body (for POST, PUT...) & isBase64Encoded


Lambda to ALB conversions: JSON to HTTP

Response from the Lambda Function

Status Code & Description

Headers as Key/Value pairs


Body & isBase64Encoded
ALB Multi-Header Values
Client
• ALB can support multi header
values (ALB setting) HTTP
• When you enable multi-value http://example.com/path?name=foo&name=bar
headers, HTTP headers and
query string parameters that
ALB
are sent with multiple values
are shown as arrays within the JSON
AWS Lambda event and ”queryStringParameters”: {“name”: [“foo”,”bar”] }
response objects.

Lambda
ALB + Lambda – Permissios

Lambda Resource Policy


Lambda – Asynchronous Invocations
• S3, SNS, CloudWatch Events…
• The events are placed in an Event Queue Lambda Service
retries
• Lambda attempts to retry on errors New file
• 3 tries total
• 1 minute wait after 1st , then 2 minutes wait event read
• Make sure the processing is idempotent (in
case of retries) Event Queue Function
S3 bucket
• If the function is retried, you will see duplicate
logs entries in CloudWatch Logs
• Can define a DLQ (dead-letter queue) – SNS DLQ for
or SQS – for failed processing (need correct failed processing
IAM permissions)
• Asynchronous invocations allow you to speed
up the processing if you don’t need to wait for SQS or SNS
the result (ex: you need 1000 files processed)
Lambda - Asynchronous Invocations - Services
• Amazon Simple Storage Service (S3)
• Amazon Simple Notification Service (SNS)
• Amazon CloudWatch Events / EventBridge
• AWS CodeCommit (CodeCommit Trigger: new branch, new tag, new push)
• AWS CodePipeline (invoke a Lambda function during the pipeline, Lambda must callback)
----- other -----
• Amazon CloudWatch Logs (log processing)
• Amazon Simple Email Service
• AWS CloudFormation
• AWS Config
• AWS IoT
• AWS IoT Events
CloudWatch Events / EventBridge

Trigger AWS Lambda Function


CRON or Rate
Every 1 hour Perform a task
EventBridge Rule

CodePipeline Trigger on AWS Lambda Function


EventBridge Rule State Changes Perform a task
S3 Events Notifications
• S3:ObjectCreated, S3:ObjectRemoved, SQS
S3:ObjectRestore, S3:Replication…
• Object name filtering possible (*.jpg) SNS
• Use case: generate thumbnails of images
uploaded to S3
events
• S3 event notifications typically deliver events
in seconds but can sometimes take a minute
or longer Amazon S3 SQS Lambda Funcxon
• If two writes are made to a single non-
versioned object at the same time, it is async
possible that only a single event notification
will be sent
• If you want to ensure that an event Lambda Function
notification is sent for every successful write,
you can enable versioning on your bucket. DLQ

SQS
Simple S3 Event Pattern – Metadata Sync

Table in RDS
New file event

S3 bucket
Update metadata table

DynamoDB Table
Lambda – Event Source Mapping
• Kinesis Data Streams
Kinesis
• SQS & SQS FIFO queue
• DynamoDB Streams POLL RETURN BATCH

Event Source Mapping


• Common denominator: (internal)
records need to be polled
from the source
INVOKE WITH EVENT BATCH
• Your Lambda function is
invoked synchronously Lambda Function
Streams & Lambda (Kinesis & DynamoDB)
• An event source mapping creates an iterator for each shard, processes items in order
• Start with new items, from the beginning or from timestamp
• Processed items aren't removed from the stream (other consumers can read them)
• Low traffic: use batch window to accumulate records before processing
• You can process multiple batches in parallel
• up to 10 batches per shard
• in-order processing is still guaranteed for each partition key,

h_ps://aws.amazon.com/blogs/compute/new-aws-lambda-scaling-controls-for-kinesis-and-dynamodb-event-sources/
Streams & Lambda – Error Handling
• By default, if your function returns an error, the entire batch is
reprocessed until the function succeeds, or the items in the batch
expire.
• To ensure in-order processing, processing for the affected shard is
paused until the error is resolved
• You can configure the event source mapping to:
• discard old events
• restrict the number of retries
• split the batch on error (to work around Lambda timeout issues)
• Discarded events can go to a Destination
Lambda – Event Source Mapping
SQS & SQS FIFO
• Event Source Mapping will SQS
poll SQS (Long Polling)
• Specify batch size (1-10
messages) POLL RETURN BATCH
• Recommended: Set the
queue visibility timeout to
6x the timeout of your Lambda
Lambda function Event Source Mapping
• To use a DLQ
• set-up on the SQS queue,
not Lambda (DLQ for INVOKE WITH EVENT BATCH
Lambda is only for async
invocations)
• Or use a Lambda destination Lambda Function
for failures
Queues & Lambda
• Lambda also supports in-order processing for FIFO (first-in, first-out) queues,
scaling up to the number of active message groups.
• For standard queues, items aren't necessarily processed in order.
• Lambda scales up to process a standard queue as quickly as possible.

• When an error occurs, batches are returned to the queue as individual items
and might be processed in a different grouping than the original batch.
• Occasionally, the event source mapping might receive the same item from
the queue twice, even if no function error occurred.
• Lambda deletes items from the queue after they're processed successfully.
• You can configure the source queue to send items to a dead-letter queue if
they can't be processed.
Lambda Event Mapper Scaling
• Kinesis Data Streams & DynamoDB Streams:
• One Lambda invocation per stream shard
• If you use parallelization, up to 10 batches processed per shard simultaneously
• SQS Standard:
• Lambda adds 60 more instances per minute to scale up
• Up to 1000 batches of messages processed simultaneously
• SQS FIFO:
• Messages with the same GroupID will be processed in order
• The Lambda function scales to the number of active message groups
Lambda – Event and Context Objects
EventBridge Lambda Function
invoke

Event Object Context Object


Lambda – Event and Context Objects
• Event Object
• JSON-formatted document contains data for the function to process
• Contains information from the invoking service (e.g., EventBridge, custom, …)
• Lambda runtime converts the event to an object (e.g., dict type in Python)
• Example: input arguments, invoking service arguments, …

• Context Object
• Provides methods and properties that provide information about the invocation,
function, and runtime environment
• Passed to your function by Lambda at runtime
• Example: aws_request_id, function_name, memory_limit_in_mb, …
Lambda – Event and Context Objects
Access Event & Context Objects using Python
Lambda – Destinations
• Nov 2019: Can configure to send result to a
destination
• Asynchronous invocations - can define destinations
for successful and failed event:
• Amazon SQS
• Amazon SNS
• AWS Lambda
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
• Amazon EventBridge bus
• Note: AWS recommends you use destinations instead of
DLQ now (but both can be used at the same time)

• Event Source mapping: for discarded event batches


• Amazon SQS
• Amazon SNS
• Note: you can send events to a DLQ directly from SQS

h_ps://docs.aws.amazon.com/lambda/latest/dg/invocaZon-eventsourcemapping.html
Lambda Execution Role (IAM Role)
• Grants the Lambda function permissions to AWS services / resources
• Sample managed policies for Lambda:
• AWSLambdaBasicExecutionRole – Upload logs to CloudWatch.
• AWSLambdaKinesisExecutionRole – Read from Kinesis
• AWSLambdaDynamoDBExecutionRole – Read from DynamoDB Streams
• AWSLambdaSQSQueueExecutionRole – Read from SQS
• AWSLambdaVPCAccessExecutionRole – Deploy Lambda function in VPC
• AWSXRayDaemonWriteAccess – Upload trace data to X-Ray.

• When you use an event source mapping to invoke your function, Lambda
uses the execution role to read event data.
• Best practice: create one Lambda Execution Role per function
Lambda Resource Based Policies
• Use resource-based policies to give other accounts and AWS services
permission to use your Lambda resources
• Similar to S3 bucket policies for S3 bucket
• An IAM principal can access Lambda:
• if the IAM policy attached to the principal authorizes it (e.g. user access)
• OR if the resource-based policy authorizes (e.g. service access)

• When an AWS service like Amazon S3 calls your Lambda function, the
resource-based policy gives it access.
Lambda Environment Variables
• Environment variable = key / value pair in “String” form
• Adjust the function behavior without updating code
• The environment variables are available to your code
• Lambda Service adds its own system environment variables as well

• Helpful to store secrets (encrypted by KMS)


• Secrets can be encrypted by the Lambda service key, or your own CMK
Lambda Logging & Monitoring
• CloudWatch Logs:
• AWS Lambda execution logs are stored in AWS CloudWatch Logs
• Make sure your AWS Lambda function has an execution role with an IAM
policy that authorizes writes to CloudWatch Logs
• CloudWatch Metrics:
• AWS Lambda metrics are displayed in AWS CloudWatch Metrics
• Invocations, Durations, Concurrent Executions
• Error count, Success Rates, Throttles
• Async Delivery Failures
• Iterator Age (Kinesis & DynamoDB Streams)
Lambda Tracing with X-Ray
• Enable in Lambda configuration (Active Tracing)
• Runs the X-Ray daemon for you
• Use AWS X-Ray SDK in Code
• Ensure Lambda Function has a correct IAM Execution Role
• The managed policy is called AWSXRayDaemonWriteAccess
• Environment variables to communicate with X-Ray
• _X_AMZN_TRACE_ID: contains the tracing header
• AWS_XRAY_CONTEXT_MISSING: by default, LOG_ERROR
• AWS_XRAY_DAEMON_ADDRESS: the X-Ray Daemon IP_ADDRESS:PORT
Customization At The Edge
• Many modern applications execute some form of the logic at the edge
• Edge Function:
• A code that you write and attach to CloudFront distributions
• Runs close to your users to minimize latency
• CloudFront provides two types: CloudFront Functions &
Lambda@Edge
• You don’t have to manage any servers, deployed globally

• Use case: customize the CDN content


• Pay only for what you use
• Fully serverless
CloudFront Functions & Lambda@Edge
Use Cases
• Website Security and Privacy
• Dynamic Web Application at the Edge
• Search Engine Optimization (SEO)
• Intelligently Route Across Origins and Data Centers
• Bot Mitigation at the Edge
• Real-time Image Transformation
• A/B Testing
• User Authentication and Authorization
• User Prioritization
• User Tracking and Analytics
CloudFront Functions Client

• Lightweight functions written in JavaScript


• For high-scale, latency-sensitive CDN customizations Viewer
Request
Viewer
Response
• Sub-ms startup times, millions of requests/second
• Used to change Viewer requests and responses:
• Viewer Request: after CloudFront receives a request from a
CloudFront
viewer
• Viewer Response: before CloudFront forwards the response
Origin Origin
to the viewer Request Response

• Native feature of CloudFront (manage code entirely


within CloudFront)
Origin
Lambda@Edge Client

• Lambda functions written in NodeJS or Python


• Scales to 1000s of requests/second Viewer Viewer
Request Response
• Used to change CloudFront requests and responses:
• Viewer Request – after CloudFront receives a request from a
viewer
• Origin Request – before CloudFront forwards the request to the
origin CloudFront
• Origin Response – after CloudFront receives the response from
the origin
Origin Origin
• Viewer Response – before CloudFront forwards the response to Request Response
the viewer
• Author your functions in one AWS Region (us-east-1), then
CloudFront replicates to its locations
Origin
CloudFront Functions vs. Lambda@Edge
CloudFront Funcions Lambda@Edge
Runtime Support JavaScript Node.js, Python
# of Requests Millions of requests per second Thousands of requests per second
CloudFront Triggers - Viewer Request/Response - Viewer Request/Response
- Origin Request/Response

Max. Execution Time < 1 ms 5 – 10 seconds

Max. Memory 2 MB 128 MB up to 10 GB

Total Package Size 10 KB 1 MB – 50 MB

Network Access, File System Access No Yes


Access to the Request Body No Yes
Pricing Free tier available, 1/6th price of @Edge No free tier, charged per request & duration
CloudFront Functions vs. Lambda@Edge - Use Cases

CloudFront Functions Lambda@Edge


• Cache key normalization • Longer execution time (several ms)
• Transform request attributes (headers,
cookies, query strings, URL) to create an • Adjustable CPU or memory
optimal Cache Key
• Your code depends on a 3rd
• Header manipulation
• Insert/modify/delete HTTP headers in the
libraries (e.g., AWS SDK to access
request or response other AWS services)
• URL rewrites or redirects • Network access to use external
• Request authentication & authorization services for processing
• Create and validate user-generated
tokens (e.g., JWT) to allow/deny requests • File system access or access to the
body of HTTP requests
Lambda by default
Default Lambda Deployment
• By default, your Lambda function is
launched outside your own VPC (in
AWS Cloud

an AWS-owned VPC) Public


www
• Therefore it cannot access resources works DynamoDB
in your VPC (RDS, ElastiCache,
internal ELB…) VPC & Private Subnet
Not working

Private RDS
Lambda in VPC Lambda Function

• You must define the VPC ID, the


Subnets and the Security Groups Private subnet

• Lambda will create an ENI (Elastic Lambda Security group


Network Interface) in your subnets
Elastic Network
• AWSLambdaVPCAccessExecutionRol Interface (ENI)
e

RDS Security group

Amazon RDS
In VPC
Lambda in VPC – Internet Access
External API

• A Lambda function in your VPC Lambda in VPC


does not have internet access
AWS Cloud
• Deploying a Lambda function in
a public subnet does not give it Public subnet DynamoDB
internet access or a public IP
NAT IGW
• Deploying a Lambda function in a Endpoint
private subnet gives it internet
access if you have a NAT VPC & Private Subnet
Gateway / Instance
• You can use VPC endpoints to working
privately access AWS services
without a NAT Assign security group Private RDS

Note: Lambda - CloudWatch Logs works even


without endpoint or NAT Gateway
Lambda Function Configuration
• RAM:
• From 128MB to 10GB in 1MB increments
• The more RAM you add, the more vCPU credits you get
• At 1,792 MB, a function has the equivalent of one full vCPU
• After 1,792 MB, you get more than one CPU, and need to use multi-threading in
your code to benefit from it (up to 6 vCPU)
• If your application is CPU-bound (computation heavy), increase RAM

• Timeout: default 3 seconds, maximum is 900 seconds (15 minutes)


Lambda Execution Context
• The execution context is a temporary runtime environment that
initializes any external dependencies of your lambda code
• Great for database connections, HTTP clients, SDK clients…
• The execution context is maintained for some time in anticipation of
another Lambda function invocation
• The next function invocation can “re-use” the context to execution time
and save time in initializing connections objects
• The execution context includes the /tmp directory
Initialize outside the handler

BAD! GOOD!

The DB connection is established The DB connecFon is established once


At every function invocation And re-used across invocaFons
Lambda Functions /tmp space
• If your Lambda function needs to download a big file to work…
• If your Lambda function needs disk space to perform operations…
• You can use the /tmp directory
• Max size is 10GB
• The directory content remains when the execution context is frozen,
providing transient cache that can be used for multiple invocations
(helpful to checkpoint your work)
• For permanent persistence of object (non temporary), use S3
• To encrypt content on /tmp, you must generate KMS Data Keys
Lambda Layers
• Custom Runtimes
• Ex: C++ https://github.com/awslabs/aws-lambda-cpp
• Ex: Rust https://github.com/awslabs/aws-lambda-rust-runtime
• Externalize Dependencies to re-use them:
Application Package 1 (30.02MB) Applicaion Package 1 (20KB) Application Package 1 (60KB)

lambda_function_1.py lambda_function_1.py lambda_function_2.py

+
my_heavy_library_1 files Lambda Layer 1 (10 MB)
+ my_heavy_library_1 files
my_heavy_library_2 files
Lambda Layer 2 (30 MB)
my_heavy_library_2 files
Lambda – File Systems Mounting
• Lambda functions can access EFS file Availability Zone A Availability Zone B

systems if they are running in a VPC VPC

• Configure Lambda to mount EFS file Private Subnet A Private Subnet B

systems to local directory during


initialization
• Must leverage EFS Access Points EFS Access Point
Path: /
• Limitations: watch out for the EFS
connection limits (one function instance =
one connection) and connection burst
limits EFS File System
Lambda – Storage Options
Ephemeral Storage /tmp Lambda Layers Amazon S3 Amazon EFS

5 layers per
Max. Size 10,240 MB function up to Elastic Elastic
250MB total
Persistence Ephemeral Durable Durable Durable
Content Dynamic Static Dynamic Dynamic
Storage Type File System Archive Object File System

Operations supported any File System operation Immutable Atomic with Versioning any File System operation

Included in Storage + Requests + Storage + Data Transfer +


Pricing Included in Lambda
Lambda Data Transfer Throughput
Sharing/Permissions Function Only IAM IAM IAM + NFS
Relative Data Access
Fastest Fastest Fast Very Fast
Speed from Lambda
Shared Across All
No Yes Yes Yes
Invocations
Lambda Concurrency and Throttling
• Concurrency limit: up to 1000 concurrent executions

• Can set a “reserved concurrency” at the function level (=limit)


• Each invocation over the concurrency limit will trigger a “Throttle”
• Throttle behavior:
• If synchronous invocation => return ThrottleError - 429
• If asynchronous invocation => retry automatically and then go to DLQ
• If you need a higher limit, open a support ticket
Lambda Concurrency Issue
• If you don’t reserve (=limit) concurrency, the following can happen:

1000 concurrent
executions
Many users Application Load Balancer

THROTTLE!

Few users
API Gateway

THROTTLE!
SDK / CLI
Concurrency and Asynchronous Invocations
• If the function doesn't have enough
concurrency available to process all
events, additional requests are
New file event throttled.
• For throttling errors (429) and
New file event system errors (500-series), Lambda
returns the event to the queue and
S3 bucket attempts to run the function again
for up to 6 hours.
New file event • The retry interval increases
exponentially from 1 second after
the first attempt to a maximum of
5 minutes.
Cold Starts & Provisioned Concurrency
• Cold Star t:
• New instance => code is loaded and code outside the handler run (init)
• If the init is large (code, dependencies, SDK…) this process can take some time.
• First request served by new instances has higher latency than the rest
• Provisioned Concurrency:
• Concurrency is allocated before the function is invoked (in advance)
• So the cold start never happens and all invocations have low latency
• Application Auto Scaling can manage concurrency (schedule or target utilization)
• Note:
• Note: cold starts in VPC have been dramatically reduced in Oct & Nov 2019
• https://aws.amazon.com/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/
Reserved and Provisioned Concurrency

https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
Lambda Function Dependencies
• If your Lambda function depends on external libraries:
for example AWS X-Ray SDK, Database Clients, etc…
• You need to install the packages alongside your code and zip it
together
• For Node.js, use npm & “node_modules” directory
• For Python, use pip --target options
• For Java, include the relevant .jar files
• Upload the zip straight to Lambda if less than 50MB, else to S3 first
• Native libraries work: they need to be compiled on Amazon Linux
• AWS SDK comes by default with every Lambda function
Lambda and CloudFormation – inline

• Inline functions are very


simple
• Use the Code.ZipFile
property
• You cannot include function
dependencies with inline
functions
Lambda and CloudFormation – through S3
• You must store the Lambda zip in S3
• You must refer the S3 zip location in
the CloudFormation code
• S3Bucket
• S3Key: full path to zip
• S3ObjectVersion: if versioned bucket
• If you update the code in S3, but
don’t update S3Bucket, S3Key or
S3ObjectVersion, CloudFormation
won’t update your function
Lambda and CloudFormation – through S3
Multiple accounts
Account 2

Account 1 CloudFormaxon
S3 bucket with
Lambda Code Execution Role
Allow get & list
To S3 bucket

Account 3
Bucket policy
Allow Principal: [Accounts ID…]
CloudFormation

Execution Role
Lambda Container Images
Application Code
• Deploy Lambda function as container
images of up to 10GB from ECR Dependencies, datasets
• Pack complex dependencies, large Base Image must implement the
dependencies in a container Lambda Runime API

• Base images are available for Python, Build


Node.js, Java, .NET, Go, Ruby Publish

• Can create your own image as long as it Amazon ECR


implements the Lambda Runtime API
• Test the containers locally using the Deploy
Lambda Runtime Interface Emulator
• Unified workflow to build apps
Lambda
Lambda Container Images
• Example: build from the base images provided by AWS

# Use an image that implements the Lambda Runtime API


FROM amazon/aws-lambda-nodejs:12

# Copy your application code and files


COPY app.js package*.json ./

# Install the dependencies in the container


RUN npm install

# Function to run when the Lambda function is invoked


CMD [ "app.lambdaHandler" ]
Lambda Container Images – Best Practices
• Strategies for optimizing container images:
• Use AWS-provided Base Images
• Stable, Built on Amazon Linux 2, cached by Lambda service
• Use Multi-Stage Builds
• Build your code in larger preliminary images, copy only the artifacts you need in your final
container image, discard the preliminary steps
• Build from Stable to Frequently Changing
• Make your most frequently occurring changes as late in your Dockerfile as possible
• Use a Single Repository for Functions with Large Layers
• ECR compares each layer of a container image when it is pushed to avoid uploading and
storing duplicates

• Use them to upload large Lambda Functions (up to 10 GB)


AWS Lambda Versions
• When you work on a Lambda function,
we work on $LATEST $LATEST
• When we’re ready to publish a Lambda (mutable)
function, we create a version
• Versions are immutable
• Versions have increasing version numbers
• Versions get their own ARN (Amazon
Resource Name) V1 V2
• Version = code + configuration (nothing (Immutable) (Immutable)
can be changed - immutable)
• Each version of the lambda function can
be accessed
AWS Lambda Aliases
Users
• Aliases are ”pointers” to Lambda
function versions
• We can define a “dev”, ”test”,
“prod” aliases and have them point
at different lambda versions DEV Alias PROD Alias TEST Alias
• Aliases are mutable (mutable) (mutable) (mutable)
• Aliases enable Canary deployment
by assigning weights to lambda 5%
functions 95%

• Aliases enable stable configuration


of our event triggers / destinations $LATEST V1 V2
• Aliases have their own ARNs (mutable) (Immutable) (Immutable)
• Aliases cannot reference aliases
Lambda & CodeDeploy
• CodeDeploy can help you automate
traffic shift for Lambda aliases Make X vary over time until X = 100%
• Feature is integrated within the SAM
framework PROD Alias

• Linear : grow traffic every N minutes until 100 – X%


100% V1
• Linear10PercentEvery3Minutes
• Linear10PercentEvery10Minutes
• Canary: try X percent then 100%
• Canary10Percent5Minutes X%
CodeDeploy
• Canary10Percent30Minutes V2
• AllAtOnce: immediate
• Can create Pre & Post Traffic hooks to
check the health of the Lambda function
Lambda & CodeDeploy – AppSpec.yml

• Name (required) – the name of the Lambda


function to deploy
• Alias (required) – the name of the alias to
the Lambda function
• CurrentVersion (required) – the version of
the Lambda function traffic currently points to
• TargetVersion (required) – the version of the
Lambda function traffic is shifted to
Lambda – Function URL
Client
• Dedicated HTTP(S) endpoint for your Lambda function
• A unique URL endpoint is generated for you (never changes)
• https://<url-id>.lambda-url.<region>.on.aws (dual-stack IPv4 & IPv6)
• Invoke via a web browser, curl, Postman, or any HTTP client
HTTPS request
• Access your function URL through the public Internet only
• Doesn’t support PrivateLink (Lambda functions do support)
• Supports Resource-based Policies & CORS configurations
• Can be applied to any function alias or to $LATEST (can’t be
applied to other function versions)
• Create and configure using AWS Console or AWS API
Lambda Function
• Throttle your function by using Reserved Concurrency https://yj4xbxeirvacv3xdjp5uyt3j7y0ltzqa.
lambda-url.us-east-1.on.aws/
Lambda – Function URL Security
• Resource-based Policy
• Authorize other accounts / specific CIDR / IAM principals

• Cross-Origin Resource Sharing (CORS)


• If you call your Lambda function URL from a different domain

CloudFront S3 Bucket
(Static Website Hosting)
example.com

Users api.example.com
Lambda Function URL
Lambda – Function URL Security
• AuthType NONE – allow public and unauthenticated access
• Resource-based Policy is always in effect (must grant public access)

Account A
(123456789012)

Lambda Function Internet Users


(my-function)

Resource-based Policy
Lambda – Function URL Security
• AuthType AWS_IAM – IAM is used to authenticate and authorize requests
• Both Principal’s Identity-based Policy & Resource-based Policy are evaluated
• Principal must have lambda:InvokeFunctionUrl permissions
• Same account – Identity-based Policy OR Resource-based Policy as ALLOW
• Cross account – Identity-based Policy AND Resource Based Policy as ALLOW

Account A Account B
(123456789012) (444455556666)

Lambda Function IAM Role


(my-function) (my-role)

Resource-based Policy Identity-based Policy


Lambda and CodeGuru Profiling +
• Gain insights into runtime performance of your Lambda
functions using CodeGuru Profiler Lambda Function
(MyFunction)

• CodeGuru creates a Profiler Group for your Lambda function


• Supported for Java and Python runtimes
• Activate from AWS Lambda Console runtime
performance
• When activated, Lambda adds: insighs

• CodeGuru Profiler layer to your function


• Environment variables to your function
• AmazonCodeGuruProfilerAgentAccess policy to your function
CodeGuru Profiler
(aws-lambda-MyFuncpon)
AWS Lambda Limits to Know - per region
• Execution:
• Memory allocation: 128 MB – 10GB (1 MB increments)
• Maximum execution time: 900 seconds (15 minutes)
• Environment variables (4 KB)
• Disk capacity in the “function container” (in /tmp): 512 MB to 10GB
• Concurrency executions: 1000 (can be increased)
• Deployment:
• Lambda function deployment size (compressed .zip): 50 MB
• Size of uncompressed deployment (code + dependencies): 250 MB
• Can use the /tmp directory to load other files at startup
• Size of environment variables: 4 KB
AWS Lambda Best Practices
• Perform heavy-duty work outside of your function handler
• Connect to databases outside of your function handler
• Initialize the AWS SDK outside of your function handler
• Pull in dependencies or datasets outside of your function handler
• Use environment variables for :
• Database Connection Strings, S3 bucket, etc… don’t put these values in your code
• Passwords, sensitive values… they can be encrypted using KMS
• Minimize your deployment package size to its runtime necessities.
• Break down the function if need be
• Remember the AWS Lambda limits
• Use Layers where necessary
• Avoid using recursive code, never have a Lambda function call itself
Amazon DynamoDB
NoSQL Serverless Database
Traditional Architecture
Application Layer Database Layer

Auto Scaling group

Amazon RDS
Clients Elasxc Load Balancer EC2 Instance EC2 Instance (MySQL, PostgreSQL, …)

• Traditional applications leverage RDBMS databases


• These databases have the SQL query language
• Strong requirements about how the data should be modeled
• Ability to do query joins, aggregations, complex computations
• Vertical scaling (getting a more powerful CPU / RAM / IO)
• Horizontal scaling (increasing reading capability by adding EC2 / RDS Read Replicas)
NoSQL databases
• NoSQL databases are non-relational databases and are distributed
• NoSQL databases include MongoDB, DynamoDB, …
• NoSQL databases do not support query joins (or just limited support)
• All the data that is needed for a query is present in one row
• NoSQL databases don’t perform aggregations such as “SUM”, “AVG”, …
• NoSQL databases scale horizontally

• There’s no “right or wrong” for NoSQL vs SQL, they just require to


model the data differently and think about user queries differently
Amazon DynamoDB
• Fully managed, highly available with replication across multiple AZs
• NoSQL database - not a relational database
• Scales to massive workloads, distributed database
• Millions of requests per seconds, trillions of row, 100s of TB of storage
• Fast and consistent in performance (low latency on retrieval)
• Integrated with IAM for security, authorization and administration
• Enables event driven programming with DynamoDB Streams
• Low cost and auto-scaling capabilities
• Standard & Infrequent Access (IA) Table Class
DynamoDB - Basics
• DynamoDB is made of Tables
• Each table has a Primary Key (must be decided at creation time)
• Each table can have an infinite number of items (= rows)
• Each item has attributes (can be added over time – can be null)
• Maximum size of an item is 400KB
• Data types supported are:
• Scalar Types – String, Number, Binary, Boolean, Null
• Document Types – List, Map
• Set Types – String Set, Number Set, Binary Set
DynamoDB – Primary Keys
• Option 1: Par tition Key (HASH)
• Partition key must be unique for each item
• Partition key must be “diverse” so that the data is distributed
• Example: “User_ID” for a users table

Primary Key Attributes

Partition Key
User_ID First_Name Last_Name Age
7791a3d6-… John William 46
873e0634-… Oliver 24
a80f73a1-… Katie Lucas 31
DynamoDB – Primary Keys
• Option 2: Par tition Key + Sor t Key (HASH + RANGE)
• The combination must be unique for each item
• Data is grouped by partition key
• Example: users-games table, “User_ID” for Partition Key and “Game_ID” for Sort Key

Primary Key Attributes

Partition Key Sort Key


User_ID Game_ID Score Result
7791a3d6-… 4421 92 Win

Same partition key 873e0634-… 1894 14 Lose


Different sort key 873e0634-… 4521 77 Win
DynamoDB – Partition Keys (Exercise)
• We’re building a movie database
• What is the best Partition Key to maximize data distribution?
• movie_id
• producer_name
• leader_actor_name
• movie_language

• “movie_id” has the highest cardinality so it’s a good candidate


• “movie_language” doesn’t take many values and may be skewed
towards English so it’s not a great choice for the Partition Key
DynamoDB – Read/Write Capacity Modes
• Control how you manage your table’s capacity (read/write throughput)

• Provisioned Mode (default)


• You specify the number of reads/writes per second
• You need to plan capacity beforehand
• Pay for provisioned read & write capacity units

• On-Demand Mode
• Read/writes automatically scale up/down with your workloads
• No capacity planning needed
• Pay for what you use, more expensive ($$$)

• You can switch between different modes once every 24 hours


R/W Capacity Modes – Provisioned
• Table must have provisioned read and write capacity units
• Read Capacity Units (RCU) – throughput for reads
• Write Capacity Units (WCU) – throughput for writes
• Option to setup auto-scaling of throughput to meet demand
• Throughput can be exceeded temporarily using “Burst Capacity”
• If Burst Capacity has been consumed, you’ll get a
“ProvisionedThroughputExceededException”
• It’s then advised to do an exponential backoff retry
DynamoDB – Write Capacity Units (WCU)
• One Write Capacity Unit (WCU) represents one write per second for an
item up to 1 KB in size
• If the items are larger than 1 KB, more WCUs are consumed

• Example 1: we write 10 items per second, with item size 2 KB


+ ,-
• We need 10 ∗ (. ,-) = 20 𝑊𝐶𝑈𝑠
• Example 2: we write 6 items per second, with item size 4.5 KB
/ ,-
• We need 6 ∗ (. ,-) = 30 𝑊𝐶𝑈𝑠 (4.5 gets rounded to the upper KB)
• Example 3: we write 120 items per minute, with item size 2 KB
.+0 + ,-
• We need 10
∗ (. ,-) = 4 𝑊𝐶𝑈𝑠
Strongly Consistent Read
vs. Eventually Consistent Read Application

• Eventually Consistent Read (default)


• If we read just after a write, it’s possible we’ll
read write
get some stale data because of replication
Amazon DynamoDB
• Strongly Consistent Read
• If we read just after a write, we will get the
correct data Server 1

• Set “ConsistentRead” parameter to True in replication


API calls (GetItem, BatchGetItem, Query, Scan)
• Consumes twice the RCU
Server 2 Server 3
DynamoDB – Read Capacity Units (RCU)
• One Read Capacity Unit (RCU) represents one Strongly Consistent Read per
second, or two Eventually Consistent Reads per second, for an item up to 4
KB in size
• If the items are larger than 4 KB, more RCUs are consumed

• Example 1: 10 Strongly Consistent Reads per second, with item size 4 KB


! "#
• We need 10 ∗ = 10 𝑅𝐶𝑈𝑠
! "#
• Example 2: 16 Eventually Consistent Reads per second, with item size 12 KB
$% $& "#
• We need ∗ = 24 𝑅𝐶𝑈𝑠
& ! "#
• Example 3: 10 Strongly Consistent Reads per second, with item size 6 KB
' "#
• We need 10 ∗ = 20 𝑅𝐶𝑈𝑠 (we must round up 6 KB to 8 KB)
! "#
DynamoDB – Partitions Internal
Application
• Data is stored in partitions
• Partition Keys go through a hashing algorithm to know to
which partition they go to ID_13 John …
write …
ID_45 Oliver

• To compute the number of partitions:


()*+!"#$% .)*+!"#$% Amazon DynamoDB
• # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠!" #$%$#&'" = +( )
,--- /--- ID_13
23'$4 5&01
• # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠!" +&01 = ID_45
/- 67
• # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 =
Hash function
ceil(max # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠!" #$%$#&'" , # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠!" +&01 )

8 7 3 e0
d6
a3
91
• WCUs and RCUs are spread evenly across par titions

634…
Table

77
ID_13 ID_45

Partition 1 Partition 2 …
DynamoDB – Throttling
• If we exceed provisioned RCUs or WCUs, we get
“ProvisionedThroughputExceededException”
• Reasons:
• Hot Keys – one partition key is being read too many times (e.g., popular item)
• Hot Par titions
• Very large items, remember RCU and WCU depends on size of items
• Solutions:
• Exponential backoff when exception is encountered (already in SDK)
• Distribute par tition keys as much as possible
• If RCU issue, we can use DynamoDB Accelerator (DAX)
R/W Capacity Modes – On-Demand
• Read/writes automatically scale up/down with your workloads
• No capacity planning needed (WCU / RCU)
• Unlimited WCU & RCU, no throttle, more expensive
• You’re charged for reads/writes that you use in terms of RRU and
WRU
• Read Request Units (RRU) – throughput for reads (same as RCU)
• Write Request Units (WRU) – throughput for writes (same as WCU)
• 2.5x more expensive than provisioned capacity (use with care)
• Use cases: unknown workloads, unpredictable application traffic, …
DynamoDB – Writing Data
• PutItem
• Creates a new item or fully replace an old item (same Primary Key)
• Consumes WCUs

• UpdateItem
• Edits an existing item’s attributes or adds a new item if it doesn’t exist
• Can be used to implement Atomic Counters – a numeric attribute that’s
unconditionally incremented

• Conditional Writes
• Accept a write/update/delete only if conditions are met, otherwise returns an error
• Helps with concurrent access to items
• No performance impact
DynamoDB – Reading Data
• GetItem
• Read based on Primary key
• Primary Key can be HASH or HASH+RANGE
• Eventually Consistent Read (default)
• Option to use Strongly Consistent Reads (more RCU - might take longer)
• ProjectionExpression can be specified to retrieve only certain attributes
DynamoDB – Reading Data (Query)
• Query returns items based on:
• KeyConditionExpression
• Partition Key value (must be = operator) – required
• Sort Key value (=, <, <=, >, >=, Between, Begins with) – optional
• FilterExpression
• Additional filtering after the Query operation (before data returned to you)
• Use only with non-key attributes (does not allow HASH or RANGE attributes)
• Returns:
• The number of items specified in Limit
• Or up to 1 MB of data
• Ability to do pagination on the results
• Can query table, a Local Secondary Index, or a Global Secondary Index
DynamoDB – Reading Data (Scan)
• Scan the entire table and then filter out data (inefficient)
• Returns up to 1 MB of data – use pagination to keep on reading
• Consumes a lot of RCU
• Limit impact using Limit or reduce the size of the result and pause
• For faster performance, use Parallel Scan
• Multiple workers scan multiple data segments at the same time
• Increases the throughput and RCU consumed
• Limit the impact of parallel scans just like you would for Scans
• Can use ProjectionExpression & FilterExpression (no changes to
RCU)
DynamoDB – Deleting Data
• DeleteItem
• Delete an individual item
• Ability to perform a conditional delete

• DeleteTable
• Delete a whole table and all its items
• Much quicker deletion than calling DeleteItem on all items
DynamoDB – Batch Operations
• Allows you to save in latency by reducing the number of API calls
• Operations are done in parallel for better efficiency
• Part of a batch can fail; in which case we need to try again for the failed items

• BatchWriteItem
• Up to 25 PutItem and/or DeleteItem in one call
• Up to 16 MB of data written, up to 400 KB of data per item
• Can’t update items (use UpdateItem)
• UnprocessedItems for failed write operations (exponential backoff or add WCU)

• BatchGetItem
• Return items from one or more tables
• Up to 100 items, up to 16 MB of data
• Items are retrieved in parallel to minimize latency
• UnprocessedKeys for failed read operations (exponential backoff or add RCU)
DynamoDB – PartiQL
• SQL-compatible query language for DynamoDB
• Allows you to select, insert, update, and delete
data in DynamoDB using SQL
• Run queries across multiple DynamoDB tables
• Run PartiQL queries from:
• AWS Management Console
• NoSQL Workbench for DynamoDB
• DynamoDB APIs
• AWS CLI
• AWS SDK
DynamoDB – Conditional Writes
• For PutItem, UpdateItem, DeleteItem, and BatchWriteItem
• You can specify a Condition expression to determine which items should be modified:
• attribute_exists
• attribute_not_exists
• attribute_type
• contains (for string)
• begins_with (for string)
• ProductCategory IN (:cat1, :cat2) and Price between :low and :high
• size (string length)

• Note: Filter Expression filters the results of read queries, while Condition
Expressions are for write operations
Conditional Writes – Example on Update Item

values.json
Conditional Writes – Example on Delete Item
• attribute_not_exists
• Only succeeds if the attribute doesn’t exist yet (no value)

• attribute_exists
• Opposite of attribute_not_exists
Conditional Writes –
Do Not Overwrite Elements
• attribute_not_exists(par tition_key)
• Make sure the item isn’t overwritten

• attribute_not_exists(par tition_key) and


attribute_not_exists(sor t_key)
• Make sure the partition / sort key combination is not overwritten
Conditional Writes – Example Complex
Condition

values.json
Conditional Writes – Example of String
Comparisons
• begins_with – check if prefix matches
• contains – check if string is contained in another string

values.json
DynamoDB – Local Secondary Index (LSI)
• Alternative Sor t Key for your table (same Par tition Key as that of base table)
• The Sort Key consists of one scalar attribute (String, Number, or Binary)
• Up to 5 Local Secondary Indexes per table
• Must be defined at table creation time
• Attribute Projections – can contain some or all the attributes of the base table
(KEYS_ONLY, INCLUDE, ALL)
Primary Key Attributes

Partition Key Sort Key LSI


User_ID Game_ID Game_TS Score Result
7791a3d6-… 4421 “2021-03-15T17:43:08” 92 Win
873e0634-… 4521 “2021-06-20T19:02:32” Lose
a80f73a1-… 1894 “2021-02-11T04:11:31” 77 Win
DynamoDB – Global Secondary Index (GSI)
• Alternative Primary Key (HASH or HASH+RANGE) from the base table
• Speed up queries on non-key attributes
• The Index Key consists of scalar attributes (String, Number, or Binary)
• Attribute Projections – some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)
• Must provision RCUs & WCUs for the index
• Can be added/modified after table creation

Partition Key Sort Key Attributes Partition Key Sort Key Attributes
User_ID Game_ID Game_TS Game_ID Game_TS User_ID
7791a3d6-… 4421 “2021-03-15T17:43:08” 4421 “2021-03-15T17:43:08” 7791a3d6-…
873e0634-… 4521 “2021-06-20T19:02:32” 4521 “2021-06-20T19:02:32” 873e0634-…
a80f73a1-… 1894 “2021-02-11T04:11:31” 1894 “2021-02-11T04:11:31” a80f73a1-…
TABLE (query by “User_ID”) INDEX GSI (query by “Game_ID”)
DynamoDB – Indexes and Throttling
• Global Secondary Index (GSI):
• If the writes are throttled on the GSI, then the main table will be throttled!
• Even if the WCU on the main tables are fine
• Choose your GSI partition key carefully!
• Assign your WCU capacity carefully!

• Local Secondary Index (LSI):


• Uses the WCUs and RCUs of the main table
• No special throttling considerations
DynamoDB - PartiQL
• Use a SQL-like syntax to manipulate DynamoDB tables

• Supports some (but not all) statements:


• INSERT
• UPDATE
• SELECT
• DELETE
• It supports Batch operations
DynamoDB – Optimistic Locking
• DynamoDB has a feature called “Conditional Writes”
• A strategy to ensure an item hasn’t changed before you update/delete it
• Each item has an attribute that acts as a version number

DynamoDB Table
Upda
te:
Client 1 only Name = Jo
if ver
sion = hn User_ID First_Name Version
1
Item 7791a3d6-… Michael 1
Lisa 2
p date : N ame = Lisa
U
sion = 1
only if ver
Client 2
DynamoDB Accelerator (DAX)
• Fully-managed, highly available, seamless in-memory Application
cache for DynamoDB
• Microseconds latency for cached reads & queries
• Doesn’t require application logic modification
(compatible with existing DynamoDB APIs) DAX Cluster
• Solves the “Hot Key” problem (too many reads)
• 5 minutes TTL for cache (default) …
• Up to 10 nodes in the cluster Nodes

• Multi-AZ (3 nodes minimum recommended for


production)
• Secure (Encryption at rest with KMS, VPC, IAM, Amazon DynamoDB
CloudTrail, …)

Tables
DynamoDB Accelerator (DAX) vs. ElastiCache

Amazon
ElasiCache
Store Aggregation Result

Application - Individual objects cache


- Query & Scan cache

Amazon
DynamoDB
DynamoDB Accelerator (DAX)
DynamoDB Streams
• Ordered stream of item-level modifications (create/update/delete) in a table
• Stream records can be:
• Sent to Kinesis Data Streams
• Read by AWS Lambda
• Read by Kinesis Client Library applications
• Data Retention for up to 24 hours
• Use cases:
• react to changes in real-time (welcome email to users)
• Analytics
• Insert into derivative tables
• Insert into OpenSearch Service
• Implement cross-region replication
DynamoDB Streams
messaging, notifications
Processing Layer Amazon SNS

KCL App
filtering, transforming, …
DDB Table
create/update/delete Lambda

Application Table DynamoDB analytics Amazon


Streams Redshift

archiving
Amazon S3

Kinesis Data Kinesis Data


Streams Firehose
indexing OpenSearch
Service
DynamoDB Streams
• Ability to choose the information that will be written to the stream:
• KEYS_ONLY – only the key attributes of the modified item
• NEW_IMAGE – the entire item, as it appears after it was modified
• OLD_IMAGE – the entire item, as it appeared before it was modified
• NEW_AND_OLD_IMAGES – both the new and the old images of the item
• DynamoDB Streams are made of shards, just like Kinesis Data Streams
• You don’t provision shards, this is automated by AWS

• Records are not retroactively populated in a stream after enabling it


DynamoDB Streams & AWS Lambda
Table
• You need to define an Event Source
Mapping to read from a DynamoDB
Streams DynamoDB
Streams

• You need to ensure the Lambda function return batch poll

has the appropriate permissions


AWS Lambda
Event Source
Mapping
• Your Lambda function is invoked (internal)

synchronously invoke with


Event Batch

Lambda function
DynamoDB – Time To Live (TTL) Current Tim

Friday, September 10, 2021, 11:56:11 A


• Automatically delete items after an expiry timestamp (Epoch timestamp: 1631274971)

• Doesn’t consume any WCUs (i.e., no extra cost) Expiration Process


• The TTL attribute must be a “Number” data type
scan &
with “Unix Epoch timestamp” value expire items
• Expired items deleted within 48 hours of expiration
SessionData (Table)
• Expired items, that haven’t been deleted, appears in
reads/queries/scans (if you don’t want them, filter User_ID Session_ID ExpTime (TTL)
them out)
7791a3d6-… 74686572652 1631188571
• Expired items are deleted from both LSIs and GSIs
873e0634-… 6e6f7468696 1631274971
• A delete operation for each expired item enters the
DynamoDB Streams (can help recover expired a80f73a1-… 746f2073656 1631102171

items)
• Use cases: reduce stored data by keeping only scan &
delete items
current items, adhere to regulatory obligations, …
Deletion Process
DynamoDB CLI – Good to Know
• --projection-expression: one or more attributes to retrieve
• --filter-expression: filter items before returned to you

• General AWS CLI Pagination options (e.g., DynamoDB, S3, …)


• --page-size: specify that AWS CLI retrieves the full list of items but with a larger
number of API calls instead of one API call (default: 1000 items)
• --max-items: max. number of items to show in the CLI (returns NextToken)
• --star ting-token: specify the last NextToken to retrieve the next set of items
DynamoDB Transactions
• Coordinated, all-or-nothing operations (add/update/delete) to multiple items
across one or more tables
• Provides Atomicity, Consistency, Isolation, and Durability (ACID)
• Read Modes – Eventual Consistency, Strong Consistency, Transactional
• Write Modes – Standard, Transactional
• Consumes 2x WCUs & RCUs
• DynamoDB performs 2 operations for every item (prepare & commit)
• Two operations:
• TransactGetItems – one or more GetItem operations
• TransactWriteItems – one or more PutItem, UpdateItem, and DeleteItem operations
• Use cases: financial transactions, managing orders, multiplayer games, …
DynamoDB Transactions
Application
A Transaction is written to both tables, or none!
one transaction

UpdateItem PutItem

AccountBalance (Table) BankTransactions (Table)

Account_ID Balance Last_Tx_ts Tx_ID Tx_ts From_Acc To_Acc Amount

acc_759692 230 1631188571 75969242… 230 acc_759692 acc_315972 45

acc_315972 120 1631274971 31597232… 120 acc_315972 acc_617055 100

acc_617055 570 1631102171 61705584… 570 acc_617055 acc_759692 260


DynamoDB Transactions – Capacity
Computations
• Impor tant for the exam!

• Example1: 3 Transactional writes per second, with item size 5 KB


/ ,-
• We need 3 ∗ ∗ 2 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑎𝑙 𝑐𝑜𝑠𝑡 = 30 𝑊𝐶𝑈𝑠
. ,-

• Example 2: 5 Transaction reads per second , with item size 5 KB


2 ,-
• We need 5 ∗ 3 ,- ∗ 2 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑎𝑙 𝑐𝑜𝑠𝑡 = 20 𝑅𝐶𝑈𝑠
• (5 gets rounded to the upper 4 KB)
DynamoDB as Session State Cache
• It’s common to use DynamoDB to store session states

• vs. ElastiCache
• ElastiCache is in-memory, but DynamoDB is serverless
• Both are key/value stores
• vs. EFS
• EFS must be attached to EC2 instances as a network drive
• vs. EBS & Instance Store
• EBS & Instance Store can only be used for local caching, not shared caching
• vs. S3
• S3 is higher latency, and not meant for small objects
DynamoDB Write Sharding
• Imagine we have a voting application with two
candidates, candidate A and candidate B
Partition Key Sort Key Attributes
• If Partition Key is “Candidate_ID”, this results
Candidate_ID Vote_ts Voter_ID
into two partitions, which will generate issues
(e.g., Hot Partition) Candidate_A-11 1631188571 7791

Candidate_B-17 1631274971 8301

Candidate_B-80 1631102171 6750


• A strategy that allows better distribution of
items evenly across partitions Candidate_A-20 1631102171 2404

• Add a suffix to Par tition Key value


• Two methods: Candidate_ID + Random Suffix
• Sharding Using Random Suffix
• Sharding Using Calculated Suffix
DynamoDB – Write Types
Concurrent Writes Atomic Writes

Update: Update:
Update: value = 1 Update: value = 2
INCREASE value by 1 INCREASE value by 2

The second write overwrites Both writes succeed,


7791a3d6-… Michael 0 7791a3d6-… Michael 0
the first write the value is increased by 3

Conditional Writes Batch Writes

Update: value = 1 Update: value = 2


only if value = 0 only if value = 0 Write/Update
many items at a time

The first write is accepted, 7791a3d6-… Michael 0


7791a3d6-… Michael 0
the second write fails 7791a3d6-… Michael 0

DynamoDB – Large Objects Pattern

617055.jpg 617055.jpg

upload download
Application S3 Bucket Application
(media-assets-bucket)

store get
metadata metadata
Products (Table)

Product_ID Product_Name Image_URL

759692 Jeans https://media-assets-bucket.us-east-1.amazonaws.com/759692.jpg

315972 Coat https://media-assets-bucket.us-east-1.amazonaws.com/315972.jpg

617055 Shirt https://media-assets-bucket.us-east-1.amazonaws.com/617055.jpg


DynamoDB – Indexing S3 Objects Metadata

upload invoke store object’s metadata

Application S3 Bucket Lambda function DynamoDB Table

- Search by date
- Total storage used by a customer
- List of all objects with certain attributes
- Find all objects uploaded within a date range

Client Application
(API for objects’ metadata)
DynamoDB Operations
AWS Data Pipeline
• Table Cleanup
• Option 1: Scan + DeleteItem
launches an
• Very slow, consumes RCU & WCU, expensive
• Option 2: Drop Table + Recreate table
• Fast, efficient, cheap Amazon EMR
Cluster

• Copying a DynamoDB Table

to

rea
om
tes

ds
• Option 1: Using AWS Data Pipeline

wr
wri

fro
s fr

ite

m
st
d
• Option 2: Backup and restore into a new table

rea

o
• Takes some time
• Option 3: Scan + PutItem or BatchWriteItem
• Write your own code
DynamoDB S3 Bucket
Table
DynamoDB – Security & Other Features
• Security
• VPC Endpoints available to access DynamoDB without using the Internet
• Access fully controlled by IAM
• Encryption at rest using AWS KMS and in-transit using SSL/TLS
• Backup and Restore feature available
• Point-in-time Recovery (PITR) like RDS
• No performance impact
• Global Tables
• Multi-region, multi-active, fully replicated, high performance
• DynamoDB Local
• Develop and test apps locally without accessing the DynamoDB web service (without Internet)
• AWS Database Migration Service (AWS DMS) can be used to migrate to
DynamoDB (from MongoDB, Oracle, MySQL, S3, …)
DynamoDB –
Users Interact with DynamoDB Directly
login Identity Providers

temporary AWS credentials Amazon Cognito


Clients/Applications User Pools
(Web & Mobile)
Obtain IAM Role
Google

permissions
Facebook

OpenID Connect
DynamoDB Table
DynamoDB – Fine-Grained Access Control
• Using Web Identity Federation or
Cognito Identity Pools, each user
gets AWS credentials
• You can assign an IAM Role to
these users with a Condition to
limit their API access to
DynamoDB
• LeadingKeys – limit row-level
access for users on the Primary
Key
• Attributes – limit specific
attributes the user can see
More at: https://docs.aws.amazon.com/amazondynamodb/latest/
developerguide/specifying-conditions.html
Amazon API Gateway
Build, Deploy and Manage APIs
Example: Building a Serverless API

REST API PROXY REQUESTS CRUD

Client API Gateway Lambda DynamoDB


AWS API Gateway
• AWS Lambda + API Gateway: No infrastructure to manage
• Support for the WebSocket Protocol
• Handle API versioning (v1, v2…)
• Handle different environments (dev, test, prod…)
• Handle security (Authentication and Authorization)
• Create API keys, handle request throttling
• Swagger / Open API import to quickly define APIs
• Transform and validate requests and responses
• Generate SDK and API specifications
• Cache API responses
API Gateway – Integrations High Level
• Lambda Function
• Invoke Lambda function
• Easy way to expose REST API backed by AWS Lambda
• HTTP
• Expose HTTP endpoints in the backend
• Example: internal HTTP API on premise, Application Load Balancer…
• Why? Add rate limiting, caching, user authentications, API keys, etc…
• AWS Service
• Expose any AWS API through the API Gateway
• Example: start an AWS Step Function workflow, post a message to SQS
• Why? Add authentication, deploy publicly, rate control…
API Gateway – AWS Service Integration
Kinesis Data Streams example

store .json
requests send records files

API Gateway Kinesis Data Kinesis Data Amazon S3


Client
Streams Firehose
API Gateway - Endpoint Types
• Edge-Optimized (default): For global clients
• Requests are routed through the CloudFront Edge locations (improves latency)
• The API Gateway still lives in only one region
• Regional:
• For clients within the same region
• Could manually combine with CloudFront (more control over the caching
strategies and the distribution)
• Private:
• Can only be accessed from your VPC using an interface VPC endpoint (ENI)
• Use a resource policy to define access
API Gateway – Security
• User Authentication through
• IAM Roles (useful for internal applications)
• Cognito (identity for external users – example mobile users)
• Custom Authorizer (your own logic)

• Custom Domain Name HTTPS security through integration with AWS


Certificate Manager (ACM)
• If using Edge-Optimized endpoint, then the certificate must be in us-east-1
• If using Regional endpoint, the certificate must be in the API Gateway region
• Must setup CNAME or A-alias record in Route 53
API Gateway – Deployment Stages
• Making changes in the API Gateway does not mean they’re effective
• You need to make a “deployment” for them to be in effect
• It’s a common source of confusion
• Changes are deployed to “Stages” (as many as you want)
• Use the naming you like for stages (dev, test, prod)
• Each stage has its own configuration parameters
• Stages can be rolled back as a history of deployments is kept
API Gateway – Stages v1 and v2
API breaking change

https://api.example.com/v1 v1 Stage
V1

V1 Client

https://api.example.com/v2 v2 Stage
New URL!
V2

V2 Client
API Gateway – Stage Variables
• Stage variables are like environment variables for API Gateway
• Use them to change often changing configuration values
• They can be used in:
• Lambda function ARN
• HTTP Endpoint
• Parameter mapping templates
• Use cases:
• Configure HTTP endpoints your stages talk to (dev, test, prod…)
• Pass configuration parameters to AWS Lambda through mapping templates
• Stage variables are passed to the ”context” object in AWS Lambda
• Format: ${stageVariables.variableName}
API Gateway Stage Variables & Lambda Aliases
• We create a stage variable to indicate the corresponding Lambda alias
• Our API gateway will automatically invoke the right Lambda function!
PROD Alias
Prod Stage 95%
V1

TEST Alias 5% Lambda alias changes


Test Stage
No API Gateway changes
V2
100%
DEV Alias
Dev Stage 100%
$LATEST
API Gateway – Canary Deployment
• Possibility to enable canary deployments for any stage (usually prod)
• Choose the % of traffic the canary channel receives
Prod Stage
v1
95%

5% Prod Stage Canary


Client v2

• Metrics & Logs are separate (for better monitoring)


• Possibility to override stage variables for canary
• This is blue / green deployment with AWS Lambda & API Gateway
API Gateway - Integration Types
• Integration Type MOCK
• API Gateway returns a response without sending the request to the backend

• Integration Type HTTP / AWS (Lambda & AWS Services)


• you must configure both the integration request and integration response
• Setup data mapping using mapping templates for the request & response

REST API AWS Service Integration

Client API Gateway SQS Queue


+ Mapping Templates
API Gateway - Integration Types
• Integration Type AWS_PROXY (Lambda Proxy):
• incoming request from the client is the input to Lambda
• The function is responsible for the logic of request / response
• No mapping template, headers, query string parameters… are passed as
arguments

Lambda function invocation payload Lambda function expected response


API Gateway - Integration Types
• Integration Type HTTP_PROXY
• No mapping template
• The HTTP request is passed to the backend
• The HTTP response from the backend is forwarded by API Gateway
• Possibility to add HTTP Headers if need be (ex: API key)

HTTP_PROXY
HTTP Request Request/Responses are proxied
+ (optional)
HTTP Header
Client API Gateway API Key: asjdh2j3jh3j… Application Load
Balancer
Mapping Templates (AWS & HTTP Integration)
• Mapping templates can be used to modify request / responses
• Rename / Modify query string parameters
• Modify body content
• Add headers
• Uses Velocity Template Language (VTL): for loop, if etc…
• Filter output results (remove unnecessary data)
• Content-Type can be set to application/json or application/xml
Mapping Example: JSON to XML with SOAP
• SOAP API are XML based, whereas REST API are JSON based
RESTful, JSON Payload XML Payload

Client API Gateway + Mapping Template


SOAP API

• In this case, API Gateway should:


• Extract data from the request: either path, payload or header
• Build SOAP message based on request data (mapping template)
• Call SOAP service and receive XML response
• Transform XML response to desired format (like JSON), and respond to the user
Mapping Example: Query String parameters
Client

HTTP
http://example.com/path?name=foo&other=bar

API Gateway + MAPPING TEMPLATE

JSON
{ “my_variable”: “foo”, “other_variable”: “bar” } You can rename variables!
(map them to anything you want)

Lambda
API Gateway - Open API spec
• Common way of defining REST APIs, using API definition as code
• Import existing OpenAPI 3.0 spec to API Gateway
• Method
• Method Request
• Integration Request
• Method Response
• + AWS extensions for API gateway and setup every single option
• Can export current API as OpenAPI spec
• OpenAPI specs can be written in YAML or JSON
• Using OpenAPI we can generate SDK for our applications
REST API – Request Validation
• You can configure API Gateway to perform basic validation of an API
request before proceeding with the integration request
• When the validation fails, API Gateway immediately fails the request
• Returns a 400-error response to the caller
• This reduces unnecessary calls to the backend
• Checks:
• The required request parameters in the URI, query string, and headers of an
incoming request are included and non-blank
• The applicable request payload adheres to the configured JSON Schema
request model of the method
REST API – RequestValidation – OpenAPI
• Setup request validation by importing OpenAPI definitions file

Enable params-only Validator


on all API methods

Enable all Validator on the


POST /validation method
(overrides the params-only validator
inherited from the API)

Defines the Validators


Caching API responses
Client
• Caching reduces the number of calls made to
the backend
• Default TTL (time to live) is 300 seconds
(min: 0s, max: 3600s)
Check
• Caches are defined per stage cache
API Gateway
• Possible to override cache settings per Gateway
cache
method
• Cache encryption option If cache miss
• Cache capacity between 0.5GB to 237GB
• Cache is expensive, makes sense in backend
production, may not make sense in dev / test
API Gateway Cache Invalidation
• Able to flush the entire cache
(invalidate it) immediately
• Clients can invalidate the
cache with header: Cache-
Control: max-age=0 (with
proper IAM authorization)
• If you don't impose
an InvalidateCache policy (or
choose the Require
authorization check box in
the console), any client can
invalidate the API cache
API Gateway – Usage Plans & API Keys
• If you want to make an API available as an offering ($) to your customers
• Usage Plan:
• who can access one or more deployed API stages and methods
• how much and how fast they can access them
• uses API keys to identify API clients and meter access
• configure throttling limits and quota limits that are enforced on individual client
• API Keys:
• alphanumeric string values to distribute to your customers
• Ex: WBjHxNtoAb4WPKBC7cGm64CBibIb24b4jt8jJHo9
• Can use with usage plans to control access
• Throttling limits are applied to the API keys
• Quotas limits is the overall number of maximum requests
API Gateway – Correct Order for API keys
• To configure a usage plan
1. Create one or more APIs, configure the methods to require an API key, and
deploy the APIs to stages.
2. Generate or import API keys to distribute to application developers (your
customers) who will be using your API.
3. Create the usage plan with the desired throttle and quota limits.
4. Associate API stages and API keys with the usage plan.

• Callers of the API must supply an assigned API key in the x-api-key header in
requests to the API.
API Gateway – Logging & Tracing
• CloudWatch Logs
• Log contains information about request/response body
• Enable CloudWatch logging at the Stage level (with Log Level - ERROR, DEBUG, INFO)
• Can override settings on a per API basis
API Gateway
request request

response response

User backend
request response

CloudWatch Logs

• X-Ray
• Enable tracing to get extra information about requests in API Gateway
• X-Ray API Gateway + AWS Lambda gives you the full picture
API Gateway – CloudWatch Metrics
• Metrics are by stage, Possibility to enable detailed metrics
• CacheHitCount & CacheMissCount: efficiency of the cache
• Count: The total number API requests in a given period.
• IntegrationLatency: The time between when API Gateway relays a
request to the backend and when it receives a response from the
backend.
• Latency: The time between when API Gateway receives a request from
a client and when it returns a response to the client. The latency
includes the integration latency and other API Gateway overhead.
• 4XXError (client-side) & 5XXError (server-side)
API Gateway Throttling
• Account Limit
• API Gateway throttles requests at10000 rps across all API
• Soft limit that can be increased upon request
• In case of throttling => 429 Too Many Requests (retriable error)
• Can set Stage limit & Method limits to improve performance
• Or you can define Usage Plans to throttle per customer

• Just like Lambda Concurrency, one API that is overloaded, if not limited,
can cause the other APIs to be throttled
API Gateway - Errors
• 4xx means Client errors
• 400: Bad Request
• 403: Access Denied, WAF filtered
• 429: Quota exceeded, Throttle

• 5xx means Server errors


• 502: Bad Gateway Exception, usually for an incompatible output returned from a
Lambda proxy integration backend and occasionally for out-of-order invocations due to
heavy loads.
• 503: Service Unavailable Exception
• 504: Integration Failure – ex Endpoint Request Timed-out Exception
API Gateway requests time out after 29 second maximum
AWS API Gateway - CORS
• CORS must be enabled when you receive API calls from another
domain.
• The OPTIONS pre-flight request must contain the following headers:
• Access-Control-Allow-Methods
• Access-Control-Allow-Headers
• Access-Control-Allow-Origin
• CORS can be enabled through the console
CORS – Enabled on the API Gateway
Preflight Request
OPTIONS /
Host: api.example.com
Origin Origin: https://www.example.com Cross Origin
https://www.example.com hups://api.example.com

Preflight Response
Access-Control-Allow-Origin: https://www.example.com
Access-Control-Allow-Methods: GET, PUT, DELETE

Web Browser
S3 bucket
static content GET /
Host: api.example.com
Origin: https://www.example.com
CORS Headers received previously allowed the origin
The web browser can now make the requests
API Gateway – Security
IAM Permissions
• Create an IAM policy authorization and attach to User / Role
• Authentication = IAM | Authorization = IAM Policy
• Good to provide access within AWS (EC2, Lambda, IAM users…)
• Leverages “Sig v4” capability where IAM credential are in headers

IAM

IAM Policy check


REST API w/ Sig v4

Client API Gateway Backend


API Gateway – Resource Policies
• Resource policies (similar
to Lambda Resource
Policy)

• Allow for Cross Account


Access (combined with
IAM Security)
• Allow for a specific source
IP address
• Allow for a VPC Endpoint
API Gateway – Security
Cognito User Pools
• Cognito fully manages user lifecycle, token expires automatically
• API gateway verifies identity automatically from AWS Cognito
• No custom implementation required
• Authentication = Cognito User Pools | Authorization = API Gateway Methods

Authenticate Cognito User Pools


Retrieve token

Evaluate Cognito Token

REST API + Pass Token

API Gateway Backend


API Gateway – Security
Lambda Authorizer (formerly Custom
Authorizers)
• Token-based authorizer (bearer token) – ex JWT (JSON Web Token) or
Oauth
• A request parameter-based Lambda authorizer (headers, query string, stage
var)
• Lambda must return an IAM policy for the user, result policy is cached
• Authentication = External | Authorization = Authorizer
Lambda Lambda function
3rd party
Authenticate Authentication
Retrieve token system Context + token Return IAM Principal
Or request params + IAM Policy

Request w/ Bearer token


Or request params
API Gateway Backend
Policy Cache
API Gateway – Security – Summary
• IAM:
• Great for users / roles already within your AWS account, + resource policy for cross
account
• Handle authentication + authorization
• Leverages Signature v4
• Custom Authorizer :
• Great for 3rd party tokens
• Very flexible in terms of what IAM policy is returned
• Handle Authentication verification + Authorization in the Lambda function
• Pay per Lambda invocation, results are cached
• Cognito User Pool:
• You manage your own user pool (can be backed by Facebook, Google login etc…)
• No need to write any custom code
• Must implement authorization in the backend
API Gateway – HTTP API vs REST API
• HTTP APIs
• low-latency, cost-effective AWS
Lambda proxy, HTTP proxy APIs and Authorizers HTTP API REST API
private integration (no data mapping) AWS Lambda ✓ ✓
• support OIDC and OAuth 2.0 IAM ✓ ✓
authorization, and built-in support for Resource Policies ✓
CORS Amazon Cognito ✓* ✓
• No usage plans and API keys Native OpenID Connect ✓
• REST APIs / OAuth 2.0 / JWT

• All features (except Native OpenID


Connect / OAuth 2.0)
Full list here: https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-vs-rest.html
API Gateway – WebSocket API – Overview
CHAT APPLICATION Client 1
• What’s WebSocket?
• Two-way interactive communication Persistent connection
between a user’s browser and a server
• Server can push information to the client WebSocket API
• This enables stateful application use cases API Gateway
• WebSocket APIs are often used in real-
time applications such as chat
applications, collaboration platforms,
multiplayer games, and financial trading
platforms.
Lambda funcuon Lambda function Lambda function
• Works with AWS Services (Lambda, (onConnect) (sendMessage) (onDisconnect)
DynamoDB) or HTTP endpoints

Amazon DynamoDB
Connecting to the API
WebSocket URL
wss://[some-uniqueid].execute-api.[region].amazonaws.com/[stage-name]

connect invoke
connectionId connectionId
Clients Lambda function
Amazon API Gateway Amazon DynamoDB
WebSocket API (onConnect)
Client to Server Messaging
ConnectionID is re-used
WebSocket URL
wss://abcdef.execute-api.us-west-1.amazonaws.com/dev

send message
invoke
frames

frames
connectionId connectionId
Clients Lambda funcxon
Amazon API Gateway Amazon DynamoDB
WebSocket API (sendMessage)
Server to Client Messaging
WebSocket URL
wss://abcdef.execute-api.us-west-1.amazonaws.com/dev

send message
invoke
connectionId connectionId
Clients Lambda function
Amazon API Gateway Amazon DynamoDB
WebSocket API (sendMessage)

Connection URL HTTP POST (IAM Sig v4)


callback
Connection URL
wss://abcdef.execute-api.us-west-1.amazonaws.com/dev/@connections/connectionId
Connection URL Operations
Connection URL
wss://abcdef.execute-api.us-west-1.amazonaws.com/dev/@connections/connectionId

OperaNon Action
POST Sends a message from the Server to the connected WS Client
GET Gets the latest connection status of the connected WS Client
DELETE Disconnect the connected Client from the WS connection
API Gateway – WebSocket API – Routing
INCOMING DATA
• Incoming JSON messages are routed to {
different backend "service" : "chat",
"action" : "join",
• If no routes => sent to $default "data" : {
• You request a route selection expression to "room" : "room1234"
}
select the field on JSON to route from }
ROUTE KEY TABLE – API GATEWAY
• Sample expression: $request.body.action
$connect
• The result is evaluated against the route keys
$disconnect
available in your API Gateway
$default
• The route is then connected to the backend join
Backend
you’ve setup through API Gateway quit
integration

delete

API Gateway - Architecture
Client Client
• Create a single interface for
all the microservices in your customer1.example.com customer2.example.com

company
API Gateway
• Use API endpoints with
various resources Route 53
Domain Registrar, DN
• Apply a simple domain name /service1 /docs /service2
and SSL certificates
• Can apply forwarding and ELB ELB
transformation rules at the S3 Bucket
API Gateway level

ECS Cluster Amazon EC2


(Microservices) Auto Scaling
AWS CICD
CodeCommit, CodePipeline, CodeBuild, CodeDeploy, …
CICD – Introduction
• We have learned how to:
• Create AWS resources, manually (fundamentals)
• Interact with AWS programmatically (AWS CLI)
• Deploy code to AWS using Elastic Beanstalk
• All these manual steps make it very likely for us to do mistakes!

• We would like our code “in a repository” and have it deployed onto AWS
• Automatically
• The right way
• Making sure it’s tested before being deployed
• With possibility to go into different stages (dev, test, staging, prod)
• With manual approval where needed
• To be a proper AWS developer… we need to learn AWS CICD
CICD – Introduction
• This section is all about automating the deployment we’ve done so far
while adding increased safety

• We’ll learn about:


• AWS CodeCommit – storing our code
• AWS CodePipeline – automating our pipeline from code to Elastic Beanstalk
• AWS CodeBuild – building and testing our code
• AWS CodeDeploy – deploying the code to EC2 instances (not Elastic Beanstalk)
• AWS CodeStar – manage software development activities in one place
• AWS CodeAr tifact – store, publish, and share software packages
• AWS CodeGuru – automated code reviews using Machine Learning
Continuous Integration (CI)
• Developers push the code to a code
repository often (e.g., GitHub, CodeCommit,
Bitbucket…) Developer
• A testing / build server checks the code as
soon as it’s pushed (CodeBuild, Jenkins CI, …)
• The developer gets feedback about the tests Build & test push code
and checks that have passed / failed results

• Find bugs early, then fix bugs fetch code


• Deliver faster as the code is tested
• Deploy often Code Repository
Build Server
• Happier developers, as they’re unblocked (build & test)
Continuous Delivery (CD)
• Ensures that the software can be released reliably whenever needed
• Ensures deployments happen often and are quick
• Shift away from “one release every 3 months” to ”5 releases a day”
• That usually means automated deployment (e.g., CodeDeploy, Jenkins CD,
Spinnaker, …)

deploy every
push code fetch code passed build

Developer Code Repository Build Server Deployment


(build & test) Server

Application Application
Server v1 Server v2
Technology Stack for CICD

AWS CodeCommit AWS CodeBuild AWS Elastic Beanstalk

Code Build Test Deploy Provision

EC2 Instances
GitHub Jenkins CI
On-premises
Instances

AWS Lambda
any 3!" party any 3!" party AWS CodeDeploy
Code Repository CI Servers Amazon ECS

orchestrate using:
AWS CodePipeline
AWS CodeCommit
• Version control is the ability to understand the various changes that
happened to the code over time (and possibly roll back)
• All these are enabled by using a version control system such as Git
• A Git repository can be synchronized on your computer, but it usually is
uploaded on a central online repository
• Benefits are:
• Collaborate with other developers
• Make sure the code is backed-up somewhere
• Make sure it’s fully viewable and auditable
AWS CodeCommit
Emma John
(Developer) (Developer)
• Git repositories can be expensive
• The industry includes GitHub, GitLab, Bitbucket, …
• And AWS CodeCommit:
• Private Git repositories
• No size limit on repositories (scale seamlessly)
push code
• Fully managed, highly available
• Code only in AWS Cloud account => increased security
and compliance
• Security (encrypted, access control, …)
• Integrated with Jenkins, AWS CodeBuild, and other CI tools
Code Repository
CodeCommit – Security
• Interactions are done using Git (standard)
• Authentication
• SSH Keys – AWS Users can configure SSH keys in their IAM Console
• HTTPS – with AWS CLI Credential helper or Git Credentials for IAM user
• Authorization
• IAM policies to manage users/roles permissions to repositories
• Encryption
• Repositories are automatically encrypted at rest using AWS KMS
• Encrypted in transit (can only use HTTPS or SSH – both secure)
• Cross-account Access
• Do NOT share your SSH keys or your AWS credentials
• Use an IAM Role in your AWS account and use AWS STS (AssumeRole API)
CodeCommit vs. GitHub
CodeCommit GitHub
Support Code Review (Pull Requests)
Integration with AWS CodeBuild

Authentication (SSH & HTTPS)


Security IAM Users & Roles GitHub Users

Hosting Managed & hosted by AWS - Hosted by GitHub


- GitHub Enterprise: self
hosted on your servers

UI Minimal Fully Featured


AWS CodePipeline
• Visual Workflow to orchestrate your CICD
• Source – CodeCommit, ECR, S3, Bitbucket, GitHub
• Build – CodeBuild, Jenkins, CloudBees, TeamCity
• Test – CodeBuild, AWS Device Farm, 3rd party tools, …
• Deploy – CodeDeploy, Elastic Beanstalk, CloudFormation, ECS, S3, …
• Invoke – Lambda, Step Functions
• Consists of stages:
• Each stage can have sequential actions and/or parallel actions
• Example: Build è Test è Deploy è Load Testing è …
• Manual approval can be defined at any stage
Technology Stack for CICD

AWS CodeCommit AWS CodeBuild AWS Elastic Beanstalk

Code Build Test Deploy Provision

EC2 Instances
GitHub Jenkins CI
On-premises
Instances

AWS Lambda
any 3!" party any 3!" party AWS CodeDeploy
Code Repository CI Servers Amazon ECS

orchestrate using:
AWS CodePipeline
CodePipeline – Artifacts
• Each pipeline stage can create ar tifacts
• Artifacts stored in an S3 bucket and passed on to the next stage

AWS CodePipeline
output input output input
Push code artifacts artifacts artifacts artifacts deploy

Developer AWS CodeCommit AWS CodeBuild AWS CodeDeploy


(Source) (Build) (Deploy)

S3 Bucket
CodePipeline – Troubleshooting
• For CodePipeline Pipeline/Action/Stage Execution State Changes
• Use CloudWatch Events (Amazon EventBridge). Example:
• You can create events for failed pipelines
• You can create events for cancelled stages

• If CodePipeline fails a stage, your pipeline stops, and you can get
information in the console
• If pipeline can’t perform an action, make sure the “IAM Service Role”
attached does have enough IAM permissions (IAM Policy)
• AWS CloudTrail can be used to audit AWS API calls
CodePipeline – Events vs. Webhooks vs. Polling
Events Webhooks

event trigger HTTP Webhook


(e.g., new commit)

Script
CodeCommit EventBridge CodePipeline CodePipeline

Polling
trigger

regular checks

GitHub CodeStar Source CodePipeline


Connection GitHub CodePipeline
(GitHub App)

Note: Events are the default and recommended


CodePipeline – Action Types Constraints for
Artifacts Owner Action Type Provider
Valid Number of
Input Artifacts
Valid Number o
Output Artifacts

• Owner AWS
AWS
Source
Source
S3
CodeCommit
0
0
1
1
• AWS – for AWS services AWS Source ECR 0 1

• 3rd Party – GitHub or Alexa Skills 3rd Party Source GitHub 0 1

Kit AWS Build Codebuild 1 to 5 0 to 5

• Custom – Jenkins AWS Test CodeBuild 1 to 5 0 to 5


AWS Test Device Farm 1 0
• Action Type AWS Approval Manual 0 0

• Source – S3, ECR, GitHub, … AWS Deploy S3 1 0


AWS Deploy CloudFormation 0 to 10 0 to 1
• Build – CodeBuild, Jenkins AWS Deploy CodeDeploy 1 0
• Test – CodeBuild, Device Farm, AWS Deploy Elastic Beanstalk 1 0
Jenkins AWS Deploy OpsWorks Stacks 1 0

• Approval – Manual AWS Deploy ECS 1 0

• Invoke – Lambda, Step Functions AWS Deploy Service Catalog 1 0


AWS Invoke Lambda 0 to 5 0 to 5
• Deploy – S3, CloudFormation, AWS Invoke Step Functions 0 to 1 0 to 1
CodeDeploy, Elastic Beanstalk, 3rd Party Deploy Alexa Skills Kit 1 to 2 0
OpsWorks, ECS, Service Catalog, … Custom Build Jenkins 0 to 5 0 to 5
Custom Test Jenkins 0 to 5 0 to 5
Custom Any Suggested Category Specified in Custom Action 0 to 5
CodePipeline – Manual Approval Stage
CodePipeline Important: Owner is “AWS”,
Action is “Manual”
new commit trigger deploy

CodeCommit CodeBuild Manual Approval CodeDeploy

trigger
approve

email

SNS IAM User

IAM User Permissions


AWS CodeBuild
• A fully managed continuous integration (CI) service
• Continuous scaling (no servers to manage or provision – no build queue)
• Compile source code, run tests, produce software packages, …
• Alternative to other build tools (e.g., Jenkins)
• Charged per minute for compute resources (time it takes to complete the builds)
• Leverages Docker under the hood for reproducible builds
• Use prepackaged Docker images or create your own custom Docker image
• Security:
• Integration with KMS for encryption of build artifacts
• IAM for CodeBuild permissions, and VPC for network security
• AWS CloudTrail for API calls logging
AWS CodeBuild
• Source – CodeCommit, S3, Bitbucket, GitHub
• Build instructions: Code file buildspec.yml or insert manually in
Console
• Output logs can be stored in Amazon S3 & CloudWatch Logs
• Use CloudWatch Metrics to monitor build statistics
• Use EventBridge to detect failed builds and trigger notifications
• Use CloudWatch Alarms to notify if you need “thresholds” for failures

• Build Projects can be defined within CodePipeline or CodeBuild


CodeBuild – Supported Environments
• Java
• Ruby
• Python
• Go
• Node.js
• Android
• .NET Core
• PHP
• Docker – extend any environment you like
CodeBuild – How it Works
CodeCommit CodeBuild artifacts
(Source) S3 Bucket
CodeBuild
… Container

Source code + buildspec.yml running instructions from buildspec.yml

store logs
store
retrieve
reusbale pieces

Docker Image S3 Bucket (Cache) Amazon S3 CloudWatch Logs


(Prepackaged or Custom) (optional)
CodeBuild – buildspec.yml
• buildspec.yml file must be at the root of your code
• env – define environment variables
• variables – plaintext variables
• parameter-store – variables stored in SSM Parameter Store
• secrets-manager – variables stored in AWS Secrets Manager
• phases – specify commands to run:
• install – install dependencies you may need for your build
• pre_build – final commands to execute before build
• Build – actual build commands
• post_build – finishing touches (e.g., zip output)
• ar tifacts – what to upload to S3 (encrypted with KMS)
• cache – files to cache (usually dependencies) to S3 for
future build speedup
CodeBuild – Local Build
• In case of need of deep troubleshooting beyond logs…
• You can run CodeBuild locally on your desktop (after installing Docker)
• For this, leverage the CodeBuild Agent

• https://docs.aws.amazon.com/codebuild/latest/userguide/use-codebuild-
agent.html
CodeBuild – Inside VPC
VPC
• By default, your CodeBuild containers are
launched outside your VPC Private Subnet

• It cannot access resources in a VPC CodeBuild


• You can specify a VPC configuration: CodeBuild
Container
• VPC ID
• Subnet IDs
• Security Group IDs
• Then your build can access resources in your
VPC (e.g., RDS, ElastiCache, EC2, ALB, …) RDS DB
• Use cases: integration tests, data query, Instance
internal load balancers, …
CodePipeline – CloudFormation Integration
• CloudFormation is used to deploy complex infrastructure using an API
• CREATE_UPDATE – create or update an existing stack
• DELETE_ONLY – delete a stack if it exists
deploy to
CodePipeline Production
CREATE_UPDATE

CodeBuild CloudFormation CodeBuild CloudFormation CloudFormation


Build app Deploy Infra & app Test app Delete Test Infra Deploy Prod Infra

DELETE_ONLY

CREATE_UPDATE CloudFormation Stack


HTTP Test Suite
Auto Scaling Group

ALB
AWS CodeDeploy
• Deployment service that automates
application deployment
• Deploy new applications versions to EC2 v1 v2
Instances, On-premises servers, Lambda
functions, ECS Services v1 v2
• Automated Rollback capability in case of
failed deployments, or trigger CloudWatch
Alarm v1 v2

• Gradual deployment control


• A file named appspec.yml defines how the v1 v2
deployment happens
CodeDeploy – EC2/On-premises Platform
• Can deploy to EC2 Instances & on-premises servers
• Perform in-place deployments or blue/green deployments
• Must run the CodeDeploy Agent on the target instances
• Define deployment speed
• AllAtOnce: most downtime
• HalfAtATime: reduced capacity by 50%
• OneAtATime: slowest, lowest availability impact
• Custom: define your %
CodeDeploy – In-Place Deployment
Half At A Time

v1 v2 v2 v2
Half

v1 v2 v2 v2

v1 v1 v1 v2

v1 v1 Other Half v1 v2
CodeDeploy – Blue-Green Deployment
Auto Scaling Group Auto Scaling Group

v1 v1

v1 v1
Application Application
v1 v1
Load Balancer Load Balancer

Auto Scaling Group Auto Scaling Group

v2 v2

v2 v2

v2 Application v2
Load Balancer
CodeDeploy Agent EC2 Instance
With CodeDeploy Agent

• The CodeDeploy Agent must be download application


running on the EC2 instances as a pre-
requisites S3 Bucket
IAM Permissions
• It can be installed and updated
automatically if you’re using Systems
Manager
• The EC2 Instances must have sufficient
permissions to access Amazon S3 to get
deployment bundles
CodeDeploy – Lambda Platform
• CodeDeploy can help you automate
traffic shift for Lambda aliases Make X vary over time until X = 100%

• Feature is integrated within the SAM PROD Alias


framework
100 – X%
• Linear : grow traffic every N minutes V1
until 100%
• LambdaLinear10PercentEvery3Minutes
• LambdaLinear10PercentEvery10Minutes
• Canary: try X percent then 100% X%
• LambdaCanary10Percent5Minutes CodeDeploy
V2
• LambdaCanary10Percent30Minutes
• AllAtOnce: immediate
CodeDeploy – ECS Platform
CodeDeploy
• CodeDeploy can help you automate
the deployment of a new ECS Task
Definition
• Only Blue/Green Deployments Application Load
• Linear : grow traffic every N minutes Balancer
(required)
until 100%
• ECSLinear10PercentEvery3Minutes
100% - X% ECS Cluster X%
• ECSLinear10PercentEvery10Minutes
• Canary: try X percent then 100% Target Group (Blue) Target Group (Green)
• ECSCanary10Percent5Minutes
• ECSCanary10Percent30Minutes v1 v1 v1 v1 v2 v2 v2 v2
• AllAtOnce: immediate
Blue-Green Deployment
CodeDeploy – Deployment to EC2
• Define how to deploy the
application using
appspec.yml +
v1 v2 v2 v2
Deployment Strategy

Half
• Will do In-place update to v1 v2 v2 v2
your fleet of EC2 instances

Other Half
• Can use hooks to verify the v1 v1 v1 v2

deployment after each


v1 v1 v1 v2
deployment phase
CodeDeploy – Deploy to an ASG
Blue/Green Deployment
• In-place Deployment
• Updates existing EC2 instances
• Newly created EC2 instances by an Application
ASG will also get automated Load Balancer
deployments
• Blue/Green Deployment
• A new Auto-Scaling Group is created
(settings are copied) EC2 Instances EC2 Instances
(Launch Template v1) (Launch Template v2)
• Choose how long to keep the old Auto Scaling Group
EC2 instances (old ASG)
• Must be using an ELB

AWS CodeDeploy
CodeDeploy – Redeploy & Rollbacks
• Rollback = redeploy a previously deployed revision of your application
• Deployments can be rolled back:
• Automatically – rollback when a deployment fails or rollback when a
CloudWatch Alarm thresholds are met
• Manually
• Disable Rollbacks — do not perform rollbacks for this deployment

• If a roll back happens, CodeDeploy redeploys the last known good


revision as a new deployment (not a restored version)
CodeDeploy – Troubleshooting 28/01/2023 @ 5:01pm

• Deployment Error : “InvalidSignatureException –


Signature expired: [time] is now earlier than [time]” CodeDeploy

• For CodeDeploy to perform its operations, it requires


accurate time references
• If the date and time on your EC2 instance are not set

InvalidSignatureException
correctly, they might not match the signature date of your

deploy
deployment request, which CodeDeploy rejects

error
• Check log files to understand deployment issues
• For Amazon Linux, Ubuntu, and RHEL log files stored at
/opt/codedeploy-agent/deployment-root/deployment-
logs/codedeploy-agent-deployments.log EC2 Instance

14/06/2020 @ 4:30am
AWS CodeStar
• An integrated solution that groups: GitHub, CodeCommit, CodeBuild,
CodeDeploy, CloudFormation, CodePipeline, CloudWatch, …
• Quickly create “CICD-ready” projects for EC2, Lambda, Elastic Beanstalk
• Supported languages: C#, Go, HTML 5, Java, Node.js, PHP, Python, Ruby
• Issue tracking integration with JIRA / GitHub Issues
• Ability to integrate with Cloud9 to obtain a web IDE (not all regions)
• One dashboard to view all your components
• Free service, pay only for the underlying usage of other services
• Limited Customization
AWS CodeArtifact
• Software packages depend on each other to be built (also called code
dependencies), and new ones are created
• Storing and retrieving these dependencies is called ar tifact
management
• Traditionally you need to setup your own artifact management system
• CodeAr tifact is a secure, scalable, and cost-effective ar tifact
management for software development
• Works with common dependency management tools such as Maven,
Gradle, npm, yarn, twine, pip, and NuGet
• Developers and CodeBuild can then retrieve dependencies straight
from CodeAr tifact
AWS CodeArtifact
VPC
npm
fetch
Public Artifact Repositories AWS CodeArtifact
JavaScript
Domain AWS CodeBuild
pip
NuGet proxy
Repository A Repository B Python

NuGet

Package 1 Package 2 .NET


publish/approve Maven
packages

Java
IT Leader
CodeArtifact – EventBridge Integration
invoke Lambda
Function
Event is created when a Package
version is created, modified, or deleted
activate Step Functions
State Machine

events message
SNS

CodeArtifact EventBridge message


SQS

CodePipeline
start Rebuild & Redeploy
an Application
with the latest
security fixes
CodeCommit CodeBuild CodeDeploy
CodeArtifact – Resource Policy
• Can be used to authorize another
account to access CodeArtifact
• A given principal can either read all the
packages in a repository or none of them

Account B Account A
(222333344555) (123456789012)

read packages

IAM User CodeArtifact


(bob)
Repository
Repository Resource Policy
CodeArtifact – Upstream Repositories
External
• A CodeArtifact repository can have other Repository
CodeArtifact repositories as Upstream External Connection
Repositories
CodeArtifact
• Allows a package manager client to access
the packages that are contained in more Repository A Repository B

than one repository using a single Upstream Upstream


repository endpoint
Repository
(my-repo)
• Up to 10 Upstream Repositories
• Only one external connection
Developer
(npm)
CodeArtifact – External Connection
• An External Connection is a connection between a Public
CodeArtifact Repository and an external/public repository Repository
(e.g., Maven, npm, PyPI, NuGet…)
External Connection
• Allows you to fetch packages that are not already present
in your CodeArtifact Repository CodeArtifact
• A repository has a maximum of 1 external connection
Cached
• Create many repositories for many external connections Repository A
Packages

Upstream
• Example – Connect to npmjs.com
• Configure one CodeArtifact Repository in your domain with an
external connection to npmjs.com Repo B Repo C Repo D …
• Configure all the other repositories with an upstream to it
• Packages fetched from npmjs.com are cached in the Upstream
Repository, rather than fetching and storing them in each
Repository
Developer
(npm)
CodeArtifact – Retention Public
Repository
Lodash
• If a requested package version is found in an Upstream (v4.17.20)
Repository, a reference to it is retained and is always available External Connectio
from the Downstream Repository
• The retained package version is not affected by changes to the CodeArtifact
Upstream Repository (deleting it, updating the package, …)
• Intermediate repositories do not keep the package Lodash Repository C
• Example – Fetching Package from npmjs.com (v4.17.20)

• Package Manager connected to Repository A requests the package Upstream


Lodash v4.17.20
Repository B
• The package version is not present in any of the three repositories
• The package version will be fetched from npmjs.com Upstream
• When Lodash 4.17.20 is fetched, it will be retained in: Lodash
• Repository A – the most-downstream repository Repository A
(v4.17.20)
• Repository C – has the external connection to npmjs.com
• The Package version will not be retained in Repository B as that
is an intermediate Repository
Developer
(npm)
CodeArtifact – Domains
• Deduplicated Storage – asset only needs to be Domain Resource-based Policy
stored once in a domain, even if it's available in many
repositories (only pay once for storage) CodeArtifact Domain
• Fast Copying – only metadata record are updated
when you pull packages from an Upstream Account A Account B
CodeArtifact Repository into a Downstream
Repository A Repository B
• Easy Sharing Across Repositories and Teams – all
the assets and metadata in a domain are encrypted
with a single AWS KMS Key Account C Account D

• Apply Policy Across Multiple Repositories – Repository C Repository D


domain administrator can apply policy across the
domain such as: repos sharing
the same package
• Restricting which accounts have access to repositories in Shared Storage
the domain
• Who can configure connections to public repositories to
use as sources of packages
Amazon CodeGuru
• An ML-powered service for automated code reviews and application
performance recommendations
• Provides two functionalities
• CodeGuru Reviewer : automated code reviews for static code analysis (development)
• CodeGuru Profiler : visibility/recommendations about application performance during
runtime (production)

CodeGuru Reviewer CodeGuru Profiler


Detect and optimize Identify performance
Built-in code reviews the expensive lines and cost improvements
with actionable of code pre-prod in production
recommendations

Coding Build & Test Deploy Measure


Amazon CodeGuru Reviewer
• Identify critical issues, security
vulnerabilities, and hard-to-find bugs
• Example: common coding best practices,
resource leaks, security detection, input
validation
• Uses Machine Learning and automated
reasoning
• Hard-learned lessons across millions of
code reviews on 1000s of open-source
and Amazon repositories
• Supports Java and Python
• Integrates with GitHub, Bitbucket, and
AWS CodeCommit

https://aws.amazon.com/codeguru/features/
Amazon CodeGuru Profiler
• Helps understand the runtime behavior of your
application
• Example: identify if your application is consuming
excessive CPU capacity on a logging routine
• Features:
• Identify and remove code inefficiencies
• Improve application performance (e.g., reduce CPU
utilization)
• Decrease compute costs
• Provides heap summary (identify which objects using
up memory)
• Anomaly Detection
• Support applications running on AWS or on-
premise
• Minimal overhead on application

https://aws.amazon.com/codeguru/features/
Amazon CodeGuru – Agent Configuration
• MaxStackDepth – the maximum depth of the stacks in the code that is
represented in the profile
• Example: if CodeGuru Profiler finds a method A, which calls method B, which calls
method C, which calls method D, then the depth is 4
• If the MaxStackDepth is set to 2, then the profiler evaluates A and B
• MemoryUsageLimitPercent – the memory percentage used by the profiler
• MinimumTimeForReportingInMilliseconds – the minimum time between
sending reports (milliseconds)
• Repor tingIntervalInMilliseconds – the reporting interval used to report
profiles (milliseconds)
• SamplingIntervalInMilliseconds – the sampling interval that is used to profile
samples (milliseconds)
• Reduce to have a higher sampling rate
AWS Cloud9
• Cloud-based Integrated Development
Environment (IDE)
• Code editor, debugger, terminal in a browser
• Work on your projects from anywhere with
an Internet connection
• Prepackaged with essential tools for popular
programming languages (JavaScript, Python,
PHP, …)
• Share your development environment with
your team (pair programming)
• Fully integrated with AWS SAM & Lambda
to easily build serverless applications https://aws.amazon.com/cloud9/
AWS Serverless Application
Model (SAM)
Taking your Serverless Development to the next level
AWS SAM
• SAM = Serverless Application Model
• Framework for developing and deploying serverless applications
• All the configuration is YAML code
• Generate complex CloudFormation from simple SAM YAML file
• Supports anything from CloudFormation: Outputs, Mappings,
Parameters, Resources…
• Only two commands to deploy to AWS
• SAM can use CodeDeploy to deploy Lambda functions
• SAM can help you to run Lambda, API Gateway, DynamoDB locally
AWS SAM – Recipe
• Transform Header indicates it’s SAM template:
• Transform: 'AWS::Serverless-2016-10-31'
• Write Code
• AWS::Serverless::Function
• AWS::Serverless::Api
• AWS::Serverless::SimpleTable
• Package & Deploy:
• aws cloudformation package / sam package
• aws cloudformation deploy / sam deploy
Deep Dive into SAM Deployment
Build the application locally Package the application Deploy the application
(sam build) (sam package OR (sam deploy OR
aws cloudformation package) aws cloudformation deploy)

transform zip & upload create/execute ChangeSet

SAM Template CloudFormation S3 Bucket


CloudFormation
(YAML) Template
(YAML)
+
+
CloudFormation Stack
Application Code
Application Code

Lambda API Gateway DynamoDB


SAM – CLI Debugging SAM CLI

• Locally build, test, and debug your serverless +


applications that are defined using AWS SAM AWS Toolkits For
templates
• Provides a lambda-like execution environment
locally AWS Cloud9 VS Code JetBrains

• SAM CLI + AWS Toolkits => step-through


and debug your code …
PyCharm IntelliJ
• Supported IDEs: AWS Cloud9, Visual Studio
Code, JetBrains, PyCharm, IntelliJ, … debug code locally
• AWS Toolkits: IDE plugins which allows you (step-through it line by line)

to build, test, debug, deploy, and invoke


Lambda functions built using AWS SAM Lambda function
code
SAM Policy Templates
• List of templates to apply permissions to
your Lambda Functions
• Full list available here:
https://docs.aws.amazon.com/serverless-
application-
model/latest/developerguide/serverless-
policy-templates.html#serverless-policy-
template-table
• Important examples:
• S3ReadPolicy: Gives read only permissions to
objects in S3
• SQSPollerPolicy: Allows to poll an SQS queue
• DynamoDBCrudPolicy: CRUD = create read
update delete
SAM and CodeDeploy
Lambda Alias
CloudWatch
• SAM framework natively uses Alarm v1
CodeDeploy to update Lambda Trigger
functions deployment
(CICD)
Traffic shifting with alias
• Traffic Shifting feature
CodeDeploy
• Pre and Post traffic hooks v2
features to validate deployment Run tests
(before the traffic shift starts and
after it ends)
Lambda Function
• Easy & automated rollback using Pre-Traffic Hook
CloudWatch Alarms Run tests
Lambda Function
Post-Traffic Hook
SAM and CodeDeploy
• AutoPublishAlias
• Detects when new code is being
deployed
• Creates and publishes an updated
version of that function with the latest
code
• Points the alias to the updated version
of the Lambda function
• DeploymentPreference
• Canary, Linear, AllAtOnce
• Alarms
• Alarms that can trigger a rollback
• Hooks
• Pre and post traffic shifting Lambda
functions to test your deployment
SAM – Local Capabilities
Client (Local)
• Locally star t AWS Lambda sam local start-lambda
• sam local start-lambda
• Starts a local endpoint that emulates AWS Lambda
• Can run automated tests against this local endpoint Lambda
(local endpoint)

• Locally Invoke Lambda Function


• sam local invoke Client (Local)
• Invoke Lambda function with payload once and quit sam local invoke
after invocation completes
• Helpful for generating test cases
• If the function make API calls to AWS, make sure Lambda
you are using the correct --profile option Function
SAM – Local Capabilities
Client (Local)
• Locally Star t an API Gateway Endpoint sam local start-api
• sam local start-api
• Starts a local HTTP server that hosts all your functions
API Endpoint
• Changes to functions are automatically reloaded (local endpoint)

• Generate AWS Events for Lambda Functions Client (Local)


• sam local generate-event sam local generate-event s3 put
• Generate sample payloads for event sources --bucket <bucket> --key <key> |
sam local invoke -e - <function_logical_id>
• S3, API Gateway, SNS, Kinesis, DynamoDB…
S3 event invoke

Lambda
Function
SAM – Exam Summary
• SAM is built on CloudFormation
• SAM requires the Transform and Resources sections
• Commands to know:
• sam build: fetch dependencies and create local deployment artifacts
• sam package: package and upload to Amazon S3, generate CF template
• sam deploy: deploy to CloudFormation
• SAM Policy templates for easy IAM policy definition
• SAM is integrated with CodeDeploy to do deploy to Lambda aliases
Serverless Application Repository (SAR)
• Managed repository for serverless S3 Bucket
applications
• The applications are packaged using SAM Serverless Application
Repository
• Build and publish applications that can be
re-used by organizations
• Can share publicly
• Can share with specific AWS accounts

publish

deploy
• This prevents duplicate work, and just go
straight to publishing
Account
• Application settings and behaviour can be
customized using Environment variables
AWS Cloud Development Kit
AWS Cloud Development Kit (CDK)
• Define your cloud infrastructure using a
familiar language:
• JavaScript/TypeScript, Python, Java, and .NET
• Contains high level components called
constructs
• The code is “compiled” into a
CloudFormation template (JSON/YAML)
• You can therefore deploy infrastructure
and application runtime code together
• Great for Lambda functions
• Great for Docker containers in ECS / EKS
CDK in a diagram
CDK Application Constructs
CloudFormation
CloudFormation
CDK CLI Template

cdk synth

Programming
Languages
CDK vs SAM
• SAM:
• Serverless focused
• Write your template declaratively in JSON or YAML
• Great for quickly getting started with Lambda
• Leverages CloudFormation

• CDK:
• All AWS services
• Write infra in a programming language JavaScript/TypeScript, Python, Java, and
.NET
• Leverages CloudFormation
CDK + SAM
• You can use SAM CLI to locally test your CDK apps
• You must first run cdk synth

CDK Application sam local invoke –t MyCDKStack.template.json myFunction


(MyCDKApp)

Lambda function
(myFunction) cdk synth CloudFormation
SAM CLI
Template
(MyCDKStack.template.json)
CDK Hands-On
Amazon
Rekognition

ge
ma

s
ul t
i
ze
res
aly
an
Upload image trigger save results

Client S3 Bucket AWS Lambda Amazon DynamoDB


CDK Constructs
• CDK Construct is a component that encapsulates everything CDK
needs to create the final CloudFormation stack
• Can represent a single AWS resource (e.g., S3 bucket) or multiple
related resources (e.g., worker queue with compute)
• AWS Construct Library
• A collection of Constructs included in AWS CDK which contains Constructs for
every AWS resource
• Contains 3 different levels of Constructs available (L1, L2, L3)
• Construct Hub – contains additional Constructs from AWS, 3rd parties,
and open-source CDK community
CDK Constructs – Layer 1 Constructs (L1)
• Can be called CFN Resources which represents all resources directly
available in CloudFormation
• Constructs are periodically generated from CloudFormation Resource
Specification
• Construct names start with Cfn (e.g., CfnBucket)
• You must explicitly configure all resource proper ties
CDK Constructs – Layer 2 Constructs (L2)
• Represents AWS resources but with a higher level (intent-based API)
• Similar functionality as L1 but with convenient defaults and boilerplate
• You don’t need to know all the details about the resource properties
• Provide methods that make it simpler to work with the resource
(e.g., bucket.addLifeCycleRule())
CDK Constructs – Layer 3 Constructs (L3)
• Can be called Patterns, which represents multiple related resources
• Helps you complete common tasks in AWS
• Examples:
• aws-apigateway.LambdaRestApi represents an API Gateway backed by a Lambda
function
• aws-ecs-patterns.ApplicationLoadBalancerFargateService which represents an
architecture that includes a Fargate cluster with Application Load Balancer
CDK – Important Commands to know

Command Description
npm install -g aws-cdk-lib Install the CDK CLI and libraries
cdk init app Create a new CDK project from a specified template
cdk synth Synthesizes and prints the CloudFormation template
cdk bootstrap Deploys the CDK Toolkit staging Stack
cdk deploy Deploy the Stack(s)
cdk diff View differences of local CDK and deployed Stack
cdk destroy Destroy the Stack(s)
CDK – Bootstrapping
User
• The process of provisioning resources for CDK
before you can deploy CDK apps into an cdk bootstrap
AWS environment aws://123456789012/eu-west-1

• AWS Environment = account & region


• CloudFormation Stack called CDKToolkit is created AWS Account (123456789012)

and contains:
AWS Region (eu-west-1)
• S3 Bucket – to store files
• IAM Roles – to grant permissions to perform
deployments CloudFormation Stack
(CDKToolkit)
• You must run the following command for each new
environment: S3 Bucket
• cdk bootstrap aws://<aws_account>/<aws_region>
• Otherwise, you will get an error “Policy contains a IAM Role
statement with one or more invalid principal”
CDK – Testing
• To test CDK apps, use CDK Asser tions Module
combined with popular test frameworks such as
Jest (JavaScript) or Pytest (Python)
• Verify we have specific resources, rules, conditions,
parameters…
• Two types of tests:
• Fine-grained Assertions (common) – test specific Fine-grained Assertions
aspects of the CloudFormation template (e.g., check if a
resource has this property with this value)
• Snapshot Tests – test the synthesized CloudFormation
template against a previously stored baseline template
• To import a template
• Template.fromStack(MyStack) : stack built in CDK Snapshot Test
• Template.fromString(mystring) : stack build outside CDK
Amazon Cognito
Amazon Cognito
• Give users an identity to interact with our web or mobile application
• Cognito User Pools:
• Sign in functionality for app users
• Integrate with API Gateway & Application Load Balancer

• Cognito Identity Pools (Federated Identity):


• Provide AWS credentials to users so they can access AWS resources directly
• Integrate with Cognito User Pools as an identity provider

• Cognito vs IAM: “hundreds of users”, ”mobile users”, “authenticate with SAML”


Cognito User Pools (CUP) – User Features
• Create a serverless database of user for your web & mobile apps
• Simple login: Username (or email) / password combination
• Password reset
• Email & Phone Number Verification
• Multi-factor authentication (MFA)
• Federated Identities: users from Facebook, Google, SAML…
• Feature: block users if their credentials are compromised elsewhere
• Login sends back a JSON Web Token (JWT)
Cognito User Pools (CUP) – Diagram
Returns
JSON Web Token
JWT

Mobile Applications Social Identity Provider


login
Cognito User Pools

Web Applications

Federation through
Third Party Identity Provider (IdP)

Database of users
Cognito User Pools (CUP) - Integrations
• CUP integrates with API Gateway and Application Load Balancer

Cognito User Pools


Authenticate Authenticate
Cognito User Pools
Retrieve token

Evaluate Cognito Token


REST API +
Pass Token Application Load Balancer
+ Listeners & Rules

API Gateway backend Target Group

Backend
Cognito User Pools – Lambda Triggers
• CUP can invoke a Lambda function synchronously on these triggers:
User Pool Flow Operation Description
Authentication Pre Authentication Lambda Trigger Custom validation to accept or deny the sign-in request
Events
Post Authentication Lambda Trigger Event logging for custom analytics
Pre Token Generation Lambda Trigger Augment or suppress token claims
Sign-Up Pre Sign-up Lambda Trigger Custom validation to accept or deny the sign-up
request
Post Confirmation Lambda Trigger Custom welcome messages or event logging for
custom analytics
Migrate User Lambda Trigger Migrate a user from an existing user directory to user
pools
Messages Custom Message Lambda Trigger Advanced customization and localization of messages
Token Creation Pre Token Generation Lambda Trigger Add or remove attributes in Id tokens
https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-identity-pools-working-with-aws-lambda-triggers.html
Cognito User Pools – Hosted Authentication UI
• Cognito has a hosted authentication UI
that you can add to your app to handle
sign-up and sign-in workflows
• Using the hosted UI, you have a
foundation for integration with social
logins, OIDC or SAML
• Can customize with a custom logo and
custom CSS

https://aws.amazon.com/blogs/aws/launch-amazon-cognito-user-pools-general-availability-app-integration-and-federation/
CUP – Hosted UI Custom Domain
• For custom domains, you must create an ACM certificate in us-east-1
• The custom domain must be defined in the “App Integration” section
CUP – Adaptive Authentication
User
• Block sign-ins or require MFA if the login appears
suspicious
• Cognito examines each sign-in attempt and generates a risk sign-in using
Password
score (low, medium, high) for how likely the sign-in request
is to be from a malicious attacker
• Users are prompted for a second MFA only when risk is Amazon
detected Cognito
• Risk score is based on different factors such as if the user
has used the same device, location, or IP address MFA Required

• Checks for compromised credentials, account takeover sign-in using


Password
protection, and phone and email verification
• Integration with CloudWatch Logs (sign-in attempts, risk
score, failed challenges…)
Attacker
Decoding a ID Token; JWT – JSON Web Token
• CUP issues JWT tokens (Base64 encoded):
• Header
• Payload User ID in Cognito DB

• Signature
• The signature must be verified to ensure
the JWT can be trusted
• Libraries can help you verify the validity of
JWT tokens issued by Cognito User Pools
• The Payload will contain the user
information (sub UUID, given_name, email,
phone_number, attributes…)
Expiry & Issued At
• From the sub UUID, you can retrieve all
users details from Cognito / OIDC

ID JWT Token Payload


Application Load Balancer – Authenticate Users
• Your Application Load Balancer can securely authenticate users
• Offload the work of authenticating users to your load balancer
• Your applications can focus on their business logic
• Authenticate users through:
• Identity Provider (IdP): OpenID Connect (OIDC) compliant
• Cognito User Pools:
• Social IdPs, such as Amazon, Facebook, or Google
• Corporate identities using SAML, LDAP, or Microsoft AD
• Must use an HTTPS listener to set authenticate-oidc &
authenticate-cognito rules
• OnUnauthenticatedRequest – authenticate (default), deny,
allow
Application Load Balancer – Cognito Auth.

1. GET /api/data 3. GET /api/data

Users Application Load Amazon ECS


Balancer
HTTPS listener
Action: authenticate-cognito

2. authenticate

Amazon Cognito
ALB – Auth through Cognito User Pools
• Create Cognito User Pool, Client and
Domain
• Make sure an ID token is returned
• Add the social or Corporate IdP if needed
• Several URL redirections are necessary
• Allow your Cognito User Pool Domain on
your IdP app's callback URL. For example:
• https://domain-
prefix.auth.region.amazoncognito.com/saml2/
idpresponse
• https://user-pool-domain/oauth2/idpresponse
Application Load Balancer – OIDC Auth.
1. HTTPS request 8. Redirect to Original Request

10. Response 9. Response

Users Application Load Amazon ECS


Balancer

7.
Us
6.

er
Ac
2. ALB redirect User for 3. Auth. Grant 4. Auth. Grant 5. ID Token +

Cl
ce

aim
authentication Code Code Access Token

ss
To

s
ke
n
Authentication Token User Info.
Endpoint Endpoint Endpoint

Identity Provider
ALB – Auth. Through an Identity Provider (IdP)
That is OpenID Connect (OIDC) Compliant
• Configure a Client ID & Client Secret

• Allow redirect from OIDC to your


Application Load Balancer DNS name
(AWS provided) and CNAME (DNS
Alias of your app)
• https://DNS/oauth2/idpresponse
• https://CNAME/oauth2/idpresponse
Cognito Identity Pools (Federated Identities)
• Get identities for “users” so they obtain temporary AWS credentials
• Your identity pool (e.g identity source) can include:
• Public Providers (Login with Amazon, Facebook, Google, Apple)
• Users in an Amazon Cognito user pool
• OpenID Connect Providers & SAML Identity Providers
• Developer Authenticated Identities (custom login server)
• Cognito Identity Pools allow for unauthenticated (guest) access

• Users can then access AWS services directly or through API


Gateway
• The IAM policies applied to the credentials are defined in Cognito
• They can be customized based on the user_id for fine grained control
Cognito Identity Pools – Diagram
Login and Get Token

Exchange token Cognito Identity Pools Social Identity Provider


Web & Mobile
Applications for temporary
AWS credentials validate

Direct access to AWS

Get temp creds


Cognito
User Pools
Private S3 Bucket DynamoDB Table
Cognito Identity Pools – Diagram with CUP
Login and Get Token

Cognito
Web & Mobile Exchange token Identity Pools
Applications for temporary
AWS credentials validate Cognito
User Pools Internal DB
of users

Direct access to AWS

Get temp creds

Private S3 Bucket DynamoDB Table


Cognito Identity Pools – IAM Roles
• Default IAM roles for authenticated and guest users
• Define rules to choose the role for each user based on the user’s ID
• You can partition your users’ access using policy variables

• IAM credentials are obtained by Cognito Identity Pools through STS


• The roles must have a “trust” policy of Cognito Identity Pools
Cognito Identity Pools – Guest User example
Cognito Identity Pools – Policy variable on S3
Cognito Identity Pools – DynamoDB
Cognito User Pools vs Identity Pools
• Cognito User Pools (for authentication = identity verification)
• Database of users for your web and mobile application
• Allows to federate logins through Public Social, OIDC, SAML…
• Can customize the hosted UI for authentication (including the logo)
• Has triggers with AWS Lambda during the authentication flow
• Adapt the sign-in experience to different risk levels (MFA, adaptive authentication, etc…)
• Cognito Identity Pools (for authorization = access control)
• Obtain AWS credentials for your users
• Users can login through Public Social, OIDC, SAML & Cognito User Pools
• Users can be unauthenticated (guests)
• Users are mapped to IAM roles & policies, can leverage policy variables
• CUP + CIP = authentication + authorization
Cognito Identity Pools – Diagram with CUP
Login and Get Token

Cognito
Web & Mobile Exchange token Identity Pools
Applications for temporary
AWS credentials validate Cognito
User Pools Internal DB
of users

Direct access to AWS

Get temp creds

Private S3 Bucket DynamoDB Table


Other Serverless
AWS Step Functions
• Model your workflows as state
machines (one per workflow)
• Order fulfillment, Data processing
• Web applications, Any workflow
• Written in JSON
• Visualization of the workflow and
the execution of the workflow, as
well as history
• Start workflow with SDK call, API
Gateway, Event Bridge
(CloudWatch Event)
Step Function – Task States Lambda Batch ECS

• Do some work in your state machine


Task State
• Invoke one AWS service
• Can invoke a Lambda function DynamoDB
• Run an AWS Batch job
• Run an ECS task and wait for it to complete
• Insert an item from DynamoDB
• Publish message to SNS, SQS
• Launch another Step Function workflow… SNS SQS
• Run an one Activity
• EC2, Amazon ECS, on-premises Poll for work
Activity
• Activities poll the Step functions for work Send results Task
• Activities send results back to Step Functions
App server
Example – Invoke Lambda Function
Step Function - States
• Choice State - Test for a condition to send to a branch (or default branch)
• Fail or Succeed State - Stop execution with failure or success
• Pass State - Simply pass its input to its output or inject some fixed data,
without performing work.
• Wait State - Provide a delay for a certain amount of time or until a
specified time/date.
• Map State - Dynamically iterate steps.’
• Parallel State - Begin parallel branches of execution.
Visual workflow in Step Functions
Error Handling in Step Functions
• Any state can encounter runtime errors for various reasons:
• State machine definition issues (for example, no matching rule in a Choice state)
• Task failures (for example, an exception in a Lambda function)
• Transient issues (for example, network partition events)
• Use Retry (to retry failed state) and Catch (transition to failure path) in the
State Machine to handle the errors instead of inside the Application Code
• Predefined error codes:
• States.ALL : matches any error name
• States.Timeout: Task ran longer than TimeoutSeconds or no heartbeat received
• States.TaskFailed: execution failure
• States.Permissions: insufficient privileges to execute code
• The state may report is own errors
Step Functions – Retry (Task or Parallel State)
• Evaluated from top to bottom
• ErrorEquals: match a specific kind
of error
• IntervalSeconds: initial delay
before retrying
• BackoffRate: multiple the delay
after each retry
• MaxAttempts: default to 3, set to
0 for never retried
• When max attempts are reached,
the Catch kicks in
Step Functions – Catch (Task or Parallel State)
• Evaluated from top to bottom
• ErrorEquals: match a specific
kind of error
• Next: State to send to
• ResultPath - A path that
determines what input is sent
to the state specified in the
Next field.
Step Function – ResultPath
• Include the error in the input

INPUT

OUTPUT WITH
ERROR
USING RESULTPATH
Step Functions – Wait for Task Token
• Allows you to pause Step Functions during a Task until a Task Token is
returned
• Task might wait for other AWS services, human approval, 3rd party
integration, call legacy systems…
• Append .waitForTaskToken to the Resource field to tell Step Functions
to wait for the Task Token to be returned

• Task will pause until it receives that Task Token back with a
SendTaskSuccess or SendTaskFailure API call
Step Functions – Wait for Task Token
Step Functions Workflow

Start SQS Queue


(MyQueue)
call SQS with Task Token
Check Credit

Result poll messages

Approved Denied
Lambda

Credit Check
Failed
Send Response Task completed ECS
SendTaskSuccess API call

End 3rd Party


Process Messages
Step Functions – Activity Tasks
Step Functions
• Enables you to have the Task work performed by an
Activity Worker
• Activity Worker apps can be running on EC2, Lambda, 2. Task
mobile device… (input & TaskToken)

• Activity Worker poll for a Task using GetActivityTask API


• After Activity Worker completes its work, it sends a
response of its success/failure using SendTaskSuccess or
SendTaskFailure
• To keep the Task active: 3. Task completed
1. Poll for Task
(output & TaskToken)
• Configure how long a task can wait by setting TimeoutSeconds (SendTaskSuccess)
(GetActivityTask)

• Periodically send a heartbeat from your Activity Worker using


SendTaskHeartBeat within the time you set in
HeartBeatSeconds
• By configuring a long TimeoutSeconds and actively
sending a heartbeat, Activity Task can wait up to 1 year
EC2 Instance
(Activity Worker)
Step Functions – Standard vs. Express
Max. Duration Up to 1 year Step Functions Up to 5 minutes
Workflows
Execution Model Exactly-once Execution
Execution Rate Over 2000 / second Over 100,000 / second

Execution History Up to 90 days or using CloudWatch CloudWatch Logs

Pricing # of State Transitions # of executions, duration,


Standard (default) Express and memory consumption
Use cases Non-idempotent actions IoT data ingestion, streaming data,
(e.g., processing Payments) mobile app backends, …

Doesn’t wait for Workflow to complete Asynchronous Synchronous


(get results from CW Logs) Wait for Workflow to complete
At-least once At-most once
You don’t need an immediate response You need an immediate response
(e.g., messaging services) (e.g., orchestrate microservices)
Must manage idempotence Can be invoked from
API Gateway or Lambda function
AWS AppSync - Overview
• AppSync is a managed service that uses GraphQL
• GraphQL makes it easy for applications to get exactly the data they need.
• This includes combining data from one or more sources
• NoSQL data stores, Relational databases, HTTP APIs…
• Integrates with DynamoDB, Aurora, OpenSearch & others
• Custom sources with AWS Lambda
• Retrieve data in real-time with WebSocket or MQTT on WebSocket
• For mobile apps: local data access & data synchronization
• It all starts with uploading one GraphQL schema
GraphQL Example
ry
q ue s
h QL lient
p c
Gra nt by
se
AppSync
upload Client
Schema

Gra
phQ
L
in J Respo
SON nse
DynamoDB
Resolver

GraphQL Schema on AppSync


DynamoDB
AppSync Diagram
DynamoDB

Web apps
GraphQL Schema Aurora
Mobile apps Resolvers

Real-time
OpenSearch
dashboards
AppSync
Offline Sync
Lambda Anything

HTTP Public
CloudWatch HTTP APIs
Metrics & Logs
AppSync – Security
• There are four ways you can authorize applications to interact with your
AWS AppSync GraphQL API:
• API_KEY
• AWS_IAM: IAM users / roles / cross-account access
• OPENID_CONNECT: OpenID Connect provider / JSON Web Token
• AMAZON_COGNITO_USER_POOLS

• For custom domain & HTTPS, use CloudFront in front of AppSync


AWS Amplify
Create mobile and web applications
Amplify Studio Amplify CLI
Visually build a full-stack app, Configure an Amplify backend
both front-end UI and a backend. With a guided CLI workflow

Amplify Libraries Amplify Hosting


Connect your app to existing AWS Host secure, reliable, fast web apps or websites
Services (Cognito, S3 and more) via the AWS content delivery network.
AWS Amplify
• Set of tools to get started with creating mobile
Build
and web applications
• “Elastic Beanstalk for mobile and web
applications” Amplify
Connect Studio & CLI
With Amplify
• Must-have features such as data storage, Frontend Libs DynamoDB
authentication, storage, and machine-learning,
all powered by AWS services
AWS
• Front-end libraries with ready-to-use AppSync
components for React.js, Vue, Javascript, iOS,
Android, Flutter, etc…
Amazon Cognito
• Incorporates AWS best practices to for
reliability, security, scalability
• Build and deploy with the Amplify CLI or Amazon S3
Amplify Studio
Frontend Amplify backend
AWS Amplify – Important Features

AUTHENTICATION DATASTORE
• Leverages Amazon Cognito • Leverages Amazon AppSync and
• User registration, authentication, Amazon DynamoDB
account recovery & other • Work with local data and have
operations
automatic synchronization to the
• Support MFA, Social Sign-in, cloud without complex code
etc…
• Pre-built UI components • Powered by GraphQL
• Fine-grained authorization • Offline and real-time capabilities
• Visual data modeling w/ Amplify Studio
AWS Amplify Hosting
AMPLIFY HOSTING

• Build and Host Modern Web Apps


• CICD (build, test, deploy)
• Pull Request Previews CloudFront
• Custom Domains Build Front End
deploy

• Monitoring
• Redirect and Custom Headers Amplify
(optionally) deploy
• Password protection Source Code
Repository Build Back End
AWS Amplify – End-to-End (E2E) Testing
• Run end-to-end (E2E) tests in the test phase in
Amplify
• Catch regressions before pushing code to
production
• Use the test step to run any test commands at
build time (amplify.yml)
• Integrated with Cypress testing framework
• Allows you to generate UI report for your tests

Build Test (E2E) Deploy


run tests while the run tests while the app
app is being built is deployed (staging)

amplify.yml
Advanced Identity in AWS
AWS STS – Security Token Service
• Allows to grant limited and temporary access to AWS resources (up to 1 hour).
• AssumeRole: Assume roles within your account or cross account
• AssumeRoleWithSAML: return credentials for users logged with SAML
• AssumeRoleWithWebIdentity
• return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
• AWS recommends against using this, and using Cognito Identity Pools instead
• GetSessionToken: for MFA, from a user or AWS account root user
• GetFederationToken: obtain temporary creds for a federated user
• GetCallerIdentity: return details about the IAM user or role used in the API call
• DecodeAuthorizationMessage: decode error message when an AWS API is denied
Using STS to Assume a Role
• Define an IAM Role within your AssumeRole API
account or cross-account
• Define which principals can access
AWS STS
this IAM Role
user
• Use AWS STS (Security Token temporary
security
Service) to retrieve credentials and credential
impersonate the IAM Role you permissions
have access to (AssumeRole API)
• Temporary credentials can be valid
between 15 minutes to 1 hour
Role (same or
other account) IAM
Cross account access with STS

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html
STS with MFA
• Use GetSessionToken from STS
• Appropriate IAM policy using
IAM Conditions
• aws:MultiFactorAuthPresent:tru
e
• Reminder, GetSessionToken
returns:
• Access ID
• Secret Key
• Session Token
• Expiration date
IAM Best Practices – General
• Never use Root Credentials, enable MFA for Root Account
• Grant Least Privilege
• Each Group / User / Role should only have the minimum level of permission it
needs
• Never grant a policy with “*” access to a service
• Monitor API calls made by a user in CloudTrail (especially Denied ones)
• Never ever ever store IAM key credentials on any machine but a
personal computer or on-premise server
• On premise server best practice is to call STS to obtain temporary
security credentials
IAM Best Practices – IAM Roles
• EC2 machines should have their own roles
• Lambda functions should have their own roles
• ECS Tasks should have their own roles
(ECS_ENABLE_TASK_IAM_ROLE=true)
• CodeBuild should have its own service role
• Create a least-privileged role for any service that requires it
• Create a role per application / lambda function (do not reuse roles)
IAM Best Practices – Cross Account Access
• Define an IAM Role for another AssumeRole API
account to access
• Define which accounts can access
AWS STS
this IAM Role
user
• Use AWS STS (Security Token temporary
security
Service) to retrieve credentials and credential
impersonate the IAM Role you permission
have access to (AssumeRole API)
• Temporary credentials can be valid
between 15 minutes to 1 hour
Role (same or
other account) IAM
Advanced IAM - Authorization Model
Evaluation of Policies, simplified
1. If there’s an explicit DENY, end decision and DENY
2. If there’s an ALLOW, end decision with ALLOW
3. Else DENY

Decision Evaluate all No No Final


Explicit
starts at applicable Allow? decision =
Deny?
DENY Policies “deny”

Yes Yes
Final Final
decision = decision =
“deny” “allow”
IAM Policies & S3 Bucket Policies
• IAM Policies are attached to users, roles, groups
• S3 Bucket Policies are attached to buckets
• When evaluating if an IAM Principal can perform an operation X on a
bucket, the union of its assigned IAM Policies and S3 Bucket Policies will
be evaluated.

S3 Bucket Total Policy


IAM Policy
Policy Evaluated
Example 1
• IAM Role attached to EC2 instance, authorizes RW to “my_bucket”
• No S3 Bucket Policy attached

• => EC2 instance can read and write to “my_bucket”


Example 2
• IAM Role attached to EC2 instance, authorizes RW to “my_bucket”
• S3 Bucket Policy attached, explicit deny to the IAM Role

• => EC2 instance cannot read and write to “my_bucket”


Example 3
• IAM Role attached to EC2 instance, no S3 bucket permissions
• S3 Bucket Policy attached, explicit RW allow to the IAM Role

• => EC2 instance can read and write to “my_bucket”


Example 4
• IAM Role attached to EC2 instance, explicit deny S3 bucket permissions
• S3 Bucket Policy attached, explicit RW allow to the IAM Role

• => EC2 instance cannot read and write to “my_bucket”


Dynamic Policies with IAM
• How do you assign each user a /home/<user> folder in an S3 bucket?
• Option 1:
• Create an IAM policy allowing georges to have access to /home/georges
• Create an IAM policy allowing sarah to have access to /home/sarah
• Create an IAM policy allowing matt to have access to /home/matt
• … One policy per user!
• This doesn’t scale
• Option 2:
• Create one dynamic policy with IAM
• Leverage the special policy variable ${aws:username}
Dynamic Policy example
Inline vs Managed Policies
• AWS Managed Policy
• Maintained by AWS
• Good for power users and administrators
• Updated in case of new services / new APIs
• Customer Managed Policy
• Best Practice, re-usable, can be applied to many principals
• Version Controlled + rollback, central change management
• Inline
• Strict one-to-one relationship between policy and principal
• Policy is deleted if you delete the IAM principal
Granting a User Permissions to Pass a Role to
an AWS Service
• To configure many AWS services, you must pass an IAM role to the service
(this happens only once during setup)
• The service will later assume the role and perform actions
• Example of passing a role:
• To an EC2 instance
• To a Lambda function
• To an ECS task
• To CodePipeline to allow it to invoke other services

• For this, you need the IAM permission iam:PassRole


• It often comes with iam:GetRole to view the role being passed
IAM PassRole example
Can a role be passed to any service?
• No: Roles can only be passed to what their trust allows
• A trust policy for the role that allows the service to assume the role
What is Microsoft Active Directory (AD)?
• Found on any Windows Server
with AD Domain Services Domain Controller
• Database of objects: User
John
Accounts, Computers, Printers, Password
File Shares, Security Groups
• Centralized security
management, create account,
assign permissions
• Objects are organized in trees
• A group of trees is a forest
AWS Directory Services
• AWS Managed Microsoft AD auth trust auth
• Create your own AD in AWS, manage users
locally, supports MFA
• Establish “trust” connections with your on-
premise AD On-prem AD AWS Managed AD

• AD Connector proxy auth


• Directory Gateway (proxy) to redirect to on-
premise AD, supports MFA
• Users are managed on the on-premise AD
On-prem AD AD Connector
• Simple AD
• AD-compatible managed directory on AWS
• Cannot be joined with on-premise AD

Simple AD
AWS Security & Encryption
KMS, Encryption SDK, SSM Parameter Store
Why encryption?
Encryption in flight (SSL)
• Data is encrypted before sending and decrypted after receiving
• SSL certificates help with encryption (HTTPS)
• Encryption in flight ensures no MITM (man in the middle attack) can happen

aGVsbG8gd29 HTTPS Website


HTTPS You ybGQgZWh… (AWS)

U: admin aGVsbG8gd29
P: supersecret ybGQgZWh…

aGVsbG8gd29 SSL Encryption SSL Decryption U: admin


ybGQgZWh… P: supersecret
Why encryption?
Server side encryption at rest
• Data is encrypted after being received by the server
• Data is decrypted before being sent
• It is stored in an encrypted form thanks to a key (usually a data key)
• The encryption / decryption keys must be managed somewhere and
Object
the server must have access to it
AWS Service (ex: EBS)
Object
HTTP/S HTTP/S

+ encryption decryption

Data key
Data key
Why encryption?
Client side encryption
• Data is encrypted by the client and never decrypted by the server
• Data will be decrypted by a receiving client
• The server should not be able to decrypt the data
• Could leverage Envelope Encryption
Object Client Encryption Any store (FTP, Object Client Decryption
S3, etc..)

+ encryption
+ decryption

Client side data key Client side data key


AWS KMS (Key Management Service)
• Anytime you hear “encryption” for an AWS service, it’s most likely KMS
• AWS manages encryption keys for us
• Fully integrated with IAM for authorization
• Easy way to control access to your data
• Able to audit KMS Key usage using CloudTrail
• Seamlessly integrated into most AWS services (EBS, S3, RDS, SSM…)
• Never ever store your secrets in plaintext, especially in your code!
• KMS Key Encryption also available through API calls (SDK, CLI)
• Encrypted secrets can be stored in the code / environment variables
KMS Keys Types
• KMS Keys is the new name of KMS Customer Master Key
• Symmetric (AES-256 keys)
• Single encryption key that is used to Encrypt and Decrypt
• AWS services that are integrated with KMS use Symmetric CMKs
• You never get access to the KMS Key unencrypted (must call KMS API to use)
• Asymmetric (RSA & ECC key pairs)
• Public (Encrypt) and Private Key (Decrypt) pair
• Used for Encrypt/Decrypt, or Sign/Verify operations
• The public key is downloadable, but you can’t access the Private Key unencrypted
• Use case: encryption outside of AWS by users who can’t call the KMS API
AWS KMS (Key Management Service)
• Types of KMS Keys:
• AWS Owned Keys (free): SSE-S3, SSE-SQS, SSE-DDB (default key)
• AWS Managed Key: free (aws/service-name, example: aws/rds or aws/ebs)
• Customer managed keys created in KMS: $1 / month
• Customer managed keys imported (must be symmetric key): $1 / month
• + pay for API call to KMS ($0.03 / 10000 calls)

• Automatic Key rotation:


• AWS-managed KMS Key: automatic every 1 year
• Customer-managed KMS Key: (must be enabled) automatic every 1 year
• Imported KMS Key: only manual rotation possible using alias
Copying Snapshots across regions
Region eu-west-2 Region ap-southeast-2

EBS Volume EBS Volume


Encrypted Encrypted
With KMS KMS Key A With KMS KMS Key B

EBS Snapshot EBS Snapshot


Encrypted Encrypted
With KMS KMS Key A With KMS KMS Key B

KMS ReEncrypt with KMS Key B


KMS Key Policies
• Control access to KMS keys, “similar” to S3 bucket policies
• Difference: you cannot control access without them

• Default KMS Key Policy:


• Created if you don’t provide a specific KMS Key Policy
• Complete access to the key to the root user = entire AWS account
• Custom KMS Key Policy:
• Define users, roles that can access the KMS key
• Define who can administer the key
• Useful for cross-account access of your KMS key
Copying Snapshots across accounts
1. Create a Snapshot, encrypted with
your own KMS Key (Customer
Managed Key)
2. Attach a KMS Key Policy to
authorize cross-account access
3. Share the encrypted snapshot
4. (in target) Create a copy of the
Snapshot, encrypt it with a CMK in
your account
5. Create a volume from the snapshot
KMS Key Policy
How does KMS work?
API – Encrypt and Decrypt
KMS
Encrypt API

Check IAM permissions


Secret (ex: password) CMK IAM
< 4 KB Perform encryption
Send encrypted secret
Encrypted Data

Decrypt API CMK


Perform decryption Check IAM permissions
IAM
Decrypted Secret Send decrypted secret
(in plaintext)
Envelope Encryption
• KMS Encrypt API call has a limit of 4 KB
• If you want to encrypt >4 KB, we need to use Envelope Encryption
• The main API that will help us is the GenerateDataKey API

• For the exam: anything over 4 KB of data that needs to be encrypted


must use the Envelope Encryption == GenerateDataKey API
Deep dive into Envelope Encryption
GenerateDataKey API
GenerateDataKey API KMS

CMK Check IAM permissions


Big File (ex: 10MB)
IAM
Client side encryption

Send plaintext data key Generate Data Key


Plaintext DEK

Send encrypted data key Encrypt Data Key


Using DEK

Encrypted DEK
with CMK

Encrypted File

Final file
Deep dive into Envelope Encryption
Decrypt envelope data
KMS

Decrypt API
Encrypted DEK CMK
Check IAM permissions
Client side decryption

IAM
Encrypted File

Envelope file
Using DEK

Send plaintext data key Decrypt data key


Plaintext DEK
using CMK

Decrypted Big File


Encryption SDK
• The AWS Encryption SDK implemented Envelope Encryption for us
• The Encryption SDK also exists as a CLI tool we can install
• Implementations for Java, Python, C, JavaScript

• Feature - Data Key Caching:


• re-use data keys instead of creating new ones for each encryption
• Helps with reducing the number of calls to KMS with a security trade-off
• Use LocalCryptoMaterialsCache (max age, max bytes, max number of messages)
Encryption SDK – diagram
• The SDK encrypts the data encryption key and stores it (encrypted) as
part of the returned ciphertext.
KMS Symmetric – API Summary
• Encrypt: encrypt up to 4 KB of data through KMS
• GenerateDataKey: generates a unique symmetric data key (DEK)
• returns a plaintext copy of the data key
• AND a copy that is encrypted under the CMK that you specify
• GenerateDataKeyWithoutPlaintext:
• Generate a DEK to use at some point (not immediately)
• DEK that is encrypted under the CMK that you specify (must use Decrypt later)
• Decrypt: decrypt up to 4 KB of data (including Data Encryption Keys)
• GenerateRandom: Returns a random byte string
KMS Request Quotas
• When you exceed a request quota, you get a ThrottlingException:

• To respond, use exponential backoff (backoff and retry)


• For cryptographic operations, they share a quota
• This includes requests made by AWS on your behalf (ex: SSE-KMS)
• For GenerateDataKey, consider using DEK caching from the Encryption SDK
• You can request a Request Quotas increase through API or AWS support
KMS Request Quotas
API operation Request quotas (per second)
Decrypt These shared quotas vary with the AWS Region and
Encrypt the type of CMK used in the request. Each quota is
GenerateDataKey (symmetric) calculated separately.
GenerateDataKeyWithoutPlaintext (symmetric)
GenerateRandom Symmetric CMK quota:
ReEncrypt • 5,500 (shared)
Sign (asymmetric) • 10,000 (shared) in the following Regions:
Verify (asymmetric) • us-east-2, ap-southeast-1, ap-southeast-2,
ap-northeast-1, eu-central-1, eu-west-2
• 30,000 (shared) in the following Regions:
• us-east-1, us-west-2, eu-west-1

Asymmetric CMK quota:


• 500 (shared) for RSA CMKs
• 300 (shared) for Elliptic curve (ECC) CMKs
S3 Bucket Key for SSE-KMS encryption
• New setting to decrease… Amazon S3 AWS KMS
• Number of API calls made to KMS from
S3 by 99%
• Costs of overall KMS encryption with
Amazon S3 by 99%
S3 Bucket key Customer master key
• This leverages data keys
• A “S3 bucket key” is generated
• That key is used to encrypt KMS
objects with new data keys
Data keys
• You will see less KMS CloudTrail
events in CloudTrail
S3 Bucket
Key Policy – Examples
KMS Key
KMS Key Policy

Default KMS Key Policy Allow Federated User


Principal Options in IAM Policies
• AWS Account and Root User

• IAM Roles

• IAM Role Sessions


Principal Options in IAM Policies
• IAM Users

• Federated User Sessions

• AWS Services

• All Principals
CloudHSM
• KMS => AWS manages the software for encryption
• CloudHSM => AWS provisions encryption hardware
• Dedicated Hardware (HSM = Hardware Security Module)
• You manage your own encryption keys entirely (not AWS)
• HSM device is tamper resistant, FIPS 140-2 Level 3 compliance
• Supports both symmetric and asymmetric encryption (SSL/TLS keys)
• No free tier available
• Must use the CloudHSM Client Software
• Redshift supports CloudHSM for database encryption and key management
• Good option to use with SSE-C encryption
CloudHSM Diagram
AWS manages the Hardware

SSL Connection
User manages the Keys

AWS CloudHSM
CloudHSM Client

IAM permissions: CloudHSM Software:


• CRUD an HSM Cluster • Manage the Keys
• Manage the Users
CloudHSM – High Availability
• CloudHSM clusters are spread across Multi AZ (HA)
• Great for availability and durability
Availability Zone 1

CloudHSM 1

Availability Zone 2
CloudHSM Client
CloudHSM 2
CloudHSM – Integration with AWS Services
CloudHSM

• Through integration with


AWS KMS
• Configure KMS Custom Connector

Key Store with RDS DB


Instance
CloudHSM KMS Encryption
EBS Volume
• Example: EBS, S3, RDS … AWS KMS
(Custom Key Store)

CloudTrail
keys usage logs
CloudHSM vs. KMS
Feature AWS KMS AWS CloudHSM
Tenancy Multi-Tenant Single-Tenant
Standard FIPS 140-2 Level 3 FIPS 140-2 Level 3
Master Keys • AWS Owned CMK Customer Managed CMK
• AWS Managed CMK
• Customer Managed CMK
Key Types • Symmetric • Symmetric
• Asymmetric • Asymmetric
• Digital Signing • Digital Signing & Hashing
Key Accessibility Accessible in multiple AWS regions (can’t • Deployed and managed in a VPC
access keys outside the region it’s created in) • Can be shared across VPCs (VPC Peering)
Cryptographic None • SSL/TLS Acceleration
Acceleration • Oracle TDE Acceleration
Access & AWS IAM You create users and manage their permissions
Authentication
CloudHSM vs. KMS
Feature AWS KMS AWS CloudHSM
High Availability AWS Managed Service Add multiple HSMs over different AZs
Audit Capability • CloudTrail • CloudTrail
• CloudWatch • CloudWatch
• MFA support
Free Tier Yes No
SSM Parameter Store
• Secure storage for configuration and secrets
• Optional Seamless Encryption using KMS Applications

• Serverless, scalable, durable, easy SDK


Plaintext Encrypted
• Version tracking of configurations / secrets configuration configuration
• Security through IAM
• Notifications with Amazon EventBridge Check IAM
SSM Parameter
Store
permissions
• Integration with CloudFormation
Decryption
Service

AWS KMS
SSM Parameter Store Hierarchy
• /my-department/
• my-app/ GetParameters or
• dev/ GetParametersByPath API
• db-url Dev Lambda
• db-password Function
• prod/
• db-url
Prod Lambda
• db-password
Function
• other-app/
• /other-department/
• /aws/reference/secretsmanager/secret_ID_in_Secrets_Manager
• /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 (public)
Standard and advanced parameter tiers
Standard Advanced
Total number of parameters 10,000 100,000
allowed
(per AWS account and
Region)
Maximum size of a 4 KB 8 KB
parameter value
Parameter policies available No Yes
Cost No additional charge Charges apply
Storage Pricing Free $0.05 per advanced parameter per
month
Parameters Policies (for advanced parameters)
• Allow to assign a TTL to a parameter (expiration date) to force
updating or deleting sensitive data such as passwords
• Can assign multiple policies at a time

Expiration (to delete a parameter) ExpirationNotification (EventBridge) NoChangeNotification (EventBridge)


AWS Secrets Manager
• Newer service, meant for storing secrets
• Capability to force rotation of secrets every X days
• Automate generation of secrets on rotation (uses Lambda)
• Integration with Amazon RDS (MySQL, PostgreSQL, Aurora)
• Secrets are encrypted using KMS

• Mostly meant for RDS integration


AWS Secrets Manager – Multi-Region Secrets
• Replicate Secrets across multiple AWS Regions
• Secrets Manager keeps read replicas in sync with the primary Secret
• Ability to promote a read replica Secret to a standalone Secret
• Use cases: multi-region apps, disaster recovery strategies, multi-region DB…
us-east-1 (Primary) us-west-2 (Secondary)

Secrets replicate Secrets


Manager Manager
MySecret-A MySecret-A
(primary) (replica)
Secrets Manager
CloudFormation Integration RDS & Aurora
• ManageMasterUserPassword – creates admin secret implicitly
• RDS, Aurora will manage the secret in Secrets Manager and its rotation

Secrets
Manager
create Secret
create Stack

User CloudFormation
RDS
create DB and
configure
Username/Password
Secrets Manager
CloudFormation - Dynamic Reference

secret is generated

reference secret in
RDS DB instance

link the secret to


RDS DB instance
SSM Parameter Store vs Secrets Manager
• Secrets Manager ($$$):
• Automatic rotation of secrets with AWS Lambda
• Lambda function is provided for RDS, Redshift, DocumentDB
• KMS encryption is mandatory
• Can integration with CloudFormation
• SSM Parameter Store ($):
• Simple API
• No secret rotation (can enable rotation using Lambda triggered by CW Events)
• KMS encryption is optional
• Can integration with CloudFormation
• Can pull a Secrets Manager secret using the SSM Parameter Store API
SSM Parameter Store vs. Secrets Manager
Rotation
AWS Secrets Manager SSM Parameter Store
every 30 days
every 30 days

invoke invoke

AWS Secrets Manager Lambda Function CloudWatch Events Lambda Function


(can be provided)
change
change password
value
change
password

Amazon RDS Amazon RDS SSM Parameter Store


CloudWatch Logs - Encryption +

• You can encrypt CloudWatch logs with KMS keys


• Encryption is enabled at the log group level, by associating a CMK with a
log group, either when you create the log group or after it exists.
• You cannot associate a CMK with a log group using the CloudWatch
console.

• You must use the CloudWatch Logs API:


• associate-kms-key : if the log group already exists
• create-log-group: if the log group doesn’t exist yet
CodeBuild Security +

• To access resources in your VPC, make sure you specify a VPC


configuration for your CodeBuild

• Secrets in CodeBuild:
• Don’t store them as plaintext in environment variables
• Instead…
• Environment variables can reference parameter store parameters
• Environment variables can reference secrets manager secrets
AWS Nitro Enclaves
• Process highly sensitive data in an isolated compute environment
• Personally Identifiable Information (PII), healthcare, financial, …
• Fully isolated virtual machines, hardened, and highly constrained
• Not a container, not persistent storage, no interactive access, no external networking
• Helps reduce the attack surface for sensitive data processing apps
• Cryptographic Attestation – only authorized code can be running in your Enclave
• Only Enclaves can access sensitive data (integration with KMS)
• Use cases: securing private keys, processing credit cards, secure multi-party
computation…
AWS Nitro Enclaves

1. Launch a compatible 2. Use the Nitro CLI 3. Using the EIF file 4. The Enclave is a
Nitro-based EC2 instance to convert your app as an input, use the separate virtual machine
AWS Nitro Enclave with the ‘EnclaveOptions’ to an Enclave Image Nitro CLI to create with its own kernel,
parameter set to ‘true’ File (EIF) an Enclave memory, and CPU

EC2 Host
Secure
local channel isolation between
EC2 instances
Instance A Enclave A Instance B on the same host

Nitro Hypervisor
Other Services
Quick overview of other services that might have questions on at the exam
Amazon Simple Email Service (Amazon SES)
• Fully managed service to send emails securely, globally and at scale
• Allows inbound/outbound emails Users
• Reputation dashboard, performance insights, anti-spam feedback
• Provides statistics such as email deliveries, bounces, feedback loop bulk emails
results, email open
• Supports DomainKeys Identified Mail (DKIM) and Sender Policy
Framework (SPF)
• Flexible IP deployment: shared, dedicated, and customer-owned IPs
Amazon SES
• Send emails using your application using AWS Console, APIs, or SMTP
APIs
or SMTP
• Use cases: transactional, marketing and bulk email communications

Application
Amazon OpenSearch Service
• Amazon OpenSearch is successor to Amazon ElasticSearch
• In DynamoDB, queries only exist by primary key or indexes…
• With OpenSearch, you can search any field, even par tially matches
• It’s common to use OpenSearch as a complement to another database
• Two modes: managed cluster or serverless cluster
• Does not natively support SQL (can be enabled via a plugin)
• Ingestion from Kinesis Data Firehose, AWS IoT, and CloudWatch Logs
• Security through Cognito & IAM, KMS encryption, TLS
• Comes with OpenSearch Dashboards (visualization)
OpenSearch patterns
DynamoDB
CRUD

DynamoDB Table DynamoDB Stream Lambda Function Amazon OpenSearch

API to search items


API to retrieve items
OpenSearch patterns
CloudWatch Logs
Real time

CloudWatch Logs Subscription Filter Lambda Function Amazon OpenSearch


(managed by AWS)

Near Real Time

CloudWatch Logs Subscription Filter Kinesis Data Firehose Amazon OpenSearch


OpenSearch patterns
Kinesis Data Streams & Kinesis Data Firehose
Kinesis Data Kinesis Data
Streams Streams

Lambda
Function
Kinesis Data data Lambda
Firehose transformation Function
(near real time) (real time)

Amazon Amazon
OpenSearch OpenSearch
Amazon Athena
• Serverless query service to analyze data stored in Amazon S3
• Uses standard SQL language to query the files (built on Presto)
load data
• Supports CSV, JSON, ORC, Avro, and Parquet
• Pricing: $5.00 per TB of data scanned
S3 Bucket
• Commonly used with Amazon Quicksight for
reporting/dashboards Query & Analyze

Amazon
• Use cases: Business intelligence / analytics / reporting, analyze & Athena
query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
• Exam Tip: analyze data in S3 using serverless SQL, use Athena Reporting & Dashboards

Amazon
QuickSight
Amazon Athena – Performance Improvement
• Use columnar data for cost-savings (less scan)
• Apache Parquet or ORC is recommended
• Huge performance improvement
• Use Glue to convert your data to Parquet or ORC
• Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd…)
• Partition datasets in S3 for easy querying on virtual columns
• s3://yourBucket/pathToTable
/<PARTITION_COLUMN_NAME>=<VALUE>
/<PARTITION_COLUMN_NAME>=<VALUE>
/<PARTITION_COLUMN_NAME>=<VALUE>
/etc…
• Example: s3://athena-examples/flight/parquet/year=1991/month=1/day=1/
• Use larger files (> 128 MB) to minimize overhead
Amazon Athena – Federated Query
Amazon
• Allows you to run SQL queries across Athena
data stored in relational, non-relational, S3 Bucket

object, and custom data sources (AWS


Lambda
or on-premises) (Data Source
ElastiCache Connector)
• Uses Data Source Connectors that run
on AWS Lambda to run Federated
Queries (e.g., CloudWatch Logs, DocumentDB
HBase in EMR
DynamoDB, RDS, …)
• Store the results back in Amazon S3
DynamoDB Database
(On-Premises)

Redshift
Amazon Managed Streaming for Apache Kafka
(Amazon MSK)
• Alternative to Amazon Kinesis
• Fully managed Apache Kafka on AWS
• Allow you to create, update, delete clusters
• MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you
• Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA)
• Automatic recovery from common Apache Kafka failures
• Data is stored on EBS volumes for as long as you want
• MSK Serverless
• Run Apache Kafka on MSK without managing the capacity
• MSK automatically provisions resources and scales compute & storage
Apache Kafka at a high level
EMR
MSK Cluster

Kinesis
S3

on ti
Broker 2

lica
rep
IoT SageMaker
Producers Write to topic Poll from topic Consumers
(your code) (your code)
Broker 1

rep
RDS Kinesis

lica
onti
Etc…
RDS
Broker 3
Etc…
Kinesis Data Streams vs. Amazon MSK

Kinesis Data Streams Amazon MSK

• 1 MB message size limit • 1MB default, configure for higher (ex: 10MB)
• Data Streams with Shards • Kafka Topics with Partitions
• Shard Splitting & Merging • Can only add partitions to a topic
• TLS In-flight encryption • PLAINTEXT or TLS In-flight Encryption
• KMS at-rest encryption • KMS at-rest encryption
Amazon MSK Consumers
Kinesis Data Analytics
for Apache Flink

AWS Glue
Streaming ETL Jobs
Powered by Apache Spark Streaming

Lambda
Amazon MSK

Applications Running on

Amazon EC2 ECS EKS


AWS Certificate Manager (ACM)
• Let’s you easily provision, manage, and deploy
SSL/TLS Cer tificates
• Used to provide in-flight encryption for HTTPS
websites (HTTPS)
• Supports both public and private TLS provision and
maintain TLS certs Application
certificates Load
• Free of charge for public TLS certificates Balancer
AWS Certificate Manager
• Automatic TLS certificate renewal
• Integrations with (load TLS certificates on) HTTP
• Elastic Load Balancers
• CloudFront Distributions
Auto Scaling group
• APIs on API Gateway

EC2 Instance EC2 Instance


AWS Private Certificate Authority (CA)
• Managed service allows you to create private
Certificate Authorities (CA), including root and CloudFront API GW ELB EKS
subordinaries CAs
• Can issue and deploy end-entity X.509 certificates
• Certificates are trusted only by your Organization integrates
(not the public Internet)
• Works for AWS services that are integrated with
ACM AWS Private CA
• Use cases: issue certificate for
• Encrypted TLS communication, Cryptographically signing code
• Authenticate users, computers, API endpoints, and IoT devices
• Enterprise customers building a Public Key Infrastructure (PKI)
User Computer APIs IoT Device
AWS Macie
• Amazon Macie is a fully managed data security and data privacy service
that uses machine learning and pattern matching to discover and
protect your sensitive data in AWS.
• Macie helps identify and alert you to sensitive data, such as personally
identifiable information (PII)

analyze notify integrations

S3 Buckets Macie Amazon


Discover Sensitive Data (PII) EventBridge
AWS AppConfig
• Configure, validate, and deploy dynamic configurations AppConfig

• Deploy dynamic configuration changes to your Config. Sources


applications independently of any code deployments …
• You don’t need to restart the application Parameter SSM
S3 Bucket
• Feature flags, application tuning, allow/block listing… Store Documents

• Use with apps on EC2 instances, Lambda, ECS, EKS… Validation

• Gradually deploy the configuration changes and OR


rollback if issues occur JSON Schema Lambda function

• Validate configuration changes before deployment


using:
• JSON Schema (syntactic check) or rollback poll for
config. chang
• Lambda Function – run code to perform validation
(semantic check) trigger

CloudWatch EC2 Instances


CloudWatch Evidently
Users Developer
• Safely validate new features by serving them to a
specified % of your users 3. Embed
• Reduce risk and identify unintended consequences Code
5. % Access 1. Create
• Collect experiment data, analyze using stats, monitor new feature 2. Code Project
performance Snippet &
Feature
• Launches (= feature flags): enable and disable
features for a subset of users Application
4. Enable
specified %
• Experiments (= A/B testing): compare multiple Only access v2
for the
new feature
versions of the same feature override

• Overrides: pre-define a variation for a specific user CloudWatch


• Store evaluation events in CloudWatch Logs or S3 Beta Tester Evidently
(specific user ID)

CW Logs Amazon S3
Exam Review & Tips
State of learning checkpoint
• Let’s look how far we’ve gone on our learning journey

• https://aws.amazon.com/certification/certified-developer-associate/
Practice makes perfect
• If you’re new to AWS, take a bit of AWS practice thanks to this course
before rushing to the exam
• The exam recommends you to have one or more years of hands-on
developing and maintaining an AWS based applications
• Practice makes perfect!

• If you feel overwhelmed by the amount of knowledge you just learned,


just go through it one more time
Ideas for practicing…!
• Take one of your existing applications
• Try deploying it manually on EC2
• Try deploying it on Elastic Beanstalk and have it scale
• Try creating a CICD pipeline for it
• Try decoupling components using SQS / SNS
• If possible, try running it on AWS Lambda & friends
• Write automation scripts using the CLI / SDK
• Idea 1: Shut down EC2 instances at night / start in the morning
• Idea 2: Automate snapshots of EBS volumes at night
• Idea 3: List all under-utilized EC2 instances (CPU Utilization < 10%)
Proceed by elimination
• Most questions are going to be scenario based
• For all the questions, rule out answers that you know for sure are wrong
• For the remaining answers, understand which one makes the most sense

• There are very few trick questions


• Don’t over-think it
• If a solution seems feasible but highly complicated, it’s probably wrong
Skim the AWS Whitepapers
• You can read about some AWS White Papers here:
• AWS Security Best Practices
• AWS Well-Architected Framework
• Architecting for the Cloud AWS Best Practices
• Practicing Continuous Integration and Continuous Delivery on AWS Accelerating
Software Delivery with DevOps
• Microservices on AWS
• Serverless Architectures with AWS Lambda
• Optimizing Enterprise Economics with Serverless Architectures
• Running Containerized Microservices on AWS
• Blue/Green Deployments on AWS
• Overall we’ve explored all the most important concepts in the course
• It’s never bad to have a look at the whitepapers you think are interesting!
Read each service’s FAQ
• FAQ = Frequently asked questions
• Example: https://aws.amazon.com/lambda/faqs/

• FAQ cover a lot of the questions asked at the exam


• They help confirm your understanding of a service
Get into the AWS Community
• Help out and discuss with other people in the course Q&A
• Review questions asked by other people in the Q&A
• Do the practice test in this section

• Read forums online


• Read online blogs
• Attend local meetups and discuss with other AWS engineers
• Watch re-invent videos on Youtube (AWS Conference)
How will the exam work?
• You’ll have to register online at https://www.aws.training/
• Fee for the exam is 150 USD
• Provide one identity documents (ID, Passport, details are in emails sent to you…)
• No notes are allowed, no pen is allowed, no speaking
• 65 questions will be asked in 130 minutes
• Use the “Flag” feature to mark questions you want to re-visit
• At the end you can optionally review all the questions / answers

• To pass you need a score of a least 720 out of 1000


• You will know within 5 days if you passed / failed the exams (most of the time less)
• You will know the overall score a few days later (email notification)
• You will not know which answers were right / wrong
• If you fail, you can retake the exam again 14 days later
AWS Certification Paths – Architecture
Architecture
Solutions Architect
Design, develop, and manage
cloud infrastructure and assets,
work with DevOps to migrate
applications to the cloud
Dive Deep

Architecture
Application Architect
Design significant aspects of
application architecture including
user interface, middleware, and
infrastructure, and ensure
enterprise-wide scalable, reliable,
and manageable systems Dive Deep

https://d1.awsstatic.com/training-and-
certification/docs/AWS_certification_paths.pdf
AWS Certification Paths – Operations
Operations
Systems Administrator
Install, upgrade, and maintain
computer components and
software, and integrate
automation processes
Dive Deep

Operations
Cloud Engineer
Implement and operate an
organization’s networked computing
infrastructure and Implement
security systems to maintain
data safety
Dive Deep
AWS Certification Paths – DevOps
DevOps
Test Engineer
Embed testing and quality
best practices for software
development from design to release,
throughout the product life cycle

DevOps
Cloud DevOps Engineer
Design, deployment, and operations
of large-scale global hybrid
cloud computing environment,
advocating for end-to-end
automated CI/CD DevOps pipelines Optional Dive Deep

DevOps
DevSecOps Engineer
Accelerate enterprise cloud adoption
while enabling rapid and stable delivery
of capabilities using CI/CD principles,
methodologies, and technologies
AWS Certification Paths – Security
Security
Cloud Security Engineer
Design computer security architecture
and develop detailed cyber security designs.
Develop, execute, and track performance
of security measures to protect information
Dive Deep

Security
Cloud Security Architect
Design and implement enterprise cloud
solutions applying governance to identify,
communicate, and minimize business and
technical risks

Dive Deep
AWS Certification Paths – Data Analytics &
Development
Data Analytics
Cloud Data Engineer
Automate collection and processing
of structured/semi-structured data
and monitor data pipeline performance

Dive Deep

Development
Software Development Engineer
Develop, construct, and maintain
software across platforms and devices
AWS Certification Paths – Networking & AI/ML
Networking
Network Engineer
Design and implement computer
and information networks, such as
local area networks (LAN),
wide area networks (WAN),
intranets, extranets, etc. Dive Deep

AI/ML
Machine Learning Engineer
Research, build, and design artificial
intelligence (AI) systems to automate
predictive models, and design machine
learning systems, models, and schemes
Congratulations & Next Steps!
Congratulations!
• Congrats on finishing the course!
• I hope you will pass the exam without a hitch J

• If you passed, I’ll be more than happy to know I’ve helped


• Post it in the Q&A to help & motivate other students. Share your tips!
• Post it on LinkedIn and tag me!

• Overall, I hope you learned how to use AWS and that you will be a
tremendously good AWS Developer
Next Steps
• We’ve spent a lot of time getting an overview of each service

• Each service on its own deserves its own course and study time

• Find out what services you liked and get specialized in them!

• My personal favorites: AWS Lambda, CloudFormation, EC2 & ECS

• Happy learning!

You might also like