AWS Notion
AWS Notion
Section I
IAM & CLI
EC2
EC2 Instance Storage
High Availability & Scalability
RDS, Aurora & ElastiCache
Route 53
Section II
Amazon S3
Amazon S3 Security
CloudFront, Global Accelerator & Wavelength
AWS Storage Extras
Decoupling Applications
Containers on AWS
AWS Serverless
Section III
Databases in AWS
Data & Analytics
Machine Learning
AWS Monitoring & Audit
IAM - Advanced
Section IV
AWS SAA 1
AWS Security & Encryption
Networking - VPC
Disaster Recovery & Migrations
Other Services
WhitePapers & Architectures
Other
Section I
IAM & CLI
🗹 IAM Users, Groups and Policies.
Users - people within our organization and they can be grouped. Users dont
have to belong to a group and one user can belong to multiple groups. Each user
has a unique name and credentials (password or access keys).
AWS SAA 2
on job function or role.
Users or Groups can be assigned JSON documents called policies which define
the permissions of the user. In AWS we apply the least privilege principle (dont
give more permissions than a user needs).
Principal (Optional) - Specifies the entity (user, account or role) that the
policy is applied to.
Action - lists the specific actions that the policy allows or denies. Actions are
represented as strings, often in the format "[service]:[action]".
AWS SAA 3
Resource - specifies the AWS resources.
🗹 IAM MFA.
Access Keys are credentials used to authenticate and authorize API requests to
AWS services. These keys consist of:
1. Access Key ID: Unique identifier that AWS uses to identify the IAM user or
AWS account making the API request.
AWS SAA 4
2. Secret Access Key: Secret key known only to the IAM user or AWS account.
It is used to digitally sign API requests made by the user, ensuring that the
request is authentic and comes from a trusted source.
Access keys are generated through the AWS Console. Users manage their own
access keys.
IAM Credential Report (account-level) - Report that lists all account’s users
and the status of their credentials.
AWS SAA 5
IAM Access Advisor (user-level) - Shows the service premissions granted
to a user and when those services were last accessed. This information can
be used to revise policies.
EC2
🗹 EC2 Basics.
AWS SAA 6
EC2 is an IaaS that offer:
When we create EC2 we can configure OS, CPU, RAM, storage, network card,
security and bootstrap script.
AWS SAA 7
🗹 Security Groups.
Security Groups are the fundamental of network security in AWS. They control
how traffic is allowed INTO or OUT of our EC2 Instances.
Security groups only contain allow rules. Rules can reference by IP or by another
security group.
Security groups can be attached to multiple instances. They are locked down to
a region/VPC combination.
It’s good practice to maintain one sparate security group for SSH access.
If application is not accessible (time out), then it is a security group issue and if
application gives a “connection refused”, then it is an application error.
AWS SAA 8
🗹 EC2 Instance Connect.
EC2 Instance Connect is a feature that allows us to connect to our EC2
instances without needing to manage SSH key. The user must have the
necessary IAM permissions to use EC2 Instance Connect.
1 year (Discount +)
Payment Options:
AWS SAA 9
No Upfront (Discount +)
Can change the EC2 instance type, instance family, OS, scope and
tenancy. Up to 66% discount.
Locked to a specific instance family & AWS Region, but flexible across
instance size, OS and tenancy.
Instances run on hardware that’s dedicated to us. May share hardware with
other instances in same account.
AWS SAA 10
Purchasing Options:
Reserved: 1 or 3 years
EC2 Placement Groups are a feature that enables us to influence the placement
of Amazon EC2 instances on the underlying hardware. This can be important for
workloads that require high performance, low latency, or other specific
characteristics.
AWS SAA 11
If we receive a capacity error when launching an instance in a placement group
that already has running instances, stop and start all of the instances in the
placement group, and try the launch again. Restarting the instances may migrate
them to hardware that has capacity for all the instances.
AWS SAA 12
Partition Placement Group
Instances are divided into logical partitions, each isolated from the other in
terms of failure. Failures in one partition do not affect the instances in other
partitions. We have more control over where instances are placed, but the
number of partitions and instances per partition can be constrained.
AWS SAA 13
🗹 Elastic Network Interface (ENI).
MAC Address
Security Groups
ENIs can be attached to instances in the same VPC, but must reside in the same
AZ. We can attach a secondary ENI to an instance and use it for failover
scenarios. If the primary instance fails, we can attach the ENI to a standby
instance.
AWS SAA 14
🗹 EC2 Hibernate.
Hibernate is a feature designed to help us manage our EC2 instances more
efficiently by allowing us to pause and resume them. When we hibernate an
instance, the contents of its memory (RAM) are saved to an EBS volume, and the
instance is stopped rather than terminated. We can later resume the instance
from this saved state. The root EBS volume must be encrypted.
Hibernation is useful for workloads that don’t require continuous operation but
need to be quickly resumed.
AWS SAA 15
EC2 Instance Storage
🗹 AMI.
AMI are customization of an EC2 instance. We can add our own configuration,
OS, software etc… All our software is pre-packaged so we have faster boot.
AMI are built for a specific region and can be copied across regions.
EC2 instances can be launched from:
AWS SAA 16
AWS Marketplace AMI - an AMI someone else made.
🗹 EBS.
EBS Volume (think of them as a “network USB stick”) is a network drive we can
attach to our instances (cloud only) while they run. It allows our instances to
persist data even after their termination. EBS volumes support live configuration
changes while in production which means that we can modify the volume type,
volume size, and IOPS capacity without service interruptions.
EBS can only be mounted to one instance at a time. They are bound to a specific
AZ.
AWS SAA 17
Data Lifecycle Manager (DLM) - automate the creation, retention, and deletion
of snapshots taken to back up our EBS volumes.
Delete on Termination
🗹 EBS Snapshots.
AWS SAA 18
EBS Snapshot Features:
EC2 Instance Store lose their storage if they are stopped (ephemeral). So we
have risk of data loss if hardware fails. Backups and replication are our
responsibility.
EBS Volumes are characterized in size, throughput and IOPS. Only gp2/gp3 and
io1/io2 Block Express can be used as boot volumes.
gp2 / gp3 (SSD) - general purpose SSD volume that balances price and
performance for a wide variety of workloads. Small gp2 volume can burst
AWS SAA 19
IOPS to 3000. Gp3 is cheaper.
io1 / io2 Block Express (SSD) - highest performance SSD volume for
mission-critical low-latency or high-throughput workloads. Supports multi-
attach. It’s great for database workloads.
st1 (HDD) - low cost HDD volume designed for frequently accessed,
throughput-intensive workloads.
sc1 (HDD) - lowest cost HDD volume designed for less frequently accessed
workloads.
🗹 EBS Multi-Attach.
Multi-Attach lets us to attach same EBS volume to multiple EC2 instances in the
same AZ. It is only possible with io1/io2 volume types. Each EC2 instance has full
AWS SAA 20
read and write premissions to the high-performance volume.
It’s limited to 16 EC2 instnaced at a time and we must use a file system that’s
cluster-aware (not XFS, EXT4, etc).
Use cases:
🗹 EBS Encryption.
All the data in flight moving between the instance and the volume is
encrypted.
Encryption has a minimal imapct on latency. EBS encryption leverages keys from
KMS (AES-256).
🗹 RAID Configuration.
AWS SAA 21
RAID (Redundant Array of Independent Disks) is a technology that combines
multiple physical disk drives into a single logical unit to provide data redundancy,
performance improvements, or both.
RAID 0 (Striping) - splits data evenly across two or more disks without any
redundancy. The main goal is to increase performance by allowing read and
write operations to happen in parallel across multiple disks.
It is ideal for applications where performance is critical, but data loss is not a
concern, such as video editing or gaming.
RAID 1 (Mirroring) - duplicates the same data on two or more disks. The goal
here is to provide fault tolerance. If one disk fails, the system can still
operate with the remaining disk(s) containing an exact copy of the data.
🗹 EFS.
EFS is managed NFS (network file system) that can be mounted on hundreds of
EC2. EFS works with Linux (uses POSIX file system) EC2 instances in multi-AZ. It
uses NFSv4.1 protocol.
AWS SAA 22
EFS can handle 1000s of concurrent NFS clients. Grows to petabyte-scale
network file system automatically.
Performance Classes
Performance Mode
Throughput Mode
Bursting Throughput - throughput scales with the size of the file system.
Suitable for most file systems, offering a balance between cost and
performance.
AWS SAA 23
Provisioned Throughput - allows us to specify the desired throughput
level, which can exceed what is available via the burst mode. Best for
applications requiring a consistent and predictable level of throughput.
Storage Classes
Infrequent Access (IA) - for files that are accessed less frequently but
require the same durability and availability as the EFS Standard class. It has
lower price to store, but has retrieval fee.
Archive - for rarelly accessed data (few times each year). It is 50% cheaper.
AWS SAA 24
We can mount our EFS file systems on our on-premises data center servers
when connected to our VPC with DX or VPN.
Vertical Scalability
AWS SAA 25
Increasing the number of instances/systems for our application (scale
out/in). Horizontal scaling implies distributed systems (very common for web
applications).
High Availability means running our application or system in at least 2 AZ. High
Availability usually goes hand in hand with horizonntal scalling. The main goal of
high availability is to survive data center loss (disaster).
Scalability vs Elasticity
🗹 ELB.
Load balancers are servers that forward internet traffic to multiple servers (EC2
Instances) downstream.
ELB is a managed load balancer. AWS guarantees that will be working, takes
care of upgrades, maintenance and high availability. ELB captures the logs and
stores them in the S3 bucket that we specify as compressed files. We can
disable access logging at any time.
AWS SAA 26
Health checks are crucial for Load Balancers. They enable the load balancer to
know if instances it forwards traffict to are available to reply to requests. The
health check is done on a port and a route (/health is common).
Some load balancers can be setup as interal (private) or external (public) ELBs.
AWS SAA 27
🗹 Application Load Balancer (ALB).
ALB operates at the application layer (Layer 7) of the OSI model, which enables it
to make routing decisions based on content. It supports redirects (for example
from HTTP to HTTPS).
ALB supports target groups, which are logical groups of resources. Each target
group can have different health checks and configurations, allowing for fine-
grained control. ALB can route to multiple target groups and health checks are at
the target group level.
Slow Start Mode - allows us to add new targets without overwhelming them with
a flood of requests.
Target Groups:
AWS SAA 28
Lambda functions.
ALBs are a great fit for microservices & container-based applications. They have
a port mapping feature to redirect to a dynamic port in ECS.
Routing Features
The application servers don’t see the IP of the client directly. The true IP of the
client is inserted in the header X-Forwarder-For.
NLB operates at transport layer (Layer 4). Can handle high volumes of traffic
with ultra-low latency, making it suitable for applications that require extreme
performance, low latency, and TCP/UDP-level traffic handling. Network Load
balancer doesn't have Weighted Target Groups.
NLB has one static IP per AZ and supports assigning Elastic IP (helpful for
whitelisting specific IP).
AWS SAA 29
Target Groups:
EC2 Instances.
ALB.
If we specify targets using an instance ID, traffic is routed to instances using the
primary private IP address specified in the primary network interface for the
instance.
AWS SAA 30
GLWB combines following functions:
Target Groups:
EC2 Instances.
AWS SAA 31
🗹 ELB Sticky Sessions (Session Affinity).
Enabling stickinesss may bring imbalance to the load over the backend EC2
instances.
For ALB the cookie is used for stickiness. There are two types of cookies:
Application-based Cookies
AWS SAA 32
Custom Cookie - it is generated by the target. Can include any custom
attributes required by the application. Cookie name must be specified
individually for each target group (cannot use names AWSALB,
AWSALBAPP or AWSALBTG).
NLB: It is disabled by default and can be enabled via the AWS Management
Console, CLI, or API. We need to pay charges for inter AZ data if enabled.
SSL Certificate allows traffic between our clients and our load balancer to be
encrypted in transit. They have an expiration date and must be renewed. Public
SSL certificates are issued by CA (Certificate Authorities).
AWS SAA 33
TLS (Transport Layer Security) - successor to SSL.
We can manage certificates using ACM and can create and upload our own
certificates alternatively. The load balancer uses an X.509 certificate (SSL/TLS
server certificate).
Server Name Indication (SNI) - solves the problem of loading multiple SSL
certificates onto one web server (to serve multiple websites). It is a newer
protocol and requires the client to indicate the hostname of the target server in
the initial SSL handshake. It only works with ALB, NLB & CloudFront.
AWS SAA 34
ALB & NLB support multiple listeners with multiple SSL certificates and use SNI
to make it work.
Deregistration Delay ensures that in-flight requests are handled properly when
instances are deregistered or terminated. It stops sending new requests to the
EC2 instance which is de-registering.
🗹 ASG.
AWS SAA 35
When EC2 Auto Scaling responds to a scale-out event, it launches one or more
instances. These instances start in the Pending state. If we added
an autoscaling:EC2_INSTANCE_LAUNCHING lifecycle hook to our ASG, the
instances move from the Pending state to the Pending:Wait state. After we
complete the lifecycle action, the instances enter the Pending:Proceed state.
When the instances are fully configured, they are attached to the ASG and they
enter the InService state.
When Amazon EC2 Auto Scaling responds to a scale-in event, it terminates one
or more instances. These instances are detached from the ASG and enter
the Terminating state. If we added
an autoscaling:EC2_INSTANCE_TERMINATING lifecycle hook to our ASG the
instances move from the Terminating state to the Terminating:Wait state. After
we complete the lifecycle action, the instances enter
the Terminating:Proceed state. When the instances are fully terminated, they
enter the Terminated state.
AWS SAA 36
ASG Attributes
Launch Template
AWS SAA 37
Min Size / Max Size / Initial Capacity
Scaling Policies
Simple/Step Scaling
Useful for applications with variable demand where we need more
granular control over scaling actions.
AWS SAA 38
< 30%), then remove unit.
Scheduled Scaling
AWS SAA 39
Average Network In/Out.
AWS SAA 40
Good practice is to use a ready-to-use AMI to reduce configuration time in order
to be serving request faster and reduce the cooldown period.
Dashboards monitoring.
AWS SAA 41
Vertical and horizontal scaling capability.
RDS Storage Auto Scaling helps us increase storage on our RDS instance
dynamically. When RDS detects we are running out of free database storage, it
scales automatically. We have to set maximum storage treshold. It is useful for
applications with unpredictable workloads.
🗹 RDS Deployments.
Read Replicas
Read Replicas are a feature that allows us to create one or more read-only
copies (max 15) of our primary database instance. They can be within AZ, cross
AWS SAA 42
AZ or cross region. Applications must update the connection string to leverage
read replicas.
Replication is async, so reads are eventually consistent. Replicas can be
promoted to their own database.
AWS SAA 43
Read replicas are used only for SELECT statements in queries.
In AWS there is a network cost when data goes from one AZ to another. For RDS
Read Replicas within the same region, we dont pay that fee.
Multi-AZ
Multi-AZ deployments are designed to enhance the availability and durability of
our database instances. This replication is sync. It is used for failover in case of
loss of AZ, loss of network, isntace or storage failure. Not used for scaling.
AWS SAA 44
Read replicas can be set too as Multi-AZ for DR.
Snapshot of database will be taken. Then new database is restored from the
snapshot in a new AZ. After that sync is established between the two databases.
AWS SAA 45
Multi-AZ DB Cluster Deployments - high availability and durability by replicating
data across multiple AZs and supporting automatic failover at both the instance
and cluster levels.
RDS Custom is a feature that provides a way to use a custom database engine.
It’s used for Oracle and Microsoft SQL Server with OS and database
customization (access to the underlying database and OS).
We can configure settings, install patches, access underlying EC2 using SSH or
SSM.
AWS SAA 46
🗹 Aurora.
Aurora stores 6 copies of our data across 3 AZ (4 copies out of 6 are needed for
writes and 3 copies out of 6 are needed for reads). It has self healing with peer-
to-peer replication.
Only one Aurora instance takes writes (master). Automated failover for master is
less than 30s. We can have up to 15 Aurora read replicas for reads and they all
support CRR.
AWS SAA 47
If we have an Aurora Replica in the same or a different AZ, when failing over,
Aurora flips the canonical name record (CNAME) for our DB Instance to point at
the healthy replica, which in turn is promoted to become the new primary. Start-
to-finish failover typically completes within 30 seconds.
Aurora Endpoints
Writer Endpoint - used for all write operations to the Aurora database. This
endpoint directs traffic to the primary instance within the Aurora cluster. In the
event of a failure of the master instance, Aurora automatically promotes one of
the replicas to be the new primary instance.
AWS SAA 48
🗹 Aurora Advanced Concepts.
AWS SAA 49
Aurora Serverless - automated database instantiation and auto-scaling based
on actual usage (no management). No capacity planning is needed for it and we
pay per second (can be more cost-effective). Can be used for infrequent or
unpredictable workloads.
Aurora Global - enables a single Aurora database to span multiple AWS regions,
providing low-latency global reads, fast disaster recovery, and cross-region data
replication.
There is 1 primary region for read and write and up to 5 secondary regions for
read-only. We can set up to 16 read replicas per secondary region. Typical
cross-region replication takes less than 1s.
AWS SAA 50
Aurora Machine Learning - enables us to add ML-based predictions to our
applications via SQL. We don’t need to have ML experience. It is used for fraud
detection, ads targeting, sentiment analysis, product recommendations…
AWS SAA 51
🗹 RDS & Aurora Backup & Monitoring.
RDS Performance Insights is a feature that helps us monitor and analyze the
performance of our RDS databases. It is offering a higher-level view of how our
database queries and workloads are performing over time.
RDS Backups
Automated backups
With automated backups we have ability to restore to any point in time (from
oldest backup to 5mins ago).
AWS SAA 52
Manual DB Snapshots
They are manually triggered by the user. Retention of backups can be set for
as long as needed.
Aurora Backups
2. Store it on S3.
3. Restore the backup file onto a new RDS instance running MySQL.
AWS SAA 53
1. Create a backup of our on-premises database using Percona
XtraBackup.
3. Restore the backup file onto a new Aurora cluster running MySQL.
It uses copy-on-write protocol - initially, the new database cluster uses the same
data volume as the original database cluster (no copying needed). When
updates are made to the new database cluster data, then additional storage is
allocated and data is copied to be separated.
At-rest encryption - database master & replicas encryption using KMS (must
be defined as launch time). If the master is not encrypted, the read replicas
cannot be encrypted. To encrypt an un-encrypted database we should go
through a database snapshost & restore as encrypted.
In-flight encryption - they are TLS-ready by default and use the TLS root
certificates client-side.
Audit Logs - can be enabled and sent to CloudWatch Logs for longer
retention.
🗹 RDS Proxy.
AWS SAA 54
the database. No code changes are required for most apps to integrate with RDS
Proxy.
It improves database efficiency by reducing the stress on database resources
and minimizes open connections and timeouts. It reduces RDS & Aurora failover
time by up to 66%.
RDS Proxy enforces IAM authentication for the database and securely stores
credentials in SSM.
🗹 ElastiCache.
Redis - in-memory data structure store that supports data types like strings,
hashes, lists…
AWS SAA 55
minimal operational overhead.
ElastiCache helps reduce load off databases for read intensive workloads. Using
ElastiCache involves heavy application code changes.
Redis vs Memcached
Components:
Application
AWS SAA 56
Cache must have an invalidation strategy to make sure only the most current
data is used in there.
Components:
Application Servers
AWS SAA 57
User logs into any of the application instances. The application writes the
session data into ElastiCache. Then the user hits another instance of our
application and instance retrieves the data and the user is already logged in.
Shard in Redis is a subset of the dataset. In sharding, data is split across multiple
Redis nodes, which helps with horizontal scaling. A shard can have a primary
node and multiple replica nodes to provide redundancy and availability.
Redis supports a persistence mode called Append-Only File (AOF). Every write
operation is logged in an append-only file, ensuring that no data is lost in case of
AWS SAA 58
failure. AOF provides data durability but may slightly degrade performance
because of disk writes. Turning on AOF ensures local data persistence but may
not help with regional failures. To provide resilience on region level we should
use Multi-AZ replication groups.
Cache Security
Redis AUTH - we can set a password/token when we create a Redis cluster. This
is an extra level of security for our cache (on top of security groups). It supports
SSL in-flight encryption.
AWS SAA 59
Lazy Loading (Cache-Aside) - all the read data is cached, data can become
stale in cache.
Write Through - adds or updates data in the cache written to a database (no
stale data).
AWS SAA 60
Gaming leaderboards are computationally complex and Redis sorted sets
guarantee both uniqueness and element ordering to solve this problem. Each
time a new element is added, it is ranked in real time, then added in correct
order.
Route 53
🗹 DNS.
DNS Terminologies
AWS SAA 61
🗹 Route 53.
Route 53 is a highly available, scalable, fully managed and authoritative (we can
update the DNS records) DNS. It is also a Domain Registrar and we have ability
to check the health of our resources.
Records define how we want to route traffic for a domain. Route 53 supports the
following DNS record types: A, AAA, CNAME, NS, CAA, DS, MX, NAPTR, PTR,
SOA, TXT, SPF, SRV.
AWS SAA 62
Domain/subdomain Name (e.g. example.com)
TTL (Time to Live) - amount of time the record is cached at DNS Resolvers.
Record Types
NS - specifies the authoritative DNS servers for the domain. For example,
example.com might have NS records pointing to ns1.example.com and
ns2.example.com.
MX - specifies the mail server responsible for receiving emails for the
domain.
Hosted Zones - a container for records that define how to route traffic to a
domain and its subdomains. We pay $0.50 per month per hosted zone.
Public Hosted Zones - contain records that specify how to route traffic on
the internet.
AWS SAA 63
Private Hosted Zones - contain records that specify how we route traffic
within one or more VPCs.
AWS SAA 64
Routing Traffic to a Website Hosted in S3.
The S3 bucket must have the same name as our domain or subdomain. Also
we must have a registered domain name. We can use Route 53 as your
domain registrar, or we can use a different registrar.
🗹 Route 53 TTL.
TTL is a mechanism that specifies the duration that DNS resolvers should cache
the information for a DNS record before querying the authoritative DNS server
(Route 53) for updated information.
Short TTLs (e.g., 60 seconds) - for DNS records that may change frequently,
such as records for load balancers or failover records. This allows changes to
propagate quickly.
Long TTLs (e.g., 86400 seconds for 24 hours) - for stable records that rarely
change. This reduces the number of DNS queries and can improve performance
and reduce costs.
Except for Alias records, TTL is mandatory for each DNS record.
AWS SAA 65
🗹 CNAME & Alias Records.
AWS Resources expose an AWS hostname and if we want we can map that
hostname to a domain we own.
Alias - specific to AWS Route 53 and provide a more powerful and flexible option
compared to CNAME records. They can map domain names to AWS resources
(app.mydomain.com ⇒ test.amazonaws.com). It works with root domain (apex
domain) and non root domain. It is free of charge.
AWS SAA 66
Alias Record Targets
HTTP Health Checks are only for public resources. With health checks we get
automated DNS failover. Health checks can be integrated with CloudWatch
AWS SAA 67
metrics.
About 15 global health checkers will check the endpoint health. If more than
18% of health checkers report the endpoint is healthy, Route 53 considers it
Healthy.
They pass only when the endpoint responds with the 2xx and 3xx status
codes. Health checks can be setup to pass/fail based on the text in the first
5120 bytes of the response.
AWS SAA 68
Calculated Health Checks
It combines the reuslts of multiple health checks into a single health check.
We can use OR, AND or NOT operators and can specify how many of the
health checks need to pass to make the parent pass.
AWS SAA 69
Health Checks that monitor CloudWatch Alarms (full control, for private
resources).
Because health checkers are outside the VPC, they cant access private
endpoints. We can create CloudWatch Metric and associate a CloudWatch
Alarm, then create a health check that checks the alarm itself.
AWS SAA 70
🗹 Routing Policy - Simple Routing.
It routes traffic to a single resource. It can specify multiple values in the same
record. If multiple values are returned, a random one is chosen by the client.
When Alias is enabled, we can specify only one AWS resource. It cannot be
associated with Health Checks.
AWS SAA 71
🗹 Routing Policy - Weighted Routing.
DNS records must have the same name and type. This routing policy can be
associated with Health Checks.
If all records have weight of 0, then all records will be returned equally.
AWS SAA 72
Use cases: load balancing between regions, testing new application versions …
It redirects to the resource that has the least latency close to us. Latency is
based on traffic between users and AWS regions.
AWS SAA 73
🗹 Routing Policy - Failover Routing.
This routing policy is used to ensure high availability for applications by directing
traffic to a backup site if the primary site becomes unavailable.
AWS SAA 74
We should create a “Default” record (in case there is no match on location). Also,
it can be associated with Health Checks.
It route traffic to our resources based on the geographic location of users and
resources. We have ability to shift more traffic to resources based on the defined
bias. Resources can be AWS resources (specified in aws region) and non-AWS
resources (we need to specify latitude and longitude).
To change the size of the geographic region:
AWS SAA 75
We must use Route 53 Traffic Flow to use this feature.
AWS SAA 76
🗹 Routing Policy - Multi-Value.
We buy or register our domain name with a Domain Registrar typically by paying
annual chargers. The Domain Registrar usually provides us with a DNS service to
AWS SAA 77
manage our DNS records, but we can use another DNS service to manage our
DNS records.
If we buy a domain on a third party registrar, we can still use Route 53 as the
DNS Service provider. First create a Hosted Zone in Route 53, then update NS
Records on third party website to use Route 53 Name Servers.
When we launch an EC2 instance into a default VPC, AWS provides it with public
and private DNS hostnames that correspond to the public IPv4 and private IPv4
addresses for the instance.
AWS SAA 78
When we launch an instance into a non-default VPC, AWS provides the instance
with a private DNS hostname only. New instances will only be provided with a
public DNS hostname depending on these two DNS attributes: the DNS
resolution and DNS hostnames that we have specified for our VPC and if our
instance has a public IPv4 address.
Section II
Amazon S3
🗹 S3 Basics.
S3 use cases: backup and storage, disaster recovery, archive, hybrid cloud
storage, application hosting, media hosting, data lakes, software delivery, static
website …
AWS SAA 79
Objects have a key. The key is FULL path (for example s3://my-
bucket/folder1/folder2/file.txt). The key is composed of prefix+object name.
There is no concept of “directories” within buckets, just keys with very long
names that contain slashes.
Object values are content of the body. Max object size is 5TB, if we upload more
than 5GB, must use “multi-part upload”. Object can have also metadata (list of
text key/value pairs), tags (unicode key/value pair, useful for security/lifecycle),
version ID.
🗹 S3 Security.
User-Based
IAM Policies - which API calls should be allowed for a specific user from IAM.
Resource-Based
Bucket Policies - bucket wide rules from the S3 console (allows cross
account).
AWS SAA 80
Object Access Control List (ACL) - finer grain (can be disable).
There are bucket settings for block public access. These settings were created
to prevent company data leaks. If we know our bucket should never be public,
we should leave these on (can be set at the account level).
Version for files can be enabled in Amazon S3. It is enabled at the bucket level.
Same key overwrite will change the versions.
AWS SAA 81
It’s best practice to version our buckets. It protects us against unintended
deletes (can restore a version) and we can roll back to previous version easily.
Any file that is not versioned prior to enabling versioning will have “null” version.
Suspending versioning does not delete the previous versions.
AWS SAA 82
🗹 S3 Storage Classes.
Used for frequently accessed data. Has low latency and high throughput.
Sustain 2 concurent facility failures. It’s used for big data analytics, mobile
and gaming applications, content distribution …
IA storage classes are used for data that is less frequently accessed, but
requires rapid access when needed. They have lower cost than S3 Standard.
S3 Standard-IA
It’s used for disaster recovery and backups. Has 99,9% availability.
S3 One Zone-IA
It’s used for storing secondary backup copies of on-premise data, or data we
can recreate. Has high durability in a single AZ. If AZ is destroyed data is lost.
S3 Glacier storage classes are low-cost object storage meant for archiving and
backup. Pricing consist of price for storage + object retrieval cost.
Has millisecond retrieval time, great for data accessed once a quarter.
Minimum storage duration is 90 days.
AWS SAA 83
that at least three expedited retrievals can be performed every five
minutes and provides up to 150 MB/s of retrieval throughput.
Standard - 12 hours.
Bulk - 48 hours.
S3 Intelligent Tiering
Deep Archive Access tier (optional) - configurable from 180 days to 700+
days.
AWS SAA 84
🗹 IAM Access Analyzer for S3.
IAM Access Analyzer ensures that only intended people have access to our S3
buckets (for example: publicly accessible bucket or bucket shared with other
AWS account).
🗹 S3 Lifecycle Rules.
Lifecycle Rules in S3 are a set of actions that we can configure to manage the
lifecycle of objects in our bucket. These rules enable us to automatically
transition objects to different storage classes or delete them after a specified
period. Lifecycle rules help us reduce costs and simplify storage management.
AWS SAA 85
Transition Actions - used to configure objects to transition to another storage
class.
Expiration Actions - used to configure objects to expire (delete) after some time.
Rules can be created for a certain prefix or for certain objects tags.
Our application on EC2 creates images thumbnails after profile photos are
uploaded to S3. These thumbnails can be easily recreated and only need to
be kept for 60 days. The source images should be able to be immediately
retrieved for these 60 days, and afterwards, the user can wait up to 6 hours.
How to design this?
A rule in our company states that we should be able to recover our deleted
S3 objects immediately for 30 days, although this may happen rarely. After
AWS SAA 86
this time, and for up to 365 days, deleted objects should be recoverable
within 48 hours.
🗹 S3 Requester Pays.
In general, buckets owners pay for all S3 storage and data transfer costs
associated with their buckets. With Requester Pays buckets, instead of the
bucket owner, the requester pays the cost of the request and the data download
from the bucket.
🗹 S3 Event Notifications.
AWS SAA 87
S3 event notifications typically deliver events in seconds but can sometimes take
a minute or longer.
🗹 S3 Performance.
AWS SAA 88
Performance Boosts
Multi-Part Upload - recommended for files < 100MB, must use for files
>5GB. It can help parallelize uploads.
S3 Select & Glacier Select retrieves less data using SQL by performing server-
side filtering. It can filter by rows and columns. We have less network transfer
and less CPU cost on client-side. We can perform S3 Select to query only the
necessary data inside the CSV files based on the bucket's name and the object's
key.
AWS SAA 89
🗹 S3 Batch Operations.
Use cases:
Modify ACLs
AWS SAA 90
We can use S3 Inventory to get object list and use S3 Select to filter our objects.
🗹 S3 Storage Lens.
With Storage Lens we can analzye and optimize storage across entire AWS
Organization or AWS Accounts. We can discover anomalies, idenfity cost
efficiencies and apply data protection best practices across entire AWS
Organization.
It is possible to aggregate data for Organization, specific accounts, regions,
buckets or prefixes.
AWS SAA 91
Default Dashboard - visualizes summarized insights and trends for both free
and advanced metrics. It shows Multi-Region and Multi-Account data. It is
preconfigured by S3 and cannot be deleted but can be disabled.
Detailed Status Code Metrics - provide insights for HTTP status codes.
Free Metrics - automatically available for all customers. Data is available for
queries for 14 days.
Advanced Metrics and Recommendations - paid metrics and features
(advanced metrics, CloudWatch publishing, prefix aggregation). Data is available
for queries for 15 months.
Amazon S3 Security
🗹 S3 Encryption.
AWS SAA 92
SSE with S3-Managed Keys (SSE-S3) - it is enabled by default and
encrypts S3 objects using keys handled, managed and owned by AWS.
SSE with KMS Keys stored in AWS KMS (SSE-KMS) - leverages KMS to
manage encryption keys.
AWS SAA 93
SSE with Customer-Provided Keys (SSE-C) - when we want to manage
our own encryption keys.
We must use HTTPS for this type of encryption. Encryption key must be
provided in HTTP headers, for every HTTP request made.
Client-Side Encryption
AWS SAA 94
Encryption in Transit
AWS SAA 95
Bucket Policies are evaluated before “Default Encryption”.
🗹 S3 CORS.
CORS - Web Browser based mechanism to allow requests to other origins while
visiting the main origin. The request wont be fulfilled unless the other origin
allows for the requests, using CORS Headers (Access-Control-Allow-Origin).
AWS SAA 96
🗹 S3 MFA Delete.
MFA Delete is a feature designed to add an extra layer of security to protect our
S3 bucket's objects from accidental or malicious deletion. It requires users to
provide two or more forms of verification before they can perform deletion.
To use MFA Delete, Versioning must be enabled on the bucket and only the
bucket owner (root account) can enable/disable MFA Delete.
🗹 S3 Access Logs.
S3 Access Logs are used for audit purpose, we may want to log all access to S3
buckets. Any request made to S3, from any account, authorized or denied will be
logged into another S3 bucket. Target logging bucket must be in the same AWS
region.
🗹 S3 Pre-Signed URLs.
We can generate pre-signed URLs using the S3 Console, AWS CLI or SKD. Users
given a pre-signed URL inherit the permissions of the user that generated the
URL for GET and PUT methods.
Use cases:
Allow only logged-in users to download a premium video from our S3 bucket.
AWS SAA 97
Allow temporarily a user to upload a file to a precise location in our S3
bucket.
Pre-signed URLs allow users to interact directly with the cloud storage service
rather than routing through our application server. This can reduce latency.
S3 Glacier Vault Lock lets us adopt a WORM (Write Once Read Many) model. We
need to create a Vault Lock Policy. It is helpful for compliance and data retention.
S3 Object Lock lets us adopt a WORM model and block an object version
deletion for a specified amount of time.
Object versions cant be overwritten or deleted by any user, including the root
users. Objects retention modes cant be changed and retention periods cant
be shortened.
AWS SAA 98
Retention mode - Governance
Most users cant overwrite or delete an object version or alter its lock
settings. Some users have special permissions to change the retention or
delete the objects.
Retention Period - protect the object for a fixed period, it can be extended.
Legal Hold - protect the object idefinitely, independent from retention period.
🗹 S3 Access Points.
Access Points simplify security management for S3 Buckets. Each Access Point
has its own DNS name (Internet Origin or VPC Origin) and access point policy
(similar to bucket policy).
We can define the access point to be accessible only from within the VPC. To
do that we must create a VPC Endpoint to access the Access Point and the VPC
Endpoint must allow access to the target bucket and Access Point.
AWS SAA 99
S3 Multi-Region Access Points - provide a global endpoint that applications can
use to fulfill requests from S3 buckets located in multiple AWS Regions. We can
use Multi-Region Access Points to build multi-Region applications with the same
simple architecture used in a single Region, and then run those applications
anywhere in the world. They provide built-in network resilience with acceleration
of internet-based requests to Amazon S3. Application requests made to a Multi-
Region Access Point global endpoint use AWS Global Accelerator.
🗹 S3 Object Lambda.
We can use AWS Lambda Functions to change the object before it is retrieved by
the caller application. Only one S3 bucket is needed, on top of which we create
S3 Access Point and S3 Object Lambda Access Points.
Provide access to multiple restricted files (e.g. all of the files for a video in
HLS format or all of the files in the subscribers' area of a website).
CloudFront Origins
S3 bucket
We can restrict who can access our distribution based on geographic location.
The country is determined using a third party Geo-IP database.
Allowlist - allow our users to access our content only if they are in one of the
countries on a list of approved countries.
Blocklist - prevent our users from accessing our content if they are in one of
the countries on a list of banned countries.
In case we update the back-end origin, CloudFront doesnt know about it and will
only get the refreshed content after the TTL has expired. We can force an entire
or partial cache refresh by performing a CloudFront Invalidation. We can
invalidate all files or special path.
AWS Snow Family are highly-secure, portable devices which collect and
process data at the edge and migrate data into and out of AWS. If it takes more
than a week to transfer over the network, we should use Snowball devices.
This a physical data transport solution. We can move TBs or PBs of data in or
out of AWS. It’s alternative to moving data over the network and paying
network fees. Instead we pay per data transfer job. Typical use cases are
large data cloud migrations and disaster recovery.
Snowcone
It’s a small, portable, rugged & secure device. Withstands hars enviroments.
Typically used for edge computing, storage and data transfer. When
Snowball does not fit we should use Snowcone (we must provide our own
battery and cables). It can be sent back to AWS offline or connect it to
internet and use AWS DataSync to send data.
Snowmobile
It’s a truck which can transfer exabytes of data. Each Snowmobile has 100PB
capacity. If we transfer more thatn 10PB it is better than Snowball.
Snowbal Edge & Snowcone can be also used in edge computing (processing
data while it’s being created on an edge location). Use cases of edge computing:
preprocessing data, ML, transcoding media streams …
Snowcone & Snowcone SSD ⇒ 2CPUs, 4GB of RAM, can use USB-C.
All can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass).
There are long-term deployment options - 1 and 3 years discounted pricing.
Snowcone is better for IoT.
Amazon FSx is a fully managed service that lets us launch third party high-
performance file systems on AWS.
FSx for Windows File Server can be joined to an existing on-premises Active
Directory. This allows the file system to continue using the same AD groups and
policies to restrict access to the shares, folders, and files, just as it did on-
premises.
AWS Storage Gateway is a bridge between on-premise data and cloud data in
S3. Our hybrid storage service will allow on-premises to seamlessly use the AWS
Cloud (disaster recovery, backup & restore, tiered storage).
Types of Storage Gateway:
To access our buckets we must create IAM role for each File Gateway. Also if
we use SMB we can use its integration with Active Directory for user
authentication.
FSx File Gateway - offers native access to Amazon FSx for Windows File
Server. It has local cache for frequently accessed data.
Use cases: sharing files, public datasets, CRM, ERP (Enterprise Resource
Planing).
DataSync is used to automate and accelerate the replication of data and move
large amount of data to and from on-premises to AWS (need agent) or AWS to
AWS. It can synchronize to S3, EFS and FSx. Replication tasks can be scheduled
hourly, daily or weekly.
File permissions and metadata are preserved (NFS POSIX, SMB…). One agent
task can use 10 Gbps and we can setup a bandwidth limit.
Decoupling Applications
🗹 AWS SQS - Standard Queues.
SQS is a fully managed message queuing service that enables us to decouple
and scale microservices, distributed systems, and serverless applications.
There is no limit how many messages can be in the queue. Default retention of
messages is 4 days, maximum of 14 days. Messages in the SQS queue will
continue to exist even after the EC2 instance has processed it, until we delete
that message. Consumers share the work to read messages and scale
horizontally. It can have duplicate messages.
SQS Security
Encryption - has in-flight encryption using HTTPS, at-rest encryption using KMS
and we can do client-side encryption.
Access Controls - we can configure IAM policies to regulate access to the SQS
API.
SQS Access Policies - similar to S3 bucket policies. Useful for cross-account
access to SQS queues. They allow other services to write to an SQS queue.
If visibility timeout is high (hours), and consumer crashes, re-processing will take
time and if visibility timeout is too low (seconds), we may get duplicates.
Long Polling - when a consumer requests messages from the queue, it can
optionally wait for messages to arrive if there are none in the queue.
It can be enabled at the queue level or at the API level using WaitTimeSeconds.
Topic Publish - using SDK. First create topic, then create a subscription and
publish to the topic.
SNS Message Filtering - assign a filter policy to the topic subscription, and the
subscriber will only receive a message that they are interested in.
SNS has FIFO Topic (ordering of messages in the topic). It has similar features as
SQS FIFO:
In the Fan Out pattern, we use SNS to broadcast messages to multiple SQS
queues. Each queue can then be processed independently by separate
consumers or applications. Need to configure SQS queue access policy to allow
SNS to write.
It’s fully decoupled and we dont have data loss. We have ability to add more SQS
subscribers over time. It works with SQS queues in other regions.
For the same combination of event type and prefix we can only have one S3
Event rule. If we want to send the same S3 event to many SQS queues we
should use fan out.
SNS can send to Kinesis and we can have the following solutions
architecture:
Kinesis Data Streams (KDS) - collect and process large streams of data
records in real time.
Kinesis Data Firehose - loads streaming data into data lakes, data stores,
and analytics services.
Kinesis Data Analytics - analyzes streaming data using SQL or Apache Flink.
Key Concepts
Shard - the fundamental unit of capacity within a stream. A stream can have
multiple shards, allowing for scaling up.
Producers - clients or services that write data to the KDS (AWS SDK, KPL,
Kinesis Agent).
KDS Security
We can control access/authorization using IAM policies. Encryption in flight is
done using HTTPS endpoints and at rest using KMS. Also we can implement
encryption/decryption of data on client side.
We can use VPC Endpoints for Kinesis to access within VPC. To monitor API calls
we use CloudTrail.
KDF is a fully managed service that makes it easy to reliably load streaming data
into data lakes, data stores, and analytics services. We pay for data going
through Firehose.
It provides real-time analytics on KDS & KDF using SQL. We can add reference
data from S3 to enrich streaming data. It is fully managed, no servers to
provision and has automatic scaling.
Output:
To ensure ordering for specific types of data in KDS, we can use partition keys.
Events with the same partition key are routed to the same shard, and thus will be
For SQS FIFO, if we dont use a Group ID, messages are consumed in the order
they are sent, with only one consumer.
🗹 AWS SWF.
Task - unit of work within an activity or a workflow. Tasks are sent to worker
processes that perform the actual work.
🗹 Amazon MQ.
Containers on AWS
🗹 ECS & Fargate.
ECS [EC2 Launch Type] - we must provision & maintain the infrasturcture (EC2
instances). Each EC2 Instance must run the ECS Agent to register in the ECS
Cluster. AWS takes care of starting/stopping containers.
EC2 Instance Profile (EC2 Launch Type only) - used by ECS agent to make
API calls to ECS service. We can send container logs to CloudWatch Logs,
pull Docker image from ECR, reference sensitive data in Secrets Manager or
SSM Parameter Store.
ECS Task Role - allows each task to have a specific role. Each role can use
different ECS services.
When an event matching the rule occurs, EventBridge triggers ECS task
or other service to process that event.
For example, we can run a scheduled ECS task every day at midnight to
perform batch processing.
ECS emits task state change events (e.g. task stopped, task started)
which are sent to EventBridge.
We can attach data volumes to our EKS cluster. We need to specify StorageClass
manifest on our EKS cluster. It leverages a Container Storage Interface (CSI)
compliant driver.
We have support for EBS, EFS (works with Fargate), FSx for Lustre, FSx for
NetApp ONTAP.
Amazon EKS Connector is a feature that allows us to connect and manage
Kubernetes clusters running outside of AWS (whether on-premises or in other
cloud environments) within the Amazon EKS console.
Amazon EKS Anywhere is a deployment option for Amazon EKS that enables us
to create and operate Kubernetes clusters on our own infrastructure, such as on-
premises data centers or other cloud environments.
EKS Node Types
Managed Node Groups - creates and manages Nodes (EC2 instances) for
us. Nodes are part of an ASG managed by EKS. It supports on-demand or
spot instances.
🗹 EKS Security.
EKS uses IAM to provide authentication to our Kubernetes cluster, but it still
relies on native Kubernetes Role-Based Access Control (RBAC) for authorization.
This means that IAM is only used for the authentication of valid IAM entities. All
permissions for interacting with our Amazon EKS cluster’s Kubernetes API are
managed through the native Kubernetes RBAC system.
Access to our cluster using IAM entities is enabled by the IAM Authenticator for
Kubernetes, which runs on the EKS control plane. The authenticator gets its
App Runner is a fully managed service that makes it easy to deploy web
applications and APIs at scale. It automatically builds and deploys the web app.
We dont need intrastructure experience (start with our source code or container
image).
We can connect to database, cache and message queue services. It has
automatic scaling, load balancer and encryption.
AWS Serverless
🗹 AWS Lambda.
AWS Lambda lets us run virtual functions on-demand (serverless). It has short
execution and automated scaling. Functions get invoked by AWS only when
needed (Event-Driven). Lambda functions are stateless, meaning each
invocation is independent of previous invocations. State can be managed via
external storage like DynamoDB or S3.
It is is integrated with the whole AWS suite of services and many programming
languages. We can easily monitor AWS Lambda through AWS CloudWatch.
We pay per request and compute time. 1 million requests per month and 400
000 GB-seconds of compute time per month are free.
Lambda can be used for serverless CRON job (trigger an event at specific time).
Docker images can be run using Lambda. The container image must implement
the Lambda Runtime API. We cant run Windows containers (only Linux).
AWS Lambda URLs are a feature that allows us to create HTTP endpoints for our
Lambda functions. This makes it easier to build serverless applications that
interact with web clients or other HTTP-based services.
Lambda Limits
☐ Edge Functions.
Origin Response - after CloudFront receives the response from the origin.
Author our functions in one AWS Region (us-east-1), then CloudFront replicates
to its locations.
We must define the VPC ID, the subnets and security groups. Lambda will create
an ENI in our subnets.
We must allow outbound traffic to our Lambda function from within our DB
instance (Public, NAT Gateway, VPC Endpoints). DB instance must have the
required permissions to invoke the Lambda.
🗹 DynamoDB.
DynamoDB is fully managed and highly available NoSQL schemaless database
with replication across 3 AZ. It’s distributed serverless database which can scale
to massive workloads. DynamoDB is fast and consisent in performance (can
handle millions of requests per second, trillions of row and 100s of TB of
storage). It’s low cost and has auto scaling capabilities. It can rapidly evolve
schemas. It has single-digit millisecond latency (low latency retrieval).
The more distinct partition key values that our workload accesses, the more
those requests will be spread across the partitioned space.
It is used for reacting to changes in real time, real time usage analytics,
implementing cross-region replication, invoke AWS Lambda on changes to our
DynamoDB table…
KDS - Process using AWS Lambda, KDA, KDF, AWS Glue Streaming ETL…
We have 1 year retention and it has high number of consumers.
Continuous backups using PITR - optionally enabled for the last 35 days.
PITR to any time within the backup window. The recovery process creates a
new table.
Export to S3 (must enable PITR) - works for any point of time in the last 35
days. It doesnt affect the read capacity of our table. We can perform data
analysis on top of DynamoDB and ETL on top of S3 data before importing
back to DynamoDB.
Amazon API Gateway is a fully managed service for developers to easily create,
publish, maintain, monitor and secure APIs. It is serverless and scalable,
supports RESTful APIs and WebSocket APIs. Has support for security, user
authentication, API keys… We can validate requests and responses (also cache
API responses). We pay only for the API calls we receive and the amount of data
transferred out.
We can monitor API usage, latency, and error rates using CloudWatch.
Lambda Function - can invoke AWS Lambda functions within our account
and start AWS Step Functions state machines.
HTTP - can make HTTP requests to endpoints hosted on AWS services such
as Elastic Beanstalk and EC2, as well as to non-AWS HTTP endpoints
accessible over the public internet.
AWS Services - can be directly integrated with other AWS services. For
example, we can configure an API method to send data directly to Amazon
Kinesis.
Regional - for clients within the same region. Cloud manually combine with
CloudFront (more control over the caching strategies and the distribution).
Private - can only be accessed from our VPC using an interface VPC
endpoint (ENI). We need to use a resource policy to define access.
User authentication through IAM roles (for internal apps), cognito (for external
users) or custom authorizer (our own logic).
Custom Domain Names allows us to use our own domain names for our APIs,
providing a more professional and user-friendly experience. We can map
multiple APIs to different paths on a single domain using base path mappings.
We can break down applications into smaller, loosely coupled services that
are independently deployable and scalable without managing servers. It
scales automatically and we only pay for what you use.
It can be integrated with EC2, ECS, on-remises servers, API Gateway, SQS, etc …
We can use it for implementing human approval feature.
AWS Step Functions' Map State is a feature designed to iterate over a collection
of items and process each one in parallel or sequentially, depending on how we
configure it. It allows us to specify a JSON array (or list) as input and then iterate
over that array and apply a specified state machine (or sub-state machine) to
each item.
The Map state allows us to configure how many iterations are processed in
parallel. We can set a MaxConcurrency value to limit the number of parallel
executions.
Inline Mode - each item in the array is processed using the same state
machine specified directly within the Map state’s Iterator. State machine logic
defined in the Iterator is directly applied to each item in the array.
{
"Type": "Map",
"ItemsPath": "$.items",
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ID:function:Proce
"End": true
}
}
},
"End": true
}
Distributed Mode - the Map state is used in conjunction with a nested state
machine or sub-state machine that is applied to each item in the array. It
{
"Type": "Map",
"ItemsPath": "$.items",
"Iterator": {
"StartAt": "SubMachine",
"States": {
"SubStateMachine": {
"Type": "StateMachine",
"StateMachineArn": "arn:aws:states:REGION:ID:stateMa
"End": true
}
}
},
"End": true
}
🗹 Amazon Cognito.
Cognito gives users an identity to interact with our web or mobile applications.
Cognito User Pools (CUP) - serverless database of user for our web &
mobile apps. It provides sign in functionality for app users. Integration with
API Gateway & ALB. It provides simple login (username/mail and password
combination), password reset, 2FA, MFA…
Provisioned RDS instance size & EBS Volume type & size.
RDS Custom for access to and customize the underlying instance (Oracle &
SQL Server).
Compatible API for Postgres & MySQL, separation of storage and compute.
Aurora Database Cloning - new cluster from existing one, faster than
restoring a snapshot.
🗹 ElastiCache Summary.
Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding).
Backup/Snapshot/PITR feature.
Used for key/value store, frequent reads, less writes, cache results for DB
queries, store session data for websites, cannot use SQL.
🗹 DynamoDB Summary.
Export to S3 without using RCU within the PITR window, import from S3
without using WCU.
🗹 Amazon Neptune.
Neptune is a fully managed graph database. It is highly available across 3 AZ,
with up to 15 Read Replicas. We can build and run apps working with highly
connected datasets (optimized for these complex and hard queries). It can store
up to billions of relations and query the graph with milliseconds latency.
🗹 Amazon DocumentDB.
DocumentDB is “AWS implementation” of MongoDB. It has similar deployment
concepts as Aurora (fully managed, highly available with replication across 3
AZ). Its storage automatically grows in increments of 10GB.
It’s used to review history of all the changes made to our application data over
time.
🗹 Amazon Timestream.
Timestream is a fully managed, fast, scalable serverless time series database. It
autimatically scales up/down to adjust capacity. Can store and analyze trillions
of events per day. It is 1000s times fasters & 1/10th the cost of relational
databases.
It is compatible with SQL and we can use scheduled queries. Recent data is kept
in memory and historical data kept in a cost-optimized storage. It has encryption
in transit and at rest.
Use cases: IoT apps, operational apps, real-time analytics, etc…
Athena Federated Query - allows us to run SQL queries across data stored in
relational, non-relational, object and custom data sources. It uses Data Source
Connectors that run on AWS Lambda and it stores results back in S3.
Use columnar data for cost-savings (less scan) - Apache Parquet or ORC is
recommended. We can use Glue to convert our data to Parquet or ORC.
🗹 Amazon Redshift.
Redshift is based on Postgres, but it’s not used for OLTP (Online Transaction
Processing). It is used for OLAP (Online Analytical Processing - Analytics & Data
Compute Node - for performing the queries and sending results to leader.
We must provision the node size in advance and use RI for cost savings.
Redshift Spectrum - feature which lets us query data that is already in S3
without loading it. For this we must have a Redshift cluster available to start the
query.
Redshift has Multi-AZ mode for some clusters. Snapshots are PIT backups of a
cluster, stored internally in S3. They are incremental (only what has changed is
saved). We can restore a snapshot into a new cluster.
EMR helps creating Hadoop clusters (Big Data) to analyze and process vast
amount of data. The clusters can be made of hundreds of EC2 instaces. It takes
care of all the provisioning and configuration. It has auto-scaling and it’s
integrated with Spot instances.
EMR supports Apache Spark, HBase, Presto, Flink…
Usage: data processing, log analysis, ML, web indexing, big data …
Transient Cluster - designed for short-lived, one-off jobs. They are created
for specific processing tasks and terminated immediately after the job is
complete.
Purchasing Options
Reserved (min 1 year) - cost savings (EMR will automatically use if available).
☐ AWS Glue.
AWS Glue is a serverless fully managed extract, transform, and load (ETL)
service. It is useful for preparing and transforming data for analytics.
Glue Components
Data Catalog - central repository that stores metadata about datasets in our
AWS environment. Can be used by Athena, Redshift or EMR.
Crawlers - automated processes in AWS Glue that scan and infer the
schema of data stored in various sources.
Glue Jobs - ETL scripts that users create to transform data. These jobs can
be scheduled and monitored within AWS Glue.
Glue Elastic Views - combine and replicate data across multiple data stores
using SQL. No custom code is needed, Glue monitors for changes in the source
data (serverless).
Glue DataBrew - clean and normalize data using pre-built transformation.
Glue Studio - used to create, run and monitor ETL jobs in Glue.
Glue Streaming ETL - run Glue jobs as streaming jobs. compatible with KDS,
Kafka, MSK
Glue PySpark Jobs
PySpark provides distributed processing via Apache Spark, which makes Glue
ideal for handling large datasets and complex transformations.
With Lake Formation we can discover, cleanse, transform and ingest data. It
combines structured and unstructured data. It has blueprints for S3, RDS,
relational & NoSQL DB, etc. Blueprints can help us migrate data from one place
to the data lake.
We can have fine-grained access controls for our applications at the row and
column level.
MSK creates and manages Kafka brokers nodes and Zookeeper nodes for us.
We can deploy the MSK cluster in our VPC and multi-AZ. Data is stored in EBS
volumes for as long as we want.
MSK Serverless - run Apache Kafka on MSK without managing the capacity.
MSK Consumers
AWS Managed Grafana is a fully managed service that makes it easy to deploy,
operate, and scale Grafana for monitoring and observability. Grafana is a popular
open-source analytics and monitoring tool that integrates with a wide variety of
data sources to provide customizable dashboards and visualizations for metrics,
logs, and traces. It scales automatically based on our needs.
Machine Learning
🗹 ML AWS Services.
Rekognition
Finds objects, people, text, secenes in images and videos using ML. Can be
used for facial anaylsis and facial search to do user verification, people
counting …
Transcribe
Polly
Turns text into speech using deep learning. It allows us to create applications
that talk.
Translate
Lex
Same technology that powers Alexa. It’s used for ASR and NLP, helps in
building chatbots and call center bots.
Connect
Comprehend
Used for NLP, fully managed and serverless service. Used for extracting key
phrases, places, people … Analyzes text using tokenization and parts of
speech, andautomatically organizes a collection of text files by topic.
SageMaker
Fully managed service for developers and data scientists to build ML models.
Forecast
Kendra
Fully managed document search service powered by ML. It extracts answers
from within a documents (text, pdf, HTML …). It can learn from user
interactions/feedback to promote preferred results (Incremental Learning).
Presonalize
Textract
Monitron
Panorama
Aims to bring computer vision (CV) capabilities to the edge devices in
industrial environments.
CloudWatch Logs is a logging service that monitors, stores, and access log files
from AWS resources, applications, and services. Logs are encrypted by default.
CloudWatch Logs can send logs to: S3, KDS, KDF, AWS Lambda, OpenSearch.
Logs can be collected from: Elastic Beanstalk, ECS, AWS Lambda, CloudTrail,
CloudWatch Log agents (on EC2 instances or on-premises servers), Route53.
CloudWatch Logs Insights - search and analyze log data stored in CloudWatch
Logs. Provides a purpose-built query language. It automatically discovers fields
from AWS services and JSON log events. We can save queries and add them to
CloudWatch Dashboard. It’s query engine, not a real-time engine.
Log data can take up to 12h to become available for export. It is not near real-
time or real-time, if we want that we should use Logs Subscriptions instead.
By default, no logs from our EC2 instance will go to CloudWatch, we need to run
a CloudWatch Agent on EC2 to push the log files we want (need to make sure
IAM permissions are correct).
CloudWatch Logs Agent - old version of the agent. Can only send to
CloudWatch Logs.
CloudWatch Unified Agent - it collects additional system-level metrics such as
RAM, processes, etc. It collect logs to send them to CloudWatch Logs. It has
☐ CloudWatch Alarms.
CloudWatch Alarms - used to trigger notifications for any metrics. It has various
options (sampling, percentage, max, min, etc). With CloudWatch Alarm we can
stop, terminate, reboot or recover an EC2 instance, trigger auto scaling action or
send notification to SNS.
Alarm States: OK, INSUFFICIENT_DATA, ALARM.
Period - length of time in seconds to evaluate the metrics.
Composite Alarms - monitoring the states of multiple other alarms and we can
use AND and OR conditions to merge them.
There is a status check to check the EC2 and we can define a CloudWatch Alarm
to monitor a specific EC2 instance. In case the alarm is being breached, we can
start an EC2 instance recovery or send an alert to our SNS topic.
We can archive events sent to an event bus (indefinetely or set period) and we
can replay archived events.
It gives us enhanced visibility into our applicaion health to reduce the time it will
take us to troubleshoot and repair our applications. Findings and alerts are sent
to EventBridge and SSM OpsCenter.
CloudTrail Events
Events are stored for 90 days in CloudTrail. To keep events beyond this period
we should log them to S3 and use Athena.
🗹 AWS Config.
AWS Config helps us with auditing and recording compliance of our AWS
resources. It can record configurations and changes over time. It is possible to
store the configuration data into S3 (analyzed by Athena).
We can use AWS managed config rules or make custom config rules (must be
defined in AWS Lambda). Rules can be evaulated/triggered for each config
change and/or at regular time intervals.
Config Rules does not prevent actions from happening (no deny).
Config Remediations
We can automate remediation of non-compliant resource using SSM Automation
Documents. It is possible to create custom Automation Documents that invokes
Lambda function.
We can set Remediation Retries if the resource is still non-compliant after auto-
remediation.
Config Notifications
We can use EventBridge to tigger notifications when AWS resources are non-
compliant.
Service Health is the single place to learn about the availability and
operations of AWS services. It offers the possibility to subscribe to an RSS
feed to be notified of interruptions to each service.
Service History - displays the status of AWS services across all regions. It
shows the current operational status and any ongoing issues.
IAM - Advanced
🗹 AWS Organizations.
AWS Organizations is a global service that allows us to manage multiple AWS
accounts. The main account is called master account. There is API which
automates AWS account creation.
Cost benefits:
Consolidated Billing across all accounts - we receive a single invoice for all
the accounts in our organization, simplifying the billing process.
Service Control Policies (SCPs) - allows us to control the services and actions
that users and roles can access within the accounts of our organization (whitelist
or blacklist IAM actions).
Can be applied to the OU (organizational unit) or account level (does not apply to
the master account). It is applied to all the users and roles of the account,
including root. SCPs are inherited down the hierarchy. An SCP attached to a root
or OU will affect all accounts and OUs beneath it.
It will not affect service-linked roles (service-linked roles enable other AWS
services to integrate with organizations and cant be restricted).
SCP must have an explicit allow (does not allow anything by defult).
aws:SourceIp - restrict the client IP from which the API calls are being made.
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
"192.0.2.0/24",
"203.0.113.0/24"
]
}
}
}
}
aws:RequestedRegion - restrict the region the API calls are made to.
{
"Version": "2012-10-17",
"Statement": [
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:startInstances",
"ec2:StopInstances"
],
"Resource": "arn:aws:ec2:us-east-1:12345:instance/
"Condition": {
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"ec2:StopInstances",
"ec2:TerminateInstances"
],
"Resource": ["*"],
"Condition": {
"Bool": {
"aws:MultiFactorAuthPresent": "true"
}
}
}]
}
IAM for S3
s3:ListBucket - applies to arn:aws:s3:::test (bucket level permission).
{
"Version": "2012-10-17",
"Statement": [
{
"Version": "2012-10-17",
"Statement": [
{
When using a resource-based policy, the principal doesnt have to give up his
permissions.
AWS IAM Identity Center (AWS Single Sign-On (AWS SSO)) is a cloud-based
service that simplifies managing access to multiple AWS accounts in AWS
Organizations and business applications.
AWS Directory Service is a managed service that makes it easy to set up and
run directory services in the AWS cloud or connect your AWS resources with an
existing on-premises Microsoft Active Directory (AD).
Microsoft Active Directory (AD) is a directory service developed by Microsoft
for Windows domain networks. It is a database of objects: user accounts,
computers, printers, file shares, security groups… Objects are organized in trees
and a group of trees is a forest.
AWS Control Tower is a service that provides a way to set up and govern a
secure, multi-account AWS environment based on AWS best practices. It is
designed for organizations that need to manage multiple AWS accounts and
provides a pre-configured environment called a landing zone. The landing zone
includes baseline IAM roles, account structures, and network configurations.
Benefits:
Preventive Guardrail - using SCPs (e.g. restrict regions across all our
accounts).
Section IV
AWS Security & Encryption
☐ KMS.
Keys Types
Asymmetrict (RSA & ECC Key Pairs) - public (encrypt) and private (decrypt)
key pair. Used for encrypt/decrypt or sign/verify operations. The public key
is downloadable, but we cant access the private key unencrypted. It is used
for encryption outside of AWS by users who cant call the KMS API.
AWS Managed Key - created, managed and used on the customer’s behalf
by AWS. Used by AWS services.
AWS Owned Key - collection of CMKs that an AWS service owns and
manages to use in multiple accounts. AWS can use those to protect
resources in our account (we cant view the keys).
Automatic key rotation is automatic every 1 year for AWS managed keys. For
created CMKs it must be enabled (can be automatic or on-demand), and for
imported KMS keys, only manual rotation is possible using alias.
KMS Key Policies - control access to KMS keys (similar to S3 bucket policies).
Default KMS Key Policy - created if we dont provide a specific KMS Key
Policy. Root user has complete access to the key.
Custom KMS Key Policy - define users, roles that can access the KMS key
and who can administer the key. Useful for cross-account access of our
KMS key.
1. AMI in source account is encrypted with KMS key from source account.
3. Share the KMS keys used to encrypt the snapshot AMI references with
the target account.
5. When launching an EC2 instance from AMI, optionally the target account
can specify a new KMS key in its own account to re-encrypt the volumes.
The KMS custom key store feature combines the controls provided by AWS
CloudHSM. We can configure our own CloudHSM cluster and authorize KMS
to use it as a dedicated key store for our keys rather than the default AWS
KMS key store. This is suitable if we want to be able to audit the usage of all
our keys independently of KMS or CloudTrail.
Multi-Region Keys are identical KMS keys in different regions that can be used
interchangeably. They have the sam key ID, key material and automatic rotation.
We can encrypt in one region and decrypt in other regions (no need to re-
encrypt or make cross-region API calls).
We can encrypt specific attributes client-side in our Aurora table using the AWS
Encryption SDK. Combined with Aurora Global Tables, the client-side encrypted
data is replicated to other regions. We can protect specific fields even from
database admins.
XKS Proxy is an intermediary service that acts as a bridge, enabling secure key
retrieval and management without requiring direct integration with the external
key store itself.
1. Specify which KMS Key to encrypt the objects within the target bucket.
3. Create IAM Role with kms:Decrypt for the source KMS Key and kms:Encrypt
for the target KMS Key.
We can use MRK, but they are currently treated as independent keys by S3 (the
object will still be decrypted and then encrypted).
SSM Parameter Store - it is a secure storage for configuration and secrets (API
keys, passwords, configurations …). It is serverless, scalable and durable. Also it
is secure because we control access permissions using IAM.
ASM is a newer service which is used for storing secrets. It has capability to
force rotation of secrets every X days. We can automate generation of secrets
on rotation using Lambda. All secrets are encrypted using KMS.
Can be integrated with Amazon RDS (MySQL, Postgres, Aurora). Mostly meant
for RDS integration.
We can replicate secrets across multiple regions. Secrets Manager keeps read
replicas in sync with the primary secret. There is an ability to promote a read
replica secret to a standalone secret.
ACM lets us easily provision, manage and deploy TLS certificates. Used to
provide HTTPS for websites. It supports both public and private TLS certificates.
Can be integrated with (load TLS certificates on) ELB, CloudFront distributions,
APIs on API Gateway.
There is an option to generate the certificate outside of ACM and then import it.
Because there is no automatic renewal, we must import a new certificate before
expiry. ACM sends daily expiration events, starting 45 days prior to expiration
(number of days can be configured). Those events are appearing in EventBridge.
First, we need to create a custom domain name in API Gateway. ACM can be
integrated with Edge-Optimized and Regional endpoints in API Gateway.
🗹 AWS WAF.
AWS WAF (Web Application Firewall) - protects our web applications from
common web exploits (Layer 7). It can be deployed on ALB, API Gateway,
CloudFront, AppSync and Cognito User Pool.
On WAF we can define Web ACL (Web Access Control List). Rules can include IP
addresses, HTTP headers, HTTP body or URI strings. It can protect us from SQL
Injection and XSS. There are Rate-based rules which are used for DDoS
protection. A rule group is a reusable set of rules that we can add to a web ACL.
Web ACL are regional except for CloudFront.
If we want to get a fixed IP in our application while using WAF with an ALB, we
have to use Global Accelerator to get fixed IP for application and then enable
WAF on our ALB.
AWS Shield Standard - protects against DDoS attack for our website and
applications, its free for all customers.
AWS Shield Advanced - 24/7 premium DDoS protection ($3000 per month).
🗹 Firewall Manager.
🗹 Amazon GuardDuty.
DNS Logs - compromised EC2 instances sending encoded data within DNS
queries.
Optional Features - EKS Audit Logs, RDS & Aurora, EBS, Lambda, S3 Data
Events…
🗹 Amazon Inspector.
EC2 Instances
Lambda functions
Can be integrated with AWS Security Hub and send findings to EventBridge.
Amazon Macie is a fully managed data security and data privacy service that
uses ML and pattern matching to discover and protect our sensitive data in AWS.
Macie helps idenfity and alert us to sensitive data in our S3 Buckets, such as
personally identifiable information (PII).
Networking - VPC
🗹 CIDR, Private & Public IP.
Subnet Mask - defines how many bits can change in the IP.
Private IP addresses are used within a VPC and are not routable over the
internet. They are used for internal communication between resources within the
same VPC or between different VPCs if they are peered or connected via a VPN.
Public IP addresses are routable over the internet. They are used to allow
resources to communicate with external networks and to be accessible from the
internet.
VPC (Virtual Private Cloud) is a private network where we can deploy our
resrouces (regional resource). All new AWS accounts have a default VPC. New
EC2 instances are launched into the default VPC if no subnet is specified.
Default VPC has internet connectivity and all EC2 instances inside it have public
IPv4 addresses.
Our VPC CIDR should not overlap with our other networks (e.g. corporate).
Routing Table is a set of rules (routes) that determine where network traffic is
directed. Every subnet in our VPC must be associated with a route table.
🗹 Bastion Hosts.
Bastion Host security group must allow inbound from the internet on port 22
from restricted CIDR, for example the public CIDR of our corporation.
Security Group of the EC2 instances must allow the Security Group of the
Bastion Host, or the private IP of the Bastion Host.
NAT Instance allows EC2 instance in private subnets to connect to the internet.
They must be launched in a public subnet and we must disable EC2
source/destination check setting and have Elastic IP attached to it.
Route Tables must be configured to route traffic from private subnets to the NAT
Instance.
It cant be used by EC2 instance in the same subnet (only from other subnets). It
requires an Internet Gateway.
NAT Gateway is resilient within a single AZ. We must create multiple NAT
Gateways in multiple AZs for fault-tolerance. There is no cross-AZ failover
needed because if an AZ goes down it doesnt need NAT.
Rules have a number (1-32766), higher precedence = lower number. First rule
match will drive the decision.
Security Groups - firewall that controls traffic to and from an EC2 instance. It
can have only ALLOW rules. Rules include IP addresses and other security
groups.
Client → Web Server: The client connects to the web server using an
ephemeral port.
Web Server → Database: The web server connects to the RDS instance over
port 3306 to retrieve or store data.
Response from Web Server → Client: The web server sends the response
back to the client.
NACL rules ensure that the web tier can communicate with the database tier
while limiting traffic to only the necessary ports. The use of a private subnet for
the database ensures it is not directly accessible from the internet, enhancing
security.
🗹 VPC Peering.
VPC Peering connects two VPCs privately using AWS network. It makes two
VPCs behave as if they were in the same network. They must not have
overlapping CIDR (IP address range).
VPC Peering connection is not transitive (must be established for each VPC that
need to communicate with one another).
We must update route tables in each VPC’s subnets to ensure EC2 instances can
communicate with each other. We can create VPC Peering connection between
VPCs in different AWS accounts/regions. Also we can reference a security group
in a peered VPC (works cross accounts in same region).
AWS must use a Virtual Private Gateway (VGW) - the VPN concentrator on
the Amazon side of the VPN connection within the AWS infrastructure. It is a
logical entity that represents a VPN gateway capable of handling multiple
VPN connections simultaneously.
On-premises should use public internet-routable IP address for its CGW device
or if it’s behind a NAT device that is enabled for NAT traversal (NAT-T), it should
use the public IP address of the NAT device.
We must enable Route Propagation for VGW in the route table that is
associated with our subnets.
Use cases:
Hybrid environments.
Direct Connect uses Virtual Interface (VIFs) to logically separate different types
of traffic. There are three primary types:
Private VIF: Used to connect to our VPC for accessing resources like EC2
instances, RDS databases, etc. Used when we want a private link between
our on-premises environment and AWS.
Public VIF: Used to access AWS public services (e.g., S3, DynamoDB, etc.)
over a private connection. Public VIFs connect to AWS public endpoints and
can be used to route traffic to multiple AWS Regions.
Transit VIF: This allows us to connect multiple VPCs across different AWS
Regions using a single connection and a transit gateway.
Lead times are often longer than one month to establish a new connection.
DX Encryption
☐ Transit Gateway.
Transit Gateway is used for having transitive peering between thousands of VPC
and on-premises. We need only one single gateway to provide this functionality.
It works with DX Gateway and VPN connections. We must configure route tables
to limit which VPC can talk with other VPC.
We can peer Transit Gateways across regions and share cross-account using
RAM. It supports IP Multicast (not supported by any other AWS service).
Traffic Mirroring - allows us to capture and inspect network traffic in our VPC. It
routes the traffic to security appliances that we manage.
Every IPv6 address in AWS is public and Internet-routable (no private range).
IPv4 cannot be disabled for our VPC and subnets. We can enable IPv6 to operate
in dual-stack mode. Our EC2 instances will get at least a private internal IPv4
and a public IPv6.
Egress-Only Internet Gateway is used for IPv6 only (similar to a NAT Gateway
bur for IPv6). It allows outbound connections over IPv6 for instances in our VPC
while preventing the internet to initiate an IPv6 connection to our instances. We
must update the route tables.
We should try to keep as much internet traffic within AWS to minimize costs.
S3 Data Transfer Pricing
Outbound to internet
Inbound to internet
Internally, the AWS Network Firewall uses the AWS Gateway Load Balancer.
Rules can be centrally managed cross-account to apply to many VPCs.
Network Firewall Endpoint - a virtual appliance deployed within our VPC that
inspects and filters network traffic based on the firewall rules we configure. It
provides the actual interface through which the firewall processes the traffic. It’s
responsible for applying the policies and rules we define to control incoming
and outgoing network traffic. We can allow, drop or alert for the traffic that
matches the rules. All rule matches can be sent to S3, CloudWatch Logs and
KDF.
It’s typically used in instances that need high throughput and low latency but do
not require the advanced features of ENA or EFA.
RPO (Recovery Point Objective) - indicates the maximum amount of data that
can be lost measured in time. It’s the point in time to which your data must be
recovered after an incident.
RTO (Recovery Time Objective) - maximum acceptable time that it takes to
restore a system or service after a disruption. It reflects how quickly you need to
recover after a failure.
Backup & Restore - most basic DR strategy where data is backed up to AWS
storage services and restored in the event of a disaster. Suitable for non-
critical applications where downtime can be tolerated.
RTO and RPO are typically longer because the restoration process involves
re-provisioning infrastructure and retrieving data.
This approach offers the lowest RTO and RPO, potentially near zero, as both
sites are live and in sync.
🗹 DMS.
With DMS we can quickly and securely migrate databases to AWS. The source
database remains available during the migration. DMS supports both
homogeneous and heterogeneous migrations. It can migrate data from various
sources like on-premises databases or other AWS services. DMS includes a
schema conversion tool (AWS SCT) that helps convert the schema of source
databases to match the target database schema.
Option 2: Create an Aurora Read Replica from our RDS MySQL and when the
replication lag is 0, promote it as its own DB cluster (can take time and
money).
Option 2: Create an Aurora Read Replica from our RDS Postgres and when
the replication lag is 0, promote it as its own DB cluster (can take time and
money).
Create backup and put it in S3 then import it using the aws_s3 Aurora
extension.
🗹 AWS Backup.
AWS Backup is a fully-managed service to centrally manage and automate
backups across AWS services. It can have on-demand and scheduled backups.
Supports PITR.
Cross-Region Backup - backing up data from one AWS region to another. This is
useful for disaster recovery scenarios where we want to ensure data availability
even if an entire AWS region becomes unavailable due to a disaster or outage.
Backup Vault Lock - enforces WORM state for all the backups that we store in
our AWS Backup Vault. It is additional layer of defense to protect our backups
against inadvertend or malicious delete operations. Even root user cannot delete
backups when enabled.
Over Snowball - ideal when dealing with extremely large datasets or when
we have limited or unreliable internet connectivity.
Some customers use VMware Cloud to manage their on-premises Data Center.
They want to extend Data Center capacity to AWS, but keep using the VMware
Cloud software.
VMware Cloud on AWS is a service that allows businesses to run VMware's
virtualization technologies on AWS infrastructure. Essentially, it extends our
VMware environment to the AWS cloud
All-at-Once (Big Bang) - deploys the new version of the application to all
instances at once. It is simple and fast, with no need for complex infrastructure
setups. It has high risk of downtime if the new version has issues, no easy
rollback. If something goes wrong, the entire system can be affected.
Use Cases: Applications requiring zero downtime and where rollback needs to
be swift and reliable.
Canary Deployment - small portion of the traffic is directed to the new version
while the rest remains on the old version. Gradually, more traffic is shifted to the
Use Cases: Applications with a large user base or critical services where early
feedback and incremental rollout are important.
Shadow Deployment - new version is deployed alongside the old version, but
the new version does not serve production traffic. Instead, it processes a copy of
the production traffic to see how it behaves.
Use Cases: Validating major updates or entirely new services without risking
production stability.
Other Services
🗹 CloudFormation.
CloudFormation is a declarative way of outlining our AWS Infrastructure, for any
resources.
Benefits of CloudFormation
Template is a JSON or YAML formatted text file that describes the resources
AWS CloudFormation will provision in our stack.
Resources:
MyInstance:
Type: AWS::EC2::Instance
CreationPolicy:
ResourceSignal:
DependsOn attribute specifies that the creation of one resource depends on the
creation of another resource. This is useful when you need to control the order in
which resources are created or updated.
Resources:
MyBucket:
Type: AWS::S3::Bucket
MyBucketPolicy:
Type: AWS::S3::BucketPolicy
DependsOn: MyBucket
Resources:
MyAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Resources:
MyBucket:
Type: AWS::S3::Bucket
UpdateReplacePolicy: Retain
🗹 AWS Batch.
AWS Batch is a fully managed batch processing service at any scale (efficiently
runs 100 000s of computing batch jobs on AWS). A batch job is a job with a start
and an end.
Batch jobs are defined as Docker images and run on ECS. Batch will dynamically
launch EC2 instances or Spot Instances and will provision the right amount of
compute/memory.
🗹 AWS SSM.
We need to install the SSM agent onto the systems we control (installed by
default on Amazon Linux AMI). If an instance cant be controlled with SSM, it’s
probably an issue with the SSM agent. With SSM agent we can run commands,
patch and configure our servers.
SSM Session Manager - allows us to start SSH on our EC2 and on-premises
servers. No SSH access, bastion hosts or SSH keys needed. Also if we want
better security we can disable port 22.
SSM Automation
EventBridge
AWS Config
Health Monitoring - a crucial feature that helps ensure the reliability and
availability of our applications. It involves tracking the status of the
environment's resources and application instances, detecting issues, and taking
appropriate actions to maintain the desired state of the environment. It pushes
metrics to CloudWatch.
🗹 Amazon SES.
SES (Simple Email Service) is a fully managed service used to send email
securely, globally and at scale. It allows inbound and outbound emails. We can
send emails using our application, AWS Console, APIs or SMTP.
🗹 Amazon Pinpoint.
In SNS & SES we manage each message’s audience, content and delivery
schedule.
Cloud Map is a service that allows us to easily create and manage custom
names for our application resources. It simplifies the process of service
discovery in cloud-based applications by maintaining a dynamic directory of all
our services and their locations.
Cost Anomaly Detection continously monitor our cost and usage using ML to
detect unsual spends. It learns our unique, historic spend patterns to detect one-
time cost spike and/or continous cost increases (we dont need to define
tresholds).
We can monitor AWS services, member accounts, cost allocation tags or cost
categories. It will send us anomaly detection report with root-cause analysis and
we can get notified with alerts using SNS.
Cost Explorer - allows users to visualize, understand, and manage their AWS
costs and usage over time. We can create custom reports that analyze cost and
usage data on high level (total cost and usage across all accounts).
With Cost Explorer we are able to access our optimal savings plan. Also we can
forecast usage up to 12 months based on previous usage.
Cost optimization.
Performance.
Security.
Fault tolerance.
Service limits.
Operational Excellence.
🗹 Amazon AppFlow.
We dont spend time writing the integrations, we can leverage APIs immediately.
Loose Coupling
Services, not Servers - dont use just EC2, use managed services,
databases, etc.
🗹 WhitePapers Links.
https://aws.amazon.com/blogs/aws/aws-well-architected-framework-updated-
white-papers-tools-and-best-practices/
https://d1.awsstatic.com/whitepapers/aws-disaster-recovery.pdf
Well-Architected framework whitepaper:
https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-
Architected_Framework.pdf
1. WhatsTheTime App
Requirements:
Requirements:
There is a shopping cart and users should not lose their shopping cart.
Architecture
We can use basic architecture with Route 53, ELB (Multi-AZ) and ASG (3 AZ).
Session
We could use sticky session to save shoping cart but it will violate EC2
performance. Using web cookies is better but we have security risks. The best
way to save session data is to store it in ElastiCache.
Database
We can store user data in RDS. Because there are hundreds of users we can use
RDS Read Replicas to scale reads (there is alternative using Lazy Loading
Availability
Using Multi-AZ RDS deployment can be used for disaster recovery.
Security
Requirements:
Serverless architecture.
Users should be able to directly interact with their own folder in S3.
Users can write and read to-dos, but they mostly read them.
The database should scale and have some high read throughput.
We can use DAX as a caching layer, so reads will be cached and DynamoDB will
not need many RCU. Another alternative is to cache responses at API Gateway
level.
Requirements:
We can store our static content in S3 and we can use CloudFront to serve it
globally. To secure it we will add Origin Access Control which will ensure that our
S3 bucket can only be accessed by CloudFront. For this we will add a bucket
policy to only auhorize the CloudFront distribution.
We will have REST HTTPS which will go through API Gateway, invoking a
Lambda function and query/read from DynamoDB (can use DAX for cache). If we
go global we can use DynamoDB Global Databases.
Welcome Email
For user welcome email we can use DynamoDB Stream which will invoke a
Lambda function (need IAM role) which will trigger SES to send an email.
Thumbnail Generation
When image is uploaded we need to store it in S3 bucket (can use again OAC
and CloudFront - Transfer Acceleration). Uploading to S3 will trigger Lambda
5. Microservices Architecture.
API Gateway, Lambda scale automatically and we can pay per usage.
We can use CloudFront. It will cache software update files at the edge and no
changes are needed to the architecture. Software update files are not dynamic,
they are static (never changing).
EC2 isntances arent serverless, but CloudFront is and it will scale for us. We will
save in availability, network bandwidth cost, etc.
Requirements:
Streaming
IoT devices send real-time data to Kinesis Data Streams. Data is periodically sent
to Kinesis Data Firehose, which can trigger AWS Lambda for processing and
then store it in S3 (Ingestion Bucket).
Processing
The data may go through SQS before further processing by another Lambda
function. The processed data is analyzed using Amazon Athena, with results
stored in the Reporting Bucket.
Analytics
The data in the Reporting Bucket or Redshift can be visualized using QuickSight
or further analyzed in Redshift.
BP1 - CloudFront ⇒ Web application delivery at the edge and we are protected
from DDoS common attacks (SYN floods, UDP reflection…).
BP1 - Global Accelerator ⇒ Access our application from the edge. We can
integrate it with Shield for DDoS protection.
Infrastructure layer defense (BP1, BP3, BP6) ⇒ Protects EC2 against high traffic.
EC2 with ASG (BP7) ⇒ Helps in case of sudden traffic surges including a flash
crowd or DDoS attack.
ELB (BP6) ⇒ Scales with the traffic increase and will distribute the traffic to many
EC2 instances.
Detect & filter malicious web requests (BP1, BP2) ⇒ CloudFront cache static
content and serve it from the edge locations, protecting our backend. WAF is
used on top of CloudFront and ALB to filter and block requests based on request
signatures.
Shield Advanced (BP1, BP2, BP6) ⇒ Shield Advanced automatically creates,
evaluates and deploys WAF rules to mitigate layer 7 attacks.
Attack Surface Reduction
Other
Important Ports.
Service Port
FTP 21
SSH 22
SFTP 22
HTTP 80
HTTPS 443
PostgreSQL 5432
MySQL 3306
EC2 Instances
RDS
EBS
Restore from a snapshot - the disk will already be formatted and have data.