Study GUIDE AWS
Study GUIDE AWS
Categorical encoding
Changing category into a number
Ordinal is when the order matters (ex. Bronze, silver and gold)
Bag of words
Breaks up text by white space into single words
N-grams
Produces groups of words of n-sizes
Example:
OSB, size =4
Remove punctuation
Cleaning and standardization
Lowercase transformation
Cleaning and standardization
Binning
Quantile binning
Grouping things together in bins with same number of data
Data Warehouse
Data Lakes
Use cases
• Send real-time alarms or notifications when certain metrics reach
predefined threshold.
• Stream raw sensor data then, clean, enrich, organize, and transform it before
it lands into a data warehouse or data lake.
Amazon Aurora
• Amazon Aurora has in some cases, 5x the performance of MySQL.
• Amazon Aurora scales from 10GB to 64TB.
• Scales from 64 vCPU to 488 vCPU.
• It stores a minimum of 2 copies of your data in 3 AZ.
• The limit of read replicas is 15, and replication is async.
• If you have 100% CPU utilization, you need to scale up (Scaling up means
increasing the instance size).
• If you have a bottleneck in reads, you need to scale out (Scaling out means
adding read replicas).
• Aurora serverless, is an on demand, auto scaling configuration for Aurora
where the database will automatically start stop shutdown and scale up our
out based on the application needs.
• Aurora is MultiAZ by default.
Tips
• If you encrypt at rest, all your read nodes are going to be encrypted.
• If you set up a cross region read replica, make it AZ since if it disrupted you
have to set it up again.
• To delete the cluster, you need to delete nodes.
• Encryption at rest is turned on by default.
• The lower the tier, the higher the priority.
• Tier 0 is the highest priority.
Amazon CloudFront
• How does origins and behaviors work?
Amazon CloudWatch
• AWS Config is for resource configuration, AWS CloudTrail is to log API calls
and AWS CloudWatch is to measure performance.
• RAM is a custom metric. You need AWS SDK or CLI to send the metric using.
• Disk usage is another custom metric.
o https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-
metric-data.html
Amazon Cognito
• The AWS preferred sign-up, sign-in and ACL for web and mobile apps.
• An identity pool can also handle anonymous users.
Amazon Database Migration Service (DMS)
• Oracle versions 10.2 and later (for versions 10.x), 11g and up to 12.2, and
18c for the Enterprise, Standard, Standard One, and Standard Two editions
Note
Note
Amazon RDS instance databases, and Amazon Simple Storage Service (Amazon
S3)
• Oracle versions 10.2 and later (for versions 10.x), 11g and up to 12.2, and
18c for the Enterprise, Standard, Standard One, and Standard Two
editions.
Note
Note
EBS
• EBS is 10x more expensive than S3.
• RAID0 offers no redundancy.
• RAID1 is often called mirroring, because that is exactly what we are doing.
• RAID5, 1 drive can fail
• RAID6, 2 drives can fail
Regarding throughput:
• Types of deployment:
o Rolling deployment: Create a new launch configuration with an
updated version and start terminating instances to bring them up
with version 2.
o A/B testing: Very popular in websites, you can send 90% to Version 1,
and send 10% to Version 2.
o Blue-green deployment: Create another ELB and a fleet of EC2
instances with version 2, change Route 53 DNS record to point to
“green” deployment”.
▪ Really easy rollback.
o Canary deployment: Canary release, you deploy in just one EC2
instance, and sit back and measure if everything is working correctly.
Tips
• Types of errors:
o InstanceLimitExceeded: means you have exceeded the number of
EC2 instances you can have of that type, you need to raise the limit
with AWS Support.
o InsuficientInstanceCapacity: Try later, change number or type, buy RI.
• Once you created a launch configuration, you can’t modify it.
• Scaling based on Amazon SQS:
Amazon EFS
• EFS is 2x cost as EBS, 15x as S3.
• EFS File Sync Agent.
Amazon Elastic Container Services
• Managed, highly scalable container platforms.
• Types of container services at AWS:
o Amazon ECS
▪ Leverages AWS services like Route53, ALB and CloudWatch.
▪ “Tasks” are instances of containers.
▪ You can use EC2 as provisioned instances
▪ Fargate is a “serverless” solution, it provisions compute as
needed.
o Amazon EKS
▪ Handles many things with the K8 platform.
▪ “Pods” are collection of containers.
Amazon ElastiCache
Shards
• 1,000 records per second
• Default limit of 500 shards, buy you can request increase to unlimited
shards.
• A dara recod is the unit of data captures:
o Sequence number
o Partition key
o Data blog (your payload, up to 1 MB)
• Transient Data Store – The retention period for data records are 24 hours to
7 days
• Allow you to generate, store and manage cryptographic keys to protect your
data in AWS.
• KMS uses shared hardware multitenant managed service.
• Is suitable where multi-tenancy is not an issue.
• If there is regulatory (like banking), you need HSM.
• Symmetric keys, same key to encrypt and decrypt.
• HSM.
• Dedicated HSM instance, hardware is not shared with other tenants, it lives
in your VPC.
• Is compliant with FIPS 140-2 Level 3 Compliance, includes tamper-evident
physical security mechanisms.
• It’s suitable for applications which have a contractual or regulatory
requirement (banking, financial, PCI, etc.).
Amazon Neptune
• Fully managed graph database.
Amazon RDS
• RDS Anti-Patterns
• If you want to build a data warehouse, you should use read replicas to query
the read replica, not the master.
• MultiAZ helps with snapshots, since it uses the read replica to create
snapshots and it doesn’t affect your master.
• You need to connect to different endpoints to the read replicas.
• Read replicas can be MultiAZ.
• Read replicas can exist in different regions.
• Read replicas is async replication.
• In order to have read replicas, you need to have enabled automated
snapshots.
• The limit of read replicas is 5, and replication is async.
• RDS can’t use System Manager (SSM) Parameter Store.
Tips
• MySQL: Non-transactional storage engines like MYISAM don’t support
replication; you must use InnoDB or XtraDB in MariaDB.
• Promoting a read replica is a big deal, so maybe you want to do it manually.
• Aurora PostgreSQL do not support cross-region replicas at present.
Amazon Redshift
• Petabyte and cost-effective data warehouse.
• Redshift is for on-line analytical processing (OLAP).
• Redshift Spectrum adds the ability to query S3 data directly.
Amazon Route53
• Simple routing:
o At random IP.
• Weighted:
o Example, 20% for one region, 80% to another region.
• For example, if you want to send a tiny portion of your traffic to one
resource and the rest to another resource, you might specify weights of
1 and 255. The resource with a weight of 1 gets 1/256th of the traffic
(1/1+255), and the other resource gets 255/256ths (255/1+255). You can
gradually change the balance by changing the weights. If you want to stop
sending traffic to a resource, you can change the weight for that record
to 0.
• Latency
o Lowest latency for the region that gives the user the least latency.
• Failover:
o Active/passive setup.
• Geolocation:
o Decide based on where the DNS queries are done.
• And APEX domain is your principal domain, for example “inbest.cloud”.
•
Amazon S3
x-amz-server-side-encryption-aws-kms-key-id
• Preassigned URLs can be created from the CLI or SKD, default is one hour,
you can select the expires on the command line
Tips
• Grants users limited and temporary access to AWS resources. From three
sources:
o Federation (Like AD).
o Federation with mobile apps.
o Cross Account Access.
• Federation: combining or joining a list of users in one domain (like IAM) with
a list of users in another domain (like AD).
• Identity Broker: a service that allows you to take an identity from point A and
join it with B.
• Identity Store: Facebook, AD.
• Identity.
Tips:
• Q: Which feature can be used to configure console access for users
authenticated by Active Directory?
• A: Federated authentication with STS
• Do tests using the “Web Identity Federation Playground” at https://web-
identity-federation-playground.s3.amazonaws.com/index.html
Amazon VPC
Architecture - One VPC Peered with Two VPCs Using Longest Prefix Match
Additional reading
• “AWS re:Invent 2018: AWS Direct Connect: Deep Dive (NET403)”,
https://www.youtube.com/watch?v=DXFooR95BYc.
AWS Cloud Adoption Framework
• The AWS Cloud Adoption Framework (AWS CAF) helps organizations
understand how cloud adoption transforms the way they work, and it
provides structure to identify and address gaps in skills and processes.
Applying the AWS CAF in your organization results in an actionable plan with
defined work streams that can guide your organization’s path to cloud
adoption. This framework leverages our experiences and best practices in
assisting organizations around the world with their cloud adoption journey.
“Statement” : [
“Effect” : “Allow”,
“Action” : “Update:*”,
“Principal”: “*”,
“Resource” : “*”
},
“Effect” : “Deny”,
“Action” : “Update:*”,
“Principal”: “*”,
“Resource” : “LogicalResourceId/ProductionDatabase”
}
• This policy by defaults deny changes on your stack. You need to explicitly
allow changes to the stack, and deny (in this case, deny changes to
“LogicalResourceId/ProductionDatabase”).
o https://d0.awsstatic.com/whitepapers/aws-amazon-vpc-connectivity-
options.pdf
• Changesets:
o When you need to update a stack, understanding how your changes
will affect running resources before you implement them can help you
update stacks with confidence. Change sets allow you to preview how
proposed changes to a stack might impact your running resources,
for example, whether your changes will delete or replace any critical
resources, AWS CloudFormation makes the changes to your stack
only when you decide to execute the change set, allowing you to
decide whether to proceed with your proposed changes or explore
other changes by creating another change set. You can create and
manage change sets using the AWS CloudFormation console, AWS
CLI, or AWS CloudFormation API.
• Stacksets:
o AWS CloudFormation StackSets extends the functionality of stacks by
enabling you to create, update, or delete stacks across multiple
accounts and regions with a single operation. Using an administrator
account, you define and manage an AWS CloudFormation template,
and use the template as the basis for provisioning stacks into selected
target accounts across specified regions.
Additional reading
• “AWS re:Invent 2017: Deep Dive on AWS CloudFormation”,
https://www.youtube.com/watch?v=01hy48R9Kr8.
AWS CloudTrail
• AWS Config is for resource configuration, AWS CloudTrail is to log API calls
and AWS CloudWatch is to measure performance.
AWS Config
• AWS Config is for resource configuration, AWS CloudTrail is to log API calls
and AWS CloudWatch is to measure performance.
• Compliance checks are triggered periodically or by configuration changes.
• Managed or custom rules.
AWS Direct Connect
• Whenever we enable Direct Connect, it is recommended to use Direct
Connect Gateway to connect multiple regions.
• What is a VIF?
• Types of VIF
o Private virtual interface: A private virtual interface should be used to
access an Amazon VPC using private IP addresses.
o Public virtual interface: A public virtual interface can access all AWS
public services using public IP addresses.
o Transit virtual interface: A transit virtual interface should be used to
access one or more Amazon VPC Transit Gateways associated with
Direct Connect gateways.
• What is a LAG?
Additional reading
• “AWS re:Invent 2018: AWS Direct Connect: Deep Dive (NET403)”,
https://www.youtube.com/watch?v=DXFooR95BYc
AWS Directory Services
AWS DynamoDB
• Managed MultiAZ cross region replicated document database.
• All reads all eventually consisted, but you can specify strong consistency in
the query.
• Priced on throughput rather than compute.
• You can provision read/write capacity in anticipation of need.
• You can select auto scale capacity based on maximum and minimum.
How partition key and sort key work out on the background?
Antipattern:
Correct usage:
Uses cases for DynamoDB streams:
• Many applications can benefit from the ability to capture changes to items
stored in a DynamoDB table, at the point in time when such changes occur.
The following are some example use cases:
o An application in one AWS Region modifies the data in a DynamoDB
table. A second application in another Region reads these data
modifications and writes the data to another table, creating a replica
that stays in sync with the original table.
o A popular mobile app modifies data in a DynamoDB table, at the rate
of thousands of updates per second. Another application captures
and stores data about these updates, providing near-real-time usage
metrics for the mobile app.
o A global multi-player game has a multi-master topology, storing data
in multiple AWS Regions. Each master stays in sync by consuming and
replaying the changes that occur in the remote Regions.
o An application automatically sends notifications to the mobile devices
of all friends in a group as soon as one friend uploads a new picture.
o A new customer adds data to a DynamoDB table. This event invokes
another application that sends a welcome email to the new customer.
• DynamoDB Streams enables solutions such as these, and many others.
DynamoDB Streams captures a time-ordered sequence of item-level
modifications in any DynamoDB table and stores this information in a log for
up to 24 hours. Applications can access this log and view the data items as
they appeared before and after they were modified, in near-real time.
Additional reading
• “AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design
Patterns for DynamoDB (DAT401)”,
https://www.youtube.com/watch?v=HaEPXoXVf2k.
AWS Elastic Beanstalk
• Types of deployments:
AWS Elastic Search
• Kibana accesses Elastic Search using Cognito.
AWS Identity and Access Management (IAM)
• The maximum number of users is 5,000.
• Difference between SCP and IAM Policies:
o SCPs operate on Organizations organizational units (OUs)
o IAM Policies operate at the principal level.
o Even if a principal is allowed to perform a certain action, an attached
SCP policy will override that capability if it’s enforcing a Deny.
Additional reading
• “AWS re:Invent 2017: IAM Policy Ninja”
https://www.youtube.com/watch?v=aISWoPf_XNE.
AWS Lambda
• If the Lambda is inside a VPC, you can use a NAT Gateway to connect to an
RDS with the appropriate security group.
• You can break the process into two, one Lambda to query the RDS inside the
VPC and then invokes the second Lambda outside the VPC.
AWS Managed VPN
VS Software VPN
AWS OpsWorks
• Difference between CloudFormation:
o CloudFormation is JUST for infrastructure.
o OpsWorks is for infrastructure AND application level.
• Check example recipes at https://github.com/aws/opsworks-cookbooks.
• Example of OpsWorks Chef recipe to configure an Apache stack:
o https://github.com/aws/opsworks-cookbooks/blob/release-chef-
11.10/apache2/definitions/apache_site.rb
• OpsWorks is a global service but when creating a stack you must specify a
region and it will not allow you to clone to another region. Further
information:
https://docs.aws.amazon.com/opsworks/latest/userguide/workingstacks-
cloning.html
AWS Rekognition
• Amazon Rekognition makes it easy to add image and video analysis to your
applications.
o Object, scene and activity detection
o Facial recognition
o Facial analysis
o Pathing
o Unsafe content detection
o Celebrity recognition
o Text in images
AWS Serverless Application Model (AWS SAM)?
The AWS Serverless Application Model (AWS SAM) is an open-source framework
that you can use to build serverless applications on AWS.
You can use AWS SAM to define your serverless applications. AWS SAM consists of
the following components:
AWS SAM template specification. You use this specification to define your serverless
application. It provides you with a simple and clean syntax to describe the functions,
APIs, permissions, configurations, and events that make up a serverless application.
You use an AWS SAM template file to operate on a single, deployable, versioned
entity that's your serverless application. For the full AWS SAM template
specification, see AWS Serverless Application Model Specification.
AWS SAM command line interface (AWS SAM CLI). You use this tool to build
serverless applications that are defined by AWS SAM templates. The CLI provides
commands that enable you to verify that AWS SAM template files are written
according to the specification, invoke Lambda functions locally, step-through debug
Lambda functions, package and deploy serverless applications to the AWS Cloud,
and so on. For details about how to use the AWS SAM CLI, including the full AWS
SAM CLI Command Reference, see AWS SAM CLI.
Additional reading
• “Authoring and Deploying Serverless Applications with AWS SAM”,
https://www.youtube.com/watch?v=MSsMOtLZXKc.
AWS Snowball
• https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html
• If its customer imported, you need to manually rotate keys
o https://docs.aws.amazon.com/kms/latest/developerguide/rotate-
keys.html#rotate-keys-manually
• To do string match, it has to be in the first 5,120 bytes of the response body.
o https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/healt
h-checks-creating-values.html#health-checks-creating-values-string-
matching.
• The flow log is still in the process of being created. In some cases, it can
take ten minutes or more after you've created the flow log for the log group
to be created, and for data to be displayed.
• There has been no traffic recorded for your network interfaces yet. The log
group in CloudWatch Logs is only created when traffic is recorded.
o https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-
troubleshooting.html
• https://aws.amazon.com/es/blogs/security/how-to-prevent-uploads-of-
unencrypted-objects-to-amazon-s3/
Ethernet frames can come in different formats, and the most common format is
the standard Ethernet v2 frame format. It supports 1500 MTU, which is the largest
Ethernet packet size supported over most of the Internet. The maximum supported
MTU for an instance depends on its instance type. All Amazon EC2 instance types
support 1500 MTU, and many current instance sizes support 9001 MTU, or jumbo
frames.
Serverless
iSCSI
In computing, iSCSI is an acronym for Internet Small Computer Systems Interface,
an Internet Protocol (IP)-based storage networking standard for linking data storage
facilities.
Routing
Fault tolerance
• Fault-tolerance defines the ability for a system to remain in operation even
if some of the components used to build the system fail.
• In RDS, MultiAZ is for DRP (disaster recovery planning).
• In a MultiAZ RDS database, the DNS pointing to the endpoint is updated
automatically.
Federated authentication
• Difference between methods:
High availability
• Elasticity is the ability to increase or decrease really fast your infrastructure.
• Read replicas are an excellent mechanism for elasticity.
• Scalability - Longer periods (Ability to grow your infrastructure without any
limits).
• Elasticity - Smaller periods (Ex. autoscaling).
• 99.99% is = 52.6 minutes / year.
• 99.9% is = 8.76 hours / year.
• 99.5% is = 1.83 days / year.
• Load testing is pretty self-explanatory.
• Smoke testing is functional testing.
Difference between step functions, Simple Workflow Service, SQS and AWS Batch
BGP
Consistency models (ACID & BASE)
• ACID
o Atomic transactions: are all or nothing.
o Consistent: Transactions must be valid.
o Isolated: Transactions can't mess with one or another.
o Durable: Completed transactions must stick around.
• BASE
o Basic availability: Values availability even if stale.
o Soft-state: Might not be instantly consisted across stores.
o Eventual consistency: Will achieve consistency at some point.
Machine Learning
Machine learning cycle
Migrations
Six Common Application Migration Strategies
Organizations usually begin to think about how they will migrate an application
during Phase 2 of the migration process. This is when you determine what is in your
environment and the migration strategy for each application. The six approaches
detailed below are common migration strategies employed and build upon “The 5
R’s” that Gartner outlined in 2011.
You should gain a thorough understanding of which migration strategy will be best
suited for certain portions of your portfolio. It is also important to consider that
while one of the six strategies may be best for migrating certain applications in a
given portfolio, another strategy might work better for moving different applications
in the same portfolio.
You may also find that applications are easier to re-architect once they are already
running in the cloud. This happens partly because your organization will have
developed better skills to do so and partly because the hard part - migrating the
application, data, and traffic - has already been accomplished.
2. Replatform (“lift, tinker and shift”)
This entails making a few cloud optimizations in order to achieve some tangible
benefit without changing the core architecture of the application. For example, you
may be looking to reduce the amount of time you spend managing database
instances by migrating to a managed relational database service such as Amazon
Relational Database Service (RDS), or migrating your application to a fully managed
platform like AWS Elastic Beanstalk.
This is a decision to move to a different product and likely means your organization
is willing to change the existing licensing model you have been using. For workloads
that can easily be upgraded to newer versions, this strategy might allow a feature
set upgrade and smoother implementation.
4. Refactor / Re-architect
5. Retire
Identifying IT assets that are no longer useful and can be turned off will help boost
your business case and direct your attention towards maintaining the resources
that are widely used.
6. Retain
You may want to retain portions of your IT portfolio because there are some
applications that you are not ready to migrate and feel more comfortable keeping
them on-premises, or you are not ready to prioritize an application that was
recently upgraded and then make changes to it again.
Additional reading
• “An Overview of the AWS Cloud Adoption Framework”,
https://d1.awsstatic.com/whitepapers/aws_cloud_adoption_framework.pdf.
Security
• NACLs are stateless and support DENY rules while SGs are stateful and have
no DENY rules:
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Security.html
Additional reading
• “AWS Security Best Practices”,
https://d0.awsstatic.com/whitepapers/Security/AWS_Security_Best_Practice
s.pdf
Well-Architected Framework
Additional reading
• “AWS Well-Architected Framework”,
https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-
Architected_Framework.pdf