AWS Architect
AWS Architect
AWS Architect
gartner
Aws vs on premise
AWS platform, infraestructura
Regions
AWS Services
Compute
EC2 - AMI
EBS
EBS automatic replication AZ, snapshot backup to S3
EBS types
S3
Snowball
Snowball use
Cloud front cloudfront
DAtabase
Amazon rds
Managed relational databases
Amazon RDS key benefits
Amazon DynamoDB
Amazon VPC
VPC connections – Gateway connections
MANAGEMENT TOOLS
Amazon EMR
Web services process lots amount of date, apache Hadoop framework
Amazon Kinesis
Streaming data - firehose
APPLICATION SERVICES
Amazon SQS
AMAZON WORKSPACES
DEVOPS:
AWS codepipeline
AWS Codecommit
AWS Architecture
Aws Mobile
AWS IoT
AWS Security
Global locations
- Regions, separate geographic área,
- Availability zone, multiple locations isolated inside the region
- Each Availability Zone is also isolated, but the Availability Zones in a region are connected
through low-latency links.
- Availability Zones are all redundantly connected to multiple tier-1 transit providers.
- You can achieve high availability by deploying your application across multiple Availability Zones.
Redundant instances for each tier (for example, web, application, and database) of an application
should be placed in distinct Availability Zones, thereby creating a multisite solution. At a minimum,
the goal is to have an independent copy of each application stack in two or more Availability
Zones.
EC2
Lambda, runs code under an EC2 managed by AWS
Auto scaling
Elastic Load balancing, load balancing sobre ec2
Elastic beanstalk, deploy code
Amazon VPC( virtual private cloud),
Aws direct connect,
Amazon route 53, DNS server
Database Service
Amazon RDS, HA, fault tolerance, manage time-consuming administration tasks, backups,
replication, software patching, monitoring, scaling.
Amazon DynamoDB, No SQL db
Amazon Redshift, BI
Amazon Elasticache
Management tools
Amazon Cloudwatch
Amazon CloudFormation
AWS CloudTrail, API web services to call aws API, tracks logs and audit for all API calls
AWS config
Application Services
Object differences between block storage or file storage, is that object is at application level,
block and operating system level.
Storage Classes:
- General purpose
- Infrequent access
- Archive
Lifecycle policies
Glacier -> cold data
There is no sub-bucket
There is no way to mount a bucket, open an object, install O.S. or install BD on it
S3 Objects are auto replicated in multiple devices and facilities within a region.
Every S3 object must be in a Bucket
Bucket name is global, like a DNS
Up to 100 per account
Each Bucket is created in a specific region
Objects can be up to 5 TB, a single bucket can store unlimted amount of objects.
System Metada and User Metadata(tag an object, is optional)
Object key must be unique in a single bucket
S3 Operations:
- Create/delete a bucket
- Write an object
- Read an object
- List keys
Amazon s3 storage is designed for 99.999999999% durability and 99.99% of availability objects
over a year
RRS has a lower cost for durability, gives 99.99%
For PUTs to new objects we can read them after upload, for overwrite or delete, it takes
eventual consistency while the replication of the change is taken in all zones.
S3 Access Control, by default only the person who creates the Bucket or Object has access
- ACL: read, write or full-control to object or bucket
- Policies: Recommended, associated with the bucket instead of IAM principal, specify
who can access from where and for how much time.
- IAM
- Query string authentication
S3 Storage Classes:
- S3 Standard: ofrece HD, HA, low latency, high performance storage for general
purpose
- S3 Standard IA: the same as standard but designed for long-lived infrequent access
data, minimum size of 128 KB, minimum duration 30 days.
- S3 RRS, Reduced Redundancy Storage, lower durability >9.9999, data easy to
reproduce like thumbnails.
- S3 Storage Glacier, does not need real time access, archive and backup purposes, for a
restore it takes from 3 to 5 hours and it is retrieved to RRS, like a copy, the original
remains in Glacier, free up to 5% to retrieve storage.
Retrieval policies
Glacier uses S3 API, but it has its own.
S3 Encryption
At flight -> Amazon S3 SSl, to use API, via HTTPs
At rest -> Amazon S3 SSE, this is provided by S3 and KMS uses 256 AES(advanced encryption
standard) or CSE, to encrypt before send it to S3
- SSE-S3 ( AWS Managed Keys) -> full managed encryption by AWS, applied with a check
box, assign keys to each object and it has a relation with a master key, that key is
regenerated monthly.
- SSE-KMS(AWS KMS Keys) -> auditing level additional to SSE-S3, we can see who use
the key to access what object, and failed attempts.
- SSE-C (Customer Provided Key) -> AWS do the encryption/decryption while we control
the keys
- Client-Side Encryption -> encrypt data on site before upload to S3, two options to use
data encryption keys:
o AWS KMS-managed customer master key
o Client-side master key
S3 Versioning
Helps to protect the data against malicious or accidental damage, identified by version ID per
object.
Versioning is enabled in the Bucket, once enabled, it cant be disabled, only suspended.
- MFA Delete: Need an additional authentication, to delete an object version or change
versioning state of a bucket. This option can be enable only by a root account.
- Pre-Signed URLs: Share objects temporary by creating pre signed url, to create it we
must provide our security credentials, specify bucket name, object key and HTTP
method and expiration date and time, this only works by the specified duration.
- Multipart Upload: to upload or copy large objects, Multipart Upload API, the ability to
pause and resume
o Three steps: initiation, uploading parts, completion or abort.
o Should use Multipath Upload for more than 100 MB
o Must use Multipath Upload for more than 5 GB
o There is a policy for lifecycle to delete multipart uploads incompleted.
- Range GETs, it is possible to download a portion of a file.
S3 Logging
Logging is off by default for a bucket, the destination bucket where the logs will be written has
to be specified, it can be in the same bucket or not.
Logs information:
- Requestor account and IP
- Bucket Name
- Request Time
- Action…GET, PUT…
- Error code, status
S3 Event Notifications
Send in response to actions taken on S3 objects.
Run workflows, send alerts or perform other actions in response to changes in the objects.
Are set up at bucket level
Notification messages can be sent through SNS, SQS, or lambda function.
AMAZON GLACIER
Glacier Archives
Data is stored in archives, an archive can contain up to 40 TB of data, and there is no limit for
amount of archives.
Glacier assigns a Unique Id, Archive and not be modified is immutable
Automatic encrypted
Glacier Vaults
Glacier vaults are containers for archives, an account can have up to 1000 vaults.
Control with vault access, or IAM policies.
Data Retrieval is up to 5% of the data stored for free in a month, we can deploy policies to limit
the data amount retrieval or the retrieval rate to minimize costs, policy applied to a vault.
AMAZON EC2 -AMAZON EBS
Virtual Machine = Instance
2 Concepts to launch instances:
- Amount of virtual hardware dedicated to the instance
- Software loaded on the instance
This is controlled by the instance type.
Enables Single Root I/O Virtualization (SR-IOV), this results in more Packets Per Second (PPS),
lower latency and lower jitter, supported by C3,C4, D2,I2,M4,R3.
Needs drivers to work and modify an instance attribute.
Supported only in HVM instances launched in a VPC
Sources of AMI:
- Published by AWS: for linux Ubuntu, RedHat and Amazon and Windows 2008 and
Windows 2012.
- AWS Marketplace
- Generated from current Instances
- Uploaded Virtual Servers: AWS VM import/export service.
- DNS: when an instance is launched AWS generates a Name, it persist only when the
instance is running, this cannot be specified, cannot be transferred
- Public IP: This cannot be specified, this persists only when instance is running and this
cannot be transferred
- Elastic IP: can be transferred and assigned to an instance
- Private IP Address: Only on VPC
- Elastic Network Interface: Only on VPC
AWS uses public-key cryptography to encrypt or decrypt login information, public to encrypt,
private to decrypt, these two are called key pair
Key Pairs can be created by the user. AWS stores Public key while the user keeps the private
key.
Public keys are stored on .ssh/authorized_keys, default user for Amazon Linux instances is
ec2-user.
For windows is better to change the local admin password at the initial access.
Virtual Firewall called Security Groups, controls traffic based on port, protocol and
source/destination
Associated with VPC or EC2 instances
EC2 Security Groups: controls outgoing *(and incoming) traffic
VPC Security Groups: controls outgoing and incoming traffic.
Cada instancia debe tener por lo menos un Security Group, puede tener mas
A security group is default deny, that means it will deny all traffic that is not defined in a rule.
When there are multiple security groups to an instance, they are aggregated, both rules will be
applied.
It is stateful and applied at instance level.
EC2 Lifecycle
EC2 Bootstrapping
This is the process to provide code to be run on an instance at launch.
One of the parameters used when an instance is launched is UserData (it is not encrypted) this
is passed to the SO to be executed at launch the first time the instance is booted.
Script can do:
- Apply patches and updates to SO
- Enroll a directory service
- Install Application software
- Copy long script or program to be run in the instance
- Installing Chef or Puppet to admin the Server
EC2 VM import/Export
Have information related to the instance, it has an http method to check information tree (
http://IP/latest/meta-data) Includes the following attributes:
EC2 Managing
EC2 Monitoring
AWS offers a service called CloudWatch that provides monitoring and alerting
EC2 Modify
- Instance Type: This can be a higher or lower hardware, instance has to be stopped,
change the instance type and then restart the instance.
- Security Groups: If an instance is inside a VPC security groups can be changed while
the instance is running, if the instance is not in a VPC cannot be changed after launch.
- Termination protection: this can be enabled for an instance to protect from
termination (stop the machine and remove it from AWS) this will lock the termination
process for an instance, to terminate it we have to disable termination protection.
Works for manual termination from console, API or CLI.
EC2 Options
EC2 Pricing Options
- On-Demand Instances: for unpredictable workloads
- Reserved Instances: capacity reservations for predictable workloads up to 75% of
discount, it depends on the term commitment(1 or 3 years the longer commitment the
bigger discount) and payment option. Payment methods:
o All Upfront: Pay the entire reservation up front , no monthly charges.
o Partial Upfront: pay a portion upfront and the rest monthly.
o No Upfront: pay monthly
Reserved instances can be changed or a subset in on or more of the following ways:
o Switch availability zones within the same region
o Change between VPC and EC2 Classic
o Change the instance type within the same family (ONLY LINUX)
- Spot Instance: not critical and tolerant to interruption (big data, analytics, media
encoding, testing), offers the greatest discounts, customer specify the price they are
willing to pay, when the customers bid is above the instance type, the customer will
receive the instances. Customer will only pay for the time the instance runs. The
instance will run until:
o Customer terminates it.
o The Spot Price is above the customers bid, two minutes before terminate it
AWS notify the customer.
o There is not enough space for spot instances.
Instance store or ephemeral storage, this storage is provided from the local disks attached in
the host, ideal for temp information, like cache, buffers, that changes frenquently. HDD or SDD
depending on the instance type, could be used for temporary data or providing redundancy
such as Hadoop HDFS
Included in the instance cost.
This data is temporal, will be lost:
- The underlying disk fails.
- The instance stops (the data will remain if instance reboot)
- The instance is terminated
AWS EBS
EBS Basics
Persistent block level storage volumes,
Automatically replicated within its availability zone offering HA and HD
Differs in performance characteristics and price
An Instance can have a lot of EBS, but an EBS can only be used one instance at a time.
EBS Types
Vary in areas such as:
- Underlying hardware
- Performance
- Cost
- EBS Magnetic Volumes: lowest performance and lowest cost, from 1 GB to 1 TB, and
100 IOPS, suited for data accessed infrequently, sequential reads and low cost storage
as a requirement, are billed based on data space provisioned but not data used.
- EBS General Purpose SSD: cost effective storage for a wide range of workloads, from 1
GB to 16 TB, baseline performance of 3 IOPS per GB capping up to 10000 IOPS or up to
160 MB of throughput, for instance for 3 GB gives 3000 IOPS but for 5 TB will give the
capped value of 10000 IOPS.
Exists bursting behavior under 1 TB, limit for 1 TB is 3000 IOPS, but if you have a
volume of 500 GB, that gives 1500 IOPS, so when it not uses those IOPS they are saved
to use them when it needs the, and they can get up to 3000 IOPS to consume the IOPS
not used before.
Used where highest disk performance is not critical such as: system boot volumes,
smail-medium DB, Development and test environments.
- EBS Provisioned IOPS SSD: meets IO intensive workloads, most expensive EBS volume
per GB, provide the highest performance, size from 4 GB to 16 TB, you specify not just
the size, but the number of IOPS, max 30 times the number of GB of the volume or
20000 IOPS. The IOPS are charged whether thery are used or not.
Provides predictable and high performance, for critical business applications, large
database workloads, more than 10000 IOPS or 160 MB of throughput
- EBS Throughput-optimized HDD: low cost HDD volumes for frequent access, big data,
data warehouse and log processing. Up to 16 TB with max 500 IOPS and max
Throughput of 500 MB/s, cheaper than General-purpose SSD
- EBS Cold HDD: less frequently accessed workloads, up to 16 TB with max 250 IOPS,
and throughput 250 MB/s, cheaper than the other HDD.
EBS Volume type comparison
VPC Components:
- Subnets
- Route Tables
- DHCP Option sets
- Security Groups
- ACL
VPC Subnets
Virtual network interface to attach to an instance in VPC. Available within the VPC and are
associated with a subnet when created.
- Can have 1 public ip address
- Multiple private ip address, have network presence in different subnets, one as a
primary
- Low budget, HA solution
VPC Endpoints
Connect VPC to other AWS services without internet, direct connect, NAT or VPN.
Create different endpoints to the same service and multiple routing tables to the same service.
VPS endpoint supports communication with S3.
- To create an endpoint:
o Specify a VPC
o Specify prefix: com.amazonaws.<region>.<service>
o Specify Policy, full access or create custom that can be changed at any time
o Specify route tables, route added to each route table, service as destination
and endpoint as target
- Two types of endpoints, Interface and gateway
o Gateway type, gives a destination path to route table to services S3 Dynamo
o Interface type, direct connection and SaaS Codebuild
VPC Peering
Network conn between two VPC to each other as if they are in the same network.
- Can be between VPC from different accounts within single region.
- Created as request/accept protocol.
- Identified by VPC ID or/and VPC ID
- One week to accept peering, then expires.
- Peering is one to one, only one peer between two VPC, do not support transitive
routing, only direct.
- Cannot be created between VPC with matching or overlapping CIDR Blocks.
- Cannot create peering between VPC in diff regions.
- Cannot support transitive.
Virtual stateful firewall controls inbound and outbound traffic, first level of defense at instance
level.
EC2 must be launched into security group, if not specified then launched into default security
group.
Default security group rules can be changed but default group cannot be deleted.
Allows comm between all resources within security group, allows all outbound traffic and
denies all other traffic.
Inbound rules must to be added
Exam important points security group:
- Up to 500 security groups per VPC
- 50 inbound and 50 outbound rules per security group.
- Associate up to 5 security groups with each network interface
- Can specify allow rules but not deny rules, important difference between Security
group and ACL
- Separate rules for inbound and outbound
- By Default, no inbound traffic allowed.
- By Default, new security groups allow all outbound traffic.
- Security groups are stateful, this means responses to allowed inbound routes are
allowed to flow regardless outbound rules and vice versa. Important difference
between ACL and Security group
- Instances associated with the same security group cannot talk each other unless create
rules for that.
- Changes can be done and applied instantly
By default any EC2 instance is not allowed to comm to internet through the IGW.
AWS recommends to use a NAT gateway instead of NAT instance.
NAT gateway provides better availability and higher bandwidth and less admin effort.
Is an AMI designed to accept traffic from instance within a private subnet, translate source IP
to public IP address of the NAT instance and forward traffic to IGW.
NAT instance maintain the state of forwarded traffic.
String amzn-ami-vpc-nat are in their names.
To allow instance in a private subnet to access internet through IGW you must do:
- Create security group for the NAT.
- Launch NAT AMI in a pubic subnet and associated with the NAT security group.
- Disable source/destination check attribute of the NAT
- Configure route table associated with private subnet to direct internet traffic to NAT
instance ( i-asdfdga )
- Allocate EIP and associate it with the NAT Instance.
- This allows instances in private subnet to access internet but prevents inbound traffic
initiated in internet to access instance.
Once the CGW and VPG are created we can create VPN tunnel.
VPC VPN – VPC Virtual Private Network
Routing type must be specified when using VPN connection. If the CGW supports BGP then
configure connections for dynamic routing, if not static routing.
VPC supports multiple CGW, each having VPN conn to a single VPG (many to one)
CGW IP must be unique per region.
VPN conn consists of two IPSEC tunnels for HA.
VPN, VPG and CGW exam points
- VPG is the AWS end of the VPN tunnel.
- CGW is the customer end of the VPN tunnel.
- Initiate the VPN tunnel from the CGW to the VPG.
- VPG supports dynamic and static routing.
- VPN connection consists of two tunnels for higher availability to the VPC
ELB Types
ELB Listeners
ELB must have at least one listener, is a process that checks for connection requests.
Every listener is configured with protocol and port for a front-end connection and protocol and
port for back end connections.
Supports protocols operating at two layers in OSI,
- Layer 4 (TCP connection)
- Layer 7 application layer HTTP and HTTPS
*To update Instances, instance must be take out the ELB updated, and then take it inside the
ELB again.
AWS CLOUDWATCH
Monitor services in real time, collect and track metrics, create alarms, send notifications and
make changes to resources being monitored based on rules
Perform automated actions when a threshold is triggered.
Metrics can be retrieved by GET request.
Each AWS account has a limitation of 5000 Alarms, data is retained for 2 weeks by default, if
you want more you have to move logs to S3 or Glacier.
Can execute auto scaling policies or send notifications.
- CW Basic Monitoring: monitoring every 5 minutes for limited number of preselected
metrics with no charge. By default
- CW Detailed Monitoring: monitoring every minute allows data aggregation across AZ
within a region, with additional charge, this must be enabled.
Using Cloud watch API, we can PUT custom metrics, with name-value, and then use triggers
and actions based on these ones.
CloudWatch Logs,
you can install an Agent in Linux or Ubuntu in a EC2 instance, you can save logs in S3, open logs
from CloudTrail, configure alarms, you have to install agent in machine
IAM powerful services that allows to control how people and programs are allowed to
manipulate AWS services.
Components:
- Users
- Groups
- Access control policies
Who can use, What can be used and how they can use the AWS services.
Active Directory on premise can be merged with AD in Cloud, and Amazon Cognito can be used
for Apps authentication
IAM is only for AWS infrastructure, not for internal server applications.
Authentication Technologies:
IAM Principals
Is an IAM entity that is allowed to interact with AWS resources. Can be permanent or temp, it
can be a human or an application, three types:
- IAM Root users, only for the first use, to create the first IAM user to control
everything, the default root user is the user that we use to create AWS account.
- IAM users, users created by a principal with administrative privileges, they have no
expiration, an administrator have to delete it first, there could be an IAM user for each
company users that needs access. (Principle of least privilege) assigning only policies
they need and nothing else.
- IAM Roles/Temporary security Tokens, advance usage, Roles grants specific
permissions to specific actors during specific period of time, actor can be
authenticated even with external system, when an actor assumes the role then a STS
security token service will deliver a temporary token, a period of time must be
specified, can be between 15 mins and 36 hours.
Use Cases:
o Amazon EC2 Roles:
granting permissions to applications running on EC2, to configure access from an EC2
to a resource like S3 requires to create an IAM user with permissions and application
will store credentials of that user to access, IAM Roles avoid to save those credentials
in a configuration file of the application, With IAM Roles you can assign a role to that
EC2 so the SDK handle EC2 role to authenticate against S3, and like the token is
updated each 36 hours then it is secure and the process is transparent for the Instance
operation.
o Cross-Account Access:
grant permissions to users from other AWS accounts (providers, customers), users
from other accounts can assume that role instead of giving him a fix credential.
o Federation:
grant permissions to users authenticated by an external system. IAM identity providers
(idP). Two types of idP, web based identities like FB, Google, etc.. AWS supports
integration with OpenID Connect (OIDC). For internal identities like LDAP, AWS
supports integration via (SAML). SAML compliant idP such an Active directory
federation service (ADFS) is used to federate the internal directory against IAM.
IAM Authentication
IAM Authorization
Authorization is the process of specify what actions can or cannot perform a principal. Once
authenticated we have to manage the access to the AWS infrastructure.
This is granted specifying privileges in policies and associating this with principals.
IAM Policies
JSON documents that fully defines permissions to access and manipulate AWS resources.
- IAM effect: allow or deny
- Service: for what service the permissions is being configured for.
- Resource: ARN Amazon Resource Name – the name for the resource that will be
applied this policy. Wildcards could be used, like *
- Action: What action will be allowed or denied for that resource, (read, write…)
- Condition: limit the actions allowed by the permission, maybe a condition could be to
restrict access from specific IP, or during specific time interval.
Best Practice when the account is new is to create an admin group and an admin user and
grant full access permissions, so you can avoid to use root.
An Actor can be associated with a policy is assuming a role, and it is provided with a temporary
security token with the policies of that role.
IAM Features
- IAM Multi-Factor Authentication - IAM MFA A second security factor, could be with a
hardware device or a virtual app like AWS virtual MFA app, it double checks your
identity with something you know and something you have. MFA can be assigned to a
IAM user account, whether if is a person or an application. Recommended that Root
user has enable the MFA.
- IAM Rotating Keys: is a best practice to rotate access keys associated with your IAM
users, allows two active access keys at a time, this process can be handle with console,
CLI or sdk. ( create new one, reconfigure applications with new keys, disable original
key, verify and then delete)
- IAM Resolving Multiple Permissions: permissions if a principal can perform an action
may come from different policies, how can a permissions conflicts be resolved:
o Initially the request is denied by default
o All policies are evaluated, if there is a “deny” then the request is denied and
evaluation stops
o If no “deny” is specified and an “allow” is specified then the request is allowed.
o If there are not explicit “deny” or “allow” then the default “deny” is applied to
the request.
The only exception is if AssumeRole includes a role and a policy, policy cannot
override any permission that is denied by default in the role.
AWS Database - AWS DB
Most organization split their Databases into OLTP and OLAP, one for transactions and other for
data warehouse (reporting)
NoSQL allows horizontal scalability.
AWS RDS
Service that simplifies setup, operations and scaling. Amazon assume offload common tasks
like backups, patching, scaling and replication.
RDS exposes database endpoint to which client software connect and execute SQL. RDS does
not provide shell access to DB instances and restrict access to certain store procedures and
tables with advanced privileges. It is compatible with ETLs.
- RDS DB instances: RDS provides an API that lets create and manage DB instances, can
be created in the Console.
o DB instance class, determines CPU and Memory resources., this can be
changed after deployed.
o DB instances storage performance and size can be selected.
o DB parameter group: acts as a container for engine configuration that can be
applied to DB instances, this can be changed but a reboot is needed.
o DB option group, acts as a container for engine features, it is empty by default.
- RDS Operational Benefits: SSH can not be used to the RDS instance,
RDS comparison operational responsabilities
- RDS MySQL
Supports 5.7, 5.6, 5.5, 5.1. RDS is running Open Source Community edition with InnoDB as
database storage engine. Supports MultiAZ deployments for HA and read replicas for
horizontal scaling.
- RDS PostgreSQL
Supports 9.5, 9.4 and 9.3. Supports MultiAZ deployments for HA and read replicas for
horizontal scaling.
- RDS MariaDB
Supports 10.1.17 and XtraDB storage engine. Supports MultiAZ deployments for HA and
read replicas for horizontal scaling
- RDS Oracle
Supports 11g, and 12c. Standard Edition one, Standard Edition and Enterprise Edition.
AWS Redshift
Fully managed, petabyte scale data warehouse, relational DB designed for OLAP scenarios for
large datasets. OLAP database, columnar
- Uses SQL commands for queries
- Connectivity via ODBC or JDBC
- Based on PostgreSQL
- Manages backups and patching
AWS DynamoDB
Write unlimited number of items with consistent latency, distributing data traffic over multiple
partitions, capacity can be adjusted after created, all table data is stored on high performance
SSD, replicate data across multi AZ within a Region.
- DynamoDB Data Model: includes tables, items and attributes. A table is a collection of
items and each item Is a collection of attributes, each item has a primary key. Limit of
400 KB on the item size, key/value pairs, a key can have multiple values. Applications
can connect to DB endpoint using HTTPs, web service configured in JSON.
- DynamoDB Data Types:
o Scalar: string, number, binary, Boolean, null
o Set: (list) String set, number set, binary set
o Document: List and Map.
- DynamoDB Primary Key: specified in table creation pointing to one item, two types
and this cannot be changed:
o DynamoDB Partition Key: made of one attribute, unordered has index on this
attribute
o DynamoDB Partition Key and Sort Key: made of two attributes, first one is the
partition key and the second one is the sort or range key. Each item is unique
identified by combination of both, partition can be repeated but sort not.
Primary key must be defined as scalar type. Hash and range key
Distribute requests across the full range of partition
- DynamoDB Capacity, read and write capacity must be assigned to the table to handle
workloads, then DynamoDB provisions needed infrastructure based on the capacity,
these values can be changed later, to scale up or down.
Each operation consumes capacity units. 1 Capacity unit for item of 4 KB or smaller for
read operations, and 1 capacity unit for 1 item that is 1 KB or smaller for write
- DynamoDB Secondary Indexes: when using hash and range key, you can define
optionally one or more secondary indexes.
o Global Secondary Index: can be different from those on the table, can be
deleted or create a new one at any time on a table maintain their own
provisioned throughput
o Local Secondary Index: the same partition key as the primary in the table but
the sort key is different. This can be created only in the table creation.
Secondary index is updated when an item is modified, updates consumes capacity
units
- DynamoDB write and read data:
o DynamoDB putitem, creates or updates if primary exists, requires table name
and primary key
o DynamoDB updateitem, can update and create items if does not exists,
support for atomic counters
o DynamoDB deleteitem
o DynamoDB getitem, query or scan based on primary key
- DynamoDB Eventual Consistency: when reading the operation can be eventually
consistent or strongly consistent reads, like dynamodb replicates data across regions
for HA maybe when a read operation after a write cannot have the latest data because
of replication.
- DynamoDB Batch Operations: to work with large batch of items, perform up to 25
items creates or updates per operation.
- DynamoDB Search Items: can use Query or Scan, max output 1 MB, try to avoid SCAN
because perform a full scan in the table and secondary indexes
- DynamoDB Scaling and Partitioning: this can be scale horizontally also using
partitions, each partition is a unit of compute and storage capacity, distribute items
across partitions, capacity given to a partition is fixed and cannot be shared, a portion
of capacity is keep to handle peak traffic or bursts.
Partitions can be split and there is no rollback
- DynamoDB Security: granular access control, integrates with IAM, for mobile use AWS
STS
- DynamoDB Streams: each item change is buffered, consist in a stream record that
represents data modification, they are grouped into shards. Shards lives for a max of
24 hours. To build an application from a shard, use DynamoDB kinesis adapter.
AWS SQS
Fully managed message queuing service with HA, to decouple components of a cloud
application, this is to transmit any volume of data at any level of throughput, you can store
application messages on reliable and scalable infrastructure.
It is a buffer between the application components that receive data and the ones that process
data. Work is not lost when insufficient resources, this would work on peak hours when
instances cannot process requests so they can be waiting in queue meanwhile.
Deliver message at least once, can have multiple readers and writers, queue can be shared
between multiple resources.
This service is NOT FIFO
Build applications that coordinate work across distributed components. Task is logical unit of
work that is performed by a component of the application. Coordinate tasks across the
application, managing inter tasks dependencies, scheduling.
Workers must be implemented to perform tasks, can be either in EC2 or on premise.
SNS owner has to define topic to allow publishers, subscribers and technologies to
communicate.
- Route 53 Supported DNS Record Types Supported: record map resources with ip
address
o Start of Authority Record SOA: mandatory in all zone files, single record,
stores info about name of DNS, admin of the zone, etc…
o A and AAA: A maps record to ipv4 and AAAA to ipv6
o CNAME: alias for server, defined in A or AAAA
o MX: route mail servers, defined by A and not by CNAME.
o NS: routes the traffic to the authoritative dns server
o PTR: reverse of A record.
o SPF: used by mail servers to combat spam
o TXT: hold text information
o SRV: provides where the DNS server is hosted.
o Routing Policies
- Route 53 main functions:
o Domain Registration: lets register domain names
o DNS Service: translates DNS with IP Address thru UDP, when size is higher
than 512 bytes then uses TCP, it is an authoritative DNS service, when register
new domain name register then a hosted zone will be created, DNS service can
be transferred to AWS route 53
o Health Checking: sends automated requests to check if it is reachable.
- Route 53 Hosted Zones: is collection of resource record sets, has its own metadata
and configuration information.
o Route 53 Public hosted zone: container that holds information to internet
o Route 53 private hosted zone: holds information about how to traffic inside
VPC
Route 53 not allowed to use CNAME
Do not use A records for subdomains
Health checks will work with cloudwatch, name servers can route differently with an unhealthy
node in 30 secs, new dns results will be known in 60 secs.
It Is a web service to simplify the setup and management of distributed in memory caching
environments.
HP and scalable caching solution, we can choose between Memcached or redis to launch
cluster. Replicas and fail over cache nodes can be configured with Redis Engine.
In memory caching technologies.
is a CDN service, works with DNS geo location to deliver content from cache edge locations, to
low original website traffic loads. Cloudfront work with S3 or web server, s3 static websites,
EC2 and ELB. Integrates with Route 53. Supports content served over HTTP or HTTPS, also
supports serve dynamic web pages. Supports streaming using HTTPs and RTMP.
Cloudfront concepts:
- Cloudfront distributions: identified by dns domain name,or create a friendly domain
name. select min, max, and TTL for objects in distribution.
- Cloudfront origins: when create distribution must specify origin- s3 bucket or HTTP
server-. Define headers
- Cloudfront Cache Control: once requested and server from edge location, objects stay
in the cache until expire or evicted to make room for more frequently accessed
content.by default objects expire after 24 hrs. It is better to create a second version
for a file rather than delete it.
Cloudfront Advanced features:
Dynamic content, multiple origins and cache behaviors. Control which requests are served by
which origin and how requests are cached using Cache Behaviors. Cache behavior, includes:
- The path pattern
- Which origin to forward your requests to
- Whether to forward query strings to your origin
- Whether accessing the specified files requires signed URL
- Whether to require HTTPS access
- The amount of time that those files stay in Cloudfront.
Cloudfront Whole Website: cache behaviors and multiple origins, supporting different
behaviors for diff client.
Cloudfront Private Content: allows you to serve private content:
- Signed URL, urls are valid only between certain times and certain IP
- Signed cookies, require authentication via public and private key pairs
- Origin Access Identities (OAI), access restricted to an S3 bucket with special cloud front
user associated to distribution.
Cloudfront Use cases:
- Serving the static assets of popular websites, images, css, javascript.
- Serving whole website or web application, both dynamic and static content, with
multiple origins, cache behaviors and TTLs.
Cloudfront wont work for a single point of access, or thru VPN.
Caching frequently accessed data on premises while encrypting and storing in S3 or Glacier
Three configurations for Storage Gateway
- Storage Gateway Cached Volumes: allows to expand local storage capacity into S3, all
data stored is moved to S3, while recently read data is retained in local storage. Each
volume is limited to 32 TB and a gateway can have up to 32 volumes. Work with
snapshots, and only incremental backups are stored, transferred over SSL. Data cannot
be accessed thru S3, only via Storage Gateway. Expand local storage hardware to
amazon S3.
- Storage Gateway Stored Volumes: store data on premises storage and async back up
that data to S3. Data is backed up to EBS snapshots. Volume limited to 16 TB, gateway
can support up to 32 volumes. Incremental backups for snapshots. Store backups on
cloud.
- Storage Gateway Virtual Tape Libraries VTL: archive data on AWS cloud. Gateway can
contain up to 1500 tapes. Virtual tape library, virtual type shelf, allowed 1 VTS per
AWS region.
AWS CloudTrail
Visibility for user activity recording API calls. Track changes. Deliver logs files to S3, logs can be
sent to CloudWatch monitoring group. SNS can be received when a log is delivered.
Types of Trail:
- CloudTrail applies to all Regions: creates the same trail in all regions and send logs to
the single S3 bucket defined.
- CloudTrail applies to One region: specify bucket that received from one region.
By default, encrypted using S3 SSE. Log files can be stored as long as we want. S3 lifecycle rules
can be defined to archive or delete log files. Delivers within 15 mins log files, service publishes
new logs every 5 mins. Trails should be enabled for all AWS accounts, for all regions.
AWS Kinesis
Platform for handling massive streaming data, enable to build custom streaming data
applications.
Three services addressing different real-time streaming, limitless data stream.
- Kinesis Firehose: load massive volumes of streaming data into AWS. Receives stream
data and stores in S3, Redshift (first is send to S3 and a copy send it to Redshift) or
ElasticSearch (backup to S3). Create delivery stream and destination data.
- Kinesis Streams: build custom applications for more complex analysis of streaming
data in real time. Collect and process large amount of data in real time. Distribute
incoming data across shards. Shard can be splited in more shards.
- Kinesis Analytics: analyze streaming data real time with SQL
Fully managed on demand Hadoop framework. When launching a EMR cluster these are the
options to specify:
- Instance type of nodes in cluster
- Number of nodes in cluster
- Version of Hadoop (apache Hadoop, MapR Hadoop)
- Additional applications (hive, pig, spark, presto)
Two types of storage:
- EMR HDFS (Hadoop distributed file system): standard FS, data replicated across
instances, can use EC2 storage or EBS. Data is not persistent when cluster shutdown if
not using EBS.
- EMR EMRFS EMR FS: allows to store data in S3.
For persistent clusters use HDFS, when clusters are powered on and then shut down onlyu for
specific tasks it is transient cluster, and the preferred is EMRFS
This EMR works for log processing, clickstream analysis, life sciences.
Service to transfer data using physical storage appliances bypassing internet, then shipped to
AWS and copy information to AWS.
- Snowball: provides amazon storage shippable through UPS. Protected with KMS
supports 50 TB and 80 TB.
o import export between on premises and S3.
o Encryption enforced
o Snowball console to manage jobs.
o No box required
- Import Export Disk: transfers data directly onto and off of storage devices you own
using amazon high speed internal network
o Data can be imported into glacier and EBS additional to S3
o Export data from S3
o Encryption optional
o Buy and maintain hardware devices
o Cant manage jobs in snowball console.
o Limit of 16 TB
AWS OpsWorks
Configure and operate applications using Chef. Does not matter architecture nor complexity.
Define package installation, software config and storage. Supports linux, windows or on
premise servers.
Server Stack is composed by (load balancer, app server, db servers…)
Helps to manage these resources as a group, this can be run in VPC.
Adding elements to a stack has to add layers, layer is a set of resources for a particular
purpose. How packages are installed, configured and deployed.
Lifecycle of tasks to run appropriates actions in specific instance.
Sends metrics to CloudWatch.
- OpsWorks Use Cases: Host multitier web applications, support continuous integration.
AWS CloudFormation
Service that helps to setup aws resources (collection of related resources), allows to deploy,
modify and update resources in orderly and predictable way, applying version control.
Work with templates(JSON) and stacks(manage related resources as a single unit).
To update a stack, create a change set by submitting a modified version of original stack, to
delete stack and leave some resources deletion policy must be specified, if not resource will be
deleted by default. If a resource cannot be deleted then the stack will not be deleted
complete.
- Cloud Formation Use Cases: quickly launch test environments. Reliably replicate
configuration between environments, launch applications in new aws regions.
Simplest way to get an application up and running on AWS. Developers only upload application
code, service handles all details about load balancing, provisioning, auto scaling. Deploy
application without worry about infrastructure. An application looks like a folder.
- Elastic Beanstalk Application version: labeled iteration that points to S3 object where
the code is.
- Elastic Beanstalk environment: application version deployed in aws resources, each
environment runs only one version.
- Elastic Beanstalk environment configuration: parameters and settings to define how
an environment work, when updated, the changes are done immediately.
Environment tier could be web server tier if it is a web app, or worker tier if it runs background
process.
Supports java, node.js, php, python, ruby and go, for containers supports Tomcat, Passenger,
Puma and Docker.
Can be Integrated with cloudwatch, SNS.
AWS Config
Managed service that provides with AWS resource inventory, config history and config change
notifications to enable security governance. Discover existing and deleted resources. Enables
auditing, security analysis, resource change tracking and troubleshooting
When turn on aws config then there will be a discovery task and generates a configuration
item per each resource.
It maintains historical data per configuration item with configuration recorder.
Config rule represents desired config settings for specific aws resources for an AWS account.
Monitor if configurations violate rules.
Management tasks:
- Config Discovery: discover resources on account, record configuration, capture
changes to these configs.
- Config Management: use SNS to notify about changes in resources config.
- Config Continuous Audit and Compliance: rules are designed to help assess
compliance with internal policies and regulatory standards.
- Config Troubleshooting: for operational issues
- Config Security and incident analysis Integrates with Cloudtrail
AWS Security
Shared responsibility model:
- Design for failure and nothing fails: design for failure, assume the worst scenario,
assume things will fail, deploy for automated recovery from failure
- Implement Elasticity: ability to grow to handle increased load. Scalable architecture.
No drop in performance. Scale vertically and horizontally.
o Scaling Vertically: increase specifications of individual resource. EC2 stop
instance and resize.
o Scaling Horizontally: increase the number of resources key characteristic is
stateless and stateful architectures.
Stateless applications: a session is created when a user or service
interact with app. Stateless don’t need knowledge of previous
interactions nor session information. Any request can be used by any
instance because no data needs to be shared.
Stateless components: store unique session identifier and light
cookies.
Stateful Components: databases are stateful, for example for low
latency applications like games, run in a single server
Deployment automation: systems scale without human intervention.
Automate infrastructure: use API to automate deployment process.
Bootstrap instances: every EC2 has a single role to play in the env (DB,
APP server, etc) it then can take actions after it boots. For recreating
env with few clicks, maintain control, reduce human induced.
- Leverage Different storage options:
o One Size Does not fit all: migrate static data from a web site to S3 and pulish it
via cloudfront, store session information in elasticache or DynamoDB
- Build security in every layer: apply encryption in transit and in rest
o Features to defense in Depth: VPC, subnets, security groups, routing controls,
WAF, IAM
o Reduce privileged access: use of service accounts using temporary tokens. Use
least privileges policy
o Security as Code: capture all security policies in a script is a golden
environment, create a cloudformation script with all the security hardening
process to deploy easy. Templates in cloudformation can be imported as
products into AWS Service Catalog.
o Real Time Auditing: services like aws config rules, aws inspector, aws trusted
advisor monitor for compliance or vulnerabilities, use cloudwatch logs and
cloudtrail.
- Think parallel: automate parallelization.
- Loose coupling sets you free: reduce interdependencies, the more loosely system
components are coupled, the larger they scale. API gateway to publish and manage
API. Use Async integration to lose coupling between services. With SQS.
- Don’t fear constrains