Aws Csaa Notes
Aws Csaa Notes
· 18 regions
· Us-east, us-west, sa-east, cn-north-1, cn-northwest-1, ap-northeast, ap-southeast, eu-central, eu-
west
QucikSight
· BI service
· Pay per session pricing model
AWS Config
· Allows you to record config changes on the resources
· You can setup rules for change in configuration
· Rules can be run automatically or scheduled via lambda
· Pricing is per config rule
AWS WAF
· Web App Firewall
· You can define rules to avoid attacks
· Pricing is per rule and number of web requests
AWS Inspector
· Automated security assessment
· Creates report for security vulnerability, bad practices
Trusted Advisor
· Automated tips on reducing cost or making improvement
AWS Directory
· Information about your organization
Data Pipeline
· Dynamo -> S3
· Mysql Full Copy -> S3, Redshift
· MySql incremental -> S3, Redshift
· S3 -> Mysql, Redshift, Dynamo
· Run JOB on EMR
Auto Scaling
· Mandatory elements of Autoscaling group:
o Minimum size
o Launch Configuration
o Health Checks and desired capacity are optional
S3:
· Returns 200 status code on successful write
· S3 is object based ie. allows file upload
· File size can be 0 to 5TB
· URL: https://s3-region.amazonaws.com/bucket-name
· Web URL: https://bucketname.s3-website.region.amazonaws.com
· When AWS client doesn’t mention bucket region, default created is North-Virginia
· Read after write consistency for new objects
· Eventual consistency for overwrite / delete operation for existing objects
· Upload is atomic. You won’t get partially updated object (except for multi-part)
· To make sure data is evenly spread across partitions, randomize the key name by prefixing it
with some salt (random text)
· S3 object contains of, key, value, version-id, metadata, sub-resources like torrent, ACL
· Supports encryption
· Deleting bucket prompts to enter bucket name, even if bucket is empty
S3 Storage Tiers:
· S3 Standard: - 99.99% availability, 99.999999999% durability
· S3 - IA (Infrequent Access): For data that’s accessed rarely but when accessed it needs to be
retrieved very fast. It has lower S3 fee, but is charged on retrieval.
· Reduced Redundancy Storage RRS: 99.99% durability, 99.99% availability over given year. This is
for data that is reproducible. Like thumbnails
· Glacier: Very cheap, used for archival only. Takes 3-5 hours to restore from S3
S3 Charges based on: storage, requests, data transfer (transfer in is free), storage management
policy
Types of S3 Encryption:
· Client Side encryption
· Server Side encryption:
o With Amazon S3 managed Keys ( SSE - S3)
o KMS (SSE - KMS) (extra charges, provides audit trail)
o Customer provided keys (SSE - C)
· Control access by bucket ACL or bucket policies
You can enable MFA delete on S3 versioning. Which forces MFA to suspend versioning OR to
permanently delete any of the version.
S3 Versioning:
· Stores all versions, including all writes as well as deletes
· Once enabled versioning can not be disabled
· Integrates with life cycle rules
S3 Replication:
· Between buckets in same region
· Between buckets in different region
· Between buckets in different account
· Versioning must be enabled for cross-region / cross account replication.
· Once enabled only new objects get replicated not the existing one
· Each object can have at max only one destination
· By default original permissions are replicated, can be overridden to destination bucket owner
· If you delete object, delete marker gets replicated, but if you delete the delete marker, it doesn’t
get replicated
· When object existing before enabling replication, is updated in the source bucket, it doesn’t get
replicated to destination
· Regions must be different. can not use cross region replication over same region
· Deleting individual version, won’t get replicated
· Chaining of replication buckets doesn’t work. It doesn’t replicate content to destination except
first one
S3 Lifecycle Rules:
· Provides following three transition rules:
o Move to Standard - IA
o Move to One zone - IA
o Move to Glacier
· Object must remain in Standard for 30 days
· Object must remain in Standard - IA for atleast 30 days
· Object must remain in Glacier atleast for 90 days
· Adding life cycle rule, places the tick mark before the bucket name
S3 Security:
o By default all buckets are private
o Can secure access by:
o Bucket policy: bucket wide
o Access control list (ACL): individual object
o Can be configured to create access logs to same bucket or another account
CloudFront Overview:
o Origin can be S3 bucket, EC2, Elastic Load Balancer, Route 53, or non-AWS system
o Distribution is the name given to CDN, consists of collections of edge locations
o Over 50 edge locations
o Same distribution can have multiple origins
o From cloudfront you can restrict direct access to S3 file
o You can restrict access to cloudfront URL using signed URL or signed Cookies
o To specify multiple origin, use path patterns. You can set it after creation of the distribution.
o You can setup error page
o You can set geo restrictions by specifying whitelisted and blacklisted countries
S3 Accelerate
o You can use cloudfront edges to accelerate uploads to S3.
o Upload URL is of the form: bucketName.s3-accelerate.amazonaws.com
o You can enable it from bucket properties
o S3 web console also shows comparison of upload speeds. It can be slower in some regions
EBS Volumes:
· Max Size = 16 TB
· General purpose SSD: 3 IOPs per GB. Allows bursts upto 3000 IOPS
· Provisioned IOPS SSD: For I/O intensive applications like large DB. Upto 20000 IOPS
· Throughput optimized HDD (ST1): Magnetic storage. Frequently accessed. For large amount for
sequential data. Big data, data warehousing, log processing, can not be boot volume
· Cold HDD (SC1): Lowest cost, infrequent accessed device
· Magnetic Storage Standard: Lowest cost per GB + bootable, infrequently accessed data
· One EBS can be mounted to only one EC2.
· EBS volume must be in same AZ as EC2
· You can upgrade EBS volume type on the fly without downtime. (you can not upgrade standard
type EBS ie magnetic)
· Volumes restored from encrypted snapshot are encrypted automatically
· Snapshot of encrypted volumes are encrypted automatically
· Snapshot can be shared only if they are un-encrypted. Snapshot can be shared with other aws
account or made public
· You can create ebs volume from snapshot in another availability zone
· To move EC2 from one region to another, you have to take snapshot and copy that snapshot to
another region. Then create Image of snapshot
EC2 Launch Instance:
· Root volume can be General Purpose SSD (GP2), Provisioned IOPS(IO1), Magnetic
· Extra volumes can be, GP2, IO1, SC1, ST1, Magnetic
· When EC2 is deleted, root volume gets deleted by default. You can change this configuration in
EC2 launch wizard
· To login key should have permission: 400
· If you enable termination protection, you won’t be able to terminate the instance accidently. If
you try, aws console will ask you to disable the termination protection first
· EC2 has following types of status checks:
o System status check: hardware issue. Instance not reachable. Reboot might not fix the issue, you
might have to terminate it.
o Instance status check: This check monitors software and network configuration. If failed, you can
reboot to fix it.
· Can not encrypt root volume
· To launch reserved instance you have following payment options, which affects discounts:
o No upfront
o Partial Upfront
o All upfront
o Light utilization
o Medium Utilization
Security Groups:
· Set of rules which allows traffic to and from the EC2
· Rule contains source IP and ec2 port number
· Any rule change applies immediately
· Security Groups are stateful. For each inbound rule, the outbound traffic is automatically allowed.
· You can not deny traffic. You can only allow it.
o Mostly works at layer 4, but also supports Works at Layer 7 features as sticky
sessions, X-Forwarded for
• ELB Errors: Classic load balancer: If app stops responding LB return 504 time out
X-Forwarded-For: Server is unware of public IP address, as he receives ELB IP. Hence, ELB adds
X-Forwarded-For header with public ip of the user
Cloudwatch:
· Basic monitoring is free. Detailed is paid
· Doesn’t report memory. It can be done but we need to install reporting agent on ec2
· Doesn’t aggregate across region
· Supports following widgets:
o Line
o Stacked area
o Number
o Text
· For EC2 cloudwatch supports following metrics:
IAM Roles
1. AWS Service Role
2. AWS service-linked role
3. Role for cross-account access
4. Role for identity provider access
Roles are not region specific
You can create Role with AWS Service Type = EC2. When you create EC2 you can specify this role
to EC2. In EC2 description you can see the above role.
The role can be updated to running instance
Once the role is set, you can access the allowed resources from AWS CLI without using
credentials
AutoScaling Group
Before you can create autoscaling group, you need to create launch configuration
EFS
· Supports NFSv4
· No pre-provisioning. You pay for the storage you use
· Can scale upto peta-bytes
· Supports thousands of concurrent NFS connections
· Data is stored multi AZ
· It’s blocked based storage (vs object based for s3)
· To create EFS, choose VPC, AZ and subnets, security groups
· EC2s must be in same security group as EFS
· Use mount -t nfs4 efs_url:/ /target/directory to mount EFS volume
AWS Lambda
· C#, GO, Java, Node JS, Python
· Triggers: IoT, Gateway, Alexa Skill Kit, Alexa Smart home, CloudFront, CLoudwatch events,
Cloudwatch Logs, CodeCommit, CognitoSync Trigger, DynamoDB, S3, SNS
· Cost: $0.2 per million (first million free)
· Max 5 Minutes per execution
· Too many lambda architecture can get complex, use AWS x-ray to monitor
Route53
· EC2, security group doesn’t support IPV6
· All ROOT level domain names are managed by IANA (http://www.iana.org/domains/root/db)
· Domain names are managed by InterNIC (service of ICANN). Each domain becomes part of
WhoIS database
· NS Records are used by Top level Domain servers to direct traffic to the Content DNS server
which contains the authoritative DNS records.
· A Record is mapping of Domain name to IP address
· TTL: Time to live
· CNAME: Is domain to domain name mapping
· Alias record is used to map resources like S3, ELB to domain name
· Naked domain name must be A record, it can not be CNAME record
· Routing policies:
o Failover
o Simple
o Latency based
o Geolocation
o Weighted
· PTR Record is used for reversed domain name mapping
Route53
· Top level domain names (.com, .gov, .co ) are managed by IANA (maintained in whois
· Second level domain name (wu.com, viu.com ) are managed by registrar (like go daddy)
· A record points to IP address. Hence it can not point to load balancer
· CName (canonical name) points to another domain name
· Alias records maps to AWS resources (ELB, S3 Bucket, Cloudfront)
· CName can’t be used for naked domain names(without www). It must be either A d, or Alias
· Alias record changes are realtime
· ELB doesn’t have public ip address by default
· Alias records are free
· Domain registration: May take upto 3 days
Database backups
• Automated Backup:
o Enabled by default
o Stored in S3.
o Free storage same as the size of DB instance.
o Backups are taken within defined window.
o It can cause elevation in latency during backup
o If DB is deleted, backups are deleted
• Snapshot:
o Manual – user initiated
o Stored even if db is deleted
· Restoring DB creates new DB instance with new endpoint
· Database encryption:
o Encryption of data at rest
o Uses AWS KMS
o Snapshot and backups will also be encrypted
o Can not encrypt existing database. Must create snapshot, then copy the snapshot and enable the
encryption in copy configuration
· You can restore to specific point in time. It first finds the snapshot for that time, and run
transaction logs until the time specified
· Multi AZ Database:
o Creates real time backup in multiple availability zone
o It is meant only for Disaster recovery.
o Only if primary DB instance goes down, it will activate instance from another zone
o Available for: SQL Server, MySql, MariaDB, Oracle, PostgreSQL
o Synchronous
· Read Replicas:
o Used for read-intensive apps
o Creates replica of the database
o Can have different AZ or different region all together
o You can create read replica of read replica (but latency will be poor)
o Asynchronous
o Available for SQL Server, MySQL, MariaDB, PostgreSQL
DynamoDB
· Stored on SSD Storage
· Spread across 3 different AZ
· Eventual consistent read by default: Consistency across all copies is usually reached n second.
Best read performance
· Strongly consistent read: Returns result after all writes operations
· Priced by provisionedReadCapacity, provisionedWriteCapacity and storage
· You can not add, edit secondary index on existing table. It must be created during table creation
· 5 Global, 5 local secondary indexes.
· Types of Indexes:
o Global Secodary Index: Its own primary and sort key. Own provisioned capacity.
o Local secondary Index: Same primary key as base table. Provisioned settings based on base table.
Redshift
· Used for OLAP – warehousing
· Columnar Database
· No need to specify indexes. Redshift automatically deduce it.
· Huge compression of the data
· Can be single node or multi-node (one leader node, upto 128 compute nodes)
· Redshift performs parallel query across all nodes, hence 10x faster
· Data is encrypted at rest using KMS
· In transit it uses SSL
· Priced as per compute node hour, backup, data transfer
· Single AZ
Cache
· Improves performance of read heavy / compute intensive by caching
· Memcached
· Redis: Supports data structures as sorted sets, lists. Supports master slave, read replication
across AZ
Aurora
· MySQL compliant
· 5 times faster than MySql, 1/10th cost of MySQL
· Starts with 10GB, scales to 64 TB
· Maintains 2 copies of the data at 3 AZ each
· It is self healing, data blocks and disks are continuously scanned for error and red automatically
· You can have 15 Aurora replicas or 5 mysql replicas
SQS
· Pull based, 256 KB size messages, 1 min to 14 days retention, default retention is 4 days
· Standard QUEUE:
o Default queue
o Unlimited number of messages per sec
o Makes sure that message is delivered at least once
o Unordered
· FIFO Queue
o Message is delivered exactly once
o Remains available until consumer processes it and deletes it
o Duplicates not allowed
o Supports multiple ordered message groups within single queue
o Limited to 300 transaction per second
· Visibility timeout:
o Time for which message remains invisible after reader reads it. If reader takes more time to
process the message than the visibility timeout, the message will appear again on queue and
some other ec2 will pick it up
o Default is 30 seconds, max 12 hours
· Long polling: Doesn’t return until it gets the message
· Short polling returns immediately even if it doesn’t get the message
SWF
· SWF is webservice which allows co-ordination of work across distributed systems
· Media-processing, web backend, business process workflows can be designed as SWF .
· Tasks can be executable code, webservice call, human action, scripts etc
· AWS SWF maintains the execution state so workers n deciders don’t have to maintain it
· Task is assigned only once and never duplicated
· Retention period is measured in seconds, max is 1 year
· Actors can be:
o Workers: programs that get tasks, process it and return results
o Deciders: program that controls the coordination of tasks ie ordering, concurrency, scheduling
o Workflow starters
· Each workflow run in Domain. You can have multiple domain per account. Domain can’t interact
with each other
SNS
· Sends notification to android, iOS, windows phone, SMS, email, HTTP end point, Lambda
function, SQS
· Push based
API Gateway
· Fully managed
· Low cost
· You can throttle request
Kinesis
· 5 transactions per second for reads, 2 MB per second per shard
· 1000 records per second or upto 1 MB per second
· Retention is 24 hours by default can be increased to 7 days
VPC
· Default mode for VPC is multi-tenant. You can configure it for dedicated hardware too. But
expensive.
· Whenever VPC is created, it creates default route, security group, network ACL
· Security Group
o Outbound traffic is all allowed by default
o For default security group, all inbound from same security group is allowed
o For custom security group, all inbound is blocked by default.
· NACL
o For default NACL, all inbound and outbound is allowed
o Custom NACL, blocks all inbound and outbound by default
· Subnet:
o To create subnet you have to specify CIDR block, the VPC and the AZ
o Whenever new Subnet is created, by default it gets associated with main route table
o For any subnet you get 5 IPs less than CIDR. As these are reserved for: Network address, VPC
router, DNS, Future use, broadcast reserved address
o You have to create Internet Gateway and attach it to VPC. Only one IG can be assigned to one VPC.
· Route Table:
o you need to specify VPC.
o Then specify Route (Destination and target)
o To allow internet connectivity traffic, use destination as 0.0.0.0/0 (::/0 for IPV6) and target as IG
o Then associate subnet to the route.
· To make subnet public
o Create Internet Gateway, attach it to VPC
o Create Route table with public route (destination = any and target = IG)
o Associate subnet with the route
o Got to subnet section and allow auto assign public IP
· To allow out going traffic from private subnet:
o Create NAT instance in the public subnet, by launching ANY EC2 instance and selecting NAT AMI
during launch
o Disable source / destination check on the EC2 instance
o Allow all required traffic in the security group
o In the route table create route for any outgoing request to point to NAT instance
o Associate private subnet with the route
· NAT Gateway
o Does the same work as NAT instance, but it’s fully managed by AWS
o To create NAT gateway, specify public subnet
o Operates only on single AZ. So you will need one NAT gateway per AZ
· NACL (Network Access Control List)
o NACL is numbers list of rules to Allow or Deny inbound or outbound traffic to specified host n port
o Whenever you create new NACL, by default it blocks all traffic. But when it is created automatically
as a part of VPC creation, it allows all traffic by default.
o Each subnet must be associated with NACL
o Same NACL can be associated with multiple subnet, but each subnet can be associated with only
one NACL
o NACL is stateless. Ie inbound and outbound rules are applied separately
· Load balancer in VPC
o When you create load balancer, you must specify VPC
o You can specify only one subnet from each availability zone
o You must specify subnets from at-least two AZ for High availability
o Load balancer must be created in public subnet, else it give warning, that it won’t be useful
· VPC flow logs
o Creates network traffic logs and push it to cloudwatch
o Can be created at 3 levels:
· VPC
· Subnet
· Network Interface level
o To enable flow logs:
· Go to VPC -> actions -> create flow logs
· Select filter over logs (all / rejected / accepted)
· Create Role to grant access to push data to cloudwatch
· Create log group in cloudwatch and map this group in flow log settings
· You can export cloudwatch log group to S3, stream to lambda, stream to elastic search
o You can not enable flow logs for VPC which are peered to your VPC, but are not under your
account
o You can not tag flow logs
o You can not change settings for flow log
o Following traffic is NOT monitored:
· Communication with AWS DNS
· Windows instance activation
· Traffic to and from 169.254.169.254 for instance metadata
· DHCP traffic
· Traffic to reserved IP address for default VPC router
· VPC endpoint
· You can create VPC endpoint to communicate to other AWS services over private network
o If you have to upload file to S3 from private network, it has to go through NAT instance, which is
in public network. Instead you can create VPC endpoint, which creates entry point to AWS service
like s3. And map it to the private subnet. It gets automatically added in the route table.
· Cleaning up VPC
o Delete EC2
o Delete NAT gateway
o Delete endpoint
o Detach IG (can not be done before Deleting NAT gateway)
o Delete IG
o Go to Actions - > delete VPC
o It will delete subnets, security groups, Network ACL, VPC Attachments, IG, Route Tables, Network
interfaces, VPC Peering connections
· If subnet only routes traffic to VPG, it’s VPN only subnet
· Route tables allows traffic. Used for ec2 from diff subnet talk
· Minimum size of cidr in vpc is /28
· Max size is /16
· CIDR of VPC can not be changed
· When you create VPC, all subnet can connect each other by default
VPC Peering
· Connection between two VPC of same or diff AWS account
· Must be done between same region
· CIDR must be different for these VPCs. It can not have overlapping IP addresses
· Transitive VPC not supported
Direct Connect
· Dedicated line between ON Prim to AWS (No internet)
· Reduce cost for large data transfer
· Increase reliability
· Increase bandwidth
· Available in 1 Gbps and 10Gbps
· Uses VLAN trunking
Security Token Service (STS)
· Grants users temporary access to AWS. User can come from three sources
o Federation (Active Directory)
§ Uses SAML
§ Doesn’t need to be IAM user
o SSO
o Cross account
AWS Workspaces
· Its VDI. It’s a replacement for traditional desktop
· It’s available as a bundle of softwares, compute resources, storage
· User can connect to workspace from any device(PC, Mac, Android tablet, Kindle Fire,
chromebook)
· By default get Admin access
· D:\ is backed up every 12 hours
· You do not need AWS account to login to workspace. Either account created by administrator or
Active directory login can work