DevOps - Module 2 Notes
DevOps - Module 2 Notes
AWS DevOps
AWS(Amazon Web Services) provides services that help in implementing DevOps methodology.
DevOps Engineering on AWS teaches you how to use the combination of DevOps cultural
philosophies, practices, and tools to increase your organization’s ability to develop, deliver, and
maintain applications and services at high velocity on AWS.
Identity and access management (IAM) is the discipline that enables the right individuals to access
the right resources at the right times for the right reasons. IAM is a framework of business processes,
policies and technologies that facilitates the management of electronic or digital identities. With an
IAM framework in place, information technology (IT) managers can control user access to critical
information within their organizations.
Features of IAM
● Centralised control of your AWS account: You can control creation, rotation, and
cancellation of each user's security credentials. You can also control what data in the aws
system users can access and how they can access it.
● Shared Access to your AWS account: Users can share the resources for the collaborative
projects.
● Granular permissions: It is used to set a permission that user can use a particular service
but not other services.
● Identity Federation: An Identity Federation means that we can use Facebook, Active
Directory, LinkedIn, etc with IAM. Users can log in to the AWS Console with the same
username and password as we log in with the Active Directory, Facebook, etc.
● Multi Factor Authentication: An AWS provides multi factor authentication as we need to
enter the username, password, and security check code to log in to the AWS Management
Console.
● Permissions based on Organizational groups: Users can be restricted to the AWS access
based on their job duties, for example, admin, developer, etc.
● Networking controls: IAM also ensures that the users can access the AWS resources within
the organization's corporate network.
● Provide temporary access for users/devices and services where necessary: If you are
using a mobile app and storing the data in an AWS account, you can do this only when you
are using temporary access.
● Integrates with many different aws services: IAM is integrated with many different aws
services.
● Eventually Consistent: IAM service is eventually consistent as it achieves high availability
by replicating the data across multiple servers within Amazon's data center around the
world.
● Free to use: AWS IAM is a feature of AWS accounts which is offered at no additional
charge. You will be charged only when you access other AWS services by using IAM user.
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-
leading scalability, data availability, security, and performance. Amazon S3 provides management
features so that you can optimize, organize, and configure access to your data to meet your specific
business, organizational, and compliance requirements.
S3 is a safe place to store the files.The files which are stored in S3 can be from 0 Bytes to 5 TB. It
has unlimited storage which means that you can store the data as much as you want. Files are stored
in Bucket. A bucket is like a folder available in S3 that stores the files. S3 is a universal namespace,
i.e., the names must be unique globally. Bucket contains a DNS address. Therefore, the bucket must
contain a unique name to generate a unique DNS address.
If you upload a file to an S3 bucket, then you will receive an HTTP 200 code that means that the
uploading of a file is successful.
Amazon S3 Features
● Low cost and Easy to Use − Using Amazon S3, the user can store a large
amount of data at very low charges.
● Secure − Amazon S3 supports data transfer over SSL and the data
gets encrypted automatically once it is uploaded. The user has
complete control over their data by configuring bucket policies
using AWS IAM.
● Scalable − Using Amazon S3, there need not be any worry about
storage concerns. We can store as much data as we have and access
it anytime.
● Higher performance − Amazon S3 is integrated with Amazon
CloudFront, which distributes content to the end-users with low
latency and provides high data transfer speeds without any
minimum usage commitments.
● Integrated with AWS services − Amazon S3 integrated with AWS services
include Amazon CloudFront, Amazon CLoudWatch, Amazon Kinesis,
Amazon RDS, Amazon Route 53, Amazon VPC, AWS Lambda, Amazon
EBS, Amazon DynamoDB, etc.
Advantages of Amazon S3
● Create Buckets: Firstly, we create a bucket and provide a name to the bucket. Buckets are
the containers in S3 that store the data. Buckets must have a unique name to generate a
unique DNS address.
● Storing data in buckets: Buckets can be used to store an infinite amount of data. You can
upload the files as much as you want into an Amazon S3 bucket, i.e., there is no maximum
limit to store the files. Each object can contain upto 5 TB of data. Each object can be stored
and retrieved by using a unique developer assigned-key.
● Download data: You can also download your data from a bucket and can also give
permission to others to download the same data. You can download the data at any time
whenever you want.
● Permissions: You can also grant or deny access to others who want to download or upload
the data from your Amazon S3 bucket. Authentication mechanism keeps the data secure
from unauthorized access.
● Standard interfaces: S3 is used with the standard interfaces REST and SOAP interfaces
which are designed in such a way that they can work with any development toolkit.
● Security: Amazon S3 offers security features by protecting unauthorized users from
accessing your data.
S3 Object
● Key: It is simply the name of the object. For example, hello.txt, spreadsheet.xlsx, etc. You
can use the key to retrieve the object.
● Value: It is simply the data that is made up of a sequence of bytes. It is actually data inside
the file.
● Version ID: Version ID uniquely identifies the object. It is a string generated by S3 when
you add an object to the S3 bucket.
● Metadata: It is the data about data that you are storing. A set of a name-value pair with
which you can store the information regarding an object. Metadata can be assigned to the
objects in the Amazon S3 bucket.
● Subresources: Subresource mechanism is used to store object-specific information.
● Access control information: You can put the permissions individually on your files.
Amazon S3 Concepts
● Buckets
○ A bucket is a container used for storing the objects.
○ Every object is incorporated in a bucket.
○ For example, if the object named photos/tree.jpg is stored in the tree image bucket,
then it can be addressed by using the URL
http://treeimage.s3.amazonaws.com/photos/tree.jpg.
○ A bucket has no limit to the amount of objects that it can store. No bucket can exist
inside of other buckets.
○ S3 performance remains the same regardless of how many buckets have been
created.
○ The AWS user that creates a bucket owns it, and no other AWS user cannot own it.
Therefore, we can say that the ownership of a bucket is not transferable.
○ The AWS account that creates a bucket can delete a bucket, but no other AWS user
can delete the bucket.
● Objects
○ Objects are the entities which are stored in an S3 bucket.
○ An object consists of object data and metadata where metadata is a set of name-value
pairs that describes the data.
○ An object consists of some default metadata such as date last modified, and standard
HTTP metadata, such as Content type. Custom metadata can also be specified at the
time of storing an object.
○ It is uniquely identified within a bucket by key and version ID.
● Key
○ A key is a unique identifier for an object.
○ Every object in a bucket is associated with one key.
○ An object can be uniquely identified by using a combination of bucket name, the
key, and optionally version ID.
○ For example, in the URL http://jtp.s3.amazonaws.com/2019-01-31/Amazons3.wsdl
where "jtp" is the bucket name, and key is "2019-01-31/Amazon S3.wsdl"
● Regions
○ You can choose a geographical region in which you want to store the buckets that
you have created.
○ A region is chosen in such a way that it optimizes the latency, minimizes costs or
addresses regulatory requirements.
○ Objects will not leave the region unless you explicitly transfer the objects to another
region.
● Data Consistency Model
Amazon S3 replicates the data to multiple servers to achieve high availability.
Two types of model:
○ Read-after-write consistency for PUTS of new objects.
■ For a PUT request, S3 stores the data across multiple servers to achieve high
availability.
■ A process stores an object to S3 and will be immediately available to read the
object.
■ A process stores a new object to S3, it will immediately list the keys within
the bucket.
■ It does not take time for propagation, the changes are reflected immediately.
○ Eventual consistency for overwrite PUTS and DELETES
■ For PUTS and DELETES to objects, the changes are reflected eventually,
and they are not available immediately.
■ If the process replaces an existing object with the new object, you try to read
it immediately. Until the change is fully propagated, the S3 might return prior
data.
■ If the process deletes an existing object, immediately try to read it. Until the
change is fully propagated, the S3 might return the deleted data.
■ If the process deletes an existing object, immediately list all the keys within
the bucket. Until the change is fully propagated, the S3 might return the list
of the deleted key.
AWS Storage Classes
● S3 Standard
● S3 Standard IA
● S3 one zone-infrequent access
● S3 Glacier
S3 Standard
● Standard storage class stores the data redundantly across multiple devices in multiple
facilities.
● It is designed to sustain the loss of 2 facilities concurrently.
● Standard is a default storage class if none of the storage class is specified during upload.
● It provides low latency and high throughput performance.
● It designed for 99.99% availability and 99.999999999% durability
S3 Standard IA
● S3 one zone-infrequent access storage class is used when data is accessed less frequently but
requires rapid access when needed.
● It stores the data in a single availability zone while other storage classes store the data in a
minimum of three availability zones. Due to this reason, its cost is 20% less than Standard
IA storage class.
● It is an optimal choice for the less frequently accessed data but does not require the
availability of Standard or Standard IA storage class.
● It is a good choice for storing the backup data.
● It is cost-effective storage which is replicated from other AWS region using S3 Cross Region
replication.
● It has the same durability, high performance, and low latency, with a low storage price and
low retrieval fee.
● It is designed for 99.5% availability and 99.999999999% durability of objects in a single
availability zone.
● It provides lifecycle management for the automatic migration of objects to other S3 storage
classes.
● The data can be lost at the time of the destruction of an availability zone as it stores the data
in a single availability zone.
S3 Glacier
● S3 Glacier storage class is the cheapest storage class, but it can be used for archive only.
● You can store any amount of data at a lower cost than other storage classes.
● S3 Glacier provides three types of models:
○ Expedited: In this model, data is stored for a few minutes, and it has a very high fee.
○ Standard: The retrieval time of the standard model is 3 to 5 hours.
○ Bulk: The retrieval time of the bulk model is 5 to 12 hours.
● You can upload the objects directly to the S3 Glacier.
● It is designed for 99.999999999% durability of objects across multiple availability zones.
AWS CloudFront
Amazon CloudFront is the content delivery network (CDN) service of Amazon.The CloudFront
network has 197 points of presence (PoPs). CloudFront provides low latency and high data transfer
speeds.Content to be distributed can be published in the origin server ( S3 for static content, EC2
for dynamic content).Origin servers can be registered with Amazon CloudFront through an API
call/console. This will return a CloudFront.net domain name ( e.g., test765.cloudfront.net) that can
be used to distribute content via the Amazon CloudFront service.
CloudFront CDN content is organized into distribution where content and delivery properties are
configured. The steps to create a distribution are as follows:
● Log in to the AWS CloudFront console
● Choose ”Create Distribution”.
● Specify the distribution properties:
● Content origin— S3 bucket/MediaPackage channel/HTTP server from which CloudFront
gets the files for distribution.
● Access— Which users/groups
● have access to the content.
● Security—e.g., Users must use HTTPS to access your content.
● Cookie or query-string forwarding—whether you want CloudFront to forward cookies or
query strings to your origin.
● Geo-restrictions—Restrict access in selected geographies.
● Access logs—Create access logs for analysis.
Origin Domain Name: It defines where the origin is coming from. Origin domain name is
jtpbucket.s3.amazonaws.com in which jtpbucket is a bucket that we have created in S3.
Origin Path: There can be multiple origins in a distribution. Origin path is a folder in S3 bucket.
You can add the folders in S3 bucket and put it in the Origin Path, means that the origin is coming
from the different folders not from the bucket itself. I leave the Origin Path with a default value.
Origin ID: It is the name of the origin. In our case, the name of the origin is S3-jtpbucket.
Restrict Bucket Access: If you don't want the bucket to be publicly accessible by the S3 URL and
you want that all requests must go through CloudFront, then enable the Restrict Bucket Access
condition.
Origin Access Identity: We do not have any existing identity, so we click on the Create a new
identity.
Grant Read Permissions on Bucket: Either you can manually update the permissions or you want
the permissions to be updated automatically. So, we click on the Yes, Update Bucket Policy.
● After the Distribution has been created, we get the domain name of the CloudFront
Distribution and we also know the object name that we have placed in the S3 bucket. Now,
the link can be created as given below:
An AWS user can increase or decrease instance capacity as needed within minutes using the
Amazon EC2 web interface or an application programming interface (API). A developer can code
an application to scale instances automatically with AWS Auto Scaling. A developer can also define
an autoscaling policy and group to manage multiple instances at once.
How EC2 works
To begin using EC2, developers sign up for an account at Amazon's AWS website. They can then
use the AWS Management Console, the AWS Command Line Tools (CLI), or AWS Software
Developer Kits (SDKs) to manage EC2.
A developer then chooses EC2 from the AWS Services dashboard and 'launch instance' in the EC2
console. At this point, they select either an Amazon Machine Image (AMI) template or create an
AMI containing an operating system, application programs, and configuration settings. The AMI is
then uploaded to the Amazon S3 and registered with Amazon EC2, creating an AMI identifier. Once
this has been done, the subscriber can requisition virtual machines on an as-needed basis.
Data only remains on an EC2 instance while it is running, but a developer can use an Amazon Elastic
Block Store volume for an extra level of durability and Amazon S3 for EC2 data backup.
EC2 also offers Amazon CloudWatch which monitors Amazon cloud applications and resources,
allowing users to set alarms, view graphs, and get statistics for AWS data; and AWS Marketplace,
an online store where users can buy and sell software that runs on AWS.
Benefits
Getting started with EC2 is easy, and because EC2 is controlled by APIs developers can commission
any number of server instances at the same time to quickly increase or decrease capacity. EC2 allows
for complete control of instances which makes operation as simple as if the machine were in-house.
The flexibility of multiple instance types, operating systems, and software packages and the fact
that EC2 is integrated with most AWS Services -- S3, Relational Database Service (RDS), Virtual
Private Cloud (VPC) -- makes it a secure solution for computing, query processing, and cloud
storage.
Challenges
Resource utilization -- developers must manage the number of instances they have to avoid costly
large, long-running instances.
Security -- developers must make sure that public-facing instances are running securely.
Deploying at scale -- running a multitude of instances can result in cluttered environments that are
difficult to manage.
Management of AMI lifecycle -- developers often begin by using default Amazon Machine
Images. As computing needs change, custom configurations will likely be required.
Ongoing maintenance -- Amazon EC2 instances are virtual machines that run in Amazon's cloud.
However, they ultimately run on physical hardware which can fail. AWS alerts developers when an
instance must be moved due to hardware maintenance. This requires ongoing monitoring.
Route53
Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service. It
is designed for developers and corporates to route the end users to Internet applications by
translating human-readable names like www.mydomain.com, into the numeric IP addresses like
192.0.2.1 that computers use to connect to each other.
How to Configure Amazon Route 53?
Following are the steps to configure Route 53.
Step 2 − Click create hosted zone option on the top left corner of the
navigation bar.
Step 3 − A form page opens. Provide the required details such as domain
name and comments, then click the Create button.
Step 4 − Hosted zone for the domain will be created. There will be four
DNS endpoints called delegation set and these endpoints must be
updated in the domain names Nameserver settings.
Features of Route 53
Databases on AWS
● Amazon Relational Database Service: It supports six commonly used database engines.
● Amazon Aurora: It is a MySQL-Compatible relational database with five times
performance.
● Amazon DynamoDB: It is a fast and flexible NoSQL database service.
● Amazon Redshift: It is a petabyte-scale data warehouse service.
● Amazon Elasticache: It is an in-memory cache service with support for Memcached and
Redis.
● AWS Database Migration Service: It is a service that provides easy and inexpensive to
migrate your databases to AWS cloud.
● Relational Databases are the databases that most of us are all used to. It has been around
since the ‘70s.
● A relational database is like a spreadsheet such as Excel, etc.
● A Database consists of tables. For example, Excel is a spreadsheet that consists of a
workbook, and inside the workbook, you have different sheets, and these sheets are made
up of rows and columns.
Oracle
● It is a very popular relational database.
● It is used by big enterprises but can be used by other businesses as well.
● Oracle is a Relational Database Management developed by Oracle.
● It is easy to set up, operate, and scale Oracle deployment in the cloud.
● You can deploy multiple editions of Oracle in minutes with cost-effective and re-sizable
hardware capacity.
● Amazon RDS frees you from managing the time-consuming database administration tasks.
You need to focus on the development part.
● You can run Oracle under two different licensing models, i.e., "License Included" and
"Bring-Your-Own-License".
Where,
License Included Model: In this model, you do not need to purchase the Oracle license separately,
i.e., Oracle Database software has been licensed by AWS only. The pricing starts at $0.04 per hour.
Bring-Your-Own-License (BYOL): If you own Oracle Database License, then you can use the
BYOL model to run Oracle database on Amazon RDS. The pricing starts at $0.025 per hour. This
model is used by those customers who already have an existing Oracle license or purchase the new
license to run the Oracle database on Amazon RDS.
MySQL Server
PostgreSQL
Aurora
MariaDB
What is Elasticache?
● Elasticache is a web service used to deploy, operate, and scale an in-memory cache in the
cloud.
● It improves the performance of web applications by allowing you to retrieve information
from fast, managed in-memory cache instead of relying entirely on slower disk-based
databases.
● For example, if you are running an online business, customers continuously asking for the
information of a particular product. Instead of front-end going and always asking
information for a product, you can cache the data using Elasticache.
● It is used to improve latency and throughput for many read-heavy application workloads
(such as social networking, gaming, media sharing, and Q&A portals) or compute intensive
workloads (such as a recommendation engine).
● Caching improves application performance by storing critical pieces of data in memory for
low latency access.
● Cached information may include the results of I/O-intensive database queries or the results
of computationally-intensive calculations.
Types of Elasticache
● Memcached
● Redis
Memcached
Benefits of Memcached
● Sub-millisecond response times
Since Memcached stores the data in the server's main memory, in-memory stores don't have to go
to disk for the data. Therefore, it has a faster response time and also supports millions of operations
per second.
● Simplicity
The design of Memcached is very simple that makes it powerful and easy to use in application
development. It supports many languages such as Java, Ruby, Python, C, C++, etc.
● Scalability
The architecture of Memcached is distributed and multithreaded that makes easy to scale. You can
split the data among a number of nodes that enables you to scale out the capacity by adding new
nodes. It is multithreaded means that you can scale up the compute capacity.
● Community
● Caching
It implements the high-performance in-memory cache which decreases the data access latency,
increases latency, ease the load of your back-end system. It serves the cached items in less than a
millisecond and also enables you to easily and cost-effectively scale your higher loads.
● Session store
It is commonly used by application developers to store and manage the session data for internet-
based applications. It provides sub-millisecond latency and also scales required to manage session
states such as user profiles, credentials, and session state.
Redis
Working of Redis
● Redis keeps its data in-memory instead of storing the data in disk or SSDs. Therefore, it
eliminates the need for accessing the data from the disk.
● It avoids seek time delays, and data can be accessed in microseconds.
● It is an open-source in-memory key-value data store that supports data structures such as
sorted sets and lists.
Benefits of Redis
● In-memory datastore
○ Redis stores the data in-memory while the databases such as PostgreSQL,
MongoDB, etc store the data in the disk.
○ It does not store the data in a disk. Therefore, it has a faster response time.
○ It takes less than a millisecond for reading and write operations, and supports
millions of requests per second.
● Flexible data structures & Simplicity
○ It supports a variety of data structures to meet your application needs.
○ It allows you to write fewer lines of code to store, access, and use data in your
applications.
○ For example, if the data of your application is stored in a Hashmap, and you want to
store in a data store, then you can use the Redis hash data structure to store the data.
If you store the data without any hash data structure, then you need to write many
lines of code to convert from one format to another.
● Replication and Persistence
○ It provides a primary-replica architecture in which data is replicated to multiple
servers.
○ It improves read performance and faster recovery when any server experiences
failure.
○ It also supports persistence by providing point-in-time backups, i.e., copying the data
set to disk.
● High availability and scalability
○ It builds highly available solutions with consistent performance and reliability.
○ There are various options available which can adjust your cluster size such as scale
in, scale out or scale up. In this way, cluster size can be changed according to the
demands.
● Extensibility
○ It is an open-source project supported by a vibrant community.
Developer ease of Its syntax is simple to understand Its syntax is simple to understand
use and use. and use.
Advanced data It does not support advanced data It supports various advanced data
structure structures. structures such as sets, sorted set,
hashes, bit arrays, etc.
Multithreaded It supports multithreaded It does not support multithreaded
Architecture architecture means that it has architecture.
multiple processing cores. This
allows you to handle multiple
operations by scaling up the
compute capacity.
Snapshots It does not support the snapshots. Redis also keeps the data in a disk
as a point-in-time backup to recover
from the fault.
Lua Scripting It does not support Lua Scripting. It allows you to execute Lua Scripts
which boost performance and
simplify the application.
What is DynamoDB?
● Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that
require consistent single-digit millisecond latency at any scale.
● It is a fully managed database that supports both document and key-value data models.
● Its flexible data model and performance make it a great fit for mobile, web, gaming, ad-tech,
IoT, and many other applications.
● It is stored in SSD storage.
● It is spread across three geographical data centers.
Because of its availability in three geographically data centres, It consists of two different types of
consistency models:
It maintains consistency across all the copies of data which is usually reached within a second. If
you read data from a DynamoDB table, then the response would not reflect the most recently
completed write operation, and if you repeat to read the data after a short period, then the response
would be the latest update. This is the best model for Read performance.
A strongly consistent read returns a result that reflects all writes that received a successful response
prior to the read.
DynamoDB throughput capacity depends on the read/write capacity modes for performing
read/write operation on tables.
There are two types of read/write capacity modes:
● Provisioned mode
● On-demand mode
Provisioned mode
● It defines the maximum amount of capacity that an application can use from a specified
table.
● In a provisioned mode, you need to specify the number of reads and writes per second
required by the application.
● If the limit of Provisioned mode throughput capacity is exceeded, then this leads to the
request throttling.
● A provisioned mode is good for applications that have predictable and consistent traffic.
● The total number of write capacity unit depends on the item size.
● Only 1 write capacity unit is required for an item up to size 1KB.
● DynamoDB will require additional write capacity units when size is greater than 1KB. For
example, if an item size is 2KB, two write capacity units are required to perform 1 write per
second.
● For example, if you create a table with 20 write capacity units, then you can perform 20
writes per second for an item up to 1KB in size.
On-Demand mode
● DynamoDB on-demand mode has a flexible new billing option which is capable of serving
thousands of requests per second without any capacity planning.
● On-Demand mode offers pay-per-request pricing for read and write requests so that you need
to pay only for what you use, thus, making it easy to balance costs and performance.
● In On-Demand mode, DynamoDb accommodates the customer's workload instantly as the
traffic level increases or decreases.
● On-Demand mode supports all the DynamoDB features such as encryption, point-in-time
recovery, etc except auto-scaling
● If you do not perform any read/write, then you just need to pay for data storage only.
● On-Demand mode is useful for those applications that have unpredictable traffic and the
database is very complex to forecast.
What is Aurora?
Aurora Scaling
● If we start with 10 GB, and we exceed the 10 GB, then it automatically scaled up to 10 GB
storage. 10 GB can be incremented up to 64 TB.
● Compute resources can scale up to 32VCPUs and 244 GB of memory.
● It maintains 2 copies of your data in each availability zone, with a minimum of three
availability zones. Therefore, we can say that it maintains 6 copies of your data.
● It is designed to transparently handle the loss of up to two copies of data without affecting
database write availability and up to three copies without affecting read availability. It is
highly redundant.
● It is also self-healing means that data blocks and disks are continuously scanned for errors
repaired automatically if the errors have been detected.
Replicas
● Aurora Replicas
● MySQL Read Replicas
Aurora Replicas
● Aurora Replicas are the separate points in an Aurora DB cluster which is used for scaling
read operations and increasing availability.
● It can distribute up to 15 Aurora Replicas across the Availability Zones.
● The DB cluster volume is made up of multiple copies of data, and the data in a DB cluster
volume is represented as single to Aurora Replicas in the DB cluster. All the Aurora Replicas
return the same result of a query.
● Aurora replicas perform well for read scaling not for write operations as they are fully
dedicated to the read operations in DB cluster. Write operations are mainly managed by a
primary instance.
● Aurora Replicas are set as failover targets to increase the availability, i.e., if Aurora instance
fails, then the Aurora Replica is promoted as a primary instance.
● If Aurora DB cluster does not include Aurora Replicas, then you need to recreate the DB
instance to recover from the failure event. Aurora Replica is faster than the recreating the
DB instance.
MySQL Read Replica
What is Redshift?
● Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the
cloud.
● Customers can use the Redshift for just $0.25 per hour with no commitments or upfront costs
and scale to a petabyte or more for $1,000 per terabyte per year.
OLAP
Suppose we want to calculate the Net profit for EMEA and Pacific for the Digital Radio Product.
This requires to pull a large number of records. Following are the records required to calculate a Net
Profit:
The complex queries are required to fetch the records given above. Data Warehousing databases
use different type architecture both from a database perspective and infrastructure layer.
Redshift Configuration
● Single node
● Multi-node
Multi-node: Multi-node is a node that consists of more than one node. It is of two types:
● Leader Node
It manages the client connections and receives queries. A leader node receives the queries
from the client applications, parses the queries, and develops the execution plans. It
coordinates with the parallel execution of these plans with the compute node and combines
the intermediate results of all the nodes, and then returns the final result to the client
application.
● Compute Node
A compute node executes the execution plans, and then intermediate results are sent to the
leader node for aggregation before sending back to the client application. It can have up to
128 compute nodes.
Let's understand the concept of leader nodes and compute nodes through an example.
Redshift warehouse is a collection of computing resources known as nodes, and these nodes are
organized in a group known as a cluster. Each cluster runs in a Redshift Engine which contains one
or more databases.
When you launch a Redshift instance, it starts with a single node of size 160 GB. When you want
to grow, you can add additional nodes to take advantage of parallel processing. You have a leader
node that manages the multiple nodes. Leader node handles the client connection as well as compute
nodes. It stores the data in compute nodes and performs the query.
Redshift features
VPC
Architecture of VPC
The outer line represents the region, and the region is us-east-1. Inside the region, we have VPC,
and outside the VPC, we have an internet gateway and virtual private gateway. Internet Gateway
and Virtual Private Gateway are the ways of connecting to the VPC. Both these connections go to
the router in a VPC and then the router directs the traffic to the routing table. Route table will then
direct the traffic to Network ACL. Network ACL is the firewall or much like security groups.
Network ACL are stateliest that allows as well as denies the roles. You can also block the IP address
on your Network ACL. Now, move over to the security group that accesses another line against the
EC2 instance. It has two subnets, i.e., Public and Private subnet. In a public subnet, the internet is
accessible by an EC2 instance, but in a private subnet, an EC2 instance cannot access the internet
on its own. We can connect the instances. To connect an instance, move over to the public subnet,
and then SSH to the private subnet. These are known as jump boxes. In this way, we can connect
an instance in a public subnet to an instance in a private subnet.
● Launch instances in a subnet of your choosing. We can choose our own subnet addressing.
● We can assign custom IP address ranges in each subnet.
● We can configure route tables between subnets.
● We can create an internet gateway and attach it to our VPC.
● It provides much better security control over your AWS resources.
● We can assign security groups to individual instances.
● We also have subnet network access control lists (ACLS).
VPC Peering
● VPC Peering is a networking connection that allows you to connect one VPC with another
VPC through a direct network route using private IP addresses.
● Instances behave as if they were on the same private network.
● You can peer VPC's with other AWS accounts as well as other VPCs in the same account.
● Peering is in a star configuration, i.e., 1 VPC peers other 4 VPCs.
● It has no Transitive Peering!!.
● You can peer between regions. Suppose you have one VPC in one region and other VPC in
another region, then you can peer the VPCs between different regions.
The above figure shows that VPC B has peered to the VPC A, so instance in VPC B can talk to VPC
A. However, VPC B cannot talk to VPC C through VPC A. This is known as Non-Transitive
Peering, i.e., both VPC C and VPC B are not directly linked so they cannot talk to each other.
So, to communicate between VPC B and VPC C, we need to peer at them as shown in the below
figure.
Deployment with EC2
A deployment group is a set of individual EC2 instances that CodeDeploy deploys revisions to.
A deployment group contains individually tagged instances, Amazon EC2 instances in Auto Scaling
groups, or both.
The following diagram shows the major steps in the deployment of application revisions:
These steps include:
1. Create an application and give it a name that uniquely identifies the application revisions
you want to deploy and the compute platform for your application. CodeDeploy uses this
name during a deployment to make sure it is referencing the correct deployment components,
such as the deployment group, deployment configuration, and application revision. For more
information, see Create an application with CodeDeploy.
2. Set up a deployment group by specifying a deployment type and the instances to which you
want to deploy your application revisions. An in-place deployment updates instances with
the latest application revision. A blue/green deployment registers a replacement set of
instances for the deployment group with a load balancer and deregisters the original
instances.
You can specify the tags applied to the instances, the Amazon EC2 Auto Scaling group
names, or both.
If you specify one group of tags in a deployment group, CodeDeploy deploys to instances
that have at least one of the specified tags applied. If you specify two or more tag groups,
CodeDeploy deploys only to the instances that meet the criteria for each of the tag groups.
For more information, see Tagging Instances for Deployments.
In all cases, the instances must be configured to be used in a deployment (that is, they must
be tagged or belong to an Amazon EC2 Auto Scaling group) and have the CodeDeploy agent
installed and running.
We provide you with an AWS CloudFormation template that you can use to quickly set up
an Amazon EC2 instance based on Amazon Linux or Windows Server. We also provide you
with the standalone CodeDeploy agent so that you can install it on Amazon Linux, Ubuntu
Server, Red Hat Enterprise Linux (RHEL), or Windows Server instances. For more
information, see Create a deployment group with CodeDeploy.
You can also specify the following options:
● Amazon SNS notifications. Create triggers that send notifications to subscribers of
an Amazon SNS topic when specified events, such as success or failure events, occur
in deployments and instances. For more information, see Monitoring Deployments
with Amazon SNS Event Notifications.
Autoscaling
As the name suggests, auto-scaling allows you to scale your Amazon EC2 instances up or down
automatically as per the instructions set by the user. Parameters like minimum and maximum
number of instances are set by the user. Using this, the number of Amazon EC2 instances you’re
using increases automatically as the demand rises to maintain the performance, and decreases
automatically as the demand decreases to minimize the cost.
Auto Scaling is particularly effective for those applications that fluctuate on hourly, daily, or
weekly usage. Auto Scaling is enabled by Amazon CloudWatch and is available at no extra cost.
AWS CloudWatch can be used to measure CPU utilization, network traffic, etc.
Elastic Load Balancing
Elastic Load Balancing (ELB) automatically distributes incoming request traffic across multiple
Amazon EC2 instances and results in achieving higher fault tolerance. It detects unfit instances and
automatically reroutes traffic to fit instances until the unfit instances have been restored in a round-
robin manner. However, if we need more complex routing algorithms, then choose other services
like Amazon Route53.
ELB consists of the following three components.
Load Balancer
This includes monitoring and handling the requests incoming through the Internet/intranet and
distributing them to EC2 instances registered with it.
Control Service
This includes automatically scaling of handling capacity in response to incoming traffic by adding
and removing load balancers as required. It also performs a fitness check of instances.
SSL Termination
ELB provides SSL termination that saves precious CPU cycles, encoding and decoding SSL within
your EC2 instances attached to the ELB. An X.509 certificate is required to be configured within
the ELB. This SSL connection in the EC2 instance is optional, we can also terminate it.
Features of ELB
CODESTAR
AWS CodeStar is a cloud-based service for creating, managing, and working with software
development projects on AWS. You can quickly develop, build, and deploy applications on AWS
with an AWS CodeStar project. An AWS CodeStar project creates and integrates AWS services for
your project development toolchain. Depending on your choice of AWS CodeStar project template,
that toolchain might include source control, build, deployment, virtual servers or serverless
resources, and more. AWS CodeStar also manages the permissions required for project users (called
team members)
Operations on CodeStar
● Start new software projects on AWS in minutes using templates for web applications,
web services, and more: AWS CodeStar includes project templates for various project
types and programming languages. Because AWS CodeStar takes care of the setup, all of
your project resources are configured to work together.
● Manage project access for your team: AWS CodeStar provides a central console where
you can assign project team members the roles they need to access tools and resources. These
permissions are applied automatically across all AWS services used in your project, so you
don't need to create or manage complex IAM policies.
● Visualize, operate, and collaborate on your projects in one place: AWS CodeStar
includes a project dashboard that provides an overall view of the project, its toolchain, and
important events. You can monitor the latest project activity, like recent code commits, and
track the status of your code changes, build results, and deployments, all from the same
webpage. You can monitor what's going on in the project from a single dashboard and drill
into problems to investigate.
● Iterate quickly with all the tools you need: AWS CodeStar includes an integrated
development toolchain for your project. Team members push code, and changes are
automatically deployed. Integration with issue tracking allows team members to keep track
of what needs to be done next. You and your team can work together more quickly and
efficiently across all phases of code delivery.
Step 4: Create an Amazon EC2 Key Pair for AWS CodeStar Project
AWS CodeCommit is a version control service hosted by Amazon Web Services that you can use
to privately store and manage assets (such as documents, source code, and binary files) in the cloud.
CodeCommit is a secure, highly scalable, managed source control service that hosts private Git
repositories. CodeCommit eliminates the need for you to manage your own source control system
or worry about scaling its infrastructure. You can use CodeCommit to store anything from code to
binaries. It supports the standard functionality of Git, so it works seamlessly with your existing Git-
based tools.
● Benefit from a fully managed service hosted by AWS. CodeCommit provides high service
availability and durability and eliminates the administrative overhead of managing your own
hardware and software. There is no hardware to provision and scale and no server software
to install, configure and update.
● Store your code securely. CodeCommit repositories are encrypted at rest as well as in transit.
● Work collaboratively on code. CodeCommit repositories support pull requests, where users
can review and comment on each other's code changes before merging them to branches;
notifications that automatically send emails to users about pull requests and comments; and
more.
● Easily scale your version control projects. CodeCommit repositories can scale up to meet
your development needs. The service can handle repositories with large numbers of files or
branches, large file sizes, and lengthy revision histories.
● Store anything, anytime. CodeCommit has no limit on the size of your repositories or on the
file types you can store.
● Integrate with other AWS and third-party services. CodeCommit keeps your repositories
close to your other production resources in the AWS Cloud, which helps increase the speed
and frequency of your development lifecycle. It is integrated with IAM and can be used with
other AWS services and in parallel with other repositories. For more information, see
Product and service integrations with AWS CodeCommit.
● Easily migrate files from other remote repositories. You can migrate to CodeCommit from
any Git-based repository.
● Use the Git tools you already know. CodeCommit supports Git commands as well as its own
AWS CLI commands and APIs.
1. Use the AWS CLI or the CodeCommit console to create a CodeCommit repository.
2. From your development machine, use Git to run git clone, specifying the name of the
CodeCommit repository. This creates a local repo that connects to the CodeCommit
repository.
3. Use the local repo on your development machine to modify (add, edit, and delete) files, and
then run git add to stage the modified files locally. Run git commit to commit the files
locally, and then run git push to send the files to the CodeCommit repository.
4. Download changes from other users. Run git pull to synchronize the files in the CodeCommit
repository with your local repo. This ensures you're working with the latest version of the
files.
Creating a Commit
1. On your local computer, create the file you want to add as the first file to the CodeCommit
repository. A common practice is to create a README.md markdown file that explains the
purpose of this repository to other repository users. If you include a README.md file, the
content of the file is displayed automatically at the bottom of the Code page for your
repository in the CodeCommit console.
2. At the terminal or command line, run the put-file command, specifying:
● The name of the repository where you want to add the first file.
● The name of the branch you want to create as the default branch.
● The local location of the file. The syntax used for this location varies, depending on
your local operating system.
● The name of the file you want to add, including the path where the updated file is
stored in the repository.
● The user name and email you want to associate with this file.
● A commit message that explains why you added this file.
Note - The user name, email address, and commit message are optional but can help other users
know who made the change and why. If you do not supply a user name, CodeCommit defaults to
using your IAM user name or a derivation of your console login as the author name.
Code Build
AWS CodeBuild is a fully managed build service in the cloud. CodeBuild compiles your source
code, runs unit tests, and produces artifacts that are ready to deploy. CodeBuild eliminates the need
to provision, manage, and scale your own build servers. It provides prepackaged build environments
for popular programming languages and build tools such as Apache Maven, Gradle, and more. You
can also customize build environments in CodeBuild to use your own build tools. CodeBuild scales
automatically to meet peak build requests.
● Fully managed – CodeBuild eliminates the need to set up, patch, update, and manage your
own build servers.
● On-demand – CodeBuild scales on-demand to meet your build needs. You pay only for the
number of build minutes you consume.
● Out of the box – CodeBuild provides preconfigured build environments for the most
popular programming languages. All you need to do is point to your build script to start your
first build.
You can use the AWS CodeBuild or AWS CodePipeline console to run CodeBuild. You can also
automate the running of CodeBuild by using the AWS Command Line Interface (AWS CLI) or the
AWS SDKs.
To run CodeBuild by using the CodeBuild console, AWS CLI, or AWS SDKs, see Run AWS
CodeBuild directly.
As the following diagram shows, you can add CodeBuild as a build or test action to the build or test
stage of a pipeline in AWS CodePipeline. AWS CodePipeline is a continuous delivery service that
you can use to model, visualize, and automate the steps required to release your code. This includes
building your code. A pipeline is a workflow construct that describes how code changes go through
a release process.
To use CodePipeline to create a pipeline and then add a CodeBuild build or test action, see Use
CodePipeline with CodeBuild. For more information about CodePipeline, see the AWS
CodePipeline User Guide.
The CodeBuild console also provides a way to quickly search for your resources, such as
repositories, build projects, deployment applications, and pipelines. Choose Go to resource or press
the / key, and then enter the name of the resource. Any matches appear in the list. Searches are case
insensitive.
Code Deploy
● Code
● Serverless AWS Lambda functions
● Web and configuration files
● Executables
● Packages
● Scripts
● Multimedia files
CodeDeploy can deploy application content that runs on a server and is stored in Amazon S3
buckets, GitHub repositories, or Bitbucket repositories. CodeDeploy can also deploy a serverless
Lambda function. You do not need to make changes to your existing code before you can use
CodeDeploy.
Code Pipeline
AWS CodePipeline is a continuous delivery service you can use to model, visualize, and automate
the steps required to release your software. You can quickly model and configure the different stages
of a software release process. CodePipeline automates the steps required to release your software
changes continuously.
● Automate your release processes: CodePipeline fully automates your release process from
end to end, starting from your source repository through build, test, and deployment. You
can prevent changes from moving through a pipeline by including a manual approval action
in any stage except a Source stage. You can release when you want, in the way you want,
on the systems of your choice, across one instance or multiple instances.
● Establish a consistent release process: Define a consistent set of steps for every code
change. CodePipeline runs each stage of your release according to your criteria.
● Speed up delivery while improving quality: You can automate your release process to
allow your developers to test and release code incrementally and speed up the release of new
features to your customers.
● Use your favorite tools: You can incorporate your existing source, build, and deployment
tools into your pipeline. For a full list of AWS services and third-party tools currently
supported by CodePipeline, see Product and service integrations with CodePipeline.
● View progress at a glance: You can review real-time status of your pipelines, check the
details of any alerts, retry failed actions, view details about the source revisions used in the
latest pipeline execution in each stage, and manually rerun any pipeline.
● View pipeline history details: You can view details about executions of a pipeline,
including start and end times, run duration, and execution IDs.
Sample Questions
1. Explain the importance of Identity Access Management (IAM) in AWS. How does it
enhance security and compliance in cloud environments?
2. Compare and contrast Amazon S3 and Glacier in terms of use cases, performance, and
cost. When would you choose one over the other?
3. Discuss how Amazon CloudFront can be integrated with S3 to improve website
performance. What are the benefits of using a CDN?
4. Describe how EC2 instances can be utilized for auto-scaling. What factors should be
considered when setting up auto-scaling policies?
5. Explain the role of Route 53 in AWS. How does it contribute to the overall architecture of
a cloud-based application?
6. Discuss the differences between AWS-managed databases and self-managed databases
on EC2. What are the trade-offs of each approach?
7. Identify the components of AWS VPC and explain how they interact to create a secure
network environment.
8. How do AWS Developer Tools, such as CodeCommit and CodePipeline, facilitate a
Continuous Integration and Continuous Deployment (CI/CD) workflow? Provide an example.
9. Describe the process of deploying an application using CodeDeploy. What are the key
steps and configurations needed to ensure a successful deployment?
10. How can AWS CloudFormation complement the use of EC2 and VPC in managing
infrastructure as code? Discuss the benefits of this approach.