0% found this document useful (0 votes)
15 views23 pages

Chapter 4

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 23

Chapter 4

AWS Storages

Amazon Web Services (AWS) offers a variety of cloud storage solutions designed to meet different use
cases, ranging from object storage to block storage and file systems.

4.1 Amazon Simple Storage Service (S3)

Amazon Simple Storage Service (S3) is one of the most popular and widely used cloud storage services
offered by AWS. It provides scalable, durable, and secure object storage for a wide variety of use cases,
including backup, archiving, big data analytics, content storage, and disaster recovery. Amazon Simple
Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data
availability, security, and performance. Customers of all sizes and industries can use Amazon S3 to store
and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications,
backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3
provides management features so that you can optimize, organize, and configure access to your data to
meet your specific business, organizational, and compliance requirements.

4.1.1 Overview of AWS S3

Amazon S3 is designed to store an unlimited amount of data and provides a web interface to upload,
store, and retrieve files (referred to as "objects"). It is highly durable, with 99.999999999% durability, and
is used by businesses for storing vast amounts of unstructured data like documents, images, videos,
backups, and logs.

Core Components of Amazon S3

1. Buckets:
o A bucket is a container for storing objects. Each object is stored in exactly one bucket.
o Buckets are globally unique (the name of the bucket must be unique across all AWS
accounts).
o You can create multiple buckets within your AWS account, and each bucket can store an
unlimited number of objects.
2. Objects:
o An object is the basic unit of storage in S3. Each object consists of:
 Data: The actual content or file you are storing.
 Metadata: Data about the object (e.g., file type, creation date).
 Key: A unique identifier for the object within the bucket (e.g., file name).
o Objects can range in size from a few bytes to 5 terabytes.
3. Keys:
o A key is the unique identifier for an object within a bucket. Every object in a bucket must
have a unique key.
4. Regions:
o When you create an S3 bucket, you must choose a geographic region where your data
will be stored. AWS offers multiple regions globally to ensure that data can be stored
close to the end users for reduced latency.

 Storage Classes in Amazon S3 Glacier


Amazon S3 offers different storage classes, each optimized for specific use cases:

1. S3 Standard:
o For frequently accessed data. It offers low-latency and high-throughput performance.
o Use Cases: Active data, website hosting, real-time analytics.
2. S3 Intelligent-Tiering:
o Automatically moves data between two access tiers (frequent and infrequent access) to
optimize costs.
o Use Cases: Unpredictable access patterns.
3. S3 Standard-IA (Infrequent Access):
o For data that is less frequently accessed but needs to be retrieved quickly when required.
o Use Cases: Backups, disaster recovery, long-term storage.
4. S3 One Zone-IA:
o Similar to Standard-IA but stores data in a single Availability Zone, reducing costs.
o Use Cases: Data that can be recreated easily and doesn’t require multi-AZ redundancy.
5. S3 Glacier:
o For archival data that is rarely accessed but requires low-cost storage.
o Use Cases: Data archiving, regulatory compliance.
6. S3 Glacier Deep Archive:
o The lowest-cost storage class, designed for data that is rarely accessed and requires
retrieval times in hours.
o Use Cases: Long-term data retention for regulatory compliance.
7. S3 Outposts:
o Extends S3 storage to on-premises environments using AWS Outposts, providing hybrid
cloud storage.
o Use Cases: Hybrid cloud use cases requiring local data storage.

Key Features of Amazon S3

1. Scalability:
o S3 automatically scales to accommodate virtually unlimited data storage. As your data
grows, the service can expand to meet your needs without manual intervention.
2. Durability:
o Amazon S3 offers 11 nines (99.999999999%) durability over a given year. This is
achieved by automatically replicating your data across multiple Availability Zones (AZs).
3. Security:
o Data Encryption: S3 supports encryption at rest and in transit. You can encrypt data
using AWS Key Management Service (KMS) or server-side encryption (SSE) options
like SSE-S3, SSE-KMS, or SSE-C (customer-provided keys).
o Access Control: You can control access to your buckets and objects using IAM policies,
Bucket Policies, and Access Control Lists (ACLs). S3 also supports Bucket
Versioning to keep track of changes and ensure data recovery.
o Access Logging: You can enable access logging to track requests to your S3 resources.
4. Data Lifecycle Management:
o Lifecycle Policies: S3 supports automated lifecycle policies that can move data between
storage classes or delete objects after a specified period.
o Versioning: Enables version control for objects, ensuring that every change is tracked
and previous versions can be restored.
5. Event Notifications:
o S3 can trigger notifications to AWS Lambda, SNS, or SQS based on events like object
creation, deletion, or modification.

4.1.2 Bucket properties

In Amazon S3, a bucket is a fundamental container used to store objects (files). Every S3 object is stored
in a bucket, and you can configure various bucket properties to control access, storage behavior, and
logging. Here are the key properties of an Amazon S3 bucket:

1. Bucket Name

 Unique: Bucket names must be globally unique across all AWS accounts.
 Naming Rules: They must follow specific guidelines, such as using lowercase letters, numbers,
hyphens, and not starting or ending with a hyphen.
 DNS-Compliant: The bucket name must be DNS-compliant if you plan to use it for static
website hosting.

2. Region

 Storage Location: When creating a bucket, you must select a Region (geographic location)
where the data will be stored.
 Why Region Matters: Data stored in a specific region ensures lower latency for users in that
region and helps comply with data residency requirements (e.g., data sovereignty laws).

3. Access Control

 Bucket Policies: JSON-based policies that define who can access the bucket and what actions
they can perform (e.g., read, write).
 Access Control Lists (ACLs): Provides finer-grained control over access, allowing permissions
on both buckets and individual objects.
 AWS Identity and Access Management (IAM): IAM users and roles can be granted
permissions on S3 resources.
 Block Public Access Settings: Controls whether the bucket or objects within the bucket can be
publicly accessed.

4. Versioning

 Enabled or Disabled: Versioning helps keep multiple versions of an object, allowing you to
recover from accidental deletions or overwrites.
 Use Cases: Can be used for data recovery, auditing, or managing changes over time.
 Versioning States: Once enabled, all versions of an object are stored in the same bucket with
unique version IDs.

5. Logging

 Access Logging: You can configure S3 to log access requests to your bucket. These logs are
stored in a separate S3 bucket and can be used for security auditing or usage analysis.
 Log File Details: Includes information like the requester's IP address, request type, response
status, and more.
6. Lifecycle Policies

 Automated Data Management: Lifecycle policies allow you to automatically transition objects
between different storage classes (e.g., from S3 Standard to Glacier) or delete objects after a
specified period.
 Use Cases: For cost optimization and data retention, e.g., automatically archiving data that hasn’t
been accessed for a certain period.

7. Bucket Encryption

 Default Encryption: You can configure the bucket to automatically encrypt objects when they
are uploaded. S3 supports server-side encryption (SSE) using:
o SSE-S3: Uses Amazon S3-managed keys.
o SSE-KMS: Uses AWS Key Management Service (KMS) to manage keys.
o SSE-C: Customer-provided keys for encryption.
 Why Important: Ensures that all objects are encrypted at rest, enhancing security.

8. Transfer Acceleration

 Enable Acceleration: This feature speeds up uploads to your bucket by routing traffic through
Amazon CloudFront’s globally distributed edge locations.
 Use Cases: Useful for customers who need to upload large objects from regions far from the
bucket’s location.

9. Cross-Region Replication (CRR)

 Data Replication: CRR allows you to automatically replicate objects from one S3 bucket to
another, in a different AWS region. This is useful for disaster recovery, data compliance, and
improving latency by storing copies of data closer to users.
 Use Cases: Ensure high availability and durability across multiple regions.

10. Event Notifications

 Trigger Actions: You can set up notifications to trigger events (e.g., an object being uploaded)
and send them to services like AWS Lambda, SNS, or SQS for further processing.
 Use Cases: Automatically process uploaded data (e.g., image resizing, file validation).

11. Object Lock

 Compliance and Protection: S3 Object Lock prevents objects from being deleted or overwritten
for a specified period, which is critical for compliance with regulatory requirements (e.g.,
WORM – Write Once, Read Many).
 Retention Modes:
o Governance Mode: Users with the right permissions can modify or delete locked
objects.
o Compliance Mode: No one can modify or delete locked objects until the retention period
expires.

4.1.3 Bucket Policy management


A Bucket Policy in Amazon S3 is a JSON-based access control mechanism that allows you to define
rules and permissions for who can access the objects in a specific S3 bucket. Bucket policies can control
access based on factors like the requester's IP address, the HTTP method used, or other conditions such as
the use of specific AWS services or user roles.

Key Features of S3 Bucket Policy

 Permissions Control: Defines who can access the bucket and what actions they can perform
(e.g., GetObject, PutObject, DeleteObject).
 Flexible Conditions: Supports conditions to control access based on time, IP address, or specific
AWS services.
 Global Scope: The policy applies to all objects within the bucket.
 JSON Format: Bucket policies are written in JSON format, and they allow you to specify
multiple rules within one policy.

Structure of an S3 Bucket Policy

A typical S3 bucket policy includes the following elements:

1. Version: Specifies the policy language version (usually "2012-10-17").


2. Statement: An array of policy statements that define the permissions.
o Effect: Can be Allow or Deny. It defines whether the action is permitted or denied.
o Principal: Specifies the AWS accounts, users, or services that are allowed or denied
access.
o Action: Lists the S3 actions that are allowed or denied, such as s3:GetObject or
s3:PutObject.
o Resource: The S3 bucket and its objects that the policy applies to. It can be the bucket
itself or specific objects within the bucket.
o Condition: (Optional) Defines specific conditions under which the policy will be applied,
such as IP address or HTTP method.

Example of a Simple Bucket Policy

Here is an example of a basic S3 bucket policy that grants read-only access to all objects in a bucket to
everyone (public access):

json
Copy code
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}
4.2 Glacier

Amazon Glacier is a secure, durable, and low-cost cloud storage service offered by AWS, primarily
designed for data archiving and long-term backup. It is ideal for data that is rarely accessed but needs to
be retained for long periods, such as compliance records, logs, backup copies, and media files. Glacier is
part of Amazon S3, but it focuses on low-cost, long-term storage with retrieval options that are designed
for infrequent access.

4.2.1 Key Features of AWS Glacier

1. Low-Cost Storage:
o Glacier is specifically designed to provide extremely low-cost storage compared to other
storage services like Amazon S3 Standard.
o The pricing is based on storage usage and the retrieval options you choose (e.g., how
quickly you need the data).
2. Durability:
o Glacier provides 11 nines (99.999999999%) durability over a given year, similar to S3,
ensuring that your archived data is safe and available.
3. Data Retrieval Options:
o Glacier offers multiple retrieval options that balance cost and speed. The retrieval options
are:
 Expedited: Retrieve data within 1-5 minutes. It’s the fastest option and is ideal
for critical data.
 Standard: Retrieve data within 3-5 hours, suitable for less urgent access needs.
 Bulk: Retrieve large amounts of data in 5-12 hours, offering the lowest cost but
the longest retrieval time.
4. Integration with Amazon S3:
o Glacier is fully integrated with Amazon S3, and you can use S3’s APIs and management
tools to interact with Glacier.
o S3 Glacier Storage Class: Glacier is part of the S3 family, so data can be archived
directly from Amazon S3 to Glacier via lifecycle policies, or data can be migrated to
Glacier from other storage tiers as needed.
5. Compliance and Security:
o Encryption: Glacier supports server-side encryption for data at rest, ensuring that your
data is securely stored.
o Access Control: You can control access to Glacier data using AWS Identity and Access
Management (IAM), S3 bucket policies, and other access control mechanisms.
o Data Integrity: Glacier automatically checks the integrity of stored data and repairs any
corruption found using checksums.
6. Vaults:
o Glacier stores data in vaults, which are containers for archived data. Each vault can hold
an unlimited amount of data and you can configure access policies and encryption
settings for each vault.
7. Vault Lock:
o Vault Lock is a feature that helps you enforce compliance controls by preventing any
changes to your archive data. Once configured, a vault lock can enforce a "Write Once,
Read Many" (WORM) model, ensuring that data cannot be modified or deleted for a
specified retention period.
8. Lifecycle Policies:
o You can configure S3 Lifecycle policies to automatically transition data from S3 storage
classes (e.g., S3 Standard, S3 Standard-IA) to Glacier after a defined period of inactivity.
This automation helps you manage costs by moving data to lower-cost storage without
manual intervention.

4.2.3 S3 Glacier storage classes

Amazon S3 offers a variety of storage classes designed to meet different needs in terms of cost, access
frequency, and data retention. These classes are optimized for different use cases, such as frequent access,
infrequent access, archival storage, or long-term retention. Here’s a detailed breakdown of the S3 storage
classes:

1. Amazon S3 Intelligent- Tiering storage class

Amazon S3 Intelligent-Tiering is the only cloud storage class that delivers automatic storage cost savings
when data access patterns change, without performance impact or operational overhead. The Amazon S3
Intelligent-Tiering storage class is designed to optimize storage costs by automatically moving data to the
most cost-effective access tier when access patterns change. ™ For a small monthly object monitoring
and automation charge, S3 Intelligent-Tiering monitors access patterns and automatically moves
objects that have not been accessed to lower-cost access tiers. Since the launch of S3 Intelligent-Tiering
in 2018, customers have saved $2 billion from adopting S3 Intelligent-Tiering when compared to S3
Standard. S3 Intelligent-Tiering is the ideal storage class for data with unknown, changing, or
unpredictable access patterns, independent of object size or retention period. You can use S3
Intelligent-Tiering as the default storage class for virtually any workload, especially data lakes, data
analytics, new applications, and user-generated content.

Purpose: Designed for data with unpredictable access patterns, automatically moves data between two
access tiers (frequent and infrequent access) to optimize costs.
Cost: Similar to S3 Standard, but with an additional cost for monitoring and automation.
Durability: 99.999999999% (11 nines).
Availability: 99.9% availability.
Retrieval Time: Data is accessible instantly, regardless of tier.
Use Cases:

 Data with unpredictable or changing access patterns (e.g., logs, backup files, or any data whose
access patterns are hard to predict).
 Helps optimize costs by automatically moving less frequently accessed data to a lower-cost
storage class.
 Ideal for situations where you don’t know whether the data will be accessed frequently or
infrequently.

2. Amazon S3 Express One Zone

Amazon S3 Express One Zone is a high- performance, single-Availability Zone storage class
purpose-built to deliver consistent single-digit millisecond data access for your most frequently
accessed data and latency-sensitive applications. S3 Express One Zone delivers data access speed up to
10x faster and request costs up to 50% lower than S3 Standard. While you have always been able to
choose a specific AWS Region to store your S3 data, with S3 Express One Zone you can select a
specific AWS Availability Zone within an AWS Region to store your data. You can choose to co-
locate your storage and compute resources in the same Availability Zone to further optimize
performance, which helps lower compute costs and run workloads faster. With S3 Express One Zone,
data is stored in a different bucket type—an S3 directory bucket—which supports hundreds of
thousands of requests per second. The Amazon S3 Glacier storage classes are purpose-built for data
archiving, providing you with the highest performance, most retrieval flexibility, and the lowest cost
archive storage in the cloud. All S3 Glacier storage classes provide virtually unlimited scalability and are
designed for 99.999999999% (11 nines) of data durability. The S3 Glacier storage classes deliver options
for the fastest access to your archive data and the lowest-cost archive storage in the cloud.

3. Amazon S3 Glacier storage class

The Amazon S3 Glacier storage classes are purpose-built for data archiving, providing you with the
highest performance, most retrieval flexibility, and the lowest cost archive storage in the cloud. All S3
Glacier storage classes provide virtually unlimited scalability and are designed for 99.999999999% (11
nines) of data durability. The S3 Glacier storage classes deliver options for the fastest access to your
archive data and the lowest-cost archive storage in the cloud.

Purpose: Designed for long-term archival storage, ideal for infrequently accessed data that needs to be
stored for years.

Cost: Very low-cost storage, suitable for data that doesn’t require frequent access.
Durability: 99.999999999% (11 nines).
Availability: 99.99% availability.
Retrieval Time:

 Expedited: 1-5 minutes (for urgent retrieval).


 Standard: 3-5 hours (for less urgent retrieval).
 Bulk: 5-12 hours (for large volumes of data).

Use Cases:

 Archival data, such as backup files, compliance data, or digital media.


 Long-term storage for disaster recovery and regulatory purposes where quick access is not
critical.

4. Amazon S3 Glacier Instant Retrieval storage class

Amazon S3 Glacier Instant Retrieval is an archive storage class that delivers the lowest-cost
storage for long-lived data that is rarely accessed and requires retrieval in milliseconds. With S3
Glacier Instant Retrieval, you can save up to 68% on storage costs compared to using the S3
Standard-Infrequent Access (S3 Standard-IA) storage class, when your data is accessed once per
quarter. ™ S3 Glacier Instant Retrieval delivers the fastest access to archive storage, with the same
throughput and milliseconds access as the S3 Standard and S3 Standard-IA storage classes.

™ S3 Glacier Instant Retrieval is designed for 99.999999999% (11 nines) of data


durability and 99.9% availability by redundantly storing data across multiple physically separated
AWS Availability Zones.
5. Amazon S3 Glacier Flexible Retrieval

™ S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier
Instant Retrieval), for archive data that is accessed 1—2 times per year and is retrieved
asynchronously.

™ For archive data that does not require immediate access but needs the flexibility to retrieve large
sets of data at no cost, such as backup or disaster recovery use cases, S3 Glacier Flexible
Retrieval (formerly S3 Glacier) is the ideal storage class.

™ S3 Glacier Flexible Retrieval delivers the most flexible retrieval options that balance cost with
access times ranging from minutes to hours and with free bulk retrievals

6. Amazon S3 Glacier Deep

™ S3 Glacier Deep Archive is Amazon S3’s lowest-cost storage class and supports long-term
retention and digital preservation for data that may be accessed once or twice in a year. ™ It is
designed for customers—particularly those in highly- regulated industries, such as financial
services, healthcare, and public sectors—that retain data sets for 7—10 years or longer to meet
regulatory compliance requirements. ™ S3 Glacier Deep Archive can also be used for backup
and disaster recovery use cases, and is a cost-effective and easy- to-manage alternative to magnetic
tape systems, whether they are on-premises libraries or off-premises services. S3 Glacier Deep
Archive complements Amazon S3 Glacier, which is ideal for archives where data is regularly
retrieved and some of the data may be needed in minutes. All objects stored in S3 Glacier Deep
Archive are replicated and stored across at least three geographically-dispersed Availability
Zones, protected by 99.999999999% of durability, and can be restored within 12 hours.

4. Amazon S3 Glacier Instant 4.2.3.1 Comparison of S3 Storage Classes

Storage Class Use Case Cost Durability Availability Retrieval Time


Frequently accessed 99.999999999% Instant (Low
S3 Standard Higher 99.99%
data (11 nines) latency)
S3
Unpredictable access Similar to 99.999999999%
Intelligent- 99.9% Instant
patterns Standard (11 nines)
Tiering
S3 Standard- Infrequent access but Lower than 99.999999999%
99.9% Instant
IA fast retrieval needed Standard (11 nines)
S3 One Zone- Infrequent access, Lower than 99.999999999%
99.5% Instant
IA single AZ resilience Standard-IA (11 nines)
Expedited (1-5
Archival storage, 99.999999999% min), Standard (3-5
S3 Glacier Very low 99.99%
rarely accessed (11 nines) hrs), Bulk (5-12
hrs)
S3 Glacier Lowest cost archival 99.999999999% Standard (12 hrs),
Lowest 99.9%
Deep Archive storage (11 nines) Bulk (48 hrs)
Storage Class Use Case Cost Durability Availability Retrieval Time
On-premises data Based on
99.999999999%
S3 Outposts storage with AWS Outposts 99.9% Instant
(11 nines)
integration pricing

4.3 EBS

Amazon Elastic Block Store (EBS) is a scalable and high-performance block storage service designed
for use with Amazon EC2 (Elastic Compute Cloud) instances. It provides persistent block-level storage
that can be attached to EC2 instances, making it suitable for a wide range of applications, from databases
to file systems and enterprise applications.

Here’s a detailed overview of Amazon EBS:

Key Features of Amazon EBS

1. Persistent Storage:
o Data stored in EBS volumes persists even when an EC2 instance is stopped or
terminated. This is different from EC2 instance store volumes, which are temporary and
data is lost when the instance is stopped or terminated.
2. Block-Level Storage:
o EBS provides block-level storage, meaning data is stored in fixed-size blocks, which can
be formatted with a file system (e.g., NTFS, ext4) or used as raw storage for databases.
3. Scalability:
o EBS allows you to provision storage of varying sizes and performance characteristics to
meet the specific needs of your application.
4. High Availability and Durability:
o EBS volumes are designed for 99.999% durability and are automatically replicated within
an Availability Zone to protect against hardware failures. They are also backed by
AWS's extensive infrastructure, ensuring high availability and resilience.
5. Snapshot Support:
o You can take point-in-time snapshots of your EBS volumes, which are stored in Amazon
S3. Snapshots are incremental, meaning only changes to the volume are saved, making
them cost-efficient.
6. Encryption:
o EBS supports encryption of data at rest and in transit. You can encrypt volumes using
AWS Key Management Service (KMS) or specify your own encryption keys for better
security.
7. Performance Options:
o EBS offers different volume types optimized for different performance and cost
characteristics, allowing you to choose the best option for your application.

EBS Volume Types

Amazon EBS provides multiple volume types, each designed for different performance characteristics,
from throughput-oriented to latency-sensitive use cases. These are classified into two main categories:
1. SSD-backed (Solid State Drive) Volume Types

These volumes are ideal for transactional workloads, including databases and high-performance
applications.

 General Purpose SSD (gp3)


o Purpose: Offers a balance of price and performance for a wide range of transactional
workloads like virtual desktops, medium-sized databases, and development
environments.
o Performance: Delivers a baseline performance of 3,000 IOPS (input/output operations
per second) and up to 16,000 IOPS with burst capabilities.
o Throughput: Up to 125 MB/s.
o Use Cases: Web servers, small-to-medium-sized databases, dev/test environments.
 Provisioned IOPS SSD (io2 & io2 Block Express)
o Purpose: Designed for applications requiring high-performance, low-latency storage
such as large transactional databases or data warehousing.
o Performance: Provides consistent and high performance, up to 64,000 IOPS for io2 and
256,000 IOPS for io2 Block Express with sub-millisecond latency.
o Throughput: Up to 1,000 MB/s.
o Use Cases: Relational and NoSQL databases, big data analytics, business-critical
applications.
 io2 Block Express (newer version of io2)
o Purpose: A high-performance variant of Provisioned IOPS SSD, ideal for the most I/O-
intensive applications.
o Performance: Up to 256,000 IOPS with consistent low latency. Can scale to multiple
terabytes.
o Use Cases: Extremely high-performance database systems (Oracle, SQL Server),
enterprise applications that require extreme throughput.

2. HDD-backed (Hard Disk Drive) Volume Types

These are suitable for large, throughput-intensive workloads, where cost efficiency is a priority.

 Throughput Optimized HDD (st1)


o Purpose: Designed for large, sequential workloads that require high throughput, such as
big data processing, data warehousing, and log processing.
o Performance: Offers throughput of up to 500 MB/s.
o Use Cases: Data processing applications, large-scale analytics, media workflows.
 Cold HDD (sc1)
o Purpose: For infrequently accessed data with lower throughput requirements. Suitable
for archival storage.
o Performance: Up to 250 MB/s throughput.
o Use Cases: Cold storage of data, backup storage, and log storage.

Key Benefits of EBS

1. High Performance:
o EBS volumes, especially SSD-backed volumes like io2 and gp3, offer high IOPS and
throughput, making them ideal for transactional databases, virtual machines, and real-
time analytics.
2. Scalability:
o You can easily scale your EBS volumes up or down in size, change volume types, or
increase IOPS and throughput without downtime. You can also attach multiple EBS
volumes to a single EC2 instance for greater storage flexibility.
3. Cost-Effective:
o With options like gp3 and sc1, you can choose cost-efficient storage for workloads that
require large storage capacities but with less need for high IOPS and throughput.
4. Snapshots for Backup:
o EBS snapshots provide incremental backups that reduce costs. You can create and
automate snapshots, ensuring that your data is backed up and recoverable in the event of
failure.
5. Durability:
o EBS volumes are replicated within the Availability Zone to prevent data loss in case of a
hardware failure. This makes EBS highly resilient, offering 99.999% durability.
6. Security:
o EBS supports encryption for data both at rest and in transit. You can use AWS Key
Management Service (KMS) for managing encryption keys and ensure that your data is
protected at all times.
7. Integration with EC2:
o EBS is fully integrated with Amazon EC2, enabling you to attach multiple volumes to a
running instance and expand storage as your application grows.

Use Cases for Amazon EBS

 Database Storage: EBS is ideal for transactional databases that require high performance, such
as MySQL, Oracle, PostgreSQL, SQL Server, and NoSQL databases like Cassandra and
MongoDB.
 Big Data and Analytics: EBS is perfect for data analytics and processing applications, where
high throughput and large-scale data storage are required.
 Backup and Disaster Recovery: Snapshots allow you to create backups of your data with
minimal overhead and rapid recovery in case of failures.
 File Systems: EBS can be used to host file systems such as ext4, NTFS, and XFS that need
persistent and durable storage.
 Content Management Systems: EBS provides reliable storage for content management systems,
websites, and applications that require scalable and low-latency storage.
 Development and Test Environments: Use EBS volumes to store the source code, build
artifacts, or application configurations for DevOps pipelines.

 EBS Volume Types

Amazon EBS (Elastic Block Store) provides various volume types to suit different performance and cost
requirements. Below is a detailed explanation of the EBS volume types:

1. General Purpose SSD (gp3 and gp2)


gp3 (Recommended option for general workloads)

 Purpose: Next-generation SSD designed to provide consistent performance at a lower cost.


 Performance:
o Baseline: 3,000 IOPS and 125 MB/s throughput.
o Scalable up to: 16,000 IOPS and 1,000 MB/s (independent of volume size).
 Use Cases:
o Boot volumes.
o Medium-scale databases.
o Development and testing environments.
 Cost: 20% cheaper than gp2.

gp2

 Purpose: Older SSD designed for general workloads with variable performance tied to volume
size.
 Performance:
o Baseline IOPS: 3 IOPS per GB (up to 16,000 IOPS for volumes >=5,334 GB).
o Throughput: Up to 250 MB/s.
 Use Cases: Similar to gp3 but less efficient for higher performance needs.
 Note: gp3 is recommended over gp2 for new workloads.

2. Provisioned IOPS SSD (io2 and io1)

io2 (High durability and performance)

 Purpose: Premium SSD for latency-sensitive applications requiring high performance and
durability.
 Performance:
o IOPS: Up to 256,000 (for Nitro-based EC2 instances).
o Throughput: Up to 4,000 MB/s.
o Durability: 99.999% (10x more durable than io1).
 Use Cases:
o Critical databases (e.g., Oracle, SAP HANA, SQL Server).
o High-performance analytics.

io1

 Purpose: Older version of provisioned IOPS SSD.


 Performance:
o Similar IOPS and throughput to io2 but with lower durability (99.8%-99.9%).
 Use Cases: Legacy applications.
 Note: io2 is recommended over io1 for new workloads.

3. Throughput Optimized HDD (st1)


 Purpose: Designed for large, sequential workloads requiring high throughput.
 Performance:
o Throughput: Up to 500 MB/s.
o IOPS: Typically lower than SSDs, starting at 40 MB/s per TB.
 Use Cases:
o Big data processing.
o Data warehousing.
o Log processing.
 Cost: Lower than SSD volumes.
 Limitations: Not suitable for latency-sensitive applications.

4. Cold HDD (sc1)

 Purpose: Lowest-cost storage for infrequently accessed workloads.


 Performance:
o Throughput: Up to 250 MB/s.
o IOPS: Typically lower, starting at 12 MB/s per TB.
 Use Cases:
o Archive storage.
o Cold data that is rarely accessed.
 Cost: The cheapest option among EBS volumes.
 Limitations: Not suitable for performance-critical workloads.

5. Magnetic (Standard or EBS Magnetic) [Deprecated]

 Purpose: Legacy option for workloads that don't require high throughput or IOPS.
 Performance:
o IOPS: 40–200.
o Throughput: Limited.
 Use Cases: Rarely used; replaced by st1 or sc1 for cost-efficiency.
EBS Volume Lifecycle

1. Creating Volumes:
o EBS volumes can be created through the AWS Management Console, AWS CLI, or
AWS SDKs. You can select the appropriate volume type and size based on your
application’s needs.
2. Attaching to EC2 Instances:
o EBS volumes can be attached to running or stopped EC2 instances. Once attached, they
are available for mounting, allowing you to create file systems or databases.
3. Taking Snapshots:
o Snapshots provide point-in-time backups of your EBS volumes. Snapshots are
incremental, meaning only changed data is stored, making them efficient in terms of cost.
4. Expanding and Modifying Volumes:
o EBS volumes can be resized to accommodate data growth, and performance
characteristics (such as IOPS or throughput) can be changed without needing to stop the
associated EC2 instance.
5. Detaching and Deleting Volumes:
o Volumes can be detached from EC2 instances and deleted when no longer needed. You
can delete volumes that are no longer required to reduce costs.

4.3.1 EBS vs S3
 Object Storage vs Block Storage vs File Storage
4.4 Elastic File Storage

Amazon Elastic File System (EFS) is a fully managed, scalable file storage service that provides a
simple, scalable, and elastic NFS (Network File System) for use with Amazon EC2 instances and on-
premises servers. It is designed to be used by applications that require a file system interface and shared
access to data.

Key Features of Amazon EFS


1. Fully Managed:
o EFS is fully managed, meaning AWS handles the infrastructure, patching, and scaling of
the file system, so you can focus on your applications rather than managing the file
system itself.
2. Scalable:
o EFS automatically scales up and down as you add or remove files, without needing to
provision storage in advance. It can handle petabytes of data, and there are no limits to
the number of files you can store.
3. Shared File Storage:
o EFS provides a shared file system, allowing multiple EC2 instances (or other compute
resources) to access the same data simultaneously. This makes it well-suited for
applications requiring a centralized file system or for workloads that need to share data
among different instances.
4. Elastic and Pay-As-You-Go:
o EFS provides elastic storage capacity, meaning you only pay for the storage you use, with
no upfront costs or over-provisioning. As your data storage needs grow, EFS
automatically scales to meet demand.
5. High Availability and Durability:
o EFS is designed for 99.999999999% durability. Data is automatically replicated across
multiple Availability Zones (AZs) within a region, ensuring high availability and fault
tolerance.
6. Low-Latency Access:
o EFS provides low-latency, high-throughput access to file data. It is ideal for applications
that need high performance and can benefit from low-latency file access, such as big data
analytics, web serving, content management, and media processing.
7. Integration with EC2:
o EFS can be easily mounted to EC2 instances using NFS v4.1 or v4.0 protocols, making
it simple to integrate with applications running on EC2.
8. Encryption:
o EFS supports encryption both at rest (while stored) and in transit (when being transferred
between clients and the file system). This ensures that your data is secure.
9. Access Control:
o EFS supports AWS Identity and Access Management (IAM) for controlling access and
permissions. It also integrates with AWS Directory Service for managing user access
using Active Directory credentials.

4.6 Identity and Access Management:

Identity and Access Management (IAM) is a framework designed to secure and manage access to cloud
resources effectively. Identity and access management (IAM) refers to a framework of policies,
technologies, and techniques to control identities in an electronic system. IAM frameworks allow
administrators to control who can access each digital resource in an organization and what actions they
can take. It is central to cloud security, ensuring that only authorized users or systems can access and
perform actions on cloud resources. IAM is a foundational component of cloud security, ensuring that:

1. Authentication verifies identities.


2. Authorization ensures proper access controls.
3. Auditing tracks and logs all access activity.
By implementing a robust IAM framework, organizations can secure their cloud environments, reduce
risks, and maintain operational compliance.

 Core Concepts of IAM

1. Identity Management

 Users: Represent people or applications needing access to cloud resources. Users can have:
o Login credentials (username/password).
o Access keys for programmatic access.
 Groups: Logical collections of users with similar access needs, simplifying permission
management.
 Roles: Assignable identities with permissions, ideal for temporary or service-level access.
 Service Accounts: Non-human identities used by applications or services to interact with cloud
resources.

2. Authentication

Ensures that users or systems are who they claim to be:

 Credentials such as passwords, API keys, and certificates.


 Multi-Factor Authentication (MFA): Adds an extra layer of security by requiring multiple
forms of verification (e.g., a password + a smartphone-generated code).

3. Authorization

Defines the actions a user or system can perform:

 Policies: Documents (e.g., JSON-based in AWS) that specify permissions (allow/deny) for
specific resources or actions.
 Granular Permissions: Fine-tuned access control, e.g., allowing a user to “read” a storage
bucket but not “write” or “delete.”

4. Federated Access

 Integration with external identity providers (e.g., Active Directory, Okta, Google Workspace)
allows users to use single sign-on (SSO) to access cloud services.
 Simplifies user management for organizations with existing identity solutions.

5. Monitoring and Auditing

 Logs all access and actions performed on resources.


 Enables tracking of who accessed what, when, and how, which is critical for security and
compliance.

Identity Access Management is used by the root user (administrator) of the organization. The users
represent one person within the organization, and the users can be grouped in that all the users will have
the same privileges to the services.
4.6.1 IAM Architecture

IAM Architecture has following management activities

User Management:- It consists of activities for the control and management over the identity life cycles.

Authentication Management:- It consists of activities for effectively controlling and managing the
processes for determining which user is trying to access the services and whether those services are
relevant to him or not.

Authorization Management:- It consists of activities for effectively controlling and managing the
processes for determining which services are allowed to access according to the policies made by the
administrator of the organization.

Access Management:- It is used in response to a request made by the user wanting to access the
resources with the organization.

Above figure depicts the architecture of Identity and Access Management (IAM) in cloud computing. It
outlines the lifecycle of user authentication and authorization and the interconnected services involved in
managing identities and securing access to resources. The explanation of above figure is as follows

1. Authentication Management Services

 It Handles verifying the identity of users, employees, suppliers, customers, and other entities
needing access.The Key Elements of Authentication Management Services are
o Uses authentication mechanisms like usernames, passwords, tokens, or biometrics.
o May include federation services, enabling users from external identity providers to
access systems securely (e.g., Single Sign-On or SSO).
o Provides identity validation before allowing any access.

2. Access Management Services

 It Governs which resources authenticated users can access based on predefined rules and roles.
 Key Functions of Access Management Services are
o Enforcing access control policies to regulate what actions users can perform on systems
and applications.
o Integrating with federated identity systems to extend access control to external users or
organizations.

3. User Management Services

 It is a Centralized management of user identities throughout their lifecycle.


 Key Features of User Management Services are
o Authoritative Sources: Stores user identity information such as roles, groups, and
permissions.
o Maintains user lifecycle events like onboarding, role changes, and offboarding.
o Ensures consistent identity data across all connected systems.

4. Authorization Management

 It Approves and governs user access to resources based on predefined roles, rules, and the desired
security state.
 Key Aspects of Authorization Management are
o Authorization Models: Define roles, rules, or attribute-based policies determining
resource access.
o Aligns the desired security state with the actual state to ensure compliance.
o Works closely with provisioning services to ensure access rights are implemented
accurately.

5. Provisioning Services

 It Automates the assignment of access rights to users based on roles or rules.


o It does Provisioning and de-provisioning of user accounts, roles, and permissions.
o Synchronizes identity data across systems and applications to ensure appropriate access
levels.
o Operates either manually or via automated workflows.

6. Monitoring and Auditing Services

 It is responsible for Tracks, logs, and reports all user activities related to authentication,
authorization, and access.
 Monitoring and Auditing Services has following Key Components:
o Monitoring Services: Continuously observes user activities and system performance.
o Auditing Services: Logs user actions and resource access for compliance and forensic
purposes.
o Reporting Services: Generates reports for stakeholders to analyze access trends and
ensure compliance with security policies.

User Lifecycle Management

 The figure shows the user lifecycle, which includes the following stages:
o Identity Creation: A user is added to the system through user management services.
o Access Assignment: Roles and permissions are granted using authorization and
provisioning services.
o Activity Monitoring: User actions are tracked to ensure adherence to policies.
o De-provisioning: When users leave or their roles change, access is revoked through de-
provisioning.

Federation

 Federation enables external users or systems to access resources securely using their existing
credentials from external identity providers. This feature simplifies user management for
organizations collaborating with external partners.

Interactions among Components

1. Authentication ensures a user’s identity is verified before accessing the system.


2. Authorization determines what actions the user can perform.
3. Provisioning services grant permissions, ensuring roles and policies align with user activities.
4. Monitoring and Auditing continuously ensure compliance and provide actionable insights.
5.

Role-Based Access Control (RBAC)

Role-Based Access Control (RBAC) is a method of regulating access to resources based on the
roles of individual users within an organization. It is one of the most commonly used access
control models because it simplifies management and enforces the principle of least privilege,
allowing users to access only what they need for their job responsibilities.

Key Concepts in RBAC

1. Roles:
o Represent a collection of permissions.
o Define the tasks a user can perform, e.g., "Database Administrator," "Developer," or
"Viewer."
o A role might include multiple permissions, such as read, write, or delete access to
specific resources.
2. Users:
o Individual entities (people or services) assigned to one or more roles.
o A user's access permissions are determined by the roles assigned to them.
3. Permissions:
o Define the actions that can be performed on resources (e.g., "read," "write," or "delete"
access).
o Assigned to roles, not directly to users.
4. Resources:
o Objects or services that users need access to, such as databases, files, or APIs.
5. Groups (Optional):
o Collections of users with similar responsibilities.
o Can be assigned to roles for simplified access management.

Some Important Questions

1) Write a short note on Amazon Simple Storage Service (S3)


2) Explain in brief Storage Classes in Amazon S3 Glacier
3) Explain the Bucket properties of S3
4) Enlist the Key features of AWS Glacier
5) Explain S3 Glacier storage classes in detail
6) What is EBS? Explain the different EBS volume types in brief
7) Explain the various storage classes and EBS volume types in brief
8) What is IAM? Explain the architecture of IAM along with concept of RBAC
9) Compare between Object Storage vs Block Storage
10) Compare between S3 vs EBS
11) Write a short note on IAM

You might also like