0% found this document useful (0 votes)
21 views109 pages

CP Notes - Storage Services

The document provides an overview of Amazon Elastic Block Store (EBS) and AWS Simple Storage Service (S3), detailing their functionalities, use cases, and different storage classes. It explains the architecture of EBS, including its attachment to EC2 instances, and discusses various S3 storage classes like Standard, Infrequent Access, and Glacier, along with their pricing and durability features. Additionally, it covers S3's capabilities for static website hosting, lifecycle policies, versioning, and transfer acceleration, as well as introducing Amazon SQS for message queuing and AWS Storage Gateway for hybrid storage solutions.

Uploaded by

Nafez Rajha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views109 pages

CP Notes - Storage Services

The document provides an overview of Amazon Elastic Block Store (EBS) and AWS Simple Storage Service (S3), detailing their functionalities, use cases, and different storage classes. It explains the architecture of EBS, including its attachment to EC2 instances, and discusses various S3 storage classes like Standard, Infrequent Access, and Glacier, along with their pricing and durability features. Additionally, it covers S3's capabilities for static website hosting, lifecycle policies, versioning, and transfer acceleration, as well as introducing Amazon SQS for message queuing and AWS Storage Gateway for hybrid storage solutions.

Uploaded by

Nafez Rajha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Elastic Block Store (EBS)

1
Simple Use-Case: External Hard Disk
External Hard Disk can be attached to the Workstation and detached whenever
required (portable)

Depending on the use-case, you can buy external storage of different


configuration based on size, performance and others.

Workstation
Understanding the Basics
Elastic Block Store (Amazon EBS) is a scalable, high-performance block-storage
service designed for Amazon EC2.

These volumes can be attached and detached from EC2 instance.

Storage Volume EC2 Instance


EBS is Network Storage
EBS volumes are attached to EC2 instance through Network.

There can be latency that might be involved.

1 volume attached

Network Mount
EC2 Instance

Elastic Block Store


Availability Zone Specific
You create an EBS volume in a specific Availability Zone, and then attach it to an
instance in that same Availability Zone.

Cross Availability Zone based attachment is not supported.

1 volume attached

us-east-1a us-east-1a
EC2 and EBS Attachment
One EC2 instance can be attached with multiple set of EBS volumes of different
sizes.

10 GB

EC2 Instance

100 GB
Instance Store
Temporary Data
Revising EBS Architecture
In this type of architecture, the EBS volumes are mounted via the network to the EC2
instance.

When we stop and start the EC2, the VM might migrate to a different host altogether.
Since EBS is mounted via the network, the challenge related to data does not arise.

Network

knowledge portal
What is Instance Store ?

AWS Instance Store provides temporary block storage volumes for use with EC2.

This storage is located on the disks that are physically attached to the host computer.

The size of instance store varies depending on your instance type.

knowledge portal
Important Points for Instance Store

Data in instance store is lost in the following situation:

● The underlying disk drive fails.


● The instance stops.
● The instance terminates.

Instance store is included in the cost of EC2 instance, so they are quite cost-effective.

If planning to use instance store, make sure you backup your data to central storage places
like S3.

knowledge portal
AWS Simple Storage Service (S3)

Unlimited Storage
Use Case - Storage Capacity
We all have some important data that needs to be reliably stored.

Most of us decides to store our data in external hard disk drives as a backup.

Important Folder

knowledge portal
Better Approach
In this approach, we decide to replicate the data in two different hard disk drives.

Downside:

● Expensive Approach,
● If you keep these disk in same location, there is a risk.

Important Folder
Best Approach
In this approach, we store the data in Cloud and let Cloud provider take care of backups.

Backups

Important Folder Cloud Storage


Cloud Storage Providers
There are multiple Cloud storage providers that are available.

Depending on your use-case, you can select one among them.

Cloud Storage
Introduction to S3

● AWS Simple Storage Service (S3) is an object storage designed to store and retrieve any
amount of data from anywhere

● It is designed for 99.999999999% durability and 99.99% availability.

● The aspect that makes AWS S3 so powerful are it’s associated features.

Versioning Encryption Logging

Transfer Cross Region Events


Acceleration Replication

Requester Pays Static Website Hosting Tagging


S3 Terminology
There are two important terminology in AWS S3 :

- Buckets
- Objects

Important Note: Bucket Names are unique across entire AWS Namespace.
S3 Storage Classes
Cloud Storage is Saviour
Use-Case - Netflix
Netflix offers various different subscription plans for various category of requirements.

Main Aim: Watch the Entertainment Content

knowledge portal
Initial Challenge - S3
AWS has millions of active customers.

Each customer might have different requirements for data storage.

Main Aim: Store Data.

knowledge portal
S3 Storage Classes
Amazon S3 offers a range of storage classes designed for different use cases.

Storage Classes Description

Offers high durability, availability, and performance


S3 Standard object storage for frequently accessed data

S3 Standard-Infrequent Access For data that is accessed less frequently, but requires
rapid access when needed.

Amazon S3 Glacier Low-cost storage class for data archiving

knowledge portal
AWS S3 Standard

S3 Standard offers high durability, availability, and performance object storage for frequently
accessed data.

Designed for durability of 99.999999999% of objects ( eleven nines )

Example :-

If we have 10,000 files stored in S3 ( 11 nines durability ) then you can expect to lose one file
every ten million years.

knowledge portal
AWS S3 Standard IA
S3 Standard-IA is for data that is accessed less frequently, but requires rapid access when
needed.

Comparing storage cost of 1TB data stored in S3 based on accessibility patterns.

Criteria Amazon S3 Amazon S3 IA

Storage of 1TB Data $23.44 $23.50

50% storage accessed in - $18.18


last 30 days

0% storage accessed in last - $12.80


30 days

knowledge portal
Amazon S3 Glacier

Glacier is meant to be for archiving and for storing long-term backups.

Ideally meant for data that needs to be archived for years without much requirement of access.

Criteria Amazon S3 Glacier

Storage of 1TB Data $23.44 $4.10

knowledge portal
Multiple S3 Storage Classes
Durability vs Availability

● Durability is percent ( % ) over one year period of time that the file which is stored in S3
will not be lost.

● Availability is percent (%) over one year period of time that the file stored in S3 will
not be available.

Example :-

For Servers, Availability is one of the key metric and any minute of downtime is a loss.
However what happens if component of server itself fails and server goes down ?

knowledge portal
S3 Intelligent-Tiering
Smart Automated System
Overview of S3 Intelligent Tiering

The S3 Intelligent Tiering is primarily designed to optimize cost by automatically moving


data to most cost-effective tier.

● 1TB of data stored in Standard S3 = $23.44


● 1TB of data stored in Standard IA = $12.80

Organization stores terabytes of data in S3.

It will be great if a solution automatically moves infrequent data to Standard IA.

knowledge portal
Overview of S3 Intelligent Tiering
The S3 Intelligent Tiering works by storing data in one of the two access tiers:

● Frequent Access Tier (Costly)


● Infrequent Access Tier (Much cheaper)

Frequent Access Tier Infrequent Access Tier

knowledge portal
Overview of S3 Intelligent Tiering
The S3 Intelligent Tiering works by storing data in one of the two access tiers:

● Frequent Access Tier (Costly)


● Infrequent Access Tier (Much cheaper)

Frequent Access Tier Infrequent Access Tier

knowledge portal
Overview of S3 Intelligent Tiering

Smart Automation System

Frequent Access Tier Infrequent Access Tier

knowledge portal
Revising S3 Intelligent Tiering
Amazon S3 monitors access patterns of the objects in S3 Intelligent-Tiering, and moves the
ones that have not been accessed for 30 consecutive days to the infrequent access tier.

If an object in the infrequent access tier is accessed, it is automatically moved back to the
frequent access tier.

A monthly monitoring and automation fee is charged at a per object level.

knowledge portal
S3 Storage Class - One Zone IA
Back again!
Understanding the Basics
Storage classes like Standard S3, Standard IA stores the data in minimum 3 availability zones.

Due to this, the overall cost per of storage is increased with such architecture.

Availability Zone 1 Availability Zone 2 Availability Zone 3

knowledge portal
Overview of One Zone IA
S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA.

It’s a good choice for storing secondary backup copies of on-premises data or easily recreatable
data.

Data will be lost in-case of availability zone destruction.

Availability Zone 1

knowledge portal
Pricing Comparison

Overview of Pricing comparison between storage classes:

● 1TB of data stored in Standard S3 = $23.44

● 1TB of data stored in Standard IA = $12.80

● 1 TB of data stored in One Zone IA = $10.24

knowledge portal
S3 Storage Class - Glacier
Back again!
Overview of Glacier
Amazon S3 Glacier is a low-cost, cloud-archive storage service that provides secure and durable
storage for data archiving and online backup.

Storage Class

Glacier Glacier Deep Archive

knowledge portal
Pricing Comparison

Storage Class & Data Storage Pricing

1 TB Data stored in S3 Standard $23.44

1 TB Data stored in One Zone IA $12.80

1 TB Data stored in Glacier $4.10

1 TB Data stored in Glacier Archive $1.02

knowledge portal
Glacier vs Glacier Deep Archive
To keep costs low, Amazon S3 Glacier provides three options for access to archives, from a few
minutes to several hours.

Glacier Deep Archive provides two access options, which range from 12 to 48 hours

Storage Class Expedited Standard Bulk

Amazon S3 Glacier 1–5 minutes 3–5 hours 5–12 hours

S3 Glacier Deep Archive Not available Within 12 hours Within 48 hours

knowledge portal
Important Note

Amazon S3 Glacier for archiving data that might infrequently need to be restored from minutes
to few hours.

S3 Glacier Deep Archive for archiving long-term backup cycle data that might infrequently
need to be restored within 12 hours

knowledge portal
Static Website Hosting in S3
A Great Feature
Challenges with Static Websites
If you want to host a basic static website, you need to create and manage the entire server
infrastructure on a cloud provider.

EC2

SSM

ELB
Static Website

knowledge portal
Static Website Hosting in S3
With Amazon S3, you can now host static websites directly through S3.

S3 Bucket
Static Website

knowledge portal
Static Vs Dynamic Websites

On a static website, individual webpages include static content. They might also contain
client-side scripts.

Dynamic website relies on server-side processing, including server-side scripts such as PHP, JSP,
or ASP.NET.

knowledge portal
S3 Lifecycle Policies
S3 is Awesome
Overview of Lifecycle Policies

Organizations tends to keep terabytes of data in S3. For such cases, cost becomes a primary
factor.

Storing the data directly into the AWS S3 Standard is not the best approach. Depending on the
access patterns, criticality of the data, data should be transitioned to appropriate storage class.

● We can store 1 months of logs in Amazon S3 Standard.


● Move the logs older than 1 month to S3 Standard-IA
● Move the logs older than 6 months to Glacier

knowledge portal
Transition Actions

After 30 days After 60 days

video.mp4 video.mp4 video.mp4

S3 Standard-IA
Glacier
S3 Standard

knowledge portal
S3 Versioning

Versioning in Object Storage


Challenge 1 - Multiple Object with Same Key

Store File
a.txt
a.txt

a.txt

Storage

knowledge portal
Challenge 2 - Accidental Deletion of Objects

Delete File

a.txt

Storage

knowledge portal
Versioning in Object Storage

Versioning allows users to keep multiple variants of an object in the same S3 bucket.

You can use versioning to preserve, retrieve, and restore every version of every object stored in
your Amazon S3 bucket.

a.txt a.txt

a.txt a.txt

Storage

knowledge portal
Important Pointers for Versioning
Once you version enable a bucket, it can never return to an unversioned state. You can,
however, suspend versioning on that bucket.

The versioning state applies to all (never some) of the objects in that bucket.

knowledge portal
S3 Transfer Acceleration
Least Latencies
Overview of S3 Transfer Acceleration
S3 Transfer Acceleration allows users to accelerate data uploads from all over the world to
centralized S3 bucket.

The transfers are accelerated by routing data to the closest edge location.

Optimized Network
STA Path

CloudFront Edge S3 Bucket

User

Direct S3 Upload
Edge Locations

knowledge portal
S3 Transfer Acceleration

knowledge portal
Intro to Amazon SQS
Message Queuing Service
Use-Case: Restoring Image Application
Medium Corp is designing an application that will enhance and restore the images that
users submit through the online portal.

knowledge portal
Current Architecture

The overall architecture involves two components:

1. Image Gatherer - Takes the Images from the user via Upload button.
2. Imager Enhancer - Receives the Image from Image Gatherer.

Send Image

Image Gatherer Image Enhancer

knowledge portal
Challenges
Due to popularity of the application and huge traffic spike, Medium Corp has decided to
add more image enhancer servers.

server-a 10.77.2.5

server-b 10.66.0.10

server-c 192.168.0.2

server-d 10.66.10.10
Better Architecture
One of the main function of message queue service is to take message from a Publisher
and forward that to a consumer.

The queue stores these messages internally.

Send to Queue
Take from Queue

Highly-Available Queue

knowledge portal
Introduction to SQS

Amazon SQS is a fast reliable, scalable, and fully managed message queuing service.

Amazon SQS makes it simple and quiet cost effective to decouple the components of a
specific application.

Send to Queue
Take from Queue

SQS

knowledge portal
Tightly Coupled Systems

Components of system architecture directly communicate with each other and have
hard-dependency on each other.

Producers Consumers

knowledge portal
Loosely Coupled System

Components of system architecture that can process the information without being
directly connected.

Producers
Consumers

knowledge portal
Storage Gateway
Hybrid Storage
Introduction

AWS Storage Gateway is a hybrid storage service that allows the on-premise application to
easily use the cloud storage

knowledge portal
Storage Gateway
Storage Gateway appliance uses standard storage protocols like NFS, iSCSI which the
application connects to and stores the data.

The other end of storage gateway connects to AWS storage services like S3, Glacier and
EBS Snapshots

knowledge portal
Storage Gateway Configuration

There are three different configuration available :

- Gateway Stored Volume


- Gateway Cached Volume
- Gateway-virtual tape library

knowledge portal
Gateway Stored Volumes

Gateway Stored Volume :

Stores primary data locally while asynchronously backing up data to AWS.

knowledge portal
Gateway Cached Volumes
Gateway Stored Volume :

Data is stored primarily on AWS S3 with cache of recently read or written data stored
locally in the on-premise server.

knowledge portal
Gateway-virtual tape library
Gateway-virtual tape library

Virtual tapes stored in S3 with frequently accessed data stored on-premise.

knowledge portal
Tape based storage
Tape backup is practice of periodically copying data from primary storage device to a tape
cartridge so data can be recovered if there is any data crash or failure on primary device.

Tape solutions remains most cost effective solution till date.

knowledge portal
VTS and VTL

knowledge portal
CloudWatch Logs
Logging Yet Again!
Challenges with Logging
A server can contain a lot of log files, from system logs to the application logs.

During debugging, it is important to have log files at hand.

This means in default case; you need to give access to the server to an individual who wants to
debug.

file.log
log line 01 - GET request
log line 02 - PUT request
log line 03 - DELETE
log line 04 - PATCH
log line 05 - POST
log line 06 - PUT request

knowledge portal
Disadvantage of the Approach
Access must be given to the server to the developers.

If the server gets terminated, the logs are lost.

No way to set up an alarm on certain conditions or create complex filters.

file.log
log line 01 - GET request
log line 02 - PUT request
log line 03 - DELETE
log line 04 - PATCH
log line 05 - POST
log line 06 - PUT request

knowledge portal
Better Way
● We create a Central Log Server.

● We push the log files from individual systems to Central Log Server.

knowledge portal
Centralized Log Management
There are multiple approaches to log the data centrally.

Linux comes with a default logging daemon called Rsyslog.

Commercial Log monitoring solutions like Splunk, ELK, can be used.

Services like AWS CloudWatch Logs also provides basic capabilities.

knowledge portal
CloudWatch Logs
CloudWatch Logs can be used to monitor, store, and access your log files from Amazon Elastic
Compute Cloud (Amazon EC2) instances, AWS CloudTrail, Route 53, and other sources.

It is a highly-available scalable service.

Amazon CloudWatch

knowledge portal
Steps - CloudWatch Logs
There are three steps required to set up CloudWatch Logs with EC2.

1. Create an IAM role with appropriate policy.


2. Install CloudWatch Logs Agent.
3. Modify the configuration file to associate appropriate log files.

Amazon CloudWatch

knowledge portal
Amazon Elastic File System (EFS)
Network Attached Storage
Overview of Elastic File System
Amazon Elastic File System (Amazon EFS) provides a simple, serverless, set-and-forget
elastic file system for use with AWS Cloud services and on-premises resources.

It is built to scale on demand to petabytes without disrupting applications, growing and


shrinking automatically as you add and remove files

Mount

Elastic File System EC2

knowledge portal
Attachment to Multiple Targets
Multiple compute instances, including Amazon EC2, Amazon ECS, and AWS Lambda,
can access an Amazon EFS file system at the same time, providing a common data source
for workloads.

Elastic File System

knowledge portal
Understanding EFS Architecture

To access file system from instance inside the VPC, we need to create mount target in the
VPC.

knowledge portal
Network File System
Network File System (NFS) is a networking protocol for distributed file sharing.

EFS uses the Network File System version 4 (NFS v4) protocol

Mount

Client

NFS Server

knowledge portal
Pricing Considerations
AWS EFS is expensive when compared to other storage options like EBS, S3.

Consideration Pricing

1 TB EFS with 80% frequently accessed data $250

1TB EBS Storage $102

1 TB of S3 Storage $24

knowledge portal
Important Pointers

If performance is your concern, always prefer EBS.

EFS can even be accessed from on-premise datacenter using an AWS Direct Connect or
AWS VPN connection.

With Amazon EFS, you pay only for what you use per month.

knowledge portal
Amazon Kinesis
Streaming Data
Basics of Streaming Data.
Streaming data is the continuous flow of data generated by various sources

Sensor 1

Sensor 2 Processing Store

Sensor 3

knowledge portal
Examples of Streaming Data
A financial institution tracks changes in the stock market in real time and adjust it’s portfolio
accordingly.

A media publisher streams billions of clickstream records from its online properties

knowledge portal
Challenges with Working of Streaming Data
Streaming data processing requires two layers: a storage layer and a processing layer.

The storage layer needs to support record ordering and strong consistency, replayable reads and
the processing layer is responsible for consuming data from the storage layer, running
computation on that data and many other tasks.

Storage Layer Processing Layer

knowledge portal
Basics of Amazon Kinesis
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you
can get timely insights and react quickly to new information.

Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale

knowledge portal
3 entities
There are 3 entities in this kind of use case:

Producer, Stream Store, Consumer

Amazon Kinesis
Consumers
Producers

knowledge portal
AWS Snowball Family

95
Understanding with Use-Case
Organization A has hosted all of it’s storage infrastructure in data-center.

Total Storage: 500 TB.

They have now decided to use S3 due to the benefits that it provides.

Bandwidth Transfer Time

100 Mbps 510 days.

500 Mbps 101 days

1 Gbps 50 days
AWS Snowball Family
Allows customers to Accelerate moving offline data or remote storage to the
cloud

Send Storage Device

AWS Internal Network

AWS Team Customer

Ship Back Device


How the Storage Devices Look

Snowball Edge

Snowmobile
Edge Computing Functionality
These devices can also come with edge computing capabilities.

This means, you can run your applications in EC2 instances in the devices so
that you can work in edge environments with limited connectivity.

Process data locally (Image/ Video Processing, Machine Learning etc)


Snowcone
AWS Snowcone is a small, rugged, and secure device offering edge computing,
data storage, and data transfer on-the-go, in severe environment with little or no
connectivity.

Can carry in backpack, drones and others.

8 TB of usable storage
Snowball Edge
AWS Snowball Edge is a type of Snowball device with on-board storage and
compute power for select AWS capabilities

Available in Multiple Storage Capacity like 100 TB, 40 TB and others.


Snowball Edge
Device Description

Snowball Edge Storage Optimized (for data This option has a 100 TB (80 TB usable) storage capacity.
transfer)

Snowball Edge Storage Optimized (with This option has up to 80 TB of usable storage space, 24
EC2 compute functionality) vCPUs, and 80 GB of memory for compute functionality

Snowball Edge Compute Optimized Most compute functionality, with 104 vCPUs, 416 GB of
memory, and 28 TB of dedicated NVMe SSD for compute
instances.

Snowball Edge Compute Optimized with Identical to the Compute Optimized option, except for an
GPU installed GPU
AWS Snowmobile
AWS Snowmobile moves extremely large amounts of data to AWS.

Transfer up to 100 PB per Snowmobile, a 45-foot-long ruggedized shipping


container pulled by a semi-trailer truck.
AWS OpsHub
AWS OpsHub is a graphical user interface you can use to manage your AWS
Snowball devices.
AWS Backup

105
Understanding with Use-Case
AWS has lots of services where data can be stored.

For production environment, data backup is one of the critical task.

Taking backup at individual service level can take lot of time and require
customization.

RDS DynamoDB S3 EFS


Introducing AWS Backup
AWS Backup is a fully-managed service that allows customers to configure
backup policies in one central place.

AWS Backup

Backup Backup

RDS DynamoDB S3 EFS


Benefits of using AWS Backup

● Easily create backup rules for daily, monthly backups.


● Backup Process is automated at a scheduled time.
● Supports Cross-Region, Cross-Account Backups.

● AWS Backup can back up on-premises Storage Gateway volumes and


VMware virtual machines

● Supports Retention Period that tells how long to store backup.


Join us in our Adventure

kplabs.in/twitter

Be Awesome

kplabs.in/linkedin

You might also like