CP Notes - Storage Services
CP Notes - Storage Services
1
Simple Use-Case: External Hard Disk
External Hard Disk can be attached to the Workstation and detached whenever
required (portable)
Workstation
Understanding the Basics
Elastic Block Store (Amazon EBS) is a scalable, high-performance block-storage
service designed for Amazon EC2.
1 volume attached
Network Mount
EC2 Instance
1 volume attached
us-east-1a us-east-1a
EC2 and EBS Attachment
One EC2 instance can be attached with multiple set of EBS volumes of different
sizes.
10 GB
EC2 Instance
100 GB
Instance Store
Temporary Data
Revising EBS Architecture
In this type of architecture, the EBS volumes are mounted via the network to the EC2
instance.
When we stop and start the EC2, the VM might migrate to a different host altogether.
Since EBS is mounted via the network, the challenge related to data does not arise.
Network
knowledge portal
What is Instance Store ?
AWS Instance Store provides temporary block storage volumes for use with EC2.
This storage is located on the disks that are physically attached to the host computer.
knowledge portal
Important Points for Instance Store
Instance store is included in the cost of EC2 instance, so they are quite cost-effective.
If planning to use instance store, make sure you backup your data to central storage places
like S3.
knowledge portal
AWS Simple Storage Service (S3)
Unlimited Storage
Use Case - Storage Capacity
We all have some important data that needs to be reliably stored.
Most of us decides to store our data in external hard disk drives as a backup.
Important Folder
knowledge portal
Better Approach
In this approach, we decide to replicate the data in two different hard disk drives.
Downside:
● Expensive Approach,
● If you keep these disk in same location, there is a risk.
Important Folder
Best Approach
In this approach, we store the data in Cloud and let Cloud provider take care of backups.
Backups
Cloud Storage
Introduction to S3
● AWS Simple Storage Service (S3) is an object storage designed to store and retrieve any
amount of data from anywhere
● The aspect that makes AWS S3 so powerful are it’s associated features.
- Buckets
- Objects
Important Note: Bucket Names are unique across entire AWS Namespace.
S3 Storage Classes
Cloud Storage is Saviour
Use-Case - Netflix
Netflix offers various different subscription plans for various category of requirements.
knowledge portal
Initial Challenge - S3
AWS has millions of active customers.
knowledge portal
S3 Storage Classes
Amazon S3 offers a range of storage classes designed for different use cases.
S3 Standard-Infrequent Access For data that is accessed less frequently, but requires
rapid access when needed.
knowledge portal
AWS S3 Standard
S3 Standard offers high durability, availability, and performance object storage for frequently
accessed data.
Example :-
If we have 10,000 files stored in S3 ( 11 nines durability ) then you can expect to lose one file
every ten million years.
knowledge portal
AWS S3 Standard IA
S3 Standard-IA is for data that is accessed less frequently, but requires rapid access when
needed.
knowledge portal
Amazon S3 Glacier
Ideally meant for data that needs to be archived for years without much requirement of access.
knowledge portal
Multiple S3 Storage Classes
Durability vs Availability
● Durability is percent ( % ) over one year period of time that the file which is stored in S3
will not be lost.
● Availability is percent (%) over one year period of time that the file stored in S3 will
not be available.
Example :-
For Servers, Availability is one of the key metric and any minute of downtime is a loss.
However what happens if component of server itself fails and server goes down ?
knowledge portal
S3 Intelligent-Tiering
Smart Automated System
Overview of S3 Intelligent Tiering
knowledge portal
Overview of S3 Intelligent Tiering
The S3 Intelligent Tiering works by storing data in one of the two access tiers:
knowledge portal
Overview of S3 Intelligent Tiering
The S3 Intelligent Tiering works by storing data in one of the two access tiers:
knowledge portal
Overview of S3 Intelligent Tiering
knowledge portal
Revising S3 Intelligent Tiering
Amazon S3 monitors access patterns of the objects in S3 Intelligent-Tiering, and moves the
ones that have not been accessed for 30 consecutive days to the infrequent access tier.
If an object in the infrequent access tier is accessed, it is automatically moved back to the
frequent access tier.
knowledge portal
S3 Storage Class - One Zone IA
Back again!
Understanding the Basics
Storage classes like Standard S3, Standard IA stores the data in minimum 3 availability zones.
Due to this, the overall cost per of storage is increased with such architecture.
knowledge portal
Overview of One Zone IA
S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA.
It’s a good choice for storing secondary backup copies of on-premises data or easily recreatable
data.
Availability Zone 1
knowledge portal
Pricing Comparison
knowledge portal
S3 Storage Class - Glacier
Back again!
Overview of Glacier
Amazon S3 Glacier is a low-cost, cloud-archive storage service that provides secure and durable
storage for data archiving and online backup.
Storage Class
knowledge portal
Pricing Comparison
knowledge portal
Glacier vs Glacier Deep Archive
To keep costs low, Amazon S3 Glacier provides three options for access to archives, from a few
minutes to several hours.
Glacier Deep Archive provides two access options, which range from 12 to 48 hours
knowledge portal
Important Note
Amazon S3 Glacier for archiving data that might infrequently need to be restored from minutes
to few hours.
S3 Glacier Deep Archive for archiving long-term backup cycle data that might infrequently
need to be restored within 12 hours
knowledge portal
Static Website Hosting in S3
A Great Feature
Challenges with Static Websites
If you want to host a basic static website, you need to create and manage the entire server
infrastructure on a cloud provider.
EC2
SSM
ELB
Static Website
knowledge portal
Static Website Hosting in S3
With Amazon S3, you can now host static websites directly through S3.
S3 Bucket
Static Website
knowledge portal
Static Vs Dynamic Websites
On a static website, individual webpages include static content. They might also contain
client-side scripts.
Dynamic website relies on server-side processing, including server-side scripts such as PHP, JSP,
or ASP.NET.
knowledge portal
S3 Lifecycle Policies
S3 is Awesome
Overview of Lifecycle Policies
Organizations tends to keep terabytes of data in S3. For such cases, cost becomes a primary
factor.
Storing the data directly into the AWS S3 Standard is not the best approach. Depending on the
access patterns, criticality of the data, data should be transitioned to appropriate storage class.
knowledge portal
Transition Actions
S3 Standard-IA
Glacier
S3 Standard
knowledge portal
S3 Versioning
Store File
a.txt
a.txt
a.txt
Storage
knowledge portal
Challenge 2 - Accidental Deletion of Objects
Delete File
a.txt
Storage
knowledge portal
Versioning in Object Storage
Versioning allows users to keep multiple variants of an object in the same S3 bucket.
You can use versioning to preserve, retrieve, and restore every version of every object stored in
your Amazon S3 bucket.
a.txt a.txt
a.txt a.txt
Storage
knowledge portal
Important Pointers for Versioning
Once you version enable a bucket, it can never return to an unversioned state. You can,
however, suspend versioning on that bucket.
The versioning state applies to all (never some) of the objects in that bucket.
knowledge portal
S3 Transfer Acceleration
Least Latencies
Overview of S3 Transfer Acceleration
S3 Transfer Acceleration allows users to accelerate data uploads from all over the world to
centralized S3 bucket.
The transfers are accelerated by routing data to the closest edge location.
Optimized Network
STA Path
User
Direct S3 Upload
Edge Locations
knowledge portal
S3 Transfer Acceleration
knowledge portal
Intro to Amazon SQS
Message Queuing Service
Use-Case: Restoring Image Application
Medium Corp is designing an application that will enhance and restore the images that
users submit through the online portal.
knowledge portal
Current Architecture
1. Image Gatherer - Takes the Images from the user via Upload button.
2. Imager Enhancer - Receives the Image from Image Gatherer.
Send Image
knowledge portal
Challenges
Due to popularity of the application and huge traffic spike, Medium Corp has decided to
add more image enhancer servers.
server-a 10.77.2.5
server-b 10.66.0.10
server-c 192.168.0.2
server-d 10.66.10.10
Better Architecture
One of the main function of message queue service is to take message from a Publisher
and forward that to a consumer.
Send to Queue
Take from Queue
Highly-Available Queue
knowledge portal
Introduction to SQS
Amazon SQS is a fast reliable, scalable, and fully managed message queuing service.
Amazon SQS makes it simple and quiet cost effective to decouple the components of a
specific application.
Send to Queue
Take from Queue
SQS
knowledge portal
Tightly Coupled Systems
Components of system architecture directly communicate with each other and have
hard-dependency on each other.
Producers Consumers
knowledge portal
Loosely Coupled System
Components of system architecture that can process the information without being
directly connected.
Producers
Consumers
knowledge portal
Storage Gateway
Hybrid Storage
Introduction
AWS Storage Gateway is a hybrid storage service that allows the on-premise application to
easily use the cloud storage
knowledge portal
Storage Gateway
Storage Gateway appliance uses standard storage protocols like NFS, iSCSI which the
application connects to and stores the data.
The other end of storage gateway connects to AWS storage services like S3, Glacier and
EBS Snapshots
knowledge portal
Storage Gateway Configuration
knowledge portal
Gateway Stored Volumes
knowledge portal
Gateway Cached Volumes
Gateway Stored Volume :
Data is stored primarily on AWS S3 with cache of recently read or written data stored
locally in the on-premise server.
knowledge portal
Gateway-virtual tape library
Gateway-virtual tape library
knowledge portal
Tape based storage
Tape backup is practice of periodically copying data from primary storage device to a tape
cartridge so data can be recovered if there is any data crash or failure on primary device.
knowledge portal
VTS and VTL
knowledge portal
CloudWatch Logs
Logging Yet Again!
Challenges with Logging
A server can contain a lot of log files, from system logs to the application logs.
This means in default case; you need to give access to the server to an individual who wants to
debug.
file.log
log line 01 - GET request
log line 02 - PUT request
log line 03 - DELETE
log line 04 - PATCH
log line 05 - POST
log line 06 - PUT request
knowledge portal
Disadvantage of the Approach
Access must be given to the server to the developers.
file.log
log line 01 - GET request
log line 02 - PUT request
log line 03 - DELETE
log line 04 - PATCH
log line 05 - POST
log line 06 - PUT request
knowledge portal
Better Way
● We create a Central Log Server.
● We push the log files from individual systems to Central Log Server.
knowledge portal
Centralized Log Management
There are multiple approaches to log the data centrally.
knowledge portal
CloudWatch Logs
CloudWatch Logs can be used to monitor, store, and access your log files from Amazon Elastic
Compute Cloud (Amazon EC2) instances, AWS CloudTrail, Route 53, and other sources.
Amazon CloudWatch
knowledge portal
Steps - CloudWatch Logs
There are three steps required to set up CloudWatch Logs with EC2.
Amazon CloudWatch
knowledge portal
Amazon Elastic File System (EFS)
Network Attached Storage
Overview of Elastic File System
Amazon Elastic File System (Amazon EFS) provides a simple, serverless, set-and-forget
elastic file system for use with AWS Cloud services and on-premises resources.
Mount
knowledge portal
Attachment to Multiple Targets
Multiple compute instances, including Amazon EC2, Amazon ECS, and AWS Lambda,
can access an Amazon EFS file system at the same time, providing a common data source
for workloads.
knowledge portal
Understanding EFS Architecture
To access file system from instance inside the VPC, we need to create mount target in the
VPC.
knowledge portal
Network File System
Network File System (NFS) is a networking protocol for distributed file sharing.
EFS uses the Network File System version 4 (NFS v4) protocol
Mount
Client
NFS Server
knowledge portal
Pricing Considerations
AWS EFS is expensive when compared to other storage options like EBS, S3.
Consideration Pricing
1 TB of S3 Storage $24
knowledge portal
Important Pointers
EFS can even be accessed from on-premise datacenter using an AWS Direct Connect or
AWS VPN connection.
With Amazon EFS, you pay only for what you use per month.
knowledge portal
Amazon Kinesis
Streaming Data
Basics of Streaming Data.
Streaming data is the continuous flow of data generated by various sources
Sensor 1
Sensor 3
knowledge portal
Examples of Streaming Data
A financial institution tracks changes in the stock market in real time and adjust it’s portfolio
accordingly.
A media publisher streams billions of clickstream records from its online properties
knowledge portal
Challenges with Working of Streaming Data
Streaming data processing requires two layers: a storage layer and a processing layer.
The storage layer needs to support record ordering and strong consistency, replayable reads and
the processing layer is responsible for consuming data from the storage layer, running
computation on that data and many other tasks.
knowledge portal
Basics of Amazon Kinesis
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you
can get timely insights and react quickly to new information.
Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale
knowledge portal
3 entities
There are 3 entities in this kind of use case:
Amazon Kinesis
Consumers
Producers
knowledge portal
AWS Snowball Family
95
Understanding with Use-Case
Organization A has hosted all of it’s storage infrastructure in data-center.
They have now decided to use S3 due to the benefits that it provides.
1 Gbps 50 days
AWS Snowball Family
Allows customers to Accelerate moving offline data or remote storage to the
cloud
Snowball Edge
Snowmobile
Edge Computing Functionality
These devices can also come with edge computing capabilities.
This means, you can run your applications in EC2 instances in the devices so
that you can work in edge environments with limited connectivity.
8 TB of usable storage
Snowball Edge
AWS Snowball Edge is a type of Snowball device with on-board storage and
compute power for select AWS capabilities
Snowball Edge Storage Optimized (for data This option has a 100 TB (80 TB usable) storage capacity.
transfer)
Snowball Edge Storage Optimized (with This option has up to 80 TB of usable storage space, 24
EC2 compute functionality) vCPUs, and 80 GB of memory for compute functionality
Snowball Edge Compute Optimized Most compute functionality, with 104 vCPUs, 416 GB of
memory, and 28 TB of dedicated NVMe SSD for compute
instances.
Snowball Edge Compute Optimized with Identical to the Compute Optimized option, except for an
GPU installed GPU
AWS Snowmobile
AWS Snowmobile moves extremely large amounts of data to AWS.
105
Understanding with Use-Case
AWS has lots of services where data can be stored.
Taking backup at individual service level can take lot of time and require
customization.
AWS Backup
Backup Backup
kplabs.in/twitter
Be Awesome
kplabs.in/linkedin