Amazon S3: Prepared by
Amazon S3: Prepared by
Prepared By:
Aakash Kumar
Ayush Jain
Siddharth Narang
Varun Dhamija
Agenda
• What is Amazon S3?
• Design Requirements
• Concepts
• Getting Started
• Functionalities
• Common Use Scenarios
• Considerations going forward
• Protecting and Managing the Data
• Advantages and Disadvantages
• Advance Amazon Features
Amazon S3
• S3 stand for Simple Storage Service.
• Amazon S3 is storage for the Internet. It is
designed to make web-scale computing easier
for developers.
• Provided via a web services Interface (REST
and SOAP)
• Based on the same infrastructure Amazon
uses for its global network of websites.
• It is basically Infrastructure as a service.
Design Requirements
• Secure - Built to provide infrastructure that allows the customer
to maintain full control over who has access to their data.
Customers must also be able to easily secure their data in transit
and at rest.
• Reliable - Store data with up to 99.999999999% durability, with
99.99% availability. There can be no single points of failure. All
failures must be tolerated or repaired by the system without any
downtime.
• Scalable - S3 can scale in terms of storage, request rate, and
users to support an unlimited number of web-scale applications. It
uses scale as an advantage: Adding nodes to the system increases,
not decreases, its availability, speed, throughput, capacity, and
robustness.
Design Requirements
• Fast - Amazon S3 must be fast enough to support high-
performance applications. Server-side latency must be insignificant
relative to Internet latency. Any performance bottlenecks can be
fixed by simply adding nodes to the system.
• Inexpensive - S3 is built from inexpensive commodity hardware
components. All hardware will eventually fail and this must not
affect the overall system. It must be hardware-agnostic, so that
savings can be captured as Amazon continues to drive down
infrastructure costs.
• Simple - Building highly scalable, reliable, fast, and inexpensive
storage is difficult. Doing so in a way that makes it easy to use for
any application anywhere is more difficult. Amazon S3 must do
both.
Basic Concepts
• Amazon S3 stores data as objects within buckets .
– Range of Content
– Large Amount of data
– Sharing via RRS
• Storage for Data Analysis
– Data Analysis
– Share Analysis Data
• Backup, Archiving and Disaster Recovery
– Versioning Capability
– Large Amount of Data
– Aged Data
• Static Website Hosting
– Inexpensive
– Traffic
– Availability and Durability
Considerations going forward
• AWS Account and Security Credentials
AWS Integration
- You can use Amazon S3 alone or in concert with one or
more other Amazon products. The most common
products used with Amazon S3 are:
• Amazon Elastic Compute Cloud
• Amazon Elastic MapReduce
• Amazon Simple Queue Service
• Amazon CloudFront
• Amazon DevPay
Naming Strategy
- Plan your bucket names in advance. The
location of your data in Amazon S3 is a URL,
generally, of the form: http://[bucket-
name].S3.amazonaws.com/[key]. The bucket
and key names should be descriptive of the
objects. Each bucket is a namespace. Within
one bucket, key names cannot clash. Before
naming objects in buckets, you should develop
a naming strategy.
Pricing
• Charges for using S3 is based on the location of your
buckets.
• You are billed according to the storage, data transfer
in and out and the number of requests per month.
• There is no minimum fee for using S3. You pay only
for what you use.
• You can view your current charges incurred almost
immediately on S3 portal.
• Detailed usage report can also be downloaded.
Protecting the Data
• Amazon S3 supports multiple access control
mechanisms, as well as encryption for both secure
transit and secure storage on disk.
• With Amazon S3’s data protection features, you can
protect your data from both logical and physical
failures, guarding against data loss from unintended
user actions, application errors, and infrastructure
failures.
• For customers who must comply with regulatory
standards such as PCI and HIPAA, Amazon S3’s data
protection features can be used as part of an overall
strategy to achieve compliance.
Data Security Details
• Different Access Control Mechanisms
- Identity and Access Management (IAM) Policies
- Access Control Lists (ACLs)
- Bucket Policies
- Query String Authentication
• Amazon S3 also provides multiple options for
encryption of data.
• Amazon S3 also supports logging of requests
made against your Amazon S3 resources.
Data Durability and Reliability
• To increase durability, Amazon S3 synchronously
stores your data across multiple facilities.
• Amazon S3 performs regular, systematic data
integrity checks and is built to be automatically
self-healing.
• Amazon S3 provides further protection via
Versioning. You can use Versioning to preserve,
retrieve, and restore every version of every object
stored in your Amazon S3 bucket.
S3’s Standard Storage
• Amazon S3’s standard storage is:
- Backed with the Amazon S3 Service Level Agreement.
- Designed for 99.999999999% durability and 99.99%
availability of objects over a given year.
- Designed to sustain the concurrent loss of data in two
facilities.
Reduced Redundancy Storage
• Reduced Redundancy Storage (RRS) is a storage option
within Amazon S3 that enables customers to reduce their
costs by storing non-critical, reproducible data at lower
levels of redundancy than Amazon S3’s standard storage. It
provides a cost-effective, highly available solution for
distributing or sharing content that is durably stored
elsewhere, or for storing thumbnails, transcoded media, or
other processed data that can be easily reproduced. The
RRS option stores objects on multiple devices across
multiple facilities, providing 400 times the durability of a
typical disk drive, but does not replicate objects as many
times as standard Amazon S3 storage, and thus is even
more cost effective. It is designed to sustain the loss of data
in a single facility.
• Backed with the Amazon S3 Service Level
Agreement.
• Designed to provide 99.99% durability and
99.99% availability of objects over a given
year. This durability level corresponds to an
average annual expected loss of 0.01% of
objects.
• Designed to sustain the loss of data in a single
facility.
Amazon Glacier
• Amazon S3 enables you to utilize Amazon Glacier’s extremely low-cost
storage service as a storage option for data archival. Amazon Glacier
stores data for as little as $0.01 per gigabyte per month, and is optimized
for data that is infrequently accessed and for which retrieval times of
several hours are suitable.
• Like Amazon S3’s other storage options (Standard or Reduced Redundancy
Storage), objects stored in Amazon Glacier using Amazon S3’s APIs or
Management Console have an associated user-defined name.
• You can get a real-time list of all of your Amazon S3 object names,
including those stored using the Amazon Glacier option, using the Amazon
S3 LIST API. Objects stored directly in Amazon Glacier using Amazon
Glacier’s APIs cannot be listed in real-time, and have a system-generated
identifier rather than a user-defined name.
• Amazon S3 objects that are stored using the Amazon Glacier option are
only accessible through Amazon S3’s APIs or the Amazon S3 Management
Console.
• Backed with the Amazon S3 Service Level
Agreement.
• Designed for 99.999999999% durability and
99.99% availability of objects over a given
year.
• Designed to sustain the concurrent loss of
data in two facilities.
Managing the Data
• The various data management features offered by
Amazon S3 are:
- Data Lifecycle Management: Lifecycle management of
data refers to how your data is managed and stored from
creation and initial storage to when it’s no longer needed
and deleted.
- Cost Monitoring and Controls: You can use Amazon
CloudWatch to receive billing alerts that help you
monitor the Amazon S3 charges on your bill.
Advantages of using S3
• Scalability - The amount of storage & bandwidth you
need can scale as you like without any configuration
changes needed.
• Availability, speed, throughput, capacity, and
robustness is not affected even if you gain 10,000 users
overnight.
• Unlimited storage. You pay as you go.
• Inexpensive and no capital outlay. Great for startups!
• Data is accessible from any location.
• Since it is based on the Amazon infrastructure, it is
probably more reliable than other cheap data storage
providers
Disadvantages of S3
• Not user-friendly for 'beginner level' computer
users. S3 is basically UI-less.
• Trust - Not all types of business or services might
be comfortable with storing their data in the
'cloud', especially those with extremely sensitive
and confidential data. Eg. banking
• Although it promises 99.9% of uptime in its SLA,
in 2008 it has 2 major outages in February and
July, bringing down Web 2.0 startups like Twitter.
• Back in 2007, S3 had speed issues with reading
and writing of data.
Advance Amazon S3 Features
• Using Amazon DevPay with Amazon S3
- you can use Amazon DevPay to charge customers that
access the data you store on Amazon S3.
• Requester pays buckets
- you can configure a bucket so that a customer pays for
the downloads they make.
• Using BitTorrent with Amazon S3
- Use BitTorrent, which is an open, peer-to-peer protocol
for distributing files.
Thankyou