Backup and Recovery Approaches Using AWS
Backup and Recovery Approaches Using AWS
Using AWS
e d
June 2016
i v
This paper has been archived.
h
For the latest technical content about the AWS Cloud,
see the AWS Whitepapers & Guides page:
rc
https://aws.amazon.com/whitepapers
A
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
© 2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Notices
This document is provided for informational purposes only. It represents AWS’s
current product offerings and practices as of the date of issue of this document,
which are subject to change without notice. Customers are responsible for
d
making their own independent assessment of the information in this document
and any use of AWS’s products or services, each of which is provided “as is”
without warranty of any kind, whether express or implied. This document does
e
not create any warranties, representations, contractual commitments, conditions
or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities
v
and liabilities of AWS to its customers are controlled by AWS agreements, and
i
this document is not part of, nor does it modify, any agreement between AWS
and its customers.
rc h
A
Page 2 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
Contents
Abstract 4
Introduction 4
Why Use AWS as a Data-Protection Platform? 4
AWS Storage Services for Data Protection 5
d
Amazon S3 6
e
Amazon Glacier 6
AWS Storage Gateway 7
i v
AWS Transfer Services 7
Designing a Backup and Recovery Solution 7
h
Cloud-Native Infrastructure 8
c
EBS Snapshot-Based Protection 9
r
Database Backup Approaches 14
On-Premises to AWS Infrastructure 17
A
Hybrid Environments 20
Backing Up AWS-Based Applications to Your Data Center 21
Migrating Backup Management to the Cloud for Availability 22
Example Hybrid Scenario 23
Archiving Data with AWS 24
Securing Backup Data in AWS 24
Conclusion 25
Contributors 25
Document Revisions 26
Page 3 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
Abstract
This paper is intended for enterprise solution architects, backup architects, and
IT administrators who are responsible for protecting data in their corporate IT
environments. It discusses production workloads and architectures that can be
implemented using AWS to augment or replace a backup and recovery solution.
These approaches offer lower costs, higher scalability, and more durability to
d
meet Recovery Time Objective (RTO), Recovery Point Objective (RPO), and
compliance requirements.
e
Introduction
i v
As the growth of enterprise data accelerates, the task of protecting it becomes
more challenging. Questions about the durability and scalability of backup
methods are commonplace, including this one: How does the cloud help meet my
h
backup and archival needs?
c
This paper covers a number of backup architectures (cloud-native applications,
r
hybrid and on-premises environments) and associated AWS services that can be
used to build scalable and reliable data-protection solutions.
A
Why Use AWS as a Data-Protection
Platform?
Amazon Web Services (AWS) is a secure, high-performance, flexible, cost-
effective, and easy-to-use cloud computing platform. AWS takes care of the
undifferentiated heavy lifting and provides tools and resources you can use to
build scalable backup and recovery solutions.
There are many advantages to using AWS as part of your data protection strategy:
Page 4 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
Standards for Attestation Engagements (SSAE) 16, International
Organization for Standardization (ISO) 27001, Payment Card Industry
e
Data Security Standard (PCI DSS), Health Insurance Portability and
Accountability Act (HIPPA), SEC1, and Federal Risk and Authorization
v
Management Program (FedRAMP) so you can easily fit the backup solution
i
into your existing compliance regimen.
Scalability: With AWS, you don’t have to worry about capacity. You can
h
scale your consumption up or down as your needs change without
administrative overhead.
c
Lower TCO: The scale of AWS operations drives down service costs and
r
helps lower the total cost of ownership (TCO) of the storage. AWS passes
these cost savings on to customers in the form of price drops.
A
Pay-as-you-go pricing: Purchase AWS services as you need them and
only for the period you plan to use them. AWS pricing has no upfront fees,
termination penalties, or long-term contracts.
1 https://aws.amazon.com/about-aws/whats-new/2015/09/amazon-glacier-receives-
third-party-compliance-assessment-for-sec-rule-17a-4f-from-cohasset-associates-
inc/
Page 5 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
Amazon S3
Amazon S3 provides highly secure, scalable object storage.
You can use Amazon S3 to store and retrieve any amount of data, at any time,
from anywhere on the web. Amazon S3 stores data as objects within resources
called buckets. AWS Storage Gateway and many third-party backup solutions can
manage Amazon S3 objects on your behalf. You can store as many objects as you
d
want in a bucket, and you can write, read, and delete objects in your bucket.
Single objects can be up to 5 TB in size.
e
Amazon S3 offers a range of storage classes designed for different use cases.
v
These include:
i
Amazon S3 Standard for general-purpose storage of frequently
accessed data.
h
Amazon S3 Standard - Infrequent Access for long-lived, but less
c
frequently accessed data.
r
Amazon Glacier for long-term archive.
Amazon S3 also offers lifecycle policies you can configure to manage your data
A
throughout its lifecycle. After a policy is set, your data will be migrated to the
appropriate storage class without any changes to your application. For more
information, see S3 Storage Classes.
Amazon Glacier
Amazon Glacier is an extremely low-cost, cloud archive storage service that
provides secure and durable storage for data archiving and online backup. To
keep costs low, Amazon Glacier is optimized for data that is infrequently accessed
and for which retrieval times of several hours are acceptable. With Amazon
Glacier, you can reliably store large or small amounts of data for as little as
$0.007 per gigabyte per month, a significant savings compared to on-premises
solutions. Amazon Glacier is well suited for storage of backup data with long or
indefinite retention requirements and for long-term data archiving. For more
information, see Amazon Glacier.
Page 6 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
In addition to third-party gateways and connectors, you can use AWS options
like AWS Direct Connect, AWS Snowball, AWS Storage Gateway, and Amazon S3
e
Transfer Acceleration to quickly transfer your data. For more information, see
Cloud Data Migration.
i v
Designing a Backup and Recovery Solution
h
When you develop a comprehensive strategy for backing up and restoring data,
you must first identify the failure or disaster situations that can occur and their
c
potential business impact. In some industries, you must consider regulatory
r
requirements for data security, privacy, and records retention.
You should implement backup processes that will offer the appropriate level of
A
granularity to meet the RTO and RPO of the business, including:
File-level recovery
Volume-level recovery
Application-level recovery (for example, databases)
Image-level recovery
The following sections describe backup, recovery, and archive approaches based
on the organization of your infrastructure. IT infrastructure can broadly be
categorized as cloud native, on-premises, and hybrid.
Page 7 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
Cloud-Native Infrastructure
This scenario describes a workload environment that exists entirely on AWS. As
the following figure shows, it includes web servers, application servers,
monitoring servers, databases, and Active Directory.
If you are running all of your services from AWS, you can leverage many built-in
d
features to meet your data protection and recovery needs.
i v e
rc h
A Figure 1: AWS Native Scenario
Page 8 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
Amazon EBS provides the ability to create snapshots (backups) of any Amazon
EBS volume. It takes a copy of the volume and places it in Amazon S3, where it is
e
stored redundantly in multiple Availability Zones. The first snapshot is a full copy
of the volume; ongoing snapshots store incremental block-level changes only.
i v
This is a fast and reliable way to restore full volume data. If you only need a
partial restore, you can attach the volume to the running instance under a
different device name, mount it, and then use operating system copy commands
h
to copy the data from the backup volume to the production volume.
c
Amazon EBS snapshots can also be copied between AWS regions using the
r
Amazon EBS snapshot copy capability available in the console or from the
command line, as described in the Amazon Elastic Cloud Compute User Guide.3
You can use this feature to store your backup in another region without having to
A
manage the underlying replication technology.
2 http://aws.amazon.com/ec2/
3 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-copy-snapshot.html
Page 9 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
In the Amazon EC2 console, on the Elastic Block Store Volumes page, choose
Create Snapshot from the Actions menu. On the Create Snapshot dialog
d
box, choose Create to create a snapshot that will be stored in Amazon S3.
i v e
rc h
A
Figure 2: Using the EC2 Console to Create a Snapshot
To use the CLI command to create the snapshot, run the following command:
You can schedule and run the aws ec2 create-snapshot commands on a
regular basis to back up the EBS data. The economical pricing of Amazon S3
makes it possible for you to retain many generations of data. And because
snapshots are block-based, you consume space only for data that’s changed after
the initial snapshot was created.
Page 10 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
1. Use the following command to create a volume from the backup snapshot:
e d
aws ec2 create-volume –-region us-west-1b –-snapshot-
id mysnapshot-id
h i v
2. On the Amazon EC2 instance, unmount the existing volume.
c
In Linux, use umount. In Windows, use the Logical Volume Manager (LVM).
r
3. Use the following command to detach the existing volume from the instance:
A
aws ec2 detach-volume --volume-id oldvolume-id –
instance-id myec2instance-id
4. Use the following command to attach the volume that was created from the
snapshot:
Page 11 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
For this reason, you must quiesce the file system or database in order to make a
clean backup. The way in which you do this depends on your database or file
d
system.
e
If possible, put the database into hot backup mode.
v
i
Run the Amazon EBS snapshot commands.
Take the database out of hot backup mode or, if using a read replica,
terminate the read replica instance.
h
The process for a file system is similar, but depends on the capabilities of the
c
operating system or file system. For example, XFS is a file system that can flush
its data for a consistent backup. For more information, see xfs_freeze.4
r
If your file system does not support the ability to freeze, you should unmount it,
A
issue the snapshot command, and then remount the file system. Alternatively,
you can facilitate this process by using a logical volume manager that supports
the freezing of I/O.
Because the snapshot process continues in the background and the creation of
the snapshot is fast to execute and captures a point in time, the volumes you’re
backing up only need to be unmounted for a matter of seconds. Because the
backup window is as small as possible, the outage time is predictable and can be
scheduled.
4 https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/xfsfreeze.
html
Page 12 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
There are a number of ways to accomplish this. For example, you can use the
script created by alestic.com5. The memory buffers should be flushed to disk; the
e
file system I/O to disk should be stopped; and a snapshot should be initiated
simultaneously for all the volumes making up the RAID set. After the snapshot
v
for the volumes is initiated (usually a second or two), the file system can continue
i
its operations. The snapshots should be tagged so that you can manage them
collectively during a restore.
h
You can also perform these backups from the logical volume manager or file-
c
system level. In these cases, using a traditional backup agent enables the data to
be backed up over the network. A number of agent-based backup solutions are
r
available on the internet and in the AWS Marketplace.6 Remember that agent-
based backup software expects a consistent server name and IP address. As a
A
result, using these tools with instances deployed in an Amazon virtual private
cloud (VPC)7 is the best way to ensure reliability.
5 https://github.com/alestic/ec2-consistent-snapshot
6 https://aws.amazon.com/marketplace/
7 http://aws.amazon.com/vpc/
Page 13 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
Snapshot-Based Protection.”
e
Using Database Replica Backups
For databases that are built on RAID sets of Amazon EBS volumes, you can
v
remove the burden of backups on the primary database by creating a read replica
i
of the database. This is an up-to-date copy of the database that runs on a separate
Amazon EC2 instance. The replica database instance can be created using
multiple disks similar to the source, or the data can be consolidated to a single
h
EBS volume. You can then use one of the procedures described in “EBS
c
Snapshot-Based Protection” to snapshot the EBS volumes. This approach is often
used for large databases that are required to run 24/7. When that is the case, the
r
backup window required is too long and the production database cannot be taken
down for such long periods.
A
Using Amazon RDS for Backups
Amazon RDS includes features for automating database backups. Amazon RDS
creates a storage volume snapshot of your database instance, backing up the
entire DB instance, not just individual databases.
8 https://aws.amazon.com/rds/
9 http://dev.mysql.com/doc/refman/5.7/en/backup-and-recovery.html
10
http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmbckba.htm#BRADV
8003
11 http://msdn.microsoft.com/en-us/library/ms187510.aspx
12 http://www.postgresql.org/docs/9.3/static/backup.html
Page 14 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
Amazon RDS provides two different methods for backing up and restoring your
DB instances:
d
uses these periodic data backups in conjunction with your transaction logs to
enable you to restore your DB instance to any second during your retention
period, up to the LatestRestorableTime (typically, the last five minutes).
e
To find the latest restorable time for your DB instances, you can use the
DescribeDBInstances API call or look on the Description tab for the
v
database in the Amazon RDS console.
i
When you initiate a point-in-time recovery, transaction logs are applied to the
most appropriate daily backup in order to restore your DB instance to the
h
time you requested.
c
DB snapshots are user-initiated backups that enable you to back up your
DB instance to a known state as frequently as you like, and then restore to
r
that state at any time. You can use the Amazon RDS console or the
CreateDBSnapshot API call to create DB snapshots. These snapshots have
unlimited retention. They are kept until you use the console or the
A
DeleteDBSnapshot API call to explicitly delete them.
When you restore a database to a point in time or from a DB snapshot, a new
database instance with a new endpoint will be created. In this way, you can create
multiple database instances from a specific DB snapshot or point in time.
Page 15 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
e d
i v
Figure 3: Using an AMI to Back Up and Launch an Instance
h
When you register an AMI, it is stored in your account using Amazon EBS
c
snapshots. These snapshots reside in Amazon S3 and are highly durable.
A r
Figure 4: Using the EC2 Console to Create a Machine Image
After you have created an AMI of your Amazon EC2 instance, you can use the
AMI to re-create the instance or launch more copies of the instance. You can also
copy AMIs from one region to another for application migration or disaster
recovery.
Page 16 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
Colocation Hosting
v e
Internet
i
Application Application File
Servers Servers Workstations Servers
Switches
h
Customer
SAN Storage
Interconnect
Network
c
Routers
Application Database
r
Servers Servers
A
Corporate Data Center
Branch Office
Workstations Management
Server Routers
Workstations Workstations Workstations Routers
SAN
Switches Storage Switches
SAN Storage
Page 17 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
By using AWS storage services in this scenario, you can focus on backup and
archiving tasks. You don’t have to worry about storage scaling or infrastructure
capacity to accomplish the backup task.
Amazon S3 and Amazon Glacier are natively API-based and available through the
Internet. This allows backup software vendors to directly integrate their
applications with AWS storage solutions, as shown in the following figure.
e d
h i v
rc
A
Figure 6: Backup Connector to Amazon S3 or Amazon Glacier
In this scenario, backup and archive software directly interfaces with AWS
through the APIs. Because the backup software is AWS-aware, it will back up the
data from the on-premises servers directly to Amazon S3 or Amazon Glacier.
If your existing backup software does not natively support the AWS cloud, you
can use AWS storage gateway products. AWS Storage Gateway13 is a virtual
appliance that provides seamless and secure integration between your data center
and the AWS storage infrastructure. The service allows you to securely store data
13 http://aws.amazon.com/storagegateway/
Page 18 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
in the AWS cloud for scalable and cost-effective storage. Storage Gateway
supports industry-standard storage protocols that work with your existing
applications while securely storing all of your data encrypted in Amazon S3 or
Amazon Glacier.
e d
h i v
c
Figure 7: Connecting On-Premises to AWS Storage
r
AWS Storage Gateway supports the following configurations:
A
Volume gateways: Volume gateways provide cloud-backed storage volumes
that you can mount as Internet Small Computer System Interface (iSCSI)
devices from your on-premises application servers. The gateway supports the
following volume configurations:
Gateway-cached volumes: You can store your primary data in Amazon
S3 and retain your frequently accessed data locally. Gateway-cached
volumes provide substantial cost savings on primary storage, minimize the
need to scale your storage on premises, and retain low-latency access to
your frequently accessed data.
Gateway-stored volumes: In the event you need low-latency access to
your entire data set, you can configure your on-premises data gateway to
store your primary data locally, and asynchronously back up point-in-time
snapshots of this data to Amazon S3. Gateway-stored volumes provide
durable and inexpensive off-site backups that you can recover locally or
from Amazon EC2.
Gateway-virtual tape library (gateway-VTL): With gateway-VTL, you
can have a limitless collection of virtual tapes. Each virtual tape can be stored
Page 19 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
which can be integrated into your backup or archive framework. You can use the
iSCSI disk devices as storage pools for your backup software or the gateway-VTL
e
to offload tape-based backup or archive directly to Amazon S3 or Amazon
Glacier.
i v
Using this method, your backup and archives are automatically offsite (for
compliance purposes) and stored on durable media, eliminating the complexity
and security risks of off-site tape management.
c h
Hybrid Environments
r
The two infrastructure deployments discussed to this point, cloud-native and on-
premises, can be combined into a hybrid scenario where the workload
environment has on-premises and AWS infrastructure components. Resources,
A
including web servers, application servers, monitoring servers, databases, Active
Directory, and more are hosted either in the customer data center or AWS.
Applications running in the AWS cloud are connected to applications running on-
premises.
14 http://aws.amazon.com/directconnect/
Page 20 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
and consistent latency to upload data to the cloud for the purposes of data
protection and consistent performance and latency for hybrid workloads.
e d
h i v
rc
A
Figure 8: A Hybrid Infrastructure Scenario
Page 21 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
highly available backup infrastructure.
To manage the backup data flows, you might also want to create one or more
e
media servers on Amazon EC2 instances. Media servers near the Amazon EC2
instances will save you money on internet transfer and, when backing up to S3 or
v
Amazon Glacier, increase overall backup and recovery performance.
h i
rc
A
Figure 9: Using Gateways in the Hybrid Scenario
Page 22 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
Your backup software has agents that back up operating systems, virtual machine
images, data volumes, SQL Server databases, and Oracle databases (using
e
RMAN). For applications like MySQL that your backup software does not have an
agent for, you might use the mysqldump client utility to create a database dump
v
file to disk where standard backup agents can then protect the data.
i
To protect this environment, your third-party backup software most likely has a
global catalog server or master server that controls the backup, archive, and
h
restore activities as well as multiple media servers that are connected to disk-
based storage, Linear Tape-Open (LTO) tape drives, and AWS storage services.
rc
The simpliest way to augment your backup solution with AWS storage services is
to take advantage of your backup vendor’s support for Amazon S3 or Amazon
Glacier. We suggest you work with your vendor to understand their integration
A
and connector options. For a list of backup software vendors who work with
AWS, see our partner directory15.
If your exising backup software does not natively support cloud storage for
backup or archive, you can use a storage gateway device, such as a bridge,
between the backup software and Amazon S3 or Amazon Glacier.
There are many third-party gateway solutions. You can also use AWS Storage
Gateway virtual appliances to bridge this gap because it uses generic techniques
such as iSCSI-based volumes and virtual tape libraries (VTLs). This configuration
requires a supported hypervisor (VMware or Microsoft Hyper-V) and local
storage to host the appliance.
15 http://www.aws-partner-directory.com/PartnerDirectory/PartnerSearch?type=ISV
Page 23 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
d
Data durability for long-term integrity
e
Data security
v
Ease of recoverability
i
Low cost
h
Amazon Glacier provides archives at low cost, native encryption of data at rest, 11
c
nines of durability, and unlimited capacity.
r
Amazon S3 Standard - Infrequent Access is a good choice for use cases that
require the quick retrieval of data. Amazon Glacier is a good choice for use cases
A
where data is infrequently accessed and retrieval times of several hours are
acceptable.
Objects can be tiered into Amazon Glacier either through lifecycle rules in S3 or
the Amazon Glacier API. The Amazon Glacier Vault Lock feature allows you to
easily deploy and enforce compliance controls for individual Amazon Glacier
vaults with a vault lock policy. You can specify controls such as “write once, read
many” (WORM) in a vault lock policy and lock the policy from future edits. For
more information, see Amazon Glacier.
Page 24 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
data in transit. Amazon Glacier encrypts all data at rest by default. With Amazon
S3, customers can choose server-side encryption for objects at rest by letting
AWS manage the encryption keys, providing their own keys when they upload an
object, or using AWS Key Management Service (AWS KMS)16 integration for the
encryption keys. Alternatively, customers can always encrypt their data before
uploading it to AWS. For more information, see Amazon Web Services: Overview
of Security Processes.
d
Conclusion
e
Gartner has recognized AWS as a leader in public cloud storage services17. AWS is
well positioned to help organizations move their workloads to cloud-based
v
platforms, the next generation of backup. AWS provides cost-effective and
i
scalable solutions to help organizations balance their requirements for backup
and archiving. These services integrate well with technologies you are using
h
today.
c
Contributors
r
The following individuals contributed to this paper:
A
Pawan Agnihotri, Solutions Architect, Amazon Web Services
16 http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html
17http://www.gartner.com/technology/reprints.do?id=1-
1WWKTQ3&ct=140709&st=sb
Page 25 of 26
Amazon Web Services – Backup and Recovery Approaches Using AWS June 2016
Document Revisions
Updated May 2016
e d
h i v
rc
A
Page 26 of 26