0% found this document useful (0 votes)
35 views

Unit 2 Lec 6 Cloud Computing

The document discusses disaster recovery planning when using cloud computing. It covers disaster recovery planning, disasters in the cloud including backup management, geographic redundancy, and organizational redundancy. It also discusses disaster recovery objectives and strategies for backups of different types of data in the cloud.

Uploaded by

Manvendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Unit 2 Lec 6 Cloud Computing

The document discusses disaster recovery planning when using cloud computing. It covers disaster recovery planning, disasters in the cloud including backup management, geographic redundancy, and organizational redundancy. It also discusses disaster recovery objectives and strategies for backups of different types of data in the cloud.

Uploaded by

Manvendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Subject Name:- Cloud Computing

Subject Code:- KCS-713


Unit No.:- 2
Lecture No.:- 6
Topic Name: Disaster Recovery

1
Content
1:-Disaster Recovery Planning
– The Recovery Point Objective
– The Recovery Time Objective
2:-Disasters in the Cloud
– Backup Management
– Geographic Redundancy
– Organization Redundancy
3:- Disaster Management
– Monitoring
– Load Balancer Recovery
– Application Server Recovery
– Database Recovery

2
1. Disaster Recovery Planning
• HOW GOOD IS YOUR DISASTER RECOVERY
PLAN? It’s fully documented and you regularly
test it by running disaster recovery drills, right?
• Disaster recovery is the practice of making a system
capable of surviving unexpected or extraordinary
failures.
• A disaster recovery plan, for example, will help your
IT systems survive a fire in your data center that
destroys all of the servers in that data center and the
systems they support.
3
• Every organization should have a documented
disaster recovery process and should test that process
at least twice each year.
• In reality, even well-disciplined companies tend to
fall short in their disaster recovery planning.

4
• Disaster recovery deals with catastrophic failures
that are extremely unlikely to occur during the
lifetime of a system. If they are reasonably
expected failures, they fall under the auspices of
traditional availability planning.
• Although each single disaster is unexpected over
the lifetime of a system, the possibility of some
disaster occurring over time is reasonably
nonzero.

5
Recovery Point Objective(RPO)
Recovery Time Objective (RTO)

6
Recovery Point Objective (RPO)
• The recovery point objective identifies how much
data you are willing to lose in the event of a
disaster. This value is typically specified in a number
of hours or days of data. For example, if you
determine that it is OK to lose 24 hours of data, you
must make sure that the backups you’ll use for your
disaster recovery plan are never more than 24 hours
old.

7
Recovery Time Objective (RTO)
• The recovery time objective identifies how much
downtime is acceptable in the event of a disaster. If
your RTO is 24 hours, you are saying that up to 24
hours may elapse between the point when your
system first goes offline and the point at which you
are fully operational again.

8
2. Disasters in the Cloud
• Assuming unlimited budget and capabilities,
you focus on three key things in disaster
recovery planning:
– Backups and data retention
– Geographic redundancy
– Organizational redundancy

9
• If you can take care of those three items, it’s nearly
certain you can meet most RPO and RTO needs.
• But you have never been in a situation in which you had
an unlimited budget and capabilities, so you have always
had to compromise. As a result, the order of the three items
matters. In addition, if your hosting provider is a
less-proven organization, organizational redundancy may
be more important than geographic redundancy.
• Fortunately, the structure of the Amazon cloud makes it
very easy to take care of the first and second items. In
addition, cloud computing in general makes the third item
much easier.

10
Backup Management
• Your ability to recover from a disaster is limited by
the quality and frequency of your backups. In a
traditional IT infrastructure, companies often make
full weekly backups to tape with nightly differentials
and then ship the weekly backups off-site. You can
do much better in the cloud, and do it much more
cheaply, through a layered backup strategy.

11
TABLE. Backup requirements by data type
Kind of data Description
Fixed data Fixed data, such as your operating system and common utilities, belong in your
AMI(Amazon Machine Image) . In the cloud, you don’t back up your AMI,
because it has no value beyond the cloud.
Transient data File caches and other data that can be lost completely without impacting the
integrity of the system. Because your application state is not dependent on this
data, don’t back it up.
Configuration Runtime configuration data necessary to make the system operate properly in a
data specific context. This data is not transient, since it must survive machine restarts.
On the other hand, it should be easily reconfigured from a clean application install.
This data should be backed up semi-regularly.
Persistent data Your application state, including critical customer data such as purchase orders. It
changes constantly and a database engine is the best tool for managing it. Your
database engine should store its state to a block device, and you should be
performing constant backups. Clustering and/or replication are also critical tools
in managing the database.
12
• In disaster recovery, persistent data is generally
the data of greatest concern. We can always rebuild
the operating system, install all the software, and
reconfigure it, but we have no way of manually
rebuilding the persistent data.

13
Fixed data strategy
• If you are fixated on the idea of backing up your
machine images, you can download the images out of
S3 and store them outside of the Amazon cloud. If S3
were to go down and incur data loss or corruption that
had an impact on your AMIs, you would be able to
upload the images from your off-site backups and
reregister them. It’s not a bad idea and it is not a lot
of trouble, but the utility is limited given the
uniqueness of the failure scenario that would make
you turn to those backups.

14
Configuration data strategy
• A good backup strategy for configuration
information comprises two levels.
• The first level can be either a regular file system
dump to your cloud storage or a files system
snapshot.
• An alternate approach is to check your application
configuration into a source code repository
outside of the cloud and leverage that repository
for recovery from even minor losses.

15
• Here’s what is recommended:
• Create regular—at a minimum, daily—snapshots of
your configuration data.
• Create semi-regular—at least less than your
RPO—file system archives in the form of ZIP or
TAR files and move those archives into Amazon S3.
• On a semi-regular basis—again, at least less than
your RPO—copy your file system archives out of the
Amazon cloud into another cloud or physical hosting
facility.
16
Persistent data strategy (aka(Also Known As) database
backups)
• The first line of defense is either multimaster
replication or clustering. A multimaster database
is one in which two master servers execute write
transactions independently and replicate the
transactions to the other master.
• A clustered database environment contains
multiple servers that act as a single logical server.
Under both scenarios, when one goes down, the
system remains operational and consistent.

17
• Instead, you can perform master-slave replication.
Master-slave replication involves setting up a master
server that handles your write operations and
replicating transactions over to a slave server. Each
time something happens on the master, it replicates to
the slave.

18
Geographic Redundancy

• You don’t necessarily need to have your application


running actively in all locations, but you need the
ability to bring your application up from the redundant
location in a state that meets your Recovery Point
Objective within a timeframe that meets your
Recovery Time Objective. If you have a 2-hour RTO
with a 24-hour RPO, geographical redundancy
means that your second location can be operational
within two hours of the complete loss of your
primary location using data that is no older than
24 hours.

19
• Amazon provides built-in geographic redundancy in
the form of regions and availability zones. If you
have your instances running in a given availability
zone, you can get them started back up in another
availability zone in the same region without any
effort.

20
Organizational Redundancy
• The best approach to organizational redundancy is
to identify another cloud provider and establish a
backup environment with that provider in the
event your first provider fails.
• In particular, we must consider all of the following
concerns:
– Storing your portable backups at your secondary
cloud provider.
– Creating machine images that can operate your
applications in the secondary provider’s virtualized
environment.

21
– Keeping the machine images up to date with
respect to their counterparts with the primary
provider.
– Not all cloud providers and managed service
providers support the same operation systems or
file systems. If your application is dependent on
either, you need to make sure you select a cloud
provider that can support your needs.

22
Disaster Management
• You are performing your backups and have an
infrastructure in place with all of the appropriate
redundancies.
• To complete the disaster recovery scenario, you need to
recognize when a disaster has happened and have the
tools and processes in place to execute your recovery
plan.
• One of the coolest things about the cloud is that all of
this can be automated.
• You can recover from the loss of Amazon’s U.S. data
centers while you sleep.

23
Monitoring
• Monitoring your cloud infrastructure is extremely important.
• You cannot replace a failing server or execute your disaster
recovery plan if you don’t know that there has been a failure.
• The trick, however, is that your monitoring systems cannot
live in either your primary or secondary cloud provider’s
infrastructure.
• They must be independent of your clouds. If you want to
enable automated disaster recovery, they also need the ability
to manage your EC2 infrastructure from the monitoring site.
• Your primary monitoring objective should be to figure out
what is going to fail before it actually fails.

24
Load Balancer Recovery
• One of the reasons companies pay absurd amounts of
money for physical load balancers is to greatly reduce
the likelihood of load balancer failure.
• With cloud vendors such as GoGrid— and in the
future, Amazon—you can realize the benefits of
hardware load balancers without incurring the costs.
Under the current AWS offering, you have to use
less-reliable EC2 instances.
• Recovering a load balancer in the cloud, however,
is lightning fast. As a result, the downside of a
failure in your cloud-based load balancer is minor.
25
• Recovering a load balancer is simply a matter of
launching a new load balancer instance from the
AMI and notifying it of the IP addresses of its
application servers. You can further reduce any
downtime by keeping a load balancer running in an
alternative availability zone and then remapping your
static IP address upon the failure of the main load
balancer.

26
Application Server Recovery
• The recovery of a failed application server is only
slightly more complex than the recovery of a failed
load balancer.
• Like the failed load balancer, you start up a new
instance from the application server machine image.
• You then pass it configuration information, including
where the database is.
• Once the server is operational, you must notify the
load balancer of the existence of the new server (as
well as deactivate its knowledge of the old one) so
that the new server enters the load-balancing rotation.

27
Database Recovery
• Database recovery is the hardest part of disaster
recovery in the cloud.
• Your disaster recovery algorithm has to identify
where an uncorrupted copy of the database exists.
• This process may involve promoting slaves into
masters, rearranging your backup management, and
reconfiguring application servers.

28
The following process will typically cover all levels of
database failure:
1. Launch a replacement instance in the old instance’s
availability zone and mount its old volume.
2. If the launch fails but the volume is still running,
snapshot the volume and launch a new instance in
any zone, and then create a volume in that zone
based on the snapshot.
3. If the volume from step 1 or the snapshot from step 2
are corrupt, you need to fall back to the replication
slave and promote it to database master.

29
4. If the database slave is not running or is somehow
corrupted, the next step is to launch a
replacement volume from the most recent
database snapshot.
5. If the snapshot is corrupt, go further back in time
until you find a backup that is not corrupt.
• Step 4 typically represents your worst-case
scenario. If you get to 5, there is something wrong
with the way you are doing backups.
30
Important Questions
Q1. How the disaster reecovery is done in cloud
computing sytem.
Q2. Explain Recovery Point Objective and Recovery Time
Objective.
Q3. Assuming unlimited budget and capabilities, what will
be three key things in disaster recovery planning on which
you will focus and why?. Explain
Q4. Write a technical note on Disaster Management.
Q5. Discuss Disaster Recovery Scenarios with AWS

31
References
• Text Books:-
• 1. Kai Hwang, Geoffrey C. Fox, Jack G.
Dongarra, “Distributed and Cloud Computing,
From Parallel Processing to the Internet of
Things”, Morgan Kaufmann Publishers, 2012.

32
Thank You

33

You might also like