100% found this document useful (1 vote)
96 views30 pages

Lecture 10 - Disaster Recovery

The document outlines the key steps in developing a disaster recovery plan (DRP), including: 1. Forming a DRP team to lead the process. 2. Defining business goals and objectives to prioritize recovery. 3. Identifying critical systems, processes, dependencies, and single points of failure. 4. Developing procedures to recover infrastructure, restore operations, and transfer data back to production. 5. Thoroughly testing and refining the plan on a regular basis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
96 views30 pages

Lecture 10 - Disaster Recovery

The document outlines the key steps in developing a disaster recovery plan (DRP), including: 1. Forming a DRP team to lead the process. 2. Defining business goals and objectives to prioritize recovery. 3. Identifying critical systems, processes, dependencies, and single points of failure. 4. Developing procedures to recover infrastructure, restore operations, and transfer data back to production. 5. Thoroughly testing and refining the plan on a regular basis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

The

Disaster Recovery
Planning Process
Introduction
žMany companies don’t have a disaster
recovery plan often there is a desire for a DRP.
žThe level of effort and/or cost required to
create DRP can cause this project to have a
low priority relative to other more immediate
projects.
žA DRP is viewed as "nice to have" or "just
insurance that will not be used ", and not as a
critical business component.
Introduction cont…
žThat is, until there is a failure that causes a
significant outage or loss of data (often at a
significant cost to the business).
žIt is my opinion that every company could
benefit from both a disaster recovery plan and
a business continuity plan (BCP)
žInvesting in a DRP and BCP is just as an
important for most business in my opinion.
Introduction cont…
ž This presentation describe many areas to address
during the creation of a DRP.
ž It is not all-inclusive, but is intended to provide insight
into the overall process.
ž So, now that you have decided to go forward with this
type of project, what next?
– Where do you start?
– What needs to be addressed?
– How will you know that the plan really works?
– Do you need to find external expertise for this type of
project?
– If so, exactly what type of expertise is required?
Introduction cont…
žA DRP first need to be created, then tested,
and redefines and finally implemented and
tested on a periodic basis.
žMost plans will experience some type of
failure during their first execution so it is also
important to retest the plan on a periodic
basis ideally rotating staff.
žUsing experienced consultants to help
develop the process and then later audit and
refine the process is usually idea sound
investment.
Introduction cont…
žThey will often identify gaps or ambiguities
that might have been missed with the
systems, process, and procedures in use.
žFind a team that will build plans based on how
your business and systems work, and not try
to make your business fit into a predefined
template.
žEvery business is different, and every system
being recovered has its own nuances.
(capture the knowledge is critical to success)
Introduction cont…
• When using a DR Facilities provider there are
many issues to consider.
• There may be availability issues if the
company has many customers in a single
geographic area.
• There may be phone and network bandwidth
issues if more than one customer declares a
disaster at a single time.
Introduction cont…
• Does the company have multiple hot sites that
you can use? Where are they located?
• How long would it take to assemble a recovery
team at each site?
• How often can you test the plan? And how
long will you have to test the plan?
• What is their DR plan?
• How committed are they to your success?
Where do you start?
žThe first step is to create a DR team and this
includes an:
1. Executive sponsor.
2. DR coordinator.
3. Team leaders (there will be several groups
and possibly subgroups).
4. Team members.
Where do you start? cont…

• This people should be designated as either


primary or backup for position, with every
position having more than one person
assigned this to minimize people as a single
point of failure.
• The goal is to have the expertise to help
develop the various recovery procedures, and
is committed to success of the overall effort.
Where do you start? cont…

The next step is to define business goals.


vThe goal should address items such as:

– What functional areas need to be recovered?


– What length of time is acceptable for recovery?
– What amount of data loss is acceptable?
– This often involves prioritization and a cost-
benefit analysis to determine the worth of
recovery (i.e. something that may be premature at
this phase of the project).
Understand the business goals and
objectives
vTo find out what that really entails you must
know:

– What are the critical systems?


– What are the key processes and applications?
– What are the dependencies on other systems?
Understand the business goals and
objectives cont…
žThis includes:
– Data transfers.
– Manual processes
– Remote processing
– Then documents these processes.
Because there is interaction with dependencies on
other systems and user interface, and the
sensitivity of the data.

žOnce the systems have been identified, attempt


to quantify their impact relative to the overall
business goals.
Identify specific requirements

Everyone involve with this effort (including


upper management within a company) needs
to have a single vision of what success look
like, without this you risk wasting time and
money on a plan that may be viewed as a
failure.
Identify key personnel
• These people may not be part of the DR team,
but they are important. (For example who has
the authority to declare a disaster?)

• This list should be maintained both by name


and by role; it should be validated and
updated frequently.
Identify single point of failure
ž The overall goal of this step is to mitigate unnecessary risk.

ž The scope of this effort includes people, software, equipment,


and infrastructure.

ž It is important to look at the "big picture", which includes:


– Impact of the failure.
– Probability of failure.
– Estimated incidents (failures).
– Annualized loss expectancy.
– Cost of mitigation.
Preparing to develop the DRP
• It is important to have a document
management system in place to:
– track versions of plans.
– work in progress.
– work that is scheduled but not started.

• This information needs to be backed-up and


saved in a format that does not rely on the
underlying systems being recovered.
• For example the banks, like HSBC bank they
use secure e-rooms and external vendors for
some of their projects, and they recommend
that the plans be archived on portable media
with copies kept people and various "safe"
locations.
The DRP should address 3 main
functional areas

1. Recovery.
2. Restoring / sustaining business operation.
3. Transferring Data back to Production
Machines.
1. Recovery
Once the infrastructure is in place it will be
necessary to recover production data.
Minimize holes in data very important
especially in a distributed processing
environment where one step could be
dependent on one or more predecessor
steps actions.
Then to identify the action to be taken when
data inconsistencies are detected.
2. Restoring / sustaining business
operation:
ž All processing requirements and service level
agreements need to be defined and documented.

žDependencies between processes also need to be


defined.

žIt is important to document the existing process and


then build the plan accordingly.

• Note: remember that routine maintenance including


backups should still be performed at the hot site.
3. Transferring Data back to
Production Machines:
• Production will need to shift from a hot site back to a
permanent location.

• A process needs to be defined to manage this


migration.

• Synchronize the machines to a specific point in time.

• It should also be noted that this is one of the more


difficult tasks to test.
The DRP should address 3 main
technical areas

1. Hardware Issues.
2. Networking issues.
3. Software issues.
1. Hardware Issues:
vThis includes:
1. Machine type.
2. Configuration
3. Operating system version
4. Patch level.
5. And the recommendation here is to plan to the
worst case.
vThe key to success is to ensure that the DRP
machines have at least as much capacity as the
production machines that they are replacing.
2. Networking issues:
žIs any special type of LAN or VPN software required?
žHow do the machines communicate with one
another?
žDo applications connect to machines using hostname
or hard-coded IP Addresses?
žAre there requirements for connection to an external
network?
vAll networking requirements and issues need to be
identified, documented, and then addressed in the
DRP.
3. Software issues:
žSoftware includes:
1. Operating system.
2. User written applications
3. Third party software (report writers, GUI products,
backup/recovery products, scheduling software).
4. A comprehensive inventory of currently used
software.

žWorking hardware and an accessible network


is worthless if your critical applications are not
working!
Creating the procedures that support
the plan
• When the DRP is created it should not assume anything!

• Nothing should be assumed or left to chance.

• Design the procedures with the goal of a semi-


experienced person who may not be familiar with your
operations executing the procedure.

• Detailed test plans should be developed prior to execution


and should address all critical functional areas of the DRP.

• Data should be gathered during testing and saved for future


review.
Creating the procedures that support
the plan cont…
ž In the event of problems that data may help the team
make a root cause determination regarding the
problem so that it can be corrected.
ž If everything goes right it provides the necessary
documentation to support an external validation effort
of the DRP exercise.
ž If every thing worked is to know what every thing is.
ž And then to be able to demonstrate that the necessary
tasks were completed successfully!
Testing and refining the plan
• A common problem that we see is the plans are developed, but they
are never tested, or are tested once and forgotten.

• A plan that is not continuously refined and validated is almost


worthless.

• In order to maximize the chance for success in the event of a real


disaster it is essential that the DRP be executed on a regular basis.

• Specific recovery procedure can generally be tested in-house on a


more frequent basis.
Summary
žThe DRP is a living document that is refined
over several iterations and update over time.

žNo matter how good it is it probably will fail


during the first execution.

žThe key is to continue to improve the plan so


that will work if and when it is ever needed.

You might also like