Topic 8 - Plan Development

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CHAPTER 8

PLAN DEVELOPMENT

1
Contents

• DRP Strategies
• Design & Development Phase
• Emergency Response (ER) Operations & Components
• Develop ER Procedures
• Alternate Recovery Site
• Site Recovery & Resumption

2
DRP Strategies
• A strategy that meets the business objectives to continually assess
and develop effective planning, testing, and recovery of critical
services (People, Processes and Technology).
Business Continuity
Identify & Assess
(Impact and Risk)
Command Rescue, Emergency
Communication and Recovery
Planning Response
Control Planning
Maintain,
& Strategy,
Embed Business Solution,
Resiliency & Plan
Site Salvage & Managing Crisis
Recovery Restoration of the Crisis
Site
Management
Planning Planning
Train, Test, Deploy, &
Maintaining Restoring
“Business as Verify Technology
Usual”
Business Relocating Identifying
Disaster
Continuity Staff Critical Staff
Recovery
Planning Planning
(COOP) Work Area Human
Recovery Resources
Planning Planning

Governance
Source: Accenture 2013. 33
DRP Strategies-continue
• Disaster is any unplanned occurrence that substantially
disrupts the operation of most or all the Applications and
prevents users from performing their business functions.

• Many disasters (fire, plane crash, flood, earthquake, etc.)


include physical damage and/or loss of the
computer/networking hardware and/or IT personnel
supporting the systems.

• Conditions that are not considered a disaster are: server


failure, software issues, temporary network outage, bug on
software, etc.

4
DRP Strategies-continue
• The common thread is that the disaster is not a planned
event (such as an upgrade) and gets in the way of the users
performing the work they normally do with the application.

• In the case of background or utility applications that have no


direct human users, a disaster prevents the company from
enjoying the benefits of proper execution of all the
applications.

• For example, these types of applications include the Internet


transaction servers or fax servers. These supporting systems
are critical to enable general processes to work.
Furthermore, a disaster is something that does not commonly
occur and for which extraordinary procedures are warranted.
5
DRP Strategies-continue
• Recovery Time Requirement Terms
▪ Maximum Tolerable Downtime (MTD)
• The maximum time a business can tolerate the absence
unavailability of a particular business function.
• Different business functions will have different MTDs.
• It correlation between the criticality of a business function and its
maximum downtime.
• The higher the criticality, the shorter the maximum tolerable
downtime
• It consists of two elements, the systems recovery time and the
work recovery time, i.e. MTD = RTO + WRT.

▪ Recovery Time Objective (RTO)


• First segment of the maximum tolerable downtime (MTD)
• The time available to recover disrupted systems and resources.

6
DRP Strategies-continue
▪ Work Recovery Time (WRT)
• The second segment that comprises the maximum tolerable
downtime (MTD).
• It is the time to get critical business functions back up and
running once the systems (hardware, software, and
configuration) are restored.
• From an IT perspective once the systems are back up and
running, recovery is complete.
• But from a business function perspective, additional steps
maybe require before it's back to business.

7
DRP Strategies-continue
▪ Recovery Point Objective (RPO)
• The amount or extent of data loss that can be tolerated by
your critical business systems.
• It is based on current operating procedures and your
estimates of what might happen in the event of a business
disruption.
• It's important to define your RPO in order to ensure your
recovery processes address these timelines.

8
DRP Strategies-continue

Disaster Recovery Timeframe

9
DRP Strategies-continue
• RTO & RPO technological impacts
▪ RTO and RPO parameters are used to evaluate possible
disaster recovery solutions basing on two different
dimensions: Platform Recovery and Data Recovery.

Platform Influenced
Recovery by RTO

Data Influenced
Recovery by RPO

Source: Accenture 2013.

Copyright © 2013 Accenture All rights reserved. 10


10
DRP Strategies-continue
• Platform Recovery Strategies
Indicative Hardware
Type Description RTO Comment
Source
Coverage
Hot Standby Computer hardware that is pre- configured Minutes This option requires high levels Dedicated
with software and business data in a way of operational attention because
that is ready to accept the production load it is a fail over solution. The age
as soon as the primary server fails. The of data is dependant on data
fail-over is typically through a stretched
cluster or load- balancing. restore method.

Warm Computer hardware that is pre- Hours This option has the resources Dedicated
Standby configured with software (or uses dynamic required to recover the system
provisioning). Once a disaster occurs available, but work is required to
business data is restored, the network is make them live.
switched to the backup site, and the
server then accepts the production load.

Cold Computer hardware that requires the Days This option requires a rebuild of Test /
Standby necessary software and data to be built or the system to recover at the Development /
restored before the system would be in a alternate location. Shared Risk
(incl. shared productive state.
risk)

No DR No pre-built hardware for disaster Weeks This option should at least Procure on
Standby recovery. include DR procedures. invocation

Source: Accenture 2013.


11
11
DRP Strategies-continue
• Data Recovery Strategies

Type Description Minimum RPO Comment

Synchronous Synchronous replication from one set No Transactional High I/O applications limit the distance
Disk Mirroring of disks to another set of disks at an Data Loss between primary and alternative data
alternate location (often SAN based). centres.

Asynchronous Asynchronous replication from one set Seconds or Minutes Can run over very large distances but
Disk Mirroring of disks to another set of disks at an does not guarantee replication of
alternate location (often SAN based). transactional data.

Disk Copy Snapshot data replication Hours Implementation may use synchronous or
technologies ensure point-in-time (typically up to 24 asynchronous mirroring but does not
(Periodic
replication of data from one set of necessarily preserve write order during
Snapshot) disks to another (often SAN based). hours)
copy.

Tape Recovery Regular backup from disk to tape 12 hours to many RPO depends on time from backup to
(Regular Backup) followed by an off-siting process days off-siting.
(either inline or duplication or
manual transportation).

Source: Accenture 2013.


12
12
DRP Strategies-continue
• Combing Platform and Data Strategies based on BIA results
Data Recovery Strategy

Synchronous Asynchronous
RPO Disk Mirroring Disk Mirroring
Disk Copy Tape Recovery
Platform Recovery Strategy

RTO Zero Secs/Mins Hours


12 Hours to
Many Days

Hot Standby Minutes

Warm Standby Hours

Cold Standby Days

No DR Standby Weeks

Key:
Not applicable Option available

13
13
Design & Development Phase
• Preparing to develop the DRP
▪ It is important to have a document management system in
place to:
• track versions of plans.
• work in progress.
• work that is scheduled but not started.

▪ This information needs to be backed-up and saved in a


format that does not rely on the underlying systems being
recovered.

▪ For example the banks, like HSBC bank they use secure e-
rooms and external vendors for some of their projects, and
they recommend that the plans be archived on portable
media with copies kept people and various "safe" locations.
14
Design & Development Phase-continue
• The DRP should address 3 main Functional Areas
▪ Recovery
• Once the infrastructure is in place it will be necessary to
recover production data.
• Minimize holes in data very important especially in a
distributed processing environment where one step could be
dependent on one or more predecessor steps actions.
• Then to identify the action to be taken when data
inconsistencies are detected.

▪ Restoring / sustaining business operation


• All processing requirements and service level agreements need
to be defined and documented.
• Dependencies between processes also need to be defined.
• It is important to document the existing process and then build
the plan accordingly.
15
Design & Development Phase-continue
▪ Transferring Data back to Production Machines
• Production will need to shift from a hot site back to a
permanent location.
• A process needs to be defined to manage this migration.
• Synchronize the machines to a specific point in time.

16
Design & Development Phase-continue
• The DRP should address 3 main Technical Areas
▪ Hardware Issues
• Machine type’s configuration and operating system version.
• Patch level.
• The recommendation here is to plan to the worst case.
• The key to success is to ensure that the DRP machines have at
least as much capacity as the production machines that they are
replacing.
▪ Networking Issues
• Is any special type of LAN or VPN software required?
• How do the machines communicate with one another?
• Do applications connect to machines using hostname or hard-
coded IP Addresses?
• Are there requirements for connection to an external network?

17
Design & Development Phase-continue
• The DRP should address 3 main Technical Areas
▪ Software Issues
• Operating system.
• User written applications.
• Third party software (report writers, GUI products,
backup/recovery products, scheduling software).
• A comprehensive inventory of currently used software.

18
Emergency Response (ER)
Operations & Components
• Emergency preparedness must be a living and evolving
process.
• Regular reviews and updates account for changing
tenants, situations and threats.
• Recovery efforts are equally important.
• Getting employees back into buildings safely,
communicating restrictions and bringing in qualified
vendors to make repairs will all need to happen quickly.

19
Emergency Response (ER)
Operations & Components-continue
• Consider all of this when developing your preparedness
plans.
▪ Planning
• Work through many emergency scenarios. The unexpected,
the unheard of, the “it could never happen here” – all should
be considered in the development of emergency
preparedness plans.
▪ Training
• Both classroom and situational training are important to help
those responsible for executing the plan become
knowledgeable, confident and prepared.
▪ Drills
• Bring those plans to life with physical drills involving all
service providers: security, janitorial, engineering, fire
wardens and tenant representatives. Your security team can
help facilitate these drills.
20
Emergency Response (ER)
Operations & Components-continue
• Consider all of this when developing your preparedness
plans.
▪ Education
• The integrators who install emergency systems need to
actively participate in educating security and management on
the accurate and efficient use of those systems.
▪ Technology
• Utilize technology to help consistently communicate
emergency plans. Tools such as these can be customized to
meet the specific nuances of your property. For example,
develop a training CD or online module that houses floor
plans, evacuation routes and factors such as fire extinguisher
locations.

21
Emergency Response (ER)
Operations & Components-continue
• Consider all of this when developing your preparedness
plans.
▪ Coordination
• In multi-tenant buildings, some may have their own
emergency plans. Those tenants should be applauded for
their efforts but everyone needs to coordinate plans to
ensure there are no conflicts and to eliminate confusion
during an emergency.
▪ Communication
• Emergency plans should be communicated to anyone within
your building as well as local authorities. Sharing plans in
advance will help ensure smooth execution in an emergency
situation. Your security team will most likely have a
relationship with local emergency services and can serve as
a liaison.

22
Develop ER Procedures

• Nothing should be assumed or left to chance.


• Design the procedures with the goal of a semi-
experienced person who may not be familiar with your
operations executing the procedure.
• Detailed test plans should be developed prior to execution
and should address all critical functional areas of the DRP.
• Data should be gathered during testing and saved for
future review.
• In the event of problems that data may help the team
make a root cause determination regarding the problem
so that it can be corrected.

23
Develop ER Procedures-continue

• If everything goes right it provides the necessary


documentation to support an external validation effort of
the DRP exercise.
• If every thing worked is to know what every thing is.
• And then to be able to demonstrate that the necessary
tasks were completed successfully.

24
Develop ER Procedures-continue
• Testing and refining the plan
▪ A common problem that we see is the plans are developed,
but they are never tested, or are tested once and forgotten.
▪ A plan that is not continuously refined and validated is
almost worthless.
▪ In order to maximize the chance for success in the event of a
real disaster it is essential that the DRP be executed on a
regular basis.
▪ Specific recovery procedure can generally be tested in-
house on a more frequent basis.

25
Develop ER Procedures-continue

Source: www.dosh.gov.my
26
Alternate Recovery Site

27
Alternate Recovery Site-continue
• What is Cloud Computing?
▪ Cloud computing is a model for enabling convenient, on-
demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned
and released with minimal management effort or service
provider interaction.

Source: National Institute of Standards and Technology, USA 28


28
Alternate Recovery Site-continue
❑Cloud DR as a service
▪ Migrating entire IT operations or DR solutions only to cloud,
and replication or movement of data to cloud brings
significant cost savings and lowering of recovery times.
▪ Can shrink and grow in response to demand. ’ Replication
Mode’ requires fewer resources and incurs low cost. When
a business disruption occurs, the system enters ‘Failover
Mode’, which requires more resources that scale smoothly
without requiring large upfront investments.
▪ Cloud Computing eliminates hardware unification between
primary datacenter and the cloud.
▪ Cloud servers start-up can be easily automated and
managed.

Copyright © 2013 Accenture All rights reserved. 29


29
Alternate Recovery Site-continue

Source: Accenture 2013.

30
30
Alternate Recovery Site-continue
• New storage and backup functionality provided by the
Cloud provide a way to have a remote copy of data
outside the Primary Datacenter.
DR Cloud

Primary Site
IP WAN

Secondary Site

Cloud BCDR Approach


Source: Accenture 2013.
31
31
Site Recovery & Resumption
• Disaster Recovery Scenarios
▪ The study of disaster recovery have been identified and proposed
the following scenarios of disaster:

Source: Accenture 2013.


32
32
Site Recovery & Resumption-continue
• IT DR Stack

Source: Accenture 2013.


33
33
Site Recovery & Resumption-continue
Facility Recovery
• Subscription Services
• Hot, warm, cold sites
• Reciprocal Agreements
• Others
▪ Redundant/Mirrored site (partial or full)
▪ Outsourcing
▪ Rolling hot site
▪ Prefabricated building
• Offsite Facilities should be no less than 20 km away for low to
medium environments. Critical operations should have an
offsite facility 80-300 km away.

34
Site Recovery & Resumption-continue
Hardware Recovery
• Technology Recovery is dependent upon good configuration
management documentation

• May include
▪ PC’s/Servers
▪ Network Equipment
▪ Supplies
▪ Voice and data communications equipment
▪ Service Level Agreement can play an essential role in
hardware recovery

35
Site Recovery & Resumption-continue
Software Recovery
• BIOS Configuration information
• Operating Systems
• Licensing Information
• Configuration Settings
• Applications
• Plans for what to do in the event that the operating
system/applications are not longer available to be
purchased

36
Site Recovery & Resumption-continue
Personnel Recovery
• Identify Essential Personnel
Entire staff is not always necessary to move into recovery
operations
• How to handle personnel if the offsite facility is a great distance
away
• Eliminate single points of failure in staffing and ensure backups
are properly trained
• Payroll need to be included in personnel recovery

37
Site Recovery & Resumption-continue
Data Recovery
• Data Recovery options are driven by metrics established in the
BIA
▪ Maximum Tolerable Downtime (MTD)
▪ Recovery Time Objective (RTO)
▪ Recovery Point Objective (RPO)

• Backups
• Database Shadowing
• Remote Journaling
• Electronic Vaulting

38
Site Recovery & Resumption-continue
Data Recovery
• Database Backups
▪ Disk-shadowing
• Mirroring technology
• Updating one or more copies of data at the same time
• Data saved to two media types for redundancy

Shadow Data
Master Data
Repository
Repository

Database

39
Site Recovery & Resumption-continue
Data Recovery
• Electronic Vaulting
▪ Copy of modified file is sent to a remote location where an
original backup is stored
▪ Transfers bulk backup information
▪ Batch process of moving data

• Remote Journaling
▪ Moves the journal or transaction log to a remote location, not
the actual files

40
Summary

• Have considered
▪ DRP Strategies
▪ Design & Development Phase
▪ Emergency Response (ER) Operations & Components
▪ Develop ER Procedures
▪ Alternate Recovery Site
▪ Site Recovery & Resumption

41
Recommended Textbooks and References

Recommended textbooks:
[1] Corey Schou, Steven Hernandez (2014). Information Assurance Handbook:
Effective Computer Security and Risk Management Strategies, ISBN-13:
978- 0071821650, McGraw Hill.
[2] Disaster Recovery (2011). EC-Council | Press. ISBN-13: 9781435488700,
Cengage Learning.

Recommended reference:
[1] Kim, Michael G.Solomon (2013). Fundamentals of Information Systems
Security (Jones & Bartlett Learning Information Systems Security &
Assurance), 2nd Edition, ISBN-13: 978-1284031621, Jones & Bartlett
Learning.

42

You might also like