0% found this document useful (0 votes)
46 views47 pages

Disaster Recovery and Busi Ness Continuity: Dr. Pranita Upadhyaya

This document discusses disaster recovery (DR) and business continuity planning (BCP). It covers the importance of DR and BCP due to risks from natural and man-made disasters. Key aspects covered include business impact assessments, BCP documentation, DR planning, classification of disasters, threats to data centers, how BCP and DR support security/availability, industry standards, benefits of planning, and the role of prevention. It also discusses DR center infrastructure, solutions, planning considerations, and site selection factors.

Uploaded by

MANOJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views47 pages

Disaster Recovery and Busi Ness Continuity: Dr. Pranita Upadhyaya

This document discusses disaster recovery (DR) and business continuity planning (BCP). It covers the importance of DR and BCP due to risks from natural and man-made disasters. Key aspects covered include business impact assessments, BCP documentation, DR planning, classification of disasters, threats to data centers, how BCP and DR support security/availability, industry standards, benefits of planning, and the role of prevention. It also discusses DR center infrastructure, solutions, planning considerations, and site selection factors.

Uploaded by

MANOJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Disaster Recovery and Busi

ness Continuity

Dr. Pranita Upadhyaya


Outline

Disaster Recovery and Business Continuity


– Business continuity planning
– Business impact assessment
– BCP documentation
– Nature of disaster
– Disaster recovery planning

2
DR and BCP motivation
 WTC, 9/11 terrors
 BASEL II
– An international business standard
– A series of recommendations on banking la
ws and regulations
 e-commerce, e-banking, e-government b
ooming

3
Disaster aftermaths
 Most companies that experience a major disa
ster are no longer in business within 5 year
s !!!
- The US Bureau of Labor -
 Revenue loss
 Brand image hurt
 Customer leaves

 What if in case of public sectors ?

4
How Disasters Affect Businesses
• Direct damage to facilities and equipment
• Transportation infrastructure damage
– Delays deliveries, supplies, customers, employees goi
ng to work
• Communications outages
• Utilities outages
Classification of Disasters

disasters

natural man-made
natural
natural non-intentional intentional

 Thunderstorms  Acts of people  Workplace violence


 Tornadoes  Technological  Civil disobedience
 Lightning system failures - Labor riots
 Earthquakes  Hazardous materials - Political riots
 Volcanoes  Environmental  Terrorism
 Tsunami  Nuclear  Weapons of mass
 Landslides  Aviation, railways destruction
 Floods, droughts  Fires, collapse
 Epidemics

6
9 major threats to Data Center
 Cooling system down
 Power system down
 Radioactive contamination
 Terror (including cyber terror)
 Telecom network cut off
 Huge human resources vacuum
 Earthquake
 Flood
 Fire

7
How BCP and DRP
Support Security
• BCP (Business Continuity Planning) and DRP
(Disaster Recovery Planning)
• Security pillars: C-I-A
– Confidentiality
– Integrity
– Availability
• BCP and DRP directly support availability
BCP and DRP Differences
and Similarities
• BCP
– Activities required to ensure the continuation of criti
cal business processes in an organization
– Alternate personnel, equipment, and facilities
– Often includes non-IT aspects of business
• DRP
– Assessment, salvage, repair, and eventual
restoration of damaged facilities and systems
– Often focuses on IT systems
Industry Standards Supporting
BCP and DRP

• ISO 27001: Requirements for Information


Security Management Systems. Section 14
addresses business continuity management.
• ISO 27002: Code of Practice for Business
Continuity Management.
Industry Standards Supporting
BCP and DRP (cont.)

• NIST 800-34
– Contingency Planning Guide for Information Te
chnology Systems.
– Seven step process for BCP and DRP projects
– From U.S. National Institute for Standards and T
echnology
• NFPA 1600
– Standard on Disaster / Emergency Management
and Business Continuity Programs
– From U.S. National Fire Protection Association
Benefits of BCP and DRP Planning
• Reduced risk
• Process improvements
• Improved organizational maturity
• Improved availability and reliability
• Marketplace advantage
The Role of Prevention
• Not prevention of the disaster itself
– Prevention of surprise and disorganized response
• Reduction in impact of a disaster
– Better equipment bracing
– Better fire detection and suppression
– Contingency plans that provide [near] contin
uous operation of critical business processes
– Prevention of extended periods of downtime
What is a Disaster Recovery ?
 DR : The planned process of restoring systems, data, and infrastructure r
equired to support key ongoing business operations.
 A DR plan : a proactive measure to minimize a company’s downtime duri
ng sudden emergencies
 An unforeseen event : fire, flood, earthquake, etc

Customer site Emergency event Personnel mobilized to Company systems


declared backup DR site run from DR site

14
Benefits from DR center
 Significantly reducing the impact of sales, financial, and cu
stomer losses during unforeseen interruptions to the busin
ess operations

 A successful DR plan gives


– Confidence in knowing the key operations can take place at a se
cond site within a set timeframe – even if your office is affected
– Protection against a single point failure associated with a single
site for operations and business data
– The ability to recover valuable company data
– Fully functional office working areas for your evacuated employe
es during emergencies

15
Types of DR sites
Average
Type Ideal for Pros Cons
recovery
Hot Mission-critical Almost instant failover, Long setup process. High 10
standby applications, high full data integrity, little cost, higher administrative seconds ~
business impact to no impact to business burden 2 minutes
activities operations, guaranteed
recovery timeframe
Warm Mission-critical Fast failover, little data Long setup process, medium- 10 ~ 45
standby applications, loss, small-to-medium to-high cost, medium minutes
medium-to-high impact to business administrative burden
business impact operations, guaranteed
activities recovery timeframe
Cold Non-mission- Low initial cost, Unpredictable recovery time, 4 hours ~
standby critical guaranteed equipment tedious restoration process, 2 days
applications, low availability potentially large impact to
business impact business operations
activities
Offsite Non-mission- Flexible, inexpensive, Very long recovery time, must 18 hours ~
data critical secure first configure application 8 days
backup applications, very environment and then restore
16
DR components

 DR center infrastructure
 DR Solution implementation
 DR planning

17
DR – infrastructure construction

18
Data center design considerations
 Operational reliability
 Quick changes, including additions and rapid expansions
 Online status monitoring
 Life cycle management
 Customer access
 Physical security
 Rapid detection, identification and resolution of faults
 Modern data center infrastructure management (DCIM) solution 
– provides data center visualization,
– robust reporting and analytics,
– becomes the central source-of-truth for changes being made in the dat
a center

19
Considerations for DR site selection

 Geographic accessibility from the main center


 Expandability for the future demand
 Network capabilities for interconnections (optical fibers)
 Proximity to public utilities (power supply, emergency service
s, transport, etc)
 Security
- Natural hazards like flood, seismic activity, and lightning
- Potential man-made hazards (strikes, fire, pollution, etc)
 Manageability
 Economic feasibility

20
Engineering Plan & Space design

21
Critical Building Systems

22
Case : DR site selection - distance
 US : 40 miles (64Km, out of the same influen
ce of the hurricane)
 Japan : on a different tectonic plate, a differe
nt seismic activity zone
 EU : 5~10Km (against bombing attack)
 Korea : similar to the situation in EU, usually
+30km away

 What about in Nepal?

23
DR site selection - distance

disaster
manageability responsiveness

optimum point ?

distance

24
Site evaluation factors : ASSES

 Backup, redundancy
Availability
 24*7 operation

 Natural disasters
stability Security
 Potential man-made disasters

Survivability  IT resources

 Maintenance
Efficiency
 Hi-quality equipment
economics
 Physical scalability
Scalability
 Functional scalability

25
General DR plan
 Primary processing location
 Backup processing location Primary

– Mirrors primary processing locati


on
– Can be used for load balancing
Backup
 Remote storage and archival
– Tape vaults
– Storage for data files, SaaS librar
y images
– Allows government operations c
ontinuity in the event of major d Archive
isruption

26
DR Solution implementation

27
DRS implementation

Planning Analyzing Proceeding & execution

Business
Define DR DR Implementation Implementing
impact & DRP
requirements solution methodology DRS
system

 BIA, system analysis


 DR  DR solution selection
- business impact
requirements - H/W solution
- data
- RPO - S/W solution
- customer contact
- RTO
- RAO  DR planning
 DR solution analysis
- DR process
- economics
 Detailed DR - DRP test & update
- manageability
targets
- technological
- reference

28
DR requirements
 Identify what are the Functional Areas that MUST be recov
ered during an emergency
 Define the Recovery Time Objective (RTO)
- “How much downtime (if any) can be tolerated?”
 Define the Recovery Point Objective (RPO)
- “How much data (if any) can you afford to lose?”

In addition,
 Define the Recovery Access Objective (RAO), and
 the Recovery Scope Objective (RSO)

29
Recovery Access Objective (RAO)
– Subcomponent of RTO that
– measures the time it takes for the network to re-e
stablish connectivity of users, customers, and part
ners
– with the applications at the alternate site once th
e primary site has been disrupted 
– It identifies the point in time at which the users t
hat were connected to applications and services r
unning on one data center have access to the sa
me applications and services running at an alterna
te data center. 

30
RPO/RTO vs. cost

Critical data is Disaster Systems recovered


recovered strikes and operational

time

time t0 time t1 time t2


Recovery point Recovery time
Days hours mins secs secs mins hours days weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape


backup replication replication replication cluster migration restore

Increasing cost Increasing cost

How current or fresh is How quickly can systems and


the data after recovery ? data be recovered ?
31
DR solutions
type solution DB/file
- HAGEO
- GEORM IBM unix
DBMS,
OS
- VVR (Veritas Volume HP, SUN File system
System
mirroring Replicator) unix
(S/W type) - RRDF DBS
DB2, ORACLE
DBMS - Symmetric Replication
ORACLE DBMS
- SharePlex
- SRDF EMC
Disk mirroring - HRC All file
HITACHI
(H/W type) systems
- XRC IBM
• HAGEO : High Availability Geographic Cluster • SRDF : Symmetrix Recovery Data Facility
• GeoRM : Geographic Remote Mirroring • HRC : Hitachi Remote Copy
• RRDF : Remote Recovery Data Facility • XRC : eXtended Remote Copy
32
DR solution selection
cost

high
Mirroring(Copy Database)

real-time data replication(Copy data and database objects)

log journaling

periodic data replication


offsite archive
low
backup tape
time
minutes hours days
- Increasing CAPEX - Increasing OPEX
- DR solution/equipment - Backup data
- Real-time data replication - Data consistency
- N/W implementation needed

33
DR solution selection
Continuous availability High availability Improved availability Traditional availability

Loss

 IRC : intermittent
SOS remote copy
Loss after
backup Remote
DASD  SOS : standby
operating system
 PPRC : peer-to-peer
Remote tape
IRC remote copy
Little loss  XRC : extended
XRC
Electronic remote copy
RR/400
journaling  Electronic
GDPS/XRC journaling : dual
PPRC transaction logging
No loss SRDF
GDPS/PPRC
Recovery
time
0~1 hour 1~6 hours 6~24 hours 24~48 hours

34
Business Continuity Planning

35
Creating a BCP
 Is an on-going process, not a project with a
beginning and an end
• Creating, testing, maintaining, and updating
• “Critical” business functions may evolve
 The BCP team must include both business and
IT personnel
 Requires the support of senior management

36
BCP phases
1. Project management & initiation
2. Business Impact Analysis (BIA)
3. Recovery strategies
4. Plan design & development
5. Testing, maintenance, awareness, training
I - Project management & initiation
Establish need (risk analysis)
Get management support
Establish team (functional, technical, BCC – Business
Continuity Coordinator)
Create work plan (scope, goals, methods, timeline)
Initial report to management
Obtain management approval to proceed
II - Business Impact Analysis (BIA)
Goal: obtain formal agreement with senior management
on the MTD for each time-critical business resource
MTD – maximum tolerable downtime, also known as
MAO (Maximum Allowable Outage)
Quantifies loss due to business outage (financial, extra
cost of recovery, embarrassment)
Does not estimate the probability of kinds of incidents,
only quantifies the consequences
II - BIA phases
Choose information gathering methods (surveys,
interviews, software tools)
Select interviewees
Customize questionnaire
Analyze information
Identify time-critical business functions
Assign MTDs
Rank critical business functions by MTDs
Report recovery options
Obtain management approval
III – Recovery strategies
Recovery strategies are based on MTDs
Predefined
Management-approved
Different technical strategies
Different costs and benefits
How to choose?
Careful cost-benefit analysis
Driven by business requirements
Strategies should address recovery of:
•Business operations
•Facilities & supplies
•Users (workers and end-users)
•Network, data center, telecommunications (technical)
•Data (off-site backups of data and applications)
IV – BCP development / implementati
on
Detailed plan for recovery
•Business & service recovery plans
•Maintenance
•Awareness & training
•Testing
Sample plan phases
•Initial disaster response
•Resume critical business operations
•Resume non-critical business operations
•Restoration (return to primary site)
•Interacting with external groups (customers, media,
emergency responders)
V – BCP final phase
Testing
•Until it’s tested, you don’t have a plan
•Testing types: Structured walk-through, Checklist, Simulation,
Parallel, Full interruption.
Maintenance
•Fix problems found in testing
•Implement change management
•Audit and address audit findings
Awareness / Training
•BCP team is probably the DR team
•BCP training must be on-going, part of corporate culture
DR planning

44
Disaster recovery plan
 DRP
– is a subset BCP (business continuity plannin
g), and
– should include planning for resumption of a
pplications, data, hardware, communications
(such as networking) and other IT infrastruct
ure.

45
Body of DR plan

• Immediate steps to be taken


Emergency information sheet • Individuals to be contacted

• Its purpose, author,


Introduction to the plan organization, scheduled updates

Communication plan

Pre-disaster actions

• Step by step, what to do


Instructions for response and recovery afterwards

46
Case : DR plan
Main center DR center

Spread out & Identify emergency &


redeploy Make DRS ready
time Identify disaster &
Declare emergency response Recover system
System recovery
Activate system

Restore data
RTO : 3 hours
Recover DB & task Recover N/W

Consistency
Recover DB & task
?

DB & business recovery


Start DRS

Resume business

47

You might also like