Module 09
Module 09
INTRODUCTION TO
BUSINESS CONTINUITY
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 1
Module 9: Introduction to Business Continuity
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 2
Module 9: Introduction to Business Continuity
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 3
Why Business Continuity (BC)?
• Information is an organization’s most important asset
• Continuous access to information ensures smooth functioning of
business operations
• Cost of unavailability of information to an organization is greater
than ever
Threats to information availability
Unplanned Planned
Natural disasters
occurrences occurrences
• flood, fire, • cybercrime, • upgrades,
earthquake human error, backup, restore
network and • result in the
computer inaccessibility
failure of information
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 4
What is Business Continuity?
Business Continuity
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 5
Information Availability
Information Availability
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 6
Causes of Information Unavailability
Disaster (<1% of Occurrences)
Natural or man made disaster
• Flood
• Fire
• Earthquake
• Contamination
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 7
Impact of Downtime
Lost Productivity Know the downtime costs (per Lost Revenue
• Number of employees hour, day, two days, and so on.) • Direct loss
impacted x hours out x • Compensatory payments
hourly rate • Lost future revenue
• Billing losses
• Investment losses
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 8
Impact of Downtime (contd.)
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 9
Measuring Information Availability
• Information availability relies on the availability of both physical
and virtual components of a data center.
• Failure of these components might disrupt information availability.
• A failure is the termination of a component’s ability to perform
a required function.
• The component’s ability can be restored by performing an external
corrective actions, such as a manual reboot, a repair, or
replacement of the failed component(s).
• Proactive risk analysis, performed as part of the BC planning
process, considers the component failure rate and average repair
time, which are measured by MTBF (Mean Time Between Failure)
and MTTR (Mean Time To Repair)
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 10
MTBF = Mean Time
Measuring Information Availability (contd.) Between Failure
Time to repair or ‘downtime’
MTTR = Mean Time
To Repair
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 11
MTBF = Mean Time
Measuring Information Availability (contd.) Between Failure
MTTR = Mean Time
To Repair
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 12
Availability Measurement – Levels
of ‘9s’ Availability
• Uptime per year is based on the exact timeliness requirements of the service.
• This calculation leads to the number of “9s” representation for availability
metrics.
• Table lists the approximate amount of downtime allowed for a service to
achieve certain levels of 9s availability.
• For example, a service that is said to be “five 9s available” is available for
99.999 percent of the scheduled time in a year (24 ×365).
Uptime (%) Downtime (%) Downtime per Year Downtime per Week
98 2 7.3 days 3hrs, 22 minutes
99 1 3.65 days 1 hr, 41 minutes
99.8 0.2 17 hrs, 31 minutes 20 minutes, 10 secs
99.9 0.1 8 hrs, 45 minutes 10 minutes, 5 secs
99.99 0.01 52.5 minutes 1 minute
99.999 0.001 5.25 minutes 6 secs
99.9999 0.0001 31.5 secs 0.6 secs
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 13
Module 9: Introduction to Business Continuity
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 14
BC Terminologies – 1
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 15
BC Terminologies – 2 • Based on the RPO, organizations
plan for the frequency with which
Recovery-Point Objective (RPO)
a backup or replica must be made
• Point-in-time to which systems
and data must be recovered after RPO of 24 hours: Backups are created at
an outage an offsite tape library every midnight.
Recovery strategy: to restore data from
• Amount of data loss that a the set of last backup tapes.
business can endure
RPO of 6 hours: Backups must be made
at least once in 6 hours
Weeks
Tape Backup
RPO of 1 hour: Backup to the remote site
Days
Periodic Replication every hour. Recovery strategy is to
Hours recover the database to the point of the
Minutes
Asynchronous Replication last log shipment.
Synchronous Replication
Seconds RPO in the order of minutes: Mirroring
data asynchronously to a remote site.
Recovery-point objective
Global Cluster Cold site: a site when Hot site: a site when
Seconds operations can be moved in operations can be moved in
the event of disaster, with the event of disaster. All
Recovery-time objective minimum IT infrastructure equipment is available and
in place, but not activated running at all times
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 17
• Determine BC requirements.
• Estimate the scope and budget to achieve
BC Planning Lifecycle requirements.
• Select a BC team that includes subject
matter experts from all areas of the
• Train the employees who are
business, whether internal or external.
responsible for backup and
• Create BC policies.
replication
Establishing
• Train employees on emergency • Collect information on data
Objectives
response procedures profiles, business processes,
• Perform damage-assessment infrastructure support
processes and review recovery • Conduct a business impact
plans Training, analysis.
• Test the BC plan regularly to Testing, • Identify critical business
evaluate its performance and Assessing, Analyzing processes and assign recovery
identify its limitations and priorities.
Maintaining
• Perform risk analysis for
critical functions and create
mitigation strategies.
• Perform cost benefit analysis.
• Evaluate options.
• Implement risk management
and mitigation procedures • Define the team structure and
• Prepare the DR sites that can Designing and assign individual roles and
be utilized if a disaster affects Implementing responsibilities
Developing
the primary data center • Design data protection
• Implement redundancy for strategies and develop
every resource in a data center infrastructure
to avoid single points of failure • Develop contingency solution
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 18
Failure Analysis
• Involves analyzing both physical and virtual infrastructure
components
To identify systems that are susceptible to a single point of failure
and implementing fault-tolerance mechanisms.
Resolving
Single Point of Multipathing
Single Points
Failure Software
of Failure
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 19
Failure Analysis: (1) Single Points of Failure
Single Points of Failure
Array port
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 20
Failure Analysis: (2) Resolving Single Points of Failure / Fault
tolerant
Redundant
Client Redundant Ports Arrays
Redundant Paths
NIC HBA
NIC
HBA
IP
Redundant
Network
Redundant FC
NIC
Switches
HBA Production Remote
NIC Teaming NIC Storage Array Storage Array
HBA
• To mitigate a single point of failure, systems are designed with
Clustered Servers redundancy, such that the system will fail only if all the
components in the redundancy group fail.
• This ensures that the failure of a single component does not
affect data availability.
• Careful analysis is performed to eliminate every single point of
failure
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 21
Failure Analysis: (2) Resolving Single Points of Failure /
Fault tolerant
• Based on the figure, implementation to resolve single points of
failure includes:
Configuration of multiple HBAs to mitigate single HBA failure.
Configuration of multiple fabrics to account for a switch failure.
Configuration of multiple storage array ports to enhance the storage
array’s availability.
RAID configuration to ensure continuous operation in the event of disk
failure.
Implementing a storage array at a remote site to mitigate local site
failure.
Implementing server (host) clustering, a fault-tolerance mechanism
whereby two or more servers in a cluster access the same set of volumes.
Clustered servers exchange heartbeats to inform each other about their
health.
If one of the servers fails, the other server takes up the complete workload.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 22
Failure Analysis: (3) Multipathing Software
• Configuration of multiple paths increases the data availability
through path failover
• Multipathing software provides the functionality to recognize
and utilize alternative I/O paths to data
Provides load balancing by Intelligently manages the
distributing I/Os to all paths to a device by sending
available, active paths: I/O down the optimal path:
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 23
Business Impact Analysis
• Identifies which business units and processes are essential to
the survival of the business
• BIA includes the following set of tasks:
Determine the business areas
Identify key business processes critical to its operation
Determine attributes of the business process: applications,
databases, h/w, s/w
Estimates the cost of failure for each business process
Calculates the maximum tolerable outage and defines RTO for
each business process
Businesses can prioritize and implement countermeasures to
mitigate the likelihood of such disruptions
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 24
BC Technology Solutions
• After analyzing the business impact of an outage, designing the
appropriate solutions to recover from a failure is the next
important activity
• Solutions that enable BC are:
Fault tolerant configuration
Done by implementing redundancies
Resolving single points of failure
Multipathing software
Backup and replication
Backup and recovery
Local replication
Remote replication
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 25
Backup and Replication
Note: Backup and Replication will be discussed in forthcoming modules.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 26
Module 9: Introduction to Business Continuity
Concept in Practice
• EMC PowerPath
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 27
EMC PowerPath Host Application
HOST
software HBA Driver HBA Driver HBA Driver HBA Driver
balancing functionality
Automatic detection and
recovery from host-to-array path
Storage
failures Network
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 28
Module 9: Summary
• Importance of business continuity
• Impact of information unavailability
• Information availability metrics
• Business impact analysis
• Single points of failure
• Multipathing software
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 29