0% found this document useful (0 votes)
16 views

Module 09

Uploaded by

Ons Hanafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Module 09

Uploaded by

Ons Hanafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MODULE – 9

INTRODUCTION TO
BUSINESS CONTINUITY

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 1
Module 9: Introduction to Business Continuity

Upon completion of this module, you should be able to:


• Define business continuity (BC) and information availability (IA)
• Explain the impact of information unavailability
• Describe BC planning process
• Explain business impact analysis (BIA)
• Explain BC technology solutions

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 2
Module 9: Introduction to Business Continuity

Lesson 1: Business Continuity Overview


During this lesson the following topics are covered:
• Business continuity
• Information availability metrics

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 3
Why Business Continuity (BC)?
• Information is an organization’s most important asset
• Continuous access to information ensures smooth functioning of
business operations
• Cost of unavailability of information to an organization is greater
than ever
Threats to information availability
Unplanned Planned
Natural disasters
occurrences occurrences
• flood, fire, • cybercrime, • upgrades,
earthquake human error, backup, restore
network and • result in the
computer inaccessibility
failure of information

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 4
What is Business Continuity?
Business Continuity

It is a process that prepares for, responds to, and recovers from a


system outage that can adversely affects business operations.

BC involves proactive measures


An integrated and enterprise- (data protection, and security)
wide process that includes set and reactive countermeasures
of activities to ensure (disaster recovery and restart)
“information availability” to be invoked in the event of a
failure.

In a virtualized environment, The goal of a BC solution is to


BC solutions need to protect ensure the “information
both physical and virtualized availability” required to conduct
resources. vital business operations.

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 5
Information Availability
Information Availability

It is the ability of an IT infrastructure to function according to


business expectations, during its specified time of operation.

• Information availability can be defined with the help of:

Accessibility Reliability Timeliness


• Information • Information • Defines the time
should be should be window during
accessible to the reliable and which
right user when correct in all information must
required aspects be accessible

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 6
Causes of Information Unavailability
Disaster (<1% of Occurrences)
Natural or man made disaster
• Flood
• Fire
• Earthquake
• Contamination

Unplanned Outages (20%)


Failure
• Database corruption
• Component (physical and/or virtual) failure
• Human error

Planned Outages (80%)-expected and


scheduled but still cause data to be
unavailable
• Various planned and Competing workloads
unplanned incidents result in • Backup, reporting
• Data warehouse extracts
data unavailability • Application and data restore

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 7
Impact of Downtime
Lost Productivity Know the downtime costs (per Lost Revenue
• Number of employees hour, day, two days, and so on.) • Direct loss
impacted x hours out x • Compensatory payments
hourly rate • Lost future revenue
• Billing losses
• Investment losses

Damaged Reputation Financial Performance


• Customers • Revenue recognition
• Suppliers • Cash flow
• Financial markets • Lost discounts (A/P)
• Banks • Payment guarantees
• Business partners • Credit rating
• Stock price
Other Expenses
• Temporary employees, equipment rental, overtime
costs, extra shipping costs, travel expenses, and so on.

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 8
Impact of Downtime (contd.)

Average cost of Average Average


downtime per productivity revenue loss
hour loss per hour per hour
(total revenue of
(total salaries
an organization
and benefits of
Average per week) /
all employees
productivity loss (average
per week) /
per hour + number of hours
(average
average revenue per week that
number of
loss per hour an organization
working hours
is open for
per week)
business)

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 9
Measuring Information Availability
• Information availability relies on the availability of both physical
and virtual components of a data center.
• Failure of these components might disrupt information availability.
• A failure is the termination of a component’s ability to perform
a required function.
• The component’s ability can be restored by performing an external
corrective actions, such as a manual reboot, a repair, or
replacement of the failed component(s).
• Proactive risk analysis, performed as part of the BC planning
process, considers the component failure rate and average repair
time, which are measured by MTBF (Mean Time Between Failure)
and MTTR (Mean Time To Repair)

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 10
MTBF = Mean Time
Measuring Information Availability (contd.) Between Failure
Time to repair or ‘downtime’
MTTR = Mean Time
To Repair

Response Time Recovery Time

Detection Repair Restoration Incident

Incident Diagnosis Recovery Time

Detection Repair time Time between failures or


elapsed time ‘uptime’

• MTBF: Average time available for a system or component to perform its


normal operations between failures
MTBF = Total uptime/Number of failures
• MTTR: Average time required to repair a failed component
MTTR = Total downtime/Number of failures

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 11
MTBF = Mean Time
Measuring Information Availability (contd.) Between Failure
MTTR = Mean Time
To Repair

• IA can be expressed in terms of system uptime and


downtime and measured as the amount or percentage of
system uptime:
IA = MTBF/(MTBF + MTTR) or IA = uptime/(uptime + downtime)
• System uptime is the period of time during which the system is
in an accessible state
• System downtime is the period of time during which the
system is not accessible state

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 12
Availability Measurement – Levels
of ‘9s’ Availability
• Uptime per year is based on the exact timeliness requirements of the service.
• This calculation leads to the number of “9s” representation for availability
metrics.
• Table lists the approximate amount of downtime allowed for a service to
achieve certain levels of 9s availability.
• For example, a service that is said to be “five 9s available” is available for
99.999 percent of the scheduled time in a year (24 ×365).
Uptime (%) Downtime (%) Downtime per Year Downtime per Week
98 2 7.3 days 3hrs, 22 minutes
99 1 3.65 days 1 hr, 41 minutes
99.8 0.2 17 hrs, 31 minutes 20 minutes, 10 secs
99.9 0.1 8 hrs, 45 minutes 10 minutes, 5 secs
99.99 0.01 52.5 minutes 1 minute
99.999 0.001 5.25 minutes 6 secs
99.9999 0.0001 31.5 secs 0.6 secs

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 13
Module 9: Introduction to Business Continuity

Lesson 2: BC Planning and Technology Solutions


During this lesson the following topics are covered:
• BC terminologies
• BC planning
• Business impact analysis
• Single points of failure
• Multipathing software

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 14
BC Terminologies – 1

Disaster recovery Disaster restart


Coordinated process of restoring
systems, data, and infrastructure Process of restarting
required to support business business operations with
operations after a disaster mirrored consistent
occurs copies of data and
applications
Restoring previous copy of
data and applying logs to that
copy to bring it to a known Generally implies use of
point of consistency replication technologies
Generally implies use of
backup technology

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 15
BC Terminologies – 2 • Based on the RPO, organizations
plan for the frequency with which
Recovery-Point Objective (RPO)
a backup or replica must be made
• Point-in-time to which systems
and data must be recovered after RPO of 24 hours: Backups are created at
an outage an offsite tape library every midnight.
Recovery strategy: to restore data from
• Amount of data loss that a the set of last backup tapes.
business can endure
RPO of 6 hours: Backups must be made
at least once in 6 hours
Weeks
Tape Backup
RPO of 1 hour: Backup to the remote site
Days
Periodic Replication every hour. Recovery strategy is to
Hours recover the database to the point of the
Minutes
Asynchronous Replication last log shipment.
Synchronous Replication
Seconds RPO in the order of minutes: Mirroring
data asynchronously to a remote site.
Recovery-point objective

RPO of zero: Mirroring data


synchronously to a remote site.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 16
• Based on the RTO, organizations plan
BC Terminologies – 2 for recovery strategies to ensure data
availability
Recovery-Time Objective (RTO) RTO of 72 hours: Restore from tapes
• Time within which systems and available at a cold site
applications must be recovered
RTO of 12 hours: Restore from tapes
after an outage available at a hot site.
• Amount of downtime that a
business can endure and survive RTO of few hours: Use disk-based backup
technology, which gives faster restore
than a tape backup.
Weeks
Tape Restore RTO of a few seconds: Cluster production
Days servers with bidirectional mirroring,
Disk Restore
Hours
enabling the applications to run at both
Manual Migration sites simultaneously.
Minutes

Global Cluster Cold site: a site when Hot site: a site when
Seconds operations can be moved in operations can be moved in
the event of disaster, with the event of disaster. All
Recovery-time objective minimum IT infrastructure equipment is available and
in place, but not activated running at all times

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 17
• Determine BC requirements.
• Estimate the scope and budget to achieve
BC Planning Lifecycle requirements.
• Select a BC team that includes subject
matter experts from all areas of the
• Train the employees who are
business, whether internal or external.
responsible for backup and
• Create BC policies.
replication
Establishing
• Train employees on emergency • Collect information on data
Objectives
response procedures profiles, business processes,
• Perform damage-assessment infrastructure support
processes and review recovery • Conduct a business impact
plans Training, analysis.
• Test the BC plan regularly to Testing, • Identify critical business
evaluate its performance and Assessing, Analyzing processes and assign recovery
identify its limitations and priorities.
Maintaining
• Perform risk analysis for
critical functions and create
mitigation strategies.
• Perform cost benefit analysis.
• Evaluate options.
• Implement risk management
and mitigation procedures • Define the team structure and
• Prepare the DR sites that can Designing and assign individual roles and
be utilized if a disaster affects Implementing responsibilities
Developing
the primary data center • Design data protection
• Implement redundancy for strategies and develop
every resource in a data center infrastructure
to avoid single points of failure • Develop contingency solution

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 18
Failure Analysis
• Involves analyzing both physical and virtual infrastructure
components
 To identify systems that are susceptible to a single point of failure
and implementing fault-tolerance mechanisms.

Resolving
Single Point of Multipathing
Single Points
Failure Software
of Failure

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 19
Failure Analysis: (1) Single Points of Failure
Single Points of Failure

It refers to the failure of a component of a system that can


terminate the availability of the entire system or IT service.

Array port

Client IP Switch FC Switch


Server Storage Array
A VM, a hypervisor, or an HBA/NIC on the server, the physical server itself, the IP network,
the FC switch, the storage array port, or even the storage array could be a potential single
point of failure
E.g.: For example, failure of a hypervisor can affect all the running VMs and virtual
network, which are hosted on it

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 20
Failure Analysis: (2) Resolving Single Points of Failure / Fault
tolerant
Redundant
Client Redundant Ports Arrays
Redundant Paths

NIC HBA
NIC
HBA

IP
Redundant
Network
Redundant FC
NIC
Switches
HBA Production Remote
NIC Teaming NIC Storage Array Storage Array
HBA
• To mitigate a single point of failure, systems are designed with
Clustered Servers redundancy, such that the system will fail only if all the
components in the redundancy group fail.
• This ensures that the failure of a single component does not
affect data availability.
• Careful analysis is performed to eliminate every single point of
failure

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 21
Failure Analysis: (2) Resolving Single Points of Failure /
Fault tolerant
• Based on the figure, implementation to resolve single points of
failure includes:
 Configuration of multiple HBAs to mitigate single HBA failure.
 Configuration of multiple fabrics to account for a switch failure.
 Configuration of multiple storage array ports to enhance the storage
array’s availability.
 RAID configuration to ensure continuous operation in the event of disk
failure.
 Implementing a storage array at a remote site to mitigate local site
failure.
 Implementing server (host) clustering, a fault-tolerance mechanism
whereby two or more servers in a cluster access the same set of volumes.
 Clustered servers exchange heartbeats to inform each other about their
health.
 If one of the servers fails, the other server takes up the complete workload.

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module #: Module Name 22
Failure Analysis: (3) Multipathing Software
• Configuration of multiple paths increases the data availability
through path failover
• Multipathing software provides the functionality to recognize
and utilize alternative I/O paths to data
Provides load balancing by Intelligently manages the
distributing I/Os to all paths to a device by sending
available, active paths: I/O down the optimal path:

• Improves I/O performance • Based on the load balancing


and data path utilization and failover policy setting
for the device

• E.g.: Microsoft Multipath I/O (MPIO) is a Microsoft-provided framework that allows


storage providers to develop multipath solutions that contain the hardware-specific
information needed to optimize connectivity with their storage arrays

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 23
Business Impact Analysis
• Identifies which business units and processes are essential to
the survival of the business
• BIA includes the following set of tasks:
 Determine the business areas
 Identify key business processes critical to its operation
 Determine attributes of the business process: applications,
databases, h/w, s/w
 Estimates the cost of failure for each business process
 Calculates the maximum tolerable outage and defines RTO for
each business process
 Businesses can prioritize and implement countermeasures to
mitigate the likelihood of such disruptions

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 24
BC Technology Solutions
• After analyzing the business impact of an outage, designing the
appropriate solutions to recover from a failure is the next
important activity
• Solutions that enable BC are:
 Fault tolerant configuration
 Done by implementing redundancies
 Resolving single points of failure
 Multipathing software
 Backup and replication
 Backup and recovery
 Local replication
 Remote replication

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 25
Backup and Replication
Note: Backup and Replication will be discussed in forthcoming modules.

Backup and Remote


Local Replication
Recovery Replication
• Backup to tape has • Data can be • Data in a storage
been a replicated to a array can be
predominant separate location replicated to
method to ensure within the same another storage
BC storage array. array located at a
• Frequency of • The replica is used remote site.
backup is independently for • If the storage array
determined based BC operations. is lost due to a
on RPO, RTO, and • Replicas can also be disaster, BC
the frequency of used for restoring operations start
data changes operations if data from the remote
corruption occurs. storage array.

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 26
Module 9: Introduction to Business Continuity

Concept in Practice

• EMC PowerPath

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 27
EMC PowerPath Host Application

 Host-based multipathing PowerPath

HOST
software HBA Driver HBA Driver HBA Driver HBA Driver

 Provides path failover and load- HBA HBA HBA HBA

balancing functionality
 Automatic detection and
recovery from host-to-array path
Storage
failures Network

 PowerPath/VE software allows


optimizing virtual environments
with PowerPath multipathing
features
STORAGE LUNs

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 28
Module 9: Summary
• Importance of business continuity
• Impact of information unavailability
• Information availability metrics
• Business impact analysis
• Single points of failure
• Multipathing software

EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. Module 9: Introduction to Business Continuity 29

You might also like