0% found this document useful (0 votes)
233 views10 pages

Chapter 6 BC and DRP V 2

This document discusses business continuity and disaster recovery plans. It defines key terms like business impact analysis, disaster recovery planning, and business continuity planning. It explains that business continuity planning aims to ensure business processes can continue after disruptions, while disaster recovery focuses on restoring IT systems. The document outlines best practices for developing, implementing, and testing business continuity and disaster recovery plans, including performing a business impact analysis, developing recovery strategies, and establishing recovery time and point objectives.

Uploaded by

JoeFSabater
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
233 views10 pages

Chapter 6 BC and DRP V 2

This document discusses business continuity and disaster recovery plans. It defines key terms like business impact analysis, disaster recovery planning, and business continuity planning. It explains that business continuity planning aims to ensure business processes can continue after disruptions, while disaster recovery focuses on restoring IT systems. The document outlines best practices for developing, implementing, and testing business continuity and disaster recovery plans, including performing a business impact analysis, developing recovery strategies, and establishing recovery time and point objectives.

Uploaded by

JoeFSabater
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter / Domain 6

Business Continuity and Disaster Recovery


Business continuity has to do with the business i.e. recovery of the business processes
so business can operate and can survive as a company.
Disaster recovery has to do with IT and is a subset of business continuity. This typically
details the process IT personnel will use to restore the computer systems. Disaster
recovery plans may be included in the business continuity plan or as a separate document
altogether, depending on the needs of the business.
Three tasks (Task Statements):
1. Evaluate adequacy of backup and restore
2. Evaluate the organizations disaster recovery plan
3. Evaluate the organizations business continuity plan.
Terms :

BIA Business Impact Analysis


DRP Disaster Recovery Planning
BCP (Business Continuity Planning) / BRP (Business Recovery or
Resumption Planning) / COOP (Continuity of Operations Planning)

Business continuity planning takes into consideration:


Those key operations that are most necessary to the survival of the organization
The human/material resources supporting them
The business continuity plan includes:
The disaster recovery plan that is used to recover a facility rendered inoperable,
including relocating operations into a new location
The restoration plan that is used to return operations to normality whether in a
restored or new facility
Six Rs of business continuity:
1. Readiness planning
2. Respond first few hours
3. Recovery just enough to keep going
4. Resume
5. Repair at original facility
6. Return to original facility always fail back your least critical first.

CISA 2008 by Joe Sabater

Phases of the business continuity planning process :


1. Creation of a business continuity and disaster recovery policy overall policy, project
management and initiation
2. Business impact analysis (BIA) the basic foundation of BCP; identification of all
business functions within an organization, and then assigning a level of importance to
each business function. BIA is the primary tool for gathering this information and
assigning criticality, recovery point objectives, and recovery time objectives.
3. Classification of operations and criticality analysis Develop profile of resources
required to support critical functions i.e. resources including hardware (mainframe,
data and voice communications and personal computers), software (vendor supplied,
in-house developed, etc.), documentation (DP, user, procedures), outside support
(public networks, DP services, etc.), facilities (office space, office equipment, etc.)
and personnel for each business unit. Then identify various recovery strategies
(including backup strategies / systems) based on factors (e.g. Cost). Recovery
Strategies will be based on short term, intermediate term and long term outages.
4. Development of a business continuity plan and disaster recovery procedures based
on the above.
5. Training and awareness program
6. Testing and implementation of plan
7. Monitoring including maintenance (update)
BCP has to be aligned to change management process for updating the plan.
Risk assessment is the first step to find the processes most important to the business
BCP focuses on availability and is primarily the responsibility of senior management.
BIA business impact analysis is a critical step in this process. need to understand the
organization, business processes in order to be able to do this properly. Outputs are RPO
and RTO.
Different BIA approaches :
(a) Questionnaire; (b) Interview key users; (c) Work group bringpeople together to
discuss.
Incident/Crisis Levels :

CISA 2008 by Joe Sabater

Negligible incidents are those causing no perceptible or significant damage, such as


very brief operating system (OS) crashes with full information recovery or
momentary power outages with uninterruptible power supply (UPS) backup.
Minor events are those that, while not negligible, produce no negative material (of
relative importance) or financial impact.
Major incidents cause a negative material impact on business processes and may
affect other systems, departments or even outside clients.
Crisis is a major incident that can have serious material (of relative importance)
impact on the continued functioning of the business and may also adversely impact
other systems or third parties. The severity of the impact depends on the industry and
circumstances, but is generally directly proportional to the time elapsed from the
inception of the incident to incident resolution.

Auditor can review past transaction volume to determine impact to the business if the
system was unavailable.
3 main questions :
1. What are the different business processes?
2. What are thecritical info resources that support these processes?
3. What is the critical recovery time for these resources how long can you be down
before losses are significant?
Two cost factors associated with the above:
Down time cost how much does it cost you if the application/system is down?
Recovery cost cost of thestrategies to minimize your downtime.
The sum of these costs should be minimized. Downtime costs increase over time, recovery
costs decrease over time. The sum usually is a U curve, at the bottom of the U curve is
where the lowest cost can be found.
Function / Application Classification Description :
Critical - These functions cannot be performed unless they are replaced by identical
capabilities. Critical applications cannot be replaced by manual methods. Tolerance to
interruption is very low; therefore, cost of interruption is very high.
Vital - These functions can be performed manually, but only for a brief period of time.
There is a higher tolerance to interruption than with critical systems and, therefore,
somewhat lower costs of interruption, provided that functions are restored within a
certain time frame (usually five days or less).
CISA 2008 by Joe Sabater

Sensitive -These functions can be performed manually, at a tolerable cost and for an
extended period of time. While they can be performed manually, it usually is a difficult
process and requires additional staff to perform.
Nonsensitive -These functions may be interrupted for an extended period of time, at little
or no cost to the company, and require little or no catching up when restored.
Recovery Point Objective (RPO) - describes the acceptable amount of data loss
measured in time. It is the point in time to which you must recover data as defined by
your organization. This is generally a definition of what an organization determines is an
acceptable loss in a distressed situation. If the RPO of a company is 2 hours and the
actual time it takes to get the data back into production is 5 hours, the RPO is still 2
hours. Based on this RPO the data must be restored to within 2 hours of the disaster.
Primary purpose of mirroring is for RPO.
Transactions during RPO and interruption need to be entered after recovery (known as
catch-up data).
Synchronous distances shorter, but no data loss (two systems are synchronized).
Asynchronous can be data loss, but distance is greater, systems not synchronized and
data transferred at set times or when possible..
Recovery Time Objective (RTO) or maximum tolerable downtime (MTD) - Acceptable
downtime for a given application. The lower the RTO, the lower the disaster tolerance.
By definition RTO is the duration of time and a service level within which a business
process be restored after a disaster (or disruption) in order to avoid unacceptable
consequences associated with a break in business continuity. Cant achieve / meet RTO
unless you have met RPO.
Interruption window The time the organization can wait from the point of failure to
the critical services/applications restoration. After this time, the progressive losses caused
by the interruption are unaffordable.
Service delivery objective (SDO) Level of services to be reached during the alternate
process mode until the normal situation is restored. This is directly related to the business
needs.
Maximum tolerable outages Maximum time the organization can support processing
in alternate mode. After this point, different problems may arise, especially if the
alternate SDO is lower than the usual SDO, and the information pending to be updated
can become unmanageable.

CISA 2008 by Joe Sabater

Recovery Strategies :
First approach in a recovery strategy is to see if built in resilience can be implemented
(for example alternative routing and redundancy). A disaster recovery procedure will
address everything not covered by resilience. Selection of a recovery strategy depends
on: (a) Criticality of business process; (b) Cost; (c) Time torecover; (4) Security
Other strategies are based on cost and RTO and RPO requirements.
BCP (business continuity policy) is the most critical corrective control. The plan is a
corrective control. A recovery strategy is a combination of preventive, detective and
corrective measures.
The selection of an appropriate strategy based on the business impact analysis and
criticality analysis is the next step for developing BCP and DRP.
Removing the threat and minimizing the risk of occurrence can be addressed through the
implementation of physical and environmental security.
Recovery Alternatives:
Hot sites can beready in minutes or hours. They are fully configured with equipment,
network and systems software must be compatible with the primary installation being
backed up. The only additional needs are staff, programs, data files and documentation.
The hot site is intended for emergency operations of a limited time period and not for
long-term extended use (i.e. would impair the protection of other subscribers).
Warm sites dont have computers, but have basicnetwork and some peripheral
equipment. The assumption behind the warm site concept is that the computer can usually
be obtained quickly for emergency installation and, since the computer is the most
expensive unit, such an arrangement is less costly than a hot site.
Cold sites have very basic stuff facility (wiring, flooring) and environmental controls.
Activation of the site may take several weeks.
Duplicate information processing sites dedicated, self-developed recovery sites that
can backup critical applications. They can range in form from a standby hot site to a
reciprocal agreement with another company installation.
Mobile sites This is a specially designed trailer that can be quickly transported to a
business location or to an alternate site to provide a ready-conditioned information
processing facility. Good for branch offices.
Reciprocal arrangements This is a less frequently used method between two or more
organizations with similar equipment or applications. Under the typical agreement,
participants promise to provide computer time to each other when an emergency arises.
Not goodbecause software changes between companies & cause incompatibility issues.
CISA 2008 by Joe Sabater

Provisions / Questions for Use of Third Party Sites :

ConfigurationsAre the vendors hardware and software configurations adequate to


meet company needs since these will vary over time?
DisasterIs the definition of disaster broad enough to meet anticipated needs?
Speed of availabilityHow soon after a disaster will facilities be available?
Subscribers per siteDoes the agreement limit the number of subscribers per site?
Subscribers per areaDoes the agreement limit the number of subscribers in a
building or area?
PreferenceWho gets preference if there are common or regional disasters? Is there
backup for the backup facilities? Is use of the facility exclusive or does the customer
have to share the available space if multiple customers simultaneously declare a
disaster? Does the vendor have more than one facility available for subscriber use?
InsuranceIs there adequate insurance coverage for company employees at the
backup site? Will existing insurance reimburse those fees?
Usage periodHow long is the facility available for use? Is this period adequate?
What technical support will the site operator provide? Is this adequate?
CommunicationsAre the communications adequate? Are the communication
connections to the backup site sufficient to permit unlimited communication with the
alternate site if needed?
WarrantiesWhat warranties will the vendor make regarding availability of the site
and the adequacy of the facilities? Are there liability limitations (there usually are)
and is the company willing to live with them?
AuditIs there a right-to-audit clause permitting an audit of the site to evaluate the
logical, physical and environmental security?
TestingWhat testing rights are included in the contract? Check with the insurance
company to determine any reduction of premiums that may be forthcoming due to the
backup site availability.
ReliabilityCan the vendor attest to the reliability of the site(s) being offered?
Ideally, the vendor should have a UPS, limited subscribers, sound technical
management, and guarantees of computer hardware and software compatibility.

CISA 2008 by Joe Sabater

Key Responsibilities:
Incident response team respond toincidents and do reporting and investigation
Emergency action team firstresponders
Damage assessment assesses the extent of thedamage
Emergency management team responsible for coordination of activities in disaster.
Telecommunications recovery methods: The methods of providing telecommunications
continuity are:
RedundancyInvolves providing extra capacity with a plan to use the surplus capacity
should the normal primary transmission capability not be available. In the case of a LAN,
a second cable could be installed through an alternate route for use in the event the
primary cable is damaged. Use of dynamic routing protocols, extra capacity etc.
Alternative routingThe method of routing information via an alternate medium such
as copper cable or fiber optics. This involves use of different networks, circuits or end
points should the normal network be unavailable. Using an alternative cable medium
like copper instead of fiber.
Diverse routingThe method of routing traffic through split cable facilities or duplicate
cable facilities. This can be accomplished with different and/or duplicate cable sheaths.
Long haul network diversityMany recovery facilities vendors have provided diverse
long-distance network availability utilizing T1 circuits among the major long-distance
carriers. This ensures long-distance access should any one carrier experience a network
failure. Several of the major carriers have now installed automatic re-routing software
and redundant lines that provide instantaneous recovery should a break in their lines
occur. Use t1 circuits.
Last mile circuit protectionMany recovery facilities provide a redundant combination
of local carrier T1s, microwave and/or coaxial cable access to the local communications
loop. This enables the facility to have access during a local carrier communication
disaster. Alternate local carrier routing is also utilized.
Voice recoveryWith many service, financial and retail industries dependent on voice
communication, redundant cabling and alternative routing should be provided for voice
communication lines as well as data communication lines.

CISA 2008 by Joe Sabater

RAID which stands for Redundant Array of Inexpensive Disks, or sometimes


alternatively Redundant Array of Independent Disks) is a technology that employs
the simultaneous use of two or more hard disk drives to achieve greater levels of
performance, reliability, and/or larger data volume sizes.
"RAID" is now used as an umbrella term for computer data storage schemes that can
divide and replicate data among multiple hard disk drives. RAID's various designs all
involve two key design goals: increased data reliability and increased input/output
performance. When several physical disks are set up to use RAID technology, they are
said to be in a RAID array. This array distributes data across several disks, but the array
is seen by the computer user and operating system as one single disk.
Some RAID levels : Level 0- striped; Level 1 mirrored; Level 5 parity blocks
Plan Testing :
The test should be scheduled during a time that will minimize disruptions to normal
operations. Weekends are generally a good time to conduct tests. It is important that the
key recovery team members be involved in the test process and allotted the necessary
time to put their full effort into it. The test should address all critical components and
simulate actual primetime processing conditions, even if it is conducted in off hours.
Test Execution: To perform testing, each of the following test phases should be
completed: Pretest, Test, and Post-Test.
In addition, the following types of tests may be performed:
a) Desk-based evaluation/Paper test paper walk through of the plan with major
players.
b) Preparedness test usually a localized version of a full test simulated system
crash.
c) Full operational test shutting down a data center, etc.

Documentation of Results: During every phase of the test, detailed


documentation of observations, problems and resolutions should be maintained.
Results Analysis: It is important to have ways to measure the success of the plan
and test against the stated objectives. Therefore, results must be quantitatively
gauged as opposed to an evaluation based only on observation.
Recovery/Continuity plan maintenance: Plans and strategies for business
continuity should be reviewed and updated on a scheduled basis to reflect
continuing recognition of changing requirements.

CISA 2008 by Joe Sabater

Backup and Restoration pointers :

The more important the data that is stored on the computer the greater the need is for
backing up this data.
A backup is only as useful as its associated restore strategy.

Storing the copy near the original is unwise, since many disasters such as fire, flood
and electrical surges are likely to cause damage to the backup at the same time.

Automated backup and scheduling should be considered, as manual backups can be


affected by human error.

Backups will fail for a wide variety of reasons. A verification or restoration testing
strategy is an important part of a successful backup plan.

It is good to store backed up archives in open/standard formats. This helps with


recovery in the future when the software used to make the backup is obsolete. It also
allows different software to be used.

Grandfather-father-son rotation of media for backup son is daily backup, father end
of week, grandfather end of month.

Typical Questions in Reviewing DRP / BCP:


Who is responsible for administration or coordination of the plan?
Is the plan administrator/coordinator responsible for keeping the plan up-to-date?
Is there a disaster recovery implementation team (i.e., the first response team
members who will react to the emergency with immediate action steps)?
Where is the disaster recovery plan stored?
What critical systems are covered by the plan?
What systems are not covered by the plan? Why not?
What equipment is not covered by the plan? Why not?
Does the plan operate under any assumptions? What are they?
Does the plan identify meeting points for the disaster management committee or
emergency management team to meet and decide if BC should be initiated?
Are the documented procedures adequate for successful recovery?
Does the plan address disasters of varying degrees?
Are telecommunications backups (data & voice line backups) addressed in the plan?
Where is the backup facility site?
Does the plan address relocation to a new information processing facility in the event
that the original center cannot be restored?
CISA 2008 by Joe Sabater

Does the plan include procedures for merging master file data, automated tape
management system data, etc., into pre-disaster files?

Vendor Contract Review :


Ensure that the contract is written clearly and is understandable.
Reexamine and confirm the organizations agreement with the rules that apply to sites
shared with other subscribers.
Ensure that insurance coverage ties in with and covers all (or most) expenses of the
disaster.
Ensure that tests can be performed at the hot site at regular intervals.
Review and evaluate communications requirements for the backup site.
Ensure that enforceable source code escrow is reviewed by a lawyer specializing in
such contracts.
Determine the limitation recourse tolerance in the event of a breached agreement.

Disaster starts when the disaster starts. IT does not declare disaster.
Not testing your BCP plan is one of the worst things you can do.
Restore core and business critical processes.
Insurance : Kinds of paper not covered cash and securities. Fidelity coverage
coverage against fraud, e.g. bonding.
People and then data are the most important things.

CISA 2008 by Joe Sabater

10

You might also like