Chapter 6 BC and DRP V 2
Chapter 6 BC and DRP V 2
Auditor can review past transaction volume to determine impact to the business if the
system was unavailable.
3 main questions :
1. What are the different business processes?
2. What are thecritical info resources that support these processes?
3. What is the critical recovery time for these resources how long can you be down
before losses are significant?
Two cost factors associated with the above:
Down time cost how much does it cost you if the application/system is down?
Recovery cost cost of thestrategies to minimize your downtime.
The sum of these costs should be minimized. Downtime costs increase over time, recovery
costs decrease over time. The sum usually is a U curve, at the bottom of the U curve is
where the lowest cost can be found.
Function / Application Classification Description :
Critical - These functions cannot be performed unless they are replaced by identical
capabilities. Critical applications cannot be replaced by manual methods. Tolerance to
interruption is very low; therefore, cost of interruption is very high.
Vital - These functions can be performed manually, but only for a brief period of time.
There is a higher tolerance to interruption than with critical systems and, therefore,
somewhat lower costs of interruption, provided that functions are restored within a
certain time frame (usually five days or less).
CISA 2008 by Joe Sabater
Sensitive -These functions can be performed manually, at a tolerable cost and for an
extended period of time. While they can be performed manually, it usually is a difficult
process and requires additional staff to perform.
Nonsensitive -These functions may be interrupted for an extended period of time, at little
or no cost to the company, and require little or no catching up when restored.
Recovery Point Objective (RPO) - describes the acceptable amount of data loss
measured in time. It is the point in time to which you must recover data as defined by
your organization. This is generally a definition of what an organization determines is an
acceptable loss in a distressed situation. If the RPO of a company is 2 hours and the
actual time it takes to get the data back into production is 5 hours, the RPO is still 2
hours. Based on this RPO the data must be restored to within 2 hours of the disaster.
Primary purpose of mirroring is for RPO.
Transactions during RPO and interruption need to be entered after recovery (known as
catch-up data).
Synchronous distances shorter, but no data loss (two systems are synchronized).
Asynchronous can be data loss, but distance is greater, systems not synchronized and
data transferred at set times or when possible..
Recovery Time Objective (RTO) or maximum tolerable downtime (MTD) - Acceptable
downtime for a given application. The lower the RTO, the lower the disaster tolerance.
By definition RTO is the duration of time and a service level within which a business
process be restored after a disaster (or disruption) in order to avoid unacceptable
consequences associated with a break in business continuity. Cant achieve / meet RTO
unless you have met RPO.
Interruption window The time the organization can wait from the point of failure to
the critical services/applications restoration. After this time, the progressive losses caused
by the interruption are unaffordable.
Service delivery objective (SDO) Level of services to be reached during the alternate
process mode until the normal situation is restored. This is directly related to the business
needs.
Maximum tolerable outages Maximum time the organization can support processing
in alternate mode. After this point, different problems may arise, especially if the
alternate SDO is lower than the usual SDO, and the information pending to be updated
can become unmanageable.
Recovery Strategies :
First approach in a recovery strategy is to see if built in resilience can be implemented
(for example alternative routing and redundancy). A disaster recovery procedure will
address everything not covered by resilience. Selection of a recovery strategy depends
on: (a) Criticality of business process; (b) Cost; (c) Time torecover; (4) Security
Other strategies are based on cost and RTO and RPO requirements.
BCP (business continuity policy) is the most critical corrective control. The plan is a
corrective control. A recovery strategy is a combination of preventive, detective and
corrective measures.
The selection of an appropriate strategy based on the business impact analysis and
criticality analysis is the next step for developing BCP and DRP.
Removing the threat and minimizing the risk of occurrence can be addressed through the
implementation of physical and environmental security.
Recovery Alternatives:
Hot sites can beready in minutes or hours. They are fully configured with equipment,
network and systems software must be compatible with the primary installation being
backed up. The only additional needs are staff, programs, data files and documentation.
The hot site is intended for emergency operations of a limited time period and not for
long-term extended use (i.e. would impair the protection of other subscribers).
Warm sites dont have computers, but have basicnetwork and some peripheral
equipment. The assumption behind the warm site concept is that the computer can usually
be obtained quickly for emergency installation and, since the computer is the most
expensive unit, such an arrangement is less costly than a hot site.
Cold sites have very basic stuff facility (wiring, flooring) and environmental controls.
Activation of the site may take several weeks.
Duplicate information processing sites dedicated, self-developed recovery sites that
can backup critical applications. They can range in form from a standby hot site to a
reciprocal agreement with another company installation.
Mobile sites This is a specially designed trailer that can be quickly transported to a
business location or to an alternate site to provide a ready-conditioned information
processing facility. Good for branch offices.
Reciprocal arrangements This is a less frequently used method between two or more
organizations with similar equipment or applications. Under the typical agreement,
participants promise to provide computer time to each other when an emergency arises.
Not goodbecause software changes between companies & cause incompatibility issues.
CISA 2008 by Joe Sabater
Key Responsibilities:
Incident response team respond toincidents and do reporting and investigation
Emergency action team firstresponders
Damage assessment assesses the extent of thedamage
Emergency management team responsible for coordination of activities in disaster.
Telecommunications recovery methods: The methods of providing telecommunications
continuity are:
RedundancyInvolves providing extra capacity with a plan to use the surplus capacity
should the normal primary transmission capability not be available. In the case of a LAN,
a second cable could be installed through an alternate route for use in the event the
primary cable is damaged. Use of dynamic routing protocols, extra capacity etc.
Alternative routingThe method of routing information via an alternate medium such
as copper cable or fiber optics. This involves use of different networks, circuits or end
points should the normal network be unavailable. Using an alternative cable medium
like copper instead of fiber.
Diverse routingThe method of routing traffic through split cable facilities or duplicate
cable facilities. This can be accomplished with different and/or duplicate cable sheaths.
Long haul network diversityMany recovery facilities vendors have provided diverse
long-distance network availability utilizing T1 circuits among the major long-distance
carriers. This ensures long-distance access should any one carrier experience a network
failure. Several of the major carriers have now installed automatic re-routing software
and redundant lines that provide instantaneous recovery should a break in their lines
occur. Use t1 circuits.
Last mile circuit protectionMany recovery facilities provide a redundant combination
of local carrier T1s, microwave and/or coaxial cable access to the local communications
loop. This enables the facility to have access during a local carrier communication
disaster. Alternate local carrier routing is also utilized.
Voice recoveryWith many service, financial and retail industries dependent on voice
communication, redundant cabling and alternative routing should be provided for voice
communication lines as well as data communication lines.
The more important the data that is stored on the computer the greater the need is for
backing up this data.
A backup is only as useful as its associated restore strategy.
Storing the copy near the original is unwise, since many disasters such as fire, flood
and electrical surges are likely to cause damage to the backup at the same time.
Backups will fail for a wide variety of reasons. A verification or restoration testing
strategy is an important part of a successful backup plan.
Grandfather-father-son rotation of media for backup son is daily backup, father end
of week, grandfather end of month.
Does the plan include procedures for merging master file data, automated tape
management system data, etc., into pre-disaster files?
Disaster starts when the disaster starts. IT does not declare disaster.
Not testing your BCP plan is one of the worst things you can do.
Restore core and business critical processes.
Insurance : Kinds of paper not covered cash and securities. Fidelity coverage
coverage against fraud, e.g. bonding.
People and then data are the most important things.
10