A Business Continuity Management Maturity Model
A Business Continuity Management Maturity Model
To cite this article: Rama Lingeswara Tammineedi (2010) Business Continuity Management: A
Standards-Based Approach, Information Security Journal: A Global Perspective, 19:1, 36-50, DOI:
10.1080/19393550903551843
A Standards-Based Approach
Rama Lingeswara
Business
R. L. Tammineedi
Continuity Management
INTRODUCTION
Business enterprises increasingly realize the importance of business
continuity management (BCM). The objective of BCM is to ensure the
uninterrupted availability of all key business resources required to support
critical business activities in the event of business disruption and to
expedite a return to “business as usual.” BCM adopts a holistic view and
focuses on the concept of continuity of all key processes, whether manual
or information technology enabled. According to the BCM Survey con-
ducted in 2009 by the Chartered Management Institute, UK, in conjunc-
tion with the Civil Contingencies Secretariat in the Cabinet Office, the
following are the five important drivers pushing the BCM initiatives in
Address correspondence to Rama organizations:
Lingeswara Tammineedi, Information
Risk Management Advisory, TCS
Limited, 16-2-752/21/13, Triveni Nagar, • Corporate governance
Gaddi Annaram, Dilsukhnagar,
• Central government
Hyderabad–500060, India.
E-mail: [email protected] • Existing and potential customers
36
• Legislation • Disaster: Disaster is an unplanned event usually
• Regulators causing denial of access to premises and resulting in
human casualties and great damage to property. Typ-
In the absence of an acceptable BCM standard, dif- ical examples are flood and fire. A catastrophic failure
ferent organizations used to follow different BCM of information technology services can also repre-
approaches to ensure their business continuity. This sent a disaster. In the words of Brian V. Cummings,
has often resulted in unreliable and ineffective busi- an expert in the area of business continuity plan-
ness continuity plans. The recent publication of BCM ning, “emergency is event relevant while disaster is
Standard – BS 25999 – by the British Standards Insti- event agnostic.”
tute greatly helps organizations adopt a holistic BCM • Crisis: Crisis is an occurrence which threatens the
approach. BCM must be fully integrated into the orga- integrity, reputation, or survival of an individual or
nization as an embedded management process. organization. Typical examples are product recall or
This paper intends to provide a conceptual under- secret tapes (e.g., Watergate).
standing of BCM, from BCM policy to BCM matu- • Outage: Outage is an event which causes a signifi-
rity, by describing the steps involved in the cant disruption to, or loss of, key business processes.
implementation of BCM Standard – BS 25999 – to The concept of an outage has both time dimension
ensure business continuity in the event of an outage. and business process dimension. An outage is differ-
ent from other business interruptions such as the
one arising from a service or technology failure (e.g.,
systems downtime, communications link failure)
KEY TERMS
which needs to be restored with the help of a service
A clear conceptual understanding of the following provider.
key terms will be useful for people involved in busi-
ness continuity management.
THE BCM STANDARD
• Event: An event is a planned occasion with The British Standard 25999 establishes the process,
unplanned consequences. Typical examples are principles and terminology of business continuity
meetings of the World Trade Organization (WTO), management. The purpose of this Standard is to
the World Bank, International Monetary Fund provide a basis for understanding, developing, and
(IMF), SAARC, and Olympic Games. implementing business continuity within an organiza-
• Incident: Incident is an occurrence (event) resulting tion and to provide confidence in the organization’s
in loss. Typical examples are violent protests, kid- dealings with customers and other organizations.
nap, and hostage. BS 25999-2:2007 defines incident BS 25999 is written in two parts:
as “situation that might be, or could lead to, a busi- Part 1, the Code of Practice, outlines the standard’s
ness disruption, loss, emergency or crisis.” overall objectives, guidance, and recommendations. It
• Contingency: Contingency is a specific system’s is this part that replaced PAS56.
failure or disruption of operations. Typical examples Part 2, the Specifications, details the requirements
are ATM system failure or on-line banking system for a BCM System (BCMS) and will be auditable,
failure. These are typically short-term failures gov- enabling organizations to demonstrate compliance to
erned by problem and event management proce- the standard. It is this part against which third party
dures. certification will be available.
• Emergency: Emergency is an incident requiring an BS 25999-1 is organized into the following
immediate and significant response. An office fire sections:
or bomb threat is a typical example of an emer-
gency. In some cases, a planned local event may • Scope and applicability
require advance preparation and elevated vigilance • Terms and definitions
to avoid an incident. For example, a major sporting • Overview of business continuity management
or championship event is typically followed by fan (BCM)
violence. • The business continuity management policy
R. L. Tammineedi 38
FIGURE 1 Typical BCM organization structure.
executive leadership and facilitation. As an example of An organizations should designate a suitably qualified
a non-BCM crisis, consider a major security breach at senior employee as its BC manager and another senior
a banking/financial company that compromises a person as backup BC manager.
large number of customer records. The business is still
operating but faces a major crisis in terms of technol-
ogy, public relations, legal, and financial impact. Media/PR Manager
CMT comprises the heads of departments of business, Public relations function is critical during a disaster
marketing, and support functions (e.g., service deliv- event. The media can make or break even the best
ery, marketing, IT, risk and compliance, HR, adminis- efforts by an organization. Only a person with media/
tration/security, finance). journalistic experience should be allowed to act as an
official spokesperson. All other employees should be
instructed not to speak to the media or any other peo-
Business Continuity Manager ple in the case of a disaster event. However, employees
with technical expertise can assist the official spokes-
The primary responsibility of the business continu- person in dealing with the media. Media responsibili-
ity (BC) manager is to oversee the development and ties include the release of announcements to
test of viable business continuity plans. The other employees on the status of the recovery effort and the
major responsibilities of a BC manager include: expectations of the enterprise in regard to employee
status reporting and assistance.
• Ensure that the recovery team members are trained
adequately to handle their responsibilities in a disas-
ter scenario.
Damage Assessment and
• Obtain and maintain contact lists of key employees, Salvage Team
BCM organization, vendors, partners, and public The damage assessment and salvage team is respon-
authorities. sible for determining the state of the original site, or
• Maintain copies of all appropriate vendor agree- trying to salvage any equipment or data that might be
ments. Liaison with hardware vendors as per agree- salvageable, and mitigating damage at the primary site.
ments in force. This depends on prompt realization of what is salvage-
• Coordinate DR activities in the event of a disaster able and what is not. Repair and replacement orders
and ensure high level adherence to the DR proce- will be filled for what is not in operational condition.
dures documented in the BC plan. The duties of this team include:
39 Business Continuity Management
• assisting in the immediate damage assessment/sal- • Travel arrangement for the personnel/food and
vage operation accommodation facility for relocated personnel
• preparing inventory of damaged and undamaged • Telephone forwarding /mail and delivery service
items rerouting
• salvaging equipment and supplies
• helping in settling property/insurance claims Ideally, teams would be staffed with the personnel
responsible for the same or similar operations under
This team is the first to arrive at the primary (disaster) normal conditions.
site. Team members provide preliminary damage
assessment information to the BC manager and CMT
to enable them to make a decision regarding invoca- APPROACH AND METHODOLOGY
tion of the BC plan. This team then conducts a Guided by the BCM policy, the BCM organization
detailed damage assessment and salvage operation and strives to establish and maintain the business continu-
documents the findings. ity capability of an enterprise. The entire gamut of
BCM activities can be discussed in terms of three
phases. Figure 2 describes the three phases of business
IT Recovery Team continuity and the important activities of each phase.
The IT recovery team plays a major role in restoring The important activities of the above three phases
the network and IT services and possibly telephony. are as follows.
All technical and logistical activities associated with
the restoration of needed network and IT service are Pre-event Preparation
carried out by this team. This team ensures the avail-
ability and functionality of critical software and other The following are the pre-event preparation activities.
utilities in the restored system environment. At a min-
imum, IT has a fiduciary responsibility to restore all IT Site Risk Assessment
services as soon as possible to a state of business as Site risk assessment focuses on risks to the physical
usual. Beyond that, IT has a responsibility to support locations (premises). Physical location is one of the
the incremental continuity requirements of critical critical resources that facilitate execution of business
business processes. critical activities. There is a fiduciary responsibility to
assess and mitigate site physical and environmental
risks. However, risk assessments should not be per-
Communications Recovery Team
formed across all business processes but only for those
This team is responsible for restoring the voice that have high criticality. As such, the BIA becomes a
communications services, including telephone, focusing lens to prioritize and invest in business pro-
mobile, fax, and so forth. This team also works closely cess risk assessment in addition to higher availability
with the IT recovery team to bring up data network business continuity strategies and solutions.
service at the DR site. The important areas that should be covered during
site risk assessment include:
Support Team • Building protection measures (e.g., perimeter, secu-
The support team is responsible for ensuring the rity guards, CCTV, intruder detection system, build-
availability of the support functions in a disaster sce- ing construction code, fire-rating of walls, running
nario. These functions include: water pipes, overhead water tank)
• Fire detection and suppression measures (e.g.,
• Building management and facility support at the smoke and fire detection systems, fire suppression
DR site and repair and restoration of primary site systems)
• Finance, funding, and procurement • Neighborhood (e.g., neighboring industries, military
• Human resources and personnel tracking areas, hotels, bus station, railway station, air port,
R. L. Tammineedi 40
FIGURE 2 The three phases of business continuity.
Figure 3 describes the key elements in a risk assessment. TABLE 1 Illustration of three risk scenarios.
There are different methodologies to carry out risk
Risk # Severity x Likelihood x Nondetectability = RPN
assessment. When failure modes and effects analysis
(FMEA) methodology is used for doing risk assessment, a 1 1 5 5 = 25
parameter called risk priority number (RPN) is com- 2 5 5 1 = 25
3 5 1 5 = 25
puted. It is a product of three attributes of risk — severity,
R. L. Tammineedi 42
tolerance and recovery time decrease with the need delivery of the services/products. Part 2 of the BIA
for higher availability. That is, the higher the data has to be conducted in a more detailed way with
availability, the lower the RTO and RPO. operational management at the department level to
• Analyze the business continuity strategies of busi- identify department specific MTPoD values and
ness units, their strategy implementation, and RTOs. The department specific MTPoD values
requirements of each key business process. given by senior management should be treated as
• Identify the resource requirements for those busi- preliminary values and need to be validated by
ness processes that are to be conducted at a business operational management of the respective depart-
recovery/alternate site. ments. Any difference in MTPoD values of senior
• Identify the vital records/critical files needed for management and operational management need to
recovery by each business unit. be resolved by achieving consensus of opinion.
• Validate the BIA results and information with
respective process owners. The challenge relating to RPOs and data recovery
• Obtain management approval for BIA results. is that most organizations overlook important
aspects that can lead to recovery delays. Issues
The traditional approach of interviewing business include backup process, backups managed by differ-
managers of all functions is tedious and time consum- ent business units, availability of data backup, reli-
ing. A faster alternative would be to consult key ser- ability of backup media, data serialization, and
vice delivery heads and business process owners and application processing capacity limitations. Failure to
develop a process matrix to identify the criticality of address these can lead to critical failures in data
the services and processes and their corresponding recovery and meeting RTOs. To give an example, in
MTPoD values. These MTPoD values will be used to one of the organizations I worked for, the backup
derive RTOs of the business processes. administrator could not notice on one day the
The two challenges one may encounter in conduct- backup tool’s failure to back up critical data. When
ing BIA are as follows: the backup data was restored later (before the next
backup), the data of a few thousands subscribers
1. Each business function is approached as a discrete added during the period were lost.
entity rather than as part of an enterprise. Everyone
will overstate the importance of their work. Man-
agement of individual business functions will give Business Process Risk Assessment
an entirely different answer about their relative Business process risk assessment (BPRA) cannot
importance in the context of a broader disaster be economically performed across the enterprise.
impact. This kind of approach will ultimately skew Driving from the BIA, the BPRA is performed for
all BIA findings to higher availability and higher critical and important business functions identified
cost of strategies and solutions; and lead to a signif- during BIA. These business functions/processes that
icant and consistent failure of BIA efforts. BIA has support the products and services of an organization
to be approached in the context of a sitewide disas- are executed by or with the help of resources such as
ter that affects all business functions at the site. people, premises, technology, information, supplies,
2. BS 25999 expects senior management to be actively and stakeholders. While the site risk assessment
involved in BIA. In some organizations, senior focuses on risks to premises, the BPRA evaluates the
management may prefer to understand the ground risks to the other resources (e.g., people, technology,
realities before committing any values for MTPoD/ information, supplies, stakeholders) and their impact
RTO, as they are aware of the financial implications on the business functions/processes, identifying single
of their decisions. In such cases, it makes sense to points of failure (SPOF) that could lead to a disrup-
break BIA into two parts, Part 1 and Part 2. Part 1 of tion of service. After identifying risks, appropriate
the BIA has to be conducted with senior manage- countermeasures should be identified, evaluated,
ment to obtain MTPoD values for all services/ and implemented to lower the risks to acceptable
products and respective functions supporting the levels.
R. L. Tammineedi 44
Event Management the first phase, preliminary damage assessment is done
to determine what recovery options need to be acti-
When disaster strikes, the following activities are
vated and to facilitate the CMT decision with regard
carried out to manage the disaster event:
to invocation of the BC plan. In the second phase, the
damage assessment activity is a comprehensive evalua-
Emergency Response tion of damage to equipment, facilities, and records.
The main objectives of emergency response are The damage assessment activity is handled by the
twofold: damage assessment and salvage team. This team
should be multidisciplinary in nature and composed
i. to ensure safety of people and protection of assets of a mechanic, electrician, plumber, medical assistant,
and and infrastructure/information technology personnel.
ii. to monitor and coordinate emergency response
efforts.
Salvage Operations at Primary Site
Emergency response activities include notification The insurance companies with which the organiza-
and evacuation of building occupants, notifying pub- tion has policies must be notified of disaster and given
lic safety authorities (e.g., police, fire), medical treat- an opportunity to investigate the facilities before
ment, and damage mitigation. These emergency beginning salvage operations at primary site. The sal-
response activities are carried out with the help of a vage operations deal with salvaging hardware, facili-
Command Post and Emergency Operations Center ties, and documentation and require specialized skills.
(EOC). A command post is set up at the disaster site Different equipments require different salvage treat-
to provide overall direction and unify all emergency ments. Therefore, the salvage procedures should pro-
response efforts at the disaster site. EOC is set up (in vide general guidelines. The damage assessment and
case of major disasters) away from the disaster site to salvage team should have the contact information of
monitor and coordinate emergency response efforts and access to salvage experts. Proper documentation
at organizational level. EOC should be represented of salvage operations should be maintained to comply
by top management with the authority and experi- with insurance and legal requirements, if any.
ence necessary to facilitate flexible and innovative
decision making required in disaster scenarios. Emer- Operations from Secondary Site
gency response procedures/guidelines should be a
separate document that is maintained and readily While salvage operations at primary site are going
available on site in the case of an emergency. Business on, critical business operations should be resumed at
continuity plans, on the other hand, are maintained the secondary site by following the predefined disaster
off site. recovery procedures. The following are the different
types of alternate secondary sites:
Coordination with Public Authorities i. Cold site: Alternate site devoid of any resources,
After initiating emergency response, public safety but is equipped with air-conditioning, electrical
authorities (e.g., police, fire, Emergency Management wiring, uninterruptible power supply (UPS), and
Services) should be notified of the disaster. These communication facilities.
agencies take control of the command post once they ii. Warm site: Partially equipped alternate site.
arrive at the disaster scene. A senior employee should iii. Hot site: Alternate site with equipment and
be tasked with the responsibility of coordinating with resources to recover the critical business functions
the public authorities. in case of a disaster.
R. L. Tammineedi 46
Insurance Settlement depending on their line of business, complexity of
The records and documentation that are generated processes, and regulatory requirements. Organizations
during the insurance cost tracking task of the event need to weigh the pros and cons of alternative recov-
management phase should be used to submit insur- ery strategies and the costs involved before selecting a
ance claims. The best time to do such a review is pre- recovery option for implementation.
ceding a peak business period.
Recovery options*
*Source: Developing Recovery Strategy for Your Business Continuity Plan by Dr. Goh Moh Heng. 2005 (with permission)
R. L. Tammineedi 48
enterprise-wide BCM program has been initiated for basics (covering senior management commitment,
senior management consideration. professional support, and governance) needed to
Level 4: Enterprise awakening — Senior manage- launch sustainable enterprise BCM program. Levels
ment understands the strategic importance of enter- four through six represent the evolutionary path of the
prise-wide BCM program and is committed to it. maturing enterprise BCM program. Organizations
BCM policy, practices, and processes are being stan- need to maintain momentum of the BCM program to
dardized across the enterprise. All critical business ensure that they do not fall back from higher maturity
functions have been identified, and continuity plans level to a lower maturity level.
for their protection have been developed, tested, and Embedding BCM in the organization’s culture is a
maintained. key requirement of BS 25999. This enables BCM to
Level 5: Planned growth — All business functions become part of the organization’s core values and
have been identified, and continuity plans for their instills confidence in all stakeholders in the ability of
protection have been developed, tested, and main- the organization to cope with disruptions. Achieving
tained. Senior management has actively participated higher BCM maturity levels will result in enhanced
in crisis management exercises. Regular communica- BCM culture.
tion and training programs exist to sustain a high level
of business continuity awareness. A multiyear plan has
been adopted to ensure organizational resilience and ACKNOWLEDGEMENTS
mature BCM program across the enterprise. The author wishes to thank Brian V. Cummings
Level 6: Synergistic — Sophisticated business and P V S Murthy for comments and feedback on ear-
protection strategies are formulated and tested success- lier draft of this paper.
fully by all business units. Cross-functional coordina-
tion has enabled the business units to develop and
successfully test upstream and downstream integration REFERENCES
of their business continuity plans. Scrupulous adher- Chartered Management Institute. UK’s 2009 business continuity
ence to company’s change control mechanisms and management survey. Available from: http://www.managers.org.uk/
download_1.aspx?id=10:2224&fid=10:2615&file=client_files/user_
continuous process improvements enable high state of
files/Woodman_31/Research files/BCM09 Final Report 09
disaster preparedness, even though the business envi- March.pdf
ronment continues to change radically and rapidly. Heng, G.-M. (2005). Developing recovery strategy for your business
continuity plan.
Levels one through three represent organizations Business Continuity Maturity Model of Virtual Corporation. (2003).
that have not yet completed the necessary program Available from: http://www.virtual-corp.net/
R. L. Tammineedi 50