0% found this document useful (0 votes)
146 views166 pages

Risk Management: ISEC 4340

The document discusses risk management concepts including risk, threats, vulnerabilities, losses, and the risk management process of threat assessment, vulnerability assessment, impact assessment, and risk mitigation strategy development. It provides examples and definitions for each step of the risk management process.

Uploaded by

يُ يَ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views166 pages

Risk Management: ISEC 4340

The document discusses risk management concepts including risk, threats, vulnerabilities, losses, and the risk management process of threat assessment, vulnerability assessment, impact assessment, and risk mitigation strategy development. It provides examples and definitions for each step of the risk management process.

Uploaded by

يُ يَ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 166

Risk Management

ISEC 4340
What Is Risk?

 Risk: The likelihood that a loss will occur. Losses occur when a threat exposes a
vulnerability.

 Threat: Any activity that represents a possible danger.

 Vulnerability: A weakness.

 Loss: A loss results in a compromise to business functions or assets.

 Tangible

 Intangible
Risk management process
• Threat assessment  what could be the threat? It could be viruses or
hacker
• Vulnerability assessment  what is the weaknesses
• Impact assessment  what is the impact
• Risk mitigation strategy development  who to mitigate the risk
Threat Assessment
• Process of formally evaluating the degree of threat to an information
system or enterprise and describing the nature of the threat.
• Threats are the tactics, techniques, and methods used by threat actors that
have the potential to cause harm to an organization's assets.
• Threat: An attacker performs an SQL injection
• Vulnerability: unpatched (update)
• Asset: web server (college web, visit web through URL)
• Consequence: to steal customers' private data.
• The process of threat assessment begins with the initial assessment of a
threat. It is then followed by a review of its seriousness, and creation of
plans to address the underlying , Finally, a follow-up assessment and plans
for mitigation. In the last phase,.
Vulnerability assessment
• The vulnerability assessment analyzes how vulnerable, susceptible,
and exposed a business or system is to a particular threat.
• it is useful to know that a system is vulnerable to a threat that has a
90% chance of occurring, a 50% chance of occurring, or a 1% chance
of occurring. The vulnerability and the likelihood of the event are
closely related, and the results are used as inputs to the impact
assessment.
• A server that is outside the firewall is far more vulnerable to external
attacks than a server that is inside the firewall.
Impact assessment
• The impact assessment analyzes how great or small the impact of a
threat occurrence will be on the business or system.
• An earthquake has an enormous impact on a business that is in or
near the epicenter of the quake; it has a lesser impact on businesses
further from the epicenter.
Risk mitigation strategy development
• You can reduce, avoid, accept, or transfer risks. Each strategy comes
with an associated cost. It’s far more expensive in many cases to
completely avoid a risk than it is to reduce the impact of the risk.
• Most businesses are more likely to build in state-of the art fire
suppression systems rather than construct a building with absolutely
no flammable materials. The cost of building a completely fireproof
building is far higher than installing a high-quality fire system.
• Some risks are worth accepting We drive cars, we cross busy
intersections on foot, we eat unhealthy food.
What is Risk Management?

• It is a process to:
• Identify all relevant risks
• Assess / rank those risks
• Address the risks in order of
priority
• Monitor risks & report on
their management
Promotes good
management

Risk Management – May be a legal requirement


why do we need it? depending upon industry or
sector

Resources available are


limited – therefore a
focused response to Risk
Management is needed
Risk Management Elements/Process
Assess Assess risks

Identify Identify risks to manage

Select Select controls

Implement and
test
Implement and test controls

Evaluate Evaluate controls


Risk Identification Process

1 2 3
Identify threats Identify Estimate
vulnerabilities likelihood of a
threat exploiting a
vulnerability
Organization-wide Risk Management

• Managing information system-related


security and privacy risk is a complex
undertaking that requires the involvement
of the entire organization—from senior
leaders providing the strategic vision and
top-level goals and objectives for the
organization, to mid-level leaders
planning, executing, and managing
projects, to individuals developing,
implementing, operating, and maintaining
the systems supporting the organization’s
missions and business functions.
Techniques of Risk Management

Avoidance

Transfer

Mitigation

Acceptance

Residual Risk

Cost-Benefit Analysis
Risk Avoidance
• Risk avoidance is a way for businesses to reduce their level of risk by not
engaging in certain high-risk activities. While it’s impossible to eliminate all
risks, a risk avoidance strategy can help prevent some losses from happening.
• An example: A retailer discontinues collection of personal data such as
customer information, ages and telephone numbers to avoid the risk
that such data would be stolen in an information security incident.
• The key advantage of this technique is that it’s the most successful
method of mitigating risk. You eliminate the possibility of suffering
losses by stopping the threat altogether.
Risk Avoidance
Risk Transfer
• You can transfer all or part of the risk to a third party. The two main
types of transfer are insurance and outsourcing. For example a
company may choose to transfer a collection project risk by out
sourcing the project.
• The advantage here is that you can take some or most of the burden
from risks and share it with a third party.
Mitigate the Risk

• Risk mitigation is the process of planning for disasters and


having a way to lessen negative impacts.
• Although the principle of risk mitigation is to prepare a business
for all potential risks, a proper risk mitigation plan will weigh the
impact of each risk and prioritize planning around that impact.
Risk mitigation focuses on the inevitability of some disasters
and is used for those situations where a threat cannot be
avoided entirely. Rather than planning to avoid a risk, mitigation
deals with the aftermath of a disaster and the steps that can be
taken prior to the event occurring to reduce adverse and,
potentially, long-term effects.
Risk Mitigation
• Risk Acceptance, also known as risk retention, is choosing to face a risk. It is impossible to
profit in business or enjoy an active life without choosing to take on risk

Acceptan
• Residual Risk: Risk treatments don’t necessarily reduce risks to zero. Remaining risk after
treatment is known as residual risk.
ce and
• Residual risk is the level of risk remaining after applying risk controls.
Residual
Risk
Best Practices for Managing Threats

CREATE A PURCHASE USE ACCESS USE


SECURITY POLICY. INSURANCE. CONTROLS. AUTOMATION.

INCLUDE INPUT PROVIDE USE ANTIVIRUS PROTECT THE


VALIDATION. TRAINING. SOFTWARE. BOUNDARY.
Risk Analysis
Annualized Loss Expectancy (ALE)
Single Loss Expectancy (SLE)

• Asset Value (AV) • SLE


• X Exposure Factor (EF) • X Annualized Rate of Occurrence
• Percentage loss in asset value if a (ARO)
compromise occurs • Annual probability of a compromise
• = Single Loss Expectancy (SLE) • = Annualized Loss Expectancy (ALE)
• Expected loss in case of a compromise • Expected loss per year from this type of
compromise
Risk Analysis calculation

Asset Value (AV) $100,000


Exposure Factor (EF) 80%

Single Loss Expectancy (SLE): = AV*EF $80,000

Annualized Rate of Occurrence (ARO) 50%

Annualized Loss Expectancy (ALE): = SLE*ARO $40,000


Class Work
• Calculate the SLE for a particular asset of value 40000 OMR with an
exposure factor of 70%. if attack happens once in every quarter, what is the
ALE?
• Uses the opinions of
experts
Qualitative Risk • Is easy to complete
Assessment Benefits
• Uses words that are
easy to express and
understand
Categories of Risks
• There are multiple ways into which risks can be categorized
• Final categories used will depend upon each organizations / unit’s circumstances
• Goal is to cluster risks into standard, meaningful & actionable groupings
• What follows is one example of a type of categorization
Categories of Risks

• Financial
• Reduction in funding
• Failure to safeguard assets
• Poor cash flow management
• Lack of value for money
• Fraud / theft
• Poor budgeting
Categories of Risks
• Operational
• These risks result from failed or inappropriate policies, procedures,
systems or activities e.g.
• Failure of an IT system
• Poor quality of services delivered
• Lack of succession planning
• Health & Safety risks
• Staff skill levels
• No process to track contractual commitments
Categories of Risks
• Reputational
• Organization engages in activities that could threaten it’s good name
• Through association with other bodies
• Staff / members acting in a criminal or unethical way
• Poor stakeholder relations
Risk Register

• a) What is it?
• b) Components
• c) How to report on it
Risk Register
• A Risk Register is a management tool used to record relevant details
relating to risks.
• It is a database of information on risks.
• Best kept simple to begin with!
Business Impact
Analysis

ISEC 4340
Learning Objective and Key Concepts

Learning Objective
• Perform a business impact analysis.

Key Concepts
• Purpose of BIA
• Critical success factors of BIA
• Steps involved in implementing a BIA
• BIA best practices
What Is a Business Impact Analysis?

A study used to identify the impact that


can result from disruptions in the business

Focuses on the failure of one or more


critical IT functions
Consider the Impact
• The BIA will identify the operational and financial impacts resulting from the
disruption of business functions and processes. Impacts to consider include:
• Lost sales and income
• Delayed sales or income
• Increased expenses (e.g., overtime labor, outsourcing, expediting costs, etc.)
• Regulatory fines
• Contractual penalties or loss of contractual bonuses
• Customer dissatisfaction or defection
• Delay of new business plans
Four purposes of the BIA

• Obtain an understanding of the organization’s most critical objectives


• Inform a management decision on maximum tolerable outage for each function
• Provide the resource information from which an appropriate recovery strategy
can be determined
• Outline dependencies that exist both internally and externally

35
Understanding impact criticality

• Criticality categories
• Category 1: Critical functions ---mission-critical
• Category 2: Essential functions ---vital
• Category 3: Necessary functions ---Important
• Category 4: Desirable functions ---Minor

36
Understanding impact criticality

• Critical functions ---Mission-Critical


• Mission-critical business processes and functions are those that have
the greatest impact on your company’s operations and need for
recovery
• The network, system, or application outage that is mission-critical
would cause extreme disruption to the business

37
Understanding impact criticality
• Essential Functions ---Vital

• Fall somewhere between mission-critical and important

• Vital systems might include those that interface with mission-critical systems

• Necessary functions - Important

• Systems may include e-mail, Internet access, databases, and other business
tools

38
Understanding impact criticality
• Desirable functions---Minor

• Deal with small, recurring issues, or functions

• Need to be recovered over the longer term

• Cause minor disruptions to the business and can easily be restored

39
Recovery Time Requirements
• Maximum tolerable downtime (MTD):
• the maximum downtime a business can tolerate the absence or unavailability of a particular
business function. The higher the criticality the shorter the MTD is likely to be
• Downtime consists of two elements:
• systems recovery and the work recovery time
• Recovery Time Objective (RTO)
• : time available to recover disruptive systems
• Work Recovery Time (WRT):
• second segment of the MTD
• Recovery Point Objective:
• the amount or extent of data loss to be tolerated

40
Recovery Time Objective (RTO)
• The Recovery Time Objective (RTO) is the targeted duration of time and a
service level within which a business process must be restored after a disaster
(or disruption) in order to avoid unacceptable consequences associated with a
break in business continuity.

41
Recovery point objective (RPO)

• A recovery point objective (RPO) is the maximum acceptable amount of data


loss measured in time. It is the age of the files or data in backup storage
required to resume normal operations if a computer system or network
failure occurs.

42
43
MTD -MAO
• Maximum tolerable period of disruption (MTPOD), also known as maximum tolerable
downtime (MTD), maximum tolerable outage (MTO), or maximum allowable outage
(MAO),

44
Methodological steps for developing a business impact
analysis
Define the boundaries of the BIA
• The starting point prior to the development of the BIA is the identification of the scope.
• Top management should have identified the scope, considering the products and services of
the organization. Several key criteria could be considered to decide the products and services
of the organization that need to be protected to assure continuity; including:
• a) market pressure,
• b) specific company sites,
• c) products and services profitability.

• Once the scope has been established, it is strategically recommended that its boundaries are
outlined and precisely defined in terms of with what activity they initiate and with which one
they terminate.
Identify activities that support the scope
• An activity is considered a process or set of processes undertaken by an
organization (or on its behalf) that produces or supports one or more
products or services.

• When the scope is delimited, the organization should identify all the activities
involved in the scope that directly contribute to the generation of its
products and services. A good tool that helps in this step is a flowchart.
Assess Financial and operational impacts
• The next step is to assess the financial and operational impacts that would
affect the organization in the event of a disruption of the activities identified
in the preceding step.

• The financial impact assessment is performed before carrying out the


operational impact assessment.
The financial impact assessment
• This measures the extent and severity of the organization’s financial losses.

• A financial impact assessment is carried out for each activity. The question to
be asked is “What would the magnitude and severity of financial loss be if the
activities were interrupted following a disruption?” The losses are estimated
on a daily basis.
Financial losses for a specific scope.
The second part of the financial impact assessment ranks each impact
in a severity level based on its monetary loss value. The following
scale is recommended:
•Severity level 0: No impact
•Severity level 1: Minor impact
•Severity Level 2: Intermediate level
•Severity level 3: Major impact
Operational Impacts
Identify Critical Activities
• This step identifies the activities that have to be performed in order
to deliver the key products and services, which enable an organization
to meet its most important and time sensitive objectives. The
financial and operational impact rankings assigned in step three
provide a basis for identifying critical activities. An activity is
considered critical if any of the following is true:
Identify critical activities (continued)
• A severity level of 2 or 3 is assigned to its financial impact;
• A ranking of high is assigned to at least three of its operational impacts;
• A ranking of high is assigned to at least two of its operational impacts and a ranking of
highest is assigned to at least one;
• A ranking of highest is assigned to at least two of its operational impacts.
• The critical activities listed in the next Figure were obtained by applying the above
selection criteria to the impact rankings of business activities presented in figures two
and three
Critical Assets
Assess MTPDs and prioritize critical activities:
• “The maximum tolerable period of disruption (MTPD) is the duration after which the viability of the
organization will be irrevocably threatened if product and service delivery cannot be resumed”.

• The estimates of MTPD can be based on either financial or operational impacts. The personnel
responsible for assessing the financial and operational impacts are asked the following question:
“What is the maximum period of time that can be tolerated for this process based on the financial
and operational impact levels?” Let’s imagine that the financial loss of US $25,000 per day becomes
unacceptable when it exceeds US $50,000.

• Therefore, the MTPD is two days, since then the financial losses will exceed US $50,000, if the
disruption continues for a longer period of time. This example assumes that the operational impacts
are insignificant relative to the financial losses.
Assess MTPDs and prioritize critical activities
(continued)
• Usually the analysis requires revising the financial and operational impacts of the disruption
to estimate the MTPD.
• Once the MTPDs are calculated, a priority for their recovery should be established. A critical
activity that has a shorter MTPD compared with another critical activity is assigned a higher
recovery priority.
• Considering today’s connectivity and the dependency on information technology, the trend
of MTPDs is to shrink in terms of duration and probably they will be close to zero in the near
future.
MTPDs and recovery priorities
Estimate the resources that each critical activity will
require for resumption
• In this step, the organization needs to estimate the resources required for resumption
at the level of each critical activity. Previously, the firm should have identified the
minimum level at which each critical activity needs to be performed upon
resumption.
• The sources that a business can use to determine the minimum levels of performance
acceptable are the contractual agreements and service level agreements for the key
products and services involved in the scope. The minimum resources needed for each
activity can be classified as:
• (a) critical IT systems and applications, and
• (b) critical non IT resources.
• This second category can be subdivided in: ‘physical areas’, ‘human competences’,
‘equipment’ and ‘documents’.
Critical activities and resources needed for
resumption
Determine RTOs for critical activities
• “The recovery time objective (RTO) is the target time set for resumption of product, service
or activity delivery after an incident” (Fullick, 2013). The RTO, which is the length of time
between a disruptive event and the recovery of resources, indicates the time available to
recover disrupted resources. The MTPD value expresses the maximum limit for the RTO
value.

• The exercise of business continuity management arrangements enables the organization to


validate its RTOs and, therefore, to take corrective actions to reduce them. Cross-functional
teams involved with the critical activities, have the task to make the estimates of the RTOs.
RTOs and RPOs for critical activities
Identify all dependencies relevant to critical activities
• Identify all dependencies relevant to critical activities: in this step the organization has to
“consider all dependencies relevant to the critical activities, including suppliers and
outsource partners” (Alexander, 2009) The critical activities that have been considered
usually have some vital inputs that are provided by some other company processes or by
external suppliers or outsource partners. The internal processes that supply important
inputs to critical activities have also to be considered as critical activities. In the case of
external suppliers and outsource partners, contractual agreements requiring them to have
a BCMS set up and managed should be in place. It is important to bear in mind that every
company is only as a resilient as its weakest link in the supply chain.
Determine recovery point objectives for critical
activities:
• The recovery point objective is the amount of data lost because of a business
disruption. The RPO is the time that will take to investigate, repair and carry
out all the arrangements to be able to activate the RTO. RPO is measured as
the time between the last data backup and the disruptive event. In the BIA
process, RPO is determined for each application, by asking the critical activity
owners the following question: “What is the tolerance, in terms of length of
time, to loss of data that may occur between any two backup periods?” The
response to this question indicates the values of RPO. In Figure seven there is
an example of RPOs for certain critical activities. The RPO has always to be less
than the RTOs.
Information gathering methods
• Obtaining the information needed for the BIA from relevant areas of the organization can be a complex and
frustrating process. A structured methodology strategy should be developed considering the magnitude of the
scope. Three methods are recommended in the technical literature (Graham, Kaye, 2006) (Hiles, Barnes, 2001)
• Survey: the method uses a set of questions which are prepared in advance and are sent to each activity owner.
The survey allows covering a vast number of respondents. However, this method has two main constraints: (1)
The accuracy of respondents becomes a problem in the event of lack of internal consistency and reliability of
the survey. (2) Survey responses may not be returned within the time allowed for this purpose.
• Interview: in this method the BIA information is collected by personally interviewing the activity owners. The
questions can be tailored according to each particularly activity concerned. Although this method is very
accurate and minimizes the possibility of misinterpreting the questions, it is more expensive than the survey
approach and involves the additional effort of planning, scheduling, and conducting the interview.
• Workshop: this method, which uses group dynamic techniques, allows a group of people strategically chosen
to work together to provide the BIA information needed. Because of group dynamics, a large amount of data is
generated in a short period of time with this method. This technique also allows the activity owners to have a
systematic view of the BIA process and to clear out any misunderstanding regarding the BIA process. In addition
to this, an important side effect associated with this method is the teamwork spirit it helps to create among
owners of critical activities.
• The choice of the appropriate method for gathering BIA information seems to be influenced by its cost,
efficiency, and by the quality of the information. Sometimes the best methodological strategy is to combine
these three techniques.
Business impact analysis project management
• The BIA methodology is based on a task force approach. All the steps of the
methodology are performed by a cross functional group integrated by the owners of
critical activities. To put the methodology into action, someone at the tactical level
having the appropriate support should be appointed as project manager. He/she
becomes responsible for the BIA resources that have been allocated to the BIA
project.
• Moreover, someone at the strategic level, with appropriate seniority and authority
among other responsibilities, should be accountable for supporting the BIA process
and ensuring that the BIA methodology is implemented in the most effective and
efficient manner. It is important to understand that a BIA is developed within an
organizational context.
• It is highly probable that there will be organizational obstacles that could prevent a
BIA project from accomplishing its goals.
• If external consultants are involved, the project manager should ensure that the
consultants work closely together with the critical activity owners.
Start with clear objectives

Maintain focus on
objectives

BIA Best Practices


Use a top-down approach

Vary data collection


methods
PLAN INTERVIEWS AND AVOID THE QUICK
MEETINGS IN ADVANCE SOLUTION

BIA Best Practices (cont.)

USE NORMAL PROJECT CONSIDER THE USE OF


MANAGEMENT TECHNOLOGY
METHODS RESOURCES
Cost-Benefit Analysis
ISEC 4340
Performing a Cost-Benefit Analysis
1. Identify losses you expect before, or without, a countermeasure
2. Identify the losses you expect after implementing the countermeasure
 Calculating projected benefits:
Loss Before Countermeasure ─ Loss After Countermeasure = Projected Benefits
 Determining value of countermeasure:
Projected Benefits ─ Cost of Countermeasure = Countermeasure Value
Cost Benefit Analysis – Class work
• Imagine you have database server that is hosting a large database. Backups are completed regularly. However,
you conducted a risk assessment that determined that backup copies are not stored ‘offsite’. All of the backup
tapes are stored in the same room as the backup server.
• The risk is that a fire could destry this server and all the backup tapes. By storing a copy of the back up tape at
an alternate location, you can eliminate the risk.
• You need to identify the value of the data. If this is a primary database for the organization, the value could
easily be in the millions of dollars. For this example, imagine that the value is $ 1 million. A complete loss before
the control is $1 million dollar.
• An external company can pickup your tapes weekly and store them at an alternate location. The company can
also rotate tapes in and out based on your needs. Imagine the cost for this service is $100 a month.
• If you subscribe to this service, the most you can use is seven days’ worth of data. If a fire destroys your
building right before the most recent backup tape is picked up, you will loos the last seven days. Imagine that the
values of this week’s worth of data is $10000.
• Calculate the cost benefit analysis and recommend.
Cost benefit analysis

• Hackers are regularly trying to attack an online book selling company and 2.6 such attacks are
successful every year. Each successful hack attack results in a loss of about $10000 to the
company. The current firewall is an outdated one. A consulting company suggested to replace the
firewall with a new one.

• A company XYZ proposed a firewall at a cost of $9000 and a maintenance cost of $5,000. The
estimated useful life of the firewall is 5 years. The company guarantees that the chance that an
attacker break through the firewall is reduced to 30%.

• Conduct a cost benefit analysis and recommend.


CBA Report Elements
 Recommended countermeasure
 Risk to be mitigated
 Annual projected benefits
 Initial costs
 Annual or recurring costs
 A comparison of the costs and benefits
 Recommendation
Class work
• An investigation by the Information Security department has shown that the
cost of rectifying a website damaged by a hacker is about Rs. 200k per
incident. Available records (over the last ten years) show that such hacking
activity has happened about five times during this period for comparable
businesses. You have been asked to evaluate a security solution consisting of
• • Two application-level firewalls (costing Rs. 20k each),
• • One IPS/IDS appliance (costing Rs. 10k each).
• The expected lifetime of the solution is 5 years - the cost is capitalised over 5
years.
• All security systems carry a simplified 20% (of total cost) charge for
‘installation, support, maintenance, and management’ per year.
Asset Value
• Asset Value (AV) – includes the following:
 cost of buying/developing hardware, software, service

 cost of installing, maintaining, upgrading hardware, software, service

 cost to train and re-train personnel

• Exposure – percentage loss that would occur from a given vulnerability being
exploited
Cost Benefit Analysis
• aka economic feasibility study - quantitative decision-making process that:
 determines the loss in value if the asset remained unprotected

 determines the cost of protecting an asset

 helps prioritize actions and spending on security …


Net Risk
Reduction
Benefit
(NRRB)

Cost Benefit
Analysis ACS (annual
cost of safe
guards)

Example ALE (prior) ALE( post


safeguards)
CBA Example:
Determining NRRB Your organization has decide to centralize anti-virus support
on a server which automatically updates virus signatures on user’s PCs . When
calculating risk due to viruses, the annualized loss expect. (ALE) is $145,000. The
cost of this anti-virus countermeasure in a year is estimated to be $24,000, but it
will lower the ALE to $65,000. Is this a cost-effective countermeasure? Why or
why not?
Quantitative Analysis
A widget manufacturer has installed new network servers, changing its network from
P2P, to client/server-based network. The network consists of 200 users who make an
average of $20 an hour, working on 100 workstations. Previously, none of the
workstations involved in the network had an anti-virus software installed on the
machines. This was because there was no connection to the Internet and the
workstations did not have USB/disk drives or Internet connectivity, so the risk of
viruses was deemed minimal. One of the new servers provides a broadband
connection to the Internet, which employees can now use to send and receive email,
and surf the Internet.
Example:
Determining ALE to Occur from Risks (cont.)
• One of the managers read in a trade magazine that other widget companies
have reported an annual 75% chance of viruses infecting their network, and it
may take up to 3 hours to restore the system. A vendor will sell licensed
copies of antivirus software for all servers and the 100 workstations at a cost
of $4,700 per year. The company has asked you to determine the annual loss
that can be expected from viruses and determine if it is beneficial in terms of
cost to purchase licensed copies of anti-virus software.
Other feasibilities

• Organizational feasibility – A firewall may be good from security point of


view, but it may prevent free flow of data

• Behavioral feasibility – user’s acceptance

• Technical feasibility

• Political feasibility
Disaster Recovery Plan
Introduction
• Business continuity planning (BCP) is a methodology used to create and validate a plan for
maintaining continuous business operations before, during, and after disasters and disruptive events.

• Disaster recovery is a part of business continuity and deals with the immediate impact of an event.
Recovering from a server outage, security breach, or hurricane, all fall into this category.

• Disaster recovery involves stopping the effects of the disaster as quickly as possible and addressing
the immediate aftermath. This might include shutting down systems that have been breached,
evaluating which systems are impacted by a flood or earthquake, and determining the best way to
proceed.

• Once the effects of the disaster or event have been addressed, business continuity activities typically
begin.
Components of Business

• The components include people, process, and technology.


Technology is implemented by people using specific
processes. Technology is only as good as the people who
designed and implemented it, and the processes
developed to utilize it.
People in DR planning

• People are the ones who do the actual planning and implementation of a
disaster plan.
• Every company is different, and therefore, every DR planning process will
have to be different. A small retail outlet’s IT planning for DR will be very
different from a college, hospital, accounting firm, or a manufacturing facility.
• According to a survey completed in 2010, human error is responsible for 40%
of all data loss, as compared to just 29% for hardware or system failures.
People are responsible for designing, implementing, and monitoring
processes intended to safeguard data. However, people make mistakes every
single day.
People in DR planning
• Another key aspect to people in DR planning is that it’s critical to remember
that if a disaster hits your company, people will have a wide variety of
responses. Some people, especially those with emergency preparedness
training, will rise to the occasion and start taking effective action through
leadership roles. As was seen in many natural disaster responses over the
years, people are often without food, shelter, power, or cellular service.
Question

• Considering Modern College of Business and Science :

1. What are the role of Lecturers in Disaster Recovery ?

2. What is the role of IT team in Disaster Recovery ?

3. What is the role of the Database team in Disaster Recovery?


Process in DR planning
• Process in DR planning has two phases: the planning phase and the implementation phase.

• The processes your company uses to run the day-to-day business are key to the long-term success
of the business. These processes are developed (and hopefully documented) in order to manage
the recurring business tasks. Things outside the normal recurring tasks typically are handled as
exceptions until they recur often enough to create a new process, and the cycle continues.

• If your business is suddenly hit by a disaster—fire, flood, earthquake, or chemical spill—your


processes are immediately interrupted. Trying to develop effective processes in the face of an
emergency is usually not at all successful. Having simple, well-tested processes to rely on when
disaster strikes is often the difference between eventual recovery and business failure
Question:
• Why it is difficult to create an effective process for Disaster Response ?
Technology in DR planning
• Part of the reason for DR planning is to look at your use of technology and
understand which elements are vulnerable to which types of disasters.

• A power outage, for example, impacts all the technology in a building. As we


look at DR planning, we’ll also look at various vulnerabilities of different
technologies and discuss, in broad strokes, strategies, tools, and techniques
that might be helpful to mitigate or avoid some of these risks.
The Cost of Planning Versus the Cost of Failure
• Disasters can result in enormous business losses—financial, investor
confidence, and corporate image. They can also lead to serious legal issues,
especially when more and more private data are being captured, stored, and
transmitted across the public Internet. These losses and legal challenges can
have a small, short-term impact but often, they have a significant, long-term
impact, and in some cases endanger the existence of the company.
The Sony Playstation Incident
The Cost of Planning Versus The Cost of Failure
• Fire is the most common emergency (disaster) companies face. 40-50% of companies that
experience a major fire go out of business because most do not have BC/DR plans in place.

• Despite the high likelihood that a company will go out of business after a disaster, more than
90% of small businesses lack a disaster recovery plan.

• Even though many companies say they understand the need for a disaster recovery plan,
very few actually make it a priority.

• There may be substantial financial and legal implications for failing to plan and for failing to
take reasonable precautions. This can add to a company’s burdens after a disaster strikes.
Types of Disasters
• Threats or hazards come in three basic categories: Natural hazards, Human-caused hazards , Accidents and

technological hazard.

• Natural hazards include weather problems in both hot and cold climates as well as geological hazards such as

earthquakes, tsunamis, volcanic eruption, and land shifting.

• Human-caused hazards can be accidental or intentional. Some intentional human-caused hazards fall under the

category of terrorism, and some are less severe and may be “simply” criminal or unethical. • Human-caused

hazards include cyber-attacks, rioting, protests, product tampering, bombs, explosions, and terrorism, to name

a few.

• Accidents and technological hazards include such issues as transportation accidents and failures, infrastructure

failures, and hazardous materials accidents, to name a few


Disaster Recovery
Planning Basics
DISASTER RECOVERY PLANNING BASICS
Some types of disasters that organizations can plan for include:
• Application failure
• Communication failure
• Data center disaster
• Building disaster
• Campus disaster
• Citywide disaster
• Regional disaster
• National disaster
• Multinational disaster
• Having two servers or routers in the same rack leaves your network
vulnerable—the single point of failure could be as simple as someone tripping
and spilling a large cup of coffee on the rack itself.

• You might conscientiously make backups, verify the backups, and store them
securely but leave them on-site. The single point of failure could be as minor
as something falling on the rack holding your tape backups or as major as a
serious fire in the server room or building.
Disaster Recovery Planning Basics.
The basic steps in any Disaster Recovery plan include:
• Project initiation
• Risk assessment
• Business impact analysis
• Mitigation strategy development
• Plan development
• Training, testing, and auditing
• Plan maintenance
Project initiation
• Project initiation is one of the most important elements in Disaster Recovery
planning because without full organizational support, the plan will be
incomplete. As an IT professional, there may be limits to what you can do to
create an organization-wide functional DR plan. For example, If the application
server is destroyed and you have data backups, do you also have a way to
access those backups? Do you have a way to allow users to connect to the
application securely? Where are users located? How will business resume? Can
it resume without that application in the near term or not? You will not likely
be able to answer these questions.
Risk assessment
• Risk assessment is the process of sitting down with key members of your company
and looking at the potential risks your company faces. These risks run from ordinary
to extraordinary—from a fire or minor flood in a server room to a catastrophic loss
such as an earthquake or major hurricane and everything in between.
• An IT professional, you can certainly lend your expertise to this process by helping
define the likely impact to technology components in various types of disasters or
events, but you can’t do it alone. For example, it’s likely that your transportation
manager understands the potential business impact of bad weather around the
country, not just in your local area. Your marketing manager might best understand
the potential business risk of a contaminated product or a Web site breach.
Business impact analysis
• Once you’ve outlined your risks, you need to turn your attention to the
potential impact of these various risks.

• For example, you might determine that your Enterprise Resource Planning or
your Electronic Medical Record application cannot be down. Period. E-mail,
Web servers, and reporting tools, however, can go down, even though both
events would be disruptive. Once you understand these parameters, you can
develop an IT-based strategy to meet the requirements that result from this
analysis
Mitigation strategy development

• The mitigation strategy might be quite simple for a small company.


Keep critical data backed up to a secure cloud location, keep several
copies, of backups off-site, and keep several copies of key information
such as employee list, phone numbers, emergency service phone
numbers, key suppliers, and customers in a binder off-site in a secure
but accessible location.
Plan development

• After you’ve gone through the analysis steps, you’ll be ready to


develop your plan. As with other types of IT project plans, you’ll want
to outline the methodology you’re going to follow so that you
improve your chance of success and reduce your chances for errors
and gaps. This includes standard processes such as developing
business and technical requirements, defining scope, budget,
timeline, quality metrics, and so forth.
Training, testing, and auditing

• Once the plan has been developed, people need to be trained on how
to implement it. In many cases, scenario-based case studies can be a
good first step. Running through appropriate drills, exercises, and
simulations can be of great help, especially for disasters or events
that rank high on the list of “likely to occur.
Plan maintenance

• Finally, plan maintenance is the last step in the DR planning process, and in
many companies, it is “last and least.” Without a plan to maintain your
plan, it will become just another project document on a file server or
sitting in a binder on a shelf. If it doesn’t get maintained, updated, and
revalidated from time to time, you’ll find that the plan may be rendered
useless if a disaster does strike. Maintenance doesn’t have to be an
enormous task, but it is one that must be done.
Recovery plan considerations

• The recovery time objective (RTO) describes the target amount of


time a business application can be down, typically measured in hours,
minutes or seconds.

• The recovery point objective (RPO) describes the age of files that
must be recovered from backup storage for normal operations to
resume.
Types of disaster recovery plans
• Virtualized disaster recovery plan - Virtualization provides opportunities to implement disaster
recovery in a more efficient and simpler way. A virtualized environment can spin up new virtual
machine (VM) instances within minutes and provide application recovery. Testing can also be easier
to achieve, but the plan must include the ability to validate that applications can be run in disaster
recovery mode and returned to normal operations within the RPO and RTO.

• Network disaster recovery plan - Developing a plan for recovering a network gets more
complicated as the complexity of the network increases. It is important to detail the step-by-step
recovery procedure, test it properly and keep it updated. Data in this plan will be specific to the
network, such as in its performance and networking staff.
Types of disaster recovery plans
• Cloud disaster recovery plan - Cloud disaster recovery (cloud DR) can range from a file backup in the
cloud to a complete replication. Cloud DR can be space, time and cost-efficient, but maintaining the
disaster recovery plan requires proper management. The manager must know the location of physical
and virtual servers. The plan must address security, which is a common issue in the cloud that can be
alleviated through testing.

• Data center disaster recovery plan - This type of plan focuses exclusively on the data center facility
and infrastructure. An operational risk assessment is a key element in data center DRPs. It analyzes key
components such as building location, power systems and protection, security, and office space. The plan
must address a broad range of possible scenarios.
Disaster recovery plan checklist

• Determine Recovery Objectives.

• Identify the stakeholders (Team)

• Channels Communication Establish

• Tests Extensive Perform

• Stay Up to Date.
Example
• The DR plan for a modern Company, running 200 physical servers and virtual
servers in an on-premises data center. The company relies on its production
environment being available 24/7 to customers, which is why their DR
strategy needs to function perfectly with minimal downtime. This company
uses Amazon Web Service (AWS) as their target DR infrastructure in order to
cut costs and improve their RTO and RPO.
RTO: 5 minutes

According to their RTO, the production must be shifted


from the on-premises Data Center to AWS
Recovery Objectives

RPO: 0 Minutes
Recovery Point is near Zero because the business cannot
tolerate any loss. This is why data is continuously
replicated from the on-premise environment to Cloud

Required Documents:
-stakeholder Register
• Risk Register
• Communication Plan.
Incident Response Plan
Introduction
• Incident response is the process that gets triggered when
something unexpected happens in such a way that the
continuity is threatened.
• Disaster recovery comes into play when an incident is so
huge that the business cannot continue its operations.
Necessary Prerequisites
• Prior to building the incident response program, specific capabilities
must exist. Examples of these capabilities include:
• Access-control processes and restriction of elevated privileges
• Protection from misuse of data in motion, in use, and at rest
• Hardening of hardware, based on established standards
• Understanding and management of vulnerabilities
• Existence of communication and control network protections
(firewalls, etc.)
Incident Response plan
Incident Response Frameworks
• The National Institute of Standards and Technology (NIST) publishes many
documents available for cybersecurity practitioners, specifically, the NIST
(SP) 800-61 Computer Security Incident Handling Guide.
The elements of NIST 800-61 include the following:
• Organizing a Computer Incident Response Capability
• Handling an Incident
• Identify
• Contain
• Eradicate
• Recover
• Post-incident
Organizing a Computer Incident Response
Capability
• Organizing an incident response program requires that an incident be
defined. Not everything that is unusual is an incident. Prior to
defining anomalies as incidents, these occurrences must be analyzed
and triaged as events.
• Policies and procedures
• The team
• Goals, strategy, and objectives
• The incident response plan
• Tactical procedures
Incident Response Definitions
• Event is an observable occurrence in a system or network.
An example of an event is quarantined e-mail that appears to be
suspicious. A security analyst assesses the e-mail and decides either to
release it to the recipient or eradicate it.
• Adverse Event: Event resulting in negative consequences
System outages, whether malicious or accidental, fall into the adverse
event bucket.
• Incident: Violation of policies
Insider threats that remove data without authorization trigger a full-
fledged incident response.
The team
• The incident response plan identifies the individuals who make up the
incident response team and their roles.
• Usually, someone from cybersecurity, at the manager
or director level, owns incident response.
• Management: Management owns incident response: It funds, allocates
resources, and controls policy decisions.
• IT support: Not everyone in IT will respond to incidents. Unique events call
for others to participate, based on expertise.
• Legal department: The general counsel’s presence on the extended team or
executive response team is expected. Engaging the legal department earlier
should be expected in certain situations.
• Public affairs and media relations: Large breaches garner media attention,
and involvement of personal information requires disclosure.
• Human resources: This group’s input becomes necessary when employee
involvement is suspected.
How Vulnerabilities Become Risks
• Vulnerabilities represent weaknesses in information systems. Threat
actors seek to uncover and exploit these in a successful attack. Weak
passwords, default accounts with default passwords, and unpatched
systems are examples of vulnerabilities commonly exploited.
• For a risk to be present, a threat and a vulnerability must exist.
Vulnerabilities that no threat actor or scenario would exploit are not a
cybersecurity risk.
• A threat actor, in this case a malicious insider, exploits a
vulnerability—default admin credentials—creating a risk to the
confidentiality, integrity, or availability of customer data
Detection and Identification of Event
• Incident response begins with the detection and identification of events.
Detection should be deployed based on risks identified and potential attack
patterns of known threats.
• Several provide automated detection and identification. Automation is
desirable when it lowers costs, increases efficiency and is more reliable
than manual processes. A significant use case for automation exists when
technology correlates and detects behavior patterns and activity not
always seen easily with the human eye.
• Not all detection requires technology. End users are an example of how the
human element can be very effective, such as noticing phishing e-mails first
when other employees do not observe good e-mail hygiene
Detection and Identification of Event
• End Point Detection and Response
End point detection and response is a capability used to detect
changes made to end points consistent with known indicators of
attack or behavior and inconsistent baselines of normal behavior.
These solutions act in a front-line detection capacity and are valuable
during containment
These solutions allow the team to quickly respond to the event and
take appropriate action.
Example : FireEye Endpoint Security and Symantec Endpoint Protection
Detection and Identification of Event
• Analyzing Traffic
Packet capture aids incident response teams’ need to confirm
whether suspected events exist. Organizations implement these
solutions based on the incident response and monitoring strategy.
Example is NetFlow developed by Cisco, which allows entities to
capture data on the origination, destination, and amount of traffic.
• Security Incident and Event Management (SIEM)
Security Information and Event Management (SIEM) is a set of tools and services offering a
holistic view of an organization's information security. SIEM tools provide: Real-time visibility
across an organization's information security systems.
Containment
• Containment comes after identifying an event and concluding that action is
required to limit its impact.
• Containment is about limiting the damage done by attackers. This is
achieved by keeping the attacker away from key assets not yet
compromised. Containing an event or incident requires identifying
indicators of the attack and identifying them in other systems
• Once a system is suspected of being compromised, it should be isolated.
Some ways to do this includes : Unplugging the network cable, Putting the
machine in sleep mode (Powering it off causes volatile memory loss and
the loss of forensic evidence.) or Isolating the machine, so that it cannot
receive data via changes to DNS and firewall rules.
Containment
• Denial of Service
Denial of service (DoS) and distributed denial of service (DDoS) attacks aim to shut down
services and disrupt business operations. The attacks target web-facing applications, and
DNS service.
Attempting to contain these attacks involves the following important steps:
1. Assess firewalls, routers, servers, and other affected device logs.
2. Pinpoint how the traffic for the DDoS attack differs from nonthreatening ones and
review network traffic looking for DDoS traffic.
3. Block traffic with perimeter devices.
4. Block outbound traffic responding to the DDoS.
5. Blackhole malicious IPs attributed to the attacker.
6. Temporarily disable applications and services affected by the attack.
7. contact the Internet service provider to confirm if it sees the attack
containment
• Lost Assets
Assets can be misplaced or stolen by end users and employees, and
when these events occur, several questions must be answered.
 Assets can be laptops, tablets, mobile phones, desktops, printers,
hard drives, and other types of removable or portable storage.
Attempting to contain these attacks involves the following important
steps:
1. Reporting to Policy
2. Wiped remotely
Eradication, Recovery, and Post-incident
Review
• Eradication is the process of removing all the remnants of a
cyberattack. This starts once systems known to be compromised are
available to be taken offline so that eradication can occur. Removing
files and reversing registry and configuration changes malware and
attackers made during the attack are addressed.
• Once all the affected machines are identified and isolated and
forensic backups are completed, the company can address
weaknesses exploited by the attackers. These vulnerabilities are
patched, and insecure configurations repaired.
Eradication, Recovery, and Post-incident
Review
• Eradication Techniques.
• Malware Artifacts
Antivirus solutions removed files and fixed changes made to
operating systems by malicious software.
Some Malware can only be removed by:
1. Taking the machine offline by removing the network cable
2. Booting the machine in safe mode
3. Using the Malware removal tool
4. Rebooting the machine and confirming that the infection is gone
Business continuity
STRATEGY
Introduction
• BCM strategy should be aligned with business and IT strategies to ensure
that regulatory and legal requirements are met. BCM policies and
procedures should incorporate the necessary controls to ensure that data
integrity and privacy are not compromised during recovery efforts.
• While developing business continuity strategy, the following should be
focused:
1. Business processes and operations
2. Users
3. Data center
4. Networks
5. Facilities
6. Supplies
7. Data (off-site storage of backup data and applications)
Introduction
• The following factors pose a large challenge in the choice of appropriate
BCM strategies:
1. Presence in multiple locations
2. Availability of recovery options such as owned, leased, shared or mobile facilities
3. Increasing number of threats, risks and vulnerabilities
4. Complexity of external dependencies on supply chain channel

• Business Continuity strategy is based on worst-case scenarios, and Business


Continuity team will help build these scenarios based on past incidents and
future predictions. Some businesses propose business recovery strategies
that are different from the rest of the organization.
BUSINESS CONTINUITY OBJECTIVE
• The business continuity objectives are the real premise to begin with since they
convey the management attitude and commitment toward the BCM program.
BCM objectives may include the following:
1. Protection of assets.
2. Measures to limit loss during disruption.
3. Minimize business loss and loss of customer goodwill.
4. Improving prompt salvage of assets during disaster.
5. Ensuring orderly evacuation of personnel and moving them to safety. Providing
resources for BCM and ensuring proper coordination between BCM teams by
properly structuring them across locations and providing for their backups in
case any of them is not available during crisis.
6. Reduction of response time through planning and exercising.
Recovery options
1. Prevention: It is a good strategy that aims at reducing the chances of
the disaster happening. It consists of deterrent controls that reduce the
likelihood of the threats occurring. Preventive controls safeguard the
vulnerable areas to ward off any threat that occurs and reduce its
impact. Having these measures in place is always more cost-effective
than attempting recovery after the interruption.
Recovery options
• The following are few types of preventive controls that can be adopted by the enterprise:
a. Ensure security of the facilities: It is an example of a deterrent control that obstructs
unauthorized entry to the installation/facilities by imposing physical access controls
such as guards, biometric access control, and surveillance systems at the location.
b. Personnel procedures: Critical locations can be made restricted zones, entry to
authorized personnel only, and a log of entry other than authorized personnel has to
be maintained. Identification badge is a good way of identifying personnel and
ensuring that they are confined to their authorized work spaces only.
c. Application controls: They help run business processes. Hence proper access control,
antivirus software, encryption algorithms, firewalls for peripheral security, intrusion
detection systems to study anomalous behavior over the network, annual vulnerability
assessments and penetration testing to overrule risk from open ports, and so on may
be deployed as preventive controls
• Data storage controls: Off-site storage of backups and a proper
predefined backup policy and procedures for backup, storage, testing,
restoration, and purging after retention dates expire are controls
connected to data storage.
Recovery options
2. Response: In this stage, the first responses to an incident should be
delisted. The first response to an incident is to notify the right people.
A point to note is that major recipients of BCM communication are:
– CIOs and CTOs
– IT directors and data center managers
– Security and risk management officers
– Data center architects
– Application owners
• Notification of impending disaster can be given by issuing prior warning
through the appointed communication channels to employees, visitors,
and/or customers on the premises.
• Timely notification can ensure orderly shutdown of machines and systems
and if necessary have an orderly evacuation of premises made in case of
risk to premises. This is one of the first response steps to move to safety all
personnel on the premises and to alert the police, fire service, and
hospitals. This is required only if the interruption is of the nature of an
accident, act of sabotage, or natural calamity. Precise notification
procedures must be documented, and call lists for persons to be contacted
and informed should exist both at primary site and at the backup site to
facilitate mobilization of notification procedures. Notification can be done
using various tools: pager, short message service (SMS), phone, and e-mail.
Recovery options
• 3. Resumption: It involves resuming only the time-sensitive business
processes, either immediately after the interruption or after the
declared mean time between failures (MTBF).
• All operations are not fully recovered. The focus shifts to the
command center once the BCM teams declare the severity of the
disaster and invoke the appropriate plan of action. The resumption
and subsequently the recovery activities are coordinated after this
point.
• Command center is a facility located near to the primary facility and
has adequate communication facilities, PCs, printers, fax machines,
and office equipment to support the activities of the team
Choice for alternate processing sites
• Hot site: A hot site is a fully functional data center with hardware,
software, personnel, and customer data. It is a 24/7 staffing; it is
ready to be operational within a small span of time. In case of
extremely small RTOs and RPOs, it would be good to have the systems
up and running in a short time. Organization such as financial
institutions where they hold a lot of customer data and have lot of
customer-facing applications has to go for a hot site option.
• Warm site: A warm site is an equipped data center with hardware,
software, network services, and personnel. The element missing here
is customer data. An organization can install additional equipment
and introduces customer data when a disaster occurs
• Cold site: A cold site is a type of data center which has its own
associated infrastructure that includes power, telecommunications,
and environmental controls designed to support IT systems,
applications, and data which are installed only when disaster strikes,
and the DR plan is activated.
• Mobile site: A mobile site is a portable van or trailer that can be used
as an emergency-processing center at the time of disaster. It provides
an excellent alternative to the above three options. After a disaster,
the trailer can move to site, all essential equipment, and supplies can
be loaded onto it, and then connection for power and communication
are added to it before it can be made functional.
• Mirrored site: A mirror site is identical in all aspects to the primary
site, right down to the information availability. It is equivalent to
having a redundant site in normal times and is naturally the most
expensive option. At the alternate site (or primary site, if still usable),
the work environment is restored. Communication, networks, and
workstations are set up and contact with the external world can be
resumed.
• Restoration: : It is the process of repairing and restoring the primary
site. At the end of this, the business operations are resumed in
totality from the original site or a completely new site. While the
recovery team is supporting operations from the alternate site,
restoration of the primary site for full functionality is initiated.
Business continuity and
disaster recovery audit
Internal Controls
• Internal controls, stated in the simplest terms, are mechanisms that
ensure proper functioning of processes within the company. Every
system and process within a company exists for some specific
business purpose. The auditor must look for the existence of risks to
those purposes and then ensure that internal controls are in place to
mitigate those risks.
Types of Internal Controls
• Controls can be preventive, detective, or reactive, and they can have
administrative, technical, and physical implementations. Examples of
administrative implementations include items such as policies and
processes. Technical implementations are the tools and software that
logically enforce controls (such as passwords).
• Preventive Controls Preventive controls stop a bad event from
happening. For example, requiring a user ID and password for access
to a system is a preventive control. It prevents (theoretically)
unauthorized people from accessing the system.
Types of Internal Controls
• Detective Controls Detective controls record a bad event after it has
happened. For example, logging all activities performed on a system
will allow you to review the logs to look for inappropriate activities
after the event
• Reactive Controls (aka Corrective Controls) Reactive controls fall
between preventive and detective controls. They do not prevent a
bad event from occurring, but they provide a systematic way to
detect when those bad events have happened and correct the
situation, which is why they are sometimes called corrective controls.
For example, you might have a central antivirus system that detects
whether each user’s PC has the latest signature files installed
Internal Control Examples
Backups and Disaster-Recovery Plans
• If the system or its data were lost, system functionality would be
unavailable, resulting in a loss of your ability to track outstanding
receivables or post new payments.
• What are some internal controls that would mitigate this risk?
1. Back up the system and its data periodically.
2. Ship backup tapes offsite.
3. Document a disaster recovery plan
internal audit
• To provide independent assurance to the audit committee (and senior
management) that internal controls are in place at the company and
are functioning effectively.
• To improve the state of internal controls at the company by
promoting internal controls and by helping the company identify
control weaknesses and develop cost-effective solutions for
addressing those weaknesses.
Data Center Auditing Essentials

A data center is a facility that is designed to house an


organization’s critical systems, which comprise computer
hardware, operating systems, and applications.
Test Steps for Auditing Data Centers
• The following topic areas should be addressed during the data center
audit:
• Neighborhood and external risk factors
• Physical access controls
• Environmental controls
• Power and electricity
• Fire suppression
• Data center operations
• Data backup and restore
• Disaster recovery planning
Test Steps for Auditing Data Centers
Neighborhood and External Risk Factors
When auditing a data center facility, you should first evaluate the
environment in which the data center resides. The goal is to identify
high-risk threats. For example, the data center you are auditing may be
in the flight path of a regional airport, flood zone, or a high-crime area.
Test Steps for Auditing Data Centers
Physical Access Controls
Several information security incidents have occurred in which thieves gained
unauthorized access to sensitive information by defeating physical access
control mechanisms.
Therefore, restricting physical access is just as critical as restricting logical
access. In a data center environment, physical access control mechanisms
consist of the following:
• Exterior doors and walls
• Access control procedures
• Physical authentication mechanisms
• Security guards
Test Steps for Auditing Data Centers
Environmental Controls
• Computer systems require specific environmental conditions such as
controlled temperature and humidity. Data centers are designed to provide
this type of controlled environment. When auditing a data center, you
should verify that there is enough HVAC capacity to service the data center
even in the most extreme conditions.
• IT Auditor need to review the Temperature and humidity logs to verify that
each falls within acceptable ranges over a period of time. In general, data
center temperatures should range from 65 to 70°F (with temperatures
above 85°F damaging computer equipment) and humidity levels should be
between 45 and 55 percent. However, this will vary depending on the
specifications of the equipment.
Test Steps for Auditing Data Centers
Environmental Controls
• Computer systems require specific environmental conditions such as
controlled temperature and humidity. Data centers are designed to provide
this type of controlled environment. When auditing a data center, you
should verify that there is enough HVAC capacity to service the data center
even in the most extreme conditions.
• IT Auditor need to review the Temperature and humidity logs to verify that
each falls within acceptable ranges over a period of time. In general, data
center temperatures should range from 65 to 70°F (with temperatures
above 85°F damaging computer equipment) and humidity levels should be
between 45 and 55 percent. However, this will vary depending on the
specifications of the equipment.
Test Steps for Auditing Data Centers
Environmental Controls
• Auditor should verify the temperature and humidity alarms to ensure
data center personnel are notified of conditions when either factor
falls outside of acceptable ranges. Sensors should be placed in all
areas of the data center where electronic equipment is present.
Ensure that sensors are placed in appropriate locations either by
reviewing architecture diagrams or by touring the facility.
• Auditor should verify that the HVAC design to verify that all areas of
the data centers are covered appropriately. Determine whether the
air flow within the data center has been modeled to ensure adequate
and efficient coverage.
Test Steps for Auditing Data Centers
Fire Suppression

Since data centers face a significant risk from fire, they typically have sophisticated fire suppression systems, generally one of two
types: gas-based systems and water-based systems.

The Auditor should Ensure that fire suppression systems are protecting the data center from fire. All data centers should have a fire
suppression system to help contain fires. Most systems are gas-based or water-based and often use multistage processes, in which
the first sensor (usually a smoke sensor) activates the system and a second sensor (usually a heat sensor) causes a discharge of
either water or gas.

• Gas-Based Systems Varieties of gas-based fire suppression systems include CO2 FM-200 and CEA-410. Gas-based systems are
expensive and often impractical, but their use does not damage electronic equipment.

• Water-Based Systems Water-based systems are less expensive and more common but can cause damage to computer
equipment. To mitigate the risk of damaging all the computer equipment in a data center or in the extended area of a fire, fire
suppression systems are designed to drop water from sprinkler heads only at the location of the fire
Test Steps for Auditing Data Centers
Data Center Operations Effective data center operations require strict
adherence to formally adopted policies, procedures, and plans.
the areas that should be covered include the following:
• Roles and responsibilities of data center personnel
• Segregation of duties of data center personnel
• Facility and equipment maintenance
Test Steps for Auditing Data Centers
Data Center Operations Effective data center operations require strict adherence
to formally adopted policies, procedures, and plans. The areas that should be
covered include the following:
• Roles and responsibilities of data center personnel
The Auditor should ensure that roles and responsibilities of data center personnel
are clearly defined.
• Segregation of duties of data center personnel
The Auditor should verify that duties and job functions of data center personnel are
segregated appropriately.
• Facility and equipment maintenance
The Auditor should verify that data center facility-based systems and equipment
are maintained properly by reviewing maintenance logs for critical systems and
equipment
Test Steps for Auditing Data Centers
Disaster Recovery Planning: The goal of disaster recovery planning is to reconstitute systems efficiently following a disaster, such as a

hurricane or flood .

• The Auditor should ensure that a disaster recovery plan (DRP) exists and is comprehensive and that key employees are aware of their roles

in the event of a disaster. If a disaster strikes your only data center and you don’t have a DRP, the overwhelming odds are that your

organization will suffer a large enough loss to cause bankruptcy. Disaster recovery, therefore, is a serious matter.

• An auditor who is auditing an organization’s disaster recovery plan should also interview personnel who participate in Disaster Recovery

• The Auditor should verify that the DRP covers all systems and operational areas. It should include a formal schedule outlining the order in

which systems should be restored and detailed step-by-step instructions for restoring critical systems.

• The Auditor should verify that the DRP identifies a critical recovery time period during which business processing must be resumed before

suffering significant or unrecoverable loss. Validate that the plan provides for recovery within that time period.

• Ensure that DRPs are updated and tested regularly.


Test Steps for Auditing Data Centers
Data Backup and Restore: System backup is regularly performed on most
systems. Often, however, restore is tested for the first time when it is
required because of a system corruption or hard-disk failure. Sound backup
and restore procedures are critical for reconstructing systems after a
disruptive event.
• The Auditor should ensure that backup procedures and capacity are
appropriate for respective systems. Backup schedules typically are 1 week
in duration, with full backups normally occurring on weekends and
incremental or differential backups at intervals during the week.
• Verify that systems can be restored from backup media.
• Ensure that backup media can be retrieved promptly from off-site storage
facilities.
Test Steps for Auditing Data Centers
• The Auditor should determine whether a Business Impact Analysis
(BIA) has been performed on the application to establish backup and
recovery needs. A business impact analysis is the first major task in a
disaster recovery or business continuity planning project. A business
impact analysis helps determine which processes in an organization
are the most important.
Criticality Analysis
• When all of the BIA information has been collected and charted, the
criticality analysis (CA) can be performed. The criticality analysis is a
study of each system and process, a consideration of the impact on
the organization.
Recovery Time Objective (RTO) vs Recovery
Point Objective (RPO)
Recovery Time Objective (RTO) Recovery Point Objective (RPO)
is the period from the onset of an outage until is the period for which recent data will be
the resumption of service. irretrievably lost in a disaster.
Definition

measured in hours or days usually measured in hours or days or minutes.


measured
Each process and system in the BIA should have However, for critical transaction systems, RPO
an RTO value. could even be measured in minutes.
Other

You might also like