g_Risk

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Hex 2224

Risk Management Guidelines

Hex 2224

Risk Management Guidelines

PRIME For Internal Use Only Document Ver.: 6.1


Page 1/38
Hex 2224
Risk Management Guidelines

TABLE OF CONTENTS

1. Risk Governance ......................................................................................................... 3


1.1 First Level of Defence – Project Level .......................................................................................... 3
1.2 Second Level of Defence – Account / Vertical Level .................................................................... 4
1.3 Third Line of Defence – Enterprise Level ...................................................................................... 5
2. Risk management at Project Level .............................................................................. 7
2.1 Risk Management Planning .......................................................................................................... 7
2.2 Execution of Risk Management Plan ............................................................................................ 7
2.2.1 ..... Risk Identification ................................................................................................................. 7
2.2.2 ..... Risk Assessment & Prioritization.......................................................................................... 9
2.2.3 ..... Risk Mitigation and Contingency planning ......................................................................... 11
2.2.4 ..... Risk Monitoring & Control ................................................................................................... 13
2.2.5 ..... Risk Analysis ...................................................................................................................... 13
3. Risk Management Tools and Techniques ................................................................. 13
3.1 Brainstorming .............................................................................................................................. 14
3.2 Delphi Technique ........................................................................................................................ 15
3.3 Failure Mode and Effects Analysis (FMEA) ................................................................................ 17
3.4 Event Tree Analysis (ETA) .......................................................................................................... 22
3.5 Fault Tree Analysis (FTA) ........................................................................................................... 25
3.6 Bow Tie Analysis (BTA)............................................................................................................... 28
3.7 Bayesian Statistics ...................................................................................................................... 31
APPENDIX A – Some Typical Project Risks ................................................................. 35
APPENDIX B – Industry Top 10 Risks .......................................................................... 36
APPENDIX C – Tracking Status of Indicators - Examples ............................................ 38

PRIME For Internal Use Only Document Ver.: 6.1


Page 2/38
Hex 2224
Risk Management Guidelines

1. Risk Governance

Enterprise Risk Management (ERM) facilitates Hexaware management’s desire to effectively govern and
manage the enterprise’s approach to risk management and to create sustainable value to its stakeholders
through business objectives.

In Hexaware, ERM involves the strategic implementation of three lines of defence as the first principle of
the risk management framework. Same is given below.

RISK GOVERNANCE FRAMEWORK

First Line of Defence Second Line of Defence Third Line of Defence

Project Level Account / Engagement Level Enterprise Level

• Have primary • Understand ERM framework, • Board defines the ERM


responsibility for day-to- Business Unit’s risk capacity, framework, risk appetite and
day risk management strategies and structure for provides oversight
managing risk • Monitor the overall
• Provide oversight, support effectiveness of the risk
and monitoring governance framework
across the organization

At each line of defence the risk governance guidance is defined to support the ERM framework.

1.1 First Level of Defence – Project Level

The first line of defence is at the project level wherein the Project Manager must understand the roles and
responsibilities about associated risks and treat the risks appropriately. The steps taken would include:

1.1. Risk Identification


1.2. Risk Assessment & Prioritization
1.3. Risk Mitigation & contingency planning
1.4. Risk monitoring & control
1.5. Risk Analysis

For more details on Project level risk management, please refer to Section 2.0 of this document.

PRIME For Internal Use Only Document Ver.: 6.1


Page 3/38
Hex 2224
Risk Management Guidelines

The Account Service Delivery Manager (ASDM), Account Manager (AM) and the Delivery Head (DH)
form the risk management committee. This risk management committee is the first line of defence of the
risk governance framework. This committee is empowered with the responsibility and accountability to
effectively plan, build, run and monitor the project’s day-to-day risk environment. The committee
provides direction regarding risk response (i.e., treatment) for those risks that are outside of the Account /
Vertical risk tolerance.

The Project Manager should identify the suitable mitigation / contingency plans and get it approved by
the ASDM, who is a part of the risk management committee at the Project level. Project Manager has the
responsibility to ensure that the control activities and other responses that treat risk are enforced and
monitored for compliance.

During monthly project reviews, the ASDM monitors the status of the risk and the remedial actions taken.
This information is then collated with other risk reports for the second-level (Account-level executive risk
committee) and/or for the third-line risk governance committees (Board risk committee), who are charged
with the role of representing the enterprise’s stakeholders in respect to risk issues.

1.2 Second Level of Defence – Account / Vertical Level


The second line of defence is the Account level / Vertical level compliance and risk functions that provide
independent oversight of the risk management activities of the first line of defence.

The responsibilities of these second-line defence functions include participating in the Account / Vertical
risk committees, reviewing risk reports and validating compliance to the risk management framework
requirements, with the objective of ensuring that risks are actively and appropriately managed.

The second level of defence is responsible for the following pertaining to their account / vertical:

• The understanding of the ERM framework


• The Account / vertical risk capacity
• The adequacy of the risk budgets
• The risk appetite and tolerance allocation for each risk category
• The skill and capabilities of its risk resources
• The risk monitoring and reporting activities
• The risk metrics to alert the business of the emergence of risk
• The capability to adjust the account / vertical risk capacity and risk tolerances for changing
organization’s business objectives
The Vertical Delivery Head and Vertical Head will form the risk management committee at the second
level. Depending upon the size and complexity of the account / vertical, there may be additional members
to the risk committee. These members would be the representatives identified by the member of the Third
level Enterprise Risk Governance Committee (ERGC).

PRIME For Internal Use Only Document Ver.: 6.1


Page 4/38
Hex 2224
Risk Management Guidelines

The second line of defence derives the information from the first-line management and independently
assesses the risk information. If there are any value adds to the ERM framework, the same is
communicated to the Enterprise Risk governance committee i.e. the third level of defence. It is for the
Enterprise risk governance committee to evaluate the reports from these multiple sources and determine
the direction for the organization.

1.3 Third Line of Defence – Enterprise Level


The third line of defence is that of Enterprise Risk Governance Committee (ERGC) who report
independently to the Executive Council and CEO with the role of representing the enterprise’s
stakeholders relative to risk issues.

This committee has the responsibility and accountability to provide effective oversight of the enterprise’s
risk profile. This committee ensures that the enterprise’s executive management is effectively governing
and managing the enterprise’s risk environment.

The ERGC teams periodically reviews the second line of defence activities and results, including the risk
governance functions involved, to ensure that the ERM arrangements and structures are appropriate and
are discharging their roles and responsibilities completely and accurately. They would also verify the
results of the external audits, if any.

The results of these independent reviews are communicated to the executive council and they ensure that
appropriate action is taken to maintain and enhance the ERM effectiveness.

PRIME For Internal Use Only Document Ver.: 6.1


Page 5/38
Hex 2224
Risk Management Guidelines

Each business head manages the third-party risks of their respective business units and they report to the
enterprise risk governance committee (ERGC) which is chaired by the Chief Operating Officer. The
ERGC committee meets quarterly to review the status. The COO reports to the Executive Council and
CEO on the status of the enterprise’s risks.

PRIME For Internal Use Only Document Ver.: 6.1


Page 6/38
Hex 2224
Risk Management Guidelines

2. Risk management at Project Level


Risk Management can be considered as a mini project within main project. For effective handling of risks,
like any other project, it also requires planning, execution of planned activities and monitoring of planned
activities. Risk Management process thus can be divided into two major categories viz. planning and
execution of planned activities as described below.
a. Risk Management Planning
b. Execution of Risk Management Plan
- Risk Identification
- Risk Assessment & Prioritization
- Risk Mitigation & contingency planning
- Risk monitoring & control
- Risk Analysis

2.1 Risk Management Planning


The purpose of risk management planning is to give desired emphasis on risk management. The risk
management plan should address issues like methods of risk identification, risk prioritization, cost &
benefit analysis, thresholds to consider risks for mitigation & further monitoring, thresholds to initiate
contingency actions, frequency of risk re-assessment etc.

One of the risk identification techniques is Failure Modes & Effects Analysis (FMEA). In this
technique requirements are taken as basis for further steps. For each requirement possible reasons for
failure are identified and then for each reason, associated effects on the whole system is determined.
The analysis of failure modes and effects helps in preventing failures or reducing the impact of
failures. Another technique used for risk identification is brainstorming.

Some time the cost of solution may be higher than actual loss because of problem. In such scenario it
is better to suffer loss than to have a solution. It applies to risk management also. Cost benefit
analysis would give an understanding to decide whether to take mitigation action or to live with the
risk or to what extent mitigation can be done.

2.2 Execution of Risk Management Plan

2.2.1 Risk Identification


Identifying risks involves discovering risks and communicating identified risk effectively. A
drilled down process of classification helps in achieving granularity in risk identification process.

PRIME For Internal Use Only Document Ver.: 6.1


Page 7/38
Hex 2224
Risk Management Guidelines

Identify the source of risks i.e. areas to focus upon to identify risks e.g. Development
environment, Project constraints etc. Project characteristics become driver in identification of
sources of risks.
After having identified the source, identify the problem area within the source i.e. identify
the category within the source e.g. in Project constraints, ‘Consultants’ may be a category.
The next step is to identify the subcategories within the category e.g. in Project constraints
(Source), ‘Consultant s’ (Category), the subcategories could be Schedule / Facilities etc.
Finally, questions are raised and if there is any concern for the same is identified as risk. For
example, question like ‘Are there areas where required technical skills are lacking?’ is raised and
if answer is ‘Yes’ then it is identified as a risk.

A checklist (Risk Identification Checklist) is provided in PRIME. This contains several questions
for each source / category / subcategory. The questions may be answered to determine whether
there is any risk. The response to each of the questions may be – YES, No, NA (Not Applicable).
These are only guidelines. In similar lines, you can identify more questions which may help in
identification of risk. Risk identification checklist to be used at the project initiation to conduct a
risk assessment and later re-assessment to be conducted once in 6 months. A typical 3 tier
structure is given below for reference:
RISK IDENTIFICATION

Software Program Development


Source Engineering Constraints Environment

Category Reqmnts Design Consultants Customer Work Devlpt


Environment Process

Sub Category
Schedule Facilities

PRIME For Internal Use Only Document Ver.: 6.1


Page 8/38
Hex 2224
Risk Management Guidelines

Five simple rules to make RISK STATEMENT

1. Make a complete statement with Cause and effect


2. Link the Cause and effect by a phrase such as” leading to”, “causing”,” result in”
3. State the cause as an event or a set of conditions
4. State the effect with respect to goal, objective
5. Identify and state the risk that lies as far upstream as is practical to manage
Example:

Context1: New employee who is not aware of functional knowledge will inject more defects
Risk Statement1: Lack of functional knowledge of new employees results in more defects

Context2: Many employees leaving the project affects project schedule


Risk Statement2: Attrition leads to slippage in project schedule and impacts product quality

It is important to assess and highlight the Key Application and Infrastructure Vulnerabilities. It
can be removed or mitigated by tuning the design or reconfiguring the infrastructure. Alternately,
if it is unable to eliminate or lessen vulnerabilities, they can defer them, or at least monitor for
their occurrence.

The identified risks are recorded in "Risks" module in SmartBase / Risk Management Form, an
Excel file. There may be some additional risks depending upon project characteristics. The same
need to be identified in addition to what is identified in the checklist. There could be some
unforeseen incidents that could not be identified as risks but occurred and had negative impact on
project success. Such incidents are to be logged in ‘Critical Incidents’ sheet of Risk Management
Form.
Any new risk identified as an outcome of reviews or usage of PPM should also be logged in Risk
Management Form / SmartBase > Risk module.
Projects should use the ‘Risks’ module in SmartBase for recording the identified risk, the relevant
risk parameter values, mitigation plans and contingency plans.

2.2.2 Risk Assessment & Prioritization


Risk assessment is nothing but conversion of plain text (risk statement) into numeric value. For
this probability of occurrence is determined based on the percentage of occurrence.

PRIME For Internal Use Only Document Ver.: 6.1


Page 9/38
Hex 2224
Risk Management Guidelines

The probable values that could be considered are:


Probability Probability Range
(SmartBase)
Most Likely 76 to 99%
Probable 51 to 75%
Occasional 26 to 50%
Unlikely 1 to 25%

The quantification of the impact should consider the effort loss or effort overrun, loss of billing,
quantified opportunity loss etc. The impact is measured on the scale of 1 to 10, where 10 being
the maximum. A recommended scale for impact assessment is as follows:
Impact Impact Range
(SmartBase)
Low 1-4
Medium 5-7
High 8-10

In SmartBase, the ordinal values will be accepted for Probability and Impact and the
corresponding values will be considered. For example High will be considered as 9 etc.
The Risk Exposure (RE) is calculated as the product of Probability and Impact.
Risk Threshold is arrived at the organization level based on the org. risk analysis and by default
set as 5.6 in the SmartBase PM Plan. PM can override this value based on the risk tolerance of the
project.

If risk exposure value is greater than risk threshold value, then method of risk management is by
default set to Control in SmartBase and cannot be changed by the user.
If method of risk management is chosen as Control or Avoidance, then mitigation and
contingency plan for the risk will be mandatory.
Based on the priority of risks, a risk management method could be selected. The risk
management methods are:
Avoidance: Taking actions that will avoid the risk.
Control: Taking active steps to minimize risks
Acceptance: Not taking any action and accept the risk when it occurs

Risk management criteria is given below for reference:

PRIME For Internal Use Only Document Ver.: 6.1


Page 10/38
Hex 2224
Risk Management Guidelines

Impact ==> Low Medium High

Probability ↓
Most Likely Control Avoidance Avoidance
Probable Control Control Avoidance
Occasional Acceptance Control Control
Unlikely Acceptance Acceptance Control

The above guideline with magnitude values are given below for reference.

Impact → Low Medium High


Probability ↓ 4 7 10
Most Likely 0.99 4 7 9.9
Probable 0.75 3 5.25 7.5
Occasional 0.5 2 3.5 5
Unlikely 0.25 1 1.75 2.5

An organizational guideline is available for the various categories of risks on probability and
impact. These guidelines should be used while assessing risks. Also, the organization guideline
value on selection of the exposure value to decide on risk mitigation should be used.

2.2.3 Risk Mitigation and Contingency planning


Risk Mitigation covers efforts taken to reduce either the probability or consequences of a risk. It
is a systematic reduction in the extent of exposure to a risk and/or the likelihood of its occurrence.

The risk mitigation plan identifies how specific risks will be dealt with and the steps that are
required to carry them out. It gives team members a clear sense of the actions that they are
expected to take and provides management with an understanding of what actions are being taken
on their behalf to minimize project risks.
The type of risk management method like Avoidance, Control or Acceptance should be decided
for each risk that is identified. The method chosen will be dependent on the cost of risk and the
cost of mitigation plans. This management method should be documented in the Risk
Management Form.
For each identified risk that are under Avoidance , mitigation actions are determined, and an
owner is assigned who would be primarily responsible for taking mitigation action and
monitoring of risk. The risk mitigation activity could be examined for the benefit it provides
versus the cost expended.

PRIME For Internal Use Only Document Ver.: 6.1


Page 11/38
Hex 2224
Risk Management Guidelines

Mitigation Thresholds are event driven milestones in the project cycle which defines when the
mitigation actions should be started. Examples of mitigation thresholds are:
− Development/Test Environment not setup by a date
− Software/Tool Licenses are not available
− Response from customer not received by a certain date
− Start of a coding phase
− Estimated effort used up for the project has touched 50%
− Work completion status
Mitigation thresholds should be defined for all risks for which mitigation planning is done.
Sometimes it is not possible to mitigate a risk; i.e., it is not possible to incur a cost ahead of an
uncertain event that will either reduce the likelihood of that event occurring or limit the loss
should the event occur. Where mitigation is not possible, a contingency plan can be employed.

Contingency refers to an organized and coordinated set of steps to be taken after the risk occurs.
A contingency plan is nothing more than a plan to solve a problem that may occur but has not
occurred yet.

Contingency actions are also planned for all risks under Avoidance and Control so that when a
risk becomes reality the PM can just initiate the action. The advantage of contingency planning in
advance is that required time can be given to brainstorm to arrive at best possible alternative.
Ownership of risk should be given to a person who understands the situation and impact well. It is
necessary to decide as to when contingency actions will be initiated.

Projects using SmartBase should record the mitigation / contingency plan for the identified risk in
“Risks” module in SmartBase
When risks occur, the status of the risk should be changed to ‘Occurred’ and risk occurrence
details & action taken in SmartBase

PRIME For Internal Use Only Document Ver.: 6.1


Page 12/38
Hex 2224
Risk Management Guidelines

2.2.4 Risk Monitoring & Control


Because of internal & external factors risk scenarios keep changing. Some risks become more
critical with project progress and some become less critical. For example – Key developers
leaving the organization would be comparatively less critical in any other phase than coding
phase because if it so happens in coding phase then the impact would be very high. It is therefore
necessary to monitor risks continuously and carry out re-assessment at a predefined frequency.
Further actions against each risk would depend upon the current exposure value. The ideal time
for risk re-assessment is that of project reviews, however depending upon the criticality it may be
necessary to have a stringent control over some risks resulting into more frequent reviews and re-
assessments.
Each time risk monitoring and review is carried out it is recommended to reassess the exposure of
risk. This exposure reassessment would help to evaluate the effectiveness of mitigation actions.
The historical data on the status of the risk and risk exposure should be maintained. This history
of the risk status should be used to evaluate the effectiveness of the mitigation plans. Each time
risk re-assessment is done Risk Management Form should be saved as a new version. Projects
using SmartBase, should record the reassessed risk parameters in SmartBase. Projects using
SmartBase should ensure implementation of the Mitigation tasks, observing the occurrence of the
risk, recording risk occurrences and implementing the contingency plan as a part of Risk
Monitoring and Control.

2.2.5 Risk Analysis


Risk analysis should be done based on relevant measurement data captured during project
execution as mentioned in Risk Management Procedure. Analysis is carried out both at project
level and organization level. The objective of risk analysis is to determine as to how effective risk
management process is.
A risk profiling could be carried out to assess the risks that are most common to our organization
and to evaluate the effectiveness of mitigation actions.

3. Risk Management Tools and Techniques

Risk assessment techniques can be classified in various ways to assist with understanding their relative
strengths and weaknesses. Some of the techniques are defined below depending on the nature of the
assessment they provide and guidance to their applicability for certain situations.

Srl Technique Risk Risk Analysis Risk


Identify Impact Probability Level of risk Evaluate
1 Brainstorming SA NA NA NA NA
2 Delphi SA NA NA NA NA

PRIME For Internal Use Only Document Ver.: 6.1


Page 13/38
Hex 2224
Risk Management Guidelines

3 FMEA Analysis SA SA SA SA SA
4 Event tree analysis A SA A A NA
5 Fault tree analysis A NA SA A A
6 Bow tie analysis NA A SA SA A
7 Bayesian NA SA NA NA NA

** SA – Strongly Applicable A – Applicable NA – Not Applicable

3.1 Brainstorming

3.1.1 Overview

Brainstorming involves stimulating and encouraging free-flowing conversation amongst a group of


knowledgeable people to identify potential failure modes and associated hazards, risks, criteria for
decisions and/or options for treatment. The term “brainstorming” is often used very loosely to mean any
type of group discussion. However true brainstorming tries to ensure that people's imagination is triggered
by the thoughts and statements of others in the group.

Effective facilitation is very important in this technique and includes stimulation of the discussion at kick-
off, periodic prompting of the group into other relevant areas and capture of the issues arising from the
discussion.

3.1.2 Use
Brainstorming can be used in conjunction with other risk assessment methods described below or may
stand alone as a technique to encourage imaginative thinking at any stage of the risk management process
and any stage of the life cycle of a system. It may be used for high-level discussions where issues are
identified, for more detailed review or at a detailed level for problems.

Brainstorming emphasizes heavily on imagination. It is therefore particularly useful when identifying


risks of new technology, where there is no data or where novel solutions to problems are needed.

3.1.3 Input
A team of people with knowledge of the organization, systems, processes or applications being
assessed.

3.1.4 Process
Brainstorming may be formal or informal. Formal brainstorming is more structured with participants
prepared in advance and the session has a defined purpose and outcome with a means of evaluating ideas
put forward. Informal brainstorming is less structured and often more ad-hoc.

In a formal process:
• the facilitator prepares thinking prompts and triggers appropriate to the context prior to
the session

PRIME For Internal Use Only Document Ver.: 6.1


Page 14/38
Hex 2224
Risk Management Guidelines

• objectives of the session are defined, and rules explained


• the facilitator starts off a train of thought and everyone explores ideas identifying as
many issues as possible There is no discussion at this point about whether things should
or should not be in a list or what is meant by statements because this tends to
inhibit free-flowing thought. All input is accepted, and none is criticized and the group
moves on quickly to allow ideas to trigger lateral thinking
• the facilitator may set people off on a new track when one direction of thought is
exhausted or discussion deviates too far. The idea however, is to collect as many diverse
ideas as possible for later analysis.

3.1.5 Output
Outputs depend on the stage of the risk management process at which it is applied, for example at the
identification stage, outputs might be a list of risks and current controls.

3.1.6 Strengths and limitations


Strengths of brainstorming include:
• it encourages imagination which helps identify new risks and novel solutions
• it involves key stakeholders and hence aids communication overall
• it is relatively quick and easy to set up.

Limitations include:
participants may lack the skill and knowledge to be effective contributors
• since it is relatively unstructured, it is difficult to demonstrate that the process has been
comprehensive, e.g. that all potential risks have been identified;
• there may be group dynamics where some people with valuable ideas stay
quiet while others dominate the discussion. This can be overcome by computer
brainstorming, using a chat forum or nominal group technique. Computer brainstorming
can be set up to be anonymous, thus avoiding personal and political issues which may
impede free flow of ideas. In nominal group technique ideas are submitted anonymously
to a moderator and are then discussed by the group.

3.2 Delphi Technique

3.2.1 Overview

The Delphi technique is a procedure to obtain a reliable consensus from a group of experts. Although the
term is often now broadly used to mean any form of brainstorming, an essential feature of the Delphi
technique, as originally formulated, was that experts expressed their opinions individually and
anonymously while having access to the other expert’s views as the process progresses.

3.2.2 Use
The Delphi technique can be applied at any stage of the risk management process or at any phase of a
system life cycle, wherever a consensus of views of experts is needed.

3.2.3 Input

PRIME For Internal Use Only Document Ver.: 6.1


Page 15/38
Hex 2224
Risk Management Guidelines

A set of options for which consensus is needed.

3.2.4 Process

Step 1: Choose a Facilitator


The first step is to choose your facilitator. You may wish to take on this role yourself or find a neutral
person within organization. It is useful to have someone that is familiar with research and data collection.

Step 2: Identify Your Experts


The Delphi technique relies on a panel of experts. This panel may be your project team, including the
customer, or other expert. An expert is, “any individual with relevant knowledge and experience of a
particular topic.”

Step 3: Define the Problem


What is the problem or issue you are seeking to understand? The experts need to know what problem they
are commenting on, so ensure you provide a precise and comprehensive definition.

Step 4: Round One Questions


Ask general questions to gain a broad understanding of the experts view on future events. The questions
may go out in the form of a questionnaire or survey. Collate and summarise the responses, removing any
irrelevant material and looking for common viewpoints.

Step 5: Round Two Questions


Based on the answers to the first questions, the next questions should probe deeper into the topic to clarify
specific issues. These questions may also go out in the form of a questionnaire or survey. Again, collate
and summarise the results, removing any irrelevant material and look for the common ground.

Step 6: Round Three Questions


The final questionnaire aims to focus on supporting decision making. Hone in on the areas of agreement.
What is it the experts are all agreed upon? You may wish to have more than three rounds of questioning
to reach a closer consensus.
Step 7: Act on Your Findings
After this round of questions, your experts will have, we hope, reached a consensus and you will have a
view of future events. Analyse the findings and put plans in place to deal with future risks and
opportunities to your project.

3.2.5 Output
Convergence toward consensus on the matter in hand.

3.2.6 Strengths and limitations


Strengths include:
• as views are anonymous, unpopular opinions are more likely to be expressed;
• all views have equal weight, which avoids the problem of dominating personalities;
• achieves ownership of outcomes;
• people do not need to be brought together in one place at one time.

PRIME For Internal Use Only Document Ver.: 6.1


Page 16/38
Hex 2224
Risk Management Guidelines

Limitations include:
• it is labour intensive and time consuming;
• participants need to be able to express themselves clearly in writing.

3.3 Failure Mode and Effects Analysis (FMEA)

3.3.1 Overview
Failure modes and effects analysis (FMEA) is a technique used to identify the ways in which components,
systems or processes can fail to fulfil their design intent.

FMEA identifies:
• all potential failure modes of the various parts of a system (a failure mode is what is
observed to fail or to perform incorrectly);
• the effects these failures may have on the system;
• the mechanisms of failure;
• how to avoid the failures, and/or mitigate the effects of the failures on the system.

FMECA extends an FMEA so that each fault mode identified is ranked according to its importance or
criticality. This criticality analysis is usually qualitative or semi-quantitative but may be quantified using
actual failure rates.

3.3.2 Use
There are several applications of FMEA: Design (or product) FMEA which is used for components and
products, System FMEA which is used for systems, Process FMEA which is used for manufacturing and
assembly processes, Service FMEA and Software FMEA.

FMEA may be applied during the design, manufacture or operation of a physical system.

To improve dependability, however, changes are usually more easily implemented at the design stage.
FMEA may also be applied to processes and procedures. For example, it is used to identify potential for
medical error in healthcare systems and failures in maintenance procedures.

FMEA can be used to


• assist in selecting design alternatives with high dependability,
• ensure that all failure modes of systems and processes, and their effects on operational success
have been considered,
• identify human error modes and effects,
• provide a basis for planning testing and maintenance of physical systems,
• improve the design of procedures and processes,
• provide qualitative or quantitative information for analysis techniques such as fault tree analysis

PRIME For Internal Use Only Document Ver.: 6.1


Page 17/38
Hex 2224
Risk Management Guidelines

FMEA can provide input to other analyses techniques such as fault tree analysis at either a qualitative or
quantitative level.

Terms and Definitions:


Process/Product Characteristics Purpose of the product or process
Failure Mode How can the product/process fail to function?
Effects Which effects are most severe to customer?
Causes Which causes are most likely to occur?
Controls Ability for current controls to detect causes?
RPN (Risk Priority Number) Which high risk to work on first?
Action Plan Recommended actions

3.3.3 Input
FMEA need information about the elements of the system in sufficient detail for meaningful analysis of
the ways in which each element can fail. For a detailed Design FMEA the element may be at the detailed
individual component level, while for higher level Systems FMEA elements may be defined at a higher
level.

Information may include:


• drawings or a flow chart of the system being analysed and its components, or the steps
• an understanding of the function of each step of a process or component of a system
• details of environmental and other parameters, which may affect operation
• an understanding of the results of failure;
• historical information on failures including failure rate data where available.

3.3.4 Process
The FMEA process is as follows:

PRIME For Internal Use Only Document Ver.: 6.1


Page 18/38
Hex 2224
Risk Management Guidelines

Effect

Severity

Process/ Failure Cause Risk Action


Product Mode Priority Plan
Characteristics Number
Occurrence

Control

Detectability

1. Assemble a cross-functional team of people with diverse knowledge about the process, product or
service and customer needs. Functions often included are: design, quality, testing, reliability,
maintenance, purchasing (and suppliers), sales, and customer service.

2. Identify the scope of the FMEA. Is it for concept, system, design, process or service? What are
the boundaries? How detailed should we be? Use flowcharts to identify the scope and to make
sure every team member understands it in detail. (From here on, we’ll use the word “scope” to
mean the system, design, process or service that is the subject of your FMEA.)

3. Fill in the identifying information at the top of your FMEA form.

4. Identify the functions of your scope. Ask, “What is the purpose of this system, design, process or
service? What do our customers expect it to do?” Usually you will break the scope into separate
subsystems, items, parts, assemblies or process steps and identify the function of each.

5. For each function, identify all the ways failure could happen. These are potential failure modes. If
necessary, go back and rewrite the function with more detail to be sure the failure modes show a
loss of that function.

6. For each failure mode, identify all the consequences on the system, related systems, process,
related processes, product, service, customer or regulations. These are potential effects of failure.
Ask, “What does the customer experience because of this failure? What happens when this failure
occurs?”
7. Determine how serious each effect is. This is the severity rating, or S. Severity is usually rated on
a scale from 1 to 10, where 1 is insignificant and 10 is catastrophic. If a failure mode has more
than one effect, write on the FMEA table only the highest severity rating for that failure mode.

PRIME For Internal Use Only Document Ver.: 6.1


Page 19/38
Hex 2224
Risk Management Guidelines

Rating Degree of Severity


1 Customer will not notice the adverse effect, or it is insignificant.
2 Customer will probably experience slight annoyance.
3 Customer will experience a slight annoyance because of poor service.
4 Customer dissatisfaction because of poor service.
5 Customer is made uncomfortable or their productivity is reduced by
the continued poor service.
6 Customer complaint because of service issue.
7 High degree of customer dissatisfaction due to loss of being able to
use a portion of the service.
8 Very high degree of customer dissatisfaction due to loss of service.
9 Customer has lost the total use of service.
10 Customer has lost the total use of service and will never return.

8. For each failure mode, determine all the potential root causes. Use tools classified as cause
analysis tool, as well as the best knowledge and experience of the team. List all possible causes
for each failure mode on the FMEA form. Usage of Cause-effect diagrams, Pareto diagrams and
Ishikawa diagrams are recommended.

9. For each cause, determine the occurrence rating, or O. This rating estimates the probability of
failure occurring for that reason during the lifetime of your scope. Occurrence is usually rated on
a scale from 1 to 10, where 1 is extremely unlikely and 10 is inevitable. On the FMEA table, list
the occurrence rating for each cause.

Rating Likelihood of occurrence


1 Likelihood of occurrence is remote.
2 Low failure rate with supporting documentation.
3 Low failure rate without supporting documentation.
4 Occasional failures.
5 Relatively Moderate Failure rate with supporting documentation.
6 Moderate Failure rate without supporting documentation.
7 Relatively High Failure rate with supporting documentation.

PRIME For Internal Use Only Document Ver.: 6.1


Page 20/38
Hex 2224
Risk Management Guidelines

8 High Failure rate without supporting documentation.


9 Failure is almost certain based on data.
10 Assured of failure based on data.

10. For each cause, identify current process controls. These are tests, procedures or mechanisms that
you now have in place to keep failures from reaching the customer. These controls might prevent
the cause from happening, reduce the likelihood that it will happen or detect failure after the
cause has already happened but before the customer is affected.

11. For each control, determine the detection rating, or D. This rating estimates how well the controls
can detect either the cause or its failure mode after they have happened but before the customer is
affected. Detection is usually rated on a scale from 1 to 10, refer to table On the FMEA table, list
the detection rating for each cause.

Rating Ability to Detect


1 Sure, that the Potential failure will be found or prevented before reaching the next
customer.
2 Almost certain that the potential failure will be detected or prevented before reaching
the next customer.
3 Low likelihood that the potential failure will reach the next customer undetected.
4 Controls may detect or prevent the potential failure from reaching the next customer.
5 Moderate likelihood that the potential failure will reach the next customer.
6 Controls are unlikely to detect or prevent the potential failure from reaching the next
customer.
7 Poor likelihood that the potential failure will be detected or prevented before reaching
the next customer.
8 Very poor likelihood that the potential failure will be detected or prevented before
reaching the next customer.
9 Current controls probably will not even detect the potential failure.
10 Absolute certainty that the current controls will not detect the potential failure.

12. Is this failure mode associated with a critical characteristic? (Critical characteristics are
measurements or indicators that reflect safety or compliance with government regulations and
need special controls.) Usually, critical characteristics have a severity of 9 or 10 and occurrence
and detection ratings above 3.

13. Calculate the risk priority number, or RPN, which equals S × O × D. Also calculate Criticality by
multiplying severity by occurrence, S × O. These numbers provide guidance for ranking potential
failures in the order they should be addressed.

PRIME For Internal Use Only Document Ver.: 6.1


Page 21/38
Hex 2224
Risk Management Guidelines

RPN = Severity Rating X Likelihood of Occurrence X Ability to Detect

14. Identify recommended actions and record them in the template along with the responsible person.
These actions may be design or process changes to lower the RPN value. For reducing the RPN
value, the new design change or process change, or new control will induce some change in the
value of severity or likelihood of occurrence or detection efficiency. The values of Severity,
Likelihood of Occurrence and Ability to Detect values will change based on the action item
implemented. This needs to be updated in the green columns

The RPN value is re-calculated using the above formula and verified for acceptable range. The acceptable
range of RPN may vary from situation to situation. The Team needs to decide whether the new RPN
value is acceptable else the steps from 2 to 15 should be followed again.

3.3.5 Output
The primary output of FMEA is a list of failure modes, the failure mechanisms and effects for each
component or step of a system or process (which may include information on the likelihood of failure).
Information is also given on the causes of failure and the consequences to the system. The output from
FMECA includes a rating of importance based on the likelihood that the system will fail, the level of risk
resulting from the failure mode or a combination of the level of risk and detectability of the failure mode.

FMECA can give a quantitative output if suitable failure rate data and quantitative consequences are used.

3.3.6 Strengths and limitations


The strengths of FMEA/FMECA are as follows:
• widely applicable to human, equipment and system failure modes and to hardware,
software and procedures;
• identify component failure modes, their causes and their effects on the system, and
present them in an easily readable format;
• avoid the need for costly equipment modifications in service by identifying problems
early in the design process;
• identify single point failure modes and requirements for redundancy or safety systems;
• provide input to the development monitoring programmes by highlighting key features to
be monitored.

Limitations include:
• they can only be used to identify single failure modes, not combinations of failure modes;
• unless adequately controlled and focused, the studies can be time consuming and costly;
• they can be difficult and tedious for complex multi-layered systems.

3.4 Event Tree Analysis (ETA)

PRIME For Internal Use Only Document Ver.: 6.1


Page 22/38
Hex 2224
Risk Management Guidelines

3.4.1 Overview
ETA is a graphical technique for representing the mutually exclusive sequences of events following an
initiating event according to the functioning/not functioning of the various systems designed to mitigate
its consequences (see Figure). It can be applied both qualitatively and quantitatively.

Figure– Example of an event tree

Figure shows simple calculations for a sample event tree, when branches are fully independent. By
fanning out like a tree, ETA can represent the aggravating or mitigating events in response to the
initiating event, considering additional systems, functions or barriers.

3.4.2 Use
ETA can be used for modelling, calculating and ranking (from a risk point of view) different accident
scenarios following the initiating event

ETA can be used at any stage in the life cycle of a product or process. It may be used qualitatively to help
brainstorm potential scenarios and sequences of events following an initiating event and how outcomes
are affected by various treatments, barriers or controls intended to mitigate unwanted outcomes.

The quantitative analysis lends itself to consider the acceptability of controls. It is most often used to
model failures where there are multiple safeguards.

ETA can be used to model initiating events which might bring loss or gain. However, circumstances
where pathways to optimize gain are sought are more often modelled using a decision tree.

PRIME For Internal Use Only Document Ver.: 6.1


Page 23/38
Hex 2224
Risk Management Guidelines

3.4.3 Input
Inputs include:
• a list of appropriate initiating events;
• information on treatments, barriers and controls, and their failure probabilities
• understanding of the processes whereby an initial failure escalates.

3.4.4 Process
An event tree starts by selecting an initiating event. This may be an incident such as a dust explosion or a
causal event such as a power failure. Functions or systems which are in place to mitigate outcomes are
then listed in sequence. For each function or system, a line is drawn to represent their success or failure.

A probability of failure can be assigned to each line, with this conditional probability estimated e.g. by
expert judgement or a fault tree analysis. In this way, different pathways from the initiating event are
modelled.

Note that the probabilities on the event tree are conditional probabilities, for example the probability of a
sprinkler functioning is not the probability obtained from tests under normal conditions, but the
probability of functioning under conditions of fire caused by an explosion.

Each path through the tree represents the probability that all the events in that path will occur. Therefore,
the frequency of the outcome is represented by the product of the individual conditional probabilities and
the frequency of the initiation event, given that the various events are independent.

3.4.5 Output
Outputs from ETA include the following:
• qualitative descriptions of potential problems as combinations of events producing
various types of problems (range of outcomes) from initiating events;
• quantitative estimates of event frequencies or probabilities and relative importance of
various failure sequences and contributing events;
• lists of recommendations for reducing risks;
• quantitative evaluations of recommendation effectiveness.

3.4.6 Strengths and limitations


Strengths of ETA include the following:
• ETA displays potential scenarios following an initiating event, are analysed and the influence of
the success or failure of mitigating systems or functions in a clear diagrammatic way
• it accounts for timing, dependence and domino effects that are cumbersome to mode in fault trees
• it graphically represents sequences of events which are not possible to represent when using fault
trees.

Limitations include:
• to use ETA as part of a comprehensive assessment, all potential initiating events need to be
identified. This may be done by using another analysis method (e.g. HAZOP, PHA), however,
there is always a potential for missing some important initiating events

PRIME For Internal Use Only Document Ver.: 6.1


Page 24/38
Hex 2224
Risk Management Guidelines

• with event trees, only success and failure states of a system are dealt with, and it is difficult to
incorporate delayed success or recovery events;
• any path is conditional on the events that occurred at previous branch points along the path. Many
dependencies along the possible paths are therefore addressed. However, some dependencies,
such as common components, utility systems and operators, may be overlooked if not handled
carefully, may lead to optimistic estimations of risk.

3.5 Fault Tree Analysis (FTA)


3.5.1 Overview
FTA is a technique for identifying and analysing factors that can contribute to a specified undesired event
(called the “top event”). Causal factors are deductively identified, organized in a logical manner and
represented pictorially in a tree diagram which depicts causal factors and their logical relationship to the
top event.

The factors identified in the tree can be events that are associated with component hardware failures,
human errors or any other pertinent events which lead to the undesired event.

Figure – Example of an FTA

3.5.2 Use
A fault tree may be used qualitatively to identify potential causes and pathways to a failure (the top event)
or quantitatively to calculate the probability of the top event, given knowledge of the probabilities of

PRIME For Internal Use Only Document Ver.: 6.1


Page 25/38
Hex 2224
Risk Management Guidelines

causal events. It may be used at the design stage of a system to identify potential causes of failure and
hence to select between different design options. It may be used at the operating phase to identify how
major failures can occur and the relative importance of different pathways to the head event.

A fault tree may also be used to analyses a failure which has occurred to display diagrammatically how
different events came together to cause the failure.

3.5.3 Inputs
For qualitative analysis, an understanding of the system and the causes of failure is required, as well as a
technical understanding of how the system can fail. Detailed diagrams are useful to aid the analysis.

For quantitative analysis, data on failure rates or the probability of being in a failed state for
all basic events in the fault tree are required.

3.5.4 Process
The steps for developing a fault tree are as follows:
• The top event to be analysed is defined. This may be a failure or maybe a broader outcome of that
failure. Where the outcome is analysed, the tree may contain a section relating to mitigation of the
actual failure.
• Starting with the top event, the possible immediate causes or failure modes leading to the top
event are identified.
• Each of these causes/fault modes is analysed to identify how their failure could be caused.
• Stepwise identification of undesirable system operation is followed to successively lower system
levels until further analysis becomes unproductive. In a hardware system this may be the
component failure level. Events and causal factors at the lowest system level analysed are known
as base events.
• Where probabilities can be assigned to base events the probability of the top event may be
calculated. For quantification to be valid it must be able to be shown that, for each gate, all inputs
are both necessary and sufficient to produce the output event. If this is not the case, the fault tree
is not valid for probability analysis but may be a useful tool for displaying causal relationships.

As part of quantification the fault tree may need to be simplified using Boolean algebra to account for
duplicate failure modes.

As well as providing an estimate of the probability of the head event, minimal cut sets, which form
individual separate pathways to the head event, can be identified and their influence on the top event
calculated.

Except for simple fault trees, a software package is needed to properly handle the calculations when
repeated events are present at several places in the fault tree, and to calculate minimal cut sets. Software
tools help ensure consistency, correctness and verifiability.

3.5.5 Outputs
The outputs from fault tree analysis are as follows:

PRIME For Internal Use Only Document Ver.: 6.1


Page 26/38
Hex 2224
Risk Management Guidelines

• a pictorial representation of how the top event can occur which shows interacting pathways where
two or more simultaneous events must occur;
• a list of minimal cut sets (individual pathways to failure) with (where data is available) the
probability that each will occur;
• the probability of the top event.

3.5.6 Strengths and limitations

Strengths of FTA:
• It affords a disciplined approach which is highly systematic, but at the same time sufficiently
flexible to allow analysis of a variety of factors, including human interactions and physical
phenomena.
• The application of the "top-down" approach, implicit in the technique, focuses attention on those
effects of failure which are directly related to the top event.
• FTA is especially useful for analysing systems with many interfaces and interactions.
• The pictorial representation leads to an easy understanding of the system behavior and the factors
included, but as the trees are often large, processing of fault trees may require computer systems.
This feature enables more complex logical relationships to be included (e.g. NAND and NOR)
but also makes the verification of the fault tree difficult.
• Logic analysis of the fault trees and the identification of cut sets is useful in identifying simple
failure pathways in a very complex system, where a combination of events which lead to the top
event could be overlooked.

Limitations include:
• Uncertainties in the probabilities of base events are included in calculations of the probability of
the top event. This can result in high levels of uncertainty where base event failure probabilities
are not known accurately; however, a high degree of confidence is possible in a well understood
system.
• In some situations, causal events are not bound together, and it can be difficult to ascertain
whether all important pathways to the top event are included. For example, including all ignition
sources in an analysis of a fire as a top event. In this situation probability analysis is not possible.
• Fault tree is a static model; time interdependencies are not addressed.
• Fault trees can only deal with binary states (failed/not failed) only.
• While human error modes can be included in a qualitative fault tree, in general failures of degree
or quality which often characterize human error cannot easily be included;
• A fault tree does not enable domino effects or conditional failures to be included easily.

PRIME For Internal Use Only Document Ver.: 6.1


Page 27/38
Hex 2224
Risk Management Guidelines

3.6 Bow Tie Analysis (BTA)

3.6.1 Overview
Bow tie analysis is a simple diagrammatic way of describing and analysing the pathways of a risk from
causes to consequences. It can be a combination of the thinking of a fault tree analysing the cause of an
event (represented by the knot of a bow tie) and an event tree analysing the consequences. However, the
focus of the bow tie is on the barriers between the causes and the risk, and the risk and consequences.
Bow tie diagrams can be constructed starting from fault and event trees but are more often drawn directly
from a brainstorming session.

3.6.2 Use
Bow tie analysis is used to display a risk showing a range of possible causes and consequences. It is used
when the situation does not warrant the complexity of a full fault tree analysis or when the focus is more
on ensuring that there is a barrier or control for each failure pathway. It is useful where there are clear
independent pathways leading to failure.

Bow tie analysis is often easier to understand than fault and event trees, and hence can be a useful
communication tool where analysis is achieved using more complex techniques.

3.6.3 Input
An understanding is required of information on the causes and consequences of a risk and the barriers and
controls which may prevent, mitigate or stimulate it.

PRIME For Internal Use Only Document Ver.: 6.1


Page 28/38
Hex 2224
Risk Management Guidelines

3.6.4 Process
The bow tie is drawn as follows:

Step 1: Define top event.

o The top event is the initial consequence

Step 2: Identify Threats

o Threats are the causes of the top event.

Step 3: Identify Barriers for each threat

o Barriers prevent the threat from leading to the top event.


o Some barriers are dependent on each other or subject to common failures

Step 4: For each barrier, identify escalation factors and controls

o Escalation factors cause the barriers to fail


o Controls prevent the escalation factors from leading to barrier failure

Step 5: Identify Consequences

o Each top event can have several consequences

Step 6: Identify recovery preparedness measures for each consequence

o Recovery preparedness measures prevent the top event leading to the consequence

Step 7: For each recovery preparedness measure, identify escalation factors and controls

o Escalation factors cause the recovery preparedness measure to fail


o Controls prevent the escalation factors from leading to recovery preparedness measure
failure

Step 8: For each Barrier, Recover Preparedness Measure and Escalation factor control identify
the Critical controls

Some level of quantification of a bow tie diagram may be possible where pathways are independent, the
probability of a consequence or outcome is known, and a figure can be estimated for the effectiveness of a
control. However, in many situations, pathways and barriers are not independent and controls may be
procedural and hence the effectiveness unclear. Quantification is often more appropriately carried out
using FTA and ETA.

PRIME For Internal Use Only Document Ver.: 6.1


Page 29/38
Hex 2224
Risk Management Guidelines

3.6.5 Output
The output is a simple diagram showing main risk pathways and the barriers in place to prevent or
mitigate the undesired consequences or stimulate and promote desired consequences.

Figure - Example bow tie diagram for unwanted consequences

3.6.6 Strengths and limitations


Strengths of bow tie analysis:
• it is simple to understand and gives a clear pictorial representation of the problem;
• it focuses attention on controls which are supposed to be in place for both prevention and
mitigation and their effectiveness;
• it can be used for desirable consequences;
• it does not need a high level of expertise to use.

Limitations include:
• it cannot depict where multiple causes occur simultaneously to cause the consequences
(i.e. where there are AND gates in a fault tree depicting the left-hand side of the bow);
• it may over-simplify complex situations, particularly where quantification is attempted.

PRIME For Internal Use Only Document Ver.: 6.1


Page 30/38
Hex 2224
Risk Management Guidelines

3.7 Bayesian Statistics


3.7.1 Overview
Bayesian statistics are attributed to the Reverend Thomas Bayes. Its premise is that any already known
information (the Prior) can be combined with subsequent measurement (the Posterior) to establish an
overall probability. The general expression of the Bayes Theorem can be expressed as:

where
the probability of X is denoted by P(X);
the probability of X on the condition that Y has occurred is denoted by P(X|Y); and
Ei is the ith event.

In its simplest form this reduces to P(A|B) = {P(A)P(B|A)} /P(B).

Bayesian statistics differs from classical statistics in that is does not assume that all distribution
parameters are fixed, but that parameters are random variables. A Bayesian probability can be more easily
understood if it is considered as a person’s degree of belief in a certain event as opposed to the classical
which is based upon physical evidence. As the Bayesian approach is based upon the subjective
interpretation of probability, it provides a ready basis for decision thinking and the development of
Bayesian nets (or Belief Nets, belief networks or Bayesian networks).

Bayes nets use a graphical model to represent a set of variables and their probabilistic relationships. The
network is comprised of nodes that represent a random variable and arrows which link a parent node to a
child node, (where a parent node is a variable that directly influences another (child) variable).

3.7.2 Use
In recent years, the use of Bays’ theory and Nets has become widespread partly because of their intuitive
appeal and because of the availability of software computing tools. Bayes nets have been used on a wide
range of topics: medical diagnosis, image modelling, genetics, speech recognition, economics, space
exploration and in the powerful web search engines used today. They can be valuable in any area where
there is the requirement for finding out about unknown variables through the utilization of structural
relationships and data. Bayes nets can be used to learn causal relationships to give an understanding about
a problem domain and to predict the consequences of intervention.

3.7.3 Input
The inputs are like the inputs for a Monte Carlo model. For a Bayes net, examples of the
steps to be taken include the following:
• define system variables;
• define causal links between variables;
• specify conditional and prior probabilities;
• add evidence to net;
• perform belief updating;
• extract posterior beliefs.

PRIME For Internal Use Only Document Ver.: 6.1


Page 31/38
Hex 2224
Risk Management Guidelines

3.7.4 Process
Bayes theory can be applied in a wide variety of ways. This example will consider the creation of a Bayes
table where a medical test is used to determine if the patient has a disease. The belief before taking the
test is that 99 % of the population do not have this disease and 1 % have the disease, i.e. the Prior
information. The accuracy of the test has shown that if the person has the disease, the test result is
positive 98 % of the time. There is also a probability that if you do not have the disease, the test result is
positive 10 % of the time. The Bayes table provides the following information:

Table – Bayes’ table data

Using Bayes rule, the product is determined by combining the prior and probability. The posterior is
found by dividing the product value by the product total. The output shows that a positive test result
indicates that the prior has increased from 1 % to 9 %. More importantly, there is a strong chance that
even with a positive test, having the disease is unlikely.

Examining the equation (0,01 0,98)/((0,01 0,98)+(0,99 0,1)) shows that the ‘no disease positive
result’ value plays a major role in the posterior values.

Consider the following Bayes net:

With the conditional prior probabilities defined within the following tables and using the notation that Y
indicates positive and N indicates negative, the positive could be “have disease” as above or could be
High and N could be Low.

Table – Prior probabilities for nodes A and B

Table – Conditional probabilities for node C with node A and node B defined

PRIME For Internal Use Only Document Ver.: 6.1


Page 32/38
Hex 2224
Risk Management Guidelines

Table Conditional probabilities for node D with node A and node C defined

To determine the posterior probability of P(A|D=N,C=Y), it is necessary to first calculate


P(A,B|D=N,C=Y).

Using Bayes’ rule, the value P(D|A,C)P(C|A,B)P(A)P(B) is determined as shown below and
the last column shows the normalized probabilities which sum to 1 as derived in the previous
example (result rounded).

Table – Posterior probability for nodes A and B with node D and node C defined

Table – Posterior probability for node A with node D and node C defined

This shows that the prior for P(A=N) has increased from 0,1 to a posterior of 0,12 which is only a small
change. On the other hand, P(B=N|D=N,C=Y) has changed from 0,4 to 0,56 which is a more significant.

3.7.5 Outputs

The Bayesian approach can be applied to the same extent as classical statistics with a wide range of
outputs, e.g. data analysis to derive point estimators and confidence intervals. Its recent popularity is in

PRIME For Internal Use Only Document Ver.: 6.1


Page 33/38
Hex 2224
Risk Management Guidelines

relation to Bayes nets to derive posterior distributions. The graphical output provides an easily understood
model and the data can be readily modified to consider correlations and sensitivity of parameters.

3.7.6 Strengths and limitations


Strengths:
• all that is needed is knowledge on the priors;
• inferential statements are easy to understand;
• Bayes’ rule is all that is required;
• it provides a mechanism for using subjective beliefs in a problem.
Limitations:
• defining all interactions in Bayes nets for complex systems is problematic;
• Bayesian approach needs the knowledge of a multitude of conditional probabilities which are
generally provided by expert judgment. Software tools can only provide answers based on
assumptions.

PRIME For Internal Use Only Document Ver.: 6.1


Page 34/38
Hex 2224
Risk Management Guidelines

APPENDIX A – Some Typical Project Risks

Refer corporate risk database in PRIME

Path: StationH → Webapps → PRIME→ Organization level reports→Corporate Risk Database

PRIME For Internal Use Only Document Ver.: 6.1


Page 35/38
Hex 2224
Risk Management Guidelines

APPENDIX B – Industry Top 10 Risks

1. Consultants (Risk Management Project) – Aggressive schedules on fixed budgets almost certainly
will cause a schedule slip and a cost overrun. Appropriate staffing is incomplete early in project. No
time for needed training. Productivity rates needed to meet schedule is not likely to occur. Overtime
perceived as a standard procedure to overcome schedule deficiencies. Lack of analysis time may
result in incomplete understanding of product functional requirements.
2. Requirements (Risk Technical Product) – Poorly defined user requirements almost certainly will
cause existing system requirements to be incomplete. Documentation does not adequately describe
the system components. Interface document is not approved. Domain experts are inaccessible and
unreliable. Detailed requirements must be derived from existing code. Some requirements are
unclear, such as the software reliability analysis and the acceptance criteria. Requirements may
change due to customer turnover.
3. Development Process (Risk Technical Process) – Poorly conceived development process is highly
likely to cause implementation problems. There is introduction of new methodology from company
software process improvement initiative. Internally imposed development process is new and
unfamiliar. The Software Development Plan is inappropriately tailored for the size of the project.
Development tools are not fully integrated. Customer file formats and maintenance capabilities are
incompatible with the existing development environment.
4. Project Interfaces (Risk Management Project) – Dependence on external software delivery has a
very good chance of causing a schedule slip. Subcontractor technical performance is below
expectations. There is unproven hardware with poor vendor track record. Subcontractor commercial
methodology conflicts with customer MIL spec methodology. Customer action item response time is
slow. Having difficulty keeping up with changing / increasing demands of customers.
5. Management Process (Risk Management Process) – Poor planning is highly likely to cause an
increase in development risk. Management does not have a picture of how to manage object-oriented
(i.e. iterative) development. Project sizing is inaccurate. Roles and responsibilities are not well
understood. Assignment of system engineers is arbitrary. There is a lack of time and staff for
adequate internal review of products. No true reporting moves up through upper management.
Information appears to be filtered.
6. Development System (Risk Technical Process) – Inexperience with the development system will
probably cause lower productivity in the short term. Nearly all aspects of the development system are
new to the project team. The level of experience with the selected tool suite will place the entire team
on the learning curve. There is no integrated development environment for software, quality
assurance, configuration management, systems engineering, test and the program management office.
System administration support in tools, operating system, networking, recovery and backups is
lacking.

PRIME For Internal Use Only Document Ver.: 6.1


Page 36/38
Hex 2224
Risk Management Guidelines

7. Design (Risk Technical Product) – Unproven design will likely cause system performance problems
and inability to meet performance commitments. The protocol suite has not been analyzed for
performance. Delayed inquiry and global query are potential performance problems. As the design
evolves, database response time may be hard to meet. Object-oriented runtime libraries are assumed
to be perfect. Building state and local backbones of sufficient bandwidth to support image data are
questionable. The number of internal interfaces in the proposed design generates complexity that
must be managed. Progress toward meeting technical performance for the subsystem has not been
demonstrated.
8. Management Methods (Risk Management Process) – Lack of management controls will probably
cause an increase in project risk and a decrease in customer delight. Management controls of
requirements are not in place. Content and organization of monthly reports does not provide insight
into the status of project issues. Risks are poorly addressed and not mitigated. Quality control is a
big factor in project but has not been given high priority by the company (customer perspective).
SQA roles and responsibilities have expanded beyond original scope (company perspective).
9. Work Environment (Risk Technical Process) – Remote location of project team, we believe will
make organizational support difficult and cause downtime. Information given to technical and
management people does not reach the project team. Information must be repeated many times.
Project status is not available through team meetings or distribution of status reports. Issues
forwarded to managers via the weekly status report are not consistently acted on. Lack of
communication between software development teams could cause integration problems.
10. Integration and Test (Risk Technical Product) – Optimistic integration schedule has a better than
even chance of accepting an unreliable system. The integration schedule does not allow for the
complexity of the system. Efforts to develop tests have been underestimated. The source of data
needed to test has not been identified. Some requirements are not testable. Formal testing below the
system level is not required. There is limited time to conduct reliability testing.

PRIME For Internal Use Only Document Ver.: 6.1


Page 37/38
Hex 2224
Risk Management Guidelines

APPENDIX C – Tracking Status of Indicators - Examples

Indicator Status Formulation


Milestones 10% behind (Planned – Claimed) / planned milestones
Unit coding 10% behind Planned units versus completed
((1400–1250) / 1400)
Hours by phase 20% growth Requirements and design each overran hours
by 20%
Costs Same labor rate Supply of programmers exceeds demand
Software size 10% growth Extrapolated from LOCs/units completed to
date
Requirement Changes 75% changes Total of 3200 changes to date for 4300
requirements
Reuse/4GL use 20% less reuse Extrapolated from units completed to date
Date size Same data size Some negotiating on message content, TBD
Number of defects 2x expected Total of 300/140K LOC versus 170
140K LOC to date
CPU utilization Much higher, 90% System simulation and performance tests (CP
only)
Memory utilization Higher, now 75% Per 4GL data efficiency experiments (CP only)

PRIME For Internal Use Only Document Ver.: 6.1


Page 38/38

You might also like