KRI Models and Tools For FMECA and Criticality Analysis: Grant Agreement Nº: 768869 Call Identifier: H2020-FOF-2017
KRI Models and Tools For FMECA and Criticality Analysis: Grant Agreement Nº: 768869 Call Identifier: H2020-FOF-2017
KRI Models and Tools For FMECA and Criticality Analysis: Grant Agreement Nº: 768869 Call Identifier: H2020-FOF-2017
Ares(2019)2255633 - 29/03/2019
Strategies and Predictive Maintenance models wrapped around physical systems for
Zero-unexpected-Breakdowns and increased operating life of Factories
Z-BRE4K
Deliverable D3.3
KRI models and tools for FMECA and criticality analysis
Work Package 3
WP3 – Knowledge and Predictive Modelling
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement nº 768869.
The dissemination of results herein reflects only the author’s view and the European Commission
is not responsible for any use that may be made of the information it contains.
The information contained in this report is subject to change without notice and should not be construed as a
commitment by any members of the Z-BRE4K Consortium. The information is provided without any warranty of any
kind.
This document may not be copied, reproduced, or modified in whole or in part for any purpose without written
permission from the Z-BRE4K Consortium. In addition to such written permission to copy, acknowledgement of the
authors of the document and all applicable portions of the copyright notice must be clearly referenced.
© COPYRIGHT 2017 The Z-BRE4K Consortium.
All rights reserved.
Z-BRE4K Project
Grant Agreement nº 768869 – H2020-FOF-2017
Executive Summary
From description of Work Package 3, specifically task T3.3,
this deliverable D3.3 was following one of the objectives of
WP3, incorporating risk analysis, KRIs and FMECA within the
predictive maintenance solution of Z-BRE4K system.
Particularly, the detailed research on fundamental definitions
and FMEA types have been provided within section 3, while
section 4 presents various FMECA standards, its
Abstract characteristic and methodology. Furthermore, section 5
(Criticality Analysis Tool) and section 6 (Key Risk Indicators),
identified failures and potential risks respectfully. Finally,
open source examples summarized in section 7, that presents
the Process Failure Modes and Effect Analysis in production
process from ceramics and knitting industries, are completed
with the conclusion and next steps to be further followed.
Revision history
TABLE OF CONTENTS
LIST OF FIGURES ....................................................................................................... 6
LIST OF TABLES ......................................................................................................... 6
ABBREVIATIONS ....................................................................................................... 7
SUMMARY ............................................................................................................... 8
1 INTRODUCTION ................................................................................................. 9
2 OBJECTIVES AND SCOPE ..................................................................................... 9
3 FMEA .............................................................................................................. 10
4 FMECA............................................................................................................. 14
4.2.2 MIL-STD-1629A............................................................................................. 15
LIST OF FIGURES
Figure 1: The FMEA level hierarchy (adapted from IEC 60812). ................................................. 10
Figure 2: Deployment of different types of FMECA in the production stages. ........................... 11
Figure 3: Sample Process FMEA in the Automotive Industry Action Group (AIGA) FMEA-4. ..... 12
Figure 4: Example of risk calculation by FMEA. .......................................................................... 13
Figure 5: Relation among failure causes, modes and effects...................................................... 18
Figure 6: Relation among failure causes, modes and effects...................................................... 21
Figure 7: Example of three stage system failure sequence. ....................................................... 23
Figure 8: Event Tree. ................................................................................................................... 24
Figure 9: Differences among fault tree and event tree analysis. ................................................ 25
Figure 10: Fire secure system example (fault tree analysis). ...................................................... 26
Figure 11: Financial pricing model (event tree analysis). ............................................................ 26
Figure 12: KRI development. ....................................................................................................... 28
Figure 13: Key objectives linked with potential critical risks. ..................................................... 29
Figure 14: Asset type/failure mode and failure cause window .................................................. 33
Figure 15: Failure effect window................................................................................................. 34
Figure 16: Process Failure Mode & Effect Analysis in production process of Ceramic tiles. ...... 36
Figure 17: Process Failure Mode & Effect Analysis in production process of Knitting industry. 37
LIST OF TABLES
Table 1: KRI thresholds ................................................................................................................ 30
Table 2: Table of Severity ............................................................................................................ 34
Table 3: Table of Detectability .................................................................................................... 35
ABBREVIATIONS
Abbreviation Name
FMECA Failure mode, effects and criticality analysis
FMEA Failure mode and effects analysis
KRI Key Risk Indicator
RPN Risk Priority Number
FM Failure Mode
IATF International Automotive Task Force
WP Wok Package
CA Criticality Analysis
NASA National Aeronautics and Space Administration
RCM Reliability-Centred Maintenance
FTA Fault Tree Analysis
SUMMARY
The FMECA, KRI models and criticality analysis task T3.3, deals with failure modes, respective
causes and immediate/final effects providing an automated FME(C)A process with the goal to
replace the manual FMECA process. It will analyse Key Risk Indicator (KRI) number in each of the
project Z-BRE4Ks pilot cases, applying specific KRI model and applicable risk assessment
approach. Each risk will be categorized by a Risk Prioritization Number (RPN) having metrics for
both the probability and the severity of each risk, allowing mitigation and contingency actions
to lower the probability and the severity, respectively. Basically, FMECA, the automated module
of Z-BRE4K, will show ways a machinery system could potentially fail (i.e. Failure Modes (FM)s,
respective causes and immediate and final effects) while using both logic diagrams and fault
trees for these analyses in the systems background. The approach will determine the indenture
level in the machinery system, broken down by subsystems, replaceable units, individual parts,
etc., where failure effects identified at the lower level may become FMs at higher level, and the
FMs at the lower level may become failure causes at higher level, etc. The IEC60812 standard
will be followed while the based classification will consider effects with local or machinery
system level scope.
1 INTRODUCTION
Failure Mode and Effect Analysis (FMEA) techniques are used for more than 40 years in the
industry business. It was mainly developed by the U.S. automotive industry with its QS-9000
supplier requirements that were established in 1996, with the additional global efforts by the
International Automotive Task Force (IATF) to build on QS-9000 (and other international quality
standards) with the development of ISO/TS 16949. In 2002 there was a revision of ISO/TS 16949
that incorporates ISO 9001:2000 and defines the quality system requirements (and application
of ISO 9001) for automotive production and relevant service part organizations. Because FMEAs
are team based, several people need to be involved in the process. Effective FMEAs cannot be
done by one person alone filling out the FMEA forms.
In 1974, development of FMECA is sometimes incorrectly attributed to NASA. At the same time
as the space program developments, use of FMEA and FMECA was already spreading to civil
aviation. In 1967 the Society for Automotive Engineers released the first civil publication to
address FMECA. The civil aviation industry now tends to use a combination of FMEA and Fault
Tree Analysis in accordance with SAE ARP4761 instead of FMECA, though some helicopter
manufacturers continue to use FMECA for civil rotorcraft.
Ford Motor Company 1 began using FMEA in the 1970s after problems experienced with its Pinto
model, and by the 1980s FMEA was gaining broad use in the automotive industry. In Europe, the
International Electrotechnical Commission published IEC 812 (now IEC 60812) in 1985,
addressing both FMEA and FMECA for general use. The British Standards Institute published BS
5760–5 in 1991 for the same purpose.
3 FMEA
The purpose of FMEA is to study the results or effects of item failure on system operation and
to classify each potential failure according to its severity. Potential failure mode is defined as the
manner in which the process could potentially fail to meet the process requirements and/or
design intent. Potential failure modes should be described in physical or technical terms, not as
a symptom noticeable by the customer. Typical failure modes could be: Bent, Cracked, surface
too rough, deformed, hole too deep, hole off location etc. The FMEA is a bottom-up method,
where the system under analysis is first hierarchically divided into components (Figure 1). The
division shall be done in such a way that the failure modes of the components at the bottom
level can be identified. The failure effects of the lower level components constitute the failure
modes of the upper level components.
2Haapanen Pentti, Helminen Atte, 2002. FAILURE MODE AND EFFECTS ANALYSIS OF SOFTWARE-BASED
AUTOMATION SYSTEMS. STUK-YTO-TR 190.
The FMEA 3 team determines, by failure mode analysis, the effect of each failure and identifies
single failure points that are crucial. It may also rank each failure according to the criticality of a
failure effect and its probability of occurring.
▪ Design FMEA: This is used to analyse products before they are released to
manufacturing. A design focuses on failure modes caused by design deficiencies.
▪ Process FMEA: It can be used to analyse manufacturing and assembly processes. A
process FMEA focuses on failure modes caused by process or assembly deficiencies.
▪ Item(s).
▪ Function(s).
▪ Failure(s).
3 Lipol, L. S., & Haq, J. (2011). Risk analysis method: FMEA/FMECA in the organizations. International Journal of Basic
& Applied Sciences, 11(5), 74-82.
4 Potential Failure Mode and Effects Analysis FMEA Reference Manual (4TH EDITION) ISBN #9781605341361.
▪ Effect(s) of Failure.
▪ Cause(s) of Failure.
▪ Current Control(s).
▪ Recommended Action(s).
▪ Other relevant details.
In order to report and collect all the information, the standards provide templates to specify the
potential risk. Figure 3 shows an example of a FMEA procedure to meet the specific
requirements of product/process.
Figure 3: Sample Process FMEA in the Automotive Industry Action Group (AIGA) FMEA-4.
Most analyses of this type also include some method to assess the risk associated with the issues
identified during the analysis and to prioritize corrective actions. Two common methods include:
For calculating the risk in FMEA method, risk has three components which are multiplied to
produce a risk priority number (RPN):
▪ To use the Risk Priority Number (RPN) 5 method to assess risk, the analysis team must:
▪ Rate the severity of each effect of failure.
▪ Rate the likelihood of occurrence for each cause of failure.
▪ Rate the likelihood of prior detection for each cause of failure (i.e. the likelihood of
detecting the problem before it reaches the end user or customer).
5Kim, K. O., & Zuo, M. J. (2018). General model for the risk priority number in failure mode and effects analysis.
Reliability Engineering & System Safety, 169, 321–329.
The RPN can then be used to compare issues within the analysis and to prioritize problems for
corrective action. An example of risk calculation is explained with a practical scenario within the
Figure 4.
The first priority will be the potential failure 2 and 4 as we have highest severity ranking there.
The potential failures 1 and 3 have same severity ranking 2. But 1 has occurrence 10 higher than
3. So, it should be prioritized next providing the results as:
4 FMECA
It is familiar that industries are using FMECA reports that consists of system description, ground
rules and assumptions, conclusions and recommendations, corrective actions to be followed,
and the attached FMECA matrix which may be in spreadsheet, worksheet or database form.
Also, according to Federal Aviation Administration (FAA) research report for commercial space
transportation, it was reported “Failure Modes, effects, and Criticality Analysis is an excellent
hazard analysis and risk assessment tool, but it suffers from other limitations. This alternative
does not consider combined failures or typically include software and human interaction
considerations. It also usually provides an optimistic estimate of reliability. Therefore, FMECA
should be used in conjunction with other analytical tools when developing reliability
estimates”6. Within the Z-BRE4K, the automated module, FMECA, will be developed to provide
information and ways a machinery system could potentially fail defining FMs, respective causes
and immediate and final effects.
4.1 Definition
The FMECA is composed of two separate analyses, the Failure Mode and Effects Analysis (FMEA)
and the Criticality Analysis (CA). The FMEA analyses different failure modes and their effects on
the system while the CA classifies or prioritizes their level of importance based on failure rate
and severity of the effect of failure. The ranking process of the CA can be accomplished by
utilizing existing failure data or by a subjective ranking procedure conducted by a team of people
with an understanding of the system.
Although the analysis can be applied to any type of system, this manual will focus on applying
the analysis to a C4ISR facility. The FMECA should be initiated as soon as preliminary design
information is available. The FMECA is a living document that is not only beneficial when used
during the design phase but also during system use. As more information on the system is
available the analysis should be updated in order to provide the most benefit.
6 Research and Development Accomplishments FY 2004 (pdf). Federal Aviation Administration. 2004. Retrieved
2010-03-14.
7 [FMECA standards] http://www.julkari.fi/bitstream/handle/10024/124480/stuk-yto-tr190.pdf?sequence=1
sections 3.2-3.4.
Each potential failure is ranked by the severity of its effect so that corrective actions may be
taken to eliminate or control design risk. High risk items are those items whose failure would
jeopardize the mission or endanger personnel. The techniques presented in this standard may
be applied to any electrical or mechanical equipment or system. Although MIL-STD-1629A has
been cancelled, its concepts should be applied during the development phases of all critical
systems and equipment whether it is military, commercial or industrial systems/products 7.
Short overview of the used standard2 is presented in the sub-sections below.
4.2.2 MIL-STD-1629A
MIL-STD-1629A is dated on November 24th, 1980, and has been published by the United States
Department of Defence. The standard establishes requirements and procedures for performing
a failure mode, effects, and criticality analysis. In the standard FMECA is presented to
systematically evaluate and document the potential impacts of each functional or hardware
failure on mission success, personnel and system safety, system performance, maintainability
and maintenance requirements. Each potential failure is ranked by the severity of its effect in
order that appropriate corrective actions may be taken to eliminate or control the risks of
potential failures. The document details the functional block diagram modelling method, defines
severity classification and criticality numbers. The following sample formats are provided by the
standard:
▪ FMEA implementation,
▪ what is an FMEA?
▪ format for documenting product/process FMEA on machinery,
▪ development of a product/process FMEA,
▪ suggested evaluation criteria for severity, detection and occurrence of failure.
4.3 Benefits
The FMECA will highlight single point failures requiring corrective action; aid in developing test
methods and troubleshooting techniques; provide a foundation for qualitative reliability,
maintainability, safety and logistics analyses; provide estimates of system critical failure rates;
provide a quantitative ranking of system and/or subsystem failure modes relative to mission
importance; and identify parts & systems most likely to fail.
Therefore, by developing a FMECA during the design phase of a facility, the overall costs will be
minimized by identifying single point failures and other areas of concern prior to construction,
or manufacturing. The FMECA will also provide a baseline or a tool for troubleshooting to be
used for identifying corrective actions for a given failure. This information can then be used to
perform various other analyses such as a Fault Tree Analysis or a Reliability-Centered
Maintenance (RCM) analysis.
The Fault Tree Analysis is a tool used for identifying multiple point failures; more than one
condition to take place in order for a particular failure to occur. This analysis is typically
conducted on areas that would cripple the mission or cause a serious injury to personnel.
The RCM analysis is a process that is used to identify maintenance actions that will reduce the
probability of failure at the least amount of cost. This includes utilizing monitoring equipment
for predicting failure and for some equipment, allowing it to run to failure. This process relies
on up to date operating performance data compiled from a computerized maintenance system.
This data is then plugged into a FMECA to rank and identify the failure modes of concern.
4.4 Characteristics
The FMECA should be scheduled and completed concurrently as an integral part of the design
process. Ideally this analysis should begin early in the conceptual phase of a design, when the
design criteria, mission requirements and performance parameters are being developed. To be
effective, the final design should reflect and incorporate the analysis results and
recommendations. However, it is not uncommon to initiate a FMECA after the system is built in
order to assess existing risks using this systematic approach.
Since the FMECA is used to support maintainability, safety and logistics analyses, it is important
to coordinate the analysis to prevent duplication of effort within the same program. The FMECA
is an iterative process. As the design becomes mature, the FMECA must reflect the additional
detail. When changes are made to the design, the FMECA must be performed on the redesigned
sections. This ensures that the potential failure modes of the revised components will be
addressed. The FMECA then becomes an important continuous improvement tool for making
program decisions regarding trade-offs affecting design integrity.
4.5 Redundancy
Redundancy means, that there is more than one means for performing a required function. The
FMEA considers possible sources when analysing a system that uses redundancy to maintain
function to mitigate consequences in the event of failure. The objective of redundancy, is to
describe at a high level the distribution of systems and components into redundant groups. High
level dependencies and intersections between these groups must be described.
The intended normal operation and operation after relevant single failures (normally one failure
at the time) shall also be specified. When redundancy is employed to reduce system vulnerability
and increase uptime, failure rates need to be adjusted prior to using the preceding formula. This
can be accomplished by using formulas from various locations depending on the application.
1. The failure mode happens in the present and it describes the way in which the failure is
observed.
▪ Computational.
▪ Logic.
▪ Data I/0.
▪ Data Handling.
▪ Interface.
▪ Data Definition.
▪ Data Base.
▪ Other.
Ristord et. al. (2001) 9 give the following list of five general purpose failure modes at processing
unit level:
8 Reifer, D.J., 1979, Software Failure Modes and Effects Analysis. IEEE Transactions on Reliability, R-28, 3, pp. 247–
249.
9 Ristord, L. & Esmenjaud, C., 2001, FMEA Per-ored on the SPINLINE3 Operational System Software as part of the
TIHANGE 1 NIS Refurbishment Safety Case. CNRA/CNSI Workshop 2001–Licensing and Operating Experience of
Computer Based I&C Systems. Ceské Budejovice–September 25–27, 2001.
For each production machine, an indenture level (submachines, replaceable units, individual
parts, etc.) is defined. To this end, failure effects identified at the lower level may become FMs
at higher level, and the FMs at the lower level may become failure causes at higher level, etc.
Our high-level FMs classification distinguishes the following main categories:
▪ failure during operation,
▪ failure to operate at prescribed time,
▪ failure to cease operation at prescribed time,
▪ premature operation,
▪ failure due to lower level component.
4.8 Methodology
The FMECA is composed of two separate analyses, the FMEA and the Criticality Analysis (CA).
The FMEA must be completed prior to performing the CA. It will provide the added benefit of
showing the analysts a quantitative ranking of system and/or subsystem failure modes. The
Criticality Analysis allows the analysts to identify reliability and severity related concerns with
particular components or systems.
One is the “hardware approach”, which lists individual hardware items and analyses their
possible failure modes. According to MIL - STD 1629A, the hardware approach is normally
utilized in a part level up (bottom - up) approach. The other is the “functional approach,” which
recognizes that every item is designed to perform a number of functions that can be classified
as outputs.
All FMEAs require identifying and understanding the functions of each item being analysed,
regardless of whether the item is a system, subsystem, component, or part. In addition, great
care must be taken to address all interfaces between parts, components, subsystems, and users,
which usually account for more than half of the potential failure modes.
The first step is to calculate the expected failures for each Item. This is the number of failures
estimated to occur based on the reliability/unreliability of the item at a given time. Reliability is
the probability that an item will perform a required function without failure under stated
conditions for a stated period of time. Unreliability is one minus reliability.
The “time”” for the calculation is most often the target or useful life of the item. With an
exponential distribution, expected failures is calculated by multiplying the failure rate by the
time (λt), but it is estimated differently for other distributions. Care must be taken to ensure
calculations for reliability/unreliability and expected failures are based on correct failure
distributions.
The second step is to identify the Mode Ratio of Unreliability for Each Potential Failure Mode.
This is the portion of the item’ s unreliability (in terms of expected failures) attributable to each
potential failure mode. In other words, this represents the percentage of all failures for the item
that will be due to the failure mode under consideration.
The total percentage assigned to all modes must be equal to 100%. The failure mode ratio of
unreliability can be based on reliability growth testing data for the current design, field data
and/or test data from a similar design, engineering judgment, or apportionment libraries such
as MIL – HDBK – 338B.
The third step is to rate the probability of Loss that will result from each failure mode that will
occur. This is the probability that a failure of the item under analysis will cause a system failure.
The fourth and the firth step are to calculate the criticality mode for each potential failure mode
and to calculate the item criticality for each item.
Qualitative Criticality Analysis does not involve the same rigorous calculations as Quantitative
Criticality Analysis. To use Qualitative Criticality Analysis to evaluate risk and prioritize corrective
actions:
The first step is to rate the severity of the potential effects of failure. The severity ranking is
determined using the unique severity scale for FMECA. The second step is to rate the likelihood
of occurrence for each potential failure mode. The occurrence ranking is determined using the
unique occurrence scale for FMECA. Compare failure modes using a criticality matrix. The
criticality matrix identifies severity on the horizontal axis and occurrence on the vertical axis.
A typical failure modes and effects analysis incorporates some method to evaluate the risk
associated with the potential problems identified through the analysis. The two most common
methods are Risk Priority Numbers (described in sub-section 3.3) and Criticality Analysis
Method. The MIL-STD-1629A11 document describes two types of criticality analysis: qualitative
and quantitative (sub-section 4.9).
To use qualitative criticality analysis to evaluate risk and prioritize corrective actions, the analysis
team must a) rate the severity of the potential effects of failure and b) rate the likelihood of
occurrence for each potential failure mode. It is then possible to compare failure modes via a
Criticality Matrix (Figure 6), which identifies severity on the horizontal axis and occurrence on
the vertical axis.
10 Carpitella, S., Certa, A., Izquierdo, J., & La Fata, C. M. (2018). A combined multi-criteria approach to support
FMECA analyses: A real-world case. Reliability Engineering & System Safety, 169, 394–402.
11 Borgovini, R., Pemberton, S., & Rossi, M. (1993). Failure Mode, Effects, and Criticality Analysis (FMECA).
To use quantitative criticality analysis, the analysis team considers the reliability/unreliability for
each item at a given operating time and identifies the portion of the item’s unreliability that can
be attributed to each potential failure mode. For each failure mode, they also rate the
probability that it will result in system failure. The team then uses these factors to calculate a
quantitative criticality value for each potential failure and for each item.
The nature of cascading failures may vary considerably and it is difficult to provide a common
definition or characteristic of such an event that would apply to all possible scenarios. A domino
effect is a principal characteristic of cascading failures when an initial event, which has little or
no adverse effect, is transmitted downstream and ones of the subsequent failures generates
hazardous effects. Cascading failures are considered "low-probability high-consequence event".
The "classic" cascading failure is characterized by a rapid propagation of failures. However, the
cause-effect chain of events, leading ultimately to a hazardous situation, has to be considered,
even if the failure propagation is spread over a large period of operation. Moreover, the
triggering event may be a permanent or a temporary fault. Therefore, an important attribute of
a cascading failure that requires consideration in the analysis is the time factor.
The nature of cascading failures may vary considerably and it is difficult to provide a common
definition or characteristic of such an event that would apply to all possible scenarios. The
prediction and analysis of cascading failures are complex due to their random dynamic involving
continuous and switching operations that suddenly change the system’s configuration. In the
progression of time, a Failure Mode comes between a Cause and an Effect. Any Effect that itself
has an Effect might also be a Failure Mode. In different contexts, a single event may be a Cause,
an Effect, and a Failure Mode.
Cascading failures in production systems normally occur as a result of initial disturbance or faults
on various mechanical or electrical elements, closely followed by errors of human operators.
The stability and secure operation of the production lines have a great impact on other related
systems in a factory. It is vital to identify any disturbances on the critical elements in advance
and develop effective protection strategies to alleviate the cascading failures.
A cascading failure is defined as a sequence of component malfunctions that include at least one
triggering component malfunction and subsequent tripping the other components. Note that a
cascading failure does not necessarily lead to a cascading break down.
There are a number of ways to construct an event tree. They use binary logic gates that has two
options. Each branching point is called a node. Simple event trees tend to be presented at a
system level, glossing over the detail. The following Figure 7 is a generic example of how they
can be drawn.
12Ferdous, R., Khan, F., Sadiq, R., Amyotte, P., & Veitch, B. (2011). Fault and Event Tree Analyses for Process
Systems Risk Analysis: Uncertainty Handling Formulations. Risk Analysis, 31(1), 86–107.
The diagram shows an initialling event and the subsequent operation or failure of three systems
which would normally operate should the event occur. Each system can either operate or not.
Because of a multitude of combinations of success/failure of each system, there are multiple
possible final outcomes. The diagram also illustrates the way event trees can be quantified. The
initiating event is typically specified as an expected annual frequency and the success for each
system as a probability.
Fault tree analysis (FTA) is a kind of analysis and logic diagram for finding deductive failures in
which using logic flows to combine different lower-level factors. It is also used for tracing all
possible important factors and branches of events. Normally the more complex the case is, the
more extensive the framework of fault tree framework will be. Here you can see an example of
showing different pathways for possibilities.
The structure of fault tree and event tree analysis is different. The general direction of an event
tree is from left to right laying on the horizontal axis, while fault tree graphs are shown in the
up-to-down design. The layout of fault tree analysis is based on the traditional diagram structure
of sciences, engineering or some other related subjects. In contrast, the structure design of
event tree (Figure 8) easily displays categories with long titles and texts.
Although the diagram of fault tree and event tree analysis (Figure 9) seem similar for some parts,
there are still differences in terms of their analytical methodology. Both two types involve the
identifications and classifications of events and factors, but with opposite focuses on undesired
events. The main purpose of fault tree analysis is preventing losses. In contrast, event tree
analysis is ideal for mitigating bad results. Or you can say, fault tree analysis is cause-oriented
whereas event tree analysis is consequences-oriented. You can see the below diagram for more
details.
Both of these two types have helpful uses in reality for different specific fields. Fault tree
analysis is ideal for the most of sciences related subjects, especially safety and reliability or
software engineering, aerospace, energy, chemical process, pharmaceutical analysis, the design
of diagnostic manuals and the fuel power design for aircraft. You can see a fire security system
example below (Figure 10) that based on fault tree analysis. P1, P2, and P3 are different
probabilities for each pathway. In these cases, fault tree analysis is mainly used for:
Event tree analysis is usually used for financial market analysis, especially those topics related
to financial assets pricing and risk analysis. Readers can easily see the probabilities between
different pathways of a financial model that based on event tree analysis diagram. Here you can
see a financial pricing model sample (Figure 11) for the practical analysis of stock pricing. For
simplification, the probabilities in this example only have two values P1 and P2, and the total
number of stages (time period) is 3. The Expected Value (EV) for each stage is the outcome of
corresponding stock price times their own probabilities.
An indicator becomes «key» when it tracks an especially important risk exposure (key risk), or it
does so especially well (key indicator), or ideally both. An Indicator is a numeric value produced
through the combination of measures which provides business insight. Key Risk Indicators (KRIs)
are critical predictors of unfavorable events that can adversely impact organizations. They
monitor changes in the levels of risk exposure and contribute to the early warning signs that
enable organizations to report risks, prevent crises and mitigate them in time.
KRIs are born out of high-quality data used to track a specific risk. Developing effective KRIs
mandates a thorough understanding of objectives and risk-related events that might affect the
achievement of those objectives. While most organizations monitor KRIs that have developed
over time, it is essential for these to be regularly evaluated for efficiency and continuously
monitored to highlight potential risks. Over time, they must be augmented with new KRIs to
meet the dynamic circumstances as newer risks emerge and the older KRIs may be insufficient.
Once the Risk Management team assesses all its risks and scores their severity according to
probability (or likelihood) and impact, it is possible to extract and isolate the top risks. It is then
possible to define specific data which must be collected regularly to measure the ongoing status
of those risks. For each KRI, upper and lower acceptable risk limits (warning thresholds) are
defined, allowing management to track evolution and trends for each risk and KRI. This
methodology enables the usage of Red, Amber and Green (RAG) limits which are useful since a
“soft” amber limit can trigger an action before reaching the “hard” red limit.
1. Relevant: the indicator/data helps identify, quantify, monitor or manage risk and/or risk
consequences that are directly associated with key business objectives/KPIs.
2. Measurable: the indicator/data is able to be quantified (a number, percentage, etc.) and
is reasonably precise, comparable over time, and is meaningful without interpretation.
3. Predictive: the indicator/data can predict future problems that management can
preemptively act on.
4. Easy to monitor: the indicator/data should be simple and cost effective to collect, parse,
and report on.
5. Auditable: you should be able to verify your indicator/data, the way you sourced it,
aggregated it, and reported on it.
6. Comparable: it’s important to be able to benchmark your indicator/data, both internally
and to industry standards, so you can verify the indicator thresholds.
▪ Supporting Risk Assessments - KRIs help in adding more detail and information to risk
assessments, making them more reliable and informative to management
▪ Tolerance levels and thresholds - KRIs detail at what level a risk is considered important
for attention or for direct intervention
▪ Trending KRIs - KRIs can help management track trends in risks to the organisation. This
can help to identify areas where greater investment may be needed or where
opportunities might lie.
Lagging - monitor data retrospectively to identify changes in the pattern or trend of risk /
activities. These types of KRIs ensure that the exposure is minimised as soon as practicable to
prevent or reduce further exposure or consequence.
Leading / predictive - are used to signal changes in the likelihood of a risk event. They are more
likely to aid management in acting in advance of risks materialising.
This graphic at Figure 12 represents the four steps needed for KRI development. An effective set
of KRI metrics will provide insight into potential risk that may impact the realisation of objectives
or may indicate the presence of new opportunities.
Step 1: Identification
Step 2: Selection
Step 3: Reporting
Step 4: Actions
▪ Action plans should be created where KRIs are trending towards highest threshold,
▪ Target competition dates for actions should be set and included in reporting.
The below diagram (Figure 13) illustrates the identification of four key objectives aligned to the
entity purpose. Linked to the objectives are several potential critical risks that may impact on
one or more of the objectives. KRIs have been mapped to each critical risk to reduce the
likelihood of the risk occurring and information to Senior Management of any risk that could
potentially hinder the achievement of the entity objectives and strategy.
When determining the thresholds and trigger points for KRIs, consider the following:
# of key persons voluntary resignations <1 per 1 – 3 per >3 per quarter
(key persons are those identified as quarter quarter
successors to senior roles)
In the above example, data would be available from HR systems and action should be taken at
the amber trigger to understand reason for leaving, this could be done via methods such as exit
interviews, looking at staff engagement survey results or one on one meetings with key persons.
Reviewing a risk event that has affected your entity in the past is a great way to determine
leading indicators that could assist in identifying a similar emerging risk from happening again.
Consider the root causes of the past event, speak to subject matter experts who managed and
implemented actions to minimise the impact of that risk and understand the availability of data
that could be used to reduce the likelihood of that risk event reoccurring. The closer the KRI is
to the root cause of the risk event, the more likely the KRI will trigger pro-active management
and action.
An effective method for developing KRIs begins by analysing a risk event that has affected the
organization in the past (or present) and then working backwards to pinpoint intermediate and
root cause events that led to the ultimate loss or lost opportunity. The goal is to develop key risk
indicators that provide valuable leading indications that risks may be emerging. The closer the
KRI is to the ultimate root cause of the risk event, the more likely the KRI will provide
management time to proactively take action to respond to the risk event.
Virtually all organizations possess existing risk metrics that have evolved over time. These
metrics should be carefully evaluated for their efficacy and continue to be employed if found to
be valuable in highlighting potential emerging risks. Augmenting these existing KRIs with new
metrics is likely to be required, however.
Another important element in designing effective KRIs involves the assurance that all parties
involved in collecting and aggregating KRI data are clear about definitions of individual data
items to be captured and any conversion or standardization methodology to be utilized. Without
confidence in the uniformity of the KRI measurement approach, aggregated information will lack
robustness and introduce noise into the ultimate decision process.
An important element of any KRI is the quality of the available data used to monitor a specific
risk. Attention must be paid to the source of the information, either internal to the organization
or drawn from an external party. Sources of information are likely to exist that can help inform
the choice of KRIs to be employed. For example, internal data may be available related to prior
risk events that can be informative about potential future exposures. However, internal data is
typically unavailable for many risks—especially those that have not been encountered
previously. And, often risks likely to have a significant impact may arise from external sources,
such as changes in economic conditions, interest rate shifts, or new regulatory requirements or
legislation.
Thus, many organizations discover that relevant KRIs are often based on external data, given
that many root cause events and intermediate events that affect strategies arise from outside
the organization.
External sources such as trade publications and loss registries compiled by independent
information providers may be helpful in identifying potential risks not yet experienced by the
organization. Discussions with key stakeholders such as customers, employees and suppliers
may provide important insights into risks they face that may ultimately create risks for the
organization. A careful understanding of regulatory and legal requirements that
must be fulfilled is likely to be helpful in anticipating potential risks and events that precede
them. KRI data sourced from external and/or independent parties provides the benefit of
objectivity. External/independent parties are not necessarily unaffiliated with the organization,
but are removed from the business unit from which the KRI is measured. Almost certainly, trade-
offs will be required in this area. Those individuals charged with ongoing management of a
particular risk are the least objective source (but at times may be the only available resource for
the data required to produce the KRI in question). A careful validation of external sources is
desirable to enhance confidence in the ultimate effectiveness of the KRI built from that data.
It is unlikely that a single KRI will adequately capture all facets of a developing risk or risk trend.
For this reason, it is helpful to analyse a collection of KRIs simultaneously to help form a better
understanding of the risk being monitored. That said, some KRIs are likely to possess superior
predictive power over other risk metrics and it will be important to weight each piece of
information to reflect its past performance in forecasting a risk event. Some have referred to
this process as assembling a mosaic of information that collectively can best provide the early
warning of potential threats developing over time. Realistically, substantial judgment and
experience must be brought to bear on this process to extract the most meaningful inferences.
As the use of KRIs evolves in an organization, opportunities for making these judgments will
likely yield improvements in KRI performance.
7 Z-BRE4K EXAMPLE
During the manufacturing processes, it is important to prioritize matters and to determine the
order and the time-frame for the predictive actions that will take actions. FMECA s/w has been
designed to automate and facilitate FMEA/FMECA process and provide flexible data that can be
further used by management and reporting capabilities for caring out the measures to address
the most serious concerns. Thus, the FMEA/FMECA methodologies have been used to design
and to identify potential failure modes for processes (but also products) before the problems
occur, to assess the risk associated with those failure modes and to provide the Risk Priority
Number (RPN) values due to occurred errors. Design, development, manufacturing, service and
other activities that improve reliability and increase efficiency can be supported with
FMEA/FMECA analysis.
It is familiar that FMEA/FMECA techniques are widely used throughout various industries e.g.
automotive, aerospace, medical and other manufacturing industries, where the flexible analysis
method can be performed at various stages in the product life cycle. The effects analysis within
Z-BRE4K considers effects with local or machinery system level with severity levels, detectability
scale and probability of failure to occurs within plastic, packing and automotive industry.
When using FMECA within Z-BRE4K (Figure 14), an end user needs to:
1. define and specify an asset type e.g. product, process of production, assembly, service,
machine, etc.
When defining the asset type, it is important to have in mind the potential failure that is
identified in terms of:
2. assess the seriousness of the effect of the potential failure mode providing the input of
severity level presented in Table 2,
RANKING
CLASSIFICATION CRITERIA
LEVEL
3. define the level of detectability presented in Table 313 and how easy is to detect the
potential failure (e.g. through physical tests, mathematical modelling, prototype
testing, feasibility reviews etc.),
RANKING
CLASSIFICATION CRITERIA
LEVEL
4. define the probability (%) of the chance that one of the specific cause/mechanisms will
occur.
The above information and the effect on the total system is studied and Risk Priority Number
(RPN) is automatically provided and listed from high to low. The RPN will be further used by
management (decision makers) and DSS software applications, for appropriate correction steps
and action that will be taken (or planned) to minimize the probability of failure or to minimize
the effect of failure.
13
Tejaskumar S. Parsana and Mihir T. Patel. A Case Study: A Process FMEA Tool to Enhance Quality and Efficiency of
Manufacturing Industry. Bonfring International Journal of Industrial Engineering and Management Science, Vol. 4,
No. 3, August 2014.
Within the Z-BRE4K FMECA we have presented the Process Failure Mode and Effect Analysis in
production process of Ceramic tiles that are publicly available14. This company’s objective and
purpose were to achieve and make quality controls to the final product for the feasance of
quality standards. The published data were used, RPN was obtained automatically (Figure 14)
and FMECA further provides information for upcoming use and analysis approach and
estimation and evaluation of risks
Figure 16: Process Failure Mode & Effect Analysis in production process of Ceramic tiles.
Furthermore, the case in the knitting industry15 was found and the example of fabric was used
for presenting the Z-BRE4K FMECA analysis as well. The results are presented below in Figure
15.
It can be stated that with continuous FMECA method, the manufacturing process efficiency of
product quality is improved while decreasing the number of defective products and saving of
rework cost and time13.
Figure 17: Process Failure Mode & Effect Analysis in production process of Knitting industry.
8 CONCLUSION
At the conclusion of the FMECA, critical items/failure modes are identified and corrective action
recommendations made based on the criticality list and/or the Criticality Matrix generated by
the Criticality Analysis. FMEA/FMECA analysis is a flexible process that can be adapted to meet
the particular needs of the industry and/or the organization.
Utilizing the criticality list, the items with the highest criticality number or RPN receive attention
first. Utilizing the Criticality Matrix (recommended), items in the upper most right-hand
quadrant will receive attention first. Typical recommendations call for design modifications such
as; the use of higher quality components, higher rated components, design in redundancy or
other compensating provisions.
Recommendations cited must be fed back into the design process as early as possible in order
to minimize iterations of the design. The FMECA is most effective when exercised in a proactive
manner to drive design decisions, rather than to respond after the fact.
Future work is to connect the FMECA with the various IDS connectors per user case. After that
the data will be consumed automatically, and will not be inputted by hand as it happens now.
Also, the FMECA will communicate with the DSS software application (D4.2).
9 REFERENCES
[1] Reliability Engineering: A Life Cycle Approach – 1st Edition - Edgar Bradley
[2] Haapanen Pentti, Helminen Atte, 2002. FAILURE MODE AND EFFECTS ANALYSIS OF
SOFTWARE-BASED AUTOMATION SYSTEMS. STUK-YTO-TR 190.
[3] Lipol, L. S., & Haq, J. (2011). Risk analysis method: FMEA/FMECA in the organizations.
International Journal of Basic & Applied Sciences, 11(5), 74-82.
[4] Potential Failure Mode and Effects Analysis FMEA Reference Manual (4TH EDITION) ISBN
#9781605341361.
[5] Kim, K. O., & Zuo, M. J. (2018). General model for the risk priority number in failure mode
and effects analysis. Reliability Engineering & System Safety, 169, 321–329
[6] Research and Development Accomplishments FY 2004 (pdf). Federal Aviation Administration.
2004. Retrieved 2010-03-14.
[8] Reifer, D.J., 1979, Software Failure Modes and Effects Analysis. IEEE Transactions on
Reliability, R-28, 3, pp. 247–249.
[9] Ristord, L. & Esmenjaud, C., 2001, FMEA Per-ored on the SPINLINE3 Operational System
Software as part of the TIHANGE 1 NIS Refurbishment Safety Case. CNRA/CNSI Workshop 2001–
Licensing and Operating Experience of Computer Based I&C Systems. Ceské Budejovice–
September 25–27, 2001.
[10] Carpitella, S., Certa, A., Izquierdo, J., & La Fata, C. M. (2018). A combined multi-criteria
approach to support FMECA analyses: A real-world case. Reliability Engineering & System Safety,
169, 394–402.
[11] Borgovini, R., Pemberton, S., & Rossi, M. (1993). Failure Mode, Effects, and Criticality
Analysis (FMECA).
[12] Ferdous, R., Khan, F., Sadiq, R., Amyotte, P., & Veitch, B. (2011). Fault and Event Tree
Analyses for Process Systems Risk Analysis: Uncertainty Handling Formulations. Risk Analysis,
31(1), 86–107.
[13] Tejaskumar S. Parsana and Mihir T. Patel. A Case Study: A Process FMEA Tool to Enhance
Quality and Efficiency of Manufacturing Industry. Bonfring International Journal of Industrial
Engineering and Management Science, Vol. 4, No. 3, August 2014.
[14] P. H. Tsarouhas, D. Arampatzaki. Application of Failure Modes and Effects Analysis (FMEA)
of a Ceramic Tiles Manufacturing Plant. 1ST OLYMPUS INTERNATIONAL CONFERENCE ON
SUPPLY CHAINS, 1-2 OCTOBER, KATERINI, GREECE.
[15] Vedat Özyazgan, Fatma Zehra Engin Sagirli. FMEA analysis and applications in knitting
industry. Research gate. July 2013Tekstil ve Konfeksiyon 23(3):228-232.