SIS Book - Chapter 03 - FMECA
SIS Book - Chapter 03 - FMECA
FMECA
RAMS Group
Department of Mechanical and Industrial Engineering
NTNU
(Version 0.1)
Learning Objectives
The main learning objectives associated with these slides are to:
I To understand why Failure modes, effects, and criticality analysis
(FMECA) is used
I To understand terminology used in an FMECA
I To learn the steps of an FMECA
I To realize the pros and cons of an FMECA
The slides provide additional information to Chapter 3 in Reliability of
Safety-Critical Systems: Theory and Applications.
DOI:10.1002/9781118776353.
Outline of Presentation
1 Introduction
3 Terminology
4 FMECA procedure
5 FMECA Worksheet
6 Risk Ranking
7 Corrective Actions
What is FMECA?
FMECA – FMEA
Initially, the FMECA was called FMEA (Failure modes and effects analysis).
The C in FMECA indicates that the criticality (or severity) of the various
failure effects are considered and ranked.
Background
I FMECA was one of the first systematic techniques for failure analysis
I FMECA was developed by the U.S. Military. The first guideline was
Military Procedure MIL-P-1629 “Procedures for performing a failure
mode, effects and criticality analysis” dated November 9, 1949
I FMECA is the most widely used reliability analysis technique in the
initial stages of product/system development
I FMECA is usually performed during the conceptual and initial design
phases of the system in order to assure that all potential failure modes
have been considered and the proper provisions have been made to
eliminate these failures
I Assist in selecting design alternatives with high reliability and high safety
potential during the early design phases
I Ensure that all conceivable failure modes and their effects on operational
success of the system have been considered
I List potential failures and identify the severity of their effects
I Develop early criteria for test planning and requirements for test equipment
I Provide historical documentation for future reference to aid in analysis of
field failures and consideration of design changes
I Provide a basis for maintenance planning
I Provide a basis for quantitative reliability and availability analyses.
The FMECA should be initiated early in the design process, where we are
able to have the greatest impact on the equipment reliability. The locked-in
cost versus the total cost of a product is illustrated in the figure:
100 100
85%
s ts
Operation (50%)
80 Co 80
-In
ed
ck
Lo
%
% Locked-In Costs
% Total Costs
60 60
40 40
Production (35%)
20 20
12%
3%
0 0
Concept/Feasibility Design/Development Production/Operation
Types of FMECA
GHFRUUHQWH
I Bottom-up approach
• The bottom-up approach is used when a system concept has been
decided. Each component on the lowest level of indenture is studied
one-by-one. The bottom-up approach is also called hardware approach.
The analysis is complete since all components are considered.
I Top-down approach
• The top-down approach is mainly used in an early design phase before
the whole system structure is decided. The analysis is usually function
oriented. The analysis starts with the main system functions - and how
these may fail. Functional failures with significant effects are usually
prioritized in the analysis. The analysis will not necessarily be complete.
The top-down approach may also be used on an existing system to focus
on problem areas.
FMECA Standards
I MIL-STD 1629 “Procedures for performing a failure mode and effect analysis”
I IEC 60812 “Procedures for failure mode and effect analysis (FMEA)”
I BS 5760-5 “Guide to failure modes, effects and criticality analysis (FMEA and
FMECA)”
I SAE ARP 5580 “Recommended failure modes and effects analysis (FMEA)
practices for non-automobile applications”
I SAE J1739 “Potential Failure Mode and Effects Analysis in Design (Design
FMEA) and Potential Failure Mode and Effects Analysis in Manufacturing and
Assembly Processes (Process FMEA) and Effects Analysis for Machinery
(Machinery FMEA)”
I SEMATECH (1992) “Failure Modes and Effects Analysis (FMEA): A Guide for
Continuous Improvement for the Semiconductor Equipment Industry”
Definition of Failure
Shutdown valve
A maximum closing time of a shutdown valve may be set to 15 seconds. A failure of
the function occurs when the closing time exceeds 15 seconds.
Failure Attributes
A failure may:
I Develop gradually
I Occur as a sudden event
Fault
In most cases, an item will have a fault after a hardware failure has occurred
– and we say that the item is in a failed state.
Design and installation errors may also prevent the item from performing
its required function. The item has a fault that is not preceded by any
hardware failure and we call this fault a systematic fault.
Error
A failure may originate from an error. When the failure occurs, the item
enters a fault state.
Performance
Target value
Error Acceptable deviation
Actual
performance Failure
(event)
Fault
(state)
Time
Failure Mode
Z Failure mode: The way a failure is observed on a failed item. [IEC 191-05-22]
A failure mode is the way in which an item could fail to perform its required
function. An item can fail in many different ways – a failure mode is a
description of a possible state of the item after it has failed.
Pump
Performance requirement: The pump must provide an output between 100 and 110
liters per minute.
Associated failure modes may be:
I No output
I Too low output
I Too high output
I Too much fluctuation in output
Classification of Failures
Special category:
I Common-cause failures (CCFs)
I Causes:
• Random (hardware) faults
• Systematic(“functional”) faults (including software faults)
I Effects:
• Safe failures (typically: untimely activation of function)
• Dangerous failures (typically: function prevented)
• No part/no effect failures (typically: Not associated with the main
function)
I Detectability:
• Detected - revealed by online diagnostics
• Undetected - revealed by functional tests or upon a real demand for
activation
System
More level 1 subsystems
Level of intendure
Subsystem 1 Subsystem 2
More level 2 subsystems More level 2 subsystems
System boundary
Rules of thumb:
I The analysis should be carried out on an as high level in the system
hierarchy as possible (“screening of subsystems to study in more
detail”)
I If unacceptable consequences are discovered on this level of resolution,
then the particular element (subsystem, sub-subsystem, or component)
should be divided into further detail to identify failure modes and
failure causes on a lower level.
I To start on a too low level will give a complete analysis, but may
at the same time be a waste of efforts and money.
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
For each system element (subsystem, component) the analyst must consider
all the functions of the elements in all its operational modes, and ask if any
failure of the element may result in any unacceptable system effect. If the
answer is no, then no further analysis of that element is necessary. If the
answer is yes, then the element must be examined further.
We will now discuss the various columns in the FMECA worksheet on the
previous frame.
3. The various operational modes for the element are listed. Example of
operational modes are: idle, standby, and running. Operational modes
for an airplane include, for example, taxi, take-off, climb, cruise,
descent, approach, flare-out, and roll. In applications where it is not
relevant to distinguish between operational modes, this column may
be omitted.
Rank Description
1-2 Very high probability that the defect will be detected. Verification and/or
controls will almost certainly detect the existence of a deficiency or defect.
3-4 High probability that the defect will be detected. Verification and/or
controls have a good chance of detecting the existence of a deficiency/defect.
5-7 Moderate probability that the defect will be detected. Verification and/or
controls are likely to detect the existence of a deficiency or defect.
8-9 Low probability that the defect will be detected. Verification and/or control
not likely to detect the existence of a deficiency or defect.
10 Very low (or zero) probability that the defect will be detected. Verification
and/or controls will not or cannot detect the existence of a deficiency/defect.
7. The effects each failure mode may have on other components in the
same subsystem and on the subsystem as such (local effects) are listed.
8. The effects each failure mode may have on the system (global effects)
are listed. The resulting operational status of the system after the
failure may also be recorded, that is, whether the system is functioning
or not, or is switched over to another operational mode. In some
applications it may be beneficial to consider each category of effects
separately, like: safety effects, environmental effects, production
availability effects, economic effects, and so on.
9. Failure rates for each failure mode are listed. In many cases it is more
suitable to classify the failure rate in rather broad classes. An example
of such a classification is:
1 Very unlikely Once per 1000 years or more seldom
2 Remote Once per 100 years
3 Occasional Once per 10 years
4 Probable Once per year
5 Frequent Once per month or more often
1 2 3 4 5
10. The severity of a failure mode is the worst potential (but realistic)
effect of the failure considered on the system level (the global effects).
The following severity classes for health and safety effects are
sometimes adopted:
Rank Description
10 Failure will result in major customer dissatisfaction and cause non-
system operation or non-compliance with government regulations.
8-9 Failure will result in high degree of customer dissatisfaction
and cause non-functionality of system.
6-7 Failure will result in customer dissatisfaction and annoyance
and/or deterioration of part of system performance.
3-5 Failure will result in slight customer annoyance and/or slight
deterioration of part of system performance.
1-2 Failure is of such minor nature that the customer (internal or external)
will probably not detect the failure.
11. Possible actions to correct the failure and restore the function or
prevent serious consequences are listed. Actions that are likely to
reduce the frequency of the failure modes should also be recorded. We
come bach to these actions later in the presentation.
12. The last column may be used to record pertinent information not
included in the other columns.
Risk Ranking
The risk related to the various failure modes is often presented either by a:
I Risk matrix, or a
I Risk priority number (RPN)
Risk Matrix
Frequency/ 1 2 3 4 5
consequence Very unlikely Remote Occasional Probable Frequent
Catastrophic
Critical
Major
Minor
All ranks are given on a scale from 1 to 10. The risk priority number (RPN) is
defined as
RPN = S × O × D
The smaller the RPN the better – and – the larger the worse.
Limitations of RPN
I How the ranks O, S, and D are defined depend on the application and
the FMECA standard that is used.
I The O, S, D, and the RPN can have different meanings for each FMECA.
I Sharing numbers between companies and groups is very difficult.
– Based on Kmenta (2002)
Review Objectives
The review team studies the FMECA worksheets and the risk matrices
and/or the risk priority numbers (RPN). The main objectives are:
1. To decide whether or not the system is acceptable
2. To identify feasible improvements of the system to reduce the risk.
This may be achieved by:
• Reducing the likelihood of occurrence of the failure
• Reducing the effects of the failure
• Increasing the likelihood that the failure is detected before the system reaches
the end-user.
Selection of Actions
I Design changes
I Engineered safety features
I Safety devices
I Warning devices
I Procedures/training
Reporting of Actions
RPN Reduction
The risk reduction related to a corrective action may be comparing the RPN
for the initial and revised concept, respectively. A simple example is given in
the following table.
Initial 7 8 5 280
Revised 5 8 4 160
Application Areas
FMECA in Design
Revise
Design
design
Perform
Get system Establish Determine
FMECA, identify
overview failure effects criticality
failure modes
Pros:
I FMECA is a very structured and reliable method for evaluating
hardware and systems
I The concept and application are easy to learn, even by a novice
I The approach makes evaluating even complex systems easy to do
Cons:
I The FMECA process may be tedious, time-consuming (and expensive)
I The approach is not suitable for multiple failures
I It is too easy to forget human errors in the analysis
FMEDA Example
Reference: Goble, W.M. and Brombacher, A. Using a failure modes, effects and diagnosis analysis (FMEDA) to mesure diagnostic coverage in
programmable electronic systems. DOI:10.1016/S0951-8320(99)00031-9 (Journal of Reliability Engineering and System Safety)