0% found this document useful (0 votes)
47 views5 pages

An Integrated Method For Incorporating Common Cause Failures in System Analysis

fdgdfgd

Uploaded by

Andrés Zúñiga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views5 pages

An Integrated Method For Incorporating Common Cause Failures in System Analysis

fdgdfgd

Uploaded by

Andrés Zúñiga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

An Integrated Method for Incorporating Common Cause Failures

in System Analysis
Zhihua Tang, University of Virginia, Charlottesville
Joanne Bechta Dugan, University of Virginia, Charlottesville

Key Words: Common Cause Failure, Common Cause Group, Binary Decision Diagram, Failure Process, Unreliability.

SUMMARY & CONCLUSIONS


CCF are simultaneous failures of multiple components due
This paper proposes an integrated method for to a CC. The CC can be lightning events, sudden changes in
incorporating common cause failures (CCF) in system environment, multiple errors in maintenance, or design
analysis. CCF are simultaneous failures of multiple weaknesses. Among the various definitions or descriptions of
components due to a common cause (CC). CCF tend to the concept of CCF, a fairly general definition of a CCF is
increase the joint-failure probabilities and can have dominant given by Mosleh et al. as “A subset of dependent events in
contribution to the system unreliability. Basically there are which two or more component fault states exist at the same
two methods for CCF analysis, the explicit method and the time, or in a short time interval, and are direct results of a
implicit method. Both methods have their own shortcomings, shared cause.” (Ref. 1). It is well-known that CCF occur most
especially when they are applied to large-scale fault trees. Our frequently in systems designed with redundancy techniques,
method provides accurate and efficient CCF analysis by which are characterized by the use of statistically-identical (s-
incorporating implicit method into binary decision diagram identical) components. It has been shown by many reliability
(BDD) and applying Markov model in dynamic fault tree studies that CCF tend to increase the joint-failure probabilities
(DFT) analysis. It avoids the disadvantages in the explicit and contribute significantly to the overall unreliability of
method or the implicit method and is customized for currently complex systems (Refs. 2, 3).
widely-used DFT model. Consider a TMR (Triple Modular Redundancy) system
(Ref. 4) with three redundant components A, B, and C and a
1. INTRODUCTION Voter, as shown in Figure 1. The three redundant components
all perform the same task and the Voter selects the correct
Acronyms1 output from among the three redundant outputs. As long as at
least two of the redundant components are operating correctly
BDD Binary Decision Diagram and the voter is not failed, then the TMR configuration
CC Common Cause operates correctly.
CCF Common Cause Failure
CCG Common Cause Group Module 1
DFT Dynamic Fault Tree
TMR Triple Modular Redundancy A
s- statistical(ly) Module 2
Notation B Voter Output

Module 3
m number of component in a CCG.
Rm(k ) the probability of the occurrence of k success C
events among m components subject to a CCF.
Figure 1: Triple Modular Redundancy (TMR)
p the reliability of a component.
Pj the probability that a component survives j-
Taking CCF into account, the total failure probability of
component failure process component A can be expressed as follows:
U TMR the system unreliability of the TMR system
ignoring CCF. AT = AI + C AB + C AC + C ABC , (1)
CCF where,
U TMR the system unreliability of the TMR system
including CCF. AT = total failure of component A,
λ failure rate. AI = failure of component A from s-independent
1
causes,
The singular & plural of an acronym are always spelled
the same.

RAMS 2004 - 610 - 0-7803-8215-3/04/$17.00 © 2004 IEEE

Authorized licensed use limited to: b-on: Universidade de Lisboa Reitoria. Downloaded on March 17,2021 at 13:49:40 UTC from IEEE Xplore. Restrictions apply.
C AB = failure of components A and B (but not C) from In the explicit method, any component involving a CCF is
common causes, modeled as a set of basic events in the system fault tree. Each
C AC = failure of components A and C (but not B) from basic event represents a type of failure process, that is, the
system is modeled with failure process level basic events.
common causes, Since all failure processes are s-independent, the resultant
C ABC = failure of components A, B and C from common fault tree can be solved using traditional fault tree analysis
causes. techniques. In practice, the system fault tree with explicit CCF
modeling can be built by expanding a system fault tree with
Similar expressions can be developed for components B extra CCF events. Taking the TMR system (shown Figure 1)
and C. Thus, by assuming that the voter is perfect, the cut sets as an example, the fault tree and the expanded fault tree are
of the TMR system are: { AI , B I } , { AI , C I } , {B I , C I } , illustrated in Figure 2.
{C AB }, {C AC }, {C BC } and {C ABC } . It is evident that if only
independence is assumed, only the first three cut sets are
involved. The neglect of other common cause cut sets makes
the result of reliability analysis optimistic.
Basically there are two different approaches for
incorporating CCF into system analysis: explicit and implicit
methods (Ref. 5). The explicit method models CCF as shared
basic events in the system fault tree, then applies conventional
fault tree analysis approaches to analyze the fault tree with
CCF basic events. In the implicit method, the fault trees are
built without considering CCF and then the algebraic system
unreliability expression is derived in a certain form. Such an
expression then is evaluated in a way that the contribution of
CCF is correctly included.
Unfortunately, both explicit and implicit methods are
inappropriate to the CCF analysis for large-scale fault trees.
The explicit method may involve adding a large number of
CCF basic events into a fault tree. This is tedious to handle Figure 2: A TMR Fault Tree and Its CCF Expanding
especially for high level redundancy. As for the implicit
method, usually the algebraic expression of system In the expanded fault tree, each component is extended
unreliability is not easy to derive. This makes the implicit from the single basic event to three basic events. One of them
method feasible only in hand-calculating the unreliabilities of represents the independent failure process; the other two
relatively small systems. represent the CCF, which are shared with other involved
The method described in this paper combines the implicit components. After being expanded, the system can be
method with the efficient Boolean function manipulation evaluated with either BDD or Markov models.
method – BDD. It remains the advantage of the implicit A major disadvantage of the explicit method is that the
method and makes the implicit method more practical. The number of expanded basic events is subject to combinatorial
CCF analysis in the DFT model is also discussed. explosion as the redundancy level increases; hence the costs
(time and space) of system analysis increase remarkably.
2. CCF ANALYSIS REVIEW m
Generally, extra ∑( m
j ) basic events are needed to explicitly
2.1 General Assumptions j =2
model a system with m redundant components.
1) CCF exist only among the s-identically distributed
components. 2.3 Implicit Method
2) Each component is subject to at most one kind of
CCF. An implicit method considers CCF in the process of
3) All components subjected to the same CCF are analysis rather than in modeling stage, it does not need to
categorized as a common cause group (CCG). perform the tedious basic event expanding to including CCF.
4) Each CCF is associated with a set of failure processes The implicit method consists of three main steps (Ref. 3):
(independent failures and CC failures), which are named 1) For each CCG with m s-identical components, calculate
with the number of components involved. the probabilities of k ( 1 ≤ k ≤ m ) success events, Rm(k ) .
5) All failure processes are mutually statistically-
2) Determine the system reliability/unreliability expression
independent (s-independent).
in terms of the individual-component reliabilities (success
probabilities), without considering any CCF. This expression
2.2 Explicit Method
is linear function of the individual-component reliabilities and
has been reduced into the form of “sum of products”.

RAMS 2004 - 611 - 0-7803-8215-3/04/$17.00 © 2004 IEEE

Authorized licensed use limited to: b-on: Universidade de Lisboa Reitoria. Downloaded on March 17,2021 at 13:49:40 UTC from IEEE Xplore. Restrictions apply.
3) In any term containing a product of k success With this two reduction rules, a BDD can represent a
probabilities of k basic events belonging to the same CCG of Boolean function efficiently. For details about BDD, the
m s-identical components, replace that product with a Rm(k ) . reader is referred to two papers by Bryant (Refs. 7, 8).
As an example, the unreliability (without considering
3.2 Incorporating the Implicit Method into BDD
CCF) of the TMR system (as shown in Figure 3.1 and
assuming Voter is perfect) is expressed as:
One major inconvenience in the implicit method is to
derive and manipulate the algebraic Boolean expression of
U = 1 − 3 p 2 + 2 p3 , (2) system unreliability. This inconvenience constrains the use of
implicit method to relatively small systems. One way to
where p is the reliability of each redundant component. overcoming this inconvenience is to find an efficient method
to derive and manipulate the algebraic Boolean expression of
After taking CCF into account, p 2 and p3 are replaced as
system unreliability. As we have known, the BDD is the most
R3( 2) and R3(3) respectively. The resulting TMR unreliability efficient data structure to handle Boolean functions. It is a
including CCF is: natural thought to incorporate the implicit method into BDD.
Since a static fault tree is essentially a Boolean formula
U TMR = 1 − 3R3( 2) + 2 R3(3) . (3) and can be efficiently encoded by means of BDD, the BDD
converted from a static fault tree is also essentially a Boolean
formula. The product of the variables (either in operational
The terms Rm(k ) ’s were originally derived in Ref. 6. They state or in failed state) on the edges along one 1-path (the path
are calculated using equations (4) and (5), where Pj is the from root node to 1-terminal node) corresponds to one term in
the Boolean formula. After replacing the variables with
reliability that is associated with j-component failure process:
corresponding probabilities, the sum of products got from all
1-paths is the algebraic Boolean expression of system
m
unreliability. Take the TMR system as an example, the BDD
∏[P ]
( mj−−11 )
Rm(1) = j , (4)
is constructed as shown in Figure 3 and we can find three 1-
j =1
paths.
m
Rm( k ) = ∏R
n = m − k +1
n
(1)
. (5) A

1
Although the implicit method does not bother to expand B B
the basic events, in practice, its limitations are also obvious:

0
1
1) The implicit is practical only for relatively small
systems (e.g. less than 10 components). When the system C
0

1
becomes larger, it is very difficult to determine the reliability
1
0

or unreliability expression and reduce it into the form of “sum


of products”. 0 1
2) The implicit method is not easy to be computerized,
since it needs to get the reliability or unreliability expression
in certain form. Figure 3: The BDD of example TMR system

3. INCORPORATING THE IMPLICIT METHOD INTO From the BDD shown in Figure 3, we can get the Boolean
BDD formula corresponding to the example TMR system as:
3.1 BDD
fTMR ( A, B, C ) = AB + AB C + A BC . (6)
A BDD is a directed acyclic graph representation of a
Boolean function. A BDD has two terminal nodes: 0 and 1 By replacing the variables with corresponding
encoding the two corresponding constant functions. Each reliabilities/unreliabilities, we can get the algebraic expression
internal node has two edges, which are called 0-edge and 1- of system unreliability as:
edge respectively. A BDD is derived by reducing a binary
decision tree, which represents the recursive execution of U TMR = q A q B + q A p B qC + p A q B qC . (7)
Shannon’s decomposition. The following reduction rules are
used in the construction of a BDD from a binary decision tree. Note that equation (7) assumes that component A, B and C
1) Delete all the redundant nodes whose two edges point to are s-independent. If component A, B and C are subject to a
the same node. CC, we need to calculate the joint-probabilities for each 1-
2) Merge all the isomorphic subgraphs. path. The system unreliability is expressed as:

RAMS 2004 - 612 - 0-7803-8215-3/04/$17.00 © 2004 IEEE

Authorized licensed use limited to: b-on: Universidade de Lisboa Reitoria. Downloaded on March 17,2021 at 13:49:40 UTC from IEEE Xplore. Restrictions apply.
CCF
U TMR = Pr{ AB} + Pr{ AB C} + Pr{ A BC} . (8) Usually the DFT models are evaluated by converting the DFT
into Markov models.
Incorporating CCF analysis into Markov models is
Ref. 9 derived a general formula for calculating the joint-
conceptually straightforward by adding additional transitions
probability of the occurrence of a set of events subject to a
to represent the occurrences of CCF. This can be done based
CC. Given the size of CCG, m, the size of event set, g, and the
on the conventional Markov models, and the resultant Markov
number of success events, k, the joint-probability is calculated
models have the same states with the original Markov models.
as:
g −k
Consider the TMR example, the Markov model that ignores
Pr{E1E2 ...E g } = ∑(
i =0
g −k i (i + k )
i ) ⋅ ( −1) Rm . (9) the CCF is shown in Figure 4. All components have the same
failure rate λ . In the initial state (state 1), all components are
operational. The failure of any one of the three components
will trigger a transition from the initial state to state 2, which
For example, the joint-probabilities in equation (8) are
represents a system configuration with 2 operational
calculated as follows:
components and one failed component. Furthermore, any
2−0
component failure in state 2 will lead to the system failed state
Pr{ AB} = ∑(
i =0
2−0 i (i + 0)
i ) ⋅ ( −1) R3 = 1 − 2 R3(1) + R3( 2) ; (10) (state F), according to the failure criteria of a TMR system.

3λ 2λ
1 2 F
3−1
Pr{ AB C} = ∑(
i =0
3−1 i (i +1)
i ) ⋅ ( −1) R3 = R3(1) − 2 R3( 2) + R3(3) ; (11)
Figure 4: The TMR Markov model ignoring CCF

3 −1 If taking CCF into consideration, the transitions in Figure


Pr{ A BC} = ∑ i =0
(3 −i1 ) ⋅ (−1)i R3(i +1) = R3(1) − 2 R3( 2) + R3(3) . (12) 4 only represent the transitions triggered by independent
failure processes. Extra transitions may occur due to the
failure processes involving the CCF. The corresponding TMR
Equation (8) is evaluated as the sum of joint-probabilities Markov model including CCF is shown in Figure 5.
(equations 10-12):
λ( 2 )
CCF
U TMR = (1 − 2 R3(1) + R3( 2) ) + 2( R3(1) − 2 R3(2) + R3(3) )
(13) 3λ(1) 2λ(1)
= 1 − 3R3( 2) + 2 R3(3) . 1 2 F
3λ( 2 )
3.3 Algorithm Summary

The algorithm of CCF analysis based on BDD model is λ(3)


summarized as follows:
1) Build the fault tree model without considering CC
events. Figure 5: The TMR Markov model including CCF
2) Convert the fault tree model into BDD representation.
3) Traverse the BDD and partition the event in each 1-path In the initial state (state 1), the system is in the operational
into different event sets according to the CCG they are state with three operational components. The possible failure
involved in. processes include three independent (1-component) failure
4) For each event set, use equation (9) to calculate the processes, three 2-component failure processes and one 3-
joint-probability. Multiply all the joint-probabilities and s- component failure process. The failure rates of these three
independent probabilities in one 1-path to get the unreliability type of failure processes are denoted as λ (1) , λ( 2) and λ(3)
of that 1-path. respectively. The occurrence of an independent failure process
5) The sum of all 1-path unreliabilities is the system leads to a transition from the initial state to state 2. According
unreliability with CCF. to the failure criteria of a TMR system, either 2-component or
3-component failure processes will lead to a transition to the
4. CCF ANALYSIS IN DFT failed state (state F). State 2 represents a configuration of 1-
out-of-2 system, either 1-component or 2-component failure
DFT models introduce dynamic gates, such as spare gates, processes will lead to system failure.
sequence gate, priority AND gate and functional dependent Note that due to the existence of “causal failures” in
gate (Ref. 10), to capture the sequential dependencies among dynamic systems, we need to pay special attention to the
the system events. Unlike the static fault trees, there is not causal failures that are subject to CCF. The occurrence of a
any explicit Boolean expression of system unreliability. causal failure will lead to other component failures. Thus both

RAMS 2004 - 613 - 0-7803-8215-3/04/$17.00 © 2004 IEEE

Authorized licensed use limited to: b-on: Universidade de Lisboa Reitoria. Downloaded on March 17,2021 at 13:49:40 UTC from IEEE Xplore. Restrictions apply.
effects of CCF and causal failure must be carefully examined Systems”, IEEE Transactions on Reliability, Vol. 41, No.3,
in order to determine the correct destinations of transitions. September 1992, pp 363-377.
The Markov model including the CCF can be solved as a
standard Markov model. This usually involves the solving of a BIOGRAPHIES
set of linear, ordinary differential equations, which results in a
set of state probabilities. The system unreliability is the Zhihua Tang
probability of system in any failed state. Department of Electrical & Computer Engineering
351 McCormick Road; PO Box 400743
ACKNOWLEDGMENTS University of Virginia
Charlottesville VA 22904-4743 USA
We would like to thank NASA Langley Research Center
under NASA Contract NAS1-02076, by which the work e-mail: [email protected]
reported in this paper was funded.
Zhihua Tang received his B.S. degree in Applied Physics
REFERENCES from Northeastern University, Shenyang, China, in 1996, the
M.S. degree in Computer Science from Shenyang Institute of
1. A. Mosleh et al. “Procedure for Treating Common- Computing Technology, Chinese Academy of Science, China,
Cause Failures in Safety and Reliability Studies,” U.S. in 1999, and the M.S. degree in Electrical Engineering from
Nuclear Regulatory Commission, NUREG/CR-4780, Vol. I University of Virginia, Charlottesville, VA in 2002. He is now
and II, Washington, DC. 1988. a Ph.D. student in the Department of Electrical & Computer
2. S. Mitra. et al. “Common-Mode Failures in Redundant Engineering at the University of Virginia. He is a student
VLSI Systems: A Survey,” IEEE Transactions on Reliability, member of IEEE.
Vol. 49, No.3, September 2000, pp. 285-295.
3. J. K. Vaurio, “An Implicit Method for Incorporating Joanne Bechta Dugan, Ph.D.
Common-Cause Failures in System Analysis,” IEEE Department of Electrical & Computer Engineering
Transactions on Reliability, Vol. 47, No.2, pp. 173-180, 1998 University of Virginia
June. 351 McCormick Road PO Box 400743
4. B. W. Johnson, “An Introduction to the Design and Charlottesville, Virginia 22904-4743 USA
Analysis of Fault-Tolerant Systems,” In D.K. Pradhan, editor,
Fault-Tolerant Computer System Design, pp.1-84, Prentice e-mail: [email protected]
Hall, 1996.
5. K.N. Fleming, A. Mosleh, “Common-cause data Joanne Bechta Dugan was awarded the B.A. degree in
analysis and implications in system modeling”, Proc. Int’l Mathematics and Computer Science from La Salle University,
Topical Meeting on Probabilistic Safety Methods & Philadelphia, PA in 1980, and the M.S. and Ph.D. degrees in
Applications, 1985 Feb 24 –Mar 1, Vol 1. pp. 3/1-3/12. Electrical Engineering from Duke University, Durham, NC in
6. K. C. Chae, G.M. Clark, “System Reliability in the 1982 and 1984, respectively. Dr. Dugan is currently Professor
Presence of Common-cause Failures,” IEEE Transactions on of Electrical and Computer Engineering at the University of
Reliability, Vol R-35, 1986 Apr, pp 32-35. Virginia. She has performed and directed research on the
7. R. Bryant, “Graph based algorithms for Boolean development and application of techniques for the analysis of
function manipulation,” IEEE Transactions on Computer, computer systems that are designed to tolerate hardware and
35(8), 1987, pp. 667-691. software faults. Her research interests include hardware and
8. R. E. Bryant, “Symbolic Boolean manipulation with software reliability engineering, fault tolerant computing, and
ordered binary-decision diagrams,” ACM Computing Surveys, mathematical modeling using dynamic fault trees, Markov
Vol. 24, No. 3, Sept. 1992, pp.293-318. models, Petri nets, and simulation. Professor Dugan is a
9. Z. Tang, Common Cause Failure Analysis and member of Phi Beta Kappa, Eta Kappa Nu, Tau Beta Pi and
Improved Solution Techniques for Dynamic Fault Trees, IEEE; is an IEEE Fellow; was Associate Editor of the IEEE
Master thesis, University of Virginia, 2002. Transactions on Reliability for 10 years; and is currently
10. J.B. Dugan, Salvatore J. Bavuso and Mark A. Boyd, Associate Editor of the IEEE Transactions on Software
“Dynamic Fault Tree Models for Fault Tolerant Computer Engineering. She is a past winner of both the P.K. McElroy
and the Alan O. Plait Awards.

RAMS 2004 - 614 - 0-7803-8215-3/04/$17.00 © 2004 IEEE

Authorized licensed use limited to: b-on: Universidade de Lisboa Reitoria. Downloaded on March 17,2021 at 13:49:40 UTC from IEEE Xplore. Restrictions apply.

You might also like