Availability of Systems With Self-Diagnostic Components-Applying
Availability of Systems With Self-Diagnostic Components-Applying
www.elsevier.com/locate/ress
Abstract
Of all the techniques applicable to safety-related analyses, each one may be adaptable to some aspects of the system safety behavior. On
the other hand, some of them can fit to analysis on one aspect of the system behavior concerning risk, but they do not always lead to the same
results. Rouvroye and Brombacher made a comparison of these techniques and indicated that Markov and Enhanced Markov analysis
techniques can cover most aspects of system’s safety-related behavior. According to their conclusion, the Markov method is introduced to
Part 6 of the standard IEC 61508 for quantitative analysis in this paper. The purpose is to present explanation in details for solutions given in
the standard because there are not clear descriptions for many results and it is not easy for a safety engineer to find the clue. In addition, the
down time tc1 shown in the standard is newly defined because it is the basis to get the results of average probability of failure on demand of
system architectures and its meaning is not clearly explained. Through derivation, however, a discrepancy is found in the standard. From this
point of view, new suggestions are proposed based on the results obtained.
q 2003 Elsevier Science Ltd. All rights reserved.
Keywords: IEC 61508; Self-diagnosis; Probability of failure on demand; Markov model
Nomenclature tGE voted group equivalent mean down time for 1oo2
and 2oo3 architectures
A steady state availability
t0GE voted group equivalent mean down time for
FF failure frequency
1oo2D architecture
FðtÞ probability of failure of a system
l; m failure and equivalent repair rates of a channel
MDT mean down time
lD dangerous failure rate ðlD ¼ lDD þ lDU Þ of a
MTTR mean time to restoration
channel in a subsystem
PCC probability of failure caused by common cause
lDD detected dangerous failure rate of a channel in a
pdf probability density function
subsystem
Pi ðtÞ; Pi probability and steady state probability of system
lDU undetected dangerous failure rate
in the ith state
lSD detected safe failure rate of a channel
PFD average probability of failure on demand for one
mDD repair rate of detected dangerous fault in a channel
channel
mDU repair rate of undetected dangerous fault in a
PFDG average probability of failure on demand for
channel
system architectures
b the fraction of undetected failures that have a
T1 proof-test interval
common cause (expressed as a fraction in the
tc1 equivalent mean down time for the undetected
equations and as a percentage elsewhere)
failure of a channel
bD of those failures that are detected by the diagnostic
tc2 equivalent mean down time for the detected
tests, the fraction that have a common cause
failure of a channel
(expressed as a fraction in the equations and as a
tCE channel equivalent mean down time for 1oo1,
percentage elsewhere)
1oo2, 2oo2, 2oo3 architectures
For details, refer to Table B.1 in Annex B of IEC
t0CE channel equivalent mean down time for 1oo2D
61508-6 [10]
architecture
the Enhanced Markov analysis. So the combination of the ii. The resulting average probability of failure on demand
above two types of Markovian methods is preferable and for the subsystem is less than 1021, or the resultant
hence recommended. probability of failure per hour for the subsystem is less
The present paper takes up the SRSs. Their system than 1025.
configurations are composed of channels that include both iii. The input subsystem comprises the actual sensor(s) and
detectable failures with self-diagnosis and undetectable any other components and wiring, up to but not
failures. The Markovian approach is applied to the including the component(s) where the signals are first
quantitative SIL analyses. In most cases, it can give combined by voting or other processing. For example,
satisfactory results. The intention of this paper is to present the configuration for two sensor channels is shown in
clue for solutions of many probabilistic parameters Annex B of Part 6 of IEC 61508 [10].
regulated in IEC 61508-6 [10] based on the specific system iv. The hardware failure rates used as inputs to the
structures and associated conditions. It is not easy for a calculations and tables are for a single channel of the
common safety engineer to understand the results of these subsystem. For example, if 2-out-of-3 sensors are used,
probabilistic parameters because there is no detailed process the failure rate is for a single sensor and the result of
of derivation. Especially, in this paper, the down time tc1 is failure rate for a 2-out-of-3 is calculated separately.
newly defined since it is the basis as shown in typical system v. All channels in a voted group have the same failure rate
architectures and its meaning is not clearly explained in the and diagnostic coverage rate.
standard. After detailed derivation, however, difficulties vi. The overall hardware failure rate of a channel in a
and discrepancy are found for some specified examples in subsystem is the sum of the failure rates: dangerous-
Part 6 of IEC 61508. Suggestions for solving them are and safe-failures for that channel. These values are
hence proposed. assumed to be equal.
vii. For each safety function, there is a perfect proof
testing and repairing. Namely, all failures that remain
2. Assumptions undetected are assumed to be detected by the proof
test.
In order to describe the state transitions of systems as viii. The proof test interval is at least one order of
clearly as possible, we make the following assumptions: magnitude greater than the diagnostic test interval.
ix. The demand rate and expected interval between demands
i. Component failure and repair rates are constant over are not considered in this paper. Therefore, we can
the life of the system. analyze the SRS failures separately from the demand.
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 135
Hence
ð T1
ta ¼ 2lDU t½1 2 expð2lDU tÞexpð2lDU tÞdt
0
ðT1
zz 2lDU ½1 2 expð2lDU tÞexpð2lDU tÞdt
0
ðT1
¼ 2lDU t½1 2 expð2lDU tÞexpð2lDU tÞdt
0
2
1 2 expð2lDU T1 Þ
2 3 7 2 3
< T1 2 lDU T12 þ l T 2 ··· ð4Þ
3 4 12 DU 1
since lDU T1 p 1 and expð2lDU T1 Þ < 1 2 lDU T1 : As
lDU T1 , 0:1; ta is a little less than 2T1 =3 but approaches
the value, tc1 can be evaluated as
Similar to the case for 1oo2 architecture, tc1 for 1oo2D and
2oo3 architectures can be obtained as given in Eq. (5). If
lDU T1 , 0:1; one can justify that 2T1 =3 is a quite good
approximation to the real value of ta for these three
architectures by using numerical examples.
0 0
0 1
Fig. 2. Reliability block diagrams.
1 0
1 1
For 1oo2 architecture (see Fig. 2b), the probability of State 0: operation state; State 1: failure state.
failure for the undetectable fault is Then the system states transition diagram obtained is
shown in Fig. 4.
½1 2 expð2lDU tÞ :2
This figure gives a set of differential equations:
8 0
>
> P ðtÞ ¼ 2ðlDD þ lDU ÞP0 ðtÞ þ mDD P1 ðtÞ þ mDU P2 ðtÞ;
> 0
>
>
< P01 ðtÞ ¼ lDD P0 ðtÞ 2 ðlDU þ mDD ÞP1 ðtÞ þ mDU P3 ðtÞ;
>
> P02 ðtÞ ¼ lDU P0 ðtÞ 2 ðlDD þ mDU ÞP2 ðtÞ þ mDD P3 ðtÞ;
>
>
>
: 0
Fig. 3. Process for the undetected dangerous fault. P3 ðtÞ ¼ lDU P1 ðtÞ þ lDD P2 ðtÞ 2 ðmDD þ mDU ÞP3 ðtÞ:
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 137
and the system failure frequency, FF, is given as [11] System System state definition
state
X X
FF¼ Pk ajk ; ð9Þ
k[W j[F 0 Two channels are operative (up state)
1 Only one channel is in operation (up state)
where 2 The two channels are all in fault (down state)
F is the failure states set of the system, So, the states transition diagram is easily obtained
W is the operating states set of the system, as shown in Fig. 5 where two repair teams can be
Pk is the probability of system in working state k and available to work on all known failures in the system (see
ajk is the element of M given in Eq. (6). assumption xi).
138 T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141
1
tGE ¼ MDT ¼
2m
4.4. 2oo2 architecture
1
¼ ½ðlDU =lD ÞðT1 =3þMTTRÞþðlDD =lD ÞMTTR: ð16Þ
2 This system consists of two channels connected in
Hence, the PFDG for this architecture is parallel. The system is in fault whenever anyone fails. See
Figs. 1d and 2d for system block diagrams. From Eq. (12),
l2 PFDG for this architecture is easily obtained on the basis of
PFDG ¼ P2 þPCC ¼ þPCC < l2 =m2 þPCC
ðl þ mÞ2 reliability block diagram
PFDG ¼ 2lD tCE ; ð19Þ
¼ l2 tCE
2
þPCC
where tCE is given in Eq. (11).
¼ 2½ð12 bD ÞlDD þð12 bÞlDU 2 tCE tGE þ bD lDD
4.5. 2oo3 architecture
MTTRþ blDU ðT1 =2þMTTRÞ ð17Þ
by considering the effects of common causes and l p m: The block diagrams of this architecture are shown in
Refer to Fig. 2b, the two channels and common cause Figs. 1e and 2e. Refer to Section 4.2, the Markov states
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 139
MTTR=2: ð23Þ
Fig. 6. Markov states transition diagram.
Table 1
Comparisons among equivalent mean down times for the three system architectures
System tCE ; tGE and t 0GE obtained by Markov model tCE ; tGE and t 0GE given in IEC 61508-6
1oo2 tCE ¼ ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD ÞMTTR tCE ¼ ðlDU =lD ÞðT1 =2 þ MTTRÞ þ ðlDD =lD ÞMTTR
2oo3 tGE ¼ 1=2½ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD ÞMTTR tGE ¼ ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD ÞMTTR
1oo2D t 0CE ¼ ðlDU =lÞðT1 =3 þ MTTRÞ þ ðlDD =lÞMTTR; l ¼ lDD þ lDU þ lSD t 0CE ¼ ðlDU =lÞðT1 =2 þ MTTRÞ þ ðlDD =lÞMTTR; l ¼ lDD þ lDU þ lSD
lDU ðT1 =3 þ MTTRÞ þ ðlDD þ lSD ÞMTTR lDU ðT1 =3 þ MTTRÞ þ ðlDD þ lSD ÞMTTR
t 0GE ¼ t 0GE ¼
2ðlDU þ lDD þ lSD Þ lDU þ lDD þ lSD
140 T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141
equation Table 3
PFDG obtained by two different methods
1 ðT1
PFDG ¼ ½1 2 AðtÞdt: ð25Þ System architecture PFDG
T1 0
For 1oo1 architecture by Eq. (25) by steady state values
understood all other results of average probability of failure [3] Kato E, Sato Y. Safety integrity levels model for IEC 61508—
on demand of typical architectures. examination of modes of operation. IEICE Trans A 2000;E83-A(5):
863–5.
As discussed in the above sections, it shows that there [4] Muta H, Ibe H, Sugiyama E. Safety design of oil reclamation system
is a discrepancy in calculation of tCE ; tGE and t0GE for using IEC 61508. PSAM5—Proceedings of the Fifth International
1oo2, 1oo2D and 2oo3 system architectures between Conference on Probabilistic Safety Assessment and Management,
the ones given in the standard IEC 61508-6 and the new Osaka, Japan; Nov. 27– Dec. 1, 2000. p. 479 –84.
ones obtained by Markov model. These differences are [5] Kawahara T, Kushibiki T, et al. Safety-integrity of safety-related
systems with human beings. PSAM5—Proceedings of the Fifth
presented in Table 1.
International Conference on Probabilistic Safety Assessment and
The average probabilities of failure on demand Management, Osaka, Japan; Nov. 27–Dec. 1, 2000. p. 2411– 7.
obtained by Markov model for 1oo2, 1oo2D and 2oo3 [6] Kato E, Sato Y. Safety integrity levels model for IEC 61508.
system architectures are of the same forms with those PSAM5—Proceedings of the Fifth International Conference on
presented in IEC 61508-6. Where, however, the Probabilistic Safety Assessment and Management, Osaka, Japan;
Nov. 27– Dec. 1, 2000. p. 2787–93.
expressions of tCE ; tGE or t0GE for these three architectures
[7] Misumi Y, Sato Y. Estimation of average hazardous-event-frequency
are different from what are newly obtained. No for allocation of safety-integrity levels. Reliab Engng Syst Safety
description in details can be found for getting tCE ; tGE 1999;66:135 –44.
and t0GE in this standard. As a result, the new expressions [8] ISA-S84.01.1996. Application of safety instrumented systems for
for tCE ; tGE and t0GE are suggested to be applied. process industries. Instrument Society of America, Research Triangle
Park; 1996.
[9] Rouvroye JL, Brombacher AC. New quantitative safety standards:
different techniques, different results? Reliab Engng Syst Safety 1999;
References 66:121–5.
[10] IEC 61508-6. Functional safety of electric/electronic/programmable
[1] IEC 61508. Functional safety of electric/electronic/programmable electronic safety-related systems. Part 6. Guidelines on the application
electronic safety-related systems, Parts. 1–7;October 1998–May 2000. of IEC 61508-2 and IEC 61508-3; April 2000.
[2] Karydas DM, Brombacher AC (Guest editors). Special issue— [11] Cao JH, Cheng K. An introduction to mathematics of
Reliability certification of programmable electronic systems. Reliab reliability. Beijing: Publication of Science; 1986. p. 2. p. 210– 30,
Engng Syst Safety, No. 2; 1999. p. 66. in Chinese.