0% found this document useful (0 votes)
126 views9 pages

Availability of Systems With Self-Diagnostic Components-Applying

This document discusses applying Markov models to quantitative analysis in IEC 61508-6 for evaluating the reliability of systems with self-diagnostic components. It presents explanations for solutions given in IEC 61508-6, which lacks clear descriptions. Markov analysis is introduced as it can cover most aspects of quantitative safety evaluation. The paper defines a new equivalent mean down time and proposes new suggestions based on derived results, noting a discrepancy in IEC 61508-6.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views9 pages

Availability of Systems With Self-Diagnostic Components-Applying

This document discusses applying Markov models to quantitative analysis in IEC 61508-6 for evaluating the reliability of systems with self-diagnostic components. It presents explanations for solutions given in IEC 61508-6, which lacks clear descriptions. Markov analysis is introduced as it can cover most aspects of quantitative safety evaluation. The paper defines a new equivalent mean down time and proposes new suggestions based on derived results, noting a discrepancy in IEC 61508-6.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Reliability Engineering and System Safety 80 (2003) 133–141

www.elsevier.com/locate/ress

Availability of systems with self-diagnostic components—applying


Markov model to IEC 61508-6
Tieling Zhanga,*, Wei Longb, Yoshinobu Satob
a
HAL Corporation, 6-21-17-701 Nishikasai, Edogawa-Ku, Tokyo 134-0088, Japan
b
Tokyo University of Mercantile Marine, 2-1-6 Etchujima, Koto-Ku, Tokyo 135-8533, Japan
Received 11 December 2000; accepted 19 December 2002

Abstract
Of all the techniques applicable to safety-related analyses, each one may be adaptable to some aspects of the system safety behavior. On
the other hand, some of them can fit to analysis on one aspect of the system behavior concerning risk, but they do not always lead to the same
results. Rouvroye and Brombacher made a comparison of these techniques and indicated that Markov and Enhanced Markov analysis
techniques can cover most aspects of system’s safety-related behavior. According to their conclusion, the Markov method is introduced to
Part 6 of the standard IEC 61508 for quantitative analysis in this paper. The purpose is to present explanation in details for solutions given in
the standard because there are not clear descriptions for many results and it is not easy for a safety engineer to find the clue. In addition, the
down time tc1 shown in the standard is newly defined because it is the basis to get the results of average probability of failure on demand of
system architectures and its meaning is not clearly explained. Through derivation, however, a discrepancy is found in the standard. From this
point of view, new suggestions are proposed based on the results obtained.
q 2003 Elsevier Science Ltd. All rights reserved.
Keywords: IEC 61508; Self-diagnosis; Probability of failure on demand; Markov model

1. Introduction self-diagnostic coverage and the failure rates of components


are often utilized in the evaluation of SILs of SRSs.
Recently IEC 61508 [1] was compiled and published as a The SILs of SRS need to be evaluated by quantitative
modish international standard. Many studies concerning analyses as required by the standard IEC 61508 and some
discussions and applications of the standard have been others like ISA-S84.01 [8]. There are many quantitative
carried out such as those published in the special issue of analysis techniques such as Markov analysis, reliability
Reliability Engineering & System Safety in 1999 [2] and block diagram, hybrid techniques, parts count analysis and
some others [3 –6]. In this standard, two frameworks are FTA. Each of them might fit to cover several aspects of the
concerned. One is risk reduction with Safety-Related system behavior concerning safety. At the same time, one
System (SRS) and the other is the Overall Safety Life- aspect of the system’s risk related behavior may be suitably
Cycle. In order to understand the first framework more analyzed by some of them, but they do not always lead to the
profoundly, the dependence of the risk reduction on both same results. Rouvroye and Brombacher [9] outlined these
Safety Integrity Levels (SILs) of SRS and demands from the techniques and compared them to each other. The
Equipment Under Control to SRS has to be clarified. calculation results obtained by these techniques for the
Misumi and Sato [7] studied this point and expressed the same example showed large differences. Therefore, they
pointed out that the application of different (quantitative)
mutual relationship mathematically by means of a simple
techniques to practical systems would not always lead to the
fault tree analysis (FTA). Their research needs to be
same and definite results. However, they also clearly wrote
developed further. The configuration of SRS, proof test,
that Markov analysis covers most aspects of quantitative
* Corresponding author. safety evaluation of systems. Others that this approach
E-mail addresses: [email protected] (T. Zhang), yoshi@ipc. cannot cover are limited to the uncertainty or sensitivity
tosho-u.ac.jp (Y. Sato). analyses. These aspects, however, can be carried out with
0951-8320/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S0951-8320(03)00004-8
134 T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141

Nomenclature tGE voted group equivalent mean down time for 1oo2
and 2oo3 architectures
A steady state availability
t0GE voted group equivalent mean down time for
FF failure frequency
1oo2D architecture
FðtÞ probability of failure of a system
l; m failure and equivalent repair rates of a channel
MDT mean down time
lD dangerous failure rate ðlD ¼ lDD þ lDU Þ of a
MTTR mean time to restoration
channel in a subsystem
PCC probability of failure caused by common cause
lDD detected dangerous failure rate of a channel in a
pdf probability density function
subsystem
Pi ðtÞ; Pi probability and steady state probability of system
lDU undetected dangerous failure rate
in the ith state
lSD detected safe failure rate of a channel
PFD average probability of failure on demand for one
mDD repair rate of detected dangerous fault in a channel
channel
mDU repair rate of undetected dangerous fault in a
PFDG average probability of failure on demand for
channel
system architectures
b the fraction of undetected failures that have a
T1 proof-test interval
common cause (expressed as a fraction in the
tc1 equivalent mean down time for the undetected
equations and as a percentage elsewhere)
failure of a channel
bD of those failures that are detected by the diagnostic
tc2 equivalent mean down time for the detected
tests, the fraction that have a common cause
failure of a channel
(expressed as a fraction in the equations and as a
tCE channel equivalent mean down time for 1oo1,
percentage elsewhere)
1oo2, 2oo2, 2oo3 architectures
For details, refer to Table B.1 in Annex B of IEC
t0CE channel equivalent mean down time for 1oo2D
61508-6 [10]
architecture

the Enhanced Markov analysis. So the combination of the ii. The resulting average probability of failure on demand
above two types of Markovian methods is preferable and for the subsystem is less than 1021, or the resultant
hence recommended. probability of failure per hour for the subsystem is less
The present paper takes up the SRSs. Their system than 1025.
configurations are composed of channels that include both iii. The input subsystem comprises the actual sensor(s) and
detectable failures with self-diagnosis and undetectable any other components and wiring, up to but not
failures. The Markovian approach is applied to the including the component(s) where the signals are first
quantitative SIL analyses. In most cases, it can give combined by voting or other processing. For example,
satisfactory results. The intention of this paper is to present the configuration for two sensor channels is shown in
clue for solutions of many probabilistic parameters Annex B of Part 6 of IEC 61508 [10].
regulated in IEC 61508-6 [10] based on the specific system iv. The hardware failure rates used as inputs to the
structures and associated conditions. It is not easy for a calculations and tables are for a single channel of the
common safety engineer to understand the results of these subsystem. For example, if 2-out-of-3 sensors are used,
probabilistic parameters because there is no detailed process the failure rate is for a single sensor and the result of
of derivation. Especially, in this paper, the down time tc1 is failure rate for a 2-out-of-3 is calculated separately.
newly defined since it is the basis as shown in typical system v. All channels in a voted group have the same failure rate
architectures and its meaning is not clearly explained in the and diagnostic coverage rate.
standard. After detailed derivation, however, difficulties vi. The overall hardware failure rate of a channel in a
and discrepancy are found for some specified examples in subsystem is the sum of the failure rates: dangerous-
Part 6 of IEC 61508. Suggestions for solving them are and safe-failures for that channel. These values are
hence proposed. assumed to be equal.
vii. For each safety function, there is a perfect proof
testing and repairing. Namely, all failures that remain
2. Assumptions undetected are assumed to be detected by the proof
test.
In order to describe the state transitions of systems as viii. The proof test interval is at least one order of
clearly as possible, we make the following assumptions: magnitude greater than the diagnostic test interval.
ix. The demand rate and expected interval between demands
i. Component failure and repair rates are constant over are not considered in this paper. Therefore, we can
the life of the system. analyze the SRS failures separately from the demand.
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 135

x. For each subsystem, there is a single T1 and MTTR:


MTTR is defined to include the time taken to detect
a failure. It is at least one order of magnitude less
than T1 : In this paper, the single assumed value of
MTTR for both detected and undetected failures
includes the diagnostic test interval but not the T1 :
For undetected failures, the MTTR used in the
calculations should not include the diagnostic test
interval since the mean time to restoration is always
added to the proof test interval, which is at least one
order of magnitude greater than the diagnostic test
interval. The error introduced here is not significant.
xi. Multiple repair teams (each of them is assumed to
have the same repair rate) are available to work on
all known faults in a system.
xii. In a channel, the detected and undetected faults can
exist simultaneously, i.e. if one occurred, the other
one can occur before the former is repaired.
xiii. Repairs of the detected and undetected faults are
viewed as independent for the sake of conservative
consideration though some dependence may be
invoked in the process.

Other assumptions can be referred to the Annex B, IEC


61508-6 [10].

3. Meaning of down time, tc1

In the standard IEC 61508, system configurations are


composed of channels. Each channel includes both
detectable failures with rate lDD by self-diagnosis and
undetectable failures with rate lDU : See Figs. 1 and 2 for
physical and reliability block diagrams of five typical
system architectures. The failure rates lDD and lDU are
assumed to be constants. Hence, the times of occurrences
for these two kinds of failures follow the exponential
distributions. For the detected dangerous fault is repaired to
be good, the MTTR is used. However, the undetectable Fig. 1. Physical block diagrams.
dangerous fault cannot be detected out until the next proof-
test. It follows such a process as shown in Fig. 3. In the
figure, t is the time of occurrence of the failure and td is the For one channel architecture,
duration of down time.
ðT1 ðT1 
In the standard, tc1 is not clearly defined (refer to
Fig. 2). However, it is the basis to get all results of ta ¼ tlDU expð2lDU tÞdt lDU expð2lDU tÞdt
0 0
average probability of failure on demand of typical
ðT1   
system architectures. If its meaning is not clear, it is not
¼ tlDU expð2lDU tÞdt 1 2 expð2lDU T1 Þ
easy for a safety engineer to understand all of other 0
solutions. Here, tc1 is named equivalent mean down time
for the undetectable fault in a channel. It is defined as < T1 =2 ð2Þ
follows:
Suppose ta is the time when the average probability of since lDU T1 p 1 and expð2lDU T1 Þ < 1 2 lDU T1 : Therefore
failure for the undetectable fault in the interval ½0; T1 
tc1 ¼ T1 =2 þ MTTR: ð3Þ
occurs in a system. Then
If lD T1 , 0:1; T1 =2 is a quite good approximation to the real
tc1 ¼ T1 2 ta þ MTTR: ð1Þ value of ta :
136 T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141

Hence
 ð T1 
ta ¼ 2lDU t½1 2 expð2lDU tÞexpð2lDU tÞdt
0
ðT1 
zz 2lDU ½1 2 expð2lDU tÞexpð2lDU tÞdt
0
ðT1 
¼ 2lDU t½1 2 expð2lDU tÞexpð2lDU tÞdt
0
 2
1 2 expð2lDU T1 Þ
2 3 7 2 3
< T1 2 lDU T12 þ l T 2 ··· ð4Þ
3 4 12 DU 1
since lDU T1 p 1 and expð2lDU T1 Þ < 1 2 lDU T1 : As
lDU T1 , 0:1; ta is a little less than 2T1 =3 but approaches
the value, tc1 can be evaluated as

tc1 ¼ T1 2 ta þ MTTR ¼ T1 =3 þ MTTR: ð5Þ

Similar to the case for 1oo2 architecture, tc1 for 1oo2D and
2oo3 architectures can be obtained as given in Eq. (5). If
lDU T1 , 0:1; one can justify that 2T1 =3 is a quite good
approximation to the real value of ta for these three
architectures by using numerical examples.

4. Availability of system architectures by Markov model

4.1. 1oo1 architecture

This system consists of a single channel, where any


dangerous failure leads to a failure of the safety function
when a demand arises. Its physical- and reliability-block
diagrams are shown in Figs. 1a and 2a. These dangerous
failures are divided into two parts being regarded as
components c1 and c2. Thus the system is composed of
two components connected in series so that there are four
system states as follows:

System state Component c1 Component c2

0 0
0 1
Fig. 2. Reliability block diagrams.
1 0
1 1
For 1oo2 architecture (see Fig. 2b), the probability of State 0: operation state; State 1: failure state.
failure for the undetectable fault is Then the system states transition diagram obtained is
shown in Fig. 4.
½1 2 expð2lDU tÞ :2
This figure gives a set of differential equations:
8 0
>
> P ðtÞ ¼ 2ðlDD þ lDU ÞP0 ðtÞ þ mDD P1 ðtÞ þ mDU P2 ðtÞ;
> 0
>
>
< P01 ðtÞ ¼ lDD P0 ðtÞ 2 ðlDU þ mDD ÞP1 ðtÞ þ mDU P3 ðtÞ;
>
> P02 ðtÞ ¼ lDU P0 ðtÞ 2 ðlDD þ mDU ÞP2 ðtÞ þ mDD P3 ðtÞ;
>
>
>
: 0
Fig. 3. Process for the undetected dangerous fault. P3 ðtÞ ¼ lDU P1 ðtÞ þ lDD P2 ðtÞ 2 ðmDD þ mDU ÞP3 ðtÞ:
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 137

From Eqs. (6) and (9),


FF ¼ P0 ðlDD þ lDU Þ
mDD mDU
¼ ðlDD þ lDU Þ : ð10Þ
ðlDD þ mDD ÞðlDU þ mDU Þ
Further,
12A l l þ lDD mDU þ lDU mDD
MDT ¼ ¼ DD DU
FF ðlDD þ lDU ÞmDD mDU
lDU l
< ðT =2 þ MTTRÞ þ DD £ MTTR
Fig. 4. Markov states transition diagram for 2-component system. lD 1 lD
They are simply rewritten as as lD ¼ lDD þ lDU ;
P0 ¼ MP ð6Þ 1
¼ T1 =2 þ MTTR;
mDU
where 1
T ¼ MTTR;
P0 ¼ P00 ðtÞ; P01 ðtÞ; P02 ðtÞ; P03 ðtÞ ; mDD

T and lDD and lDU are less than 1025. Therefore


P ¼ P0 ðtÞ; P1 ðtÞ; P2 ðtÞ; P3 ðtÞ
lDU l
tCE ¼ MDT ¼ ðT =2 þ MTTRÞ þ DD £ MTTR: ð11Þ
and M is given as lD 1 lD
0 1
2ðlDD þlDU Þ mDD mDU 0 For a channel with down time, tCE ; the resulting in average
B C
B 2ðlDU þmDD Þ C probability of failure,
B lDD 0 mDU C
B C:
B C lDD lDU þ lDD mDU þ lDU mDD
B lDU 0 2ðlDD þmDU Þ mDD C
@ A PFD <
ðlDD þ mDD ÞðlDU þ mDU Þ
0 lDU lDD 2ðmDD þmDU Þ  
lDU lDD
, lD þ
The system availability AðtÞ is lD mDU lD mDD
 
1 lDU lDD
AðtÞ¼P0 ðtÞ¼ {m m þl m ¼ lD ðT =2 þ MTTRÞ þ £ MTTR ¼ lD tCE :
ðlDD þmDD ÞðlDU þmDU Þ DD DU DD DU lD 1 lD

exp½2ðlDD þmDD ÞtþlDU mDD Hence, for a 1oo1 architecture,


exp½2ðlDU þmDU ÞtþlDD lDU PFDG ¼ ðlDU þ lDD ÞtCE ¼ lD tCE : ð12Þ

exp½2ðlDD þlDU þmDD þmDU Þt}: ð7Þ


4.2. 1oo2 architecture
In general, the system probabilistic parameters at steady
state are of interest. At the steady state, the system The architecture includes two channels connected in
availability is parallel. See Figs. 1b and 2b for system block diagrams.
mDD mDU Assuming the two channels are of the same, the system
A¼ ð8Þ states can be simply defined as follows:
ðlDD þmDD ÞðlDU þmDU Þ

and the system failure frequency, FF, is given as [11] System System state definition
state
X X
FF¼ Pk ajk ; ð9Þ
k[W j[F 0 Two channels are operative (up state)
1 Only one channel is in operation (up state)
where 2 The two channels are all in fault (down state)
F is the failure states set of the system, So, the states transition diagram is easily obtained
W is the operating states set of the system, as shown in Fig. 5 where two repair teams can be
Pk is the probability of system in working state k and available to work on all known failures in the system (see
ajk is the element of M given in Eq. (6). assumption xi).
138 T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141

component compose a series system. According to


the definitions of b and bD ; the probability of failure for
detectable fault caused by common cause is bD lDD MTTR
Fig. 5. Markov states transition diagram. and the probability of failure for undetectable fault due to
common cause is blDU ðT1 =2þMTTRÞ: Here, ðT1 =2þ
This figure stands for the following differential MTTRÞ is equivalent mean down time of undetectable
equations: fault in a channel. Hence, we have Eq. (17).
8 0
>
> P ðtÞ ¼ 22lP0 ðtÞ þ mP1 ðtÞ;
< 0 4.3. 1oo2D architecture
P01 ðtÞ ¼ 2lP0 ðtÞ 2 ðl þ mÞP1 ðtÞ þ 2mP2 ðtÞ; ð13Þ
>
>
: 0 Two channels in this architecture are connected in parallel.
P2 ðtÞ ¼ lP1 ðtÞ 2 2mP2 ðtÞ:
During normal operation, both channels need to demand the
It is rewritten as safety function before it can take place. In addition, if the
0 0 1 0 10 1 diagnostic tests detect a fault in either channel, then the output
P0 ðtÞ 22l m 0 P0 ðtÞ
B 0 C B CB C voting is adapted so that the overall output state then follows
B P ðtÞ C ¼ B 2l 2m C B C
@ 1 A @ 2ðl þ mÞ A@ P1 ðtÞ A: ð14Þ that given by the other channel. If the diagnostic tests find
faults in both of channels or a discrepancy that cannot be
P02 ðtÞ 0 l 22m P2 ðtÞ
allocated to either channel, then the output goes to the safe
Similar to the case of 1oo1 architecture, the probabilistic state. In order to detect a discrepancy between the channels,
parameters at steady state are concerned here. From Eq. (14) either channel can determine the state of the other via a means
independent of the other channel. See Figs. 1c and 2c for the
2lm þ m2
A ¼ P0 ð0Þ þ P1 ð0Þ ¼ : system block diagrams.
ðl þ mÞ2 Since each component follows the exponential distri-
Hence, FF ¼ lP1 according to Eq. (6). Then bution, comparing Fig. 2b and c, the values of equivalent
mean down times for each channel and the architecture are:
12A l2 1 1
MDT ¼ ¼ ¼ : lDU ðT1 =3 þ MTTRÞ þ ðlDD þ lSD ÞMTTR
FF ðl þ mÞ2 lP1 2m t0CE ¼ ;
lDU þ lDD þ lSD
Refer to Eqs. (5) and (11), the equivalent mean down time of
lDU ðT1 =3 þ MTTRÞ þ ðlDD þ lSD ÞMTTR
a channel in the system architecture, tCE is t0GE ¼ :
2ðlDU þ lDD þ lSD Þ
lDU l
tCE ¼ ðT =3 þ MTTRÞ þ DD £ MTTR: ð15Þ
lD 1 lD The PFDG for this architecture is then obtained by referring
to Eq. (17) as follows
As m is equivalent repair rate of a channel,
 21 PFDG ¼ 2ð1 2 bÞlDU ½lSD þ ð1 2 bD ÞlDD
lDU l
m ¼ 1=tCE ¼ ðT1 =3 þ MTTRÞ þ DD £ MTTR :
lD lD þ ð1 2 bÞlDU t0CE t0GE þ bD lDD MTTR
The equivalent mean down time for this system architecture
is obtained as þ blDU ðT1 =2 þ MTTRÞ: ð18Þ

1
tGE ¼ MDT ¼
2m
4.4. 2oo2 architecture
1
¼ ½ðlDU =lD ÞðT1 =3þMTTRÞþðlDD =lD ÞMTTR: ð16Þ
2 This system consists of two channels connected in
Hence, the PFDG for this architecture is parallel. The system is in fault whenever anyone fails. See
Figs. 1d and 2d for system block diagrams. From Eq. (12),
l2 PFDG for this architecture is easily obtained on the basis of
PFDG ¼ P2 þPCC ¼ þPCC < l2 =m2 þPCC
ðl þ mÞ2 reliability block diagram
PFDG ¼ 2lD tCE ; ð19Þ
¼ l2 tCE
2
þPCC
where tCE is given in Eq. (11).
¼ 2½ð12 bD ÞlDD þð12 bÞlDU 2 tCE tGE þ bD lDD
4.5. 2oo3 architecture
 MTTRþ blDU ðT1 =2þMTTRÞ ð17Þ
by considering the effects of common causes and l p m: The block diagrams of this architecture are shown in
Refer to Fig. 2b, the two channels and common cause Figs. 1e and 2e. Refer to Section 4.2, the Markov states
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 139

tGE ¼ MDT ¼ ½ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD Þ

 MTTR=2: ð23Þ
Fig. 6. Markov states transition diagram.

For this architecture, the following is easily obtained by


referring to Eqs. (17) and (21).
transition diagram for this architecture is given in Fig. 6
where the assumption xi applies.
PFDG ¼ 6½ð1 2 bD ÞlDD þ ð1 2 bÞlDU 2 tCE tGE
Furthermore, there are the equations according to
Fig. 6. þ bD lDD MTTR þ blDU ðT1 =2 þ MTTRÞ ð24Þ
8 0
>
> P0 ðtÞ ¼ 23lP0 ðtÞ þ mP1 ðtÞ; since m q l:
>
>
>
>
>
> 0
< P1 ðtÞ ¼ 3lP0 ðtÞ 2 ð2l þ mÞP1 ðtÞ þ 2mP2 ðtÞ;
ð20Þ 5. Discussions
>
>
>
> P02 ðtÞ ¼ 2lP1 ðtÞ 2 ðl þ 2mÞP2 ðtÞ þ 3mP3 ðtÞ;
>
> 5.1. Discrepancies
>
>
: 0
P3 ðtÞ ¼ lP2 ðtÞ 2 3mP3 ðtÞ:
In Section 4, the equivalent mean down times and
average probabilities of failure on demand for five system
We investigate the steady state probabilistic parameters.
architectures are obtained by Markov model. They are the
At steady state, the system unavailability ð1 2 AÞ and FF
steady state values of the corresponding systems. However,
are obtained as:
the equivalent mean down times, tCE ; tGE and t0GE for 1oo2,
1oo2D and 2oo3 architectures shown in Section 4 are
l2 ðl þ 3mÞ different from what are described in Annex B of Part 6 of
ð1 2 AÞ ¼ ð21Þ
ðl þ mÞ3 IEC 61508 [10]. In the standard, tCE expressed in Eq. (11) is
used for all of 1oo1, 1oo2, 1oo2D, 2oo2 and 2oo3 typical
architectures. In order to make a comparison, the results of
and tCE ; tGE and t0GE used for these system architectures are listed
in Table 1. In the standard, no description can be found for
3lm2 getting the expressions of tCE ; tGE ; t0GE and average
FF ¼ 2lP1 ¼ 2l
ðl þ m Þ3 probability of failure on demand.
The average probabilities of failure on demand for these
based on Eq. (9). Hence, MDT of the system is given by three architectures in IEC 61508 are of the same forms as
those given in Section 4 but tCE ; tGE and t0GE are different.
In fact, the systems discussed in this paper could not
12A l þ 3m
MDT ¼ ¼ ð22Þ access steady state in proof test interval as mean down time
FF 6m2 of undetectable fault in a channel is T1 =2 þ MTTR and thus
the corresponding equivalent repair rate is much smaller.
where m ¼ 1=tCE and tCE is given in Eq. (15). Since m q The average probabilities of failure on demand for the five
l; MDT < 1=2m: Therefore typical architectures should be calculated by the following

Table 1
Comparisons among equivalent mean down times for the three system architectures

System tCE ; tGE and t 0GE obtained by Markov model tCE ; tGE and t 0GE given in IEC 61508-6

1oo2 tCE ¼ ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD ÞMTTR tCE ¼ ðlDU =lD ÞðT1 =2 þ MTTRÞ þ ðlDD =lD ÞMTTR
2oo3 tGE ¼ 1=2½ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD ÞMTTR tGE ¼ ðlDU =lD ÞðT1 =3 þ MTTRÞ þ ðlDD =lD ÞMTTR

1oo2D t 0CE ¼ ðlDU =lÞðT1 =3 þ MTTRÞ þ ðlDD =lÞMTTR; l ¼ lDD þ lDU þ lSD t 0CE ¼ ðlDU =lÞðT1 =2 þ MTTRÞ þ ðlDD =lÞMTTR; l ¼ lDD þ lDU þ lSD
lDU ðT1 =3 þ MTTRÞ þ ðlDD þ lSD ÞMTTR lDU ðT1 =3 þ MTTRÞ þ ðlDD þ lSD ÞMTTR
t 0GE ¼ t 0GE ¼
2ðlDU þ lDD þ lSD Þ lDU þ lDD þ lSD
140 T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141

equation Table 3
PFDG obtained by two different methods
1 ðT1
PFDG ¼ ½1 2 AðtÞdt: ð25Þ System architecture PFDG
T1 0
For 1oo1 architecture by Eq. (25) by steady state values

1 ð T1 2.45 £ 1023 4.30 £ 1023


PFDG ¼ ½1 2 P0 ðtÞdt 1oo1
T1 0 1oo2 6.89 £ 1026 8.26 £ 1026
l l þ lDD mDU þ lDU mDD 2oo3 1.86 £ 1025 2.47 £ 1025
¼ DD DU
ðlDD þ mDD ÞðlDU þ mDU Þ
1 Example 1: lDD ¼ 1025 ; lDU ¼ 5 £ 1026 ; MTTR ¼ 10;
2
ðlDD þ mDD ÞðlDU þ mDU ÞT1 T1 ¼ 5 £ 103 :

lDD mDU PFDG values are shown in Table 2 where the effects of
 ð1 2 exp½2ðlDD þ mDD ÞT1 Þ
ðlDD þ mDD Þ common cause failures are not concerned.
lDU mDD Example 2: lDD ¼ 1026 ; lDU ¼ 1026 ; MTTR ¼ 8; T1 ¼
þ ð1 2 exp½2ðlDU þ mDU ÞT1 Þ 8600:
ðlDU þ mDU Þ
lDD lDU See Table 3 for PFDG values calculated by two different
þ methods, where the effects of common cause failures are not
ðlDD þ lDU þ mDD þ mDU Þ
 involved.
 ð1 2 exp½2ðlDD þ lDU þ mDD þ mDU ÞT1 Þ : By comparing the values shown in Tables 2 and 3, it
concludes that the value of PFDG obtained by the steady
ð26Þ state system values is a little larger than the one
For 1oo2 architecture calculated by Eq. (25) for the same system architecture
1 ðT1 l2 2l2 but they approach well to each other. Therefore, it is
PFDG ¼ P2 ðtÞdt ¼ 2
2 reasonable to calculate PFDG by the steady state system
T1 0 ðl þ m Þ ðl þ mÞ3 T1
values. Moreover, it is simple for application in
l2 engineering.
 ð1 2 exp½2ðl þ mÞT1 Þ þ
ðl þ mÞ3 T1
 ð1 2 exp½22ðl þ mÞT1 Þ; ð27Þ 5.2. Effects of common cause failures
where P2 ðtÞ is obtained from Eq. (13).
Common cause failure is an important part in construction
For 2oo3 architecture
of redundant system architectures. This part and other
1 ðT1 redundant structure compose a series system in logic; see
PFDG ¼ ½P ðtÞ þ P3 ðtÞdt
T1 0 2 Fig. 2b, c and e. The effects of common cause failures on
l2 ðl þ 3mÞ 6l2 m system average probability of failure on demand are then
¼ 3
2 ð1 2 exp½2ðl þ mÞT1 Þ represented by
ðl þ mÞ ðl þ mÞ4 T1
3l2 ðl 2 mÞ
2 ð1 2 exp½22ðl þ mÞT1 Þ bD lDD MTTR þ blDU ðT1 =2 þ MTTRÞ
2ðl þ mÞ4 T1
2l3 for 1oo2, 1oo2D and 2oo3 system architectures. The
þ ð1 2 exp½23ðl þ mÞT1 Þ; ð28Þ
3ðl þ mÞ4 T1 contribution of common cause failure to PFDG is influenced
where P2 ðtÞ and P3 ðtÞ are got from Eq. (20). by factors bD and b; which depend on a physical system.
Through the above derivations, it is found that the values
of PFDG calculated by Eq. (25) and by the system steady
state values are different. In the following, let us investigate 6. Remarks
the differences by numerical examples.
In IEC 61508-6, the down time, tc1 ; is not defined. If its
Table 2 meaning is not clearly known, all other results presented in
PFDG calculated by two methods the standard could be difficult to be understood for a
System architecture PFDG
common safety engineer because it is the basis as shown in
all typical system architectures. tc1 is newly named
by Eq. (25) by steady state values equivalent mean down time of undetected failure in a
channel. It is defined as T1 2 ta þ MTTR; where ta stands
1oo1 7.17 £ 1023 1.25 £ 1022 for the time when the average probability of failure for the
1oo2 6.28 £ 1025 7.08 £ 1025
undetectable fault occurs in a system in the interval ½0; T1 :
2oo3 1.76 £ 1024 2.11 £ 1024
As the meaning of tc1 is now defined, one can get
T. Zhang et al. / Reliability Engineering and System Safety 80 (2003) 133–141 141

understood all other results of average probability of failure [3] Kato E, Sato Y. Safety integrity levels model for IEC 61508—
on demand of typical architectures. examination of modes of operation. IEICE Trans A 2000;E83-A(5):
863–5.
As discussed in the above sections, it shows that there [4] Muta H, Ibe H, Sugiyama E. Safety design of oil reclamation system
is a discrepancy in calculation of tCE ; tGE and t0GE for using IEC 61508. PSAM5—Proceedings of the Fifth International
1oo2, 1oo2D and 2oo3 system architectures between Conference on Probabilistic Safety Assessment and Management,
the ones given in the standard IEC 61508-6 and the new Osaka, Japan; Nov. 27– Dec. 1, 2000. p. 479 –84.
ones obtained by Markov model. These differences are [5] Kawahara T, Kushibiki T, et al. Safety-integrity of safety-related
systems with human beings. PSAM5—Proceedings of the Fifth
presented in Table 1.
International Conference on Probabilistic Safety Assessment and
The average probabilities of failure on demand Management, Osaka, Japan; Nov. 27–Dec. 1, 2000. p. 2411– 7.
obtained by Markov model for 1oo2, 1oo2D and 2oo3 [6] Kato E, Sato Y. Safety integrity levels model for IEC 61508.
system architectures are of the same forms with those PSAM5—Proceedings of the Fifth International Conference on
presented in IEC 61508-6. Where, however, the Probabilistic Safety Assessment and Management, Osaka, Japan;
Nov. 27– Dec. 1, 2000. p. 2787–93.
expressions of tCE ; tGE or t0GE for these three architectures
[7] Misumi Y, Sato Y. Estimation of average hazardous-event-frequency
are different from what are newly obtained. No for allocation of safety-integrity levels. Reliab Engng Syst Safety
description in details can be found for getting tCE ; tGE 1999;66:135 –44.
and t0GE in this standard. As a result, the new expressions [8] ISA-S84.01.1996. Application of safety instrumented systems for
for tCE ; tGE and t0GE are suggested to be applied. process industries. Instrument Society of America, Research Triangle
Park; 1996.
[9] Rouvroye JL, Brombacher AC. New quantitative safety standards:
different techniques, different results? Reliab Engng Syst Safety 1999;
References 66:121–5.
[10] IEC 61508-6. Functional safety of electric/electronic/programmable
[1] IEC 61508. Functional safety of electric/electronic/programmable electronic safety-related systems. Part 6. Guidelines on the application
electronic safety-related systems, Parts. 1–7;October 1998–May 2000. of IEC 61508-2 and IEC 61508-3; April 2000.
[2] Karydas DM, Brombacher AC (Guest editors). Special issue— [11] Cao JH, Cheng K. An introduction to mathematics of
Reliability certification of programmable electronic systems. Reliab reliability. Beijing: Publication of Science; 1986. p. 2. p. 210– 30,
Engng Syst Safety, No. 2; 1999. p. 66. in Chinese.

You might also like