Availability and Cost Function For Periodically Inspected Maintenance
Availability and Cost Function For Periodically Inspected Maintenance
Abstract
Unavailability and cost rate functions are developed for components whose failures can occur randomly but they are detected only by
periodic testing or inspections. If a failure occurs between consecutive inspections, the unit remains failed until the next inspection.
Components are renewed by preventive maintenance periodically, or by repair or replacement after a failure, whichever occurs first (age-
replacement). The model takes into account finite repair and maintenance durations as well as costs due to testing, repair, maintenance and
lost production or accidents. For normally operating units the time-related penalty is loss of production. For standby safety equipment it is the
expected cost of an accident that can happen when the component is down due to a dormant failure, repair or maintenance. The objective of
maintenance optimization is to minimize the total cost rate by proper selection of two intervals, one for inspections and one for replacements.
General conditions and techniques are developed for solving optimal test and maintenance intervals, with and without constraints on the
production loss or accident rate. Insights are gained into how the optimal intervals depend on various cost parameters and reliability
characteristics. 䉷 1998 Elsevier Science Ltd.
Keywords: Cost rate; Optimisation; Periodic replacement; Standby system; Testing; Unavailability
1. Introduction of whether or not repairs were needed after the previous PM.
Even if the focus of this paper is on age-replacement policy,
In the past, the economic aspects of preventive and one should notice that the results are approximately valid
corrective maintenance have been extensively studied for also for the block-replacement policy when M ¼ 1 (renewal
monitored components in which failures are immediately at every inspection) and when M → ⬁ (no PM), exactly so
detected and subsequently repaired. A review of many when t R ¼ t M ¼ 0. Further, the results are practically the
models on such maintenance policies is given by Valdez- same for these two policies for any M in the region F(MT)
Flores and Feldman [1], Dekker [2] and Schabe [3]. Far less p 1, because then the relative impact of the renewals
attention has been paid to the economics of systems in between PM-episodes is small.
which failures are dormant and detected only by periodic
testing or inspections. Such units are used in non-monitored
production lines and they are especially common in indus- 2. The cost function and general results
trial safety and protection systems. Both the unavailability
models and the cost factors differ considerably from those The objective is to derive an expression for the average
of monitored components [4–6]. cost rate y(T,M) over a long time horizon. It depends on T, M
This paper develops unavailability and cost models for and the reliability characteristics of the unit. It is useful to
periodically inspected and maintained units, extending the consider the average renewal cycle, beginning with a
earlier works [6,7] in category [T old, F new] with the age- renewal and ending either with a preventive maintenance
replacement maintenance policy, accounting for finite dura- after time MT [with probability R(MT)] or with a repair
tions of repairs and maintenance. This is an expanded [with probability F(MT)]. The average length of a renewal
version of a conference paper [8] with more emphasis on cycle is then
normally operating units and certain special cases. Some of
kT
the results are valid also for an alternative block-replace- X
M Z
ment policy. L(T, M) ¼ (MT þ tM )R(MT) þ (kT þ tR )dF(t)
The basic assumptions are: k ¼ 1 kT ¹ T
1.1. Age-replacement and block-replacement The average total cost per cycle consists of the cost-
contributions of inspections, repair, maintenance and lost
The policy defined by assumption 4 means that pre- production (or accidents),
ventive maintenance (PM) has to be rescheduled when a
MX
¹1
failure occurs during the first M-1 test intervals after any Relia
C(T, M) ¼ CT R(kT) þ CR F(MT) þ CM R(MT)
renewal, a fact not always emphasized in the literature. An k¼0 Syste
alternative is the block-replacement policy in which preven-
tive renewals take place at fixed intervals MT, independent þ KCA D(T, M) ð5Þ
J.K. Vaurio/Reliability Engineering and Systems Safety 63 (1999) 133–141 135
and the cost rate function is the ratio y(T,M) ¼ C(T,M)/ Because R(t) is non-increasing,
L(T,M), MT
MX
¹1 Z
y(T, M) ¼ T R(kT) ⱖ R(t)dt: (11)
MX
¹1 k¼0 0
CT R(kT) þ CM R(MT) þ CR F(MT) þ KCA D(T, M)
k¼0
This on the left hand side of Eq. (10) and the fact
MX
¹1
T R(kT) þ tM R(MT) þ tR F(MT) CM R(MT) þ CR F(MT) ⱖ min(CM , CR ) (12)
k¼0 ð6Þ
on the right hand sideP of Eq. (10) yields
One should notice the special cases: with M ¼ 1 the unit is Topt KCA ⱖ CT þ min(CM , CR )= PMk ¼ 0 R(kTopt ).
¹1
renewed at every test, and with M ¼ ⬁ there is no preventive Eq. (9) then follows from k ¼ 0 R(kT) ⱕ M. Conse-
M¹1
maintenance. No special rescheduling of maintenance is quently, the case KC A ¼ 0 always yields T opt ¼ ⬁.
involved in these cases. d. The necessary conditions for the existence of T with
A practical task is to minimize the cost rate by proper y(T,M) ⬍ KC A are:
selection of T and M. A reasonable approach is to find an
optimal T for different values of M ¼ 1,2,…, and then select KCA tf ⬎ CT þ min(CR , CM ), for M ⬍ ⬁
the optimal M and T from these results. Bisection proce-
dures can be useful in this process. It is practical to calculate KCA tf ⬎ CT þ CR , for M ¼ ⬁
the average unavailability
These follow rather easily from the inequality Eqs. (10)–
D(T, M) (12).
Uav (T, M) ¼
L(T, M) e. For any fixed M ⱖ 1,
A(T, M) tf
¼ 1 ¹ M¹1 ð7Þ Uav (T, M) → 1 ¹ (13)
X large T T þ tR
T R(kT) þ tR F(MT) þ tM R(MT)
k¼0 This result follows also from Eq. (8) by substitution KC A ¼ 1,
along with the cost rate function so that it can be easily C T ¼ C R ¼ 0.
compared with any limit that might be set by the company f. For any fixed M, 1 ⱕ M ⬍ ⬁,
policy or by safety authorities. tM
Uav (T, M) → (14)
small T MT þ tM
2.1. General results
g. For M ¼ ⬁,
First consider the asymptotic behaviour of y(T,M) for tR
Uav (T, ⬁) → (15)
increasing T and any fixed M ⱖ 1. Because small T tf þ tR
R(t) ¼ 0, F(⬁) ¼ 1, and a finite mean time to failure
limt→⬁
This is a familiar result for continuously monitored com-
tf ¼ ⬁0 R(t)dt, Eq. (6) yields
ponents in renewal theory. Here it follows from
KCA tf ¹ CT ¹ CR
y(T, M) → KCA ¹ → KCA (8) MX
¹1
MT
Z
large T T þ tR T→⬁ 1
T R(kT) → R(t)dt þ TF(MT) → tf (16)
Now one can conclude, for any M ⱖ 1: k¼0 small T 2 M→⬁
0 T→0 þ
a. A value T always exists such that y(T,M) ⱕ KC A; T ¼ ⬁
for equality if nothing better. h. For any fixed finite M, the unavailability is minimum at
b. If KC At f ⬎ C T þ C R (sufficient condition), there is a some value T ⬎ 0, when t M ⬎ 0. This follows from result f,
finite optimal T ¼ T opt that minimizes y(T,M), and y(T opt, M) Eq. (14).
⬍ KC A. This can be seen from the middle form of Eq. (8), i. For M ¼ ⬁, the unavailability is minimum at T → 0 þ ,
under the stated condition. and in the limit equals Eq. (15). This follows from Eqs. (7)
c. The optimal T is always larger than a certain limit and (11).
j. When the maintenance interval T M ¼ MT is held con-
min(CM , CR )
Topt ⱖ TLL ⬅ CT þ =(KCA ) (9) stant in Eq. (7), the unavailability is minimum at T ¼ 0 þ .
M This is because the denominator in Eq. (7) is now minimum
at T ¼ 0 (Eqs. (11) and (16)).
Proof. From y(T opt, M) ⱕ KC A (result a) Eq. (6) yields
MTopt
Z MX
¹1
2.2. Renewal at every inspection Relia
KCA R(t)dt ⱖ CT R(kTopt ) þ CM R(MTopt )
k¼0 Syste
0
Since the general equations are rather complex, let us
þ CR F(MTopt ) ð10Þ study the case M ¼ 1 in some detail. y(T) ¼ y(T,1) from
136 J.K. Vaurio/Reliability Engineering and Systems Safety 63 (1999) 133–141
Fig. 1. Typical cost rate functions in case C R ⬎ C M (KC A ¼ 30, t f ¼ 40, C R ¹ C M ¼ 400, C T þ C M ¼ 2000,1000 and 400, respectively).
and sufficient condition for B(T) ⬎ 0 for a finite T is KC A/l From these results one can conclude also that the unavail-
⬎ C T þ C R, independent of C M. ability U(T,1) always has a minimum at a finite T ¼ T 0. In
When this condition is satisfied, T opt is obtained by setting the exponential case, R(t) ¼ e -lt,T o is the unique solution of
the derivative of y(T) of Eq. (17) equal to zero. T opt is then Eq. (24) with d ¼ 0, t̄ ¼ t M. In this case T o is independent of
the unique solution of t R, and T o ⬍ T opt.
CT þ CM One can imagine a strongly oscillating h(t) such that Eq.
1 ¹ (1 þ lt̄ þ lT)e ¹ lT ¼ d ⬅ (24) (28) could have multiple solutions and y(T) multiple local
KCA =l þ CM ¹ CR
extreme values. In practice one rarely encounters more com-
where t̄ is a weighted average of t M and t R, plex h(t) than a ‘bathtub’ curve: decreasing h(t) up to a
t̄ ¼ [KCA =l ¹ CT ¹ CR )tM certain age and later increasing. In such a case Eq. (28)
could have at most two solutions, T 1 and T 2, and T opt
þ (CT þ CM )tR ]=[KCA =l þ CM ¹ CR ] ð25Þ would be between these if B(T 2) ⬎ 0 and dL(T)/dT ⬎ 0 at
p T ¼ T 2, otherwise T opt ⬎ T 2.
Often a good approximate solution is Topt ⬇ 2(d þ lt̄)=l.
[By the way, with an exponential R(t) the case M ¼ ⬁ also
2.3. Numerical example
leads to Eqs. (24) and (25) for optimal T, with C M ¼ 0, t M ¼
0].A
Consider the following numerical values:C T ¼ 2, C M ¼
Generally, whenever B(T) of Eq. (18) is monotonically
50, C R ¼ 60, K ¼ 10 ¹6, C A ¼ 4·10 6, t M ¼ 0.2, t R ¼ 0.3,
increasing (non-decreasing), one can conclude from Eq.
H(t) ¼ lt þ (at) b, l ¼ 3.23·10 ¹3, a ¼ 9.07·10 ¹3, b ¼ 4. The
(18) that KC At f ⬎ C T þ C R is both necessary and sufficient
cost unit is $1000, and the time unit 1 week. With these
condition for a finite T opt. On the condition that B(T) has a
values, t f ⬇ 84.5, and the condition KC At f ⬎ C T þ C R is
derivative
clearly satisfied, leading to finite optimal T for any M. The
dB(T) described procedure yields the optimization results given in
¼ [KCA ¹ (CR ¹ CM )h(T)]R(t) (26)
dT Table 1 for different values M. The overall optimal combi-
one can make several additional practical conclusions. The nation is obviously M ¼ 7, T ¼ 14. The cost function y(T,M)
hazard rate h(t) is a commonly used ‘input’ function in is rather flat and insensitive to ⫾ 50% variations of T and/or
reliability studies. From Eq. (26) one can conclude: M around the optimal values, as illustrated by Fig. 2. Thus,
A. If C M ⱖ C R, then B(T) is non-decreasing for any h(t) ⱖ even approximate optimization with uncertain cost data can
0, and KC At f ⬎ C T þ C R is necessary and sufficient for a yield reasonable near-optimal values for T and M. This
finite T opt to exist. behaviour is evidently due to the smoothness of the hazard
B. When C R ⬎ C M and maxt⬎0 h(t) ⬍ KC A/(C R ¹ C M), rate h(t), compared to that of Fig. 1.
then KC At f ⬎ C T þ C R is a necessary and sufficient condi- With any given M, y(T,M) as a function of T typically has
tion for a finite T opt. a single minimum. The value of T that minimizes the
C. When C R ⬎ C M and mint⬎0 h(t) ⱖ KC A/(C R ¹ C M), unavailability U av(T,M) is typically smaller than the one
then B(T) ⱕ 0 and T opt ¼ ⬁. that minimizes y(T,M).
D. When C R ⬎ C M and the conditions B . and C . are not
satisfied: if there is a value T 1 such that 3. Constrained optimization
h(t) ⬍ KCA =(CR ¹ CM ) for t ⬍ T1 , (27)
In some cases a company policy or safety authority can
h(t) ⬎ KCA =(CR ¹ CM ) for t ⬎ T1 , set conditions for the allowable unavailability or the accident
rate. The actual optimization problem then is to minimize
[h(t) does not have to be monotonic], then B(T) has an the cost under the additional constraint
absolute maximum at T ¼ T 1. Then:
If B(T 1) ⬎ 0 and dL(T)/dT ⬎ 0 at T ¼ T 1, then T opt ⬍ KUav (T, M) ⱕ Rⴱ (29)
T 1 ⬍ ⬁. where R* is the allowed limit.
If B(T 1) ⬎ 0 and dL(T)/dT ⬍ 0 at T ¼ T 1, then T 1 ⬍ If the absolutely optimal values T ¼ T opt and M ¼ M opt
T opt ⬍ ⬁. that minimize Eq. (6) fail to satisfy Eq. (29), one has to
These results follow from the sign of dy(T)/dT (Eq. (17)) change one or both to reduce the average unavailability
at T ¼ T 1. (and increase the cost rate). In numerical optimization
With a continuous h(t) one can solve T 1 form the equation studies it is rather straightforward to verify Eq. (29) for
KCA any candidate values T and M, and limit the admissible
h(T1 ) ¼ (28) values accordingly. For this purpose, it is advisible to list
CR ¹ CM
or plot simultaneously both y(T,M) and U av(T,M). If the ratio Relia
easily for the usual forms h(t) ¼ at b¹1 (a ⬎ 0, b ⬎ 1) or h(t) R*/K is smaller than the absolutely minimum value of Syste
¼ e aþbt (b ⬎ 0). Thus, for an increasing h(t) one often easily U av(T,M), then there are no acceptable values T and M.
finds the finite bounds T 1 and T LL (Eq. (9)) for T opt. Analytically, the optimization problem is now to
138 J.K. Vaurio/Reliability Engineering and Systems Safety 63 (1999) 133–141
minimize y(T,M) under the condition U av(T,M) ¼ R*/K, time, or to the minimum risk constraint that can be satisfied
which defines T as a function of M, or vice versa. without preventive maintenance. For example, with a non-
increasing h(t), M ¼ ⬁ is always optimal and therefore Eq.
3.1. Numerical example (31) relevant.
Rⴱ tR Relia
tR ⱕ t , Rⴱ ⱖ K (31) TM
Z
K ¹ Rⴱ f tf þ t R Syste
¼ [tR F(TM ) þ tM R(TM )]= R(t)dt
This sets a condition to the maximum allowed mean repair 0
J.K. Vaurio/Reliability Engineering and Systems Safety 63 (1999) 133–141 139
Table 1
Optimal inspection intervals T opt and the corresponding cost rate and unavailability values
M
1 2 3 4 5 6 7 8 9 10 20 90
T opt 66 37 27 21 18 16 14 13 12 11 11 11
y(T opt,M) 1.347 1.187 1.122 1.098 1.073 1.065 1.060 1.061 1.064 1.067 1.103 1.103
U av(T opt,M) 0.129 0.091 0.077 0.065 0.062 0.060 0.055 0.055 0.054 0.050 0.064 0.064
2 3
TM is satisfied)
Z
6 7 tf ¹ (1 ¹ r)tR
r ¼ CT =4KCA R(t)dt ¹ (CR þ CT =2)F(TM ) ¹ CM R(TM )5 1
ỹ ¼ rKCA þ (1 ¹ r) CR þ CT =t , ð36Þ
0 2 rtf ¹ (1 ¹ r)tR f
with r ¼ Rⴱ =K:
The dimensionless parameters , and r are usually small
compared to unity. The outage times t R and t M then have
a minor impact on the test interval. The optimal interval 5. Extensions
increases with an increasing C T and decreases with an
increasing KC A. The following additional failures and unavailability con-
On the other hand, if the value of T has been pre-selected, tributions can be rather easily taken into account:
the condition ỹ(T,T M)/T M ¼ 0 becomes, approximately
(when p 1 and h(T M)T p 1) a ¼ time-independent probability of failure due to a
true demand (relevant to standby units). This
TM
Z includes possible design or installation errors that
CM þ KCA tM
h(TM ) R(t)dt ¹ F(TM ) ¼ are not discovered by periodic testing or main-
CR ¹ CM þ KCA (T=2 þ tR ¹ tM )
0 tenance, only by a true demand.
(34) p ¼ probability of a human error per test, e.g. failure to
return to operating state after a test. This error is
Even if analytical solution is rarely possible, this result discovered and corrected at the next test, without
shows that no finite T M is optimal for nonincreasing h(t). additional cost or delay.
Eq. (34) also shows that the optimal T M ¼ T̃ M depends d ¼ downtime-due to a test;assumption d p T is made.
essentially on a single combined cost parameter defined
by the r.h.s. of Eq. (34). T M increases with increasing C M These events and parameters are assumed to have no
and t M, and decreases with increasing C R and t R. Increasing effect on the other reliability and cost parameters. They
KC A and T also typically reduce T̃ M. In practice usually T q do not change the length of the renewal cycle L(T, M) but
t R & t M so that the outage times t R and t M have a minor reduce the expected uptime per cycle:
impact on T M. In case of a generalized Weibull hazard func-
tion, H(t) ¼ lt þ (at) b, Eq. (34) can be solved analytically if MT
Z
d
F(T M) ⬇ H(T M) p 1. A(T, M) → 1 ¹ (1 ¹ a)(1 ¹ p) R(t)dt (37)
One general way to solve both optimal intervals is to plot T
0
or list both sides of Eq. (34) as functions of T M (with T on the
r.h.s. calculated from Eq. (33)), and find the crossing-point. With this new A(T,M) Eqs. (3)–(7) are again valid.
If this leads to solutions with Ũ av(T̃,T̃ M) ⬎ R*/K, one has to It is also possible that some failures are not detected by
resort to the constraint optimization. The condition U av ¼ periodic tests (e.g. reduced capacity), but are removed/
R*/K then defines T as a function of T M corrected by repair and maintenance. Then there is R n(t) ¼
probability that a failure not detectable by a test does not
2 3
TM
ⴱ Z
2 6 R 7 Table 2
T¼ 4 R(t)dt ¹ tR F(TM ) ¹ tM R(TM )5
F(TM ) K ¹ Rⴱ Optimal test and maintenance intervals based on constrained risk (R*/K)
0
R*/K ¼ U av(T̂,M̂) T̂ M̂ y(T̂,M̂)
(35)
0.055 14.0 7 1.060
0.050 12.3 8 1.062
and ỹ can be plotted as a function of T M only. The condition 0.040 10.3 9 1.075 Relia
ỹ/T M ¼ 0 does not generally yield easy analytical equa- 0.030 9.0 9 1.12
Syste
tions, except in case t R ¼ t M ¼ 0, or if M ¼ ⬁. In case 0.020 6.9 10 1.24
0.010 2.4 20 2.09
T M ¼ ⬁, the constraint minimum cost rate is (when Eq. (31)
140 J.K. Vaurio/Reliability Engineering and Systems Safety 63 (1999) 133–141
occur before time t since the last renewal. Assuming that without possible constraints on the production loss or accident
detectable and non-detectable failures are mutually inde- rate. In addition to purely numerical methods, a practical
pendent, only the integral in A(T, M) needs to be changed semi-analytic method is to first solve the optimal T for two
in Eqs. (3), (6) and (7) (and Eq. (37)): extreme cases, M ¼ 0 and M ¼ ⬁, for which some analytical
MT MT results are available. The global T opt may then be iterated
Z Z between these values based on Eqs. (33) and (34), for example.
R(t)dt → R(t)Rn (t)dt (38) Because the cost rate is a linear function of all cost para-
0 0
meters, the optimal values T opt and M opt depend only on the
Again the renewal cycle length L(T, M) remains unchanged, ratios of the cost parameters. Nevertheless, optimization can
assuming that non-detectable failures do not change t R or be sensitive to these parameters at least if KC At f is nearly
t M. equal to C T þ C M or C T þ C R, or if h(t) changes strongly in
the neighbourhood of the optimal T M.
Analytical work in the future may focus on the extensions
6. Conclusions mentioned in the preceding Section. Models with partially
effective inspections or repairs with alternative maintenance
In the economically competitive world it is increasingly policies is another avenue for new developments. With
important to consider the cost factors associated with safety realistic models available for single units there is a wide
related systems in addition to those of production systems. field open for analysing and optimizing large systems.
A prerequisite for this is proper modelling of reliability Refs. [4–6] provide some examples along these lines.
characteristics in combination with the cost factors to facil-
itate calculation and minimization of the total cost. Mini-
mization can be accomplished by selecting the free References
parameters, in this case the inspection and maintenance
intervals, in an optimal way. This paper is a modest con- [1] Valdez-Flores C, Feldman RM. A survey of preventive maintenance
tribution to reach these goals. models for stochastically deteriorating single — unit systems. Naval
Research Logistics, 1989;36:419–446.
General cost rate and unavailability equations were [2] Dekker R. Integrating optimisation, priority setting, planning and
developed for periodically inspected and preventively main- combining of maintenance activities. European Journal of Operational
tained components under the age-replacement policy. The Research, 1995;82:225–240.
model includes finite repair and maintenance times and cost [3] Schabe H. A new approach to optimal replacement times for complex
contributions due to inspection (or testing), repair, mainte- systems. Microelectronics and Reliability, 1995;35:1125–1130.
[4] Harunuzzaman M, Aldemir T. Optimization of standby safety system
nance and loss of production (or accidents). Several neces- maintenance schedules in nuclear power plants. Nuclear Technology,
sary and sufficient conditions and solution techniques were 1996;113:354–367.
developed for optimizing test and maintenance intervals, [5] Martorell S, Munoz A, Serradell V. An approach to integrating sur-
with and without constraints on the availability (or accident veillance and maintenance tasks to prevent the dominant failure
rate). Exact and approximate analytical results were derived causes of critical components. Reliability Engineering and System
Safety, 1995;50:179–187.
to gain insights about the dependencies of optimal intervals [6] Vaurio JK. Optimisation of test and maintenance intervals based on
on various cost parameters and hazard rates. The optimal risk and cost. Reliability Engineering and System Safety,
intervals increase with increasing test and maintenance 1995;49:23–36.
costs, while increasing production (or accident) and repair [7] Vaurio JK. On time-dependent availability and maintenance optimi-
costs tend to reduce the optimal intervals. Repair and main- sation of standby units under various maintenance policies. Reliability
Engineering and System Safety, 1997;56:79–89.
tenance times have a minor impact in typical cases. [8] Vaurio, JK. The cost function for periodically tested standby units
The main results are the general cost rate equation and the with age-replacement maintenance. In: Proc. ESREL’97, Vol 3, 17–
procedures outlined for solving the optimal intervals, with and 20 June, Lisbon, Portugal, Pergamon, 1997:1681–1688.
Relia
Syste