Availabilty Optimizzed PDF
Availabilty Optimizzed PDF
www.elsevier.com/locate/ress
Abstract
This paper studies preventive maintenance (PM) in simultaneously considering three actions, mechanical service, repair and replacement
for a multi-components system based on availability. Mechanical service denotes the activities including lubricating, cleaning, checking and
adjusting, etc. which is set to alleviate strength degradation. Repair is defined on that not only slow down the degraded velocity but also
restore the degraded strength partly. Replacement is settled to recover a component to its original condition. According to the definitions, the
degradation of components is analyzed from its failure mechanisms and the improvements of various actions to it in reliability were
measured by using two improved factors. Following the proposed model of reliability, the mean-up and mean-down times of each component
are also investigated and the replacement intervals of components are determined based on availability maximization. Here, the minimum
one among the intervals is chosen as the PM interval of system for programming the periodical PM policy. The selection of action for the
components on every PM stage is decided by maximizing system benefit in maintenance. Repeatedly, the scheduling is progressed step by
step and is terminated until the system extended life reaching to its expected life. The complete schedule provides the information, the actions
adopted for the components, the availability and the total cost of system on each stage. Validly, a multi-components system is used as an
example to describe the proposed algorithm.
q 2003 Elsevier Ltd. All rights reserved.
Keywords: Preventive maintenance; Reliability and availability
0951-8320/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ress.2003.11.011
262 Y.-T. Tsai et al. / Reliability Engineering and System Safety 84 (2004) 261–270
achieving the optimization of PM policy based on some PM. The former is more regular so that it is often executed
specific supporting, such as uniform improvement, main- in a general system. The latter usually is more complex and
tenance activity and cost, etc. For a system which is is mostly adopted for some specific parts, e.g. key
consisted of many subsystems and/or components (SCs), the components, because its maintenance interval is not
effectiveness of maintenance mainly depends on both the constant. Moreover, the commonly used goals on mainten-
improved levels and the maintenance-costs of the SCs. It is ance optimization are based on either costs minimization or
similar to imperfect maintenance. Aiming to imperfect profits maximization [14]. A frequent adopted index in
maintenance, Whitaker and Samaniego [9] proposed a representing system performance is the availability, which
method of reliability evaluation. Refs. [10,11] below cover describes the ratio of up and down times of systems. It is so
different approaches proposed to model imperfect mainten- important as well as costs/profits in many real situations.
ance based on an improvement factor. Considering multi- Therefore, there were many authors to have considered the
activities in maintenance, Martorell et al. [12] assumed that both criteria in developing approaches for searching the
the PM activities would affect component age as a function optimized maintenance [15 – 17]. Typically, Borgonovo
of the maintenance effectiveness, and suggested some age- et al. [18] presented an approach for the evaluation of
dependent models to determine the risk and associated plant maintenance strategies and operating procedures
economic cost problems. Further, a new reliability model under economic constraints.
was presented by Martorel et al. [13] in which includes For a complex system, the shut-down loss could be
parameters related to surveillance and maintenance effec- obviously reduced as well as its effectiveness can be
tiveness and working conditions of the equipment, both promoted if its availability can be set or maintained at
environmental and operational. someone level. In this paper, availability maximization is
For suitably modeling the effects of maintenance to a adopted as a criterion for scheduling periodical PM. It is
multi-component system, this paper combines three typical used to determine the PM intervals of SCs for a multi-
PM actions as follows. component system. Three kinds of action mentioned above
(1a)-maintenance (mechanical service). This type-action are concurrently taken on each PM stage. The purpose of
emphasizes on maintaining a system on normal operating PM strategy is not only on maintaining the system life to its
condition. It usually involves less techniques and tools, i.e. expected life but also in obtaining the maximum system
the improvement is limitary. It just only improves the benefit by availability optimization. By the example
extrinsic state (the deteriorated environment) so that it can analysis, the results demonstrate that the PM policy which
tune the SCs to a more good condition. Several typical considers more than one action is more advantage than that
activities for this type are, for example, (a) lubricating, (b) only single action (replacement) adopted.
adjusting/calibrating the position or load carried to the mating
parts, (c) tightening the loose parts, (d) cleaning the dust, jam
and rust, etc. to maintain the inherent function of parts, and (e) 2. Reliability under PM
consuming materials supplement such as oil, waters, etc.
(1b)-maintenance (repair). This type-action is mainly Before scheduling the PM program, the improvements
adopted for some SCs which are expensive and/or uneasily of various PM actions to reliability must be identified at
to be acquired. It generally includes the activities of (1a) and first. From the viewpoint of strength– stress interference
repairing/replacing for some simple parts such as springs, theory (SSI), reliability degrading denotes that the strength
seals, belts and bearings, etc. It can rightly recover the distribution is moving toward left depending on time. The
intrinsic damage except the extrinsic condition improved. (1a)-maintenance could slow down the moving velocity of
Examples for this type are engine overhaul, engineering the strength distribution due to the deteriorated environ-
structure reinforcement and surface treatments to the ment improved so as to it could delay the degraded time.
moving parts, etc. Normally, it usually contains On the other hand, the (1b)- and (2P)-maintenances could
the following activities: (a) disassembly, (b) reassemble of shift the distribution toward right except holding the
the repaired SCs and/or (c) the whole function calibration. function of (1a), i.e. uplifting the reliability, because the
(2P)-maintenance (replacement). This type-action is to cumulative damages of system could be solved by the two
replace the subsystem/component (SC) with a new one. It is actions. The effects of various actions to the strength
frequently adopted for the key SCs to avoid serious damage distribution were shown in Fig. 1.
occurred. In addition, the SCs which undergone several According to the improved mechanisms, the improve-
times (1a) and (1b) and were not worthy to go on using, may ments of maintenance to the system can be classified into two
also take this type-action. parts. The former is the recovery to the failed parts of system
While planning the PM schedule according to the defined which are restored either by repairing or by replacing. The
activities, the maintenance time and the optimization goal of latter is the improvement to the survival parts which are
system would affect the contents of actions adopted. restored by anyone of the three actions. Ideally, the reliability
Considering the time of PM taken, PM policies can be of surviving parts can be modeled by using the age reduction
classified into two kinds, periodical PM and non-periodical model [1,2]. This model proposed the system reliability
Y.-T. Tsai et al. / Reliability Engineering and System Safety 84 (2004) 261–270 263
as u ¼ 4000; b ¼ 2:5; tp ¼ 2000; m1 ¼ 0:8 and m2 ¼ 0:5: result in mechanical fatigue, contact stress will lead to
Under the given parameters, the reliability changing of excessive wear of kinetic parts, the chemical change on
various PM actions was shown in Fig. 2(a). Furthermore, surfaces will weaken materials (corrosion), and the original
according to the relationship between reliability and hazard, performance decay after a long time use (aging), etc.
the hazard function was expressed as [19] According to the intrinsic –extrinsic problems of system
failures, (1a) contains five kinds of activity (Section 1)
1 dRj ðtÞ which can only improve the extrinsic conditions of failures.
hj ðtÞ ¼ 2 ; ðj 2 1Þtp # t # jtp ð6Þ
Rj ðtÞ dt Here, four frequently occurred failures are considered for
assessing the improved factors. They are (1) fatigue, (2)
Then, the failure rate of system on the jth stage can be
wear-out, (3) aging and (4) others (such as corrosion, creep,
expressed as
rupture, and deformation, etc.). The relative probabilities of
the failures occurred, pf;i ; are first estimated. Next, the
1 b 1=m1 ðt 2 ðj 2 1Þtp Þ b21
hj ðtÞ ¼ h0;j þ ð7Þ possible environment deteriorations subjected to these
m1 u u
failures, which may be the problems of loads, temperature,
where h0;j is the initial failure rate of system on the jth humidity and dust, etc. are roughly analyzed. After that, the
stage. Here, the improvement of maintenance to h0;j can improved levels of (1a) to the extrinsic factors are evaluated
be taken with the same behavior as well as maintenance according to the enclosed activities. Here, m1 is defined as
to R0;j : Then, the failure rate changing in this example was
X
4
shown in Fig. 2(b). m1 ¼ pf;i ·Ii ð8Þ
i¼1
advance. The parameters include (1) the improvement where Ii denotes the degree of the operating environments
factors m1 and m2 ; (2) the needed times in maintenance. restored to the original conditions subjected to failure i
which is set between 0 and 1. A further discussion in
3.1. Improvement factors assessment determining Ii can refer to Ref. [14]. On the other hand, the
improvement of maintenance to the failed parts can be
The improvement factors are associated with both the measured according to the recovery of (1b) action to the
maintainability of the failure, i.e. repairable or non-repair- individual failure under someone given tools and equip-
able, and the sufficiency of maintenance to the individual ments. It is defined as
failure. For accurately modeling the improvement, a detailed
analysis on the reliability characteristics of systems (failure X
4
rate contributors) and the extrinsic effects of individual m2 ¼ pf;i ·di ð10Þ
i¼1
maintenance task is necessary. Typically, a complex
approach in analyzing the dominant failure causes and the where di indicates the percents of the failures recovered by
corresponding maintenance tasks were introduced by Mar- repairing.
torell et al. [21] for identifying the most suitable set of tasks For example, the possible failures of a mechanical
which integrates the maintenance tasks with the surveillance component are fatigue and wear-out. The supposed failure
requirement. A further discussion to evaluate the factor probabilities are pf;i ¼ ð0:6; 0:4; 0; 0Þ: Assuming fatigue is
(named maintenance effectiveness) for mechanical com- caused by heat stress (dust heap), vibration (joint loosing)
ponents was reported in Ref. [22] which a statistical method and wear-out by poor lubrication. The restored levels of (1a)
presented. Recently, a simple numerical calculation based on to the extrinsic deteriorations of these failures be assessed as
the maintenance contents had been proposed by Tsai et al. Ii ¼ ð0:8; 1; * ; * Þ: Moreover, the (1b) may be packing
[14] to estimate the improvement of simple maintenance. replacing or internal defects processing and the improve-
For a mechanical system, the common failure modes may ments are supposed as di ¼ ð0:9; 0:8; * ; * Þ: Then, ðm1 ; m2 Þ
be, for example, shorts, opens, ruptures, power losses, being would be ð0:88; 0:86Þ:
out of tolerance and loss of out, etc. The failure modes come
mainly from two mechanisms. The first one is the extrinsic 3.2. Maintenance times evaluation
problems of systems such as poor lubricating/heat-vanish-
ing, choking and jamming caused by contamination (dust The size of maintenance time would affect the effective-
and dirt), and bad connection or pressure over due to parts ness of systems. It is also a concrete index to describe the
loosing, etc. The other one is the problems of intrinsic maintainability of systems. To plan PM policy based on
damages, for example, repeated cycles of lead vibration will availability, evaluating the needed times in maintenance for
Y.-T. Tsai et al. / Reliability Engineering and System Safety 84 (2004) 261–270 265
the PM actions are necessary. In general, the maintenance the unexpected failures of systems but also have a
time can be defined as the sum of the durations of the significant impact on the system life. To schedule PM
following subtasks: program by availability, the mathematical expression of
availability must be depicted at first.
(1) The access time ðt1 Þ: It is the amount of time required Normally, availability depends on both reliability and
to gain access to the maintained components, i.e. the maintainability. A concrete expression to describe the
ready time of preliminary jobs in maintenance. For operational availability is by the mean up-time (MUT) and
example, the removal of maintenance obstacles, such the down-time (MDT) of each cycle. It was defined as [20]
as panels or covers disassembly is this item. MUT
(2) The inspection or diagnosis time ðt2 Þ: It is the amount A¼ ð13Þ
MUT þ MDT
of time required to trouble shoot and to determine the
cause of degradation or failure. It includes the time of Considering periodical replacement problem, the MUT
diagnostic instruments setting and is also referred to as can be expressed as
fault isolation time. ðtp
(3) The repair and/or replacement time ðt3 Þ: This item is MUT ¼ tp 2 tb hðtÞdt ð14Þ
0
only the actual hands-on time to complete the
restoration process once the problem has been where tp ; ta and tb are the PM interval, the PM and the CM
identified and access to the degraded/failed com- times on replacement, respectively, and hðtÞ the hazard
ponents obtained. Its size is decided according to the function. The MDT is defined as
actual job contents in maintenance. ðtp
(4) The verification and alignment time ðt4 Þ: It includes the MDT ¼ ta þ tb hðtÞdt ð15Þ
0
time in assembling the all dismantled parts for
validating the restoration and the time of alignment Substituting Eqs. (14) and (15) into Eq. (13), it can be
check to ensure that the unit has been returned to an rewritten as
operational condition. ðtp
tp 2 tb hðtÞdt
0
According to the four time items, the PM time can be A¼ ð16Þ
tp þ ta
reasonably estimated if the possible maintenance jobs are
preset accurately in advance. For any SC, it is expressed as Subsequently, the PM interval for maximizing the
availability can be derived by differentiating Eq. (16) to
X
4
ta ¼ ti ð11Þ time tp : It is
i
dA
¼0 ð17Þ
On the other hand, the CM time must enclosed additionally dtp
the times in supply delay and maintenance delay except the
items considered in PM since the fails are occurred The differential result is
unexpectedly. Supply delay consists of the total delay time ð tp t
ðtp þ ta Þhðtp Þ 2 hðtÞdt ¼ a ð18Þ
in obtaining necessary spare parts or components in order to 0 tb
complete the restoration process. Maintenance delay is the
time spent waiting for maintenance resources or facilities. Naturally, the optimal tp would satisfy the above equation.
Any delay in waiting for spares, additional personnel test Considering a multi-components system, the tp (optimal
equipment, and so on, is either supply delay or maintenance replacement time) of the SCs each can be derived by Eq.
delay. Thus, the CM time of SCs can be expressed as (18) once the related parameters ta ; tb and hðtÞ given. For a
system, if the SCs are replaced depending on their tp
X
6 individually, the system’s availability would be largely
tb ¼ ti ð12Þ reduced due to system shut-down over frequent. To avoid
i
the problem, we choose the minimum one among the tp of
where the two denotations t5 ; t6 stand for the supply delay and the SCs as the PM interval of system, i.e. system PM
the maintenance delay, respectively. Once the times interval, T ¼ Min (the SCs tp ). On the other hand, the SCs
evaluated, the PM scheduling is then progressed following which tp . T are taken with (1a) and (1b) in this time.
the constructed reliability model. While scheduling the PM program, there are two
problems arisen. The first one is that whether the other
SCs (which tp . T) need to be maintained at this time. The
4. Maintenance planning other one is that what actions should be adopted for these
SCs. Here, the former is decided according to the status of
The reliability increase of system often can be achieved reliability degradation. If the SCs reliability at the next PM
through a PM program. Such program not only can reduce stage is less than the set minimum reliability requirement,
266 Y.-T. Tsai et al. / Reliability Engineering and System Safety 84 (2004) 261–270
5. Example
i.e. Rð2TÞ , Rmin ; the SC need to be maintained on this 5.1. Problem formulation
stage. The latter is decided depending on the results of
maintenance-benefit analyzing. The maintenance benefit of A mechatronic system which consists of five SCs [(1)
any SC on the jth stage is defined as control, (2) power, (3) transmission, (4) sensing, and (5)
ð1 ð1 tool] is used as an example to explain the procedure of
Ri;jþ1 ðtÞdt 2 Ri;j ðtÞdt PM scheduling. The reliabilities of the SCs are formulated
t tj by using Weibull function because the most useful
Bi;k ¼ j ð19Þ
Ci;k probability distributions in reliability are Weibull. The
system reliability is model by the agree method [19],
where the subscript i; k denote the ith SC and the three
because it is applicable to the general system that can
actions, respectively. The numerator indicates the extended
be decomposed into a series of independent SCs. It is
life of SCi by action k. The denominator is the correspond-
defined as
ing maintenance cost. The action which leads to the
maximum maintenance benefit, i.e. Bpi ¼ MaxðBi;k Þ; would X
5
be selected for the SC. Rs ðtÞ ¼ RC ðtÞ {1 2 ai ½1 2 Ri ðtÞ} ð22Þ
No sooner than the SCs actions established, the system i¼1
availability on any stage can be calculated. It is
where ai is the failure probability of system due to the ith
n ðtj
X SC failure. In particular, the system combination would
T 2 tb;m hi;j ðtÞdt become to series if ai ¼ 1: The RC ðtÞ denotes the
MUTs;j i¼1 tj21
As;j ¼ ¼ X
n ð20Þ reliability of the surplus part excluding the five SCs and
MUTs;j þ MDTs;j
Tþ ti;k;a is modeled by an exponential function which possesses a
i constant failure rate, i.e. RC ðtÞ ¼ expð2ltÞ: It could be
Y.-T. Tsai et al. / Reliability Engineering and System Safety 84 (2004) 261–270 267
Table 1 Table 3
The supposed parameters of the subsystems in the example The PM schedules of the system in the example ðtb ¼ 3ta Þ
SCs aI uI bI MTBF tp m1 ta ðPMÞ C2p Stage Time (h) PM actions Availability Cost (PM) Cost (CM)
1 0.5 1300 1.8 1155 761 0.8 30 $180 j¼1 761 3 1 0 0 1 0.88 330 137
2 0.6 2400 2.5 2127 1278 0.8 50 $240 j¼2 1522 3 2 3 0 3 0.71 984 339
3 0.6 2600 3.2 2326 1408 0.9 70 $400 j¼3 2283 3 3 0 3 1 0.74 818 363
4 0.6 3800 3.1 3395 2068 0.8 60 $320 j¼4 3044 3 1 3 0 3 0.73 912 284
5 0.5 2000 3.1 1787 1066 0.8 80 $260 j¼5 3805 3 2 0 0 1 0.84 402 239
j¼6 4566 3 3 3 3 3 0.62 1400 510
m1 ¼ m2 ; tb ðCMÞ ¼ 3ta ; C1a ¼ 0:3C2p ; C1b ¼ 0:6C2p :
Average availability ¼ 0.76; Total cost ¼ 4846(PM) þ 1872(
CM) ¼ $6718.
regarded as the inherent failure rate of system and only be
resolved by (1C). Here, it is set to l ¼ 0:0002:
the maintenance costs in the example are listed in
The reliability parameters of the subsystem, ui ; hi can
Table 1. Here, ta ; C2p indicate the times and costs of
usually be acquired by numerical analysis for the exper-
preventive replacement for the SCs. The scale factor in time
imental data of reliability testing. A rapid manner in
is set to f ¼ 3 (i.e. tb ¼ 3ta ). The expected life of system is
evaluating these parameters is by their key components. The
set to TL ¼ 5000 h: The initial reliability of the SCs are all
determining of key components can refer to Ref. [6].
set to R0 ¼ 0:999: The minimum reliability for judging
Moreover, the maintenance-related factors m1 ; m2 ; ta and tb
whether maintained or not are set to and Rmin ¼ 0:8:
can be estimated following the previously introduced According to the given parameters, the optimal PM
methods. For example the control subsystem, the possible interval of the subsystems each can be obtained. They are
failures may be parts aging or local functions disabled. The
possible failure probabilities, the restored levels of (1a) and tp ¼ {761; 1278; 1408; 2068; 1066}
(1b) are assumed as pf;i ¼ ð0; 0; 0:5; 0:5Þ; Ii ¼ ðp ;p ; 0:7; 0:9Þ
and di ¼ ðp ;p ; 0:8; 0:8Þ; respectively, so m1 ¼ m2 ¼ 0:8 The PM interval of the system would be T ¼ 761: Next,
(Eqs. (8) – (10)). In addition, the needed times the maintenance benefits are calculated for choosing the PM
in maintenance can also be obtained which are actions. According to the given parameters in Table 1, the
assumed as ta ¼ 30 h and tb ¼ 90 h (Eqs. (11) and (12)). maintenance benefits of different PM actions can be
Similarly, the related factors of the other calculated by Eq. (19) (Table 2). The optimal PM contents,
subsystems can be derived, too. The supposed related availabilities and costs of the system are recorded in Table 3.
parameters (ai ; ui ; hi ; ta ; tb ; m1 ; m2 ; C1a ; C1b and C2p ) and According to the calculated results, the average availability
and the total cost of system were 0.76 and $6718,
Table 2 respectively. For example, SC2 would progress one time
The maintenance benefits of the subsystems under various PM actions of (1a)- and (1b)-maintenances before (2P)-maintenance.
The reliability changing of the system and the correspond-
Stage Actions Subsystems
ing subsystems in the example are shown in Figs. 4 and 5,
1 2 3 4 5 respectively. Moreover, for judging the validity of
the algorithm, we reset the tp ¼ 500 and 900 h to
j¼1 1a 2.67 3.18 4.03 0.64 4.15
1b 3.50 2.11 2.17 0.40 2.43
2P 3.71 3.12 1.89 1.37 2.89
j¼2 1a 2.67 5.47 5.34 4.93 4.24
1b 3.50 5.53 4.96 4.11 3.24
2P 3.71 5.07 5.65 4.68 4.36
j¼3 1a 2.67 4.98 4.03 6.28 4.15
1b 3.50 4.27 2.17 6.25 2.43
2P 3.71 5.18 1.89 6.79 2.89
j¼4 1a 2.67 3.18 5.34 0.64 3.24
1b 3.50 2.11 4.96 0.40 4.24
2P 3.71 3.12 5.65 1.37 4.36
j¼5 1a 2.67 5.47 4.03 4.93 4.15
1b 3.50 5.53 2.17 4.11 2.43
2P 3.71 5.07 1.89 4.68 2.89
Fig. 5. The reliability changing of the subsystems in the example ðtb ¼ 3ta Þ:
Table 6 the increase of the scale factor. The results are very
The PM schedules of the system in the example ðtb ¼ 4ta Þ consistent with the real case.
Stage Time (h) PM actions Availability Cost (PM) Cost (CM)
Table 7
The availabilities and costs of the system on different time ratios
tb ¼ 2ta 0.78 5104 1058 $6162 The work was supported by a grant from the National
tb ¼ 3ta 0.76 4846 1872 $6718 Science Council under contract No. NSC 91-2212-E-237-
tb ¼ 4ta 0.74 4088 2996 $7084
001. The authors would like to appreciate the reviewers for
tb ¼ 5ta 0.67 4712 4320 $9032
their valuable suggestions.
270 Y.-T. Tsai et al. / Reliability Engineering and System Safety 84 (2004) 261–270