Practical Considerations in Developing An
Practical Considerations in Developing An
Practical Considerations in Developing An
.000
0018-9529/89/0600-0253$01 1989 IEEE
-
Section 2 identifies the maintenance time constraint prob- on the list, should be added.
lem, provides some sources of information for deciding which 5. Instruments which are relied upon by other instruments
instruments should be on the PM schedule, contains some sam- might need to be on the PM list. These instruments can affect
ple calculations on time spent for PM, and looks at the HFIR the failure rates of systems that depend on their readings [l].
staffing specifically to determine if and how long the The HFIR PM schedule was developed though a joint
maintenance goals will fall behind schedule with the existing meeting of a representative of the HFIR Operations Division,
repair staff. Section 3 broadly discusses sensor validation and the HFIR Field Engineer, and the Instrument Foreman for the
comprises such issues as cause-consequence relations, detec- Instrumentation & Controls (I&C) Division. The Operations
tion of faulty sensors, use of smart sensors, redundancy, and Division representative provided input about the instruments
in-place calibration. Congestion periods, stability, and selected most critical to keep the reactor operating. The Field Engineer
concepts from queuing theory are reviewed in section 4. Sec- advised as to which areas of the HFIR had been most in need
tion 5 discusses some special maintenance considerations in- of maintenance, and the Instrument Foreman provided general
cluding instrument dependency on the same power sources, knowledge on the repair rates of instruments in both reactor
humans interacting with an observation of instruments, and and non-reactor plants. Where manufacturer’s specification call-
signal use in control systems. Section 6 presents alternative ap- ed for recalibration (eg, annually), this figure was used as a
proaches to PM that are based on likelihood vs severity of lower bound, so that such an instrument might be on a 6- or
accidents. 12-month PM schedule. Appendix A explains what instruments
in the HFIR primary pressure system are on the PM schedule.
were subsequently completed in 1987 August. The best time 176 hours/month. If all 3 technicians worked on HFIR
estimate for PM at the HFIR is approximately 90-110 maintenance directly then the work would total approximately
hourdmonth . 528 hours/month.
Experience with the HFIR has shown that most of the in- The combination of vacation time, holidays, sick leave, and
strument breakdown reports were written at night while the reac- attendance at safety meetings takes up between 30-35 % of the
tor was operating. The I&C Division personnel received 3-4 technicians’ available work-time. If the technicians spend only
breakdown reports per night with an average time of 3-4 hours 70% of their time on maintenance activities, the combined work
for unscheduled maintenance on each instrument report. Thus time for the 3 is approximately 370 hours/month (528 X 0.7
unscheduled maintenance averaged about 12 houdnight. Many = 370).
of the breakdown reports were false alarms in that the reactor Each technician spends approximately 40% of the work-
operators were concerned about a reading they were receiving time on unscheduled maintenance and 60%on PM. Unscheduled
from some instrument and wanted it checked, but upon examina- maintenance takes 40% of the 370 hours/month, viz, 148
tion the instrument was in working order. hours/month. That leaves 222 honrs/month for PM under the
Based on experience with the HFIR and other reactors and instrument foreman’s rule-of-thumb. Table 1 summarizes these
treatment plants operated by ORNL, the instrument foreman results.
uses a rule-of-thumb of assigning 250 instruments/technician. These calculations show that the unscheduled maintenance
This figure is not written in any operating procedures or manual hours have a nearly perfect equilibrium between demand and
but is based solely on manpower experience. Using some of supply, but the combined routine and special PM hours appear
these average figures as background, it is useful to turn to the to exceed 112 hours/month if all 3 technicians perform routine
specific HFIR unscheduled maintenance and PM schedules from PM. In fact, one of the technicians is expected to handle changes
both a supply and demand perspective. in equipment required for various experiments at HFIR.
If the HFIR repair staff were limited to 2 technicians then
they would be paid for approximately 352 hours/month and have
2.3.1 Demand for Maintenance Services 70% of that time (approximately 250 hourdmonth) to spend
on actual instrument maintenance. Adding column 1 of table
If unscheduled maintenance requires an average of 12 1 yields 260 hours/month, so that there might be a close cor-
hourdday and the HFIR is kept operating 25 daydmonth, then relation between the HFIR maintenance demand and the supp-
approximately 300 working hours would be needed for ly services of 2 technicians.
unscheduled maintenance on a monthly basis. However, the ob- Five other considerations also affect the manpower deci-
jective of initiating a PM plan is to reduce the number of unan- sions for implementing a successful PM plan.
ticipated breakdowns. Suppose the PM plan meets this objec-
tive, and the unscheduled maintenance requirement is cut in half
to 150 working hours/month, viz, an average of 6 hourdday 1. The 90-100 hourdmonth PM requirement might have
for the 25 days of operation. left out the extra maintenance amenities that could be afforded
If an average of 1.1 hours is spent on the 950 instruments with some extra time. For example, the technician might not
on the HFIR PM schedule then approximately 1045 hours are have time to clean up oil or other materials used in his
needed for this work. Distributed over a 12-month period, the maintenance/calibration work. A third technician could be
1045 hours amounts to approximately 90 houdmonth. If we justified to allow for extra time to do a more thorough job with
consider special requests for equipment verification prior to ex- each instrument.
periments or other unique circumstances then we might want 2. The instrument foreman’s rule-of-thumb of 250 in-
to add a cushion of 10-20 hourdmonth to handle special PM. strumentdtechnician suggests that the HFIR staff would be
Thus we arrive at a figure of approximately 100-110 slightly overworked with only 2 technicians. If the figures used
hours/month for PM which fits the current best estimates of in these calculations have omitted relevant work requirements
HFIR PM needs. Adding the 150 hours/month for unscheduled that are considered in the foreman’s rule-of-thumb then HFIR
maintenance, 90 hours/month for routine PM, and 20 maintenance requirements might need to include the part-time
hours/month for special PM yields approximately 260 services of the engineering technologist.
hours/month on the demand side. 3. Limiting the maintenance to 250 instruments/technician
on the HFIR might be too generous: The entire HFIR safety
system with approximately 240 instruments can be serviced in
2.3.2 Supply of Maintenance Services 3 days. Under the PM procedures written by the field engineer,
some of these instruments can be testdcalibrated simultaneous-
Prior to 1988, the HFIR had one instrument technician to ly. Thus one large group of instruments on the PM plan actual-
handle all the maintenance; however, from 1985-1988the reac- ly requires much less time for servicing than the average for
tor was shut down. The HFIR now has 3 instrument technicians the remaining HFIR instruments.
and an engineering technologist assigned to maintenance tasks 4. Timing is important. The restarted HFIR will run for
for the HFIR. Assuming 22 work-daydmonth and 8 work- 25 days and then require a 4-day shutdown for maintenance.
hours/workday then each technician is paid for approximately Some of the PM activities can be executed only during reactor
256 IEEE TRANSACTIONS ON RELIABILITY, VOL. 38, NO. 2,1989 JUNE
TABLE 1 or a fault tree. Fault trees impose a more rigid form on the rela-
Demand and Supply of Maintenance Services for HFIR tions by: 1) passing events through Boolean logic, 2 ) requiring
(hourdmonth) the implications to hold in both directions so that one can move
Demand Supply up or down the fault tree, and 3) incorporating restrictions on
Unscheduled maintenance 150 148
circular or overlapping branches of the tree. Appendix B con-
Routine PM 90 222 tains a partial listing of cause-consequence relations that can
Special PM 20 (figured in above) be easily depicted in fault trees and are contained in the HFIR
quality
- - assurance documentation.
Once the cause-consequence relations of the components
shutdowns, and it is possible that only 2 technicians, even if
have been identified, it should be possible to isolate particular
they are working overtime, might not complete their scheduled
events stemming from abnormal conditions in the reactor. One
PM tasks during the 4day shutdown. To analyze this constraint
method of detecting the events is direct observation; another
further, the list of PM tasks must be subdivided into: a) tasks
is to rely on alarms or annunciators. These events should then
that can be completed during reactor operations, and b) tasks
be compared to distinguish the difference in appearance between
that require a shutdown. The I&C Division inventory list of in-
an abnormal condition and a seemingly abnormal condition
struments is being updated to include this information.
caused by sensor failure.
. 5. Having additional personnel allows more time for fill- In the HFIR, the operator relies mainly on the instrument
ing out maintenance paperwork. A PM plan is only as good
readings available in the control room. Only during his once-
as the input it receives [2]. If a technician completes work on
per-shift equipment inspections will he walk around the plant
an instrument and fails to report the work then the PM system
to check on other instrument readings. Thus, since operators
acts as if that work has not been completed and omits the work
rely more on sensor signals rather than actual observations to
from the cumulative totals. To cut down on manpower spent
determine abnormal conditions during reactor operations, it is
filing paperwork, a computer terminal has been installed on-
helpful to discuss more rigorously how sensor failure can be
site at the HFIR to record maintenance/service and to make the
incorporated into conventional risk-assessment studies.
reporting requirements less tedious.
Faulty sensors can lead to 2 patterns of observed failures:
In the past, the HFIR shutdown period could last from 14
false positives and false negatives. In a false-positive pattern,
hours to 3 days. It generally overlapped with nights or
sensor failure can trigger an alarm of some safety system even
weekends, and the Operations Division could not afford to pay
when the true state of the system is normal. In a false-negative
overtime for PM. Because the instrument technician had little
pattern, the sensor can fail to register some abnormal system-
time to perform PM, a shutdown was required. The chance that
condition and give instead the appearance that all components
I&C personnel might have 2 working days between the hours
are working properly.
of 8:OO am and 4:30 pm was slim. Now the HFIR shutdown
False positives from sensor failure can be incorporated in-
is anticipated to last a minimum of 4 days, which is announced
to traditional fault-trees by including another parameter for sen-
in advance and thus possible to plan around.
sor failure at each step where it can have an impact.
Example 1
3. INSTRUMENT VALIDATION
The 2 initiating events, both A and B together cause some
consequence, C 1. A Boolean equation for this relationship is -
The second major task in developing a preventive
maintenance (PM) schedule is to look at alternative instrument
validation techniques for allocating time. The most reasonable AABVC1. (3-1)
method of validating instruments involves local testing with
monitors. For example, the field engineer or instrument tech- Notation
nician tests display devices in the High Flux Isotope Reactor A Boolean AND operator
(HFIR) by disconnecting the devices from the system, connec- V Boolean OR operator
ting his own test equipment, and then applying a specified signal A, B initiating events
to see that the devices register the correct value. C1 a consequence event: An alarm sounds (alarm-trip)
Another method of instrument validation comes from deter- C2 a consequence event: A separate alarm sounds
mining the cause-consequence relations for the associated com- S1 event: Sensor sl does not calibrate correctly
ponent failure. Knowledge of the consequences of component S2 event: Sensor s2 calibrates correctly and is
failures assists the planner in distinguishing actual events from working
faulty-sensor signals. If failure of a given component is known
to cause an observable event, then failure to witness this event The terms calibrate and working are a matter of degree.
could indicate that the problem rests with the given-sensor signal
rather than being a component failure. Some faulty sensor can independently cause the conse-
It is often helpful to depict graphically the cause- quence, C1 which could be an alarm-trip. By joining an event
consequence relations, either in the form of a semantic network S 1 to the 1.h.s. of (3-1) with a Boolean OR gate as shown in
GUTH: PRACTICAL CONSIDERATIONS IN DEVELOPING AN INSTRUMENT-MAINTENANCE PLAN 251
(3-2) an alternative source can be introduced to explain obser- uncalibrated resistance-bulb. If, on the average, out of 280 days
vations of C1 when the true component initiatingevents, A and of operation the resistance bulb was working properly, on 275
B, have not both occurred. days then the probabilities assigned to S2 are (279280, 5/280).
This logic implies that in 5/280 trials, the consequence (a signal
that the fans are not properly controlled) would not appear even
if the true reactor component state were abnormal. 0
Eq. (3-2) can be loosely translated as: ”if (both A and B are Few, if any, instruments on the HFIR have failure rates
failures) and/or (sensor SI fails) then an alarm is tripped, Cl”.O with known or relevant frequencies. Most instrument
breakdowns on the HFIR are unique and hence do not lend
Example 2 themselves to probabilistic calculations on failure rates. On the
other hand, many instruments on the HFIR need a 2-year
False negatives can be modeled in a similar fashion with calibration cycle. If the instrument is calibrated annually then
the Boolean AND operator and a sensor-failure parameter. the operator can rest assured of the accuracy of the signals com-
Begin with a causal relation similar to (3-l), except that now ing into the control room. If the routine calibration is delayed
we are interested in showing how another consequence, C2, to 2 years then the sensor signals become more questionable
might not be observed even when both A and B have occurred, towards the latter part of the cycle.
as shown in (3-3). Only one type of instrument on the HFIR, an early design
of an operational amplifier using an electro-mechanical chop-
( A A B ) A ~2 ‘ ~ 2 (3-3) per, has breakdowns that approach a pattern. At one time the
HFIR had about 100 such amplifiers in service. After the
That is, (3-3) shows that Occurrence of both A and B is necessary breakdown pattern was observed, it was found that they could
but not sufficient to cause C2 to occur. With the addition of be replaced by then state-of-the-art integrated circuits for less
S2,the associated sensor must be both calibrated and working money than the repaidupkeep on the old amplifiers. Therefore,
properly for the anticipated consequence C2 to occur. Other- advances in microelectronics led to cost savings by substitu-
wise even if both A and B occur, the consequence C2 does not tion of another type of instrument rather than repair of the ex-
occur. 0 isting type.
Using risk analysis for instrument validation requires col-
Discussion of Examples lecting data on individual sensor failure-rates or aging processes.
In general, the failure rate and aging process depend on the
The numerical values related to S1 & S2 derive from routine PM plan, so that some simultaneity-bias enters these
estimates of instrument failure-rates. Any instrument that re- figures. The lifetime of an instrument can be extended through
mained in perfect calibrationhepair would have: Pr(S2) = 1 routine maintenancehepair - even beyond that lifetime
and Pr(S1) = 0. However, S1 & S2 refer to 2 distinct rela- guaranteed by the manufacturer.
tionships. The S1 value might come from one particular instru- One source of statistics on failure or aging rates of in-
ment and the S2 from another. If S1 & S2 are both based on struments can be obtained from the log-books of repairman, eg,
the same instrument then that sensor might be sufficiently far service hours, nature of the problem, time for repair, frequen-
from calibration to cause C1 to occur. In terms of probabilities cy of repairs. For HFIR sensors, the data on failure rates are
derived from relative frequencies, an analyst could assign (oc- not very complete. The HFIR has been in operation since ca
cur, NOT occur) values of (0.1, 0.9) to S1 and (0.8, 0.2) to 1965. The first system of collecting and recording information
S2. For additional explanations, see [3]. on system repairs (MAINS) went into effect about 1976. The
On the HFIR one example of a false-positive signal is the instrument system was changed to the MAJIC ca 1986 January.
fairly frequent (monthly) spurious trip of an annunciator due MAJIC had some debugging problems with the entry of data;
to electrical noise. We might know that the intended cause for it did not get back into satisfactory.operation until 1986 October.
the annunciator to go off is the joint event ( A AND B). When Thus 10 years of data are missing from the first years of
the consequence (annunciator going off) is observed, but not HFIR operations. The data on MAINS are available, at some
the causes, then a preliminary hypothesis for the observation effort, on hard copy. The data on MAJIC for the first 8 months
is sensor failure. The sensor might be picking up electrical noise apparently contain some gaps. Thus there exists no consistent
or there might be a fault in the electrical system. If a particular data set on HFIR maintenance, and what data do exist might
annunciator, on the average, goes off 13 times a year, and 2 not be readily accessible or reliable, because certain informa-
of the 13 times are spurious then the probabilities assigned to tion on repair incidence was not recorded.
S1 are (2/13, 11/13). What emerges from applying risk analysis to the PM
An example of a false negative on the HFIR has occurred scheduling problem is a perspective of ranking various ac-
on the resistance-bulb calibration for the cooling tower. On cidentdevents that could occur if the sensors are not properly
several occasions, 1 of the 4 resistance bulbs has gone out of serviced/maintained. Risk assessment combines prob-
calibration, usually indicating a low temperature. If the fans abilities-of-events with their seriousness, to arrive at a risk fac-
are not on, or are not working properly, then information about tor. By focusing on these 2 variables, it is possible to re-examine
rising water-temperatures is not correctly conveyed through the the design of a PM plan with a view toward eliminating all
258 IEEE TRANSACTIONS ON RELIABILITY, VOL. 38, NO. 2,1989 JUNE
,
assumes that the breakdown follows a Weibull or lognormal
distribution. Without going into detail on the properties of each
distribution, it is helpful to maintenance planners to see figure
I
1 which shows survivor functions for 3 distributions (schedules).
Schedule A shows a knife-edged distribution along some
mean service time, Tm. The pattern implies that the overwhelm-
ing majority of individual instruments (of that type) require ser-
vice after or near time, Tm. This distribution corresponds to
an instrument with a well-known breakdown time, and little
deviation from that time.
Schedule B shows an exponential distribution which might
apply to an instrument with a variety of moving parts that can hammer
malfunction. Or, the instrument might depend on many ad- 1000 psi 1000 psi
instruments requiring recalibration after being placed in opera- at least 2 identically functioning instruments on-line at a given
tion on the HFIR are generally accessible, and they have local time, or an inventory of spare parts.
adjustment capabilities. On the High Flux Isotope Reactor (HFIR) the instruments
The potential for sensors or instruments with built-in com- and sensors that form the safety system are tracked in triplicate,
pensation capabilities could help to correct sensors that are sub- and a safety-trip requires 2-out-of-3 to activate. Thus when one
ject to fluctuations in electrical current, or air pressure for of the sensors needs PM or repair, it can be taken out of ser-
pneumatic instruments. A potential application at the HFIR is vice while the HFIR is operating. Scheduling-time constraints
for signals based on other signals, such as heat power for which pose the greatest difficulty on the HFIR for those sensors and
a 2 % deviation in the temperature and flow probes can lead to instruments that have no redundancy. The non-redundant in-
an 8 % deviation in the heat-power calculation. The related issue struments include such parts as the chemical treatment and de-
is how much you are willing to pay to have your heat-power aerator. PM on the sensors associated with the primary pressure-
signal reduced to a maximum variation of 1% instead of 8 % . system also requires scheduling during a shutdown, since the
Another conceptual issue for PM deals with the availabili- pressure-system sensors have no built-in redundancy.
ty of spare components in inventory and the ability to service In looking at congestion problems it is helpful to take some
the instrument during normal operations. One-of-a-kind in- techniques from queuing theory. If an instrument needs calibra-
struments generally have no inventory spares, and they often tion or service but no repairman is available then it is added
require a reactor shutdown before any maintenance can take to the waiting queue. The service times for the instruments are
place. However, many of the instruments on the safety, servo, random variables. Three important parameters to consider from
and counting channels of the HFIR are tracked in triplicate. As queuing theory are:
a result, one panel of instruments can be removed from service
during normal operations and serviced while the reactor is 1. The waiting time for repair on each instrument
operating. 2. The busy period during which one or more repairmen
Finally, tradeoffs between in-place calibratiodrepair com- are busy
pared to removing the instrument and talung it to a shop should 3. The queue size (number of instruments in the queue).
be included in calculations for sensor PM requirements.
Moreover, some sensors require a system shutdown or the Two aspects of PM on equipment distinguish it from other
removal of various obstacles before they can be serviced. Thus queueing processes:
while actual repair time on a particular instrument might take
only 2 hours, it might take a day to remove obstacles, and up 1. The possibility of PM introduces a simultaneity that
to a week before the shop has time to work on the instrument. means the PM required to keep the system working is a func-
tion of the amount of PM. This characteristic further implies
some control over the unanticipated nature of breakdowns -
4. SEARCH FOR CONGESTION PERIODS so that these instances can be controlled or reduced [8].
2. There is a finite population that can potentially break
The objective of designing a routine preventive maintenance down. Once all of these have broken down and are in the queue
(PM) plan is to avoid situations in which an engineer or repair- for repair, no more can enter the system. For most other queu-
man has too many instruments or sensors to service/maintain ing applications the effective population is infinite.
at a given time. The incidence of congestion periods generally When developing a program to ensure some PM objective
depends on the repair frequency of the sensors under study, the (eg, all the sensors related to the primary pressure on the HFIR
number of instruments or sensors in the study, and the priority are properly calibrated and in working order), the PM schedule
for working on the sensors. must be integrated with the service requirements for the rest
PM priority should be given to those sensors and in- of the instruments. Viewed as a separate plan to achieve some
struments whose failure can cause the most serious, as well as special objective, the PM plan should not reveal any conges-
the most frequent, consequence. Safety considerations must tion periods that would be evident when viewed simply as part
prevail over convenience or cost. Consequently, development of an overall schedule.
of a PM plan must consider the effects of various accident- Once a preliminary PM schedule is developed, the stabili-
related scenarios stemming from instrument failure. The PM ty of the plan should be tested by adding some unanticipated
plan might pose questions such as: What is the worst accident failures of instruments. These additional exogenous shocks can
that can happen if the instruments are serviced in the current show what points in the PM schedule have sufficient flexibility
priority ranking? What is the worst event that can occur if the to accommodate unscheduled maintenance. For real PM plans,
priority ranking is changed? After several iterations the most the number of unanticipated breakdowns in instruments is in-
important sensors should be identifiable. versely related to the time spent on PM.
Time flexibility in repairs and redundancy of sensors are Borrowing some concepts from perturbation theory, the
2 important factors in eliminating congestion periods. Where planner could adjust the parameters of the model - mean time
the sensors are redundant or a sensor can be serviced without for routine PM, mean time for unscheduled maintenance,
affecting operations, congestion periods can usually be alleviated number of instruments needing special PM, number of in-
if not eliminated. The redundancy can take the form of either struments needing routing PM, number of instruments in the
260 IEEE TRANSACTIONS ON RELIABILITY, VOL. 38, NO. 2,1989 JUNE
1 . Common causes for degradation or failure that defeat notices a particular instrument or sensor malfunction, yet fails
redundancy. Instruments share a common power source, to take corrective action immediately. The observer might con-
shelf/location, circuits, cooling source, etc. The High Flux clude that the instrument is unimportant and need not be
Isotope Reactor (HFIR) has been checked for common reliance calibratedhepaired immediately. Thus the instrument is allow-
by instruments on power source or other attributes. As a general ed to remain unrepaired until such time as: a) a reading from
rule, similar instruments or sensors have been designed to rely that instrument is actually needed, or b) some accident occurs
upon different electrical sources so that a failure in one area whence the instrument is needed to correct the system state.
does not affect the sensors in another. For example, the in- In developing a PM plan, it is helpful to determine the impact
struments in the safety system are tracked in triplicate. Thus of allowing a bad sensor to remain unrepaired.
core-inlet temperature is measured on panels A, B, C , with The spare instruments kept in inventory at the HFIR can
power source and wiring for panel A physically independent fall into this category of disuse and disrepair. The value of
of panels B, C. redundant or spare parts is questionable if they are not known
A power failure to one of the panels could affect all the to be in proper working condition. To resolve this issue, the
instruments on that panel. Each of the panels is connected to inventory of spare instruments has been added to the PM plan.
a separate battery bank that supplies electrical power to the panel Now the I&C Division maintains closer control over the work-
in the event of a utility power failure. The HFIR has backup ing condition of the spares.
generators that require a failure-to-start before relying solely 5. The manner in which signals from a sensor are used by
on the reserve electricity in the battery banks. the system. Some sensor signals feed directly to: a) a control
The HFIR has one important exception to the separate system, b) a recorder, or c) a local display only. Some sensors
power source & circuitry rule: The process systems, which in- that serve only as local gauges or instruments, and which were
clude the cooling-tower temperature-control and the pH treat- placed on the reactor only for convenience, have been left off
ment of primary water, do share a common power source. the PM plan. From a risk-analysis perspective, studying various
Moreover, in order to service a particular instrument in one accident-related scenarios for each instrument left off the PM
of the process systems, it might be necessary to open a circuit schedule could help predict whether such an instrument would
breaker. Although drawings exist that show the relation between be needed in a crisis. However, all instruments on the safety
a particular breaker and its associated instruments, no one has system that are part of any control system and that give output
studied the total impact on the HFIR facility when a family of signals to the control room have been placed on the PM plan.
instruments is taken out of service, for example, when a cir-
cuit breaker is opened.
Tylee [9] evaluates the functional redundancy approach to 6. ATTITUDES TOWARD RISK
detecting instrument failures in nuclear power plant instrumen-
tation. His real-time method uses a bank of Kalman filters for Preventive maintenance (PM) is not free. In general, the
each instrument to generate optimal estimates of the plant state. anticipated benefits of increased PM must be weighed against
By performing consistency checks among the outputs of ap- the costs. Risk assessment commonly assumes that some forms
propriate filter, Tylee can identify failed instruments. of risk, no matter how intolerable, cannot be completely
2. The number of instruments to be serviced. This number eliminated. Risk assessment often delivers a list of alternatives
partially determines the manner in which PM is undertaken. that can reduce the probability of some accident/event so that
The HFIR has 1132 instruments on the I&C Division inven- its risk factor is acceptable. Risk factor is:
tory list, and of this total, approximately 850 are on the pro-
grammed PM schedule. The kind of individual attention paid
to instruments as well as the variety of problems that can be
checked may be limited by the vast number of instruments in Notation
a reactor. Given a long queue of instruments waiting for PM,
a reactor instrument technician would likely have less time to Fi risk factor of an event
spend on individual instruments and might feel pressured to Pi probability of the event
complete his PM tasks. An analogy is preparing-meals: The way Si severity of the event-
a person serves a meal to one person differs from the way he
would serve meals to an entire family. The method of serving The planner for the PM schedule has an objective function that
a family, in turn, differs from the method of serving over loo0 uses both the severity and the probability of an accident to in-
employees. fluence which instruments are on the PM schedule and what
3. The number of instruments on the PM list. Thls number, their priorities are. The assumptions about risk attitudes in-
more so than just the time constraint, affects - a) decisions fluence why some managers use more PM than others do.
about purchasing tools for on-site repairs vs shop repairs, as Consider the operation of an engineering system with no
well as b) the number of employees in the PM program. PM plan and only one type of accident. The system components
4. The extent of human interaction with the instruments. are repaired on a bare-bones approach. The costs are measured
A sensor can continue giving bad readings until it is observed in terms of the severity of the accident and quantified in dollars,
by some operator or technician. In one scenario, an observer the same as benefits.
262 IEEE TRANSACTIONS ON RELIABILITY, VOL. 38, NO. 2,1989 JUNE
When E { U } exceeds the s-expected (average) utility from im- Funding for this research was provided by appointment to
plementing a PM plan, then the engineering-system operators the US Department of Energy Laboratory Cooperative
undertake a bare-bones approach to PM. Therefore, any change Postgraduate Research Training Program administered by Oak
that increases E { U } increases the incidence of operations Ridge Associated Universities. Don Asquith, Field Engineer
without any PM. Similarly, decreases in E{U} increase the for the High Flux Isotope Reactor (HFIR), provided invaluable
benefits from adopting a PM plan. assistance and most of the information on experiences with the
The statement, “The likelihood of an accident has propor- HFIR reported herein. I also thank Bill Zabriske, Charlie Allen,
tionally more impact than the severity of an accident on the deci- and two anonymous referees for their helpful suggestions.
sion to run without a PM plan.” can be expressed in terms of
derivatives:
[U(B) - U(B-C)] p / U > p U’(B-C) C / U (6-3b) The primary pressure system on the HFIR comprises the
following instruments:
Hence (6-3) holds if
1. Channel A Flux - measured 0 to 150%
[U(B) - U(B-C)]/C > U’(B-C). (6-4) 2. Channel B Flux - measured 0 to 150%
3. Channel C Flux - measured 0 to 150%
Inequality (6-4) simply requires U to be convex; ie, the plan- 4. FM258 Letdown Cleanup Flow - measured 0 to 200
ner prefers risk over the certainty equivalent. For example, if gpm
given choices between a 60/40 lottery of receiving $100 or $0, 5. HICM377 Secondary Flow Control Valve - a demand
+
and a second choice of $60 ($60 = 0.6 x $100 0.4 x $O), signal showing 0 to 100% closed for the 36 inch valve
a risk averse person will, by definition, choose the $60, while 6. HICM377A Secondary Flow Control Valve - a de-
a person who prefers risk will, by definition, prefer the fair mand signal showing 0 to 100% closed for the 10 inch valve
lottery. If the odds remain the same but the certain payoff is (attached to the inlet temperature controller)
lowered to an unfair $50 then the marginally risk averse per- 7. FM2 16 Pressurizer Pump Flow - measured 0 to 200
son might, by definition, accept the slightly unfair lottery. gpm, this flow is measured after the letdown flow has passed
Are operators of engineering-systems risk averse, or do they through chemical processing and is returning to the primary
prefer risk? The answer most likely varies across industry system.
because risk-bearing can be a source of profits in the private 8. PM127 Primary Pressure - measured in 0 - 1500 psi.
sector. If the question is limited to nuclear plants then all per- This sensor actually measures from 3 - 15 lbs, which it then
sonnel associated with managing the plant are risk averse. In translates to the 0 - 1500 psi scale.
addition, regulatory constraints added to the objective function 9. PM127A Pressure Control Valve Position - measured
essentially eliminate all gains from operating without a work- 0 to 100% open, a demand signal for the valve to open, not
ing PM plan. Even if the reactor operated without incident, the a feedback signal from the valve itself.
administrators would be penalized for failing to take adequate 10. #1 Primary Flow - measured 0 - 20 0oO gpm
precautions. 11. #2 Primary Flow - measured 0 - 20 0oO gpm
Additional inferences can be drawn from this model. For 12. #3 Primary Flow - measured 0 - 20 0oO gpm
example, increasing the severity of the accident by raising the 13. #1 Inlet Temperature - measured 75 - 200 degrees F
cost thereof, or increasing the probability of an accident will 14. #2 Inlet Temperature - measured 75 - 200 degrees F
lower the s-expected (average) utility from operating without 15. #3 Inlet Temperature - measured 75 - 200 degrees F
GUTH: PRACTICAL CONSIDERATIONS IN DEVELOPING AN INSTRUMENT-MAINTENANCE PLAN 263
16. #1 Outlet Temperature - measured 75 - 200 degrees F true of the three primary flow sensors, which are on the ven-
17. #2 Outlet Temperature - measured 75 - 200 degrees F turi - an hour-glass shaped tube with an orifice for measuring
18. #3 Outlet Temperture - measured 75 - 200 degrees F flow, are permanent, and are not calibrated. The secondary flow
19. FM300 Secondary Flow - mesured 0 - 25 OOO gpm is measured by a dah1 tube - a funnel-shaped tube with an
20. TM3 10B1 Coolinf Tower Inlet Temperture - measured orifice, and is not calibrated.
20 - 120 degrees F, both this signal and the cooling tower outlet The core-mlet temperature sensors are routinely PM’d only
temperature are measured by a resistance bulb. on the safety side. Each of the three transmitters is serviced an-
2 1. TM3 1OA 1 Cooling Tower Outlet Temperature - nually and required about 30 minutes for calibration. The resistance
measured 20 - 120 degrees F bulbs are on a 3 year plan and staggered so that only one bulb
22. #1 Rod Position - measured 0 - 27 inches is serviced in a given year. It is quite an ordeal to check the calibra-
23. #2 Rod Position - measured 0 - 27 inches tion of these bulbs. First the primary water must be drained from
24. #3 Rod Position - measured 0 - 27 inches the system, which requires a minimum of 8 hours. Once the bulb
25. #4 Rod Position - measured 0 - 27 inches is removed it is sent to the Standards Office, which is located in
26. #5 Rod Position - measured 0 - 27 inches a building about 1.5 miles from the HFIR,to be immersed in a
bath. The resistance bulbs rarely if ever show signs of drifting
Two other sensors, not presently linked to any other system, are: out of calibration. But since these bulbs are part of the safety
system, they are serviced just to be sure that they have stayed
FM128 Low Pressure - a local meter, provides no signal in calibration.
FM104 Backup Pressure Sensor to FM127 - appears as digital The core outlet temperature sensors are not part of the safe-
LED in control room ty system; hence, they are not routinely serviced like the core
inlet temperature sensors. Neither the HICM377 36 inch con-
Notation troller valve nor the HICM377A 10 inch controller valve require
routing PM. Repairs on the valves, if ever needed, are split be-
psi pounds per square inch, gauge pressure tween Instrumentation & Controls personnel for the top part of
gpm gallons per minute the valve (the controller box) and Plant & Equipment personnel
F Farenheit for the valve itself.
lbs pounds Primary pressure PM127 is serviced annually. The calibra-
tion check takes about 1 hour, but it may take all day before
the sensor can be removed and taken to the shop. The PM127A
Summary of Which Sensors are Routinely Calibrated and How pressure controller valve is serviced annually and takes about
Ofren one hour to complete the PM check; the valve is easily accessed
in the control room.
PM on the 3 flux channels can be separated into PM on the The 5 rod-position sensors are not on a routine PM schedule.
ion chamber and PM on the instrument itself. Each of the three Because of the way they are locked into place by bolts, they do
ion chambers is serviced on a 3 year basis, and staggered so that not drift out of calibration over time. The only part that requires
one chamber is serviced a year. The PM check takes only about servicing is the pointer, which is visually compared to a yard-
2 hours; the chambers are readily accessible. The instrument itself stick in the subpile room during each restart.
is serviced every 6 months, and this service effort-requires bet- The cooling toward inlet temperature TM31OB1 sensor con-
ween 2 - 3 hours. The HFIR has a total of 9 flux sensors: 3 on tains one transmitter and one resistance bulb. The cooling tower
the safety, 3 on the servo, and 3 on the counting channels. outlet temperature TM310A1 sensor contains 4 transmitters and
The letdown cleanup flow, pressurizer pump flow, and three 4 resistance bulbs with sensor reading being the average of the
primary flow sensors are not calibrated. The manufacturer’s four. The transmitters are serviced annually. Because of the loca-
specifications are taken as true and accurate. The devices are tion of the resistance bulbs, it is economically not feasible to ser-
installed and not generally serviced. This practice is particularly vice them on a routine basis.
reduction in primary cleanup flow 1 . primary cleanup pump fails to start on request
primary recirculating pump seals wear reduction in primary cleanup flow
primary cleanup pump fails off during operation 1. basic motor failure
2. overload relay tripped
3. control room switch turned off
4. local switch turned off
5. timer relay TR-2 fails after flow switch 217 clears
6 . timer relay TR-2 fails after auto transfer switch #1 goes
back to normal
pump bowl leak 1. basic pump failure
2. mechanical seal failure
3. pump vent open
4. pump drain value open
bearings and seals fail after extended use loss of cooling water
fuel-cladding failure 1. sufficient flow blocked or diverted away from fuel
region
2. power transients occur
reactivity-control lost or hindered 1. control plates are jammed
2. extension tubes are jammed
3. shock tubes are jammed
4. tracks are jammed
5. moderator shifts to alter the core flux distribution
6. reflector material shifts to alter the core flux distribution
power transients occur reactivity-control lost or hindered
REFERENCES AUTHOR
[I] Winfrid G. Schneeweiss, “The failure of systems with dependent control”,
IEEE Trans. Reliability, vol R-35, 1986 Dec. pp 512-517. Dr. A. S. Guth; RJO Enterprises; 116 Oklahoma Avenue; Oak Ridge, Tennessee
[2] J. B. Fussell, J. S. Arendt, “System reliability engineering methodology: 378308604 USA.
A discussion on tghe state of the art”, Nuclear Safety, vol 20, Sep-Oct Michael Anthony Stephen Guth was born in Oak Ridge, Tennessee on
1979, pp 541-550. 1962 August 1. He completed his BA (Economics) from Rice University in
[3] M. A. S. Guth, “A probabilistic foundation for vagueness and impreci- 1982, his MS (Economics) from California Institute of Technology in 1984,
sion in fault tree analysis”, revision submitted to IEEE Trans. Reliability, and his PhD (Economics) from the University of Tennessee in 1988. He worked
1988, (TR87-042/1). as a system analyst and economist at the NASA Jet Propulsion Laboratory from
[4] P. M. Morse, Queues, Inventories and Maintenance, John Wiley & Sons, 1982 - 1984, an economist at and postgraduate research fellow at Oak Ridge
1958. National Laboratory form 1985 - 1988, and since 1988 July as a Senior
[5] D. N. Khandelwal, Jaydev Sharma, L. M. Ray, “Optimal periodic Technical Specialist with RJO Enterprises. His research interests include uncer-
maintenance policy for machines subject to deterioration and random tainty theory, risk evaluation, and mathematical modeling of decision processes.
breakdown”, IEEE Trans. Reliability, vol R-28, 1979 Oct, pp 328-330. He is a member of the American Economics Association and the Operations
[6] S . E. Emoto, R. E. Schafer, “On the specfication of repair time re- Research Society of America.
quirements”, IEEE Trans. Reliability, vol R-29, 1980 Apr, pp 13-16.
[7] John M. Sheppard, “Discussion o f On the specification of repair time Manuscript TR87-704 received 1987 October 8; revised 1988 September 1.
requirements”, IEEE Trans. Reliability, vol R-30, 1981 Apr, pp 36-37. IEEE Log Number 24512 4TRb
[8] L. Takacs, Introduction to the Theory of Queues, Oxford University Press,
1962.
[9] J . Louis, Tylee, “On-line failure detection in nuclear power plant instrumen-
tation”, IEEE Trans. Automatic Control, vol AC-28, 1983 Mar, pp
406-4 15.
“A statistical method of obtaining the factors in electronic-component reliability- “Optimal apportionment of reliability & redundancy in series systems under
prediction models”, Zhongsen Yang Dept. of Computer Science Univer- multiple objectives”, Anoop K. Dhingra School of Mechanical Engineer-
sity of Regina 0 Regina, Saskatchewan S4S O A 2 CANADA. (TR89-057) ing c Purdue University West Lafayette, Indiana 47907 o USA. (TR89-058)