Is 4002 Maintainability Engineering
Is 4002 Maintainability Engineering
COURSE : M.E
1
Other Recourses
2
Unit 1
Machinery/equipment must be lined and levelled, wearing surfaces must be examined and replaced, oiling
schedules must be laid down at regular intervals. Thus a machine in good operating condition subjected to regular
inspection and adjustment will continue to produce quality products for a long time.
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of
necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business,
governmental, and residential installations.
Maintenance functions are often referred to as maintenance, repair and overhaul (MRO), and MRO is also used
for maintenance, repair and operations.] Over time, the terminology of maintenance and MRO has begun to become
standardized. The United States Department of Defense uses the following definitions:
The performance of tasks required to ensure the continuing airworthiness of an aircraft, including any one
or combination of overhaul, inspection, replacement, defect rectification, and the embodiment of a modification or
a repair.
3
Estimation of maintenance costs and evaluation of alternatives
Forecasting of spare parts
Assessing the needs for equipment replacements and establish replacement programs when due
Application of scheduling and project management principles to replacement programs
Assessing required maintenance tools and skills required for efficient maintenance of equipment
Assessing required skills for maintenance personnel
Reviewing personnel transfers to and from maintenance organizations
Assessing and reporting safety hazards associated with maintenance of equipment
Challenges in maintenance:-
“As an asset begins to age, a significant challenge Maintenance Managers face is ensuring they spend the
optimum level of expense money (OPEX), at the right time, to maintain the highest level of integrity and
reliability that ensures continuous production and the safe operation of the facility.
Ultimately, for efficient, it is needed to spend the right amount of money to achieve this level of integrity
and reliability - which can be a difficult balance to achieve. And it was found by time after time that
maintenance teams tend to overspend their allocated maintenance budget, because they don’t have a clear
understanding of the relative risk associated with the failure of their equipment.
“The key to overcoming this challenge requires not only assessing the risk associated with each element of
the plant, but also determining how these risk rankings come together to form the basis of the planned
maintenance and inspection programs.”
Have a clear and well understood RACI (Responsible, Accountable, Consulted and Informed)
program in place, so that the right level of management is making the decisions for expensive corrective
work.
Aging assets all have repair histories that can be trended. Look at those trends to help determine the
next inspection or planned maintenance.
Ask the manufacturers for a list of other companies that have the same equipment at a similar age
and in similar service, and then contact these companies for additional insight.
4
Be aware that when to operate outside the stated integrity operating window (IOW), the inspection
and planned maintenance intervals will no longer be applicable. If the plant operates outside of these
IOWs, risk will usually be heightened, and inspection intervals need to be shortened.
“Unexpected rotating or fixed equipment failure can result in significant production losses, or worse,
environmental or human losses for a company. Avoiding this type of failure is therefore at the top of the
priority list for Maintenance Managers.
“A best in class preventative maintenance and inspection program comes down to intervals. But when is the
right time to conduct routine maintenance? Is it when the manufacturer says it’s time, or when the plant
says it’s time?”
“While the recommended maintenance time intervals provided by the OEM (original equipment
manufacturer) can sometimes work, But the inspection and preventative maintenance program based on
risk should be more effective, and otherwise results in a drop in unexpected failures and an increase in
uptime.”
Ensure that the top senior leaders of management of the facility take ownership, and support your risk-based
program through funding and people.
“Planned maintenance programs are expensive and take time - many take as long as 3 - 5 years to design and
implement successfully. If senior management doesn’t have a long-term view and commitment of people
and resources, then the program is set to fail.”
“when a plant puts in place a fully functional and accurate CMMS, along with conducting the proper
criticalities and risk assessments, surprises are minimized and many historical failures are not repeated
because the proper timing has been established for the inspections and maintenance.”
“The sole role of the CMMS (computerized maintenance management system) is to provide teams with
accurate and up-to-date information, to enable them to make the best decisions possible.
1. The plant with missing, inaccurate and inconsistent data, which forces the maintenance team to
spend a great deal of time on hardcopy reports and spreadsheets. Reporting to management can also
be late, and data can be out of date.
2. The site whose CMMS is accurate, consistently well-populated and maintained, with clear ownership
and accountabilities. In this scenario, the trained maintenance team is wasting no time with manual
reporting or planning, and the CMMS is used to gain helpful insights to make their work more
effective and streamlined, and can be accessed in real-time.
5
TIPS AND CONSIDERATIONS:
Conduct a site-wide physical asset verification (PAV) to ensure the data in your CMMS is correct.
Give ownership of the program to a single individual with senior management support and funding.
Don’t accept one-off spreadsheets for reports. Always depend on the accuracy of the CMMS, and insist on
using that data.
Invite the wider team to attend user group meetings hosted by the CMMS software provider, to learn about the
latest releases and to share experiences and learnings.
Track success through appropriate KPIs, and always look for ways to improve.
“When a plant does nothing but corrective or reactive maintenance, the maintenance teams losing sight of
the bigger picture - which is to perform proper planned maintenance to avoid such breakdowns. And these
unexpected shutdowns can often result in blown budgets and missed production targets.
“Frustratingly, it can seem almost impossible to get ahead of the backlog in order to become more
proactive and implement a planned and risk-based preventative maintenance program.
“But the words of Peter Drucker, the famous management consultant and author - ‘What gets measured gets
managed’ - are incredibly important when it comes to maintenance. And measurement is often the missing
piece of the puzzle, that can slowly but surely help maintenance teams make that change.
“It’s like running long distance - if you don’t know what your time is each time, then you don’t know if
you’re getting better or worse.
“Continuous performance measurement is absolutely key to knowing if the working on the right
maintenance, following the plan, and achieving the budget, and will help to make that a predominantly
corrective and reactive maintenance plan, to a more planned and proactive plan.”
Risk assess all rotating and fixed plant equipment with solid risk-based inspection (RBI) and reliability-
centred maintenance (RCM) programs
“By doing this, the maintenance team able to set the inspection and planned maintenance intervals
accordingly.”
“All Maintenance Managers know the bad actors - those assets that constantly break down or fail
unexpectedly, resulting in losses.
“For starters, consider tracking preventative versus corrective maintenance work orders, plotted against
downtime. The maintenance person will know the planned maintenance program is starting to work when it
is shifted from CM to PM, and an increase in stream factors.
“By measuring these KPIs, the full picture of the maintenance effectiveness, and the challenging but
achievable goals to continue improving the team’s performance can be measured.
6
.
Ensure data is being input into the CMMS correctly and consistently
“If it is bad data in the CMMS, then the tracking the KPIs becomes inaccurate. Take the time to prepare the
data behind the KPIs, so that the clearest view of the performance will be possible.”
“We all know it takes teamwork to effectively manage and run a group of assets, but a huge challenge will
be faced when maintenance teams are working within imbedded functional (e.g. maintenance, operations,
engineering), that reflect the historical culture of the plant.
“Many functional managers seem uninterested or frustrated when trying to combine forces, or share
responsibility, and the result is siloed teams operating in isolation, which is not beneficial for the overall
business.
Every plant needs a rally point to tear down silos. A reduction in OPEX or an increase in stream time usually
works.
Take the time to understand the issues experienced by both silos when working together previously, and
construct joint sessions to work through those issues together.
Change the way business is done by instituting new lines of communications and inviting members of all silos
to participate in necessary planning sessions.
And most importantly, get senior level buy-in by instituting a legitimate and generous recognition and rewards
program.
CHALLENGES IN MAINTENANCE
Maintenance managers perform a highly complex range of tasks. Both the productivity and generation of
value within asset intensive organizations depend on effective maintenance management programs (ensuring
the right things are done) as well as efficient maintenance delivery (ensuring things are done right first
time). As the leader of the maintenance organization, maintenance managers directly impact both aspects.
As such their role is complex and diverse, requiring continuous long and short-term trade-offs of production,
asset integrity, human resources, safety and commercial considerations, all of which impact on one or more
functional areas across the organization. The following article discusses some of the challenges maintenance
managers face in performing their day-to-day job.
One of the challenges facing the modern maintenance manager is to increase the operational efficiency of
the organization and reducing unscheduled downtime by implementing maintenance management programs
that appropriately balance preventive, predictive, corrective and replacement options. Balancing these
7
options, are required in order to maximize the productive utilization of these assets over the medium to long-
term, while ensuring the integrity thereof, at optimum cost.
This requires maintenance managers to have deep knowledge of the equipment and the facilities under their
care, the operational environment and load on these assets, manufacturer's recommendations, guarantees,
manuals and instructions. These need to be translated into effective maintenance strategies and detailed task
plans that enable forward planning of task such as cleaning, lubrication, inspection, overhaul etc., together
with the associated parts, services, tools etc. as well as end of life preplacement planning and budgeting.
Contingency management
Despite the best forward planning, unplanned equipment failures are an ever-present reality, requiring
maintenance managers to also be prepared for appropriate contingency management when these do
happen. While the incorporation of an effective reactive maintenance response capability forms an integral
part of any well-balanced maintenance program, the challenge for maintenance managers is to identify
appropriate response / execution options and strategies.
In this regard, maintenance managers need to improvise and manage the speed of response including
logistical support operations (e.g. specialized equipment and services, long delivery parts etc.) necessary to
solve, as quickly as possible, faults and failures that may occur unexpectedly..
While it may in theory be possible to deploy a perfect technical maintenance program, maintenance
managers face the very real challenge of financial budget constraints that limit their planning options. As a
result, maintenance managers continually need to balance operational expenditures on technology, labour,
parts and service costs, while keeping in mind the impact of these decisions on the long-term productive
value of the installed asset base. This require maintenance managers to consider all cost types, including
fixed, variable, direct or indirect costs as well as the corresponding depreciation and tax implications. In
this regard it is critically important for maintenance managers to understand the business context of the
assets under their care, and the commercial impact of maintenance related decisions and trade-offs, both in
the short as well as over the long term.
A particularly challenging area involves the optimization of MRO parts inventory and costs. On the one
hand it is vital to have all the parts, and materials necessary to carry out maintenance tasks, without delaying
urgent maintenance tasks. At the same time, it is also important not to carry unnecessary stock, as this
negatively impacts operating capital, stores infrastructure and labour costs, while over the longer-term lead
to unnecessary write-offs due to losses, obsolescence, or and deterioration.
8
Coordination of work teams
While maintenance has a primary focus on physical assets and the associated technological/ technical
elements thereof, maintenance execution is a people driven process that requires the right balance between
process, technology support and leadership. The planning and coordination of maintenance tasks requires a
high degree of people management, communication, negotiation and planning skills.
Knowledge, experience and preparation vary according to the profile of the maintenance organization. Thus,
collaborative work is of vital importance. It is important that the maintenance manager helps facilitate
communication channels and integrated work between different parts of the organization (e.g. engineering,
production, procurement, stores, safety department etc.) as well as within the maintenance organization
itself, including, various disciplines, support workshops, service providers etc. Maintenance managers are
also expected to contribute / function as part of the senior management team, responsible for making key
business, operational and financial decisions.
Establishing a collaborative work model, in which information can be shared in real time about completed
tasks, breakdowns, procedures, expiration dates, etc., is one of the essential challenges of modern
maintenance management.
The coordination and monitoring of the maintenance work groups, the assignment of tasks, meetings with
vendors and distributors, answering questions, doubts and emails, schematizing work plans and
maintenance, and even reacting to all kinds of unforeseen situations, are just Some of the multiple tasks that
every maintenance manager must face every day.
The organization, prioritization and execution of each of these activities involve the handling of large
amounts of information, immediate access to documents, diagrams and photos, and the ability to share them
with the different members of the work team, facilitating the function of each one, and guaranteeing the
quality and effectiveness of the work. All this makes the efficient management of time, becomes a daily
challenge for the maintenance manager.
TERO TECHNOLOGY
Tero technology is the maintenance of assets in optimal manner. It is the combination of management,
financial, engineering, and other practices applied to physical assets such as plant, machinery, equipment,
buildings and structures in pursuit of economic life cycle costs.
It is concerned with the reliability and maintainability of physical assets and also takes into account the
processes of installation, commissioning, operation, maintenance, modification and replacement.
9
Decisions are influenced by feedback on design, performance and costs information throughout the life
cycle of a project.
It can be applied equally to products, as the product of one organization often becomes the asset of another.
MAINTENANCE COSTS
What Are Maintenance Cost?
The term maintenance expense refers to any cost incurred by an individual or business to keep their assets in
good working condition. These costs may be spent for the general maintenance of items like running anti-virus
software on computer systems or they may be used for repairs such as fixing a car or machinery. These expenses are
in addition to the actual purchase price of an asset, so individuals and companies should be able and willing to foot the
bill in order to keep their assets in running order.
How much an individual pays in maintenance expenses depends on the type of asset and how often upkeep is
required and performed. Individuals may incur maintenance costs for homes, automobiles, appliances, and electronics,
while businesses pay for maintenance on their fixed assets—vehicles, equipment, facilities—and their technology.
Keeping up to date with regular maintenance can keep costs down because the asset is serviced on a timely basis.
Neglecting assets and waiting until the last minute to service them may result in higher maintenance costs. If the asset
isn't maintained at all, the owner may have to replace it altogether.
KEY TAKEAWAYS
Maintenance expenses are necessary costs for upkeep—whether it's a car, home, rental apartment, or
condominium.
10
Neglecting regular maintenance—and not paying expenses for upkeep—may result in higher maintenance
costs and, even worse, replacement costs for the asset itself.
Individuals pay for maintenance on things like homes, automobiles, and appliances, while companies pay for
upkeep on fixed assets and technology.
Special Considerations
Consumers should consider the initial price tag as well as the item's ongoing maintenance expenses when they
purchase an item that requires upkeep. This is why it's always a good idea for any consumer to set some
money aside for maintenance expenses. Failure to do so may result in financial distress when it comes time to
pay for these charges in the future.
It's always a good idea to have money set aside for the regular maintenance of your assets.
Government regulations require landlords to maintain certain safety and living standards. For example, the heat in an
apartment building must meet minimum standards. The infrastructure, such as heating and ventilation, must be
adequately maintained by the landlord. Some of the upkeep and maintenance may fall on the tenant. The rental
agreement should define what expenses are the renter's responsibility.
Condo Fees
Monthly fees are common for people who own condominiums. Condo fees can range from $50 to $1,000 depending
on the property, building, and location. If the building has a concierge, swimming pool, tennis courts, or gym, those
costs are built into the monthly condo fee.
Buyers who want maintenance-free living should consider the monthly fees when calculating their affordability and
the potential mortgage payment for the condominium. If, for example, the mortgage payment is $1,500 per month
while the condo fee is $600 per month, the condo fee represents nearly 30% of the total monthly payments to live
there.
Plant maintenance services provide attention for the maintenance of machines and equipment’s due
to their frequent use and strategic position in the entire production function. A machine is the name given to
a mechanism consisting of the services of sequential components each performing its specific function
which is part of the whole system or mechanism.
11
For any machine some of its parts are fixed while other are replaceable. Such equipment or mechanical
devices and their components require constant and continuous services such as cleaning, lubrication, repair
and replacements etc. so that their operational efficiency can be maintained.
Further it may be noted that plant maintenance service is not confined to the equipment and machines.
Under the wide spectrum of the plant maintenance service, the maintenance of the buildings power plant,
material handling equipment’s, heating and air conditioning equipment’s, waste disposal systems, wash
rooms, water supply, jigs and fixtures and fire-fighting facilities etc., also need attention. The activity of the
plant maintenance service also includes the provision of maintenance equipment and stock of repair parts
and maintenance materials.
Types/Areas of Maintenance:
The major areas of maintenance are:
(1) Civil Maintenance:
Building construction and maintenance, maintaining service facilities such as water supply, steam, gas,
compressed air, heating and ventilating, air conditioning, plumbing, carpenter and painting work. Also
included in civil maintenance are fencing, land scarping, lawns, gardening and maintaining drainage and
fire-fighting equipment’s etc.
12
Fig. 34.1 illustrates the various types of maintenance. Basically, maintenance work can be planned or
unplanned. Planned maintenance is maintenance work organized and carried out with foresight, control and
records, to a predetermined plan. Unplanned maintenance is caused due to breakdowns that have not been
foreseen.
Planned maintenance involves the inspection of all plants and equipment’s, machinery, buildings according
to a pre-determined schedule in order to overhaul, service, lubricate or repair before actual breakdown in
service occurs. The purpose is to reduce the machine stoppage due to sudden breakdown requiring
emergency maintenance.
13
(i) What is to be maintained i.e. the individual item of the plant and equipment to be maintained.
(ii) The details of how each item is to be maintained i.e., method to be adopted.
(iii) What maintenance resources would be needed i.e., manpower, tools, spares and test equipment etc. to
carry out the maintenance work.
(iv) The frequency of carrying out maintenance inspection.
(v) The method of managing the maintenance operation.
(vi) The method of analysis, rectification and control must be pre-decided in order to evaluate the
performance of maintenance system and improvements if possible.
So it is the duty of maintenance engineering department in a manufacturing enterprise that all the above
mentioned factors must be defined clearly.
This will form the basis and structure of a practical maintenance programme which must possess the
essential details regarding the following features:
(i) List of all the machines/equipment, plant item which require maintenance.
(ii) Comprehensive maintenance programme/schedule for each and every item needing maintenance.
(iii) A time table/programme of maintenance events when each work must be carried out.
(iv) A technique of ensuring the maintenance work listed in the time table.
A method of recording the results achieved and thus judging the implementation/effectiveness of the
maintenance programme.
Thus any such programme should be easy to operate, should need minimum manpower and paper
work of recording etc. But it must indicate the following aspects clearly:
(a) What requires maintenance or what is to be maintained?
(b) When/where it is to be maintained?
(c) How it is to be maintained?
(d) Who will do the maintenance work?
(e) Whether maintenance work is of desired level?
(ii) Unplanned Maintenance:
It is an operation/activity carried out without any prior planning. Generally it is very urgent in nature.
Such type of maintenance operations are required in case of heavy and total breakdowns which may occur
without any prior indication. Such breakdowns are generally harmful to the system and they may cause loss
of human life also. In order to fight such unwanted situations provisions are made to provide maintenance
with prior planning, preparations, and scheduling etc.
Thus in most of the cases the unplanned maintenance is emergent in nature in view of the fact that here the
recovery time is the most important factor in order to minimize the consequences of serious breakdown. The
examples of such failures or breakdown may be bursting of boilers or failure of pipe lines carrying
fluids/gases.
Emergency Maintenance:
In reality emergency maintenance is a special type of unplanned maintenance operation which is performed
with prior planning. It is necessary to implement immediately in order to avoid serious consequences of a
heavy breakdown. Heavy loss of production, heavy maintenance cost and sometimes even loss of human life
are the serious consequences.
Thus emergency maintenance may be defined as a sort of unorganized maintenance activity which should be
executed only by utilizing available resources in minimum possible time. Emergency maintenance is
essential to minimize the time delays as well as heavy production losses by virtue of serious break- down or
unpredictable failures.
14
2. The nature of failure is very serious.
3. No time lag is allowable.
4. Recovery time is given top priority.
5. Generally occurs in pressure vessels such as boilers or turbines where the risk involved is very high.
6. Implementation is very urgent and corrective in nature with available resources.
7. The delay in implementation may be serious.
Breakdown Maintenance:
1. It may be planned or unplanned.
2. The nature of failure is normal or not very serious.
3. Permissible time lag may be allowed.
4. Maintenance cost is the first priority.
5. Generally occurs in general engineering work, where the delay in timely repair is not very risky.
6. Implementation is not very urgent but corrective in nature.
7. The effect of delay in implementation is not very serious.
Economic Aspects of Maintenance:
The main goal/objective of properly run maintenance department is to make available the plant, equipment
and machinery for productive utilization during the scheduled hours operating to pre-decided standards with
minimum possible waste and minimum total cost involved.
The total cost means sum of the maintenance labour cost, material cost, cost of lost production due to non-
availability of productive equipment/machinery or their reduced operational efficiency due to lack of
maintenance.
Maintenance is thus a service which has economic value to the production process. When this value is
calculated and expressed in quantitative terms, then only the comparison of cost effectiveness of various
maintenance policies is possible.
Fig. 34.3 illustrates the total maintenance cost which has been optimized by equating various direct as well
as indirect maintenance costs. There are various mathematical relations to evaluate maintenance in
performance in numerical terms, either as a single overall factor or as a series of factors.
15
16
UNIT II
MAINTENANCE MODELS
Proactive maintenance
Proactive maintenance is the maintenance philosophy that supplants “failure reactive” with “failure proactive” by
activities that avoid the underlying conditions that lead to machine faults and degradation.
Unlike predictive or preventive maintenance, proactive maintenance commissions corrective actions aimed at failure
root causes, not failure symptoms. Its central theme is to extend the life of machinery as opposed to
While reactive maintenance can have a place in a well-rounded maintenance strategy, it shouldn’t be your
go-to for all repairs.
17
Emergency repairs are usually prioritized at the expense of planned work, which may be pushed or cancelled
completely. This can lead to maintenance backlog which is really hard to get on top of once it starts to pile up.
Worse repair or maintenance: a maintenance action which makes the system failure rate or actual age
increases but the system does not break down. Thus, upon worse repair system’s operating condition
becomes worse than that just prior to its failure.
Worst repair or maintenance: a maintenance action which undeliberately makes the system fail or break
down.
Some possible causes for imperfect, worse or worst maintenance are due to the maintenance performer,
Repair the wrong part, Only partially repair the faulty part, Repair (partially or completely) the faulty part
but damage adjacent parts, Incorrectly assess the condition of the unit inspected, Perform the maintenance
action not when called for but at his convenience (the timing for maintenance is off the schedule).
It is proposed various methods for modeling imperfect, worse and worst maintenance. It is necessary to
summarizing these methods. This will be helpful to rectify the imperfect, worse and worst maintenance
because these modeling methods can be utilized in various maintenance and inspection policies.
18
MAINTENANCE POLICIES:-
The maintenance policy of a productive system provides specific answer to problems concerned with the
selection of specific components parts of a system for maintenance, decision regarding the specific forms of
maintenance to be used, a choice between internal and external maintenance and a further choice between centralized
and decentralized maintenance in case of internal maintenance.
Moreover not all items are influenced/controlled by preventive maintenance. For example, an item showing a
time independent, i.e. a Negative Exponential failure behaviour, then the reason of failure is external to the item
hence, any amount of preventive replacement is not going to serve intended purpose.
Preventive maintenance policy is appropriate for items that wear out with time due to use i e. for items that
show a normal failure mode.
Furthermore, such a policy may be useful only if the costs of preventive maintenance are significantly lower
than those of the breakdown maintenance replacement which means that the item should be simply replaceable item
and not a complex one for replacement operation.
Of course, preventive replacement cannot be rejected outright for a complex part but the “cost-cum-safety”
factors have to be taken into consideration while deciding a maintenance policy.
Primarily, maintenance policy must answer the questions of the extent of activities and the size of the
maintenance department. As far as the extent of activities is concerned, practices vary across companies. Small
enterprises, for example, use the maintenance department for simple repair and replacement.
A major non production engineering job in these plants such as an addition to construction of a new building
is handled through some outside experts with only token aid from plant’s maintenance department. Reverse is the case
when large companies/plants are involved since they have their own more specialized staff in all major non-
production engineering jobs.
With regard to equipment maintenance two practices are commonly followed. One practice is to have a well-
planned and organized maintenance programme formulated to secure maximum life and utilization of machinery. The
second practice is to adopt a policy of minimum maintenance and maximum wear. This practice is more economical in
view of the fact that the equipment is usually superseded before it wears out.
As far as the size of the maintenance department, manufacturing work force tends to have large maintenance
crews in order to solve their breakdown problems on a moment’s notice. It is for the management of the enterprise to
strike a balance between prompt and delayed maintenance services to be provided.
19
PREVENTIVE VS BREAKDOWN MAINTENANCE:-
Preventive maintenance identifies any issues before equipment failure or downtime, through routinely scheduled
maintenance. Breakdown maintenance works by running equipment until it breaks down, in which case repairs and
maintenance are performed.
Preventive maintenance operates based on a schedule, where maintenance tasks are completed at specific intervals
prior to downtime events. This is because the goal of preventive maintenance is to maximize the lifespan and runtime
of equipment.
Breakdown maintenance is somewhat specific because it’s not applicable to many pieces of equipment. For example,
it is not a suitable maintenance strategy for anything involved in human safety and health, nor is it a good strategy for
critical or central pieces of equipment.
However, it works well with things that are designed to be used until they’re inoperable. This can include everything
from light bulbs to residential water heaters.
20
Even though a water heater may be considered a critical piece of equipment, the time spent PMing a water heater—
which includes disrupting the resident every X months—is probably more intrusive than fixing a broken system every
decade or so. This shows that breakdown maintenance is applicable for critical pieces of equipment in certain cases,
especially in the property management space.
Preventive maintenance, on the other hand, is a solid maintenance plan for almost all pieces of equipment in a factory
setting. In a residential setting, however, it only makes sense to perform PMs on equipment in non-living areas.
Preventive maintenance (PM) is work that is Breakdown maintenance (BM) is work that is
Definition scheduled based on calendar time, asset only performed when a piece of equipment
runtime, or some other period of time. breaks down or has a downtime event.
-Can be expensive to keep up over the long -Can’t be used for many types of equipment,
term especially safety equipment
Cons
-Labor intensive due to constant maintenance -Requires careful planning and execution to
tasks work effectively
21
Preventive Maintenance Workflow
Overview
Preventive maintenance, also spelled preventative maintenance, is carried out with the goal of increasing asset lifetime
by preventing excess depreciation and impairment or untimely breakdown. This maintenance includes, but is not
limited to, adjustments, cleaning, lubrication, repairs, and parts replacements.
Due to the unique needs of different assets, the type and amount of preventive maintenance required varies. Because
of this, it can be challenging to establish a successful preventive maintenance program. However, a good rule of
thumb is to start with a time-based PM program.
Overview
Preventive maintenance, also spelled preventative maintenance, is carried out with the goal of increasing asset lifetime
by preventing excess depreciation and impairment or untimely breakdown. This maintenance includes, but is not
limited to, adjustments, cleaning, lubrication, repairs, and parts replacements.
Due to the unique needs of different assets, the type and amount of preventive maintenance required varies. Because
of this, it can be challenging to establish a successful preventive maintenance program. However, a good rule of
thumb is to start with a time-based PM program.
Calendar-based maintenance
A recurring work order is scheduled for when a specified time interval is reached in the computerized maintenance
management system (CMMS).
Usage-based maintenance
Meter readings are used and logged in the CMMS. When a specific unit is reached, a work order is created for routine
maintenance.
Predictive maintenance
When work order data is logged in the CMMS, maintenance managers can predict when an asset will crash based on
historical events and create specific PMs to prevent them from happening again.
22
Prescriptive maintenance
This is similar to predictive maintenance, but instead of only the maintenance manager prescribing PMs, machine
learning software assists them.
How preventive maintenance decreases downtime
Think about it in simple terms such as with your car. Oil changes and regular servicing are part of a preventive
maintenance schedule that ensures your car runs properly and without unexpected failure. If you ignore that
maintenance schedule and miss service intervals, your car will depreciate in value and utility. The same goes for
machinery in manufacturing plants and equipment in facilities.
With a PM schedule in place, maintenance managers can decrease downtime. This schedule is usually automated with
a CMMS that comes with PM scheduling software. However, managers are always cautious of over-maintaining
assets. There’s a point where preventive maintenance starts costing too much in relation to the amount of downtime it
prevents.
How about your water systems? Do you have appropriate filtration? Are you running warm water systems that may be
a breeding area for serious bacterial infections such as Legionnaires Disease? How about your electrical systems and
the need to ensure that they not only comply with legislation but do not degrade over time? Doors, stairways, lighting,
and flooring all need periodic inspection and maintenance, too.
The list of what needs to be included in your preventive maintenance plan can be bewildering, but there are certain
guidelines that give you at least a basis to conform too. The American National Standards Institute (ANSI) carries a
lot of information on preventive maintenance and is a good place to start if you are unsure as to the extent of the
program that you need.
23
Perhaps the greatest benefit is increased safety, especially for a company that owns heavy machinery. The
price of employee safety is never too high and organizations such as the Occupational Health and Safety
Administration (OHSA) rigorously enforce government policy.
INSPECTION MODELS
The basic purpose behind an inspection is to determine the state of the equip-ment. Once indicators,
such as bearing wear, gauge readings, and quality of the product, which are used to describe the state, have
been specified, and the inspection made to determine the values of these indicators, some further
maintenance action may be taken, depending on the state identified. When the inspection should take place
ought to be influenced by the costs of the inspection (which will be related to the indicators used to describe
the state of the equipment) and the benefits of the inspection, such as detection and correction of minor
defects before major break-down occurs.
1. Inspection frequencies: for equipment that is in continuous operation and subject to breakdown
2. Inspection intervals: for equipment used only in emergency conditions (failure-finding intervals)
3. Condition monitoring (CM) of equipment: optimizing condition-based maintenance (CBM)
decisions.
Equipment breaks down from time to time, requiring materials and trades people to repair it. Also,
while the equipment is being repaired, there is a loss in production output. To reduce the number of
breakdowns, we can periodically inspect the equip-ment and rectify any minor defects that may otherwise
eventually cause complete breakdown. These inspections cost money in terms of materials, wages, and loss
of production due to scheduled downtime.
What we want to determine is an inspection policy that will give us the correct balance between the
number of inspections and the resulting output, such that the profit per unit time from the equipment is
maximized over a long period.
Such a system is depicted in Figure 3.2, in which it is seen that the complex system can fail for many
reasons, such as that caused by component 1, component 2, and so on. Each of these causes of equipment
failure could have its own inde-pendent failure distribution. Of course, it does not need to be a physical
component that causes the equipment to cease functioning; it could well be a software problem that is the
cause (mode) of equipment failure. Clearly, as the frequency or intensity of inspections increases, there is an
expectation that the frequency of equipment/system failures will be reduced. The challenge is to identify the
optimal frequency/intensity.
24
3.2.2consTrucTion of the model
1. Equipment failures occur according to the exponential distribution with mean time to failure
(MTTF) = 1/λ, where λis the mean arrival rate of failures. (For example, if the MTTF = 0.5 year, then the
mean number of failures per year = 1/0.5 = 2, i.e., λ= 2.)Note that it is not unreasonable to make this
exponential assumption for complex equipment (Drenick 1960).
3. The inspection policy is to perform ninspections per unit time. Inspection times are exponentially
distributed with a mean time of 1/i.
25
4. The value of the output in an uninterrupted unit of time has a profit value V(e.g., selling price less
material cost less production cost). That is, Vis the profit value per unit time if there are no downtime losses.
6. The average cost of repairs per uninterrupted unit of time is R.Note that Iand Rare the costs that
would be incurred if inspection or repair lasted the whole unit of time. Thus, the actual costs of inspection
and repair incurred per unit time will be proportions of I and R, respectively.
7. The breakdown rate of the equipment, λ, is a function of n, the frequency of inspection per unit
time. That is, the breakdowns can be influenced by the number of inspections; therefore, λ ≡ λ(n), as
illustrated in Figure 3.3.
In Figure 3.3, λ(0) is the breakdown rate if no inspection is made, and λ(1) is the breakdown rate if
one inspection is made per unit time. Thus, from the figure, it can be seen that the effect of performing
inspections is to increase the MTTF of the equipment.
8. The objective is to choose nto maximize the expected profit per unit time from operating the
equipment. The basic conflicts are illustrated in Figure 3.4.
26
The profit per unit time from operating the equipment will be a function of the number of
inspections. Therefore, denoting profit per unit time by P(n),
=Vλ(n)/µ
Note that λ(n)/μis the proportion of unit time that a job spends being repaired.
= Vnli
Cost of repairs per uninterrupted unit of time Number of repairs per unit time Mean time to
repair
=Rλ(n)/µ
27
Substitution of n= 3 into Equation 3.1 will, of course, give the expected profit per unit time resulting
from this policy. Insertion of other values of n into Equation 3.1 will give the expected profit resulting from
other inspection policies. Comparisons can be made with the savings of the optimal policy over other
possibilities, and over the policy currently adopted for the equipment.
MINIMIZATION OF DOWNTIME
28
3.3.1 Statement of The problem
The problem of this section is analogous to that of Section 3.2.1: equipment breaks down from time
to time, and to reduce the breakdowns, inspections and consequent minor modifications can be made. The
decision now, however, is to determine the inspection policy that minimizes the total downtime per unit time
incurred due to breakdowns and inspections, rather than to determine the policy that maximizes profit per
unit time. Figure 3.6 illustrates the problem.
2. The objective is to choose nto minimize total downtime per unit time. The total downtime per unit
time will be a function of the inspection frequency, n, denoted as D(n). Therefore,
= λ(n) / + n / i 3.4
Equation 3.4 is a model of the problem relating inspection frequency n to total down-time D(n)
29
REPLACEMENT DECISIONS:-
The goal of this chapter is to present models that can be used to optimize compo-nent replacement
decisions. The interest in this decision area is because a common approach to improving the reliability of a
system, or complex equipment, is through preventive replacement of critical components within the system.
Thus, it is neces-sary to be able to identify which components should be considered for preventive
replacement, and which should be left to run until they fail. If the component is a candidate for preventive
replacement, then the subsequent question to be answered is: What is the best time? The primary goal
addressed in this chapter is that of mak-ing a system more reliable through preventive replacement. In the
context of the framework of the decision areas addressed in this book, we are addressing column 1 of the
framework, as highlighted in Figure 2.1.
Deterministic problems are those in which the timing and outcome of the replacement action are
assumed to be known with certainty. For example, we may have an item that is not subject to failure but
whose operating cost increases with use. To reduce this operating cost, a replacement can be performed.
After the replacement, the trend in operation cost is known. This deterministic trend in costs is illustrated in
Figure 2.2.
Examples of component replacement problems that can be treated with a deterministic model are
provided in Table 2.1.
Probabilistic problems are those in which the timing and outcome of the replacement action depend
on chance. In the simplest situation, the equipment may be described as being good or failed. The
probability law describing changes from good to failed may be described by the distribution of time between
completion of the replacement action and failure. As described in Appendix 1, the time to fail-ure is a
random variable whose distribution may be termed the equipment’s failure distribution.
Examples of component replacement problems that can be analyzed using a sto-chastic model are
provided in Table 2.2.
30
The determination of replacement decisions for probabilistically failing equip-ment involves a problem of
decision making with one main source of uncertainty: it is impossible to predict with certainty when a failure will
occur, or more generally, when the transition from one state of the equipment to another will occur. A further source
of uncertainty is that it may be impossible to determine the state of equip-ment, either good, failed, or somewhere in
between, unless definite maintenance action is taken, such as inspection. This aspect of uncertainty is highly relevant
to equipment, often termed protective devices, used in emergency situations. An exam-ple of such a protective device
is a pressure safety valve in an oil and gas field—if it is dormant, waiting to come into service when an unacceptable
pressure level occurs. Its condition can only be determined through an inspection.
In the probabilistic problems of this chapter, we will assume that there are only two possible conditions of the
equipment, good and failed, and that the condition is always known. This is not unreasonable because, for example,
with continuously operating equipment producing some form of goods, we will soon know when the equipment has
reached the failed state because items may be produced outside speci-fied tolerance limits or the equipment may cease
to function.
In determining when to perform a replacement, we are interested in the sequence of times at which the
replacement actions should take place. Any sequence of times is a replacement policy, but what we are interested in
determining are optimal replace-ment policies, that is, ones that maximize or minimize some criterion, such as profit,
total cost, and downtime, or ensure that a specified safety or environmental criterion is not exceeded.
31
In many of the models of component replacement problems presented in this chapter, it will be assumed
(which applies in many cases) that the replacement action returns the equipment to the “as new” condition, thus
continuing to provide exactly the same services as the equipment that has just been replaced when it was new. By
making this assumption, we are implying that various costs, failure distributions, and so on used in the analysis do not
change from one replacement to the next. An exception to this assumption will be problems in which the item being
replaced is not replaced by one that can be considered statistically as good as new.
Throughout this chapter, maintenance actions such as overhaul and repair can be considered to be equivalent
to replacement, provided it is reasonable to assume that such actions also return equipment to the as-new condition. In
practice, this is often a reasonable assumption, and hence the following models can often be used to analyze
overhaul/repair problems. If it is not reasonable to make such an assumption, then the models introduced in Section
2.9.3, along with the model associated with condition-based maintenance in Chapter 3, may help.Section 2.2 addresses
a common deterministic component replacement problem. Stochastic problems are covered in Sections 2.3 through 2.9
Some equipment operates with excellent efficiency when it is new, but as it ages, its performance deteriorates.
An example is the air filter in an automobile. When new, there is good gasoline consumption, but as the air filter gets
dirty, the gasoline con-sumption per kilometer increases. The question then is: When in the increasing cost trend is it
economically justifiable to replace the air filter, thus reducing the operating cost of the automobile? In general,
replacements cost money in terms of materials and wages, and a balance is required between the money spent on
replacements and savings obtained by reducing the operating cost. Thus, we wish to determine an optimal replacement
policy that will minimize the sum of operating and replacement costs per unit time.
When dealing with optimization problems, in general, we wish to optimize some measure of performance over
a long period. In many situations, this is equivalent to optimizing the measure of performance per unit time. This
approach is easier to deal with mathematically when compared to developing a model for optimizing a measure of
performance over a finite horizon.
The cost conflicts and associated optimization problems are illustrated in Figure 2.3. It should be stressed that
this class of problem can be called short-term deter-ministic because the magnitude of the interval between
replacements is weeks or months, rather than years. If the interval between replacements was measured in years, then
the fact that money changes in value over time would need to be taken into account in the analysis.
32
construction of the model
1. c(t) is the operating cost per unit time at time tafter replacement.
3. The replacement policy is to perform replacements at intervals of length t r. The policy is illustrated in
Figure 2.4.
4. The objective is to determine the optimal interval between replacements to minimize the total cost of
operation and replacement per unit time.
To use the equation c(tr) = C(tr) requires that the trend in operating costs be an increasing function, which in practice is
a very reasonable assumption. If that is not the case, and as time progresses, the operating cost of a component
33
becomes lower, then Equation 2.1 needs to be solved using classic calculus (if the cost trend is simple); otherwise, a
numerical solution will be required.
If the trend in operating costs is not continuous, but discrete, then the optimal replacement time is when the
next period’s operating cost is equal to or greater than the current average cost of replacement to that time. In other
words, replace when the marginal operating cost is greater than the average cost to date.
Numerical Example
34
Further comments
In the construction of the model in this section, the time required to produce a replacement has not been included. This
replacement time, Tr, can be accommodated without difficulty. See Figure 2.7 and Equation 2.2 for the appropriate
model:
In practice, it is often not unreasonable to disregard the replacement time because it is usually small when
compared with the interval between the replacements. Any costs, such as production losses incurred due to the
duration of the replacement, need to be incorporated into the cost of the replacement action.
Models have now been developed whereby, for particular assumptions, the opti-mal interval between
replacements can be obtained. In practice, there may be considerable difficulty in scheduling replacements to occur at
their optimal time, or in obtaining the values of some of the parameters required for the analysis. To further assist the
engineer in deciding what an appropriate replacement policy should be, it is usually useful to plot the total cost/unit
time curve (Figure 2.8). The advantage of the curve is that, along with giving the optimal value of t r, it shows the form
of the total cost around the optimum. If the curve is fairly flat around the optimum, it is not really very important that
the engineer should plan for the replacements to occur exactly at the optimum, thus giving some leeway in scheduling
the work. Thus, in Figure 2.8, a replacement interval (t r) with a value somewhere between 3.5 and 6 weeks does not
greatly influence the total cost. Of course, if the total cost curve is not fairly flat around the optimum but rising rapidly
on both sides, then the optimal interval should be adhered to if at all possible.
If there is uncertainty about the value of the particular parameter required in the analysis—say, we are not sure
what the replacement cost is—then evaluation of the total cost curve for various values of the uncertain parameter, and
noting the effect of this variation on the optimal solution, often goes a long way toward deciding what policy should
be adopted and if the particular parameter is important from a solution viewpoint. For example, changing the value of
Cr in Equation 2.1 may produce curves similar to Figure 2.9, which demonstrate, in this instance, that although Cr is
varied, it does not greatly influence the optimal values of t r. In fact, there is an over-lap, which indicates a good
solution independent of the true value of Cr(provided this value is within the bounds specified by the two curves). If
changes in Cr drastically altered the solution from the point of replacement interval and minimal total cost, then it
would be clear that a careful study would be required to identify the true value of C r to be used when solving the
model. (For example, does Cr include only material and labor costs? Or does it include lost production costs? Or costs
associated with having to use a less efficient plant, overtime, or contractors, etc., to make up for losses incurred
resulting from the replacement?) The decision that can be taken (in this case regarding the interval between
replacements) essentially may remain constant within the uncertainty region checked by sensitivity. This does not
necessarily mean that the true total costs will have more or less the same numerical value within the overlap region.
From a decision-making point of view, however, this does not matter because it is the interval between replacements
that is under the control of the decision maker. The total costs are a consequence of the decision taken.
35
Thus, sensitivity checking gives guidance on what information is important from a decision-making viewpoint
and, consequently, what information should be gath-ered in a data collection scheme. The statement “garbage in =
garbage out,” which is frequently made with reference to data requirements of quantitative techniques, is also
demonstrated to be not necessarily correct. The validity of the “garbage in = garbage out” statement does depend on
the sensitivity of the solution to particular garbage. Note, therefore, that garbage indoes not necessarily equal garbage
out, and so our information requirements for the use of quantitative techniques may not be as severe as is often
claimed.
APPLICATIONS:
What is the economic replacement time for the air filter in an automobile?
The purchase price of an air filter is $80. The automobile driver travels 2,000 km/month. Gasoline costs
$0.75/L. When the air filter is new, then during the first month of operation, the automobile’s performance is 15 km/L;
thus, the first month’s operating cost is $100.00. As the filter ages, there is a deterioration in the number of kilometers
that can be driven using 1 L of gasoline. The deterioration trend is given in Table 2.4.
Using Equation 2.1, in discrete form, we obtain Table 2.5, from which we see that the optimal replacement
age is 4 months, and the associated cost per month is $131.88. The associated graph of cost per month versus time is
provided in Figure 2.10, which includes a calculation showing the use of the optimizing criterion c(t) = C(t r) when the
trend in operating cost is discretized.
Therefore, replace at the end of month 4 because next period’s operations and maintenance cost, c(t= 5), is
greater than the average cost to date ($131.88).
36
Overhauling a Boiler Plant
The replacement problem we have been discussing is similar to a problem associated with a boiler plant.
Through use, the heat transfer surfaces within the boiler become less efficient, and to increase their efficiency, they
can be cleaned. Cleaning thus increases the rate of heat transfer, and less fuel is required to produce a given amount of
steam. However, due to deterioration of other parts of the boiler plant, the trend in operating cost is not constant after
37
each cleaning operation (equivalent to a replacement), but follows a trend similar to that of Figure 2.11. Thus, k
illustrated in Figure 2.6 is no longer constant, but varies from replacement to replacement. That is, the trend in
operating cost after each replacement depends on the amount of steam produced up to the date of the replacement. A
detailed study of this problem is given by Davidson (1970), who analyzes it using a dynamic programming model.
MODELS
One of the main tools in the scientific approach to management decision making is that of building an
evaluative model, usually mathematical, whereby a variety of alternative decisions can be assessed. Any model is
simply a representation of the system under study. In the application of quantitative techniques to management
problems, the type of model used is frequently a symbolic model in which the components of the system are
represented by symbols, and the relationships of these components are described by mathematical equations.
To illustrate this model-building approach, we will examine a maintenance stores problem that, although
simplified, will illustrate two of the most important aspects of the use of models: the construction of a model of the
problem being studied and its solution.
A Stores Problem
A stores controller wishes to know how many items to order each time the stock level of an item reaches zero.
The system is illustrated in Figure 1.5.
The conflict in this problem is that the more items the controller orders at any time, the more the ordering costs
will decrease because fewer orders will have to be placed, but the stockholding costs will increase. These conflicting
costs are illustrated in Figure 1.6.
The stores controller wants to determine the order quantity that minimizes the total cost. This total cost can be
plotted, as shown in Figure 1.6, and used to solve the problem. In this particular case, the total cost is minimized when
the order quantity is at the intersection of the holding cost curve and the ordering cost curve. However, this should not
be generalized; for example, see Figure 1.8. A much more rapid solution to the problem, however, may be
38
obtained by constructing a mathematical model. The following parameters can be defined:
D -total annual demand
Q -order quantity
Co -ordering cost per order
Ch- stockholding cost per item per year
Since
Orderingcost/year=Number of orders placed per /year ordering cost per order
= DCo / Q
Stock holding cost/year = Average number of items in stock per year(assuming linear decrease of stock)
Stock holding cost per item per year
= (1/2) QCh
Therefore, the total cost per year, which is a function of the order quantity, and denoted C(Q), is
C(Q) = (DCo /Q) + (QCh /2) 1.1
Equation 1.1 is a mathematical model of the problem relating order quantity Q
to total cost C(Q).
The stores controller wants the number of items to order to minimize the total cost, that is, to minimize the
right-hand side of Equation 1.1. The answer comes by differentiating the equation with respect to Q, the order
quantity, and equating the derivative to zero as follows:
Because the values of D, Co, and Ch are known, their substitution into Equation 1.2 gives Q*, the optimal value of Q.
Strictly speaking, we should check that the value of Q* obtained from Equation 1.2 is a minimum and not a
maximum. The interested reader can check that this is the case by taking the second derivative of C(Q) and noting that
the result is positive. In fact, in this particular case, the opti-mal order quantity equalizes the average holding and
ordering costs.
From Equation 1.2, we can find that by optimizing the order quantity, the total cost per year is minimized, and
its value is
CQ DCC ()∗= oh
For example, let D= 1000 items, Co = $5.00, and Ch = $0.25:
Thus, each time the stock level reaches zero, the stores controller should order 200 items to minimize the
total cost per year of ordering and holding stock.
Note that various assumptions have been made in the inventory model pre-sented that, in practice, may not
be realistic. For example, no consideration has been given to the possibility of quantity discounts, the possible lead
time between placing an order and its receipt, the fact that demand may not be linear, or the fact that demand may not
be known with certainty. The purpose of the above model is simply to illustrate the construction and solution of a
39
model for a particular prob-lem. If the reader is interested in the stock control aspects of maintenance stores, see
Nahmias (1997).
In the stores problem of the previous section, two methods for solving a mathematical model were demonstrated:
an analytical procedure and a numerical procedure.
The calculus solution was an illustration of an analytical technique in which no particular set of values of the
control variable (amount of stock to order) was considered, but we proceeded straight to the solution given by
Equation 1.2.
In the numerical procedure, solutions for various values of the control variables were evaluated to identify the best
results, that is, it is a trial-and-error procedure. The graphical solution of Figure 1.6 is equivalent to inserting different
values of Qinto the model (Equation 1.1) and plotting the total cost curve to identify the optimal value of Q.
In general, analytical procedures are preferred to numerical ones, but because of problem complexity, in many
cases, they are impracticable or even impossible to use. In many of the maintenance problems examined in this book,
the solution to the mathematical model will be obtained by using numerical procedures. These are pri-marily graphical
procedures, but iterative procedures and simulation are also used.
Perhaps one of the main advantages of graphical solutions is that they often enable management to clearly see the
effect of implementing a maintenance policy that deviates from the optimum identified through solving the model.
Also, it may be possible to plot the effects of different maintenance policies together, thus illustrat-ing the relative
effects of the policies. To illustrate this point, Chapter 2 includes the analyses of two different replacement
procedures:
Intuitively, one might feel that procedure 2 would be preferable because it is based on usage of the item (thus
preventing an almost new item from being replaced shortly after its installation subsequent to a previous failure, as
would happen with procedure 1).
For these different maintenance policies, which can be adopted for the same equipment, models can be
constructed, as is done in Chapter 2 and, for each pol-icy, the optimal procedure can be determined. However, by
using a graphical solu-tion procedure, the maintenance cost of each policy can be plotted, as illustrated in Figure 1.7,
and the maintenance manager can see exactly the effect of the alternative policies on total cost. It may well be the case
that from a data collection point of view, one policy involves considerably less work than the other, yet they may have
almost the same minimum total cost. This is illustrated in Figure 1.7, in which the minimum total costs are about the
same for procedures 1 and 2.
40
FIGURE 1.7 Comparing the total maintenance costs of two preventive replacement procedures
Of course, for different costs, breakdown distributions, failure and preventive replace-ment times, and so on,
the minimum total costs and replacement intervals may differ greatly between different replacement policies. The
point is that a graphical illustration of the solutions often assists the manager to determine the policy to be adopted.
Also, such a method of presenting a solution is often more acceptable than a statement such as “policy x is the best,”
which may be presented along with complicated mathematics.
Further comments about the benefits of curve plotting are given in Section 2.2.4 in relation to the problem of
determining the optimal replacement interval for equip-ment, the operating cost of which increases with use.
One of the developments in numerical procedures made possible by comput-ers is simulation. An application of
this procedure will be illustrated in a problem in Chapter 5, which relates to determining the optimal number of
machines to be installed in a workshop.
• Inspection frequencies
• Overhaul intervals, i.e., part of a preventive maintenance policy
• Whether to do repairs, i.e., having a breakdown maintenance policy or not
• Replacement rules for components
• Replacement rules for capital equipment—perhaps taking account of tech-nological changes
• Whether equipment should be modified
• The size of the maintenance crew
• Composition of machines in a workshop
• Rules for the provision of spares
Appendix 7 provides a list of real-world applications of maintenance decision optimization models in different
industries.
Problems within these areas can be classified as being deterministic or probabilis-tic. Deterministic ones are
those in which the consequences of a maintenance action are assumed to be nonrandom. For example, after an
overhaul, the future trend in operating costs is known. A probabilistic problem is one in which the outcome of the
maintenance action is random. For example, after equipment repair, the time to next failure is uncertain.
41
To solve any of the previously mentioned problems, there are often many alterna-tive decisions. For example,
for an item subject to sudden failure, we may have to decide whether to replace it while it is in an operating state, or
only upon its failure; whether to replace similar components in groups when only one has failed; and so on. Thus, the
function of the asset management department is, to a large extent, con-cerned with determining the effect of various
decisions to control the condition of assets on meeting the objectives of the organization.
As indicated previously, many control actions are open to the maintenance man-ager. The effect of these
actions should not be looked at solely from their effect on the asset management department because the consequences
of such actions may seriously affect other units of the organization, such as production or operations.
To illustrate the possible interactions of the asset management function in other departments, consider the
effect of the decision to perform repairs only and not to do any preventive maintenance, such as overhauls. This
decision may well reduce the budget for asset management, but it may also cause considerable production or opera-
tion downtime. To take account of interactions, sophisticated techniques are frequently required, and this is where the
use of mathematical models can assist the maintenance manager and reduce the tension that often occurs between
maintenance and operations.
Figure 1.8 illustrates the type of approach taken by using a mathematical model to determine the optimal
frequency of overhauling a piece of the plant by balancing the input (maintenance cost) of the maintenance policy
against its output (reduction in downtime)
The above example is very simple and, in practice, we have to consider many factors in the context of even a
single maintenance decision. For example, if the objective of a maintenance decision is to minimize total costs—
lowest cost optimization—the costs of the component or asset, labor, lost production, and per-haps even customer
dissatisfaction from delayed deliveries are all to be considered. Where equipment or component wear-out is a factor,
the lowest possible cost is usu-ally achieved by replacing machine parts late enough to get good service out of them,
but early enough for an acceptable rate of on-the-job failures (to attain a zero rate, we would probably have to replace
parts every day). In another scenario in which availability is to be maximized, we have to get the right balance
between taking equipment out of service for preventive maintenance and suffering outages due to breakdowns. If
safety is the most important factor, we might optimize for the safest possible solution, but with an acceptable effect on
cost. If profit is to be optimized, we would take into account not only cost but also the effect on revenues through
greater customer satisfaction (better profits) or delayed deliveries (lower profits).
42
The example shown in Figure 1.8 should suffice to show that the quantitative approach taken in this book is
concerned with determining appropriate maintenance decisions by studying the mathematical and statistical
relationships between the decisions to be made and the consequences of these decisions. The foregoing comments
about the use of models for analyzing maintenance problems are very brief, but they will be elaborated upon in the
subsequent chapters of this book
Data are essential inputs for building decision models that support evidence-based asset management. It must be
recognized that mathematical models by themselves do not guarantee that the right decisions will be made if the data
used do not have the required quality. A discussion on data requirements for model creation in the context of
maintenance optimization is presented in Tsang et al. (2006).
When data are unavailable or sparse, creating a model that characterizes the risk of failure can still be
achieved through knowledge elicitation by interviewing the asset’s domain experts. The related methodology, as well
as an illustrative example, is provided in Appendix 5.
43
UNIT III
MAINTENANCE LOGISTICS
Logistics
Logistics is the integrated design, management, and operation of human, physical, financial, and information
resources, during product, system, or service life time. (The Society of Logistic Engineers, SOLE)
It is a technology in the system engineering to lower a product life cycle cost and decrease demand
for logistics by the maintenance system optimization to ease the product support. Although originally developed for
military purposes, it is also widely used in commercial customer service organisations.
Classification of Logistics
• Supply chain logistics: Supply chain logistics deals with the delivery of inputs from suppliers to the
manufacturing plant and the delivery of finished goods to various demand centers. It deals with raw materials and
components on the input side and finished products on the output side.
Service response logistics: Service response logistics is the process of coordinating non‐material activities
necessary for the fulfillment of the service in an effective way. Service response logistics has a different focus from
supply chain logistics, in that supply chain logistics focuses on physical supply and distribution of products, whilst
44
service response logistics emphasizes building responsive organizations, which can respond to customer requests. This
difference in emphasis is illustrated in Figure 20.1.
• Product support logistics: Product support logistics deals with the provisioning, procurement, materials
handling, transportation and distribution, and warehousing of the items and the support infrastructure needed for
carrying out these activities over the life of the product. Figure 20.2 shows the main elements of product support
logistics.
LOGISTICS MANAGEMENT:
Logistics management deals with decision making and this is done at three different levels. In the context of
manufacturing logistics, the three levels are as follows:
• The strategic level deals with decisions that have a long‐lasting effect on the firm. This includes decisions
regarding the number, location, and capacities of warehouses and manufacturing plants.
The tactical level typically includes decisions that are updated anywhere between once every quarter and once
every year. This embraces purchasing decisions, inventory policies, and transportation strategies.
• The operational level refers to day‐to‐day decisions such as scheduling, routing trucks, and measuring
performance.
Maintenance logistics overlaps with product support logistics when one is dealing with products. Since our
focus is not only on products, but also on other engineered objects such as plants and infrastructures, maintenance
logistics needs to be looked at from the provider perspective – this may be the maintenance department (for in‐house
maintenance) and/or the external service provider (for outsourced maintenance). The key elements of maintenance
logistics for engineered objects are shown in Figure 20.3.
Service Facilities
Carrying out maintenance activities requires several service facilities including workshops to repair failed
items, warehouses and other storage facilities to store materials and spare parts, and so on. Having the tools and
equipment to carry out maintenance is an important issue that needs to be addressed.
45
Location
Maintenance service facilities may be located in one place, as in the case of plants, or may be distributed over
a wide geographical area to be close to customers in the case of consumer products. In the case of infrastructures,
these facilities have to be distributed due to the nature of the engineered object itself. The facilities may be owned by a
single or several different service providers. In some cases, such as air forces or airlines, these facilities may have a
multi‐echelon structure where different types of maintenance are carried out at different levels.
HUMAN RESOURCES
The maintenance of most objects (products, plants, or infrastructures) is labor intensive. Having the right mix
of skills and the right workforce size are key to ensuring effective maintenance. Maintenance personnel may be
specialized in certain areas (for example, mechanic, electrician, welder, etc.) or multi‐skilled to deal with a particular
item (car, aircraft, ship, air conditioner, turbine, rail, etc.).
Key issues in terms of maintenance human resources include having the right mix of skills supported by
adequate training programs in order to provide the required level of service.
INVENTORIES
Maintenance of an object requires various kinds of physical goods and they can be grouped broadly into two
categories: (i) consumables– such as oil and grease in plants, paint in infra-structures, and so on, and (ii) spare parts–
items (from components to objects and anything in between) that may be bought new from external suppliers or
repaired/reconditioned either in‐house or by an external agent.
Inventory management (of materials and spares) is important, as holding inventories implies capital being tied
up. Not having enough inventories of spare parts and materials may affect the functioning/operation of the object, with
serious consequences in terms of availability and cost. In this section we focus on spare parts from the point of view of
the maintenance service provider.
46
Characterization of Spare Parts
Spare parts and maintenance are a significant part of most industrial world economies, as illustrated by the
statement given below.
Spare parts and services account for 8% of the annual gross domestic product in the United States. Consumers
and businesses spend more than $700 billion each year on spare parts and services for previously purchased assets,
such as automobiles, aircraft, and industrial machinery. On a global basis, the annual spending on such aftermarket
parts and services totals more than $1.5 trillion.
There are many different ways of characterizing the spare parts used in maintenance, and these include the
following.
1. Repairable parts:Parts that are repaired rather than procured; that is, parts that are technically and
economically repairable. After repair, the part becomes ready for use again.
Non‐Repairable Spares
The primary question that service providers encounter for spare parts planning is how to place the spare part
inventories throughout their service network. Possible options include delivering parts to the field where they are
required, channeling the parts through a central warehouse – a two‐echelon solution, or a three‐echelon solution with a
central distribution center and regional warehouses close to customers. Once the distribution network is in place, the
next issue is the ordering and inventory policies for the different echelons.
Repairable Spares
Given an item design and a repair network, a level of repair analysis (LORA) determines, for each component
in the item, (i) whether it should be discarded or repaired upon failure and (ii) at which echelon in the repair network
this should be done. The objective of the LORA is to minimize the total (variable and fixed) costs.
A typical structure of the repair network is to have a single‐ or multi‐echelon system. The details of these
types of network and their operation are discussed later in this chapter.
Other Issues
• Criticality: This is based on the consequences caused by the failure of the part. The unavailability of some
parts may shut down a whole unit or plant, resulting in high losses.
• Specificity: Some parts are custom‐made whilst others are generic and common to many objects.
• Lead time: Many spare parts have a long lead time, especially for custom built items or repairable items that
have to queue for service at a repair facility.
Figure 20.4 is a system characterization of the real world of spare parts management which shows the key
elements and the interactions among them.
47
The key issues we will discuss include:
There are two main approaches for forecasting the demand for spare parts. The first is the reliability‐based
approach and the second is the black‐box approach based on historical data on spare parts consumption. In some
cases, spare parts demand exhibits patterns that cannot be predicted well using traditional forecasting methods. We
focus on reliability‐based forecasting.
Demand for non‐repairable items depends on several factors, and the factors influencing the demand for
spares are shown in Figure 20.5.
Replacements of components are points (some random and others non‐random) along the time axis. Let N(t)
denote the count of failures and replacements over [ ,) 0t and this is the demand for replacement items. The demand is
uncertain and can be characterized in terms of the mean and variance, as shown in Figure 20.6. The mean demand for
an item is given by E[N(t)] (often referred to as the mean cumulative function(MCF)) for the item.
One can divide items into three groups, as shown in Figure 20.7: (A) fast‐moving, (B) medium‐moving, and
(C) slow‐moving items based on their reliability (low, intermediate, and high).
48
49
NEW ITEM INVENTORY MANAGEMENT
Inventory management involves the selection of suppliers (a tactical decision) and ordering policies. This
section deals with ordering policies and inventory costs.
The framework for ordering policies is given in Figure 20.8. We discuss briefly the various elements of the
framework.
Inventory Level
The inventory level of new parts changes dynamically in an uncertain manner. It decreases when an item is
issued for maintenance activities and increases when an order is received from suppliers.
Ordering Policies
There are three key issues related to ordering policies. For the single‐item case, the first two issues are
important and they deal with (i) when to order and (ii) how much to order, with one or both being the decision
variables. For the multi‐item case, the coordination of times to order is issue (iii), and this is commonly referred to as
joint replenishment.
Several different inventory policies dealing with issues (i) and (ii) have been studied, and the two most
commonly used are as follows:
• Fixed ordering time policy: The inventory is reviewed at ordering times jT, j =12...., and the quantity
ordered may change with time.
• Fixed ordering quantity policy: Here, the quantity ordered each time is the same and the time between orders
changes with time.
If the demand for spares is based on the MCF, then the ordering times and quantities for the two policies are
as shown in Figure 20.9.
50
Decision Problems
The fixed ordering time [quantity] policy has a single decision variable T[Q] and the optimal selection of this requires
a suitable objective function. One that is commonly used is the asymptotic expected total inventory cost per unit time
(over an infinite time horizon). The total cost consists of the following elements:
• Ordering cost: This depends on the quantity ordered, Q+ administration cost, and is given by a+bQ where b is the
sale price of a spare item.
• Inventory holding cost: This is given by hy where h is the holding cost per item per unit time and y is the duration
the spare item stays in inventory.
Shortage cost: This cost may include losses resulting from the downtime due to unavailability of spares.
51
Emergency ordering cost: This cost is incurred when the last spare is used before the next regular ordering time
instant.
It is difficult to obtain analytical expressions for these cost elements as failures occur randomly and, as such, the
inventory levels change in an uncertain manner over time.
In this section we consider a periodic inventory policy for an item maintained using the block policy discussed
in Chapter 4.Note that the expected number of spares needed between two preventive maintenance (PM) actions is
given by M(T), where M(t) is the renewal function associated with F(t), the failure distribution for the item. The
quantity ordered is 1 + M(T) at time instants which correspond to PM actions, as shown in Figure 20.11 for the first
cycle. Since one item is used in the PM replacement, the inventory level at the start is M(T). The inventory reduces by
one each time a spare is used and this occurs in an uncertain manner. However, the mean value of the inventory is
given by I (t)= M(T)+M(t) where t = 0is the time instant of PM action. The mean inventory profile of the cycle stock
is shown in the figure.
Since the inventory level changes in an uncertain manner, the total demand for spares over a cycle is a random
variable which can be either less than M(T) or greater than M(T). In the former case, the order quantity needed is less
than M(T). However, in the latter case there is a shortfall.
A periodic inventory policy with maximal level Sand safety stock level sis shown in Figure 20.12. Note that
even in this case, the demand over a cycle can exceed S, so that the inventory is depleted and additional spares need to
be ordered as emergency orders.
The single‐echelon inventory system for a repairable item is shown in Figure 20.4. When an item fails, it is
replaced by a working item from the new or repaired item inventory, if available. Otherwise, the system (for example,
an aircraft) waits until a working item becomes available. The failed item is removed and joins the repair queue. At a
later time it is either scrapped or repaired (and then joins the inventory for repaired items).
• The distribution of the arrival of the failed items to the repair facility: This depends on the number of
engineered objects involved (if the item is an aircraft engine then the arrival distribution depends on the number of
aircraft in the fleet), their intensity of usage, the maintenance policy adopted, and so on.
• The capacity of the repair facility: This capacity determines the service rate of repair, which is an important
parameter of the problem.
• The appropriate measures of performance for the system: The common measures include:
52
◦Average fill rate: This is the percentage of parts required for repair that are available from on‐the‐shelf
inventory.
◦Total (system) backorders: The system‐wide backorders simply represents the sum of expected backorders of
all parts that are used to support the system.
◦System availability: This is a measure that is both intuitive and directly reflects the customer goal of
generating value through the use of the system.
• The optimal number of spares in the system: This is usually the key decision variable of the problem and is
determined to optimize one of the above measures of system performance.
Consider a two‐echelon inventory system for a repairable item where the system consists of a repair depot and
N operating sites. Each site requires a set of working items and maintains an inventory of spare items. All failed items
are repaired at the repair depot, which also maintains an inventory of spare items. We consider a one‐for‐one
replenishment policy, which is appropriate when the item has high value and is subject to infrequent failures. When an
item fails at a site, three events occur simultaneously:
1. The failed item is replaced with a spare item from the site’s inventory, if one is available; otherwise, there is
a shortage at the site that will last until a replacement arrives from the repair depot.
3. The depot ships a replacement item if it has available inventory; otherwise, the depot places the
replacement request on backorder and will fill it when stock is available. When the failed item arrives at the repair
depot, it enters the repair process; upon completion of the repair process, the item goes into the depot inventory or fills
a backorder if any exist.
The decision problem is the quantity of spare items to be stocked and their locations.
The Multi‐Echelon Technique for Recoverable Item Control (METRIC) was developed in the late 1960s for
the US Air Force by the RAND Corporation. The METRIC model determines, for every item of a system, the optimal
stock level at each of several different bases, which may be different in terms of item demand rates and other
characteristics, and the supporting depot. The objective function is the sum of backorders across all bases.
Delivering after‐sales services is more complex than manufacturing products. When delivering after‐sales
services, firms have to deploy parts, people, and equipment at more locations than they do to make the products. An
after‐sales network has to support all the products a company has made in the past as well as those it currently makes.
As a result, the service network often has to cope with 20 times the number of stock‐keeping units that the
manufacturing function deals with. Businesses also have to train service personnel, who are dispersed all over the
world, in a variety of technical skills. Moreover, after‐sales networks operate in an unpredictable and inconsistent
marketplace because of the unpredictable nature of the demand for product repair.
In addition, companies must design a portfolio of service products, since different customers have different
service needs even though they may own the same product. Those needs also change with time. For example, the
failure of a computer in a nuclear power plant will have a more severe impact than when a computer in a library goes
53
down. Also, a grounded aircraft means more to an air force during a war than it does during the course of a training
exercise.
The management of service part logistics encompasses planning, fulfillment, and execution of service parts
through activities like demand forecasting, parts distribution, warehouse management, repair of parts, and
collaboration processes with all the relevant parties in the after‐sales service supply chain. In the next section we
present a framework that captures the main elements of service part logistics in the automotive and aerospace
industries.
A framework encompassing the main elements of service part logistics in the automotive and aerospace
industries is shown in Figure 20.13. We focus on the network configuration for delivering spare parts.
The amount of time it takes to restore a failed item is often seen as a key performance indicator, especially in
the aerospace industry, where any part unavailability translates into huge losses. Companies need to design a portfolio
of service products, as each customer segment demands a different level of service.
In general, both automotive and aircraft companies offer three different levels of service, as indicated in
Tables 20.2 and 20.3.
54
Supply Chain Network
A typical after‐sales supply chain network consists of four entities, namely the parts supplier or original
equipment manufacturer (OEM), the regional logistics center (RLC), the importers or country warehouses, and the
dealers. Three typical configurations that are commonly used are as follows:
• A centralized configuration where parts from suppliers will be stored in the RLC and delivered directly to
the dealers whenever demand arises
• A decentralized configuration where parts from a supplier will be forwarded to the RLC first. The RLC
usually breaks up the large shipments received from suppliers/OEM and then sends the smaller shipments to various
warehouses in other countries in the region. Some of these warehouses are owned by the company whilst others are
outsourced to third party logistics providers.
55
MAINTENANCE LOGISTICS FOR PLANTS
The turnaround maintenance (TAM) event affects and is affected by many internal and external stakeholders
in a wider supply chain context. In the petrochemical industry, plants feed each other; that is, the product of one plant
is the raw material of another. Also, a large number of plants in a given area (for example, Jubail in Saudi Arabia) will
compete for a limited number of subcontractors. As explained in the previous section, TAM also requires the ordering
of many spare parts, involves long lead times for items from suppliers, and sometimes the assistance of technology
providers is needed. This supply chain view of TAM is depicted in Figure 20.14. This system view requires integrated
TAM planning and coordination involving all stakeholders to secure maximal utilization of resources to benefit the
entire system.
In particular, the timing of TAM for the various plants should take into consideration the interdependence
between them to minimize the disturbance to the whole system. Timing coordination between plants and the sharing
of experiences can benefit an entire industry if the TAM event is viewed in this wider supply chain context.
This coordination of TAM events is also an important issue in the power‐generation industry. Unlike the
petrochemical industry, where an inventory buffer of final products and raw materials may be built ahead of a TAM
event, electricity cannot be stored. Thus, the timing of TAM is crucial to avoid an interruption to the electricity supply.
56
Many models have been developed in the literature to generate optimal maintenance schedules, taking into
account the interdependence between plants to minimize the adverse effect of plant shutdown on all stakeholders.
In this section we discuss some aspects of rail track maintenance logistics. Maintenance and renewal activities
for rail track require several items such as rails, switches and crossings (S&C), sleepers, and ballast and also
machinery such as welding machines, rail‐grinding machines, and so on. Here, we focus on the logistics for rails and
also provide some information on S&Cs, sleepers, and ballast.
Rail Logistics
Rails may be delivered directly from the plant to the renewal or maintenance site or they may be stocked for
later use. The welding of rails may be done on site or in welding plants located close to rail rolling mills. The rail
length may be up to 400 m and they are transported by train. This is the case for track renewal. However, for track
maintenance, shorter rails are often needed and are located at many sites along the track, since defects may occur in
any part of the network. In this case, the replacement short‐length rails are best stocked at discrete locations such as
maintenance depots. Hence, the most flexible logistical solution for the delivery of short‐length rails (up to 27 m) is by
road using flatbed trailers. Other logistics information in terms of lead time, number of suppliers, and relationships
with suppliers is indicated in Table 20.4 based on a study in the European Union.
Maintenance planning is a key element of maintenance management and needs to be done at three levels:
strategic, tactical, and operational. At each level there are several issues that need to be addressed and effective
decision making requires a proper framework that captures the main issues and decisions at that level. At the tactical
level, the key issues include facility capacity planning (for carrying out maintenance actions), manpower needs, and
equipment and tool requirements. At the operational level, the key issue is scheduling and this depends on whether
maintenance is done on site or the failed item is brought to a workshop. Maintenance control is essential to ensure that
the planned maintenance and related activities are carried out properly. This involves monitoring, proper data
collection, and analysis to resolve any problems and guarantee continuous improvement. This chapter deals with these
topics.
MAINTENANCE PLANNING
Tactical‐Level Framework
At the tactical level, the key issues are (i) maintenance load forecasting and (ii) maintenance capacity
planning. Forecasting predicts the future demand for maintenance work considering age and planned workload, and
capacity planning ensures that adequate capacity is available to meet the planned and unplanned maintenance load.
Operational‐Level Framework
Operational‐level planning deals with the day‐to‐day preparation and execution of maintenance work. Key
issues include scheduling, work order planning, and execution.
57
Tactical‐Level Maintenance Planning
The maintenance load denotes the volume of maintenance work anticipated over time into the future and is made up of
the following two main components:
1. Planned maintenance:This includes all PM (preventive maintenance) work that has been planned and scheduled in
advance.
2. Unplanned maintenance:This includes all CM (corrective maintenance) work due to unforeseen breakdowns and
failures.
The load comprises the manpower, materials (including spares), and facilities (equipment and tools) needed on a
periodic basis (per month, quarter, or year) for the object being maintained. For products and plants, planned
maintenance is determined by the maintenance policies recommended by the OEM (original equipment manufacturer).
For infrastructures, planned maintenance is decided during the design process in the building of the object, taking into
account the anticipated usage and load, and needs to be revised over time based on the history of actual usage and
load. The unplanned maintenance depends on the degradation and failure of the object. For products and plants, the
expected number of CM actions required over a period depends on the age at the start of the period and the reliability
characteristics.
58
Qualitative Methods for Forecasting
For a newly designed object there are often very limited data to evaluate its reliability or performance over time.
In the case of products, this translates into uncertainty in the form and parameters of the ROCOF (rate of occurrence
of failure).In this case, qualitative (or judgmental) forecasting methods can be used. The two commonly used methods
are as follows:
1. Panel consensus: This generates a forecast based on the average estimates of a group of experts. The idea is that a
panel of people from a variety of positions is able to develop a more reliable forecast than a narrower group. Panel
forecasts are developed through open meetings with free exchange of ideas from all levels of management and
individuals.
2. The Delphi method:This is a group technique in which a panel of experts is questioned indi-vidually about their
perceptions of future events. The experts do not meet as a group in order to reduce the possibility that consensus is
reached because of dominant personality factors. Instead, the forecasts and accompanying arguments are summarized
by an outside party and returned to the experts along with further questions. This continues until a consensus is
reached.
The MCF (mean cumulative function) of the object gives the expected number of CM actions as a function of the
period under consideration and is given by the integration of the ROCOF over this period. One can compute the
maintenance load (PM and CM) in each period, and Figure 19.3 is a typical plot of such a forecast.
For consumer products, the OEM needs to forecast the maintenance requirements during the warranty period
from sales occurring over time. Time series modeling has been used for forecasting sales over different time periods.
The model is updated as new sales data become available. This is combined with the ROCOF to obtain the unplanned
maintenance load over time.
Capacity planning deals with the determination of the maintenance resources needed to meet the maintenance
load on a periodic basis. The resources required can be classified into four categories: (i) human resources, (ii) spare
parts and materials needed, (iii) facilities, equipment, and tools required, and (iv) information (documentation,
manuals, etc.) needed to carry out the maintenance tasks. Here, we focus on human resource capacity planning, and
the spare part issue is discussed in the next chapter.
Due to fluctuations in maintenance load from period to period, human resource capacity planning addresses
the following issues:
• Overtime capacity;
The purpose of maintenance capacity planning is to determine how to satisfy a fluctuating maintenance load
in each period. This is done by determining how much of each possible maintenance capacity (regular time, overtime,
subcontracting) should be planned to meet the maintenance load. Figure 19.4 illustrates this point for the case where
the demand is the human resource needed. In periods 1–6, the planned capacity is PC1and in the subsequent periods it
59
is PC2 (>PC1). Note that when the demand exceeds the planned capacity, it needs to be met by either using overtime or
outsourcing some of the maintenance tasks.
An important objective of capacity planning is to minimize the total cost of labor and backlog over the
planning horizon. Many approaches have been proposed for determining the optimal capacity and they can be grouped
broadly into (i) deterministic (when uncertainty is insignificant) and (ii) stochastic (when uncertainty is significant).
Deterministic Approaches:
1. Decision variables: The workforce size, number of workers hired (or fired), number of overtime hours,
number of regular hours, number of hours subcontracted, and number of hours backlogged each period.
2. Objective function: Minimize the total labor cost (regular, overtime, and subcontracted), the total cost of
hiring and firing, and the backlog cost.
3. Constraints: Balance the equation for maintenance load and workforce size between adjacent periods, and
limits on overtime and subcontracting that can be used.
Many different models have been proposed and we present a mixed integer programming model.
60
Stochastic Approaches:
The stochastic approach is seldom used in practice as it involves complex model formulations and simulations
to carry out the analysis and optimization. An alternative approach used is the deterministic approach with safety
factor – inflating the mean to reduce the demand exceeding the planned capacity due to uncertainties in the load
demand and treating the problem as deterministic. The risk of demand not being met is reduced as the safety factor
increases, but this is achieved at the expense of the capacity being underutilized when demand is below capacity.
At the operational level, maintenance planning has the following main objectives:
3. Optimized utilization of maintenance labor and materials through effectively planned and balanced
schedules.
4. Equitable resource allocation based on understood criteria and the varying business needs of the internal
customers supported.
5. Minimization of labor delay and idle time through effective coordination with the concerned department,
such as operations and stores.
It involves (i) work order planning and scheduling and (ii) maintenance scheduling.
61
A work order form (paper or electronic) serves as the vehicle for communicating information related to
specific work requested for maintenance. The work order form must be designed to include two types of information:
• Information needed for planning and scheduling: This includes the requesting department, information about
the item to be maintained (inventory number, location), information about the work requested (description, priority,
etc.), information about resources needed (estimated time, types and trades, spare parts, tools, etc.), information about
methods, safety procedures, and technical information (drawings and manuals).
Information needed for control: This includes actual time taken and spares and materials used and also the
causes and consequences of failures.
Planning
Work order planning is the advance preparation of maintenance work so that it can be exe-cuted in an efficient
and effective manner at some future date. The maintenance planner con-ducts a detailed analysis of each job to
determine and describe the work to be performed, the task sequence and methodology, plus the identification of
required resources – including skills, crew size, man‐hours, spare parts and materials, special tools and equipment.
• Experience and familiarity with the engineered objects that need maintenance. This enables the planner to
estimate maintenance time and other resources and select the best methods.
• Good communication skills, as this job requires coordination with other departments.
• Familiarity with planning tools and techniques and data analysis methods.
The job of a maintenance planner is greatly enhanced by the use of a computerized maintenance management
system (CMMS).Such a system provides timely access to available resources that need to be planned. It also assists in
data collection and analysis and the generation of various reports, and is discussed in the next section.
SCHEDULING
Scheduling is the process by which required resources are allocated to specific jobs at a certain point in time
when the engineered object is available or the job site is accessible. Effective scheduling requires coordination with
production personnel.
Priorities are established in coordination with maintenance customers to ensure that the most urgent jobs are
scheduled first. Most maintenance departments have three or four levels of priorities that are clearly defined, including
time frames for starting the work (e.g., urgent, normal, scheduled).
Maintenance schedules are usually prepared for different time frames. A long‐range schedule may cover a
period between three months and one year (for example, a schedule for rail track maintenance). It is usually based on
open work orders, PM work orders, and anticipated CM. The long‐range schedule is usually broken down into weekly
schedules that are, in turn, broken down into daily schedules. These schedules are continuously updated in light of any
changes to original plans.
Execution
Good planning is a prerequisite for good execution. An effective planning function eliminates unnecessary
waste from the work process, so that all materials, tools, support services, and technical information are ready for
technicians to start the job without delay.
As mentioned earlier, the work order system plays a key role in administering, monitoring, approving, and
collecting data about all maintenance jobs. In particular, data are collected about the actual time taken, the spare parts
62
used, and the cause of the failure in case of CM actions. The approval process for execution of jobs ensures quality
and identifies training needs. This information is crucial for a maintenance control system and is the cornerstone of
continuous improvement.
Maintenance Scheduling:
If planning specifies howto do maintenance‐related jobs, then scheduling specifies whento do them.
Maintenance scheduling deals with the decisions regarding when specific maintenance tasks are to be carried out
(either at the service facility or on site). It needs to take into account various other issues – an important one being the
interaction between maintenance and production/operations departments.
Scheduling Techniques:
There are many techniques that can assist a scheduler in developing effective schedules. Some of these are
graphical in nature and can be very helpful in following up the execution, especially for lengthy jobs. Other techniques
are used to obtain optimal schedules in terms of cost or some other criterion, taking into account the needs of the
operations department, the coordination of the maintenance of similar units, and so on. Two commonly used methods
are outlined below.
In terms of graphical methods, critical path methods (CPMs) are commonly used for large projects with
complex precedence relationships between maintenance tasks, such as, TAM. CPM scheduling is a graphical
technique used for illustrating activity sequences, together with each activity’s expected duration, to portray project
execution steps in precedence order. Several commercial software packages are available for this purpose.
Development of a CPM schedule begins by representing the project graphically by a network built up from
circles (nodes) and arrows (directed arcs) which lead up to or emerge from the circles. Usually, the circles represent
activities. Connecting the circles with arrows represents a sequence of activities in which each one is dependent on the
previous one. In other words, the earlier activity must be completed in order to begin the next activity. Graphing out
the job activities and dependencies to develop the network requires good knowledge of the constituent parts of the
project.
The following simple CPM example illustrates how the critical path is determined given a certain number of
activities, their precedence relationships, and their durations. Consider the network shown in Figure 19.5, where the
activities are the nodes and the duration of each activity is shown on the arc out of the node. The arc out of a node
points to its succes-sor activity. One can follow the arrows backwards to find what is required for each task and follow
them forwards to see what task is next.
The critical path is found by calculating the earliest start and finish times for each node, beginning from the
start point and moving forward to the end node. This is called the for-ward pass. The results are indicated in the upper
part of the table above each node. The backward passcalculates the latest start and finish times for each activity. The
results are indicated in the lower part of the table above each node, as shown in Figure 19.5
The critical path is then identified from the difference between the earliest start times and the latest finish
times. These differences are called the slack times. The critical path is the path where the earliest start and the latest
63
finish time are the same and there-fore there is no slack in these activities – a delay in these activities leads to a delay
in the entire project. Activities that have slack time may be delayed without causing a delay in the entire project. Such
activities are not on the critical path. The critical path for this example is then N1‐N2‐N5‐N6‐N7
Scheduling of maintenance work for a fleet of objects (buses, airplanes, locomotives, etc.), or a large number
of interdependent plants, leads to complex problems with many constraints arising from the need to coordinate
maintenance timing with the operational requirements of the engineered objects. Finding optimal maintenance
schedules that minimize the overall maintenance cost subject to various constraints may be formulated as a
mathematical program problem.
--------------------------------------------------
EXTRA
Reliability engineering, maintainability engineering and maintenance (preventive, predictive and corrective)
planning
Supply (spare part) support acquire resources
Support and test equipment/equipment support
Manpower and personnel
Training and training support
Technical data/publications
Computer resources support
Facilities
Packaging, handling, storage and transportation
Design interface
Decisions are documented in a life cycle sustainment plan (LCSP), a Supportability Strategy, or (most
commonly) an Integrated Logistics Support Plan (ILSP). ILS planning activities coincide with development of the
system acquisition strategy, and the program will be tailored accordingly. A properly executed ILS strategy will
ensure that the requirements for each of the elements of ILS are properly planned, resourced, and implemented. These
actions will enable the system to achieve the operational readiness levels required by the war fighter at the time of
fielding and throughout the life cycle.[2][3] ILS can be also used for civilian projects, as highlighted by the ASD/AIA
ILS Guide.[4]
It is considered common practice within some industries - primarily Defence - for ILS practitioners to take a
leave of absence to undertake an ILS Sabbatical; furthering their knowledge of the logistics engineering disciplines.
ILS Sabbaticals are normally taken in developing nations - allowing the practitioner an insight into sustainment
practices in an environment of limited materiel resources.
ILS is a technique introduced by the US Army to ensure that the supportability of an equipment item is considered
during its design and development. The technique was adopted by the UK MoD in 1993 and made compulsory for the
procurement of the majority of MOD equipment.
64
Influence on Design. Integrated Logistic Support will provide important means to identify (as early as
possible) reliability issues / problems and can initiate system or part design improvements based on reliability,
maintainability, testability or system availability analysis
Design of the Support Solution for minimum cost. Ensuring that the Support Solution considers and
integrates the elements considered by ILS. This is discussed fully below.
Initial Support Package. These tasks include calculation of requirements for spare parts, special tools, and
documentation. Quantities required for a specified initial period are calculated, procured, and delivered to support
delivery, installation in some of the cases, and operation of the equipment.
The ILS management process facilitates specification, design, development, acquisition, test, fielding, and support of
systems.
Maintenance Planning
Maintenance planning begins early in the acquisition process with development of the maintenance concept. It is
conducted to evolve and establish requirements and tasks to be accomplished for achieving, restoring, and maintaining
the operational capability for the life of the system. Maintenance planning also involves Level Of Repair Analysis
(LORA) as a function of the system acquisition process. Maintenance planning will:
Define the actions and support necessary to ensure that the system attains the specified system readiness
objectives with minimum Life Cycle Cost (LCC).
Set up specific criteria for repair, including Built-In Test Equipment (BITE) requirements, testability,
reliability, and maintainability; support equipment requirements; automatic test equipment; and manpower skills
and facility requirements.
State specific maintenance tasks, to be performed on the system.
Define actions and support required for fielding and marketing the system.
Address warranty considerations.
The maintenance concept must ensure prudent use of manpower and resources. When formulating the
maintenance concept, analysis of the proposed work environment on the health and safety of maintenance
personnel must be considered.
Conduct a LORA repair analysis to optimize the support system, in terms of LCC, readiness objectives, design
for discard, maintenance task distribution, support equipment and ATE, and manpower and personnel
requirements.
Minimize the use of hazardous materials and the generation of waste
SUPPLY SUPPORT
Supply support encompasses all management actions, procedures, and techniques used to determine requirements to:
65
Handling and Maintenance Equipment.
Tools (hand tools as well as power tools).
Metrology and measurement devices.
Calibration equipment.
Test equipment.
Automatic test equipment.
Support equipment for on- and off-equipment maintenance.
Special inspection equipment and depot maintenance plant equipment, which includes all equipment and tools
required to assemble, disassemble, test, maintain, and support the production and/or depot repair of end items or
components.
This also encompasses planning and acquisition of logistic support for this equipment.
Competencies management
Factory training
Instructor and key personnel training
New equipment training team
Resident training
Sustainment training
User training
HAZMAT disposal and safe procedures training
Embedded training devices, features, and components are designed and built into a specific system to provide training
or assistance in the use of the system. (One example of this is the HELP files of many software programs.) The design,
development, delivery, installation, and logistic support of required embedded training features, mockups, simulators,
and training aids are also included.
TECHNICAL DATA
Technical Data and Technical Publications consists of scientific or technical information necessary to translate system
requirements into discrete engineering and logistic support documentation. Technical data is used in the development
of repair manuals, maintenance manuals, user manuals, and other documents that are used to operate or support the
system. Technical data includes, but may not be limited to:
66
Technical manuals
Technical and supply bulletins
Transportability guidance technical manuals
Maintenance expenditure limits and calibration procedures
Repair parts and tools lists
Maintenance allocation charts
Corrective maintenance instructions
Preventive maintenance and Predictive maintenance instructions
Drawings/specifications/technical data packages
Software development documentation
Provisioning documentation
Depot maintenance work requirements
Identification lists
Component lists
Product support data
Flight safety critical parts list for aircraft
Lifting and tie down pamphlet/references
Hazardous Material documentation
67
FACILITIES
The Facilities logistics element is composed of a variety of planning activities, all of which are directed toward
ensuring that all required permanent or semi-permanent operating and support facilities (for instance, training, field
and depot maintenance, storage, operational, and testing) are available concurrently with system fielding. Planning
must be comprehensive and include the need for new construction as well as modifications to existing facilities. It also
includes studies to define and establish impacts on life cycle cost, funding requirements, facility locations and
improvements, space requirements, environmental impacts, duration or frequency of use, safety and health standards
requirements, and security restrictions. Also included are any utility requirements, for both fixed and mobile facilities,
with emphasis on limiting requirements of scarce or unique resources.
DESIGN INTERFACE
Design interface is the relationship of logistics-related design parameters of the system to its projected or actual
support resource requirements. These design parameters are expressed in operational terms rather than as inherent
values and specifically relate to system requirements and support costs of the system. Programs such as "design for
testability" and "design for discard" must be considered during system design. The basic requirements that need to be
considered as part of design interface include:
Reliability
Maintainability
Standardization
Interoperability
Safety
Security
Usability
Environmental and HAZMAT
Privacy, particularly for computer systems
Legal
68
UNIT IV
MAINTENANCE QUALITY
Introduction
The development of a sound quality control system for maintenance is essential for ensuring high-quality repair,
accurate standards, maximum availability, and equipment life cycle and efficient equipment production rates. Quality
control as an integrated system has been practiced with more intensity in production and manufacturing operations
than that in maintenance. Although the role of maintenance in the long-term profitability in an organization has been
realized, the issues relating to the quality of maintenance output have not been adequately formulated. Possible
reasons include the following:
secondary system driven by production. This viewpoint has led to the assignment of a low priority to improvement of
maintenance activities.
The quality of maintenance output has a direct link to product quality and the ability for a company to meet
delivery schedules. In general terms, equipment that is not well maintained or that is maintained with poor
workmanship fails periodically, or experiences speed losses, or is reduced of precision and hence tends to produce
defects. More often than not, such equipment drives manufacturing processes out of control. A process that is out of
control or with poor capability produces defective products which amount to lower profitability and greater customer
dissatisfaction.
A clear organization of the quality control function and the specification of its role (responsibilities) in the
maintenance system should be emphasized by the organization’s top management. The responsibilities include
development of testing and inspection procedures, documentation, follow-up, deficiency analysis, and help in
identifying training needs from the analysis of quality reports.
Maintenance managers and engineers need to be aware of the importance of controlling the quality of
maintenance output. The establishment of maintenance testing and inspection standards and acceptable quality levels
should be developed for all maintenance work. Documentation of maintenance procedures and inspection reports can
provide tremendous opportunities for maintenance quality improvement. These opportunities can be realized by
continuous improvement of the procedures, and the identification of training needs to enhance craft technical skills.
Maintenance activities are not repetitive, and large observations for such activities cannot be collected for
statistical analysis. For such activities, process control techniques provide valuable tools for improving maintenance
processes.
Organizations should strive to tie their maintenance activities to the quality of their products and services.
Also, they should create a focus on their internal customers. This will provide them with direction and goals for
improving their maintenance processes.
69
Responsibilities of Quality Control (QC):
3. Ensuring that all units are aware and proficient in maintenance procedures and standards.
4. Maintaining a high level of expertise by keeping up to date with the publications concerning maintenance
procedures and records.
6. Performing deficiency analysis and process improvement studies using various statistical process control
tools.
7. Ensuring that all the technical and management procedures are adhered to by crafts when performing actual
maintenance.
9. Reviewing material and spare parts quality and availability to ensure avail-ability and quality.
10. Performing maintenance audits to access the current maintenance situation and prescribe remedies for
deficient areas.
11. Establishing certification and authorization of personnel performing highly specialized critical tasks.
12. Developing procedures for new equipment inspections and test the equipment prior to acceptance from
vendors.
In summary, quality control in maintenance is responsible for ensuring the quality objectives for resources,
procedures, and standards used in the maintenance process are met. In addition, it performs inspection of maintenance
jobs and tests of equipment prior to acceptance or operation.
It is essential to have the QC personnel as independent as possible, and they must not be an extension of the
workforce. Also, they should not perform production inspections, as such inspections can be assigned to production
inspectors or workshop supervisors. Personnel comprising the quality control unit must be highly qualified technicians
or engineers with extensive training in areas such as productivity improvement, statistical process control, process
improvement, techniques for planning and scheduling, and work measurements.
In large organizations such as airline companies, air forces, army units, and railroad companies, it is necessary
to have a quality control division within the maintenance department. This division will report to the maintenance
manager. In medium-size organization, a small unit will do the job; however, in small-size organizations, one or two
inspectors attached to the manager’s office or the planning unit can perform the function of quality control.
70
BASIC CONCEPTS OF FMEA AND FMECA
Failure Mode and Effects Analysis (FMEA) and Failure Modes, Effects and Criticality Analysis (FMECA) are
methodologies designed to identify potential failure modes for a product or process, to assess the risk associated with
those failure modes, to rank the issues in terms of importance and to identify and carry out corrective actions to
address the most serious concerns.
Although the purpose, terminology and other details can vary according to type (e.g. Process FMEA, Design FMEA,
etc.), the basic methodology is similar for all. This article presents a brief general overview of FMEA / FMECA
analysis techniques and requirements.
Item(s)
Function(s)
Failure(s)
Effect(s) of Failure
Cause(s) of Failure
Current Control(s)
Recommended Action(s)
Plus other relevant details
Most analyses of this type also include some method to assess the risk associated with the issues identified during the
analysis and to prioritize corrective actions. Two common methods include:
71
Basic Analysis Procedure for FMEA or FMECA
The basic steps for performing an FMEA/FMECA analysis include:
Criticality Analysis
The MIL-STD-1629A document describes two types of criticality analysis: quantitative and qualitative. To use the
quantitative criticality analysis method, the analysis team must:
72
Applications and Benefits
The FMEA / FMECA analysis procedure is a tool that has been adapted in many different ways for many
different purposes. It can contribute to improved designs for products and processes, resulting in higher
reliability, better quality, increased safety, enhanced customer satisfaction and reduced costs. The tool can
also be used to establish and optimize maintenance plans for repairable systems and/or contribute to control
plans and other quality assurance procedures. It provides a knowledge base of failure mode and corrective
action information that can be used as a resource in future troubleshooting efforts and as a training tool for
new engineers. In addition, an FMEA or FMECA is often required to comply with safety and quality
requirements, such as ISO 9001, QS 9000, ISO/TS 16949, Six Sigma, FDA Good Manufacturing Practices
(GMPs), Process Safety Management Act (PSM), etc.
ReliaSoft's Xfmea software facilitates analysis, data management and reporting for failure mode and effects
analysis (FMEA) and failure modes, effects and criticality analysis (FMECA). The software supports all
major standards (AIAG FMEA-3, J1739, ARP5580, MIL-STD-1629A, etc.) and provides extensive
customization capabilities for analysis and reporting, allowing you to configure the software to meet your
organization's specific analysis and reporting procedures for all types of FMEA / FMECA.
Two quantitative and one qualitative options exist for FMECA Criticality as identified below:
1. Quantitative
o Mode Criticality = Item Unreliability x Mode Ratio of Unreliability x Probability of Loss x Time
(life)
o Item Criticality = Sum of Mode Criticalities
2. Qualitative
o Compare failure modes via a Criticality Matrix, which identifies severity on the horizontal axis and
qualitatively derived occurrence on the vertical axis
o Note: Quality-One suggests a qualitative criticality matrix for the Quality-One Three Path Model for
FMEA Development. Severity is on the vertical axis and occurrence is depicted on the horizontal axis. This
is often used as an alternative for the Risk Priority Number (RPN) in FMEA.
73
Design and Development Benefits
Increased reliability
Better quality
Higher safety margins
Decreased development time and re-design
Operations Benefits
More effective Control Plans
Improved Verification and Validation testing requirements
Optimized preventive and predictive maintenance
Reliability growth analysis during product development
Decreased waste and non-value added operations (Lean Operation and Manufacturing)
Cost Benefits
Recognize failure modes in advance (when they are less costly to address)
Minimized warranty costs
Increased sales from customer satisfaction
How to Perform Failure Mode, Effects & Criticality Analysis (FMECA)
The basic assumption when performing FMECA instead of FMEA is the desire to have a more quantitative risk
determination. The FMEA utilizes a more multi-functional team using guidelines to set Severity and Occurrence. The
FMECA is performed by first completing an FMEA process worksheet and then completing the FMECA Criticality
Worksheet.
The general steps for FMECA development are as follows:
FMEA Portion (see our FMEA page for more details)
o Define the system
o Define ground rules and assumptions to help drive the design
o Construct system Boundary Diagrams and Parameter Diagrams
o Identify failure modes
o Analyze failure effects
o Determine causes of the failure modes
o Feed results back into design process
FMECA Portion
o Transfer Information from the FMEA to the FMECA
o Classify the failure effects by severity (change to FMECA severity)
o Perform criticality calculations
o Rank failure mode criticality and determine highest risk items
o Take mitigation actions and document the remaining risk with rationale
o Follow-up on corrective action implementation/effectiveness
FMECA can often become time consuming and therefore available resources and team interest can be an issue as the
process continues. Quality-One has developed the FMECA process below to utilize engineering resources effectively
and ensure the FMECA has been developed thoroughly. The Quality-One approach is as follows:
Step 1: Perform the FMEA
The FMEA is a good starting place for the FMECA. FMEA allows for qualitative, and therefore creative, inputs from
a multi-disciplined engineering team. FMEA provides the first inputs into design change and can jump start the risk
mitigation process. The FMEA information is transferred into the FMECA Criticality Worksheet. The transferred
data from the FMEA worksheet will include:
Item Identification Number
Item / Function
Detailed Function and / or Requirements
Failure Modes and Causes with Mechanisms of Failure
Mission Phase or Operational Mode (DoD specific), often related to the Effects of Failure
74
Step 2: Determine Severity Level
Next, assign the Severity Level of each Effect of Failure. There are various severity tables to select from. The
following is used in medical and some aerospace activities. The actual descriptions can be altered to fit any product or
process design. There are generally four severity level classifications as follows:
Catastrophic: Could result in death, permanent total disability, loss exceeding $1M, or irreversible severe
environmental damage that violates law or regulation
Major/High Impact: Permanent partial disability, injuries or occupational illness resulting in hospitalization of
3 or more personnel, loss exceeding $200K but less than $1M, or reversible environmental damage causing a
violation of law or regulation
Minor Impact: Could result in injury or occupational illness resulting in one or more lost work day(s), loss
exceeding $10K but less than $200K, or mitigatable environmental damage without violation of law or regulation
where restoration activities can be accomplished
Low Impact: Result in minor injury or illness not resulting in a lost work day, loss exceeding $2K but less
than $10K, or minimal environmental damage
Step 3: Failure Effect Probability
In some applications of FMECA, a Beta value is assigned to the Failure Effect Probability. The FMECA analyst may
also use engineering judgement to determine the Beta value. The Beta / Effect Probability is placed in the FMECA
Criticality Worksheet where:
Actual Loss / 1.00
Probable loss / >0.10 to <1.00
Possible loss / >0 to =0.10
No Effect / 0
A failure mode ratio is developed by assigning a proportion of the failure mode to each cause. The accumulation of
all cause values equals 1.00.
Step 4: Probability of Occurrence (Quantitative)
Assign probability values for each Failure Mode, referencing the data source selected. Failure Probability and Failure
Rate data can be found from several sources:
Handbook 217 is referenced but any source of failure rate data can be used
RAC databases, Concordia, etc.
If the Failure Mode probability is listed (functional approach) several columns of the FMECA Criticality Worksheet
may be skipped. Criticality (Cr) can be calculated directly. When failure rates for failure modes and contributing
components are desired, detailed failure rates for each component are assigned.
Next, we must assign Component Failure Rate (lambda). Failure Rates for each component are selected from the
failure rate source document. Where there is no failure rate available, the qualitative values from the FMEA are used.
FMEA may also be an alternative method on new or innovative designs.
Operating Time (t) represents the time or cycles the item or component will be expected to live. This is related to the
expected duty cycle requirements.
Step 5: Calculate and Plot Criticality
In FMECA, Criticality is calculated in two ways:
The Modal Criticality (each failure mode all causes) = Cm
The Criticality of the Item (all failure modes summarized) = Cr
Formulas of each are not provided in this explanation but the essence of the elements of the calculation is as follows:
Cm = The product of the following:
o Failure Rate of the Part (lambda)
o Failure Rate of the Effect (Beta)
o Failure Mode Ratio (alpha)
o Operating Time (units of time or cycles)
Cr = The summation of all the Cm
75
Step 6: Design Feedback and Risk Mitigation
Risk mitigation is a discipline required to reduce possible failure. The identified risk in the criticality matrix is the
substitute for failure and must be treated in the same context as a test failure or customer returned component or item.
FMECA requires a change in risk levels / criticality after mitigation. A defect / defective detection strategy,
commensurate to the risk level, may be required. Acceptable risk management strategy includes the following:
Mitigation actions directed at Highest Severity and Probability combinations
Any risk where mitigation was unsuccessful is a candidate for Mistake Proofing or Quality Control, protecting
the customer / consumer from the potential failure
o Detection methods are chosen for failure modes first and if possible individual causes which do not
permit shipping or acceptance
Action logs and “risk registers” with revision history are kept for follow-up and closure of each undesirable
risk
Other examples of FMECA mitigation strategies to consider:
Design change. Take a new direction on design technology, change components and/or review duty cycles for
derating.
Selection of a component with a lower lambda (failure rate). This can be expensive unless identified early in
Product Development.
Physical redundancy of the component. This option places the redundant component in a parallel
configuration. Both must fail simultaneously for the failure mode to occur. If a safety concern exists, this option
may require non-identical components.
Software redundancy. The addition of a sensing circuit which can change the state of the product. This option
often reduces the severity of the event by protecting components through duty cycle changes and reducing input
stresses.
Warning system. A placard and / or buzzer / light. This requires action by an operator or analyst to avoid a
failure or the effect of failure.
Detection and removal of the potential failure through testing or inspection. The inspection effectiveness must
match the level of severity and criticality.
Step 7: Perform Maintainability Analysis
Maintainability Analysis looks at the highest risk items and determines which components will fail earliest. The cost
and parts availability are also considered. This analysis can affect the location of the components or items when in the
design phase. Design consideration must be given for quick access when serviceability is required more frequently.
Access panels, easy to remove, permit service of the identified components and items. This can limit down
time of important machinery.
A spare parts list is typically created from the maintainability analysis.
Root Cause Analysis (RCA) is defined as a systematic process for identifying the origins of problems and
determining an approach for responding to and solving them. It focuses on preventing problems rather than
simply “putting out fires.” RCA tries to be more scientific about asset failures, going a step beyond troubleshooting.
Overview
When feeling under the weather, it’s perfectly natural to address any pain or discomfort by some sort of first aid
treatment or a superficial remedy. However, if you consult a medical professional, then the approach might be a little
more thorough. You might find yourself being asked a series of specific questions about your condition and might
even go through some laboratory tests to get to the source of your illness.
76
The same is true for plant and maintenance incidents. While an immediate response is usually required, there is always
value in performing a systematic analysis of possible root causes.
RCA is the process that aims to identify the cause of a particular event. In the plant setting, this event usually refers to
any potential problems that will disrupt standard operations. At a very high level the usual suspects (i.e. usual causes
of problems) can be categorized as:
RCA capitalizes on the analysis of data collected from previous asset failures. It’s important to remember that some
failures can cascade into other failures, creating a greater need for root cause analysis in order to fully understand the
sequence of cause and failure events. Root Cause Analysis typically has 3 goals:
5 Whys
The name of the method pretty much explains the steps: Ask why and ask it again. Asking “Why?” five times usually
gets to the bottom of the problem, but don’t let the name stop you from asking more times. The idea is to drill down to
the details of an event until you are left with the actual root cause.
When executing root cause analysis, one process that is widely used is “The Five Whys”, a method that originated
from Sakichi Toyoda, founder of Toyota Industries, in the 1930s. The idea behind this process is that you should be
able to figure out the root cause of a problem by asking five “why” questions (more or less than five as needed). Here
is a real life example:
1. Why didn’t your car start? – Because the battery was dead.
2. Why was your car battery dead? – Because I left the headlights on last night.
3. Why did you leave the headlights on last night? – Because the headlight warning sensor did not beep when
car was last exited.
4. Why did the headlight warning sensor malfunction? – It suffered a complete failure.
5. Why did it suffer a complete failure? – Because the part has reached the end of its lifespan.
Using The Five Whys method, you can deduce that the root cause of your car failing to start is an depleted headlight
warning sensor (which should beep at you when you exit the car with your lights on). In the case of a sensor such as
this, you can’t really prevent it from failing—you can only replace it timely so that your car can be used again right
away. However, there may be other repairs you can avoid making by keeping up with preventive maintenance.
77
An another example involving a faulty mixer subjected to 5 Whys is shown below.
A more visual method to determine root causes is by using a fault tree diagram. A fault tree diagram starts by
having the problem at the topmost block. The immediate causes preceding the problem event are listed, then they
branch out to form the second layer of the diagram. Each immediate cause branches out to its own prior causes. This
process is continued until the most basic events are identified, which then become your potential root causes.
The same mixer can resemble the following fault tree diagram:
78
Fishbone diagram (aka Ishikawa diagram)
Another visual method to identify root causes is by using a fishbone diagram (also known as an Ishikawa diagram,
named after its creator Kaoru Ishikawa). It starts by specifying the problem on the rightmost part of the diagram. The
factors contributing to the main problem are then listed as categories. Specific causes under each category are then
listed down to identify the source of the problem.
Environmental
People
Equipment/material
Procedures
Applying these basing categories as a starting point, the mixer problem can be translated into a fishbone diagram.
79
Implementing Root Cause Analysis
While RCA methods are very common and well-known to the maintenance community, there can be challenges
to making RCA thrive.
The first step to mastering this process is knowing the methods that are available to conduct RCAs. The next steps are
setting the proper mindset and improving the quality of execution to drive the initiative toward success.
Keep in mind the importance of collecting data accurately and involving the correct groups to analyze that data. To
implement RCA effectively, it should be a repeatable process that is collaboratively executed by the group.
While The Five Why’s is a popular RCA method, it is definitely not the only one. You may use one or multiple
methods in the same cycle of RCA. Other strategies for RCA include:
Barrier Analysis
Change Analysis
Casual Factor Tree Analysis
Fishbone Diagram Analysis
Failure Tree Analysis
Failure Mode and Effects Analysis
Pareto Analysis
When Root Cause Analysis is performed, there are six phases in one RCA cycle. The components of asset
failure may include environment, people, equipment, materials, and procedure. Before you carry out RCA, you should
decide which problems are immediate candidates for this analysis. Just a few examples of where root cause analysis is
used include major accidents, everyday incidents, human errors, and manufacturing mistakes. Those that result in the
highest costs to resolve, most downtime, or threats to safety will rise to the top of the list.
The first thing to do in Root Cause Analysis is to list every potential cause leading up to a problem or event.
Place the incident into the context of everything related to the problem. You should also look at a longer time period
than the days leading up to when the incident occurred to create a history of what might have gone wrong and when.
This phase requires complete neutrality, focusing on facts. When you are investigating potential causes, some facts
may not be available if no one saw what happened or evidence was discarded or destroyed. This is when you should
look to secondary sources. You can also construct possible scenarios for how the problem may have occurred.
Phase 2 involves collecting as much data as you can that relates to the potential cause(s) of the problem. This
data may come from your existing CMMS software, other databases, digital files, or printed documents. Ask questions
to clarify information and drill down into every potential cause. This phase would be where you would implement The
Five Whys Method.
Phase 3 is to identify everything that contributed to the problem. Make a list of every change or event. If
possible, gather evidence of these changes and the main problem that occurred. There are four types of evidence that
80
can be gathered: people, paper, physical, and recording evidence. Just a few examples include interviews, activity-
specific paperwork, broken parts, and video footage.
In Phase 4, you should analyze the collected data. Categorize changes or events by how much influence you
have over them. Then decide if each event is unrelated, a correlating factor, a contributing factor, or a root cause. An
unrelated event is one that has no impact or effect on the problem whatsoever. A correlating factor is one that is
statistically related to the problem, but may or may not have a direct impact on the problem. A contributing factor is
an event or condition that directly led to the problem, in full or in part. (We defined root cause in the beginning of this
article). This should help you arrive at one or more one root causes. When the root cause has been identified, more
questions can be asked. Why are you certain that this is the root cause instead of something else?
The fifth phase of RCA is to develop a plan for preventing future breakdowns. It’s important to identify
preventive actions you should take that will not only prevent the problem from reoccurring, but also not cause other
problems. You should find and present a solution that is repeatable and applicable to more than one situation.
Be sure to ask, how can the root cause be eliminated or avoided so the issue doesn’t occur again? There are reports
available in a CMMS to help you identify how to prevent the problem. Just a few examples include changes to the
preventive maintenance routine or operator training, new signage or HMI controls, or a change of parts or part
suppliers.
In order to predict the potential for future problems (and hopefully avoid them), you should ask a few questions. What
can be done to prevent the problem from reoccurring? How will the solution be implemented, and who will implement
it? What are the risks involved in this solution?
Now that you’ve come up with a plan, it’s time to implement that plan. Depending on the type, severity, and
complexity of the problem and the plan to prevent it from happening again, there are a number of areas to take into
consideration. These include, but are not limited to the people in charge of the assets, the condition and status of the
assets themselves, the processes related to the maintenance of those assets, and any people or processes outside of
asset maintenance that have an impact on the identified problem.
After root cause analysis is complete, maintenance teams should go back and review the actual downtime and costs
impact associated with that problem. This will help you determine if this problem and other similar issues were worth
the effort of RCA.
81
RELIABILITY CENTERED MAINTENANCE (RCM)
Reliability centered maintenance is a logical methodology derived from this research in the aviation sector and
uses the failure mode, effect, and criticality analysis (FMECA) tool. RCM is a process used to identify the most
applicable and effective maintenance action(s) to ensure the highest practical standard of operating performance of a
system or a component.
• To determine the most cost-effective and applicable maintenance tasks to minimize the risk and impact of failure on
systems/equipment function.
• To ensure realization of the inherent safety and reliability levels of the equipment.
• To obtain the information necessary for design improvement of those items where their inherent reliability proves to
be inadequate.
• To accomplish these goals at a minimum total cost, including maintenance costs, support costs, and economic
consequences of operational failures.
The above goals and specific objectives are clear derive for effective maintenance programs that usually result from
the application of RCM methodology.
RCM Principles
RCM has the following key principles that distinguish it from other methodologies for maintenance:
• Preservation system of equipment function: The focus here is to keep the system performing its function not to keep
it operating as though it is new. This tells us that as far as the system performing its function, there is no need for
excessive maintenance which may cause failure in some cases. This principle has led to a reduction in time-based
preventive maintenance in the airline industry that reduced cost and improved reliability of systems. Redundancy of
function through multiple equipment improves functional reliability, but increases life cycle cost in terms of
procurement and operating costs.
• Focus on systems: RCM focuses on systems than component, since functions are usually driven by systems.
• RCM is reliability centered: It treats failure statistics as it relates to age. It seeks to know the conditional probability
of failure at specific ages (the probability that failure will occur in each given operating age bracket).
• Safety and economics are the key criteria: Safety must be ensured first at any cost; followed by costs that result from
the impact on production and operation.
Design limitations exist: RCM objective is to maintain the inherent reliability of the equipment design, recognizing
that changes in inherent reliability arises from design rather than maintenance. Maintenance can, at best, only achieve
and maintain the level of reliability for a system.
82
• Feedback is necessary for improvement: RCM recognizes that maintenance feedback can improve on the original
design. In addition, RCM recognizes that a difference often exists between the perceived design life and the intrinsic
or actual design life.
• Failure is any unsatisfactory condition: failure may be either a loss of function (operation ceases) or a loss of
acceptable quality (operation continues).
• Maintenance tasks should be derived based on logic: RCM uses a logic tree to develop and screen maintenance tasks.
RCM Methodology
RCM has a systematic methodology that consists of seven steps. The seven steps are as follows:
7. Task selection.
RCM is best presented and implemented at a system level due to the fact that functions are best captured at the system
level. The component level lacks defining significance of functions and functional failure, while plant-level analysis
makes the whole analysis intractable. The important question faced at this stage is which system should be selected?
The following are criteria that guide the selection:
1. Systems with a high number of corrective maintenance tasks during recent years;
2. Systems with a high number of preventive maintenance tasks and or costs during recent years;
5. Systems contributing significantly toward plant outages/shutdowns (full or partial) during recent years;
Past experience has shown that all of these criteria except scheme 6 and 7 yield more or less the same results. An
indicator of a good selection is that systems chosen for RCM program results in a significant improvement over the
current situation.
The next task, after selecting a system, is collecting information related to the selected system. A good practice is to
start collecting key information and document right at the onset of the process. The following are documents that may
be required in a typical RCM study:
83
• Systems schematic and/or block diagram;
• Equipment design specification and operations manuals (a source of finding design specifications and operating
condition details);
• Other identified sources of information, unique to the plant or organizational structure. Example includes industry
data for similar systems; and
• Current maintenance program used for the system. This information is generally not recommended to collect before
step 7, in order to avoid biases that may affect the RCM process.
• It provides an exact knowledge of what is included and not included in a system in order to make sure that any key
system function or equipment is not neglected (or not overlapped from another system). This is especially important if
two adjacent systems are selected.
• Boundary definition also includes system interfaces (both IN and OUT inter-faces) and interactions that establish
inputs and outputs of a system. An accurate definition of IN and OUT interfaces is a precondition to fulfill step 3 and
4 below.
There are no clear rules to specify a system boundaries; however, as a general guideline, a system has one or two main
functions with a few supporting functions that would make up a logical grouping of equipment. However the
boundary is identified, there must be clear documentation as part of a successful process.
This step is important and will set the stage for a successful RCM process. The step has the following five elements:
• System description;
• In/out interfaces;
• Equipment history.
This step generally involves a form that documents baseline characterization of a system which is eventually expected
to be used in stipulating PM tasks.
The fourth step identifies and lists all system functions. As a guide for identifying functions, every out interface
should be captured into a function statement and any internal out interfaces between functional subsystems can be a
source for a function. An important point to note is that these statements are for defining system functions and not the
equipment. With the definition of system functions comes the functional failures. In RCM, the focus is on functions
and functional failures. The functional failures are more than just a single statement of loss of function. The loss
conditions may be two or more (e.g., complete paralysis of the plant or major or minor deprivation of functionality).
This distinction is important and will lead to the proper ranking of functions and functional failures.
84
Figure11.7provides a form to document functions and functional failures. The following are the examples for the
correct and wrong statement of functions:
• Provide 1500 psi safety relief valves (wrong statement because the statement isabout equipment);
• Provide for pressure relief above 1500 psi (correct; the focus is on function);
• Provide a 1500 gallon per minute (gpm) centrifugal pump on the discharge side of header 26 (wrong); and
Failure modes and effects analysis (FMEA) is a basic tool used in reliability engineering to assess the impact of
failures. It is a systematic failure analysis technique that is used to identify the failure modes, their causes, and
consequently their fallouts on the system function. FMEA analysis rates each potential failure mode and effect based
on the following three factors:
• detection—the probability of the failure being detected before the impact of the
effect is realized.
Then these three factors are combined in one number called the risk priority number (RPN) to reflect the priority of
the failure modes identified. The risk priority number (RPN) is simply calculated by multiplying the severity rating,
the occurrence probability rating, and the detection probability rating.
The purpose of the LTA is to prioritize the resources to be committed to each failure mode. The prioritization is based
on the impact of the failure mode. RCM processes a simple and intuitive structure for this purpose. The structure
utilizes two criteria, i.e., safety and cost, that arise from plant full outage. The LTA has three questions that enable a
user, with minimal efforts, to place each failure mode into one of the six categories. Each question is answered as yes
or no only. Each category (also known as a bin) forms natural segregation of items of respective importance. The LTA
scheme is shown below in Fig.11.9
85
The six classification categories for the failures are A, B, C, D/A, D/B, or D/C.For the priority scheme, A and
B have higher priority over C when it comes to allocation of scarce resources and A is given higher priority than B. In
summary,the priority for PM task goes in the following order:
1. A or D/A;
2. B or D/B; and
3. C or D/C.
Task Selection
In this step, RCM methodology allocates PM tasks and resources. This is the stage where the maximum benefit from
RCM may be obtained. The task selection process requires that each selected task must be applicable and effective.
Here, “applicable” means that the task should be able to prevent failures, detect failures, or unearth hidden failures,
while “effective” is related to the cost effectiveness of the alternative PM strategies. If no PM task is selected through
the LTA, the only option is to run equipment to failure. This activity requires contribution from the maintenance per-
sonal as their experience is invaluable in the correct selection of the PM task. After selecting the tasks, the set of all
run-to-failure (rtf) tasks are subjected to a final sanity check. The purpose of the check is to review critically all
component failures that are treated as run-to-failure cases to see if this task is appropriate. If an rtf task fails any of the
following tests or creates a conflict, the PM or the current task is kept. The following are the checks:
• Marginal effectiveness: It is not clear that the rtf costs are significantly less than the current PM costs.
• High-cost failure: While there is no loss of critical function, the failure mode is likely to cause extensive damage to
the component that should be avoided.
• Secondary damage: Similar to the second item, except that there is a high probability extensive damage in
neighboring components.
• OEM conflict: The original manufacturer recommends a PM task that is not supported by RCM. It is very sensitive if
warranty conditions are involved.
• Internal conflict: Maintenance or operation feels strongly about the PM task that is not supported by RCM.
• Regulatory conflict: Regulatory body established the PM, such as theEnvironmental Protection Agency (EPA).
• Insurance conflict: similar to the above two.
RCM IMPLEMENTATION
RCM implementation can be viewed as a process with four stages [1, 2, 4, 7]. Each stage consists of a number of tasks
that must be executed in order to ensure successful implementation. The four stages are as follows:
86
• Stage 1: Planning and organizing for RCM.
• Stage 2: Analysis and design.
• Stage 3: Scheduling and execution.
• Stage 4: Assessment and improvement.
The following subsections present the process for RCM implementation.
This stage is important for the success of RCM implementation. Top maintenance management must seek
organization commitment for the RCM project and ensure the needed resources are provided. In some situations, it
may be better to conduct a pilot RCM project. The following must be addressed carefully:
1. Organization: Formation of the RCM team and selection of a facilitator. The facilitator must be knowledge about
the RCM process. The team must include experienced people in the areas that will be impacted by the application of
the RCM. The team must use a clear system for reporting progress and challenges.
2. Training: A training program on RCM should be conducted at different levels. Management should be provided an
awareness program about RCM and its benefit. A well-structured training program on RCM methodology and imple-
mentation should be provided to the team.
3. Avail resources: Estimate what type of resources is needed and ensure their availability.
4.Manage expectations: Establish a baseline for the current performance and the expected benefits from RCM.
5. Schedule: Prepare a schedule for RCM project on a Gantt chart with the necessary resources. A schedule will
facilitate follow-up and monitoring.
6. Change management program: Develop a change management program to mitigate resistance to the RCM project
and ensure buying. The program may have an awareness program, training, and reorganization.
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
UNIT V
TOTAL PRODUCTIVE MAINTENANCE
Nakajima, who is considered by many in the literature as the father of total pro-ductive maintenance (TPM),
defines it [5]as“productive maintenance carried out by all employees through small group activities.”He also
adds“TPM is equipment maintenance performed on a company wide basis.”The authors define TPM as a management
approach to maintenance that imports total quality management (TQM) philosophy and techniques to maintenance.
TPM focuses on involving all employees in the organization in equipment improvement.
TPM (Total Productive Maintenance) is a holistic approach to equipment maintenance that strives to achieve perfect
production:
No Breakdowns
No Small Stops or Slow Running
No Defects
In addition it values a safe working environment:
No Accidents
TPM emphasizes proactive and preventative maintenance to maximize the operational efficiency of equipment. It
blurs the distinction between the roles of production and maintenance by placing a strong emphasis on empowering
operators to help maintain their equipment.
The implementation of a TPM program creates a shared responsibility for equipment that encourages greater
involvement by plant floor workers. In the right environment this can be very effective in improving productivity
(increasing up time, reducing cycle times, and eliminating defects).
106
Why Chronic Losses are Neglected
as a team.
107
o its original operating conditions
Introduction to OEE
OEE (Overall Equipment Effectiveness) is a metric that identifies the percentage of planned production time that is
truly productive. It was developed to support TPM initiatives by accurately tracking progress towards achieving
“perfect production”.
OEE consists of three underlying components, each of which maps to one of the TPM goals set out at the beginning of
this topic, and each of which takes into account a different type of productivity loss.
Availability No Stops Availability takes into account Availability Loss, which includes all events that stop
planned production for an appreciable length of time (typically several minutes or
longer). Examples include Unplanned Stops (such as breakdowns and other down
events) and Planned Stops (such as changeovers).
Performance No Small Stops Performance takes into account Performance Loss, which includes all factors that
or Slow cause production to operate at less than the maximum possible speed when running.
Running Examples include both Slow Cycles, and Small Stops.
Quality No Defects Quality takes into account Quality Loss, which factors out manufactured pieces that
do not meet quality standards, including pieces that require rework. Examples include
Production Rejects and Reduced Yield on startup.
OEE Perfect OEE takes into account all losses (Availability Loss, Performance Loss, and Quality
Production Loss), resulting in a measure of truly productive manufacturing time.
For a complete discussion of OEE, including information on how to calculate Availability, Performance, Quality, and
OEE visit our dedicated OEE (Overall Equipment Effectiveness) page.
108
As can be seen from the above table, OEE is tightly coupled to the TPM goals of No Breakdowns (measured by
Availability), No Small Stops or Slow Running (measured by Performance), and No Defects (measured by Quality).
It is extremely important to measure OEE in order to expose and quantify productivity losses, and in order to measure
and track improvements resulting from TPM initiatives.
Item Benefit
Stop Time The accuracy of manual unplanned stop time tracking is typically in the range of 60
to 80% (based on real-world experience across many companies). With automatic
Run/Down detection, this accuracy can approach 100%.
Small Stops For most equipment it is impossible to manually track slow cycles and small stops.
and Slow This means that a great deal of potentially useful information, such as time-based and
Cycles event-based loss patterns, is not available.
Operator Focus With automated data collection the operator spends more time focused directly on the
equipment (versus spending time on paperwork).
1. Track OEE (including Availability, Performance, and Quality) for the target equipment for one month. Make
sure to compile the results by shift.
2. Review every shift result, keeping track of the best individual result for Availability, Performance, and
Quality across all shifts (i.e. the highest Availability score across all shifts, the highest Performance score
across all shifts, etc.).
3. Multiply the best individual results together to calculate a “Best of the Best” OEE score.
109
This newly calculated “Best of the Best” OEE score represents the stretch goal – derived from the best results actually
achieved across the month for Availability, Performance, and Quality.
Setup and Availability Setup/Changeover, Material This loss is often addressed through
Adjustments Loss Shortage, Operator Shortage, setup time reduction programs such
Major Adjustment, Warm-Up as SMED (Single-Minute Exchange
Time of Die).
Small Stops Performance Component Jam, Minor Typically only includes stops that
Loss Adjustment, Sensor Blocked, are less than five minutes and that
Delivery Blocked, do not require maintenance
Cleaning/Checking personnel.
Slow Running Performance Incorrect Setting, Equipment Anything that keeps the equipment
Loss Wear, Alignment Problem from running at its theoretical
maximum speed.
110
Six Big OEE
Losses Category Examples Comments
Reduced Yield Quality Loss Scrap, Rework Rejects during warm-up, startup or
other early production.
The JIPE in its definition of TPM in 1971 stated that TPM seeks the following five key goals:
• maximize OEE, which includes availability, process efficiency, and product quality;
• take a systematic approach to reliability, maintainability, and life cycle costs (LCC);
• involve operations, materials management, maintenance, engineering, and administration in equipment management;
• involve all levels of management and workers; and
• improve equipment performance through small group activities and team performance.
• Autonomous maintenance: Equipment operators are the focal point of TPM activities. Although most operators
understand what their equipment does, few understand the underlying mechanisms of how it does it. The term
“autonomous maintenance” is used to describe the activities of the operators, which relate to equipment maintenance,
and to the independent study nature of the other equipment improvement activities. Operators would perform cleaning,
inspection, lubrication, adjustments, and minor component change outs and other light maintenance tasks requiring
some training and instruction, but not comprehensive craftsman skills. The operator gradually learns how to diagnose
equipment problems before they become serious.
• Equipment Management: In TPM, whenever equipment performs at a level less than is required, the performance
loss is recorded and monitored. These losses can be grouped into six categories: breakdowns, setup and adjustments,
idling and minor stoppages, reduced speed, defects, and yield losses. Breakdowns and setups cause downtime and
impact availability, reduced speed impacts the cycle time and defects, and yield losses impact quality. OEE is the key
TPM performance measure and is the product of availability, cycle time, and quality rate. The operator and maintainer
are trained to identify problems related to OEE and perform root cause analyses in teams to investigate the losses.
• Systematic Planning and Continuous Improvement: Within the maintenance department, the TPM methodology
encourages the development of systematic planning and control of preventive and corrective maintenance, and fully
sup-ports the autonomous activities performed by the operator. In plants where the basic operating and maintaining
environment has been improved to the point of diminishing returns, active maintenance prevention activities are
undertaken, as described earlier in the sections on designing for maintainability. Throughout, there should be a strong
emphasis on improving operator and maintainer skills.Spending on training is customarily on the order of 5–8 % of
the labor budget.
Autonomous Maintenance
The benefits of involving the operators in the success of TPM cannot be overemphasized. A pragmatic way of
achieving this is by using a systematic, data-based approach to skill transfer. Skill transfer is the process of moving
111
tasks requiring lower skills from the exclusive domain of one work group to a shared task zone. Under this policy, an
operator who has been properly trained and certified can perform a mechanic’s task and vice versa. This partnership
between operations and maintenance integrates maintenance and operation/manufacturing and has many benefits that
include the following:
• Operators and mechanics become multi-skilled, which leads to job enrichment and improved flexibility of workers.
• The involvement of operators in routine maintenance builds a sense of responsibility, pride, and ownership.
• Delay times are reduced and productivity is increased.
• Teamwork between operations and maintenance is promoted.
Traditional total productive maintenance was developed by Seiichi Nakajima of Japan. The results of his work on the
subject led to the TPM process in the late 1960s and early 1970s. Nippon Denso (now Denso), a company that created
parts for Toyota, was one of the first organizations to implement a TPM program. This resulted in an internationally
accepted benchmark for how to implement TPM. Incorporating lean manufacturing techniques, TPM is built on eight
pillars based on the 5-S system. The 5-S system is an organizational method based around five Japanese words and
their meaning:
The eight pillars of total productive maintenance focus on proactive and preventive techniques to help improve
equipment reliability. The eight pillars are: autonomous maintenance; focused improvement (kaizen); planned
maintenance; quality management; early equipment management; training and education; safety, health and
environment; and TPM in administration. Let's break down each pillar below.
1.
Autonomous maintenance: Autonomous maintenance means ensuring your operators are fully trained on
routine maintenance like cleaning, lubricating and inspecting, as well as placing that responsibility solely in
their hands. This gives machine operators a feeling of ownership of their equipment and increases their
knowledge of the particular piece of equipment. It also guarantees the machinery is always clean and
lubricated, helps identify issues before they become failures, and frees up maintenance staff for higher-level
tasks.
Implementing autonomous maintenance involves cleaning the machine to a "baseline" standard that the
operator must maintain. This includes training the operator on technical skills for conducting a routine
inspection based on the machine's manual. Once trained, the operator sets his or her own autonomous
inspection schedule. Standardization ensures everyone follows the same procedures and processes.
2. Focused improvement: Focused improvement is based around the Japanese term "kaizen," meaning
"improvement." In manufacturing, kaizen requires improving functions and processes continually. Focused
improvement looks at the process as a whole and brainstorms ideas for how to improve it. Getting small teams
in the mindset of proactively working together to implement regular, incremental improvements to processes
pertaining to equipment operation is key for TPM. Diversifying team members allows for the identification of
recurring problems through cross-functional brainstorming. It also combines input from across the company
so teams can see how processes affect different departments.
In addition, focused improvement increases efficiency by reducing product defects and the number of
processes while enhancing safety by analyzing the risks of each individual action. Finally, focused
improvement ensures improvements are standardized, making them repeatable and sustainable.
3. Planned maintenance: Planned maintenance involves studying metrics like failure rates and historical
downtime and then scheduling maintenance tasks based around these predicted or measured failure rates or
112
downtime periods. In other words, since there is a specific time to perform maintenance on equipment, you
can schedule maintenance around the time when equipment is idle or producing at low capacity, rarely
interrupting production.
Additionally, planned maintenance allows for inventory buildup for when scheduled maintenance occurs.
Since you'll know when each piece of equipment is scheduled for maintenance activities, having this
inventory buildup ensures any decrease in production due to maintenance is mitigated.
Taking this proactive approach greatly reduces the amount of unplanned downtime by allowing for most
maintenance to be planned for times when machinery is not scheduled for production. It also lets you plan
inventory more thoroughly by giving you the ability to better control parts that are prone to wear and failure.
Other benefits include a gradual decrease in breakdowns leading to uptime and a reduction in capital
investments in equipment since it is being used to its maximum potential.
4. Quality maintenance: All the maintenance planning and strategizing in the world is all for naught if the
quality of the maintenance being performed is inadequate. The quality maintenance pillar focuses on working
design error detection and prevention into the production process. It does this by using root cause
analysis (specifically the "5 Whys") to identify and eliminate recurring sources of defects. By proactively
detecting the source of errors or defects, processes become more reliable, producing products with the right
specifications the first time.
Possibly the biggest benefit of quality maintenance is it prevents defected products from moving down the
line, which could lead to a lot of rework. With targeted quality maintenance, quality issues are addressed, and
permanent countermeasures are put in place, minimizing or completely eliminating defects and downtime
related to defected products.
5. Early equipment management: The TPM pillar of early equipment management takes the practical
knowledge and overall understanding of manufacturing equipment acquired through total productive
maintenance and uses it to improve the design of new equipment. Designing equipment with the input of
people who use it most allows suppliers to improve maintainability and the way in which the machine
operates in future designs.
When discussing the design of equipment, it's important to talk about things like the ease of cleaning and
lubrication, accessibility of parts, ergonomically placing controls in a way that is comfortable for the operator,
how changeovers occur and safety features. Taking this approach increases efficiency even more because new
equipment already meets the desired specifications and has fewer startup issues, therefore reaching planned
performance levels quicker.
6. Training and education: Lack of knowledge about equipment can derail a TPM program. Training and
education applies to operators, managers and maintenance personnel. They are intended to ensure everyone is
on the same page with the TPM process and to address any knowledge gaps so TPM goals are achievable.
This is where operators learn skills to proactively maintain equipment and identify emerging problems. The
maintenance team learns how to implement a proactive and preventive maintenance schedule, and managers
become well-versed in TPM principles, employee development and coaching. Using tools like single-point
lessons posted on or near equipment can further help train operators on operating procedures.
7. Safety, health and environment: Maintaining a safe working environment means employees can perform
their tasks in a safe place without health risks. It's important to produce an environment that makes production
more efficient, but it should not be at the risk of an employee's safety and health. To achieve this, any
solutions introduced in the TPM process should always consider safety, health and the environment.
113
Aside from the obvious benefits, when employees come to work in a safe environment each day, their attitude
tends to be better, since they don't have to worry about this significant aspect. This can increase productivity
in a noticeable manner. Considering safety should be especially prevalent during the early equipment
management stage of the TPM process.
8. TPM in administration: A good TPM program is only as good as the sum of its parts. Total productive
maintenance should look beyond the plant floor by addressing and eliminating areas of waste in administrative
functions. This means supporting production by improving things like order processing, procurement and
scheduling. Administrative functions are often the first step in the entire manufacturing process, so it's
important they are streamlined and waste-free. For example, if order-processing procedures become more
streamlined, then material gets to the plant floor quicker and with fewer errors, eliminating potential
downtime while missing parts are tracked down.
Now that you have an understanding of the foundation (5-S system) and pillars on which the TPM process is built,
let's take a look at how to implement a TPM program. This is generally done in five steps: identifying a pilot area,
restoring equipment to prime operating condition, measuring OEE, addressing and reducing major losses, and
implementing planned maintenance.
What's the easiest to improve? Selecting equipment that is easiest to improve gives you the chance for
immediate and positive results; however, it doesn't test the TPM process as strongly as the other two options.
Where's the bottleneck? Choosing equipment based on where production is clearly being held up gives you an
immediate increase in total output and provides quick payback. The downside is that employing this
equipment as a pilot means you're using a critical asset as an example and risk the chance of it being offline
longer than you would like.
What's the most problematic? Fixing equipment that gives operators the most trouble will be well-received,
strengthening support for the TPM program. However, this doesn't give you as much immediate payback as
the previous approach, and it may be challenging to obtain a quick result from figuring out an unsolved
problem, leading to disinterest.
If this is your first time implementing a TPM program, your best choice is typically the first approach – the easiest
equipment to improve. If you have some or extensive experience with total productive maintenance, you may choose
to correct the bottleneck. This is because you can build temporary stock or inventory, making sure downtime can be
tolerated, which minimizes risk.
Include employees across all aspects of your business (operators, maintenance personnel, managers and
administration) in the pilot selection process. It's a good idea to use a visual like a project board where you can post
progress for all to see.
114
Step 2: Restore Equipment to Prime Operating Condition
The concept of restoring equipment to prime operating condition revolves around the 5-S system and autonomous
maintenance. First, TPM participants should learn to continuously keep equipment to its original condition using the
5-S system: organize, cleanliness, orderliness, standardize and sustain. This might include:
Photographing the area and current state of the equipment and then posting them to your project board.
Clearing the area by removing unused tools, debris and anything that can be considered waste.
Organizing the tools and components you use regularly (a shadow board with tool outlines is a popular
option).
Photographing the improvements of the equipment and surrounding area and then posting to the project board.
Creating a standardized 5-S work process to maintain the continuity of this process.
Auditing the process with lessening frequency (first daily, then weekly, etc.) to ensure the 5-S process is being
followed (update the process to keep it current and relevant).
Once you've established a baseline state of the equipment, you can implement the autonomous maintenance program
by training operators on how to clean equipment while inspecting it for wear and abnormalities. Creating an
autonomous maintenance program also means developing a standardized way to clean, inspect and lubricate
equipment correctly. Items to address during the planning period for the autonomous maintenance program include:
Identifying and documenting inspection points, including parts that endure wear.
Increasing visibility where possible to help with inspection while the machine is running (replacing opaque
guarding with transparent guarding).
Identifying and clearly labeling set points with their corresponding settings (most people put labels with
settings directly on the equipment).
Identifying all lubrication points and scheduling maintenance during changeovers or planned downtime
(consider placing difficult-to-access lubrication points that require stopping the machine on the outside of the
equipment).
Training operators to make them aware of any emerging or potential issues so they can report them to the line
supervisor.
Auditing the process with lessening frequency to ensure the checklist is being followed.
115
Since the biggest losses in regard to equipment are the result of unplanned downtime, it's important to categorize
every unplanned stoppage event. This gives you a more accurate look at where a stoppage is occurring. Include an
"unknown" or "unallocated" stoppage time category for unknown causes.
It's recommended that you gather data for a minimum of two weeks to get an accurate representation of the unplanned
stoppage time and a clear picture of how small stops and slow cycles impact production. Below is a simplified
example of a top 5 loss chart. Each loss is categorized and is in descending order from the loss that causes the most
downtime to the loss that causes the least.
116
Select a loss based on OEE and stoppage time data. This should be the biggest source of unplanned stoppage
time.
Look into the symptoms of the problem(s). Collect detailed information on symptoms like observations,
physical evidence and photographic evidence. Using a fishbone diagram to track symptoms and record
information while you're at the equipment is strongly recommended.
With your team, discuss and identify potential causes of the problem(s), check the possible causes against the
evidence you've gathered, and brainstorm the most effective ways to solve the issue.
Once the fix has been implemented, restart production and observe how effective the fix is over time. If it
resolves the issue, make a note to implement the change and move onto the next cause of stoppage time. If
not, gather more information and hold another brainstorming session.
Next, use proactive maintenance intervals. These intervals are not set in stone and can be updated as needed. For wear
and predicted failure-based components, establish the current wear level and then a baseline replacement interval.
Once these have been determined, you can create a proactive replacement schedule of all wear- and failure-prone
components. When doing this, use "run time" as opposed to "calendar time." Finally, develop a standardized process
for creating work orders based on the planned maintenance schedule.
You can optimize maintenance intervals by designing a feedback system. Things like log sheets for each wear- and
failure-prone component where operators can record replacement information and component condition at the time of
replacement will be key. Additionally, conduct monthly planned maintenance audits to verify the maintenance
schedule is being followed and the component logs are being kept up to date. Review the logs' information to see if
adjustments to the maintenance schedule need to be made.
Quality maintenance should be introduced to the TPM process when significant issues about quality are
being raised by customers or employees.
The best time to use early equipment management is when new equipment is in the design phase or is being
installed.
Safety, health and environment should always be at the forefront of any process or program design. Use it in
tandem with the five-step implementation process.
TPM in administration should be addressed before you implement the final version of your planned
maintenance schedule. Issues in administration like work order delays, processing problems and part
procurement greatly delay the rest of the production process.
117
118