100% found this document useful (1 vote)
296 views118 pages

Is 4002 Maintainability Engineering

This document provides information about maintainability engineering. It discusses key concepts like maintenance, maintainability, and challenges in maintenance. The objectives of planned maintenance are to minimize breakdowns, keep plants in good working condition at lowest cost, ensure availability of machines, and meet production and safety requirements. Managing costs of aging assets, determining correct inspection and maintenance times, and getting the most from computerized maintenance systems are some challenges. Regular maintenance is important to operate machines at their best efficiency for long periods.

Uploaded by

raj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
296 views118 pages

Is 4002 Maintainability Engineering

This document provides information about maintainability engineering. It discusses key concepts like maintenance, maintainability, and challenges in maintenance. The objectives of planned maintenance are to minimize breakdowns, keep plants in good working condition at lowest cost, ensure availability of machines, and meet production and safety requirements. Managing costs of aging assets, determining correct inspection and maintenance times, and getting the most from computerized maintenance systems are some challenges. Regular maintenance is important to operate machines at their best efficiency for long periods.

Uploaded by

raj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

BHARATH NIKETAN ENGINEERING COLLEGE

Aundipatty – 625 536, Theni District.

COURSE : M.E

DEPARTMENT : INDUSTRIAL SAFETY ENGINEERING

SUBJECT CODE : IS 5091

SUBJECT NAME : MAINTAINABILITY ENGINEERING

SEMESTER & YEAR : 01 & 01

1
Other Recourses

4. by Mohamed Ben-daya, Uday kumar and D.N.Prabakar murthy , “Introduction to Maintenance


Engineering Modelling, Optimaisation and Management” , Wiley .
5. Salih O. Duffuaa · A. Raouf, “Planning and Control of Maintenance Systems Modelling and Analysis”,
Springer.

2
Unit 1

UNIT I MAINTENANCE CONCEPT


The word ‘maintenance’ does not mean repairs. But maintenance really means to keep up and not only to
repair when it breaks down. It must be a regular and methodical process. The emphasis should be on maintenance
rather than on repair.

Machinery/equipment must be lined and levelled, wearing surfaces must be examined and replaced, oiling
schedules must be laid down at regular intervals. Thus a machine in good operating condition subjected to regular
inspection and adjustment will continue to produce quality products for a long time.

The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of
necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business,
governmental, and residential installations.

Maintenance functions are often referred to as maintenance, repair and overhaul (MRO), and MRO is also used
for maintenance, repair and operations.] Over time, the terminology of maintenance and MRO has begun to become
standardized. The United States Department of Defense uses the following definitions:

 Any activity—such as tests, measurements, replacements, adjustments, and repairs—intended to retain or


restore a functional unit in or to a specified state in which the unit can perform its required functions
 All action taken to retain material in a serviceable condition or to restore it to serviceability. It
includes inspections, testing, servicing, classification as to serviceability, repair, rebuilding, and
reclamation.
 All supply and repair action taken to keep a force in condition to carry out its mission.
 The routine recurring work required to keep a facility (plant, building, structure, ground facility,
utility system, or other real property) in such condition that it may be continuously used, at its original or
designed capacity and efficiency for its intended purpose.
Maintenance is strictly connected to the utilization stage of the product or technical system, in which the concept
of maintainability must be included. In this scenario, maintainability is considered as the ability of an item, under
stated conditions of use, to be retained in or restored to a state in which it can perform its required functions, using
prescribed procedures and resources.
In some domains like aircraft maintenance, terms maintenance, repair and overhaul also include inspection,
rebuilding, alteration and the supply of spare parts, accessories, raw materials, adhesives, sealants, coatings
and consumables for aircraft maintenance at the utilization stage. In international civil aviation maintenance means:

The performance of tasks required to ensure the continuing airworthiness of an aircraft, including any one
or combination of overhaul, inspection, replacement, defect rectification, and the embodiment of a modification or
a repair.

Need for maintenance


A maintenance engineer should possess significant knowledge of statistics, probability and logistics, and
additionally in the fundamentals of the operation of the equipment and machinery he or she is responsible for. A
maintenance engineer should also possess high interpersonal, communication, and management skills, as well as the
ability to make decisions quickly.

Typical responsibilities include:


 Assure optimization of the Maintenance Organization structure
 Analysis of repetitive equipment failures

3
 Estimation of maintenance costs and evaluation of alternatives
 Forecasting of spare parts
 Assessing the needs for equipment replacements and establish replacement programs when due
 Application of scheduling and project management principles to replacement programs
 Assessing required maintenance tools and skills required for efficient maintenance of equipment
 Assessing required skills for maintenance personnel
 Reviewing personnel transfers to and from maintenance organizations
 Assessing and reporting safety hazards associated with maintenance of equipment

Objectives of Planned Maintenance Activity


To achieve minimum breakdown and to keep the plant in good working condition at the lowest possible cost.
To ensure the availability of machines and services in an optimum working condition.
To keep machines and other facilities in a condition to be used to achieve the maximum profit without any
interruption.
To keep the time schedule of delivery to customers.
To meet the availability requirements for critical equipment.
To keep the maintenance costs as low as possible for non-critical equipment.
To control the effective and trained supervision.
To meet the quality requirements of the product.
To increase the profits of production systems.

Challenges in maintenance:-

Challenge 1: Managing the costs of aging assets

“As an asset begins to age, a significant challenge Maintenance Managers face is ensuring they spend the
optimum level of expense money (OPEX), at the right time, to maintain the highest level of integrity and
reliability that ensures continuous production and the safe operation of the facility.

Ultimately, for efficient, it is needed to spend the right amount of money to achieve this level of integrity
and reliability - which can be a difficult balance to achieve. And it was found by time after time that
maintenance teams tend to overspend their allocated maintenance budget, because they don’t have a clear
understanding of the relative risk associated with the failure of their equipment.

“The key to overcoming this challenge requires not only assessing the risk associated with each element of
the plant, but also determining how these risk rankings come together to form the basis of the planned
maintenance and inspection programs.”

TIPS AND CONSIDERATIONS:

 Have a clear and well understood RACI (Responsible, Accountable, Consulted and Informed)
program in place, so that the right level of management is making the decisions for expensive corrective
work.

 Aging assets all have repair histories that can be trended. Look at those trends to help determine the
next inspection or planned maintenance.

 Ask the manufacturers for a list of other companies that have the same equipment at a similar age
and in similar service, and then contact these companies for additional insight.

4
 Be aware that when to operate outside the stated integrity operating window (IOW), the inspection
and planned maintenance intervals will no longer be applicable. If the plant operates outside of these
IOWs, risk will usually be heightened, and inspection intervals need to be shortened.

Challenge 2: Knowing when to carry out inspections and maintenance

“Unexpected rotating or fixed equipment failure can result in significant production losses, or worse,
environmental or human losses for a company. Avoiding this type of failure is therefore at the top of the
priority list for Maintenance Managers.

“A best in class preventative maintenance and inspection program comes down to intervals. But when is the
right time to conduct routine maintenance? Is it when the manufacturer says it’s time, or when the plant
says it’s time?”

TIPS AND CONSIDERATIONS:

 Plan the maintenance based on risk of equipment failure

“While the recommended maintenance time intervals provided by the OEM (original equipment
manufacturer) can sometimes work, But the inspection and preventative maintenance program based on
risk should be more effective, and otherwise results in a drop in unexpected failures and an increase in
uptime.”

 Ensure that the top senior leaders of management of the facility take ownership, and support your risk-based
program through funding and people.

“Planned maintenance programs are expensive and take time - many take as long as 3 - 5 years to design and
implement successfully. If senior management doesn’t have a long-term view and commitment of people
and resources, then the program is set to fail.”

 Invest time and effort to have an accurate a functional CMMS in place

“when a plant puts in place a fully functional and accurate CMMS, along with conducting the proper
criticalities and risk assessments, surprises are minimized and many historical failures are not repeated
because the proper timing has been established for the inspections and maintenance.”

Challenge 3: Getting the most out of your CMMS

“The sole role of the CMMS (computerized maintenance management system) is to provide teams with
accurate and up-to-date information, to enable them to make the best decisions possible.

Typically the two extremes of CMMS users:

1. The plant with missing, inaccurate and inconsistent data, which forces the maintenance team to
spend a great deal of time on hardcopy reports and spreadsheets. Reporting to management can also
be late, and data can be out of date.
2. The site whose CMMS is accurate, consistently well-populated and maintained, with clear ownership
and accountabilities. In this scenario, the trained maintenance team is wasting no time with manual
reporting or planning, and the CMMS is used to gain helpful insights to make their work more
effective and streamlined, and can be accessed in real-time.

5
TIPS AND CONSIDERATIONS:

 Conduct a site-wide physical asset verification (PAV) to ensure the data in your CMMS is correct.

 Give ownership of the program to a single individual with senior management support and funding.

 Don’t accept one-off spreadsheets for reports. Always depend on the accuracy of the CMMS, and insist on
using that data.

 Invite the wider team to attend user group meetings hosted by the CMMS software provider, to learn about the
latest releases and to share experiences and learnings.

 Track success through appropriate KPIs, and always look for ways to improve.

Challenge 4: Shifting the balance from corrective to planned maintenance

“When a plant does nothing but corrective or reactive maintenance, the maintenance teams losing sight of
the bigger picture - which is to perform proper planned maintenance to avoid such breakdowns. And these
unexpected shutdowns can often result in blown budgets and missed production targets.

“Frustratingly, it can seem almost impossible to get ahead of the backlog in order to become more
proactive and implement a planned and risk-based preventative maintenance program.

“But the words of Peter Drucker, the famous management consultant and author - ‘What gets measured gets
managed’ - are incredibly important when it comes to maintenance. And measurement is often the missing
piece of the puzzle, that can slowly but surely help maintenance teams make that change.

“It’s like running long distance - if you don’t know what your time is each time, then you don’t know if
you’re getting better or worse.

“Continuous performance measurement is absolutely key to knowing if the working on the right
maintenance, following the plan, and achieving the budget, and will help to make that a predominantly
corrective and reactive maintenance plan, to a more planned and proactive plan.”

TIPS AND CONSIDERATIONS:

 Risk assess all rotating and fixed plant equipment with solid risk-based inspection (RBI) and reliability-
centred maintenance (RCM) programs

“By doing this, the maintenance team able to set the inspection and planned maintenance intervals
accordingly.”

 Track and trend all repair histories

“All Maintenance Managers know the bad actors - those assets that constantly break down or fail
unexpectedly, resulting in losses.

 Focus on the most important maintenance KPIs (key performance indicators)

“For starters, consider tracking preventative versus corrective maintenance work orders, plotted against
downtime. The maintenance person will know the planned maintenance program is starting to work when it
is shifted from CM to PM, and an increase in stream factors.

“By measuring these KPIs, the full picture of the maintenance effectiveness, and the challenging but
achievable goals to continue improving the team’s performance can be measured.

6
.

 Ensure data is being input into the CMMS correctly and consistently

“If it is bad data in the CMMS, then the tracking the KPIs becomes inaccurate. Take the time to prepare the
data behind the KPIs, so that the clearest view of the performance will be possible.”

Challenge 5: Creating a culture of teamwork and shared responsibility

“We all know it takes teamwork to effectively manage and run a group of assets, but a huge challenge will
be faced when maintenance teams are working within imbedded functional (e.g. maintenance, operations,
engineering), that reflect the historical culture of the plant.

“Many functional managers seem uninterested or frustrated when trying to combine forces, or share
responsibility, and the result is siloed teams operating in isolation, which is not beneficial for the overall
business.

TIPS AND CONSIDERATIONS:

 Every plant needs a rally point to tear down silos. A reduction in OPEX or an increase in stream time usually
works.

 Take the time to understand the issues experienced by both silos when working together previously, and
construct joint sessions to work through those issues together.

 Change the way business is done by instituting new lines of communications and inviting members of all silos
to participate in necessary planning sessions.

 And most importantly, get senior level buy-in by instituting a legitimate and generous recognition and rewards
program.

CHALLENGES IN MAINTENANCE
Maintenance managers perform a highly complex range of tasks. Both the productivity and generation of
value within asset intensive organizations depend on effective maintenance management programs (ensuring
the right things are done) as well as efficient maintenance delivery (ensuring things are done right first
time). As the leader of the maintenance organization, maintenance managers directly impact both aspects.
As such their role is complex and diverse, requiring continuous long and short-term trade-offs of production,
asset integrity, human resources, safety and commercial considerations, all of which impact on one or more
functional areas across the organization. The following article discusses some of the challenges maintenance
managers face in performing their day-to-day job.

Forecast And Prediction

One of the challenges facing the modern maintenance manager is to increase the operational efficiency of
the organization and reducing unscheduled downtime by implementing maintenance management programs
that appropriately balance preventive, predictive, corrective and replacement options. Balancing these

7
options, are required in order to maximize the productive utilization of these assets over the medium to long-
term, while ensuring the integrity thereof, at optimum cost.

This requires maintenance managers to have deep knowledge of the equipment and the facilities under their
care, the operational environment and load on these assets, manufacturer's recommendations, guarantees,
manuals and instructions. These need to be translated into effective maintenance strategies and detailed task
plans that enable forward planning of task such as cleaning, lubrication, inspection, overhaul etc., together
with the associated parts, services, tools etc. as well as end of life preplacement planning and budgeting.

Contingency management

Despite the best forward planning, unplanned equipment failures are an ever-present reality, requiring
maintenance managers to also be prepared for appropriate contingency management when these do
happen. While the incorporation of an effective reactive maintenance response capability forms an integral
part of any well-balanced maintenance program, the challenge for maintenance managers is to identify
appropriate response / execution options and strategies.

In this regard, maintenance managers need to improvise and manage the speed of response including
logistical support operations (e.g. specialized equipment and services, long delivery parts etc.) necessary to
solve, as quickly as possible, faults and failures that may occur unexpectedly..

Cost planning and control

While it may in theory be possible to deploy a perfect technical maintenance program, maintenance
managers face the very real challenge of financial budget constraints that limit their planning options. As a
result, maintenance managers continually need to balance operational expenditures on technology, labour,
parts and service costs, while keeping in mind the impact of these decisions on the long-term productive
value of the installed asset base. This require maintenance managers to consider all cost types, including
fixed, variable, direct or indirect costs as well as the corresponding depreciation and tax implications. In
this regard it is critically important for maintenance managers to understand the business context of the
assets under their care, and the commercial impact of maintenance related decisions and trade-offs, both in
the short as well as over the long term.

A particularly challenging area involves the optimization of MRO parts inventory and costs. On the one
hand it is vital to have all the parts, and materials necessary to carry out maintenance tasks, without delaying
urgent maintenance tasks. At the same time, it is also important not to carry unnecessary stock, as this
negatively impacts operating capital, stores infrastructure and labour costs, while over the longer-term lead
to unnecessary write-offs due to losses, obsolescence, or and deterioration.

8
Coordination of work teams

While maintenance has a primary focus on physical assets and the associated technological/ technical
elements thereof, maintenance execution is a people driven process that requires the right balance between
process, technology support and leadership. The planning and coordination of maintenance tasks requires a
high degree of people management, communication, negotiation and planning skills.

Knowledge, experience and preparation vary according to the profile of the maintenance organization. Thus,
collaborative work is of vital importance. It is important that the maintenance manager helps facilitate
communication channels and integrated work between different parts of the organization (e.g. engineering,
production, procurement, stores, safety department etc.) as well as within the maintenance organization
itself, including, various disciplines, support workshops, service providers etc. Maintenance managers are
also expected to contribute / function as part of the senior management team, responsible for making key
business, operational and financial decisions.

Establishing a collaborative work model, in which information can be shared in real time about completed
tasks, breakdowns, procedures, expiration dates, etc., is one of the essential challenges of modern
maintenance management.

Effective time management

The coordination and monitoring of the maintenance work groups, the assignment of tasks, meetings with
vendors and distributors, answering questions, doubts and emails, schematizing work plans and
maintenance, and even reacting to all kinds of unforeseen situations, are just Some of the multiple tasks that
every maintenance manager must face every day.

The organization, prioritization and execution of each of these activities involve the handling of large
amounts of information, immediate access to documents, diagrams and photos, and the ability to share them
with the different members of the work team, facilitating the function of each one, and guaranteeing the
quality and effectiveness of the work. All this makes the efficient management of time, becomes a daily
challenge for the maintenance manager.

TERO TECHNOLOGY
Tero technology is the maintenance of assets in optimal manner. It is the combination of management,
financial, engineering, and other practices applied to physical assets such as plant, machinery, equipment,
buildings and structures in pursuit of economic life cycle costs.
It is concerned with the reliability and maintainability of physical assets and also takes into account the
processes of installation, commissioning, operation, maintenance, modification and replacement.

9
Decisions are influenced by feedback on design, performance and costs information throughout the life
cycle of a project.
It can be applied equally to products, as the product of one organization often becomes the asset of another.

MAINTENANCE COSTS
What Are Maintenance Cost?

The term maintenance expense refers to any cost incurred by an individual or business to keep their assets in
good working condition. These costs may be spent for the general maintenance of items like running anti-virus
software on computer systems or they may be used for repairs such as fixing a car or machinery. These expenses are
in addition to the actual purchase price of an asset, so individuals and companies should be able and willing to foot the
bill in order to keep their assets in running order.

Understanding Maintenance cost


Consumers who purchase assets should expect to pay maintenance expenses at some point in the future if they
want to use them over a period of time. As mentioned earlier, these costs are incurred in order to keep an individual or
company's assets in good working order.

How much an individual pays in maintenance expenses depends on the type of asset and how often upkeep is
required and performed. Individuals may incur maintenance costs for homes, automobiles, appliances, and electronics,
while businesses pay for maintenance on their fixed assets—vehicles, equipment, facilities—and their technology.

Keeping up to date with regular maintenance can keep costs down because the asset is serviced on a timely basis.
Neglecting assets and waiting until the last minute to service them may result in higher maintenance costs. If the asset
isn't maintained at all, the owner may have to replace it altogether.

KEY TAKEAWAYS

 Maintenance expenses are necessary costs for upkeep—whether it's a car, home, rental apartment, or
condominium.
10
 Neglecting regular maintenance—and not paying expenses for upkeep—may result in higher maintenance
costs and, even worse, replacement costs for the asset itself.
 Individuals pay for maintenance on things like homes, automobiles, and appliances, while companies pay for
upkeep on fixed assets and technology.
 Special Considerations
 Consumers should consider the initial price tag as well as the item's ongoing maintenance expenses when they
purchase an item that requires upkeep. This is why it's always a good idea for any consumer to set some
money aside for maintenance expenses. Failure to do so may result in financial distress when it comes time to
pay for these charges in the future.

 It's always a good idea to have money set aside for the regular maintenance of your assets.

Types of Maintenance Expenses


 As mentioned above, maintenance expenses depend on the type of asset held. Maintenance expenses for
homes include lawn care, plumbing, electrical, and roof repairs as well as replacement of worn-out appliances.
Homeowners must also pay premiums for hazard insurance. This expense protects the owner from damage to
the home from natural events like severe storms, fires, tornados, and earthquakes.

Landlords and Tenants


Most of the maintenance expenses for a rental property are the landlord's responsibility. Snow removal, sewage, trash
pickup, lawn care as well as the sidewalks, windows, and any exterior expense falls to the landlord to pay. If the
apartment or rental home is furnished, any replacement or repair of the furniture is the landlord’s responsibility.
Cleaning or replacement of any carpeting, as well as painting, is also paid by the landlord.

Government regulations require landlords to maintain certain safety and living standards. For example, the heat in an
apartment building must meet minimum standards. The infrastructure, such as heating and ventilation, must be
adequately maintained by the landlord. Some of the upkeep and maintenance may fall on the tenant. The rental
agreement should define what expenses are the renter's responsibility.

Condo Fees
Monthly fees are common for people who own condominiums. Condo fees can range from $50 to $1,000 depending
on the property, building, and location. If the building has a concierge, swimming pool, tennis courts, or gym, those
costs are built into the monthly condo fee.

Buyers who want maintenance-free living should consider the monthly fees when calculating their affordability and
the potential mortgage payment for the condominium. If, for example, the mortgage payment is $1,500 per month
while the condo fee is $600 per month, the condo fee represents nearly 30% of the total monthly payments to live
there.

Example of Maintenance Expenses


Owning a vehicle requires regular maintenance—oil changes, tire rotation, engine flushes. Can owners can enjoy their
vehicles by keeping up to date with and paying for these expenses on a timely and regular basis. People who don't
maintain their vehicles or wait too long may have to pay more in maintenance and may even have to pay
the replacement cost for a new vehicle.

SCOPE OF MAINTENANCE DEPARTMENT

Plant maintenance services provide attention for the maintenance of machines and equipment’s due
to their frequent use and strategic position in the entire production function. A machine is the name given to
a mechanism consisting of the services of sequential components each performing its specific function
which is part of the whole system or mechanism.

11
For any machine some of its parts are fixed while other are replaceable. Such equipment or mechanical
devices and their components require constant and continuous services such as cleaning, lubrication, repair
and replacements etc. so that their operational efficiency can be maintained.

Further it may be noted that plant maintenance service is not confined to the equipment and machines.

Under the wide spectrum of the plant maintenance service, the maintenance of the buildings power plant,
material handling equipment’s, heating and air conditioning equipment’s, waste disposal systems, wash
rooms, water supply, jigs and fixtures and fire-fighting facilities etc., also need attention. The activity of the
plant maintenance service also includes the provision of maintenance equipment and stock of repair parts
and maintenance materials.

Maintenance covers two broad categories of function as listed below:

(A) Primary Functions:


(i) Equipment inspection, cleaning and lubrication.
(ii) Alterations to existing equipment’s and buildings.
(iii) Maintenance of existing plant buildings and grounds.
(iv) Maintenance of existing plant and equipment’s.
(v) New installations of equipment and buildings.
(vi) Generation and distribution of utilities.

(B) Secondary functions:


(i) Property accounting.
(ii) Insurance administration against theft and fire etc.
(iii) Store keeping for maintenance purposes.
(iv) Plant protection against fire etc.
(v) Pollution and noise control.
(vi) Waste disposal.
(vii) Salvage.
(viii) Providing caretaker services.
(ix) Any other services concerning maintenance as delegated by plant management.

Types/Areas of Maintenance:
The major areas of maintenance are:
(1) Civil Maintenance:
Building construction and maintenance, maintaining service facilities such as water supply, steam, gas,
compressed air, heating and ventilating, air conditioning, plumbing, carpenter and painting work. Also
included in civil maintenance are fencing, land scarping, lawns, gardening and maintaining drainage and
fire-fighting equipment’s etc.

(2) Electrical Maintenance:


Maintaining electrical equipment’s such as generators, transformers, motors, switch gears, electrical
installations, lighting, fans and control panels etc.

(3) Mechanical Maintenance:


Maintaining machines and equipment’s, transport vehicles, compressors furnaces, steam generators and
material handling equipment’s. Lubrication of machines is also part of mechanical maintenance work.

12
Fig. 34.1 illustrates the various types of maintenance. Basically, maintenance work can be planned or
unplanned. Planned maintenance is maintenance work organized and carried out with foresight, control and
records, to a predetermined plan. Unplanned maintenance is caused due to breakdowns that have not been
foreseen.

(i) Planned Maintenance:


Planned maintenance is also known as scheduled maintenance or productive maintenance. Breakdown of a
machine or an equipment does not occur in a planned manner but maintenance work can be planned well in
advance.

Planned maintenance involves the inspection of all plants and equipment’s, machinery, buildings according
to a pre-determined schedule in order to overhaul, service, lubricate or repair before actual breakdown in
service occurs. The purpose is to reduce the machine stoppage due to sudden breakdown requiring
emergency maintenance.

Features of Planned Maintenance:


(i) A maintenance work which is well organized and pre-planned.
(ii) It is carried out with prior planning, foresightedness controls and records.
(iii) It is carried out in a scientific way.
(iv) It covers comprehensive planning as well as the execution part for any job concerning maintenance.
(v) It is generally carried out according to some pre-defined maintenance programme.
(vi) It is applicable to any type of maintenance work such as corrective, preventive as well as replacement
work.
Basic Requirements of Planned Maintenance:
(i) A well-defined maintenance policy is to be followed.
(ii) Maintenance policy planned in advance is applicable.
(iii) The maintenance work is to conform to the pre-decided maintenance plan.
(iv) Historical and statistical records which are compiled and maintained provide guidelines for future
maintenance policy.
In view of the tough competition the manufacturing organizations want to manufacture products at most
economical cost. So the maintenance plan must be well conceived and organized to achieve this basic aim of
any manufacturing enterprise. So not only the maintenance and production functions but the economic
aspects should also be taken into consideration.
Thus whenever a policy of planned maintenance is practised the planning is mainly of financial nature It has
to be ensured that sufficient funds are available for providing manpower, machines and other inputs required
for planned maintenance.

The following factors would help deciding the planned maintenance:

13
(i) What is to be maintained i.e. the individual item of the plant and equipment to be maintained.
(ii) The details of how each item is to be maintained i.e., method to be adopted.
(iii) What maintenance resources would be needed i.e., manpower, tools, spares and test equipment etc. to
carry out the maintenance work.
(iv) The frequency of carrying out maintenance inspection.
(v) The method of managing the maintenance operation.
(vi) The method of analysis, rectification and control must be pre-decided in order to evaluate the
performance of maintenance system and improvements if possible.
So it is the duty of maintenance engineering department in a manufacturing enterprise that all the above
mentioned factors must be defined clearly.
This will form the basis and structure of a practical maintenance programme which must possess the
essential details regarding the following features:
(i) List of all the machines/equipment, plant item which require maintenance.
(ii) Comprehensive maintenance programme/schedule for each and every item needing maintenance.
(iii) A time table/programme of maintenance events when each work must be carried out.
(iv) A technique of ensuring the maintenance work listed in the time table.
A method of recording the results achieved and thus judging the implementation/effectiveness of the
maintenance programme.
Thus any such programme should be easy to operate, should need minimum manpower and paper
work of recording etc. But it must indicate the following aspects clearly:
(a) What requires maintenance or what is to be maintained?
(b) When/where it is to be maintained?
(c) How it is to be maintained?
(d) Who will do the maintenance work?
(e) Whether maintenance work is of desired level?
(ii) Unplanned Maintenance:
It is an operation/activity carried out without any prior planning. Generally it is very urgent in nature.

Such type of maintenance operations are required in case of heavy and total breakdowns which may occur
without any prior indication. Such breakdowns are generally harmful to the system and they may cause loss
of human life also. In order to fight such unwanted situations provisions are made to provide maintenance
with prior planning, preparations, and scheduling etc.

Thus in most of the cases the unplanned maintenance is emergent in nature in view of the fact that here the
recovery time is the most important factor in order to minimize the consequences of serious breakdown. The
examples of such failures or breakdown may be bursting of boilers or failure of pipe lines carrying
fluids/gases.

Emergency Maintenance:
In reality emergency maintenance is a special type of unplanned maintenance operation which is performed
with prior planning. It is necessary to implement immediately in order to avoid serious consequences of a
heavy breakdown. Heavy loss of production, heavy maintenance cost and sometimes even loss of human life
are the serious consequences.

Thus emergency maintenance may be defined as a sort of unorganized maintenance activity which should be
executed only by utilizing available resources in minimum possible time. Emergency maintenance is
essential to minimize the time delays as well as heavy production losses by virtue of serious break- down or
unpredictable failures.

Difference between Breakdown and Emergency Maintenance


Emergency Maintenance:
1. It is always unplanned.

14
2. The nature of failure is very serious.
3. No time lag is allowable.
4. Recovery time is given top priority.
5. Generally occurs in pressure vessels such as boilers or turbines where the risk involved is very high.
6. Implementation is very urgent and corrective in nature with available resources.
7. The delay in implementation may be serious.
Breakdown Maintenance:
1. It may be planned or unplanned.
2. The nature of failure is normal or not very serious.
3. Permissible time lag may be allowed.
4. Maintenance cost is the first priority.
5. Generally occurs in general engineering work, where the delay in timely repair is not very risky.
6. Implementation is not very urgent but corrective in nature.
7. The effect of delay in implementation is not very serious.
Economic Aspects of Maintenance:
The main goal/objective of properly run maintenance department is to make available the plant, equipment
and machinery for productive utilization during the scheduled hours operating to pre-decided standards with
minimum possible waste and minimum total cost involved.

The total cost means sum of the maintenance labour cost, material cost, cost of lost production due to non-
availability of productive equipment/machinery or their reduced operational efficiency due to lack of
maintenance.

Maintenance is thus a service which has economic value to the production process. When this value is
calculated and expressed in quantitative terms, then only the comparison of cost effectiveness of various
maintenance policies is possible.

Fig. 34.3 illustrates the total maintenance cost which has been optimized by equating various direct as well
as indirect maintenance costs. There are various mathematical relations to evaluate maintenance in
performance in numerical terms, either as a single overall factor or as a series of factors.

A simple and very important efficiency index may be expressed by


where
E = K/mTm+ nt + CW
E = maintenance efficiency index.
K = a constant such that value of the expression is 100 for the base year.
m = total cost of maintenance in the base year.
n = total cost of lost time due to maintenance in the base year.
C = total cost of waste material (scrap) in the base year.
Tm = Total cost of maintenance expressed as percentage of the replacement value of the plant and
equipment.
t = down time to maintenance operations expressed as a percentage of the scheduled production hours.
W = material wastage due to maintenance operations expressed as percentage of total output at that stage of
the process.
In view of the above discussion it can be said that extent of maintenance should be such that it may not
became me uneconomical. For this purpose cost of breakdown and cost of maintenance examined. It is a fact
that when the cost of maintenance increases, the cost of breakdown and repairs decreases.
As shown in Fig. 34.3 after a certain stage any increase in the maintenance expenditure becomes
uneconomical. Fig. 34.3 shows the point of optimum availability of plant and equipment in desired operating
condition and the optimum maintenance cost.

15
16
UNIT II

MAINTENANCE MODELS

Proactive maintenance

Proactive maintenance is the maintenance philosophy that supplants “failure reactive” with “failure proactive” by
activities that avoid the underlying conditions that lead to machine faults and degradation.
Unlike predictive or preventive maintenance, proactive maintenance commissions corrective actions aimed at failure
root causes, not failure symptoms. Its central theme is to extend the life of machinery as opposed to

1. making repairs when often nothing is wrong,


2. accommodating failure as routine or normal, or
3. detecting impending failure conditions followed by remediation.
Proactive maintenance depends on rigorous machine inspection and condition monitoring. In mechanical machinery it
seeks to detect and eradicate failure root causes such as wrong lubricant, degraded lubricant, contaminated lubricant,
botched repair, misalignment, unbalance and operator error.
Reactive Maintenance
Reactive maintenance (also known as breakdown maintenance) refers to repairs that are done when
equipment has already broken down, in order to restore the equipment to its normal operating condition.

While reactive maintenance can have a place in a well-rounded maintenance strategy, it shouldn’t be your
go-to for all repairs.

Advantages of reactive maintenance


Generally speaking, it takes less time and money to do nothing than it does to do something, and this holds true when
it comes to reactive maintenance. There is no initial cost associated with reactive maintenance, and it requires far less
planning than preventive maintenance, for instance. But this is a very shortsighted approach, and relying exclusively
on reactive maintenance in your facility is not sustainable for the long term.
Disadvantages of reactive maintenance
1)More expensive
Unexpected downtime during production runs can result in late orders, damaged reputations and impacted revenue. On
top of that, the unpredictable nature of reactive maintenance means that labour and spare parts may not be readily
available so organizations can end up paying a premium for emergency parts shipping, travel time, and after-hours
support.
2)Shorter asset life expectancy
Reactive maintenance does not keep systems running in optimal “as new” condition. In a lot of cases, you’re doing
just enough to get a machine up and running again, and over time, systems that have been patched again and again
deteriorate faster and don’t maximize their initial capital cost investment.
3)Safety issues
When work is scheduled, technicians have time to review the standard procedures and safety requirements to complete
the job correctly. Technicians tend to take more risks when maintenance work is reactive because they’re under
pressure to get systems running without delay.
4)Inefficient use of time
While planned maintenance can be included in a production schedule, reactive maintenance tends to catch you
unawares, and technicians spend time running around looking for the correct manuals and schematics, ordering the
right parts and trying to diagnose and fix the issue.
5)Bad for backlog

17
Emergency repairs are usually prioritized at the expense of planned work, which may be pushed or cancelled
completely. This can lead to maintenance backlog which is really hard to get on top of once it starts to pile up.

6)Higher energy costs


When equipment is not properly maintained, it uses more energy. Doing simple things like greasing moving parts or
changing filters can reduce energy consumption by 15%.

IMPERFECT REPAIR OR IMPERFECT MAINTENANCE: A maintenance action does not make a


system like as good as new, but younger. Usually, it is assumed that imperfect maintenance restores the
system operating state to somewhere between as good as new and as bad as old. Clearly, imperfect repair
(maintenance) is a general repair (maintenance) which can include two extreme cases: minimal and perfect
repair (maintenance). Engine tune-up is an example of imperfect maintenance because an engine tune-up
may not make an engine as good as new but its performance might be greatly improved.

Worse repair or maintenance: a maintenance action which makes the system failure rate or actual age
increases but the system does not break down. Thus, upon worse repair system’s operating condition
becomes worse than that just prior to its failure.
Worst repair or maintenance: a maintenance action which undeliberately makes the system fail or break
down.
Some possible causes for imperfect, worse or worst maintenance are due to the maintenance performer,
Repair the wrong part, Only partially repair the faulty part, Repair (partially or completely) the faulty part
but damage adjacent parts, Incorrectly assess the condition of the unit inspected, Perform the maintenance
action not when called for but at his convenience (the timing for maintenance is off the schedule).

It is proposed various methods for modeling imperfect, worse and worst maintenance. It is necessary to
summarizing these methods. This will be helpful to rectify the imperfect, worse and worst maintenance
because these modeling methods can be utilized in various maintenance and inspection policies.

Imperfect maintenance models for various policies:


Age-dependent PM policy:
In the age-dependent PM (preventive maintenance) model, a unit is preventively maintained at
predetermined age T, or repaired at failure, whichever comes first. For this policy there are various
imperfect maintenance models according to the conditions that either or both of PM and CM (corrective
maintenance) is imperfect. One of the pioneer imperfect maintenance models for the age-dependent PM
policy is due to Nakagawa (1979a). He considers three age-dependent PM models with imperfect PM and
perfect or minimal repair at failure. He derives the expected maintenance cost rate and discusses the optimal
maintenance policies in terms of PM interval time T.
Periodic PM policy:
In the periodic PM policy, a unit is preventively maintained at fixed time intervals and repaired at
intervening failures.
Failure limit policy: This policy assumes that PM is performed only when the failure rate or reliability of a
system reaches a predetermined level.
Sequential PM policy: a sequential PM policy where PM is done at fixed intervals xk where xk s xk_ , for k
= 2,3, . . . This policy is very practical because most systems need to perform maintenance more frequent
when the age increased.
Repair limit policy: When a system fails, the repair cost is estimated and repair is undertaken if the
estimated cost is less than a predetetmined limit; otherwise, the system is replaced. This is called repair cost
limit policy.

18
MAINTENANCE POLICIES:-

The maintenance policy of a productive system provides specific answer to problems concerned with the
selection of specific components parts of a system for maintenance, decision regarding the specific forms of
maintenance to be used, a choice between internal and external maintenance and a further choice between centralized
and decentralized maintenance in case of internal maintenance.

Moreover not all items are influenced/controlled by preventive maintenance. For example, an item showing a
time independent, i.e. a Negative Exponential failure behaviour, then the reason of failure is external to the item
hence, any amount of preventive replacement is not going to serve intended purpose.

Preventive maintenance policy is appropriate for items that wear out with time due to use i e. for items that
show a normal failure mode.

Furthermore, such a policy may be useful only if the costs of preventive maintenance are significantly lower
than those of the breakdown maintenance replacement which means that the item should be simply replaceable item
and not a complex one for replacement operation.

Of course, preventive replacement cannot be rejected outright for a complex part but the “cost-cum-safety”
factors have to be taken into consideration while deciding a maintenance policy.

Primarily, maintenance policy must answer the questions of the extent of activities and the size of the
maintenance department. As far as the extent of activities is concerned, practices vary across companies. Small
enterprises, for example, use the maintenance department for simple repair and replacement.

A major non production engineering job in these plants such as an addition to construction of a new building
is handled through some outside experts with only token aid from plant’s maintenance department. Reverse is the case
when large companies/plants are involved since they have their own more specialized staff in all major non-
production engineering jobs.

With regard to equipment maintenance two practices are commonly followed. One practice is to have a well-
planned and organized maintenance programme formulated to secure maximum life and utilization of machinery. The
second practice is to adopt a policy of minimum maintenance and maximum wear. This practice is more economical in
view of the fact that the equipment is usually superseded before it wears out.

As far as the size of the maintenance department, manufacturing work force tends to have large maintenance
crews in order to solve their breakdown problems on a moment’s notice. It is for the management of the enterprise to
strike a balance between prompt and delayed maintenance services to be provided.

Procedure to Select Effective Maintenance Policy:


The following procedure would be helpful in selection of an appropriate maintenance policy:

(1) First identify the policies which are effective.


(2) Then decide the most desirable policy.
The choice is related with minimum cost and safety criteria.

19
PREVENTIVE VS BREAKDOWN MAINTENANCE:-

Preventive maintenance identifies any issues before equipment failure or downtime, through routinely scheduled
maintenance. Breakdown maintenance works by running equipment until it breaks down, in which case repairs and
maintenance are performed.

Preventive maintenance operates based on a schedule, where maintenance tasks are completed at specific intervals
prior to downtime events. This is because the goal of preventive maintenance is to maximize the lifespan and runtime
of equipment.

Breakdown maintenance is somewhat specific because it’s not applicable to many pieces of equipment. For example,
it is not a suitable maintenance strategy for anything involved in human safety and health, nor is it a good strategy for
critical or central pieces of equipment.
However, it works well with things that are designed to be used until they’re inoperable. This can include everything
from light bulbs to residential water heaters.

20
Even though a water heater may be considered a critical piece of equipment, the time spent PMing a water heater—
which includes disrupting the resident every X months—is probably more intrusive than fixing a broken system every
decade or so. This shows that breakdown maintenance is applicable for critical pieces of equipment in certain cases,
especially in the property management space.

Preventive maintenance, on the other hand, is a solid maintenance plan for almost all pieces of equipment in a factory
setting. In a residential setting, however, it only makes sense to perform PMs on equipment in non-living areas.

Differences between preventive and breakdown maintenance

Preventive maintenance Breakdown maintenance

Preventive maintenance (PM) is work that is Breakdown maintenance (BM) is work that is
Definition scheduled based on calendar time, asset only performed when a piece of equipment
runtime, or some other period of time. breaks down or has a downtime event.

Trigger Time Downtime event

Cost Low Low

Cost Dependent on equipment and breakdown


12% to 18% [1]
Savings maintenance plan

-Maintenance software for scheduling


Resources -Maintenance scheduler (for larger -Maintenance software for downtime triggers
Needed
organizations) -Necessary replacement equipment
-Preventive maintenance checklists

-Lowers overall costs of non-critical


-Extends the lifetime of assets
manufacturing equipment
Pros -Optimizes planning of maintenance and
-Minimizes preventive maintenance costs on
resources
nonessential equipment

-Can be expensive to keep up over the long -Can’t be used for many types of equipment,
term especially safety equipment
Cons
-Labor intensive due to constant maintenance -Requires careful planning and execution to
tasks work effectively

An organization wants to lower the cost of


An organization wants to decrease unplanned
constantly replacing a variety of light bulbs in
downtime and emergency maintenance but
a facility. Instead of replacing them at
does not have a large maintenance budget. As a
designated intervals, the organization decides
solution, they implement a PM program for
Use Case to adopt a breakdown maintenance plan, only
select assets. Work orders are scheduled for
replacing light bulbs when they are completely
inspections, lubrication, filter replacements,
burned out. This saves time and reduces the
and parts replacements based on
overall cost of buying light bulbs as the
recommendations from OEMs.
necessary amount of spares is lower.

21
Preventive Maintenance Workflow

Overview
Preventive maintenance, also spelled preventative maintenance, is carried out with the goal of increasing asset lifetime
by preventing excess depreciation and impairment or untimely breakdown. This maintenance includes, but is not
limited to, adjustments, cleaning, lubrication, repairs, and parts replacements.

Due to the unique needs of different assets, the type and amount of preventive maintenance required varies. Because
of this, it can be challenging to establish a successful preventive maintenance program. However, a good rule of
thumb is to start with a time-based PM program.

Overview
Preventive maintenance, also spelled preventative maintenance, is carried out with the goal of increasing asset lifetime
by preventing excess depreciation and impairment or untimely breakdown. This maintenance includes, but is not
limited to, adjustments, cleaning, lubrication, repairs, and parts replacements.

Due to the unique needs of different assets, the type and amount of preventive maintenance required varies. Because
of this, it can be challenging to establish a successful preventive maintenance program. However, a good rule of
thumb is to start with a time-based PM program.

Types of preventive maintenance


Any maintenance that is not reactive maintenance is preventive maintenance. And there are many different types of
preventive maintenance that require different types of technology and expertise.

Four common types of preventive maintenance (PM) include:

Calendar-based maintenance
A recurring work order is scheduled for when a specified time interval is reached in the computerized maintenance
management system (CMMS).
Usage-based maintenance
Meter readings are used and logged in the CMMS. When a specific unit is reached, a work order is created for routine
maintenance.
Predictive maintenance
When work order data is logged in the CMMS, maintenance managers can predict when an asset will crash based on
historical events and create specific PMs to prevent them from happening again.

22
Prescriptive maintenance
This is similar to predictive maintenance, but instead of only the maintenance manager prescribing PMs, machine
learning software assists them.
How preventive maintenance decreases downtime
Think about it in simple terms such as with your car. Oil changes and regular servicing are part of a preventive
maintenance schedule that ensures your car runs properly and without unexpected failure. If you ignore that
maintenance schedule and miss service intervals, your car will depreciate in value and utility. The same goes for
machinery in manufacturing plants and equipment in facilities.

With a PM schedule in place, maintenance managers can decrease downtime. This schedule is usually automated with
a CMMS that comes with PM scheduling software. However, managers are always cautious of over-maintaining
assets. There’s a point where preventive maintenance starts costing too much in relation to the amount of downtime it
prevents.

Examples of preventive maintenance


Some aspects of a solid preventive maintenance program are obvious. Production line equipment should be suitably
maintained to prevent breakdown, and infrastructure elements such as heating, ventilation, and air conditioning
(HVAC) should be routinely inspected, cleaned, and updated as required. However, there may be other systems that
also need routine maintenance to prevent failure.

How about your water systems? Do you have appropriate filtration? Are you running warm water systems that may be
a breeding area for serious bacterial infections such as Legionnaires Disease? How about your electrical systems and
the need to ensure that they not only comply with legislation but do not degrade over time? Doors, stairways, lighting,
and flooring all need periodic inspection and maintenance, too.

The list of what needs to be included in your preventive maintenance plan can be bewildering, but there are certain
guidelines that give you at least a basis to conform too. The American National Standards Institute (ANSI) carries a
lot of information on preventive maintenance and is a good place to start if you are unsure as to the extent of the
program that you need.

Benefits of preventive maintenance


There are more benefits of implementing a preventive maintenance program than merely reducing the amount of
unplanned downtime. Other benefits include:

23
Perhaps the greatest benefit is increased safety, especially for a company that owns heavy machinery. The
price of employee safety is never too high and organizations such as the Occupational Health and Safety
Administration (OHSA) rigorously enforce government policy.

INSPECTION MODELS

The basic purpose behind an inspection is to determine the state of the equip-ment. Once indicators,
such as bearing wear, gauge readings, and quality of the product, which are used to describe the state, have
been specified, and the inspection made to determine the values of these indicators, some further
maintenance action may be taken, depending on the state identified. When the inspection should take place
ought to be influenced by the costs of the inspection (which will be related to the indicators used to describe
the state of the equipment) and the benefits of the inspection, such as detection and correction of minor
defects before major break-down occurs.

Three classes of inspection problems are examined in this chapter:

1. Inspection frequencies: for equipment that is in continuous operation and subject to breakdown
2. Inspection intervals: for equipment used only in emergency conditions (failure-finding intervals)
3. Condition monitoring (CM) of equipment: optimizing condition-based maintenance (CBM)
decisions.

OPTIMAL INSPECTION FREQUENCY:


MAXIMIZATION OF PROFIT

Statement of The problem

Equipment breaks down from time to time, requiring materials and trades people to repair it. Also,
while the equipment is being repaired, there is a loss in production output. To reduce the number of
breakdowns, we can periodically inspect the equip-ment and rectify any minor defects that may otherwise
eventually cause complete breakdown. These inspections cost money in terms of materials, wages, and loss
of production due to scheduled downtime.

What we want to determine is an inspection policy that will give us the correct balance between the
number of inspections and the resulting output, such that the profit per unit time from the equipment is
maximized over a long period.

Such a system is depicted in Figure 3.2, in which it is seen that the complex system can fail for many
reasons, such as that caused by component 1, component 2, and so on. Each of these causes of equipment
failure could have its own inde-pendent failure distribution. Of course, it does not need to be a physical
component that causes the equipment to cease functioning; it could well be a software problem that is the
cause (mode) of equipment failure. Clearly, as the frequency or intensity of inspections increases, there is an
expectation that the frequency of equipment/system failures will be reduced. The challenge is to identify the
optimal frequency/intensity.

24
3.2.2consTrucTion of the model

1. Equipment failures occur according to the exponential distribution with mean time to failure
(MTTF) = 1/λ, where λis the mean arrival rate of failures. (For example, if the MTTF = 0.5 year, then the
mean number of failures per year = 1/0.5 = 2, i.e., λ= 2.)Note that it is not unreasonable to make this
exponential assumption for complex equipment (Drenick 1960).

2. Repair times are exponentially distributed with a mean time of 1/μ.

3. The inspection policy is to perform ninspections per unit time. Inspection times are exponentially
distributed with a mean time of 1/i.

25
4. The value of the output in an uninterrupted unit of time has a profit value V(e.g., selling price less
material cost less production cost). That is, Vis the profit value per unit time if there are no downtime losses.

5. The average cost of inspection per uninterrupted unit of time is I.

6. The average cost of repairs per uninterrupted unit of time is R.Note that Iand Rare the costs that
would be incurred if inspection or repair lasted the whole unit of time. Thus, the actual costs of inspection
and repair incurred per unit time will be proportions of I and R, respectively.

7. The breakdown rate of the equipment, λ, is a function of n, the frequency of inspection per unit
time. That is, the breakdowns can be influenced by the number of inspections; therefore, λ ≡ λ(n), as
illustrated in Figure 3.3.

In Figure 3.3, λ(0) is the breakdown rate if no inspection is made, and λ(1) is the breakdown rate if
one inspection is made per unit time. Thus, from the figure, it can be seen that the effect of performing
inspections is to increase the MTTF of the equipment.

8. The objective is to choose nto maximize the expected profit per unit time from operating the
equipment. The basic conflicts are illustrated in Figure 3.4.

26
The profit per unit time from operating the equipment will be a function of the number of
inspections. Therefore, denoting profit per unit time by P(n),

P(n)= value of output per uninterrupted unit of time

-output value lost due to repairs per unit

– time output value lost due to inspections per unit time

– cost of repairs per unit time

– cost of inspections per unit time

Output value lost due to repairs per unit time

= value of output per uninterrupted unit of time ×

number of repairs per unit time 

Mean time to repair

=Vλ(n)/µ

Note that λ(n)/μis the proportion of unit time that a job spends being repaired.

Output value lost due to inspections per unit time

= value of output per uninterrupted unit of time

 number of inspections per unit time

× mean time to inspect

= Vnli

Cost of repairs per unit time =

Cost of repairs per uninterrupted unit of time  Number of repairs per unit time  Mean time to
repair

=Rλ(n)/µ

27
Substitution of n= 3 into Equation 3.1 will, of course, give the expected profit per unit time resulting
from this policy. Insertion of other values of n into Equation 3.1 will give the expected profit resulting from
other inspection policies. Comparisons can be made with the savings of the optimal policy over other
possibilities, and over the policy currently adopted for the equipment.

3.3 OPTIMAL INSPECTION FREQUENCY:

MINIMIZATION OF DOWNTIME

28
3.3.1 Statement of The problem

The problem of this section is analogous to that of Section 3.2.1: equipment breaks down from time
to time, and to reduce the breakdowns, inspections and consequent minor modifications can be made. The
decision now, however, is to determine the inspection policy that minimizes the total downtime per unit time
incurred due to breakdowns and inspections, rather than to determine the policy that maximizes profit per
unit time. Figure 3.6 illustrates the problem.

3.3.2 Construction of the model

1. f(t), λ(n), n, 1/μ, and 1/iare defined in Section 3.2.2.

2. The objective is to choose nto minimize total downtime per unit time. The total downtime per unit
time will be a function of the inspection frequency, n, denoted as D(n). Therefore,

D(n)=down time incurred due to repairs per unit time +

Down time incurred due to inspection per uunittime

= λ(n) /  + n / i 3.4

Equation 3.4 is a model of the problem relating inspection frequency n to total down-time D(n)

29
REPLACEMENT DECISIONS:-
The goal of this chapter is to present models that can be used to optimize compo-nent replacement
decisions. The interest in this decision area is because a common approach to improving the reliability of a
system, or complex equipment, is through preventive replacement of critical components within the system.
Thus, it is neces-sary to be able to identify which components should be considered for preventive
replacement, and which should be left to run until they fail. If the component is a candidate for preventive
replacement, then the subsequent question to be answered is: What is the best time? The primary goal
addressed in this chapter is that of mak-ing a system more reliable through preventive replacement. In the
context of the framework of the decision areas addressed in this book, we are addressing column 1 of the
framework, as highlighted in Figure 2.1.

Replacement problems (and maintenance problems in general) can be classified as either


deterministic or probabilistic (stochastic).

Deterministic problems are those in which the timing and outcome of the replacement action are
assumed to be known with certainty. For example, we may have an item that is not subject to failure but
whose operating cost increases with use. To reduce this operating cost, a replacement can be performed.
After the replacement, the trend in operation cost is known. This deterministic trend in costs is illustrated in
Figure 2.2.

Examples of component replacement problems that can be treated with a deterministic model are
provided in Table 2.1.

Probabilistic problems are those in which the timing and outcome of the replacement action depend
on chance. In the simplest situation, the equipment may be described as being good or failed. The
probability law describing changes from good to failed may be described by the distribution of time between
completion of the replacement action and failure. As described in Appendix 1, the time to fail-ure is a
random variable whose distribution may be termed the equipment’s failure distribution.

Examples of component replacement problems that can be analyzed using a sto-chastic model are
provided in Table 2.2.
30
The determination of replacement decisions for probabilistically failing equip-ment involves a problem of
decision making with one main source of uncertainty: it is impossible to predict with certainty when a failure will
occur, or more generally, when the transition from one state of the equipment to another will occur. A further source
of uncertainty is that it may be impossible to determine the state of equip-ment, either good, failed, or somewhere in
between, unless definite maintenance action is taken, such as inspection. This aspect of uncertainty is highly relevant
to equipment, often termed protective devices, used in emergency situations. An exam-ple of such a protective device
is a pressure safety valve in an oil and gas field—if it is dormant, waiting to come into service when an unacceptable
pressure level occurs. Its condition can only be determined through an inspection.

In the probabilistic problems of this chapter, we will assume that there are only two possible conditions of the
equipment, good and failed, and that the condition is always known. This is not unreasonable because, for example,
with continuously operating equipment producing some form of goods, we will soon know when the equipment has
reached the failed state because items may be produced outside speci-fied tolerance limits or the equipment may cease
to function.

In determining when to perform a replacement, we are interested in the sequence of times at which the
replacement actions should take place. Any sequence of times is a replacement policy, but what we are interested in
determining are optimal replace-ment policies, that is, ones that maximize or minimize some criterion, such as profit,
total cost, and downtime, or ensure that a specified safety or environmental criterion is not exceeded.
31
In many of the models of component replacement problems presented in this chapter, it will be assumed
(which applies in many cases) that the replacement action returns the equipment to the “as new” condition, thus
continuing to provide exactly the same services as the equipment that has just been replaced when it was new. By
making this assumption, we are implying that various costs, failure distributions, and so on used in the analysis do not
change from one replacement to the next. An exception to this assumption will be problems in which the item being
replaced is not replaced by one that can be considered statistically as good as new.

Throughout this chapter, maintenance actions such as overhaul and repair can be considered to be equivalent
to replacement, provided it is reasonable to assume that such actions also return equipment to the as-new condition. In
practice, this is often a reasonable assumption, and hence the following models can often be used to analyze
overhaul/repair problems. If it is not reasonable to make such an assumption, then the models introduced in Section
2.9.3, along with the model associated with condition-based maintenance in Chapter 3, may help.Section 2.2 addresses
a common deterministic component replacement problem. Stochastic problems are covered in Sections 2.3 through 2.9

OPTIMAL REPLACEMENT TIMES FOR EQUIPMENT

WHOSE OPERATING COST INCREASES WITH USE

statement of The problem

Some equipment operates with excellent efficiency when it is new, but as it ages, its performance deteriorates.
An example is the air filter in an automobile. When new, there is good gasoline consumption, but as the air filter gets
dirty, the gasoline con-sumption per kilometer increases. The question then is: When in the increasing cost trend is it
economically justifiable to replace the air filter, thus reducing the operating cost of the automobile? In general,
replacements cost money in terms of materials and wages, and a balance is required between the money spent on
replacements and savings obtained by reducing the operating cost. Thus, we wish to determine an optimal replacement
policy that will minimize the sum of operating and replacement costs per unit time.

When dealing with optimization problems, in general, we wish to optimize some measure of performance over
a long period. In many situations, this is equivalent to optimizing the measure of performance per unit time. This
approach is easier to deal with mathematically when compared to developing a model for optimizing a measure of
performance over a finite horizon.

The cost conflicts and associated optimization problems are illustrated in Figure 2.3. It should be stressed that
this class of problem can be called short-term deter-ministic because the magnitude of the interval between
replacements is weeks or months, rather than years. If the interval between replacements was measured in years, then
the fact that money changes in value over time would need to be taken into account in the analysis.

32
construction of the model

1. c(t) is the operating cost per unit time at time tafter replacement.

2. Cr is the total cost of a replacement.

3. The replacement policy is to perform replacements at intervals of length t r. The policy is illustrated in
Figure 2.4.

4. The objective is to determine the optimal interval between replacements to minimize the total cost of
operation and replacement per unit time.

To use the equation c(tr) = C(tr) requires that the trend in operating costs be an increasing function, which in practice is
a very reasonable assumption. If that is not the case, and as time progresses, the operating cost of a component

33
becomes lower, then Equation 2.1 needs to be solved using classic calculus (if the cost trend is simple); otherwise, a
numerical solution will be required.

If the trend in operating costs is not continuous, but discrete, then the optimal replacement time is when the
next period’s operating cost is equal to or greater than the current average cost of replacement to that time. In other
words, replace when the marginal operating cost is greater than the average cost to date.

Numerical Example

34
Further comments

In the construction of the model in this section, the time required to produce a replacement has not been included. This
replacement time, Tr, can be accommodated without difficulty. See Figure 2.7 and Equation 2.2 for the appropriate
model:

In practice, it is often not unreasonable to disregard the replacement time because it is usually small when
compared with the interval between the replacements. Any costs, such as production losses incurred due to the
duration of the replacement, need to be incorporated into the cost of the replacement action.

Models have now been developed whereby, for particular assumptions, the opti-mal interval between
replacements can be obtained. In practice, there may be considerable difficulty in scheduling replacements to occur at
their optimal time, or in obtaining the values of some of the parameters required for the analysis. To further assist the
engineer in deciding what an appropriate replacement policy should be, it is usually useful to plot the total cost/unit
time curve (Figure 2.8). The advantage of the curve is that, along with giving the optimal value of t r, it shows the form
of the total cost around the optimum. If the curve is fairly flat around the optimum, it is not really very important that
the engineer should plan for the replacements to occur exactly at the optimum, thus giving some leeway in scheduling
the work. Thus, in Figure 2.8, a replacement interval (t r) with a value somewhere between 3.5 and 6 weeks does not
greatly influence the total cost. Of course, if the total cost curve is not fairly flat around the optimum but rising rapidly
on both sides, then the optimal interval should be adhered to if at all possible.

If there is uncertainty about the value of the particular parameter required in the analysis—say, we are not sure
what the replacement cost is—then evaluation of the total cost curve for various values of the uncertain parameter, and
noting the effect of this variation on the optimal solution, often goes a long way toward deciding what policy should
be adopted and if the particular parameter is important from a solution viewpoint. For example, changing the value of
Cr in Equation 2.1 may produce curves similar to Figure 2.9, which demonstrate, in this instance, that although Cr is
varied, it does not greatly influence the optimal values of t r. In fact, there is an over-lap, which indicates a good
solution independent of the true value of Cr(provided this value is within the bounds specified by the two curves). If
changes in Cr drastically altered the solution from the point of replacement interval and minimal total cost, then it
would be clear that a careful study would be required to identify the true value of C r to be used when solving the
model. (For example, does Cr include only material and labor costs? Or does it include lost production costs? Or costs
associated with having to use a less efficient plant, overtime, or contractors, etc., to make up for losses incurred
resulting from the replacement?) The decision that can be taken (in this case regarding the interval between
replacements) essentially may remain constant within the uncertainty region checked by sensitivity. This does not
necessarily mean that the true total costs will have more or less the same numerical value within the overlap region.
From a decision-making point of view, however, this does not matter because it is the interval between replacements
that is under the control of the decision maker. The total costs are a consequence of the decision taken.

35
Thus, sensitivity checking gives guidance on what information is important from a decision-making viewpoint
and, consequently, what information should be gath-ered in a data collection scheme. The statement “garbage in =
garbage out,” which is frequently made with reference to data requirements of quantitative techniques, is also
demonstrated to be not necessarily correct. The validity of the “garbage in = garbage out” statement does depend on
the sensitivity of the solution to particular garbage. Note, therefore, that garbage indoes not necessarily equal garbage
out, and so our information requirements for the use of quantitative techniques may not be as severe as is often
claimed.

APPLICATIONS:

Replacing the Air Filter in an Automobile:-

What is the economic replacement time for the air filter in an automobile?

The purchase price of an air filter is $80. The automobile driver travels 2,000 km/month. Gasoline costs
$0.75/L. When the air filter is new, then during the first month of operation, the automobile’s performance is 15 km/L;
thus, the first month’s operating cost is $100.00. As the filter ages, there is a deterioration in the number of kilometers
that can be driven using 1 L of gasoline. The deterioration trend is given in Table 2.4.

Using Equation 2.1, in discrete form, we obtain Table 2.5, from which we see that the optimal replacement
age is 4 months, and the associated cost per month is $131.88. The associated graph of cost per month versus time is
provided in Figure 2.10, which includes a calculation showing the use of the optimizing criterion c(t) = C(t r) when the
trend in operating cost is discretized.

Therefore, replace at the end of month 4 because next period’s operations and maintenance cost, c(t= 5), is
greater than the average cost to date ($131.88).

36
Overhauling a Boiler Plant

The replacement problem we have been discussing is similar to a problem associated with a boiler plant.
Through use, the heat transfer surfaces within the boiler become less efficient, and to increase their efficiency, they
can be cleaned. Cleaning thus increases the rate of heat transfer, and less fuel is required to produce a given amount of
steam. However, due to deterioration of other parts of the boiler plant, the trend in operating cost is not constant after

37
each cleaning operation (equivalent to a replacement), but follows a trend similar to that of Figure 2.11. Thus, k
illustrated in Figure 2.6 is no longer constant, but varies from replacement to replacement. That is, the trend in
operating cost after each replacement depends on the amount of steam produced up to the date of the replacement. A
detailed study of this problem is given by Davidson (1970), who analyzes it using a dynamic programming model.

MODELS
One of the main tools in the scientific approach to management decision making is that of building an
evaluative model, usually mathematical, whereby a variety of alternative decisions can be assessed. Any model is
simply a representation of the system under study. In the application of quantitative techniques to management
problems, the type of model used is frequently a symbolic model in which the components of the system are
represented by symbols, and the relationships of these components are described by mathematical equations.

To illustrate this model-building approach, we will examine a maintenance stores problem that, although
simplified, will illustrate two of the most important aspects of the use of models: the construction of a model of the
problem being studied and its solution.

A Stores Problem
A stores controller wishes to know how many items to order each time the stock level of an item reaches zero.
The system is illustrated in Figure 1.5.
The conflict in this problem is that the more items the controller orders at any time, the more the ordering costs
will decrease because fewer orders will have to be placed, but the stockholding costs will increase. These conflicting
costs are illustrated in Figure 1.6.
The stores controller wants to determine the order quantity that minimizes the total cost. This total cost can be
plotted, as shown in Figure 1.6, and used to solve the problem. In this particular case, the total cost is minimized when
the order quantity is at the intersection of the holding cost curve and the ordering cost curve. However, this should not
be generalized; for example, see Figure 1.8. A much more rapid solution to the problem, however, may be

Fig 1.5 stores problem

38
obtained by constructing a mathematical model. The following parameters can be defined:
D -total annual demand
Q -order quantity
Co -ordering cost per order
Ch- stockholding cost per item per year

Optimal Order Quantity


Total cost per year of ordering and holding stock = Ordering cost per year + stockholding cost per year

Since
Orderingcost/year=Number of orders placed per /year ordering cost per order
= DCo / Q

Stock holding cost/year = Average number of items in stock per year(assuming linear decrease of stock) 
Stock holding cost per item per year
= (1/2)  QCh

Therefore, the total cost per year, which is a function of the order quantity, and denoted C(Q), is
C(Q) = (DCo /Q) + (QCh /2) 1.1
Equation 1.1 is a mathematical model of the problem relating order quantity Q
to total cost C(Q).

The stores controller wants the number of items to order to minimize the total cost, that is, to minimize the
right-hand side of Equation 1.1. The answer comes by differentiating the equation with respect to Q, the order
quantity, and equating the derivative to zero as follows:

Because the values of D, Co, and Ch are known, their substitution into Equation 1.2 gives Q*, the optimal value of Q.
Strictly speaking, we should check that the value of Q* obtained from Equation 1.2 is a minimum and not a
maximum. The interested reader can check that this is the case by taking the second derivative of C(Q) and noting that
the result is positive. In fact, in this particular case, the opti-mal order quantity equalizes the average holding and
ordering costs.
From Equation 1.2, we can find that by optimizing the order quantity, the total cost per year is minimized, and
its value is
CQ DCC ()∗= oh
For example, let D= 1000 items, Co = $5.00, and Ch = $0.25:

Thus, each time the stock level reaches zero, the stores controller should order 200 items to minimize the
total cost per year of ordering and holding stock.
Note that various assumptions have been made in the inventory model pre-sented that, in practice, may not
be realistic. For example, no consideration has been given to the possibility of quantity discounts, the possible lead
time between placing an order and its receipt, the fact that demand may not be linear, or the fact that demand may not
be known with certainty. The purpose of the above model is simply to illustrate the construction and solution of a

39
model for a particular prob-lem. If the reader is interested in the stock control aspects of maintenance stores, see
Nahmias (1997).

Obtaining Solutions from Models

In the stores problem of the previous section, two methods for solving a mathematical model were demonstrated:
an analytical procedure and a numerical procedure.

The calculus solution was an illustration of an analytical technique in which no particular set of values of the
control variable (amount of stock to order) was considered, but we proceeded straight to the solution given by
Equation 1.2.

In the numerical procedure, solutions for various values of the control variables were evaluated to identify the best
results, that is, it is a trial-and-error procedure. The graphical solution of Figure 1.6 is equivalent to inserting different
values of Qinto the model (Equation 1.1) and plotting the total cost curve to identify the optimal value of Q.

In general, analytical procedures are preferred to numerical ones, but because of problem complexity, in many
cases, they are impracticable or even impossible to use. In many of the maintenance problems examined in this book,
the solution to the mathematical model will be obtained by using numerical procedures. These are pri-marily graphical
procedures, but iterative procedures and simulation are also used.

Perhaps one of the main advantages of graphical solutions is that they often enable management to clearly see the
effect of implementing a maintenance policy that deviates from the optimum identified through solving the model.
Also, it may be possible to plot the effects of different maintenance policies together, thus illustrat-ing the relative
effects of the policies. To illustrate this point, Chapter 2 includes the analyses of two different replacement
procedures:

1. Replacement of items at fixed intervals of time


2. Replacement of items based on the length of time they are actually in use

Intuitively, one might feel that procedure 2 would be preferable because it is based on usage of the item (thus
preventing an almost new item from being replaced shortly after its installation subsequent to a previous failure, as
would happen with procedure 1).

For these different maintenance policies, which can be adopted for the same equipment, models can be
constructed, as is done in Chapter 2 and, for each pol-icy, the optimal procedure can be determined. However, by
using a graphical solu-tion procedure, the maintenance cost of each policy can be plotted, as illustrated in Figure 1.7,
and the maintenance manager can see exactly the effect of the alternative policies on total cost. It may well be the case
that from a data collection point of view, one policy involves considerably less work than the other, yet they may have
almost the same minimum total cost. This is illustrated in Figure 1.7, in which the minimum total costs are about the
same for procedures 1 and 2.

40
FIGURE 1.7 Comparing the total maintenance costs of two preventive replacement procedures

Of course, for different costs, breakdown distributions, failure and preventive replace-ment times, and so on,
the minimum total costs and replacement intervals may differ greatly between different replacement policies. The
point is that a graphical illustration of the solutions often assists the manager to determine the policy to be adopted.
Also, such a method of presenting a solution is often more acceptable than a statement such as “policy x is the best,”
which may be presented along with complicated mathematics.

Further comments about the benefits of curve plotting are given in Section 2.2.4 in relation to the problem of
determining the optimal replacement interval for equip-ment, the operating cost of which increases with use.

One of the developments in numerical procedures made possible by comput-ers is simulation. An application of
this procedure will be illustrated in a problem in Chapter 5, which relates to determining the optimal number of
machines to be installed in a workshop.

Maintenance Control and Mathematical Models


The primary function of maintenance is to control the condition of assets. Some of the problems associated with this
include the determination of:

• Inspection frequencies
• Overhaul intervals, i.e., part of a preventive maintenance policy
• Whether to do repairs, i.e., having a breakdown maintenance policy or not
• Replacement rules for components
• Replacement rules for capital equipment—perhaps taking account of tech-nological changes
• Whether equipment should be modified
• The size of the maintenance crew
• Composition of machines in a workshop
• Rules for the provision of spares

Appendix 7 provides a list of real-world applications of maintenance decision optimization models in different
industries.

Problems within these areas can be classified as being deterministic or probabilis-tic. Deterministic ones are
those in which the consequences of a maintenance action are assumed to be nonrandom. For example, after an
overhaul, the future trend in operating costs is known. A probabilistic problem is one in which the outcome of the
maintenance action is random. For example, after equipment repair, the time to next failure is uncertain.

41
To solve any of the previously mentioned problems, there are often many alterna-tive decisions. For example,
for an item subject to sudden failure, we may have to decide whether to replace it while it is in an operating state, or
only upon its failure; whether to replace similar components in groups when only one has failed; and so on. Thus, the
function of the asset management department is, to a large extent, con-cerned with determining the effect of various
decisions to control the condition of assets on meeting the objectives of the organization.

As indicated previously, many control actions are open to the maintenance man-ager. The effect of these
actions should not be looked at solely from their effect on the asset management department because the consequences
of such actions may seriously affect other units of the organization, such as production or operations.

To illustrate the possible interactions of the asset management function in other departments, consider the
effect of the decision to perform repairs only and not to do any preventive maintenance, such as overhauls. This
decision may well reduce the budget for asset management, but it may also cause considerable production or opera-
tion downtime. To take account of interactions, sophisticated techniques are frequently required, and this is where the
use of mathematical models can assist the maintenance manager and reduce the tension that often occurs between
maintenance and operations.

Figure 1.8 illustrates the type of approach taken by using a mathematical model to determine the optimal
frequency of overhauling a piece of the plant by balancing the input (maintenance cost) of the maintenance policy
against its output (reduction in downtime)

The above example is very simple and, in practice, we have to consider many factors in the context of even a
single maintenance decision. For example, if the objective of a maintenance decision is to minimize total costs—
lowest cost optimization—the costs of the component or asset, labor, lost production, and per-haps even customer
dissatisfaction from delayed deliveries are all to be considered. Where equipment or component wear-out is a factor,
the lowest possible cost is usu-ally achieved by replacing machine parts late enough to get good service out of them,
but early enough for an acceptable rate of on-the-job failures (to attain a zero rate, we would probably have to replace
parts every day). In another scenario in which availability is to be maximized, we have to get the right balance
between taking equipment out of service for preventive maintenance and suffering outages due to breakdowns. If
safety is the most important factor, we might optimize for the safest possible solution, but with an acceptable effect on
cost. If profit is to be optimized, we would take into account not only cost but also the effect on revenues through
greater customer satisfaction (better profits) or delayed deliveries (lower profits).

42
The example shown in Figure 1.8 should suffice to show that the quantitative approach taken in this book is
concerned with determining appropriate maintenance decisions by studying the mathematical and statistical
relationships between the decisions to be made and the consequences of these decisions. The foregoing comments
about the use of models for analyzing maintenance problems are very brief, but they will be elaborated upon in the
subsequent chapters of this book

DATA REQUIREMENTS FOR MODELING:

Data are essential inputs for building decision models that support evidence-based asset management. It must be
recognized that mathematical models by themselves do not guarantee that the right decisions will be made if the data
used do not have the required quality. A discussion on data requirements for model creation in the context of
maintenance optimization is presented in Tsang et al. (2006).

When data are unavailable or sparse, creating a model that characterizes the risk of failure can still be
achieved through knowledge elicitation by interviewing the asset’s domain experts. The related methodology, as well
as an illustrative example, is provided in Appendix 5.

43
UNIT III

MAINTENANCE LOGISTICS

Logistics

Logistics is the integrated design, management, and operation of human, physical, financial, and information
resources, during product, system, or service life time. (The Society of Logistic Engineers, SOLE)

It is a technology in the system engineering to lower a product life cycle cost and decrease demand
for logistics by the maintenance system optimization to ease the product support. Although originally developed for
military purposes, it is also widely used in commercial customer service organisations.

Classification of Logistics

Logistics may be divided broadly into the following three categories:

• Supply chain logistics: Supply chain logistics deals with the delivery of inputs from suppliers to the
manufacturing plant and the delivery of finished goods to various demand centers. It deals with raw materials and
components on the input side and finished products on the output side.

Service response logistics: Service response logistics is the process of coordinating non‐material activities
necessary for the fulfillment of the service in an effective way. Service response logistics has a different focus from
supply chain logistics, in that supply chain logistics focuses on physical supply and distribution of products, whilst

44
service response logistics emphasizes building responsive organizations, which can respond to customer requests. This
difference in emphasis is illustrated in Figure 20.1.

• Product support logistics: Product support logistics deals with the provisioning, procurement, materials
handling, transportation and distribution, and warehousing of the items and the support infrastructure needed for
carrying out these activities over the life of the product. Figure 20.2 shows the main elements of product support
logistics.

LOGISTICS MANAGEMENT:

Logistics management deals with decision making and this is done at three different levels. In the context of
manufacturing logistics, the three levels are as follows:

• The strategic level deals with decisions that have a long‐lasting effect on the firm. This includes decisions
regarding the number, location, and capacities of warehouses and manufacturing plants.

The tactical level typically includes decisions that are updated anywhere between once every quarter and once
every year. This embraces purchasing decisions, inventory policies, and transportation strategies.

• The operational level refers to day‐to‐day decisions such as scheduling, routing trucks, and measuring
performance.

Key Elements of Maintenance Logistics

Maintenance logistics overlaps with product support logistics when one is dealing with products. Since our
focus is not only on products, but also on other engineered objects such as plants and infrastructures, maintenance
logistics needs to be looked at from the provider perspective – this may be the maintenance department (for in‐house
maintenance) and/or the external service provider (for outsourced maintenance). The key elements of maintenance
logistics for engineered objects are shown in Figure 20.3.

Service Facilities

Carrying out maintenance activities requires several service facilities including workshops to repair failed
items, warehouses and other storage facilities to store materials and spare parts, and so on. Having the tools and
equipment to carry out maintenance is an important issue that needs to be addressed.

45
Location
Maintenance service facilities may be located in one place, as in the case of plants, or may be distributed over
a wide geographical area to be close to customers in the case of consumer products. In the case of infrastructures,
these facilities have to be distributed due to the nature of the engineered object itself. The facilities may be owned by a
single or several different service providers. In some cases, such as air forces or airlines, these facilities may have a
multi‐echelon structure where different types of maintenance are carried out at different levels.

Tools and Equipment for Maintenance


Different types of engineered objects require different types of tools and equipment. These also may be
centralized, mobile, or distributed depending on the object. Example 20.1 provides examples of tools and equipment
required by products, plants, and infrastructures.

HUMAN RESOURCES

The maintenance of most objects (products, plants, or infrastructures) is labor intensive. Having the right mix
of skills and the right workforce size are key to ensuring effective maintenance. Maintenance personnel may be
specialized in certain areas (for example, mechanic, electrician, welder, etc.) or multi‐skilled to deal with a particular
item (car, aircraft, ship, air conditioner, turbine, rail, etc.).

Key issues in terms of maintenance human resources include having the right mix of skills supported by
adequate training programs in order to provide the required level of service.

INVENTORIES
Maintenance of an object requires various kinds of physical goods and they can be grouped broadly into two
categories: (i) consumables– such as oil and grease in plants, paint in infra-structures, and so on, and (ii) spare parts–
items (from components to objects and anything in between) that may be bought new from external suppliers or
repaired/reconditioned either in‐house or by an external agent.

Inventory management (of materials and spares) is important, as holding inventories implies capital being tied
up. Not having enough inventories of spare parts and materials may affect the functioning/operation of the object, with
serious consequences in terms of availability and cost. In this section we focus on spare parts from the point of view of
the maintenance service provider.

46
Characterization of Spare Parts
Spare parts and maintenance are a significant part of most industrial world economies, as illustrated by the
statement given below.

Spare parts and services account for 8% of the annual gross domestic product in the United States. Consumers
and businesses spend more than $700 billion each year on spare parts and services for previously purchased assets,
such as automobiles, aircraft, and industrial machinery. On a global basis, the annual spending on such aftermarket
parts and services totals more than $1.5 trillion.

There are many different ways of characterizing the spare parts used in maintenance, and these include the
following.

Repairable versus Non‐Repairable Items


An engineered object can be viewed as a multi‐level system (see Chapter 2). The number of appropriate
levels2depends on the object under consideration. Deciding whether an item is repairable is not a straightforward
decision.

We distinguish between two types of spare parts:

1. Repairable parts:Parts that are repaired rather than procured; that is, parts that are technically and
economically repairable. After repair, the part becomes ready for use again.

2. Non‐repairable parts or consumables:Parts which are scrapped after replacement.

Non‐Repairable Spares

The primary question that service providers encounter for spare parts planning is how to place the spare part
inventories throughout their service network. Possible options include delivering parts to the field where they are
required, channeling the parts through a central warehouse – a two‐echelon solution, or a three‐echelon solution with a
central distribution center and regional warehouses close to customers. Once the distribution network is in place, the
next issue is the ordering and inventory policies for the different echelons.

Repairable Spares

Given an item design and a repair network, a level of repair analysis (LORA) determines, for each component
in the item, (i) whether it should be discarded or repaired upon failure and (ii) at which echelon in the repair network
this should be done. The objective of the LORA is to minimize the total (variable and fixed) costs.

A typical structure of the repair network is to have a single‐ or multi‐echelon system. The details of these
types of network and their operation are discussed later in this chapter.

Other Issues

Other issues include the following:

• Criticality: This is based on the consequences caused by the failure of the part. The unavailability of some
parts may shut down a whole unit or plant, resulting in high losses.

• Specificity: Some parts are custom‐made whilst others are generic and common to many objects.

• Lead time: Many spare parts have a long lead time, especially for custom built items or repairable items that
have to queue for service at a repair facility.

Framework for Spare Parts Inventory Management

Figure 20.4 is a system characterization of the real world of spare parts management which shows the key
elements and the interactions among them.

47
The key issues we will discuss include:

• Inflow and outflow of items from the inventory;

• Forecasting of the demand of spare parts;

• Inventory control to manage the flow

Forecasting of Demand for Spares

There are two main approaches for forecasting the demand for spare parts. The first is the reliability‐based
approach and the second is the black‐box approach based on historical data on spare parts consumption. In some
cases, spare parts demand exhibits patterns that cannot be predicted well using traditional forecasting methods. We
focus on reliability‐based forecasting.

Demand for non‐repairable items depends on several factors, and the factors influencing the demand for
spares are shown in Figure 20.5.

Replacements of components are points (some random and others non‐random) along the time axis. Let N(t)
denote the count of failures and replacements over [ ,) 0t and this is the demand for replacement items. The demand is
uncertain and can be characterized in terms of the mean and variance, as shown in Figure 20.6. The mean demand for
an item is given by E[N(t)] (often referred to as the mean cumulative function(MCF)) for the item.

One can divide items into three groups, as shown in Figure 20.7: (A) fast‐moving, (B) medium‐moving, and
(C) slow‐moving items based on their reliability (low, intermediate, and high).

48
49
NEW ITEM INVENTORY MANAGEMENT

Inventory management involves the selection of suppliers (a tactical decision) and ordering policies. This
section deals with ordering policies and inventory costs.

Framework for Ordering Policy Decisions

The framework for ordering policies is given in Figure 20.8. We discuss briefly the various elements of the
framework.

Inventory Level

The inventory level of new parts changes dynamically in an uncertain manner. It decreases when an item is
issued for maintenance activities and increases when an order is received from suppliers.

Ordering Policies

There are three key issues related to ordering policies. For the single‐item case, the first two issues are
important and they deal with (i) when to order and (ii) how much to order, with one or both being the decision
variables. For the multi‐item case, the coordination of times to order is issue (iii), and this is commonly referred to as
joint replenishment.

Several different inventory policies dealing with issues (i) and (ii) have been studied, and the two most
commonly used are as follows:

• Fixed ordering time policy: The inventory is reviewed at ordering times jT, j =12...., and the quantity
ordered may change with time.

• Fixed ordering quantity policy: Here, the quantity ordered each time is the same and the time between orders
changes with time.

If the demand for spares is based on the MCF, then the ordering times and quantities for the two policies are
as shown in Figure 20.9.

50
Decision Problems

The fixed ordering time [quantity] policy has a single decision variable T[Q] and the optimal selection of this requires
a suitable objective function. One that is commonly used is the asymptotic expected total inventory cost per unit time
(over an infinite time horizon). The total cost consists of the following elements:

• Ordering cost: This depends on the quantity ordered, Q+ administration cost, and is given by a+bQ where b is the
sale price of a spare item.

• Inventory holding cost: This is given by hy where h is the holding cost per item per unit time and y is the duration
the spare item stays in inventory.

Shortage cost: This cost may include losses resulting from the downtime due to unavailability of spares.

51
Emergency ordering cost: This cost is incurred when the last spare is used before the next regular ordering time
instant.

It is difficult to obtain analytical expressions for these cost elements as failures occur randomly and, as such, the
inventory levels change in an uncertain manner over time.

Integrated Maintenance Inventory Model

In this section we consider a periodic inventory policy for an item maintained using the block policy discussed
in Chapter 4.Note that the expected number of spares needed between two preventive maintenance (PM) actions is
given by M(T), where M(t) is the renewal function associated with F(t), the failure distribution for the item. The
quantity ordered is 1 + M(T) at time instants which correspond to PM actions, as shown in Figure 20.11 for the first
cycle. Since one item is used in the PM replacement, the inventory level at the start is M(T). The inventory reduces by
one each time a spare is used and this occurs in an uncertain manner. However, the mean value of the inventory is
given by I (t)= M(T)+M(t) where t = 0is the time instant of PM action. The mean inventory profile of the cycle stock
is shown in the figure.

Since the inventory level changes in an uncertain manner, the total demand for spares over a cycle is a random
variable which can be either less than M(T) or greater than M(T). In the former case, the order quantity needed is less
than M(T). However, in the latter case there is a shortfall.

A periodic inventory policy with maximal level Sand safety stock level sis shown in Figure 20.12. Note that
even in this case, the demand over a cycle can exceed S, so that the inventory is depleted and additional spares need to
be ordered as emergency orders.

REPAIRABLE ITEMS INVENTORY MANAGEMENT

Single‐Echelon Inventory Model

The single‐echelon inventory system for a repairable item is shown in Figure 20.4. When an item fails, it is
replaced by a working item from the new or repaired item inventory, if available. Otherwise, the system (for example,
an aircraft) waits until a working item becomes available. The failed item is removed and joins the repair queue. At a
later time it is either scrapped or repaired (and then joins the inventory for repaired items).

The key issues in the single‐echelon repairable item problem include:

• The distribution of the arrival of the failed items to the repair facility: This depends on the number of
engineered objects involved (if the item is an aircraft engine then the arrival distribution depends on the number of
aircraft in the fleet), their intensity of usage, the maintenance policy adopted, and so on.

• The capacity of the repair facility: This capacity determines the service rate of repair, which is an important
parameter of the problem.

• The appropriate measures of performance for the system: The common measures include:

52
◦Average fill rate: This is the percentage of parts required for repair that are available from on‐the‐shelf
inventory.

◦Total (system) backorders: The system‐wide backorders simply represents the sum of expected backorders of
all parts that are used to support the system.

◦System availability: This is a measure that is both intuitive and directly reflects the customer goal of
generating value through the use of the system.

• The optimal number of spares in the system: This is usually the key decision variable of the problem and is
determined to optimize one of the above measures of system performance.

Multi‐Echelon Inventory Model

Consider a two‐echelon inventory system for a repairable item where the system consists of a repair depot and
N operating sites. Each site requires a set of working items and maintains an inventory of spare items. All failed items
are repaired at the repair depot, which also maintains an inventory of spare items. We consider a one‐for‐one
replenishment policy, which is appropriate when the item has high value and is subject to infrequent failures. When an
item fails at a site, three events occur simultaneously:

1. The failed item is replaced with a spare item from the site’s inventory, if one is available; otherwise, there is
a shortage at the site that will last until a replacement arrives from the repair depot.

2. The failed item is sent to the depot for repair.

3. The depot ships a replacement item if it has available inventory; otherwise, the depot places the
replacement request on backorder and will fill it when stock is available. When the failed item arrives at the repair
depot, it enters the repair process; upon completion of the repair process, the item goes into the depot inventory or fills
a backorder if any exist.

The decision problem is the quantity of spare items to be stocked and their locations.

The METRIC Model

The Multi‐Echelon Technique for Recoverable Item Control (METRIC) was developed in the late 1960s for
the US Air Force by the RAND Corporation. The METRIC model determines, for every item of a system, the optimal
stock level at each of several different bases, which may be different in terms of item demand rates and other
characteristics, and the supporting depot. The objective function is the sum of backorders across all bases.

MAINTENANCE LOGISTICS FOR PRODUCTS

Characteristics of Service Part Logistics for Products

Delivering after‐sales services is more complex than manufacturing products. When delivering after‐sales
services, firms have to deploy parts, people, and equipment at more locations than they do to make the products. An
after‐sales network has to support all the products a company has made in the past as well as those it currently makes.
As a result, the service network often has to cope with 20 times the number of stock‐keeping units that the
manufacturing function deals with. Businesses also have to train service personnel, who are dispersed all over the
world, in a variety of technical skills. Moreover, after‐sales networks operate in an unpredictable and inconsistent
marketplace because of the unpredictable nature of the demand for product repair.

In addition, companies must design a portfolio of service products, since different customers have different
service needs even though they may own the same product. Those needs also change with time. For example, the
failure of a computer in a nuclear power plant will have a more severe impact than when a computer in a library goes

53
down. Also, a grounded aircraft means more to an air force during a war than it does during the course of a training
exercise.

The management of service part logistics encompasses planning, fulfillment, and execution of service parts
through activities like demand forecasting, parts distribution, warehouse management, repair of parts, and
collaboration processes with all the relevant parties in the after‐sales service supply chain. In the next section we
present a framework that captures the main elements of service part logistics in the automotive and aerospace
industries.

Service Part Logistics in the Automotive and Aerospace Industries

A framework encompassing the main elements of service part logistics in the automotive and aerospace
industries is shown in Figure 20.13. We focus on the network configuration for delivering spare parts.

Customer Service Objectives and Goals

The amount of time it takes to restore a failed item is often seen as a key performance indicator, especially in
the aerospace industry, where any part unavailability translates into huge losses. Companies need to design a portfolio
of service products, as each customer segment demands a different level of service.

In general, both automotive and aircraft companies offer three different levels of service, as indicated in
Tables 20.2 and 20.3.

54
Supply Chain Network

A typical after‐sales supply chain network consists of four entities, namely the parts supplier or original
equipment manufacturer (OEM), the regional logistics center (RLC), the importers or country warehouses, and the
dealers. Three typical configurations that are commonly used are as follows:

• A centralized configuration where parts from suppliers will be stored in the RLC and delivered directly to
the dealers whenever demand arises

• A decentralized configuration where parts from a supplier will be forwarded to the RLC first. The RLC
usually breaks up the large shipments received from suppliers/OEM and then sends the smaller shipments to various
warehouses in other countries in the region. Some of these warehouses are owned by the company whilst others are
outsourced to third party logistics providers.

55
MAINTENANCE LOGISTICS FOR PLANTS

The turnaround maintenance (TAM) event affects and is affected by many internal and external stakeholders
in a wider supply chain context. In the petrochemical industry, plants feed each other; that is, the product of one plant
is the raw material of another. Also, a large number of plants in a given area (for example, Jubail in Saudi Arabia) will
compete for a limited number of subcontractors. As explained in the previous section, TAM also requires the ordering
of many spare parts, involves long lead times for items from suppliers, and sometimes the assistance of technology
providers is needed. This supply chain view of TAM is depicted in Figure 20.14. This system view requires integrated
TAM planning and coordination involving all stakeholders to secure maximal utilization of resources to benefit the
entire system.

In particular, the timing of TAM for the various plants should take into consideration the interdependence
between them to minimize the disturbance to the whole system. Timing coordination between plants and the sharing
of experiences can benefit an entire industry if the TAM event is viewed in this wider supply chain context.

This coordination of TAM events is also an important issue in the power‐generation industry. Unlike the
petrochemical industry, where an inventory buffer of final products and raw materials may be built ahead of a TAM
event, electricity cannot be stored. Thus, the timing of TAM is crucial to avoid an interruption to the electricity supply.

56
Many models have been developed in the literature to generate optimal maintenance schedules, taking into
account the interdependence between plants to minimize the adverse effect of plant shutdown on all stakeholders.

MAINTENANCE LOGISTICS FOR INFRASTRUCTURES

In this section we discuss some aspects of rail track maintenance logistics. Maintenance and renewal activities
for rail track require several items such as rails, switches and crossings (S&C), sleepers, and ballast and also
machinery such as welding machines, rail‐grinding machines, and so on. Here, we focus on the logistics for rails and
also provide some information on S&Cs, sleepers, and ballast.

Rail Logistics

Rails may be delivered directly from the plant to the renewal or maintenance site or they may be stocked for
later use. The welding of rails may be done on site or in welding plants located close to rail rolling mills. The rail
length may be up to 400 m and they are transported by train. This is the case for track renewal. However, for track
maintenance, shorter rails are often needed and are located at many sites along the track, since defects may occur in
any part of the network. In this case, the replacement short‐length rails are best stocked at discrete locations such as
maintenance depots. Hence, the most flexible logistical solution for the delivery of short‐length rails (up to 27 m) is by
road using flatbed trailers. Other logistics information in terms of lead time, number of suppliers, and relationships
with suppliers is indicated in Table 20.4 based on a study in the European Union.

MAINTENANCE PLANNING AND SCHEDULING

Maintenance planning is a key element of maintenance management and needs to be done at three levels:
strategic, tactical, and operational. At each level there are several issues that need to be addressed and effective
decision making requires a proper framework that captures the main issues and decisions at that level. At the tactical
level, the key issues include facility capacity planning (for carrying out maintenance actions), manpower needs, and
equipment and tool requirements. At the operational level, the key issue is scheduling and this depends on whether
maintenance is done on site or the failed item is brought to a workshop. Maintenance control is essential to ensure that
the planned maintenance and related activities are carried out properly. This involves monitoring, proper data
collection, and analysis to resolve any problems and guarantee continuous improvement. This chapter deals with these
topics.

MAINTENANCE PLANNING

Tactical‐Level Framework

At the tactical level, the key issues are (i) maintenance load forecasting and (ii) maintenance capacity
planning. Forecasting predicts the future demand for maintenance work considering age and planned workload, and
capacity planning ensures that adequate capacity is available to meet the planned and unplanned maintenance load.

Operational‐Level Framework

Operational‐level planning deals with the day‐to‐day preparation and execution of maintenance work. Key
issues include scheduling, work order planning, and execution.
57
Tactical‐Level Maintenance Planning

 Maintenance Load Forecasting

The maintenance load denotes the volume of maintenance work anticipated over time into the future and is made up of
the following two main components:

1. Planned maintenance:This includes all PM (preventive maintenance) work that has been planned and scheduled in
advance.

2. Unplanned maintenance:This includes all CM (corrective maintenance) work due to unforeseen breakdowns and
failures.

The load comprises the manpower, materials (including spares), and facilities (equipment and tools) needed on a
periodic basis (per month, quarter, or year) for the object being maintained. For products and plants, planned
maintenance is determined by the maintenance policies recommended by the OEM (original equipment manufacturer).
For infrastructures, planned maintenance is decided during the design process in the building of the object, taking into
account the anticipated usage and load, and needs to be revised over time based on the history of actual usage and
load. The unplanned maintenance depends on the degradation and failure of the object. For products and plants, the
expected number of CM actions required over a period depends on the age at the start of the period and the reliability
characteristics.

58
 Qualitative Methods for Forecasting

For a newly designed object there are often very limited data to evaluate its reliability or performance over time.
In the case of products, this translates into uncertainty in the form and parameters of the ROCOF (rate of occurrence
of failure).In this case, qualitative (or judgmental) forecasting methods can be used. The two commonly used methods
are as follows:

1. Panel consensus: This generates a forecast based on the average estimates of a group of experts. The idea is that a
panel of people from a variety of positions is able to develop a more reliable forecast than a narrower group. Panel
forecasts are developed through open meetings with free exchange of ideas from all levels of management and
individuals.

2. The Delphi method:This is a group technique in which a panel of experts is questioned indi-vidually about their
perceptions of future events. The experts do not meet as a group in order to reduce the possibility that consensus is
reached because of dominant personality factors. Instead, the forecasts and accompanying arguments are summarized
by an outside party and returned to the experts along with further questions. This continues until a consensus is
reached.

 Quantitative Methods for Forecasting

The MCF (mean cumulative function) of the object gives the expected number of CM actions as a function of the
period under consideration and is given by the integration of the ROCOF over this period. One can compute the
maintenance load (PM and CM) in each period, and Figure 19.3 is a typical plot of such a forecast.

For consumer products, the OEM needs to forecast the maintenance requirements during the warranty period
from sales occurring over time. Time series modeling has been used for forecasting sales over different time periods.
The model is updated as new sales data become available. This is combined with the ROCOF to obtain the unplanned
maintenance load over time.

 Maintenance Capacity Planning

Capacity planning deals with the determination of the maintenance resources needed to meet the maintenance
load on a periodic basis. The resources required can be classified into four categories: (i) human resources, (ii) spare
parts and materials needed, (iii) facilities, equipment, and tools required, and (iv) information (documentation,
manuals, etc.) needed to carry out the maintenance tasks. Here, we focus on human resource capacity planning, and
the spare part issue is discussed in the next chapter.

Due to fluctuations in maintenance load from period to period, human resource capacity planning addresses
the following issues:

• Number of maintenance workers of various trades and skills;

• Correct level of work backlog;

• Overtime capacity;

• Contract maintenance capacity.

The purpose of maintenance capacity planning is to determine how to satisfy a fluctuating maintenance load
in each period. This is done by determining how much of each possible maintenance capacity (regular time, overtime,
subcontracting) should be planned to meet the maintenance load. Figure 19.4 illustrates this point for the case where
the demand is the human resource needed. In periods 1–6, the planned capacity is PC1and in the subsequent periods it

59
is PC2 (>PC1). Note that when the demand exceeds the planned capacity, it needs to be met by either using overtime or
outsourcing some of the maintenance tasks.

An important objective of capacity planning is to minimize the total cost of labor and backlog over the
planning horizon. Many approaches have been proposed for determining the optimal capacity and they can be grouped
broadly into (i) deterministic (when uncertainty is insignificant) and (ii) stochastic (when uncertainty is significant).

Deterministic Approaches:

A common deterministic approach is to formulate the maintenance manpower capacity problem as a


mathematical program with:

1. Decision variables: The workforce size, number of workers hired (or fired), number of overtime hours,
number of regular hours, number of hours subcontracted, and number of hours backlogged each period.

2. Objective function: Minimize the total labor cost (regular, overtime, and subcontracted), the total cost of
hiring and firing, and the backlog cost.

3. Constraints: Balance the equation for maintenance load and workforce size between adjacent periods, and
limits on overtime and subcontracting that can be used.

Many different models have been proposed and we present a mixed integer programming model.

60
Stochastic Approaches:

The stochastic approach is seldom used in practice as it involves complex model formulations and simulations
to carry out the analysis and optimization. An alternative approach used is the deterministic approach with safety
factor – inflating the mean to reduce the demand exceeding the planned capacity due to uncertainties in the load
demand and treating the problem as deterministic. The risk of demand not being met is reduced as the safety factor
increases, but this is achieved at the expense of the capacity being underutilized when demand is below capacity.

OPERATIONAL‐LEVEL MAINTENANCE PLANNING

At the operational level, maintenance planning has the following main objectives:

1. Completion of maintenance work when it is needed, in a safe and efficient manner.

2. Minimization of lost production time due to maintenance.

3. Optimized utilization of maintenance labor and materials through effectively planned and balanced
schedules.

4. Equitable resource allocation based on understood criteria and the varying business needs of the internal
customers supported.

5. Minimization of labor delay and idle time through effective coordination with the concerned department,
such as operations and stores.

It involves (i) work order planning and scheduling and (ii) maintenance scheduling.

 Work Order Planning and Scheduling

61
A work order form (paper or electronic) serves as the vehicle for communicating information related to
specific work requested for maintenance. The work order form must be designed to include two types of information:

• Information needed for planning and scheduling: This includes the requesting department, information about
the item to be maintained (inventory number, location), information about the work requested (description, priority,
etc.), information about resources needed (estimated time, types and trades, spare parts, tools, etc.), information about
methods, safety procedures, and technical information (drawings and manuals).

 Information needed for control: This includes actual time taken and spares and materials used and also the
causes and consequences of failures.

Planning

Work order planning is the advance preparation of maintenance work so that it can be exe-cuted in an efficient
and effective manner at some future date. The maintenance planner con-ducts a detailed analysis of each job to
determine and describe the work to be performed, the task sequence and methodology, plus the identification of
required resources – including skills, crew size, man‐hours, spare parts and materials, special tools and equipment.

An effective planner must have the following qualifications:

• Experience and familiarity with the engineered objects that need maintenance. This enables the planner to
estimate maintenance time and other resources and select the best methods.

• Good communication skills, as this job requires coordination with other departments.

• Familiarity with planning tools and techniques and data analysis methods.

The job of a maintenance planner is greatly enhanced by the use of a computerized maintenance management
system (CMMS).Such a system provides timely access to available resources that need to be planned. It also assists in
data collection and analysis and the generation of various reports, and is discussed in the next section.

SCHEDULING

Scheduling is the process by which required resources are allocated to specific jobs at a certain point in time
when the engineered object is available or the job site is accessible. Effective scheduling requires coordination with
production personnel.

Priorities are established in coordination with maintenance customers to ensure that the most urgent jobs are
scheduled first. Most maintenance departments have three or four levels of priorities that are clearly defined, including
time frames for starting the work (e.g., urgent, normal, scheduled).

Maintenance schedules are usually prepared for different time frames. A long‐range schedule may cover a
period between three months and one year (for example, a schedule for rail track maintenance). It is usually based on
open work orders, PM work orders, and anticipated CM. The long‐range schedule is usually broken down into weekly
schedules that are, in turn, broken down into daily schedules. These schedules are continuously updated in light of any
changes to original plans.

Execution

Good planning is a prerequisite for good execution. An effective planning function eliminates unnecessary
waste from the work process, so that all materials, tools, support services, and technical information are ready for
technicians to start the job without delay.

As mentioned earlier, the work order system plays a key role in administering, monitoring, approving, and
collecting data about all maintenance jobs. In particular, data are collected about the actual time taken, the spare parts

62
used, and the cause of the failure in case of CM actions. The approval process for execution of jobs ensures quality
and identifies training needs. This information is crucial for a maintenance control system and is the cornerstone of
continuous improvement.

Maintenance Scheduling:

If planning specifies howto do maintenance‐related jobs, then scheduling specifies whento do them.
Maintenance scheduling deals with the decisions regarding when specific maintenance tasks are to be carried out
(either at the service facility or on site). It needs to take into account various other issues – an important one being the
interaction between maintenance and production/operations departments.

Maintenance scheduling is object (product/plant/infrastructure) specific and depends on whether the


maintenance needs to be done on site or whether the object is being brought to a service facility. Many maintenance
jobs are of short duration and can be handled by the work order system described in the previous section; other jobs,
such as turnaround maintenance (TAM) for plants or major maintenance of infrastructures, are major projects that
need extensive planning and scheduling, and a variety of techniques is used.

Scheduling Techniques:

There are many techniques that can assist a scheduler in developing effective schedules. Some of these are
graphical in nature and can be very helpful in following up the execution, especially for lengthy jobs. Other techniques
are used to obtain optimal schedules in terms of cost or some other criterion, taking into account the needs of the
operations department, the coordination of the maintenance of similar units, and so on. Two commonly used methods
are outlined below.

 Critical Path Methods

In terms of graphical methods, critical path methods (CPMs) are commonly used for large projects with
complex precedence relationships between maintenance tasks, such as, TAM. CPM scheduling is a graphical
technique used for illustrating activity sequences, together with each activity’s expected duration, to portray project
execution steps in precedence order. Several commercial software packages are available for this purpose.

Development of a CPM schedule begins by representing the project graphically by a network built up from
circles (nodes) and arrows (directed arcs) which lead up to or emerge from the circles. Usually, the circles represent
activities. Connecting the circles with arrows represents a sequence of activities in which each one is dependent on the
previous one. In other words, the earlier activity must be completed in order to begin the next activity. Graphing out
the job activities and dependencies to develop the network requires good knowledge of the constituent parts of the
project.

Example 19.1 CPM Method

The following simple CPM example illustrates how the critical path is determined given a certain number of
activities, their precedence relationships, and their durations. Consider the network shown in Figure 19.5, where the
activities are the nodes and the duration of each activity is shown on the arc out of the node. The arc out of a node
points to its succes-sor activity. One can follow the arrows backwards to find what is required for each task and follow
them forwards to see what task is next.

The critical path is found by calculating the earliest start and finish times for each node, beginning from the
start point and moving forward to the end node. This is called the for-ward pass. The results are indicated in the upper
part of the table above each node. The backward passcalculates the latest start and finish times for each activity. The
results are indicated in the lower part of the table above each node, as shown in Figure 19.5

The critical path is then identified from the difference between the earliest start times and the latest finish
times. These differences are called the slack times. The critical path is the path where the earliest start and the latest
63
finish time are the same and there-fore there is no slack in these activities – a delay in these activities leads to a delay
in the entire project. Activities that have slack time may be delayed without causing a delay in the entire project. Such
activities are not on the critical path. The critical path for this example is then N1‐N2‐N5‐N6‐N7

Mathematical Programming Techniques:

Scheduling of maintenance work for a fleet of objects (buses, airplanes, locomotives, etc.), or a large number
of interdependent plants, leads to complex problems with many constraints arising from the need to coordinate
maintenance timing with the operational requirements of the engineered objects. Finding optimal maintenance
schedules that minimize the overall maintenance cost subject to various constraints may be formulated as a
mathematical program problem.

--------------------------------------------------

EXTRA

The most widely accepted list of logistics activities include:

 Reliability engineering, maintainability engineering and maintenance (preventive, predictive and corrective)
planning
 Supply (spare part) support acquire resources
 Support and test equipment/equipment support
 Manpower and personnel
 Training and training support
 Technical data/publications
 Computer resources support
 Facilities
 Packaging, handling, storage and transportation
 Design interface
Decisions are documented in a life cycle sustainment plan (LCSP), a Supportability Strategy, or (most
commonly) an Integrated Logistics Support Plan (ILSP). ILS planning activities coincide with development of the
system acquisition strategy, and the program will be tailored accordingly. A properly executed ILS strategy will
ensure that the requirements for each of the elements of ILS are properly planned, resourced, and implemented. These
actions will enable the system to achieve the operational readiness levels required by the war fighter at the time of
fielding and throughout the life cycle.[2][3] ILS can be also used for civilian projects, as highlighted by the ASD/AIA
ILS Guide.[4]
It is considered common practice within some industries - primarily Defence - for ILS practitioners to take a
leave of absence to undertake an ILS Sabbatical; furthering their knowledge of the logistics engineering disciplines.
ILS Sabbaticals are normally taken in developing nations - allowing the practitioner an insight into sustainment
practices in an environment of limited materiel resources.
ILS is a technique introduced by the US Army to ensure that the supportability of an equipment item is considered
during its design and development. The technique was adopted by the UK MoD in 1993 and made compulsory for the
procurement of the majority of MOD equipment.

64
 Influence on Design. Integrated Logistic Support will provide important means to identify (as early as
possible) reliability issues / problems and can initiate system or part design improvements based on reliability,
maintainability, testability or system availability analysis
 Design of the Support Solution for minimum cost. Ensuring that the Support Solution considers and
integrates the elements considered by ILS. This is discussed fully below.
 Initial Support Package. These tasks include calculation of requirements for spare parts, special tools, and
documentation. Quantities required for a specified initial period are calculated, procured, and delivered to support
delivery, installation in some of the cases, and operation of the equipment.
The ILS management process facilitates specification, design, development, acquisition, test, fielding, and support of
systems.

Maintenance Planning
Maintenance planning begins early in the acquisition process with development of the maintenance concept. It is
conducted to evolve and establish requirements and tasks to be accomplished for achieving, restoring, and maintaining
the operational capability for the life of the system. Maintenance planning also involves Level Of Repair Analysis
(LORA) as a function of the system acquisition process. Maintenance planning will:

 Define the actions and support necessary to ensure that the system attains the specified system readiness
objectives with minimum Life Cycle Cost (LCC).
 Set up specific criteria for repair, including Built-In Test Equipment (BITE) requirements, testability,
reliability, and maintainability; support equipment requirements; automatic test equipment; and manpower skills
and facility requirements.
 State specific maintenance tasks, to be performed on the system.
 Define actions and support required for fielding and marketing the system.
 Address warranty considerations.
 The maintenance concept must ensure prudent use of manpower and resources. When formulating the
maintenance concept, analysis of the proposed work environment on the health and safety of maintenance
personnel must be considered.
 Conduct a LORA repair analysis to optimize the support system, in terms of LCC, readiness objectives, design
for discard, maintenance task distribution, support equipment and ATE, and manpower and personnel
requirements.
 Minimize the use of hazardous materials and the generation of waste

SUPPLY SUPPORT
Supply support encompasses all management actions, procedures, and techniques used to determine requirements to:

 Acquire support items and spare parts.


 Catalog the items.
 Receive the items.
 Store and warehouse the items.
 Transfer the items to where they are needed.
 Issue the items.
 Dispose of secondary items.
 Provide for initial support of the system.
 Acquire, distribute, and replenish inventory

SUPPORT AND TEST EQUIPMENT


Support and test equipment includes all equipment, mobile and fixed, that is required to perform the support functions,
except that equipment which is an integral part of the system. Support equipment categories include:

65
 Handling and Maintenance Equipment.
 Tools (hand tools as well as power tools).
 Metrology and measurement devices.
 Calibration equipment.
 Test equipment.
 Automatic test equipment.
 Support equipment for on- and off-equipment maintenance.
 Special inspection equipment and depot maintenance plant equipment, which includes all equipment and tools
required to assemble, disassemble, test, maintain, and support the production and/or depot repair of end items or
components.
This also encompasses planning and acquisition of logistic support for this equipment.

MANPOWER AND PERSONNEL


Manpower and personnel involves identification and acquisition of personnel with skills and grades required to
operate and maintain a system over its lifetime. Manpower requirements are developed and personnel assignments are
made to meet support demands throughout the life cycle of the system. Manpower requirements are based on related
ILS elements and other considerations. Human factors engineering (HFE) or behavioral research is frequently applied
to ensure a good man-machine interface. Manpower requirements are predicated on accomplishing the logistics
support mission in the most efficient and economical way. This element includes requirements during the planning
and decision process to optimize numbers, skills, and positions. This area considers:.

 Man-machine and environmental interface


 Special skills
 Human factors considerations during the planning and decision process

TRAINING AND TRAINING DEVICES


Training and training devices support encompasses the processes, procedures, techniques, training devices, and
equipment used to train personnel to operate and support a system. This element defines qualitative and quantitative
requirements for the training of operating and support personnel throughout the life cycle of the system. It includes
requirements for:

 Competencies management
 Factory training
 Instructor and key personnel training
 New equipment training team
 Resident training
 Sustainment training
 User training
 HAZMAT disposal and safe procedures training
Embedded training devices, features, and components are designed and built into a specific system to provide training
or assistance in the use of the system. (One example of this is the HELP files of many software programs.) The design,
development, delivery, installation, and logistic support of required embedded training features, mockups, simulators,
and training aids are also included.

TECHNICAL DATA
Technical Data and Technical Publications consists of scientific or technical information necessary to translate system
requirements into discrete engineering and logistic support documentation. Technical data is used in the development
of repair manuals, maintenance manuals, user manuals, and other documents that are used to operate or support the
system. Technical data includes, but may not be limited to:

66
 Technical manuals
 Technical and supply bulletins
 Transportability guidance technical manuals
 Maintenance expenditure limits and calibration procedures
 Repair parts and tools lists
 Maintenance allocation charts
 Corrective maintenance instructions
 Preventive maintenance and Predictive maintenance instructions
 Drawings/specifications/technical data packages
 Software development documentation
 Provisioning documentation
 Depot maintenance work requirements
 Identification lists
 Component lists
 Product support data
 Flight safety critical parts list for aircraft
 Lifting and tie down pamphlet/references
 Hazardous Material documentation

COMPUTER RESOURSES SUPPORT


Computer Resources Support includes the facilities, hardware, software, documentation, manpower, and
personnel needed to operate and support computer systems and the software within those systems. Computer resources
include both stand-alone and embedded systems. This element is usually planned, developed, implemented, and
monitored by a Computer Resources Working Group (CRWG) or Computer Resources Integrated Product Team (CR-
IPT) that documents the approach and tracks progress via a Computer Resources Life-Cycle Management Plan
(CRLCMP). Developers will need to ensure that planning actions and strategies contained in the ILSP and CRLCMP
are complementary and that computer resources support for the operational software, and ATE software, support
software, is available where and when needed.

PACKEGING, HANDLING STORAGE AND TRANSPORTATION (PHS&T)


This element includes resources and procedures to ensure that all equipment and support items are preserved,
packaged, packed, marked, handled, transported, and stored properly for short- and long-term requirements. It
includes material-handling equipment and packaging, handling and storage requirements, and pre-positioning of
material and parts. It also includes preservation and packaging level requirements and storage requirements (for
example, sensitive, proprietary, and controlled items). This element includes planning and programming the details
associated with movement of the system in its shipping configuration to the ultimate destination via transportation
modes and networks available and authorized for use. It further encompasses establishment of critical engineering
design parameters and constraints (e.g., width, length, height, component and system rating, and weight) that must be
considered during system development. Customs requirements, air shipping requirements, rail shipping requirements,
container considerations, special movement precautions, mobility, and transportation asset impact of the shipping
mode or the contract shipper must be carefully assessed. PHS&T planning must consider:
 System constraints (such as design specifications, item configuration, and safety precautions for hazardous
material)
 Special security requirements
 Geographic and environmental restrictions
 Special handling equipment and procedures
 Impact on spare or repair parts storage requirements
 Emerging PHS&T technologies, methods, or procedures and resource-intensive PHS&T procedures
 Environmental impacts and constraints

67
FACILITIES

The Facilities logistics element is composed of a variety of planning activities, all of which are directed toward
ensuring that all required permanent or semi-permanent operating and support facilities (for instance, training, field
and depot maintenance, storage, operational, and testing) are available concurrently with system fielding. Planning
must be comprehensive and include the need for new construction as well as modifications to existing facilities. It also
includes studies to define and establish impacts on life cycle cost, funding requirements, facility locations and
improvements, space requirements, environmental impacts, duration or frequency of use, safety and health standards
requirements, and security restrictions. Also included are any utility requirements, for both fixed and mobile facilities,
with emphasis on limiting requirements of scarce or unique resources.

DESIGN INTERFACE
Design interface is the relationship of logistics-related design parameters of the system to its projected or actual
support resource requirements. These design parameters are expressed in operational terms rather than as inherent
values and specifically relate to system requirements and support costs of the system. Programs such as "design for
testability" and "design for discard" must be considered during system design. The basic requirements that need to be
considered as part of design interface include:

 Reliability
 Maintainability
 Standardization
 Interoperability
 Safety
 Security
 Usability
 Environmental and HAZMAT
 Privacy, particularly for computer systems
 Legal

68
UNIT IV

MAINTENANCE QUALITY

Introduction

The development of a sound quality control system for maintenance is essential for ensuring high-quality repair,
accurate standards, maximum availability, and equipment life cycle and efficient equipment production rates. Quality
control as an integrated system has been practiced with more intensity in production and manufacturing operations
than that in maintenance. Although the role of maintenance in the long-term profitability in an organization has been
realized, the issues relating to the quality of maintenance output have not been adequately formulated. Possible
reasons include the following:

1. The output of the maintenance department is difficult to define and measure.

2. Customer focus is lacking in maintenance as compared to production.

3. A large portion of maintenance is non-repetitive.

4. Work conditions vary more in maintenance work than in production.

5. Traditionally, maintenance has been regarded as a necessary evil and at best a

secondary system driven by production. This viewpoint has led to the assignment of a low priority to improvement of
maintenance activities.

The quality of maintenance output has a direct link to product quality and the ability for a company to meet
delivery schedules. In general terms, equipment that is not well maintained or that is maintained with poor
workmanship fails periodically, or experiences speed losses, or is reduced of precision and hence tends to produce
defects. More often than not, such equipment drives manufacturing processes out of control. A process that is out of
control or with poor capability produces defective products which amount to lower profitability and greater customer
dissatisfaction.

A clear organization of the quality control function and the specification of its role (responsibilities) in the
maintenance system should be emphasized by the organization’s top management. The responsibilities include
development of testing and inspection procedures, documentation, follow-up, deficiency analysis, and help in
identifying training needs from the analysis of quality reports.

Maintenance managers and engineers need to be aware of the importance of controlling the quality of
maintenance output. The establishment of maintenance testing and inspection standards and acceptable quality levels
should be developed for all maintenance work. Documentation of maintenance procedures and inspection reports can
provide tremendous opportunities for maintenance quality improvement. These opportunities can be realized by
continuous improvement of the procedures, and the identification of training needs to enhance craft technical skills.

Maintenance activities are not repetitive, and large observations for such activities cannot be collected for
statistical analysis. For such activities, process control techniques provide valuable tools for improving maintenance
processes.

Organizations should strive to tie their maintenance activities to the quality of their products and services.
Also, they should create a focus on their internal customers. This will provide them with direction and goals for
improving their maintenance processes.

69
Responsibilities of Quality Control (QC):

1. Performing inspections of maintenance actions, procedure, equipment, and facilities.

2. Maintaining and upgrading maintenance documents, procedures, and standards.

3. Ensuring that all units are aware and proficient in maintenance procedures and standards.

4. Maintaining a high level of expertise by keeping up to date with the publications concerning maintenance
procedures and records.

5. Providing input in the training of maintenance personnel.

6. Performing deficiency analysis and process improvement studies using various statistical process control
tools.

7. Ensuring that all the technical and management procedures are adhered to by crafts when performing actual
maintenance.

8. Reviewing the job time standards to evaluate their adequacy.

9. Reviewing material and spare parts quality and availability to ensure avail-ability and quality.

10. Performing maintenance audits to access the current maintenance situation and prescribe remedies for
deficient areas.

11. Establishing certification and authorization of personnel performing highly specialized critical tasks.

12. Developing procedures for new equipment inspections and test the equipment prior to acceptance from
vendors.

In summary, quality control in maintenance is responsible for ensuring the quality objectives for resources,
procedures, and standards used in the maintenance process are met. In addition, it performs inspection of maintenance
jobs and tests of equipment prior to acceptance or operation.

It is essential to have the QC personnel as independent as possible, and they must not be an extension of the
workforce. Also, they should not perform production inspections, as such inspections can be assigned to production
inspectors or workshop supervisors. Personnel comprising the quality control unit must be highly qualified technicians
or engineers with extensive training in areas such as productivity improvement, statistical process control, process
improvement, techniques for planning and scheduling, and work measurements.

In large organizations such as airline companies, air forces, army units, and railroad companies, it is necessary
to have a quality control division within the maintenance department. This division will report to the maintenance
manager. In medium-size organization, a small unit will do the job; however, in small-size organizations, one or two
inspectors attached to the manager’s office or the planning unit can perform the function of quality control.

70
BASIC CONCEPTS OF FMEA AND FMECA

Failure Mode and Effects Analysis (FMEA) and Failure Modes, Effects and Criticality Analysis (FMECA) are
methodologies designed to identify potential failure modes for a product or process, to assess the risk associated with
those failure modes, to rank the issues in terms of importance and to identify and carry out corrective actions to
address the most serious concerns.

Although the purpose, terminology and other details can vary according to type (e.g. Process FMEA, Design FMEA,
etc.), the basic methodology is similar for all. This article presents a brief general overview of FMEA / FMECA
analysis techniques and requirements.

FMEA / FMECA Overview


In general, FMEA / FMECA requires the identification of the following basic information:

 Item(s)
 Function(s)
 Failure(s)
 Effect(s) of Failure
 Cause(s) of Failure
 Current Control(s)
 Recommended Action(s)
 Plus other relevant details

Most analyses of this type also include some method to assess the risk associated with the issues identified during the
analysis and to prioritize corrective actions. Two common methods include:

 Risk Priority Numbers (RPNs)


 Criticality Analysis (FMEA with Criticality Analysis = FMECA)

71
Basic Analysis Procedure for FMEA or FMECA
The basic steps for performing an FMEA/FMECA analysis include:

 Assemble the team.


 Establish the ground rules.
 Gather and review relevant information.
 Identify the item(s) or process(es) to be analyzed.
 Identify the function(s), failure(s), effect(s), cause(s) and control(s) for each item or process to be analyzed.
 Evaluate the risk associated with the issues identified by the analysis.
 Prioritize and assign corrective actions.
 Perform corrective actions and re-evaluate risk.
 Distribute, review and update the analysis, as appropriate.
Risk Evaluation Methods
A typical FMEA incorporates some method to evaluate the risk associated with the potential problems identified
through the analysis. The two most common methods, Risk Priority Numbers and Criticality Analysis, are described
next.

Risk Priority Numbers


To use the Risk Priority Number (RPN) method to assess risk, the analysis team must:

 Rate the severity of each effect of failure.


 Rate the likelihood of occurrence for each cause of failure.
 Rate the likelihood of prior detection for each cause of failure (i.e. the likelihood of detecting the problem
before it reaches the end user or customer).
 Calculate the RPN by obtaining the product of the three ratings:
RPN = Severity x Occurrence x Detection
The RPN can then be used to compare issues within the analysis and to prioritize problems for corrective action.

Criticality Analysis
The MIL-STD-1629A document describes two types of criticality analysis: quantitative and qualitative. To use the
quantitative criticality analysis method, the analysis team must:

 Define the reliability/unreliability for each item, at a given operating time.


 Identify the portion of the items unreliability that can be attributed to each potential failure mode.
 Rate the probability of loss (or severity) that will result from each failure mode that may occur.
 Calculate the criticality for each potential failure mode by obtaining the product of the three factors:
Mode Criticality = Item Unreliability x Mode Ratio of Unreliability x Probability of Loss
 Calculate the criticality for each item by obtaining the sum of the criticalities for each failure mode that has
been identified for the item.
Item Criticality = SUM of Mode Criticalities
To use the qualitative criticality analysis method to evaluate risk and prioritize corrective actions, the analysis team
must:

 Rate the severity of the potential effects of failure.


 Rate the likelihood of occurrence for each potential failure mode.

72
Applications and Benefits

 The FMEA / FMECA analysis procedure is a tool that has been adapted in many different ways for many
different purposes. It can contribute to improved designs for products and processes, resulting in higher
reliability, better quality, increased safety, enhanced customer satisfaction and reduced costs. The tool can
also be used to establish and optimize maintenance plans for repairable systems and/or contribute to control
plans and other quality assurance procedures. It provides a knowledge base of failure mode and corrective
action information that can be used as a resource in future troubleshooting efforts and as a training tool for
new engineers. In addition, an FMEA or FMECA is often required to comply with safety and quality
requirements, such as ISO 9001, QS 9000, ISO/TS 16949, Six Sigma, FDA Good Manufacturing Practices
(GMPs), Process Safety Management Act (PSM), etc.

 ReliaSoft's Xfmea software facilitates analysis, data management and reporting for failure mode and effects
analysis (FMEA) and failure modes, effects and criticality analysis (FMECA). The software supports all
major standards (AIAG FMEA-3, J1739, ARP5580, MIL-STD-1629A, etc.) and provides extensive
customization capabilities for analysis and reporting, allowing you to configure the software to meet your
organization's specific analysis and reporting procedures for all types of FMEA / FMECA.

What is Failure Mode, Effects & Criticality Analysis (FMECA)


FMECA is a bottom-up (Hardware) or top-down (Functional) approach to risk assessment. It is inductive, or data-
driven, linking elements of a failure chain as follows: Effect of Failure, Failure Mode and Causes/Mechanisms. These
elements closely resemble the modern 5 Why technique in Root Cause Analysis (RCA). The Effect of Failure
duplicates the experience of a user/customer and is then translated into the technical failure description or Failure
Mode. The technical failure description answers the next question “Why”, introducing causes that result in the failure
mode. Each failure mode has a probability assigned and each cause has a failure rate assigned. If data is not available,
probability of occurrence is assigned. The probability depends on the failure data source documents utilized in the
FMECA. Unlike 5 Why, the FMECA is performed prior to any failure actually occurring. FMECA analyzes risk,
which is measured by criticality (the combination of severity and probability), to take action and thus provide an
opportunity to reduce the possibility of failure.

Two quantitative and one qualitative options exist for FMECA Criticality as identified below:
1. Quantitative
o Mode Criticality = Item Unreliability x Mode Ratio of Unreliability x Probability of Loss x Time
(life)
o Item Criticality = Sum of Mode Criticalities
2. Qualitative
o Compare failure modes via a Criticality Matrix, which identifies severity on the horizontal axis and
qualitatively derived occurrence on the vertical axis
o Note: Quality-One suggests a qualitative criticality matrix for the Quality-One Three Path Model for
FMEA Development. Severity is on the vertical axis and occurrence is depicted on the horizontal axis. This
is often used as an alternative for the Risk Priority Number (RPN) in FMEA.

Why Perform Failure Mode, Effects & Criticality Analysis (FMECA)


The intent of the Failure Mode, Effects & Criticality Analysis methodology is to increase knowledge of risk and
prevent failure. The tangible benefits of FMECA are offered in the following categories:

73
Design and Development Benefits
 Increased reliability
 Better quality
 Higher safety margins
 Decreased development time and re-design
Operations Benefits
 More effective Control Plans
 Improved Verification and Validation testing requirements
 Optimized preventive and predictive maintenance
 Reliability growth analysis during product development
 Decreased waste and non-value added operations (Lean Operation and Manufacturing)
Cost Benefits
 Recognize failure modes in advance (when they are less costly to address)
 Minimized warranty costs
 Increased sales from customer satisfaction
How to Perform Failure Mode, Effects & Criticality Analysis (FMECA)
The basic assumption when performing FMECA instead of FMEA is the desire to have a more quantitative risk
determination. The FMEA utilizes a more multi-functional team using guidelines to set Severity and Occurrence. The
FMECA is performed by first completing an FMEA process worksheet and then completing the FMECA Criticality
Worksheet.
The general steps for FMECA development are as follows:
 FMEA Portion (see our FMEA page for more details)
o Define the system
o Define ground rules and assumptions to help drive the design
o Construct system Boundary Diagrams and Parameter Diagrams
o Identify failure modes
o Analyze failure effects
o Determine causes of the failure modes
o Feed results back into design process
 FMECA Portion
o Transfer Information from the FMEA to the FMECA
o Classify the failure effects by severity (change to FMECA severity)
o Perform criticality calculations
o Rank failure mode criticality and determine highest risk items
o Take mitigation actions and document the remaining risk with rationale
o Follow-up on corrective action implementation/effectiveness
FMECA can often become time consuming and therefore available resources and team interest can be an issue as the
process continues. Quality-One has developed the FMECA process below to utilize engineering resources effectively
and ensure the FMECA has been developed thoroughly. The Quality-One approach is as follows:
Step 1: Perform the FMEA
The FMEA is a good starting place for the FMECA. FMEA allows for qualitative, and therefore creative, inputs from
a multi-disciplined engineering team. FMEA provides the first inputs into design change and can jump start the risk
mitigation process. The FMEA information is transferred into the FMECA Criticality Worksheet. The transferred
data from the FMEA worksheet will include:
 Item Identification Number
 Item / Function
 Detailed Function and / or Requirements
 Failure Modes and Causes with Mechanisms of Failure
 Mission Phase or Operational Mode (DoD specific), often related to the Effects of Failure

74
Step 2: Determine Severity Level
Next, assign the Severity Level of each Effect of Failure. There are various severity tables to select from. The
following is used in medical and some aerospace activities. The actual descriptions can be altered to fit any product or
process design. There are generally four severity level classifications as follows:
 Catastrophic: Could result in death, permanent total disability, loss exceeding $1M, or irreversible severe
environmental damage that violates law or regulation
 Major/High Impact: Permanent partial disability, injuries or occupational illness resulting in hospitalization of
3 or more personnel, loss exceeding $200K but less than $1M, or reversible environmental damage causing a
violation of law or regulation
 Minor Impact: Could result in injury or occupational illness resulting in one or more lost work day(s), loss
exceeding $10K but less than $200K, or mitigatable environmental damage without violation of law or regulation
where restoration activities can be accomplished
 Low Impact: Result in minor injury or illness not resulting in a lost work day, loss exceeding $2K but less
than $10K, or minimal environmental damage
Step 3: Failure Effect Probability
In some applications of FMECA, a Beta value is assigned to the Failure Effect Probability. The FMECA analyst may
also use engineering judgement to determine the Beta value. The Beta / Effect Probability is placed in the FMECA
Criticality Worksheet where:
 Actual Loss / 1.00
 Probable loss / >0.10 to <1.00
 Possible loss / >0 to =0.10
 No Effect / 0
A failure mode ratio is developed by assigning a proportion of the failure mode to each cause. The accumulation of
all cause values equals 1.00.
Step 4: Probability of Occurrence (Quantitative)
Assign probability values for each Failure Mode, referencing the data source selected. Failure Probability and Failure
Rate data can be found from several sources:
 Handbook 217 is referenced but any source of failure rate data can be used
 RAC databases, Concordia, etc.
If the Failure Mode probability is listed (functional approach) several columns of the FMECA Criticality Worksheet
may be skipped. Criticality (Cr) can be calculated directly. When failure rates for failure modes and contributing
components are desired, detailed failure rates for each component are assigned.
Next, we must assign Component Failure Rate (lambda). Failure Rates for each component are selected from the
failure rate source document. Where there is no failure rate available, the qualitative values from the FMEA are used.
FMEA may also be an alternative method on new or innovative designs.
Operating Time (t) represents the time or cycles the item or component will be expected to live. This is related to the
expected duty cycle requirements.
Step 5: Calculate and Plot Criticality
In FMECA, Criticality is calculated in two ways:
 The Modal Criticality (each failure mode all causes) = Cm
 The Criticality of the Item (all failure modes summarized) = Cr
Formulas of each are not provided in this explanation but the essence of the elements of the calculation is as follows:
 Cm = The product of the following:
o Failure Rate of the Part (lambda)
o Failure Rate of the Effect (Beta)
o Failure Mode Ratio (alpha)
o Operating Time (units of time or cycles)
 Cr = The summation of all the Cm

75
Step 6: Design Feedback and Risk Mitigation
Risk mitigation is a discipline required to reduce possible failure. The identified risk in the criticality matrix is the
substitute for failure and must be treated in the same context as a test failure or customer returned component or item.
FMECA requires a change in risk levels / criticality after mitigation. A defect / defective detection strategy,
commensurate to the risk level, may be required. Acceptable risk management strategy includes the following:
 Mitigation actions directed at Highest Severity and Probability combinations
 Any risk where mitigation was unsuccessful is a candidate for Mistake Proofing or Quality Control, protecting
the customer / consumer from the potential failure
o Detection methods are chosen for failure modes first and if possible individual causes which do not
permit shipping or acceptance
 Action logs and “risk registers” with revision history are kept for follow-up and closure of each undesirable
risk
Other examples of FMECA mitigation strategies to consider:
 Design change. Take a new direction on design technology, change components and/or review duty cycles for
derating.
 Selection of a component with a lower lambda (failure rate). This can be expensive unless identified early in
Product Development.
 Physical redundancy of the component. This option places the redundant component in a parallel
configuration. Both must fail simultaneously for the failure mode to occur. If a safety concern exists, this option
may require non-identical components.
 Software redundancy. The addition of a sensing circuit which can change the state of the product. This option
often reduces the severity of the event by protecting components through duty cycle changes and reducing input
stresses.
 Warning system. A placard and / or buzzer / light. This requires action by an operator or analyst to avoid a
failure or the effect of failure.
 Detection and removal of the potential failure through testing or inspection. The inspection effectiveness must
match the level of severity and criticality.
Step 7: Perform Maintainability Analysis
Maintainability Analysis looks at the highest risk items and determines which components will fail earliest. The cost
and parts availability are also considered. This analysis can affect the location of the components or items when in the
design phase. Design consideration must be given for quick access when serviceability is required more frequently.
 Access panels, easy to remove, permit service of the identified components and items. This can limit down
time of important machinery.
 A spare parts list is typically created from the maintainability analysis.

ROOT CAUSE ANALYSIS

Root Cause Analysis (RCA) is defined as a systematic process for identifying the origins of problems and
determining an approach for responding to and solving them. It focuses on preventing problems rather than
simply “putting out fires.” RCA tries to be more scientific about asset failures, going a step beyond troubleshooting.

Overview

When feeling under the weather, it’s perfectly natural to address any pain or discomfort by some sort of first aid
treatment or a superficial remedy. However, if you consult a medical professional, then the approach might be a little
more thorough. You might find yourself being asked a series of specific questions about your condition and might
even go through some laboratory tests to get to the source of your illness.

76
The same is true for plant and maintenance incidents. While an immediate response is usually required, there is always
value in performing a systematic analysis of possible root causes.

RCA is the process that aims to identify the cause of a particular event. In the plant setting, this event usually refers to
any potential problems that will disrupt standard operations. At a very high level the usual suspects (i.e. usual causes
of problems) can be categorized as:

 technical issues affecting physical parts


 human causes, or when an assigned individual does not perform a task correctly
 system causes, or lapses in processes
The general process of RCA requires you to describe what happened, why and how it happened, and what steps are
needed to prevent the same event from happening in the future. The process can get very complex depending on the
situation. Thankfully, some common methods were developed to aid in identifying the root cause.

Goals of Root Cause Analysis

RCA capitalizes on the analysis of data collected from previous asset failures. It’s important to remember that some
failures can cascade into other failures, creating a greater need for root cause analysis in order to fully understand the
sequence of cause and failure events. Root Cause Analysis typically has 3 goals:

1. Discover the root cause


2. Fully understand how to fix and learn from the problem
3. Apply the solution to this and future problems, establishing repeatable processes and ensuring repeat
successes

Common methods used in Root Cause Analysis

5 Whys

The name of the method pretty much explains the steps: Ask why and ask it again. Asking “Why?” five times usually
gets to the bottom of the problem, but don’t let the name stop you from asking more times. The idea is to drill down to
the details of an event until you are left with the actual root cause.

When executing root cause analysis, one process that is widely used is “The Five Whys”, a method that originated
from Sakichi Toyoda, founder of Toyota Industries, in the 1930s. The idea behind this process is that you should be
able to figure out the root cause of a problem by asking five “why” questions (more or less than five as needed). Here
is a real life example:

1. Why didn’t your car start? – Because the battery was dead.
2. Why was your car battery dead? – Because I left the headlights on last night.
3. Why did you leave the headlights on last night? – Because the headlight warning sensor did not beep when
car was last exited.
4. Why did the headlight warning sensor malfunction? – It suffered a complete failure.
5. Why did it suffer a complete failure? – Because the part has reached the end of its lifespan.

Using The Five Whys method, you can deduce that the root cause of your car failing to start is an depleted headlight
warning sensor (which should beep at you when you exit the car with your lights on). In the case of a sensor such as
this, you can’t really prevent it from failing—you can only replace it timely so that your car can be used again right
away. However, there may be other repairs you can avoid making by keeping up with preventive maintenance.

77
An another example involving a faulty mixer subjected to 5 Whys is shown below.

Fault tree analysis

A more visual method to determine root causes is by using a fault tree diagram. A fault tree diagram starts by
having the problem at the topmost block. The immediate causes preceding the problem event are listed, then they
branch out to form the second layer of the diagram. Each immediate cause branches out to its own prior causes. This
process is continued until the most basic events are identified, which then become your potential root causes.

The same mixer can resemble the following fault tree diagram:

78
Fishbone diagram (aka Ishikawa diagram)

Another visual method to identify root causes is by using a fishbone diagram (also known as an Ishikawa diagram,
named after its creator Kaoru Ishikawa). It starts by specifying the problem on the rightmost part of the diagram. The
factors contributing to the main problem are then listed as categories. Specific causes under each category are then
listed down to identify the source of the problem.

As a general guide, the following categories are used as starting points:

 Environmental
 People
 Equipment/material
 Procedures
Applying these basing categories as a starting point, the mixer problem can be translated into a fishbone diagram.

79
Implementing Root Cause Analysis

While RCA methods are very common and well-known to the maintenance community, there can be challenges
to making RCA thrive.
The first step to mastering this process is knowing the methods that are available to conduct RCAs. The next steps are
setting the proper mindset and improving the quality of execution to drive the initiative toward success.

Keep in mind the importance of collecting data accurately and involving the correct groups to analyze that data. To
implement RCA effectively, it should be a repeatable process that is collaboratively executed by the group.

Other RCA Methods

While The Five Why’s is a popular RCA method, it is definitely not the only one. You may use one or multiple
methods in the same cycle of RCA. Other strategies for RCA include:

 Barrier Analysis
 Change Analysis
 Casual Factor Tree Analysis
 Fishbone Diagram Analysis
 Failure Tree Analysis
 Failure Mode and Effects Analysis
 Pareto Analysis

The Six Phases of RCA

When Root Cause Analysis is performed, there are six phases in one RCA cycle. The components of asset
failure may include environment, people, equipment, materials, and procedure. Before you carry out RCA, you should
decide which problems are immediate candidates for this analysis. Just a few examples of where root cause analysis is
used include major accidents, everyday incidents, human errors, and manufacturing mistakes. Those that result in the
highest costs to resolve, most downtime, or threats to safety will rise to the top of the list.

Phase 1: List and Consider Every Possible Cause

The first thing to do in Root Cause Analysis is to list every potential cause leading up to a problem or event.
Place the incident into the context of everything related to the problem. You should also look at a longer time period
than the days leading up to when the incident occurred to create a history of what might have gone wrong and when.

This phase requires complete neutrality, focusing on facts. When you are investigating potential causes, some facts
may not be available if no one saw what happened or evidence was discarded or destroyed. This is when you should
look to secondary sources. You can also construct possible scenarios for how the problem may have occurred.

Phase 2: Gather Additional Data, Information, and Evidence

Phase 2 involves collecting as much data as you can that relates to the potential cause(s) of the problem. This
data may come from your existing CMMS software, other databases, digital files, or printed documents. Ask questions
to clarify information and drill down into every potential cause. This phase would be where you would implement The
Five Whys Method.

Phase 3: Identify What Contributed to the Problem

Phase 3 is to identify everything that contributed to the problem. Make a list of every change or event. If
possible, gather evidence of these changes and the main problem that occurred. There are four types of evidence that

80
can be gathered: people, paper, physical, and recording evidence. Just a few examples include interviews, activity-
specific paperwork, broken parts, and video footage.

Phase 4: Analyze Collected Data

In Phase 4, you should analyze the collected data. Categorize changes or events by how much influence you
have over them. Then decide if each event is unrelated, a correlating factor, a contributing factor, or a root cause. An
unrelated event is one that has no impact or effect on the problem whatsoever. A correlating factor is one that is
statistically related to the problem, but may or may not have a direct impact on the problem. A contributing factor is
an event or condition that directly led to the problem, in full or in part. (We defined root cause in the beginning of this
article). This should help you arrive at one or more one root causes. When the root cause has been identified, more
questions can be asked. Why are you certain that this is the root cause instead of something else?

Phase 5: Create a Plan for Preventing Future Breakdowns

The fifth phase of RCA is to develop a plan for preventing future breakdowns. It’s important to identify
preventive actions you should take that will not only prevent the problem from reoccurring, but also not cause other
problems. You should find and present a solution that is repeatable and applicable to more than one situation.

Be sure to ask, how can the root cause be eliminated or avoided so the issue doesn’t occur again? There are reports
available in a CMMS to help you identify how to prevent the problem. Just a few examples include changes to the
preventive maintenance routine or operator training, new signage or HMI controls, or a change of parts or part
suppliers.

In order to predict the potential for future problems (and hopefully avoid them), you should ask a few questions. What
can be done to prevent the problem from reoccurring? How will the solution be implemented, and who will implement
it? What are the risks involved in this solution?

Phase 6: Implement Plan

Now that you’ve come up with a plan, it’s time to implement that plan. Depending on the type, severity, and
complexity of the problem and the plan to prevent it from happening again, there are a number of areas to take into
consideration. These include, but are not limited to the people in charge of the assets, the condition and status of the
assets themselves, the processes related to the maintenance of those assets, and any people or processes outside of
asset maintenance that have an impact on the identified problem.

After root cause analysis is complete, maintenance teams should go back and review the actual downtime and costs
impact associated with that problem. This will help you determine if this problem and other similar issues were worth
the effort of RCA.

81
RELIABILITY CENTERED MAINTENANCE (RCM)

Reliability centered maintenance is a logical methodology derived from this research in the aviation sector and
uses the failure mode, effect, and criticality analysis (FMECA) tool. RCM is a process used to identify the most
applicable and effective maintenance action(s) to ensure the highest practical standard of operating performance of a
system or a component.

RCM Goals and Objectives

RCM has the following goals:

• To determine the most cost-effective and applicable maintenance tasks to minimize the risk and impact of failure on
systems/equipment function.

• Ensure high safety and reliability performance.

• Maintain system and equipment functionality in the most economical manner.

Specific RCM objectives

• To ensure realization of the inherent safety and reliability levels of the equipment.

• To restore the equipment to these inherent levels when deterioration occurs.

• To obtain the information necessary for design improvement of those items where their inherent reliability proves to
be inadequate.

• To accomplish these goals at a minimum total cost, including maintenance costs, support costs, and economic
consequences of operational failures.

The above goals and specific objectives are clear derive for effective maintenance programs that usually result from
the application of RCM methodology.

RCM Principles

RCM has the following key principles that distinguish it from other methodologies for maintenance:

• Preservation system of equipment function: The focus here is to keep the system performing its function not to keep
it operating as though it is new. This tells us that as far as the system performing its function, there is no need for
excessive maintenance which may cause failure in some cases. This principle has led to a reduction in time-based
preventive maintenance in the airline industry that reduced cost and improved reliability of systems. Redundancy of
function through multiple equipment improves functional reliability, but increases life cycle cost in terms of
procurement and operating costs.

• Focus on systems: RCM focuses on systems than component, since functions are usually driven by systems.

• RCM is reliability centered: It treats failure statistics as it relates to age. It seeks to know the conditional probability
of failure at specific ages (the probability that failure will occur in each given operating age bracket).

• Safety and economics are the key criteria: Safety must be ensured first at any cost; followed by costs that result from
the impact on production and operation.

Design limitations exist: RCM objective is to maintain the inherent reliability of the equipment design, recognizing
that changes in inherent reliability arises from design rather than maintenance. Maintenance can, at best, only achieve
and maintain the level of reliability for a system.

82
• Feedback is necessary for improvement: RCM recognizes that maintenance feedback can improve on the original
design. In addition, RCM recognizes that a difference often exists between the perceived design life and the intrinsic
or actual design life.

• Failure is any unsatisfactory condition: failure may be either a loss of function (operation ceases) or a loss of
acceptable quality (operation continues).

• Maintenance tasks should be derived based on logic: RCM uses a logic tree to develop and screen maintenance tasks.

RCM Methodology

RCM has a systematic methodology that consists of seven steps. The seven steps are as follows:

1. System selection and information collection;

2. System boundary definition;

3. System description and functional block diagram;

4. Functions and functional failure;

5. Failure mode and effective analysis (FMEA);

6. Logic decision tree analysis (LTA); and

7. Task selection.

System Selection and Information Collection

RCM is best presented and implemented at a system level due to the fact that functions are best captured at the system
level. The component level lacks defining significance of functions and functional failure, while plant-level analysis
makes the whole analysis intractable. The important question faced at this stage is which system should be selected?
The following are criteria that guide the selection:

1. Systems with a high number of corrective maintenance tasks during recent years;

2. Systems with a high number of preventive maintenance tasks and or costs during recent years;

3. A combination of scheme 1 and 2;

4. System with a high cost of maintenance;

5. Systems contributing significantly toward plant outages/shutdowns (full or partial) during recent years;

6. Systems with high concern relating to safety; and

7. Systems with high concern relating to environment.

Past experience has shown that all of these criteria except scheme 6 and 7 yield more or less the same results. An
indicator of a good selection is that systems chosen for RCM program results in a significant improvement over the
current situation.

The next task, after selecting a system, is collecting information related to the selected system. A good practice is to
start collecting key information and document right at the onset of the process. The following are documents that may
be required in a typical RCM study:

• P&ID (piping and instrumentation) diagram;

83
• Systems schematic and/or block diagram;

 Functional block diagram;

• Equipment design specification and operations manuals (a source of finding design specifications and operating
condition details);

• Equipment history file (failure and maintenance history in specific);

• Other identified sources of information, unique to the plant or organizational structure. Example includes industry
data for similar systems; and

• Current maintenance program used for the system. This information is generally not recommended to collect before
step 7, in order to avoid biases that may affect the RCM process.

System Boundary Definition

System boundary definition is needed for the following reasons:

• It provides an exact knowledge of what is included and not included in a system in order to make sure that any key
system function or equipment is not neglected (or not overlapped from another system). This is especially important if
two adjacent systems are selected.

• Boundary definition also includes system interfaces (both IN and OUT inter-faces) and interactions that establish
inputs and outputs of a system. An accurate definition of IN and OUT interfaces is a precondition to fulfill step 3 and
4 below.

There are no clear rules to specify a system boundaries; however, as a general guideline, a system has one or two main
functions with a few supporting functions that would make up a logical grouping of equipment. However the
boundary is identified, there must be clear documentation as part of a successful process.

System Description and Functional Block Diagram

This step is important and will set the stage for a successful RCM process. The step has the following five elements:

• System description;

• Functional block diagram;

• In/out interfaces;

• System work breakdown structure; and

• Equipment history.

This step generally involves a form that documents baseline characterization of a system which is eventually expected
to be used in stipulating PM tasks.

System Functions and Functional Failure

The fourth step identifies and lists all system functions. As a guide for identifying functions, every out interface
should be captured into a function statement and any internal out interfaces between functional subsystems can be a
source for a function. An important point to note is that these statements are for defining system functions and not the
equipment. With the definition of system functions comes the functional failures. In RCM, the focus is on functions
and functional failures. The functional failures are more than just a single statement of loss of function. The loss
conditions may be two or more (e.g., complete paralysis of the plant or major or minor deprivation of functionality).
This distinction is important and will lead to the proper ranking of functions and functional failures.

84
Figure11.7provides a form to document functions and functional failures. The following are the examples for the
correct and wrong statement of functions:

• Provide 1500 psi safety relief valves (wrong statement because the statement isabout equipment);

• Provide for pressure relief above 1500 psi (correct; the focus is on function);

• Provide a 1500 gallon per minute (gpm) centrifugal pump on the discharge side of header 26 (wrong); and

• Maintain aflow of 1500 gpm at the outlet of header 2

Failure Modes and Effects Analysis (FMEA)

Failure modes and effects analysis (FMEA) is a basic tool used in reliability engineering to assess the impact of
failures. It is a systematic failure analysis technique that is used to identify the failure modes, their causes, and
consequently their fallouts on the system function. FMEA analysis rates each potential failure mode and effect based
on the following three factors:

• Severity—the consequence of the failure when it happens;

• occurrence—the probability or frequency of the failure occurring; and

• detection—the probability of the failure being detected before the impact of the

effect is realized.

Then these three factors are combined in one number called the risk priority number (RPN) to reflect the priority of
the failure modes identified. The risk priority number (RPN) is simply calculated by multiplying the severity rating,
the occurrence probability rating, and the detection probability rating.

RPN= severity rating × occurrence probability rating × Detection Probability rating

Logic or Decision Tree Analysis (LTA)

The purpose of the LTA is to prioritize the resources to be committed to each failure mode. The prioritization is based
on the impact of the failure mode. RCM processes a simple and intuitive structure for this purpose. The structure
utilizes two criteria, i.e., safety and cost, that arise from plant full outage. The LTA has three questions that enable a
user, with minimal efforts, to place each failure mode into one of the six categories. Each question is answered as yes
or no only. Each category (also known as a bin) forms natural segregation of items of respective importance. The LTA
scheme is shown below in Fig.11.9

85
The six classification categories for the failures are A, B, C, D/A, D/B, or D/C.For the priority scheme, A and
B have higher priority over C when it comes to allocation of scarce resources and A is given higher priority than B. In
summary,the priority for PM task goes in the following order:

1. A or D/A;
2. B or D/B; and
3. C or D/C.

Task Selection

In this step, RCM methodology allocates PM tasks and resources. This is the stage where the maximum benefit from
RCM may be obtained. The task selection process requires that each selected task must be applicable and effective.
Here, “applicable” means that the task should be able to prevent failures, detect failures, or unearth hidden failures,
while “effective” is related to the cost effectiveness of the alternative PM strategies. If no PM task is selected through
the LTA, the only option is to run equipment to failure. This activity requires contribution from the maintenance per-
sonal as their experience is invaluable in the correct selection of the PM task. After selecting the tasks, the set of all
run-to-failure (rtf) tasks are subjected to a final sanity check. The purpose of the check is to review critically all
component failures that are treated as run-to-failure cases to see if this task is appropriate. If an rtf task fails any of the
following tests or creates a conflict, the PM or the current task is kept. The following are the checks:

• Marginal effectiveness: It is not clear that the rtf costs are significantly less than the current PM costs.
• High-cost failure: While there is no loss of critical function, the failure mode is likely to cause extensive damage to
the component that should be avoided.
• Secondary damage: Similar to the second item, except that there is a high probability extensive damage in
neighboring components.
• OEM conflict: The original manufacturer recommends a PM task that is not supported by RCM. It is very sensitive if
warranty conditions are involved.
• Internal conflict: Maintenance or operation feels strongly about the PM task that is not supported by RCM.
• Regulatory conflict: Regulatory body established the PM, such as theEnvironmental Protection Agency (EPA).
• Insurance conflict: similar to the above two.

RCM IMPLEMENTATION

RCM implementation can be viewed as a process with four stages [1, 2, 4, 7]. Each stage consists of a number of tasks
that must be executed in order to ensure successful implementation. The four stages are as follows:

86
• Stage 1: Planning and organizing for RCM.
• Stage 2: Analysis and design.
• Stage 3: Scheduling and execution.
• Stage 4: Assessment and improvement.
The following subsections present the process for RCM implementation.

Planning and Organizing for RCM

This stage is important for the success of RCM implementation. Top maintenance management must seek
organization commitment for the RCM project and ensure the needed resources are provided. In some situations, it
may be better to conduct a pilot RCM project. The following must be addressed carefully:
1. Organization: Formation of the RCM team and selection of a facilitator. The facilitator must be knowledge about
the RCM process. The team must include experienced people in the areas that will be impacted by the application of
the RCM. The team must use a clear system for reporting progress and challenges.
2. Training: A training program on RCM should be conducted at different levels. Management should be provided an
awareness program about RCM and its benefit. A well-structured training program on RCM methodology and imple-
mentation should be provided to the team.
3. Avail resources: Estimate what type of resources is needed and ensure their availability.
4.Manage expectations: Establish a baseline for the current performance and the expected benefits from RCM.
5. Schedule: Prepare a schedule for RCM project on a Gantt chart with the necessary resources. A schedule will
facilitate follow-up and monitoring.
6. Change management program: Develop a change management program to mitigate resistance to the RCM project
and ensure buying. The program may have an awareness program, training, and reorganization.

Analysis and Design


This stage deals with the development of the optimized maintenance program using RCM methodology. It includes
the selection of the methodology and its application. The team is expected to follow the RCM steps. For each
maintenance task designits maintenance procedure. The procedure should specify task requirements in terms of
manpower, spare parts, standard time, and schedule. The outcomes of this stage are maintenance tasks for each
functional failure.
Execution Stage
In this stage, the team integrates RCM tasks in the maintenance schedule. The impact of the new tasks is estimated in
terms of benefits and cost. The change management program is reviewed and enhanced if needed.

Assessment and Feedback Stage


At this stage, the RCM project is implemented either in full or as a pilot case. The impact of the RCM project must be
measured and assessed using relevant realistic performance measures. The measures may include availability, quality
rate, reliability, and cost. Comparisons with prior RCM performance and current target will reveal RCM impact and
aide in identifying needed improvement.

87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
UNIT V
TOTAL PRODUCTIVE MAINTENANCE

Nakajima, who is considered by many in the literature as the father of total pro-ductive maintenance (TPM),
defines it [5]as“productive maintenance carried out by all employees through small group activities.”He also
adds“TPM is equipment maintenance performed on a company wide basis.”The authors define TPM as a management
approach to maintenance that imports total quality management (TQM) philosophy and techniques to maintenance.
TPM focuses on involving all employees in the organization in equipment improvement.

TPM (Total Productive Maintenance) is a holistic approach to equipment maintenance that strives to achieve perfect
production:
 No Breakdowns
 No Small Stops or Slow Running
 No Defects
In addition it values a safe working environment:
 No Accidents
TPM emphasizes proactive and preventative maintenance to maximize the operational efficiency of equipment. It
blurs the distinction between the roles of production and maintenance by placing a strong emphasis on empowering
operators to help maintain their equipment.
The implementation of a TPM program creates a shared responsibility for equipment that encourages greater
involvement by plant floor workers. In the right environment this can be very effective in improving productivity
(increasing up time, reducing cycle times, and eliminating defects).

106
Why Chronic Losses are Neglected

as a team.

Reducing and Eliminating Chronic Losses

107
o its original operating conditions

OEE AND THE SIX BIG LOSSES

Introduction to OEE
OEE (Overall Equipment Effectiveness) is a metric that identifies the percentage of planned production time that is
truly productive. It was developed to support TPM initiatives by accurately tracking progress towards achieving
“perfect production”.

 An OEE score of 100% is perfect production.


 An OEE score of 85% is world class for discrete manufacturers.
 An OEE score of 60% is fairly typical for discrete manufacturers.
 An OEE score of 40% is not uncommon for manufacturers without TPM and/or lean programs.

OEE consists of three underlying components, each of which maps to one of the TPM goals set out at the beginning of
this topic, and each of which takes into account a different type of productivity loss.

Component TPM Goal Type of Productivity Loss

Availability No Stops Availability takes into account Availability Loss, which includes all events that stop
planned production for an appreciable length of time (typically several minutes or
longer). Examples include Unplanned Stops (such as breakdowns and other down
events) and Planned Stops (such as changeovers).

Performance No Small Stops Performance takes into account Performance Loss, which includes all factors that
or Slow cause production to operate at less than the maximum possible speed when running.
Running Examples include both Slow Cycles, and Small Stops.

Quality No Defects Quality takes into account Quality Loss, which factors out manufactured pieces that
do not meet quality standards, including pieces that require rework. Examples include
Production Rejects and Reduced Yield on startup.

OEE Perfect OEE takes into account all losses (Availability Loss, Performance Loss, and Quality
Production Loss), resulting in a measure of truly productive manufacturing time.

For a complete discussion of OEE, including information on how to calculate Availability, Performance, Quality, and
OEE visit our dedicated OEE (Overall Equipment Effectiveness) page.

108
As can be seen from the above table, OEE is tightly coupled to the TPM goals of No Breakdowns (measured by
Availability), No Small Stops or Slow Running (measured by Performance), and No Defects (measured by Quality).

It is extremely important to measure OEE in order to expose and quantify productivity losses, and in order to measure
and track improvements resulting from TPM initiatives.

Benefits of Automated OEE Tracking


Manually calculating OEE is a great way to start. It can be done with pencil and paper or with a simple spreadsheet,
and only five pieces of data are needed (Planned Production Time, Stop Time, Ideal Cycle Time, Total Count, and
Good Count). Performing manual OEE calculations helps reinforce the underlying concepts and provides a deeper
understanding of OEE. However, there are also very strong benefits to quickly moving to automated OEE data
collection:

Item Benefit

Stop Time The accuracy of manual unplanned stop time tracking is typically in the range of 60
to 80% (based on real-world experience across many companies). With automatic
Run/Down detection, this accuracy can approach 100%.

Small Stops For most equipment it is impossible to manually track slow cycles and small stops.
and Slow This means that a great deal of potentially useful information, such as time-based and
Cycles event-based loss patterns, is not available.

Operator Focus With automated data collection the operator spends more time focused directly on the
equipment (versus spending time on paperwork).

Real-Time Automated data collection provides results in real-time, enabling improvement


Results techniques such as SIC (Short Interval Control).

Creating a “Best of the Best” OEE Goal


An interesting question is how to set an effective “stretch” goal for OEE. As it happens, there is an excellent technique
for doing so called “Best of the Best”. Here is how it works:

1. Track OEE (including Availability, Performance, and Quality) for the target equipment for one month. Make
sure to compile the results by shift.
2. Review every shift result, keeping track of the best individual result for Availability, Performance, and
Quality across all shifts (i.e. the highest Availability score across all shifts, the highest Performance score
across all shifts, etc.).
3. Multiply the best individual results together to calculate a “Best of the Best” OEE score.

109
This newly calculated “Best of the Best” OEE score represents the stretch goal – derived from the best results actually
achieved across the month for Availability, Performance, and Quality.

Understanding the Six Big Losses


OEE loss categories (Availability Loss, Performance Loss, and Quality Loss) can be further broken down into what is
commonly referred to as the Six Big Losses – the most common causes of lost productivity in manufacturing. The Six
Big Losses are extremely important because they are nearly universal in application for discrete manufacturing, and
they provide a great starting framework for thinking about, identifying, and attacking waste (i.e. productivity loss).

Six Big OEE


Losses Category Examples Comments

Unplanned Availability Tooling Failure, Unplanned There is flexibility on where to set


Stops Loss Maintenance, Overheated the threshold between an
Bearing, Motor Failure Unplanned Stop (Availability Loss)
and a Small Stop (Performance
Loss).

Setup and Availability Setup/Changeover, Material This loss is often addressed through
Adjustments Loss Shortage, Operator Shortage, setup time reduction programs such
Major Adjustment, Warm-Up as SMED (Single-Minute Exchange
Time of Die).

Small Stops Performance Component Jam, Minor Typically only includes stops that
Loss Adjustment, Sensor Blocked, are less than five minutes and that
Delivery Blocked, do not require maintenance
Cleaning/Checking personnel.

Slow Running Performance Incorrect Setting, Equipment Anything that keeps the equipment
Loss Wear, Alignment Problem from running at its theoretical
maximum speed.

Production Quality Loss Scrap, Rework Rejects during steady-state


Defects production.

110
Six Big OEE
Losses Category Examples Comments

Reduced Yield Quality Loss Scrap, Rework Rejects during warm-up, startup or
other early production.

TPM Goals and Key Elements

The JIPE in its definition of TPM in 1971 stated that TPM seeks the following five key goals:

• maximize OEE, which includes availability, process efficiency, and product quality;
• take a systematic approach to reliability, maintainability, and life cycle costs (LCC);
• involve operations, materials management, maintenance, engineering, and administration in equipment management;
• involve all levels of management and workers; and
• improve equipment performance through small group activities and team performance.

The key elements of TPM include the following:

• Autonomous maintenance: Equipment operators are the focal point of TPM activities. Although most operators
understand what their equipment does, few understand the underlying mechanisms of how it does it. The term
“autonomous maintenance” is used to describe the activities of the operators, which relate to equipment maintenance,
and to the independent study nature of the other equipment improvement activities. Operators would perform cleaning,
inspection, lubrication, adjustments, and minor component change outs and other light maintenance tasks requiring
some training and instruction, but not comprehensive craftsman skills. The operator gradually learns how to diagnose
equipment problems before they become serious.

• Equipment Management: In TPM, whenever equipment performs at a level less than is required, the performance
loss is recorded and monitored. These losses can be grouped into six categories: breakdowns, setup and adjustments,
idling and minor stoppages, reduced speed, defects, and yield losses. Breakdowns and setups cause downtime and
impact availability, reduced speed impacts the cycle time and defects, and yield losses impact quality. OEE is the key
TPM performance measure and is the product of availability, cycle time, and quality rate. The operator and maintainer
are trained to identify problems related to OEE and perform root cause analyses in teams to investigate the losses.

• Systematic Planning and Continuous Improvement: Within the maintenance department, the TPM methodology
encourages the development of systematic planning and control of preventive and corrective maintenance, and fully
sup-ports the autonomous activities performed by the operator. In plants where the basic operating and maintaining
environment has been improved to the point of diminishing returns, active maintenance prevention activities are
undertaken, as described earlier in the sections on designing for maintainability. Throughout, there should be a strong
emphasis on improving operator and maintainer skills.Spending on training is customarily on the order of 5–8 % of
the labor budget.

Autonomous Maintenance

The benefits of involving the operators in the success of TPM cannot be overemphasized. A pragmatic way of
achieving this is by using a systematic, data-based approach to skill transfer. Skill transfer is the process of moving

111
tasks requiring lower skills from the exclusive domain of one work group to a shared task zone. Under this policy, an
operator who has been properly trained and certified can perform a mechanic’s task and vice versa. This partnership
between operations and maintenance integrates maintenance and operation/manufacturing and has many benefits that
include the following:

• Operators and mechanics become multi-skilled, which leads to job enrichment and improved flexibility of workers.
• The involvement of operators in routine maintenance builds a sense of responsibility, pride, and ownership.
• Delay times are reduced and productivity is increased.
• Teamwork between operations and maintenance is promoted.

The 8 Pillars of Total Productive Maintenance (TPM)

Traditional total productive maintenance was developed by Seiichi Nakajima of Japan. The results of his work on the
subject led to the TPM process in the late 1960s and early 1970s. Nippon Denso (now Denso), a company that created
parts for Toyota, was one of the first organizations to implement a TPM program. This resulted in an internationally
accepted benchmark for how to implement TPM. Incorporating lean manufacturing techniques, TPM is built on eight
pillars based on the 5-S system. The 5-S system is an organizational method based around five Japanese words and
their meaning:

The eight pillars of total productive maintenance focus on proactive and preventive techniques to help improve
equipment reliability. The eight pillars are: autonomous maintenance; focused improvement (kaizen); planned
maintenance; quality management; early equipment management; training and education; safety, health and
environment; and TPM in administration. Let's break down each pillar below.

1.
Autonomous maintenance: Autonomous maintenance means ensuring your operators are fully trained on
routine maintenance like cleaning, lubricating and inspecting, as well as placing that responsibility solely in
their hands. This gives machine operators a feeling of ownership of their equipment and increases their
knowledge of the particular piece of equipment. It also guarantees the machinery is always clean and
lubricated, helps identify issues before they become failures, and frees up maintenance staff for higher-level
tasks.

Implementing autonomous maintenance involves cleaning the machine to a "baseline" standard that the
operator must maintain. This includes training the operator on technical skills for conducting a routine
inspection based on the machine's manual. Once trained, the operator sets his or her own autonomous
inspection schedule. Standardization ensures everyone follows the same procedures and processes.

2. Focused improvement: Focused improvement is based around the Japanese term "kaizen," meaning
"improvement." In manufacturing, kaizen requires improving functions and processes continually. Focused
improvement looks at the process as a whole and brainstorms ideas for how to improve it. Getting small teams
in the mindset of proactively working together to implement regular, incremental improvements to processes
pertaining to equipment operation is key for TPM. Diversifying team members allows for the identification of
recurring problems through cross-functional brainstorming. It also combines input from across the company
so teams can see how processes affect different departments.

In addition, focused improvement increases efficiency by reducing product defects and the number of
processes while enhancing safety by analyzing the risks of each individual action. Finally, focused
improvement ensures improvements are standardized, making them repeatable and sustainable.

3. Planned maintenance: Planned maintenance involves studying metrics like failure rates and historical
downtime and then scheduling maintenance tasks based around these predicted or measured failure rates or

112
downtime periods. In other words, since there is a specific time to perform maintenance on equipment, you
can schedule maintenance around the time when equipment is idle or producing at low capacity, rarely
interrupting production.

Additionally, planned maintenance allows for inventory buildup for when scheduled maintenance occurs.
Since you'll know when each piece of equipment is scheduled for maintenance activities, having this
inventory buildup ensures any decrease in production due to maintenance is mitigated.

Taking this proactive approach greatly reduces the amount of unplanned downtime by allowing for most
maintenance to be planned for times when machinery is not scheduled for production. It also lets you plan
inventory more thoroughly by giving you the ability to better control parts that are prone to wear and failure.
Other benefits include a gradual decrease in breakdowns leading to uptime and a reduction in capital
investments in equipment since it is being used to its maximum potential.

4. Quality maintenance: All the maintenance planning and strategizing in the world is all for naught if the
quality of the maintenance being performed is inadequate. The quality maintenance pillar focuses on working
design error detection and prevention into the production process. It does this by using root cause
analysis (specifically the "5 Whys") to identify and eliminate recurring sources of defects. By proactively
detecting the source of errors or defects, processes become more reliable, producing products with the right
specifications the first time.

Possibly the biggest benefit of quality maintenance is it prevents defected products from moving down the
line, which could lead to a lot of rework. With targeted quality maintenance, quality issues are addressed, and
permanent countermeasures are put in place, minimizing or completely eliminating defects and downtime
related to defected products.

5. Early equipment management: The TPM pillar of early equipment management takes the practical
knowledge and overall understanding of manufacturing equipment acquired through total productive
maintenance and uses it to improve the design of new equipment. Designing equipment with the input of
people who use it most allows suppliers to improve maintainability and the way in which the machine
operates in future designs.

When discussing the design of equipment, it's important to talk about things like the ease of cleaning and
lubrication, accessibility of parts, ergonomically placing controls in a way that is comfortable for the operator,
how changeovers occur and safety features. Taking this approach increases efficiency even more because new
equipment already meets the desired specifications and has fewer startup issues, therefore reaching planned
performance levels quicker.

6. Training and education: Lack of knowledge about equipment can derail a TPM program. Training and
education applies to operators, managers and maintenance personnel. They are intended to ensure everyone is
on the same page with the TPM process and to address any knowledge gaps so TPM goals are achievable.
This is where operators learn skills to proactively maintain equipment and identify emerging problems. The
maintenance team learns how to implement a proactive and preventive maintenance schedule, and managers
become well-versed in TPM principles, employee development and coaching. Using tools like single-point
lessons posted on or near equipment can further help train operators on operating procedures.

7. Safety, health and environment: Maintaining a safe working environment means employees can perform
their tasks in a safe place without health risks. It's important to produce an environment that makes production
more efficient, but it should not be at the risk of an employee's safety and health. To achieve this, any
solutions introduced in the TPM process should always consider safety, health and the environment.

113
Aside from the obvious benefits, when employees come to work in a safe environment each day, their attitude
tends to be better, since they don't have to worry about this significant aspect. This can increase productivity
in a noticeable manner. Considering safety should be especially prevalent during the early equipment
management stage of the TPM process.

8. TPM in administration: A good TPM program is only as good as the sum of its parts. Total productive
maintenance should look beyond the plant floor by addressing and eliminating areas of waste in administrative
functions. This means supporting production by improving things like order processing, procurement and
scheduling. Administrative functions are often the first step in the entire manufacturing process, so it's
important they are streamlined and waste-free. For example, if order-processing procedures become more
streamlined, then material gets to the plant floor quicker and with fewer errors, eliminating potential
downtime while missing parts are tracked down.

How to Implement Total Productive Maintenance (TPM)

Now that you have an understanding of the foundation (5-S system) and pillars on which the TPM process is built,
let's take a look at how to implement a TPM program. This is generally done in five steps: identifying a pilot area,
restoring equipment to prime operating condition, measuring OEE, addressing and reducing major losses, and
implementing planned maintenance.

Step 1: Identify a Pilot Area


Using a pilot area to begin implementation helps gain more acceptance from staff when they see the benefits that come
out of the process. When choosing equipment for a pilot area, consider these three questions:

 What's the easiest to improve? Selecting equipment that is easiest to improve gives you the chance for
immediate and positive results; however, it doesn't test the TPM process as strongly as the other two options.

 Where's the bottleneck? Choosing equipment based on where production is clearly being held up gives you an
immediate increase in total output and provides quick payback. The downside is that employing this
equipment as a pilot means you're using a critical asset as an example and risk the chance of it being offline
longer than you would like.

 What's the most problematic? Fixing equipment that gives operators the most trouble will be well-received,
strengthening support for the TPM program. However, this doesn't give you as much immediate payback as
the previous approach, and it may be challenging to obtain a quick result from figuring out an unsolved
problem, leading to disinterest.

If this is your first time implementing a TPM program, your best choice is typically the first approach – the easiest
equipment to improve. If you have some or extensive experience with total productive maintenance, you may choose
to correct the bottleneck. This is because you can build temporary stock or inventory, making sure downtime can be
tolerated, which minimizes risk.

Include employees across all aspects of your business (operators, maintenance personnel, managers and
administration) in the pilot selection process. It's a good idea to use a visual like a project board where you can post
progress for all to see.

114
Step 2: Restore Equipment to Prime Operating Condition
The concept of restoring equipment to prime operating condition revolves around the 5-S system and autonomous
maintenance. First, TPM participants should learn to continuously keep equipment to its original condition using the
5-S system: organize, cleanliness, orderliness, standardize and sustain. This might include:

 Photographing the area and current state of the equipment and then posting them to your project board.

 Clearing the area by removing unused tools, debris and anything that can be considered waste.

 Organizing the tools and components you use regularly (a shadow board with tool outlines is a popular
option).

 Cleaning the equipment and the surrounding area thoroughly.

 Photographing the improvements of the equipment and surrounding area and then posting to the project board.

 Creating a standardized 5-S work process to maintain the continuity of this process.

 Auditing the process with lessening frequency (first daily, then weekly, etc.) to ensure the 5-S process is being
followed (update the process to keep it current and relevant).

Once you've established a baseline state of the equipment, you can implement the autonomous maintenance program
by training operators on how to clean equipment while inspecting it for wear and abnormalities. Creating an
autonomous maintenance program also means developing a standardized way to clean, inspect and lubricate
equipment correctly. Items to address during the planning period for the autonomous maintenance program include:

 Identifying and documenting inspection points, including parts that endure wear.

 Increasing visibility where possible to help with inspection while the machine is running (replacing opaque
guarding with transparent guarding).

 Identifying and clearly labeling set points with their corresponding settings (most people put labels with
settings directly on the equipment).

 Identifying all lubrication points and scheduling maintenance during changeovers or planned downtime
(consider placing difficult-to-access lubrication points that require stopping the machine on the outside of the
equipment).

 Training operators to make them aware of any emerging or potential issues so they can report them to the line
supervisor.

 Creating an autonomous maintenance checklist for all operator-controlled tasks.

 Auditing the process with lessening frequency to ensure the checklist is being followed.

Step 3: Measure OEE


Step three requires you to track OEE for the target equipment, either manually or using automated software (as long as
it includes code tracking for unplanned stoppage time). For details on how to calculate OEE manually, reference
Reliable Plant's article on OEE. Regularly measuring OEE gives you a data-driven confirmation on whether your
TPM program is working and lets you track progress over time.

115
Since the biggest losses in regard to equipment are the result of unplanned downtime, it's important to categorize
every unplanned stoppage event. This gives you a more accurate look at where a stoppage is occurring. Include an
"unknown" or "unallocated" stoppage time category for unknown causes.

It's recommended that you gather data for a minimum of two weeks to get an accurate representation of the unplanned
stoppage time and a clear picture of how small stops and slow cycles impact production. Below is a simplified
example of a top 5 loss chart. Each loss is categorized and is in descending order from the loss that causes the most
downtime to the loss that causes the least.

Top 5 Loss Chart

Loss Rank Loss Category Lost Time (minutes)

1 Equipment Failure: Filler Jam 400

2 Equipment Failure: Bottle Labeler Down 250

3 Setup/Adjustments: Bottle Change 170

4 Setup/Adjustments: Label Change 165

5 Equipment Failure: Bottle Jam 10

Total Lost Time = 995 minutes (16.5 hours)

Step 4: Address/Reduce Major Losses


Once you've got a data-driven snapshot of where your top losses are, it's time to address them. This step uses the
previously discussed pillar of focused improvement or kaizen. To do this, put together a cross-functional team of
operators, maintenance personnel and supervisors that can dissect the OEE data using root cause analysis and identify
the main cause(s) of the losses. Your team's process might look something like this:

116
 Select a loss based on OEE and stoppage time data. This should be the biggest source of unplanned stoppage
time.

 Look into the symptoms of the problem(s). Collect detailed information on symptoms like observations,
physical evidence and photographic evidence. Using a fishbone diagram to track symptoms and record
information while you're at the equipment is strongly recommended.

 With your team, discuss and identify potential causes of the problem(s), check the possible causes against the
evidence you've gathered, and brainstorm the most effective ways to solve the issue.

 Schedule planned downtime to implement the agreed-upon fixes.

 Once the fix has been implemented, restart production and observe how effective the fix is over time. If it
resolves the issue, make a note to implement the change and move onto the next cause of stoppage time. If
not, gather more information and hold another brainstorming session.

Step 5: Implement Planned Maintenance


The last step of the TPM implementation process is the integration of proactive maintenance techniques into your
program. This involves working off the third pillar of planned maintenance. Choose which components should receive
proactive maintenance by looking at three factors: wear components, components that fail and stress points.
Identifying stress points is often done by using infrared thermography and vibration analysis.

Next, use proactive maintenance intervals. These intervals are not set in stone and can be updated as needed. For wear
and predicted failure-based components, establish the current wear level and then a baseline replacement interval.
Once these have been determined, you can create a proactive replacement schedule of all wear- and failure-prone
components. When doing this, use "run time" as opposed to "calendar time." Finally, develop a standardized process
for creating work orders based on the planned maintenance schedule.

You can optimize maintenance intervals by designing a feedback system. Things like log sheets for each wear- and
failure-prone component where operators can record replacement information and component condition at the time of
replacement will be key. Additionally, conduct monthly planned maintenance audits to verify the maintenance
schedule is being followed and the component logs are being kept up to date. Review the logs' information to see if
adjustments to the maintenance schedule need to be made.

What About The Remaining Four TPM Pillars?


You may have noticed the implementation process negated four of the eight pillars: quality management, early
equipment management, safety and TPM in administration. So, when should you introduce these activities? They
should be instituted as needed. Let's take a look at some examples.

 Quality maintenance should be introduced to the TPM process when significant issues about quality are
being raised by customers or employees.
 The best time to use early equipment management is when new equipment is in the design phase or is being
installed.
 Safety, health and environment should always be at the forefront of any process or program design. Use it in
tandem with the five-step implementation process.
 TPM in administration should be addressed before you implement the final version of your planned
maintenance schedule. Issues in administration like work order delays, processing problems and part
procurement greatly delay the rest of the production process.

117
118

You might also like