Unit-1 Maintenance Management
Unit-1 Maintenance Management
1. Planning and Scheduling: Maintenance managers plan when and how maintenance
will be carried out to ensure minimal disruption to operations. This includes both
scheduled preventive maintenance and unplanned corrective maintenance.
2. Maintenance Operations: This involves performing preventive maintenance
(regularly scheduled maintenance) and corrective maintenance (repair after a failure
occurs).
3. Resource Management: It involves managing labor, spare parts, tools, and equipment
to carry out maintenance effectively. Maintenance management ensures that the right
resources are available at the right time.
4. Equipment Monitoring: Continuous monitoring of equipment performance is
essential. This includes tracking wear and tear, detecting anomalies, and using
predictive maintenance techniques to anticipate failures before they happen.
5. Documentation and Record-Keeping: Maintaining detailed records of all
maintenance activities is critical for tracking equipment performance, understanding
failure trends, and improving future maintenance plans.
6. Budgeting and Cost Control: Maintenance management involves preparing and
managing the maintenance budget. It ensures that maintenance costs are controlled
without sacrificing equipment reliability.
Organization and Administration of Maintenance Systems
Maintenance systems require a structured approach to manage both people and processes
efficiently. This involves setting up the right organization, administration policies, and
frameworks.
1. Clear Objectives: The maintenance system must have well-defined objectives, such as
reducing downtime, optimizing costs, and improving equipment lifespan.
2. Skilled Workforce: Maintenance staff must be properly trained and qualified to handle
maintenance tasks. This includes having technicians, engineers, supervisors, and
planners with the necessary skills.
3. Adequate Resources: The system must have access to essential tools, spare parts, and
equipment for performing maintenance tasks. Resource availability is critical to avoid
delays in repairs.
4. Proper Documentation: Maintenance records, equipment history, and manuals should
be well-maintained. This helps in tracking the performance of equipment, identifying
patterns in failures, and planning future maintenance.
5. Safety and Compliance: The maintenance system must prioritize safety standards,
ensuring compliance with regulations and minimizing the risk of accidents.
The organization structure for maintenance should align with the size and complexity of the
operations. Below are some key structural elements:
1. Centralized vs. Decentralized Maintenance:
o Centralized Maintenance: Maintenance activities are managed and controlled
by a single department, ensuring uniformity and central control over processes,
policies, and resource allocation.
o Decentralized Maintenance: Maintenance responsibilities are distributed
across different departments or locations. This approach allows for more
flexibility and faster response times in local areas.
2. Hierarchical Structure:
o Maintenance Manager: Oversees the entire maintenance department, setting
policies, managing budgets, and ensuring the overall objectives of the
maintenance system are met.
o Maintenance Supervisor/Planner: Responsible for planning and scheduling
maintenance tasks, coordinating with other departments, and ensuring that
maintenance is carried out efficiently.
o Maintenance Technicians/Engineers: Skilled individuals who perform the
actual maintenance tasks, such as repairs, inspections, and replacements.
o Operators: In some systems (e.g., TPM), machine operators are trained to
perform basic maintenance tasks, like cleaning, lubricating, and minor
adjustments.
3. Functional Areas:
o Preventive Maintenance Department: Focuses on scheduling and executing
regular maintenance to avoid breakdowns.
o Emergency/Corrective Maintenance Department: Handles unplanned
repairs and breakdowns, ensuring quick restoration of equipment.
o Planning and Scheduling: This team is responsible for creating maintenance
schedules, coordinating resources, and ensuring minimal disruption to
operations.
o Inventory Management: Ensures the availability of spare parts and tools
required for maintenance.
Administration of Maintenance Systems
1. Maintenance Policy: The administration must define clear maintenance policies that
outline the goals (e.g., downtime reduction, cost control), procedures, and performance
standards.
2. Work Planning and Scheduling: Effective administration includes developing a work
plan that outlines what needs to be done, when it needs to be done, and how resources
will be allocated. Scheduling maintenance during non-peak times to avoid production
losses is essential.
3. Budget and Cost Control: Administration should monitor and control maintenance
costs, ensuring that the spending is justified by the improvement in equipment
reliability and performance.
4. Performance Monitoring and Reporting: The administration must track the
performance of the maintenance system through KPIs (Key Performance Indicators),
such as mean time between failures (MTBF), mean time to repair (MTTR), and overall
equipment effectiveness (OEE).
5. Training and Development: Ensuring that the maintenance staff is continuously
trained on the latest technologies, safety practices, and equipment handling is a key
administrative function.
Failure Analysis
1. Identify Root Causes of Failures: The main goal is to find out why a failure occurred,
whether it's due to material fatigue, design flaws, operational errors, or external factors.
2. Prevent Recurrence: By understanding the cause of the failure, preventive measures
can be put in place to avoid similar incidents in the future.
3. Improve Equipment Reliability: Identifying weak points in a system allows for design
or operational improvements, which enhances the overall reliability of the equipment.
4. Optimize Maintenance Strategies: Failure analysis provides insights that help in fine-
tuning maintenance schedules, such as when to perform preventive or predictive
maintenance.
5. Safety Enhancement: Preventing catastrophic failures ensures the safety of both the
equipment and personnel.
1. Failure Detection: The first step is recognizing that a failure has occurred. This could
be through performance monitoring, inspections, or operator observations.
2. Data Collection: Relevant data is collected to investigate the failure. This includes
operating conditions, maintenance history, material specifications, and any unusual
occurrences leading to the failure.
3. Failure Examination: The failed component is physically examined. Methods like
visual inspection, microscopic analysis, non-destructive testing (NDT), and material
testing (e.g., tensile tests, hardness tests) are used to gather clues about the failure.
4. Failure Mode Identification: The next step is identifying the type of failure. Common
failure modes include:
o Fatigue: Failure due to repetitive stress cycles, often resulting in cracks.
o Wear: Material loss due to friction or erosion.
o Corrosion: Degradation of material due to chemical reactions with the
environment.
o Overload: Failure when the load exceeds the material's strength.
o Creep: Gradual deformation of materials under high temperatures and sustained
stress.
o Brittle Fracture: Sudden failure without significant deformation, typically in
brittle materials.
5. Root Cause Analysis (RCA): Various techniques are used to determine the root cause
of the failure:
o Fishbone Diagram (Ishikawa): This helps visualize the potential causes of
failure by categorizing them into areas like machinery, methods, materials, and
manpower.
o 5 Whys Technique: A simple yet effective method of asking "Why?" five times
to drill down to the underlying cause of the problem.
o Fault Tree Analysis (FTA): A more detailed, logical method that uses a tree
structure to identify potential causes of system failures.
6. Corrective Action: Once the root cause is identified, corrective measures are taken.
This could involve design modifications, changing operating conditions, revising
maintenance procedures, or implementing new safety measures.
7. Documentation and Reporting: The findings of the failure analysis should be
documented, and a report should be prepared that includes the causes of the failure, the
corrective actions taken, and recommendations for future prevention.
Source identification in failure analysis refers to determining the origin of the failure within
a system or process. This step is crucial for identifying not just the direct cause of failure, but
also the contributing factors that led to it. By isolating the sources of failure, maintenance
engineers can target corrective actions more effectively.
1. Pinpoint the Exact Cause: The main objective is to locate the exact source of the
failure within a system, which may be related to design flaws, material defects,
operational conditions, or external influences.
2. Determine Contributing Factors: Failures may not occur due to a single issue but
rather a combination of factors. Source identification helps in understanding these
interactions.
3. Improve Future Reliability: Identifying the source of the failure allows for informed
design, operational, or maintenance improvements to prevent future occurrences.
Imagine a rotating shaft in a pump has failed prematurely. After performing failure analysis,
the following findings are identified:
• The primary source of the failure was a fatigue crack that initiated at a sharp corner in
the shaft’s design.
• The secondary source was improper alignment during installation, which increased the
load on the shaft at the stress concentration point.
• Further material testing showed the shaft material was substandard, which accelerated
the fatigue process.
By identifying these sources, the corrective actions would involve redesigning the shaft to
avoid sharp corners, improving installation procedures to ensure proper alignment, and
sourcing higher-grade materials.
Classification of Failures
Failures can be classified based on several criteria, including the nature, cause, and time of
occurrence. Here are the primary categories:
A. By Nature of Failure:
• Catastrophic Failure: A sudden and complete breakdown of the system or component
(e.g., a gearbox shattering due to overloading).
• Gradual Failure: A slow deterioration in performance over time (e.g., corrosion or
wear leading to reduced efficiency).
B. By Cause of Failure:
• Mechanical Failure: Occurs due to issues like overloading, fatigue, or wear.
• Electrical Failure: Caused by issues in electrical systems like short circuits or
insulation breakdown.
• Thermal Failure: Occurs when components overheat or are exposed to extreme
temperature changes.
• Corrosion Failure: Degradation of materials due to chemical reactions with the
environment.
C. By Time of Failure:
• Early Failure (Infant Mortality): Failures that occur shortly after installation or
during the early stages of operation, often due to manufacturing defects or improper
installation.
• Random Failure: Failures that occur randomly during the product’s lifetime and are
not related to wear or aging (e.g., sudden electronic malfunction).
• Wear-Out Failure: Failures that occur after a long period of use, as components reach
the end of their useful life due to wear or material degradation.
Selectivity of Failures
Selectivity refers to how specific components or subsystems within a larger system are more
likely to fail under certain conditions. This can be influenced by factors like:
• Operating Conditions: Certain parts of a system may be subjected to harsher
conditions (e.g., high temperature or load) and are thus more prone to failure.
• Design and Material: Some materials or designs are more vulnerable to specific failure
modes, like fatigue or corrosion.
• Maintenance Practices: Components that are harder to access or maintain regularly
may fail more frequently due to lack of proper maintenance.
Reliability engineering uses statistical methods to analyze and predict the likelihood of failures
over time. The key statistical and reliability concepts used in failure analysis are:
A. Bathtub Curve:
The bathtub curve is a common graphical representation of failure rates over time and is divided
into three phases:
1. Infant Mortality Phase: High failure rate at the beginning due to manufacturing
defects or improper installation.
2. Normal Life Phase: Low, constant failure rate where failures occur randomly (due to
unexpected events).
3. Wear-Out Phase: Increasing failure rate as the equipment ages and wears out.
This curve helps in understanding when most failures occur and aids in planning preventive
maintenance strategies.
D. Reliability (R(t)):
Reliability is the probability that a component or system will perform without failure over a
specified period. It is related to the failure rate and time (t).
R(t)=e−λtR(t) = e^{-\lambda t}R(t)=e−λt
For example, if the failure rate (λ\lambdaλ) is 0.001 failures per hour and you want to calculate
the reliability over 500 hours, the reliability is:
R(500)=e−0.001×500=e−0.5≈0.606R(500) = e^{-0.001 \times 500} = e^{-0.5} \approx
0.606R(500)=e−0.001×500=e−0.5≈0.606
This means there’s approximately a 60.6% chance that the component will not fail within 500
hours of operation.
E. Weibull Distribution:
The Weibull distribution is a widely used statistical model in failure analysis, as it can represent
different failure rates over time. It is characterized by two parameters:
1. Shape parameter (β\betaβ): Determines the failure rate trend.
o If β<1\beta < 1β<1, it indicates a decreasing failure rate (early failures).
o If β=1\beta = 1β=1, it indicates a constant failure rate (random failures).
o If β>1\beta > 1β>1, it indicates an increasing failure rate (wear-out failures).
2. Scale parameter (η\etaη): Represents the characteristic life, which is the time at which
63.2% of the population will have failed.
For example, if the scale parameter η=1000\eta = 1000η=1000 hours and shape parameter
β=2\beta = 2β=2, then the distribution indicates that wear-out failures become more common
after 1000 hours.
Numerical Example (Weibull Distribution):
If you have a component with a Weibull distribution where β=2\beta = 2β=2 and η=1000\eta =
1000η=1000 hours, you can calculate the probability of failure after 500 hours using the
cumulative distribution function (CDF) of Weibull:
F(t)=1−e−(tη)βF(t) = 1 - e^{-(\frac{t}{\eta})^\beta}F(t)=1−e−(ηt)β
Substitute t=500t = 500t=500, η=1000\eta = 1000η=1000, and β=2\beta = 2β=2:
F(500)=1−e−(5001000)2=1−e−0.25≈0.221F(500) = 1 - e^{-(\frac{500}{1000})^2} = 1 - e^{-
0.25} \approx 0.221F(500)=1−e−(1000500)2=1−e−0.25≈0.221
This means there is a 22.1% probability that the component will fail within the first 500 hours
of operation.
Failure modes describe the specific way a component fails. In failure analysis, identifying the
mode is essential to understanding the root cause and selecting corrective actions.
A. Fatigue Failure:
Occurs due to cyclic stresses below the material's ultimate tensile strength, leading to crack
formation and eventual fracture. This is common in components subject to repetitive loading,
like crankshafts.
B. Corrosion Failure:
Results from chemical reactions between materials and their environment, degrading the
material over time (e.g., rusting of steel).
C. Wear Failure:
Occurs due to the removal of material from a surface due to friction or erosion. Bearings and
gears are common components affected by wear.
D. Creep Failure:
Gradual deformation of materials under constant stress and elevated temperatures. This is
typical in high-temperature applications like turbines or boilers.
E. Overload Failure:
Occurs when the stress on a component exceeds its design strength, resulting in immediate
fracture or permanent deformation. Overload failures are often due to unexpected operational
conditions or improper design.