Determine Maintenance Strategy
Determine Maintenance Strategy
Unit one
Identify and analyze Maintenance Needs
Introduction:
Now a day it’s difficult to run any business without the help of ICT. People use IT
components in one or another way. From small enterprise to large companies rely on
this technology for better improvement. So the components used in these business
environments can’t work as it’s intended to. Hardware can fail due to different reasons
like age, power problems, improper handling, malicious user etc. software can corrupt,
and data can be lost or corrupt. The problem here is that when the ICT component
fails it can affect the normal functioning of the whole business. So it’s worthwhile to
incorporate maintenance plan as another business plan in order to resume the
running of the business to its normal condition after the failure occurred. Depending
on the type of business and people’s choice there are different types of maintenance
strategies. Some people may wait until failure occurs and take immediate reaction
when fault is determined. Others may take preventive action in order to prevent fault.
It’s the people’s interest to choose one method of maintenance but as an IT
professional it’s recommended to prevent failure.
A risk analysis program entails the identification of the most likely threats to business
continuity and deciding which areas of a company are most susceptible to these
threats. The consequences of failure once these threats are realized are then
considered in Business Impact forums.
Threats to Continuity
Possible threats to business continuity can be external or internal and can be
natural, technical or human related. Even though it can be difficult to determine the
exact nature of potential failures, it is important that risks be assessed and if possible
quantified.
Consideration should include the geography of the business location including the
proximity of rivers, landslide areas, power stations, airports, highways that may carry
hazardous waste, and potential terrorist targets or accident zones. The history of the
local area should be investigated to ascertain the level and regularity of natural
disasters. Accessibility is another factor with security being an aspect affecting the
likelihood of any attack on premises or infrastructure.
The track record of any Utilities used should also be factored in with older power
stations more susceptible to failure and therefore more likely to be responsible for
downtime.
Varying levels of automation and the amount of technology used will result in varying
susceptibility to these threats with existing backup systems and services needing to be
included when deciding on the final level of risk for each area.
Communications is an area often needing specialist analysis due to the nature and
sophistication of the technology and companies such as Telecommunication can assist
in telecom risk analysis and evaluation of the telecom recovery options.
It's a worthwhile exercise to quantify the various threats in terms of overall impact.
There is an array of methods used, plus the option for professional help, but a simple
analysis could involve a combination of an impact level and probability ratings.
1 Minor impact with disruption up to 2 hours. This would cover the more usual
: threats such as power outages and internal application failures.
2 Disruptive impact up to 8 hours. Hardware failures and malicious damage would
: usually fall into this category.
3 Serious outage up to 2 days. Cut communications or staff disputes may be
: involved at this level.
4 Major outage over 2 days. Natural disasters such as flooding and fire are the
: most likely causes of extended outages.
It's of course necessary to apply the scale uniformly and to ignore cumulative threats
such as flooding leading to a landslide - consider them individually.
A probability value then needs to be applied to each threat going from 1 for low to 10
for high.
To then create a weighted risk rating the impact value should be multiplied by the
probability factor e.g. if vandalism such as cabling into the premises being cut will
produce an impact rating of 2 and because it's a high crime area the probability is
assessed at 7, the weighted risk rating is 14.
Using a system such as this the threats can be scaled with resources and priorities be
applied accordingly.
Activity 1
Discus the following case study and try to answer the questions that follows.
1. Assume that you are running a barber in Arbaminch town but the power
fluctuation is the big problem because it goes for a day or even two days.
List the all possible problems that can be caused due to power outage
Calculate the risk of the business due to power outage.
Try to quantify the amount of money in birr if the power gone for two
days.
If the power goes twice a week, how much amount of money in birr lost
in one month? In a year?
If there are 100 barbers how much money in birr will be lost in a year?
2. Assume that you are providing a service for customers by using your laptop
computer. Unfortunately your computer’s hard disk is crashed and you can’t
run your business due to the failure.
What is the possible lose your business may face?
Try to quantify the lose in economic value.
What are effects of your computer’s damage on your customers?
3. How can you prevent internal system threats?
The terms 32-bit and 64-bit refer to the way a computer's processor (also called a
CPU), handles information. The 64-bit version of Windows handles large amounts of
random access memory (RAM) more effectively than a 32-bit system.
So you just bought a fancy new computer, and it’s got a big sticker on it that says “64-
bit!” Have you found yourself wondering why this particular computing buzzword is so
prominently featured on your new hardware, and what exactly it means? Modern
computing has been shifting towards 64-bit for a few years now, and it has saturated
the market to a point where even entry-level computers are equipped with these new,
more powerful processors. Even with the manufacturers pushing the new CPUs, your
computer may not be able to take full advantage of the technology, and getting to that
point may cost you more money in software than it’s worth.
The number of bits in a processor refers to the size of the data types that it handles
and the size of its registry. A 64-bit processor is capable of storing 264 computational
values, including memory addresses, which means it’s able to access over four billion
times as much physical memory than a 32-bit processor! The key difference: 32-bit
processors are perfectly capable of handling a limited amount of RAM, and 64-bit
processors are capable of utilizing much more. Of course, in order to achieve this,
your operating system also needs to be designed to take advantage of the greater
access to memory
As a general rule, if you have under 4 GB of RAM in your computer, you don’t need a
64-bit CPU, but if you have 4 GB or more, you do. While many users may find that a
32-bit processor provides them with enough performance and memory access,
applications that tend to use large amounts of memory may show vast improvements
with the upgraded processor. Image and video editing software, 3D rendering utilities,
and video games will make better use of a 64-bit architecture and operating system,
especially if the machine has 8 or even 16 GB of RAM that can be divided among the
applications that need it.
Through hardware emulation, it’s possible to run 32-bit software and operating
systems on a machine with a 64-bit processor. The opposite isn’t true however, in that
32-bit processors cannot run software designed with 64-bit architecture in mind. This
means if you want to take full advantage of your new processor you also need a new
operating system, otherwise you won’t experience any marked benefits over the 32-bit
version of your hardware.
With an increase in the availability of 64-bit processors and larger capacities of RAM,
Microsoft and Apple both have begun to develop and release upgraded versions of their
operating systems that are designed to take full advantage of the new technology. In
the case of Microsoft Windows, the basic versions of the operating systems put
software limitations on the amount of RAM that can be used by applications, but even
in the ultimate and professional version of the operating system, 4 GB is the
maximum usable memory the 32-bit version can handle. While a 64-bit operating
system can increase the capabilities of a processor drastically, the real jump in power
comes from software designed with this architecture in mind.
Applications with high performance demands already take advantage of the increase in
available memory, with companies releasing 64-bit versions of their programs. This is
especially useful on programs that can store a lot of information for immediate access,
like image editing and software that opens multiple large files at the same time.
Video games are also uniquely equipped to take advantage of 64-bit processing and
the increased memory that comes with it. Being able to handle more computations at
once means more spaceships on screen without lagging and smoother performance
from your graphics card, which doesn’t have to share memory with other processes
anymore.
Most software is backwards compatible, allowing you to run applications that are 32-
bit in a 64-bit environment without any extra work or issues. Virus protection
software and drivers tend to be the exception to this rule, with hardware mostly
requiring the proper version be installed in order to function correctly.
Here are answers to some common questions about the 32-bit and 64-bit
versions of Windows
3. In the System section, you can see what type of operating system you're
currently running under System type, and whether or not you can run a 64-bit
version of Windows under 64-bit capable. (If your computer is already running
a 64-bit version of Windows, you won't see the 64-bit capable listing.)
No. If you are currently running a 32-bit version of Windows, you can only perform an
upgrade to another 32-bit version of Windows. Similarly, if you are running a 64-bit
version of Windows, you can only perform an upgrade to another 64-bit version of
Windows.
If you want to move from a 32-bit version of Windows to a 64-bit version of Windows,
you'll need to back up your files and then perform a custom installation of the 64-bit
version of Windows.
Most programs designed for a computer running a 32-bit version of Windows will work
on a computer running 64-bit versions of Windows. Notable exceptions are many
antivirus programs, and some hardware drivers.
Drivers designed for 32-bit versions of Windows do not work on computers running a
64-bit version of Windows. If you're trying to install a printer or other device that only
has 32-bit drivers available, it won't work correctly on a 64-bit version of Windows.
For information about updating drivers and troubleshooting issues with device drivers
for 64-bit versions of Windows, contact the manufacturer of the device or program.
The benefits are most apparent when you have a large amount of random access
memory (RAM) installed on your computer, typically 4 GB of RAM or more. In such
cases, because a 64-bit operating system can handle large amounts of memory more
efficiently than a 32-bit operating system can, a 64-bit system can be more responsive
when running several programs at the same time and switching between them
frequently.
If I'm running a 64-bit version of Windows, do I need 64-bit drivers for my devices ?
Yes. All hardware devices need 64-bit drivers to work on a 64-bit version of Windows.
Drivers designed for 32-bit versions of Windows don't work on computers running 64-
bit versions of Windows.
Risk Mitigation
Imagine that you are an investigative journalist. For each assignment, your job is to
research a topic deeply to uncover the hidden facts, and to report the story in such a
way that it provides context to your readers. How does what you uncovered have the
potential to affect their daily lives? Unlike traditional analytical journalism, which
simply reports a story from the data that is available, investigative journalism
attempts to determine if what has been presented is, in fact, reality. Architecture
evaluation shares that objective. The purpose of the evaluation is not simply to review
and communicate the candidate-architecture specification to the stakeholders. The
objective is to review and evaluate the architecture, assess its ability to meet quality
requirements, detect design errors early in the software-development life cycle (SDLC),
and identify potential risks to the project. In other words, the objective is to determine
if the reality of the specification measures up to its claims.
Why?
Why should an organization review and evaluate software architecture? The bottom
line is that architecture review produces better architectures—resulting in the delivery
of better systems. Too often, systems are released with performance issues, security
risks, and availability problems as a result of inappropriate architectures. The
architectures were defined early in the project life cycle, but the resulting flaws were
discovered much later. They were exposed when the project was affected most
negatively by change, when downstream artifacts were too costly to overhaul.
There are also some positive side effects of evaluation. First, the process necessitates
the unambiguous articulation of the system's quality requirements. If the
requirements are too vague to evaluate an architecture against, they must be
elaborated upon. Poorly specified requirements result in hit-or-miss architectures.
Evaluation also forces you to document the architecture clearly, so that it can be
reviewed. Furthermore, as you participate in regular evaluations of your work, you
learn to anticipate the questions that will be asked and the typical criteria against
which your work will be measured. Over time, this process promotes stronger
architectural skills.
What?
The objectives for a review are based upon stakeholder concerns and focus on specific
aspects of the architecture. Objectives will vary from project to project, according to
each system's specific requirements, but there are a few general categories under
which most tend to fall. Typically, stakeholders want to ensure the quality and
suitability of the architecture, identify areas in which improvement is required, open a
dialogue between decision makers to address areas of risk, and negotiate any
necessary trade-offs.
What are the outputs of an architectural evaluation and review? The primary output is
a comprehensive report that describes the evaluation-and-review findings. This
document need only be as formal as required by the project, but it should serve as a
concise summary of the assessment that can be communicated to the project team, as
well as the stakeholders. The report should include the scope of the review,
evaluation-and-review objectives, architecturally significant requirements list, findings
and recommendations, and an action plan.
What is the scope of an architecture evaluation and review? The scope describes the
boundaries of a specific instance of a review. For example, the architecture of the
entire system can be evaluated, or only part of the system. A review can evaluate the
architecture against all of the system's quality requirements, or only the most critical
ones. Discover the appropriate scope by prioritizing the goals of the evaluation, based
on its defined objectives.
What exactly should be reviewed? Based on the defined objectives and scope, create
a list of the specific criteria against which the architecture will be measured. The list
might include system-wide properties, significant functional requirements to deliver,
and general attributes of quality architectures. The goal is to review and assess how
each item on the list is affected by the architectural decisions that are made.
A true investigative approach, however, takes time to ask, "What criteria have been
excluded, and why?" Are there political agendas at stake that selectively ignore aspects
of the architecture? Have software and other technologies been mandated that
constrain the architecture's ability to meet its objectives? While some of these
scenarios cannot be avoided in the real business world, it is always appropriate for the
architect and the reviewers to acknowledge any limitations of the architecture, even if
they cannot be removed.
Who?
Who participates in a software architecture evaluation and review? The objective of the
selection process is to ensure that people with the right skills and relevance to the
project are assigned to support the effort effectively, without creating a crowd that is
too large to be efficient. Ideally, there should be active representation from three
contingencies: an evaluation team, project stakeholders, and project practitioners.
The evaluation team conducts the actual evaluation and documents all findings. In
large organizations, an evaluation team often comprises practitioners who rotate
through the team in between other projects. Staffing the evaluation team with
practitioners from the target project should be avoided, if possible, to maintain the
highest degree of objectivity. For very small projects, however, self-assessments and
peer reviews are completely acceptable. It is critical that members of the evaluation
team have respect and credibility as architects, so that their conclusions will carry
weight with the project representatives and stakeholders.
Stakeholders are the people who have specific architectural concerns and a vested
interest in the resulting system. Most of the architectural requirements were specified
by these stakeholders, so that their participation in the evaluation is critical.
System architects and component designers are the key project representatives and
are responsible for communicating the architecture and presenting their motivations
for design decisions. Other project representatives to include are project and program
managers, developers, system administrators, and component vendors.
The follow-up step for an investigative approach is to ask, "Who is missing from the
participant list?" What stakeholders or project representatives intentionally were not
included? Occasionally, practitioners and stakeholders are excluded because of past
experiences. Perhaps they were not supportive of a previous evaluation effort—not
dedicating enough time, not taking the evaluation as seriously as they should have, or
exhibiting defensive or contentious behavior. Part of the evaluation process is coaching
the participants. If someone is important to an evaluation for the knowledge that they
have or the requirements that they represent, it is worth the effort to try to influence
their behavior, so that they can contribute to the process.
When?
When should an architecture evaluation and review take place? If only one evaluation
can be performed, it takes place ideally as early in the life cycle as is reasonable and
possible. Generally, you want to conduct the evaluation when the architecture is
specified, but before anything has been implemented. The goal is to identify any areas
of concern as early as possible, while they are still relatively easy and cheap to correct.
That being said, an evaluation and review can be conducted at any stage in the life
cycle. For projects using an iterative development approach, evaluation can take place
within each iteration—whenever architectural decisions have been made. Evaluations
also can be conducted on legacy systems, to assess their ability to support future
business objectives.
Your investigative instincts should be getting sharper by now. How can we take the
"when" question a step further? Beware of stakeholders or project representatives
balking at the timing of an evaluation. The reasons could be completely valid; maybe
they are unavailable, or they truly feel that the timing is inappropriate. Digging a little
deeper might reveal project issues. The architecture team might be struggling. They
might not see the evaluation as their chance to get valuable input and advice.
Stakeholders might not be ready and willing to negotiate any conflicting requirements.
Take the time to uncover the true reasons behind any postponement attempts. You
might find a critical risk hidden behind that reluctance.
How?
How is an architecture evaluation and review performed? Prior to the review, you
should gather inputs that describe the architecture and explain the rationale behind
the architectural decisions that are made. Examples of typically selected inputs are
the architecturally significant requirements, an architectural description or software
architecture document, an architectural-decisions document, and an architectural
proof of concept.
The final step of the evaluation-and-review process is to document the findings, and
communicate them to the project team and stakeholders. When architectural concerns
or deficiencies are exposed, it is critical to provide recommendations for improvement
that are actionable. The whole point of the investigative approach is to uncover issues
that otherwise might have been overlooked. If recommendations are too generic to be
implemented, the evaluation cannot contribute much to the success of the project.
Where?
After the review—where do you go from here? When the evaluation report is complete,
you typically are given an opportunity to respond to the findings and
recommendations. The report then is forwarded to the stakeholders for use in
planning the next steps for the project. Sometimes, an evaluation will identify the need
for trade-offs. For example, if the architecture cannot support a specific performance
requirement, stakeholders must determine if the benefit of strengthening the
architecture to achieve that requirement is worth the cost. Following an evaluation,
the architectural decisions should be updated, requirements refined and prioritized,
and the project adjusted as necessary.
While each evaluation produces different results, the goal is always the same: to
produce a better architecture. For you, the architect: Consider an evaluation of your
work as a way to produce improved specifications by tapping into the experiences of
veteran architects. See each evaluation as a valuable learning opportunity. Your
projects will benefit, your organization will benefit, and so, too, will your career.
Critical-Thinking Questions
A validated architecture does not guarantee the quality of the resulting system.
How can downstream design decisions undermine the architecture's ability to
meet its quality objectives?
Unit Two
Purpose
Review all assumptions that have led to the formulation of strategies and the
applicability of any policies to the particular asset
Assess whether they have any significant impact with respect to well/facility
maintenance and intervention
Identify assumptions to be reviewed and any departures from policy
Formulate assumptions into change proposals for the asset reference plan
3. Determine maintenance strategies
You determine maintenance strategies in line with company objectives and the
requirements of the asset holder as specified in the asset reference plan which
includes any changes from the Review maintenance assumptions process step (see
above).
Purpose
RCM techniques often utilize a logic diagram approach for evaluating the potential
effects of failure and selecting the appropriate maintenance strategy. As an example,
Figure 1 shows a portion of one of the decision-making flowcharts presented in the
SAE JA1012 document, A Guide to the Reliability-Centered Maintenance (RCM)
Standard. Similar diagrams are provided in other published RCM guidelines. (Some of
the major RCM publications are listed in the References section of this article.)
In addition to, or instead of, a logic diagram approach, the RCM analyst may wish to
use cost- and availability-based comparisons of potential maintenance strategies when
selecting and assigning maintenance tasks. This article provides an overview of these
comparison techniques along with a couple of demonstration examples.
Run-to-Failure - fix the equipment when it fails but do not perform any
scheduled maintenance.
Scheduled Inspections
Failure Finding Inspections - inspect the equipment on a scheduled
basis to discover hidden failures. If the equipment is found to be failed,
initiate corrective maintenance.
On-Condition Inspections - inspect the equipment on a scheduled or
ongoing basis to discover conditions indicating that a failure is about to
occur. If the equipment is found to be about to fail, initiate preventive
maintenance.
Scheduled Preventive Maintenance
Service - perform lubrication or other servicing actions on a scheduled
basis.
Repair - repair or overhaul the equipment on a scheduled basis.
Replace - replace the equipment on a scheduled basis.
Design Change - Re-design the equipment, select different equipment or make
some other one-time change to improve the reliability/availability of the
equipment.
Given certain information about how the equipment will be operated, the probability of
occurrence for the failure mode and the maintenance characteristics, the analyst can
use simulation to estimate the cost and average availability that can be expected over
the operational life of the equipment when a particular maintenance strategy is
employed. The calculations can then be used to compare available maintenance
strategies so that the analyst can select the most cost-effective strategy that provides
an acceptable level of performance.
To estimate the cost and average availability that can be expected for a run-to-failure
(corrective maintenance only) maintenance strategy, the analyst must provide the
following information:
Scheduled Repair/Replacement
To calculate the cost and availability that can be expected from a maintenance
strategy that involves preventive repair/replacement of the equipment, the following
information is required (in addition to the inputs described previously):
With this additional information, simulation can be used to estimate the expected
number of corrective maintenance (CM) and preventive maintenance (PM) actions, along
with the uptime. The total operating cost for this maintenance strategy includes the
cost of all CMs plus the cost of all PMs, as shown next. Note that the Cost per Uptime
and Average Availability calculations are the same, regardless of task type.
Calculations for Service and Failure Finding tasks are performed in a similar manner
except that the assumptions of the simulation will vary to fit the conditions of the
task. For example, if the failure is undetectable during normal operation and the
equipment is found to be failed during a scheduled Service task, then the simulation
will assume that corrective maintenance will be initiated. Likewise, a Failure Finding
task can initiate corrective action if the equipment is found to be failed but does not
restore the equipment to any degree if it is found to be running.
On-Condition Inspections
For the cases in which the inspection detects that a failure is approaching, the
analysis also requires the downtime, cost and restoration factor associated with the
preventive maintenance that will be initiated.
This total operating cost is then used to calculate cost per uptime and average
availability as described previously.
Example 1: Mechanical Component with Wearout
Consider an RCM analysis for a large truck that is intended to operate for 120,000
miles per year. A critical failure mode has been identified for a mechanical component
and reliability analysis indicates that the failure behavior follows a Weibull
distribution with beta = 2.3 and eta = 72,000 miles. Considering logistical factors,
downtime penalties and the actual repair resources, it takes 7 work days (3,500 miles
of lost “production”) and costs $4,650 each time the component must be replaced
when it fails. The component will be “as good as new” after the maintenance action.
The RCM analysis team is considering whether to incorporate a scheduled preventive
replacement task into the maintenance plan. Because there are no additional logistical
delays/costs for a planned replacement, the PM task will take only 1 work day and
cost $2,050.
Using the RCM++ software, the team can first estimate the optimum preventive
replacement time for the component and then simulate the operation of the equipment
for 120,000 miles to estimate the cost and average availability that can be expected in
a year from the two maintenance strategies that are under consideration. By entering
the cost of corrective maintenance (CM), the cost of preventive maintenance (PM) and
the probability of failure into the following equation, the optimum PM interval is
determined to be 60,330.25 miles.
Rounding to 60,000 miles and performing the simulation yields the following results
per vehicle per year:
Run-to-Failure
Preventive Replacement
Another critical failure mode has been identified for an electrical component of the
truck described in Example 1. This follows a Weibull distribution with beta = .76 and
eta = 100,000 miles. The RCM analysis team is considering a planned replacement for
this component at 60,000 miles to coincide with the PM for the mechanical
component. For this failure mode, the CM downtime is 4 work days; the CM cost is
$2,800; the PM cost would be $1,200 and there would be no additional PM downtime
because the equipment is already down for the other maintenance task. The analysis
yields the following results:
Run-to-Failure
Preventive Replacement
In this case, the analysis indicates that a run-to-failure maintenance strategy will be
more cost-effective and provide better availability. In fact, since the beta parameter of
the failure distribution is less than 1, this indicates that the equipment does not
experience wearout and there is no optimum preventive replacement time. The team
could repeat the analysis for other maintenance intervals and would always determine
that run-to-failure is more cost-effective.
Conclusion
As this topic demonstrates, cost-based comparisons can be very useful to help RCM
analysts to select the most appropriate maintenance strategy for a particular piece of
equipment/failure mode. ReliaSoft's RCM++ software automatically performs the
maintenance task cost calculations described here. This functionality relies on the
same powerful simulation engine available in ReliaSoft's BlockSim software, which can
also be used for maintenance planning and other more complex system reliability,
maintainability and availability analyses.
Understanding PM Schedules
Understanding PM Cycle Events
Understanding PM Schedules
When you manage the equipment maintenance needs, you define the type and
frequency of each maintenance task for each piece of equipment in the organization.
The PM cycle refers to the sequence of events that make up a maintenance task, from
its definition to its completion. Because most PM tasks are commonly performed at
scheduled intervals, parts of the PM cycle repeat, based on those intervals.
You should be familiar with the following terms and concepts that are related to the
PM cycle.
Service Type
You define service types to describe individual preventive maintenance tasks. You can
define as many service types as you need. You can set up service types to apply to a
particular piece of equipment or a class of equipment. Examples of service types
include:
250-hour inspection
Clutch adjustment
Lubricate ventilation fan
10,000-hour engine rebuild
Installing ant viruses
Cleaning computer parts
Installing updates patches
Preventive Maintenance (pm)
A PM refers to one or more service types that are scheduled to be performed for a piece
of equipment. You typically specify that a PM be performed at a predefined point in
time. The point in time can be based on days, date, or when a piece of equipment
accumulates a predefined number of statistical units, such as hours, miles, and so on.
You identify how many units have accumulated for each piece of equipment by
periodically entering equipment meter readings.
You create one PM schedule for each piece of equipment for which you want to
perform PMs. The PM schedule defines which service types apply to a piece of
equipment. The PM schedule also defines the service interval for each service type. A
service interval refers to the frequency at which the service types are performed.