0% found this document useful (0 votes)

238 views

Reliability Engineering

Reliability engineering deals with predicting, preventing, and managing engineering uncertainty and risks of failure. The objectives are to prevent failures, identify and correct failures that do occur, and determine ways to cope with failures. Reliability engineering techniques include reliability testing, failure analysis, fault tolerance through redundancy, and preventative maintenance. The goal is to design systems and components that function as intended without failure for a specified period of time.

Uploaded by

mia farrow

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

238 views

Reliability Engineering

Uploaded by

mia farrow

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Reliability engineering

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment
to function without failure. Reliability describes the ability of a system or component to function under
stated conditions for a specified period of time.[1] Reliability is closely related to availability, which is
typically described as the ability of a component or system to function at a specified moment or interval of
time.

The reliability function is theoretically defined as the probability of success at time t, which is denoted R(t).
In practice, it is calculated using different techniques and its value ranges between 0 and 1, where 0
indicates no probability of success while 1 indicates definite success. This probability is estimated from
detailed (physics of failure) analysis, previous data sets or through reliability testing and reliability
modeling. Availability, testability, maintainability and maintenance are often defined as a part of "reliability
engineering" in reliability programs. Reliability often plays the key role in the cost-effectiveness of systems.

Reliability engineering deals with the prediction, prevention and management of high levels of "lifetime"
engineering uncertainty and risks of failure. Although stochastic parameters define and affect reliability,
reliability is not only achieved by mathematics and statistics.[2][3] "Nearly all teaching and literature on the
subject emphasize these aspects, and ignore the reality that the ranges of uncertainty involved largely
invalidate quantitative methods for prediction and measurement." [4] For example, it is easy to represent
"probability of failure" as a symbol or value in an equation, but it is almost impossible to predict its true
magnitude in practice, which is massively multivariate, so having the equation for reliability does not begin
to equal having an accurate predictive measurement of reliability.

Reliability engineering relates closely to Quality Engineering, safety engineering and system safety, in that
they use common methods for their analysis and may require input from each other. It can be said that a
system must be reliably safe.

Reliability engineering focuses on costs of failure caused by system downtime, cost of spares, repair
equipment, personnel, and cost of warranty claims.[5]

History
The word reliability can be traced back to 1816, and is first attested to the poet Samuel Taylor Coleridge.[6]
Before World War II the term was linked mostly to repeatability; a test (in any type of science) was
considered "reliable" if the same results would be obtained repeatedly. In the 1920s, product improvement
through the use of statistical process control was promoted by Dr. Walter A. Shewhart at Bell Labs,[7]
around the time that Waloddi Weibull was working on statistical models for fatigue. The development of
reliability engineering was here on a parallel path with quality. The modern use of the word reliability was
defined by the U.S. military in the 1940s, characterizing a product that would operate when expected and
for a specified period of time.

In World War II, many reliability issues were due to the inherent unreliability of electronic equipment
available at the time, and to fatigue issues. In 1945, M.A. Miner published the seminal paper titled
"Cumulative Damage in Fatigue" in an ASME journal. A main application for reliability engineering in the
military was for the vacuum tube as used in radar systems and other electronics, for which reliability proved
to be very problematic and costly. The IEEE formed the Reliability Society in 1948. In 1950, the United
States Department of Defense formed a group called the "Advisory Group on the Reliability of Electronic
Equipment" (AGREE) to investigate reliability methods for military equipment.[8] This group
recommended three main ways of working:

Improve component reliability.

Establish quality and reliability requirements for suppliers.
Collect field data and find root causes of failures.

In the 1960s, more emphasis was given to reliability testing on component and system level. The famous
military standard MIL-STD-781 was created at that time. Around this period also the much-used
predecessor to military handbook 217 was published by RCA and was used for the prediction of failure
rates of electronic components. The emphasis on component reliability and empirical research (e.g. Mil Std
217) alone slowly decreased. More pragmatic approaches, as used in the consumer industries, were being
used. In the 1980s, televisions were increasingly made up of solid-state semiconductors. Automobiles
rapidly increased their use of semiconductors with a variety of microcomputers under the hood and in the
dash. Large air conditioning systems developed electronic controllers, as had microwave ovens and a
variety of other appliances. Communications systems began to adopt electronics to replace older
mechanical switching systems. Bellcore issued the first consumer prediction methodology for
telecommunications, and SAE developed a similar document SAE870050 for automotive applications. The
nature of predictions evolved during the decade, and it became apparent that die complexity wasn't the only
factor that determined failure rates for integrated circuits (ICs). Kam Wong published a paper questioning
the bathtub curve[9]—see also reliability-centered maintenance. During this decade, the failure rate of many
components dropped by a factor of 10. Software became important to the reliability of systems. By the
1990s, the pace of IC development was picking up. Wider use of stand-alone microcomputers was
common, and the PC market helped keep IC densities following Moore's law and doubling about every 18
months. Reliability engineering was now changing as it moved towards understanding the physics of
failure. Failure rates for components kept dropping, but system-level issues became more prominent.
Systems thinking became more and more important. For software, the CMM model (Capability Maturity
Model) was developed, which gave a more qualitative approach to reliability. ISO 9000 added reliability
measures as part of the design and development portion of certification. The expansion of the World-Wide
Web created new challenges of security and trust. The older problem of too little reliability information
available had now been replaced by too much information of questionable value. Consumer reliability
problems could now be discussed online in real time using data. New technologies such as micro-
electromechanical systems (MEMS), handheld GPS, and hand-held devices that combined cell phones and
computers all represent challenges to maintain reliability. Product development time continued to shorten
through this decade and what had been done in three years was being done in 18 months. This meant that
reliability tools and tasks had to be more closely tied to the development process itself. In many ways,
reliability became part of everyday life and consumer expectations.

Overview

Objective

The objectives of reliability engineering, in decreasing order of priority, are:[10]

1. To apply engineering knowledge and specialist techniques to prevent or to reduce the

likelihood or frequency of failures.
2. To identify and correct the causes of failures that do occur despite the efforts to prevent them.
3. To determine ways of coping with failures that do occur, if their causes have not been
corrected.
4. To apply methods for estimating the likely reliability of new designs, and for analysing
reliability data.

The reason for the priority emphasis is that it is by far the most effective way of working, in terms of
minimizing costs and generating reliable products. The primary skills that are required, therefore, are the
ability to understand and anticipate the possible causes of failures, and knowledge of how to prevent them.
It is also necessary to have knowledge of the methods that can be used for analysing designs and data.

Scope and techniques

Reliability engineering for "complex systems" requires a different, more elaborate systems approach than
for non-complex systems. Reliability engineering may in that case involve:

System availability and mission readiness analysis and related reliability and maintenance
requirement allocation
Functional system failure analysis and derived requirements specification
Inherent (system) design reliability analysis and derived requirements specification for both
hardware and software design
System diagnostics design
Fault tolerant systems (e.g. by redundancy)
Predictive and preventive maintenance (e.g. reliability-centered maintenance)
Human factors / human interaction / human errors
Manufacturing- and assembly-induced failures (effect on the detected "0-hour quality" and
reliability)
Maintenance-induced failures
Transport-induced failures
Storage-induced failures
Use (load) studies, component stress analysis, and derived requirements specification
Software (systematic) failures
Failure / reliability testing (and derived requirements)
Field failure monitoring and corrective actions
Spare parts stocking (availability control)
Technical documentation, caution and warning analysis
Data and information acquisition/organisation (creation of a general reliability development
hazard log and FRACAS system)
Chaos engineering

Effective reliability engineering requires understanding of the basics of failure mechanisms for which
experience, broad engineering skills and good knowledge from many different special fields of engineering
are required,[11] for example:

Tribology
Stress (mechanics)
Fracture mechanics / fatigue
Thermal engineering
Fluid mechanics / shock-loading engineering
Electrical engineering
Chemical engineering (e.g. corrosion)
Material science

Definitions

Reliability may be defined in the following ways:

The idea that an item is fit for a purpose with respect to time
The capacity of a designed, produced, or maintained item to perform as required over time
The capacity of a population of designed, produced or maintained items to perform as
required over time
The resistance to failure of an item over time
The probability of an item to perform a required function under stated conditions for a
specified period of time
The durability of an object

Basics of a reliability assessment

Many engineering techniques are used in reliability risk assessments, such as reliability block diagrams,
hazard analysis, failure mode and effects analysis (FMEA),[12] fault tree analysis (FTA), Reliability
Centered Maintenance, (probabilistic) load and material stress and wear calculations, (probabilistic) fatigue
and creep analysis, human error analysis, manufacturing defect analysis, reliability testing, etc. It is crucial
that these analyses are done properly and with much attention to detail to be effective. Because of the large
number of reliability techniques, their expense, and the varying degrees of reliability required for different
situations, most projects develop a reliability program plan to specify the reliability tasks (statement of work
(SoW) requirements) that will be performed for that specific system.

Consistent with the creation of safety cases, for example per ARP4761, the goal of reliability assessments is
to provide a robust set of qualitative and quantitative evidence that use of a component or system will not
be associated with unacceptable risk. The basic steps to take[13] are to:

Thoroughly identify relevant unreliability "hazards", e.g. potential conditions, events, human
errors, failure modes, interactions, failure mechanisms and root causes, by specific analysis
or tests.
Assess the associated system risk, by specific analysis or testing.
Propose mitigation, e.g. requirements, design changes, detection logic, maintenance,
training, by which the risks may be lowered and controlled for at an acceptable level.
Determine the best mitigation and get agreement on final, acceptable risk levels, possibly
based on cost/benefit analysis.

Risk here is the combination of probability and severity of the failure incident (scenario) occurring. The
severity can be looked at from a system safety or a system availability point of view. Reliability for safety
can be thought of as a very different focus from reliability for system availability. Availability and safety can
exist in dynamic tension as keeping a system too available can be unsafe. Forcing an engineering system
into a safe state too quickly can force false alarms that impede the availability of the system.

In a de minimis definition, severity of failures includes the cost of spare parts, man-hours, logistics, damage
(secondary failures), and downtime of machines which may cause production loss. A more complete
definition of failure also can mean injury, dismemberment, and death of people within the system (witness
mine accidents, industrial accidents, space shuttle failures) and the same to innocent bystanders (witness the
citizenry of cities like Bhopal, Love Canal, Chernobyl, or Sendai, and other victims of the 2011 Tōhoku
earthquake and tsunami)—in this case, reliability engineering becomes system safety. What is acceptable is
determined by the managing authority or customers or the affected communities. Residual risk is the risk
that is left over after all reliability activities have finished, and includes the unidentified risk—and is
therefore not completely quantifiable.

The complexity of the technical systems such as improvements of design and materials, planned
inspections, fool-proof design, and backup redundancy decreases risk and increases the cost. The risk can
be decreased to ALARA (as low as reasonably achievable) or ALAPA (as low as practically achievable)
levels.

Reliability and availability program plan

Implementing a reliability program is not simply a software purchase; it is not just a checklist of items that
must be completed that will ensure one has reliable products and processes. A reliability program is a
complex learning and knowledge-based system unique to one's products and processes. It is supported by
leadership, built on the skills that one develops within a team, integrated into business processes and
executed by following proven standard work practices.[14]

A reliability program plan is used to document exactly what "best practices" (tasks, methods, tools,
analysis, and tests) are required for a particular (sub)system, as well as clarify customer requirements for
reliability assessment. For large-scale complex systems, the reliability program plan should be a separate
document. Resource determination for manpower and budgets for testing and other tasks is critical for a
successful program. In general, the amount of work required for an effective program for complex systems
is large.

A reliability program plan is essential for achieving high levels of reliability, testability, maintainability, and
the resulting system availability, and is developed early during system development and refined over the
system's life-cycle. It specifies not only what the reliability engineer does, but also the tasks performed by
other stakeholders. An effective reliability program plan must be approved by top program management,
which is responsible for allocation of sufficient resources for its implementation.

A reliability program plan may also be used to evaluate and improve the availability of a system by the
strategy of focusing on increasing testability & maintainability and not on reliability. Improving
maintainability is generally easier than improving reliability. Maintainability estimates (repair rates) are also
generally more accurate. However, because the uncertainties in the reliability estimates are in most cases
very large, they are likely to dominate the availability calculation (prediction uncertainty problem), even
when maintainability levels are very high. When reliability is not under control, more complicated issues
may arise, like manpower (maintainers / customer service capability) shortages, spare part availability,
logistic delays, lack of repair facilities, extensive retro-fit and complex configuration management costs, and
others. The problem of unreliability may be increased also due to the "domino effect" of maintenance-
induced failures after repairs. Focusing only on maintainability is therefore not enough. If failures are
prevented, none of the other issues are of any importance, and therefore reliability is generally regarded as
the most important part of availability. Reliability needs to be evaluated and improved related to both
availability and the total cost of ownership (TCO) due to cost of spare parts, maintenance man-hours,
transport costs, storage cost, part obsolete risks, etc. But, as GM and Toyota have belatedly discovered,
TCO also includes the downstream liability costs when reliability calculations have not sufficiently or
accurately addressed customers' personal bodily risks. Often a trade-off is needed between the two. There
might be a maximum ratio between availability and cost of ownership. Testability of a system should also
be addressed in the plan, as this is the link between reliability and maintainability. The maintenance strategy
can influence the reliability of a system (e.g., by preventive and/or predictive maintenance), although it can
never bring it above the inherent reliability.
The reliability plan should clearly provide a strategy for availability control. Whether only availability or
also cost of ownership is more important depends on the use of the system. For example, a system that is a
critical link in a production system—e.g., a big oil platform—is normally allowed to have a very high cost
of ownership if that cost translates to even a minor increase in availability, as the unavailability of the
platform results in a massive loss of revenue which can easily exceed the high cost of ownership. A proper
reliability plan should always address RAMT analysis in its total context. RAMT stands for reliability,
availability, maintainability/maintenance, and testability in the context of the customer's needs.

Reliability requirements
For any system, one of the first tasks of reliability engineering is to adequately specify the reliability and
maintainability requirements allocated from the overall availability needs and, more importantly, derived
from proper design failure analysis or preliminary prototype test results. Clear requirements (able to
designed to) should constrain the designers from designing particular unreliable items / constructions /
interfaces / systems. Setting only availability, reliability, testability, or maintainability targets (e.g., max.
failure rates) is not appropriate. This is a broad misunderstanding about Reliability Requirements
Engineering. Reliability requirements address the system itself, including test and assessment requirements,
and associated tasks and documentation. Reliability requirements are included in the appropriate system or
subsystem requirements specifications, test plans, and contract statements. Creation of proper lower-level
requirements is critical.[15] Provision of only quantitative minimum targets (e.g., Mean Time Between
Failure (MTBF) values or failure rates) is not sufficient for different reasons. One reason is that a full
validation (related to correctness and verifiability in time) of a quantitative reliability allocation (requirement
spec) on lower levels for complex systems can (often) not be made as a consequence of (1) the fact that the
requirements are probabilistic, (2) the extremely high level of uncertainties involved for showing
compliance with all these probabilistic requirements, and because (3) reliability is a function of time, and
accurate estimates of a (probabilistic) reliability number per item are available only very late in the project,
sometimes even after many years of in-service use. Compare this problem with the continuous
(re-)balancing of, for example, lower-level-system mass requirements in the development of an aircraft,
which is already often a big undertaking. Notice that in this case, masses do only differ in terms of only
some %, are not a function of time, the data is non-probabilistic and available already in CAD models. In
case of reliability, the levels of unreliability (failure rates) may change with factors of decades (multiples of
10) as result of very minor deviations in design, process, or anything else.[16] The information is often not
available without huge uncertainties within the development phase. This makes this allocation problem
almost impossible to do in a useful, practical, valid manner that does not result in massive over- or under-
specification. A pragmatic approach is therefore needed—for example: the use of general levels / classes of
quantitative requirements depending only on severity of failure effects. Also, the validation of results is a far
more subjective task than for any other type of requirement. (Quantitative) reliability parameters—in terms
of MTBF—are by far the most uncertain design parameters in any design.

Furthermore, reliability design requirements should drive a (system or part) design to incorporate features
that prevent failures from occurring, or limit consequences from failure in the first place. Not only would it
aid in some predictions, this effort would keep from distracting the engineering effort into a kind of
accounting work. A design requirement should be precise enough so that a designer can "design to" it and
can also prove—through analysis or testing—that the requirement has been achieved, and, if possible,
within some a stated confidence. Any type of reliability requirement should be detailed and could be
derived from failure analysis (Finite-Element Stress and Fatigue analysis, Reliability Hazard Analysis, FTA,
FMEA, Human Factor Analysis, Functional Hazard Analysis, etc.) or any type of reliability testing. Also,
requirements are needed for verification tests (e.g., required overload stresses) and test time needed. To
derive these requirements in an effective manner, a systems engineering-based risk assessment and
mitigation logic should be used. Robust hazard log systems must be created that contain detailed
information on why and how systems could or have failed. Requirements are to be derived and tracked in
this way. These practical design requirements shall drive the design and not be used only for verification
purposes. These requirements (often design constraints) are in this way derived from failure analysis or
preliminary tests. Understanding of this difference compared to only purely quantitative (logistic)
requirement specification (e.g., Failure Rate / MTBF target) is paramount in the development of successful
(complex) systems.[17]

The maintainability requirements address the costs of repairs as well as repair time. Testability (not to be
confused with test requirements) requirements provide the link between reliability and maintainability and
should address detectability of failure modes (on a particular system level), isolation levels, and the creation
of diagnostics (procedures). As indicated above, reliability engineers should also address requirements for
various reliability tasks and documentation during system development, testing, production, and operation.
These requirements are generally specified in the contract statement of work and depend on how much
leeway the customer wishes to provide to the contractor. Reliability tasks include various analyses,
planning, and failure reporting. Task selection depends on the criticality of the system as well as cost. A
safety-critical system may require a formal failure reporting and review process throughout development,
whereas a non-critical system may rely on final test reports. The most common reliability program tasks are
documented in reliability program standards, such as MIL-STD-785 and IEEE 1332. Failure reporting
analysis and corrective action systems are a common approach for product/process reliability monitoring.

Reliability culture / human errors / human factors

In practice, most failures can be traced back to some type of human error, for example in:

Management decisions (e.g. in budgeting, timing, and required tasks)

Systems Engineering: Use studies (load cases)
Systems Engineering: Requirement analysis / setting
Systems Engineering: Configuration control
Assumptions
Calculations / simulations / FEM analysis
Design
Design drawings
Testing (e.g. incorrect load settings or failure measurement)
Statistical analysis
Manufacturing
Quality control
Maintenance
Maintenance manuals
Training
Classifying and ordering of information
Feedback of field information (e.g. incorrect or too vague)
etc.

However, humans are also very good at detecting such failures, correcting for them, and improvising when
abnormal situations occur. Therefore, policies that completely rule out human actions in design and
production processes to improve reliability may not be effective. Some tasks are better performed by
humans and some are better performed by machines.[18]
Furthermore, human errors in management; the organization of data and information; or the misuse or abuse
of items, may also contribute to unreliability. This is the core reason why high levels of reliability for
complex systems can only be achieved by following a robust systems engineering process with proper
planning and execution of the validation and verification tasks. This also includes careful organization of
data and information sharing and creating a "reliability culture", in the same way that having a "safety
culture" is paramount in the development of safety critical systems.

Reliability prediction and improvement

Reliability prediction combines:

creation of a proper reliability model (see further on this page)

estimation (and justification) of input parameters for this model (e.g. failure rates for a
particular failure mode or event and the mean time to repair the system for a particular
failure)
estimation of output reliability parameters at system or part level (i.e. system availability or
frequency of a particular functional failure) The emphasis on quantification and target setting
(e.g. MTBF) might imply there is a limit to achievable reliability, however, there is no inherent
limit and development of higher reliability does not need to be more costly. In addition, they
argue that prediction of reliability from historic data can be very misleading, with
comparisons only valid for identical designs, products, manufacturing processes, and
maintenance with identical operating loads and usage environments. Even minor changes
in any of these could have major effects on reliability. Furthermore, the most unreliable and
important items (i.e. the most interesting candidates for a reliability investigation) are most
likely to be modified and re-engineered since historical data was gathered, making the
standard (re-active or pro-active) statistical methods and processes used in e.g. medical or
insurance industries less effective. Another surprising – but logical – argument is that to be
able to accurately predict reliability by testing, the exact mechanisms of failure must be
known and therefore – in most cases – could be prevented! Following the incorrect route of
trying to quantify and solve a complex reliability engineering problem in terms of MTBF or
probability using an-incorrect – for example, the re-active – approach is referred to by
Barnard as "Playing the Numbers Game" and is regarded as bad practice.[19]

For existing systems, it is arguable that any attempt by a responsible program to correct the root cause of
discovered failures may render the initial MTBF estimate invalid, as new assumptions (themselves subject
to high error levels) of the effect of this correction must be made. Another practical issue is the general
unavailability of detailed failure data, with those available often featuring inconsistent filtering of failure
(feedback) data, and ignoring statistical errors (which are very high for rare events like reliability related
failures). Very clear guidelines must be present to count and compare failures related to different type of
root-causes (e.g. manufacturing-, maintenance-, transport-, system-induced or inherent design failures).
Comparing different types of causes may lead to incorrect estimations and incorrect business decisions
about the focus of improvement.

To perform a proper quantitative reliability prediction for systems may be difficult and very expensive if
done by testing. At the individual part-level, reliability results can often be obtained with comparatively
high confidence, as testing of many sample parts might be possible using the available testing budget.
However, unfortunately these tests may lack validity at a system-level due to assumptions made at part-level
testing. These authors emphasized the importance of initial part- or system-level testing until failure, and to
learn from such failures to improve the system or part. The general conclusion is drawn that an accurate and
absolute prediction – by either field-data comparison or testing – of reliability is in most cases not possible.
An exception might be failures due to wear-out problems such as fatigue failures. In the introduction of
MIL-STD-785 it is written that reliability prediction should be used with great caution, if not used solely for
comparison in trade-off studies.

Design for reliability

Design for Reliability (DfR) is a process that encompasses tools and procedures to ensure that a product
meets its reliability requirements, under its use environment, for the duration of its lifetime. DfR is
implemented in the design stage of a product to proactively improve product reliability.[20] DfR is often
used as part of an overall Design for Excellence (DfX) strategy.

Statistics-based approach (i.e. MTBF)

Reliability design begins with the development of a (system) model. Reliability and availability models use
block diagrams and Fault Tree Analysis to provide a graphical means of evaluating the relationships
between different parts of the system. These models may incorporate predictions based on failure rates
taken from historical data. While the (input data) predictions are often not accurate in an absolute sense,
they are valuable to assess relative differences in design alternatives. Maintainability parameters, for
example Mean time to repair (MTTR), can also be used as inputs for such models.

The most important fundamental initiating causes and failure mechanisms are to be identified and analyzed
with engineering tools. A diverse set of practical guidance as to performance and reliability should be
provided to designers so that they can generate low-stressed designs and products that protect, or are
protected against, damage and excessive wear. Proper validation of input loads (requirements) may be
needed, in addition to verification for reliability "performance" by testing.

One of the most important design techniques is

redundancy. This means that if one part of the system
fails, there is an alternate success path, such as a backup
system. The reason why this is the ultimate design choice
is related to the fact that high-confidence reliability
evidence for new parts or systems is often not available, or
is extremely expensive to obtain. By combining
redundancy, together with a high level of failure
monitoring, and the avoidance of common cause failures;
even a system with relatively poor single-channel (part)
reliability, can be made highly reliable at a system level
(up to mission critical reliability). No testing of reliability
has to be required for this. In conjunction with
redundancy, the use of dissimilar designs or
manufacturing processes (e.g. via different suppliers of
similar parts) for single independent channels, can provide
less sensitivity to quality issues (e.g. early childhood A fault tree diagram
failures at a single supplier), allowing very-high levels of
reliability to be achieved at all moments of the
development cycle (from early life to long-term). Redundancy can also be applied in systems engineering
by double checking requirements, data, designs, calculations, software, and tests to overcome systematic
failures.
Another effective way to deal with reliability issues is to perform analysis that predicts degradation,
enabling the prevention of unscheduled downtime events / failures. RCM (Reliability Centered
Maintenance) programs can be used for this.

Physics-of-failure-based approach

For electronic assemblies, there has been an increasing shift towards a different approach called physics of
failure. This technique relies on understanding the physical static and dynamic failure mechanisms. It
accounts for variation in load, strength, and stress that lead to failure with a high level of detail, made
possible with the use of modern finite element method (FEM) software programs that can handle complex
geometries and mechanisms such as creep, stress relaxation, fatigue, and probabilistic design (Monte Carlo
Methods/DOE). The material or component can be re-designed to reduce the probability of failure and to
make it more robust against such variations. Another common design technique is component derating: i.e.
selecting components whose specifications significantly exceed the expected stress levels, such as using
heavier gauge electrical wire than might normally be specified for the expected electric current.

Common tools and techniques

Many of the tasks, techniques, and analyses used in Reliability Engineering are specific to particular
industries and applications, but can commonly include:

Physics of failure (PoF)

Built-in self-test (BIT or BIST) (testability analysis)
Failure mode and effects analysis (FMEA)
Reliability hazard analysis
Reliability block-diagram analysis
Dynamic reliability block-diagram analysis[21]
Fault tree analysis
Root cause analysis
Statistical engineering, design of experiments – e.g. on simulations / FEM models or with
testing
Sneak circuit analysis
Accelerated testing
Reliability growth analysis (re-active reliability)
Weibull analysis (for testing or mainly "re-active" reliability)
Thermal analysis by finite element analysis (FEA) and / or measurement
Thermal induced, shock and vibration fatigue analysis by FEA and / or measurement
Electromagnetic analysis
Avoidance of single point of failure (SPOF)
Functional analysis and functional failure analysis (e.g., function FMEA, FHA or FFA)
Predictive and preventive maintenance: reliability centered maintenance (RCM) analysis
Testability analysis
Failure diagnostics analysis (normally also incorporated in FMEA)
Human error analysis
Operational hazard analysis
Preventative/Planned Maintenance Optimization (PMO)
Manual screening
Integrated logistics support

Results from these methods are presented during reviews of part or system design, and logistics. Reliability
is just one requirement among many for a complex part or system. Engineering trade-off studies are used to
determine the optimum balance between reliability requirements and other constraints.

The importance of language

Reliability engineers, whether using quantitative or qualitative methods to describe a failure or hazard, rely
on language to pinpoint the risks and enable issues to be solved. The language used must help create an
orderly description of the function/item/system and its complex surrounding as it relates to the failure of
these functions/items/systems. Systems engineering is very much about finding the correct words to
describe the problem (and related risks), so that they can be readily solved via engineering solutions. Jack
Ring said that a systems engineer's job is to "language the project." (Ring et al. 2000)[22] For part/system
failures, reliability engineers should concentrate more on the "why and how", rather that predicting
"when". Understanding "why" a failure has occurred (e.g. due to over-stressed components or
manufacturing issues) is far more likely to lead to improvement in the designs and processes used[4] than
quantifying "when" a failure is likely to occur (e.g. via determining MTBF). To do this, first the reliability
hazards relating to the part/system need to be classified and ordered (based on some form of qualitative and
quantitative logic if possible) to allow for more efficient assessment and eventual improvement. This is
partly done in pure language and proposition logic, but also based on experience with similar items. This
can for example be seen in descriptions of events in fault tree analysis, FMEA analysis, and hazard
(tracking) logs. In this sense language and proper grammar (part of qualitative analysis) plays an important
role in reliability engineering, just like it does in safety engineering or in-general within systems
engineering.

Correct use of language can also be key to identifying or reducing the risks of human error, which are often
the root cause of many failures. This can include proper instructions in maintenance manuals, operation
manuals, emergency procedures, and others to prevent systematic human errors that may result in system
failures. These should be written by trained or experienced technical authors using so-called simplified
English or Simplified Technical English, where words and structure are specifically chosen and created so
as to reduce ambiguity or risk of confusion (e.g. an "replace the old part" could ambiguously refer to a
swapping a worn-out part with a non-worn-out part, or replacing a part with one using a more recent and
hopefully improved design).

Reliability modeling
Reliability modeling is the process of predicting or understanding the reliability of a component or system
prior to its implementation. Two types of analysis that are often used to model a complete system's
availability behavior including effects from logistics issues like spare part provisioning, transport and
manpower are fault tree analysis and reliability block diagrams. At a component level, the same types of
analyses can be used together with others. The input for the models can come from many sources including
testing; prior operational experience; field data; as well as data handbooks from similar or related industries.
Regardless of source, all model input data must be used with great caution, as predictions are only valid in
cases where the same product was used in the same context. As such, predictions are often only used to
help compare alternatives.

For part level predictions, two separate fields of investigation are common:

The physics of failure approach uses an understanding of physical failure mechanisms

involved, such as mechanical crack propagation or chemical corrosion degradation or
failure;
The parts stress modelling approach is an
empirical method for prediction based on
counting the number and type of components
of the system, and the stress they undergo
during operation.
A reliability block diagram showing a "1oo3" (1 out
of 3) redundant designed subsystem
Reliability theory

Reliability is defined as the probability that a device will perform its intended function during a specified
period of time under stated conditions. Mathematically, this may be expressed as,

where is the failure probability density function and is the length of the period of time (which is
assumed to start from time zero).

There are a few key elements of this definition:

1. Reliability is predicated on "intended function:" Generally, this is taken to mean operation

without failure. However, even if no individual part of the system fails, but the system as a
whole does not do what was intended, then it is still charged against the system reliability.
The system requirements specification is the criterion against which reliability is measured.
2. Reliability applies to a specified period of time. In practical terms, this means that a system
has a specified chance that it will operate without failure before time . Reliability
engineering ensures that components and materials will meet the requirements during the
specified time. Note that units other than time may sometimes be used (e.g. "a mission",
"operation cycles").
3. Reliability is restricted to operation under stated (or explicitly defined) conditions. This
constraint is necessary because it is impossible to design a system for unlimited conditions.
A Mars rover will have different specified conditions than a family car. The operating
environment must be addressed during design and testing. That same rover may be
required to operate in varying conditions requiring additional scrutiny.
4. Two notable references on reliability theory and its mathematical and statistical foundations
are Barlow, R. E. and Proschan, F. (1982) and Samaniego, F. J. (2007).

Quantitative system reliability parameters—theory

Quantitative requirements are specified using reliability parameters. The most common reliability parameter
is the mean time to failure (MTTF), which can also be specified as the failure rate (this is expressed as a
frequency or conditional probability density function (PDF)) or the number of failures during a given
period. These parameters may be useful for higher system levels and systems that are operated frequently
(i.e. vehicles, machinery, and electronic equipment). Reliability increases as the MTTF increases. The
MTTF is usually specified in hours, but can also be used with other units of measurement, such as miles or
cycles. Using MTTF values on lower system levels can be very misleading, especially if they do not
specify the associated Failures Modes and Mechanisms (The F in MTTF).[16]
In other cases, reliability is specified as the probability of mission success. For example, reliability of a
scheduled aircraft flight can be specified as a dimensionless probability or a percentage, as often used in
system safety engineering.

A special case of mission success is the single-shot device or system. These are devices or systems that
remain relatively dormant and only operate once. Examples include automobile airbags, thermal batteries
and missiles. Single-shot reliability is specified as a probability of one-time success or is subsumed into a
related parameter. Single-shot missile reliability may be specified as a requirement for the probability of a
hit. For such systems, the probability of failure on demand (PFD) is the reliability measure – this is actually
an "unavailability" number. The PFD is derived from failure rate (a frequency of occurrence) and mission
time for non-repairable systems.

For repairable systems, it is obtained from failure rate, mean-time-to-repair (MTTR), and test interval. This
measure may not be unique for a given system as this measure depends on the kind of demand. In addition
to system level requirements, reliability requirements may be specified for critical subsystems. In most
cases, reliability parameters are specified with appropriate statistical confidence intervals.

Reliability testing
The purpose of reliability testing is to discover potential problems with the design as early as possible and,
ultimately, provide confidence that the system meets its reliability requirements.

Reliability testing may be performed at several levels and there are different types of testing. Complex
systems may be tested at component, circuit board, unit, assembly, subsystem and system levels.[23] (The
test level nomenclature varies among applications.) For example, performing environmental stress screening
tests at lower levels, such as piece parts or small assemblies, catches problems before they cause failures at
higher levels. Testing proceeds during each level of integration through full-up system testing,
developmental testing, and operational testing, thereby reducing program risk. However, testing does not
mitigate unreliability risk.

With each test both a statistical type 1 and type 2 error could be made and depends on sample size, test
time, assumptions and the needed discrimination ratio. There is risk of incorrectly accepting a bad design
(type 1 error) and the risk of incorrectly rejecting a good design (type 2 error).

It is not always feasible to test all system requirements. Some systems are prohibitively expensive to test;
some failure modes may take years to observe; some complex interactions result in a huge number of
possible test cases; and some tests require the use of limited test ranges or other resources. In such cases,
different approaches to testing can be used, such as (highly) accelerated life testing, design of experiments,
and simulations.

The desired level of statistical confidence also plays a role in reliability testing. Statistical confidence is
increased by increasing either the test time or the number of items tested. Reliability test plans are designed
to achieve the specified reliability at the specified confidence level with the minimum number of test units
and test time. Different test plans result in different levels of risk to the producer and consumer. The desired
reliability, statistical confidence, and risk levels for each side influence the ultimate test plan. The customer
and developer should agree in advance on how reliability requirements will be tested.

A key aspect of reliability testing is to define "failure". Although this may seem obvious, there are many
situations where it is not clear whether a failure is really the fault of the system. Variations in test conditions,
operator differences, weather and unexpected situations create differences between the customer and the
system developer. One strategy to address this issue is to use a scoring conference process. A scoring
conference includes representatives from the customer, the developer, the test organization, the reliability
organization, and sometimes independent observers. The scoring conference process is defined in the
statement of work. Each test case is considered by the group and "scored" as a success or failure. This
scoring is the official result used by the reliability engineer.

As part of the requirements phase, the reliability engineer develops a test strategy with the customer. The
test strategy makes trade-offs between the needs of the reliability organization, which wants as much data
as possible, and constraints such as cost, schedule and available resources. Test plans and procedures are
developed for each reliability test, and results are documented.

Reliability testing is common in the Photonics industry. Examples of reliability tests of lasers are life test
and burn-in. These tests consist of the highly accelerated aging, under controlled conditions, of a group of
lasers. The data collected from these life tests are used to predict laser life expectancy under the intended
operating characteristics.[24]

Reliability test requirements

Reliability test requirements can follow from any analysis for which the first estimate of failure probability,
failure mode or effect needs to be justified. Evidence can be generated with some level of confidence by
testing. With software-based systems, the probability is a mix of software and hardware-based failures.
Testing reliability requirements is problematic for several reasons. A single test is in most cases insufficient
to generate enough statistical data. Multiple tests or long-duration tests are usually very expensive. Some
tests are simply impractical, and environmental conditions can be hard to predict over a systems life-cycle.

Reliability engineering is used to design a realistic and affordable test program that provides empirical
evidence that the system meets its reliability requirements. Statistical confidence levels are used to address
some of these concerns. A certain parameter is expressed along with a corresponding confidence level: for
example, an MTBF of 1000 hours at 90% confidence level. From this specification, the reliability engineer
can, for example, design a test with explicit criteria for the number of hours and number of failures until the
requirement is met or failed. Different sorts of tests are possible.

The combination of required reliability level and required confidence level greatly affects the development
cost and the risk to both the customer and producer. Care is needed to select the best combination of
requirements—e.g. cost-effectiveness. Reliability testing may be performed at various levels, such as
component, subsystem and system. Also, many factors must be addressed during testing and operation,
such as extreme temperature and humidity, shock, vibration, or other environmental factors (like loss of
signal, cooling or power; or other catastrophes such as fire, floods, excessive heat, physical or security
violations or other myriad forms of damage or degradation). For systems that must last many years,
accelerated life tests may be needed.

Accelerated testing

The purpose of accelerated life testing (ALT test) is to induce field failure in the laboratory at a much faster
rate by providing a harsher, but nonetheless representative, environment. In such a test, the product is
expected to fail in the lab just as it would have failed in the field—but in much less time. The main
objective of an accelerated test is either of the following:

To discover failure modes

To predict the normal field life from the high stress lab life

An accelerated testing program can be broken down into the following steps:
Define objective and scope of the test
Collect required information about the product
Identify the stress(es)
Determine level of stress(es)
Conduct the accelerated test and analyze the collected data.

Common ways to determine a life stress relationship are:

Arrhenius model
Eyring model
Inverse power law model
Temperature–humidity model
Temperature non-thermal model

Software reliability
Software reliability is a special aspect of reliability engineering. It focuses on foundations and techniques to
make software more reliable, i.e., resilient to faults. System reliability, by definition, includes all parts of the
system, including hardware, software, supporting infrastructure (including critical external interfaces),
operators and procedures. Traditionally, reliability engineering focuses on critical hardware parts of the
system. Since the widespread use of digital integrated circuit technology, software has become an
increasingly critical part of most electronics and, hence, nearly all present day systems. Therefore, software
reliability has gained prominence within the field of system reliability.

There are significant differences, however, in how software and hardware behave. Most hardware
unreliability is the result of a component or material failure that results in the system not performing its
intended function. Repairing or replacing the hardware component restores the system to its original
operating state. However, software does not fail in the same sense that hardware fails. Instead, software
unreliability is the result of unanticipated results of software operations. Even relatively small software
programs can have astronomically large combinations of inputs and states that are infeasible to exhaustively
test. Restoring software to its original state only works until the same combination of inputs and states
results in the same unintended result. Software reliability engineering must take this into account.

Despite this difference in the source of failure between software and hardware, several software reliability
models based on statistics have been proposed to quantify what we experience with software: the longer
software is run, the higher the probability that it will eventually be used in an untested manner and exhibit a
latent defect that results in a failure (Shooman 1987), (Musa 2005), (Denney 2005).

As with hardware, software reliability depends on good requirements, design and implementation. Software
reliability engineering relies heavily on a disciplined software engineering process to anticipate and design
against unintended consequences. There is more overlap between software quality engineering and
software reliability engineering than between hardware quality and reliability. A good software
development plan is a key aspect of the software reliability program. The software development plan
describes the design and coding standards, peer reviews, unit tests, configuration management, software
metrics and software models to be used during software development.

A common reliability metric is the number of software faults per line of code (FLOC), usually expressed as
faults per thousand lines of code. This metric, along with software execution time, is key to most software
reliability models and estimates. The theory is that the software reliability increases as the number of faults
(or fault density) decreases. Establishing a direct connection between fault density and mean-time-between-
failure is difficult, however, because of the way software faults are distributed in the code, their severity,
and the probability of the combination of inputs necessary to encounter the fault. Nevertheless, fault density
serves as a useful indicator for the reliability engineer. Other software metrics, such as complexity, are also
used. This metric remains controversial, since changes in software development and verification practices
can have dramatic impact on overall defect rates.

Software testing is an important aspect of software reliability. Even the best software development process
results in some software faults that are nearly undetectable until tested. Software is tested at several levels,
starting with individual units, through integration and full-up system testing. All phases of testing, software
faults are discovered, corrected, and re-tested. Reliability estimates are updated based on the fault density
and other metrics. At a system level, mean-time-between-failure data can be collected and used to estimate
reliability. Unlike hardware, performing exactly the same test on exactly the same software configuration
does not provide increased statistical confidence. Instead, software reliability uses different metrics, such as
code coverage.

The Software Engineering Institute's capability maturity model is a common means of assessing the overall
software development process for reliability and quality purposes.

Structural reliability
Structural reliability or the reliability of structures is the application of reliability theory to the behavior of
structures. It is used in both the design and maintenance of different types of structures including concrete
and steel structures.[25][26] In structural reliability studies both loads and resistances are modeled as
probabilistic variables. Using this approach the probability of failure of a structure is calculated.

Comparison to safety engineering

Reliability for safety and reliability for availability are often closely related. Lost availability of an
engineering system can cost money. If a subway system is unavailable the subway operator will lose money
for each hour the system is down. The subway operator will lose more money if safety is compromised.
The definition of reliability is tied to a probability of not encountering a failure. A failure can cause loss of
safety, loss of availability or both. It is undesirable to lose safety or availability in a critical system.

Reliability engineering is concerned with overall minimisation of failures that could lead to financial losses
for the responsible entity, whereas safety engineering focuses on minimising a specific set of failure types
that in general could lead to loss of life, injury or damage to equipment.

Reliability hazards could transform into incidents leading to a loss of revenue for the company or the
customer, for example due to direct and indirect costs associated with: loss of production due to system
unavailability; unexpected high or low demands for spares; repair costs; man-hours; re-designs or
interruptions to normal production.[27]

Safety engineering is often highly specific, relating only to certain tightly regulated industries, applications,
or areas. It primarily focuses on system safety hazards that could lead to severe accidents including: loss of
life; destruction of equipment; or environmental damage. As such, the related system functional reliability
requirements are often extremely high. Although it deals with unwanted failures in the same sense as
reliability engineering, it, however, has less of a focus on direct costs, and is not concerned with post-failure
repair actions. Another difference is the level of impact of failures on society, leading to a tendency for strict
control by governments or regulatory bodies (e.g. nuclear, aerospace, defense, rail and oil industries).[27]

Fault tolerance

Safety can be increased using a 2oo2 cross checked redundant system. Availability can be increased by
using "1oo2" (1 out of 2) redundancy at a part or system level. If both redundant elements disagree the
more permissive element will maximize availability. A 1oo2 system should never be relied on for safety.
Fault-tolerant systems often rely on additional redundancy (e.g. 2oo3 voting logic) where multiple
redundant elements must agree on a potentially unsafe action before it is performed. This increases both
availability and safety at a system level. This is common practice in Aerospace systems that need continued
availability and do not have a fail-safe mode. For example, aircraft may use triple modular redundancy for
flight computers and control surfaces (including occasionally different modes of operation e.g.
electrical/mechanical/hydraulic) as these need to always be operational, due to the fact that there are no
"safe" default positions for control surfaces such as rudders or ailerons when the aircraft is flying.

Basic reliability and mission reliability

The above example of a 2oo3 fault tolerant system increases both mission reliability as well as safety.
However, the "basic" reliability of the system will in this case still be lower than a non-redundant (1oo1) or
2oo2 system. Basic reliability engineering covers all failures, including those that might not result in system
failure, but do result in additional cost due to: maintenance repair actions; logistics; spare parts etc. For
example, replacement or repair of 1 faulty channel in a 2oo3 voting system, (the system is still operating,
although with one failed channel it has actually become a 2oo2 system) is contributing to basic unreliability
but not mission unreliability. As an example, the failure of the tail-light of an aircraft will not prevent the
plane from flying (and so is not considered a mission failure), but it does need to be remedied (with a
related cost, and so does contribute to the basic unreliability levels).

Detectability and common cause failures

When using fault tolerant (redundant) systems or systems that are equipped with protection functions,
detectability of failures and avoidance of common cause failures becomes paramount for safe functioning
and/or mission reliability.

Reliability versus quality (Six Sigma)

Quality often focuses on manufacturing defects during the warranty phase. Reliability looks at the failure
intensity over the whole life of a product or engineering system from commissioning to decommissioning.
Six Sigma has its roots in statistical control in quality of manufacturing. Reliability engineering is a
specialty part of systems engineering. The systems engineering process is a discovery process that is often
unlike a manufacturing process. A manufacturing process is often focused on repetitive activities that
achieve high quality outputs with minimum cost and time.[28]

The everyday usage term "quality of a product" is loosely taken to mean its inherent degree of excellence.
In industry, a more precise definition of quality as "conformance to requirements or specifications at the
start of use" is used. Assuming the final product specification adequately captures the original requirements
and customer/system needs, the quality level can be measured as the fraction of product units shipped that
meet specifications.[29] Manufactured goods quality often focuses on the number of warranty claims during
the warranty period.

Quality is a snapshot at the start of life through the warranty period and is related to the control of lower-
level product specifications. This includes time-zero defects i.e. where manufacturing mistakes escaped
final Quality Control. In theory the quality level might be described by a single fraction of defective
products. Reliability, as a part of systems engineering, acts as more of an ongoing assessment of failure
rates over many years. Theoretically, all items will fail over an infinite period of time.[30] Defects that
appear over time are referred to as reliability fallout. To describe reliability fallout a probability model that
describes the fraction fallout over time is needed. This is known as the life distribution model.[29] Some of
these reliability issues may be due to inherent design issues, which may exist even though the product
conforms to specifications. Even items that are produced perfectly will fail over time due to one or more
failure mechanisms (e.g. due to human error or mechanical, electrical, and chemical factors). These
reliability issues can also be influenced by acceptable levels of variation during initial production.

Quality and reliability are, therefore, related to manufacturing. Reliability is more targeted towards clients
who are focused on failures throughout the whole life of the product such as the military, airlines or
railroads. Items that do not conform to product specification will generally do worse in terms of reliability
(having a lower MTTF), but this does not always have to be the case. The full mathematical quantification
(in statistical models) of this combined relation is in general very difficult or even practically impossible. In
cases where manufacturing variances can be effectively reduced, six sigma tools have been shown to be
useful to find optimal process solutions which can increase quality and reliability. Six Sigma may also help
to design products that are more robust to manufacturing induced failures and infant mortality defects in
engineering systems and manufactured product.

In contrast with Six Sigma, reliability engineering solutions are generally found by focusing on reliability
testing and system design. Solutions are found in different ways, such as by simplifying a system to allow
more of the mechanisms of failure involved to be understood; performing detailed calculations of material
stress levels allowing suitable safety factors to be determined; finding possible abnormal system load
conditions and using this to increase robustness of a design to manufacturing variance related failure
mechanisms. Furthermore, reliability engineering uses system-level solutions, like designing redundant and
fault-tolerant systems for situations with high availability needs (see Reliability engineering vs Safety
engineering above).

Note: A "defect" in six-sigma/quality literature is not the same as a "failure" (Field failure | e.g. fractured
item) in reliability. A six-sigma/quality defect refers generally to non-conformance with a requirement (e.g.
basic functionality or a key dimension). Items can, however, fail over time, even if these requirements are
all fulfilled. Quality is generally not concerned with asking the crucial question "are the requirements
actually correct?", whereas reliability is.

Reliability operational assessment

Once systems or parts are being produced, reliability engineering attempts to monitor, assess, and correct
deficiencies. Monitoring includes electronic and visual surveillance of critical parameters identified during
the fault tree analysis design stage. Data collection is highly dependent on the nature of the system. Most
large organizations have quality control groups that collect failure data on vehicles, equipment and
machinery. Consumer product failures are often tracked by the number of returns. For systems in dormant
storage or on standby, it is necessary to establish a formal surveillance program to inspect and test random
samples. Any changes to the system, such as field upgrades or recall repairs, require additional reliability
testing to ensure the reliability of the modification. Since it is not possible to anticipate all the failure modes
of a given system, especially ones with a human element, failures will occur. The reliability program also
includes a systematic root cause analysis that identifies the causal relationships involved in the failure such
that effective corrective actions may be implemented. When possible, system failures and corrective actions
are reported to the reliability engineering organization.

Some of the most common methods to apply to a reliability operational assessment are failure reporting,
analysis, and corrective action systems (FRACAS). This systematic approach develops a reliability, safety,
and logistics assessment based on failure/incident reporting, management, analysis, and
corrective/preventive actions. Organizations today are adopting this method and utilizing commercial
systems (such as Web-based FRACAS applications) that enable them to create a failure/incident data
repository from which statistics can be derived to view accurate and genuine reliability, safety, and quality
metrics.

It is extremely important for an organization to adopt a common FRACAS system for all end items. Also, it
should allow test results to be captured in a practical way. Failure to adopt one easy-to-use (in terms of ease
of data-entry for field engineers and repair shop engineers) and easy-to-maintain integrated system is likely
to result in a failure of the FRACAS program itself.

Some of the common outputs from a FRACAS system include Field MTBF, MTTR, spares consumption,
reliability growth, failure/incidents distribution by type, location, part no., serial no., and symptom.

The use of past data to predict the reliability of new comparable systems/items can be misleading as
reliability is a function of the context of use and can be affected by small changes in design/manufacturing.

Reliability organizations
Systems of any significant complexity are developed by organizations of people, such as a commercial
company or a government agency. The reliability engineering organization must be consistent with the
company's organizational structure. For small, non-critical systems, reliability engineering may be informal.
As complexity grows, the need arises for a formal reliability function. Because reliability is important to the
customer, the customer may even specify certain aspects of the reliability organization.

There are several common types of reliability organizations. The project manager or chief engineer may
employ one or more reliability engineers directly. In larger organizations, there is usually a product
assurance or specialty engineering organization, which may include reliability, maintainability, quality,
safety, human factors, logistics, etc. In such case, the reliability engineer reports to the product assurance
manager or specialty engineering manager.

In some cases, a company may wish to establish an independent reliability organization. This is desirable to
ensure that the system reliability, which is often expensive and time-consuming, is not unduly slighted due
to budget and schedule pressures. In such cases, the reliability engineer works for the project day-to-day,
but is actually employed and paid by a separate organization within the company.

Because reliability engineering is critical to early system design, it has become common for reliability
engineers, however, the organization is structured, to work as part of an integrated product team.

Education
Some universities offer graduate degrees in reliability engineering. Other reliability professionals typically
have a physics degree from a university or college program. Many engineering programs offer reliability
courses, and some universities have entire reliability engineering programs. A reliability engineer must be
registered as a professional engineer by the state or province by law, but not all reliability professionals are
engineers. Reliability engineers are required in systems where public safety is at risk. There are many
professional conferences and industry training programs available for reliability engineers. Several
professional organizations exist for reliability engineers, including the American Society for Quality
Reliability Division (ASQ-RD),[31] the IEEE Reliability Society, the American Society for Quality
(ASQ),[32] and the Society of Reliability Engineers (SRE).[33]

A group of engineers have provided a list of useful tools for reliability engineering. These include: PTC
Windchill software, RAM Commander software, RelCalc software, Military Handbook 217 (Mil-HDBK-
217), 217Plus and the NAVMAT P-4855-1A manual. Analyzing failures and successes coupled with a
quality standards process also provides systemized information to making informed engineering designs.[34]

Dependability – Measure in systems engineering

Factor of safety – System strength beyond intended load
Failing badly – Fails with a catastrophic result or without warning
Failure mode and effects analysis (FMEA) – Analysis of potential system failures
Fracture mechanics – Study of propagation of cracks in materials
Highly accelerated life test – Stress testing methodology for enhancing product reliability
Highly accelerated stress test
Human reliability – Factor in safety, ergonomics and system resilience
Industrial engineering – Branch of engineering which deals with the optimization of complex
processes or systems
Institute of Industrial and Systems Engineers – Professional society for the support of the
industrial engineering profession
Logistics engineering – Field of engineering for organization of purchase, transport, storage,
distribution, and warehousing
Performance engineering – Encompasses the techniques applied during a systems
development life cycle and Performance indicator – Measurement that evaluates the
success of an organization
Product certification – Performance and quality assurance
Overall equipment effectiveness – Measure of how well a manufacturing operation is utilized
RAMS – Engineering characterization of a product or system
Reliability, availability and serviceability – Quality of robustness of computer hardware
Reliability theory of aging and longevity – Biophysics theory
Risk-based inspection – an Optimal maintenance business process used to examine
equipment such as pressure vessels, heat exchangers and piping in industrial plants
Robustness validation
Security engineering – Process of incorporating security controls into an information system
Software reliability testing
Solid mechanics – Branch of mechanics concerned with solid materials and their behaviors
Spurious trip level – measure of the spurious activations of a safety or alarm system
Strength of materials – Behavior of solid objects subject to stresses and strains
Stress–strength analysis – tool in reliability engineering; analysis of the strength of the
materials and the interference of the stresses placed on the materials, where "materials" is
not necessarily the raw goods or parts, but can be an entire system
Structural fracture mechanics – Field of structural engineering
Temperature cycling – Chemical process
Weibull distribution – Continuous probability distribution

References
1. Institute of Electrical and Electronics Engineers (1990) IEEE Standard Computer Dictionary:
A Compilation of IEEE Standard Computer Glossaries. New York, NY ISBN 1-55937-079-3
2. RCM II, Reliability Centered Maintenance, Second edition 2008, pages 250–260, the role of
Actuarial analysis in Reliability
3. Why You Cannot Predict Electronic Product Reliability (http://www.lambdaconsulting.co.za/2
012ARS_EU_T1S5_Barnard.pdf) (PDF). 2012 ARS, Europe. Warsaw, Poland.
4. O'Connor, Patrick D. T. (2002), Practical Reliability Engineering (Fourth Ed.), John Wiley &
Sons, New York. ISBN 978-0-4708-4462-5.
5. Aven, Terje (1 June 2017). "Improving the foundation and practice of reliability engineering"
(https://doi.org/10.1177%2F1748006X17699478). Proceedings of the Institution of
Mechanical Engineers, Part O: Journal of Risk and Reliability. 231 (3): 295–305.
doi:10.1177/1748006X17699478 (https://doi.org/10.1177%2F1748006X17699478).
ISSN 1748-006X (https://www.worldcat.org/issn/1748-006X).
6. Saleh, J.H. and Marais, Ken, "Highlights from the Early (and pre-) History of Reliability
Engineering", Reliability Engineering and System Safety, Volume 91, Issue 2, February
2006, pages 249–256
7. Juran, Joseph and Gryna, Frank, Quality Control Handbook, Fourth Edition, McGraw-Hill,
New York, 1988, p.24.3
8. Reliability of military electronic equipment;report. Washington: United States Department of
Defense. 4 June 1957. hdl:2027/mdp.39015013918332 (https://hdl.handle.net/2027%2Fmd
p.39015013918332).
9. Wong, Kam, "Unified Field (Failure) Theory-Demise of the Bathtub Curve", Proceedings of
Annual RAMS, 1981, pp 402–408
10. Practical Reliability Engineering, P. O'Conner – 2012
11. "Articles – Where Do Reliability Engineers Come From? – ReliabilityWeb.com: A Culture of
Reliability" (https://reliabilityweb.com/index.php/articles/where_do_reliability_engineers_co
me_from/).
12. Using Failure Modes, Mechanisms, and Effects Analysis in Medical Device Adverse Event
Investigations, S. Cheng, D. Das, and M. Pecht, ICBO: International Conference on
Biomedical Ontology, Buffalo, NY, July 26–30, 2011, pp. 340–345
13. Federal Aviation Administration (19 March 2013). System Safety Handbook (http://www.faa.g
ov/regulations_policies/handbooks_manuals/aviation/risk_management/ss_handbook/).
U.S. Department of Transportation. Retrieved 2 June 2013.
14. Reliability Hotwire – July 2015
15. Reliability Maintainability and Risk Practical Methods for Engineers Including Reliability
Centred Maintenance and Safety – David J. Smith (2011)
16. Practical Reliability Engineering, O'Conner, 2001
17. System Reliability Theory, second edition, Rausand and Hoyland – 2004
18. The Blame Machine, Why Human Error Causes Accidents – Whittingham, 2007
19. Barnard, R.W.A. (2008). "What is wrong with Reliability Engineering?" (http://lambdaconsulti
ng.co.za/rwa%20barnard%20incose%202008.pdf) (PDF). Lambda Consulting. Retrieved
30 October 2014.
20. http://www.dfrsolutions.com/hubfs/DfR_Solutions_Website/Resources-
Archived/Presentations/2016/Design-for-Reliability-Best-Practices.pdf?t=1505335343846
21. Salvatore Distefano, Antonio Puliafito: Dependability Evaluation with Dynamic Reliability
Block Diagrams and Dynamic Fault Trees. IEEE Trans. Dependable Sec. Comput. 6(1): 4–
17 (2009)
22. The Seven Samurais of Systems Engineering (http://www.incose.org/wma/library/docs/seve
n_samurai-martin-paper-v040316a.pdf), James Martin (2008)
23. Ben-Gal I., Herer Y. and Raz T. (2003). "Self-correcting inspection procedure under
inspection errors" (http://www.eng.tau.ac.il/~bengal/SCI_paper.pdf) (PDF). IIE Transactions
on Quality and Reliability, 34(6), pp. 529–540.
24. "Yelo Reliability Testing" (http://www.yelo.co.uk/laser-diode-life-test/laser-diode-life-test.htm
l). Retrieved 6 November 2014.
25. Piryonesi, Sayed Madeh; Tavakolan, Mehdi (9 January 2017). "A mathematical
programming model for solving cost-safety optimization (CSO) problems in the maintenance
of structures". KSCE Journal of Civil Engineering. 21 (6): 2226–2234. doi:10.1007/s12205-
017-0531-z (https://doi.org/10.1007%2Fs12205-017-0531-z). S2CID 113616284 (https://api.
semanticscholar.org/CorpusID:113616284).
26. Okasha, N. M., & Frangopol, D. M. (2009). Lifetime-oriented multi-objective optimization of
structural maintenance considering system reliability, redundancy and life-cycle cost using
GA. Structural Safety, 31(6), 460–474.
27. Reliability and Safety Engineering – Verma, Ajit Kumar, Ajit, Srividya, Karanki, Durga Rao
(2010)
28. INCOSE SE Guidelines (http://g2sebok.incose.org/app/mss/asset.cfm?ID=INCOSE%20G2S
EBOK%202.14&ST=F)
29. "8.1.1.1. Quality versus reliability" (http://www.itl.nist.gov/div898/handbook/apr/section1/apr1
11.htm).
30. "The Second Law of Thermodynamics, Evolution, and Probability" (http://www.talkorigins.or
g/faqs/thermo/probability.html).
31. American Society for Quality Reliability Division (ASQ-RD) (http://asqrd.org)
32. American Society for Quality (ASQ) (http://asq.org/cert/reliability-engineer)
33. Society of Reliability Engineers (SRE) (http://www.sre.org)
34. "Top Tools for a Reliability Engineer's Toolbox: 7 Reliability Engineering Experts Reveal
Their Favorite Tools, Tips and Resources" (http://www.camcode.com/asset-tags/top-tools-in-
a-reliability-engineers-toolbox/#badge). Asset Tag & UID Label Blog. October 2014.
Retrieved 18 January 2016.

N. Diaz, R. Pascual, F. Ruggeri, E. López Droguett (2017). "Modeling age replacement

policy under multiple time scales and stochastic usage profiles". International Journal of
Production Economics. 188: 22–28. doi:10.1016/j.ijpe.2017.03.009 (https://doi.org/10.1016%
2Fj.ijpe.2017.03.009).

Further reading
Barlow, R. E. and Proscan, F. (1981) Statistical Theory of Reliability and Life Testing, To
Begin With Press, Silver Springs, MD.
Blanchard, Benjamin S. (1992), Logistics Engineering and Management (Fourth Ed.),
Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
Breitler, Alan L. and Sloan, C. (2005), Proceedings of the American Institute of Aeronautics
and Astronautics (AIAA) Air Force T&E Days Conference, Nashville, TN, December, 2005:
System Reliability Prediction: towards a General Approach Using a Neural Network.
Ebeling, Charles E., (1997), An Introduction to Reliability and Maintainability Engineering,
McGraw-Hill Companies, Inc., Boston.
Denney, Richard (2005) Succeeding with Use Cases: Working Smart to Deliver Quality.
Addison-Wesley Professional Publishing. ISBN. Discusses the use of software reliability
engineering in use case driven software development.
Gano, Dean L. (2007), "Apollo Root Cause Analysis" (Third Edition), Apollonian
Publications, LLC., Richland, Washington
Holmes, Oliver Wendell Sr. The Deacon's Masterpiece
Horsburgh, Peter (2018), "5 Habits of an Extraordinary Reliability Engineer", Reliability Web
Kapur, K.C., and Lamberson, L.R., (1977), Reliability in Engineering Design, John Wiley &
Sons, New York.
Kececioglu, Dimitri, (1991) "Reliability Engineering Handbook", Prentice-Hall, Englewood
Cliffs, New Jersey
Trevor Kletz (1998) Process Plants: A Handbook for Inherently Safer Design CRC ISBN 1-
56032-619-0
Leemis, Lawrence, (1995) Reliability: Probabilistic Models and Statistical Methods, 1995,
Prentice-Hall. ISBN 0-13-720517-1
Lees, Frank (2005). Loss Prevention in the Process Industries (3rd ed.). Elsevier. ISBN 978-
0-7506-7555-0.
MacDiarmid, Preston; Morris, Seymour; et al., (1995), Reliability Toolkit: Commercial
Practices Edition, Reliability Analysis Center and Rome Laboratory, Rome, New York.
Modarres, Mohammad; Kaminskiy, Mark; Krivtsov, Vasiliy (1999), Reliability Engineering
and Risk Analysis: A Practical Guide, CRC Press, ISBN 0-8247-2000-8.
Musa, John (2005) Software Reliability Engineering: More Reliable Software Faster and
Cheaper, 2nd. Edition, AuthorHouse. ISBN
Neubeck, Ken (2004) "Practical Reliability Analysis", Prentice Hall, New Jersey
Neufelder, Ann Marie, (1993), Ensuring Software Reliability, Marcel Dekker, Inc., New York.
O'Connor, Patrick D. T. (2002), Practical Reliability Engineering (Fourth Ed.), John Wiley &
Sons, New York. ISBN 978-0-4708-4462-5.
Samaniego, Francisco J. (2007) "System Signatures and their Applications in Engineering
Reliability", Springer (International Series in Operations Research and Management
Science), New York.
Shooman, Martin, (1987), Software Engineering: Design, Reliability, and Management,
McGraw-Hill, New York.
Tobias, Trindade, (1995), Applied Reliability, Chapman & Hall/CRC, ISBN 0-442-00469-9
Springer Series in Reliability Engineering (https://www.springer.com/series/6917)
Nelson, Wayne B., (2004), Accelerated Testing—Statistical Models, Test Plans, and Data
Analysis, John Wiley & Sons, New York, ISBN 0-471-69736-2
Bagdonavicius, V., Nikulin, M., (2002), "Accelerated Life Models. Modeling and Statistical
analysis", CHAPMAN&HALL/CRC, Boca Raton, ISBN 1-58488-186-0
Todinov, M. (2016), "Reliability and Risk Models: setting reliability requirements", Wiley, 978-
1-118-87332-8.

US standards, specifications, and handbooks

Aerospace Report Number: TOR-2007(8583)-6889 (http://www.everyspec.com/USAF/TORs/
TOR2007-8583-6889_14232/) Reliability Program Requirements for Space Systems, The
Aerospace Corporation (10 July 2007)
DoD 3235.1-H (3rd Ed) (http://www.everyspec.com/DoD/DoD-PUBLICATIONS/DOD_3235x
1-H_15048/) Test and Evaluation of System Reliability, Availability, and Maintainability (A
Primer), U.S. Department of Defense (March 1982).
NASA GSFC 431-REF-000370 (http://www.everyspec.com/NASA/NASA-GSFC/GSFC-Cod
e-Series/GSFC_431_REF_000370_2297/) Flight Assurance Procedure: Performing a
Failure Mode and Effects Analysis, National Aeronautics and Space Administration Goddard
Space Flight Center (10 August 1996).
IEEE 1332–1998 (https://standards.ieee.org/standard/1332-1998.html) IEEE Standard
Reliability Program for the Development and Production of Electronic Systems and
Equipment, Institute of Electrical and Electronics Engineers (1998).
JPL D-5703 (http://www.everyspec.com/NASA/NASA-JPL/JPL_D-5703_JUL1990_15049/)
Reliability Analysis Handbook, National Aeronautics and Space Administration Jet
Propulsion Laboratory (July 1990).
MIL-STD-785B (http://www.everyspec.com/MIL-STD/MIL-STD-0700-0799/MIL-STD-785B_2
3780/) Reliability Program for Systems and Equipment Development and Production, U.S.
Department of Defense (15 September 1980). (*Obsolete, superseded by ANSI/GEIA-STD-
0009-2008 titled Reliability Program Standard for Systems Design, Development, and
Manufacturing, 13 Nov 2008)
MIL-HDBK-217F (http://www.everyspec.com/MIL-HDBK/MIL-HDBK-0200-0299/MIL-HDBK-2
17F_14591/) Reliability Prediction of Electronic Equipment, U.S. Department of Defense (2
December 1991).
MIL-HDBK-217F (Notice 1) (http://www.everyspec.com/MIL-HDBK/MIL-HDBK-0200-0299/MI
L-HDBK-217F_NOTICE-1_14589/) Reliability Prediction of Electronic Equipment, U.S.
Department of Defense (10 July 1992).
MIL-HDBK-217F (Notice 2) (http://www.everyspec.com/MIL-HDBK/MIL-HDBK-0200-0299/MI
L-HDBK-217F_NOTICE-2_14590/) Reliability Prediction of Electronic Equipment, U.S.
Department of Defense (28 February 1995).
MIL-STD-690D (http://www.everyspec.com/MIL-STD/MIL-STD-0500-0699/MIL-STD-690D_1
5050/) Failure Rate Sampling Plans and Procedures, U.S. Department of Defense (10 June
2005).
MIL-HDBK-338B (http://www.everyspec.com/MIL-HDBK/MIL-HDBK-0300-0499/MIL-HDBK-
338B_15041/) Electronic Reliability Design Handbook, U.S. Department of Defense (1
October 1998).
MIL-HDBK-2173 (http://www.everyspec.com/MIL-HDBK/MIL-HDBK-2000-2999/MIL-HDBK-2
173_15046/) Reliability-Centered Maintenance (RCM) Requirements for Naval Aircraft,
Weapon Systems, and Support Equipment, U.S. Department of Defense (30 January 1998);
(superseded by NAVAIR 00-25-403 (http://www.barringer1.com/mil_files/NAVAIR-00-25-403.
pdf)).
MIL-STD-1543B (http://www.everyspec.com/MIL-STD/MIL-STD-1500-1599/MIL_STD_1543
B_166/) Reliability Program Requirements for Space and Launch Vehicles, U.S. Department
of Defense (25 October 1988).
MIL-STD-1629A (http://www.everyspec.com/MIL-STD/MIL-STD-1600-1699/MIL_STD_1629
A_1556/) Procedures for Performing a Failure Mode Effects and Criticality Analysis, U.S.
Department of Defense (24 November 1980).
MIL-HDBK-781A (http://www.everyspec.com/MIL-HDBK/MIL-HDBK-0700-0799/MIL_HDBK_
781A_1933/) Reliability Test Methods, Plans, and Environments for Engineering
Development, Qualification, and Production, U.S. Department of Defense (1 April 1996).
NSWC-06 (Part A & B) (http://www.everyspec.com/USN/NSWC/NSWC-06_RELIAB_HDBK
_2006_15051/) Handbook of Reliability Prediction Procedures for Mechanical Equipment,
Naval Surface Warfare Center (10 January 2006).
SR-332 (https://telecom-info.njdepot.ericsson.net/site-cgi/ido/docs.cgi?ID=073944231SEAR
CH&KEYWORDS=&TITLE=&DOCUMENT=sr-332&DATE=&CLASS=&COUNT=1000)
Reliability Prediction Procedure for Electronic Equipment, Telcordia Technologies (January
2011).
FD-ARPP-01 (https://telecom-info.njdepot.ericsson.net/site-cgi/ido/docs.cgi?ID=073944231
SEARCH&KEYWORDS=&TITLE=&DOCUMENT=fd-arpp-01&DATE=&CLASS=&COUNT=
1000) Automated Reliability Prediction Procedure, Telcordia Technologies (January 2011).
GR-357 (https://telecom-info.njdepot.ericsson.net/site-cgi/ido/docs.cgi?DOCUMENT=gr-357
&KEYWORDS=&TITLE=&ID=259205889SEARCH) Generic Requirements for Assuring the
Reliability of Components Used in Telecommunications Equipment, Telcordia Technologies
(March 2001).

http://standards.sae.org/ja1000/1_199903/ SAE JA1000/1 Reliability Program Standard Implementation

Guide

UK standards

In the UK, there are more up to date standards maintained under the sponsorship of UK MOD as Defence
Standards. The relevant Standards include:

DEF STAN 00-40 Reliability and Maintainability (R&M)

PART 1: Issue 5: Management Responsibilities and Requirements for Programmes and

Plans
PART 4: (ARMP-4)Issue 2: Guidance for Writing NATO R&M Requirements Documents
PART 6: Issue 1: IN-SERVICE R & M
PART 7 (ARMP-7) Issue 1: NATO R&M Terminology Applicable to ARMP's

DEF STAN 00-42 RELIABILITY AND MAINTAINABILITY ASSURANCE GUIDES

PART 1: Issue 1: ONE-SHOT DEVICES/SYSTEMS

PART 2: Issue 1: SOFTWARE
PART 3: Issue 2: R&M CASE
PART 4: Issue 1: Testability
PART 5: Issue 1: IN-SERVICE RELIABILITY DEMONSTRATIONS

DEF STAN 00-43 RELIABILITY AND MAINTAINABILITY ASSURANCE ACTIVITY

PART 2: Issue 1: IN-SERVICE MAINTAINABILITY DEMONSTRATIONS

DEF STAN 00-44 RELIABILITY AND MAINTAINABILITY DATA COLLECTION AND

CLASSIFICATION

PART 1: Issue 2: MAINTENANCE DATA & DEFECT REPORTING IN THE ROYAL NAVY,
THE ARMY AND THE ROYAL AIR FORCE
PART 2: Issue 1: DATA CLASSIFICATION AND INCIDENT SENTENCING—GENERAL
PART 3: Issue 1: INCIDENT SENTENCING—SEA
PART 4: Issue 1: INCIDENT SENTENCING—LAND

DEF STAN 00-45 Issue 1: RELIABILITY CENTERED MAINTENANCE

DEF STAN 00-49 Issue 1: RELIABILITY AND MAINTAINABILITY MOD GUIDE TO

TERMINOLOGY DEFINITIONS
These can be obtained from DSTAN (https://web.archive.org/web/20060106062624/http://www.dstan.mo
d.uk/). There are also many commercial standards, produced by many organisations including the SAE,
MSG, ARP, and IEE.

French standards
FIDES [1] (http://fides-reliability.org). The FIDES methodology (UTE-C 80-811) is based on
the physics of failures and supported by the analysis of test data, field returns and existing
modelling.
UTE-C 80–810 or RDF2000 [2] (http://www.ute-fr.com/FR/). The RDF2000 methodology is
based on the French telecom experience.

International standards
TC 56 Standards: Dependability (http://tc56.iec.ch/about/standards0_1.htm)

External links
Media related to Reliability engineering at Wikimedia Commons
John P. Rankin Collection, The University of Alabama in Huntsville Archives and Special
Collections (http://libarchstor.uah.edu:8081/repositories/2/resources/66) NASA reliability
engineering research on sneak circuits.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Reliability_engineering&oldid=1163000810"

Electrodynamics I Slides Complete
No ratings yet
Electrodynamics I Slides Complete
531 pages
Reliability Designof Mechanical Systemformechanicalcivil Engineer
100% (1)
Reliability Designof Mechanical Systemformechanicalcivil Engineer
435 pages
Reliability Engineering
100% (1)
Reliability Engineering
31 pages
Reliability Engineering - Wikipedia PDF
100% (1)
Reliability Engineering - Wikipedia PDF
206 pages
Reliability Engineering
No ratings yet
Reliability Engineering
18 pages
Basic-Concepts MTBF
No ratings yet
Basic-Concepts MTBF
15 pages
On Component Reliability and System Reliability Fo
No ratings yet
On Component Reliability and System Reliability Fo
6 pages
The Value of RCM in Business Today
No ratings yet
The Value of RCM in Business Today
8 pages
2@software Reliability
No ratings yet
2@software Reliability
126 pages
Mil HDBK 217f
No ratings yet
Mil HDBK 217f
80 pages
Fault Tolerance Techniques: Unit 3
No ratings yet
Fault Tolerance Techniques: Unit 3
40 pages
Simcenter T3STER Master 2301 Useref
No ratings yet
Simcenter T3STER Master 2301 Useref
138 pages
Technical Specification of ItemSoft
No ratings yet
Technical Specification of ItemSoft
13 pages
Availability Simulation - Isograph
No ratings yet
Availability Simulation - Isograph
1 page
Risk and Reliability Engineering
No ratings yet
Risk and Reliability Engineering
8 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Fault Tolerance
No ratings yet
Fault Tolerance
49 pages
Slides 08 PDF
No ratings yet
Slides 08 PDF
95 pages
Software Reliability Engineering: A Roadmap
No ratings yet
Software Reliability Engineering: A Roadmap
18 pages
Causes of Obsolescence
No ratings yet
Causes of Obsolescence
13 pages
Availability and Reliability
No ratings yet
Availability and Reliability
3 pages
RCM Alan Pride
No ratings yet
RCM Alan Pride
16 pages
RCM Template3
No ratings yet
RCM Template3
16 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Cobots - A New Way of Technology: D. Becerra, J. Paredes, G. Vaca
No ratings yet
Cobots - A New Way of Technology: D. Becerra, J. Paredes, G. Vaca
6 pages
High-Temperature Tensile and Creep Data For Inconel 600
100% (1)
High-Temperature Tensile and Creep Data For Inconel 600
13 pages
Reliability-Centered Maintenance Planning Based On Computer-Aided FMEA
No ratings yet
Reliability-Centered Maintenance Planning Based On Computer-Aided FMEA
7 pages
EPRD Sample
No ratings yet
EPRD Sample
29 pages
Fault Tolerance
No ratings yet
Fault Tolerance
13 pages
Reliability Growth
No ratings yet
Reliability Growth
267 pages
BS en 60300-3-16-2008 - (2020-08-23 - 04-51-36 PM)
No ratings yet
BS en 60300-3-16-2008 - (2020-08-23 - 04-51-36 PM)
26 pages
Reliability and Availablity
No ratings yet
Reliability and Availablity
6 pages
Advance CMOS Cell Design-Book3
No ratings yet
Advance CMOS Cell Design-Book3
44 pages
EPRD Sample
100% (1)
EPRD Sample
83 pages
Reliability Analysis: Oskar Larsson
No ratings yet
Reliability Analysis: Oskar Larsson
54 pages
Mil STD 756b PDF
100% (1)
Mil STD 756b PDF
85 pages
MTBF Calculator (Reliability Calculation System)
100% (1)
MTBF Calculator (Reliability Calculation System)
233 pages
Modulo 08 Airborne Systems
100% (1)
Modulo 08 Airborne Systems
78 pages
Aladon RCS 2017
No ratings yet
Aladon RCS 2017
4 pages
Reliability Software
100% (2)
Reliability Software
19 pages
Maintenance Strategy Selection A Case Study PDF
50% (2)
Maintenance Strategy Selection A Case Study PDF
22 pages
Reliability Mathematics
No ratings yet
Reliability Mathematics
341 pages
Software Reliability
No ratings yet
Software Reliability
28 pages
Fault Tolerant System Design
100% (1)
Fault Tolerant System Design
44 pages
RAMS
No ratings yet
RAMS
10 pages
Iaea Tecdoc 478
No ratings yet
Iaea Tecdoc 478
298 pages
Reliability Engineering Learning Matrix
100% (1)
Reliability Engineering Learning Matrix
1 page
Isograph RASWG Presentation
100% (2)
Isograph RASWG Presentation
41 pages
Fundamentals of Reliability Fundamentals of Reliability Engineering and Applications Part 1 of 3
No ratings yet
Fundamentals of Reliability Fundamentals of Reliability Engineering and Applications Part 1 of 3
36 pages
Block Sim Features
No ratings yet
Block Sim Features
4 pages
A Literature Review and Future Perspectives On Maintenance Optimization (2011) Alternative 1
No ratings yet
A Literature Review and Future Perspectives On Maintenance Optimization (2011) Alternative 1
21 pages
RAM Analysis Course V02
No ratings yet
RAM Analysis Course V02
2 pages
System Maintenance Plan
No ratings yet
System Maintenance Plan
66 pages
BOOK - Per Jacobsson, Göran Grimvall. Risks in Technological Systems - Reliability Engineering - Springer
100% (2)
BOOK - Per Jacobsson, Göran Grimvall. Risks in Technological Systems - Reliability Engineering - Springer
347 pages
LOG211 - Lesson11 - SG - FINAL (v1.3) - 20121212 5000 - 02 W Markup
No ratings yet
LOG211 - Lesson11 - SG - FINAL (v1.3) - 20121212 5000 - 02 W Markup
72 pages
Advances in Reliability, Risk and Safety Analysis With Big Data: Proceedings of The 57th ESReDA Seminar
No ratings yet
Advances in Reliability, Risk and Safety Analysis With Big Data: Proceedings of The 57th ESReDA Seminar
191 pages
II - Fault-Tolerant-techniques
No ratings yet
II - Fault-Tolerant-techniques
104 pages
MIL HDBK 217F Notice2
No ratings yet
MIL HDBK 217F Notice2
150 pages
AMMP: A New Maintenance Management Model Based On ISO 55000: Infrastructure Asset Management January 2016
No ratings yet
AMMP: A New Maintenance Management Model Based On ISO 55000: Infrastructure Asset Management January 2016
11 pages
Sustainable Asset Management: AI & Blockchain Unleashed
From Everand
Sustainable Asset Management: AI & Blockchain Unleashed
Prashant Sinha
No ratings yet
Reliability Engg
No ratings yet
Reliability Engg
20 pages
Canonical Correlation
No ratings yet
Canonical Correlation
7 pages
Autocorrelation
No ratings yet
Autocorrelation
8 pages
Analysis of Variance
100% (1)
Analysis of Variance
18 pages
ANOVA Gauge R&R
100% (1)
ANOVA Gauge R&R
3 pages
Statistical Hypothesis Testing
No ratings yet
Statistical Hypothesis Testing
16 pages
A) Reach B) Find C) Go D) Get
No ratings yet
A) Reach B) Find C) Go D) Get
5 pages
English Grammar Reviewer
100% (1)
English Grammar Reviewer
35 pages
GB Brochure
No ratings yet
GB Brochure
4 pages
BhaktiVriksha Introduction or Forward To BV Modules Book
No ratings yet
BhaktiVriksha Introduction or Forward To BV Modules Book
2 pages
Merged Bank and Manager Contact
No ratings yet
Merged Bank and Manager Contact
81 pages
QP - 2023 - Sup - 85e - C
No ratings yet
QP - 2023 - Sup - 85e - C
12 pages
VIRAJ 2024 Pre-Invite
No ratings yet
VIRAJ 2024 Pre-Invite
11 pages
Teen Self-Esteem Scale
No ratings yet
Teen Self-Esteem Scale
3 pages
Hawak Kamay - Chords and Lyrics
0% (1)
Hawak Kamay - Chords and Lyrics
2 pages
Effectiveness of Work Immersion in The Communication Skills of Freshmen Students
No ratings yet
Effectiveness of Work Immersion in The Communication Skills of Freshmen Students
56 pages
Southeastern and Southern Gaddang
No ratings yet
Southeastern and Southern Gaddang
11 pages
Montecillo Del Mar Vs Gica
No ratings yet
Montecillo Del Mar Vs Gica
1 page
Storyboarding and Storytelling - Lesson 4
No ratings yet
Storyboarding and Storytelling - Lesson 4
8 pages
Raionul Localitatea Instituţia de Învăţământ
0% (1)
Raionul Localitatea Instituţia de Învăţământ
6 pages
Micro-Economics 2nd Year 1st Semester: 1. Microeconomics 2. Macroeconomics
No ratings yet
Micro-Economics 2nd Year 1st Semester: 1. Microeconomics 2. Macroeconomics
2 pages
GR 6 Term 2 NST Tracker
No ratings yet
GR 6 Term 2 NST Tracker
11 pages
Termination of Marriage in Nigerian Family Laws - Nigeria
No ratings yet
Termination of Marriage in Nigerian Family Laws - Nigeria
36 pages
12 C Adminstration
No ratings yet
12 C Adminstration
2 pages
Corrections and Probation: TH TH
No ratings yet
Corrections and Probation: TH TH
6 pages
Unit Three: Four Walls: Replace The Words in Bold With A Synonym From The Box Blends in False Modest Ornate Plain Wood
No ratings yet
Unit Three: Four Walls: Replace The Words in Bold With A Synonym From The Box Blends in False Modest Ornate Plain Wood
37 pages
Notes in Correctional Administration 1 and 2: Justinian Code
No ratings yet
Notes in Correctional Administration 1 and 2: Justinian Code
1 page
The Undermountain LVL 1
No ratings yet
The Undermountain LVL 1
20 pages
Dr. Sanjay Kumar Professor (SCSE) : Program: M.SC Course Code: MSCD2360 Course Name: Disruptive Technology Lecture - 5
No ratings yet
Dr. Sanjay Kumar Professor (SCSE) : Program: M.SC Course Code: MSCD2360 Course Name: Disruptive Technology Lecture - 5
22 pages
Study: Stanley Milgram 1963: Zaina Hassan Social Psychology-Studies of Obedience
No ratings yet
Study: Stanley Milgram 1963: Zaina Hassan Social Psychology-Studies of Obedience
3 pages
Special Product
No ratings yet
Special Product
16 pages
Favourable Social Innovation Ecosystem(s) - An Explorative
No ratings yet
Favourable Social Innovation Ecosystem(s) - An Explorative
26 pages
PDF
No ratings yet
PDF
543 pages
The Essence of Functional Programming (1992)
No ratings yet
The Essence of Functional Programming (1992)
23 pages

Reliability Engineering

Uploaded by

Reliability Engineering

Uploaded by

Reliability engineering

Improve component reliability.

The objectives of reliability engineering, in decreasing order of priority, are:[10]

1. To apply engineering knowledge and specialist techniques to prevent or to reduce the

Scope and techniques

Reliability may be defined in the following ways:

Basics of a reliability assessment

Reliability and availability program plan

Reliability culture / human errors / human factors

Management decisions (e.g. in budgeting, timing, and required tasks)

Reliability prediction and improvement

creation of a proper reliability model (see further on this page)

Design for reliability

Statistics-based approach (i.e. MTBF)

One of the most important design techniques is

Common tools and techniques

Physics of failure (PoF)

The importance of language

The physics of failure approach uses an understanding of physical failure mechanisms

There are a few key elements of this definition:

1. Reliability is predicated on "intended function:" Generally, this is taken to mean operation

Quantitative system reliability parameters—theory

Reliability test requirements

To discover failure modes

Common ways to determine a life stress relationship are:

Comparison to safety engineering

Basic reliability and mission reliability

Detectability and common cause failures

Reliability versus quality (Six Sigma)

Reliability operational assessment

Dependability – Measure in systems engineering

N. Diaz, R. Pascual, F. Ruggeri, E. López Droguett (2017). "Modeling age replacement

US standards, specifications, and handbooks

http://standards.sae.org/ja1000/1_199903/ SAE JA1000/1 Reliability Program Standard Implementation

DEF STAN 00-40 Reliability and Maintainability (R&M)

PART 1: Issue 5: Management Responsibilities and Requirements for Programmes and

DEF STAN 00-42 RELIABILITY AND MAINTAINABILITY ASSURANCE GUIDES

PART 1: Issue 1: ONE-SHOT DEVICES/SYSTEMS

DEF STAN 00-43 RELIABILITY AND MAINTAINABILITY ASSURANCE ACTIVITY

PART 2: Issue 1: IN-SERVICE MAINTAINABILITY DEMONSTRATIONS

DEF STAN 00-44 RELIABILITY AND MAINTAINABILITY DATA COLLECTION AND

DEF STAN 00-45 Issue 1: RELIABILITY CENTERED MAINTENANCE

DEF STAN 00-49 Issue 1: RELIABILITY AND MAINTAINABILITY MOD GUIDE TO

Retrieved from "https://en.wikipedia.org/w/index.php?title=Reliability_engineering&oldid=1163000810"

You might also like