On The Use of Machine Learning Methods To Predict Component Reliability From Data-Driven Industrial Case Studies
On The Use of Machine Learning Methods To Predict Component Reliability From Data-Driven Industrial Case Studies
DOI 10.1007/s00170-017-1039-x
ORIGINAL ARTICLE
Abstract The reliability estimation of engineered compo- the presence of censored data, and finally, understand the
nents is fundamental for many optimization policies in a performance impact when the number of available inputs
production process. The main goal of this paper is to study changes. Our experimental results show the high ability of
how machine learning models can fit this reliability esti- machine learning to predict the component reliability and
mation function in comparison with traditional approaches particularly, random forest, which generally obtains high
(e.g., Weibull distribution). We use a supervised machine accuracy and the best results for all the cases. Experimenta-
learning approach to predict this reliability in 19 indus- tion confirms that all the models improve their performance
trial components obtained from real industries. Particularly, when considering censored data. Finally, we show how
four diverse machine learning approaches are implemented: machine learning models obtain better prediction results
artificial neural networks, support vector machines, ran- with respect to traditional methods when increasing the size
dom forest, and soft computing methods. We evaluate if of the time-to-failure datasets.
there is one approach that outperforms the others when
predicting the reliability of all the components, analyze Keywords Reliability prediction · Machine learning ·
if machine learning models improve their performance in Censored data · Weibull distribution
1 Introduction
Manuel Chica
[email protected]
The advance of international markets and the consequent
Emanuel F. Alsina
[email protected] increase of global competition have led manufacturers to
create more and more customizable products to reach higher
Krzysztof Trawiński
[email protected] customer expectations. Nowadays, manufacturers need to
produce high-quality products in less time. In this compet-
Alberto Regattieri
[email protected] itive environment, the manufacturers’ interest is to focus
more than ever before on the machines’ reliability [1, 2].
1 Department of Physics, Mathematics and Informatics, University Then, an appropriate maintenance is essential in manufac-
of Modena and Reggio Emilia, 41125 Modena, Italy turing systems to ensure all operating equipment in healthy
2
condition, reduce failures, and guarantee the quality of the
RØD Brand Consultants, 28001 Madrid, Spain
produced items [3]. Several additional factors have moti-
3 School of Electrical Engineering and Computing, The University vated this interest growth in reliability, which include the
of Newcastle, Callaghan NSW, 2308, Australia increasing of complexity and sophistication of the sys-
4
tems [4], and the insistence on product quality, warranty
Novelti, 28012 Madrid, Spain
programs, safety laws, and supply chain sustainability [5].
5 Department of Industrial Engineering, University of Bologna, Some of the latter factors are also influenced by the high
40136 Bologna, Italy cost of failures, their repairs, or their replacement [6].
Int J Adv Manuf Technol
Many possible optimization policies can apply relia- modeling, evaluation, and prediction [3, 18, 19], even when
bility with its corresponding improvement in safety and the number of features of the problem is not high because
costs. The study of the reliability is important to make of the difficulties of manually acquiring big datasets in the
important decisions to improve different aspects of the man- industry.
ufacturing processes [6–8]. Examples of these policies are Given the importance and benefits of machine learning
production planning, optimal mix of maintenance policies, methods for reliability prediction, one of the principal con-
fault detection, effective spare part management, and ware- tributions of this paper is the comparison of a wide and
house optimization. Two maintenance techniques exist in diverse set of machine learning models to fit the reliabil-
the literature: time-based maintenance and condition-based ity of a set of industrial real cases. These cases were taken
maintenance [8]. For the first type of maintenance, a robust from different conditions and real industrial components.
reliability analysis requires an a priori effective failure pro- We present the machine learning model application to the
cess investigation. This process is based on the operation failure prediction process, from the initial data collection
and failure times of a generic component that can be a part, phase to the final performance evaluation of the machine
device, piece of equipment, or the whole system. Failure learning models to predict the reliability of the components.
times, also called time-to-failure (TTF), are the times when The main purpose is to understand if there is a single data-
the considered components stop working adequately. driven learning model able to outperform the other models
Generally, this failure process investigation is costly as under different reliability prediction conditions: from a low
the data collection requires a lot of effort. For instance, one number of TTFs (e.g., less than 40) to a high number of
of the most difficult task in the industry is to obtain suffi- them (e.g., more than 100).
cient and reliable data available as it is likely to have very Furthermore, this study explores the performance of
little accurate TTF data [9]. Therefore, practitioners some- these models in the presence of censored data. In [20], they
times make use of incomplete failure information to define demonstrated that neglecting censored information results
the reliability of a component. This incomplete failure infor- in significant errors while evaluating the reliability perfor-
mation derives from the components which are still operat- mance of the components using the two-parameter Weibull
ing at the end of the tests of the failure investigation. These reliability estimation. We aim to demonstrate the crucial
conditions are known as censored data situations and can role of this censored data also when using machine learn-
influence the analysis of the components’ reliability [10]. ing models and to also suggest effective methodologies to
The application of mathematical models is a common take this data into consideration during the process. To do
practice to predict the reliability of the components when this, we divide the experimentation into the application of
TTFs are available. In most of the cases, theoretical proba- the models to censored and uncensored data. First, we apply
bility distributions, fault tree analysis, and Markov models the median rank method (MRM) to the uncensored set (all
are used for the systems reliability modeling [11]. However, the components) while the product limit estimator (PLE) is
these reliability modeling processes often makes simpli- applied to just the set with censored data.
fied assumptions to enable analytical or suitable numerical Finally, we explore the influence of the number of TTF
treatment difficult to validate. Also, several complex fac- available in the machine learning models’ performance.
tors that exhibit non-linear patterns influence the study of This study is done by defining a set of instance clusters
the reliability [12]. Therefore, the analysis of the reliability depending on their number of TTF. In this way, we can show
cannot depend on the assumptions of independence and lin- by analyzing the experimentation the best data-driven learn-
earity but requires more sophisticated models to capture the ing model for every cluster and if the ranking of the methods
complexities of the reliability behavior in a realistic way. changes when increasing the available TTF data. The results
Recently, new machine learning models, able to capture and conclusions that arise from the broad experimentation
the complexity of the reliability [13, 14], emerged above the can be a useful contribution for decision makers in the field
non-parametric techniques of failure data regression. Par- of maintenance engineering to have a wide view on the
ticularly, the use of artificial neural networks (ANNs) [15, capabilities and limitations of machine learning methods
16] and support vector machines (SVMs) [12, 17] showed when real industrial data is available.
outstanding results when predicting reliability from histor- Next, Section 2 presents an overview of the process of
ical data analysis. In a nutshell, machine learning models estimating the reliability function and the effects of the
are data-driven learning methods used to train software to use of censored data during this process. Also, this section
make generalized predictions from historical data. These presents a study of the related literature. Sections 3 and
models have the ability to learn automatically to solve prob- 4 describe the machine learning models implemented for
lems of different nature and with different dimensionality the work and the real industrial data used in our analy-
values, from hundreds of input features to just a few. In the sis, respectively. We discuss the performance results of the
last years, they have been used for the systems reliability models in Section 5 where we examine the use of censored
Int J Adv Manuf Technol
data and the performance analysis when modifying the non-trivial knowledge about the past performance of com-
amount of available TTF data. Finally, Section 6 discusses ponents [20]. The available dataset, usually received from
the conclusions of the work. industry, is not represented by a series of complete data,
but composed of a series of censored times. Censored times
mean that not all units fail during the tests, or several units
2 Background fail between two data monitoring times without knowing
the exact failure time. It then includes incomplete and/or
2.1 The reliability function estimation missing information and in some cases, the available data
is not necessarily a tracking data collection. In all the lat-
The reliability estimation problem can be classified depend- ter scenarios, censored time means the precise failure time
ing on the prediction it makes: system’s ability to operate, is unknown, whatever the reason is.
to complete a mission, or to perform at some given circum- One of the most common reasons for censoring is the fact
stances. We will focus on the prediction of the analyzed of analyzing life test data before all units have failed [22].
components to operate without maintenance and logistic Figure 1b shows a censored data situation. As it can be
support at certain times [6, 7, 21], also called basic reliabil- observed in this graph, unit 4 is still working at the end of
ity analysis. Therefore, the term reliability is used to find a the test. This case is called censored on the right as tested
probability function R(t) for a component to perform with- units are still in function at the end of a test or they have
out failing at least until a given period of time t under stated been removed from tracking before the failure [23]. Simi-
operating conditions and by assuming that the component is larly, Fig. 1b shows unit 3 where time when the units fail
new when t = 0. The reliability function is then a decreas- is known, but the effective working time is not known. This
ing function with time t, assuming that it is equal to one at fact occurs because the initial working time of the unit is
time zero, and zero at infinity time [6]. unknown, called censored on the left. The last censored data
The purpose of the non-parametric estimation approach, scenario is called interval censored data and it is shown by
i.e., an empirical approach, is to directly estimate the relia- unit 2 in Fig. 1b. This scenario takes place when the state of
bility function R(t) from a set of failure times. In the same the observed unit is controlled at a time interval and a fail-
way, this approach can be used in maintenance and logistics ure occurs between two inspections. It is usual to find up
support, where the data analyzed correspond to the main- to 40% of the failures after the programmed inspections in
tenance, repair, or support task completion times [21]. The complex industrial systems [24].
is estimated by empir-
data-driven reliability function R(t)
ical methods when historical failure times were collected. 2.3 Existing literature on machine learning
For instance, one can use different statistical distributions for reliability prediction
to fit the learnt function R(TTF) as a continuous reliabil-
ity distribution from the TTF series data. Once obtained, The machine learning approach has attracted consider-
those distributions become available for practitioners to able interest lately, mainly due to the flexible capacity
compute the reliability even for failure times which were not and ability of these models to understand the complex
present in the range of collected data. The most common non-linear relationships between input and output time
are mainly the Weibull,
distributions used to fit the R(t) series patterns through appropriate learning processes [25].
exponential, and normal [6, 8]. This fitting phase can be Machine learning models, especially artificial neural net-
considered as a machine learning problem to find a model to works (ANNs) [26, 27] and support vector machines
represent the failure time distribution. In fact, some works (SVMs) [28], have been used to improve the quality of the
have already shown how machine learning and soft comput- reliability prediction. One of the first works using machine
ing algorithms are suitable for solving such problems as we learning for industrial reliability was the one of [29] which
will review in next Section 2.3. explored the accuracy of a machine learning model to study
small-sample reliability data. The authors used ANNs to
2.2 Complete and censored data identify the most appropriate probability distribution which
fits a set of failure data, and estimate its parameters. The
Given an environment of n units, we call a complete data results showed how ANNs perform well when the number
situation when the failure times of all n units are available. of available TTF is small.
That is, the failure time of the i th unit, represented by ti , is Similar results emerged from the work of [30] where the
available for all units t1 , t2 , ..., tn . Figure 1a shows a com- efficiency of simple ANNs was tested to find the cumu-
plete data situation, where during the test, all units have a lative failure distribution of mechanical components. The
failure at a given unit of time. In this situation, all the unit authors of the latter work showed how these ANN mod-
failures are known. However, TTF data is often based on els can perform better than other common distributions,
Int J Adv Manuf Technol
especially under poor data conditions. Amjady and Ehsan the analyzed cases, SVM outperforms or is comparable to
[15] developed ANNs for evaluating the reliability of a other techniques.
power system, concretely the reliability evaluation of gen- Other machine learning approaches have been tested
erating units and the transmission system. Chatterjee and for the same well-known literature instances. An exam-
Bandopadhyay [16] applied an ANN to predict the cumula- ple is the work of [3] where authors proposed a two-stage
tive failure time of a load-haul-dump machine by showing maintenance framework using evolvable ANNs, restricted
how these models can better predict the component reli- Boltzmann machine and deep belief networks. Pai and Lin
ability than the auto-regressive integrated moving average [36] tested the performance of a soft computing [37] -based
model (ARIMA). The experimental results of [31] also approach (i.e., a neural fuzzy model) against ANNs and
demonstrated that ANNs outperform the ARIMA approach ARIMA models. They showed that the proposed fuzzy-
in terms of prediction accuracy when forecasting repairable based model provided lower forecast errors than the other
systems. Another ANN-related study was done by [32] who machine learning models.
predicted the reliability and failures of engine systems using In summary, all the presented studies demonstrated that
ANNs. The authors confirmed how the use of ANNs pro- machine learning models can perform well in the industrial
vided an alternative and good prediction performance with reliability field. However, they often use one well-known
respect to the existing models. The same Xu’s dataset [32] case study to validate their models. Their use, therefore,
was used in other works to test the performance of other appears to be ad hoc for a specific dataset and industrial con-
types of ANNs [33]. The results of [33] showed the poten- figuration, and it is still unclear if there is a machine learning
tial of using ANNs in reliability data analysis, especially for model able to perform generally well across different TTF
mid- to long-term predictions. datasets. Particularly, Xu’s dataset [32] has a large number
Different support vector machine (SVM) models (i.e., of TTF data points to train models.
support vector regression (SVR) when dealing a time series A failure process investigation, indeed, is costly, and
fitting problem) have been developed to predict the reliabil- sometimes, it is hard to have a long available TTF series
ity of the already commented dataset [34]. SVM performed (e.g., 40), like Xu’s dataset does. One key characteristic of
at least equal or even better than ANNs and ARIMA. real-world situations is they are diverse and it is not granted
The experimental results of [35] suggested that, within the that the number of available TTF is of that magnitude. In
forecasting fields of systems reliability, SVM had a better addition, revised studies used the last five TTFs to test the
forecast accuracy than ANNs. Hong and Pai [17] made the models. But, in real-world situations, it is not always useful
first attempt to apply a SVM model to predict engine reli- to know the reliability of a component at the end of its life time.
ability by comparing it with respect to the Duane model, Therefore, one of the main contribution of our work is
ARIMA, and general regression neural networks. Finally, studying a wide set of possible and different real-world
das Chagas Moura et al. [12] showed the SVM effectiveness industrial situations with diverse types and number of avail-
in forecasting TTF and reliability of engineered compo- able data points. Also, we aim at comparing different
nents on literature case studies against two types of ANN machine learning approaches to see if there is a common
(i.e., radial basis function and multilayer perceptron), an conclusion for all the studied datasets. Next, Section 3
auto-regressive-integrated-moving average and the recur- presents these models, from ANNs to more advanced soft
rent neural networks. The comparison demonstrated that in computing and bagging classification approaches.
Int J Adv Manuf Technol
Linear regression is a classical statistical technique for mod- The ensemble learning systems are well-recognized machine
eling the relationship between a scalar dependent variable learning tools capable of obtaining better performance than
yi (output) and xij which is called the i th input of a j th inde- a single-component model [45]. They are also able to deal
pendent variable (inputs) for the i th sample [38]. The case with complex and high-dimensional regression and classifi-
of one explanatory variable is called simple linear regres- cation problems [45, 46]. They are based on the combination
sion. When there are more than one explanatory variable, of the output of simple-machine learning systems, in the lit-
the process is called multiple linear regression. erature called weak learners, into a group of learners to get
Least mean squares (LMS) is the standard fitting better fitting results (in regression) or better accuracy rates
approach to adjust the parameters of the linear regression (in classification). Their performance strongly relies on the
model to existing data, often used as a baseline for innu- diversity of the weak learners as these learners correctly
merable regression problems [38]. In fact, some studies behave on different parts of the problem space.
showed that linear regression methods achieved good per- Random forests (RF) is a state-of-the-art ensemble algo-
formance for short-term data stream forecasting in system rithm for regression and classification [47]. Its learning
fault prediction [41]. algorithm constructs a set of diverse decision trees during
Linear regression is based on the minimization of the the training phase and aggregates the results provided by all
observation errors, obtained from the difference between the component decision trees to compute the final output
an observed value of an instance j and the value provided during the prediction phase. When dealing with a regression
by the fitted function [13, 42]. The deviation between the problem as the reliability fitting function, RF returns the
observed value yi and the estimated value yi∗ = α t xi is gen- mean prediction of all its individual trees. The basis of RF
erally regarded as the observation error i = yi − yi∗ , ∀i = is a bagging algorithm [48]. This algorithm builds a forest
1, .., N . from a set of random trees which play the role of the weak
learners of the ensemble system.
3.2 Support vector regression
3.4 Artificial neural networks
In a support vector regression (SVR) model, the input exam-
ples are mapped into the high-dimensional feature space ANNs are a family of learning algorithms inspired by bio-
through a non-linear mapping (kernel) selected a priori [43]. logical nervous systems of animals (in particular a neural
SVR constructs a linear decision surface, i.e., a hyperplane, structure of the brain) and are used to estimate or predict
in the original input feature space. Precisely, this hyper- functions that are generally unknown and can depend on a
plane is a linear decision function with maximal margin large number of inputs [49]. They are presented as systems
between the examples of the different categories in order to of interconnected neurons, also called perceptrons, which
separate them as much as possible. The maximal margin is are able to compute the output values of a complex system
determined by the support vectors, a small amount of the from some given inputs. Although neurons in isolation are
training data representing the decision boundary. When the able of performing simple computations, they need to oper-
SVR model is trained with the corresponding training data, ate as a collective to solve difficult, non-linear classification
testing samples are mapped into the same feature space and and regression problems.
their output values are predicted based on which side of the Within the numerous existing ANN algorithms, we
hyperplane they fall on. specifically use a classical multilayer perceptron algorithm
Int J Adv Manuf Technol
Table 1 Description of the 19 components’ data: number of available capability of knowledge extraction and human-centric rep-
TTFs and censored data times (in case they are) resentation allows them to become more comprehensible
Component TTFs Censored times to the users of the models (that is, they are different from
classical black-box models) [53, 54].
1 cod A 73 10 The lack of the automatic extraction of fuzzy systems has
2 cod B 96 10 attracted the attention of the soft computing community to
3 cod C 20 – incorporate learning capabilities to these kinds of systems.
4 cod D 25 10 In consequence, a hybridization of fuzzy systems and evo-
5 cod E 21 6 lutionary algorithms has become one of the most popular
6 cod F 37 – approaches in this field [55–57]. In general, evolutionary
7 cod H 22 – genetic fuzzy systems are fuzzy systems enhanced by a
8 cod I 23 4 learning procedure coming from evolutionary computation,
9 cod L 47 5 i.e. considering any evolutionary algorithm.
10 cod M 32 4 In particular, we fit the reliability function by means of a
11 cod N 71 7 fuzzy rule learning method based on genetic programming
12 cod O 34 – grammar operators and simulated annealing (EFS-SA) [58].
13 cod P 11 – EFS-SA is based on a fuzzy rule-based system with fuzzy if-
14 cod Q 24 – then rules. The learning mechanism of this soft computing
15 cod R 24 – algorithm is a hybrid method formed by a simulated anneal-
16 cod S 15 – ing [59] and genetic programming approach [60] to build
17 cod T 45 a set of the fuzzy rules which finally generate the numeri-
18 cod U 607 18 cal output yi∗ . The algorithm learns the rules of the system
19 cod V 282 10 and the fuzzy partitions from the dataset and uses a tree-
based codification for genetic programming. The authors of
the algorithm defined specific genetic programming gram-
mar operators and adapted them for the simulated annealing
(MLP) and a conjugate gradient method to train the weights optimization method.
of its structure [50]. MLP is a feed-forward ANN which is
a directed graph constructed over multiple layers of nodes.
Each layer is fully connected to the next one. A conjugate 4 Industrial dataset description and experimental
gradient method is an iterative search method that comes setup
from the conventional numerical analysis. This method also
uses a scaled conjugate gradient in order to speed up the This section presents the description of the used dataset and
learning process. non-parametric methods (Section 4.1) and the experimen-
tal setup with all the parameters used for comparing the
3.5 Soft computing regression method based on fuzzy logic algorithms (Section 4.2).
Fuzzy systems [37], which are based on fuzzy logic [51], 4.1 Data description
became popular in the research community since they have
the ability to deal with complex, non-linear problems, dif- Different TTF and censored times are typical characteristics
ficult to be solved by classical methods [52]. Besides, its of real industrial conditions. For instance, the monitoring
Table 2 Non-parametric
methods to estimate the
reliability function when there
is complete and censored data
Int J Adv Manuf Technol
and good maintenance policies can ensure the availabil- Each of the datasets has different available failure and
ity of a consistent number of TTF but, on the other hand, censored times as shown in Table 1. This table has a column
there are cases where only small datasets are available [61]. with the number of TTF for each type of component and
The goal of the datasets used in this paper is to be an another column with the number of censored times.
extensive representation of real industrial conditions. As i ) is used to evaluate and
An estimated function R(t
already commented, these datasets are diverse as contain train the machine learning models. To do this, we use non-
failure data from 19 different real-world engineered compo- parametric approximation methods. Depending on the type
nents. Particularly, data comes from the information systems of data (complete or censored data), different approxima-
which supports the maintenance management of three large i ) can be applied. In our study,
tion methods to estimate R(t
companies located in northern Italy, engaged in different we use the direct (DM), improved direct (IDM), and median
industrial sectors. These components belong to mechani- rank (MRM) methods [10] for complete TTF data series.
cal and electrical components of technical assets for the On the other hand, in presence of censored data, we use
electricity production, equipments for filling beverages, and the product limit estimator (PLE) method, and its varia-
parts of car assembly lines. tion, known as the Kaplan and Meier method (KM) [62].
Table 3 An example of code E dataset which can be used as complete or censored sample
Table 2 shows the mathematical definitions for all these non- the training set is composed of the remaining k − 1 subsets.
parametric methods which estimate the reliability function. This procedure is repeated k times. In this paper, we use its
As an example, we show in Table 3, the data series of classical variant, tenfold cross validation (10cv) [63].
code E component to understand the TTF construction and The used fitting evaluation measure is the mean squared
how this data can be used with complete and censored data. error (MSE) which will show the reliability fitting perfor-
This engineered component E has 21 TTFs (see Table 1). mance of the machine learning method. MSE computes the
During its failure process investigation, 29 TTFs were avail- sum of the squares of the output differences at the i th time
able: six of them were censored times and in three failed unit between the value of the approximate failure distri-
after 2202 time units (see times i = {11, 12, 15} in Table 3). bution and the output from the machine learning models
i )) (see Eq. 1 for the mathematical definition of the
(R(t
4.2 Experimental setup MSE). Obviously, the smaller the MSE value is, the better
the machine learning method performs as MSE measures
A k-fold cross validation technique is used to compare the the deviation between the non-parametric estimation and the
accuracy of the machine learning algorithms described in predicted values by the machine learning methods.
Section 3. k-fold cross validation is a statistical technique
which evaluates how the results of a statistical analysis will
generalize to k independent subsets of the main dataset [14]. 1
n
First, data is randomly split into the k subsets. At each k MSE = · i ))2 .
(R(ti ) − R(t (1)
n
iteration the test set is composed of a single subset, while i=0
Fig. 2 Methods to estimate the reliability function R(t) and the data-driven models used in the experimentation of this work
The values of the parameters for the five machine learn- the PLE method just to the components with censored data1
ing methods are presented in Table 4. These parameters (see Table 1 for checking the components with censored and
were used during the whole experimentation and are set to complete data).
the default values of the different implementations of the To sum up, we show in Fig. 2 a summary with the fol-
algorithms not to fit the methods to the problem in an unfair lowed failure process and a list of the machine learning
comparison. LMS, SVR, and RF were implemented using models used during the experimentation.
an R library while we used the KEEL [64] implementation
for the last two methods, MLP and EFS-SA. We run all
the experiments on an Intel i5-2400, 3.1 GHz processor and 5 Reliability prediction results
4 GBytes of memory, under the Linux operating system.
Additionally, we have included the traditional two- In this section, we present the obtained results of the
parameter Weibull reliability estimation [10, 65] to compare machine learning models on the dataset of 19 engineered
the performance of the machine learning algorithms with components. We first explore the results with complete data
it. The experimentation was carried out by using two non-
parametric estimation functions. We used the MRM method 1 We did not include the rest of the non-parametric estimations (i.e.,
for uncensored data and the PLE for censored data. The DM, IDM, and KM) in the final experimentation of the paper as we
MRM method was applied to all the components, while did not find differences between their values.
Int J Adv Manuf Technol
(Section 5.1). Later, we do the same when data is censored algorithms considered (lower error values indicate a better
(Section 5.2). Finally, we study the influence of the TTF size performance). Also, Table 5 shows the average error on the
in the models’ performance (Section 5.3). 19 components in the last row (highlighted by a gray filling).
For each component, the best-performing result is in bold.
5.1 Analysis of results with complete data We can observe from these results that the majority
of the machine learning models obtain better results than
We analyze the experiments obtained using machine learn- the classical method for the reliability prediction based
ing models with respect to the two-parameter Weibull reli- on the two-parameter Weibull reliability estimation (aver-
ability estimation for the MR method [6] (complete data). aged MSE value of 0.0038). The best-performing models
Table 5 presents the results obtained for all the industrial are the RF, MLP, and EFS-SA with averaged MSE values
components averaged in the tenfold validation results. These of 0.0016, 0.0027, and 0.003, respectively. However, two
values are the errors obtained by the MSE metric for the machine learning models, SVR and the traditional LMS, do
not improve the prediction provided by the two-parameter Table 6 presents the results obtained for censored data by
Weibull reliability estimation. showing the MSE metric values for all the algorithms con-
The LMS, which is a simple regression algorithm, sidered. Again, the best result for a given component dataset
obtains the worst reliability prediction results on average. is presented in bold font. The last row of the table presents
In contrast, RF confirms its high potential when solving the average values over all the components datasets.
this prediction problem by obtaining the lowest accuracy The prediction results for the PLE method show a sim-
results on average (0.0016). Furthermore, RF individually ilar trend with respect to when using complete data (the
outperforms the rest techniques in 11 out of 19 industrial MRM method shown in Section 5.1). Two machine learning
component cases. The ANN variant, i.e., the MLP with a approaches, the RF and MLP, obtain better accuracy results
conjugate gradient algorithm, is the second best-performing (0.0008 and 0.0012, respectively) than the traditional two-
model and obtains the best results in four cases. parameter Weibull reliability estimation method on average
(0.0024). Again, the statistical LMS is the worst-performing
5.2 Analysis of results with censored data approach on average (an averaged error value of 0.0054). RF
turns out to be the best-performing approach both in aver-
In this subsection, we focus the analysis and compare the age and individually by components. RF outperforms the
results obtained by the machine learning algorithms and the other approaches in eight out of ten cases. As in the previous
two-parameter Weibull reliability estimation method using section, when dealing with complete data, the ANN variant,
censored set of data (using the PLE method). Later on, we the MLP with a conjugate gradient algorithm, is the second
will also compare the results obtained for the censored data best-performing algorithm.
(the PLE method) with the results of the complete data (the In order to clearly see differences between using com-
MRM method), discussed in previous Section 5.1. plete or censored data, Table 7 presents the difference
Fig. 3 Evolution of the RF model performance when increasing the Fig. 4 Evolution of the RF model performance when increasing the
number of TTF for complete data number of TTF for censored data
Int J Adv Manuf Technol
between the results obtained for PLE (censored) and MRM censored and complete data for the two-parameters Weibull
(uncensored) methods for each algorithm. The negative val- reliability estimation. Four of the five machine learning
ues (in bold font) represent when machine learning models models obtain better accuracy when using censored data on
and two-parameter Weibull reliability estimation better per- the averaged metrics but EFS-SA.
form for censored data. It means they obtain a lower MSE
metric value for the same component when having censored 5.3 Impact of the dataset size on the models’
data. In contrast, positive values represent the same behavior performance
but for complete data.
In 36 out of 50 cases, we see that the models obtain bet- In this section, we will evaluate the impact of the number
ter behavior when using censored data (i.e., negative values of TTF on the performance of the machine learning models.
in Table 7). In six cases, there is no difference between both The goal is to find out if models perform better when having
censored and complete data. All the machine learning mod- industrial components with a higher number of failure times
els but EFS-SA obtain better results when using censored available. First, we plot in Figs. 3 and 4 the performance,
data (PLE method) than when using complete data (MRM measured by the MSE values, of the RF model (the best-
method). The clearest case is the best-performing machine performing machine learning algorithm). Horizontal axis of
learning algorithm, RF, which almost always obtains better both graphs is the number of TTF of the datasets while verti-
or equal results when using censored data. Its censored data cal axis is the MSE values obtained by the machine learning
results are only worse in the case of industrial component E. method for the different components.
Additionally, if we look at the averaged values (last row We can see that the RF performance increases when the
of Table 7), we find no difference between the accuracy of number of TTF also increases. Both graphs of Figs. 3 and 4
Table 8 Results of the components’ clusters based on the dataset size (number of TTF) for the MRM method (complete data)
have the same performance evolution. The same behavior accuracy results than the traditional two-parameters Weibull
occurs when using the MRM method (complete data, Fig. 3) reliability estimation on average (0.0031, 0.0033, and 0.006,
and the PLE method (censored data, Fig. 4). Therefore, respectively). In the next clusters, RF gets a higher differ-
there are no differences between using censored or complete ence with respect to the other techniques. It means that RF
data for this better performance behavior when the TTF size works better when having larger datasets. For instance, RF
increases, as expected. is the best-performing method for all the components but
We will analyze the behavior of the machine learning one of the last three clusters. In other words, the higher TTF
models by defining dataset clusters to further study this sce- cluster, the better RF performs. The same happens for both
nario of dataset size variation. These clusters are created complete (MRM method, see Table 8) and censored data
according to the size of the different available datasets. We (PLE method, Table 9).
divide all the industrial components datasets into four dif- Finally, one of the main important analysis from this
ferent clusters using TTF as the splitting criterion. The first study is that the two-parameters Weibull reliability esti-
cluster comprises components with TTF up to 25 (TTF< mation is the method with the lowest improvement when
25), the second one consists of the ones with TTF between increasing the number of TTF. In fact, this reliability esti-
25 and 50 (25< T T F ≤50), the third one contains the com- mation is more competitive to deal with the prediction
ponents with TTF between 50 and 100 (50< T T F ≤100), of the components of the first cluster (small datasets in
and the fourth one includes the components with TTF higher terms of available TTFs). When increasing the number
than 100 (100<TTF). of TTFs, machine learning models such as RF and MLP
Tables 8 and 9 present the results of the machine learning present important performance differences with respect to
models and two-parameters Weibull reliability estimation the Weibull reliability estimation. This analysis makes our
for the four clusters considering the MRM and PLE meth- clustering study useful to better understand the need of
ods, respectively. The values of the table show the number different models depending on the life cycles of the real
of TTF of each component of the clusters (second column), industrial components. Therefore, it is highly recommended
the difference between the maximum and minimum times to use machine learning methods when the number of
of the components (third column) as well as the results of TTF is relatively high. A reasonable threshold value for
the different algorithms. The numerical values are the MSE this decision can be 25 TTF. Differences between tradi-
(lower values mean a better performance). tional distribution methods and machine learning models is
Focusing on the first cluster, we can see that three machine not significant for components with low number of TTF
learning approaches (RF, EFS-SA, and MLP) obtain lower available.
Table 9 Results of the components’ clusters based on the dataset size (number of TTF) for the PLE method (censored data)
6 Conclusions References
Reliability prediction is fundamental in estimating engineer- 1. Birolini A (2007) Reliability engineering, vol 5. Springer, Berlin
ing system performance. An effective estimation of R(t) 2. Shi H, Zeng J (2016) Real-time prediction of remaining useful
life and preventive opportunistic maintenance strategy for multi-
permits the appropriate application of several models of
component systems considering stochastic dependence. Comput
maintenance policies of components. And then, it has their Ind Eng 93:192–204
significant benefits in the total cost of the ownership of a 3. Luo M, Yan HC, Hu B, Zhou JH, Pang CK (2015) A data-
complex system (e.g., manufacture machine). In this paper, driven two-stage maintenance framework for degradation predic-
we applied five machine learning models to estimate the tion in semiconductor manufacturing industries. Comput Ind Eng
85:414–422
reliability R(t) of a large dataset of mechanical components. 4. Sun Y, Ma L, Mathew J (2009) Failure analysis of engineer-
In particular, we used a set of 19 real cases, representative ing systems with preventive maintenance and failure interactions.
of a wide range of components behavior, to try to under- Comput Ind Eng 57(2):539–549
stand the ability of the proposed models to capture the 5. Rodger JA, George JA (2017) Triple bottom line account-
ing for optimizing natural gas sustainability: a statistical lin-
complexities of the reliability behavior by approximating ear programming fuzzy ILOWA optimized sustainment model
the reliability function of industrial components over time approach to reducing supply chain global cybersecurity vulner-
R(t). ability through information and communications technology. J
This paper analyzed how machine learning models Clean Prod 142:1931–1949
6. Ebeling CE (2010) An introduction to reliability and maintainabil-
behave with respect to the two-parameters Weibull reliabil- ity engineering. Waveland Press, Long Grove
ity estimation, the traditional method used in the industry. In 7. O’Connor P, Kleyner A (2011) Practical reliability engineering.
particular, we concluded that the random forest (RF) algo- Wiley, Hoboken
rithm is the best model as average and in the majority of 8. Ahmad R, Kamaruddin S (2012) An overview of time-based and
condition-based maintenance in industrial application. Comput
the industrial components of the datasets, obtaining high Ind Eng 63(1):135–149
accuracy values for the engineered components and without 9. Crocker J, Kumar UD (2000) Age-related maintenance versus reli-
performing a calibration for the parameters’ values of the ability centred maintenance: a case study on aero-engines. Reliab
machine learning methods. Additionally, the results showed Eng Syst Saf 67(2):113–118
10. Manzini R, Regattieri A, Pham H, Ferrari E (2009) Maintenance
that prediction models can get better results in presence of for industrial systems. Springer Science & Business Media, Berlin
censored data than when using complete data. 11. Nakagawa T (2006) Maintenance theory of reliability. Springer
We also carried out a study about the implications of Science & Business Media, Berlin
having different number of available TTF. To do this, we 12. das Chagas Moura M, Zio E, Lins ID, Droguett E (2011) Failure
and reliability prediction by support vector machines regression of
compared the behavior of the best-performing algorithm, time series data. Reliab Eng Syst Saf 96(11):1527–1534
i.e., RF, when increasing the number of TTF, and grouped 13. Bishop CM (2006) Pattern recognition and machine learning.
the datasets by size-based clusters. The main conclusion Springer, Berlin
of this experimentation was that machine learning mod- 14. Witten IH, Frank E (2005) Data mining: practical machine learn-
ing tools and techniques, 2nd edn. Morgan Kaufmann, San
els obtained better results when having more TTF data. Francisco
Specifically, RF clearly was the best model in all the 15. Amjady N, Ehsan M (1999) Evaluation of power systems reli-
larger components (the three last clusters). We showed that ability by an artificial neural network. IEEE Trans Power Syst
differences between the traditional two-parameter Weibull 14(1):287–292
16. Chatterjee S, Bandopadhyay S (2012) Reliability estimation using
reliability estimation and machine learning models are a genetic algorithm-based artificial neural network: an application
higher when having more TTF data. Therefore, it is even to a load-haul-dump machine. Expert Syst Appl 39(12):10,943–
more recommended to use machine learning models with 10,951
respect to traditional approaches in datasets with more than 17. Hong WC, Pai PF (2006) Predicting engine reliability by support
vector machines. Int J Adv Manuf Technol 28(1–2):154–161
25 TTF. 18. Gokulachandran J, Mohandas K (2015) Comparative study of two
Our research study showed the potential of the machine soft computing techniques for the prediction of remaining useful
learning approach for industrial reliability analysis. For life of cutting tools. J Intell Manuf 26(2):255–268
instance, our results can be also an useful contribution in the 19. Wu X, Zhu Z, Fan S, Su X (2016) Failure and reliability prediction
of engine systems using iterated nonlinear filters based state-space
field of maintenance engineering, helping to define some least square support vector machine method. Optik Int J Light
threshold values which trigger certain maintenance action. Electron Opt 127(3):1491–1496
Future works will focus on using more advanced machine 20. Regattieri A, Manzini R, Battini D (2010) Estimating reliability
learning models such as those using aggregation mecha- characteristics in the presence of censored data: a case study in a
light commercial vehicle manufacturing system. Reliab Eng Syst
nisms [5]. Additionally, we could explore if the used models Saf 95(10):1093–1102
are able to generalize data about specific industrial compo- 21. Kumar UD, Crocker J, Knezevic J, El-Haram M (2012) Relia-
nents and predict their reliability under different industrial bility, maintenance and logistic support: a life cycle approach.
conditions. Springer, Berlin
Int J Adv Manuf Technol
22. Meeker WQ, Escobar LA (1998) Statistical methods for reliability 44. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn
data, vol 314. Wiley-Interscience, Hoboken 20:273–297
23. Nelson WB (2009) Accelerated testing: statistical models, test 45. Dietterich TG (2000) Ensemble methods in machine learning. In:
plans, and data analysis, vol 344. Wiley, Hoboken Multiple classifier systems. Springer, Berlin, pp 1–15
24. Jardine AK, Tsang AH (2013) Maintenance, replacement, and 46. Kuncheva L (2001) Combining classifiers: soft computing solu-
reliability: theory and applications. CRC Press, Boca Raton tions. In: Pal S, Pal A (eds) Pattern recognition. From classical to
25. Bontempi G, Taieb SB, Le Borgne YA (2013) Machine learning modern approaches. World Scientific, Singapore, pp 427-451
strategies for time series forecasting. In: Business intelligence. 47. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Springer, Berlin, pp 62–77 48. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–
26. Vanini ZS, Khorasani K, Meskin N (2014) Fault detection and iso- 140
lation of a dual spool gas turbine engine using dynamic neural 49. Haykin S (1998) Neural networks: a comprehensive foundation,
networks and multiple model approach. Inf Sci 259:234–251 2nd edn. Prentice Hall PTR, NJ
27. Ogaji SO, Singh R (2003) Advanced engine diagnostics using 50. Moller F (1990) A scaled conjugate gradient algorithm for fast
artificial neural networks. Appl Soft Comput 3(3):259–271 supervised learning. Neural Netw 6:525–533
28. Widodo A, Yang BS (2007) Support vector machine in machine 51. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
condition monitoring and fault diagnosis. Mech Syst Signal Pro- 52. Yager RR, Filev DP (1994) Essentials of fuzzy modeling and
cess 21(6):2560–2574 control. Wiley-Interscience, New York
29. Liu MC, Kuo W, Sastri T (1995) An exploratory study of a neural 53. Casillas J, Cordon O, Herrera F, Magdalena L (2003) Inter-
network approach for reliability data analysis. Qual Reliab Eng Int pretability issues in fuzzy modeling. Springer, Berlin
11(2):107–112 54. Alonso JM, Magdalena L, González-Rodrı́guez G (2009) Looking
30. Alsina EF, Cabri G, Regattieri A (2016) A neural network for a good fuzzy system interpretability index: an experimental
approach to find the cumulative failure distribution: modeling and approach. Int J Approx Reason 51:115–134
experimental evidence. Qual Reliab Eng Int 32(2):167–579 55. Cordón O, Herrera F, Hoffmann F, Magdalena L (2001) Genetic
31. Ho S, Xie M, Goh T (2002) A comparative study of neural net- fuzzy systems. Evolutionary tuning and learning of fuzzy knowl-
work and Box-Jenkins ARIMA modeling in time series prediction. edge bases. World Scientific, Singapore
Comput Ind Eng 42(2):371–375 56. Cordón O, Gomide F, Herrera F, Hoffmann F, Magdalena L
32. Xu K, Xie M, Tang LC, Ho S (2003) Application of neural net- (2004) Ten years of genetic fuzzy systems: current framework and
works in forecasting engine systems reliability. Appl Soft Comput new trends. Fuzzy Sets Syst 141(1):5–31
2(4):255–268 57. Herrera F (2008) Genetic fuzzy systems: taxonomy, current
33. Fink O, Zio E, Weidmann U (2014) Predicting component reliabil- research trends and prospects. Evol Intell 1:27–46
ity and level of degradation with complex-valued neural networks. 58. Sánchez L, Couso I, Corrales J (2001) Combining GP operators
Reliab Eng Syst Saf 121:198–206 with SA search to evolve fuzzy rule based classifiers. Inf Sci
34. Lins ID, Moura MdC, Zio E, Droguett EL (2012) A particle 136(1-4):175–191
swarm-optimized support vector machine for reliability predic- 59. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by
tion. Qual Reliab Eng Int 28(2):141–158 simulated annealing. Sci 220(4598):671–680
35. Chen KY (2007) Forecasting systems reliability based on support 60. Koza JR (1992) Genetic programming: on the programming of
vector regression with genetic algorithms. Reliab Eng Syst Saf computers by means of natural selection, vol 1. MIT Press,
92(4):423–432 Cambridge
36. Pai PF, Lin KP (2006) Application of hybrid learning neural fuzzy 61. Ferrari E, Pareschi A, Regattieri A, Persona A (2002) TPM: situa-
systems in reliability prediction. Qual Reliab Eng Int 22(2):199–211 tion and procedure for a soft introduction in Italian factories. TQM
37. Zadeh LA (1994) Fuzzy logic, neural networks, and soft comput- Mag 14(6):350–358
ing. Commun ACM 37(3):77–84 62. Kaplan EL, Meier P (1958) Nonparametric estimation from
38. Draper NR, Smith H (2014) Applied regression analysis. Wiley, incomplete observations. J Am Stat Assoc 53(282):457–481
Hoboken 63. Kohavi R (1995) A study of cross-validation and bootstrap for
39. Marsland S (2015) Machine learning: an algorithmic perspective. accuracy estimation and model selection. In: Proceedings of
CRC Press, Boca Raton the 14th international joint conference on artificial intelligence,
40. Jang JSR, Sun CT, Mizutani E (1997) Neuro-fuzzy and soft IJCAI’95, vol 2. Morgan Kaufmann Publishers Inc., San Fran-
computing: a computational approach to learning and machine cisco, pp 1137–1143
intelligence. Prentice Hall, Upper Saddle Rive 64. Alcalá-Fdez J, Sanchez L, Garcia S, del Jesus MJ, Ventura S, Gar-
41. Alzghoul A, Löfstrand M, Backe B (2012) Data stream forecasting rell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009)
for system fault prediction. Comput Ind Eng 62(4):972–978 Keel: a software tool to assess evolutionary algorithms for data
42. Rustagi J (1994) Optimization techniques in statistics. Academic mining problems. Soft Comp 13(3):307–318
Press, Cambridge 65. Wang ZM, Yang JG (2012) Numerical method for Weibull gener-
43. Smola AJ, Schölkopf B (2004) A tutorial on support vector alized renewal process and its applications in reliability analysis
regression. Stat Comput 14(3):199–222 of nc machine tools. Comput Ind Eng 63(4):1128–1134