Creation of A Machine Learning Model For The Predictive Maintenance of An Engine Equipped With A Rotating Shaft
Creation of A Machine Learning Model For The Predictive Maintenance of An Engine Equipped With A Rotating Shaft
Supervisor:
Prof.ssa Tania Cerquitelli
Company Tutor:
Dott.ssa Paola Dal Zovo
Candidate:
Manfredi Manfrè
March 2020
I
Summary
One of the most promising applications of Industry 4.0 enabling technologies con-
cerns the creation of systems capable of providing condition-based and predictive
maintenance services. This thesis work deals with the introduction of the objectives
of these services, their difficulties and known problems, and the solutions offered by
the literature. It also describes the design and implementation of a system capable of
detecting vibrations on a rotating shaft of an electric motor. This solution is based
on a Data-driven approach, using an accelerometer and combining machine learning
models to determine the operating status of the machine and report any anomaly.
Particular attention is paid to the preprocessing of data to limit the calculation costs
and increase the speed of execution while maintaining high reliability.
II
III
Table of contents
Summary II
1 Introduction 1
2 Background 4
2.1 Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Prospective and advantages . . . . . . . . . . . . . . . . . . . 9
2.2 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Corrective Maintenance . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Preventive Maintenance . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Condition-based Maintenance . . . . . . . . . . . . . . . . . . 11
2.2.4 Predictive Maintenance . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Maintenance in Industry 4.0 context . . . . . . . . . . . . . . . . . . 12
3 Related Works 15
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Approach Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Physical Models . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Knowledge-Based Models . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Data-Driven Models . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Data and Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Type and Sources of Data . . . . . . . . . . . . . . . . . . . . 21
3.4 Proposed Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.2 Classification Models . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Challenges and Known Problems . . . . . . . . . . . . . . . . . . . . 25
3.6 State of Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Architecture 29
4.1 Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
IV
4.1.1 Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Transmission Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Used Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Models Realization 38
5.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . . . . 40
5.1.2 Discrete Wavelet Transform (DWT) . . . . . . . . . . . . . . . 42
5.1.3 Features Extraction . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.4 Features Selection . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.5 Standardize features . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 Future Developments 76
6.1 Combination of Classification and Anomaly Detection . . . . . . . . . 77
6.1.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7 Conclusions 79
Bibliography 81
V
Chapter 1
Introduction
Regarding the total costs for the production of goods, those related to maintenance
constitute from 15% up to 40% [1].
In particular, the main source of loss lies in all those non-fixed maintenance ac-
tivities, namely, those interventions that aim to solve problems related to sudden
breakdowns of machines and blocking in relation to production.
Analysis of the maintenance costs indicate that the same intervention entails a cost
approximately tripled when it is performed unexpectedly compared to when it is
programmed in advance [2].
It is therefore not surprising that with the advent of Industry 4.0, which is
expected to be the fourth industrial revolution, one of the sectors where many in-
vestments are being made and in which the research is very active is precisely that
of maintenance.
The goal is to be able to deduce the state of health of the machines and their critical
components and use this information to plan when to intervene with replacements
or repairs.
1
1 – Introduction
However, some problems and difficulties that are encountered when moving to
the actual implementation of these systems in real scenarios are equally well known.
Among them are the following:
• The high percentage of noise present in the data collected by the sensors in a
context such as that of industrial production;
• The need for historical data with the association to the relative state of health
from which the model manages to perform learning and the strong dependence
on the human factor that must certify the correctness of this labeling.
2
1 – Introduction
The thesis project was carried out in collaboration with Santer Reply SpA, which
is a provider of end-to-end solutions and consultancy in the IT sector, with special
focus on IoT.
Composition Structure
The rest of the thesis is structured as follows.
The second chapter provides a description of the context in which the thesis
project is located, thus presenting the concepts of Industry 4.0 and maintenance.
The third chapter reports the related works, highlighting the possible approaches,
the common problems and the solutions proposed by the literature to create sys-
tems for detecting, classifying and predicting machine failures. In this chapter, in
fact, the concepts of data analytics and machine learning will be explained, with a
particular focus on existing classification techniques.
The fourth chapter, the overall architecture, the individual components of the
proposed solution, the data provided, their structure and the technological tools
used to construct the classification models are analyzed.
The fifth chapter describes in detail the project carried out in detail. It is divided
into three parts, which respectively represent the different phases of the work: in
the first one, we analyze different methodologies with which to carry out the pre-
processing of raw data, that is preliminary treatment useful to better prepare the
received dataset, the second and the third part instead deal with the treatment of
different types of models developed, with the common purpose of creating a system
of help in predictive maintenance.
The sixth chapter describes future developments, where a possible solution to the
problems encountered during the work carried out in the fifth chapter is explained.
Furthermore, a program for the validation of the system on operating machines
through the use of an industrial sensor is presented.
3
Chapter 2
Background
4
2 – Background
In order to better describe what constitutes Industry 4.0,both the enabling tech-
nologies and the main benefits are described hereafter.
5
2 – Background
The internet of Things (IoT) is about extending the power of the internet beyond
computers and smartphones to a whole range of other things, processes, and environ-
ments. Those ”connected” things are used to gather information, send information
back, or both. When something is connected to the internet, that means that it
can send information or receive information, or both. This ability to send and/or
receive information makes things smart. To be smart, a thing doesn’t need to have
super storage or a supercomputer inside of it. Instead, it must only be connected to
super storage or a supercomputer.
In the Internet of Things, all the things that are being connected to the internet can
be put into three categories:
And all three of these have enormous benefits that feed on each other.
IoT provides businesses and people better insight into and control over the 99 percent
of objects and environments that remain beyond the reach of the internet. And by
doing so, IoT allows businesses and people to be more connected to the world around
them and to do more meaningful, higher-level work [8].
Cloud Computing
6
2 – Background
user. The term is generally used to describe data centers available to many users
over the Internet. It allows companies to achieve necessary services and resources to
enable Industry 4.0 applications, avoiding or minimizing up-front IT infrastructure
costs. Also, it allows enterprises to get their applications up and running faster,
with improved manageability and less maintenance, and that it enables IT teams to
more rapidly adjust resources to meet fluctuating and unpredictable demand.
Big Data
With Big Data we mean the collection of data flow such that in terms of volume,
speed, and variety, they are not manageable by a traditional relational database
system. This data could be about activity, events, sensor values and machine state.
The true advantage comes out during the data analysis, where we are looking for
unknown correlations between monitored events and variables. The analysis is,
therefore, the moment in which data are translated in information, which supports
decision stages.
The presence and analysis of this big data require appropriate solutions to be im-
plemented, so they typically rely on cloud services.
Machine learning is the scientific study of algorithms and statistical models that
computer systems use to perform a specific task without using explicit instructions,
relying on patterns and inference instead. Machine learning algorithms build a
mathematical model based on sample data, known as ”training data”, in order to
make predictions or decisions without being explicitly programmed to perform the
task. The main advantage in using a learning technique is that it is possible to solve
problems that a traditional sequence of instructions could hardly manage, even with
the contribution of domain experts.
In an Industry 4.0 context where there is a high amount of potentially useful data,
machine learning became a crucial tool.
Usually, ML algorithms don’t directly use raw data, but values derived from
them instead. They are called feature and they are used for the first data elabora-
tion step. Feature extraction is required to make model work easier, it can be done
by reducing input variables number but preserving their information and character-
istics, also to facilitate interpretability.
7
2 – Background
8
2 – Background
• Integration between the processes taking place during all the production chain;
this allows digitization and optimization of activities that spread from internal
logistic to the sale;
• The ability to collect and store data about every machine and production
aspect and the possibility it for operators and managers to access in every mo-
ment provide big support to decision-making activity. Analysis and artificial
intelligence methods further simplify the task, providing additional informa-
tion extracted from the data;
9
2 – Background
2.2 Maintenance
The Maintenance is defined as combination of all technical, administrative and man-
agement actions, during the life cycle of an entity, intended to maintain or restore
it to a state where it can perform the required function[12].
The maintenance concept and process have undergone a strong evolution over
the years, going from simple technique tasks performed individually when machines
or instruments had blocking failures, to a complex system strongly integrated with
other production processes and with an important strategic role.
Among the most significant activities that allow moving in this direction there
are failure systematic analysis and their reasons, careful management and planning
of the warehouse for the immediate availability of spare parts, the use of CMMS
(computerized maintenance management system) to support maintenance workflow
and the usage of special maintenance policies.
Maintenance
Preventive Corrective
Maintenance Maintenance
Condition-based Cyclic
Maintenance Maintenance
Predictive
Maintenance
10
2 – Background
11
2 – Background
12
2 – Background
• Storage, to keep the history of the sensor values, and possibly integrate it with
a list of events and activities;
The main advantages and benefit coming from the application of these systems
are reported below.
• Through the visualization of the data collected by the sensors in real time, it
is possible to carry out continuous monitoring activities of the production ma-
chinery, even during normal operation and without having periodic inspections
by maintenance workers;
• Since it is not possible to identify and prevent 100% of the blocking faults,
when they occur a CBM system is useful for identifying the component that
caused the stoppage and the type of problem that affected it, simplifying the
task of the maintenance technicians and reducing the time needed to restore
production activities.
13
2 – Background
14
Chapter 3
Related Works
The contributions and results of research in recent years in the context of Condition-
Based Maintenance (CBM) systems have been numerous.
As already reported in the previous chapter, they are mainly due to the strong in-
terest of the industrial sector, which sees maintenance on condition and predictive
as one of the most profitable applications of Industry 4.0.
Therefore, since the vastness of the topic in question, in this chapter, the main
definitions, the different problems to be solved and the most relevant methodologies
used in the solutions proposed in the literature are presented.
• Data Acquisition: the process of gathering all the information that is consid-
ered relevant to be able to deduce the state of the machine or its components;
15
3 – Related Works
The main distinction within CBM applications is between diagnostics and prog-
nostics.
The purpose of a diagnostic system is to detect and identify a fault when it occurs.
In the ideal case, this therefore means monitoring a system, indicating when some-
thing is not working in an expected way, indicating which component is affected by
the anomaly and specifying the type of anomaly.
On the other hand, the prognostic aims to determine whether a failure is close to
occurring or to deduce its probability of occurrence. Obviously, since the prognostic
is a prior analysis, it can provide a greater contribution as regards the reduction of
the costs of the interventions, but it is more complex objects to be achieved.
16
3 – Related Works
Figure 3.1. CBM system that combine diagnostic and prognostic [16]
17
3 – Related Works
The most relevant variables include various thermal, mechanical, chemical and elec-
trical quantities. Being able to represent how they impact the health of machinery
is a very complicated task. Therefore the figure that deals with realizing this type
of solution requires a high knowledge of domain and modeling skills.
Once the model has been created, it is necessary to have sensors available that al-
low obtaining values corresponding to the quantities considered relevant during the
analysis and modeling phase, in order to use them as inputs.
Among the most common approaches for implementing this type of model are
rule-based mechanisms and fuzzy logic [17].
The former has the simplicity of construction and interpretability, but it may not
be sufficient to express complicated conditions and may result in a combinatorial
explosion when the number of rules is very high.
The use of fuzzy logic allows describing the state of the system through more vague
and inaccurate inputs, making the process of model formalization and description
simpler and more intuitive.
Even for expert systems, as for physical methods, the results are strongly guaranteed
by the quality and level of detail obtained from the model and are highly specific.
18
3 – Related Works
• Compared to other approaches, they have the great advantage of not requir-
ing in-depth knowledge specific to the application domain, thus making the
contribution of experts on the final performance of the model less decisive;
the contribution of the experts may still be useful to speed up the process of
selecting the quantities to be used as input, but it has a much lesser weight
if compared with the knowledge-based or physics-based methods; besides we
have that learning and data mining techniques may be able to detect relation-
ships between the input parameters and the state of the system that even to
the experts themselves are not known in advance;
The choice of a specific data-driven type model is highly dependent on the ob-
jective to be achieved by the system. In fact, based on the objective, the problem
is modeled differently. The main options are shown below [19]:
Binary Classification
The simplest way is to represent CBM as a binary classification problem, in which
every single input representing the state of the system must be labeled with one of
1
Most cited and used Tools in literature are TensorFlow, Scikit-Learn, Keras, PyTorch, Theano
and SciPy.
19
3 – Related Works
Multiclass Classification
The multiclass version is a generalization of binary classification, in which the num-
ber of possible labels to choose from is increased. However, only one label must be
associated with each input.
The diagnostic case extends the previous case in a very intuitive way, namely decid-
ing whether the machine is working correctly or incorrectly, and in the second case
in which of the possible states of the anomaly.
While in the prognostic applications you can find in which time interval before the
failure the machine is located, where therefore the possible labels represent different
intervals of proximity to failure.
Regression
Regression can be used to model prognostic problems. This means allowing to es-
timate the remaining useful life of a component in terms of a continuous number
(provided by the regression model) of pre-fixed time units.
In this specific case, the training dataset must only contain data relating to compo-
nents that have been subject to failure, in order to allow the labeling of the inputs
backward starting from the instant of failure.
Anomaly Detection
Another possible representation of diagnostic problems is to consider it as an anomaly
detection problem.
This means that the model must be able to establish whether the operation of the
machine returns to a normal state or if it deviates from it, that is, coming in a case
of anomaly.
20
3 – Related Works
The interpretation of the problem is therefore very similar to the binary classifica-
tion.
However, this methodology differs from the classification which is part of the cases
of semi-supervised learning (a difference of the previous cases, which are all super-
vised), because the model only needs to learn from input which represents correct
and MUST functioning states, following the training phase, recognize anomalous
states which are not known, or whose characteristics are unknown.
Sensors Data
They are the measurements of all those physical quantities that describe in some
way the state of the machine during its operation; they are obtained through special
sensors that convert the physical value into an electrical value. Examples of these
parameters used are noise, vibrations, pressure, temperature and humidity, where
the relevance of each of them strongly depends on the system being monitored.
To be more specific, it is possible to distinguish sensor data, according to the type
of values [15]:
• Simple Values: a single value, typically numerical, collected at a precise instant
of time, such as temperature, pressure and humidity;
• Signals: namely the trend os a single quantity for a period of time, such as a
sound wave or vibrational signal;
• Multidimensional Values: namely, a multiplicity of values collected at the
same time referring to the same concept, such as a photograph or infrared
thermography.
21
3 – Related Works
Statistic Data
Metadata, as also defined in [19], are the data that correlate the static operating
conditions of the machine or plant at each instant of time, such as the type of piece
produced, the code of the materials used, the speed of machine production, identi-
fication and characteristics of the operator who is using the machine.
The sources of this information can be the PLCs of the machines or the ERP sys-
tems of the production plant, or, if they are not available, the manual declarations
of the operators, which must be digitized and integrated later.
Log Data
They are the historian of the events and relevant actions that concern a machine
and its components. In particular, the lists of repair and replacement interventions
or the history of the faults found are useful.
Also, in this case, they can be obtained thanks to ERP or CMMS systems, or by
specific operator declarations.
GATEWAY &
MACHINE SENSORS CONNECTIVITY
EDGE COMPUTING
MAINTENANCE
DATA COLLECTION
APPLICATION DATA ANALYSIS AND
PERSISTENCE
22
3 – Related Works
• Sensors: as already described in 3.3.1, these are the devices that deal with
detecting the physical quantities of interest from the machine;
• Connectivity: devices that interface directly with the sensors to carry out the
data collected by them and then transmit them through some communication
technology, which can be via cable or wireless depending on the characteristics
of the specific scenario;
• Gateway and edge computing: it is a first point of collection of raw data from
multiple sensors; these data can be filtered or aggregated according to a well-
defined logic, to reduce the traffic of data on the network and to detect and
discard any anomalous or not-significant data as soon as possible;
• Application: namely, where the information derived from the previous com-
ponent is presented to the end-user, possibly also intervening in the decision-
making phase, suggesting corrective actions that the user can then carry out.
23
3 – Related Works
a certain number (k, chosen a priori) of groups: in rotation on all the groups, one
of them will act as a test set and all the others as a training set.
Cross-Validation generally works well on many types of the dataset, but if necessary
other methods perform the same function (such as fixed partitioning, used for very
large dataset).
Many classification techniques have significant differences between them.
The best known are the following:
• Decision Tree: algorithms with a tree structure where each node represents a
specific test on the data attributes and each branch is a ”road” that is traveled
based on the test result. The final nodes are the label with which each data
can be associated. Its strengths are its interpretability, efficiency and good
accuracy while the main weakness is its sensitivity to missing data;
• Random Forest: these are classifiers that combine the results of multiple de-
cision trees for greater accuracy. Their weakness, however, concerns lower
scalability compared to the size of the training sets;
• Bayesian Classification: it is based on the calculation of the probability that
data belongs to a certain class. It is an accurate classifier with fair inter-
pretability but the generation of the model is very slow in case of full-bodied
datasets. To deal with this problem it is often necessary to introduce the
hypothesis of statistical independence among the attributes of the dataset
(called the Naive hypothesis, which however risks simplifying the model too
much, reducing its accuracy);
• K-Nearest Neighbors: algorithm based on the calculation of the distance (the
Euclidean one is often used) between the elements of the dataset. For example,
data is assigned to a certain class if close enough to the other data of the same
class. Parameter K represents the number of neighboring data taken into
account when assigning classes. The K-NN risks becoming computationally
expensive due to the calculation of the distances between the data, especially
in cases where there are many attributes;
• Neural Network[20]: these are very accurate and robust techniques in case of
missing data or outliers which, however, have poor interpretability and a slow
learning process. Their functioning resembles the human brain: each node,
which represents the neuron, receives the data, processes it and transmits the
data and its analyzes to the subsequent nodes: in this way the nodes of the
subsequent levels obtain more and more detailed information.
The outputs of the various classification algorithms can be evaluated by calcu-
lating some metrics that verify their quality, to understand if the created model is
working well or needs some adjustments:
24
3 – Related Works
• The accuracy of the model, which is calculated as the ratio between the number
of correctly classified data and the total number of data present in the test
dataset;
• The recall and precision, calculated for each different class. The first is the
ratio between the data correctly classified in a certain class and the total data
belonging to the same class, while the second is the ratio between the data
correctly classified in a certain class and the number of data assigned to that
class.
Recall and precision must be calculated because accuracy alone is not enough to
describe the model’s output, especially in the case of unbalanced data sets in the
distribution of classes.
25
3 – Related Works
• Many of the approaches described in the related works assume that a dataset
is available containing examples of sensor values classified by specific fault
classes. In a real scenario, it is very difficult for it to be present, and if it is
missing it means that defects should be specifically induced on the machine,
and this is often very complicated to implement;
• Finally, there is a strong dependence on the human factor. They are the
experts in the sector who must carry out the classification on the dataset
that will be used. It is a very delicate phase, as errors during labeling can
cause incorrect training and consequently the system risks being useless if not
harmful.
In this last section of the chapter some examples of real applications of predictive
maintenance will be presented, with the aim of making it clear in which areas its
use is already widespread today.
26
3 – Related Works
Figure 3.3. Machine condition relation to time and moments of indicators occur-
rence.
Through some specific sensors, however, it is possible to perceive the small ini-
tial vibrations and allow targeted intervention quite before the problem becomes
serious.
In fact, these sensors transmit the data collected on vibrations to the algorithm
which is adequately interpreted by an adequate staff.
27
3 – Related Works
different conditions: temperature, pressure and vibrations analyzed during the flight
phase have a significant influence on the monitoring of the aircraft condition.
28
Chapter 4
Architecture
This chapter describes the various parts, both hardware and software, that make up
the system and the interactions between them.
The general structure is based on what appears to be the most used in literature,
shown in Fig. 3.2, where storage, analysis and application responsibility are con-
densed into a single node.
The diagram representing the system architecture and the description of the indi-
vidual components are shown below.
MQTT Broker
CLOUD
Reading Data
Sensor Topic
n
ibe tio
cr ra sen P
bs ele sor ublis
Su /acc /ac h
sor cele
en rati
s on
Client MQTT
Storage Client MQTT
Vibration
Data Acquisition Data Collection
Sensor
Learning Model
Client
MMA8451
PC
29
4 – Architecture
4.1 Machine
The machine in which the anomaly detection system is applied was designed to mea-
sure the vibrations of a bearing. Through appropriate modifications, two different
operating conditions have been created, making possible to apply certain algorithms
to evaluate its status.
During the analysis phase of the project, the requirements that the machine
should have satisfied were identified, which are the follows:
• The possibility of applying sensors on it to detect significant parameters on
its state, without interfering with the normal operation;
• The full-time availability of the machine to be used for exploratory testing and
analysis;
• The ease of ignition and use, so as not to require specific skills to train and
test the CBM system;
The machine was built and made available by Reply, in order to create a model
for studying the vibrations on the bearings.
30
4 – Architecture
4.1.1 Fault
The aim of the project is to create a model capable of recognizing fault states when
they occur on the machine.
Therefore two possible states of the machine have been created by means of the
clamps that secure the bearings. In fact, while the central bearing is fixed to the
structure and has the task of stabilizing the rotating shaft, the outermost bearing
has the possibility of loosening the fastening that fix it to the structure, simulating
its malfunction.
4.2 Sensor
In order to determine the operating status of the rotating machines, the best way is
to analyze the vibrations, as highlighted in the literature [26].
The vibration analysis of electrical rotating machines lies on the fact that all ro-
tating machines in good condition have fairly stable vibration pattern. Under any
abnormal condition in working of machines, the vibration pattern gets changed.
Based on the type of defect and its slope of progression, predictive maintenance
schedule can be proposed. As a general rule, machines do not break down or fail
without some form of warning, which is indicated by an increased vibration level.
The vibrations caused by the defects occur at specific vibration frequencies. The vi-
bration amplitudes at particular frequencies are indicative of severity of the defects.
The vibration analysis technique is the mainstay of the predictive maintenance tech-
nique. Vibration analysis is most effective techniques for monitoring the health of
machinery.
Accelerometer
Among the characteristics of the sensors used in the consulted articles, it was noted
that the sampling frequency of the acceleration values varies from 20kHz up to
40kHz. Hardware capable of ensuring this level of performance requires very high
costs, outside the budget foreseen for the project.
We opted for the MMA845, a low consumption MEMS2 triaxial accelerometer with
an I 2 C interface, which has been specially programmed, to obtain the maximum
2
Micro Electro-Mechanical Systems
31
4 – Architecture
As shown in Fig.4.3 the accelerometer has been fixed on the bearing support, in
order to analyze its vibrations by direct contact.
32
4 – Architecture
The sampling technique partially mirrors that used in the NASA Bearing [27],
namely, it samples 2 seconds of vibrations at a frequency of 2.2kHz every 10 seconds.
After each sampling interval, the Raspberry Pi convert and send the data through
MQTT protocol.
33
4 – Architecture
1 {
2 " signals " : {
3 " x " : [ ARRAY ( FLOAT ) ] ,
4 " y " : [ ARRAY ( FLOAT ) ] ,
5 " z " : [ ARRAY ( FLOAT ) ]
6 }
7 }
34
4 – Architecture
35
4 – Architecture
• Json [29]: it is a library that allows the encoding and decoding of a JSON type
file in a Python dictionary and vice versa;
• Matplotlib [32] and Seaborn [31]: these are useful libraries for creating graph-
ics. During this work, extensive use was made of these libraries, as the graphic
display of the results obtained was a fundamental component of the project.
It proved to be fundamental during the exploration of the dataset to direct the
analyzes towards a certain direction, and also during the tests of the prediction
models to deny or confirm hypotheses and choices made. Both Matplotlib and
Seaborn, in a few lines of code, allow to view 2D or 3D graphics;
• Scikit-learn [33] and Keras [34]: they are data analysis and machine learning
oriented libraries, specially designed to be used together with other libraries
36
4 – Architecture
such as Numpy (another Python library that supports large vectors and mul-
tidimensional matrices and adds different mathematical functions) or Pandas
(many features of this libraries are in fact designed to receive a DataFrame
as input, for example). Scikit-learn contains various classes and methods to
carry out any type of analysis, thanks to the possibility of implementing all the
most important data mining algorithms. As part of this thesis, it was mainly
used for the preprocessing phase and for the construction, train and test of
the various classifiers designed. Keras instead is a high-level neural networks
API, written in Python and capable of running on top of TensorFlow, CNTK,
or Theano. It was developed with a focus on enabling fast experimentation.
It supports both convolutional networks and recurrent networks, as well as
combinations of the two end runs seamlessly on CPU and GPU.
37
Chapter 5
Models Realization
This chapter of the paper describes how the data collected by the sensors are used
for the detection of anomalies on the machine and therefore how the functions ex-
posed by the system described in the previous chapter are actually implemented.
Therefore all the steps carried out in the project and all the analysis carried out to
reach the final purpose will be explained in detail.
For each step, the methodology used and the reasons for any choices made, as
well as the results obtained, even if partial or intermediate, will be explained.
There has been a lot of focus on the preprocessing of raw data, in the specific
case of vibrations of rotating mechanical parts and in the evaluation and selection
of features, because it is considered the most important and most impact part in
the creation of a machine learning model, as it is greater the quality of the features
as easier and faster it will be from the point of view of the algorithm to train and
get better results. For this reason, two parallel paths have been taken, in order to
show advantages and disadvantages in the two cases examined which will lead to the
choice of a definitive architecture considering requirements in a project from which
this thesis is born.
In Fig.5.1, the yellow blocks represents the common part of all the analyzed
algorithms, that is the data collection and the creation of a Data Set. The blue ones
represent the two parallels path: on the left the creation of FFT while on the right the
creation of DWT, for both it continues with the creation of the features and finally
with the selection of the same. Lastly, the green blocks instead represent the steps
for creating the general predictive models, they are used both after the creation of
the features and after the selection of the features to looking for a minimum amount
of data to make correct predictions.
38
5 – Models Realization
RAW DATA
DB CREATION
FFT DWT
FEATURES FEATURES
GENERATION GENERATION
FEATURES FEATURES
SELECTION SELECTION
TRAIN AND
TEST DATASET
DIVISION
CLASSIFIERS
GENERATION
ANALYSIS AND
EVALUATION
OF THE
MODELS
39
5 – Models Realization
forest algorithm null values have to be managed from the original raw data set.
Another aspect is that data set should be formatted in such a way that more than
one Machine Learning and Deep Learning algorithms are executed in one data set,
and best out of them is chosen.
In the project specific scenario, each input element coming from the machine to
be observed through the models is composed of a vibration signal of a two-second
interval, calculated from approximately 4000 acceleration values on each axis. Since
features from the frequency domain make the algorithm more accurate than the
features from the time domain when evaluating the condition of the bearing [35],
the next step is to convert the raw data from time domain to frequency domain. To
operate this domain transformation two possible ways are proposed.
Fourier analysis is a field of study used to analyze the periodicity in (periodic) sig-
nals. If a signal contains components which are periodic in nature, Fourier analysis
can be used to decompose this signal in its periodic components, telling us the fre-
quency of these periodical component.
Two (or more) different signals (with different frequencies, amplitudes, etc) can be
mixed together to form a new composite signal. The new signal then consists of all
of its component signals.
The reverse is also true, every signal no matter how complex it looks can be de-
composed into a sum of its simpler signals. These simpler signals are trigonometric
functions (sine and cosine waves). This was discovered (in 1822) by Joseph Fourier
and it is what Fourier analysis is about. The mathematical function which trans-
form a signal from the time-domain to the frequency-domain is called the Fourier
Transform, and the function which does the opposite is called the Inverse Fourier
Transform.
The Fast Fourier Transform (FFT) is an efficient algorithm for calculating the Dis-
crete Fourier Transform (DFT) and is the de facto standard to calculate a Fourier
Transform. It is present in almost any scientific computing libraries and packages,
in every programming language.
Nowadays the Fourier transform is an indispensable mathematical tool used in al-
most every aspect of our daily lives.
In our specific case, we can see that the distinction between health and faulty data is
more marked and easy to understand by going to analyze the FFT in Fig.5.3 rather
than the Raw data in Fig.5.2.
40
5 – Models Realization
2000 800
1500 600
1000 400
Acceleration [m/s 2]
Acceleration [m/s 2]
500 200
0 0
-500 -200
-1000 -400
-1500 -600
-2000 -800
-2500 -1000
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time [s] Time [s]
Power Spectral Density Healthy [x] Power Spectral Density Faulty [x]
800 400
700 350
600 300
500 250
400 200
300 150
200 100
100 50
0 0
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Frequency [Hz] Frequency [Hz]
This first manipulation of raw data brings a considerable advantage from the
point of view of machine learning algorithms, making their training easier.
It is not possible to directly use the coefficients coming from the FFT calculation
since there would be about 4000 features, thus making the training of the machine
learning algorithms inefficient and unnecessarily expensive from the computational
point of view.
41
5 – Models Realization
In Fig.5.4 we can see the difference between a sine-wave and a wavelet. The
main difference is that the sine-wave is not localized in time (it stretches out from
−∞ to +∞) while a wavelet is localized in time. This allows the wavelet transform
to obtain time-information in addition to frequency information.
Since the Wavelet is localized in time, we can multiply our signal with the wavelet
at different locations in time. We start with the beginning of our signal and slowly
move the wavelet towards the end of the signal. This procedure is also known as a
convolution. After we have done this for the original (mother) wavelet, we can scale
it such that it becomes larger and repeat the process [36].
42
5 – Models Realization
1 R +∞
Xw (a,b) = x(t)ψ̄( t−b ) dt
|a|1/2 −∞ a
Where ψ(t) is the continuous mother wavelet which gets scaled by a factor of a
and translated by a factor of b.
When we are talking about the Discrete Wavelet Transform, the main difference is
that the DWT uses discrete values for the scale and translation factor.
The DWT is only discrete in the scale and translation domain, not in the time-
domain.
• Variance: it measures how far a set of (random) numbers are spread out from
their average value;
• Median: it is the value separating the higher half from the lower half of a data
sample;
45
5 – Models Realization
• 25% of value;
• 75% of value;
• Root Mean Square: it is the square of the average of the squared amplitude
values;
r
1 Pn−1 2
x
n i=0 i
• Mean of Derivative;
From FFT
In this way for each file containing 2 seconds of raw data on the three axes, we are
able to obtain 11 features for each axis, concatenated on the same line, for a total
of 33 features.
46
5 – Models Realization
From DWT
The DWT is used to split a signal into different frequency sub-bands, as many as
possible. If the different types of signals exhibit different frequency characteristics,
this difference in behavior has to be exhibited in one of the frequency sub-bands. So
if we generate features from each of the sub-band and use the collection of features
as an input for a classifier and train it by using these features, the classifier should
be able to distinguish between the different types of signals [36].
47
5 – Models Realization
Correlation Matrix
Correlation is a statistical term that in common usage refers to how close two vari-
ables are to having a linear relationship with each other.
Features with high correlation are more linearly dependent and hence have almost
the same effect on the dependent variable. So, when two features have high corre-
lation, we can drop one of the two features [37].
For this reason, the correlation matrices are analyzed for both FFT and DWT, since
especially the latter has a large number of features.
48
5 – Models Realization
As we can see from the figures above, in both cases we can see a high correlation
of some features.
• Collinear features: are features that are highly correlated with one another. In
machine learning, these lead to decreased generalization performance on the
test set due to high variance and less model interpretability.
49
5 – Models Realization
• Features with low importance: using the feature importances from the model
for further selection, It finds the lowest importance features that do not con-
tribute to a specified total importance. Based on the plot of cumulative im-
portance and this information, the gradient boosting machine considers many
of the features to be irrelevant for learning.
50
5 – Models Realization
51
5 – Models Realization
The new database was created and the correlation matrix was calculated again with
the remaining features.
As we can see from the figure above, respect to the correlation matrix in Fig.5.9
features generated by FFT have been reduced from 34 to 5. Namely a reduction of
84%.
52
5 – Models Realization
From the DWT point of view, we have a reduction of the features from 325 to
14. Reaching a reduction of 95.6%.
53
5 – Models Realization
(X − u)
Z=
s
Where Z represents the values of the features after normalization, X the non-
normalized input features, while u and s are respectively the mean and the standard
deviation of the distribution of a feature.
This standardization is carried out in Python through a function specially prepared
by Scikit-learn called StandardScaler().
5.2 Classification
The first goal that we wanted to achieve for this project was to be able to discern
between a state of normal machine operation and one with some anomalies. There-
fore, we want to obtain a service capable of providing diagnostics.
We have chosen to tackle the problem by using machine learning algorithms for
solving multiclass classification.
The models in question are shown below.
• Gradient Boost: ensemble classifier which combines 100 decision trees that
are used sequentially, so that each classifier of the chain is trained on the
residual errors of the previous model;
• K Nearest Neighbors: simple algorithm that stores all available cases and
classifies new cases based on a similarity measure (e.g. distance functions).
A case is classified by a majority vote of its neighbors, with the case being
assigned to the class most common among its K nearest neighbors measured
by a distance function. After several attempts, five neighbors are chosen, since
they give the best result;
• Naive Bayes: models that assign class labels to problem instances, repre-
sented as vectors of feature values, where the class labels are drawn from some
finite set. It assume that the value of a particular feature is independent of
the value of any other feature, given the class variable;
• Neural Network: a feed forward neural network is used, with N input neu-
rons corresponding to the N features that depends from the chosen Dataframe,
two hidden layers both from 16 neurons and an output layer with as many neu-
rons as the classes in the dataset used. A Drop Out function is insert between
54
5 – Models Realization
the layers and a Callback function to early stop training and save the best
model is added, both to reduce overfitting.
To analyze the scenario, a dataset was therefore created containing two different
classes, one of correct functioning and one of anomalous functioning. 2200 input
signals were collected for each class.
P recision = T P/(T P + F P )
• Recall: is the ability of a classifier to find all positive instances. For each class
it is defined as the ratio of true positives to the sum of true positives and false
negatives (FN);
Recall = T P/(T P + F N )
• F1-Score: is a weighted harmonic mean of precision and recall such that the
best score is 1.0 and the worst is 0.0;
Recall ∗ P recision
F 1 − Score = 2 ∗
(Recall + P recision)
55
5 – Models Realization
TP + TN
Accuracy =
(Recall + P recision)
The overall precision and recall values are obtained by calculating the individual
values for each class and averaging them. This procedure is possible because the
classes of the dataset are balanced.
The results are listed in the following tables:
As we can be deduced from the obtained data, the results are very positive, for
all the models that are used. In fact we reach more than 90% in all performances
in both cases.
But on closer inspection, we can see how the performance in the case of features
extracted from DWT has increased by about 1% compared to those extracted from
FFT, at the expense, however, of an increase in the time required to perform the
algorithmic training, ranging from 20% in the case of the Gradient Boost up to a
significant 97% in the case of the K Nearest Neighbors, with an average increase on
all algorithms by 50%.
56
5 – Models Realization
In the figures below the data just described are graphically represented in order
to facilitate their interpretation.
FFT - Classifier
Binary Classifier Gradient Boost K Nearest Neighbors Naive Bayes Neural Network
PRECISION
1,00
0,99
0,98
0,97
0,96
0,95
F1-SCORE
57
5 – Models Realization
DWT - Classifier
Binary Classifier Gradient Boost K Nearest Neighbors Naive Bayes Neural Network
PRECISION
1,00
0,99
0,98
0,97
0,96
0,95
F1-SCORE
58
5 – Models Realization
3,50
3,00
2,50
2,00
1,50
1,00
0,50
0,00
Binary Classifier Gradient Boost K Nearest Neighbors Naive Bayes Neural Network
FFT - Classifier DWT - Classifier
Figure 5.17. Comparison training time between FFT and DWT Classifier.
This result meets expectations since the better quality of the functionalities gen-
erated with DWT means that all algorithms can increase the overall performance,
but the big difference in the number of features makes training slower in the second
case, reaching more than 3s in two algorithms Fig.5.17.
In order to increase the training speed trying to maintain the highest possible
performance, the algorithms behaviors were evaluated after the previously gener-
ated features were subjected to a further preprocessing step of the data, namely, the
Feature Selection algorithm that we talked about in Chapter 5.1.4, by which, given
the input features, it returns a restricted subset without loss of useful information
for the functioning of the algorithms.
59
5 – Models Realization
Analyzing these new results, we note different aspects that need to be explored:
first of all, the highest precision is maintained by the algorithms powered by the
Features extracted from the DWT compared to those powered by the FFT.
The average loss of performance compared to previous DFs without Feature Selection
is of the order of 2.8% in the case of FFT reaching the maximum peak in the Naive
Bayes algorithm with a Precision loss of 5%.
While in the case of DWT there is an average loss of 1.9% with a maximum of 3%
in Naive Bayes.
Regarding the execution time of the training, there is an average decrease of 35%
in the case of FFT with Feature Selection compared to the case without it, since
we have a decrease the features from the original 34 to 4. While in the case of
DWT with FS the average time decreases by 62% compared to the first case, with
a reduction of 311 features. This means that the execution times of the algorithms
with Features generated via FFT and DWT both processed via Feature Selection
are more similar than in the previous case.
The only algorithm that behaves differently is the Neural Network, as in both cases
the smaller number of features means that the number of Epochs needed to reach
an adequate level of accuracy increases, consequently increasing the training time.
60
5 – Models Realization
0,99
0,98
0,97
0,96
0,95
F1-SCORE
61
5 – Models Realization
0,99
0,98
0,97
0,96
0,95
F1-SCORE
62
5 – Models Realization
4,00
3,50
3,00
2,50
2,00
1,50
1,00
0,50
0,00
Binary Classifier Gradient Boost K Nearest Neighbors Naive Bayes Neural Network
FFT - Classifie FS DWT - Classifier FS
Figure 5.20. Comparison training time between FFT and DWT Classifier after
Feature Selection.
0,001
0,0008
0,0006
0,0004
0,0002
0
K Nearest Neighbors Naive Bayes
FFT - Classifie FS DWT - Classifier FS
63
5 – Models Realization
5.2.1 Observations
Thanks to the tests described in this section, it is possible to demonstrate that the
machine learning models are able to distinguish different operating states of the
machine through the values collected by the chosen sensor.
In particular, it has been shown that the features extracted from the coefficients
calculated using the DWT and through a subsequent selection of the features them-
selves we can generate an excellent dataframe for classifying the operating states of
a rotating machine, with greater precision than a features calculated through the
FFT, while maintaining comparable training times, thus making the choice of the
DWT plus Feature Selection excellent as regards Classification problems.
64
5 – Models Realization
• Elliptic Envelop: a method that tries to figure out the key parameters of our
data general distribution by assuming that our entire data is an expression of
an underlying multivariate Gaussian distribution.
They are all an unsupervised learning algorithm, this means that we are able to
leave the model to work on its own to discover information.
Assuming to use good state signals as training, the model will aim to identify the
signals collected in the state of failure as a novelty.
To verify the ability to distinguish fault situations with anomaly detection, the
same 4 groups of features generated during classification are used. However, the
models have been trained using only 50% of the features corresponding to a good
65
5 – Models Realization
For the test phase, the remaining 50% of the data in which the machine is in
good condition and the data collected in fault situations were used, in order to verify
both the machine capable of recognizing the anomalies but also that it is able to
recognize the normal operation of the machine.
The analysis of the results shows that the detection of the anomaly status is
possible using both data frames.
We achieve the best result with the usage of Isolation Forest algorithm which uses
Features extracted from DWT, with an increase in training time of 20% respect the
FFT model.
66
5 – Models Realization
0,95
0,90
0,85
0,80
0,75
F1-SCORE
67
5 – Models Realization
0,95
0,90
0,85
0,80
0,75
F1-SCORE
68
5 – Models Realization
10,00
8,00
6,00
4,00
2,00
0,00
Isolation Forest One Class SVM Elliptic Envelop Local Outlier Factor
FFT - Outlier Det DWT - Outlier Det
Figure 5.24. Comparison training time between FFT and DWT Outlier Detection.
0,18
0,16
0,14
0,12
0,10
0,08
0,06
0,04
0,02
0,00
Isolation Forest One Class SVM Local Outlier Factor
FFT - Outlier Det DWT - Outlier Det
Figure 5.25. Zoom in Isolation Forest, One Class SVM and Local Outlier Factor.
In general, what is noted is that the performance has decreased compared to the
use of the classification methods described in the previous section.
69
5 – Models Realization
The other models such as SVM and LOF perform better using the features extracted
from FFT, while Elliptic Envelop perform better in DWT model but covariance es-
timation not perform well in high-dimensional settings also the excessive training
time makes it unusable.
As done in the previous chapter the above mentioned algorithms are analyzed
by operating the Feature Selection.
In this case, since it is not a supervised machine learning problem, it was not possible
to apply the method for the elimination of the Zero Importance Features which we
discussed in the Chap.5.1.4. Therefore we will not have a massive reduction in
features as in the previous cases but we will still get good results.
70
5 – Models Realization
0,95
0,90
0,85
0,80
0,75
F1-SCORE
71
5 – Models Realization
0,95
0,90
0,85
0,80
0,75
F1-SCORE
72
5 – Models Realization
0
Isolation Forest One Class SVM Elliptic Envelop Local Outlier Factor
FFT - Outlier Det FS DWT - Outlier Det FS
Figure 5.28. Comparison training time between FFT and DWT Outlier Detection
after Feature Selection.
0,16
0,14
0,12
0,1
0,08
0,06
0,04
0,02
0
Isolation Forest One Class SVM Local Outlier Factor
FFT - Outlier Det FS DWT - Outlier Det FS
Figure 5.29. Zoom in Isolation Forest, One Class SVM and Local Outlier Factor.
73
5 – Models Realization
Contrary to what happened during the Classification, the Features Selection has
decreased the features in the FFT by 58% compared to 88% in the previous case,
while in the DWT the features were reduced by 27% compared to 96% in the pre-
vious case.
5.3.1 Observations
The tests carried out regarding anomaly detection have shown that it is possible to
recognize with good precision when entering a generic state of the anomaly by using
in the training phase only examples collected during normal operation.
The choice, in particular, is limited to two types of different models that can be
used according to the specific use case, that is, if you want more performance at the
expense of a slower training time, you can use the Isolation Forest algorithm pow-
ered by features generated by the DWT without operating the Feature Selection.
Instead, if you want to reduce the number of features to be stored and make training
more streamlined and faster at the expense of lower performances, you can use the
Local Outlier Factor powered by features generated by the FFT and subsequently
selected.
The disadvantage of this approach, however, is to lose the distinction and clas-
sification between the different failures.
A further property deriving from this approach is that of being able to identify
when a certain input does not belong to any of the known classes.
74
5 – Models Realization
This is achievable by checking the distance values returned by the models. If the
signal to be classified is far from all models, then it is most likely an unknown class.
75
Chapter 6
Future Developments
The final objective of the reported research is to use the models for prognostic ap-
plications. To this, a future required development is to proceed with the validation
of the models on (different) operating machines, using a specific industrial sensor,
in order to obtain raw data of higher quality than that used for the tests. A first
test application is foreseen on machinery used in the waste transformation, thanks
to the collaborative project BioEnPro4TO, co-funded by Regione Piemonte. To
this, sensors with proper IP protection and, in some cases, an embedded wireless
communication module, have been shortlisted. Moreover, from the observations pro-
duced in the previous chapter, it is clear that in the case of Classification, machine
learning models are able to distinguish with remarkable results between the different
operating states of the machine once properly labeled, but this potential cannot be
exploited when we are dealing with machinery already in production that does not
have adequate historical data. Furthermore, the classification is limited to known
faults, this means that if a new fault occurs it will be classified into one of the known
ones even if the input pattern will differ from each of them.
Regarding Anomaly Detection, the developed models have shown good precision in
recognizing when the machine enters a generic state of the anomaly, using for the
training phase only data regarding the normal operating state. The disadvantage is
losing the ability to distinguish between different failures.
In order to maintain the good accuracy of the classification models without losing
the advantages introduced by the anomaly detection, a solution for CBM problems
is proposed that combines the two types of models.
76
6 – Future Developments
• It is supposed to start using the system without having already collected and
labeled a dataset;
• The machine’s normal operating status is trained, collecting data from the
sensors and using them to train an anomaly detection model implemented by
a Binary Classifier;
• At this point you can already go into the testing phase, using the anomaly
detection model to detect any deviations from the normal case, which will then
be reported as a generic anomaly;
• When an anomaly is found, a system user can perform labeling on the indi-
vidual signals, classifying them with a specific class of fault. In this case, a
new Binary Classifier is created, trained only on the new set of labeled data.
The procedure can be repeated several times for each new fault encountered
over time;
• In parallel to what has been described so far, from the moment in which two
or more classes are known, an Isolation Forest is also used, which must be re-
adjusted each time the user classifies new inputs. The classifier is used when
the other models indicate that it is not a new class, to distinguish the specific
type of anomaly.
77
6 – Future Developments
6.1.1 Observations
The proposed solution shown allows us to detect and classify anomalies that occur
on the machine.
It eliminates the need to have a dataset already at the start, allowing you to start
and only perform anomaly detection and introduce the classification as the faults
actually occur on the machine, using them to be able to distinguish them the fol-
lowing times.
The main limitations encountered are the time required to train the Isolation
Forest classes for advanced test phases, in which there are full-bodied Dataframe in
terms of classes and inputs. In fact, the classifier must be re-trained on the whole
dataset every time it is increased or modified. For this reason, it makes sense to use
a Dataframe generated from DWT with feature selection, in this way it is possible
to maintain a low amount of stored data and a good training speed of the algorithm.
It is therefore assumed that in real scenarios it is an operation that must be carried
out only periodically, in the background or at times when the machines are not in
production, limiting themselves to using the anomaly detection models as classifiers
during the process.
78
Chapter 7
Conclusions
The thesis project carried out aimed at the design and construction of a system
capable of enabling and supporting conditions and predictive maintenance activities
in Industry 4.0 context.
The developed system has proven to be able to classify and detect anomalies,
thus enabling diagnostic functions.
After testing different combinations to extract and select the features, the best
method of preprocessing raw data was assessed according to the machine learning
model to be used, in particular in the case of Classification and Anomaly Detection.
All the data that was used comes from an ad hoc realized model and not from a
simulation. The sensor positioned on the machine has collected physical quantities
during the operation of the machine, which has actually been brought into failure
states.
The sensor chose to be used is a low-cost and easily available device, demonstrating
the fact that it is possible to obtain positive results even without sophisticated in-
strumentation.
The classifiers have indeed shown excellent accuracy in the distinction between
states, but they need to train on a complete dataset of all the faults to be identified.
The parallel use of anomaly detection models allows overcoming this limitation, as
they are able to detect when a new input does not fall into any of the known fault
classes.
The proposed solution is based precisely on this concept of progressive learning,
which starts from the recognition of a generic anomaly by knowing only the state
79
7 – Conclusions
of normal operation and which allows maintenance workers to increase the set of
known faults as they occur on the machines.
Furthermore, a possible future development has been proposed which integrates a
combination of Classification and Anomaly Detection algorithms, so as to be able to
maintain the good accuracy of Classification models without losing the advantages
introduced by Anomaly Detection, such as not needing of all the data from the
beginning for training, but to be able to add it later. This allows you to apply
predictive maintenance even to machinery that does not have a substantial dataset,
thus extending the fields of application.
80
Bibliography
81
Bibliography
82
Bibliography
83