0% found this document useful (0 votes)

69 views8 pages

Data-Driven Methodology For Predictive Maintenance

This document presents a data-driven methodology for predictive maintenance of commercial vehicle turbochargers, emphasizing the importance of accurate data analysis for enhancing vehicle uptime and reducing maintenance costs. The methodology consists of three main steps: Problem Understanding, Data Understanding, and Data Preparation, aimed at preparing datasets for predictive model construction. The research highlights the need for systematic data standardization to facilitate the application of artificial intelligence techniques in predictive maintenance.

Uploaded by

SHIVAM LOHAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views8 pages

Data-Driven Methodology For Predictive Maintenance

Uploaded by

SHIVAM LOHAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2023 15th IEEE International Conference on Industry Applications Thu2Track A.

Data-Driven Methodology for Predictive Maintenance

of Commercial Vehicle Turbochargers
Tiago Nascimento de Freitas Ricardo Gaspar Romulo Gonçalves Lins Elmer John Hartman Junior
Engineering and Innovation Engineering and Innovation Engineering and Innovation Trucks Development
Management Management Management Scania Latin America Ltda.
Federal University of ABC Federal University of ABC Federal University of ABC São Bernardo do Campo,
Santo André, Brazil Santo André, Brazil Santo André, Brazil Brazil
[email protected] [email protected] [email protected] [email protected]
2023 15th IEEE International Conference on Industry Applications (INDUSCON) | 979-8-3503-1418-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/INDUSCON58041.2023.10374966

Abstract— Predictive maintenance plays an important role companies, increasing operating costs and, consequently,
in ensuring the profitability of companies that rely on interfering with product quality and customer service.
commercial vehicles as a core durable asset in the transportation
industry. However, developing accurate and reliable predictive Automotive manufacturers have a record of the historical
models for such equipment requires specific data analysis maintenance data and operational data for each individual
methodologies. This work proposes a data-driven methodology truck. This database can be used to obtain statistical data and
for predictive maintenance of commercial vehicles, comprising information structure to determine the relationship between
three main steps adapted from generic data mining methods: the data, identify patterns and obtain prognostic methods that
Problem Understanding, Data Understanding, and Data estimate the degradation levels of a component and its
Preparation. We used a case study that uses operational data remaining useful life. A real decision support system should
and maintenance history of turbochargers to demonstrate the gather various types of information, such as field
feasibility of the proposed methodology that is based on the instrumentation data, historical monitored variables, and
collection and analysis of real industry data for detailed analysis preventive and corrective maintenance reports.
of relevant variables and prepare the dataset for construction of
a predictive model. In this way, it is possible to identify trends The motivation for this research arose from the need to
and patterns that allow for the prediction of possible failures in optimize vehicle repair time through the use of a fault
vehicles before they occur, increasing their uptime reducing prognosis system employing artificial intelligence techniques.
maintenance costs. However, it's important to note that the scope of this work
does not encompass the development of predictive models or
Keywords—predictive maintenance, turbocharger, data the implementation of a fault diagnosis system. Rather, the
mining, machine learning focus lies on addressing the fundamental challenge of data
preparation. Since the development of machine learning
I. INTRODUCTION models requires the availability of uniform, reliable, and
The transport market is currently experiencing one of its structured datasets, this study aims to tackle the complexities
main moments of change, with the emergence of new posed by the automotive industry's extensive and
technologies, such as advances in vehicle connectivity heterogeneous data storage practices. By proposing a
through telematics, which allows for real-time data collection methodology for data preparation in this specific context, this
and fleet visibility. New business models such as shared research seeks to lay the groundwork for the successful
mobility solutions and on-demand services to meet individual application of artificial intelligence techniques in the future
travel needs, and the search for sustainable transport solutions development of predictive models and fault diagnosis
such as electric vehicles, shaping a disruptive scenario in systems.
which companies need to find innovative answers to
consolidate or maintain their position in the market. In the There are numerous factors that affect the collection,
transport chain, the commercial vehicle is an integral part, storage, availability, and data values. In particular, there is a
where robustness, high reliability, and uptime are essential to wide range of vehicle configurations that vary according to
ensure the profitability and competitiveness of companies, their application, such as sugarcane, haulage, or mining
mainly because it is inserted in a market that depends on many trucks, and their technical specifications, including engine
external variables such as fuel prices, vehicle failures, and type and power, quantity and types of sensors, among others.
economic slowdowns. The need for innovation inserted in Additionally, vehicle generations can present significant
volatile economic contexts requires the search for quick differences, with the installation of parts, sensors and software
answers and increasingly efficient transport solutions, where from different versions depending on the year of manufacture.
new and more complex transport systems will develop [1]. Operational and regional factors, such as transported load, fuel
quality, documentation of maintenance services, type and
Commercial vehicles have a large number and diversity of condition of roads, weather conditions, driver skill, and time
components and systems that have been increasing with the of use, are also directly related to the values and quality of
development of new technologies, automation, and the use of collected data. To enable the use of this data in predictive
intelligent systems, consequently increasing the probability of models, the methodology proposed suggests a systematic
failures occurring. In this sense, it can be said that vehicle process for data standardization. This process begins with
unavailability periods affect the productive capacity of problem understanding, followed by data understanding, and
finally, data preparation.

979-8-3503-1418-2/23/$31.00 ©2023 IEEE 807 ISBN 979-8-3503-1418-2

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
The paper is structured as follows: Section II presents the use of artificial intelligence to deal with maintenance events
main related works and summarizes relevant aspects of the become increasingly viable.
theoretical background of predictive maintenance, and data.
Section III presents the proposed methodology. The results A methodology that uses data mining methods to identify
and analysis are presented in section IV. Section V presents the health status and remaining useful life of the Air
future works and conclusions. Processing System (APS) responsible for managing and
regulating the flow of compressed air used in the brakes of
A. Ethical Aspects trucks and buses to optimize the scheduling and maintenance
This research uses real data from the automotive industry, schedules of vehicles is presented in [3]. Specific APS data
respecting throughout its development its privacy policy and was used to identify whether the component was faulty or not
data protection and the Brazilian General Data Protection with the aim of optimizing flexible maintenance planning
Regulations (Law No. 13,709, of August 2018) which using machine learning to identify the most relevant variables
regulates the activities of personal data processing. Specific for APS failure prediction [3]. The expertise of maintenance
details, such as variable numbers, exact numbers of and system specialists from the commercial vehicle industry
components containing failures, and customer data, will be was utilized for this purpose. The Random Forest model was
omitted. successful, demonstrating the feasibility of using data-driven
models for predictive maintenance using operational vehicle
II. BACKGROUND data in combination with expert knowledge.
A. Related Works When real-world application data is used for modeling
Predictive maintenance of commercial vehicles can be datasets, the problem of class imbalance often arises. Methods
considered a new maintenance technique since the beginning that seek to solve or reduce its influence on results by altering
of the past decade. In the development of machine learning the dataset are extensively studied, particularly in the context
models, some steps are employed, such as data selection, data of predictive maintenance where real-world data is frequently
processing, model selection, model training and validation. In used. In [4], a methodology is introduced that generates failure
recent years, for each step of the development of machine data with actual value for prognostics in conditions where
learning models, research has been carried out to enable the limited failure data is presented. Unlike [1], which used the
identification of failures in heavy vehicles using data-driven SMOTE technique, this methodology proposes the use of a
models. Conditional Generative Adversarial Network (CGAN)
integrated with existing failure data, noise, and auxiliary
State-of-the-art methods in predictive maintenance, can be information related to failure modes to generate new failure
classified as on-board, which have unrestricted and real-time data. The methodology proved itself effective in reducing
access to the data, and off-board, which have greater errors and uncertainties associated with limited real-world
computational resources and access to historical data. Three failure data, optimizing the predictions produced in the given
approaches can be used to predict the need for vehicle APS truck dataset, using Random Forest and Gradient
maintenance: Remaining Useful Life Prediction, Deviation Boosting Machine as classification-based prognostic models.
Detection, and Supervised Classification, all of which can use
historical and real-time data as input. Research presented in In [5], an extensive investigation was conducted on the
[1] introduces a generic machine learning methods that used effects of using Generative Adversarial Networks (GAN) and
on-board and off-board data for predictive maintenance of its different variations such as Vanilla Generative Adversarial
commercial vehicle components applied to air compressors on Networks (VGAN) and Wasserstein Generative Adversarial
trucks. The AI methods mainly used Random Forest as a Networks (WGAN) to generate data and samples from
supervised classification algorithm, in conjunction with minority classes as similar as possible to the original data. A
feature selection techniques such as Wrapper Feature comparison was also made between GAN and traditional
Selection, which looks for all subsets of attributes that can oversampling methods such as SMOTE and random
enhance the classification accuracy by detecting a possible oversampling. The main contribution was a new concept that
interaction between their variables, filter-based methods that combines the two oversampling techniques called SMOTE-
seek the most important parameters by analyzing them VGAN. All techniques were tested using two classification-
individually, and to deal with unbalanced classes, they used based machine learning models, Random Forest and
the Synthetic Minority Oversampling Technique (SMOTE), Stochastic Gradient Descent, applied to two different datasets
demonstrating the viability and use of large databases for for predictive maintenance of commercial vehicle
predictive maintenance. components. GAN proved to be better than traditional
techniques in learning distributions and contributed to
A systematic literature review was conducted by [2], who machine learning, with WGAN showing the best performance
identified the 36 key articles on machine learning methods overall. The new SMOTE-VGAN technique demonstrated a
applied to predictive maintenance. Their research indicates considerable improvement in learning the distribution and can
that some models, such as Random Forest, followed by achieve better results, but exploring new combinations is
models based on Artificial Neural Networks (ANNs), suggested for a better evaluation.
Recurrent Neural Networks (RNNs), and Deep Learning are
more commonly used in predictive maintenance and applied To investigate the temporal relationship between data
to various equipment such as motors, turbos, batteries, samples and describe the dynamic behavior of commercial
sensors, among others. Since 2013, there has been an vehicle air compressors, a research proposed in [6] introduced
increasing trend in research related to fault diagnosis and the use of RNNs, specifically LSTM networks, with the aim
prognosis. Furthermore, it was observed that with the of detecting air compressor failures within a ninety-day
advancement of Industry 4.0, predictive maintenance and the period. Their study also compared the efficiency of LSTM
with the Random Forest model. The LSTM network
demonstrated slightly inferior performance in terms of

808

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
prediction accuracy, but it was identified that model stability occurs just before the vehicle presents failures. This program
is a very important point and strongly correlated with real requires much more dedication and care from the maintainer.
maintenance management of vehicles, and a consistent model
in its predictions over time is essential in decision-making for
maintenance actions and optimization of vehicle availability.
B. Predictive Maintenance
Vehicle maintenance strategies can be planned using
different approaches, of which corrective, preventive, and
predictive maintenance are the most commonly used.
However [7].
Corrective maintenance occurs after a failure that renders
the vehicle inoperable, usually resulting in downtime. In many
cases, corrective maintenance is caused by a lack of strategy Fig. 3 Predictive Maintenance Program
and often occurs in testing workshops and dealerships for
more complex and uncommon types of failures, where a well- Predictive maintenance uses tools to determine when
defined diagnosis for problem-solving does not yet exist. This maintenance actions will be needed, based on continuous
policy is inefficient and results in high maintenance costs and monitoring of a machine or process integrity, allowing
vehicle downtime. Fig. 1 illustrates five trucks operating maintenance to be performed only when necessary [2].
under a corrective maintenance program that does not require Through the use of predictive tools based on historical data
data collection or knowledge about their remaining useful life. such as machine learning techniques, integrity factors such as
visual inspection and wear, statistical inference methods, and
engineering approaches, early fault detection is possible.
However, predictive maintenance is a prognostic approach in
which the current state of integrity and historical data are used
to schedule a maintenance event.
C. Data
The commercial vehicle industry, whose data will be
provided, has a central storage repository that contains data
from various sources, in order to have a single repository
within the company that receives large amounts of data and
makes them available for exploration and decision-making.
Fig. 1 Corrective Maintenance Program
1) On-board data: Several vehicle systems and
Another more cautious approach is preventive subsystems are monitored by sensors that provide signals to
maintenance, a common practice in the automotive industry, the Electronic Control Modules (ECUs), where the signals
where actions are planned for the revision and/or replacement are processed and converted into physical quantities for
of vehicle components. These maintenance activities are
identification and monitoring, and mathematical calculations
usually planned considering driving time, accumulated
kilometers, driver's driving mode, fuel consumption, type and may also be performed if needed. The ECUs communicate
weight of cargo, among others [7]. Fig. 2 illustrates 5 trucks through a Controller Area Network (CAN) whose data is used
operating using preventive maintenance. In this policy, some for vehicle control and determining its operating condition.
vehicles will be repaired on time while others will fail before The vehicles have a communication module that receives this
the planned review time. data in real-time and transmits it via GSM network to the
central data storage, and whenever a vehicle visits a
workshop for any activity, the data is downloaded and saved
in the system. This is a crucial part that enables the
opportunity to use data for the development of various
systems with numerous applications.
2) Off-board data: Most major automotive companies
have a vast amount of data from various sources, collected
over years of operations. The data consists of project designs,
technical reports, maintenance history, failure history,
vehicle operation statistics, among others. These data are
Fig. 2 Preventive Maintenance Program managed by various software and stored in a central
repository. This is an important source of information for
Finally, predictive maintenance, which employs methods developing the predictive model.
for identifying, diagnosing, and prognosing failures using
monitoring and predictive methods of machine conditions to III. METHODOLOGY
estimate when a failure will occur. This policy adapts the Fig. 4 shows the workflow for the systematic application
maintenance period and indicates which components should of machine learning techniques, which consists of problem
be repaired as needed [7]. Fig. 3 illustrates 5 trucks operating definition, data collection and preparation. These steps ensure
using predictive maintenance, where the maintenance action a structured and efficient approach for the subsequent

809

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
development of predictive models for identify the classification task [6], [8] and the Remaining Useful Life
turbocharger maintenance necessity. (RUL) approach that formulates the problem as a regression
task [9], [10], [11].
This work uses the Sliding-Box binary classification
approach. The model is designed to predict whether a
component will fail or not within a certain period based on the
data collected up to a certain point [8]. The Sliding-Box model
has several advantages such as simplicity, no issues with
censored data, direct relevance to the actual vehicle
application, and most importantly, making a direct decision on
whether the component will fail or not. By allowing the model
to decide whether to bring the vehicle for maintenance, it
facilitates the task of personnel working in workshops and
fleet operators in managing vehicle maintenance [6], [8]. Fig.
Fig. 4 Workflow for Machine Learning Application
5 illustrates the Sliding-Box approach where the classifier is
A. Data Descripition designed to predict whether the component will fail
considering all the data up to a certain point in time t, within a
The data used in this work was collected and stored by a certain time window ∆t. Each horizontal line represents the
commercial vehicle industry from trucks operating in different timeline of a vehicle, and the events can be readings of
markets for a limited period. Vehicles operating in the operational data, fault codes, and so on.
European and Latin American markets will be selected for
data extraction.
The turbochargers of the vehicles do not have direct
monitoring sensors for their condition and operation,
therefore, other combined data sources will be used to verify
if there are patterns related to input data and its integrity. To
compose the dataset used in this work, three types of data will
be explored: technical specification data (TSD), operational
data (OD), and maintenance data (MD), detailed as follows:
• TSD describes the configuration of the truck when it Fig. 5 Sliding Box approach [8]
is produced, using codes that determine which parts
are equipped in the vehicle, such as a code for the 2) Data categorization: In the methodology, samples of
engine, a code for the gearbox, a code for the type of faulty trucks, belonging to the positive class, were obtained
cabin, among others. Missing data occurs in this by selecting the operational data (OD) readings records
dataset due to the modular nature of the data, where within a time interval ∆t, which comprises records of 90 days
different engines may have different sensors, and collected prior to the repair or replacement of the
differences between generations of engines may have
turbocharger. All other records of these trucks were
different sensors with different precisions. To handle
the problem of modularity and missing data, a data considered as belonging to the negative class. For the trucks
imputation technique will be used. considered healthy, which did not undergo repair or
replacement of the turbocharger, all available records were
• OD is stored throughout the vehicle's lifespan and is selected, also considered as belonging to the negative class.
automatically transmitted via GSM network or Fig. 6 presents an illustration of the category extraction
manually extracted when vehicles visit a workshop.
process.
They are mainly composed of aggregated usage
statistics and variables that can be scalars or vectors.
Examples include exhaust gas temperature,
percentage of time a component was subjected to a
pressure range, distance traveled, fuel consumption,
among others.
• DM is collected when vehicles visit dealership
workshops. They contain information on which types
of services were performed, which parts needed
repairs, which parts were replaced, and costs.
B. Problem Understanding
For the structuring and composition of the dataset and the Fig. 6 Strategy for category extraction
definition of the strategy for failure prediction, interviews
must be conducted with experts to identify the main variables C. Data Understanding
related to the failure modes of turbochargers.
To the data understanding step, a population of models and
1) Data strategy and categorization: There are two main applications should be initially defined to delimit the study
approaches to deal with this problem, the Sliding-Box scope. Only trucks equipped with Euro V and equivalent
approach that formulates the problem as a binary engines were selected, limited to the 2018 manufacturing year.

810

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Additionally, an additional selection criterion was applied D. Data Preparation
regarding the markets of interest, separating vehicles that A large portion of operational data is collected when the
circulate in Europe from those that circulate in South America. truck visits a workshop and is transmitted via the GSM
The European markets considered in the analysis were: network. However, the data is not collected at fixed and
England, France, Germany, Sweden, Spain, Denmark, Poland, regular intervals, which poses a major challenge for learning
Netherlands, Norway, and Italy. The South American markets models.
considered were: Brazil, Argentina, Uruguay, Chile, Peru, and To address this challenge, the data should be sorted and
Colombia. This careful selection of data allows for a specific grouped by time and distance, categorized into 5000 kilometer
focus on regions of interest and the analysis of trucks with ranges, and for some vehicles, some ranges had no readings,
similar characteristics and contexts. so the previous reading will be added using the original data
1) Data collection: To collect and process the raw data to fill in the missing data for each truck, thus representing the
available in the central repository, scripts will be developed data in the time series.
using the Structured Query Language (SQL), a programming To deal with the large volume of null data, variables with
language specifically designed for managing relational more than 65% null values will be discarded, and for the
databases, and the software DBeaver will be chosen for this remaining variables, the Mean Imputation (MI) will be used,
purpose. Data sampling methods will be applied to obtain which calculates the average of the values for each feature
representative subsets to ensure accurate estimation of the found in the training dataset and replaces the null data in the
performance of binary classification models. To ensure the training and validation datasets.
appropriate quality and quantity of original readings, vehicles To balance the classes, the SMOTE technique must be
should be filtered based on the highest number of available used, available in the imbalanced-learn library of Python. The
readings. Events should then be classified as either a failed technique helps to improve the model’s performance in class
vehicle (0) or a non-failed vehicle (1) based on the occurrence imbalance problems, as it increases the amount of data in the
of repairs or replacement of parts on dates and mileage minority class, making it more representative and,
consequently, reducing classification bias towards the
corresponding to the readings. Subsequently, the data should
majority classes.
be pivoted so that each parameter becomes a column in the
data structure. This structure, composed of categorized and To deal with the high correlation of the data, the
ordered data over time, with all readings of the parameters of normalization technique must be adopted to ensure that the
interest in the columns, should be prepared to identify features present in the dataset are on a common scale, favoring
degradation patterns in the turbocharger and predict future efficiency and accuracy. Normalization will be performed
using the MinMaxScaler algorithm, available in the Scikit-
failures based on these patterns. This approach will allow for
learn library package of Python, which transforms the features
a detailed and efficient analysis of the data, aiming to obtain to a range between 0 and 1. Thus, it is ensured that all the
relevant conclusions for the study at hand. features have the same scale, preserving the original data form
2) Exploratory analisys and vizualization: In the data and contributing to a more robust and reliable analysis.
analysis process, the Python programming language will be
used, and some of the essential libraries will be utilized such IV. RESULTS
as NumPy, Pandas, and Matplotlib, which provide The results obtained were presented according to the
functionalities for statistical analysis and visualization. One workflow defined in the methodology, as illustrated in Fig. 4.
of the important steps will be to verify the quality of the A. Problem Understanding
available data. To do this, an analysis of the amount of null
After interviews with experts, a range of factors relevant
data in the dataset will be performed, using the Pandas library
to the primary failure modes of the turbocharger were
for data manipulation and analysis. This analysis will allow considered when selecting variables for the study. These
the identification and treatment of any missing values, factors encompass various operational parameters and
ensuring the reliability of the results obtained. Another conditions that impact the turbocharger's performance.
relevant aspect will be the analysis of data imbalance with
respect to their class. Using data visualization techniques, Due to the sensitivity of the information contained in the
features, anonymization measures were applied, following
such as bar graphs, possible imbalances in the distribution of
ethical principles, as discussed in section I - A. Therefore, the
classes in the dataset can be identified. The Matplotlib library selected variables will be presented in a generic manner in the
will be used to generate these visualizations. Finally, the following sequence to enhance the study's comprehensibility,
correlation between the "features" variables in the dataset will as shown in Table I.
be analyzed. High correlation can lead to the phenomenon of
overfitting, where the model becomes too adjusted to the TABLE I. DESCRIPTION OF VARIABLES
training data, resulting in poorer performance when applied Variable No Description
to new data. This can occur because the model may become
1 Engine Load Matrix Bias
trapped in specific patterns present in the training data that do
not apply to other data, and data pre-processing techniques 2 Engine Load Matrix
such as normalization and standardization can help reduce the 3 Ambient Temperature
correlation between the data. The NumPy library, which
4 Atmospheric Pressure
provides functionalities for mathematical calculations and
manipulation of array matrices, will be used for this analysis. 5 Boost Pressure

811

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Variable No Description and the knowledge of experts who work with the
6 Charge Air Temperature development of turbochargers, a time interval of 90 days was
determined. This interval allows vehicles to have failure data
7 Coolant Temperature
available in the operational database (OD) during the valid
8 Lambda Value period, as well as allowing sufficient time for the
9 Oil Pressure development of any wear on the turbocompressor that can be
identified.
10 Torque Intervention Status
B. Data Understanding
11 Torque Limiting Mode
The initial data set is composed of 38 variables,
12 Engine Runtime at Air Filter Clogging represented in different formats such as histograms, vector
13 Particle Filter Ash Filling matrices, and scalars. For the development of the research, 95
vehicles that presented turbocharger failure and 100 vehicles
14 Vehicle Distance at Air Filter Clogging that did not have failure were selected, totaling 7496 rows and
15 Engine Runtime 1031 columns.
16 Fuel Total Amount We can observe in Fig. 7 the presence of a large number
of null data in the data set, which can impact the quality and
17 Number of Starts
reliability of the analyses. The graph's vertical axis represents
18 Air Filter Max Flow Restriction the percentage of missing data in the dataset across various
features, while the horizontal X-axis lists these features, like
19 Air Intake Flow Restriction
environmental conditions, mechanical parameters, and
20 Current Filter Clogging operational metrics. It's worth mentioning that, to safeguard
sensitive information, not all X-axis variables were revealed.
21 Empty Filter Estimation
Nevertheless, a selection of variables has been displayed to aid
22 Engine Oil Consumed the reader's comprehension.
23 Engine Oil / Fuel Ratio (ED95)

24 Engine Oil / Fuel Ratio (Diesel)

25 Accelerator Pedal Position

26 Engine Oil Refilled

27 Engine Runtime at Filter Renewal

28 Particle Filter Ash Filling

29 Vehicle Distance at Filter Renewal

30 Ambient Temperature (Min/Max)

31 Atmospheric Pressure (Min/Max)

32 Boost Pressure (Min/Max)

33 Catalyst Thermal Aging (Min/Max)

Fig. 7 Percentage of null data
34 Charge Air Temperature (Min/Max)

35 Coolant Temperature (Min/Max) Another important aspect is the data imbalance, with only
16% of the data belonging to the positive class, indicating a
36 Engine Speed (Min/Max) clear asymmetry in the class distribution in the analyzed
37 Exhaust Temperature Upstream Catalyst (Min/Max) dataset, as we can observe in Fig. 8.
38 Oil Pressure (Min/Max)

1) Prediction strategy and data categorization: In this

work, the Sliding-Box classification model was used due to
its simplicity, easy interpretation compared to regression
models, and strong correlation with the current maintenance
application and management of vehicles. This approach uses
a binary prediction model that predicts whether the
turbocharger will fail within a certain time interval ∆t, called
the horizon of events. The labels for failure prediction are 0
and 1, where a label of 1 indicates that the turbocharger will
fail, and 0 indicates that it will not fail within the time interval Fig. 8 Class Distribution
∆t. Choosing the horizon of events is an essential
hiperparameter for this approach. Based on previous studies

812

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Finally, the correlation between the features was analyzed, D. Discussion
as shown in Fig. 9. In the Y and X axes of the heatmap graph, The methodology used demonstrates its benefit, as
the correlated features are represented. Light colors, with the understanding the problem, understanding the data, and its
upper limit in yellow, indicate high correlation between the preparation are essential steps within many data mining
features, while dark blue indicates low correlation. It's projects in the engineering domain. Through the workflow, it
important to note that, due to the confidentiality of the is possible to generate a checklist to allow for better structure,
information, not all variables have been displayed on the x and communication, and documentation of these essential tasks
y axes. However, a few variables have been shown to enhance for the development of prediction models. However,
the reader's understanding. collecting and defining data to compose the dataset used in
this research was a major challenge, not only due to data
collection done by the company, which was not originally
carried out for predictive maintenance purposes but also due
to the storage of this data in an unstructured format. This lack
of structure made it difficult to construct SQL queries to
access and create the necessary tables for data extraction and
understanding. It is important to note that all steps require
close cooperation of personnel with expertise in engineering
and information technology, such as engineers, mechanical
technicians, architects, and data scientists.
V. CONCLUSIONS AND FUTURE WORK
Based on the objectives and results presented in this study,
we conclude that the proposed methodology, which collects
real industry data derived from workshop histories and
operational records, utilizing data mining techniques for
Fig. 9 Correlation Matrix comprehensive variable analysis and dataset curation to
develop predictive models, has yielded positive experimental
These results provide a solid foundation for the analyses outcomes. The proposed methodology exhibits the potential to
carried out in this research, allowing for a precise and well- be extrapolated to other databases and components of
founded understanding and interpretation of the data, even commercial vehicles, bearing significant implications for
with the anonymization measures applied to protect sensitive advancing more accurate and dependable predictive models.
information.
Overall, this study highlights the importance of proper
C. Data Preparation data preparation and selection, as well as the use of advanced
After data preparation, the final dataset consists of 7496 data mining techniques, in developing accurate predictive
rows and 735 columns. A relevant point is that the processed maintenance models. It also underscores the need for
data went through a null value removal process, ensuring the collaboration between engineering and information
quality of the data used in the analysis. In addition, the technology specialists throughout the process. By improving
distribution of classes was balanced through the SMOTE predictive maintenance practices, this study has significant
technique, as illustrated in Fig. 10. This approach allowed implications for reducing downtime and increasing vehicle
balancing the number of samples in each class, avoiding bias efficiency and safety. Future research could explore additional
in the analysis. data sources and incorporate more advanced modeling
techniques to further improve the accuracy and reliability of
predictive maintenance models. Additionally, a study focused
on the development of a methodology for data collection,
storage, and structuring by industry specifically for predictive
maintenance could lead to improved quality and diversity of
data, thereby resulting in better prediction outcomes for
statistical and machine learning models.
REFERENCES

[1] R. Prytz, "Machine learning methods for vehicle

predictive maintenance using off-board and on-board
data.," Halmstad University Press, Halmstad, 2014.
Fig. 10 Class distribution after SMOTE. [2] Carvalho, T. P. et al., "A systematic literature review
of machine learning methods applied to predictive
These preparation steps have resulted in a dataset ready to maintenance.," Computers & Industrial Engineering,
be used by prediction algorithms for the classification of p. 137, 2019.
turbocharger failures, providing a solid foundation for the [3] Biteus, J. and Lindgren, T., "Planning Flexible
investigations and conclusions of this work. Maintenance for Heavy Trucks Using Machine
Learning Models, Constraint Programming, and Route

813

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Optimization.," SAE International, vol. 10, no. 3, p. [8] E. Bremer, "Prediction of Component Breakdowns in
306–315, 2017. Commercial Trucks: Using Machine Learning on
[4] Ranasinghe, G. D. and Parlikad, A. K., "Generating Operational and Repair History Data.," KTH,
real-valued failure data for prognostics under the Stockholm, 2020.
conditions of limited data availability.," in IEEE [9] F. Liljefors, "Time dependent modeling of
International Conference on Prognostics and Health turbocharger failure using machine learning.," KTH,
Management (ICPHM), San Francisco, 2019. Stockholm, 2020.
[5] Nataraj, V. and Narayanan, S., "Resolving Class [10] Mashhadi, P. S., Nowaczyk S. and Pashami S.,
Imbalance using Generative Adversarial Networks.," "Stacked Ensemble of Recurrent Neural Networks for
Halmstad, 2020. Predicting Turbocharger Remaining Useful Life.,"
[6] Chen, K., Sepideh P., Yuantao F. and Nowaczyk S., Applied Sciences, vol. 10, no. 1, p. 69, 2020.
"Predicting Air Compressor Failures Using," in EPIA [11] S. Voronov, "Machine Learning Models for
Conference on Artificial Intelligence, 2019. Predictive," Electronic Press, Linköping, 2020.
[7] PASHAMI, S. et al., "Explainable Predictive
Maintenance," arXiv:2306.05120 [cs.AI], p. 51, 8
June 2023.

814

Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.

Naked Power-The Phallus As An Apotropaic Symbol in The Images and Texts of Roman Italy
100% (1)
Naked Power-The Phallus As An Apotropaic Symbol in The Images and Texts of Roman Italy
132 pages
Freightliner Proprietary Parts Suggested List With Core Effective 1-1-18
67% (3)
Freightliner Proprietary Parts Suggested List With Core Effective 1-1-18
2,450 pages
4.1 Revised Penal Code Book 1
No ratings yet
4.1 Revised Penal Code Book 1
75 pages
Smart Fortwo Workshop Service Repair Manual PDF
88% (8)
Smart Fortwo Workshop Service Repair Manual PDF
4,602 pages
How To Be A Network Engineer in A Programmable Age PDF
100% (1)
How To Be A Network Engineer in A Programmable Age PDF
38 pages
Predictive Maintenance Challenges Whitepaper
100% (1)
Predictive Maintenance Challenges Whitepaper
11 pages
Predictive Maintenance Enabled by Machine Learning - Use Cases and
100% (1)
Predictive Maintenance Enabled by Machine Learning - Use Cases and
21 pages
(PPT) Environmental Impacts of Dam
100% (2)
(PPT) Environmental Impacts of Dam
17 pages
A Comprehensive Review On Artificial Intelligence Driven Predictive Maintenance in Vehicles: Technologies, Challenges and Future Research Directions
No ratings yet
A Comprehensive Review On Artificial Intelligence Driven Predictive Maintenance in Vehicles: Technologies, Challenges and Future Research Directions
25 pages
PHD Thesis Application of Data Mining Techniques For Fault Diagnosis and Prognosis of High Pressure Pump Failures
No ratings yet
PHD Thesis Application of Data Mining Techniques For Fault Diagnosis and Prognosis of High Pressure Pump Failures
263 pages
CAT - C10-C12-MBJ-MBL-Diesel-Engine-Service-Manual
92% (48)
CAT - C10-C12-MBJ-MBL-Diesel-Engine-Service-Manual
286 pages
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
Fuso Canter 2010-2017 Service Repair Manual
91% (11)
Fuso Canter 2010-2017 Service Repair Manual
2,664 pages
List Plu Redeem Voucher Cash Back 20K 1-15 Des 2024
No ratings yet
List Plu Redeem Voucher Cash Back 20K 1-15 Des 2024
75 pages
2002 Isuzu NPR Repair Manual
87% (30)
2002 Isuzu NPR Repair Manual
165 pages
Mack ES T310 Service Manual
100% (11)
Mack ES T310 Service Manual
206 pages
EDU4 Instructors Lesson Plan Final
No ratings yet
EDU4 Instructors Lesson Plan Final
26 pages
Complaint Affidavit Murder 1 PDF
No ratings yet
Complaint Affidavit Murder 1 PDF
3 pages
A Thing of Beauty 2023
No ratings yet
A Thing of Beauty 2023
7 pages
Treasurers Certificate
No ratings yet
Treasurers Certificate
2 pages
Ap Spanish Language - Syllabus Spanish 4
No ratings yet
Ap Spanish Language - Syllabus Spanish 4
9 pages
NISSAN YD25 Manual de Motor
93% (61)
NISSAN YD25 Manual de Motor
146 pages
A. Recount Text
No ratings yet
A. Recount Text
9 pages
Mx13 Paccar EPA13 Overhaul Manual
95% (59)
Mx13 Paccar EPA13 Overhaul Manual
186 pages
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
No ratings yet
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
33 pages
Report
No ratings yet
Report
25 pages
2020 - Machine Learning Approach To Predictive
No ratings yet
2020 - Machine Learning Approach To Predictive
10 pages
Machine Learning Applications in Predictive Maintenance For Vehicles Case Studies
No ratings yet
Machine Learning Applications in Predictive Maintenance For Vehicles Case Studies
14 pages
Global Oil and Gas Profile: Vision, Reputation, and Commitment
No ratings yet
Global Oil and Gas Profile: Vision, Reputation, and Commitment
44 pages
Analyzing Selling Price of Used Cars Using Machine Learning
No ratings yet
Analyzing Selling Price of Used Cars Using Machine Learning
41 pages
International Body &chassis Wiring Diagrams and Info
72% (89)
International Body &chassis Wiring Diagrams and Info
105 pages
Mark The Letter A, B, C or D To Indicate The Correct Answer To Each of The Following
No ratings yet
Mark The Letter A, B, C or D To Indicate The Correct Answer To Each of The Following
4 pages
Saluting On The March: Name: Parth Anand Lalit Class: Fybsc
No ratings yet
Saluting On The March: Name: Parth Anand Lalit Class: Fybsc
10 pages
Fault Detection and Diagnosis of Electric Vehicles Using Artificial Intelligence
No ratings yet
Fault Detection and Diagnosis of Electric Vehicles Using Artificial Intelligence
8 pages
Applicabilityof Voussoir Beam Theoryfor Tunnel Designin Sydney Oliveiraand Pells
No ratings yet
Applicabilityof Voussoir Beam Theoryfor Tunnel Designin Sydney Oliveiraand Pells
17 pages
Topic:-Optical Fiber Connectors: Prepared by
No ratings yet
Topic:-Optical Fiber Connectors: Prepared by
23 pages
Jom Faperta Vol 4 No 2 Oktober 2017 1
No ratings yet
Jom Faperta Vol 4 No 2 Oktober 2017 1
12 pages
Revolutionizing Automotive Care PDM
No ratings yet
Revolutionizing Automotive Care PDM
21 pages
Predictive Maintenance For Industrial IoT of Vehicle PDF
No ratings yet
Predictive Maintenance For Industrial IoT of Vehicle PDF
15 pages
Manual 4HK1 6HK1
94% (197)
Manual 4HK1 6HK1
328 pages
d13 Pinout Volvo
92% (125)
d13 Pinout Volvo
5 pages
Diagrama Electronico DT466
84% (25)
Diagrama Electronico DT466
2 pages
Mini Project New
No ratings yet
Mini Project New
25 pages
Phase 1 Review 1 Presentation Format 23 24 (1) 1 (1) (Read Only)
No ratings yet
Phase 1 Review 1 Presentation Format 23 24 (1) 1 (1) (Read Only)
20 pages
Centrifugal Pump Fault Diagnosis Using A Predictive Maintenance Model
No ratings yet
Centrifugal Pump Fault Diagnosis Using A Predictive Maintenance Model
12 pages
Binggggo
No ratings yet
Binggggo
7 pages
Phase1 Report
No ratings yet
Phase1 Report
18 pages
Ein 00811 2024 01
No ratings yet
Ein 00811 2024 01
21 pages
Sensors 24 03215
No ratings yet
Sensors 24 03215
25 pages
AI-Powered Predictive Analytics For Vehicle Maintenance Scheduling
No ratings yet
AI-Powered Predictive Analytics For Vehicle Maintenance Scheduling
16 pages
Certprepod Sfpartner
No ratings yet
Certprepod Sfpartner
10 pages
Aircraft On-Condition Reliability Assessment Based On Data-Intensive Analytics
No ratings yet
Aircraft On-Condition Reliability Assessment Based On Data-Intensive Analytics
12 pages
Study Guide: Cisco AppDynamics Professional Implementer
From Everand
Study Guide: Cisco AppDynamics Professional Implementer
Anand Vemula
No ratings yet
Research Proposal For Idriss
No ratings yet
Research Proposal For Idriss
10 pages
Entrep Quiz 1
No ratings yet
Entrep Quiz 1
6 pages
Algorithms 17 00411 v2
No ratings yet
Algorithms 17 00411 v2
17 pages
Prytz 2015
No ratings yet
Prytz 2015
12 pages
12 P Lehle Bosch
No ratings yet
12 P Lehle Bosch
22 pages
IT service continuity
From Everand
IT service continuity
Dougglas Hurtado Carmona
No ratings yet
Individual Predictive Maintenance Approach For Diesel Engines in Rail Vehicles
No ratings yet
Individual Predictive Maintenance Approach For Diesel Engines in Rail Vehicles
9 pages
Ai and Machine Learning For Predicting
No ratings yet
Ai and Machine Learning For Predicting
9 pages
Predictive Maintenance in Autonomous Vehicles
No ratings yet
Predictive Maintenance in Autonomous Vehicles
16 pages
Parkinsons
No ratings yet
Parkinsons
22 pages
Machine Learning Approach Using MLP and SVM Algori
No ratings yet
Machine Learning Approach Using MLP and SVM Algori
15 pages
Applied Sciences
No ratings yet
Applied Sciences
34 pages
Iot MM 1
No ratings yet
Iot MM 1
22 pages
Journal Review-Case study-RPE-CH SRINIVAS
No ratings yet
Journal Review-Case study-RPE-CH SRINIVAS
7 pages
Harnessing Advanced AI To Revolutionize The Capabilities and Performance of Vehi
No ratings yet
Harnessing Advanced AI To Revolutionize The Capabilities and Performance of Vehi
11 pages
Used Car Price Prediction
No ratings yet
Used Car Price Prediction
20 pages
Implementing AI in Car Diagnostics
No ratings yet
Implementing AI in Car Diagnostics
7 pages
Long Quiz Week 12 Joint Arrangements - ACTG341 Advanced Financial Accounting and Reporting 1
No ratings yet
Long Quiz Week 12 Joint Arrangements - ACTG341 Advanced Financial Accounting and Reporting 1
4 pages
Psychological Behaviorism and Behaviorizing Psychology: The Behavior Analyst / MABA April 1994
No ratings yet
Psychological Behaviorism and Behaviorizing Psychology: The Behavior Analyst / MABA April 1994
23 pages
A Physics-Informed Machine Learning Framework For Predictive Maintenance Applied To Turbomachinery Assets
No ratings yet
A Physics-Informed Machine Learning Framework For Predictive Maintenance Applied To Turbomachinery Assets
15 pages
Dragonborn Warlock 3rd Level
No ratings yet
Dragonborn Warlock 3rd Level
3 pages
Predicting Pre-Owned Car Prices Using Machine Learning
No ratings yet
Predicting Pre-Owned Car Prices Using Machine Learning
17 pages
Automobile Predictive Maintenance Using
No ratings yet
Automobile Predictive Maintenance Using
12 pages
Revolutionizing System Operation and Maintenance in The Automobile Industry Through Machine Learning Applications
No ratings yet
Revolutionizing System Operation and Maintenance in The Automobile Industry Through Machine Learning Applications
9 pages
Predictive Maintenance For Industrial IoT of Vehic
No ratings yet
Predictive Maintenance For Industrial IoT of Vehic
16 pages
Vehicle Litrature
No ratings yet
Vehicle Litrature
5 pages
74 Ijcse2018 19
No ratings yet
74 Ijcse2018 19
7 pages
Jasin Farras A Bu Papper
No ratings yet
Jasin Farras A Bu Papper
6 pages
PIXD230707ZZ
No ratings yet
PIXD230707ZZ
2 pages
Tech Seminar
No ratings yet
Tech Seminar
5 pages
Predictive Maintenance Unleashing Applications of Machine Learning A Comprehensive Exploration
No ratings yet
Predictive Maintenance Unleashing Applications of Machine Learning A Comprehensive Exploration
5 pages
Paper 10479
No ratings yet
Paper 10479
4 pages
AppDynamics Administration and Optimization: Definitive Reference for Developers and Engineers
From Everand
AppDynamics Administration and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Prediction of The Price of Used Cars Based On Mach
No ratings yet
Prediction of The Price of Used Cars Based On Mach
7 pages
1 s2.0 S2212827119303828 Main
No ratings yet
1 s2.0 S2212827119303828 Main
6 pages
Technical Specifications Baby Warmer
No ratings yet
Technical Specifications Baby Warmer
1 page
DARLIAP9
No ratings yet
DARLIAP9
6 pages
AIoT 4 Zheng TechnicalPaper KDD2020
No ratings yet
AIoT 4 Zheng TechnicalPaper KDD2020
6 pages
Enhanced Predictive Modeling For Electric Vehicle Fault Detection
No ratings yet
Enhanced Predictive Modeling For Electric Vehicle Fault Detection
2 pages
1st Review
No ratings yet
1st Review
9 pages
Predictive Vehicle Maintenance Using Deep Neural Networks-2
No ratings yet
Predictive Vehicle Maintenance Using Deep Neural Networks-2
5 pages
2013 01 90411 PDF
No ratings yet
2013 01 90411 PDF
11 pages
Ecu Repair
100% (20)
Ecu Repair
146 pages
The Tourism Profile of Mindanao Lesson 9
No ratings yet
The Tourism Profile of Mindanao Lesson 9
5 pages
Essay Plan
No ratings yet
Essay Plan
4 pages
Manufacturing Process Optimization Using Data Mining Techniques
No ratings yet
Manufacturing Process Optimization Using Data Mining Techniques
4 pages
Maintenance Database: José Caldeira Duarte, Pedro F. Cunha, João T. Craveiro
No ratings yet
Maintenance Database: José Caldeira Duarte, Pedro F. Cunha, João T. Craveiro
6 pages
Mahindra Capstone
No ratings yet
Mahindra Capstone
5 pages
Science 7th Paper
No ratings yet
Science 7th Paper
2 pages
Gearmax Ep: Gearmax Ep 680 Es Una Mezcla Con Aceites Y Aditivos de Base
No ratings yet
Gearmax Ep: Gearmax Ep 680 Es Una Mezcla Con Aceites Y Aditivos de Base
2 pages
Air Pressure System Failure Prediction
No ratings yet
Air Pressure System Failure Prediction
8 pages

Data-Driven Methodology For Predictive Maintenance

Uploaded by

Data-Driven Methodology For Predictive Maintenance

Uploaded by

2023 15th IEEE International Conference on Industry Applications Thu2Track A.

Data-Driven Methodology for Predictive Maintenance

979-8-3503-1418-2/23/$31.00 ©2023 IEEE 807 ISBN 979-8-3503-1418-2

24 Engine Oil / Fuel Ratio (Diesel)

25 Accelerator Pedal Position

26 Engine Oil Refilled

27 Engine Runtime at Filter Renewal

28 Particle Filter Ash Filling

29 Vehicle Distance at Filter Renewal

30 Ambient Temperature (Min/Max)

31 Atmospheric Pressure (Min/Max)

32 Boost Pressure (Min/Max)

33 Catalyst Thermal Aging (Min/Max)

1) Prediction strategy and data categorization: In this

[1] R. Prytz, "Machine learning methods for vehicle

You might also like