Data-Driven Methodology For Predictive Maintenance
Data-Driven Methodology For Predictive Maintenance
Abstract— Predictive maintenance plays an important role companies, increasing operating costs and, consequently,
in ensuring the profitability of companies that rely on interfering with product quality and customer service.
commercial vehicles as a core durable asset in the transportation
industry. However, developing accurate and reliable predictive Automotive manufacturers have a record of the historical
models for such equipment requires specific data analysis maintenance data and operational data for each individual
methodologies. This work proposes a data-driven methodology truck. This database can be used to obtain statistical data and
for predictive maintenance of commercial vehicles, comprising information structure to determine the relationship between
three main steps adapted from generic data mining methods: the data, identify patterns and obtain prognostic methods that
Problem Understanding, Data Understanding, and Data estimate the degradation levels of a component and its
Preparation. We used a case study that uses operational data remaining useful life. A real decision support system should
and maintenance history of turbochargers to demonstrate the gather various types of information, such as field
feasibility of the proposed methodology that is based on the instrumentation data, historical monitored variables, and
collection and analysis of real industry data for detailed analysis preventive and corrective maintenance reports.
of relevant variables and prepare the dataset for construction of
a predictive model. In this way, it is possible to identify trends The motivation for this research arose from the need to
and patterns that allow for the prediction of possible failures in optimize vehicle repair time through the use of a fault
vehicles before they occur, increasing their uptime reducing prognosis system employing artificial intelligence techniques.
maintenance costs. However, it's important to note that the scope of this work
does not encompass the development of predictive models or
Keywords—predictive maintenance, turbocharger, data the implementation of a fault diagnosis system. Rather, the
mining, machine learning focus lies on addressing the fundamental challenge of data
preparation. Since the development of machine learning
I. INTRODUCTION models requires the availability of uniform, reliable, and
The transport market is currently experiencing one of its structured datasets, this study aims to tackle the complexities
main moments of change, with the emergence of new posed by the automotive industry's extensive and
technologies, such as advances in vehicle connectivity heterogeneous data storage practices. By proposing a
through telematics, which allows for real-time data collection methodology for data preparation in this specific context, this
and fleet visibility. New business models such as shared research seeks to lay the groundwork for the successful
mobility solutions and on-demand services to meet individual application of artificial intelligence techniques in the future
travel needs, and the search for sustainable transport solutions development of predictive models and fault diagnosis
such as electric vehicles, shaping a disruptive scenario in systems.
which companies need to find innovative answers to
consolidate or maintain their position in the market. In the There are numerous factors that affect the collection,
transport chain, the commercial vehicle is an integral part, storage, availability, and data values. In particular, there is a
where robustness, high reliability, and uptime are essential to wide range of vehicle configurations that vary according to
ensure the profitability and competitiveness of companies, their application, such as sugarcane, haulage, or mining
mainly because it is inserted in a market that depends on many trucks, and their technical specifications, including engine
external variables such as fuel prices, vehicle failures, and type and power, quantity and types of sensors, among others.
economic slowdowns. The need for innovation inserted in Additionally, vehicle generations can present significant
volatile economic contexts requires the search for quick differences, with the installation of parts, sensors and software
answers and increasingly efficient transport solutions, where from different versions depending on the year of manufacture.
new and more complex transport systems will develop [1]. Operational and regional factors, such as transported load, fuel
quality, documentation of maintenance services, type and
Commercial vehicles have a large number and diversity of condition of roads, weather conditions, driver skill, and time
components and systems that have been increasing with the of use, are also directly related to the values and quality of
development of new technologies, automation, and the use of collected data. To enable the use of this data in predictive
intelligent systems, consequently increasing the probability of models, the methodology proposed suggests a systematic
failures occurring. In this sense, it can be said that vehicle process for data standardization. This process begins with
unavailability periods affect the productive capacity of problem understanding, followed by data understanding, and
finally, data preparation.
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
The paper is structured as follows: Section II presents the use of artificial intelligence to deal with maintenance events
main related works and summarizes relevant aspects of the become increasingly viable.
theoretical background of predictive maintenance, and data.
Section III presents the proposed methodology. The results A methodology that uses data mining methods to identify
and analysis are presented in section IV. Section V presents the health status and remaining useful life of the Air
future works and conclusions. Processing System (APS) responsible for managing and
regulating the flow of compressed air used in the brakes of
A. Ethical Aspects trucks and buses to optimize the scheduling and maintenance
This research uses real data from the automotive industry, schedules of vehicles is presented in [3]. Specific APS data
respecting throughout its development its privacy policy and was used to identify whether the component was faulty or not
data protection and the Brazilian General Data Protection with the aim of optimizing flexible maintenance planning
Regulations (Law No. 13,709, of August 2018) which using machine learning to identify the most relevant variables
regulates the activities of personal data processing. Specific for APS failure prediction [3]. The expertise of maintenance
details, such as variable numbers, exact numbers of and system specialists from the commercial vehicle industry
components containing failures, and customer data, will be was utilized for this purpose. The Random Forest model was
omitted. successful, demonstrating the feasibility of using data-driven
models for predictive maintenance using operational vehicle
II. BACKGROUND data in combination with expert knowledge.
A. Related Works When real-world application data is used for modeling
Predictive maintenance of commercial vehicles can be datasets, the problem of class imbalance often arises. Methods
considered a new maintenance technique since the beginning that seek to solve or reduce its influence on results by altering
of the past decade. In the development of machine learning the dataset are extensively studied, particularly in the context
models, some steps are employed, such as data selection, data of predictive maintenance where real-world data is frequently
processing, model selection, model training and validation. In used. In [4], a methodology is introduced that generates failure
recent years, for each step of the development of machine data with actual value for prognostics in conditions where
learning models, research has been carried out to enable the limited failure data is presented. Unlike [1], which used the
identification of failures in heavy vehicles using data-driven SMOTE technique, this methodology proposes the use of a
models. Conditional Generative Adversarial Network (CGAN)
integrated with existing failure data, noise, and auxiliary
State-of-the-art methods in predictive maintenance, can be information related to failure modes to generate new failure
classified as on-board, which have unrestricted and real-time data. The methodology proved itself effective in reducing
access to the data, and off-board, which have greater errors and uncertainties associated with limited real-world
computational resources and access to historical data. Three failure data, optimizing the predictions produced in the given
approaches can be used to predict the need for vehicle APS truck dataset, using Random Forest and Gradient
maintenance: Remaining Useful Life Prediction, Deviation Boosting Machine as classification-based prognostic models.
Detection, and Supervised Classification, all of which can use
historical and real-time data as input. Research presented in In [5], an extensive investigation was conducted on the
[1] introduces a generic machine learning methods that used effects of using Generative Adversarial Networks (GAN) and
on-board and off-board data for predictive maintenance of its different variations such as Vanilla Generative Adversarial
commercial vehicle components applied to air compressors on Networks (VGAN) and Wasserstein Generative Adversarial
trucks. The AI methods mainly used Random Forest as a Networks (WGAN) to generate data and samples from
supervised classification algorithm, in conjunction with minority classes as similar as possible to the original data. A
feature selection techniques such as Wrapper Feature comparison was also made between GAN and traditional
Selection, which looks for all subsets of attributes that can oversampling methods such as SMOTE and random
enhance the classification accuracy by detecting a possible oversampling. The main contribution was a new concept that
interaction between their variables, filter-based methods that combines the two oversampling techniques called SMOTE-
seek the most important parameters by analyzing them VGAN. All techniques were tested using two classification-
individually, and to deal with unbalanced classes, they used based machine learning models, Random Forest and
the Synthetic Minority Oversampling Technique (SMOTE), Stochastic Gradient Descent, applied to two different datasets
demonstrating the viability and use of large databases for for predictive maintenance of commercial vehicle
predictive maintenance. components. GAN proved to be better than traditional
techniques in learning distributions and contributed to
A systematic literature review was conducted by [2], who machine learning, with WGAN showing the best performance
identified the 36 key articles on machine learning methods overall. The new SMOTE-VGAN technique demonstrated a
applied to predictive maintenance. Their research indicates considerable improvement in learning the distribution and can
that some models, such as Random Forest, followed by achieve better results, but exploring new combinations is
models based on Artificial Neural Networks (ANNs), suggested for a better evaluation.
Recurrent Neural Networks (RNNs), and Deep Learning are
more commonly used in predictive maintenance and applied To investigate the temporal relationship between data
to various equipment such as motors, turbos, batteries, samples and describe the dynamic behavior of commercial
sensors, among others. Since 2013, there has been an vehicle air compressors, a research proposed in [6] introduced
increasing trend in research related to fault diagnosis and the use of RNNs, specifically LSTM networks, with the aim
prognosis. Furthermore, it was observed that with the of detecting air compressor failures within a ninety-day
advancement of Industry 4.0, predictive maintenance and the period. Their study also compared the efficiency of LSTM
with the Random Forest model. The LSTM network
demonstrated slightly inferior performance in terms of
808
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
prediction accuracy, but it was identified that model stability occurs just before the vehicle presents failures. This program
is a very important point and strongly correlated with real requires much more dedication and care from the maintainer.
maintenance management of vehicles, and a consistent model
in its predictions over time is essential in decision-making for
maintenance actions and optimization of vehicle availability.
B. Predictive Maintenance
Vehicle maintenance strategies can be planned using
different approaches, of which corrective, preventive, and
predictive maintenance are the most commonly used.
However [7].
Corrective maintenance occurs after a failure that renders
the vehicle inoperable, usually resulting in downtime. In many
cases, corrective maintenance is caused by a lack of strategy Fig. 3 Predictive Maintenance Program
and often occurs in testing workshops and dealerships for
more complex and uncommon types of failures, where a well- Predictive maintenance uses tools to determine when
defined diagnosis for problem-solving does not yet exist. This maintenance actions will be needed, based on continuous
policy is inefficient and results in high maintenance costs and monitoring of a machine or process integrity, allowing
vehicle downtime. Fig. 1 illustrates five trucks operating maintenance to be performed only when necessary [2].
under a corrective maintenance program that does not require Through the use of predictive tools based on historical data
data collection or knowledge about their remaining useful life. such as machine learning techniques, integrity factors such as
visual inspection and wear, statistical inference methods, and
engineering approaches, early fault detection is possible.
However, predictive maintenance is a prognostic approach in
which the current state of integrity and historical data are used
to schedule a maintenance event.
C. Data
The commercial vehicle industry, whose data will be
provided, has a central storage repository that contains data
from various sources, in order to have a single repository
within the company that receives large amounts of data and
makes them available for exploration and decision-making.
Fig. 1 Corrective Maintenance Program
1) On-board data: Several vehicle systems and
Another more cautious approach is preventive subsystems are monitored by sensors that provide signals to
maintenance, a common practice in the automotive industry, the Electronic Control Modules (ECUs), where the signals
where actions are planned for the revision and/or replacement are processed and converted into physical quantities for
of vehicle components. These maintenance activities are
identification and monitoring, and mathematical calculations
usually planned considering driving time, accumulated
kilometers, driver's driving mode, fuel consumption, type and may also be performed if needed. The ECUs communicate
weight of cargo, among others [7]. Fig. 2 illustrates 5 trucks through a Controller Area Network (CAN) whose data is used
operating using preventive maintenance. In this policy, some for vehicle control and determining its operating condition.
vehicles will be repaired on time while others will fail before The vehicles have a communication module that receives this
the planned review time. data in real-time and transmits it via GSM network to the
central data storage, and whenever a vehicle visits a
workshop for any activity, the data is downloaded and saved
in the system. This is a crucial part that enables the
opportunity to use data for the development of various
systems with numerous applications.
2) Off-board data: Most major automotive companies
have a vast amount of data from various sources, collected
over years of operations. The data consists of project designs,
technical reports, maintenance history, failure history,
vehicle operation statistics, among others. These data are
Fig. 2 Preventive Maintenance Program managed by various software and stored in a central
repository. This is an important source of information for
Finally, predictive maintenance, which employs methods developing the predictive model.
for identifying, diagnosing, and prognosing failures using
monitoring and predictive methods of machine conditions to III. METHODOLOGY
estimate when a failure will occur. This policy adapts the Fig. 4 shows the workflow for the systematic application
maintenance period and indicates which components should of machine learning techniques, which consists of problem
be repaired as needed [7]. Fig. 3 illustrates 5 trucks operating definition, data collection and preparation. These steps ensure
using predictive maintenance, where the maintenance action a structured and efficient approach for the subsequent
809
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
development of predictive models for identify the classification task [6], [8] and the Remaining Useful Life
turbocharger maintenance necessity. (RUL) approach that formulates the problem as a regression
task [9], [10], [11].
This work uses the Sliding-Box binary classification
approach. The model is designed to predict whether a
component will fail or not within a certain period based on the
data collected up to a certain point [8]. The Sliding-Box model
has several advantages such as simplicity, no issues with
censored data, direct relevance to the actual vehicle
application, and most importantly, making a direct decision on
whether the component will fail or not. By allowing the model
to decide whether to bring the vehicle for maintenance, it
facilitates the task of personnel working in workshops and
fleet operators in managing vehicle maintenance [6], [8]. Fig.
Fig. 4 Workflow for Machine Learning Application
5 illustrates the Sliding-Box approach where the classifier is
A. Data Descripition designed to predict whether the component will fail
considering all the data up to a certain point in time t, within a
The data used in this work was collected and stored by a certain time window ∆t. Each horizontal line represents the
commercial vehicle industry from trucks operating in different timeline of a vehicle, and the events can be readings of
markets for a limited period. Vehicles operating in the operational data, fault codes, and so on.
European and Latin American markets will be selected for
data extraction.
The turbochargers of the vehicles do not have direct
monitoring sensors for their condition and operation,
therefore, other combined data sources will be used to verify
if there are patterns related to input data and its integrity. To
compose the dataset used in this work, three types of data will
be explored: technical specification data (TSD), operational
data (OD), and maintenance data (MD), detailed as follows:
• TSD describes the configuration of the truck when it Fig. 5 Sliding Box approach [8]
is produced, using codes that determine which parts
are equipped in the vehicle, such as a code for the 2) Data categorization: In the methodology, samples of
engine, a code for the gearbox, a code for the type of faulty trucks, belonging to the positive class, were obtained
cabin, among others. Missing data occurs in this by selecting the operational data (OD) readings records
dataset due to the modular nature of the data, where within a time interval ∆t, which comprises records of 90 days
different engines may have different sensors, and collected prior to the repair or replacement of the
differences between generations of engines may have
turbocharger. All other records of these trucks were
different sensors with different precisions. To handle
the problem of modularity and missing data, a data considered as belonging to the negative class. For the trucks
imputation technique will be used. considered healthy, which did not undergo repair or
replacement of the turbocharger, all available records were
• OD is stored throughout the vehicle's lifespan and is selected, also considered as belonging to the negative class.
automatically transmitted via GSM network or Fig. 6 presents an illustration of the category extraction
manually extracted when vehicles visit a workshop.
process.
They are mainly composed of aggregated usage
statistics and variables that can be scalars or vectors.
Examples include exhaust gas temperature,
percentage of time a component was subjected to a
pressure range, distance traveled, fuel consumption,
among others.
• DM is collected when vehicles visit dealership
workshops. They contain information on which types
of services were performed, which parts needed
repairs, which parts were replaced, and costs.
B. Problem Understanding
For the structuring and composition of the dataset and the Fig. 6 Strategy for category extraction
definition of the strategy for failure prediction, interviews
must be conducted with experts to identify the main variables C. Data Understanding
related to the failure modes of turbochargers.
To the data understanding step, a population of models and
1) Data strategy and categorization: There are two main applications should be initially defined to delimit the study
approaches to deal with this problem, the Sliding-Box scope. Only trucks equipped with Euro V and equivalent
approach that formulates the problem as a binary engines were selected, limited to the 2018 manufacturing year.
810
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Additionally, an additional selection criterion was applied D. Data Preparation
regarding the markets of interest, separating vehicles that A large portion of operational data is collected when the
circulate in Europe from those that circulate in South America. truck visits a workshop and is transmitted via the GSM
The European markets considered in the analysis were: network. However, the data is not collected at fixed and
England, France, Germany, Sweden, Spain, Denmark, Poland, regular intervals, which poses a major challenge for learning
Netherlands, Norway, and Italy. The South American markets models.
considered were: Brazil, Argentina, Uruguay, Chile, Peru, and To address this challenge, the data should be sorted and
Colombia. This careful selection of data allows for a specific grouped by time and distance, categorized into 5000 kilometer
focus on regions of interest and the analysis of trucks with ranges, and for some vehicles, some ranges had no readings,
similar characteristics and contexts. so the previous reading will be added using the original data
1) Data collection: To collect and process the raw data to fill in the missing data for each truck, thus representing the
available in the central repository, scripts will be developed data in the time series.
using the Structured Query Language (SQL), a programming To deal with the large volume of null data, variables with
language specifically designed for managing relational more than 65% null values will be discarded, and for the
databases, and the software DBeaver will be chosen for this remaining variables, the Mean Imputation (MI) will be used,
purpose. Data sampling methods will be applied to obtain which calculates the average of the values for each feature
representative subsets to ensure accurate estimation of the found in the training dataset and replaces the null data in the
performance of binary classification models. To ensure the training and validation datasets.
appropriate quality and quantity of original readings, vehicles To balance the classes, the SMOTE technique must be
should be filtered based on the highest number of available used, available in the imbalanced-learn library of Python. The
readings. Events should then be classified as either a failed technique helps to improve the model’s performance in class
vehicle (0) or a non-failed vehicle (1) based on the occurrence imbalance problems, as it increases the amount of data in the
of repairs or replacement of parts on dates and mileage minority class, making it more representative and,
consequently, reducing classification bias towards the
corresponding to the readings. Subsequently, the data should
majority classes.
be pivoted so that each parameter becomes a column in the
data structure. This structure, composed of categorized and To deal with the high correlation of the data, the
ordered data over time, with all readings of the parameters of normalization technique must be adopted to ensure that the
interest in the columns, should be prepared to identify features present in the dataset are on a common scale, favoring
degradation patterns in the turbocharger and predict future efficiency and accuracy. Normalization will be performed
using the MinMaxScaler algorithm, available in the Scikit-
failures based on these patterns. This approach will allow for
learn library package of Python, which transforms the features
a detailed and efficient analysis of the data, aiming to obtain to a range between 0 and 1. Thus, it is ensured that all the
relevant conclusions for the study at hand. features have the same scale, preserving the original data form
2) Exploratory analisys and vizualization: In the data and contributing to a more robust and reliable analysis.
analysis process, the Python programming language will be
used, and some of the essential libraries will be utilized such IV. RESULTS
as NumPy, Pandas, and Matplotlib, which provide The results obtained were presented according to the
functionalities for statistical analysis and visualization. One workflow defined in the methodology, as illustrated in Fig. 4.
of the important steps will be to verify the quality of the A. Problem Understanding
available data. To do this, an analysis of the amount of null
After interviews with experts, a range of factors relevant
data in the dataset will be performed, using the Pandas library
to the primary failure modes of the turbocharger were
for data manipulation and analysis. This analysis will allow considered when selecting variables for the study. These
the identification and treatment of any missing values, factors encompass various operational parameters and
ensuring the reliability of the results obtained. Another conditions that impact the turbocharger's performance.
relevant aspect will be the analysis of data imbalance with
respect to their class. Using data visualization techniques, Due to the sensitivity of the information contained in the
features, anonymization measures were applied, following
such as bar graphs, possible imbalances in the distribution of
ethical principles, as discussed in section I - A. Therefore, the
classes in the dataset can be identified. The Matplotlib library selected variables will be presented in a generic manner in the
will be used to generate these visualizations. Finally, the following sequence to enhance the study's comprehensibility,
correlation between the "features" variables in the dataset will as shown in Table I.
be analyzed. High correlation can lead to the phenomenon of
overfitting, where the model becomes too adjusted to the TABLE I. DESCRIPTION OF VARIABLES
training data, resulting in poorer performance when applied Variable No Description
to new data. This can occur because the model may become
1 Engine Load Matrix Bias
trapped in specific patterns present in the training data that do
not apply to other data, and data pre-processing techniques 2 Engine Load Matrix
such as normalization and standardization can help reduce the 3 Ambient Temperature
correlation between the data. The NumPy library, which
4 Atmospheric Pressure
provides functionalities for mathematical calculations and
manipulation of array matrices, will be used for this analysis. 5 Boost Pressure
811
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Variable No Description and the knowledge of experts who work with the
6 Charge Air Temperature development of turbochargers, a time interval of 90 days was
determined. This interval allows vehicles to have failure data
7 Coolant Temperature
available in the operational database (OD) during the valid
8 Lambda Value period, as well as allowing sufficient time for the
9 Oil Pressure development of any wear on the turbocompressor that can be
identified.
10 Torque Intervention Status
B. Data Understanding
11 Torque Limiting Mode
The initial data set is composed of 38 variables,
12 Engine Runtime at Air Filter Clogging represented in different formats such as histograms, vector
13 Particle Filter Ash Filling matrices, and scalars. For the development of the research, 95
vehicles that presented turbocharger failure and 100 vehicles
14 Vehicle Distance at Air Filter Clogging that did not have failure were selected, totaling 7496 rows and
15 Engine Runtime 1031 columns.
16 Fuel Total Amount We can observe in Fig. 7 the presence of a large number
of null data in the data set, which can impact the quality and
17 Number of Starts
reliability of the analyses. The graph's vertical axis represents
18 Air Filter Max Flow Restriction the percentage of missing data in the dataset across various
features, while the horizontal X-axis lists these features, like
19 Air Intake Flow Restriction
environmental conditions, mechanical parameters, and
20 Current Filter Clogging operational metrics. It's worth mentioning that, to safeguard
sensitive information, not all X-axis variables were revealed.
21 Empty Filter Estimation
Nevertheless, a selection of variables has been displayed to aid
22 Engine Oil Consumed the reader's comprehension.
23 Engine Oil / Fuel Ratio (ED95)
35 Coolant Temperature (Min/Max) Another important aspect is the data imbalance, with only
16% of the data belonging to the positive class, indicating a
36 Engine Speed (Min/Max) clear asymmetry in the class distribution in the analyzed
37 Exhaust Temperature Upstream Catalyst (Min/Max) dataset, as we can observe in Fig. 8.
38 Oil Pressure (Min/Max)
812
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Finally, the correlation between the features was analyzed, D. Discussion
as shown in Fig. 9. In the Y and X axes of the heatmap graph, The methodology used demonstrates its benefit, as
the correlated features are represented. Light colors, with the understanding the problem, understanding the data, and its
upper limit in yellow, indicate high correlation between the preparation are essential steps within many data mining
features, while dark blue indicates low correlation. It's projects in the engineering domain. Through the workflow, it
important to note that, due to the confidentiality of the is possible to generate a checklist to allow for better structure,
information, not all variables have been displayed on the x and communication, and documentation of these essential tasks
y axes. However, a few variables have been shown to enhance for the development of prediction models. However,
the reader's understanding. collecting and defining data to compose the dataset used in
this research was a major challenge, not only due to data
collection done by the company, which was not originally
carried out for predictive maintenance purposes but also due
to the storage of this data in an unstructured format. This lack
of structure made it difficult to construct SQL queries to
access and create the necessary tables for data extraction and
understanding. It is important to note that all steps require
close cooperation of personnel with expertise in engineering
and information technology, such as engineers, mechanical
technicians, architects, and data scientists.
V. CONCLUSIONS AND FUTURE WORK
Based on the objectives and results presented in this study,
we conclude that the proposed methodology, which collects
real industry data derived from workshop histories and
operational records, utilizing data mining techniques for
Fig. 9 Correlation Matrix comprehensive variable analysis and dataset curation to
develop predictive models, has yielded positive experimental
These results provide a solid foundation for the analyses outcomes. The proposed methodology exhibits the potential to
carried out in this research, allowing for a precise and well- be extrapolated to other databases and components of
founded understanding and interpretation of the data, even commercial vehicles, bearing significant implications for
with the anonymization measures applied to protect sensitive advancing more accurate and dependable predictive models.
information.
Overall, this study highlights the importance of proper
C. Data Preparation data preparation and selection, as well as the use of advanced
After data preparation, the final dataset consists of 7496 data mining techniques, in developing accurate predictive
rows and 735 columns. A relevant point is that the processed maintenance models. It also underscores the need for
data went through a null value removal process, ensuring the collaboration between engineering and information
quality of the data used in the analysis. In addition, the technology specialists throughout the process. By improving
distribution of classes was balanced through the SMOTE predictive maintenance practices, this study has significant
technique, as illustrated in Fig. 10. This approach allowed implications for reducing downtime and increasing vehicle
balancing the number of samples in each class, avoiding bias efficiency and safety. Future research could explore additional
in the analysis. data sources and incorporate more advanced modeling
techniques to further improve the accuracy and reliability of
predictive maintenance models. Additionally, a study focused
on the development of a methodology for data collection,
storage, and structuring by industry specifically for predictive
maintenance could lead to improved quality and diversity of
data, thereby resulting in better prediction outcomes for
statistical and machine learning models.
REFERENCES
813
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.
Optimization.," SAE International, vol. 10, no. 3, p. [8] E. Bremer, "Prediction of Component Breakdowns in
306–315, 2017. Commercial Trucks: Using Machine Learning on
[4] Ranasinghe, G. D. and Parlikad, A. K., "Generating Operational and Repair History Data.," KTH,
real-valued failure data for prognostics under the Stockholm, 2020.
conditions of limited data availability.," in IEEE [9] F. Liljefors, "Time dependent modeling of
International Conference on Prognostics and Health turbocharger failure using machine learning.," KTH,
Management (ICPHM), San Francisco, 2019. Stockholm, 2020.
[5] Nataraj, V. and Narayanan, S., "Resolving Class [10] Mashhadi, P. S., Nowaczyk S. and Pashami S.,
Imbalance using Generative Adversarial Networks.," "Stacked Ensemble of Recurrent Neural Networks for
Halmstad, 2020. Predicting Turbocharger Remaining Useful Life.,"
[6] Chen, K., Sepideh P., Yuantao F. and Nowaczyk S., Applied Sciences, vol. 10, no. 1, p. 69, 2020.
"Predicting Air Compressor Failures Using," in EPIA [11] S. Voronov, "Machine Learning Models for
Conference on Artificial Intelligence, 2019. Predictive," Electronic Press, Linköping, 2020.
[7] PASHAMI, S. et al., "Explainable Predictive
Maintenance," arXiv:2306.05120 [cs.AI], p. 51, 8
June 2023.
814
Authorized licensed use limited to: RPTU Kaiserslautern-Landau. Downloaded on February 08,2025 at 18:36:27 UTC from IEEE Xplore. Restrictions apply.