Processes
Processes
Article
Conditional Generative Adversarial Networks with Optimized
Machine Learning for Fault Detection of Triplex Pump in
Industrial Digital Twin
Amged Sayed 1,2, *, Samah Alshathri 3 and Ezz El-Din Hemdan 4,5
1 Department of Electrical Energy Engineering, College of Engineering & Technology, Arab Academy for
Science Technology & Maritime Transport, Smart Village Campus, Giza 12577, Egypt
2 Industrial Electronics and Control Engineering Department, Faculty of Electronic Engineering, Menoufia
University, Menoufia 32952, Egypt
3 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint
Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia; [email protected]
4 Structure and Materials Research Lab, Prince Sultan University, P.O. Box 66833, Riyadh 11586, Saudi Arabia;
[email protected]
5 Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University,
Menoufia 32952, Egypt
* Correspondence: [email protected]
Abstract: In recent years, digital twin (DT) technology has garnered significant interest from both
academia and industry. However, the development of effective fault detection and diagnosis models
remains challenging due to the lack of comprehensive datasets. To address this issue, we propose
the use of Generative Adversarial Networks (GANs) to generate synthetic data that replicate real-
world data, capturing essential features indicative of health-related information without directly
referencing actual industrial DT systems. This paper introduces an intelligent fault detection and
diagnosis framework for industrial triplex pumps, enhancing fault recognition capabilities and
offering a robust solution for real-time industrial applications within the DT paradigm. The proposed
Citation: Sayed, A.; Alshathri, S.; framework leverages Conditional GANs (CGANs) alongside the Harris Hawk Optimization (HHO)
Hemdan, E.E.-D. Conditional as a metaheuristic method to optimize feature selection from input data to enhance the performance
Generative Adversarial Networks
of machine learning (ML) models such as Bagged Ensemble (BE), AdaBoost (AD), Support Vector
with Optimized Machine Learning for
Machine (SVM), K-Nearest Neighbors (KNNs), Decision Tree (DT), and Naive Bayes (NB). The
Fault Detection of Triplex Pump in
efficacy of the approach is evaluated using key performance metrics such as accuracy, precision,
Industrial Digital Twin. Processes 2024,
12, 2357. https://doi.org/10.3390/
recall, and F-measure on a triplex pump dataset. Experimental results indicate that hybrid-optimized
pr12112357 ML algorithms (denoted by “ML-HHO”) generally outperform or match their classical counterparts
across these metrics. BE-HHO achieves the highest accuracy at 95.24%, while other optimized models
Academic Editors: Yuhe Wang,
also demonstrate marginal improvements, highlighting the framework’s effectiveness for real-time
Shaoke Wan, Naipeng Li and
fault detection in DT systems, where SVM-HHO attains 94.86% accuracy, marginally higher than
Zijian Qiao
SVM’s 94.48%. KNN-HHO outperforms KNNs with 94.73% accuracy compared to 93.14%. Both DT-
Received: 15 September 2024 HHO and DT achieve 94.73% accuracy, with DT-HHO exhibiting slightly better precision and recall.
Revised: 10 October 2024 NB-HHO and NB show near-equivalent performance, with NB-HHO at 94.73% accuracy versus NB’s
Accepted: 25 October 2024 94.6%. Overall, the optimized algorithms demonstrate consistent, albeit marginal, improvements
Published: 27 October 2024
over their classical versions.
Keywords: machine learning; fault diagnosis; digital twins; conditional GANs (CGANs); Harris
Copyright: © 2024 by the authors. Hawk Optimizer (HHO); industrial control systems; Internet of Things (IoT)
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons 1. Introduction
Attribution (CC BY) license (https:// Recently, predictive analytics has been considered a key field within data science
creativecommons.org/licenses/by/
through the use of statistical models and machine learning algorithms to forecast future oc-
4.0/).
Therefore, this paper introduces an efficient fault diagnosis framework for industrial
digital twin systems. The framework aims to achieve accurate and efficient fault detection
and diagnosis by incorporating Conditional Generative Adversarial Networks (CGANs) for
generating synthesis industrial pump data and hybrid-optimized machine learning meth-
ods with Harris Hawk Optimization (HHO). The novelty of the work is in the combination
of CGAN with the HHO for the feature selection process in machine learning models. This
approach renders a significant improvement in the performance of models used in fault
detection systems. This resolves a very typical challenge in industrial fault detection where
such datasets are often unbalanced, enhancing the generalization and performance of the
model. Therefore, the key contributions of this paper can be summarized as follows:
• Develop a smart and vigorous fault detection and diagnosis framework for industrial
triplex pumps, enhancing fault recognition capabilities and offering a robust solution
for real-time industrial applications within the DT paradigm.
• The proposed framework leverages Conditional GANs (CGANs) alongside the Harris
Hawk Optimization (HHO) metaheuristic method to optimize feature selection from
input data effectively for machine learning (ML) models such as Bagged Ensemble
(BE), AdaBoost (AD), Support Vector Machine (SVM), K-Nearest Neighbors (KNNs),
Decision Tree (DT), and Naive Bayes (NB). The efficacy of the approach is evaluated
using key performance metrics such as accuracy, precision, recall, and F-measure on a
triplex pump dataset.
• From the experimental results, the suggested hybrid-optimized ML algorithms outper-
form or match their classical counterparts across various metrics. BE-HHO achieves
the highest accuracy at 95.24%, slightly surpassing BE’s 95.17%. SVM-HHO attains
94.86% accuracy, marginally higher than SVM’s 94.48%. KNN-HHO outperforms
KNNs with an accuracy of 94.73% compared to 93.14%. Both DT-HHO and DT achieve
94.73% accuracy, with DT-HHO displaying slightly better precision and recall. NB-
HHO and NB show nearly equivalent performance, with NB-HHO at 94.73% accuracy
versus NB’s 94.6%. Although AD-HHO and AD have lower accuracies at 92.57% and
92.06%, respectively, AD achieves higher recall.
• Hybrid-optimized machine learning models using the HHO will outperform classical
models in terms of accuracy, precision, and recall in diagnosing faults in industrial
pump systems.
Thus, this research proposes an innovative approach that successfully exploits both the
features optimization and the synthetic data production to fit the needs of typical dynamic
industrial environments based on the digital twin.
The structure of the paper is as follows: In Section 2, relevant work on the paper’s
theme is explored, while a brief overview of Conditional GAN (CGAN) and Harris Hawk
Optimization (HHO) as they pertain to the proposed system is provided in Section 3. The
suggested framework is explained and clarified in Section 4, while a high-level proposed
framework for remote fault monitoring and detection in smart industrial IoT systems
is delivered in Section 5. Section 6 shows the experimental findings and comparative
effectiveness of the suggested approaches in comparison with classical ML models, while
the paper’s conclusion and future scope of this innovative topic are explored in Section 8.
2. Previous Studies
To identify and diagnose defective equipment, this study intends to develop and
use a digital twin system for the triplex pump. For several industrial processes, fault
detection and diagnosis have been carried out to boost effectiveness, safety, and continuous
production. In recent decades, numerous Artificial Intelligence techniques for failure
diagnosis have been introduced to increase the reliability and security of sophisticated
equipment. One of the effective machine learning techniques used is Generative Adversarial
Networks (GANs). GANs have shown promising results in various fields, including fault
diagnosis [12–15]. These studies show how GANs can be used to diagnose faults in a range
of sectors, including manufacturing, energy, and transportation. However, there are still
Processes 2024, 12, 2357 4 of 20
several issues and open research paths in this field, including how to deal with imbalanced
data, how to deal with the absence of labeled data, and how to use other GAN techniques
in fault diagnosis.
One of the well-known and effective methods for defect diagnosis is the model-based
approach, in which an accurate complicated apparatus model is built by different analytic
terms [16–18]. Physically and mathematically informed approaches have been effectively
employed to tackle the intricacies of sophisticated industrial machinery, leveraging es-
tablished model-based methodologies. Nevertheless, the profound complexity of certain
equipment often necessitates a thorough comprehension of the underlying physical princi-
ples to create an accurate model. Due to its potential to revolutionize several industries,
including gaming, education, healthcare, and manufacturing, the ideas of industrial digital
twins have attracted significance recently.
On the other hand, digital twins are virtual representations of actual things, systems,
or settings that can be used for testing, simulation, and monitoring. Numerous industries,
including engineering, architecture, urban planning, and healthcare, use them. The pub-
lication by Grieves et al. [19], which presents the idea of digital twins and explores their
potential advantages and disadvantages, is one of the foundational works on this topic. The
authors contend that digital twins can enhance consumer experience, lower expenses, and
improve product development. Although the present DT schemes and executions are still
in their initial phases and require significant effort, they have been successfully integrated
into various applications such as healthcare systems, various industries including aviation
and farming, smart cities, and climate prediction [20,21].
Designing a competent digital twin system for any physical system requires the exper-
tise of specialized engineers and computer scientists. Their duties comprise constructing
and proposing the necessary product model and creating a comprehensive description of
the virtual system. The authors demonstrate the effectiveness of their approach in reducing
costs and improving transparency. Also, Wang et al. [22] create a digital twin platform for
smart cities that combines data from several sources to offer in-the-moment monitoring
and optimization of municipal infrastructure. However, they also note the difficulties in
managing data, scaling, and cybersecurity that come with building and sustaining digital
twins. A framework for building digital twins of the triplex pump and using hybrid ma-
chine learning for fault diagnosis in the industrial system is given in [23]. Fault diagnosis
becomes faster, more accurate, and more cost-effective, leading to improved operational
efficiency and reduced downtime in various industries, such as manufacturing, energy, and
transportation [24,25].
The current state of the art in industrial IoT applications is characterized by a scarcity of
contributions focused on the integration of digital twins and machine learning algorithms.
In response to this gap, this study proposes a novel fault prediction framework comprising
four phases, namely (1). a Data Acquisition Step (DAS), (2). a Data Synthesizing Step (DSS),
(3). ML-based Model Training and Testing (MLMT2), and finally (4). a Failure Diagnosis
Step (FDS). The proposed framework seeks to develop an advanced powerful forecast
digital twin-assisted AI framework that leverages Generative Adversarial Networks (GANs)
in combination with diverse HHO-based optimized machine learning techniques, such
as Bagged Ensemble (BE), AdaBoost (AD), Support Vector Machine (SVM), K-Nearest
Neighbors (KNNs), Decision Tree (DT), and Naive Bayes (NB) to identify and classify
faults effectively.
In conclusion, this section presents a critical review of the literature on fault detection
and diagnosis (Table 1) using digital twins and Artificial Intelligence techniques, specifically
Generative Adversarial Networks (GANs) and machine learning algorithms. Various
studies are summarized, highlighting their objectives, methodologies, key findings, and
their pros and cons.
Processes 2024, 12, 2357 5 of 20
3. Background
This section discusses the topics of Conditional GAN (CGAN) and Harris Hawk
Optimization (HHO) as they pertain to the proposed system.
Figure 2. The system architecture of Conditional GAN method for synthetic data.
Figure 2. The system architecture of Conditional GAN method for synthetic data.
3. Mating Pool: Select the fittest individuals from the current population to form the
mating pool. The size of the mating pool determines the number of offspring produced
in the next generation.
4. Crossover and Mutation: Apply crossover and mutation operators to the members
of the mating pool to generate new offspring. Crossover involves combining two
parent individuals to produce a single offspring, while mutation involves introducing
random changes to an individual.
5. Replacement: Replace the least fit individuals in the current population with the
newly generated offspring. This maintains the diversity of the population and pre-
vents the algorithm from getting stuck in the local optimum.
6. Repeat: Go back to step 2 and repeat the process until a stopping criterion is met,
such as reaching a maximum number of generations or achieving a desired level
of accuracy.
7. Selection of final features: Once the algorithm converges, select the top-ranked
features from the final population as the optimal set of features for the given dataset.
The key advantage of the HHO is its ability to handle complex, non-linear problems
and its robustness against noise and outliers in the data. Additionally, the HHO can be
easily parallelized, making it suitable for large datasets. However, the algorithm requires
careful parameter tuning for optimal performance.
4. Proposed Framework
In this work, we present a powerful scheme for automatically recognizing faults in
industrial DT systems. The proposed framework is demonstrated, which consists of a
hybrid-optimized ML model through the HHO method with different machine learning
methods such as Bagged Ensemble (BE), AdaBoost (AD), Support Vector Machine (SVM),
K-Nearest Neighbors (KNNs), Decision Tree (DT), and Naive Bayes (NB) on the CGANS-
based generated dataset. Furthermore, the detailed suggested framework comprises five
significant steps, as follows, to achieve the diagnostic procedure explained in Figure 3:
Stage 1: (Gaining and Gathering of Data): in this phase, the data will be gathered from
digital twin sources to handle the subsequent steps in the proposed system.
Stage 2: (Generating Synthetic Data): In this stage, the CGAN is utilized to create synthetic
data that resemble actual networks’ input. By utilizing labeled data, Conditional GANs
(CGANs) can generate synthetic samples that belong to specific, predefined categories.
Stage 3: (Generated Data Validation): In this phase, the principal component analysis
(PCA) is applied to assess the properties of both the created and actual signals. PCA enables
a complex dataset to be transformed into a set of uncorrelated variables, which are referred
to as the principal components. The goal of PCA is to use the numerical structures of the
actual data and assign the features of the created data to the same PCA subspace.
Stage 4: (Training/Testing the proposed model): At this point, the data generated in phase
one are used to train an effective ML model that diagnoses the fault of the optimized ML
methods to organize all the input data generated from the digital twin model. Then, the
HHO technique is applied to enhance the performance of the machine learning algorithm.
Finally, the data collected in real-time digital twins of machinery data are used to test
different machine learning algorithms. Therefore, the performing assessment indices are
applied to assess the suggested framework.
Stage 5: (Cloud-based Monitoring System for Fault Classification and Prediction): To
monitor machine data from their industrial systems, the supervisor operators in this step
use a cloud-based industrial monitoring tool of the proposed framework. To optimize
operations, operators can use this system to identify patterns and trends indicative of
impending failures, enabling proactive remediation before issues arise.
Processes 2024, 12, 2357 8 of 20
Processes 2024, 12, x FOR PEER REVIEW 9 of 22
Figure 3. Proposed
Figure hybrid-optimized
3. Proposed fault detection
hybrid-optimized framework.
fault detection framework.
6. Experimental Study
The generated pump dataset and the experimental setup are provided in this section.
In conclusion, this section provides an analysis of the findings and a discussion of the
suggested framework.
Processes 2024, 12, 2357 10 of 20
6. Experimental Study
The generated pump dataset and the experimental setup are provided in this section.
In conclusion, this section provides an analysis of the findings and a discussion of the
suggested framework.
Assessment metrics like accuracy, precision, recall, and F1-Score can be processed after
finding the parameters in the confusion matrix as follows [29–32]:
Accuracy: To ensure the reliability and accuracy of our system, it is imperative to determine
certain key parameters that influence the quality of the model. Specifically, we must evalu-
ate the symmetry of the datasets and the balance between false positive and false negative
rates, as expressed in Equation (1). A well-balanced dataset with minimal disparity between
Processes 2024, 12, x FOR PEER REVIEW 13 of 22
Figure
Figure6.6.The
TheCGAN
CGANmodel
modelfor
forthe
thetriplex
triplexpump
pumpsystem.
system.
Processes 2024, 12, 2357 12 of 20
Therefore, the data generated from CGAN are used to train the different algorithms,
and then the real data are used to examine the performance of the proposed algorithm. The
advantage of this technique is that there is no need for splitting the actual data and one can
test the proposed algorithm on whole actual data. We can grasp its capabilities due to this
approach. From Figure 7, the distribution of the generated signals is like the distribution
Processes 2024, 12, x FOR PEER REVIEW of the real signals. Both faulty and healthy signals, whether generated or real, 14 oflie
22 in the
same region of the PCA subspace, indicating that the properties of the generated signals are
equivalent to those of the real signals. Table 3 shows a comparison of the results and these
are also depicted in Figure 8. Figure 9 shows a confusion matrix for different ML methods
and Figure 10 displays the confusion ROC curve for all algorithms in the ML framework.
Figure Figure
7. Signal
7. feature visualization
Signal feature using the
visualization firstthe
using three
firstPCAs.
three PCAs.
For the3.mentioned
Table ComparisonML models,
results the resulting
of proposed systemconfusion matrices are indicated in Fig-
for fault detection.
ures 8 and 9 for the triplex model. Likewise, the ROC curves of the proposed system are
illustratedAlgorithm
in Figure 10. TheAccuracy
Harris Hawk Optimization
Recall (HHO) is applied to machine
Precision F1-Score
learning algorithms
BE to enhance the performance96.5789
95.1746 and boost the accuracy
93.6224of the system. The
95.0777
HHO is performedAD in a highly competitive
92.0635 manner in
96.8421terms of the caliber
87.9331 of its exploration
92.1728
and exploitation. The optimization algorithm is used for feature selection. The results in
SVM 94.6032 99.6053 90.2265 94.6842
Figure 11 show that the HHO reserves and enhances the performance of all the models.
KNNs 93.1429 99.8684 87.6443 93.3579
Table 3. Comparison
DT results of proposed
94.4762 system for 95.6579
fault detection. 93.0858 94.3543
100
96
94
Accuracy Precision Recall F1-score
Measuremnts %
92 100
90 98
88 96
86 94
Measuremnts %
84 92
90
82
88
80
86 BE AD SVM KNN DT NB
84 ML Models
82
Figure 8. 8.Results
Figure Resultsfor
forML
ML models forfault
models for faultdetection
detection without
without optimization.
optimization.
80
BE AD SVM KNN DT NB
SVM Figure 11 and Table 4 show KNNs
the performance of various
ML Models
machine learning (ML) al-
gorithms in their optimized and classical forms across four metrics: accuracy, precision,
recall, and F1-Score. The optimized versions are indicated with the suffix “-HHO” and
Figure 8. Results
consistently for ML models
outperform for fault
or closely detection
match their without
classicaloptimization.
counterparts.
SVM KNNs
Figure 9. Cont.
Processes 2024, 12, x FOR PEER REVIEW 16 of 22
Processes2024,
Processes 2024,12,
12,2357
x FOR PEER REVIEW 16 of 22
14 of 20
BEBE NBNB
DTDT ADAD
Figure 9. Confusion
Figure matrix
9. Confusion of the
matrix proposed
of the framework.
proposed framework.
Figure 9. Confusion matrix of the proposed framework.
SVM
SVM KNNs
KNNs
True Positive Rate
BE NB
True Positive Rate
DT AD
100
98
96
Measuremnts %
94
92
90
88
86
84
BE-HHO AD-HHO SVM-HHO KNN-HHO DT-HHO NB-HHO
Optmized ML Models
HHO, SVM, DT, NB, KNN, AD-HHO, and AD. Small improvements in recall, accuracy, or
precision can have a significant impact on predictive maintenance, fault detection quality,
and system reliability in industrial applications.. Any improvement, no matter how little,
helps to increase operational efficiency, lower risk, and improve decision-making in real-
Processes 2024, 12, x FOR PEER REVIEW 19 of 22
world systems where faulty or inaccurate diagnostics can cause expensive disruptions. So,
the total cost of maintenance will be reduced.
100
Measurements %
95
90
85
Recall
80 Accuracy
Models
From 5the
Table results above,
provides it is evident
a comparative that of
analysis different models
different have their
optimization own strengths,
strategies that can
with the best-performing ones being evaluated based on key
be used to improve the accuracy of the Support Vector Machine (SVM) algorithm metrics such as accuracy,
based
precision,
on accuracy, recall, and F1-Score.
precision, recall, These metrics provide
and F1-Score. Amonga more general view
the different methodsof model per-
evaluated,
formance, beyond just accuracy. The BE-HHO model leads with
the Harris Hawk Optimization (HHO) method emerges as the leader with a remarkable the highest accuracy at
95.24%, followed closely by the BE model at 95.17%. Next, the SVM-HHO
accuracy of 94.86%, a precision of 91.76%, and an F1-Score of 94.85 for this task, clearly model has an
accuracy of 94.86%,
demonstrating slightly
its ability outperforming
in the detection ofthe classical
faults. SVM at 94.48%.
Considering KNN-HHO
the results and
of all covered
DT-HHO both have an accuracy of 94.73%, with KNN-HHO marginally
methods, the HHO appears to be the most successful optimization approach for enhancing outperforming
classical KNNs
ML-based (93.14%) and DT-HHO closely matching classical DT (94.48%). The NB-
fault detection.
HHO model also has an accuracy of 94.73%, slightly higher than the classical NB at 94.6%.
AD-HHO
Table and AD have
5. Comparative studythe lowestdifferent
between accuracies at 92.57%methods
optimization and 92.06%, respectively, but AD
for SVM.
achieves the highest recall among all models. Therefore, the ranking of models based on
their performance
Optimization is as follows: BE-HHO,
Accuracy BE, SVM-HHO,
Precision DT-HHO,
Recall KNN-HHO, F1 NB-
HHO,
ParticalSVM,
SwarmDT,Optimization
NB, KNN, AD-HHO, and AD. Small
94.35 improvements
90.9646 98.0263in recall, 94.3635
accuracy,
orWhale
precision can have a significant
Optimization Algorith impact
92.8889on predictive
88.57maintenance, fault
97.8947 detection93qual-
ity, and system reliability in industrial applications.. Any improvement, no matter how
Slime Mould Algorithm 90.6032 85.4988 96.9737 90.8755
little, helps to increase operational efficiency, lower risk, and improve decision-making in
real-world systems where faulty or inaccurate diagnostics can cause expensive
Sine Cosine Algorithm 93.1429 89.2771 97.5000 disrup-
93.2075
tions. Generalized
So, the totalNormal
cost of maintenance will be reduced.
94.1587 90.6326 98.0263 94.1846
Table 5 provides
Distribution a comparative analysis of different optimization strategies that can
Optimization
be used to improve
Genatic the accuracy of93.6508
Algorithm the Support Vector
89.7590Machine98.0263
(SVM) algorithm based
93.7107
on accuracy, precision, recall, and F1-Score. Among the different methods evaluated, the
HHO 94.8571 91.7589 98.1579 94.8506
Harris Hawk Optimization (HHO) method emerges as the leader with a remarkable ac-
curacy of 94.86%, a precision of 91.76%, and an F1-Score of 94.85 for this task, clearly
demonstrating its ability in the detection of faults. Considering the results of all covered
methods, the HHO appears to be the most successful optimization approach for enhanc-
ing ML-based fault detection.
Processes 2024, 12, 2357 18 of 20
7. Limitation
While the research shows good prospects with using the HHO with CGANs in digital
twins for fault detection, some limitations have to be addressed to improve the robustness
and scalability of the proposed approach:
• Computation complexity and cost: The use of the HHO along with CGANs would
increase the computational complexity and cost. Both methods would involve heavy
computational loads—both the optimization for the HHO and CGANs, like any other
generative model, tend to require power-intensive calculations because of the need
to fit an auxiliary conditional variable in between, creating generator and discrimi-
nator networks. When these two are put together, the computational burden could
increase even more, thus inhibiting real-time applications of digital twins in fast fault
detection processes.
• Overfitting and biases in fault: As in the case of every other machine learning model,
there might be tendency of overfitting with the CGAN model, especially if it is trained
on small or unbalanced datasets. When the training data do not cover enough fault
types or some operating regimes, the learning algorithm may ‘overfit’ and learn to
always favor certain types of failures/diminish the importance of failures that are
less treated or are deemed rare. Such risk is more pronounced in large-scale complex
industrial systems where there are tendencies of faults happening frequently and
the amount of training data is small. Although the HHO is useful for the purpose of
finding the optimal parameter values, it is not a solution to the problem of limited data.
• Parameter sensitivity of HHO: In the absence of the self tuning nature of the HHO and
other metaheuristic algorithms, initialization parameter values in those cases tend to
impact the success of the HHO. With a small change in some parameters such as the
population size or even the number of iterations, the performance experienced may
be totally different. Such sensitivity probably causes the results obtained in different
scenarios of fault detection tasks to be quite different, most probably when real-time
data processing is involved, in particular in manufacturing systems.
• Data quality and representation: The effectiveness of Generative Artificial Intelligence
such as the CGAN heavily relies on the quality of the data used for training. Inaccurate,
incomplete, or biased datasets can lead to poor fault detection performance, particu-
larly when digital twins are expected to simulate real-world operational conditions.
These constraints can also be addressed and further researched to enhance the integra-
tion of the HHO with Generative Artificial Intelligence in digital twins for enhanced fault
detection, making it more effective, efficient, and applicable to more industries.
optimized algorithms consistently deliver marginal enhancements over their classical coun-
terparts. With respect to synthetic data generation, the framework employs CGANs and
feature selection is carried out using the HHO, which makes the framework a powerful and
adaptable architecture for continuous monitoring and diagnosis. As a result, operational
efficiency is enhanced, downtime is reduced, and enhanced system uptime is observed. In
addition, the ability of this approach to fit various industrial applications indicates greater
prospects for enhancing predictive maintenance and improvement strategies in several
applications over time.
Future research could explore the integration of other machine learning methods
for anomaly detection techniques such as Autoencoders, Deep Belief Networks, deep
reinforcement learning, or Recurrent Neural Networks and incorporate them into real-
time data streams for continuous monitoring and adaptive fault detection in dynamic
industrial DT systems. Expanding the framework beyond pumps opens up exciting
avenues for research in a range of industries that rely on dynamic equipment such as
turbines, compressors, and other industrial processes. Furthermore, efforts could be
directed toward improving CGANs’ performances by fine-tuning the hyperparameters,
such as learning rates, batch sizes, and the architecture of the generator and discriminator.
Author Contributions: Conceptualization and methodology, A.S. and E.E.-D.H.; software, validation
and formal analysis A.S.; investigation, E.E.-D.H.; writing—original draft preparation A.S., writing—
review and editing, S.A., E.E.-D.H. and A.S.; project administration and funding acquisition, S.A. All
authors have read and agreed to the published version of the manuscript.
Funding: this project is funded by Princess Nourah bint Abdulrahman University Researchers
Supporting Project number (PNURSP2024R197), Princess Nourah bint Abdulrahman University,
Riyadh, Saudi Arabia.
Data Availability Statement: The original contributions presented in the study are included in the
article; further inquiries can be directed to the corresponding author.
Acknowledgments: Princess Nourah bint Abdulrahman University Researchers Supporting Project
number (PNURSP2024R197), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding
the present study.
References
1. Bin, B.; Jian, K. Digital twin-based sustainable intelligent manufacturing: A review. Adv. Manuf. 2021, 9, 1–21.
2. Liu, Y.; Zhang, L.; Yuan, Y.; Zhou, L.; Ren, L.; Wang, F.; Liu, R.; Pang, Z.; Jamal Deen, M. A novel cloud- based framework for
elderly healthcare Services using a digital twin. IEEE Access 2019, 7, 49088–49101. [CrossRef]
3. Caputo, F.; Greco, A.; Fera, M.; Macchiaroli, R. Digital twins to enhance the integration of ergonomics in workplace design. Int. J.
Ind. Ergon. 2019, 71, 20–31. [CrossRef]
4. Zayed, S.M.; Attiya, G.; El-Sayed, A.; Sayed, A.; Hemdan, E.E.-D. An Efficient Fault Diagnosis Framework for Digital Twins Using
Optimized Machine Learning Models in Smart Industrial Control Systems. Int. J. Comput. Intell. Syst. 2023, 16, 69. [CrossRef]
5. Hemdan, E.E.-D.; El-Shafai, W.; Sayed, A. Integrating Digital Twins with IoT-Based Blockchain: Concept, Architecture, Challenges,
and Future Scope. Wirel. Pers. Commun. 2023, 131, 2193–2216. [CrossRef] [PubMed]
6. Pedersen, A.N.; Borup, M.; Brink-Kjær, A.; Christiansen, L.E.; Mikkelsen, P.S. Living and prototyping digital twins for urban
water systems: Towards multi-purpose value creation using models and sensors. Water 2021, 13, 592. [CrossRef]
7. Pylianidis, C.; Osinga, S.; Athanasiadis, I.N. Introducing digital twins to agriculture. Comput. Electron. Agric. 2021, 184, 105942.
[CrossRef]
8. Verdouw, C.; Tekinerdogan, B.; Beulens, A.; Wolfert, S. Digital twins in smart farming. Agric. Syst. 2021, 189, 103046. [CrossRef]
9. Neethirajan, S.; Kemp, B. Digital Twins in Livestock Farming. Animals 2021, 4, 1008. [CrossRef]
10. Nativi, S.; Mazzetti, P.; Craglia, M. Digital ecosystems for developing digital twins of the earth: The destination earth case. Remote
Sens. 2021, 13, 2119. [CrossRef]
11. Guo, H.; Nativi, S.; Liang, D.; Craglia, M.; Wang, L.; Schade, S.; Corban, C.; He, G.; Pesaresi, M.; Li, J.; et al. Big Earth Data science:
An information framework for a sustainable planet. Int. J. Digit. Earth 2020, 13, 743–767. [CrossRef]
12. Liu, H.; Zhou, J.; Xu, Y.; Zheng, Y.; Peng, X.; Jiang, W. Unsupervised fault diagnosis of rolling bearings using a deep neural
network based on generative adversarial networks. Neurocomputing 2018, 315, 412–424. [CrossRef]
Processes 2024, 12, 2357 20 of 20
13. Sabuhi, M.; Zhou, M.; Bezemer, C.-P.; Musilek, P. Applications of generative adversarial networks in anomaly detection: A
systematic literature review. IEEE Access 2021, 9, 161003–161029. [CrossRef]
14. Lian, Y.; Geng, Y.; Tian, T. Anomaly Detection Method for Multivariate Time Series Data of Oil and Gas Stations Based on Digital
Twin and MTAD-GAN. Appl. Sci. 2023, 13, 1891. [CrossRef]
15. Liu, H.; Zhao, H.; Wang, J.; Yuan, S.; Feng, W. LSTM-GAN-AE: A promising approach for fault diagnosis in machine health
monitoring. IEEE Trans. Instrum. Meas. 2021, 71, 1–13. [CrossRef]
16. Li, W.; Li, H.; Gu, S.; Chen, T. Process fault diagnosis with model-and knowledge-based approaches: Advances and opportunities.
Control Eng. Pract. 2020, 105, 104637. [CrossRef]
17. Syed, M.M.; Lemma, T.A.; Vandrangi, S.K.; Ofei, T.N. Recent developments in model-based fault detection and diagnostics of gas
pipelines under transient conditions. J. Nat. Gas Sci. Eng. 2020, 83, 103550. [CrossRef]
18. Costamagna, P.; De Giorgi, A.; Magistri, L.; Moser, G.; Pellaco, L.; Trucco, A. A classification approach for model-based fault
diagnosis in power generation systems based on solid oxide fuel cells. IEEE Trans. Energy Convers. 2015, 31, 676–687. [CrossRef]
19. Grieves, M. Digital Twin: Manufacturing Excellence Through Virtual Factory Replication. White Pap. 2014, 1, 1–7.
20. Rasheed, A.; San, O.; Kvamsdal, T. Digital Twin: Values, Challenges and Enablers from a Modeling Perspective. IEEE Access 2020,
8, 21980–22012. [CrossRef]
21. Jones, D.; Snider, C.; Nassehi, A.; Yon, J.; Hicks, B. Characterising the Digital Twin: A Systematic Literature Review. CIRP J. Manuf.
Sci. Technol. 2020, 29, 36–52. [CrossRef]
22. Wang, H.; Chen, X.; Jia, F.; Cheng, X. Digital twin-supported smart city: Status, challenges and future research directions. Expert
Syst. Appl. 2023, 217, 119531. [CrossRef]
23. Alshathri, S.; Hemdan, E.E.-D.; El-Shafai, W.; Sayed, A. Digital twin-based automated fault diagnosis in industrial IoT applications.
Comput. Mater. Contin. 2023, 75, 183–196. [CrossRef]
24. Rachmawati, S.M.; Putra, M.A.P.; Lee, J.M.; Kim, D.S. Digital twin-enabled 3D printer fault detection for smart additive
manufacturing. Eng. Appl. Artif. Intell. 2023, 124, 106430. [CrossRef]
25. Kuru, K. MetaOmniCity: Towards immersive urban metaverse cyberspaces using smart city digital twins. IEEE Access 2023, 11,
43844–43868. [CrossRef]
26. Mirza, M.; Simon, O. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784.
27. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications.
Future Gener. Comput. Syst. 2019, 97, 849–872. [CrossRef]
28. Multi-Class Fault Detection Using Simulated Data. Available online: https://ssd.mathworks.com/supportfiles/SPT/data/
PumpSignalGAN.zip (accessed on 14 August 2024).
29. El-Naby, A.A.; Hemdan, E.E.-D.; El-Sayed, A. An efficient fraud detection framework with credit card imbalanced data in financial
services. Multimed. Tools Appl. 2023, 82, 4139–4160. [CrossRef]
30. Rezk, N.G.; Attia, A.-F.; El-Rashidy, M.A.; El-Sayed, A.; Hemdan, E.E.-D. An Efficient Plant Disease Recognition System
Using Hybrid Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs) for Smart IoT Applications in
Agriculture. Int. J. Comput. Intell. Syst. 2022, 15, 65. [CrossRef]
31. Sharaf, M.; Hemdan, E.E.; El-Sayed, A.; El-Bahnasawy, N.A. An efficient hybrid stock trend prediction system during COVID-19
pandemic based on stacked-LSTM and news sentiment analysis. Multimed. Tools Appl. 2023, 82, 23945–23977. [CrossRef]
32. Abd El Naby, A.; Hemdan, E.E.D.; El-Sayed, A. Deep learning approach for credit card fraud detection. In Proceedings of the
2021 International Conference on Electronic Engineering (ICEEM), Menouf, Egypt, 3–4 July 2021; IEEE: New York, NY, USA, 2021;
pp. 1–5.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.