A Review On Machine Learning and Deep Learning Techniques For Predicting
A Review On Machine Learning and Deep Learning Techniques For Predicting
Due to the rise of big data analytics and the increasing number of people accessing the Internet, users are
looking for more efficient and effective computing solutions. Cloud software has made it easier for people
to access the resources they need, and major companies such as Amazon, Microsoft, and Google are
providing users with leasing solutions that allow them to get the most out of their computational power. The
rise of cloud computing has made it more common than ever to use distributed systems. These services
provide high-speed, reliable, and unlimited resources. Their environments are made up of various
components, such as memory units, processors, networking devices, and sensors. Due to the complexity of
the cloud's architecture, it is becoming more difficult to maintain its availability and reliability. Many users
are complaining about the issues that they are experiencing with the various cloud services. The availability
of publicly available clouds provides users with access to resources without requiring them to worry about
the hardware's maintenance.
Cloud computing applications are prone to failure as they operate in large-scale environments, such as virtual
and physical machines. Data canters and services may experience various types of failures, like disk,
software, and hardware, which can affect their operations. To ensure that cloud users and service providers
have the ability to predict when and how it will happen, a model has to be developed that can accurately
forecast job failure. However, it is important to note that the maintenance of the cloud environment is carried
out by the service providers. In the event of hardware failure, the users and cloud providers are greatly
affected. Having the ability to accurately forecast when and how it will happen is very important. The
efficiency of the prediction techniques currently used in the cloud computing environment was compared
with the performance of their alternatives using a variety of statistical methods. This study will help the
development of future research in this field and provide the opportunity for cloud computing users to respond
1
efficiently to predicted failures. It could also benefit the cloud computing services of major companies such
as Microsoft and Amazon.
The main objective of the present review is to understand the characteristics of the failure and successful
activities, key factors that can affect the performance of the cloud-based applications, understanding the role
of machine learning and deep learning models with respect to failure predictions, and analyzing the
performance of existing models. The review process can help the future research to identify several factors
that impacts the availability and reliability of the cloud services while reducing the number of failures. The
study also provides a scope to develop a new failure prediction model that can analyzing various failure
characteristics. The future research can implement various statistical, machine learning and neural networks
to develop an efficient cloud-based failure prediction system.
The major contributions of this work are described as follows:
• To define the scope of understanding several factors that can upgrade the performance of cloud
computing services by understanding the behaviour of failure activities.
• To discuss the concepts of machine learning and deep learning and understand their applicability and
provide a failure prediction system in real-world situations.
• To summarize the potential benefits of using machine learning and deep learning algorithms for
developing an efficient failure prediction system to classify the failure and successful activities in
cloud environments.
• To summarize and discuss the potential directions of future research within the scope of failure
predictions.
The organization of the other sections of the article are as follows: Section 2 summarizes the failure
characteristics, failure prediction techniques and the existing models. Section 3 summarizes the comparative
analysis with respect to several attributes followed by conclusion and future directions in Section 4.
2. Related Work
The researchers evaluated several papers on the subject “failures of cloud computing infrastructure”. The
study also noted the importance of developing predictive models to classify the successful and failure
activities using historical data ana applying several machine learning and deep learning models in a timely
manner. The entire review process is carried out by collecting the relevant data from various articles that are
published in well reputed journals and published in between the years 2017 and 2024. Only seven years data
is collected to understand the latest advancements in the systems that are developed for the prediction of
faults in cloud environments.
35
29
30
Number of Articles
25 22
19
20
14 13 14
15 11 10
10
5
0
Year of Publication
Failure
Prediction
Systems, 17
Failure
Prediction
Techniques, 7
5
performed better than the previous research, with a precision of up to 11% and a recall rate of 32% and
compared to the state-of the-art studies, the combined approach exhibited competitive results, with a 5% F1
score and a 10% precision (Chhetri et al., 2022).
To improve the cloud-app efficiency and resource utilization is proposed. We performed a comprehensive
evaluation of its components, namely the Google cluster, Trinity, and Mustang. We were able to identify the
most accurate model using several machine learning techniques. The results of our analysis revealed that the
requests and unsuccessful tasks have a significant correlation. The results of the analysis revealed that the
proposed model can perform well in various areas, such as accuracy, recall, and F1score. The time it takes to
perform the task with the RF-based model is longer, at 247.6s, with a Google trace of 29 days, while the DT
has a relatively low time of 53.8s (M. S. Jassas & Mahmoud, 2022). The proposed evaluation method aims
to predict a task's failure and a job's failure. We utilized the GCT dataset to analyse the performance of the
different TML and DL models. The XGBoost classifier was selected as the best candidate for predicting job-
level malfunctions. It achieved an accuracy rate of 94.35% and a score of 0.9310. In contrast, two supervised
task-level classification models performed well, with the former having an accuracy rate of 89.75% and the
latter an F-rating of 0.9154 (Tengku Asmawi et al., 2022).
2.3 Fault Prediction Systems
Machine learning-based models are commonly used in the cloud environment to detect and recover faults.
They are supervised learning methods that learn from data related to specific fault situations. Various
frameworks have been presented for developing and implementing this type of model. The review addresses
the need of monitoring the abnormal changes in cloud services. The process also brings the need of
understanding the implementation of various fault prediction models in the cloud environment. Different
machine learning and deep learning algorithms that perform the efficient classification of successful and
failure activities are also discussed. Further studies on the applications of deep learning and machine learning
in forecasting and fault management have been carried out (Soualhia et al., 2019; Wang et al., 2022; Yang
& Kim, 2022). The method presented in (Kaur & Vaithiyanathan, 2024) combines the optimization
techniques of hybrid systems with neural networks.
The proposed framework can be used to improve the maintenance strategy of an organization by analysing
and predicting the likelihood of task failures. It can also be used in real-world environments such as a
manufacturing facility (Aboshosha et al., 2023). The proposed framework is based on the Hidden Markov
Model and the Cloud Theory, and it can extend this model to predict system failure. In the simulations, the
model was able to perform well. It also exhibited an optimal tradeoff between the computational complexity
and the performance of the prediction (Zheng et al., 2016).
Several studies have been conducted on the use of anomaly detection methods for detecting various faults in
an environment using software defined networking (SDN). One of these is a method that uses cloud log data
to predict and detect faults. Unfortunately, due to the use of SVM, the method has some issues such as
labelling and imbalance (Garg et al., 2019; M. El-Shamy et al., 2021). Another technique that is commonly
used is the Bi-LSTM method, but it has some disadvantages like utilizing a lot of labelling (Gao et al., 2022;
He & Lee, 2021). In (Mohammed et al., 2019), a method that uses machine learning to improve the accuracy
of prediction for failure is presented. We have developed a model that is based on a variety of algorithms,
such as the Support Vector Machine, the Random Forest, the KNN, and the CART. We tested the accuracy of
the prediction with different comparisons. A proposed model for failure prediction is based on CloudSim. It
collects performance-related data from the cloud and uses a neural network to analyse the hardware's status.
It was able to predict the cloud's host failure with an accuracy of about 89% (Davis et al., 2017).
6
Several critical factors were addressed related to cloud computing by developing models that can improve
the prediction of cloud performance and provide a better fault tolerance. We utilized a combination of
machine learning techniques, such as gradient boosting, linear regression, and decision trees, to build our
models (Kalaskar & Thangam, 2023). A novel predictive model can be developed by extracting various
features from log data through a text mining algorithm. It then provides a model that can predict the failure
of the system's critical devices and identify the ones that need to be replaced. The last step involves
developing a forecasting model that can predict the infrastructure's health. The second step of this process
involves developing a set of models using various algorithms, such as rank-based and association rules. The
time-series models are then built using machine learning techniques (Patel, 2020).
The proposed framework can be used to identify and predict various faults in an infrastructure-level cloud.
It can perform well by detecting non-fatal faults in the hardware and software of the system. The accuracy of
the prediction made by the two models is comparable. For instance, the CNN has a 96.47% accuracy while
the Long-Term Memory LSTM has a 96.88% accuracy (Soualhia et al., 2019). The paper presents an
innovative method for root cause analysis and system failure prediction that combines the three aspects of IT
observability: logs, traces, and metrics. The method is designed to capture the temporal aspects of the data
by integrating GNNs. The predicted F1 scores of the system failure prediction were 0.98, 0.96, and 0.97,
which are significantly better than the state-of-art (Rouf et al., 2024). The goal of this study is to develop a
framework that can predict the likelihood of task failures in scientific workflows. The results of the analysis
of the predicted and actual failures in Amazon EC2 and Pegasus were compared with the predicted and actual
failures using Naive Bayes. The model's accuracy was confirmed at 94% (Bala & Chana, 2015).
3. Comparative Analysis
3.1 Advancements in Failure Analysis
In recent years, advancements in analysing the failures of cloud infrastructure have gained significant
attention due to their potential impact on applications like cloud data availability, reliability and fault
tolerance. With the emergence of cloud computing, researchers have been increasingly able to tackle
challenges such as predictive maintenance, anticipating and address the failures like detecting the anomalies,
predicting the outages and optimizing the cloud performance. This progression has laid the groundwork for
examining the reliability of cloud computing. Despite considerable progress, there remains a lack of
comprehensive studies addressing the application of analysing the predictive techniques in cloud
environment using machine learning techniques. Consequently, this literature review seeks to explore these
gaps, examining how various existing problems are being addressed in the current research landscape.
The paper introduces Preface, a novel approach that enhances neural-network-based failure predictors to
effectively handle time series of KPI sets with variable sizes, which is essential for cloud applications that
utilize autoscaling. This is achieved by incorporating a Rectifier layer that transforms the variable KPI sets
into a fixed set of rectified-KPIs, making them compatible with the neural network's input requirements.
Experimental results demonstrate that Preface can successfully predict many harmful failures in both a
commercial application and a widely used academic exemplar, allowing for timely activation of
countermeasures to prevent negative impacts on users of the applications (Y. Li et al., 2020). The study found
that the best overall performing configuration for failure prediction is a CNN-based encoder combined with
the Logkey2vec embedding strategy. This combination demonstrated high accuracy when specific dataset
conditions were met, namely a dataset size greater than 350 or a failure percentage exceeding 7.5%. The
research systematically investigated the impact of various deep learning encoders (LSTM, BiLSTM, CNN,
and transformer) and embedding strategies (BERT and Logkey2vec) on failure prediction accuracy, revealing
7
that the characteristics of the dataset, such as size and failure percentage, significantly influence the
performance of the models (Hadadi et al., 2024).
The proposed failure prediction algorithm based on multi-layer Bidirectional Long Short-Term Memory (Bi-
LSTM) demonstrates a significant improvement in predicting task and job 4 failures in cloud data centers,
achieving an accuracy of 93 percent for task failures and 87 percent for job failures in trace-driven
experiments. The study highlights the importance of accurately predicting task and job failures to enhance
service reliability and availability in large-scale cloud data centers, thereby reducing resource wastage
associated with recovery from such failures (Gao et al., 2022). The study proposes a conceptual model for
preparing, constructing, and evaluating both traditional machine learning algorithms and deep learning
algorithms specifically for predicting job and task failures in cloud systems, addressing a critical issue faced
by cloud service providers and users. Experimental results indicate that Extreme Gradient Boosting
outperforms other algorithms in job failure prediction with an accuracy of 94.35%, while Decision Tree and
Random Forest achieve the highest accuracy of 89.75% in task failure prediction, highlighting the importance
of specific features such as disk space request, CPU request, and task priority in determining prediction
outcomes (He & Lee, 2021).
3.2 Analysing the State-of-the-art Systems
Due to the complexity of cloud computing, many service providers are not able to prevent the failures that
commonly occur in their components. Previous studies have mainly focused on understanding the behavior
of failed jobs and identifying their causes. On the other hand, some investigations have investigated the
prediction of failures. The main objective of this approach is to enhance the efficiency of cloud applications
by minimizing the number of jobs that have failed. In this subsection, a comparative analysis of existing
literatures was reviewed to evaluate the performance of the existing failure prediction models investigating
the systems developed, datasets used, performance assessment, and results achieved are shown in Table 1.
Table 1. Summary of the existing literature towards failure prediction systems
Reference Dataset Process Approach Metrics Results
Accuracy
(Islam & 97%
Analysing Long-Short Term True Positive
Google accuracy,
Manivannan, failure Memory Rate
cluster 85% TPR,
2017) Characterization (LSTM) Fale Positive
11% FPR
Rate
92.4%
Custom data average
collected Predicting LSTM with precision,
Precision
(Lin et al., from failure Random Forest 63.5%
Recall
2018) production proneness of a (RF) and a average
F1 measure
cloud service node ranking model recall, 75.2%
system average f1-
score
XGBoost, C5.0,
(Kalaskar & 100%
Ada-Boost, Precision
Google Encompassing precision,
Thangam, Average Neural Sensitivity
Trace diverse metrics 80%
2023) Network, and
sensitivity,
Bayesian GLM
Log Artificial Neural Sensitivity and 69.96%
(Patel, 2020) Text mining
messages Networks and Specificity sensitivity
8
and Support Vector and 97.13%
maintenance Machine specificity
records
Yellow Saddle
0.95 purity
(Kaur & Failure- Goat Fish
value and
Dataset Augment the Algorithm, and Purity value and
Vaithiyanathan, 0.901, 0.89
OpenStack purity metrics Grasshopper STO workload
2024) STO
database Optimization
workloads
Algorithm
Logistic
(M. S. Jassas & Regression (LR) 99% for
Mitigating the
Mahmoud, Bit Brains and K-Nearest Accuracy KNN, 95%
losses
2022) Neighbour for LR
(KNN)
92%
(Faraz Bashir et Google Failure Precision and
XGBoost precision and
al., 2022) cluster characterization Recall
94.8% recall
Multiple
(Gollapalli et Google 99.8%
assessment ANN and SVM Accuracy
al., 2022) cluster accuracy
criteria
Analysing
(Gao et al., Google Bidirectional 93%
system message Accuracy
2022) cluster LSTM accuracy
logs
Alahmad, Y., Daradkeh, T., & Agarwal, A. (2021). Proactive Failure-Aware Task Scheduling Framework for Cloud
Computing. IEEE Access, 9, 106152–106168. https://doi.org/10.1109/ACCESS.2021.3101147
Amvrosiadis, G., Woo Park, J., Ganger, G. R., Gibson, G. A., Baseman, E., & DeBardeleben, N. (n.d.). On the
diversity of cluster workloads and its impact on research results.
https://www.usenix.org/conference/atc18/presentation/amvrosiadis
Bala, A., & Chana, I. (2015). Intelligent failure prediction models for scientific workflows. Expert Systems with
Applications, 42(3), 980–989. https://doi.org/10.1016/j.eswa.2014.09.014
Bambharolia, P., Bhavsar, P., & Prasad, V. (2017). Failure Prediction and Detection In Cloud Datacenters.
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH, 6(09). www.ijstr.org
Bommala, H., Uma Maheswari, V., Aluvalu, R., & Mudrakola, S. (2023). Machine learning job failure analysis
and prediction model for the cloud environment. High-Confidence Computing, 3(4).
https://doi.org/10.1016/j.hcc.2023.100165
Chhetri, T. R., Dehury, C. K., Lind, A., Srirama, S. N., & Fensel, A. (2022). A Combined System Metrics
Approach to Cloud Service Reliability Using Artificial Intelligence. Big Data and Cognitive Computing,
6(1). https://doi.org/10.3390/bdcc6010026
Davis, N. A., Rezgui, A., Soliman, H., Manzanares, S., & Coates, M. (2017). FailureSim: A System for Predicting
Hardware Failures in Cloud Data Centers Using Neural Networks. IEEE International Conference on
Cloud Computing, CLOUD, 2017-June, 544–551. https://doi.org/10.1109/CLOUD.2017.75
Gao, J., Wang, H., & Shen, H. (2022). Task Failure Prediction in Cloud Data Centers Using Deep Learning. IEEE
Transactions on Services Computing, 15(3), 1411–1422. https://doi.org/10.1109/TSC.2020.2993728
Garg, S., Kaur, K., Kumar, N., & Rodrigues, J. J. P. C. (2019). Hybrid deep-learning-based anomaly detection
scheme for suspicious flow detection in SDN: A social multimedia perspective. IEEE Transactions on
Multimedia, 21(3), 566–578. https://doi.org/10.1109/TMM.2019.2893549
10
Gaur, D. K., & Mahalkari, A. (n.d.). Effective Fault prediction using classifier analysis for cloud environment.
Gollapalli, M., AlMetrik, M. A., AlNajrani, B. S., AlOmari, A. A., AlDawoud, S. H., AlMunsour, Y. Z., Abdulqader,
M. M., & Aloup, K. M. (2022). Task Failure Prediction Using Machine Learning Techniques in the Google
Cluster Trace Cloud Computing Environment. Mathematical Modelling of Engineering Problems, 9(2),
545–553. https://doi.org/10.18280/mmep.090234
Hadadi, F., Dawes, J. H., Shin, D., Bianculli, D., & Briand, L. (2024). Systematic Evaluation of Deep Learning
Models for Log-based Failure Prediction. Empirical Software Engineering, 29(5).
https://doi.org/10.1007/s10664-024-10501-4
He, Z., & Lee, R. B. (2021). CloudShield: Real-time Anomaly Detection in the Cloud.
http://arxiv.org/abs/2108.08977
Islam, T., & Manivannan, D. (2017). Predicting Application Failure in Cloud: A Machine Learning Approach.
Proceedings - 2017 IEEE 1st International Conference on Cognitive Computing, ICCC 2017, 24–31.
https://doi.org/10.1109/IEEE.ICCC.2017.11
Jassas, M., & Mahmoud, Q. H. (n.d.). Failure Analysis and Characterization of Scheduling Jobs in Google
Cluster Trace.
Jassas, M. S., & Mahmoud, Q. H. (2021, April 15). A Failure Prediction Model for Large Scale Cloud
Applications using Deep Learning. 15th Annual IEEE International Systems Conference, SysCon 2021 -
Proceedings. https://doi.org/10.1109/SysCon48628.2021.9447141
Jassas, M. S., & Mahmoud, Q. H. (2022). Analysis of Job Failure and Prediction Model for Cloud Computing
Using Machine Learning. Sensors, 22(5). https://doi.org/10.3390/s22052035
Kalaskar, C., & Thangam, S. (2023). Fault Tolerance of Cloud Infrastructure with Machine Learning.
Cybernetics and Information Technologies, 23(4), 26–50. https://doi.org/10.2478/cait-2023-0034
Karthik, T. S., & Kamala, B. (2021). Cloud based AI approach for predictive maintenance and failure prevention.
Journal of Physics: Conference Series, 2054(1). https://doi.org/10.1088/1742-6596/2054/1/012014
Kaur, R., & Vaithiyanathan, R. (2024). Hybrid YSGOA and neural networks-based software failure prediction in
cloud systems. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-67107-5
Kumar, A. (n.d.). AI-Driven Innovations in Modern Cloud Computing. Computer Science and Engineering,
2024(6), 129–134. https://doi.org/10.5923/j.computer.20241406.02
Li, Y., Jiang, Z. M. J., Li, H., Hassan, A. E., He, C., Huang, R., Zeng, Z., Wang, M., & Chen, P. (2020). Predicting
Node Failures in an Ultra-Large-Scale Cloud Computing Platform. ACM Transactions on Software
Engineering and Methodology, 29(2). https://doi.org/10.1145/3385187
Li, Z., Liu, L., & Kong, D. (2019). Virtual Machine Failure Prediction Method Based on AdaBoost-Hidden Markov
Model. Proceedings - 2019 International Conference on Intelligent Transportation, Big Data and Smart
City, ICITBS 2019, 700–703. https://doi.org/10.1109/ICITBS.2019.00173
Liang, J., & Chen, M. (n.d.). AI-Driven Predictive Maintenance for Cloud Infrastructure: Advancements,
Challenges, and Future Directions.
Lin, Q., Hsieh, K., Dang, Y., Zhang, H., Sui, K., Xu, Y., Lou, J. G., Li, C., Wu, Y., Yao, R., Chintalapati, M., & Zhang,
D. (2018). Predicting node failure in cloud service systems. ESEC/FSE 2018 - Proceedings of the 2018
11
26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the
Foundations of Software Engineering, 480–490. https://doi.org/10.1145/3236024.3236060
M. El-Shamy, A., A. El-Fishawy, N., Attiya, G., & A. A. Mohamed, M. (2021). Anomaly Detection and Bottleneck
Identification of The Distributed Application in Cloud Data Center using Software–Defined Networking.
Egyptian Informatics Journal, 22(4), 417–432. https://doi.org/10.1016/j.eij.2021.01.001
Mesbahi, M. R., Rahmani, A. M., & Hosseinzadeh, M. (2019). Dependability analysis for characterizing Google
cluster reliability. International Journal of Communication Systems, 32(16).
https://doi.org/10.1002/dac.4127
Mohammed, B., Awan, I., Ugail, H., & Muhammad, Y. (n.d.). Failure Prediction using Machine Learning in a
Virtualised HPC System and application.
Ng’ang’a, D. N., Cheruiyot, W., & Njagi, D. (2023). A Machine Learning Framework for Predicting Failures in
Cloud Data Centers -A case of Google Cluster -Azure Clouds and Alibaba Clouds.
https://doi.org/10.21203/rs.3.rs-3326876/v1
Padmakumari, P., & Umamakeswari, A. (2019). Task Failure Prediction using Combine Bagging Ensemble
(CBE) Classification in Cloud Workflow. Wireless Personal Communications, 107(1), 23–40.
https://doi.org/10.1007/s11277-019-06238-9
Patel, S. S. (2020). Forecasting health of complex IT systems using system log data. Journal of Banking and
Financial Technology, 4(1), 27–35. https://doi.org/10.1007/s42786-019-00011-z
Prajapati, V., & Thakkar, V. (2020). EasyChair Preprint A Survey on Failure Prediction Techniques in Cloud
Computing A Survey on Failure Prediction Techniques in Cloud Computing.
Proceedings of the 9th International Conference On Cloud Computing, Data Science and Engineering:
Confluence 2019 : 10-11 January 2019, Uttar Pradesh, India. (2019a). IEEE.
Rawat, A., Sushil, R., Agarwal, A., & Sikander, A. (2021). A New Approach for VM Failure Prediction using
Stochastic Model in Cloud. IETE Journal of Research, 67(2), 165–172.
https://doi.org/10.1080/03772063.2018.1537814
Rouf, R., Rasolroveicy, M., Litoiu, M., Nagar, S., Mohapatra, P., Gupta, P., & Watts, I. (2024). InstantOps: A Joint
Approach to System Failure Prediction and Root Cause Identification in Microservices Cloud-Native
Applications. ICPE 2024 - Proceedings of the 15th ACM/SPEC International Conference on Performance
Engineering, 119–129. https://doi.org/10.1145/3629526.3645047
Ruan, L., Xu, X., Xiao, L., Yuan, F., Li, Y., & Dai, D. (2019). A comparative study of large-scale cluster workload
traces via multiview analysis. Proceedings - 21st IEEE International Conference on High Performance
Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE
International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, 397–404.
https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00067
Soualhia, M., Fu, C., & Khomh, F. (2019). Infrastructure fault detection and prediction in edge cloud
environments. Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC 2019, 222–235.
https://doi.org/10.1145/3318216.3363305
Sun, Y., Xu, L., Li, Y., Guo, L., Ma, Z., & Wang, Y. (2018). Utilizing Deep Architecture Networks of VAE in Software
Fault Prediction. 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous
Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking,
12
Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), 870–877.
https://doi.org/10.1109/BDCloud.2018.00129
Swain, D., Kumar, P., Tushar, P., & Editors, A. (n.d.). Advances in Intelligent Systems and Computing 1311
Machine Learning and Information Processing Proceedings of ICMLIP 2020.
http://www.springer.com/series/11156
Tengku Asmawi, T. N., Ismail, A., & Shen, J. (2022). Cloud failure prediction based on traditional machine
learning and deep learning. Journal of Cloud Computing, 11(1). https://doi.org/10.1186/s13677-022-
00327-0
Uddin Ahmed, K. M., Alvarez, M., & Bollen, M. H. J. (2020). Characterizing failure and repair time of servers in a
hyper-scale data center. IEEE PES Innovative Smart Grid Technologies Conference Europe, 2020-
October, 660–664. https://doi.org/10.1109/ISGT-Europe47291.2020.9248891
Wang, B., Hua, Q., Zhang, H., Tan, X., Nan, Y., Chen, R., & Shu, X. (2022). Research on anomaly detection and
real-time reliability evaluation with the log of cloud platform. Alexandria Engineering Journal, 61(9), 7183–
7193. https://doi.org/10.1016/j.aej.2021.12.061
Yang, H., & Kim, Y. (2022). Design and Implementation of Machine Learning-Based Fault Prediction System in
Cloud Infrastructure. Electronics (Switzerland), 11(22). https://doi.org/10.3390/electronics11223765
Zheng, W., Wang, Z., Huang, H., Meng, L., & Qiu, X. (2016). EHMM-CT: An online method for failure prediction
in cloud computing systems. KSII Transactions on Internet and Information Systems, 10(9), 4087–4107.
https://doi.org/10.3837/tiis.2016.09.004
13