AI-Driven Anomaly Detection in Network Monitoring
AI-Driven Anomaly Detection in Network Monitoring
AI-Driven Anomaly Detection in Network Monitoring
USA
ABSTRACT
Effective network monitoring is crucial for maintaining performance and security. Traditionally, tools use threshold-based methods for anomaly detection
but struggle to detect complex patterns in modern dynamic networks. This paper investigates leveraging machine learning to augment monitoring
capabilities. Key network monitoring tools are described along with how they currently handle anomaly detection. Machine learning techniques for
developing predictive models from historical data are then discussed. A framework for integrating trained models as add-ons to existing tools is proposed.
These AI-driven approaches are shown to provide more accurate and automated anomaly detection compared to legacy techniques.
*Corresponding author
Aakash Aluwala, USA.
Received: June 01, 2024; Accepted: June 06, 2024; Published: June 14, 2024
Keywords: Anomaly Detection, Intrusion Detection, Machine include LOF, AutoCloud clustering, TEDA clustering, Bayesian
Learning, Network Monitoring, Network Security models and HTM. Discussed deep learning techniques including
convolutional and LSTM, autoencoder and SNN. It also handles
Introduction challenges of evolving data, high dimensionality, online learning
Reasons for network monitoring include compliance with service and performance.
level agreements, improving performance, and increasing security.
Network administrators use tools for monitoring to identify
problems and diagnose them [1]. Historically, these tools work
with rules and thresholds to analyze traffic and find deviations.
However, modern networks deal with enormous, heterogeneous
traffic originating from numerous sources. They also note that
new threats reemerge continuously. This makes it difficult for
traditional tools to maintain pace and identify such patterns.
to go with anomalies because they are easier to isolate. The current from past occurrences [17]. For example, time series forecasting
literature has shown that iForest is efficiency in detecting network models such as ARIMA can help forecast future metric values.
intrusions and infrastructural difficulties [7]. However, finer and That is why the comparison of actual and predicted values allows
clearly distinguishing between the anomalies at the boundary of for detecting deviations. Features are obtained from flow data by
data points is a demanding process. constructing network graphs, and these features are used as inputs
to the ML classifier to train for normal behaviors. This makes
Autoencoders are deep learning models that learn efficient it easier to identify complex deviations that would otherwise
structures for data encoding and decoding. They are generally not be easily identified by basic value thresholds. It also allows
applied to detect anomalies based on the reconstruction residuals tools to become more self-improving over their lifetimes through
[8]. This approach was pioneered by Preethi D, et al. in network subsequent model training on additional data. This increases their
intrusion detection where the autoencoder was trained on normal efficiency, especially in a rapidly changing environment.
traffic and any instance that yielded high error was flagged. They
obtained 98% accuracy on KDD Cup 1999 data [9]. However, AI-Driven Approaches for Anomaly Detection
these methods depend on large data sets to capture significant Machine learning and AI techniques can be leveraged to develop
interactions in the high-dimensional network metrics. RNNs powerful anomaly detection models for network monitoring
have also been used with some level of success by modelling the systems [18].
temporal sequences in the network metrics. Mou L, et al. used the
time series traffic attributes and applied long short-term memory
(LSTM) RNN for anomaly detection [10]. Similarly, Makineedi
SH, et al. employed RNN to train normal TCP connection patterns
and used the same to detect port scans and SYN floods [11].
However, constrained computational capability remains an issue
for real-time implementation of deep learning techniques.
from historical data. enables easy updates and experimentation with different ML
• Detect previously unknown anomalies without explicit rules techniques [29].
defined for each case.
• Continuously improve over time with exposure to more data. Discussion of Improvements, Challenges and Future Work
• Anticipate emerging issues based on subtle shifts in network While the proposed AI module architecture provides a practical
usage. approach to incorporating machine learning into network
monitoring, there are still opportunities for improvement. One
By developing custom models tailored to each organization’s challenge is acquiring enough high-quality historical data to train
network environment, AI enhances the accuracy and automation of accurate models. Networks are constantly evolving, so ensuring
anomaly detection for proactive network monitoring and defense collection of fully representative normal data over long periods.
[24]. Outdated training data could impair detection ability [30]. Feature
engineering also requires domain expertise to identify the most
Solution and Implementation pertinent indicators for different network entities and anomaly
Proposed Architecture for Adding AI Modules to Monitoring types. Irrelevant features could introduce noise.
Systems
To add AI-driven anomaly detection capabilities to existing network When retraining models, balancing exploration of new techniques
monitoring systems, a modular architecture can be implemented with maintaining consistency is difficult. Frequent changes could
where machine learning models are developed as plug-ins or add- reduce stability of detections [31]. Integration of modules may
ons. The proposed architecture involves developing AI modules affect existing monitoring workflows and interfaces. Testing is
that interface with the monitoring system via APIs or by accessing needed to validate minimal disruption to operations and maintain/
the system database [25]. The modules will have the following improve productivity. Future work involves developing self-
key components: supervised and online learning approaches. Instead of batch
training, models could continuously update based on recent data
and feedback to autonomously track changes [32]. Ensemble
and multi-model techniques combining clustering, isolation, and
reconstruction algorithms may provide more robust detection
over individual models.
An Auto encoder neural network model was developed using To calculate these, all model-generated alerts on the test VMs
Python and Keras for feature extraction and dimensionality were recorded along with known injection times of attacks [43].
reduction. The model took as input a set of 30 statistical features True/false positives were labeled based on occurring within 24
summarizing the traffic and metrics per host over each 5-minute hours of attacks or not. For detection rate, an alert within 24 hours
interval [38]. It was trained for 100 epochs on the first 3 months of of an attack constituted a true positive. false positives flagged
normal data to learn underlying patterns and dependencies between outside known events [44]. Mean time was averaged only over
features. The model achieved a 93% reconstruction accuracy on true positive detections.
a validation set, demonstrating it captured characteristic patterns
in the input space. A Zabbix plug-in was created using their PHP The auto encoder and isolation forest models integrated with
API to retrieve live monitoring data and run that data through the Zabbix achieved average detection rates of 89% and 83%
trained auto encoder. The mean-squared-error between inputs and respectively across attack types, compared to 71% for baseline
outputs was used as an anomaly score. The models were tested threshold monitoring [45]. False positive rates for the models were
on the last 3 months of data, where synthetic attacks including also significantly lower at 2.4% and 3.1%, whereas naive threshold
DDoS, port scans and crashed services were injected weekly to resulted in an unacceptable 12.4% error rate. Mean detection time
simulate anomalies [33]. was reduced from over 30 hours with basic alerts to under 6 hours
on average when using integrated ML models to provide early
Performance was evaluated based on the ability to detect these warnings. This case study demonstrated how academic testing
attacks within a day and achieve low false positive rates. Results protocols can validate AI-driven approaches deliver quantifiable
showed the auto encoder identified 89% of attacks within 24 improvements over traditional techniques for practical network
hours and had a 2.4% overall false positive rate. Compared to monitoring deployments.
default threshold-based alerting in Zabbix on individual metrics,
the model significantly improved timeliness and accuracy of Conclusion
anomaly detections across the testbed [39]. This proved the In conclusion, this research aimed to examine how AI and machine
concept of integrating pre-trained ML models as plugins to gain learning techniques can improve anomaly detection capabilities
the advantages of more adaptive, intelligent monitoring [40]. Such for network monitoring systems. After outlining the limitations of
studies help demonstrate the benefits and practical challenges of traditional rule-based monitoring approaches, various supervised
adopting AI in real network operations. Overall, this case study and unsupervised machine learning algorithms were explored for
illustrates how an academic approach of model customization, developing robust anomaly detection models tailored to network
experimentation and evaluation can be applied to real world tools environments. Specifically, clustering, isolation forest, auto
for enhanced anomaly detection. encoders and RNN models were identified as commonly used
techniques.
Testing Methodology and Sample Results: Accuracy, False
Positives To demonstrate integration of the ML models with existing
To properly evaluate the performance of the AI-augmented monitoring tools, an architecture was proposed to add AI modules
anomaly detection models integrated with Zabbix, a rigorous as plug-ins. A case study featuring an auto encoder model integrated
testing methodology was designed and sample results analyzed. with the Zabbix tool showed improved detection rates of synthetic
The test environment consisted of the 20 VM testbed continuously network attacks compared to basic threshold monitoring. Further
monitored by Zabbix over 6 months. 10% of the last 3 months testing methodology and sample results validated the AI approach
of normal data was held out as the validation set for final model can significantly reduce false positives while enhancing speed and
evaluation. Synthetic attacks simulating common classes of accuracy of anomaly identification.
anomalies were scripted to be periodically injected into VMs
over weeks [41]. These included DDoS floods, port scans, crashing While the potential of AI-driven monitoring was exhibited,
services, and abnormal traffic/resource usage. The effectiveness challenges around acquiring sufficient representative training
metrics used to compare the AI models versus baseline Zabbix data, feature engineering expertise, and balancing model changes
alerting were: were also discussed. Overall, continued research seeking to
• Attack Detection Rate: Percentage of attacks successfully address current limitations through techniques like self-supervised
detected within 24 hours (Figure 5). learning, ensemble modeling and semi-supervised approaches
• False Positive Rate: Alerts flagged in error during normal could help strengthen practical adoption and management of
operations modern, complex networks through more adaptive intelligent
• Mean Time to Detect: Average time taken to detect attacks monitoring [46,47].
References
1. Mariano Hernández D, Hernández Callejo L, Zorita Lamadrid
A, Duque Pérez O, García FS (2021) A review of strategies
for building energy management system: Model predictive
control, demand side management, optimization, and fault
detect & diagnosis. Journal of Building Engineering 33:
101692.
2. Abbasi M, Shahraki A, Taherkordi A (2021) Deep learning for
network traffic monitoring and analysis (NTMA): A survey.
Computer Communications 170: 19-41.
3. Alamri R, Murugesan RK, Man M, Abdulateef AF, Al
Sharafi MA, et al. (2021) A review of machine learning and
Figure 5: Attack Detection Rate [42]. deep learning techniques for anomaly detection in IoT data.
Applied Sciences 11: 5320. techniques for suspicious behavior recognition in intelligent
4. Wang L, Yang J, Xu X, Wan PJ (2021) Mining network traffic surveillance system. International Journal of Information
with the k-means clustering algorithm for stepping-stone Technology 14: 397-410.
intrusion detection. Wireless Communications and Mobile 21. Oyelade Jelili, Itunuoluwa Isewon, Olufunke Oladipupo,
Computing 1-9. Onyeka Emebo, Zacchaeus Omogbadegun, et al. (2019) Data
5. Kim Y, Vasarhelyi MA (2024) Anomaly detection with the clustering: Algorithms and its applications. In 2019 19th
density based spatial clustering of applications with noise International Conference on Computational Science and Its
(DBSCAN) to detect potentially fraudulent wire transfers. Applications (ICCSA) 71-81.
The International Journal of Digital Accounting Research 22. Gonzalez Francisco J, Maciej Balajewicz (2018) Deep
24: 57-91. convolutional recurrent autoencoders for learning
6. Togbe MU, Barry M, Boly A, Chabchoub Y, Chiky R, et al. low-dimensional feature dynamics of fluid systems.
(2020) Anomaly detection for data streams based on isolation arXiv preprint arXiv:1808.01346 https://arxiv.org/
forest using scikit-multiflow. In Computational Science and abs/1808.01346#:~:text=The%20deep%20convolutional%20
Its Applications–ICCSA 2020: 20th International Conference autoencoder%20returns,in%20a%20computationally%20
Cagliari Italy 20: 15-30. efficient%20manner.
7. Laskar MTR, Huang JX, Smetana V, Stewart C, Pouw K, et 23. Garg Astha, Wenyu Zhang, Jules Samaran, Ramasamy
al. (2021) Extending isolation forest for anomaly detection in Savitha, Chuan Sheng Foo (2021) An evaluation of anomaly
big data via K-means. ACM Transactions on Cyber-Physical detection and diagnosis in multivariate time series. IEEE
Systems (TCPS) 5: 1-26. Transactions on Neural Networks and Learning Systems
8. Torabi H, Mirtaheri SL, Greco S (2023) Practical autoencoder 33: 2508-2517.
based anomaly detection by using vector reconstruction error. 24. Bécue Adrien, Isabel Praça, João Gama (2021) Artificial
Cybersecurity 6: 1. intelligence, cyber-threats and Industry 4.0: Challenges and
9. Preethi D, Khare N (2021) Sparse auto encoder driven opportunities. Artificial Intelligence Review 54: 3849-3886.
support vector regression based deep learning model for 25. Maimó Lorenzo Fernández, Ángel Luis Perales Gómez, Félix
predicting network intrusions. Peer-to-Peer Networking and J García Clemente, Manuel Gil Pérez, Gregorio Martínez
Applications 14: 2419-2429. Pérez (2018) A self-adaptive deep learning-based system for
10. Mou L, Zhao P, Xie H, Chen Y (2019) T-LSTM: A long anomaly detection in 5G networks. Ieee Access 6: 7700-7712.
short-term memory neural network enhanced by temporal 26. Zhang Wei, Xiaowei Dong, Huaibao Li, Jin Xu, Dan Wang
information for traffic flow prediction. Ieee Access 7: 98053- (2020) Unsupervised detection of abnormal electricity
98060. consumption behavior based on feature engineering. IEEE
11. Makineedi SH, Chowdhury S, Manivannan V (2022) Artificial Access 8: 55483-55500.
intelligence based real time packet analysing to detect DoS 27. Kuo Rita, Cheng Li Chen, Zhong Xiu Lu, Maiga Chang,
attacks. In International Conference on Image Processing and Hung Yi Chang (2019) Educational reward information
Capsule Networks. Cham: Springer International Publishing communication API (ERIC API): A preliminary study result.
305-320. Revista Produção e Desenvolvimento 5.
12. Saeed MM, Saeed RA, Abdelhaq M, Alsaqour R, Hasan 28. Saurav Sakti, Pankaj Malhotra, Vishnu TV, Narendhar
MK, et al. (2023) Anomaly detection in 6G networks using Gugulothu, Lovekesh Vig, et al. (2018) Online anomaly
machine learning methods. Electronics 12: 3300. detection with concept drift adaptation using recurrent neural
13. Chahal D, Kharb L, Choudhary D (2019) Performance networks. In Proceedings of the ACM india joint international
analytics of network monitoring tools. Int. J Innov Technol conference on data science and management of data 78-87.
Explor Eng IJITEE 8. 29. Rawindaran Nisha, Ambikesh Jayal, Edmond Prakash,
14. Qin T, Li C, Sun S, Liu G (2024) A device information- Chaminda Hewage (2021) Cost benefits of using machine
centered accelerator control network management system. learning features in NIDS for cyber security in UK small
Radiation Detection Technology and Methods 1-17. medium enterprises (SME). Future Internet 13: 186.
15. Calderon G, del Campo G, Saavedra E, Santamaría A (2023) 30. Papamartzivanos Dimitrios, Félix Gómez Mármol, Georgios
Monitoring Framework for the Performance Evaluation Kambourakis (2019) Introducing deep learning self-adaptive
of an IoT Platform with Elasticsearch and Apache Kafka. misuse network intrusion detection systems. IEEE access 7:
Information Systems Frontiers 1-17. 13546-13560.
16. Blázquez García A, Conde A, Mori U, Lozano JA (2021) A 31. Zhou Tianyi, Shengjie Wang, Jeff Bilmes (2020) Robust
review on outlier/anomaly detection in time series data. ACM curriculum learning: from clean label detection to
Computing Surveys (CSUR) 54: 1-33. noisy label self-correction. In International Conference
17. Bharadiya JP (2023) Machine learning and AI in business on Learning Representations https://openreview.net/
intelligence: Trends and opportunities. International Journal forum?id=lmTWnm3coJJ.
of Computer (IJC) 48: 123-134. 32. Zhang Kexin, Qingsong Wen, Chaoli Zhang, Rongyao Cai,
18. Diro Abebe, Naveen Chilamkurti, Van-Doan Nguyen, Will Ming Jin, et al. (2024) Self-supervised learning for time
Heyne (2021) A comprehensive study of anomaly detection series analysis: Taxonomy, progress, and prospects. IEEE
schemes in IoT networks using machine learning algorithms. Transactions on Pattern Analysis and Machine Intelligence
Sensors 21: 8320. https://arxiv.org/abs/2306.10125.
19. Saeed Mamoon M, Rashid A Saeed, Maha Abdelhaq, Raed 33. Choi Kukjin, Jihun Yi, Changhwa Park, Sungroh Yoon (2021)
Alsaqour, Mohammad Kamrul Hasan, et al. (2023) Anomaly Deep learning for anomaly detection in time-series data:
detection in 6G networks using machine learning methods. Review, analysis, and guidelines. IEEE access 9: 120043-
Electronics 12: 3300. 120065.
20. Verma Kamal Kant, Brij Mohan Singh, Amit Dixit (2022) 34. Chanal Poornima M, Mahabaleshwar S Kakkasageri (2020)
A review of supervised and unsupervised machine learning Security and privacy in IoT: a survey. Wireless Personal
Communications 115: 1667-1693. 42. Sharma Kapil, Satish Saini, Shailja Sharma, Hardeep Singh
35. Noor Ayman Ibrahem (2021) Real-Time QoS Monitoring and Kang, Mohamed Bouye, et al. (2022) Big Data Analytics
Anomaly Detection on Microservice-based Applications in Model for Distributed Document Using Hybrid Optimization
Cloud-Edge Infrastructure. PhD DISS Newcastle University with-Means Clustering. Wireless Communications and
https://theses.ncl.ac.uk/jspui/handle/10443/5497. Mobile Computing 2022.
36. Gunawi Haryadi S, Riza O Suminto, Russell Sears, Casey 43. Guerra Jorge Luis, Carlos Catania, Eduardo Veas (2022)
Golliher, Swaminathan Sundararaman, et al. (2018) Fail-slow Datasets are not enough: Challenges in labeling network
at scale: Evidence of hardware performance faults in large traffic. Computers & Security 120: 102810.
production systems. ACM Transactions on Storage (TOS) 44. Alahmadi Bushra A, Louise Axon, Ivan Martinovic (2022)
14: 1-26. 99% False Positives: A Qualitative Study of {SOC} Analysts’
37. (2024) Server Monitoring. https://www.zabbix.com/server_ Perspectives on Security Alarms. In 31st USENIX Security
monitoring. Symposium (USENIX Security 22) 2783-2800.
38. Saha Avirup, Niloy Ganguly, Sandip Chakraborty, Abir De 45. Singh Sachin Kumar, Shreeman Gautam, Cameron Cartier,
(2019) Learning network traffic dynamics using temporal Sameer Patil, Robert Ricci (2024) Where the Wild Things
point process. In IEEE INFOCOM 2019-IEEE Conference Are: {Brute-Force} {SSH} Attacks in The Wild and How
on Computer Communications 1927-1935. to Stop Them. In 21st USENIX Symposium on Networked
39. Trihinas Demetris Y (2018) Low-cost approximate and Systems Design and Implementation (NSDI 24) 1731-1750.
adaptive monitoring techniques. https://gnosis.library.ucy. 46. Kahveci Sinan, Bugra Alkan, Ahmad Mus’ab H, Bilal Ahmad,
ac.cy/handle/7/50205?show=full. Robert Harrison (2022) An end-to-end big data analytics
40. Kolides Adam, Alyna Nawaz, Anshu Rathor, Denzel Beeman, platform for IoT-enabled smart factories: A case study of
Muzammil Hashmi, et al. (2023) Artificial intelligence battery module assembly system for electric vehicles. Journal
foundation and pre-trained models: Fundamentals, of Manufacturing Systems 63: 214-223.
applications, opportunities, and social impacts. Simulation 47. Bhardwaj Aanshi, Veenu Mangat, Renu Vig, Subir Halder,
Modelling Practice and Theory 126: 102754. Mauro Conti (2021) Distributed denial of service attacks in
41. Rosso Martin, Michele Campobasso, Ganduulga Gankhuyag, cloud: State-of-the-art of scientific and commercial solutions.
Luca Allodi (2020) Saibersoc: Synthetic attack injection to Computer Science Review 39: 100332.
benchmark and evaluate the performance of security operation
centers. In Proceedings of the 36th Annual Computer Security
Applications Conference 141-153.