Collaborative Federated Learning-Based Model for Alert Correlation and Attack Scenario Recognition
Abstract
:1. Introduction
- -
- Perform an extensive analysis of alert data from various datasets to model multi-stage and sophisticated attacks in order to reconstruct intrusion scenarios.
- -
- Design a security system utilizing collaborative federated learning models to detect the cross-correlation between alerts generated from different sources and with various formats. Zero-day and novel attacks can be predicated by building high-level abstraction of attacker actions against protected systems.
- -
- Evaluate the proposed system using a benchmark dataset based on different metrics of performance and accuracy.
2. Related Work
3. Materials and Methods
3.1. Modeling Multi-Stage Attack Scenario
3.2. Alert Correlation Model
3.3. Dataset Description
3.4. Implementation
3.4.1. Preprocessing
3.4.2. Feature Selection
3.4.3. Data Splitting, Training, and Testing
3.4.4. Centralized Learning Models
3.4.5. Federated Learning Model
Algorithm 1: CNN_FL: K Clients, Model parameter , Global Model 𝓜g |
Input: Dataset UNSW-NB15, k: local dataset, k: number of Clients , i: number of rounds Output: Model parameter , Global Model 𝓜g |
Server # Initialize the global model parameters Initialize 𝓜g for each round i do # Repeat until 𝓜g converges ← set of k clients # Repeat optimization of global model 𝓜g for each client do #aggregate local models 𝓜k distribute on k clients UpdateClient For each local round #train local models 𝓜k Train (𝓜k, k, ) # train local models |
Algorithm 2: Experimental implementation of the proposed CNN_FL model |
1. START 2. import tensorflow federated liberary as tff 3. Declare No_of_Clients, No_of_rounds,lr,labels 4. for c in range (No_Of_Clients): 5. Read data(c) 6. Preprocessing(c) 7. Splitting(data) 8. Call function createModel() 9. Trainer=tff.learning.build_federated_averaging_process(createModel,optimizer) 10. build CNN model(trainlist[0]) 11. for j in range (No_of_rounds): 12. state, metrics = trainer.next(state, train): 13. print(metrics[‘train’][‘loss’]}, Accuracy={metrics[‘train’][‘accuracy’] END |
3.5. Experimental Setup
3.6. Evaluation
- Accuracy is a metric employed to assess the ratio of accurate classifications of the overall number of entries, as expressed by the following formula:
- Precision refers to the ratio of accurately expected attack classes to the total number of predicted attack results. This can be calculated using the formula:
- Recall refers to the ratio of correctly classified attack occurrences to the total number of samples that should have been identified as attacks. It is mathematically represented as:
- The F1-score is a metric that measures the ratio between Precision and Recall by calculating their Harmonic Mean.
4. Results and Discussion
4.1. Centralized Model Results
4.2. FL Model Results
4.3. Comparison of the Results of Work with Related Works
5. Conclusions and Further Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bhattacharya, S.; Maddikunta, P.K.; Kaluri, R.; Singh, S.; Gadekallu, T.R.; Alazab, M.; Tariq, U. A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU. Electronics 2020, 9, 219. [Google Scholar] [CrossRef]
- Preuveneers, D.; Rimmer, V.; Tsingenopoulos, I.; Spooren, J.; Joosen, W.; Ilie-Zudor, E. Chained Anomaly Detection Models for Federated Learning: An Intrusion Detection Case Study. Appl. Sci. 2018, 8, 2663. [Google Scholar] [CrossRef]
- Bhatti, D.G.; Virparia, P.V. Soft Computing-Based Intrusion Detection System With Reduced False Positive Rate. In Design and Analysis of Security Protocol for Communication; Wiley: Hoboken, NJ, USA, 2020; pp. 109–139. [Google Scholar] [CrossRef]
- Anwar, S.; Mohamad Zain, J.; Zolkipli, M.F.; Inayat, Z.; Khan, S.; Anthony, B.; Chang, V. From Intrusion Detection to an Intrusion Response System: Fundamentals, Requirements, and Future Directions. Algorithms 2017, 10, 39. [Google Scholar] [CrossRef]
- Jadidi, Z.; Hagemann, J.; Quevedo, D. Multi-step attack detection in industrial control systems using causal analysis. Comput. Ind. 2022, 142, 103741. [Google Scholar] [CrossRef]
- Sharma, A.; Gupta, B.B.; Singh, A.K.; Saraswat, V.K. A novel approach for detection of APT malware using multi-dimensional hybrid Bayesian belief network. Int. J. Inf. Secur. 2023, 22, 119–135. [Google Scholar] [CrossRef]
- Manzoor, E.; Milajerdi, S.M.; Akoglu, L. Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1035–1044. [Google Scholar] [CrossRef]
- Ansari, M.S.; Bartos, V.; Lee, B. Shallow and Deep Learning Approaches for Network Intrusion Alert Prediction. Procedia Comput. Sci. 2020, 171, 644–653. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, Y.; Wang, J.; Chen, B. FedMEC: Improving Efficiency of Differentially Private Federated Learning via Mobile Edge Computing. Mob. Netw. Appl. 2020, 25, 2421–2433. [Google Scholar] [CrossRef]
- Michie, D.; Spiegelhalter, D.J.; Taylor, C.C. Machine Learning, Neurall and Statistical Classification; Ellis Horwood Series in Artificial Intelligence: New York, NY, USA, 1994; Volume 13. [Google Scholar]
- Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Martin, L. Cyber Kill Chain. 2014. Available online: http://cyber.lockheedmartin.com/ (accessed on 27 July 2023).
- Rahman, S.A.; Tout, H.; Talhi, C.; Mourad, A. Internet of Things Intrusion Detection: Centralized, On-Device, or Federated Learning? IEEE Netw. 2020, 34, 310–317. [Google Scholar] [CrossRef]
- Chen, Z.; Lv, N.; Liu, P.; Fang, Y.; Chen, K.; Pan, W. Intrusion Detection for Wireless Edge Networks Based on Federated Learning. IEEE Access 2020, 8, 217463–217472. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-IID data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Truong, H.T.; Ta, B.P.; Le, Q.A.; Nguyen, D.M.; Le, C.T.; Nguyen, H.X.; Do, H.T.; Nguyen, H.T.; Tran, K.P. Light-weight federated learning-based anomaly detection for time-series data in industrial control systems. Comput. Ind. 2022, 140, 103692. [Google Scholar] [CrossRef]
- Wilkens, F.; Ortmann, F.; Haas, S.; Vallentin, M.; Fischer, M. Multi-Stage Attack Detection via Kill Chain State Machines. In Proceedings of the 3rd Workshop on Cyber-Security Arms Race, Virtual, 15 November 2021; pp. 13–24. [Google Scholar] [CrossRef]
- Ghafir, I.; Hammoudeh, M.; Prenosil, V.; Han, L.; Hegarty, R.; Rabie, K.; Aparicio-Navarro, F.J. Detection of advanced persistent threat using machine-learning correlation analysis. Future Gener. Comput. Syst. 2018, 89, 349–359. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
- Khosravi, M.; Ladani, B.T. Alerts Correlation and Causal Analysis for APT Based Cyber Attack Detection. IEEE Access 2020, 8, 162642–162656. [Google Scholar] [CrossRef]
- Li, Z.; Chen, J.; Zhang, J.; Cheng, X.; Chen, B. Detecting Advanced Persistent Threat in Edge Computing via Federated Learning. In Proceedings of the Security and Privacy in Digital Economy: First International Conference, SPDE 2020, Quzhou, China, 30 October–1 November 2020; Springer: Singapore, 2020; pp. 518–532. [Google Scholar] [CrossRef]
- Neuschmied, H.; Winter, M.; Stojanović, B.; Hofer-Schmitz, K.; Božić, J.; Kleb, U. APT-Attack Detection Based on Multi-Stage Autoencoders. Appl. Sci. 2022, 12, 6816. [Google Scholar] [CrossRef]
- Xia, Q.; Dong, S.; Peng, T. An Abnormal Traffic Detection Method for IoT Devices Based on Federated Learning and Depthwise Separable Convolutional Neural Networks. In Proceedings of the 2022 IEEE International Performance, Computing, and Communications Conference (IPCCC), Austin, TX, USA, 11–13 November 2022; pp. 352–359. [Google Scholar] [CrossRef]
- Thi, H.T.; Son, N.D.H.; Duy, P.T.; Pham, V.-H. Federated Learning-Based Cyber Threat Hunting for APT Attack Detection in SDN-Enabled Networks. In Proceedings of the 2022 21st International Symposium on Communications and Information Technologies (ISCIT), Xi’an, China, 27–30 September 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Giura, P.; Wang, W. Using large scale distributed computing to unveil advanced persistent threats. Sci. J. 2012, 1, 93–105. [Google Scholar]
- Wang, X.; Zheng, K.; Niu, X.; Wu, B.; Wu, C. Detection of command and control in advanced persistent threat based on independent access. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Lajevardi, A.M.; Amini, M. A semantic-based correlation approach for detecting hybrid and low-level APTs. Future Gener. Comput. Syst. 2019, 96, 64–88. [Google Scholar] [CrossRef]
- Yin, Y.; Jang-Jaccard, J.; Xu, W.; Singh, A.; Zhu, J.; Sabrina, F.; Kwak, J. IGRF-RFE: A hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. J. Big Data 2023, 10, 15. [Google Scholar] [CrossRef]
- Kasongo, S.M.; Sun, Y. Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset. J. Big Data 2020, 7, 105. [Google Scholar] [CrossRef]
- Hairab, B.I.; Elsayed, M.S.; Jurcut, A.D.; Azer, M.A. Anomaly Detection Based on CNN and Regularization Techniques Against Zero-Day Attacks in IoT Networks. IEEE Access 2022, 10, 98427–98440. [Google Scholar] [CrossRef]
- Almaiah, M.A.; Almomani, O.; Alsaaidah, A.; Al-Otaibi, S.; Bani-Hani, N.; Hwaitat, A.K.; Al-Zahrani, A.; Lutfi, A.; Awad, A.B.; Aldhyani, T.H. Performance Investigation of Principal Component Analysis for Intrusion Detection System Using Different Support Vector Machine Kernels. Electronics 2022, 11, 3571. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Cox, K.J.; Gerg, C. Managing Security with Snort & IDS Tools: Intrusion Detection with Open Source Tools; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2004. [Google Scholar]
- Roesch, M. Snort: Lightweight intrusion detection for networks. In Proceedings of the LISA ‘99: 13th Systems Administration Conference, Seattle, WA, USA, 7–12 November 1999; Volume 99, pp. 229–238. [Google Scholar]
- Waleed, A.; Jamali, A.F.; Masood, A. Which open-source IDS? Snort, Suricata or Zeek. Comput. Netw. 2022, 213, 109116. [Google Scholar] [CrossRef]
- Wang, J.; Xu, M.; Wang, H.; Zhang, J. Classification of Imbalanced Data by Using the SMOTE Algorithm and Locally Linear Embedding. In Proceedings of the 2006 8th International Conference on Signal Processing, Guilin, China, 16–20 November 2006. [Google Scholar] [CrossRef]
- Strom, B.E.; Applebaum, A.; Miller, D.P.; Nickels, K.C.; Pennington, A.G.; Thomas, C.B. Mitre Att&ck: Design and Philosophy. 2018. Available online: https://www.mitre.org/news-insights/publication/mitre-attck-design-and-philosophy (accessed on 30 October 2023).
- Alhaj, T.A.; Siraj, M.M.; Zainal, A.; Elshoush, H.T.; Elhaj, F. Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation. PLoS ONE 2016, 11, e0166017. [Google Scholar] [CrossRef]
- Fauzi, M.A.; Yang, B.; Blobel, B. Comparative Analysis between Individual, Centralized, and Federated Learning for Smartwatch Based Stress Detection. J. Pers. Med. 2022, 12, 1584. [Google Scholar] [CrossRef] [PubMed]
- Khan, M.; Glavin, F.G.; Nickles, M. Federated Learning as a Privacy Solution—An Overview. Procedia Comput. Sci. 2023, 217, 316–325. [Google Scholar] [CrossRef]
- Ma, X.; Liao, L.; Li, Z.; Lai, R.X.; Zhang, M. Applying Federated Learning in Software-Defined Networks: A Survey. Symmetry 2022, 14, 195. [Google Scholar] [CrossRef]
- Al-Hejri, A.M.; Al-Tam, R.M.; Fazea, M.; Sable, A.H.; Lee, S.; Al-antari, M.A. ETECADx: Ensemble Self-Attention Transformer Encoder for Breast Cancer Diagnosis Using Full-Field Digital X-ray Breast Images. Diagnostics 2022, 13, 89. [Google Scholar] [CrossRef]
- Houssein, E.H.; Mohamed, O.; Abdel Samee, N.; Mahmoud, N.F.; Talaat, R.; Al-Hejri, A.M.; Al-Tam, R.M. Using deep DenseNet with cyclical learning rate to classify leukocytes for leukemia identification. Front. Oncol. 2023, 13, 1230434. [Google Scholar] [CrossRef]
- Nwakanma, C.I.; Ahakonye, L.A.; Njoku, J.N.; Odirichukwu, J.C.; Okolie, S.A.; Uzondu, C.; Ndubuisi Nweke, C.C.; Kim, D.S. Explainable Artificial Intelligence (XAI) for Intrusion Detection and Mitigation in Intelligent Connected Vehicles: A Review. Appl. Sci. 2023, 13, 1252. [Google Scholar] [CrossRef]
- Al-Tam, R.M.; Al-Hejri, A.M.; Narangale, S.M.; Samee, N.A.; Mahmoud, N.F.; Al-Masni, M.A.; Al-Antari, M.A. A Hybrid Workflow of Residual Convolutional Transformer Encoder for Breast Cancer Classification Using Digital X-ray Mammograms. Biomedicines 2022, 10, 2971. [Google Scholar] [CrossRef] [PubMed]
Work | Year | Dataset | Approaches | Moel | Weakness |
---|---|---|---|---|---|
M. A. Ferrag [20] | 2022 | Edge-IIoTset | Modeling attacks traffic and process used DT, RF, SVM, KNN, and DNN. | Centralized and federated learning | Identified 61 features with high correlations for traffic, did not include scenario for alerts, and generated meta-alerts. |
M. Khosravi et al. [21] | 2020 | Semi real-world dataset | Modeling attacks process, generating meta-alerts with APT steps and host score for all risk levels. | Finding IKCs using Causal Relation Analysis | No centralized and federated learning to classify APT attacks. |
Z. Li et al. [22] | 2020 | UNSW-NB15 and synthetic datasets | Modeling attacks process, correlating alerts to APT stages and identifying the probability of APT stage change. | Federated | Gain correlation method determines association but cannot predict causation. |
I. Ghafir et al. [19] | 2018 | Simulation dataset | Created a correlation framework to link the alerts to the APT attacks and use ML models to predict network events. | ML | Only considered network events. |
H. Neuschmied et al. [23] | 2022 | Contagio and CICIDS2017 | Detection of abnormal behavior based on network traffic analysis. | Several autoencoders | Lack of generality and only identified network events. |
Q. Xia et al. [24] | 2022 | Aposemat IoT-23 | Detection of abnormal behavior based on network traffic. | FL CNN | Did not study the causal relation of alerts. |
H. T. Thi et al. [25] | 2022 | UNSW-NB15 | Detection of APT attacks based on network traffic in SDN. | FL | Considered network events. |
Yin, Y., Jang-Jaccard [29] | 2023 | UNSW-NB15 | Filter methods were employed to assess the impact of less significant features in relation to high-frequency values. | MLP | Only classification network attacks based on features filtering without studying the causal relation of alerts. |
Kasongo et al. [30] | 2021 | UNSW-NB15 | Detection of abnormal behavior based on network traffic analysis. | Ensemble models | Did not conduct the causal relation of alerts or stages. |
B. I. Hairab [31] | 2022 | Bot-IoT dataset | Focus on zero-day attacks that have not been previously reported within the network. | ML and DL methods | Only considered DoS and DDoS scenarios for traffic attacks. |
M. A. Almaiah et al. [32] | 2022 | UNSW-NB15 | Detection of abnormal behavior based on network traffic. | PCA and kernals of SVM | Only classification network attacks based on features filtering without studying the causal relation of alerts. |
Stages of APT | Type of Alert | No of Records | Encoding Label |
---|---|---|---|
1st: Reconnaissance | Gathering information | 13,987 | 0 |
2nd: Initial Access | Fuzzer, Analysis | 26,923 | 1 |
3rd: Exploitation | Exploits | 44,525 | 2 |
4th: Persistent | Backdoor, Shellcode | 3840 | 3 |
5th: Lateral Movement | Worms | 174 | 4 |
Alert Type | Records |
---|---|
Exploits | 44,525 |
Fuzzers | 24,246 |
Reconnaissance | 13,987 |
Analysis | 2677 |
Backdoor | 2329 |
Shellcode | 1511 |
Worms | 174 |
Feature Name | Importance Value | Feature Name | Importance Value |
---|---|---|---|
bytes | 0.129649 | loss | 0.023135 |
means | 0.106072 | Sujit | 0.023083 |
ct_srv_dst | 0.072535 | dur | 0.022949 |
state | 0.052549 | Spkts | 0.022914 |
bytes | 0.052046 | snack | 0.022460 |
ct_srv_src | 0.044081 | Dintpkt | 0.021106 |
ct_dst_src_ltm | 0.042994 | Dpkts | 0.020949 |
means | 0.039951 | Sloss | 0.018946 |
Sload | 0.039715 | tcprtt | 0.017903 |
proto | 0.034719 | Djit | 0.013530 |
Dload | 0.034529 | Ackdat | 0.013002 |
service | 0.031950 | ||
Sintpkt |
Parameter | Description |
---|---|
N_CLIENTS = 4 | The total number of clients |
TEST_FRAC = 0.2 | The fraction of the complete dataset that will be taken for the test set |
N_CLASSES = 5 | APT scenario attacks |
LEARNING_RATE = 0.0001 | Learning rate |
BATCH_SIZE = 32 | Batch size |
N_EPOCHS = 50 | The number of epochs (times the dataset will be repeated) |
N_ROUNDS = 20 | Rounds between clients and server to update weights |
Models | ACC | REC | PRE | F1-Score | AUC |
---|---|---|---|---|---|
XGBoost | 0.8809 | 0.8809 | 0.8879 | 0.8823 | 0.925 |
RF | 0.8795 | 0.8795 | 0.8833 | 0.8803 | 0.8803 |
CatBoost | 0.8529 | 0.8529 | 0.8629 | 0.8547 | 0.8547 |
Ensemble model | 0.8815 | 0.8815 | 0.8876 | 0.8827 | 0.9259 |
CNN | 0.8457 | 0.8457 | 0.8452 | 0.8393 | 0.8639 |
Stages of APT | Records | Client 1 | Client 2 | Client 3 | Client 4 |
1st: Reconnaissance | 13,987 | 3570 | 3496 | 3514 | 3407 |
2nd: Initial Access | 26,923 | 6817 | 6718 | 6688 | 6700 |
3rd: Exploitation | 44,525 | 11,005 | 11,120 | 11,098 | 11,302 |
4th: Persistent | 3840 | 931 | 984 | 1015 | 910 |
5th: Lateral Movement | 174 | 49 | 44 | 47 | 43 |
Total | 89,449 | 22,372 | 22,362 | 22,362 | 22,362 |
Strategy | ACC | SEN | SPE | F1-Score | AUC |
---|---|---|---|---|---|
The proposed CNN_FL Model | 0.9018 | 0. 9018 | 0.9011 | 0.9009 | 0.9322 |
Approach | ACC |
---|---|
M. A. Ferrag [20] | Identified 61 features with high correlations for traffic but did not include scenarios for alerts and generating APT. |
M. Khosravi et al. [21] | Modeling attacks process, generating meta-alerts with APT steps and host score for all risk levels, not AI classification |
W. Giura, P., and Wang [26] | 81.80% |
X. Wang and K. Zheng [27] | 83.30% |
Lajevardi et al. [28] | 84.21% |
Yin, Y. [29] | 84.26% |
M. Khosravi and B. T. Ladani [21] | 87.10% |
The proposed work | 90.01% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alkhpor, H.K.; Alserhani, F.M. Collaborative Federated Learning-Based Model for Alert Correlation and Attack Scenario Recognition. Electronics 2023, 12, 4509. https://doi.org/10.3390/electronics12214509
Alkhpor HK, Alserhani FM. Collaborative Federated Learning-Based Model for Alert Correlation and Attack Scenario Recognition. Electronics. 2023; 12(21):4509. https://doi.org/10.3390/electronics12214509
Chicago/Turabian StyleAlkhpor, Hadeel K., and Faeiz M. Alserhani. 2023. "Collaborative Federated Learning-Based Model for Alert Correlation and Attack Scenario Recognition" Electronics 12, no. 21: 4509. https://doi.org/10.3390/electronics12214509
APA StyleAlkhpor, H. K., & Alserhani, F. M. (2023). Collaborative Federated Learning-Based Model for Alert Correlation and Attack Scenario Recognition. Electronics, 12(21), 4509. https://doi.org/10.3390/electronics12214509