0% found this document useful (0 votes)
95 views

IEEE2023 Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques

The document presents a cloud-based intrusion detection model that uses random forest classification and feature engineering. The model is evaluated on two datasets and achieves detection accuracy above 98%. The proposed approach performs better than recent related works in terms of accuracy, precision, and recall. The model aims to address limitations of traditional intrusion detection systems for cloud environments.

Uploaded by

balavinmail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

IEEE2023 Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques

The document presents a cloud-based intrusion detection model that uses random forest classification and feature engineering. The model is evaluated on two datasets and achieves detection accuracy above 98%. The proposed approach performs better than recent related works in terms of accuracy, precision, and recall. The model aims to address limitations of traditional intrusion detection systems for cloud environments.

Uploaded by

balavinmail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

BIG DATA MINING AND ANALYTICS

ISSN 2096-0654 05/10 pp311 –320


Volume 6, Number 3, September 2023
DOI: 10.26599/BDMA.2022.9020038

Cloud-Based Intrusion Detection Approach Using


Machine Learning Techniques
Hanaa Attou, Azidine Guezzaz , Said Benkirane, Mourade Azrour, and Yousef Farhaoui

Abstract: Cloud computing (CC) is a novel technology that has made it easier to access network and computer
resources on demand such as storage and data management services. In addition, it aims to strengthen systems
and make them useful. Regardless of these advantages, cloud providers suffer from many security limits. Particularly,
the security of resources and services represents a real challenge for cloud technologies. For this reason, a set
of solutions have been implemented to improve cloud security by monitoring resources, services, and networks,
then detect attacks. Actually, intrusion detection system (IDS) is an enhanced mechanism used to control traffic
within networks and detect abnormal activities. This paper presents a cloud-based intrusion detection model
based on random forest (RF) and feature engineering. Specifically, the RF classifier is obtained and integrated to
enhance accuracy (ACC) of the proposed detection model. The proposed model approach has been evaluated
and validated on two datasets and gives 98.3% ACC and 99.99% ACC using Bot-IoT and NSL-KDD datasets,
respectively. Consequently, the obtained results present good performances in terms of ACC, precision, and recall
when compared to the recent related works.

Key words: cloud security; anomaly detection; features engineering; random forest

1 Introduction elasticity, and measured service. Recently, the cloud


suffers from many security problems like availability,
Cloud technologies allow practical access on demand
data confidentiality, integrity, and control authorization.
to a shared network, storage, and resources and offer
In addition, the Internet is used to facilitate access
more choices regarding their service models[1] . These
to the services offered by the cloud representing a
models are platform as a service (PaaS), software
major source of threats that can infect the cloud
as a service (SaaS), and infrastructure as a service
systems and resources[2] . Then enhancing cloud security
(IaaS)[2] , used in one of the deployment models private,
becomes a primary challenge for cloud providers[5] .
public, and hybrid cloud[3] . The cloud provides services
Therefore, several approaches such as firewall tools,
with high performance due to its characteristics[2]
data encryption algorithms, authentication protocols,
according to the National Institute of Standards and
and others have been developed to better secure
Technology[4] : network access, resource pooling, quick
cloud environments from various attacks[6] . However,
 Hanaa Attou, Azidine Guezzaz, and Said Benkirane are with the
Technology Higher School Essaouira, Cadi Ayyad University, traditional systems are not sufficient to secure cloud
Marrakech 44000, Morocco. E-mail: [email protected]. services from different limits[7] . Therefore, a set of
 Mourade Azrour and Yousef Farhaoui are with the STI intrusion detection approaches are proposed and applied
Laboratory, the IDMS team, Faculty of Sciences and Techniques, to detect and prevent undesirable activities in real-
Moulay Ismail University of Meknès, Errachidia 25003, time[8, 9] . In general, the detection methods are divided
Morocco.
into misuse detection method which uses known attacks
* To whom correspondence should be addressed.
to detect intrusion and anomaly detection method which
Manuscript received: 2022-09-07; revised: 2022-09-27;
accepted: 2022-10-12 detect intrusion using unknown attack. The hybrid

C The author(s) 2023. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
312 Big Data Mining and Analytics, September 2023, 6(3): 311–320

method is obtained by combining the advantages of these The models of cloud services are IaaS, PaaS, and
two methods[10] . Despite of more solutions given to SaaS. They differ according to the technical layers
secure cloud environments, the recent intrusion detection offered[19] . The IaaS model provides temporary virtual
systems (IDSs) are affected by various significant machines (VMs) and also allows an increase in the
limitations[8] , for example, huge amounts of analyzed storage space of VMs, networks, and load balancers.
data, real-time detection, data quality, and others that They offer the technical layers of IaaS in addition to
aim to decrease the performance of detection models. middleware instances and execution contexts, such as
Nowadays, academic researchers show that intelligent databases and application servers, whereas PaaS models
learning methods[6, 11] such as machine learning (ML), only offer middleware instances. They are provided over
deep learning (DL), and ensemble learning are useful the Internet on-demand and with a measured service[20] .
in various areas[12, 13] and are able to perform network SaaS models offer software[20] and users could run
security[14–18] . Our main goal in this research work desired applications as shown in Fig. 1.
is to propose an anomaly detection approach based Cloud deployment models are intended for different
on random forest (RF) binary classifier and feature entities as needed[2] . The public cloud is a model that
engineering is carried out based on a data visualization intended its resources for public clients, as the name
process aiming to reduce the number of used features suggests. However, the private cloud is only for one
and perform the proposed anomaly detection model. The entity. The hybrid cloud concept combines both private
evaluation performances of the model are implemented and public clouds. Community cloud is a multi-tenant
on NSL-KDD and BoT-IoT datasets. Then, the obtained platform that allows multiple companies to collaborate
outcomes demonstrate model performances. The rest on the same platform if their needs and concerns are
of this paper is described as follows. In Section 2, similar[2] . The most important difference between the
we present the state-of-the-art cloud computing (CC) public cloud and the private cloud is that the private
architectures, IDS, ML methods, and recent related cloud is considered the most secure since it has fewer
works in the domain. The proposed framework is users than the public cloud[2, 5] . Intrusion is a kind
presented in Section 3. In Section 4, we demonstrate the of unauthorized activity that could pose a possible
experimental Setting. Then, in Section 5, we describe in threat to the information’s confidentiality, integrity, and
detail the obtained results. The paper is achieved with a availability[8] . Researchers have developed IDS that
conclusion and future works. aims to detect any type of intrusion. It achieves this
objective by monitoring activities at the network or
2 State-of-the-Art Works host machine. Depending on these activities, IDSs
consist of two basic varieties, Network IDS and Host
In this section, we present a state-of-the-art CC
IDS[8, 21] . We can distinguish between misuse based
architecture, IDS, ML methods, and related works that
detection and anomaly based detection. The first one
describe different algorithms used to enhance IDS and
is used to detect known attacks and the second one is
cloud security.

Fig. 1 Cloud computing models.


Hanaa Attou et al.: Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques 313

used to detect unknown attacks. Hybrid based detection proposed a classification based ML approach to detect
combines both methods for the ability to detect known distributed denial of service (DDoS) attacks in CC
and unknown attacks[12] . ML is a technique used to using three methods (KNN, RF, and NB), the proposed
feed the IDS for identifying attacks[22] . Computers may model achieves 99.76% ACC, and they concluded that
learn without being explicitly programmed thanks to RF gives the best results. Alshammari and Aldribi[22]
the ML field[23] . Concretely, it is a branch of modern applied ML techniques to feed IDS and detect malicious
science that uses statistics to identify patterns in data and network traffic in cloud computing, and ISOT-CID is
then make predictions[24] . We can subdivide ML into the dataset used to evaluate the performance. Jiang et
three types of learning[25] . To discover a relationship al.[33] tested the effectiveness of the suggested attack
from a series of samples, a supervised ML technique detection system using the NSL-KDD dataset and
is utilized. The learned association can then be used concluded that long short term memory (LSTM) and
to forecast data that have not been seen before. The recurrent neural networks (RNNs) are the best choices
most known supervised learning algorithms are k- for multichannel IDS. The system’s efficiency is reported
nearest neighbors (KNNs), decision trees (DTs), linear to be 99.23% and its ACC to be 98.94%. To learn from
regression (LR), neural networks (NNs), and RF[26] . privacy preserved encrypted data on cloud, Khan et
Ferrag et al.[27] used the DT algorithms classifier to al.[34] used supervised and unsupervised ML specifically
train the IDS in a layered approach. Unsupervised ANNs over the scrambled information. SNORT and
learning is unlike supervised learning, data are unlabeled optimized back propagation neural network (BPN) have
and machine learns without an example[25] . Some been proposed as a cooperative and hybrid network
of the important unsupervised learning are clustering, intrusion detection framework by Chiba et al.[7] This
visualization, dimensionality reduction, and association system attempts to improve BPN algorithm by merging
rule learning[23] . Semi-supervised learning is between signature-based detection (SNORT) with anomaly-based
supervised and non-supervised learning[23] . Besides, detection BPN. In Ref. [35], the model used is DL to
deep learning (DL) is a type of ML method based on build two classes, since the database used NSL KDD
learning data representations. Also it is the main ML contains 39 types of attacks that are grouped into 4
technology that relies on algorithms of artificial neural classes, this study showed that working only on two
networks (ANNs)[27] . On the other sides, firewalls are classes normal or anomaly. Kim et al.[36] suggested an
used by cloud providers to identify intrusions, but this architecture for intrusion detection that uses the LSTM
technology does not detect insider attacks[7] . Therefore, as a recurrent neural network and the KDD Cup 99
the challenge is to detect the different types of intrusions dataset. As an input vector, they employed 41 features.
in the cloud. There are different related studies that use In Ref. [37], Zhang proposed an automatic technique
DL and ML techniques to reinforce computer security. that develops the discriminative model and fuses multi-
Kanimozhi and Jacob[28] used a calibration curve to view information to improve accuracy (ACC). Six
evaluate the different classifier methods such as KNN basic features are used by Tang et al.[30] to build an
classifier, Naı̈ve Bayes (NB), Adaboost with DT, support IDS based on DL. According to the attack detection
vector machine (SVM) classifier, and RF classifier to performance, the suggested system obtains an ACC of
detect a portrayal of botnet attacks on dataset CSE- 96.93%. In Ref. [38], Ahmad et al. proposed a method
CIC-IDS2018. Zhou et al.[29] suggested a deep neural for cloud-based text document classification and data
network (DNN) based IDS. In particular, the system integrity. Ahmad et al.[38] concluded that RF outperforms
employs three phases: data acquisition (DAQ), data pre- the different techniques used NB, SVM, and KNN.
processing, and then DNN classifications. With SVM, Recently, Mubarakali et al.[39] used SVM-based expert
the system obtains an accuracy (ACC) of 0.963. A systems to detect distributed denial of service (DDoS).
software-defined networking IDS using a DL method The system performance is reported as 96.23%. The
was put forth by Tang et al.[30] A two-stage DL technique comparison of various current IDS models is shown in
for intrusion detection that focuses on finding malicious Table 1.
assaults on autonomous cars was developed by Zhang et
al.[31] They started with a reliable rule-based system 3 Proposed Framework
and then switched to a DL system in the second In this section, we described various techniques used
stage to detect anomalies. In Ref. [32], Mishra et al. to elaborate our solution. The features reducing are
314 Big Data Mining and Analytics, September 2023, 6(3): 311–320

Table 1 Comparison of various current IDS models.


Reference Year Method used ACC (%) Dataset
[7]
Chiba et al. 2016 BPN – –
ANN 92.00
KNN 100.00
Alshammari and DT 100.00
2021 ISOT-CID
Aldribi[22] SVM 81.00
NB 60.00
RF 100.00
ANN 99.90
RF 99.80
CSE-CIC-
Kanimozhi KNN 99.73
2019 IDS
and Jacob[28] SVN 99.80
2018
Adaboost 99.90
NB 99.20
Zhou et al.[29] 2018 DNN 96.30 –
Tang et al.[30] 2016 DNN 75.75 NSL-KDD
Zhang et al.[31] 2018 DNN – –
[32]
Mishra et al. 2021 RF, KNN, NB 99.76 –
Jiang et al.[33] 2018 LSTM 98.94 NSL-KDD
Khan et al.[34] 2019 ANNs – –
Potluri and
2016 DL 97.50 NSL-KDD
Diedrich[35]
KDD
Kim et al.[36] 2016 LSTM 96.93
CUP’99
Fig. 2 Proposed model architecture.
RF, NB,
Ahmad et al.[38] 2022 92.00 –
SVM, KNN
normal or abnormal activities. The RF algorithm
Mubarakali
2020 SVM 96.23 – groups many DTs whose outputs merged into one
et al.[39]
final output[40] . DT is a supervised learning algorithm
integrated to minimize execution time and to perform without a hyper-parameter designed for classification
prediction. In addition, the RF classifier is trained with and regression[40] . Breiman[41] mentioned the idea that
the two features selected as a subset from the NSL- RF performs well unlike other classifiers like SVM,
KDD dataset to identify intrusions. As illustrated in neural networks, and discriminant analysis. It avoids the
Fig. 2, our proposed model adopts the main standard over-adjustment issue. DT tends to overfit, whereas RF
components of IDS. Hence, our model implements the uses a bagging method to deal with these issues[42] .
collecting data module, preprocessing module data, and  RF as defined in Refs. [40, 43, 44] employs a
decision module. Our contribution focused more on classifier combination.
the preprocessing data module by enhancing feature  Base classifiers using an m tree structure fh.X,
engineering tasks and obtaining reliable predictions. n/, N D 1; 2; : : :; mg.
The preprocessing module focused on data  X represents the input data, and fng is a
normalization. Therefore, the categorical features dependent distributed random vector.
are transformed into numeric values with the dummies  Every DT selects data randomly from the available
function that allows symbolic features to be mapped data.
as numeric values. Then, the detected inconsistencies  Build a forest to build the number of trees “n” by
are deleted. Our intrusion detection model includes repeating the steps above for the number of times “n”.
feature selection to identify and combine useful features According to Refs. [42, 45], the advantages of RF are
for accurate detection. The graphic data visualization  They are less sensitive to outlier data due to their
task is used to select the optimum feature subset that ability to overcome the issue of overfitting in training
can enhance the prediction of the proposed model. data. It is simple to establish parameters, which avoids
Once the subset is selected, the RF algorithm is applied the requirement for tree trimming. Variable importance
to obtain a reliable classifier to distinguish between and ACC are automatically created.
Hanaa Attou et al.: Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques 315

 RF is a classifier that includes a group of tree- udp, and protocol type icmp are the three different types
structured learners that each cast a majority election of protocols.
for the class that received the most input. Using the  src bytes: Data bytes sent from the source to the
training test, a tree is constructed and is independent destination.
of earlier random vectors of the same distribution, and  dst bytes: Number of data bytes sent between
an upper bound is derived for RF to get the prediction source and destination.
error in terms of two factors that are exactitude and  Count: Number of connections made to the same
interdependence of different classifiers. host in the previous two seconds as the connection type.
 srv count: Number of prior two-second
4 Experimental Setting connections to the same service as the current
Our research work is carried out and evaluated in an connection.
experimental setting using a computer with a Core TM- The protocol type categorical variable was converted
i5 8250U CPU running at 1.8 GHz and 12 GB of RAM into numeric values using the dummies function.
running windows 10 professional 64 bits. While Python According to graphic visualization shown in Fig. 3,
3 is used to implement the RF, DT, and SVM models we conclude that the class variable is not influenced by
after graphic visualization is used to reduce features. protocol types variable.
To validate our proposed model, we evaluate the ACC The class anomaly variable can be predicted from the
metric and compare it with the ACC of other models. As variables count, duration and dst host srv count only in
a result, we divided the entire dataset randomly. In the a few points as shown in Fig. 4. For example, if the
training step, 70% are employed, and the last part is used duration variable >1500 then we can detect an anomaly.
in the test step. The best parameters for any classifier Figure 5a shows that we can predict class variable
performance are determined by the dataset used in the if src bytes variable > 0. Also, we can detect that
model training and testing. In this research work, NSL- class anomaly equals 0 if the variable dst bytes is higher
KDD and Bot-IoT datasets are used. In order to address than 50 000 as shown in Fig. 5b.
some of the inherent issues with the KDD 1999 dataset, a The selected variables from visualization are src bytes
new edition of the KDD dataset was developed[46] . The and dst bytes, and then we reduce the number of
NSL-KDD dataset provides the following advantages features from 41 to two features. As the first step,
over the original KDD dataset: It excludes records that after the graphic visualization, RF model was developed
are redundant or duplicated. The amount of records for class anomaly and the selected variables from
is appropriate and selected records are arranged as a graphic visualization src bytes and dst bytes to detect
percentage of records (80% eKDDTrain+20% ARFF). intrusion. The Bot-IoT dataset is more developed since
The forty-one initial features from the KDD’99 dataset it includes IoT devices that work with both simulated
are available in NSL-KDD[46] . The NSL-KDD dataset and actual data[27, 47] . Shafiq et al.[48] identified the
yields the best results. Hence, 41 features are included top five variables with improved characteristics using
in our dataset. The NSL-KDD dataset’s six fundamental ML approaches including DT, NB, RF, and SVM as
properties are utilized to develop various models, as well as measures like Pearson moment correlation and
stated in Ref. [30]. area under the curve (AUC). This dataset contains
 Duration: Duration of the connection in seconds. information on numerous distinct forms of IoT traffic
 protocol type: Protocol type tcp, protocol type flows, including regular traffic, IoT traffic, and botnet

(a) (b) (c)


Fig. 3 Graphic visualization for different variables of protocol type.
316 Big Data Mining and Analytics, September 2023, 6(3): 311–320

(a) (b) (c)


Fig. 4 Graphic visualization.

(a) (b)
Fig. 5 Graphic visualization of the selected variables.

traffic[49] . on RF.
In order to evaluate our proposed model, we have  ACC is calculated using Eq. (1). It is the percentage
selected randomly two features: of accurate predictions made relative to all cases.
 State number: Feature state is represented TP C TN
ACC D  100% (1)
numerically. TP C TN C FP C FN
 Stddev: The standard deviation of records that have  Recall is obtained from Eq. (2). It is used to
been aggregated. determine the percentage of correctly categorized
positive patterns.
5 Result and Discussion TP
Recall D  100% (2)
TP C TN
5.1 Evaluation metrics  Precision is calculated using Eq. (3). It is out of all
We focus on classification models and precisely binary the expected patterns. A precision measurement counts
classification since intrusion detection is a problem when the number of accurately predicted positive patterns in a
we use labeled data to predict whether the object belongs positive class.
to the attack class or not. The results given by a binary TP
Precision D  100% (3)
classification algorithm are 0 or 1. Choosing the right TP C FP
metric is therefore crucial for evaluating and validating 5.2 Obtained results
ML models. In these types of problems, the metrics Initially, the IDS is implemented for the classification
generally consist of comparing the actual classes to the task. The ACC value influences how well the model
classes predicted by the model. This makes it possible to performs. The RF classification model was first
interpret the predicted probabilities for these classes. improved by identifying the features that produce the
The key performance metric for classification is the best classification outcomes. After that, we used a subset
confusion matrix[50] which is visualization, in table form, of the NSL-KDD dataset to train the model.
of the predictions of the model in relation to the real Hence, we have started the evaluation of our model
labels. The instances of a real class are represented in based on the two selected variables from the NSL-
each row and those of a predicted class are represented in KDD src bytes and dst bytes using graphic visualization.
each column of the confusion matrix. From this matrix, To prove that the chosen variables are efficient, we
we can calculate the different metrics ACC, recall, and use matrix correlation which is a table showing the
precision which will allow us to evaluate our IDS based correlation values for several variables. In Fig. 6 the
Hanaa Attou et al.: Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques 317

Fig. 8 Performance comparison between the proposed


model and other works.
Fig. 6 Pearson correlation.
with the use of two selected features from NSL-KDD
matrix depicts the correlation between src bytes and and BoT-IoT and RF than the other works mentioned in
dst bytes, we conclude from Fig. 6 that there is no Fig. 8.
relation between the two variables since the coefficient Consequently, reducing number of explanatory
tends toward 0 which shows that the risk of the variables reduces data collection time and execution
multicollinearity is negligible. The outcomes illustrated time. Hence, we maintain high quality results as shown
in Fig. 7 show that our model performs well in terms in Fig. 8. As a result of all of this, we have demonstrated
of ACC and precision using src bytes and dst bytes, but that our RF classifier technique can distinguish between
the recall requires an improvement. normal and aberrant traffic using only two features.
On the other hand, to check the effectiveness of the Besides, the RF gives good results compared to DNN,
model, BoT-IoT is employed. We aggregate the data into LSTM, DL, and SVM.
a new data frame after importing it separately. Then we
follow the steps outlined in our model displayed in Fig. 2. 6 Conclusion
Two features are selected from BoT-IoT: state number Intrusion detection is a new technology that has
and stddev. Once we have done our tests, the results are improved the security of the cloud. Recently, ML
established in Fig. 7. In Fig. 7, we display the obtained algorithms have been used to develop this technique
results of the three metrics that are used to evaluate the because they are very helpful to secure and monitor
performance and efficiency of our proposed model. As it systems. In this paper, we present an approach for
is very clear, we have obtained 98.3% of ACC, 96.3% of detecting intrusions by combining graphic visualization
precision, and 46.0% of recall using NSL-KDD dataset. and RF for cloud security. Then the first one is used
Furthermore, both ACC, precision, and recall reached for features engineering and the second one is used to
100% when the BoT-IoT dataset is used. predict and detect intrusions. Before the training of the
Figure 8 shows the ACC obtained by the different model, we reduced the number of features to two. Based
models using NSL-KDD and the ACC of our model on the obtained results, the RF classifier is a remarkably
using NSL-KDD and BoT-IoT. Our proposed IDS more accurate method to predict and classify the attack
performs well if we compare it with the works proposed type than DNN, DT, and SVM. We have demonstrated
in Refs. [30, 33, 35, 39]. We obtained a higher ACC the potential of using a small number of features by
contrasting the results with those of other classifiers. But
recall is still not well enough using NSL-KDD, so in
future work, we will focus on this point by using DL and
ensemble learning techniques to improve our model.
References
[1] M. Ali, S. U. Khan, and A. V. Vasilakos, Security in cloud
computing: Opportunities and challenges, Information
Sciences, vol. 35, pp. 357–383, 2015.
Fig. 7 Performance metrics of our model with NSL-KDD [2] A. Singh and K. Chatterjee, Cloud security issues and
and BoT-IoT datasets. challenges: A survey, Journal of Network and Computer
318 Big Data Mining and Analytics, September 2023, 6(3): 311–320

Applications, vol. 79, pp. 88–115, 2017. A multilayer perceptron classifier for monitoring network
[3] P. S. Gowr and N. Kumar, Cloud computing security: traffic, in Proc. 3rd International Conference on Big Data
A survey, International Journal of Engineering and and Networks Technologies, Leuven, Belgium, 2019, pp.
Technology, vol. 7, no. 2, pp. 355–357, 2018. 262–270.
[4] A. Verma and S. Kaushal, Cloud computing security [18] S. Benkirane, Road safety against sybil attacks based
issues and challenges: A survey, in Proc. First on RSU collaboration in VANET environment, in Proc.
International Conference on Advances in Computing and 5th International Conference on Mobile, Secure, and
Communications, Kochi, India, 2011, pp. 445–454. Programmable Networking, Mohammedia, Morocco, 2019,
[5] H. Alloussi, F. Laila, and A. Sekkaki, L’état de l’art de la pp. 163–172.
sécurité dans le cloud computing: Problèmes et solutions [19] Q. Zhang, L. Cheng, and R. Boutaba, Cloud computing:
de la sécurité en cloud computing, presented at Workshop State-of-the-art and research challenges, J. Internet Serv.
on Innovation and New Trends in Information Systems, Appl., vol. 1, pp. 7–18, 2010.
Mohamadia, Maroc, 2012. [20] M. K. Srinivasan, K. Sarukesi, P. Rodrigues, M. S.
[6] J. Gu, L. Wang, H. Wang, and S. Wang, A novel approach Manoj, and P. Revathy, State-of-the-art cloud computing
to intrusion detection using SVM ensemble with feature security taxonomies: A classification of security challenges
augmentation, Computers and Security, vol. 86, pp. 53–62, in the present cloud computing environment, in Proc.
2019. 2012 International Conference on Advances in Computing,
[7] Z. Chiba, N. Abghour, K. Moussaid, A. E. Omri, and M.
Communications and Informatics, Chennai, India, 2012, pp.
Rida, A cooperative and hybrid network intrusion detection
470–476.
framework in cloud computing based snort and optimized
[21] A. L. Buczak and E. Guven, A survey of data mining
back propagation neural network, Procedia Computer
and machine learning methods for cyber security intrusion
Science, vol. 83, pp. 1200–1206, 2016.
detection, IEEE Communications Surveys & Tutorials, vol.
[8] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman,
18, no. 2, pp. 1153–1176, 2016.
Survey of intrusion detection systems: Techniques, datasets
[22] A. Alshammari and A. Aldribi, Apply machine learning
and challenges, Cybersecurity, vol. 2, p. 20, 2019.
[9] A. Guezzaz, A. Asimi, Y. Asimi, Z. Tbatou, and Y. Sadqi, A techniques to detect malicious network traffic in cloud
global intrusion detection system using PcapSockS sniffer computing, Journal of Big Data, vol. 8, p. 90, 2021.
[23] A. Géron, Hands-On Machine Learning with Scikit-Learn
and multilayer perceptron classifier, International Journal
& TensorFlow: Concepts, Tools, and Techniques to Build
of Network Security, vol. 21, no. 3, pp. 438–450, 2019.
[10] A. Guezzaz, S. Benkirane, M. Azrour, and S. Khurram, Intelligent Systems. Sebastopol, CA, USA: O’Reilly Media,
A reliable network intrusion detection approach using Inc., 2017.
decision tree with enhanced data quality, Security and [24] N. Chand, P. Mishra, C. R. Krishna, E. S. Pilli, and M. C.
Communication Networks, vol. 2021, p. 1230593, 2021. Govil, A comparative analysis of SVM and its stacking
[11] B. A. Tama and K. H. Rhee, HFSTE: Hybrid feature with other classification algorithm for intrusion detection,
selections and tree-based classifiers ensemble for intrusion in Proc. 2016 International Conference on Advances in
detection system, IEICE Trans. Inf. Syst., vol. E100.D, no. Computing, Communication, & Automation (ICACCA),
8, pp. 1729–1737, 2017. Dehradun, India, 2016, pp. 1–6.
[12] M. Azrour, J. Mabrouki, G. Fattah, A. Guezzaz, and F. Aziz, [25] A. B. Nassif, M. A. Talib, Q. Nasir, H. Albadani, and
Machine learning algorithms for efficient water quality F. M. Dakalbab, Machine learning for cloud security: A
prediction, Modeling Earth Systems and Environment, vol. systematic review, IEEE Access, vol. 9, pp. 20717–20735,
8, pp. 2793–2801, 2022. 2021.
[13] M. Azrour, Y. Farhaoui, M. Ouanan, and A. Guezzaz, SPIT [26] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J.
detection in telephony over IP using K-means algorithm, Kim, A survey of deep learning-based network anomaly
Procedia Computer Science, vol. 148, pp. 542–551, 2019. detection, Cluster Comput., vol. 22, pp. 949–961, 2017.
[14] M. Azrour, M. Ouanan, Y. Farhaoui, and A. Guezzaz, [27] M. A. Ferrag, L. Maglaras, S. Moschoyiannis, and
Security analysis of Ye et al. authentication protocol for H. Janicke, Deep learning for cyber security intrusion
internet of things, in Proc. International Conference on Big detection: Approaches, datasets, and comparative study,
Data and Smart Digital Environment, Casablanca, Morocco, Journal of Information Security and Applications, vol. 50,
2018, pp. 67–74. p. 102419, 2020.
[15] M. Azrour, J. Mabrouki, A. Guezzaz, and A. Kanwal, [28] V. Kanimozhi and T. P. Jacob, Calibration of various
Internet of things security: Challenges and key issues, optimized machine learning classifiers in network intrusion
Security and Communication Networks, vol. 2021, p. detection system on the realistic cyber dataset CSE-CIC-
5533843, 2021. IDS2018 using cloud computing, International Journal of
[16] A. Guezzaz, S. Benkirane, and M. Azrour, A novel anomaly Engineering Applied Sciences and Technology, vol. 4, no.
network intrusion detection system for internet of things 6, pp. 209–213, 2019.
security, in IoT and Smart Devices for Sustainable [29] L. Zhou, X. Ouyang, H. Ying, L. Han, Y. Cheng, and T.
Environment, M. Azrour, A. Irshad, and R. Chaganti, eds. Zhang, Cyber-attack classification in smart grid via deep
Cham, Switzerland: Springer, 2022, pp. 129–138. neural network, in Proc. 2nd International Conference on
[17] A. Guezzaz, A. Asimi, M. Azrour, Z. Tbatou, and Y. Asimi, Computer Science and Application Engineering, Hohhot,
Hanaa Attou et al.: Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques 319

China, 2018, pp. 1–5. Computational Intelligence, vol. 36, no. 4, pp. 1580–1592,
[30] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and 2020.
M. Ghogho, Deep learning approach for network intrusion [40] N. M. Abdulkareem and A. M. Abdulazeez, Machine
detection in software defined networking, in Proc. 2016 learning classification based on radom forest algorithm:
International Conference on Wireless Networks and Mobile A review, International Journal of Science and Business,
Communications, Fez, Morocco, 2016, pp. 258–263. vol. 5, no. 2, pp. 128–142, 2021.
[31] L. Zhang, L. Shi, N. Kaja, and D. Ma, A two-stage deep [41] L. Breiman, Random forests, Machine Learning, vol. 45,
learning approach for can intrusion detection, in Proc. 2018 pp. 5–32, 2001.
Ground Vehicle Syst. Eng. Technol. Symp. (GVSETS), Novi, [42] I. Reis, D. Baron, and S. Shahaf, Probabilistic random
MI, USA, 2018, pp. 1–11. forest: A machine learning algorithm for noisy data sets,
[32] A. Mishra, B. B. Gupta, D. Perakovic, F. J. G. Penalvo, The Astronomical Journal, vol. 157, no. 1, p. 16, 2018.
and C. H. Hsu, Classification based machine learning for [43] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, Random forests
detection of DDoS attack in cloud computing, in Proc. 2021 and decision trees, IJCSI International Journal of Computer
IEEE International Conference on Consumer Electronics, Science Issues, vol. 9, no. 5, pp. 272–278, 2012.
Las Vegas, NV, USA, 2021, pp. 1–4. [44] B. O. Yigin, O. Algin, and G. Saygili, Comparison of
[33] F. Jiang, Y. Fu, B. B. Gupta, Y. Liang, S. Rho, F. morphometric parameters in prediction of hydrocephalus
Lou, F. Meng, and Z. Tian, Deep learning based multi- using random forests, Computers in Biology and Medicine,
channel intelligent attack detection for data security, IEEE vol. 116, p. 103547, 2020.
Transactions on Sustainable Computing, vol. 5, no. 2, pp. [45] A. Sarica, A. Cerasa, and A. Quattrone, Random forest
204–212, 2018. algorithm for the classification of neuroimaging data in
[34] A. N. Khan, M. Y. Fan, A. Malik, and R. A. Memon,
alzheimer’s disease: A systematic review, Frontiers in
Learning from privacy preserved encrypted data on cloud
Aging Neuroscience, vol. 9, p. 329, 2017.
through supervised and unsupervised machine learning, in
[46] A. Devarakonda, N. Sharma, P. Saha, and S. Ramya,
Proc. 2019 2nd International Conference on Computing,
Network intrusion detection: A comparative study of four
Mathematics and Engineering Technologies, Sukkur,
classifiers using the NSL-KDD and KDD’99 datasets,
Pakistan, 2019, pp. 1–5.
[35] S. Potluri and C. Diedrich, Accelerated deep neural Journal of Physics: Conference Series, vol. 2161, p. 012043,
networks for enhanced intrusion detection system, in Proc. 2022.
2016 IEEE 21st International Conference on Emerging [47] M. Zeeshan, Q. Riaz, M. A. Bilal, M. K. Shahzad, H.
Technologies and Factory Automation, Berlin, Germany, Jabeen, S. A. Haider, and A. Rahim, Protocol-based deep
2016, pp. 1–8. intrusion detection for DoS and DDoS attacks using UNSW-
[36] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, Long short term NB15 and Bot-IoT data-sets, IEEE Access, vol.10, pp. 2269–
memory recurrent neural network classifier for intrusion 2283, 2021.
detection, in Proc. 2016 International Conference on [48] M. Shafiq, Z. Tian, A. K. Bashir, X. Du, and M. Guizani,
Plateform Technology and Service, Jeju, Republic of Korea, CorrAUC: A malicious Bot-IoT traffic detection method
2016, pp. 1–5. in IoT network using machine-learning techniques, IEEE
[37] J. Zhang, Anomaly detecting and ranking of the cloud Internet Things J., vol. 8, no. 5, pp. 3242–3254, 2021.
computing platform by multi-view learning, Multimedia [49] M. Shafiq, Z. Tian, Y. Sun, X. Du, and M. Guizani,
Tools and Applications, vol. 78, pp. 30923–30942, 2019. Selection of effective machine learning algorithm and Bot-
[38] F. B. Ahmad, A. Nawaz, T. Ali, A. A. Kiani, and G. Mustafa, IoT attacks traffic identification for Internet of things in
Securing cloud data: A machine learning based data smart city, Future Generation Computer Systems, vol. 107,
categorization approach for cloud computing, http://doi.org/ pp. 433–442, 2020.
10.21203/rs.3.rs-1315357/v1, 2022. [50] M. Hossin and M. N. Sulaiman, A review on evaluation
[39] A. Mubarakali, K. Srinivasan, R. Mukhalid, S. C. metrics for data classification evaluations, International
Jaganathan, and N. Marina, Security challenges in Internet Journal of Data Mining & Knowledge Management Process,
of things: Distributed denial of service attack detection doi: 10.5121/ijdkp.2015.5201.
using support vector machine-based expert systems,

Azidine Guezzaz received the PhD Hanaa Attou received the engineer
degree from Ibn Zohr University, Agadir, diploma in operational research and
Morocco in 2018. He is currently an decision support from National Institute of
assistant professor of computer science Statistics and Applied Economics, Morocco
and mathematics at Cadi Ayyad University. in 2020. She is currently pursuing the PhD
His main field of research interest is degree of computer security at Cadi Ayyad
computer security, cryptography, artificial University, Marrakech. Her main field
intelligence, intrusion detection, and smart of research interest is machine learning,
cities. He is also a reviewer of various scientific journals. intrusion detection, and cloud environment security.
320 Big Data Mining and Analytics, September 2023, 6(3): 311–320

Said Benkirane received the PhD degree Yousef Farhaoui received the PhD degree
from Choaib Dokkali University, El jadida, in computer security from Ibn Zohr
Morocco in 2013. He is currently an University of Science. He is a professor
associate professor of computer science at the Faculty of Sciences and Techniques,
and mathematics at Cadi Ayyad University, Moulay Ismail University of Meknès, and
Marrakech. His research interests include a local publishing and research coordinator
computer security, artificial intelligence, of Cambridge International Academics in
smart cities, and VANET networks. He is United Kingdom. His research interests
also the reviewer of various scientific journals. include learning, e-learning, computer security, big data analytics,
and business intelligence. He has three books in computer science.
Mourade Azrour received the PhD degree He is a coordinator and member of the organizing committee,
from Moulay Ismail University of Meknès, a member of the scientific committee of several international
Errachidia, Morocco in 2019, and the congresses, and a member of various international associations.
MS degree in computer and distributed He has authored 4 books and many book chapters with reputed
systems from Ibn Zouhr University, Agadir, publishers such as Springer and IGI. He serves as a reviewer
Morocco in 2014. He currently works as for IEEE, IET, Springer, Inderscience and Elsevier journals. He
a compter sciences professor at the Faculty is also the guest editor of many journals with Wiley, Springer,
of Sciences and Techniques, Moulay Inderscience, etc. He has been the general chair, session chair,
Ismail University of Meknès. His research interests include and panelist in several conferences.
authentication protocol, computer security, Internet of Things,
and smart systems. He is a member of the scientific committee
of numerous international conferences. He is also a reviewer of
various scientific journals. He has edited a scientific book IoT
and Smart Devices for Sustainable Environment and he is a guest
editor in journal EAI Endorsed Transactions on Internet of Things.

You might also like