Hybrid Metaheuristics With Machine Learning Based
Hybrid Metaheuristics With Machine Learning Based
ABSTRACT Botnet detection in a cloud-aided Internet of Things (IoT) environment is a tedious process,
meanwhile, IoT gadgets are extremely vulnerable to attacks due to poor security practices and limited
computing resources. In the cloud-aided IoT environment, Botnet can be identified by monitoring network
traffic and analyzing it for signs of malicious activity. It can be performed by using intrusion detection
systems, machine learning (ML) algorithms, and other security tools that are devised for identifying known
botnet behaviors and signatures. Therefore, this study presents a Hybrid Metaheuristics with Machine
Learning based Botnet Detection (HMMLB-BND) method in the Cloud Aided IoT environment. The
projected HMMLB-BND technique focuses on the detection and classification of Botnet attacks in the
cloud-based IoT environment. In the presented HMMLB-BND technique, modified firefly optimization
(MFFO) algorithm for feature selection purposes. The HMMLB-BND algorithm uses a hybrid convolutional
neural network (CNN)-quasi-recurrent neural network (QRNN) module for botnet detection. For the optimal
hyperparameter tuning process, the chaotic butterfly optimization algorithm (CBOA) is employed. A series
of simulations were made on the N-BaIoT dataset and the experimental outcomes stated the significance of
the HMMLB-BND technique over other existing approaches.
INDEX TERMS Deep learning, cloud computing, Internet of Things, cybersecurity, botnet detection.
IoT gadgets found to be rising with the development of IoT model. For the optimal hyperparameter tuning process, the
[5]. Many IoT gadgets are linked to the internet, allowing chaotic butterfly optimization algorithm (CBOA) is applied.
a lack of security control abuse [6]. Several security threats To demonstrate the enhanced performance of the HMMLB-
are aimed at the IoT, which includes several susceptibilities. BND technique, a series of simulations were made on the
As the IoT is prone to various attacks, it is significant to cate- N-BaIoT dataset. In short, the key contributions are listed as
gorize the attacks and appropriate vulnerabilities to study the follows.
IoT. Through certain research, it is established that routing, • Develop a new HMMLB-BND technique comprising
jamming, sinkhole, DoS, wormhole, a man in the middle, MFFO based feature subset selection, CNN-QRNN
worm attacks [7], flooding, and virus probably occur in an classification, and CBOA based hyperparameter tuning
IoT system. To be specific, DoS attacks and flooding take for botnet detection has been developed. To the best
place in production IoT platforms [8]. of our knowledge, the HMMLB-BND technique never
Botnet attack is now increasingly gaining popularity [9]. existed in the literature.
Service disruption and resource depletion are some of the • Present MFFO algorithm for the feature selection pro-
damages caused by the botnet. AI is commonly utilized to cess, which resolves the ineffective exploration ability
find such IoT attacks [10]. The intrusion detection system and local optima problem.
(IDS) is an application that can be used to oversee net- • Hyperparameter tuning of the CNN-QRNN model
work traffic actions to detect malicious actions. The IDS using CBOA helps to improve the overall predictive
are classified into two categories per the detection system performance on unseen training data.
named anomaly-based and signature-based detection systems
[11]. The first one is to use the network behaviours from II. RELATED WORKS
established baselines. This method will be suited to detect Vinayakumar et al. [14] based on a two-level DL structure,
unknown and known malicious events. The next one applies a botnet detection system is presented for semantically deter-
particular patterns from the network (e.g., a sequence of mining Botnet and legal activities at the application layer
bytes). And later it compares these sequences with current of the domain name system (DNS). A primary first level of
signature databases [12]. Compared to traditional ML meth- structure, with the use of a Siamese network, depends on
ods, the current study noted that the deep learning (DL) a pre-defined threshold, the measure of similarity of DNS
method finds IoT assaults fruitfully. But the cloud layer only queries will be predicted to opt the frequent DNS data across
has the resource to run these algorithms [13]. On top of an Ethernet connection. In Shorman et al. [15], an innova-
that, these methods are not active in some cases, like remote tive unsupervised evolutionary IoT botnet recognition algo-
live functioning, since the system has been assumed to form rithm was devised. The algorithm mainly detects IoT botnet
realistic decisions faster. attacks in IoT devices using the effectiveness of a recent SI
The convergence of IoT devices and cloud services poses method named GWO for optimizing the hyperparameter of
several complexities because of huge scale, heterogeneity, the OCSVM and concurrently recognising the attributes that
and dynamic nature of the ecosystem. The sheer volume define the IoT botnet issue optimally.
of different IoT devices, each with distinct communication The authors [16] introduced a potential DL-based Botnet
protocols and abilities, makes it difficult to devise univer- attack-detecting approach that can deal with vastly imbal-
sal detection methods. Encrypted communication among the anced network traffic datasets. To be Specific, to achieve
devices and cloud services further complicate the examina- class balance, SMOTE makes more minority samples, while
tion of network traffic for signs of botnet activity. The limited DRNN learned hierarchical feature representation in bal-
resources of IoT devices obstruct the design of resource- anced network traffic datasets for effectuating discriminative
intensive detection approaches, requiring the design of classifying procedure. The authors in [17] devise a botnet
lightweight yet effective approaches. The distributed nature detection method utilizing the barnacle’s mating optimizer
of botnets and its capability to mimic legitimate device behav- including ML (BND-BMOML) for the IoT platform. This
ior make pinpointing malicious activities and command-and- proposed method has focused on the recognition and identi-
control nodes challenging. Rapid development of the attack fication of botnets in IoT platforms. Initially, a data standard-
approaches, integrated with the absence of uniform security ization method is followed by the BND-BMOML algorithm
standards across IoT devices, exacerbate the difficulty of follows for effectuating this. In the above-mentioned method,
botnet detection. for opting for a valuable feature set, the BMO approach was
To resolve these issues, this study designs a Hybrid Meta- used. To detect botnets, this study uses the BND-BMOML
heuristics with Machine Learning based Botnet Detection method in an Elman NN (ENN) method.
(HMMLB-BND) method in the Cloud Assisted IoT envi- In [18], ML approaches were utilized to support the preven-
ronment. In the presented HMMLB-BND approach, the tion and detection of bot attacks. In this study for the selec-
modified firefly optimization (MFFO) technique is used for tion of the best features, An Ensemble Classifier Algorithm
feature selection (FS) purposes. For botnet detection, the includes Stacking Process (ECASP) was devised that is given
HMMLB-BND technique uses a hybrid convolutional neu- as input to the ML classifiers for forecasting the performance
ral network (CNN)-quasi-recurrent neural network (QRNN) of botnet identification. Catillo et al. [19] modelled a new
VOLUME 11, 2023 115669
L. Almuqren et al.: HMMLB-BND in Cloud Assisted Internet of Things Environment
(
formulated by: xmute1,j if k5 ≤ k4
ximprove4,j = (12)
−γ r 2 xmute2,j if k5 > k4
β (r) = β0 e (4)
ximprove5,j = ψ × Xworst + ζ (Xbest − Xworst ) (13)
In 2D space, the distance between two fireflies is
determined by the Cartesian distance: F k1 : k5 , ψ, and ζ characterize random variable ranges
q from zero to one.
rij̇ = xi − xj = (xi − xj )2 +(yi − yj )2 (5) For each firefly, the objective function can be defined, and
the ith firefly will be replaced by the firefly having the smaller
As we discussed, the brighter firefly attracts the spe- objective function. When the ith firefly has a main function
cific, less bright one. Therefore, ith firefly movement can be small than the optimally attained firefly, then the replacement
mathematically formulated by: cannot be done. The α random parameter controls the random
2 search ability whereas the neighboring fireflies are not notice-
xj = xj + β0 e−γ rij xj − xj + α (rand−0.50)
(6) able to the selected firefly. The α monitor and control the
movements of every firefly selected randomly amongst [0, 1].
where α signifies the randomization parameter, and rand sig-
The value of α through the global search space leads to an
nifies a uniformly distributed random number within [0, 1].
optimum solution, whereas the smaller value of α promotes
The FFO technique is a powerful optimization technique uti-
local search. Thereby, an optimum value of α fulfils the bal-
lized in an optimization problem. But, it has poor search capa-
ance of local and global searching. A novel adaptive control
bility and suffers from the local optima problem. An adapted
mechanism can be devised for improving the search ability
version of the FFO is established, namely the MFFO algo-
(global and local) to accomplish this balance. Moreover, the
rithm by presenting the subsequent modification in the FFO
process runs for multiple epochs, and the heuristic function
algorithm: search capability is enhanced, and the local opti-
for every epoch is attained by the following:
mal problems can be solved. A comprehensive explanation is
shown in the following. αitr+1 = (1/2kmax )1/kmax αitr (14)
The poor search capability and trapped in local optima
are solved by presenting two modifications in the FFO, where itr signifies iteration value ranges from 1 to kmax .
named MFFO technique: (1) the overall population of fire- The fitness function of the MFFO algorithm is intended
flies is stimulated towards the direction of global optimal to have a balance between the classification performance
or best solution; (2) population diversity can be enhanced (highest) and the count of features selected in every solution
by presenting two mutation operations and three crossover (lowest) attained by the features selected, Eq. (15) signifies
operations. Correspondingly, the entire firefly population the fitness function to estimate the solution.
can be enhanced in every iteration by presenting certain |R|
assumptions. The comprehensive overview is given below: Fitness = αγR (D) + β (15)
itr denotes the best individual, and X itr |C|
Consider that Xbest worst
denotes the worse individual in the firefly population at every where the two parameters respective to the importance of
iteration. During the firefly population for ith firefly, three classification quality and subset length are α and β, γR (D)
more fireflies are selected randomly as Xq1 , Xq2 , and Xq3 so characterizes the classification error rate. |R| denotes the
that q1 ̸ =q2 ̸ = q3 ̸ =i. The two newly generated individuals are cardinality of the selected subset and |C| shows the overall
given as follows: amount of features in the dataset. ∈ [1,0] and β = 1 − α.
Thus, to prevent overfitting, we added a dropout layer. Fig. 2 the g (position) based on the minimal fitness. Besides, ther
illustrates the architecture of the CNN-QRNN technique. value could not be utilized for calculating fitness, it can be
managed by the p (switching probability) value and primary
value of p = 0.8. The r value has been related to the p-value
for controlling the BF but moving to a better solution with
minimal fitness from local/global searching.
-Local searching stage. If the BFs lose the sense of the
fragrance of other BFs, they can be moved arbitrarily from
the searching space. The procedure is termed local searching
and it could be determined as:
xit+1 = xit + r 2 × xjt − xkt ×fi (18)
FIGURE 2. Structure of CNN-QRNN. whereas xjt , xk t implies the 2 vectors that signify 2 various
BFs from a similar population. -Solution estimation. The fra-
Next, max-pooling and 1D convolutional layers are used grance intensity of BFs defines their main function. The BF
for extracting spatiotemporal features. The outcome of CNN attract the other BFs based on their magnitude of fragrances.
is fed into Flatten layer the FC input layer converts the The projected CBOA technique depends upon the combi-
outcome of pooling layers into a single vector that is input nation of chaotic maps from the typical BOA. Essential stages
for the following layers. Lastly, the dense layer, also known as of the presented CBOA are given in the following. Appeal
the FC layer, with SoftMax function was utilized to categorize the chaotic maps to upgrade BF places rather than utilizing
the threat by computing the probability for all the classes. random variables so as far as will enhance the performance
of CBOA. Eqs. (2) and (3) are altered by exchanging r 2 by
C. PARAMETER TUNING USING CBOA Cj as follows:
The utilization of the CBOA model for hyperparameter tun-
xit+1 = xit + Cj ×g∗ − xit ×fi
ing helps in attaining improved performance on botnet attack (19)
t+1 t t
detection. The learning rate is the hyperparameter tuned by xi = xi + Cj ×xjt − xk ×fi (20)
the CBOA. The BOA is a swarm-based metaheuristic algo-
rithm based on the information-sharing and foraging behav- whereas Cj denotes the chaotic map and j = 1, 2, . . . , 10.
Noticeably the Cj values can be chaotic, created utilizing
ior of butterflies (BFs) [30]. Due to its performance, BOA
10 chaotic maps that can be exchanged with r value for
was used in different fields of optimization problems. The
obtaining best outcomes and minimal fitness than novel
magnitude of BF produces an odor smell with intensity once
technique utilize random value.
it moves. The other BFs attracted towards BF based on the
Fitness selection is a key factor in the CBOA system.
magnitude of fragrances. The fragrance of all the BFs is
Solution encoding is used to evaluate the goodness of the
illustrated in Eq. (16).
solution candidate. Then, the accuracy value is the primary
pfi = cI a (16) condition exploited for devising a fitness function.
where c and a parameters are a power exponent that signifies Fitness = max (P) (21)
the degree of fragrance absorption and the sensor modality TP
correspondingly. pfi signifies the perceived magnitude of P= (22)
TP + FP
fragrances, and I represent the fragrance intensity.
where TP represent the true positive and FP symbolizes the
Butterflies movement: The movement of BFs can be
false positive value.
dependent upon 3 stages as follow.
-Global searching stage. All the BFs emit fragrance once
IV. RESULTS AND DISCUSSION
it moves and other BFs take with it based on their magnitude
In this work, the botnet detection results of the HMMLB-
of fragrances. This procedure is termed global searching and
BND method are studied on the N-BaIoT [31] Dataset.
is determined as:
It includes 17001 instances with three class labels as given
xit+1 = xit + r 2 × g∗ − xit ×fi (17) in Table 1. The proposed model is simulated using Python
3.6.5 tool on PC i5-8600k, GeForce 1050Ti 4GB, 16GB
In which xit defines the vector that signifies the BF (solu- RAM, 250GB SSD, and 1TB HDD. The parameter settings
tion) at iteration t, g∗ signifies the entire better results, r are given as follows: learning rate: 0.01, dropout: 0.5, batch
stands for the random number in 0 and 1, and fi denotes the size: 5, epoch count: 50, and activation: ReLU.
fragrance of it h BF. During this stage, the g primary value Fig. 3 represents the confusion matrices of the HMMLB-
is the location of the minimal fitness of every solution and it BND method tested under distinct sizes of the TRP and
is computed by allocating a fitness value to all the solutions TSP. The results denote that the HMMLB-BND system has
and determining the minimal fitness afterwards upgrading identified the botnets proficiently under all TRP and TSP.
V. CONCLUSION
In this study, we have established a novel HMMLB-BND
method in the Cloud Aided IoT environment. The projected
HMMLB-BND technique focuses on the detection and clas-
sification of botnet attacks in the cloud-based IoT platform.
In the presented HMMLB-BND technique, the MFFO algo-
rithm for FS purposes is applied. To detect and classify
FIGURE 10. Precision-recall outcome of HMMLB-BND approach.
botnets properly the CBOA with CNN-QRNN model is
used. The utilization of the CBOA model helps in attaining
improved performance on botnet attack detection. A series
imply improvements in the HMMLB-BND technique in of simulations were made on the N-BaIoT dataset to demon-
terms of several measures. The outcomes stated that the strate the higher performance of the HMMLB-BND tech-
LSTM and CNN-RNN approaches reach the least outcomes nique. The experimental outcomes stated the significance of
while the DNN-LSTM, LSTM-CNN, and DNN models the HMMLB-BND technique over other existing approaches.
accomplish nearer classification performance. In the future, ensemble deep-learning classifiers can extend
the performance of the HMMLB-BND algorithm. Besides,
TABLE 3. Comparative outcome of HMMLB-BND method with existing future work can investigate the computation complexity of the
algorithms.
proposed model. In addition, class imbalance data handing
problem will be addressed in future.
ACKNOWLEDGMENT
The authors extend their appreciation to the Deanship
of Scientific Research at King Khalid University for
funding this work through large group Research Project
under grant number (RGP2/159 /44). Princess Nourah bint
Abdulrahman University Researchers Supporting Project
number (PNURSP2023R349), Princess Nourah bint Abdul-
rahman University, Riyadh, Saudi Arabia. We Would like
Next, the BND-BMODL model results in considerable to thank SAUDI ARAMCO Cybersecurity Chair for fund-
outcomes with accuy , precn , recal , and Fscore of 99.04%, ing this project. This study is supported via funding from
98.67%, 98.66%, and 98.70% respectively. But the HMMLB- Prince Sattam bin Abdulaziz University project number
BND technique reaches maximum performance with accuy , (PSAU/2023/R/1444). This study is partially funded by the
precn , recal , and Fscore of 99.43%, 99.13%, 99.12%, and Future University in Egypt (FUE).