Proposed Algorithm Base Optimization Scheme For Intrusion Detection Using Feature Selection
Proposed Algorithm Base Optimization Scheme For Intrusion Detection Using Feature Selection
Corresponding Author:
Imane Laassar
Department of Computer Science, Faculty of Computer Sciences and Informatics,
Université Ibn Tofail Morocco
Av. de L'Université, Kénitra, Morocco
Email: [email protected]
1. INTRODUCTION
The number of devices associated with the internet is rapidly rising as the internet has become
ingrained in every aspect of modern life. Particularly, internet of things (IoT) gadgets are becoming
commonplace in everyday life. However, certain issues are becoming worse, and their solutions are also
being discussed by different researchers [1]. In cloud and IoT security techniques, intrusion detection is used
to identify, verify, and thwart illegal entry into a computer network or internetwork. Due to the impressive
developments in data technology, there are important network confidentiality battles to win. Consequently, it
is imperative to have an intrusion detection system (IDS) for the security of a network [2].
IDS fall under several categories of distinct approaches. The two primary divisions are active and
inactive IDS. The traditional active IDS is unable to address newly emerging threats. Due to its enormous
number of components and features, one of the primary challenges in finding intrusions is to locate and
distinguish between regular and anomalous network connections. IDS is frequently used to determine how
and where intrusions occur. The investigators conducted a thorough investigation of several element selection
strategies to achieve real-time intrusion detection [3].
A compelling argument for improving the accuracy and speed of categorization schemes is to reduce
the number of features based on the selection of the essential characteristics. Machine learning techniques
have been widely used to recognize various attack types, and they can assist network administrators in
responding to network attacks by guiding them toward the best course of action. The majority of these
conventional machine learning techniques, however, fall within the shallow learning category and require
extensive feature extraction and feature selection [4]. Due to its enormous number of components and
features, one of the primary challenges in finding intrusions is to locate and distinguish between regular and
anomalous network connections. IDS is frequently used to determine how and where intrusions occur. The
classifier, which uses a detection mechanism to distinguish between intrusion and normal activity, is the
fundamental component of an IDS. It can be difficult to implement a classifier with an accurate detection
method, especially in IoT and cloud computing networks with lots of devices [5], [6]. Figure 1 presents the
structure of IoT and cloud computing (CC) integration and working criteria. The rest of this paper is
structured as follows: section 2 presents information about related work, section 3 discusses the proposed
algorithm, section 4 covers the parameters, section 5 presents the results, and section 6 discusses the
conclusion.
2. RELATED WORK
The two main types of intrusion detection techniques are anomaly-based and signature-based. With
signature-based approaches, several intrusion patterns that have been tested and proven effective against
them are stored in the system as predefined signatures. Additionally, the system compares the actions taken
with these patterns, and if a similar pattern is seen, it will be labeled as an intrusion. Naturally, these
techniques cannot identify brand-new or zero-day risks. These techniques, however, are particularly good at
identifying recognized risks and their patterns [8]. A vision of typical activity is constructed using anomaly-
based methods, after which an anomaly may denote an intrusion. It is well recognized that because there is no
set pattern for monitoring, aberrant intrusions are exceedingly challenging to find. An occurrence is typically
deemed abnormal if it occurs considerably more frequently or less frequently than a threshold [9]. Some AI
methods employ tree-based algorithms like decision trees and random forests, which can build a structure for
successfully detecting infiltration. In a decision tree algorithm, decisions are made step by step in accordance
with the parameters of the problem. However, a decision tree may not always be sufficient to model a
problem. Therefore, multiple decision trees are employed in random forest algorithms to improve overall
decision-making accuracy. For software-defined networks, Xu et al. [10] have presented an anomaly-based
method (IDSML) that enhances detection performance by combining many distinct tree-based methods.
Neural networks are employed in other AI methods to accurately determine whether a specific occurrence
resembles known patterns. Neural networks are made up of a number of interconnected nodes and have the
ability to recognize patterns. According to Revathi and Malathi [11], calculations in a neural network take a
long time since decision-making problems have a lot of parameters. Neural networks have been the primary
detection method in numerous studies.
The artificial bee colony (ABC) algorithm was created in 2005 by Karaboga as a heuristic swarm
intelligence system to resemble the group behavior of honeybees. It was initially created to address some
Proposed algorithm base optimization scheme for intrusion detection using feature … (Imane Laassar)
26 ISSN: 2252-8814
issues with numerical optimization. According to Vinayakumar et al. [12], the ABC algorithm was used to
optimize multivariate functions, and it was compared to other methods like the genetic algorithm (GA) and
particle swarm optimization (PSO). The results show that ABC is a superior algorithm over others. On the
other hand, the ABC algorithm struggles with exploitation and is prone to settle into a local optimum while
excelling in exploring the answer. The GABC algorithm, which enhances exploitation by including
information on the global optimal solution in the solution search equation, was introduced as an upgrade to
the ABC algorithm [13]. According to Mishra et al. [14], a multi-strategy ensemble artificial bee colony
(MEABC) algorithm was suggested. In MEABC, a variety of unique solution search tactics cohabit and
compete for offspring throughout the search process. When applied to continuous optimization issues, the
MEABC approach significantly enhances the performance of ABC. According to Karaboga and Ozturk [15],
an ABC algorithm incorporating elite-guided search equations and a depth-first architecture, called DFSABC
elite, was introduced. The algorithm's ability to be exploited is improved by giving superior solutions higher
priority for computational resources.
3. PROPOSED ALGORITHM
The population-based, iterative ABC method is a powerful approach for tackling numerical
optimization issues. The previous papers mentioned are [17]. Equations are stronger for exploration than for
exploitation. Additionally, the ABC algorithm's convergence performance is not outstanding. Therefore, in
[18], a binary search framework (BSF) and two search equation solutions, as given in (1), were suggested to
better balance exploration and exploitation. This process, known as BSF, is used for improving the
algorithm's ability to be exploited. The BSF framework can give better solutions higher priority when
allocating more computational resources. The search equations retain the answer with the highest fitness
value on each iteration, hastening the algorithm's training [19].
1
𝑉𝑒,𝑗 = (𝑋𝑒,𝑗 + 𝑋𝑏𝑒𝑠𝑡,𝑗 ) + 𝜙𝑒,𝑗 × (𝑋𝑏𝑒𝑠𝑡,𝑗 − 𝑋𝑘,𝑗 ) (2)
2
Where the solutions 𝑋𝑒 and 𝑋𝑘 were randomly selected from the binary search solution and the current
population, respectively. Neither e nor k are equivalent to one another. 𝑋𝑏𝑒𝑠𝑡 is currently the best choice. i, j
and e, j are two random real values in the range [-1, 1]. In order to better balance ABC exploration and
exploitation capacities, in paper [20], the problem that the candidate solution search equation in paper [21]
has an overly significant disruption to the search solution is addressed. It then presents a binary search
equation.
Different search equations should be utilized for the candidate solutions and the accepted solutions.
Where 𝑋𝑘 is a randomly chosen solution from the current population and 𝑋𝑒 is a solution chosen at random
from the binary search solution. e and k are not interchangeable terms. Right now, 𝑋𝑏𝑒𝑠𝑡 is the best option.
Two random real variables in the [-1, 1] in the given range. The issue that the candidate solution search
equation in paper [22] has an excessively significant disruption to the search solution is addressed in article
[23] in order to better balance ABC's exploration and exploitation capacities. A binary search equation is then
presented. Different search equations should be employed for the candidate solutions and the accepted
solutions.
𝑐1 ×𝑝𝑏𝑒𝑠𝑡𝑖 +𝑐2 ×𝑔𝑏𝑒𝑠𝑡
𝑃𝑖 = (3)
𝑐1 +𝑐2
𝑔𝑏𝑒𝑠𝑡+𝑝𝑏𝑒𝑠𝑡𝑖
𝑋𝑖 = 𝑁 ( , 𝑔𝑏𝑒𝑠𝑡 − 𝑝𝑏𝑒𝑠𝑡𝑖 ) (4)
2
Int J Adv Appl Sci, Vol. 13, No. 1, March 2024: 24-32
Int J Adv Appl Sci ISSN: 2252-8814 27
In this situation, N stands for the Gaussian distribution, 𝑔𝑏𝑒𝑠𝑡 + 𝑝𝑏𝑒𝑠𝑡𝑖 for the mean, and
𝑔𝑏𝑒𝑠𝑡, 𝑝𝑏𝑒𝑠𝑡𝑖 for the standard deviation. The Gaussian distribution in (3) is used to take advantage of the
information around 𝑝𝑏𝑒𝑠𝑡 and 𝑔𝑏𝑒𝑠𝑡. According to (4) a comparable Gaussian search equation is suggested
[24].
𝑋𝑏𝑒𝑠𝑡,𝑗 +𝑋𝑖,𝑗
𝑉𝑒,𝑗 = 𝑁 ( , 𝑋𝑏𝑒𝑠𝑡,𝑗 − 𝑋𝑖,𝑗 ) (5)
2
1
𝑉𝑒,𝑗 = (𝑋𝑒,𝑗 + 𝑋𝑏𝑒𝑠𝑡,𝑗 ) + 𝜙(𝑋𝑒,𝑗 + 𝑋𝑏𝑒𝑠𝑡,𝑗 ) + 𝜙𝑒,𝑗 (𝑋𝑏𝑒𝑠𝑡,𝑗 − 𝑋𝑒,𝑗 ) (6)
2
𝑛𝑒𝑡𝑗 = ∑𝑚
𝑖=1 𝜔𝑖,𝑗 𝜒𝑖 + 𝜃𝑖 (7)
Finally, the neural network is trained using the backpropagation method using the initial weight and
threshold values produced by the BABCN algorithm. By using gradient descent, the backpropagation method
attempts to reduce the training error. The neural network for network traffic intrusion detection will employ
the weights and thresholds with the minimum training error as its parameters [25]. The working criteria of the
proposed backpropagation and neural network are as follows: choose a sample of data for training, then
generate the weight values at random for the connections between the hidden layer neurons and the output
layer neurons (𝜔𝑗𝑘 ) and the hidden layer neurons and the input layer neurons (𝜔𝑗 ). Additionally, create the
threshold values j of the neurons in the hidden layer and k of the output layer [26].
1
𝑉𝑒,𝑗 = (𝑋𝑒,𝑗 + 𝑋𝑏𝑒𝑠𝑡,𝑗 ) + 𝜙(𝑋𝑒,𝑗 + 𝑋𝑏𝑒𝑠𝑡,𝑗 ) + 𝜙𝑒,𝑗 (𝑋𝑏𝑒𝑠𝑡,𝑗 − 𝑋𝑒,𝑗 ) (8)
2
𝑛𝑒𝑡𝑗 = ∑𝑚
𝑖=1 𝜔𝑖,𝑗 𝜒𝑖 + 𝜃𝑖 (9)
𝑦𝑗 = 𝜗1 (𝑛𝑒𝑡𝑗 ) (10)
𝑍𝑘 = 𝜗2 (𝑛𝑒𝑡𝑘 ) (12)
According to (8), the neural network's error is estimated. If the error fulfills the criteria, (9) and (10)
are followed; otherwise, (11) is followed with (12).
𝑞1 2
𝐽(𝑤) = ∑𝑘=1(𝑡𝑘 − 𝑧𝑘 ) (13)
2
In (13) and (14) modify the threshold and weight values between the hidden layer and the output
layer. The weight and threshold values between the input layer and the hidden layer are changed in
accordance with (15) [27], [28].
𝑞
𝛻𝜃𝑗 = 𝜂[∑𝑘=1 𝜔𝑗𝑘 𝛿𝑘 ]𝜗1 ′(𝑛𝑒𝑡𝑗 ) (17)
𝜕𝐽 𝜕𝐽 𝜕𝑧𝑘
𝛿𝑘 = − =− = (𝑡𝑘 − 𝑧𝑘 )𝜗2 ′(𝑛𝑒𝑡𝑘 ) (18)
𝜕𝑛𝑒𝑡𝑘 𝜕𝑧𝑘 𝜕𝑛𝑒𝑡𝑘
The new weight standards 𝜔𝑖𝑗 and the new threshold values j between the input layer and the hidden
layer, as well as the new weight values 𝜔𝑗𝑘 and the new threshold values k between the hidden layer and the
output layer, can be obtained after the results found in (16) [29]. After learning the results from (17) [30], it is
possible to recover the new weight values 𝜔𝑖𝑗 and the new threshold values j between the input layer and the
hidden layer, as well as the new weight values 𝜔𝑗𝑘 and the new threshold values k between the hidden layer
and the output layer.
Proposed algorithm base optimization scheme for intrusion detection using feature … (Imane Laassar)
28 ISSN: 2252-8814
1
𝑓(𝑋𝑖 )≥0
1+𝑓(𝑋𝑖 )
𝑓𝑖𝑡𝑖 = { 1+|𝑓(𝑋𝑖 )|𝑓(𝑋𝑖 )<0
(19)
Rerun into step (18) using the updated weight and threshold values. Stop the training process if the
error complies with the specifications. Otherwise, obtain the relevant output signal from the neural network
by using the present weights and thresholds as neural work input signals. The goal function of (18) is set to
the loss function of a neural network, (19). Decide what the max cycle number (MCN) should be. Figure 2
presents the working approach [31].
4. EVALUATION METRICS
The following evaluation parameters are measured in this paper, which are as (20).
𝑇𝑃+𝑇𝑁
𝐴𝐶 = (20)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
Accuracy (AC) is defined by (20) as the proportion of samples that have been correctly identified to all
samples (41) [28].
𝑇𝑃
𝑇𝑃𝑅 = (21)
𝑇𝑃+𝐹𝑁
The true positive rate (TPR), which is the percentage of correctly identified anomaly samples over all
anomaly samples, is equal to the detection rate (DR) [32].
𝐹𝑃
𝐹𝑃𝑅 = (22)
𝐹𝑃+𝑇𝑁
The ratio of the total number of normal samples to the number of normal samples that were incorrectly
labeled as anomaly samples is known as the false positive rate (FPR) [33].
Int J Adv Appl Sci, Vol. 13, No. 1, March 2024: 24-32
Int J Adv Appl Sci ISSN: 2252-8814 29
The performance measurement tool is displayed in an ROC curve. Categorization issues arise when
choosing a model's threshold. The two parameters of this ROC curve are genuine positives and the rate of
false positives. Table 2 displays the outcomes of a 32-batch operation. In this case, the mean accuracy of the
proposed BABCN algorithm classifier declined as the number of research epochs increased. When the
number of epochs increased from 10 to 32, the accuracy decreased. Figure 3 presents the batch operation of
different algorithms, Figure 3(a) shows the elapsed time of the different algorithms, and Figure 3(b) shows
the epoch time of the different algorithms.
Tables 3 and 4 display the outcomes for batch sizes of 64 and 128. The mean accuracy of the
proposed BABCN algorithm classifier seemed to have increased as the number of research epochs grew.
When the number of epochs increased from 15 to 45, there was a minor decrease for the BABCN, and then it
increased at 45 epochs. Table 4 shows that a larger batch size could result in a shorter duration time. Figure 4
presents the accuracy of different approaches.
(a) (b)
Figure 3. The batch operation of (a) elapsed time of different algorithms and (b) epoch time of different
algorithms
Proposed algorithm base optimization scheme for intrusion detection using feature … (Imane Laassar)
30 ISSN: 2252-8814
6. CONCLUSION
We looked at various machine learning and deep learning techniques on an IoT network and
compared them with our proposed approach in this study. We took into account the analysis of RF,
convolutional neural network (CNN), MLP, and the proposed BABCN algorithm. The best outcome in terms
of multiclass classification accuracy and AUC was achieved by random forests and CNN. In trials with 32
and 64 batches, the accuracy slightly decreased with the addition of epochs, whereas in trials with 128
batches, the accuracy slightly increased. Additionally, we discovered that boosting the batch size helped
hasten the computation. For the proposed BABCN algorithm, increasing the batch size by two could speed up
computation by 1.3–2.4 times, while for CNN, it could accelerate computation by 1.8–2.4 times. Our long-
term objective is to create models using the proposed BABCN algorithm. Future deployment of our proposed
system aims to deliver detection and classification services against various cyber-attacks and intrusions
within a network of IoT devices (e.g., a network of advanced RISC machines (ARM) or Arduino Raspberry
Pi nodes).
ACKNOWLEDGMENTS
Thanks to Ibn-Tofail University Kenitra, Morocco for their support and My SV for his support and
time.
REFERENCES
[1] M. S. Noori, R. K. Z. Sahbudin, A. Sali, and F. Hashim, “Feature drift aware for intrusion detection system using developed
variable length particle swarm optimization in data stream,” IEEE Access, vol. 11, pp. 1–1, 2023, doi:
10.1109/access.2023.3333000.
[2] H. Gupta, S. Sharma, and S. Agrawal, “Artificial intelligence-based anomalies detection scheme for identifying cyber threat on
iot-based transport network,” IEEE Transactions on Consumer Electronics, pp. 1–1, 2023, doi: 10.1109/tce.2023.3329253.
[3] F. Feng, K. C. Li, J. Shen, Q. Zhou, and X. Yang, “Using cost-sensitive learning and feature selection algorithms to improve the
performance of imbalanced classification,” IEEE Access, vol. 8, pp. 69979–69996, 2020, doi: 10.1109/ACCESS.2020.2987364.
[4] W. A. H. M. Ghanem et al., “Cyber intrusion detection system based on a multiobjective binary bat algorithm for feature selection
and enhanced bat algorithm for parameter optimization in neural networks,” IEEE Access, vol. 10, pp. 76318–76339, 2022, doi:
10.1109/ACCESS.2022.3192472.
[5] Y. Gong, Y. Fang, L. Liu, and J. Li, “Multi-agent intrusion detection system using feature selection approach,” in Proceedings -
2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014, IEEE,
Aug. 2014, pp. 528–531. doi: 10.1109/IIH-MSP.2014.137.
[6] L. Hakim, R. Fatma, and Novriandi, “Influence analysis of feature selection to network intrusion detection system performance
using NSL-KDD dataset,” in Proceedings - 2019 International Conference on Computer Science, Information Technology, and
Electrical Engineering, ICOMITEE 2019, IEEE, Oct. 2019, pp. 217–220. doi: 10.1109/ICOMITEE.2019.8920961.
[7] Y. Su, K. Qi, C. Di, Y. Ma, and S. Li, “Learning automata based feature selection for network traffic intrusion detection,” in
Proceedings - 2018 IEEE 3rd International Conference on Data Science in Cyberspace, DSC 2018, IEEE, Jun. 2018, pp. 622–
627. doi: 10.1109/DSC.2018.00099.
[8] N. Chaabouni, M. Mosbah, A. Zemmari, C. Sauvignac, and P. Faruki, “Network intrusion detection for iot security based on
learning techniques,” IEEE Communications Surveys and Tutorials, vol. 21, no. 3, pp. 2671–2701, 2019, doi:
10.1109/COMST.2019.2896380.
[9] X. Zhang, P. Zhu, J. Tian, and J. Zhang, “An effective semi-supervised model for intrusion detection using feature selection based
LapSVM,” in IEEE CITS 2017 - 2017 International Conference on Computer, Information and Telecommunication Systems,
Int J Adv Appl Sci, Vol. 13, No. 1, March 2024: 24-32
Int J Adv Appl Sci ISSN: 2252-8814 31
Proposed algorithm base optimization scheme for intrusion detection using feature … (Imane Laassar)
32 ISSN: 2252-8814
BIOGRAPHIES OF AUTHORS
Int J Adv Appl Sci, Vol. 13, No. 1, March 2024: 24-32