DCNN-a novel binary and multi-class network intrusion detection model via deep convolutional neural network
DCNN-a novel binary and multi-class network intrusion detection model via deep convolutional neural network
EURASIP Journal on
EURASIP Journal on Information Security (2024) 2024:36
https://doi.org/10.1186/s13635-024-00184-1 Information Security
Abstract
Network security become imperative in the context of our interconnected networks and everyday communications.
Recently, many deep learning models have been proposed to tackle the problem of predicting intrusions and mali-
cious activities in interconnected systems. However, they solely focus on binary classification and lack reporting
on individual class performance in case of multi-class classification. Moreover, many of them are trained and tested
using outdated datasets which eventually impact the overall performance. Therefore, there is a need for an effi-
cient and accurate network intrusion detection system. In this paper, we propose a novel intelligent detection
system based on convolutional neural network, namely DCNN. The proposed model can be utilized to efficiently
analyze and detect attacks and intrusions in intelligent network systems (e.g., suspicious network traffic activities
and policy violations). The DCNN model is applied against three benchmark datasets and compared with state-of-
the-art models. Experimental results show that the proposed model improved resilience to intrusions and malicious
activities for binary as well as multi-class classification, expanding its applicability across different intrusion detec-
tion scenarios. Furthermore, our DCNN model outperforms similar intrusion detection systems in terms of positive
predicted value, true positive rate, F1 measure, and accuracy. The scores obtained for binary and multi-class classifica-
tions on the CICIoT2023 dataset are 99.50% and 99.25%, respectively. Additionally, for the CICIDS-2017 dataset, DCNN
attains a score of 99.96% for both binary and multi-class classifications, while the CICIoMT2024 dataset attains a score
of 99.98% and 99.86% for binary and multi-class classifications, respectively.
Keywords Convolutional neural network, Cybersecurity, Deep learning, Deep neural network, Network intrusion
detection
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/.
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 2 of 23
important research field that requires in-depth investi- bias, overfitting and the exclusion of important informa-
gation and has led to the conception of network intru- tion. As a result, classification algorithms suffer degraded
sion detection systems (NIDSs) [8]. Through their ability performance, particularly for minority samples.
to detect and predict intrusions, NIDS technology has In this paper, convolutional neural network (CNN) and
proven its outstanding reliability in blocking hacking deep neural network (DNN) models are utilized to pro-
attempts by unauthorized users and sustaining network pose an NIDS based on deep learning. This method com-
security [9, 10]. bines the strengths of both deep learning architectures:
Correspondingly, artificial intelligence (AI) [11] that CNNs excel at identifying important patterns by treating
simply refers to machines imitating the intelligent human rows or columns as spatial features, while DNNs capture
behavior [12], is increasingly becoming a key feature complex relationships between features, especially in
in the field of network security and intrusion detec- high-dimensional data or when feature interactions are
tion [13–16]. Machine learning (ML) and deep learning key. This results in a more accurate model with improved
(DL) algorithms being utilized for learning and develop- performance in both multi-class and binary classification
ment, enabling the examination of massive datasets and tasks.
the detection of subtle insights and patterns that sur- Furthermore, as our proposed model targets the
pass human ability, thereby enhancing NIDSs [17, 18]. detection of ongoing malicious intrusions, it is best
DL models are types of ML that takes inspiration from trained using three distinct yet closely related data-
human’s brain with its ability to filter information by uti- sets: CICIoT2023 [33], CICIoMT2024 [34], and CIC-
lizing stacked mathematical functions arranged in lay- IDS-2017 [38]. These datasets are chosen due to the
ers to mimic neurons’ functions of the biological neural imbalance in class distribution, popularity, and the inclu-
networks achieving depth and hence deep learning [19]. sion of a variety of network traffic examples, includ-
These models have the ability to efficiently learn the con- ing both normal and malicious actions. In addition, the
cealed network behavior and features, extract patterns performance of our model is compared to that of the
and non-redundant information from data. Thus, this DNN as well as related state-of-art models. Experimental
approach enables efficient detection of potential attacks findings demonstrate that the proposed model exhibits
in large datasets, unlike ML, which works better with increased F-score, accuracy and detection rate, indicating
smaller datasets [20, 21]. its robustness performance against cyber attacks.
Nevertheless, despite the noticeable advancements in The rest of this paper is organized as follow: We start
attack detection techniques for IoT environments, there with summarizing related work outlining recent DL
are still many limitations in current proposed models. On models used for NIDS. Afterwards, the proposed model
the one hand, some existing research primarily empha- is presented thoroughly in Sect. 3. The experimental
sizes binary classification (e.g., see [7, 22, 23]). However, environment configuration and result assessment as well
this approach is limited, as different attack types require as the analysis of our experimental findings are presented
individual analysis and investigation. On the other hand, in Sect. 4. Section 5 discusses the implications and the
other researchers may direct their attention towards limitation of presented model. Finally, the paper is closed
multi-class classification (e.g., see [5, 24–31]), disre- with conclusions and an outline for the future work.
garding the notion of binary classification. Furthermore,
monitoring the performance of each class is impera- 2 Related work
tive in order to guarantee that the model is functioning Motivated by the need of detecting intrusions and
effectively across all class. However, many models lack defending networks against attacks, researchers have
reporting on the performance of individual classes in put a lot of efforts into developing a variety of network
multi-class classification (e.g., see [6, 26, 27, 29, 32–37]). intrusion detection systems (NIDS) [5, 6, 24–29, 32, 35,
Hence, resulting in an incomplete comprehension of the 36]. Table 1 provides a summary of the limitations and
model’s performance. Moreover, another issue of the cur- strengths of recently published models. In the following,
rently proposed NIDS models is the utilization of out- we further elaborate on the strengths and limitations of
dated datasets during model training and testing which these models.
do not accurately reflect modern IoT network traffic (e.g., Various NIDS approaches offer promising results, but
see [6, 22, 28, 29]). Thus, leading to poor model generali- they are only tested and validated via multi-class classi-
zation and inaccurate model performance metrics when fication. For instance, recently, authors in [5] proposed
deployed in real-world scenarios. Besides, the improper a hybrid IDS based on DL by merging CNN and LSTM
selection and adjustment of techniques and key param- (long short-term memory network) while optimizing
eters related to reducing dimensionality and sampling category weights to reduce data imbalance. Similarly,
leads to defects in the entire process, such as potential in [24], an IDS based on CNN is proposed to attain the
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 3 of 23
Table 1 Summary of the limitations and strengths (in table footnotes) of existing IDS models
# Limitations Model
security of network against intrusions, while in [25], there is a possibility to exclude important information
an IDS based on Robust Transformer named RTIDS, when reducing features. Another novel technique for
is introduced. In addition, a two-stage hybrid model intrusion detection in IoT named DIS-IoT is proposed
based on DL named LSTM-AE is proposed in [26], while in [36]. The proposed model utilizes a stacking ensem-
in [30], an innovative approach was introduced. That ble of four DL models integrated into a deep fully con-
approach focus on enhancing attack detection through nected layer. In the same way, in [37] an innovative
leverageing a blending model, offering high accuracy and model, namely, CKAN (Convolutional Kolmogorov-
a deeper understanding of attack classifications. In addi- Arnold Network) is proposed. The CKAN model is
tion, authors in [27] proposed a one-dimensional Pyra- developed through replacing the MLPs (Multi-Layer
mid Depth-wise Separable CNN IDS (PyDSC-IDS) based Perceptrons) layers with KANs layers inside the CNN
on OneHot Encoding and VGM, while another intrusion architecture.
detection method is introduced in [29]. In a similar man- Moreover, some approaches are limited to binary
ner, authors in [31] introduced IoT-PRIDS, novel packet classification. For example, in [22], an approach to
representation-based approach for intrusion detection. select optimal features is proposed by combining sta-
This approach is a lightweight non-ML model that can tistical importance through the difference in stand-
operate in near real-time scenarios, as it only requires ard deviation of median and mean. Moreover, in [23],
parsing packets. a two-phase IDS is introduced to classify multi-class
Furthermore, many of the presented models do not data using Naive Bayes classifier, followed by major-
provide in-depth information regarding the perfor- ity voting. In the second phase, binary data are passed
mance of individual classes in the context of multi- through an unsupervised elliptic envelope for classifi-
class classification. Examples of models that belong to cation. Another approach to intrusion detection based
this category include those introduced in [26, 27, 29]. on AutoML and a soft voting is proposed in [7]. This
Likewise, authors in [6] proposed a hierarchical CNN- approach utilizes an AutoML framework to select
Attention network named CANET with Equalization supervised classifiers and sampling methods to address
Loss v2 in order to balance learning attention of minor- data imbalance and minority classes.
ity classes by increasing their weights. This proposal Additionally, some developed models adopts sam-
was found to enhance the rate of detecting minority pling techniques to solve the problem of data imbal-
classes. In addition, a hybrid IDS combining the feature ance, However, although sampling techniques serves as
extraction ability of the LSTM and the spatial feature a solution to the problem of data imbalance, the rep-
extraction ability of the CNN is proposed in [32], while lication of minority class events may increase the risk
in [35], a bi-phase NIDS that utilizes a combination of of overfitting and time consumption. As examples of
PSO and GA for feature selection is outlined. However, models that fit into this category are those proposed in
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 4 of 23
Table 2 Summary of the CICIoT2023, CICIDS-2017 and CICIoMT2024 datasets [33, 34, 38]
Class CICIoT2023 Class CICIDS-2017 Class CICIoMT2024
Samples % Samples % Samples %
Fig. 1 A schematic diagram depicting the process of data preprocessing. First, we load and merge raw data. Then, errors, inconsistencies
and duplicates are eliminated from the data in the cleaning phase. Unnecessary features are excluded during the feature selection phase. After that,
non-numerical values are transformed into numerical values with the use of one-hot-encoder during the data encoding phase. Moreover, data
is normalized in the feature scaling stage using standardization scaling technique. Finally, data was split into training 75% and testing 25%
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 6 of 23
Fig. 2 The architecture of DCNN model merges the strengths of CNN and DNN models. CNN is exceptional in spatial and sequential feature
extraction. The fully connected DNN layer acts as a powerful classifier. This integration enables a smooth and efficient workflow by using spatial
features for precise categorization and labeling of input data
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 7 of 23
Pooling layers are utilized to compress the size of the from the original values yields a maximal loss. The train-
generated map, thereby reducing parameters and com- ing process comprises batch size of 128, with a total of 30
putations while preserving significant features. Thus, epochs being executed.
exerting regulation over the challenge of overfitting. The
max pooling process is applied to each window of size 3.4 Evaluation metric
three along the sequence returning the maximum value. The performance assessment of the suggested NIDS
Conversely, up-sampling layers, which are essentially the model is accomplished by utilizing confusion matrix indi-
antithesis of pooling layers, augment the dimensions of cators. The following metrics are calculated as follows:
the map in preparation for the subsequent convolutional
TP
layer. The second and third convolutional layers extract DR = TPR = Sensitivity = Recall = (5)
higher-level features from the input sequence data, TP + FN
encompassing patterns and textures.
The DNN of the hybrid model is designed to classify TP
PPV = Precision = (6)
the output of the CNN component into different classes TP + FP
of network traffic. The structural design of the DNN com-
prises of three fully connected layers (FCLs) each with TN
256, 128, and 128 neurons, respectively. Upon the com- TNR = Specificity = (7)
TN + FP
pletion of feature extraction for the training samples, the
data undergoes flattening process and is subsequently Precision × Recall
fed into a sequence of deep FCLs. Each FCL employs the F 1 − score = 2 × (8)
Precision + Recall
ReLU activation function, which is followed by a dropout
layer that removes certain nodes from the network during
training, thus eliminating their influence on prediction TP + TN
Accuracy = (9)
and back-propagation to mitigate the impact of overfit- TP + FP + FN + TN
ting, ultimately facilitating the classification process. The acronyms TP, FP, FN, and TN represent true posi-
Subsequently, the output layer estimates the prob- tive, false positive, false negative, and true negative.
abilities associated with each possible class as the final Moreover, DR and TPR in Eq. (5) represent detection
output. The output layer is equipped with the softmax rate and true positive rate, respectively, are metrics used
function [12] defined by (3) in the context of multi-class to assess the performance of a classification model in the
classification, whereas in the scenario of binary-class positive class. They quantify how well the model identi-
classification, the sigmoid function [41] given by (4) is fies positive instances correctly. PPV in Eq. (6) represents
employed. positive predictive value; it is a metric that focuses on the
e xi positive class and quantifies the number of actual posi-
Softmax(xi ) = K xj
for i = 1, 2, . . . , K (3) tive samples out of all the samples predicted as positive.
j=1 e TNR in Eq. (7) is an abbreviation for true negative rate, it
measures the model’s ability to correctly identify negative
1 instances. Moreover, Eq. (8) shows F1-score that is the
Sigmoid(x) = (4)
1 + e−x harmonic mean of the PPV and TPR. Accuracy metric in
Eq. (9), represents the total of correctly identified samples
The CNN and DNN are integrated to create our hybrid in our test data. In summary, these acronyms and metrics
model by flattening the output of the CNN part, then are crucial in evaluating the effectiveness and reliability
feeding it into the DNN part, which generates the final of classification models and frequently employed when
output of the model as shown in Fig. 2. evaluating machine learning models.
The model undergoes training through the Adam opti-
mization algorithm with a learning rate set to 0.0001 (see
4 Results
Sect. 5 for more details). The binary crossentropy loss
This section presents an assessment of the proposed
function is employed for binary-class classification tasks,
model, encompassing a range of performance metrics
whereas the categorical crossentropy loss function is uti-
such as accuracy, precision, recall, and F1-score (see
lized in the context of multi-class classification scenarios. Sect. 3.4). Furthermore, it showcases the visualization of
Loss functions assess the extent to which the model’s the model’s accuracy, model’s loss, and confusion matrix,
predictions diverge from the actual outcomes. A closer effectively illustrating its performance. In addition to
alignment of the model’s predictions to the true values this, it furnishes the outcomes of our testing, providing
results in a minimal loss, while a significant deviation valuable insights into the identification and explication
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 8 of 23
of diverse attack types and normal behavior in two dis- becomes apparent that certain classes within the dataset
tinct scenarios: multi-class classification and binary-class are exceptionally well-classified, particularly for those
classification. The primary objective is to comprehen- that possess a substantial number of instances within
sively investigate the implications and outcomes associ- the dataset. The benign class demonstrates particularly
ated with each approach and their overall impact on the impressive metrics, including a TPR of 95.02% and a
analysis. TNR of 99.58%, indicating robust capabilities in identify-
ing benign instances.
4.1 Experiment configuration Furthermore, the model excels in detecting certain
Table 3 outlines the details of the experimental design attacks with minuscule misclassification rate, notably
and setup that were employed in order to evaluate the DDoS, DoS, and Mirai, which achieves a remarkable
various parameters of the model with utmost accuracy overall metrics of over 99%, alongside a low FPR and
and F-score, facilitating the attainment of reliable and FNR. Therefore, the proposed model exhibits the high-
dependable results that can be used for further analysis est values in terms of DR and F-score for these classes.
and interpretation. This highlights the effectiveness of the proposed model in
accurately detecting and classifying instances pertaining
4.2 Experimental analysis to these specific attack types.
Table 4 provides a comprehensive overview of the per- Furthermore, to gain further insights into the perfor-
formance exhibited by the proposed model using multi- mance of the proposed model, it is crucial to examine the
class classification based on the CICIoT2023 dataset. It results depicted in the confusion matrix, as illustrated in
is worth noting that the proposed model demonstrates Fig. 3c and d. The confusion matrix provides a visual rep-
a good overall performance, achieving high accuracy resentation of the performance of the model, showcas-
rates consistently across various categories, with accu- ing the classification results for each class. Analyzing the
racy ranging from 99.47 to 100%. Moreover, upon confusion matrix enables a more nuanced understanding
closer examination of the results presented in Table 4, it of the model’s performance and the specific areas where
improvements can be made. In order to provide a com-
prehensive overview of the findings, the performance
Table 3 Experiment configuration results have been visually depicted in Fig. 3a and b. This
visual representation serves as a succinct summary of the
OS Windows 11 Home
performance.
CPU Intel Core i7 12700H
14core cash 24MB In contrast, for the case of binary-class classification
GPU NVIDIA RTX 3060 (6GB) using the CICIoT2023 dataset, one can observe a dis-
RAM DDR5 16GB 4800 Mhz cernible improvement in the overall performance of the
Anaconda 23.7.4 proposed model with respect to overall evaluation met-
Python 3.8.20 rics, see Table 5. Both classes achieve good performance
Tensorflow 2.7 in terms of high TPR, accuracy, PPV, F-score, and TNR
Tensorflow-gpu 2.10 alongside a low FPR and FNR .The enhancements can
be witnessed in a steady manner. The comprehensive
BenignTraffic DNN DCNN 97.46% 95.02% 99.71% 99.58% 0.29% 0.42% 2.54% 4.98% 88.86% 84.37% 92.96% 89.38% 99.65% 99.47%
Brute - Force DNN DCNN 19.83% 24.94% 100.00% 0.00% 0.00% 80.17% 75.06% 96.36% 76.34% 32.89% 37.59% 99.98% 99.98%
100.00%
DDoS DNN DCNN 99.99% 99.98% 99.84% 99.86% 0.16% 0.14% 0.01% 0.02% 99.94% 99.95% 99.96% 99.97% 99.95% 99.95%
DoS DNN DCNN 99.78% 99.82% 99.99% 99.99% 0.01% 0.01% 0.22% 0.18% 99.97% 99.95% 99.88% 99.88% 99.96% 99.96%
Mirai DNN DCNN 99.98% 99.97% 100.00% 0.00% 0.00% 0.02% 0.03% 99.98% 99.97% 99.98% 99.97% 100.00% 100.00%
100.00%
Recon DNN DCNN 78.70% 65.34% 99.89% 99.90% 0.11% 0.10% 21.30% 34.66% 84.50% 83.38% 81.50% 73.26% 99.73% 99.64%
Spoofing DNN DCNN 78.70% 74.67% 99.90% 99.81% 0.10% 0.19% 21.30% 25.33% 89.08% 80.64% 83.57% 77.54% 99.68% 99.55%
Web - based DNN DCNN 33.75% 12.82% 100.00% 0.00% 0.00% 66.25% 87.18% 97.91% 65.77% 50.19% 21.46% 99.97% 99.95%
100.00%
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 9 of 23
Fig. 3 CICIoT2023 performance analysis in terms of multi-class classification (8 classes). The analysis includes visualizing accuracy and loss curves,
alongside the confusion matrix of the proposed model, to assess the model’s learning and improvement over time
BenignTraffic DNN DCNN 99.71% 99.68% 91.62% 92.16% 8.38% 7.84% 0.29% 0.32% 99.80% 99.81% 99.75% 99.75% 99.52% 99.50%
Attack DNN DCNN 91.62% 92.16% 99.71% 99.68% 0.29% 0.32% 8.38% 7.84% 88.42% 87.41% 89.99% 89.72% 99.52% 99.50%
outcomes of the proposed model are visually represented demonstrate that the suggested methods exhibit good
in Fig. 4a depicting the model accuracy, and Fig. 4b performance in the scenario of multi-class classifica-
showcasing the model loss providing a graphical repre- tion with TPR, TNR and accuracy consistently exceed-
sentation of the model’s performance. Additionally, the ing threshold of 99% alongside very low FPR and FNR
confusion matrix of the proposed model can be found in for the majority of classes. Notably, Heartbleed and
Fig. 4c and d. Infiltration achieves the highest value of 100% in all
Furthermore the classification process applied to metrics alongside a FPR and FNR of 0%, reflecting
the CICIDS-2017 dataset, which is detailed in Table 6, the model’s robustness in detecting these specific
yielded highly favorable outcomes across multiple minor classes. However, while these metrics showcase
classes from the model proposed. The obtained results the model’s strengths, a comprehensive examination
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 10 of 23
Fig. 4 CICIoT2023 performance analysis in terms of binary-class classification (2 classes). The analysis includes visualizing accuracy and loss curves,
alongside the confusion matrix of the proposed model, to assess the model’s learning and improvement over time
reveals some notable limitations. For instance, the enhance the clarity and comprehensibility of the obtained
model’s performance on Web Attack-Brute Force and results, allowing for a more thorough evaluation of the
Web Attack-XSS, where the TPR drops to 97.58% proposed model’s efficacy.
and 97.25%, respectively, in addition to PPV with Furthermore, a similar trend emerges when binary-
85.23% and 83.10%, and the FNR increases to 2.42% class classification is applied to the CICIDS2017 data-
and 2.75%, respectively. Moreover, Web Attack-Sql set, as showcased in Table 7. Once again, the proposed
Injection achieves a noticeable low TPR and F-score model exhibits a high level of performance in this con-
of 42.86% and 50% alongside a high FNR of 57.14%. text, further validating their effectiveness. The corre-
This discrepancy can be attributed to the fact that the sponding confusion matrix can be found in Fig. 6c and
testing data comprises a relatively small number of d, while Fig. 6a and b provide insights into the model’s
instances pertaining to this specific attack type (also performance in terms of both model accuracy and
see Sect. 5). model loss.
Nevertheless, it is important to highlight the robustness Moreover, the classification methodology of our model
of the overall performance of the model. In order to pro- is also applied on the CICIoMT2024 dataset, as detailed
vide a more comprehensive understanding of the model’s in Table 8, yielded exceptionally favorable results across
performance, the resulting confusion matrix is presented various categories derived from the proposed model. The
in Fig. 5c and d. Additionally, Fig. 5a and b visually depict acquired findings indicate that the recommended method-
the model’s performance in terms of both model accu- ologies demonstrate good performance within the context
racy and model loss. These visual representations serve to of multi-class classification, with accuracy levels surpassing
Table 6 Multi-class classification performance analysis with CICIDS-2017
Class Model TPR TNR FPR FNR PPV F1 measure Accuracy
BENIGN DNN DCNN 99.95% 99.97% 99.93% 99.97% 0.07% 0.03% 0.05% 0.03% 99.98% 99.99% 99.96% 99.98% 99.94% 99.97%
Bot DNN DCNN 96.00% 99.11% 100.00% 100.00% 0.00% 0.00% 4.00% 0.89% 98.86% 94.69% 97.41% 96.85% 100.00%
100.00%
DDoS DNN DCNN 99.98% 99.99% 100.00% 100.00% 0.00% 0.00% 0.02% 0.01% 99.93% 99.99% 99.95% 99.99% 100.00%
Shebl et al. EURASIP Journal on Information Security
100.00%
DoS GoldenEye DNN DCNN 99.23% 99.45% 100.00% 100.00% 0.00% 0.00% 0.77% 0.55% 99.04% 99.82% 99.14% 99.63% 99.99% 100.00%
DoS Hulk DNN DCNN 99.99% 99.98% 99.98% 99.99% 0.02% 0.01% 0.01% 0.02% 99.74% 99.90% 99.87% 99.94% 99.98% 99.99%
DoS Slowhttptest DNN DCNN 99.00% 99.64% 100.00% 100.00% 0.00% 0.00% 1.00% 0.36% 98.72% 99.71% 98.86% 99.68% 100.00%
100.00%
DoS slowloris DNN DCNN 98.99% 99.59% 100.00% 100.00% 0.00% 0.00% 1.01% 0.41% 99.19% 99.73% 99.09% 99.66% 100.00%
(2024) 2024:36
100.00%
FTP-Patator DNN DCNN 99.65% 99.85% 100.00% 100.00% 0.00% 0.00% 0.35% 0.15% 99.75% 100.00% 99.70% 99.93% 100.00%
100.00%
Heartbleed DNN DCNN 100.00% 100.00% 100.00% 100.00% 0.00% 0.00% 0.00% 0.00% 100.00% 100.00% 100.00% 100.00% 100.00%
100.00%
Infiltration DNN DCNN 100.00% 100.00% 100.00% 100.00% 0.00% 0.00% 0.00% 0.00% 100.00% 100.00% 100.00% 100.00% 100.00%
100.00%
PortScan DNN DCNN 99.93% 99.98% 100.00% 100.00% 0.00% 0.00% 0.07% 0.02% 99.99% 99.98% 99.96% 99.98% 100.00%
100.00%
SSH-Patator DNN DCNN 98.87% 99.22% 100.00% 100.00% 0.00% 0.00% 1.13% 0.78% 99.22% 99.50% 99.04% 99.36% 100.00%
100.00%
Web Attack - Brute DNN DCNN 99.28% 97.58% 99.99% 99.99% 0.01% 0.01% 0.72% 2.42% 85.63% 85.23% 91.95% 90.99% 99.99% 99.99%
Force
Web Attack - Sql DNN DCNN 28.57% 42.86% 100.00% 100.00% 0.00% 0.00% 71.43% 57.14% 100.00% 60.00% 44.44% 50.00% 100.00%
Injection 100.00%
Web Attack - XSS DNN DCNN 95.05% 97.25% 100.00% 99.99% 0.00% 0.01% 4.95% 2.75% 85.64% 83.10% 90.10% 89.62% 99.99% 99.99%
Page 11 of 23
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 12 of 23
Fig. 5 CICIDS-2017 performance analysis in terms of multi-class classification (8 classes). The analysis includes visualizing accuracy and loss curves,
alongside the confusion matrix of the proposed model, to assess the model’s learning and improvement over time
BENIGN DNN DCNN 99.93% 99.94% 99.95% 99.96% 0.05% 0.04% 0.07% 0.06% 99.79% 99.83% 99.86% 99.89% 99.95% 99.96%
Attack DNN DCNN 99.95% 99.96% 99.93% 99.94% 0.07% 0.06% 0.05% 0.04% 99.98% 99.99% 99.97% 99.97% 99.95% 99.96%
the threshold of 99.9%, and TPR and TNR predominantly Nevertheless, while these metrics showcase the mod-
exceeding the threshold of 99%, alongside remarkably FPR el’s advantageous attributes, a thorough analysis unveils
and FNR for the majority of classifications. This is particu- several significant constraints. For instance, there exists
larly evident in the classes MQTT-DDoS-Connect_Flood, a slight diminishment in performance with the MQTT-
MQTT-DDoS-Publish_Flood, MQTT-DoS-Connect_Flood, DDoS-Publish_Flood class, which attained a TPR of
DoS-Publish_Flood, TCP_IP-DDoS-ICMP, TCP_IP-DDoS- 99.76%, an FNR of 0.24%, and an F-score of 99.88%. Simi-
SYN, TCP_IP-DDoS-TCP, TCP_IP-DDoS-UDP, TCP_IP- larly, the MQTT-DoS-Connect_Flood class achieved a
DoS-ICMP, TCP_IP-DoS-SYN, TCP_IP-DoS-TCP, and PPV of 99.70% and an F-score of 99.85%. Furthermore,
TCP_IP-DoS-UDP. ARP_Spoofing attained a TPR of 97.60%, FNR of 2.40%,
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 13 of 23
Fig. 6 CICIDS-2017 performance analysis in terms of binary-class classification (2 classes). The analysis includes visualizing accuracy and loss curves,
alongside the confusion matrix of the proposed model, to assess the model’s learning and improvement over time
PPV of 93.08%, and F-score of 95.25%, while the Benign Nonetheless, it is essential to emphasize the resilience
class recorded a TPR of 99.44%, FNR of 0.56%, PPV of the model’s overall performance. To facilitate a more
of 99.81%, and an F-score of 99.63%. Additionally, the exhaustive comprehension of the model’s efficacy, the cor-
MQTT-Malformed_Data class obtained a TPR of 91.34%, responding confusion matrix is illustrated in Fig. 7c and
FNR of 8.66%, PPV of 96.47%, and an F-score of 93.83%. d. Furthermore, Fig. 7a and b represent the model’s per-
Conversely, one can discern a significant decline in per- formance with respect to both accuracy and loss metrics.
formance with the Recon-OS-Scan class achieving a PPV However, in the context of binary-class classification,
of 74.08% and an F-score of 84.44%, while the Recon- one can observe a significant enhancement in the over-
Ping_Sweep class recorded a TPR of 80.89%, an FNR of all performance metrics associated with the model’s
19.11%, PPV of 88.35%, and an F-score of 84.45%. In a predictive capabilities as outlined in Table 9. It is par-
similar vein, the Recon-Port_Scan class attained a TPR of ticularly noteworthy that both of the distinct classes
93.94%, an FNR of 6.06%, PPV of 97.81%, and an F-score within the binary classification attained remarkably
of 95.84%, whereas the Recon-VulScan class exhibited high values for TPR, TNR, PPV, F-score, and accu-
the poorest performance with a TPR of 37.20%, an FNR racy, all exceeding the remarkable threshold of 99%,
of 62.80%, and an F-score of 54.04%. This variation can while simultaneously maintaining very low rates for
be attributed to the relatively limited number of instances both FPR and FNR. Also, when binary-class classi-
within the testing dataset that pertain to this specific fication is applied to the CICIoMT2024 dataset, the
attack type. proposed model exhibits a high level of performance
Table 8 Multi-class classification performance analysis with CICIoMT2024
Class Model TPR TNR FPR FNR PPV F1 measure Accuracy
ARP_Spoofing DNN DCNN 86.59% 97.60% 99.98% 99.99% 0.02% 0.01% 13.41% 2.40% 90.68% 93.08% 88.59% 95.29% 99.95% 99.98%
Benign DNN DCNN 99.29% 99.44% 99.97% 99.99% 0.03% 0.01% 0.71% 0.56% 98.93% 99.81% 99.11% 99.63% 99.95% 99.98%
MQTT-DDoS-Con- DNN DCNN 99.97% 100.00% 100.00% 100.00% 0.00% 0.00% 0.03% 0.00% 99.82% 99.98% 99.89% 99.99% 99.99% 100.00%
nect_Flood
MQTT-DDoS-Publish_ DNN DCNN 98.95% 99.76% 100.00% 100.00% 0.00% 0.00% 1.05% 0.24% 99.79% 99.99% 99.36% 99.88% 99.99% 100.00%
Flood
MQTT-DoS-Connect_ DNN DCNN 97.64% 100.00% 100.00% 100.00% 0.00% 0.00% 2.36% 0.00% 99.49% 99.70% 98.56% 99.85% 99.99% 100.00%
Flood
Shebl et al. EURASIP Journal on Information Security
MQTT-DoS-Publish_ DNN DCNN 99.86% 99.97% 100.00% 100.00% 0.00% 0.00% 0.14% 0.03% 99.37% 99.99% 99.61% 99.98% 100.00%
Flood 100.00%
MQTT-Malformed_ DNN DCNN 96.24% 91.34% 99.99% 100.00% 0.01% 0.00% 3.76% 8.66% 87.84% 96.47% 91.85% 93.83% 99.99% 99.99%
Data
Recon-OS_Scan DNN DCNN 83.46% 98.16% 99.97% 99.92% 0.03% 0.08% 16.54% 1.84% 87.12% 74.08% 85.25% 84.44% 99.93% 99.91%
(2024) 2024:36
Recon-Ping_Sweep DNN DCNN 84.00% 80.89% 100.00% 100.00% 0.00% 0.00% 16.00% 19.11% 78.75% 88.35% 81.29% 84.45% 100.00%
100.00%
Recon-Port_Scan DNN DCNN 96.95% 93.94% 99.95% 99.97% 0.05% 0.03% 3.05% 6.06% 96.08% 97.81% 96.51% 95.84% 99.92% 99.90%
Recon-VulScan DNN DCNN 50.60% 37.20% 99.99% 100.00% 0.01% 0.00% 49.40% 62.80% 67.26% 98.72% 57.75% 54.04% 99.97% 99.98%
TCP_IP-DDoS-ICMP DNN DCNN 100.00% 99.98% 100.00% 100.00% 0.00% 0.00% 0.00% 0.02% 100.00% 100.00% 100.00% 99.99% 100.00%
100.00%
TCP_IP-DDoS-SYN DNN DCNN 99.99% 100.00% 100.00% 100.00% 0.00% 0.00% 0.01% 0.00% 100.00% 99.97% 99.99% 99.98% 100.00%
100.00%
TCP_IP-DDoS-TCP DNN DCNN 99.99% 100.00% 100.00% 100.00% 0.00% 0.00% 0.01% 0.00% 100.00% 99.99% 99.99% 100.00% 100.00%
100.00%
TCP_IP-DDoS-UDP DNN DCNN 100.00% 100.00% 100.00% 100.00% 0.00% 0.00% 0.00% 0.00% 99.99% 100.00% 100.00% 100.00% 100.00%
100.00%
TCP_IP-DoS-ICMP DNN DCNN 99.99% 99.98% 100.00% 100.00% 0.00% 0.00% 0.01% 0.02% 100.00% 99.99% 100.00% 99.98% 100.00%
100.00%
TCP_IP-DoS-SYN DNN DCNN 99.96% 99.98% 100.00% 100.00% 0.00% 0.00% 0.04% 0.02% 100.00% 99.99% 99.98% 99.98% 100.00%
100.00%
TCP_IP-DoS-TCP DNN DCNN 99.99% 99.98% 100.00% 100.00% 0.00% 0.00% 0.01% 0.02% 99.95% 99.97% 99.97% 99.98% 100.00%
100.00%
TCP_IP-DoS-UDP DNN DCNN 99.99% 99.99% 100.00% 100.00% 0.00% 0.00% 0.01% 0.01% 99.98% 99.99% 99.99% 99.99% 100.00%
100.00%
Page 14 of 23
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 15 of 23
Fig. 7 CICIoMT2024 performance analysis in terms of multi-class classification (19 classes). The analysis includes visualizing accuracy and loss
curves, alongside the confusion matrix of the proposed model, to assess the model’s learning and improvement over time
Attack DNN DCNN 99.93% 100.00% 99.69% 99.32% 0.31% 0.68% 0.07% 0.00% 99.99% 99.98% 99.96% 99.99% 99.93% 99.98%
Benign DNN DCNN 99.69% 99.32% 99.93% 100.00% 0.07% 0.00% 0.31% 0.68% 97.60% 99.82% 98.63% 99.57% 99.93% 99.98%
in this context, further validating their effectiveness. proposed by our work as compared to the state-of-the-art
The corresponding confusion matrix can be found in models on the CICIoT2023 dataset. Notwithstanding, the
Fig. 8c and d, while Fig. 8a and b provide insights into obtained results indicate that our model, when compared
the model’s performance in terms of both model accu- to the related-work on the CICIoT2023 dataset, achieves
racy and model loss (Table 10). a level of accuracy and F-score and PPV that surpasses
As a final assessment of the outcomes derived from that of [33], although, it falls slightly short in terms of
the process of evaluation, Table 11 provides a compre- accuracy and TPR. Also, it outperforms that of [31] in
hensive analysis of the overall performance of the model terms of accuracy, PPV, and F-score, while it exhibits a
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 16 of 23
Fig. 8 CICIoMT2024 performance analysis in terms of binary-class classification (2 classes). The analysis includes visualizing accuracy and loss
curves, alongside the confusion matrix of the proposed model, to assess the model’s learning and improvement over time
Table 10 Performance benchmarking of the proposed model versus other related work with CICIoMT2024 dataset
Ref Model Dataset PPV TPR F1 measure Accuracy
[34] Logistic regression CICIoMT2024 B: 95.20% 95.90% B: 94.00% 96.10% B: 94.60% 95.90% B: 99.50% 99.60%
Adaboost DNN 95.60% 97.10% 94.80% 95.10% 95.20% 96.10% 99.60% 99.60%
Random Forest M: 54.70% 14.40% M: 47.10% 23.80% M: 43.20% 14.10% M: 72.70% 42.20%
64.90% 69.10% 55.30% 57.70% 52.20% 55.10% 72.90% 73.30%
Proposed DNN DCNN CICIoMT2024 B: 99.93% 99.98% B: 99.93% 99.98% B: 99.93% 99.98% B: 99.93% 99.98%
M: 99.84% 99.86% M: 99.84% 99.86% M: 99.84% 99.86% M: 99.84% 99.86%
minor deficiency with respect to TPR. However, it sur- Table 12 illustrates the comparison between our work
passes that of [37] in terms of all metrics. Nevertheless, and the state-of-the-art models on the CICIDS-2017
our model still exhibit exceptional performance, particu- dataset. It is important to note that our proposed model
larly in terms of PPV and F-score metrics, surpassing surpasses the related work in all aspects, particularly
those of the State-of-the-Art. in terms of F-score in both binary and multi-class clas-
sification. While [26] manages to achieve a high level
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 17 of 23
Table 11 Performance benchmarking of the proposed model versus other related work with CICIoT2023 dataset
Ref Model Dataset PPV TPR F1 measure Accuracy
[30] Blending CICIoT2023 IoTID20 M: 98.51% 100.00% M: 99.63% 100.00% M: 99.07% 100.00% M: 99.51% 100.00%
[31] IoT-PRIDS CICIoT2023 M: 93.84% M: 99.71% M: 95.29% M: 98.74%
[33] Logistic regression CICIoT2023 B: 86.31% 82.54% B: 89.04% 79.70% B: 87.62% 81.05% B: 98.90% 98.17%
Perception Adaboost 96.56% 96.53% 94.73% 96.51% 95.62% 96.52% 99.58% 99.68%
Random Forest DNN 94.75% 93.32% 94.03% 99.44%
M: 51.24% 52.39% M: 69.60% 65.91% M: 53.94% 55.51% M: 83.16% 86.63%
46.49% 70.54% 48.77% 91.00% 36.86% 71.92% 35.13% 99.43%
67.94% 90.66% 69.72% 99.11%
[37] CKAN CICIoT2023 NSL_KDD B: 99.81% 98.36% B: 99.40% 98.90% B: 99.60% 98.63% B: 99.22% 98.71%
TONIoT 99.95% 99.91% 99.93% 99.93%
M: 98.84% 99.20% M: 98.84% 99.20% M: 98.84% 99.20% M: 98.84% 99.20%
93.30% 93.30% 93.30% 93.30%
Proposed DNN DCNN CICIoT2023 B: 99.52% 99.50% B: 99.52% 99.50% B: 99.52% 99.50% B: 99.52% 99.50%
M: 99.45% 99.25% M: 99.45% 99.25% M: 99.45% 99.25% M: 99.45% 99.25%
of performance in terms of accuracy, wherein we have Notably, the CICIDS-2017 dataset demonstrates the
achieved scores above 99.94%. lowest inference times of 44.60 s and 41.72 s for multi-
A comparison between our work and the state-of-the- class and binary-class, respectively.
art models on the CICIoMT2024 dataset is outlined in
Table 10. It is important to note that our proposed model 5 Discussion
surpasses the related work in all aspects in both binary Many different machine learning models have been pre-
and multi-class classification. viously developed to address the issue of network intru-
In summary, the analysis conducted on both the CIC- sion detection from different perspectives. In this paper,
IDS-2017 and CICIoT2023 datasets clearly demonstrates our objective is to take advantage of the power of comb-
the effectiveness of the proposed model in both multi- ing CNNs and DNNs in one model in order to improve
class and binary-class classification approaches. The the overall NIDS system performance and accuracy.
obtained results, supported by the visual representations When this new hybrid model is trained using three dif-
and statistical evaluations, highlight the remarkable accu- ferent datasets, it shows a better performance for both
racy and robustness of the model. Thus signifying their binary and multi-class classification.
potential for real-world implementation and contribution However, despite these strengths, some limitations
to enhanced detection and prevention of diverse attack emerge, particularly in the detection of certain classes
types in the cybersecurity field. such as Brute-Force and Web-based, that are frequently
Table 13 presents time analysis of fitting and infer- misclassified as benign, recon, or spoofing, leads to a
ence times for both binary and multi-class classi- minor degradation in performance for these classes. For
fication across the three datasets: CICIoMT2024, instance, Brute-Force presents a concerning TPR of only
CICIoT2023, and CICIDS-2017. The results indicate 24.94%, which indicates that the model struggles to iden-
that fitting times vary between multi-class and binary tify this attack effectively, despite a high TNR of 100%.
classifications, with the CICIoMT2024 dataset show- Such misclassifications highlight potential issues in the
ing fitting times of 356 s for multi-class and 390 s for training data or suggesting that the model may not be
binary-class. Significantly, the CICIDS-2017 dataset adequately capturing the unique characteristics of these
displays the minimal fitting times, recorded at 147 s attacks, that may lead to underperformance in identify-
for multi-class and 138 s for binary-class, whereas the ing Brute-Force attacks. Further considering this aspect
CICIoT2023 dataset shows the highest fitting times in future model development by comparing the model
of 562 s and 469 s, respectively. Regarding inference response across different datasets will indeed enhance its
times, the CICIoMT2024 dataset reflects a multi- accuracy.
class inference time of 127.18 s compared to 119.75s Similarly, Web-based exhibits a low TPR of just
for binary-class. The CICIoT2023 dataset reports an 12.82% and a high FNR of 87.18%, indicating a gap in
inference times of 238.67 s and 180.88 s, respectively. the detection capabilities. Such misclassifications also
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 18 of 23
Table 12 Performance benchmarking of the proposed model versus other related work with CICIDS-2017 dataset
Ref Model Dataset PPV TPR F1 measure Accuracy
Table 13 Runtime analysis of the proposed model for training as well as inference stages
Dataset Case study Fitting time (per epoch) Inference time (25%) Inference time (per record)
highlight potential issues in the training data or fea- while the model performs well overall, certain attack
ture selection that may lead to inefficiency in identify- types are not being effectively recognized. Nonethe-
ing these specific attacks. In addition, The results for less, it is noteworthy that despite the model making
Recon and Spoofing attacks also reveal challenges, with these errors due to the similarities in data patterns, the
TPR of 65.34% and 74.67%, respectively, suggesting that
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 19 of 23
Fig. 9 Comparison of the model accuracy when two hyper-parameters are variated: (a) learning rate and (b) optimizers. The value of learning
rate of 0.0001 provides the best accuracy for both binary and multi-class classification, while Adam optimizer algorithm offers the best accuracy
compared to other considered algorithms
classification process is successful in the majority of values. Figure 10a through d further confirm that 0.0001
other cases. optimizes both accuracy and learning loss. This learn-
Furthermore, when dealing with class imbalance, ing rate is also commonly used in similar models across
there are many techniques that can be employed to the literature. Lower values than 0.0001 were found to
address the issue, including oversampling the minor- negatively impact accuracy and loss. Thus, we selected
ity class, under-sampling the majority class, and using 0.0001 as the optimal learning rate for training our
class weights. However, in this study, we deliberately model in this study.
chose to focus on data standardization to reduce the Similarly, various optimization algorithms are com-
effect of class imbalance. By standardizing the data, monly used to train deep neural networks effectively.
we ensured that all features have equal importance, In this study, we experimented with three optimizers-
which helped to mitigate the impact of class imbalance. Adam, SGD, and Adagrad-to identify the best fit for
Additionally, we employed alternative metrics, such as our model. As shown in Fig. 9b, the Adam optimizer
F1-score, precision, and recall to evaluate the model’s achieved the highest accuracy among the options
performance for each class. These metrics provide a tested. Figure 11a through d further confirm Adam’s
more comprehensive understanding of the model’s abil- consistent performance improvement with each train-
ity to detect minority class instances. ing epoch.
When training a neural network, key hyperparameters Finally, while we did not have the opportunity to test
play a crucial role in shaping how the network learns and the model in a real network environment (see Sect. 6),
performs. In our model, we experimented with different the runtime measurements presented in Table 13 indi-
learning rates and optimizers to identify the optimal set- cate that the model could operate within a real network
tings that yield the best accuracy and parameter optimi- without introducing significant latency. This potential
zation. Figure 9 illustrates how model accuracy responds will be further validated in future studies.
to various learning rates and optimizer choices, while
Figs. 10 and 11 provide detailed views of these effects as 6 Conclusion and future work
training progresses across epochs. In this paper, we proposed an NIDS based on deep learn-
We experimented with three learning rates: 0.01, ing, utilizing a hybrid deep convolutional neural network
0.001, and 0.0001. As shown in Fig. 9a, a learning rate model. This model combines two deep learning para-
of 0.0001 achieved the highest accuracy among these digms, namely convolutional neural network and deep
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 20 of 23
Fig. 10 Comparison of the model accuracy and loss (y-axis) with different learning rate values and different number of epochs (x-axis): (a)
epochs versus accuracy for multi-class classification, (b) epochs versus accuracy for binary-class classification, (c) epochs versus loss for multi-class
classification, and (d) epochs versus loss for binary-class classification
neural network. To evaluate the performance of the pro- field of network security and has the potential to greatly
posed model, experiments were conducted using three improve the overall security of interconnected networks
benchmark datasets. In order to thoroughly analyze and and systems. The model implementation can be found at
evaluate the model performance, two distinct scenarios this link https://github.com/AhmedShebl13/NIDS.
were taken into consideration, namely multi-class classi- The presented model offers several avenues for future
fication as well as binary-class classification. development. First, the detection of minor classes,
The findings indicate that the model attained remark- as elaborated in the Sect. 5, could be enhanced to
ably high F-score, accuracy, positive predictive value, improve the classification accuracy. Second, evaluating
and true positive rate. The scores obtained for binary and the model with additional benchmark datasets would
multi-class classifications is ranging from 99.25 to 99.98% strengthen its robustness. Third, integrating innovative
depending on the dataset used, which further improved solutions, such as blockchain, could further expand the
existing models in the literature. By leveraging the power model’s applications. Fourth, to provide insight into the
of deep learning, our model is able to effectively analyze model’s performance in a real network environment,
complex patterns in network traffic data and accurately we have relied on its detection runtime. However, test-
identify potential intrusions. This will contribute to the ing the model by deploying it in an actual environment
Shebl et al. EURASIP Journal on Information Security (2024) 2024:36 Page 21 of 23
Fig. 11 Comparison of the model accuracy and loss (y-axis) with different optimizers and different number of epochs (x-axis): (a) epochs
versus accuracy for multi-class classification, (b) epochs versus accuracy for binary-class classification, (c) epochs versus loss for multi-class
classification, and (d) epochs versus loss for binary-class classification
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.