Feature Extraction For Machine Learning-Based Intrusion Detection in
Feature Extraction For Machine Learning-Based Intrusion Detection in
A R T I C L E I N F O A B S T R A C T
Keywords: A large number of network security breaches in IoT networks have demonstrated the unreliability of current
Feature extraction Network Intrusion Detection Systems (NIDSs). Consequently, network interruptions and loss of sensitive data have
Machine learning occurred, which led to an active research area for improving NIDS technologies. In an analysis of related works, it
Network intrusion detection system
was observed that most researchers aim to obtain better classification results by using a set of untried combi-
IoT
nations of Feature Reduction (FR) and Machine Learning (ML) techniques on NIDS datasets. However, these
datasets are different in feature sets, attack types, and network design. Therefore, this paper aims to discover
whether these techniques can be generalised across various datasets. Six ML models are utilised: a Deep Feed
Forward (DFF), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Decision Tree (DT),
Logistic Regression (LR), and Naive Bayes (NB). The accuracy of three Feature Extraction (FE) algorithms is
detected; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA), are
evaluated using three benchmark datasets: UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018. Although PCA and AE
algorithms have been widely used, the determination of their optimal number of extracted dimensions has been
overlooked. The results indicate that no clear FE method or ML model can achieve the best scores for all datasets.
The optimal number of extracted dimensions has been identified for each dataset, and LDA degrades the per-
formance of the ML models on two datasets. The variance is used to analyse the extracted dimensions of LDA and
PCA. Finally, this paper concludes that the choice of datasets significantly alters the performance of the applied
techniques. We believe that a universal (benchmark) feature set is needed to facilitate further advancement and
progress of research in this field.
1. Introduction security measures in IoT networks have proven unreliable against un-
precedented attacks [4]. For instance, in 2017, attackers compromised a
Cyber-security attacks and their associated risks have significantly casino's sensitive database through an IoT fish tank's thermometer. Ac-
increased since the rapid growth of the interconnected digital world [1], cording to the Nozomi networks' report, new and modified IoT botnet
e.g., the Internet of Things (IoT) and Software-Defined Networks (SDN) attacks increased rapidly in the first half of 2020, with 57% of IoT devices
[2]. IoT is an ecosystem of interrelated digital devices and objects known vulnerable to attacks [5]. According to the Symantec Internet Security
as "thing" [3]. They are embedded with sensors, computing chips and Threat Report, more than 2.4 million new malware variants were created
other technologies to collect and exchange data over the internet. IoT in 2018 [6]. That led to growing interest in improving the capabilities of
networks aim to increase the productivity of the hosting environment, NIDSs to detect unprecedented attacks. Therefore, new innovative ap-
such as industrial systems and "smart" buildings. IoT devices are growing proaches are required to enhance the attack detection performance of
significantly, with an expected number of 50 billion devices by the end of Network Intrusion Detection Systems (NIDSs).
2020 [3]. This growth has led to an increase in cyber attacks and the risks An NIDS is implemented in a network to analyse traffic flows to detect
associated with them. Consequently, businesses and governments are security threats and protect digital assets [7]. It is designed to provide
proactively looking for new ways to protect their personal and organ- high cyber-security protection in operational infrastructures and aims to
isational data stored on networked devices. Unfortunately, current preserve the three principles of information systems security:
* Corresponding author.
E-mail address: [email protected] (M. Sarhan).
https://doi.org/10.1016/j.dcan.2022.08.012
Received 12 March 2021; Received in revised form 16 July 2022; Accepted 31 August 2022
Available online 7 September 2022
2352-8648/© 2024 Chongqing University of Posts and Telecommunications. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
confidentiality, integrity, and availability [7]. Detecting cyber-attacks benchmark datasets, UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018 have
and threats have been the primary goal of NIDSs for a long time. There been studied. The results of the complete full dataset, without any FE
are two main types of NIDSs: Signature-based aims to match and compare algorithm applied, are also calculated for comparison. The extracted
the signatures from an incoming traffic with a database of predetermined outputs of PCA and LDA are analysed by calculating their respective
signatures of previously known attacks [8]. Although they usually pro- variance score. The optimal numbers of dimensions when applying the
vide a high level of detection accuracy for precedented attacks, they fail AE and PCA algorithms are found by experimenting with 1, 2, 3, 4, 5, 10,
to detect zero-day or modified threats that do not exist in the database. As 20, and 30 dimensions. This paper is structured as follows; in Section 2,
attackers constantly change their techniques and strategies for con- related works conducted in this field are explained. It is followed by a
ducting attacks to evade current security measures, NIDSs must be methodology section where the data processing, FE algorithms, and ML
adaptive to evolving detection approaches. However, the current method classifiers used and their architectures and parameters are mentioned. In
for tuning signatures to keep up with changing attack vectors is unreli- Section 4, the datasets used and their importance in research are dis-
able. Anomaly-based NIDSs aim to overcome the limitations faced by cussed, the evaluation metrics used are defined, and the results achieved
signature NIDSs by using advanced statistical methods, which have are listed and explained. In summary, the key contributions of the paper
enabled researchers to determine the behavioural patterns of network are:
traffic. Various methods are used for anomaly detection, such as statis-
tical-, knowledge- and Machine Learning (ML)-based techniques [8]. Experimental evaluation of 18 combinations of FE algorithms and ML
Generally, they can achieve higher accuracy and Detection Rate (DR) classifiers across three NIDS datasets.
levels for zero-day attacks, as they focus on matching attack patterns and Exploration of the number of feature dimensions and their impact on
behaviours rather than signatures [9]. However, anomaly NIDSs suffer the classification performance.
from high False Alarm Rates (FARs) as they can identify any unique Analysis of feature variance and their correlation to the detection
benign traffic that deviates from secure behaviour as an anomaly. accuracy.
Current signature NIDSs have proven unreliable for detecting zero-
day attack signatures [10] as they pass through IoT networks. This is 2. Related works
due to the lack of known attack signatures in the system's database. To
prevent these incidents from recurring, many techniques, including ML, This section provides an overview of related papers and studies in this
have been developed and applied with some success. ML is an emerging area. Due to the rapidly evolving nature of networks, new attack sce-
technology with new capabilities to learn and extract harmful patterns narios appear daily, and the age of a dataset is critical. As old datasets
from network traffic, which can be beneficial for detecting security contain outdated patterns of benign and attack traffic, they are consid-
threats [11]. Deep Learning (DL) is an emerging branch of ML that has ered obsolete and have limited significance. Therefore, datasets released
proven very successful in detecting sophisticated data patterns [12]. Its within the last five years are selected as they represent up-to-date
models are inspired by biological neural systems in which a network of network traffic. An updated version of CSE-CICIDS2017, known as
interconnected nodes transmits data signals. Each node contains a CSE-CIC-IDS2018, was released publicly by the University of New
mathematical activation function that converts input to output. These Brunswick. Although the University of New South Wales released
models consist of hidden layers that can further extract complex patterns another dataset known as ToN-IoT in late 2019, limited papers that used
in network traffic. These patterns are learnt through network attack it were found at the time of writing. Therefore, examining this dataset
vectors, which can be obtained from various features transmitted and its performance against those very well-known and widely used
through network traffic, such as packet count/size, protocols, services datasets is another contribution of this paper. Researchers have widely
and flags. Each attack type has a different identifying pattern, known as a used the UNSW-NB15 dataset due to its various features and attack types.
set of events that may compromise the security principles of networks if Papers in which the UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018 data-
undetected. sets were used are analysed in the following paragraphs.
Researchers have developed and applied various ML models, which In [14], the authors implemented a CNN model and evaluated it on
are often combined with Feature Reduction (FR) algorithms to poten- the UNSW-NB15 dataset. The CNN uses max-pooling, and a complete list
tially improve their performance. Using a set of evaluation metrics, of its hyper-parameters is provided. Experiments were conducted with
promising results for the detection capabilities of ML have been obtained, different numbers of hidden layers and an addition of a Long Short Term
but these models are not yet reliable for real production IoT networks. Memory (LSTM) layer. The three-layer network performed best on the
The trend in this field is to outperform state-of-the-art results for a spe- balanced and unbalanced datasets, achieving an accuracy of 85.86% and
cific dataset rather than to gain insights into an ML-based NIDS appli- 91.2%, respectively, with the minority class oversampled to balance the
cation [13]. Therefore, the extensive amount of academic research label classes. The authors also compared three activation functions
conducted outweighs the number of actual deployments in the real (sigmoid, relu, and tanh), with sigmoid obtaining the best accuracy of
operational world. Although this could be due to the high cost of errors 91.2%. Although they claimed to have built a reliable NIDS model, a DR
compared with those in other domains [13], it may also be that these of 96.17% and FAR of 14% are not ideal. They also did not evaluate their
techniques are unreliable in a real environment. This is because they are best model on various datasets to determine its stability or performance
often evaluated on a single dataset consisting of a list of features that for different attack types or packet features. Khan et al. explored the five
might not be feasible for collection or storage in a live IoT network feed. algorithms DT, RF, Gradient Boosting (GB), AdaBoost, and NB on the
Moreover, due to the nature of ML, there is often room for improvement UNSW-NB15 dataset with an extra tree classifier for FE. The extracted
in its hyper-parameters when implemented on a specific dataset. features could have been heavily influenced by identifying features such
Therefore, this paper aims to measure the generalisability of Feature as IPs and ports, which are biased towards attacking/victim nodes. The
Extraction (FE) algorithms and ML models combinations on different results showed that RF (98.60%) achieved the best score, followed by
NIDS datasets. AdaBoost (97.92%) and DT (97.85%). However, in terms of prediction
In this paper, the effectiveness of three DL models in detecting attack times, DT performed the best with 0.75s, while RF and AdaBoost took
vectors has been measured and compared with three Shallow Learning 6.97s and 21.22s, respectively [15].
(SL) models, i.e., Deep Feed Forward (DFF), Convolution Neural Network In [16], the authors investigated various activation functions (relu,
(CNN), Recurrent Neural Network (RNN), Decision Trees (DT), Logistic sigmoid, tanh, and softsign) and optimisers (adam, sgd, adagrad, nadam,
Regression (LR) and Naive Bayes (NB). Three FE algorithms, namely, adamax and RMSProp) with different numbers of nodes in the hidden
Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) layers. They aimed to find the optimal set of hyper-parameters for po-
and Auto-encoder (AE), have been explored, and their effects on three tential use in an NIDS. The experiment was conducted using DFF and
206
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
LSTM architectures on the UNSW-NB15 dataset. There was no substantial numerical results. This can also be achieved by modifying any
improvement using LSTM rather than DFF, with the relu activation hyper-parameters used, which often have room for improvement when
function outperforming the others. Most optimisers performed similarly applied to a certain dataset. In most papers, experiments were conducted
well, except for SGD, which was less accurate. They claimed that their using a single dataset which questions the conclusion that their proposed
best setting for the hyper-parameters was using relu, adam, and a number techniques could be generalised across datasets. As each dataset contains
of nodes following a configuration with the rule 0.75 input þ output. its own private set of features, there are variations in the information
Their best accuracy results were 98.8% for DFF and 98% for LSTM. presented. Consequently, these proposed techniques may have different
However, in the paper, neither the flow identifier features were dropped performances, strongly influenced by the chosen dataset. The experi-
nor their best-claimed set of hyper-parameters is evaluated on another mental issues mentioned above create a gap between the extensive aca-
dataset. In Ref. [17], the authors proposed an AE neural architecture demic research conducted on ML-based NIDSs and the actual
consisting of LSTM and dense layers as an FE tool. The extracted output is deployments of ML-based NIDS in the operational world. However,
then fed into an RF classifier to perform the attack detection. Three compared with other applications, the same ML tools have been deployed
datasets, UNSW-NB15, ToN-IoT, and NSL-KDD, were used to evaluate the in commercial scenarios with great success. We believe this is due to the
performance of the proposed methodology. The results indicate that the high cost of errors in the NIDS domain, making it critical to design an
chosen classifier achieves higher detection performance without using optimal ML model before deployment. Therefore, as gaining insights into
compression methods. However, training time has been significantly the ML-based NIDS application is crucial, this paper explores the per-
reduced by using lower dimensions. formance of combinations of FE algorithms and ML models on different
In [18], the authors visually explored the effects of applying PCA and datasets. This will help determine if the best combination can be
AE on the UNSW-NB15 and NSL-KDD datasets. They also experimented generalised for all chosen datasets. Also, although applying PCA and AE
with different dimensions (ranging between 2 and 30) using the classi- algorithms have been common in recent papers, finding the optimal
fiers K Nearest Neighbour (KNN), DFF, and DT in a binary and multi-class number of dimensions to be used has been overlooked. The extracted
classification scenario. The study found that AE performed better than dimensions of PCA and LDA are analysed by computing the variance and
PCA for KNN and DFF, but both were similar for DT. An optimal number its correlation with the detection accuracy.
of dimensions (20) was found for the UNSW-NB15 dataset but not for the
NSL-KDD one. In Ref. [19], a CNN and an RNN model were designed to 3. Methodology
detect attacks in the CSE-CIC-IDS2018 dataset. The authors followed a
supervised binary classification where CNN outperformed RNN in This paper explores the effects of applying three FE techniques (PCA,
detecting each attack type. The authors have omitted some benign LDA and AE) on three DL models (DFF, CNN and RNN) and three SL
packets to balance attack and benign classes to improve classification classifiers (DT, LR and NB). For PCA and AE, several dimensions
performance. A significant increase in the performance was obtained in (1,2,3,4,5,10,20 and 30) are selected to potentially find the optimal
the detection of minority samples of attacks. Beloush et al. explored DT, number. Three publicly released NIDS datasets that reflect modern
NB, SVM and RF models on the UNSW-NB15 dataset. They have used network behaviour are utilised to conduct our experiments, with an
accuracy as the defining metric where RF achieved 97.49%, followed by overall representation provided in Fig. 1. The datasets are processed for
a DT score of 95.82%, and SVM and NB led to poor results. They applied efficient FE and ML procedures. Then, the predictions made by the
no FR techniques, where the full dataset's features have been utilised. classifiers are collected, and certain evaluation metrics are statistically
Training and testing times were also recorded, where NB achieved the
fastest time [20].
In [21], the CSE-CIC-IDS2018 dataset has been utilised to explore
seven different DL models, i.e., supervised (DFF, RNN and CNN) and
unsupervised (restricted Boltzmann, DBN, deep Boltzmann machine and
deep AE). The experiments also included a comparison of different
learning rates and numbers of hidden nodes. However, any data
pre-processing phase, including FR, was not mentioned. Moreover, the
flow identifiers were not dropped, which would have caused a bias to-
wards attacking victims’ nodes or applications. All models performed
similarly with slight variations in the DRs of their attack types. In terms of
overall accuracy, CNN had the highest of 97.38% when using 100 hidden
nodes with 0.5 as the learning rate. Increasing the number of hidden
nodes and learning rate improved the accuracy, but also increased the
training time. In Ref. [22], the authors compared two FE techniques,
namely, PCA and LDA, and proposed a linear discriminative PCA by
feeding the discriminant information output from the LDA into the PCA.
Although the ML model they used in their experiments was not identi-
fied, their method was evaluated on the UNSW-NB15 dataset. As their
technique did not perform well for detecting fuzzers and exploiting at-
tacks, they decided to eliminate them from some of their results which
are not ideal in a realistic network environment. Nevertheless, their re-
sults were still poor, with the best one for binary classification having a
DR of 92.35%. One of their stated future works is to determine the
optimal number of principal components, i.e., the number of dimensions
in a PCA.
Most of the works found in the literature still adopted the negative
habits addressed in Ref. [23], with researchers aiming to create new FR
methods and build new ML models to outperform the state-of-the-art
results. However, due to the nature of the domain, researchers can al-
ways find a combination or variation that would result in slightly better Fig. 1. System architecture.
207
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
X Xmin
X* ¼ (1)
Xmax Xmin
208
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
nodes in the encoder block decreases in the order of 30, 20 and 10, weighted connections mapping the high-level features as input to the
and the decoder block increases in the reverse order. The number of desired output. The weights are randomly initialised and then opti-
nodes in the middle layer is set to the number of output dimensions mised in the learning phase through a process known as back-
required. All the layers consist of the relu activation function, adam propagation. The input is a row (flow) of the CSV file fed into the
optimiser and binary cross-entropy loss function. input layer consisting of nodes equal to the number of input di-
mensions. Then, it passed through three hidden layers consisting of
20 dense nodes, each having a relu activation function. The weight
3.3. Machine learning and biases are optimised using the adam algorithm with the binary
cross-entropy loss function. Finally, due to our number of classes, the
ML is a subset of Artificial Intelligence (AI) that uses certain algo- output layer is a single sigmoidal unit. The dropout rate of 0.2 is used
rithms to learn and extract complex patterns from data. In the context of to remove 20% of the nodes' information to avoid over-fitting the
ML-based NIDS, ML models can learn harmful patterns from network training dataset. Fig. 2 presents the DFF architecture.
traffic, which can be beneficial in the detection of security threats. DL is Convolution Neural Network (CNN): An originally designed model
an emerging ML branch that is proven capable of detecting sophisticated to map images to outputs, which has proven to be effective when
data patterns. Its models are inspired by biological neural systems, in applied to any prediction scenario. Its hidden layers are typically
which a network of interconnected nodes transmits data signals. Building convolutional and pooling ones, and a fully connected CNN includes
an ML model following a supervised classification method involves two an additional dense layer. Convolutional layers extract features with
processes: training and testing. During the first phase, the model is kernels from the input, and the pooling layers can enhance these
trained using labelled malicious and benign network packets from the features. The input is converted to a 2-dimensional shape to be
training dataset to extract patterns and fit the corresponding model's compatible with the Conv1D layer. All layers have 20 filters, with
parameters. Then, the testing phase evaluates the model's reliability by kernel sizes of 3 in the input layer and 2 and 1 in the first and second
measuring its performance for classifying unseen attacks and benign hidden layers, respectively. All activation functions used in the con-
traffic on the testing set of unlabelled network packets. These predictions volutional layers are relu, and the average pooling size is 2 between
are compared with the actual labels in the testing dataset to evaluate the each set of two convolutional layers. The input is passed to a dropout
model using certain metrics explained in Section 3.4. with a value of 0.2 and then to the final dense sigmoid classifier. Fig. 3
The hyper-parameters used in the DL models are listed in Table 1. All presents the mapping and pooling of the input by the convolutional
three datasets used in the experiments suffer from a class imbalance in layers until a prediction is made by the dense output layer. The hid-
terms of the frequency of benign and attack samples, which usually den layers are removed for each input with less than 10 features, and
causes the model to predict the dominant class over the others. As the its kernel size is reduced to 1.
learning phase of an ML model is often biased towards the class with the Recurrent Neural Network (RNN): A model that can capture the
majority of samples, the minority class may not be well fitted or trained sequential information present in input data while making pre-
in the final model [24]. Due to the nature of the experiments, in two of dictions through an internal memory that stores a sequence of inputs,
the datasets, the minority class is an attack one, namely class 1, which is and it is successful in language-processing scenarios. Although there
critical for the model to be able to detect and classify samples in that are various types of RNNs, LSTM is the most commonly used type of
class. To deal with the datasets' imbalanced classes, weights are assigned RNN. Each LSTM node contains three gates: forget, input, and output.
to each class, with the minority having a "heavier" weight than the ma- The input is converted to a 3-dimensional shape to be compatible
jority. Therefore, the model emphasises or gives priority to the former with the requirements of the LSTM layer. The number of nodes is
class in the training phase [25]. The classes’ weights are calculated using equal to the number of input dimensions in the input layer. The input
Eq. (2). is then passed through a single hidden layer consisting of 10 nodes,
with relu activation functions. Then, the weight and bias of each
TotalSamplesCount
Wclass ¼ (2) feature and layer are optimised using the adam algorithm based on
2 ClassSamplesCount
the binary cross-entropy loss function. The output layer is a single
sigmoidal output unit. The dropout rate of 0.2 is used to remove 20%
Deep Feed Forward (DFF): A class of Multi-layer Perceptrons (MLPs) of the model's information to avoid over-fitting the training dataset.
that is usually constructed of three or more hidden layers. In this Fig. 4 presents the mapping of an input to its output through LSTM
model, the data is fed forward through the input layer and predictions layers.
are obtained on the outputs. Each layer consists of several nodes with Logistic Regression (LR): A linear classification model used for
predictive analysis. It uses the logistic function, also known as the
Table 1 sigmoid function, to classify a binary output. It calculates the prob-
DL hyper-parameters. ability of being an output class between 0 and 1. It is easy to imple-
Parameter DFF CNN CNN Feature RNN ment and requires few computational resources, but may not work
Features ≥ < 10 well for non-linear scenarios. The lbfgs optimisation algorithm is
10 selected with an l2 regularisation technique to specify the strategy for
Layeras Type Dense Conv1D Conv1D LSTM penalisation to avoid over-fitting. The tolerance value of the stopping
No. of Hidden 3 2 N/A 1 criteria is set to 1e-4, the value of the regularisation strength to 1, and
Layer(s) the maximum number of iterations to 100.
Hidden Layer 20/Relu 20/Relu N/A 10/Relu
Decision Trees (DT): A model that follows a tree series in which each
Neurons/
Function end node represents a high-level feature. The branches represent the
Output Layer Sigmoid Sigmoid Sigmoid Sigmoid outputs and the leaves represent the label classes. It uses a supervised
Function learning method mainly for classification and regression purposes,
Pooling N/A Average/2 Average/2 N/A aiming to map features and values to their desired outcome. It is
Type/Size
Optimi- Adam Adam Adam Adam
widely used because it is easy to build and understand, but it can
sation create an overcomplex tree that overfits the training data. The DT's
Loss Binary Binary Binary Binary Classification and Regression Trees (CART) algorithm is used due to
crossentropy crossentropy crossentropy crossentropy its capability to construct binary trees using the input features [26].
Dropout 0.2 0.2 0.2 0.2
The Gini impurity function is selected to measure the quality of a split.
209
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
TP þ TN
ACC ¼ (3)
TP þ TN þ FP þ FN calculated. In this section, the results for each dataset are initially dis-
cussed, and then all of them are considered for discussion. The early
DR: Detection Rate, also known as recall, is the number of correctly comparison of the models and FE algorithms is conducted using AUC as
classified attack samples divided by the total number of attack the comparison metric. For each dataset, the effects of applying the FE
samples algorithms using different dimensions for each ML model are presented
separately. Also, the best combination of an ML model and FE algorithm
TP is selected to measure its performance for detecting each attack type
DR ¼ (4)
TP þ FN statistically.
FAR: False Alarm Rate is the number of incorrectly classified attack 4.1. Datasets
samples divided by the total number of benign samples
Data selection is crucial for determining the reliability of ML models
FP
FAR ¼ (5) and the credibility of their evaluation phases. Obtaining labelled network
FP þ TN
data is challenging due to generation, privacy and security issues. Also,
production networks do not generate labelled flows, which is mandatory
F1: F1 Score is the harmonic mean of precision and DR when following a supervised learning methodology. Therefore, re-
searchers have created publicly available benchmark datasets for training
TP
Precision ¼ (6) and evaluating ML models. They are generated through a virtual network
TP þ FP
testbed set up in a lab, where normal network traffic is mixed with
synthetic attack traffic. The packets are then processed by extracting
2 Precision DR
F1 ¼ (7) certain features using particular tools and procedures. An additional label
Precision þ DR
feature is created to indicate whether a flow is malicious or benign. Each
sample is defined by a network flow, with a flow considered a unidi-
AUC: Area Under the Curve is the area under the Receiver Operating rectional data log between two end nodes where all the transmitted
Characteristics (ROC) curve that indicates the trade-off between the packets share specific characteristics such as IP addresses and port
DR and FAR. numbers. The following three datasets have been used:
Most metrics are heavily affected by the imbalance of classes in the
datasets. For example, a model can achieve a high accuracy or F1 score by UNSW-NB15: A commonly adopted dataset released in 2015 by the
predicting only the major class or having both a high DR and FAR, which Cyber Range Lab of the Australian Centre for Cyber Security (ACCS)
makes it not ideal. Therefore, a single metric cannot be used to differ- [27]. The dataset originally contains 49 features extracted by Argus
entiate between models. The ROC considers both the DR and FAR by and Bro-IDS, now called Zeek tools. Although pre-selected training
plotting them on the x- and y-axes, respectively, and then the AUC is and testing datasets were created, the full dataset has been utilised. It
calculated. This represents the trade-off between the two aspects and has 2,218,761 (87.35%) benign flows and 321,283 (12.65%) attack
measures the performance of an NIDS for distinguishing between attack ones, that is, 2,540,044 flows. Its flow identifier features are: id, srcip,
and benign flows. As shown in Fig. 5, the ROC curve for an optimal NIDS dstip, sport, dport, stime and ltime. The dataset contains non-integer
is aimed toward the top left-hand corner of the graph with the highest features, such as proto, service and state. The dataset contains nine
possible AUC value of 1. On the other hand, an imperfect NIDS generates attack types known as fuzzers, analysis, backdoor, Denial of Service
a graph of a diagonal line and has the lowest possible AUC value of 0.5. (DoS), exploits, generic, reconnaissance, shellcode and worms.
ToN-IoT: A recent heterogeneous dataset released in 2019 by ACCS
4. Results and discussion [28]. Its network traffic portion collected over an IoT ecosystem has
been utilised, and it is made up of mainly attack samples with a ration
The following results are obtained from the testing sets using a of 796,380 (3.56%) benign flows to 21,542,641 (96.44%) attack
stratified folding method of five-folds, and the mean results are ones, that is, 22,339,021 flows in total. It contains 44 original features
210
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
extracted by Bro-IDS tool. The flow identifier features are named: ts, Table 2
src_ip, dst_ip, src_port and dst_port. It contains non-integer features, such UNSW-NB15 classification metrics.
as proto, service and conn_state, ssl_version, ssl_cipher, ssl_subject, ssl_is- ML FE DIM ACC (%) F1 DR (%) FAR (%) AUC
suer, dns_query, http_method, http_version, http_resp_mime_types, http_or-
DFF FULL 40 98.33 0.85 99.97 1.75 0.9973
ig_mime_types, http_uri, http_user_agent, weird_addl and weird_name. Its LDA 1 98.34 0.85 99.88 1.74 0.9935
Boolean features include dns_AA, dns_RD, dns_RA, dns_rejected, ssl_re- PCA 20 98.18 0.84 99.87 1.90 0.9954
sumed, ssl_established and weird_notice. The dataset includes multiple AE 20 97.20 0.79 99.66 2.92 0.9949
attack settings, such as backdoor, DoS, Distributed DoS (DDoS), in- CNN FULL 40 98.22 0.84 99.85 1.86 0.9938
LDA 1 98.28 0.85 99.89 1.80 0.9937
jection, Man In The Middle (MITM), password, ransomware, scanning PCA 20 97.44 0.80 99.31 2.65 0.9935
and Cross-Site Scripting (XSS). AE 20 98.16 0.84 99.85 1.92 0.9960
CSE-CIC-IDS2018: A dataset released by a collaborative project be- RNN FULL 40 98.12 0.84 99.73 1.97 0.9915
tween the Communications Security Establishment (CSE) and Cana- LDA 1 98.31 0.85 99.88 1.77 0.9924
PCA 20 97.89 0.82 99.26 2.18 0.9913
dian Institute for Cybersecurity (CIC) in 2018 [29]. Their developed
AE 20 97.88 0.83 99.88 2.11 0.9941
tool called CICFlowMeter-V3 was used to extract 75 network data LR FULL 40 98.47 0.86 99.88 1.60 0.9914
features. The full dataset has been used, which has 13,484,708 LDA 1 98.34 0.84 99.41 1.71 0.9885
(83.07%) benign flows and 2,748,235 (16.93%) attack ones, that is, PCA 10 98.13 0.84 98.87 1.91 0.9848
16,232,943 flows. Its flow identifier features are called Dst IP, Flow ID, AE 20 98.13 0.84 99.59 1.95 0.9882
DT FULL 40 99.27 0.92 91.58 0.34 0.9562
Src IP, Src Port, Dst Port and Timestamp. Several attack settings were LDA 1 97.86 0.78 77.91 1.10 0.8841
conducted, such as brute-force, bot, DoS, DDoS, infiltration, and web PCA 3 97.41 0.73 72.37 1.31 0.8553
attacks. AE 20 98.67 0.86 85.15 0.65 0.9226
NB FULL 40 95.94 0.70 98.82 4.20 0.9731
LDA 1 98.34 0.85 99.39 1.71 0.9884
PCA 20 97.47 0.79 99.74 2.65 0.9854
4.2. UNSW-NB15
AE 30 97.02 0.75 91.87 2.72 0.9457
211
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
Table 3 Table 4
UNSW-NB15 attacks detection. ToN-IoT classification metrics.
Attack Type Actual Predicted DR (%) ML FE DIM ACC (%) F1 DR (%) FAR (%) AUC
Analysis 2185 2182 99.87 DFF FULL 37 95.45 0.84 76.67 1.26 0.9337
Backdoor 1984 1966 99.10 LDA 1 94.27 0.97 95.06 28.32 0.8953
DoS 5665 5621 99.23 PCA 5 95.97 0.98 96.91 30.94 0.9078
Exploits 27599 27532 99.76 AE 30 96.93 0.98 98.25 40.86 0.8010
Fuzzers 21795 21780 99.93 CNN FULL 37 95.43 0.98 96.29 29.32 0.9100
Generic 25378 25355 99.91 LDA 1 97.60 0.99 99.46 55.37 0.8155
Reconnaissance 13357 13342 99.89 PCA 10 96.44 0.98 97.29 27.68 0.9232
Shellcode 1511 1511 100 AE 20 96.78 0.98 97.59 26.39 0.9254
Worms 171 171 100 RNN FULL 37 86.35 0.93 87.40 43.80 0.7868
LDA 1 93.03 0.96 93.77 28.19 0.8801
PCA 5 96.13 0.98 96.90 26.02 0.9249
4.3. ToN-IoT
AE 4 96.02 0.98 96.93 30.18 0.9079
LR FULL 37 75.46 0.86 75.70 31.36 0.7217
Using the ToN-IoT dataset, the results achieved by each FE algorithm LDA 1 97.68 0.99 99.59 56.97 0.7131
and ML model are significantly different, as displayed in Fig. 7. Overall, PCA 5 75.44 0.86 75.68 31.49 0.7209
DT obtains the best possible results when it is applied to full dataset, and AE 30 95.46 0.98 96.31 28.81 0.8375
DT FULL 37 97.29 0.99 97.29 2.66 0.9731
AE is used. The DFF model achieves its best results on the complete full LDA 1 86.61 0.92 87.77 46.53 0.7062
dataset, performing obviously poorly when using AE but better when PCA 3 80.86 0.89 81.16 27.92 0.7662
using LDA and PCA as it is stable after 4 dimensions. For any dimension AE 20 98.23 0.99 98.28 3.21 0.9753
less than 10, CNN performs inefficiently with AE and PCA. Like DFF, RNN NB FULL 37 96.78 0.98 99.93 93.41 0.5326
LDA 1 97.77 0.99 99.64 55.48 0.7208
performs poorly when using AE but well when using PCA as it starts to
PCA 5 97.94 0.99 99.82 55.75 0.7203
stabilise with 2 dimensions. DT achieves great results when applied to the AE 20 91.47 0.95 93.24 58.98 0.6713
full datasets, similar to AE, with dimensions greater than or equal to 2.
However, when using LDA or PCA, it will generate defective results. LR
and NB do not perform efficiently on this dataset using any of the FE DFF, RNN, LR and NB obtain their best results for PCA using 5 di-
algorithms. LDA improves the performances of RNN and NB, but reduces mensions, making it the best number of dimensions, while AE requires a
those of DFF, CNN, and DT applied to the full dataset. higher number of 20. Table 5 displays the types of attacks in this dataset
The full metrics of the best results obtained by each FE method using
all ML models on the ToN-IoT dataset are listed in Table 4. The FAR
Table 5
values are considerably large because there are more attack samples than
ToN-IoT attacks detection.
benign samples in the dataset. DFF performs best when applied to the full
dataset, achieving a low FAR, i.e., 1.26%, and a low DR of 76.67%. AE Attack Type Actual Predicted DR (%)
decreases the performance of DFF even after using the maximum number Backdoor 505385 505256 99.97
of dimensions provided. FE algorithms, especially PCA, significantly DDoS 6082893 6010012 98.80
improve the performances of RNN and NB applied to the full dataset. DT DoS 1815909 1814699 99.93
Injection 452659 442137 97.68
obtains the highest scores when applied to the full dataset, and AE MITM 1043 773 74.11
extracted dimensions. The best is to use AE with 10 dimensions as the DR Password 1365958 1359372 99.52
of 98.28% and FAR of 3.21% are recorded, but ineffective when using Ransomware 32214 10781 33.47
PCA and LDA. LR and NB achieve the worst performances of the six ML Scanning 7140158 6974943 97.69
XSS 2108944 2084863 98.86
models. LDA proves unreliable compared to PCA and AE for all learning
models except RNN and NB.
212
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
and their actual number of samples compared with the number of clas- extracted LDA feature of the UNSW-NB15 dataset has a significantly
sified ones. The best-performing combination of FE and ML methods has higher variance compared to the other two datasets. This might indicate
been used for prediction, and DT is applied to an AE of 20 dimensions that one or a very small number of features in the UNSW-NB15 dataset
having a 98.28% DR. This table shows that each attack type is almost strongly correlate to the labels. This is consistent with the results
fully detected except for MITM and ransomware because there are few observed in Figs. 6–8, where the LDA for the UNSW-NB15 dataset ach-
samples of each of the models to train on. Scanning and injection attacks ieves a significantly higher classification accuracy than the other two
have 97.69% and 97.68% DRs, respectively, despite their sufficient datasets. The classification accuracy of LDA for UNSW-NB15 is close to
samples, indicating that their patterns are more complex. that achieved with the full dataset, i.e., with the complete set of features.
The results for the datasets have been grouped based on the ML
4.4. CSE-CIC-IDS2018 models, as shown in Fig. 11. The best dimensions of PCA and AE are
selected for a fair comparison. It is clear that patterns form the effects of
As illustrated in Fig. 8, the DL models perform equally well in terms of applying FE algorithms. In Fig. 11(a), DFF is the best when applied to the
their best AUC scores. DFF is applied to the full dataset, and good full dataset due to the ability of a dense network to assign weights to
detection performance is achieved when PCA is used. The effects of the relevant features, while AE lowers the detection accuracy of the DFF
AE's and PCA's changing dimensions are very similar for CNN as it also model. Figure Fig. 11(b) shows that applying CNN to the full dataset or
has difficulty in classification using a lower number of dimensions. RNN using PCA or AE does not significantly alter its performance, but using
performs equally using all FE algorithms, with AE slightly better than LDA, the outcome deteriorates. In Fig. 11(c), the necessity of applying an
others. DT performs well with AE and when applied to the full dataset, FE algorithm before using RNN is obvious, with the best being AE, fol-
but performs very poorly with LDA and PCA. Using AE requires only 3 lowed by PCA, and lastly, LDA. Fig. 11(d) proves the unreliability of
dimensions to stabilise and reach its maximum AUC. NB obtains its best using LDA or PCA for a DT model, whereas this model works efficiently
results using LDA and PCA, peaking at dimension 20, and LR performs when applied to the full dataset or using AE. In Fig. 11(e), applying a
equally using the three FE algorithms. Moreover, AE and PCA have linear FE algorithm, namely, LDA or PCA, improves the performance of
similar impacts on all ML models except DT, for which AE significantly the NB model. LDA achieves the best results, while the NB has the worst
outperforms PCA. LR and NB perform poorly throughout the results among the six ML models without an FE algorithm. Fig. 11(f)
experiments. shows that applying LR to the full dataset or using FE methods leads to
Table 6 displays the best score obtained by the FE algorithms for each the same results where AE improves the model's performance on the ToN-
ML model applied to the CSE-CIC-IDS2018 dataset. DFF and CNN achieve IoT dataset while LDA decreases it on the CSE-CIC-IDS2018. Overall,
their best performances when applied to the full dataset, while the FE there is a clear pattern of the effects of the FE methods and classification
algorithms improve the classification capability of RNNs. LDA performs capabilities of ML models for the three datasets. Models such as RNN and
worse than AE and PCA for all models except NB. However, LR and NB NB benefit from applying FE algorithms, whereas DFF does not. LDA's
are ineffective in detecting attacks present in this dataset. The optimal general performance is negative for the ToN-IoT and CSE-CIC-IDS2018
numbers of PCA and AE dimensions are 20 and 10, respectively, due to datasets when using all ML models except NB. This is explained by the
their requirement in most ML classifiers. In Table 7, attack types in the low variance scores achieved by the two datasets compared to the UNSW-
dataset and their actual numbers compared with their correct predictions NB15 dataset. However, LR and NB do not perform well for detecting
are presented. The best-performing combination of the model and FE attacks in the three datasets, with the best scores attained by a different
algorithm has been used for prediction; that is, the DT classifier is applied set of techniques.
to 10 extracted dimensions using AE. This table shows that each attack The experimental evaluation of 18 different combinations of FE and
type is almost fully detected, except Brute Force -Web, Brute Force -XSS, ML techniques has assisted in finding the optimal combination for each
and SQL injection, due to their low number of sample counts in the dataset used. On the UNSW-NB15 dataset, the CNN classifier obtains the
dataset, which matches the findings in Ref. [30]. However, infiltration best score when applied to the AE dimensions. On the ToN-IoT and CSE-
attacks are more difficult to detect despite their majority in the dataset. CIC-IDS2018 ones, DT outperforms the other models and achieves the
This could be due to the similarity of its statistical distribution with best scores using the AE technique. However, no single method works
another class type, which leads to confusion of the detection model. best across the utilised NIDS datasets. This is caused by the vast differ-
Further analysis is required, such as t-tests, to measure the difference ence in the feature sets that make up the utilised datasets. Therefore, it is
between the distributions of each class. very necessary to create a universal set of features for future NIDS
datasets is essential. The universal set needs to be easily generated from
4.5. Discussion live network traffic headers as they do not require deep packet inspec-
tion, which is challenging in encrypted traffic. The features should also
According to the evaluation results, it has been observed that a not be biased towards providing information on limited protocols or
relatively small number of feature dimensions can achieve the classifi- attack types but rather on all network traffic and attack scenarios. The
cation performance close to the maximum. In addition, the marginal features will be required to be small in number to enable a feasible
income of more dimensions is very small. The outputs of LDA and PCA deployment, but contain an adequate number of security events to aid in
are analysed using their respective variance to understand and explain the successful detection of network attacks. The optimal number of di-
this behaviour. The variance is the distribution of the squared deviations mensions has been identified for all three datasets, which is 20 di-
of the output from its respective mean. The variance of each dimension mensions. This is indicated in Fig. 9, where further dimensions gain no
extracted from all the datasets using PCA and LDA is discussed. additional informational variance. After analysing the DR of each attack
Measuring the variance of the dimensions being fed into the ML classi- type based on the best-performing models, it can be concluded that in a
fiers is necessary for this field. It will aid in understanding how FE perfect dataset, the number of attack samples needs to be balanced to be
techniques perform on NIDS datasets. efficient in binary classification scenarios.
Fig. 9 shows the variance of each dimension extracted in PCA for the
three datasets. As observed, the first 10 feature dimensions account for 5. Conclusions
the bulk of the variance, with a minor contribution of additional di-
mensions. This is consistent with and explains the results in Figs. 6–8, In this paper, PCA, autoencoder and LDA have been investigated and
where a higher number of features beyond 10 does not provide any evaluated regarding their impact on the classification performance ach-
further increase in classification accuracy. Fig. 10 displays the variance of ieved in conjunction with a range of machine learning models. Variance
the single LDA feature for each of the three considered datasets. The is used to analyse their performance, particularly the correlation between
213
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
Table 6
CSE-CIC-IDS2018 classification metrics.
ML FE DIM ACC (%) F1 DR (%) FAR (%) AUC
Table 7
CSE-CIC-IDS2018 attacks detection.
Attack Type Actual Predicted DR (%)
214
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
algorithms (LR, DT and NB) have been applied to three recent benchmark [6] Symantec, Internet Security Threat Report, vol. 24, 2019. URL, https://docs.bro
adcom.com/doc/istr-24-2019-en.
NIDS datasets, i.e., UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018. In this
[7] S.F. Yusufovna, Integrating intrusion detection system and data mining, in: 2008
paper, the optimal combination for each dataset has been mentioned. The International Symposium on Ubiquitous Multimedia Computing, 2008,
optimal number of extracted feature dimensions has been identified for pp. 256–259, https://doi.org/10.1109/UMC.2008.59.
each dataset through an analysis of variance and their impact on the [8] P. García-Teodoro, J. Díaz-Verdejo, G. Macia-Fernandez, E. Vazquez, Anomaly-
based network intrusion detection: techniques, systems and challenges, Comput.
classification performance. However, among the 18 tried combinations Secur. 28 (1–2) (2009) 18–28, https://doi.org/10.1016/j.cose.2008.08.003.
of FE algorithm and ML classifiers, no single combination performs best [9] P.V. Amoli, T. Hamalainen, G. David, M. Zolotukhin, M. Mirzamohammad,
across all three NIDS datasets. Therefore, it is important to note that Unsupervised network intrusion detection systems for zero-day fast-spreading
attacks and botnets, JDCTA, Int. J. Digit. Contents.Technol.Appl. 10 (2) (2016)
finding a combination of an FE algorithm and ML classifier that performs 1–13.
well across a wide range of datasets and in practical application scenarios [10] M.J. Hashemi, G. Cusack, E. Keller, Towards evaluation of nidss in adversarial
is far from trivial and needs further investigation. While research which setting, in: Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine
Learning and Artificial Intelligence for Data Communication Networks, 2019,
aims to improve the intrusion detection and attack classification per- pp. 14–21.
formance for a particular data and feature set by a few percentage points [11] C. Sinclair, L. Pierce, S. Matzner, An application of machine learning to network
is valuable, we believe a stronger focus should be placed on the gen- intrusion detection, in: Proceedings 15th Annual Computer Security Applications
Conference (ACSAC’99), IEEE, 1999, pp. 371–377.
eralisability of the proposed algorithms, especially their performance in [12] A. Javaid, Q. Niyaz, W. Sun, M. Alam, A deep learning approach for network
more practical network scenarios. In particular, we believe it is crucial to intrusion detection system, in: Proceedings of the 9th EAI International Conference
work towards defining generic feature sets that are applicable and effi- on Bio-Inspired Information and Communications Technologies, formerly
BIONETICS), 2016, pp. 21–26.
cient across a wide range of NIDS datasets and practical network settings.
[13] R. Sommer, V. Paxson, Outside the closed world: on using machine learning for
Such a benchmark feature set would allow a broader comparison of network intrusion detection, in: 2010 IEEE Symposium on Security and Privacy,
different ML classifiers and would significantly benefit the research IEEE, 2010, pp. 305–316.
community. Finally, explaining the internal operations of ML models [14] M. Azizjon, A. Jumabek, W. Kim, 1d cnn based network intrusion detection with
normalization on imbalanced data, 2020 International Conference on Artificial
would attract the benefits of Explainable AI (XAI) in the NIDS domain. Intelligence in Information and Communication (ICAIIC)doi:10.1109/
icaiic48513.2020.9064976.
Declaration of competing interest [15] S. Khan, E. Sivaraman, P.B. Honnavalli, Performance evaluation of advanced
machine learning algorithms for network intrusion detection system, in:
Proceedings of International Conference on IoT Inclusive Life (ICIIL 2019), NITTTR,
The authors declare that they have no known competing financial Chandigarh, India, 2020, pp. 51–59, https://doi.org/10.1007/978-981-15-3020-3_
interests or personal relationships that could have appeared to influence 6.
[16] X.A. Larriva-Novo, M. Vega-Barbas, V.A. Villagra, M. Sanz Rodrigo, Evaluation of
the work reported in this paper. cybersecurity data set characteristics for their applicability to neural networks
algorithms detecting cybersecurity anomalies, IEEE Access 8 (2020) 9005–9014,
References https://doi.org/10.1109/access.2019.2963407.
[17] A. Andalib, V.T. Vakili, A Novel Dimension Reduction Scheme for Intrusion
Detection Systems in Iot Environments, 2020 05922 arXiv:2007.
[1] I. Stellios, P. Kotzanikolaou, M. Psarakis, C. Alcaraz, J. Lopez, A survey of iot-
[18] W. Zong, Y.-W. Chow, W. Susilo, Dimensionality reduction and visualization of
enabled cyberattacks: assessing attack paths to critical infrastructures and services,
network intrusion detection data, Information Security and Privacy (2019)
IEEE,Commun. Surv. Tutorials 20 (4) (2018) 3453–3495.
441–455, https://doi.org/10.1007/978-3-030-21548-4_24.
[2] N. Sultana, N. Chilamkurti, W. Peng, R. Alhadad, Survey on sdn based network
[19] W. Tao, W. Zhang, C. Hu, C. Hu, A Network Intrusion Detection Model Based on
intrusion detection system using machine learning approaches, Peer-to-Peer.Netw.
Convolutional Neural Network, Security with Intelligent Computing and Big-Data
Appl. 12 (2) (2019) 493–501.
Services, 2019, pp. 771–783, https://doi.org/10.1007/978-3-030-16946-6_63.
[3] M.A. Khan, K. Salah, Iot security: review, blockchain solutions, and open
[20] M. Belouch, S. El Hadaj, M. Idhammad, Performance evaluation of intrusion
challenges, Future Generat. Comput. Syst. 82 (2018) 395–411.
detection based on machine learning using Apache spark, Procedia Comput. Sci.
[4] M. Nawir, A. Amir, N. Yaakob, O.B. Lynn, Internet of things (iot): taxonomy of
127 (2018) 1–6, https://doi.org/10.1016/j.procs.2018.01.091.
security attacks, in: 2016 3rd International Conference on Electronic Design (ICED),
[21] M. A. Ferrag, L. Maglaras, H. Janicke, R. Smith, Deep Learning Techniques for Cyber
IEEE, 2016, pp. 321–326.
Security Intrusion Detection : A Detailed Analysisdoi:10.14236/ewic/icscsr19.16.
[5] A. Pinto, Ot/iot security report: rising iot botnets and shifting ransomware escalate
enterprise risk, URL, https://www.nozominetworks.com/blog/what-it-needs-to
-know-about-ot-io-security-threats-in-2020/, 2020.
215
M. Sarhan et al. Digital Communications and Networks 10 (2024) 205–216
[22] H. Qiao, J. O. Blech, H. Chen, A machine learning based intrusion detection [27] N. Moustafa, J. Slay, Unsw-nb15: a comprehensive data set for network intrusion
approach for industrial networks, 2020 IEEE International Conference on Industrial detection systems (unsw-nb15 network data set), 2015 Military Communications
Technology (ICIT)doi:10.1109/icit45562.2020.9067253. and Information Systems Conference (MilCIS)doi:10.1109/milcis.2015.7348942.
[23] R. Sommer, V. Paxson, Outside the closed world: on using machine learning for [28] N. Moustafa, Ton-iot Datasets, 2019, https://doi.org/10.21227/fesz-dm97,
network intrusion detection, 2010 IEEE Symposium on Security and Privacydoi: 10.21227/fesz-dm97. URL.
10.1109/sp.2010.25. [29] I. Sharafaldin, A. Habibi Lashkari, A.A. Ghorbani, Toward generating a new
[24] A. Fernandez, B. Krawczyk, S. Garcia, M. Galar, F. Herrera, R.C. Prati, Learning from intrusion detection dataset and intrusion traffic characterization, Proceedings of the
Imbalanced Data Sets, first ed., Springer, 2018. 4th International Conference on Information Systems Security and Privacy,
[25] X. Guo, Y. Yin, C. Dong, G. Yang, G. Zhou, On the class imbalance problem, 2008 10.5220/0006639801080116. URL, https://registry.opendata.aws/cse-cic
Fourth International Conference on Natural Computationdoi:10.1109/ -ids2018/.
icnc.2008.871. [30] X. Li, W. Chen, Q. Zhang, L. Wu, Building auto-encoder intrusion detection system
[26] T.K. Ho, Random decision forests, in: Proceedings of 3rd International Conference based on random forest feature selection, Comput. Secur. 95 (2020) 101851,
on Document Analysis and Recognition, vol. 1, IEEE, 1995, pp. 278–282. https://doi.org/10.1016/j.cose.2020.101851.
216