0% found this document useful (0 votes)
20 views9 pages

Three Stage Data Generation Algorithm For Multicla 2023 International Journa

The article presents a three-stage data generation algorithm for multiclass network intrusion detection (NID) aimed at addressing challenges posed by highly imbalanced datasets. The proposed method combines synthetic minority over-sampling technique, generative adversarial networks, and variational autoencoders, achieving an accuracy of 91.9%–96.2% across four benchmark datasets, outperforming existing models. The study emphasizes the importance of effective data generation and feature extraction in enhancing the performance of NID systems.

Uploaded by

hzn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

Three Stage Data Generation Algorithm For Multicla 2023 International Journa

The article presents a three-stage data generation algorithm for multiclass network intrusion detection (NID) aimed at addressing challenges posed by highly imbalanced datasets. The proposed method combines synthetic minority over-sampling technique, generative adversarial networks, and variational autoencoders, achieving an accuracy of 91.9%–96.2% across four benchmark datasets, outperforming existing models. The study emphasizes the importance of effective data generation and feature extraction in enhancing the performance of NID systems.

Uploaded by

hzn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Intelligent Networks 4 (2023) 202–210

Contents lists available at ScienceDirect

International Journal of Intelligent Networks


journal homepage: www.keaipublishing.com/en/journals/
international-journal-of-intelligent-networks

Three-stage data generation algorithm for multiclass network intrusion


detection with highly imbalanced dataset
Kwok Tai Chui a, *, Brij B. Gupta b, c, i, j, k, **, Priyanka Chaurasia d, Varsha Arya e, l,
Ammar Almomani f, g, Wadee Alhalabi h
a
Department of Electronic Engineering and Computer Science, School of Science and Technology, Hong Kong Metropolitan University, Hong Kong, China
b
International Center for AI and Cyber Security Research and Innovations (CCRI), Taichung 413, Taiwan, China
c
Department of Computer Science and Information Engineering, Asia University, Taichung 413, Taiwan, China
d
School of Computing, Engineering and Intelligent Systems, Ulster University, UK
e
Department of Business Administration, Asia University, Taiwan, China
f
School of Information Technology, Skyline University College, Sharjah, P.O. Box 1797, United Arab Emirates
g
Al- Balqa Applied University, Jordan
h
Immersive Virtual Reality Research Group, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
i
Symbiosis Centre for Information Technology (SCIT), Symbiosis International University, Pune, India
j
Lebanese American University, Beirut, 1102, Lebanon
k
Birkbeck, University of London, UK
l
Chandigarh University, Chandigarh, India

A R T I C L E I N F O A B S T R A C T

Keywords: The Internet plays a crucial role in our daily routines. Ensuring cybersecurity to Internet users will provide a safe
Convolutional neural network online environment. Automatic network intrusion detection (NID) using machine learning algorithms has
Data generation recently received increased attention recently. The NID model is prone to bias towards the classes with more
Generative adversarial network
training samples due to highly imbalanced datasets across different types of attacks. The challenge in generating
Kernel function
Multiclass classification
additional training data for minority classes is the generation of insufficient data. The study’s purpose is to
Network intrusion detection address this challenge, which extends the data generation ability by proposing a three-stage data generation
Support vector machine algorithm using the synthetic minority over-sampling technique, a generative adversarial network (GAN), and a
Synthetic minority over-sampling technique variational autoencoder. A convolutional neural network is employed to extract the representative features from
the data, which were fed into a support vector machine with a customised kernel function. An ablation study
evaluated the effectiveness of the three-stage data generation, feature extraction, and customised kernel. This
was followed by a performance comparison between our study and existing studies. The findings revealed that
the proposed NID model achieved an accuracy of 91.9%–96.2% in the four benchmark datasets. In addition, it
outperformed existing methods such as GAN-based deep neural networks, conditional Wasserstein GAN-based
stacked autoencoder, synthesised minority oversampling technique-based random forest, and variational
autoencoder-based deep neural network, by 1.51%–28.4%.

1. Introduction artificial intelligence was used (44%) for NID, among other applications
[1]. Another survey [2] suggested that approximately about 953,800
With the ever-growing size of computer networks, automatic Web attacks were blocked daily in 2018, following a yearly growth rate
network intrusion detection (NID) has become essential to provide of 60%. Cybersecurity has become a leading smart city visions that has
instant warnings to users of potentially malicious behaviour, including driven the expansion of the global cybersecurity market size from 218
malware, attacks, and intrusions. A global survey concluded that billion to 345 billion USD between 2021 and 2026 [3].

* Corresponding author.
** Corresponding author. International Center for AI and Cyber Security Research and Innovations (CCRI), Taichung 413, Taiwan, China.
E-mail addresses: [email protected] (K.T. Chui), [email protected] (B.B. Gupta), [email protected] (P. Chaurasia), varsha.arya@insights2techinfo.
com (V. Arya), [email protected] (A. Almomani), [email protected] (W. Alhalabi).

https://doi.org/10.1016/j.ijin.2023.08.001
Received 27 April 2023; Received in revised form 3 August 2023; Accepted 3 August 2023
Available online 5 August 2023
2666-6030/© 2023 The Authors. Published by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

To achieve automatic NID, various machine learning (ML) algo­ three benchmark datasets, the accuracies were 84.9%, 82.5%, and
rithms have been proposed in the literature where the historical and 99.8%, respectively. Another study [6] improved the conditional Was­
latest developments can be referred to in the review articles [4]. In serstein GAN to generate additional training data for minority classes. A
addition, they summarised the benchmark datasets for the performance stacked autoencoder extracted the features and served as the NID model.
evaluation of the NID models. Due to the nature of various network Performance evaluation and analysis were conducted using two
attacks, data collection becomes biased and majority and minority benchmark datasets with accuracies of 80.8% and 93.3%. In Ref. [7], a
classes can be observed in many datasets. For example, NID biased to­ traditional GAN and an autoencoder were adopted for NIDs. An evalu­
wards the majority classes may become more severe with an increase in ation using a single benchmark dataset yielded an accuracy of 71.6%.
the imbalanced ratio. This was reflected in the deviation between the In addition, SMOTE can be used to generate additional effective
detection accuracies in each class can reflect this. training data for minority classes. For instance, a SMOTE-based random
To facilitate easier checking of acronyms and symbols, Table 1 forest approach was proposed for NID [8]. The model evaluation using a
summarises the complete list. benchmark dataset achieved an accuracy of 92.6%. In Ref. [9], SMOTE
was applied to generate additional training data that were fed into the
reinforcement learning (RL) algorithm for NID. Results revealed that the
1.1. Related works model yielded an accuracy of 82.1%. Another SMOTE-based NID model
was proposed with a hybrid feature extraction using a convolutional
Only the NID models using data generation algorithms are included neural network (CNN) and bidirectional long short-term memory
in the relevant related works. Various data generation algorithms have (BiLSTM) [10].
been used, including the generative adversarial network (GAN) [5–7], Regarding the VAE-based NID models, VAE and DNN were employed
synthetic minority over-sampling technique (SMOTE) [8–10], and in Ref. [11]. The analysis suggested that the proposed method out­
variational autoencoder (VAE) [11,12]. performed eight typical classification models. The accuracies achieved
In [5], convolutional layers and imbalanced data filters were applied were 80.3% and 93.1% using the two benchmark datasets, respectively.
to modify a GAN to generate data for minority classes. The feature In Ref. [12], a log-hyperbolic cosine function was introduced to enhance
extraction was based on a feed-forward neural network, and an NID the conditional VAE. CNN was applied to construct the NID model.
model was constructed using a deep neural network (DNN). Using the Analysis using one benchmark dataset yielded an accuracy of 85.5%.

Table 1 1.2. Limitations of related works


Summary of the notations and acronyms.
Notations/ Details Notations/ Details After reviewing the typical data generation algorithms in the liter­
acronyms acronyms ature (Subsection 1.1), several key limitations were observed that might
BiLSTM Bidirectional long short- Ladversarial Adversarial loss limit the performance enhancement of NID models.
term memory
CNN Convolutional neural Lreconstruction Reconstruction loss ● Four benchmark datasets were selected for the performance evalu­
network
ation of the NID models. To verify the model’s robustness, consid­
d Dimensionality of N Size of the majority
minority class class erations across different disciplines and the nature of the datasets
D Discriminator NID Network intrusion should be broadened. AnNID model should be designed to achieve
detection accurate detection in various benchmark datasets to reflect the
DBN Deep belief network Q Generator applicability and generalisation of the models.
DNN Deep neural network RBM Restricted Boltzmann
machine
● The NID models’ performance accuracy using three benchmark
e Real number RL Reinforcement datasets, NSL-KDD, UNSW-NB15, and KDD Cup 1999, was less than
learning 86% - a reason for insufficient training data in minority classes (also
f Integer S Schur product of reflected in the imbalanced ratios between the types of network in­
matrix
trusions [28–30]). This indicated that there was room to improve
F Decode SMOTE Synthetic minority
over-sampling various aspects including data generation, feature extraction, and
technique classification algorithms;
g Real number SVM Support vector ● Single-split validation was adopted in all related studies. This might
machine lead to insufficient performance evaluation and analysis because the
GAN Generative adversarial V Non-zero
models were more prone to suffer from overfitting, not all samples
network eigenvectors
h Real number VAE Variational were used to test the models, and one might pick a group of training
autoencoder datasets that could achieve better performance. In addition, fine-
k Number of neighbours to xfake Fake data tuning the hyperparameters of the NID model might be challenging.
be examined in k-NN
algorithm
K Kernel xi Input data 1.3. Research contributions of our work
Kcp1 Kernel using closure xreal Real data
Property 1 The rationale for the proposed design and formulations of the
Kcp2 Kernel using closure yi Latent state for the methodology (detailed in Section 2) aims to address the limitations of
Property 2 fake data
Kcp3 Kernel using closure yj Latent state for the
the related studies. This study’s contributions are:
Property 3 real data
Kcp4 Kernel using closure α Parameter for the ● A three-stage data generation algorithm is proposed to take advan­
Property 4 encoder tage of GAN, SMOTE, and VAE, which generates additional training
Klinear(x,y) Linear kernel function Parameter for the
β
data compared to the standalone data generation algorithm.
decoder
Kpoly(x,y) Polynomial kernel γ Parameter for the ● A CNN is designed to extract the representative features. A critical
function discriminator investigation is conducted on the customisation towards (relatively)
Krbf(x,y) Radial basis function λ Eigenvalues small-scale minority classes. This is advantageous because a CNN, as
Ksigm(x,y) Sigmoid kernel σ Real number a deep learning (DL) approach, usually requires sufficient training
l Real number
data to capture representative information.

203
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

● To resist overfitting, a support vector machine (SVM) takes advan­ Table 2


tage and is used in the multiclass classification phase. In addition, it Summary of the benchmark datasets.
reduces the influence of biased classification in a multiclass classi­ Dataset Class label Sample size Imbalanced ratio
fication problem with highly imbalanced datasets. A customised
NSL-KDD [13] Normal 77,053 N/A
kernel is designed to maximise the performance of the NID model as DoS 53,387 1.44:1
a tailor-made kernel for a NID applications. Probe 14,077 5.47:1
● We performed a more in-depth performance evaluation and analysis R2L 3880 19.9:1
using five-fold cross-validation. This helps to investigate the issue of U2R 119 648:1
UNSW-NB15 [14] Normal 1,902,765 N/A
model overfitting and benefits from the fine-tuning of the Generic 215,481 8.83:1
hyperparameters. Exploits 44,525 42.7:1
Fuzzing 22,246 85.5:1
1.4. Organization of the paper DoS 16,353 116:1
Reconnaissance 13,987 136:1
Analysis 2677 711:1
The rest of this paper is organised as follows. Section 2 provides the Backdoor 2329 817:1
details of the design and formulation of the methodology. Section 3 Shell code 1511 1259:1
presents a performance analysis and an ablation study of the proposed Worms 174 10,935:1
algorithm. Section 4 compares the performance of our study and related KDD Cup 1999 [15] Normal 873,407 N/A
DoS 477,079 1.83:1
studies. Section 5 draws the conclusions and presents future research Probe 18,026 4.85:1
directions. R2L 17,188 50.8:1
U2R 280 3199:1
2. Design and formulations of methodology CICIDS2017 [16] Normal 2,358,036 N/A
DoS Hulk 231,073 10.2:1
Port Scan 158,930 14.8:1
In this section, the methodology is divided into four parts: (i) sum­ DDoS[26,27] 41,835 56.4:1
mary of the four benchmark datasets for the performance evaluation and DoS GoldenEye 10,293 229:1
analysis of the proposed method; (ii) generation of additional training FTP Patator 7938 297:1
data using three-stage data generation algorithms; (iii) feature extrac­ SSH Patator 5897 400:1
DoS Slow Loris 5796 407:1
tion using CNN; and (iv) NID using customised kernel-based SVM. Dos Slow HTTP Test 5499 429:1
Botnet 1966 1199:1
2.1. Benchmark NID datasets Web attack: Brute 1507 1565:1
Web attack: XSS 625 3773:1
Infiltration 36 65,501:1
To ensure a valid performance comparison with related works, we
Web attack 21 112,287:1
selected four highly cited benchmark datasets based on citations in HeartBleed 11 214,367:1
Google Scholar, namely, NSL-KDD (3554) [13], UNSW-NB15 (1334)
[14], KDD Cup 1999 (212) [15], and CICIDS2017 (1505) [16], used in
Refs. [5–12]. into the majority classes in the subsequent NID phase. Regarding the
Table 2 summarises the class label, sample size (in descending GAN, the original design aimed to address imaging applications that led
order), and imbalanced ratio (in ascending order and with reference to to a technical foundation to generate data in minority classes that might
the first majority class; i.e., normal data) in each dataset. The key be scarce [18]. This was often relieved by introducing cross-validation.
highlights were: However, this only worked for highly imbalanced datasets. Although the
VAE demonstrated its superiority in some applications, it needed to be
● The number of classes in the benchmark datasets were 5, 10, 5, and revised to include the limitation that the generated data were noisy;
15 for NSL-KDD [13], UNSW-NB15 [14], KDD Cup 1999 [15], and thus, perfect reconstruction became challenging [19].
CICIDS2017, respectively. The challenge of building an accurate NID To resolve the limitations of standalone data generation algorithms
model is increased with the number of classes due to an increase in and take advantage of each algorithm, we proposed a three-stage data
model complexity. generation algorithm, namely SMOTE-GAN-VAE, to extend the ability to
● The ranges of sample sizes in the classes for each dataset were 119- generate more reliable training data.
77053, 174-1902,765, 280-873,407, and 11-2358,036, respectively. The SMOTE was applied in the first stage of the proposed data gen­
The fewer the number of samples, the more challenging it was to eration algorithm. It comprised two major ideas: (i) searching for the k-
build an accurate NID model because sufficient samples had to be nearest neighbours (k-NN) for each sample and (ii) applying interpola­
passed into the ML model to learn the domain knowledge. tion to generate samples. The model complexity was governed by
O(dN2 log10 k) with the dimensionality of the minority class d, size of the
● The ranges of the imbalanced ratio in the classes in each dataset were majority class N, and parameter k in the k-NN. Here is a summary of the
1.44–648:1, 8.83–10935:1, 1.83–3199:1, and 10.2–214,367:1, major steps in SMOTE.
respectively. The challenge of building an unbiased NID model to­
wards the majority classes increased with an imbalanced ratio. The Step 1: Iterate all minority samples for the k-NN;
three-stage data generation algorithm (Subsection B) had to allocate Step 2: Pick a random integer from one to k;
more resources to generate more training data in the minority classes Step 3: Find out the Euclidean distance between minority samples;
than in the majority classes. Step 4: Apply interpolation to generate samples for the minority class

2.2. Three-stage data generation: SMOTE-GAN-VAE After data generation with SMOTE, the updated dataset was passed
to the remaining two stages using the GAN and VAE. Fig. 1 shows the
Standalone data-generation algorithms, SMOTE, GAN, and VAE are high-level architecture of the SMOTE-GAN-VAE. Note that the compu­
limited to generating sufficient and representative training data in tational cost of the algorithm is increased with the aid of a three-stage
highly imbalanced datasets. SMOTE extracts local information from data generation process. However, once the model was trained, the
minority classes and generates training data in low diversity [17]. This data generation did not affect its practical usage. The testing time was
increases the probability that the generated data may be misclassified reasonable for the SVM classifier.

204
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

extracting representative features in different research applications.


Because a CNN can automatically extract features from high-
dimensional data and is flexible enough to be customised, it was
selected for feature extraction in the NID model. Although CNN might
not perform well in insufficient samples, the impact was lowered with
the SMOTE-GAN-VAE as a three-stage data generation.
Because CNN is a well-established algorithm, in-depth formulations
were not repeated. Below are the recommended readings for the back­
ground of CNN [21]. The CNN architecture comprised an input layer,
multiple hidden layers, and an output layer. Within a hidden layer, the
constituents were convolution, maximum pooling layers, and rectified
linear units. Due to the increased complexity of feature extraction with a
small sample size in minority classes of highly imbalanced NID datasets,
it was expected that more hidden layers should be adopted to extract
features from minority classes. Consequently, the number of hidden
layers in each class of the NID model was not fixed. Therefore, a grid
search approach was chosen as a common method to determine the
optimal number of hidden layers [22] compared with computationally
intensive genetic algorithm-based and particle swarm
optimisation-based [23] hyperparameter tuning. The other hyper­
parameters followed a grid search approach.

2.4. SVM with customized kernel for NID

SVM is a common ML algorithm for solving multiclass classification


Fig. 1. Architecture of SMOTE-GAN-VAE. problems that takes advantage of (i) resistance in model overfitting: fine-
tuning of regularisation and kernel parameters; (ii) kernel mapping,
In GAN theory, the generator captures the data distribution from the which maps the feature vectors into a higher-dimensional feature space;
output of SMOTE, and the discriminator calculates the probability of the and (iii) kernel design: flexibility in designing a non-linear kernel.
data originating from the real data instead of fake data; thus, dis­ Typical kernel functions include polynomials, radial basis function,
tinguishing between real and fake data. Both the generator and sigmoids, and linear kernels. Note that these kernels were designed for
discriminator are conditional on enabling the control of the generated general applications, where the best performance was not guaranteed.
data. To minimize the bias of the generated data, we applied constraint Furthermore, to enhance the performance of the SVM model, a
maximization to diversity, where bias and diversity were inversely customised kernel was required for NID applications. In general, there
related [20]. The outputs (fake data) of the generator and the real data were two directions for designing a customised kernel: (i) applying
will served as inputs for the encoder. Conceptually, the kernel properties to traditional kernels and (ii) using a new kernel
higher-dimensional data were transformed into lower-dimensional data function following Mercer’s theorem. The former direction was chosen
after the encoding process was performed by the encoder. Therefore, the here because traditional kernels have been extensively reviewed and
discriminator took advantage of the more effective determination of the analysed in the literature, yielding good performance in general classi­
actual class label in a lower-dimensional environment. Subsequently, fication problems. In addition, they characterise the nature of the data
the output of the discriminator was decoded by a decoder to form an distribution. By contrast, the second direction might require creating
accumulated dataset, which joined the generator to update the multiple constraints that force the kernel to obey Mercer’s theorem.
discriminator. Finally, the SMOTE-GAN-VAE data generation algorithm Note that the kernel should be positive, semi-definite, and symmetric,
was completed. that follows (4):
The reconstruction loss Lreconstruction between the encoder and the KV = λV (4)
decoder was given by (1):
Lreconstruction (α, β) = ‖xi − F(Q(xi , α)), β‖ (1) for kernel K, non-zero eigenvectors V, and eigenvalues λ.
The following closure properties merge the traditional polynomial,
with parameters for the encoder α and decoder β, input data xi, decoder radial basis function, sigmoid, and linear kernels. Let Y × Y (over Rn ×
F, and generator Q. Rn ) kernels K1 and K2, c ∈ R+ , and a real-valued function f(•) on Y. Four
By combining this with the discriminator, the adversarial loss closure properties are defined in (5)–(9). Assume a finite sequence {x1 ,
Ladversarial was defined as (2): …,xL }, any vector φ ∈ RL . Matrix K was positive and semi-definite when
( ( ))
Ladversarial (α, β, γ) = Eyi ∼F(yi ) logD Q xfake + Eyj ∼F(yj )log(1− D(Q(xreal ))) (2) φ′Kφ ≥ 0, ∀φ (5)

with a parameter for the discriminator γ, latent state for the fake data yi, Closure Property 1: Kcp1 = K1 + K2 .
latent state for the real data yj, discriminator D, fake data xfake, and real φ′(K1 + K2 )φ = φ′K1 φ + φ′K2 φ ≥ 0 (6)
data xreal.
Thus, the objective function was thus defined in (3): Closure Property 2: Kcp2 = cK1 .
minmaxLreconstruction + Ladversarial (3)
α,β γ φ′(cK1 )φ = cφ′K1 φ ≥ 0 (7)

Closure Property 3: Kcp3 = K1 • K2 .


2.3. CNN for feature extraction The kernel was identical to the Schur product of the matrices S of K1
and K2. S was the principal submatrix of K1 • K2 which was a set of rows
In related studies, various DL approaches have been effective in and columns. As a result:

205
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

φ′(S)φ = φ′1 (K1 • K2 )φ1 ≥ 0 (8) multiclass SVM is chosen instead of 1-aginist-all SVM [24]. Fig. 2
shows the conceptual diagrams of these types of SVM.
Table 3 summarises the challenges of the selected algorithms and our
2
for φ ∈ RL with corresponding φ1 ∈ RL .
Closure property 4: Kcp4 = f(x) • f(y). designs (solutions).
Consider a 1D feature mapping:
3. Performance analysis and ablation study
∅ = x→f (x)εR (9)
The selected four kernel functions are defined in (10)-(13). This section begins with the performance analysis of the proposed
( )f NID method. To study the effectiveness of the proposed three-stage data
Kpoly (x, y) = l × xT y + e (10) generation algorithm SMOTE-GAN-VAE, feature extraction using CNN,
and NID using SVM with the customised kernel, ablation studies were
conducted as described in three subsections.
‖x− y‖2
Krbf (x, y) = e− σ2 (11)
( )
Ksigm (x, y) = tanh gxT y + h (12) 3.1. Performance analysis of the proposed method

Klinear (x, y) = xT y (13) To provide a more reliable analysis, k-fold cross-validation with k =
5 was chosen. To the best of our knowledge, k = 5 and k = 10 [25] were
with real numbers l, e, g, σ, and h, and integer f. the two common values adopted in the literature. Because some
Various combinations were formed and analysed using varying benchmark datasets had minority classes that originally comprised tens
closure properties (1–4) and kernel functions (1–4). One-against-one to hundreds of samples, k = 5 was selected.

Fig. 2. Conceptual diagrams of (a) one-against-one SVM. (b) one-against-all SVM.

206
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

Table 3 Table 4
Challenges of selected algorithms and our designs. Accuracy of the proposed method.
Algorithm Challenges Our designs Dataset Class label Accuracy (%)

SMOTE The SMOTE extracts local Searching for k-NN for each NSL-KDD [13] Normal 92.4
information of minority classes sample and applying DoS 91.7
and generates training data in low interpolation to generate samples Probe 90.5
diversity. This increases the R2L 90.6
chance that generated data might U2R 89.2
be misclassified into majority UNSW-NB15 [14] Normal 96.3
classes in the later NID phase. Generic 95.9
Exploits 95.3
GAN The original design aims at To minimize the bias of the
Fuzzing 95.2
tackling imaging applications generated data, we have adopted
DoS 94.7
that lead to a technical the idea of applying constraint
Reconnaissance 94.4
foundation that generates data in maximization on the diversity
Analysis 93.8
minority classes that might be where bias and diversity are
Backdoor 93.5
scared. It is often relieved by inversely related to each other.
Shell code 93.0
introducing cross-validation.
Worms 91.8
However, it only works well in
KDD Cup 1999 [15] Normal 94.2
highly imbalanced datasets.
DoS 93.8
VAE It needs to be revised to include The higher-dimensional data is Probe 93.5
the limitation that the generated transformed into lower- R2L 92.6
data is noisy; thus, perfect dimensional data after the U2R 91.4
reconstruction becomes encoding process is managed by CICIDS2017 [16] Normal 94.6
challenging. an encoder. Therefore, the DoS Hulk 94.0
discriminator takes advantage of Port Scan 93.7
a more effective determination DDoS 93.5
between the actual class label in a DoS GoldenEye 92.8
lower-dimensional environment. FTP Patator 92.5
SSH Patator 92.1
CNN Due to the increased complexity The number of hidden layers in
DoS Slow Loris 92.0
of feature extraction with a small each class of the NID model is not
Dos Slow HTTP Test 91.7
sample size in the minority fixed. Therefore, a grid search
Botnet 91.3
classes of the highly imbalanced approach is chosen as a common
Web attack: Brute 90.9
NID datasets, it is expected that way to determine the optimal
Web attack: XSS 90.6
more hidden layers should be number of hidden layers.
Infiltration 88.5
adopted to extract features in
Web attack 88.3
minority classes.
HeartBleed 87.9
SVM Typical kernel functions include Applying kernel properties on
polynomial, radial basis function, traditional kernels. Traditional
sigmoid, and linear kernels. These kernels are extensively reviewed
kernels are designed for general and analysed in the literature and Table 5
applications where best yielded good performance in Specificity and sensitivity of the proposed method.
performance is not guaranteed. general classification problems.
Dataset Specificity (%) Sensitivity (%)
In addition, they are able to
characterise the nature of the NSL-KDD [13] 92.4 91.4
data distribution. UNSW-NB15 [14] 96.3 95.6
KDD Cup 1999 [15] 94.2 93.7
CICIDS2017 [16] 94.6 93.7
Table 4 summarises the performance of the proposed method.
Several key observations are drawn based on Table 4.
standalone algorithms SMOTE, GAN, and VAE. Table 6 summarises the
accuracies of the seven approaches. In this analysis, only changes were
● The ranges of accuracies were NSL-KDD: 89.2%–92.4%; UNSW-
made to the data generation algorithms. The key observations were:
NB15: 91.8%–96.3%; KDD Cup 1999: 91.4%–94.2%; and
CICIDS2017: 87.9%–94.6%. The proposed method generated a sig­
● The proposed three-stage data generation algorithm outperformed
nificant portion of the training data in minority classes, which
the hybrid algorithms, whereas the hybrid algorithms outperformed
reduced the impact of biased classification toward the majority
the standalone algorithms. This was because the architecture took
classes.
advantage of each data generation algorithm. As a trade-off, the
● The deviations of the min-max accuracy in each dataset were as
complexity of the algorithm increased with the adoption of more
follows: NSL-KDD, 3.59%; UNSW-NB15, 4.90%; KDD Cup 1999,
algorithms.
3.06%; and CICIDS2017, 7.62%. The deviations became more severe
an increase in the number of classes in the multiclass NID problem
● The accuracy ranking in terms of accuracy of the seven algorithms in
and the level of imbalanced ratios.
descending order was: SMOTE-GAN-VAE (proposed), SMOTE-VAE,
SMOTE-GAN, GAN-VAE, VAE, SMOTE, and GAN.
Table 5 lists the specificities and sensitivities of the proposed method
● The percentage improvements by the proposed method across the
for each benchmark dataset. However, the specificity and sensitivity
accuracies in each class compared with the hybrid algorithms were
were not identical when reducing the NID to a binary classification.
4.39% (GAN-VAE), 3.54% (SMOTE-GAN), and 2.63% (SMOTE-
VAE).
3.2. Comparison to other data generation algorithm ● The percentage improvements by the proposed method across the
accuracies in each class compared with the standalone algorithms
To reveal the effectiveness of the proposed three-stage data genera­ were 7.82% (GAN), 6.76% (SMOTE), and 6.15% (VAE).
tion algorithm, SMOTE-GAN-VAE, it was compared with the hybrid al­ ● The min-max deviations across the four benchmark datasets in the
gorithms SMOTE-GAN, SMOTE-VAE, and GAN-VAE and with the seven approaches were ranked in ascending order: 4.79%

207
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

Table 6
Comparison between three-stage, hybrid, and standalone data generation algorithms.
Dataset Class label Accuracy (%)

Three-Stage Hybrid Standalone

Proposed SMOTE-GAN SMOTE-VAE GAN-VAE SMOTE GAN VAE

NSL-KDD [13] Normal 92.4 90.3 90.7 89.6 87.9 86.8 88.6
DoS 91.7 89.3 89.8 88.4 86.7 85.5 87.2
Probe 90.5 88.0 88.9 87.1 85.5 84.7 85.8
R2L 90.6 87.7 88.5 86.6 85.0 84.0 85.4
U2R 89.2 86.2 87.1 85.7 83.9 82.8 84.3
UNSW-NB15 [14] Normal 96.3 93.4 94.0 92.3 90.0 89.1 90.6
Generic 95.9 92.5 93.3 91.3 89.0 88.2 89.8
Exploits 95.3 91.7 92.5 90.6 88.2 87.6 89.0
Fuzzing 95.2 91.5 92.4 90.1 87.6 86.8 88.4
DoS 94.7 90.9 91.7 89.5 87.0 86.3 87.7
Reconnaissance 94.4 90.3 91.3 89.0 86.4 85.4 87.1
Analysis 93.8 89.4 90.6 88.6 85.7 84.5 86.5
Backdoor 93.5 89.0 90.2 88.1 85.1 83.7 85.9
Shell code 93.0 88.5 89.6 87.4 84.5 83.2 85.4
Worms 91.8 87.2 87.8 86.0 83.3 82.2 84.1
KDD Cup 1999 [15] Normal 94.2 91.7 92.4 91.3 89.6 88.8 90.2
DoS 93.8 91.4 91.7 90.8 89.1 88.0 89.5
Probe 93.5 90.7 91.3 90.3 88.5 87.3 89.2
R2L 92.6 89.9 90.6 89.6 87.9 86.6 88.4
U2R 91.4 89.0 89.5 88.7 86.8 85.9 87.7
CICIDS2017 [16] Normal 94.6 92.4 93.0 91.9 89.9 88.9 90.4
DoS Hulk 94.0 91.7 92.2 91.2 89.3 88.4 89.7
Port Scan 93.7 91.2 91.4 90.6 88.8 88.0 89.0
DDoS 93.5 90.8 91.3 90.3 88.4 87.6 88.6
DoS GoldenEye 92.8 90.0 90.8 89.7 87.8 87.3 88.0
FTP Patator 92.5 89.5 90.4 89.4 87.6 86.8 87.6
SSH Patator 92.1 89.0 89.7 88.5 87.2 86.5 87.3
DoS Slow Loris 92.0 88.8 89.4 88.1 86.7 86.0 86.8
Dos Slow HTTP Test 91.7 88.3 89.0 87.6 86.1 85.7 86.4
Botnet 91.3 88.0 88.8 87.3 85.8 85.2 85.7
Web attack: Brute 90.9 87.6 88.5 86.7 85.2 84.6 85.5
Web attack: XSS 90.6 87.1 88.1 86.3 84.7 84.2 85.1
Infiltration 88.5 85.5 86.7 84.9 83.2 82.9 83.8
Web attack 88.3 85.0 86.4 84.7 82.8 82.3 83.4
HeartBleed 87.9 84.3 85.8 84.0 82.3 81.9 82.9

(proposed), 5.71% (SMOTE-VAE), 6.05% (GAN-VAE), 6.13% standalone kernels Kpoly (x, y), Krbf (x, y), Ksigm (x, y), and Klinear (x, y).
(SMOTE-GAN), 6.18% (VAE), 6.29% (GAN), and 6.32% (SMOTE). Table 8 summarises the comparison of the five kernels in each class label
of the benchmark dataset. The key points were:
3.3. Comparison to other DL-based feature extraction approaches
● The accuracy ranks in descending order were customised kernel
(proposed), Krbf (x,y), Kpoly (x,y), Ksigm (x,y), and Klinear (x,y).
To evaluate the DL-based algorithm for feature extraction in NID, we
Merging the kernels with closure properties took advantage of each
compared the CNN with a restricted Boltzmann machine (RBM) and a
kernel to characterise the data distribution.
deep belief network (DBN). Because feature extraction was not this
● The average percentage improvements across all class labels in the
study’s main focus, a comparison was made based on the overall accu­
benchmark datasets were 3.03% (Krbf (x,y)), 3.99% (Kpoly (x,y)),
racy of each benchmark dataset (Table 7).
5.29% (Ksigm (x,y)), and 8.10% (Klinear (x,y));
CNN-based NID yielded higher accuracy than RBM and DBN. The
percentage improvements in accuracy across all datasets were 2.90%
and 3.53% compared with RBM and DBN, respectively. This could be 3.5. Summary of key results
related to the characteristics of the RBM that worked well with missing
data, whereas those of the DBN were good for 1D data. The results revealed that the proposed SMOTE-GAN-VAE improved
accuracy with 2.63%–4.39% and 6.15%–7.82%, compared with the
3.4. Comparison to other kernel functions hybrid and standalone algorithms, respectively. Feature extraction via
CNN enhanced the accuracy with 2.90% and 3.53% compared with RBM
The analysis was performed between the customised kernel and the and DBN, respectively. The SVM with the customised kernel improved
the accuracy with 3.03%–8.10% compared with traditional polynomial,
radial basis function, sigmoid, and linear kernels.
Table 7
Comparison between CNN, RBM, and DBN.
4. Performance comparison with related works
Dataset Overall accuracy (%)

CNN RBM DBN A performance comparison was conducted with related studies
NSL-KDD [13] 91.9 88.6 87.9 [5–12]. These six aspects are summarised in Table 9, including the
UNSW-NB15 [14] 96.2 93.1 92.7 datasets, data generation algorithms, feature extraction, ML algorithms,
KDD Cup 1999 [15] 94.0 92.0 91.4 cross-validation, and accuracy. The following discussion is made based
CICIDS2017 [16] 94.5 92.3 91.8
on a comparison of each aspect.

208
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

Table 8
Comparison between customized kernel functions.
Dataset Class label Accuracy (%)

Customized Kpoly(x,y) Krbf(x,y) Ksigm(x,y) Klinear(x,y)

NSL-KDD [13] Normal 92.4 89.2 89.7 87.9 84.2


DoS 91.7 88.4 88.8 87.1 83.3
Probe 90.5 86.8 87.3 86.2 82.5
R2L 90.6 86.5 87.0 85.7 81.9
U2R 89.2 85.6 86.2 84.9 81.3
UNSW-NB15 [14] Normal 96.3 93.5 93.9 92.3 89.7
Generic 95.9 92.8 93.3 91.4 89.0
Exploits 95.3 92.4 92.8 90.7 88.4
Fuzzing 95.2 92.0 92.5 90.2 87.7
DoS 94.7 91.3 92.0 89.8 87.5
Reconnaissance 94.4 91.1 91.6 89.3 87.0
Analysis 93.8 90.7 91.1 88.9 86.6
Backdoor 93.5 90.1 90.6 88.4 86.2
Shell code 93.0 89.7 89.9 88.0 85.9
Worms 91.8 88.2 88.5 86.5 84.7
KDD Cup 1999 [15] Normal 94.2 91.6 92.2 90.7 88.5
DoS 93.8 90.8 91.6 89.6 87.3
Probe 93.5 90.4 91.3 89.1 87.0
R2L 92.6 89.6 90.6 88.4 86.5
U2R 91.4 88.5 89.7 87.2 85.6
CICIDS2017 [16] Normal 94.6 91.6 92.2 90.5 88.6
DoS Hulk 94.0 90.7 91.4 89.9 87.8
Port Scan 93.7 90.2 91.1 89.5 87.4
DDoS 93.5 89.8 90.9 89.1 87.0
DoS GoldenEye 92.8 89.2 90.3 88.6 86.4
FTP Patator 92.5 88.8 89.9 88.1 85.8
SSH Patator 92.1 88.2 89.5 87.5 85.5
DoS Slow Loris 92.0 88.0 89.2 87.1 85.0
Dos Slow HTTP Test 91.7 87.6 88.8 86.7 84.8
Botnet 91.3 87.1 88.4 86.3 84.3
Web attack: Brute 90.9 86.6 88.0 85.8 84.0
Web attack: XSS 90.6 86.2 87.5 85.2 83.6
Infiltration 88.5 84.1 85.5 83.4 81.9
Web attack 88.3 83.7 85.1 83.0 81.7
HeartBleed 87.9 83.5 84.8 82.8 81.4

Table 9
Performance comparison between proposed work and related works.
Work Datasets Data generation algorithm Feature extraction Machine learning algorithms Cross-validation Accuracy (%)

[5] NSL-KDD GAN Feed-forward Neural Network Deep neural network Single split 84.9 (5-class)
UNSW-NB15 82.5 (10-class)
CICIDS2017 99.8 (6-class)
[6] NSL-KDD Conditional Wasserstein GAN Stacked autoencoder Single split 80.8 (5-class)
UNSW-NB15 93.3 (10-class)
[7] NSL-KDD GAN Autoencoder Single split 71.6 (5-class)
[8] KDD Cup 1999 SMOTE Random forest Single split 92.6 (5-class)
[9] NSL-KDD SMOTE Q-learning Reinforcement learning Single split 82.1 (5-class)
[10] NSL-KDD SMOTE CNN Bi-LSTM Single split 83.6 (5-class)
UNSW-NB15 77.2 (10-class)
[11] NSL-KDD VAE DNN Single split 80.3 (5-class)
UNSW-NB15 93.1 (10-class)
[12] NSL-KDD VAE with log hyperbolic cosine function CNN Single split 85.5 (5-class)
Proposed NSL-KDD Three-stage CNN SVM with customized kernel 5-fold 91.9 (5-class)
UNSW-NB15 SMOTE-GAN-VAE 96.2 (10-class)
KDD Cup 1999 94.0 (5-class)
CICIDS2017 94.5 (15-class)

● Datasets: Analysing the NID model with benchmark datasets was ● Feature extraction: Both shallow [5–9] and DL approaches [10–12]
important to confirm its effectiveness and generalisability. The NID have been used in the literature. DL approaches might become more
model was evaluated using four benchmark datasets: NSL-KDD, effective in discovering hidden knowledge if solid domain knowl­
UNSW-NB15, KDD Cup 1999, and CICIDS2017. Four works [7–9, edge was not fully understood based on shallow learning approaches.
12] used one dataset, three studies [6,10,11] used three datasets, and ● ML algorithms: Some existing studies [6–8,11,12] have considered
one study [5] used only one dataset; feature extraction and ML algorithms for the NID model as an
● Data generation algorithm: A one-stage data generation algorithm embedded formulation. This might reduce the flexibility in hyper­
was used in related studies. GAN in Refs. [5–7], SMOTE in Refs. parameter tuning and customisation of the model for the desired
[8–10], and VAE in Refs. [11,12]. Furthermore, a three-stage SMO­ applications.
TE-GAN-VAE was proposed to enhance the quality of the additional ● Cross-validation: As mentioned in Section I, there were key limita­
training data. tions to single-split validation: it might not obtain the best

209
K.T. Chui et al. International Journal of Intelligent Networks 4 (2023) 202–210

performance for the NID model, and it was not fully evaluated with [4] Z. Ling, Z. J, HaoIntrusion detection using normalized mutual information feature
selection and parallel quantum genetic algorithm, Int. J. Semantic Web Inf. Syst. 18
all available samples in the benchmark datasets. Our work adopted
(1) (2022) 1–24.
five-fold cross-validation, which was a common order. Notably, the [5] S. Huang, K. Lei, IGAN-IDS: an imbalanced generative adversarial network towards
extreme scenario of k-fold cross-validation became leave-one- intrusion detection system in ad-hoc networks, Ad Hoc Netw. 105 (2020), 102177.
subject-out cross-validation, which required an N-fold with N as [6] G. Zhang, X. Wang, R. Li, Y. Song, J. He, J. Lai, Network intrusion detection based
on conditional Wasserstein generative adversarial network and cost-sensitive
the total number of samples in the datasets, which was not feasible in stacked autoencoder, IEEE Access 8 (2020) 190431–190447.
terms of computational power and training time. [7] P.F. de Araujo-Filho, G. Kaddoum, D.R. Campelo, A.G. Santos, D. Macêdo,
● Accuracy: The discussion was based on each dataset with the per­ C. Zanchettin, Intrusion detection for cyber–physical systems using generative
adversarial networks in fog environment, IEEE Internet Things J. 8 (8) (2021)
centage improvement by the proposed method (i) NSL-KDD: 8.24% 6247–6256.
[5], 13.7% [6], 28.4% [7], 11.9% [9], 9.93% [10], 14.4% [11], and [8] X. Tan, S. Su, Z. Huang, X. Guo, Z. Zuo, X. Sun, L. Li, Wireless sensor networks
7.49% [12]; (ii) UNSW-NB15: 16.6% [5], 3.11% [6], 24.6% [10], intrusion detection based on SMOTE and the random forest algorithm, Sensors 19
(1) (2019) 203.
and 3.33% [11]; (iii) KDD Cup 1999: 1.51% [8]; and (iv) [9] X. Ma, W. Shi, Aesmote: adversarial reinforcement learning with smote for
CICIDS2017: It was not comparable because the work in Ref. [5] anomaly detection, IEEE Trans. Netw. Sci. Eng. 8 (2) (2021) 943–956.
regrouped the 14 types of attacks into five classes. It was expected [10] K. Jiang, W. Wang, A. Wang, H. Wu, Network intrusion detection combined hybrid
sampling with deep hierarchical network, IEEE Access 8 (2020) 32464–32476.
that with the original class labels (15 classes), the challenge of the [11] Y. Yang, K. Zheng, B. Wu, Y. Yang, X. Wang, Network intrusion detection based on
NID model would increases; thus, decreasing the accuracy. To supervised adversarial variational auto-encoder with regularization, IEEE Access 8
conclude, a performance comparison with related works on four (2020) 42169–42184.
[12] X. Xu, J. Li, Y. Yang, F. Shen, Toward effective intrusion detection using log-cosh
benchmark datasets suggested an accuracy improvement of 7.49%–
conditional variational autoencoder, IEEE Internet Things J. 8 (8) (2021)
28.4% (NSL-KDD), 3.11%–24.6% (UNSW-NB15), and 1.51% (KDD 6187–6196.
Cup 1999), and was incomparable but worth investigation [13] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the KDD CUP
(CICIDS2017). 99 data set, in: IEEE Symp. Comput. Intelli. Secur. Defense App., ON, Ottawa,
2009, pp. 1–6. Canada, 2009.
[14] N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network intrusion
5. Conclusion detection systems (UNSW-NB15 network data set), in: Military Comm. Inf. Syst.
Conf. (MilCIS), ACT, Canberra, 2015, pp. 1–6. Australia, 2015.
[15] S.D. Bay, D. Kibler, M.J. Pazzani, P. Smyth, The UCI KDD archive of large data sets
The issue of cybersecurity is a crucial factor in the development and for data mining research and experimentation, ACM SIGKDD Explor. News 2 (2)
sustainability of smart cities. With our reliance on the internet in our (2000) 81–85.
[16] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward generating a new intrusion
daily activities, the risk of network intrusion is a constant threat. detection dataset and intrusion traffic characterization, in: Proceed. 4th Int. Conf.
However, the current benchmark datasets for detecting network attacks Inf. Syst. Secur. Privacy, 2018, pp. 108–116. Portugal.
are highly imbalanced, leading to biased results. To address this chal­ [17] J. Sun, J. Lang, H. Fujita, H. Li, Imbalanced enterprise credit evaluation with DTE-
SBD: decision tree ensemble based on SMOTE and bagging with differentiated
lenge, we proposed a three-stage data generation algorithm that lever­
sampling rates, Inf. Sci. 425 (2018) 76–91.
ages synthetic minority over-sampling technique, generative adversarial [18] Z. Zhong, J. Li, D.A. Clausi, A. Wong, Generative adversarial networks and
network, and variational autoencoder to generate high-quality data and conditional random fields for hyperspectral image classification, IEEE Trans.
Cybern. 50 (7) (2020) 3318–3329.
reduce the impact of imbalanced ratios in minority classes. While our
[19] Z. Islam, M. Abdel-Aty, Q. Cai, J. Yuan, Crash data augmentation using variational
study is a step in the right direction, there is still room for improvement. autoencoder, Accid. Anal. Prev. 151 (2021), 105950.
We suggest exploring transfer learning from identical and non-identical [20] S. Suh, H. Lee, P. Lukowicz, Y.O. Lee, CEGAN: classification enhancement
domains, merging heterogeneous datasets, and enhancing the design generative adversarial networks for unraveling data imbalance problems, Neural
Network. 133 (2021) 69–86.
with variants of data generation algorithms to provide more robust [21] [a] D. Li, L. Deng, B.B. Gupta, H. Wang, C. Choi, A novel CNN based security
training data. By continuing to refine and improve our models, we can guaranteed image watermarking generation scenario for smart city applications,
better ensure the safety and security of our smart cities. Inf. Sci. 479 (2019) 432–447;
[b] A.K. Mandle, S.P. Sahu, G.P. Gupta, CNN-based deep learning technique for
the brain tumor identification and classification in MRI images, Int. J. Software Sci.
Funding Comput. Intell. 14 (1) (2022) 1–20.
[22] I. Priyadarshini, C. Cotton, A novel LSTM–CNN–grid search-based deep neural
network for sentiment analysis, J. Supercomput. 77 (12) 13911–13932.
The work described in this paper was fully supported by a grant from [23] N.M. Aszemi, P.D.D. Dominic, Hyperparameter optimization in convolutional
Hong Kong Metropolitan University (RIF/2021/05). neural network using genetic algorithms, Int. J. Adv. Comput. Sci. Appl. 10 (6)
(2019) 269–278.
[24] C.W. Hsu, C.J. Lin, A comparison of methods for multiclass support vector
machines, IEEE Trans. Neural Network. 13 (2) (2002) 415–425.
Declaration of competing interest [25] T. Fushiki, Estimation of prediction error by using K-fold cross-validation, Stat.
Comput. 21 (2011) 137–146.
[26] B.B. Gupta, R.C. Joshi, M. Misra, Defending against distributed denial of service
The authors declare that they have no known competing financial attacks: issues and challenges, Inf. Secur. J. A Glob. Perspect. 18 (5) (2009)
interests or personal relationships that could have appeared to influence 224–247.
the work reported in this paper. [27] I. Cvitić, D. Perakovic, et al., Boosting-based DDoS detection in internet of things
systems, IEEE Internet Things J. 9 (3) (2021) 2109–2123.
[28] R. Sharma, T.P. Sharma, A.K. Sharma, Detecting and preventing misbehaving
References intruders in the internet of vehicles, Int. J. Cloud Appl. Comput. 12 (1) (2022)
1–21.
[1] Statista Research Department, Artificial Intelligence Application Areas in [29] Z. Ling, Z.J. Hao, Intrusion detection using normalized mutual information feature
Organizations Worldwide 2018, Mar. 13, 2022. selection and parallel quantum genetic algorithm, Int. J. Semantic Web Inf. Syst. 18
[2] J. Johnson, Number of Web Attacks Blocked Daily Worldwide 2015-2018, Jan. 25, (1) (2022) 1–24.
2021. [30] S. Li, D. Qin, X. Wu, J. Li, B. Li, W. Han, False alert detection based on deep
[3] J.A. Sava, Cybersecurity Market Revenues Worldwide 2021-2026, Feb. 14, 2022. learning and machine learning, Int. J. Semantic Web Inf. Syst. 18 (1) (2022) 1–21.

210

You might also like