Electronics 11 00898
Electronics 11 00898
Article
A Deep Learning Model for Network Intrusion Detection with
Imbalanced Data
Yanfang Fu 1 , Yishuai Du 1 , Zijian Cao 1 , Qiang Li 1 and Wei Xiang 2,3, *
1 School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China;
[email protected] (Y.F.); [email protected] (Y.D.); [email protected] (Z.C.);
[email protected] (Q.L.)
2 School of Computing, Engineering and Mathematical Sciences, La Trobe University,
Melbourne, VIC 3086, Australia
3 College of Science and Engineering, James Cook University, Cairns, QLD 4878, Australia
* Correspondence: [email protected]
Abstract: With an increase in the number and types of network attacks, traditional firewalls and
data encryption methods can no longer meet the needs of current network security. As a result,
intrusion detection systems have been proposed to deal with network threats. The current mainstream
intrusion detection algorithms are aided with machine learning but have problems of low detection
rates and the need for extensive feature engineering. To address the issue of low detection accuracy,
this paper proposes a model for traffic anomaly detection named a deep learning model for network
intrusion detection (DLNID), which combines an attention mechanism and the bidirectional long
short-term memory (Bi-LSTM) network, first extracting sequence features of data traffic through a
convolutional neural network (CNN) network, then reassigning the weights of each channel through
the attention mechanism, and finally using Bi-LSTM to learn the network of sequence features. In
intrusion detection public data sets, there are serious imbalance data generally. To address data
imbalance issues, this paper employs the method of adaptive synthetic sampling (ADASYN) for
Citation: Fu, Y.; Du, Y.; Cao, Z.; Li, Q.; sample expansion of minority class samples, to eventually form a relatively symmetric dataset,
Xiang, W. A Deep Learning Model for and uses a modified stacked autoencoder for data dimensionality reduction with the objective of
Network Intrusion Detection with enhancing information fusion. DLNID is an end-to-end model, so it does not need to undergo the
Imbalanced Data. Electronics 2022, 11, process of manual feature extraction. After being tested on the public benchmark dataset on network
898. https://doi.org/10.3390/ intrusion detection NSL-KDD, experimental results show that the accuracy and F1 score of this model
electronics11060898 are better than those of other comparison methods, reaching 90.73% and 89.65%, respectively.
Academic Editors: Jihoon Yang and
Unsang Park Keywords: intrusion detection; Bi-LSTM; attention mechanism; NSL-KDD
activities. One is signature-based detection, similar to antivirus software that requires com-
parison with previously collected attack features, while the other is anomaly-based detection,
which requires comparison with normal traffic to make a judgment. In the KDD99 dataset,
Stolfo et al. classified network attacks into four categories—namely, the denial-of-service
attack (DoS), user-to-root attack (U2R), remote-to-local attack (R2L), and probe attack [5].
Nowadays, there are many researchers who advocate the combination of intrusion
detection and machine learning (ML) technologies for the detection of network attacks
by creating effective models. The authors in [6] propose the use of naive Bayes for the
identification of anomalous networks and compare it with decision trees (another clas-
sical machine learning algorithm). The authors in [7] combine support vector machine
(SVM) and the genetic algorithm to optimize the selection, parameters, and weights of
SVM features, thus improving the accuracy of network attack identification. The authors
in [8] improve the detection by constructing a multi-level random forest model to detect
network anomalous behavior. The authors in [9] improve the existing k-nearest neighbor
(KNN) classifier by combining K-MEANS clustering and KNN classifier with each other to
improve the accuracy of detection. The authors in [10] propose a novel intrusion detection
method that first decomposes the network data into smaller subsets by a C4.5 decision
tree algorithm and then creates multiple SVM models for the subsets, which reduces the
time complexity and improves the detection rate of unknown attacks. However, traditional
machine learning methods usually emphasize feature engineering, which consumes con-
siderable computational resources and usually only learns shallow features, leading to
less satisfactory detection results. Many scholars have turned their attention to the current
trend of deep learning, hoping to import network traffic data directly into the model to
skip the feature selection step. In one study [11], the authors propose a model structure
based on deep belief networks (DBNs) and probabilistic neural networks (PNNs) to reduce
the dimensionality of the data using deep belief networks and then classify the data using a
probabilistic neural network, which is superior to the traditional PNNs. The authors in [12]
propose a convolutional neural network-based detection method by processing traffic data
into image form, saving the process of designing features manually. In another study [13],
the authors use RNN networks for Botnet anomaly detection, and the effectiveness of RNN
networks on timing features is utilized to further improve the accuracy of classification.
Table 1 gives a summary and summary of the relevant research.
However, there is a problem of uneven distribution in network traffic data, and none
of the above networks exploits the correlation between traffic features. In this paper, a
DLNID model is proposed to solve the above remaining problems, using adaptive synthetic
sampling (ADASYN) for data augmentation of unbalanced samples and a modified stacked
autoencoder for data dimensionality reduction. To train and test the performance of
the DLNID model, we take the NSL-KDD dataset for simulation testing. The following
contributions are presented in this paper:
(1) A DLNID model combining attention mechanism and Bi-LSTM is proposed. This
DLNID model can classify network traffic data accurately;
(2) To address the issue of imbalanced network data, ADASYN is used for data augmenta-
tion of the minority class of samples eventually making the distribution of the number
of each sample type relatively symmetrical, allowing the model to learn adequately;
(3) An improved stacked autoencoder is proposed and used for data dimensionality
reduction with the objective of enhancing information fusion.
The rest of this paper follows: Section 2 details the techniques and innovations used in this
paper and presents a diagram of the model architecture of the DLNID model. Section 3 presents
information about the NSL-KDD dataset used in this paper. Section 4 provides experimental
results and analysis. In Section 5, we summarize our study and propose future research.
2. Technology
2.1. ADASYN
Adaptive synthetic sampling (ADASYN) [15] is an adaptive oversampling algorithm
based on the minority class samples. Compared with other data expansion algorithms,
it is characterized by the fact that it generates more instances in a special space with
lower density and fewer instances in feature space with higher density. This feature has
the advantage of adaptively shifting decision boundaries to difficult-to-learn samples, so
ADASYN is more suitable than other data augmentation algorithms to handle network
traffic with severe data imbalance. The algorithm is executed in the following steps:
Step 1: Calculate the number of samples to be synthesized as G, which can be expressed as
G = (nb − ns ) × β (1)
where nb represents the majority sample, ns represents the minority samples, and β ∈ (0, 1).
Step 2: For each minority sample, calculate K neighbors by the Euclidean distance and
denote by ri the proportion of majority class samples contained in the neighbors, which
can be expressed as
ri = k/K (2)
where K represents the current number of neighbors, and k represents the majority class
sample in the current neighbor.
Step 3: Calculate the number of samples that need to be synthesized for each minority
sample according to G and synthesize the samples according to Equation (4), which can be
expressed as
g = G × ri (3)
Zi = Xi + (Xzi − Xi ) × λ (4)
where g represents the quantity to be synthesized, Zi represents the synthesized new
sample, Xi represents the current minority sample, and XZi represents a random minority
sample among the k neighbors of Xi ,λ ∈ (0, 1).
2.2. Autoencoder
An autoencoder [16] is an unsupervised learning network architecture, in which the
input and output dimensions are the same, and the number of nodes in the middle layer
is generally less than the number of nodes on the left and right sides. Figure 1 illustrates a
typical autoencoder consisting of two main components, i.e., the encoder and decoder. It
sample, Xi represents the current minority sample, and X Zi represents a random minor-
ity sample among the k neighbors of Xi , λ ∈ (0,1) .
2.2. Autoencoder
An autoencoder [16] is an unsupervised learning network architecture, in which the
Electronics 2022, 11, 898 4 of 13
input and output dimensions are the same, and the number of nodes in the middle layer
is generally less than the number of nodes on the left and right sides. Figure 1 illustrates
a typical autoencoder consisting of two main components, i.e., the encoder and decoder.
It works
works byby usingdeep
using deeplearning
learning techniques
techniques totofind
findananefficient representation
efficient of the
representation input
of the input data
data without losing information. In short, it compresses the original data
without losing information. In short, it compresses the original data by using the by using the encoder
encoder
to obtaintoa obtain a lower-dimensional
lower-dimensional representation,
representation, whichwhich
is thenis reconstructed
then reconstructed
intointo
the original
the original data by the decoder. According to this working principle, we can use the
data by the decoder. According to this working principle, we can use the trained encoder as a
trained encoder as a tool for data dimensionality reduction. Compared with the tradi-
tool for data dimensionality reduction. Compared with the traditional principal component
tional principal component analysis (PCA) [17] data dimension reduction method, the au-
analysis (PCA) [17] data dimension reduction method, the autoencoder can achieve nonlinear
toencoder can achieve nonlinear changes, which facilitates the learning of more deep pro-
changes, which
jection data facilitates the learning of more deep projection data information.
information.
Figure 1.
Figure 1. Autoencoder
Autoencoder structure.
structure.
Although
Althoughthe theautoencoder
autoencodercan canachieve
achievebetter
betterdata
datadimensionality
dimensionalityreduction, com-
reduction, compared
paredother
with with other dimensionality
dimensionality reduction
reduction methods,
methods, we weaimed
aimedto
topropose
propose an
anautoencoder
autoencoder that is
that is able to perform dimensionality reduction and enhance data robustness to adapt to
able to perform dimensionality reduction and enhance data robustness to adapt to complex
complex network scenarios. Dropout [18] enables each neuron to have the probability p to
network scenarios. Dropout [18] enables each neuron to have the probability p to be
be discarded during network training iterations, and due to this mechanism, each neuron
discarded during network training iterations, and due to this mechanism, each neuron
is not overly dependent on other neurons, thus reducing the phenomenon of overfitting
is
and improvingdependent
not overly on other
the generalization neurons,
ability of thethus
modelreducing
to some the phenomenon
extent. By combining ofthe
overfitting
and improving the generalization ability of the model to some extent.
two ideas, a low-latitude representation is obtained by using dropout and stacked auto- By combining
Electronics 2022, 11, x FOR PEER REVIEW 5 of 14
the two after
encoder ideas, a low-latitude
dimensionality representation
reduction. Since each is obtainedhas
dimension bythe
using dropout
probability and stacked
of being
autoencoder after dimensionality reduction. Since each dimension has the probability
of being discarded, the information set of each dimension is more comprehensive than
discarded, the information set of each dimension is more comprehensive than that ob-
that obtained by traditional autoencoder after dimensionality reduction, thus facilitating
tained by traditional autoencoder after dimensionality reduction, thus facilitating model
model learning. Based on the above ideas, we proposed a stacked encoder structure with
learning. Based on the above ideas, we proposed a stacked encoder structure with in-
increased dropout,
creased dropout, as shown
as shown in Figure
in Figure 2. 2.
Figure 2.
Figure 2. Improved
Improvedstacked autoencoder.
stacked autoencoder.
The bidirectional LSTM (Bi-LSTM) network [24] improves its LSTM predecessor by
← →
adding backward hidden states h t to the existing forward hidden states h t , allowing it to
obtain a forward-looking capability similar to that of the hidden Markov model (HMM).
The following shows how the Bi-LSTM network updates itself in one time step:
→ →
h t = tan h W → xt + W→→ h t−1 + b→ (11)
ht hh h
← ←
h t = tan h W ← xt + W← ← h t−1 + b← (12)
ht hh h
→ ←
ht = h t + h t (13)
where ht represents the hidden state of the current cell, ht−1 represents the hidden state
→ ←
of the previous cell, h t represents the forward hidden state of the current cell, and h t
represents the reverse hidden state of the current cell.
For network traffic, Bi-LSTM can effectively utilize the temporal features present in
Electronics 2022, 11, x FOR PEER REVIEW
the contextual information to improve the model training, and its structure is shown 7 ofin
14
Figure 4.
Figure4.4. Bi-LSTM
Figure Bi-LSTM structure.
structure.
2.5.
2.5. Network
Network Architecture
Architecture
As
As shownin
shown in Figure
Figure5, 5,the
the overall
overall architecture
architectureof ofthe
theDLNID
DLNIDmodel
modelconsists
consistsof ofseven
seven
parts, which are the input layer, encoder layer, multiple convolutional layer,
parts, which are the input layer, encoder layer, multiple convolutional layer, attention attention layer,
Bi-LSTM layer, fully
layer, Bi-LSTM layer,connected layer, and
fully connected theand
layer, output layer. In
the output the first
layer. In thelayer,
firstthe model
layer, the
accepts the network traffic data from the dataset. In the encoder layer,
model accepts the network traffic data from the dataset. In the encoder layer, the model the model uses
the
usesencoder part of
the encoder theofimproved
part the improved stacked autoencoder
stacked that has
autoencoder thatbeen trained
has been to perform
trained to per-
dimensionality reduction on the data. In the multiple convolutional layer,
form dimensionality reduction on the data. In the multiple convolutional layer, the model the model uses
multiple
uses multiple convolutional operations to extract features from the downscaled data.the
convolutional operations to extract features from the downscaled data. In In
attention layer,
the attention the model
layer, uses uses
the model the CBAM
the CBAMto redistribute the weights
to redistribute of each
the weights channel
of each and
channel
assign moremore
and assign important channels
important channelswith higher
with higherweights.
weights. InIn
the
theBi-LSTM
Bi-LSTMlayer,layer,the
themodel
model
extracts the feature information of each dimension and learns the relationship between the
extracts the feature information of each dimension and learns the relationship between
dimensions. In the fully connected layer and the output layer, the model passes the learned
the dimensions. In the fully connected layer and the output layer, the model passes the
features onto the classifier and outputs the classification results. Algorithm 1 presents the
learned features onto the classifier and outputs the classification results. Algorithm 1 pre-
training process of the DLNID model.
sents the training process of the DLNID model.
Electronics 2022, 11, 898 7 of 13
Figure5.5.Overall
Figure Overall structure
structure of model.
of the the model.
3. Datasets
3.1. Data Analysis
The experimental data in this paper adopt the NSL-KDD dataset [5], which is an
improved version of the KDD99 dataset [25] that addresses the data redundancy problem
present in the KDD99 dataset and is one of the benchmark datasets used to evaluate the
performance of IDS. It consists of a training set (KDDTrain+), containing 125,973 traffic
samples, and a test set (KDDTest+), containing 22,544 traffic samples. In order to restore
the complex network situation in reality to a greater extent, there are only 19 attack types
in the training set, and the other 17 attack types only exist in the testing set.
The NSL-KDD dataset has a total of 42 dimensional features, one of which is a clas-
sification label, and the rest are feature labels. For binary classification, the classification
Electronics 2022, 11, 898 8 of 13
labels are divided into two categories, i.e., normal and anomaly. For multiclassification, the
classification labels are divided into five categories, i.e., normal, Dos, R2L, U2R, and probe.
3.2.3. Normalization
A large gap between different dimensional feature data within the dataset can bring
about problems such as slow model training and insignificant accuracy improvement;
therefore, in order to tackle this issue, the MinMaxScaler [26] was adopted to map the data
into the range of (0,1) as follows:
x − xmin
x0 = (14)
xmax − xmin
where xmax is the maximum value, and xmin is the minimum value.
4. Results
In the following section, we detail the experimental settings and appraise the per-
formance metrics of the model. In addition, we present two sets of ablation experiments
to verify the reliability of the data augmentation and improve dimensionality reduction
approaches proposed in Section 2. Finally, we compare the model with other papers.
TP + TN
Acc = (15)
TP + TN + FP + FN
TP
Pre = (16)
TP + FP
Electronics 2022, 11, 898 9 of 13
TP
Rec = (17)
TP + FN
2 × Pre × Rec
F1 = (18)
Pre + Rec
FP
FPR = (19)
FP + TN
4.3.
Electronics 2022, 11, x FOR PEER REVIEW Result Analysis 10 of 14
The experiment studied the performance of the proposed network on normal, Dos,
R2L, U2R, and probe for binary and multiclassification experiments, respectively. When the
network
using the parameters were chosen
confusion matrix. as shown in Table
The experimental 2, the
results show high accuracy
that and F1 score
most samples were could
clas-
be achieved on the KDDTest+ test set. Figures 6 and 7 show the experimental results
sified correctly, which appear on the diagonal, indicating a better classification perfor- using
the confusion
mance. However,matrix. The experimental
the comparison betweenresults
the twoshow
figuresthatshows
most that
samples were classified
the performance of
correctly, which appear on the diagonal, indicating a better classification
the proposed model was somewhat degraded on the multiclassification experiments, performance.
However, the comparison between the two figures shows that the performance of the
compared with the binary classification experiments. Table 3 provides the false-positive
proposed model was somewhat degraded on the multiclassification experiments, compared
and recall rates corresponding to different attacks under the multiclassification task; the
with the binary classification experiments. Table 3 provides the false-positive and recall
aim was to achieve a lower false-positive rate and a higher recall rate in intrusion detec-
rates corresponding to different attacks under the multiclassification task; the aim was to
tion. From the analysis, it can be concluded that despite the data augmentation process,
achieve a lower false-positive rate and a higher recall rate in intrusion detection. From the
the U2R category was more likely to be misclassified because the U2R category in the test
analysis, it can be concluded that despite the data augmentation process, the U2R category
set was much larger than the others in the training set.
was more likely to be misclassified because the U2R category in the test set was much larger
than the others in the training set.
Table 2. Model parameters.
Type
Table 2. Model parameters. Parameter
Encoder -
Type Parameter
Conv1d 5×5
Encoder -
BatchNorma1d -
Conv1d 5×5
Maxpool1d
BatchNorma1d - 3×3
Conv1d
Maxpool1d 3 × 31×1
Conv1d
ChannelAttention 1×1 -
ChannelAttention
Bidirectional LSTM - -
Bidirectional LSTM -
Dropout
Dropout 0.3
0.3
Fully connected (LeakyRelu)
Fully connected (LeakyRelu) 32 32
Dropout
Dropout 0.2 0.2
Fully connected
Fully ()
connected () 16 16
Loss function
Loss function CrossEntropy
CrossEntropy
Optimizer
Optimizer Adam Adam
Learning rate 0.0005
Learning rate 0.0005
Confusionmatrix
Figure6.6.Confusion
Figure matrix(2(2classes).
classes).
Confusionmatrix
Figure7.7.Confusion
Figure matrix(5(5classes).
classes).
Table
4.3.2.5.Dimensionality
Dimensionality reduction
Reduction comparison.
Comparison
Table 5 shows
Typethe experimental results
ACC (%)of selecting
Predifferent
(%) dimensionality
Rec (%) reduction
F1 (%)
methods for horizontal
PCA comparison under
85.29the condition
83.45 in which82.14
the model 82.79
was the
same, andAutoencoder
the data augmentation method86.09remained unchanged.
85.08 Compared
82.12 with
83.57 the
PCA, the performance of each
Improved stacked autoencoder model in this
90.73 paper greatly
86.38 improved.
93.17Compared with
89.65
the autoencoder, the improved stacked autoencoder used in this paper also had some
improvement in accuracy and F1 score, with an increase of 4.64% and 3.28%, respectively.
4.3.3. Model Comparison
TableFigure 8 comparesreduction
5. Dimensionality the proposed DLNID model with other reference models in terms
comparison.
of accuracy, and it can be seen that the accuracy of DLNID is higher than other models.
Table 6 comparesTypethe proposed modelACCand(%)other network
Pre (%) modelsRec (%) of various
in terms F1 (%)per-
formance metrics,
PCA from which it can be seen
85.29 that the proposed
83.45 DLNID82.14model outperforms
82.79
its comparison peers in terms of Accuracy
Autoencoder 86.09 and F1 score,
85.08 reaching82.12
90.73% and 89.65%
83.57 on
theImproved
KDDTest+stacked autoencoder
dataset, respectively.90.73
Compared with 86.38the traditional
93.17 machine 89.65
learning
Electronics 2022, 11, 898 11 of 13
of network intrusion detection. In the future, we plan to apply the DLNID model to an actual,
combined network capture module to implement an online intrusion detection model.
Author Contributions: Methodology, Y.F. and Y.D.; funding acquisition, W.X.; investigation, Y.F., Y.D.,
W.X. and Q.L.; resources, Z.C. and W.X.; validation, Z.C. and Q.L.; writing—original draft preparation,
Y.D.; writing—review and editing, Y.F. All authors have read and agreed to the published version of
the manuscript.
Funding: The work of Y.F., Y.D., Z.C. and Q.L. is supported, in part, by Shannxi S&T under Grant
2021KW-07 and Shaanxi Education under Fund 19jk0414.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Patel, A.; Qassim, Q.; Wills, C. A survey of intrusion detection and prevention systems. Inf. Manag. Comput. Secur. 2010, 18,
277–290. [CrossRef]
2. Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges.
Cybersecurity 2019, 2, 20. [CrossRef]
3. Yuan, L.; Chen, H.; Mai, J.; Chuah, C.N.; Su, Z.; Mohapatra, P. Fireman: A toolkit for firewall modeling and analysis. In
Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P’06), Berkeley/Oakland, CA, USA, 21–24 May 2006; IEEE:
Manhattan, NY, USA, 2006; pp. 15–213.
4. Musa, U.S.; Chhabra, M.; Ali, A.; Kaur, M. Intrusion detection system using machine learning techniques: A review. In
Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 10–12
September 2020; IEEE: Manhattan, NY, USA, 2020; pp. 149–155.
5. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009
IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009;
IEEE: Manhattan, NY, USA, 2009; pp. 1–6.
6. Amor, N.B.; Benferhat, S.; Elouedi, Z. Naive bayes vs. decision trees in intrusion detection systems. In Proceedings of the
Proceedings of the 2004 ACM Symposium on Applied Computing, Nicosia, Cyprus, 14–17 March 2004; Association for Computing
Machinery: New York, NY, USA, 2004; pp. 420–424.
7. Tao, P.; Sun, Z.; Sun, Z. An improved intrusion detection algorithm based on GA and SVM. IEEE Access 2018, 6, 13624–13631.
[CrossRef]
8. Jiadong, R.; Xinqian, L.; Qian, W.; Haitao, H.; Xiaolin, Z. A multi-level intrusion detection method based on KNN outlier detection
and random forests. J. Comput. Res. Dev. 2019, 56, 566.
9. Shapoorifard, H.; Shamsinejad, P. Intrusion detection using a novel hybrid method incorporating an improved KNN. Int. J.
Comput. Appl. 2017, 173, 5–9. [CrossRef]
10. Kim, G.; Lee, S.; Kim, S. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert
Syst. Appl. 2014, 41, 1690–1700. [CrossRef]
11. Zhao, G.; Zhang, C.; Zheng, L. Intrusion detection using deep belief network and probabilistic neural network. In Proceedings of
the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on
Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; IEEE: Manhattan, NY, USA, 2017; Volume 1,
pp. 639–642.
12. Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network for representation
learning. In Proceedings of the 2017 International Conference on Information Networking (ICOIN), Da Nang, Vietnam, 11–13
January 2017; IEEE: Manhattan, NY, USA, 2017; pp. 712–717.
13. Torres, P.; Catania, C.; Garcia, S.; Garino, C.G. An analysis of recurrent neural networks for botnet detection behavior. In
Proceedings of the 2016 IEEE Biennial Congress of Argentina (ARGENCON), Buenos Aires, Argentina, 15–17 June 2016; IEEE:
Manhattan, NY, USA, 2016; pp. 1–6.
14. Su, T.; Sun, H.; Zhu, J.; Wang, S.; Li, Y. BAT: Deep learning methods on network intrusion detection using NSL-KDD dataset.
IEEE Access 2020, 8, 29575–29585. [CrossRef]
15. He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings
of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence),
Hong Kong, China, 1–8 June 2008.
16. Meng, Q.; Catchpoole, D.; Skillicom, D.; Kennedy, P.J. Relational autoencoder for feature extraction. In Proceedings of the 2017
International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Manhattan, NY, USA,
2017; pp. 364–371.
17. Roweis, S. EM algorithms for PCA and SPCA. Adv. Neural Inf. Processing Syst. 1998, 10, 626–632.
18. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks
from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
Electronics 2022, 11, 898 13 of 13
19. Jie, H.; Li, S.; Gang, S. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018.
20. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European conference
on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19.
21. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw.
Learn. Syst. 2017, 28, 2222–2232. [CrossRef] [PubMed]
22. Gui, Z.; Sun, Y.; Yang, L.; Peng, D.; Li, F.; Wu, H.; Guo, C.; Guo, W.; Gong, J. LSI-LSTM: An attention-aware LSTM for real-time
driving destination prediction by considering location semantics and location importance of trajectory points. Neurocomputing
2021, 440, 72–88. [CrossRef]
23. Lin, T.; Horne, B.G.; Tino, P.; Giles, C.L. Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans. Neural
Netw. 1996, 7, 1329–1338. [PubMed]
24. Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain.
Fuzziness Knowl.-Based Syst. 1998, 6, 107–116. [CrossRef]
25. Engen, V.; Vincent, J.; Phalp, K. Exploring discrepancies in findings obtained with the KDD Cup 99 data set. Intell. Data Anal.
2011, 15, 251–276. [CrossRef]
26. Bisong, E. Introduction to scikit-learn. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Apress:
Berkeley, CA, USA, 2019; pp. 215–229.
27. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell.
Res. 2002, 16, 321–357. [CrossRef]
28. Wisanwanichthan, T.; Thammawichai, M. A double-layered hybrid approach for network intrusion detection system using
combined naive bayes and SVM. IEEE Access 2021, 9, 138432–138450. [CrossRef]
29. Ieracitano, C.; Adeel, A.; Morabito, F.C.; Hussain, A. A novel statistical analysis and autoencoder driven intelligent intrusion
detection approach. Neurocomputing 2020, 387, 51–62. [CrossRef]
30. Ding, Y.; Zhai, Y. Intrusion detection system for NSL-KDD dataset using convolutional neural networks. In Proceedings of the 2018
2nd International Conference on Computer Science and Artificial Intelligence, Shenzhen, China, 8–10 December 2018; pp. 81–85.
31. Gao, X.; Shan, C.; Hu, C.; Niu, Z.; Liu, Z. An adaptive ensemble machine learning model for intrusion detection. IEEE Access 2019,
7, 82512–82521. [CrossRef]
32. Tama, B.A.; Comuzzi, M.; Rhee, K. TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection
system. IEEE Access 2019, 7, 94497–94507. [CrossRef]
33. Kanakarajan, N.K.; Muniasamy, K. Improving the accuracy of intrusion detection using GAR-forest with feature selection. In
Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015,
Durgapur, India, 16–18 November 2015; Springer: New Delhi, India, 2016; pp. 539–547.
34. Jiang, K.; Wang, W.; Wang, A.; Wu, H. Network intrusion detection combined hybrid sampling with deep hierarchical network.
IEEE Access 2020, 8, 32464–32476. [CrossRef]
35. Pervez, M.S.; Farid, D.M. Feature selection and intrusion classifi-cation in NSL-KDD Cup 99 dataset employing SVMs. In Proceedings
of the 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), Dhaka,
Bangladesh, 18–20 December 2014; pp. 1–6.