Intrusion Detection System Using Voting-Based Neural Network
Intrusion Detection System Using Voting-Based Neural Network
Intrusion Detection System Using Voting-Based Neural Network
Abstract: Several security solutions have been proposed to detect network abnormal behavior. However, successful
attacks is still a big concern in computer society. Lots of security breaches, like Distributed Denial of Service (DDoS),
botnets, spam, phishing, and so on, are reported every day, while the number of attacks are still increasing. In
this paper, a novel voting-based deep learning framework, called VNN, is proposed to take the advantage of any
kinds of deep learning structures. Considering several models created by different aspects of data and various deep
learning structures, VNN provides the ability to aggregate the best models in order to create more accurate and
robust results. Therefore, VNN helps the security specialists to detect more complicated attacks. Experimental
results over KDDCUP’99 and CTU-13, as two well known and more widely employed datasets in computer network
area, revealed the voting procedure was highly effective to increase the system performance, where the false
alarms were reduced up to 75% in comparison with the original deep learning models, including Deep Neural
Network (DNN), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit
(GRU).
Key words: deep learning; Voting-based Neural Network (VNN); network security; Pearson correlation coefficient
xk−1 xk+1
output.
xk
Aminanto and Kim[23] applied SAE as a classifier on
KDDcup’99 dataset and presented four different IDSes:
... Repeating Repeating Repeating
... application layer IDS, transport layer IDS, network layer
unit unit unit
IDS, and data link layer IDS. Javaid et al.[24] used SAE
to learn features from NSLKDD.
hk−1 hk hk+1
Farahnakian and Heikkonen[25] proposed Deep
Auto Encoder (DAE) to extract features from high
Fig. 1 RNN architecture[15] . dimensional data. They achieved more than 97%
detection precision in case of using 10% KDDCUP’99
Althubiti et al.[19] compared the performance of RNN
dataset as test case.
with traditional machine learning methods, including
BM is a type of stochastic RNN to make decisions
naive Bayes, random forest, and Support Vector Machine
concerning being either on or off. BM provides the
(SVM), using KDDCUP’99 in both multi-class and
ability to simply learn systems and interesting features
binary classifiers and revealed RNN overwhelmed all
from datasets having binary labels[26] .
the traditional methods well. A Gated Recurrent Unit
A multi-layer Denial of Service (DoS) attack detection
Recurrent Neural Network (GRU-RNN) was proposed
technique based on Deep Boltzmann Machine (DBM)
by Tang et al.[20] with the performance of 89% on
was provided by Gao et al.[27] The authors argued that
KDDCUP’99 using only 6 raw features.
their method gained better precision on KDDCUP’99
CNN is a special deep learning architecture firstly
compared to SVM and simple Artificial Neural Network
developed for image recognition problem. However,
(ANN). Zhang and Chen[28] sped up the training time by
Yao et al.[21] proposed a CNN-based method to detect
combining SVN, BM, and Deep Belief Network (DBN).
time-delayed attacks, and reported that the method was
Alrawashdeh and Purdy[29] achieved 97.9% precision on
highly accurate for DARPA’98 dataset. Wu et al.[22]
10% KDDCUP’99 dataset as the test case. Recently
employed CNN in order to select traffic properties
Vinayakumar et al.[30–34] provided a comprehensive
automatically from raw dataset. They evaluated the
study of various CNN, LSTM, CNN-LSTM, CNN-GRU,
method by KDDCUP’99 and argued that the method
and DNN to select the optimal network architecture
performs better in terms of performance and false alarm
using KDDCUP’99 and NSLKDD datasets.
rate compared to the conventional standard algorithms.
Haghighat et al.[35] also developed a sliding
SAE is a specific type of neural network with exactly
window-based deep learning technique (called
the same size output of its input. The main goal of
SAWANT) which achieved 99.952% accuracy on
SAE is to reconstitute of the output from the input.
CTU-13 dataset. The authors used only 1%–10%
Figure 2 depicts the SAE architecture where the input
CTU-13 dataset as training to conduct their tests.
is compressed and then decompressed to compute the
The aforementioned methods took the advantage of
deep learning to detect network malicious activities.
Input layer Hidden layers Output layer
Although their performance was considerable,
aggregating different deep learning models provides
the capability to utilize the strength of each model and
detect attacks incredibly more efficient.
In this paper we propose “Voting-based Neural
Network (VNN)” as a general infrastructure voting-based
mechanism to aggregate and take the advantages of
Code
any kinds of deep learning algorithms. In other words,
several deep learning-based models can be created by the
state-of-the-art techniques with different performance.
Giving test data, VNN provides a procedure to perform a
Encoder Decoder weighted voting function on the most suitable models to
Fig. 2 SAE architecture.
achieve higher accurate results. Due to only selecting and
486 Tsinghua Science and Technology, August 2021, 26(4): 484–495
aggregating the best models for each test sample, VNN Deep learning models 1
…
Result
an overview of VNN is explained. Then, VNN is Model 1n
Feature selection
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting Based Neural Network 3
deeply studied by two well-known KDDCUP’99 and
Voting engine
Input data
CTU-13 datasets in Sections 3 and 4, using two different
aggregating the best models for each test sample, VNN Deep learning models
configurations: high and low accuracies, respectively.
2
results research
proved plans
our argument as the false alarms were
are explained.
…
…
reduces up to 75%. Deep learning modelsm
Table2 ??Voting-Based
summarizedNeural all theNetwork
relevant acronyms Inputm
employed throughout
Voting-based thenetwork
neural paper. is a general infrastructure
The paper is structured using
to create several models different aspects
as follows. of data or
In Section 2, Fig. 3 VNN architecture.
various types of deep learning architectures, and merging
an overview of VNN is explained. Then, VNN is Algorithm 1 VNN whole procedure
them, aiming at increasing the system performance. input1: traindata = {F1, F2, . . . , Fl} //input train data
deeply studied by two well-known KDDCUP’99 and
As illustrated in VNN architecture (Fig. 3), several input2: testdata = {F10 , F02, . . . , Fl0} //input test data
CTU-13inputs
datasets in Sections
are extracted from 3theand 4, using
original data two
to bedifferent
modeled where Fi = {ai1 , ai2 , . . . , aik } //k different attributes
input3: Θ = {θ1, θ2, . . . , θn} //n different models
configurations:
by varioushighkindsand low learning
of deep accuracy, respectively.
techniques, like output: prediction result
Finally,DNN,
in Section 5, theSAE,
CNN, RNN, paperandis so
concluded and future
on. As a result, in the 1 //initialization
2 ← {} //empty set as training data
prediction phase,
research plans are explained.a heuristic function, called “voting 3 ω ← {} //empty set as testing data
engine", processes all the models to select the best 4 ∆ ← {} //empty set as prediction results
5 Ξ ← {} //empty set as voting candidates
2 Voting-Based
candidates in a wayNeural Network
to minimize the errors. The chosen
6 //selecting n different train and test features
models perform voting procedure in order to predict 7 vectors with randomely chosen attibutes
Voting-based
test dataNeural Network is
label. Algorithm a generalthe
1 describes infrastructure
whole VNN 8 for i ← range(1, n)
procedure in detail. 9 A ← randomely select i attributes
to create several models using different aspects of data or 10 Ψi ← selectattributes(train, A)
In theofnext
various types deeptwo sectionsarchitectures,
learning different case studies on well
and merging 11 Ωi ← selectattributes(test, A)
known KDDCUP’99 and CTU-13 datasets are presented 12 end
them, aiming at increasing the system performance. 13 for each models θi, train dataj , and test data ωj
to make the voting procedure clearer.
14 modelij ← train(θi, ψj) //train model
15 ∆ ij← predict(modelij , ωj ) //prediction result
3 Case Study 1: KDDCUP’99
Deep Learning Models
model
1
16 end
11
17 ∆ 0 ← select best voting candidates
KDDCUP’99[36] is the mostly used dataset to evaluate 18 result ← vote(∆ 0 )
19 return result
Table 1 Acronyms used
model12
through the paper.
Acronym Expression
[37]
VNN Voting-based Neural Network In the nextdetection
anomaly-based two sections different
systems . The case studies
dataset was on well
[38]
DDoS
Input1 Distributed Denial of Service built based on DARPA’98 project and contains
known KDDCup’99 and CTU-13 datasets are presented about
4.9 million records, including 41 different features with
…
1: KDDCUP’99
Voting Engine
Input Data
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 487
0.9 0.9
0.8 0.8
Normalized accuracy
Normalized accuracy
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
38
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
1
Number of attributes Number of attributes
0.9 0.9
0.8 0.8
Normalized accuracy
Normalized accuracy
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0.9 0.9
0.8 0.8
Normalized accuracy
Normalized accuracy
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0.9 0.9
0.8 0.8
Normalized accuracy
Normalized accuracy
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
2
0.9 0.9
0.8 0.8
Normalized accuracy
Normalized accuracy
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Fig. 5 Normalized form of model accuracy (The blue dashed lines show UMT).
UMT were removed. magnificently in both higher and less accurate deep
The voting procedure was conducted over the learning structures. VNN resolved 708 errors out of 1804
remained models and the result was depicted by Fig. 6. (more than 39%) for binary classification-based GRU
The results proved that VNN increased the true responses architecture, and 63 675 false alarms out of about 85 000
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 489
0.9990
Voting Table 4 KDDCUP’99 binary classification confusion
0.9985
All features matrix.
Predicted
0.9980
Normal Malicious Total
0.9975 Normal 301 031 203 301 234
Accuracy
0.9970
Actual Malicious 488 188 121 188 609
Total 301 519 188 324 489 843
0.9965
0.9960
Table 5 KDDCUP’99 five-class classification confusion
matrix.
0.9955 Predicted
0.9950 Normal DoS R2L U2R Probing Total
CNN DNN GRU LSTM CNN-LSTM
Normal 277 269 219 20 608 0 0 298 096
(a) Binary classification
DoS 490 188 107 5 7 0 188 609
1.00
Voting R2L 0 62 3060 0 0 3122
0.95 Actual
0.90
All features U2R 0 1 0 0 0 1
0.85 Probing 0 14 1 0 0 15
0.80 Total 277 759 188 403 23 674 7 0 489 843
0.75
Accuracy
0.70
0.65 in Table 6. These values were achieved by Eqs. (3) – (8)
0.60 in the following:
0.55
FP
0.50 FPR D (3)
0.45 FP C TN
0.40 FN
0.35 FNR D (4)
0.30 FN C TP
CNN DNN GRU LSTM CNN-LSTM
TP
(b) Five-class classification Precision D (5)
TP C FP
Fig. 6 System accuracy: voting-based vs. normal-based TP
using KDDCUP’99 dataset. Recall D (6)
TP C FN
(around 75%) for five-class classification-based CNN- TP C TN
Accuracy D (7)
LSMT models. The detailed number of false alarms and All Data
Precision Recall
their correction rates were explained in Table 3. F_Score D 2 (8)
We also performed the voting procedure over all the Precision C Recall
where FP, FN, TP, and TN denote False Positive, False
models created by any deep architectures, in which the
Negative, True Positive, and True Negative, respectively.
performance result is summarized in Tables 4 and 5.
The result proved that VNN achieved higher accuracy
Different measurements of the experiment, including
compared to any deep learning structures for both binary
False Positive Rate (FPR), False Negative Rate (FNR),
and five-class classifiers efficiently. Figure 7 compares
Accuracy, Precision, Recall, and F_Score are computed
VNN with DNN, CNN, LSTM, CNN-LSTM, and GRU
Table 3
KDDCUP’99 error correction.
methods.
Number of Number of Correction
Method
errors corrections rate (%) 4 Case Study 2: CTU-13
DNN 777 29 3.73 CTU-13 contains thirteen days labeled traffic, captured
CNN 872 97 11.12
by CTU University, Czech Republic in 2011[42] . It has
Binary LSTM 1551 551 35.53
about twenty million netflow records, including Internet
CNN-LSTM 993 148 14.90
GRU 1804 708 39.25
Table 6 Measurement result of KDDCUP’99 study.
DNN 205 439 25 497 12.41
FPR FNR Accuracy Precision Recall F_Score
CNN 205 306 7463 3.64
Binary
Five-class LSTM 208 849 81 263 38.90 0.0011 0.0016 0.9986 0.9993 0.9984 0.9989
classification
CNN-LSTM 85 068 63 675 74.85 Five-class
0.0982 0.0021 0.9563 0.9302 0.9979 0.9628
GRU 208 513 28 374 13.61 classification
490 Tsinghua Science and Technology, August 2021, 26(4): 484–495
0.9980
hop router. These attributes are too simple to be used
0.9978
in a deep learning method to detect network attacks.
0.9976
As a result, Haghighat et al.[35] developed a sliding
0.9974
window-based technique, called SmArt Window-based
0.9972
Anomaly detection using Netflow Traffic (SAWANT),
CNN DNN GRU LSTM CNN-LSTM VNN
which aggregates netflow records and extracts several
(a) Binary classification meaningful attributes using sliding window algorithm.
1.0 Using training small subset of netflow records (one
0.9
to ten percent), SAWANT was able to achieve high
0.8
accurate models, which is its main contribution. As
0.7
0.6
illustrated in Fig. 8, new feature vectors were extracted
Accuracy
0.1
abnormal.
0
(1) Slide a window of size w through the netflow
CNN DNN GRU LSTM CNN-LSTM VNN
records.
(b) Five-class classification
(2) For each position of the window, calculate these
Fig. 7 VNN vs. other deep learning architectures. attributes:
Number of unique values of source IP/port,
Relay Chat (IRC), P2P, HTTP, fast flux, spam, click
destination IP/port, duration, source bytes, number of
fraud, port scan, and DDoS traffic. The goal of CTU-13
packets, and flow size per incoming and outgoing flows.
is to collect a large real botnet traffic mixed with the
Entropy values of source IP/port, destination
normal user activities in the network. Table 7 describes
IP/port, duration, source bytes, number of packets, and
the distribution of labels in the netflow traffic per day.
flow size per incoming and outgoing flows.
4.1 Deep learning models Minimum, maximum, average, sum, and variance
Netflow traffic contains high level network activities of duration, source bytes, number of packets, and flow
information, including source IP/port numbers, size per incoming, outgoing, and total flows.
Calculate malicious rate () as the label of each
Table 7 CTU13 label distribution. vector based on Eq. (9) in the following:
Number Command Number of malicious netflow records
Botnet Normal Background (9)
Day of flows and D
(%) (%) (%) Window size
(million) control (%)
1 2.82 1.41 1.07 0.030 97.47 The new feature vectors were used to train ANN
2 1.81 1.04 0.50 0.110 98.33 model as depicted in Fig. 9, where the output layer
3 4.71 0.56 2.48 0.001 96.94 expressed the malicious rate.
4 1.21 0.15 2.25 0.004 97.58 The results of the test dataset were compared with the
5 0.13 0.53 3.6 1.150 95.70 actual malicious rate values using “Pearson correlation
6 0.56 0.79 1.34 0.030 97.83
7 0.11 0.03 1.47 0.020 98.47 f11 f12 … f1m 𝜌1 f21 f22 … f2m 𝜌2 fn1 fn2 … fnm 𝜌𝑛
Netflown
…
…
Network Pre-processing
Fn
coefficient” function, described by Eq. (10) in the following: Sort the models based on the computed value and
EŒX Y EŒX EŒY remove the last 50% models.
rX;Y D p D
For each two remaining predicted sets i and j :
p
EŒX 2 EŒX 2 EŒY 2 EŒY 2
Pn – Compute ˛ as the Pearson correlation coefficient of
xi yi nxN yN Si and Sj , r.Si ; Sj /.
i D1
s s (10) – Remove t from both Si and Sj , and compute ˇ
n n
P 2
xi nxN 2
P 2
yi nyN 2 as the Pearson correlation coefficient of the two sets,
i D1 i D1 r.Si ft g; Sj ft g/.
where X and Y were two different variable sets. – Compare the Pearson correlation coefficient
Definition 2 Let X and Y be two different data calculated from the above steps.
series. X and Y are positively correlated (r D 1), and – Mark Si and Sj as being similar for test case t if ˛
8xi 2 X; yi 2 Y j yi D ˛xi C ˇ; is greater than ˇ.
Put similar models into a single set.
where ˛ and ˇ are two arbitrary numbers. Return the largest set as the voting candidate.
4.2 Voting procedure Compute the result based on the majority voting
As described in the previous section, a ranking schema over the parties inside the selected set.
Algorithm 3 describes the model selection procedure
mechanism is defined in order to select a subset of
in detail.
more probable models to achieve more accurate results
in the voting procedure. The more decisive models 4.3 Experimental results
were selected in the classification environment (like We chose DNN, CNN, LSTM, and GRU as the deep
Case Study 1 with “malicious” and “benign” classes), learning structure of SAWANT and performed the
the more likely it has more accuracy. However, the voting procedure to evaluate VNN. The SAWANT
main challenge of SAWANT is its predicted malicious pre-processed data contains 92 different attributes. We
rate which is numerical (not categorical). In fact, the extracted 73 unique subsets, each containing 72 features.
SAWANT predicted results were not equal to the Table 8 explains the hyper parameters to test CTU-13
actual values, meaning finding more decisive models dataset.
impossible. Therefore, the aforementioned majority We configured the deep learning structure in a way
voting procedure explained in Section 3.1 is not practical to result both higher and lower accuracies, in which
here. As a result, we developed a new heuristic procedure the performance of DNN, CNN, GRU, and LSTM was
to rank and select better models for any arbitrary test 99%, 94%, 76%, and 70%, respectively. UMT was also
case as t. configured as 0.8 to select better models in the voting
Normalize the accuracy of all the models according procedure. Figure 10 illustrates the accuracy of each
to Eq. (2) and remove less accurate models based on model created by various extracted subsets and deep
UMT. learning architectures.
Compute the sum of Pearson correlation coefficient Figure 11 compares the accuracy of VNN with the
(r) of each predicted model with all the others. utilized deep learning structures (DNN, CNN, LSTM,
• Put similar models into a single set.
with the • Return the largest set as the voting candidate.
elation • Compute the result based on the majority voting
schema over the parties inside the selected set.
Algorithm 3 describes the model selection procedure
=
2 in detail.
492 Tsinghua Science and Technology, August 2021, 26(4): 484–495
1.0
Algorithm 3 SAWANT best model selection procedure
(10) input1: Γ = {γ1, γ2, … , γn} //predicted malicous rates set
0.9
0.8
where γi = {pmri1, pmri2, … , pmrim }
//predicted malicous rates of m test cases 0.7
Accuracy
input3: λ //unsatisfied model threshold 0.5
0.1
3 ∆ ← {} //inittializing the output
0
4 M ← {} //inittializing the set of satisfied models 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
results 13 δγi ← δγi + r(γi, γj) //r is correlation coefficient Accuracy 0.5
14 end
models 15 ∆ sorted ← sort(∆)
0.4
0.3
nt (like 16 Γ 0 ← remain the top 50% Γ based on ∆sorted 0.2
17 for each γi, γj ∈ Γ 0
lasses), 18 r← r(γi, γj)
0.1
the 0
ver, 19 r0← r(γi − {pmripivot }, γj − {pmrjpivot})
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
Number of features
20 if r is greater than r0
licious 21 (b) DNN–normalized form of accuracy
θi, j ← 1
n fact, 22
else
1.0
the 23 θi, j ← 0 0.9
to 24 end 0.8
ractical 0.4
Table 8 Hyper parameters used to test CTU-13. 0.3
Hyper parameter Value 0.2
0.6
Number of GRU layers 2
Accuracy
0.5
Number of input attributes 72
0.4
Number of input subsets 73 0.3
Output Malicious rate 0.2
UMT 0.8 0.1
TU 0.5 0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
Number of features
and GRU). VNN decreased false alarms significantly, (d) GRU–normalized form of accuracy
especially for LSTM and DNN methods, where 272 507
Fig. 10 Model accuracy reported by the system during the
out of 668 597 errors (around 40%) and 12 418 out of
training phase.
17 112 errors (about 72%) were corrected, respectively.
Table 9 expresses the detail of error correction over CTU- Tables 10 and 11 also summarized VNN performance
13 dataset. over DNN as the best suited model in our case study.
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 493
0.80
are another direction to work in the future. Deeper
analysis on different attack types (e.g., those provided
0.75
in KDDCUP’99—DoS, R2L, U2R, and probing) will
0.70
give us a suitable feedback to create more robust models.
0.65 The proposed method missed U2R and probing attacks,
0.60
however the number of samples were too small. But we
DNN CNN LSTM GRU
plan to address this issue in the future.
Fig. 11 System accuracy: voting-based vs. normal-based
using CTU-13 dataset.
Acknowledgment
Journal of Network and Computer Applications, vol. 103, Belgium, 2016, pp. 21–26.
pp. 131–145, 2018. [25] F. Farahnakian and J. Heikkonen, A deep auto-encoder-
[11] M. Lotfollahi, M. J. Siavoshani, R. S. Hosseinzade, and M. based approach for intrusion detection system, in Proc.
S. Saberian, Deep packet: A novel approach for encrypted of 2018 20th International Conference on Advanced
traffic classification using deep learning, Soft Computing, Communication Technology, Chuncheon-si Gangwon-do,
vol. 24, no. 3, pp. 1999–2012, 2020. South Korea, 2018, pp. 178–183.
[12] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapè, [26] R. Salakhutdinov and G. Hinton, Deep boltzmann machines,
MIMETIC: Mobile encrypted traffic classification using in Proc. of Twelfth International Conference on Artificial
multimodal deep learning, Computer Networks, vol. 165, Intelligence and Statistics, Clearwater, FL, USA, 2009, pp.
pp. 1186–1191, 2019. 448–455.
[13] N. Mansouri and M. Fathi, Simple counting rule for optimal [27] N. Gao, L. Gao, Q. Gao, and H. Wang, An intrusion
data fusion, in Proc. of 2003 IEEE Conference on Control detection model based on deep belief networks, in Proc. of
Applications, Istanbul, Turkey, 2003, pp. 1186–1191. IEEE 2014 Second International Conference on Advanced
[14] D. Ciuonzo, A. De Maio, and P. S. Rossi, A systematic Cloud and Big Data, Huangshan, China, 2014, pp. 247–252.
framework for composite hypothesis testing of independent [28] X. Zhang and J. Chen, Deep learning-based intelligent
Bernoulli trials, IEEE Signal Processing Letters, vol. 22, no. intrusion detection, in Proc. of 2017 IEEE 9th International
9, pp. 1249–1253, 2015. Conference on Communication Software and Networks,
[15] A. Khan and F. Zhang, Using recurrent neural networks Guangzhou, China, 2017, pp. 1133–1137.
(RNNs) as planners for bio-inspired robotic motion, in [29] K. Alrawashdeh and C. Purdy, Toward an online anomaly
Proc. of 2017 IEEE Conference on Control Technology and intrusion detection system based on deep learning, in Proc.
Applications, Mauna Lani, HI, USA, 2017, pp. 1025–1030. of 2016 15th IEEE International Conference on Machine
[16] J. Kim and H. Kim, Applying recurrent neural network to Learning and Applications, Anaheim, CA, USA, 2016, pp.
intrusion detection with hessian free optimization, in Proc. 195–200.
of 2015 International Workshop on Information Security [30] R. Vinayakumar, K. P. Soman, and P. Poornachandran,
Applications, Jeju Island, Korea, 2015, pp. 357–369. A comparative analysis of deep learning approaches for
[17] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, Long short term network intrusion detection systems (N-IDSs): Deep
memory recurrent neural network classifier for intrusion learning for N-IDSs, International Journal of Digital Crime
detection, in Proc. of 2016 International Conference on and Forensics, vol. 11, no. 3, pp. 65–89, 2019.
Platform Technology and Service, Jeju South, Korea, 2016, [31] R. Vinayakumar, M. Alazab, K. P. Soman, P.
pp. 1–5. Poornachandran, A. Al-Nemrat, and S. Venkatraman, Deep
[18] C. Yin, Y. Zhu, J. Fei, and X. He, A deep learning approach learning approach for intelligent intrusion detection system,
for intrusion detection using recurrent neural networks, IEEE Access, vol. 7, pp. 41 525–41 550, 2019.
IEEE Access, vol. 5, pp. 21 954–21 961, 2017. [32] R. Vinayakumar, K. P. Soman, and P. Poornachandran,
[19] S. Althubiti, W. Nick, J. Mason, X. Yuan, and A. Evaluation of recurrent neural network and its variants for
Esterline, Applying long short-term memory recurrent intrusion detection system (IDS), International Journal of
neural network for intrusion detection, in Proc. of IEEE Information System Modeling and Design, vol. 8, no. 3, pp.
Southeast Conference 2018, St. Petersburg, FL, USA, 2018, 43–63, 2017.
pp. 1–5. [33] R. Vinayakumar, K. P. Soman, and P. Poornachandran,
[20] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and Evaluating effectiveness of shallow and deep networks to
M. Ghogho, Deep recurrent neural network for intrusion intrusion detection system, in Proc. of 2017 International
detection in SDN-based networks, in Proc. of 2018 4th IEEE Conference on Advances in Computing, Communications
Conference on Network Softwarization and Workshops, and Informatics, Manipal, India, 2017, pp. 1282–1289.
Montreal, Canada, 2018, pp. 202–206. [34] R. Vinayakumar, K. P. Soman and P. Poornachandran,
[21] Y. Yao, Y. Wei, F. Gao, and G. Yu, Anomaly intrusion Applying convolutional neural network for network
detection approach using hybrid MLP/CNN neural network, intrusion detection, in Proc. of 2017 International
in Proc. of Sixth International Conference on Intelligent Conference on Advances in Computing, Communications
Systems Design and Applications, Jinan, China, 2006, pp. and Informatics, Manipal, India, 2017, pp. 1222–1228.
1095–1102. [35] M. H. Haghighat, Z. Abtahi Foroushani, and J. Li,
[22] K. Wu, Z. Chen, and W. Li, A novel intrusion detection SAWANT: Smart window-based anomaly detection using
model for a massive network using convolutional neural netflow traffic, in Proc. of 2019 IEEE 19th International
networks, IEEE Access, vol. 6, pp. 50 850–50 859, 2018. Conference on Communication Technology, Xi’an, China,
[23] M. E. Aminanto and K. Kim, Deep learning-based feature 2019, pp. 1396–1402.
selection for intrusion detection system in transport layer, in [36] KDD CUP 1999 dataset, http://kdd.ics.uci.edu/databases/
Proc. of Summer Conference of Korea Information Security kddcup99/kddcup99.html, 1999.
Society, Busan, Korea, 2016, pp. 535–538. [37] T. Janarthanan and S. Zargari, Feature selection in UNSW-
[24] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, A deep NB15 and KDDCUP’99 datasets, in Proc. of 2017 IEEE
learning approach for network intrusion detection system, in 26th International Symposium on Industrial Electronics,
Proc. of 9th EAI International Conference on Bio-inspired Edinburgh, UK, 2017, pp. 1881–1886.
Information and Communications Technologies, Brussels, [38] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R.
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 495
Mohammad Hashem Haghighat received Jun Li received the PhD degree from New
the BS degree in computer engineering Jersey Institute of Technology (NJIT) in
from Shiraz Azad University, Shiraz, Iran 1997, and the MEng and BEng degrees
in 2008, and the MS degree in computer in automation from Tsinghua University
engineering from Sharif University of in 1998 and 1985, respectively. He is
Technology, Tehran, Iran in 2010. He is currently a professor at the Department of
currently a PhD candidate at Tsinghua Automation, Tsinghua University, and his
University, Beijing, China. His research research interests include network security
interests include network security, intrusion detection systems, and network automation.
deep learning, and information forensics.