Intrusion Detection System Using Voting-Based Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

TSINGHUA SCIENCE AND TECHNOLOGY

ISSNll1007-0214 09/15 pp484–495


DOI: 1 0 . 2 6 5 9 9 / T S T . 2 0 2 0 . 9 0 1 0 0 2 2
Volume 26, Number 4, August 2021

Intrusion Detection System Using Voting-Based Neural Network

Mohammad Hashem Haghighat and Jun Li

Abstract: Several security solutions have been proposed to detect network abnormal behavior. However, successful
attacks is still a big concern in computer society. Lots of security breaches, like Distributed Denial of Service (DDoS),
botnets, spam, phishing, and so on, are reported every day, while the number of attacks are still increasing. In
this paper, a novel voting-based deep learning framework, called VNN, is proposed to take the advantage of any
kinds of deep learning structures. Considering several models created by different aspects of data and various deep
learning structures, VNN provides the ability to aggregate the best models in order to create more accurate and
robust results. Therefore, VNN helps the security specialists to detect more complicated attacks. Experimental
results over KDDCUP’99 and CTU-13, as two well known and more widely employed datasets in computer network
area, revealed the voting procedure was highly effective to increase the system performance, where the false
alarms were reduced up to 75% in comparison with the original deep learning models, including Deep Neural
Network (DNN), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit
(GRU).

Key words: deep learning; Voting-based Neural Network (VNN); network security; Pearson correlation coefficient

1 Introduction Nowadays, deep learning provides a suitable


infrastructure to automatically learn features from raw
Computer network plays an important role nowadays.
data. This advantage enables the scientists to employ
Various internet-based services, like voice over IP,
deep learning techniques in different areas, like natural
internet banking, Point to Point (P2P) file sharing,
language processing, image and voice recognition, and
online gaming, and so on, having been used every day.
computer networks.
However, the number of network malicious activities
Generally, various types of deep learning models have
are increasing dramatically[1] . According to McAfee,
been developed, including Deep Neural Network (DNN),
“ransomware attacks”, as a type of malware aiming at
Recurrent Neural Network (RNN), Convolutional
blocking the access of a user to its computer until specific
Neural Network (CNN), Boltzmann Machine (BM), and
amount of money is paid, have been increased by 118%
Stacked Auto-Encoder (SAE).
during 2019[2] .
RNNs enable previous outputs to be used for the input
Dozens of behavior-based detection techniques have
of the next step as depicted in Fig. 1. Since RNNs are
been proposed to protect networks from such attacks.
suitable for time series data, they are widely utilized
The key challenge of these methods is to lower the false
in network anomaly-based detection techniques in the
alarms using machine learning algorithms[3–14] .
literature.
 Mohammad Hashem Haghighat and Jun Li are with the J. Kim and H. Kim[16] applied RNN to Intrusion
Department of Automation, Tsinghua University, Beijing Dection System (IDS) and achieved magnificent results
100084, China. E-mail: [email protected]; junl@
on KDDCUP’99. They improved their method by
tsinghua.edu.cn.
 To whom correspondence should be addressed.
employing Long Short-Term Memory (LSTM) as the
Manuscript received: 2020-03-24; revised: 2020-07-02; learning engine which the performance test showed the
accepted: 2020-07-10 system was suitable for IDSes[17] . Yin et al.[18] and
C The author(s) 2021. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 485

xk−1 xk+1
output.
xk
Aminanto and Kim[23] applied SAE as a classifier on
KDDcup’99 dataset and presented four different IDSes:
... Repeating Repeating Repeating
... application layer IDS, transport layer IDS, network layer
unit unit unit
IDS, and data link layer IDS. Javaid et al.[24] used SAE
to learn features from NSLKDD.
hk−1 hk hk+1
Farahnakian and Heikkonen[25] proposed Deep
Auto Encoder (DAE) to extract features from high
Fig. 1 RNN architecture[15] . dimensional data. They achieved more than 97%
detection precision in case of using 10% KDDCUP’99
Althubiti et al.[19] compared the performance of RNN
dataset as test case.
with traditional machine learning methods, including
BM is a type of stochastic RNN to make decisions
naive Bayes, random forest, and Support Vector Machine
concerning being either on or off. BM provides the
(SVM), using KDDCUP’99 in both multi-class and
ability to simply learn systems and interesting features
binary classifiers and revealed RNN overwhelmed all
from datasets having binary labels[26] .
the traditional methods well. A Gated Recurrent Unit
A multi-layer Denial of Service (DoS) attack detection
Recurrent Neural Network (GRU-RNN) was proposed
technique based on Deep Boltzmann Machine (DBM)
by Tang et al.[20] with the performance of 89% on
was provided by Gao et al.[27] The authors argued that
KDDCUP’99 using only 6 raw features.
their method gained better precision on KDDCUP’99
CNN is a special deep learning architecture firstly
compared to SVM and simple Artificial Neural Network
developed for image recognition problem. However,
(ANN). Zhang and Chen[28] sped up the training time by
Yao et al.[21] proposed a CNN-based method to detect
combining SVN, BM, and Deep Belief Network (DBN).
time-delayed attacks, and reported that the method was
Alrawashdeh and Purdy[29] achieved 97.9% precision on
highly accurate for DARPA’98 dataset. Wu et al.[22]
10% KDDCUP’99 dataset as the test case. Recently
employed CNN in order to select traffic properties
Vinayakumar et al.[30–34] provided a comprehensive
automatically from raw dataset. They evaluated the
study of various CNN, LSTM, CNN-LSTM, CNN-GRU,
method by KDDCUP’99 and argued that the method
and DNN to select the optimal network architecture
performs better in terms of performance and false alarm
using KDDCUP’99 and NSLKDD datasets.
rate compared to the conventional standard algorithms.
Haghighat et al.[35] also developed a sliding
SAE is a specific type of neural network with exactly
window-based deep learning technique (called
the same size output of its input. The main goal of
SAWANT) which achieved 99.952% accuracy on
SAE is to reconstitute of the output from the input.
CTU-13 dataset. The authors used only 1%–10%
Figure 2 depicts the SAE architecture where the input
CTU-13 dataset as training to conduct their tests.
is compressed and then decompressed to compute the
The aforementioned methods took the advantage of
deep learning to detect network malicious activities.
Input layer Hidden layers Output layer
Although their performance was considerable,
aggregating different deep learning models provides
the capability to utilize the strength of each model and
detect attacks incredibly more efficient.
In this paper we propose “Voting-based Neural
Network (VNN)” as a general infrastructure voting-based
mechanism to aggregate and take the advantages of
Code
any kinds of deep learning algorithms. In other words,
several deep learning-based models can be created by the
state-of-the-art techniques with different performance.
Giving test data, VNN provides a procedure to perform a
Encoder Decoder weighted voting function on the most suitable models to
Fig. 2 SAE architecture.
achieve higher accurate results. Due to only selecting and
486 Tsinghua Science and Technology, August 2021, 26(4): 484–495

aggregating the best models for each test sample, VNN Deep learning models 1

incredibly boosted the system accuracy. Experimental Model 11

results proved our argument, as the false alarms were


reduced up to 75%. Model 12

Table 1 summarized all the relevant acronyms


employed throughout the paper. Input 1

The paper is structured as follows. In Section 2,


Result
an overview of VNN is explained. Then, VNN is Model 1n

Feature selection
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting Based Neural Network 3
deeply studied by two well-known KDDCUP’99 and

Voting engine
Input data
CTU-13 datasets in Sections 3 and 4, using two different
aggregating the best models for each test sample, VNN Deep learning models
configurations: high and low accuracies, respectively.
2

incredibly boosted the system accuracy. Experimental


Finally, in Section 5, the paper is concluded and future
Input 2

results research
proved plans
our argument as the false alarms were
are explained.



reduces up to 75%. Deep learning modelsm

Table2 ??Voting-Based
summarizedNeural all theNetwork
relevant acronyms Inputm

employed throughout
Voting-based thenetwork
neural paper. is a general infrastructure
The paper is structured using
to create several models different aspects
as follows. of data or
In Section 2, Fig. 3 VNN architecture.
various types of deep learning architectures, and merging
an overview of VNN is explained. Then, VNN is Algorithm 1 VNN whole procedure
them, aiming at increasing the system performance. input1: traindata = {F1, F2, . . . , Fl} //input train data
deeply studied by two well-known KDDCUP’99 and
As illustrated in VNN architecture (Fig. 3), several input2: testdata = {F10 , F02, . . . , Fl0} //input test data
CTU-13inputs
datasets in Sections
are extracted from 3theand 4, using
original data two
to bedifferent
modeled where Fi = {ai1 , ai2 , . . . , aik } //k different attributes
input3: Θ = {θ1, θ2, . . . , θn} //n different models
configurations:
by varioushighkindsand low learning
of deep accuracy, respectively.
techniques, like output: prediction result
Finally,DNN,
in Section 5, theSAE,
CNN, RNN, paperandis so
concluded and future
on. As a result, in the 1 //initialization
2 ← {} //empty set as training data
prediction phase,
research plans are explained.a heuristic function, called “voting 3 ω ← {} //empty set as testing data
engine", processes all the models to select the best 4 ∆ ← {} //empty set as prediction results
5 Ξ ← {} //empty set as voting candidates
2 Voting-Based
candidates in a wayNeural Network
to minimize the errors. The chosen
6 //selecting n different train and test features
models perform voting procedure in order to predict 7 vectors with randomely chosen attibutes
Voting-based
test dataNeural Network is
label. Algorithm a generalthe
1 describes infrastructure
whole VNN 8 for i ← range(1, n)
procedure in detail. 9 A ← randomely select i attributes
to create several models using different aspects of data or 10 Ψi ← selectattributes(train, A)
In theofnext
various types deeptwo sectionsarchitectures,
learning different case studies on well
and merging 11 Ωi ← selectattributes(test, A)
known KDDCUP’99 and CTU-13 datasets are presented 12 end
them, aiming at increasing the system performance. 13 for each models θi, train dataj , and test data ωj
to make the voting procedure clearer.
14 modelij ← train(θi, ψj) //train model
15 ∆ ij← predict(modelij , ωj ) //prediction result
3 Case Study 1: KDDCUP’99
Deep Learning Models

model
1
16 end
11
17 ∆ 0 ← select best voting candidates
KDDCUP’99[36] is the mostly used dataset to evaluate 18 result ← vote(∆ 0 )
19 return result
Table 1 Acronyms used
model12
through the paper.
Acronym Expression
[37]
VNN Voting-based Neural Network In the nextdetection
anomaly-based two sections different
systems . The case studies
dataset was on well
[38]
DDoS
Input1 Distributed Denial of Service built based on DARPA’98 project and contains
known KDDCup’99 and CTU-13 datasets are presented about
4.9 million records, including 41 different features with

ANN Artificial Neural Network


to make the voting procedure clearer.
DNN Deep
model 1n Neural Network
Result
normal and four attack types (denial of service, user to
CNN Convolutional Neural Network root,
3 remote
Case toStudy
local, and probing) labels. Hereafter,
Feature Selection

1: KDDCUP’99
Voting Engine
Input Data

RNN Recurrent Neural Network Tavallaee et al.[39]


removed the duplicated records of
LSTM Long Short-Term Memory KDDCUP’99 to [36]
create NSLKDD
Deep Learning Models2 KDDCUP’99 is the mostlydataset. Figure to
used dataset 4 evaluate
GRU Gated Recurrent Unit [40]
shows the evolution of
anomaly-based NSLKDDsystems
detection dataset in . Ref. [37]. The
BM
Input2
Boltzmann Machine
SAE Stacked Auto-Encoder 3.1 Voting
dataset was built based on DARPA’98 project[38] and
procedure

SVM Support Vector Machine contains


The aboutbehind
main idea 4.9 million
VNN records, including
is to make 41 different
a general

P2P Deep Learning Point


Modelsm to Point
infrastructure to create several models using different
features with normal and four attack types (denial of
Inputm
service, user to root, remote to local, and probing) labels.
Hereafter, Tavallaee et al.[39] removed the duplicated
Fig. 3 VNN Architecture. records of KDDCUP’99 to create NSLKDD dataset.
As illustrated in VNN architecture (Fig. 3), several Figure 4 shows the evolution of NSLKDD dataset[40] .
factor (γi ) is less than .
• Perform the majority voting mechanism on the
selected models.
Algorithm 2 describes the procedure in detail.

Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 487

Algorithm 2 Multi-class output best model selection


DARPA Extracting features Removing duplicates input1: Γ ={γ1, γ2,…, γn} //uncertainety factors of n models
KDDCUP’ NSLKDD
RAW TCP/IP 99 Reducing size input2: Z = {ζ1, ζ2,…, ζn} //accuracy of the models
dump file
input3:  //total uncertainty threshold
input4: λ //unsatisfied model threshold
Fig. 4 Evolution of NSLKDD dataset. output: ∆, as set of k best models
1 //initializing the voting parameters
deep learning approaches or data aspects. Then, given a 2 E ← 0 //total sum of uncertainty factors
3 ∆ ← {} //inittializing the output
test sample, select those models who likely more suitable 4 M ← {} //inittializing the set of satisfies models
to find the accurate label. 5 for each ζi ∈ Z
Definition 1 Let n be the number of models. 6 ni ← normalize(ζi)
7 if ni > λ
Uncertainty factor i of the i-th model is defined 8 //adding corresponding uncertainty factor to M
according to the following equation: 9 M← add(γi)
10 end
i D1 i (1) 11 end
where i is the probability of the output layer achieved 12 Msorted ← sort(M ) //sorting the models
13 while true
by the i-th model. 14 E← E + pop(Msorted)
The below procedure is defined to select k best 15 if E>
candidate models of the voting procedure. 16 break
17 else
 Considering i as the accuracy of i-th model 18 add corresponding model to ∆
reported by the system training procedure, normalize 19 end
20 end
all  values according to “normal distribution equation” 21 return ∆
provided by Eq. (2) in the following[41] :
1 .x /2 Table 2 Hyper parameters used to test KDDCUP’99.
f .x; ; ı/ D p e 2ı2 (2) Hyper parameter Value
2 ı
 Assuming  as “Unsatisfied Models Threshold Train size 90%
Test size 10%
(UMT)”, remove all the models whose normalized
Dropout 0.5
accuracy are less than .
Batch input On
 Consider “Total Uncertainty (TU)” threshold as .
Activation function Relu
 Sort all the remained models based on their Layers number of CNN 4
uncertainty factors in ascending order. Layers number of LSTM 2
 Select the models until the total sum of uncertainty Layers number of CNN-LSTM 4
factor ( i ) is less than . Layers number of DNN 2
 Perform the majority voting mechanism on the Layers number of GRU 2
selected models. Number of input attributes 37
Algorithm 2 describes the procedure in detail. Number of input subsets 38
Output Binary and five-class
3.2 Experimental result
UMT 0.7
Several test cases were conducted on KDDCUP’99 using TU 0.5
different deep learning architectures, including CNN,
LSTM, GRU, CNN-LSTM, and DNN models. In order Generally, 90% of KDDCUP’99 datasets were chosen
to highlight the efficiency of voting mechanism, we to train the models, while the rest of 10% were used
configured the hyper parameters of these deep learning for testing. In addition, 38 different training and testing
techniques using two different approaches to see the datasets were generated from the input data, in which
impact of the voting procedure on different situations: each dataset includes 37 random KDDCUP’99 attributes.
(1) Achieving highly accurate results (performance> We conducted binary classification as our highly accurate
99%). test, while the less accurate tests were performed based
(2) Having lots of false alarms (55%<performance< on five-class classifier. Figure 5 depicts the accuracy
80%). reported by the system during the training phase.
Table 2 describes the models’ hyper parameters As illustrated in Fig. 5, 0.7 was chosen for UMT where
configuration in detail. all the models with less normalized accuracy values than
488 Tsinghua Science and Technology, August 2021, 26(4): 484–495
1.0 1.0

0.9 0.9

0.8 0.8
Normalized accuracy

Normalized accuracy
0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

38
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

1
Number of attributes Number of attributes

(a) CNN–binary (b) CNN–five-class


1.0 1.0

0.9 0.9

0.8 0.8
Normalized accuracy

Normalized accuracy
0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Number of attributes Number of attributes

(c) DNN–binary (d) DNN–five-class


1.0 1.0

0.9 0.9

0.8 0.8
Normalized accuracy

Normalized accuracy

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Number of attributes Number of attributes

(e) LSTM–binary (f) LSTM–five-class


1.0 1.0

0.9 0.9

0.8 0.8
Normalized accuracy

Normalized accuracy

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
2

Number of attributes Number of attributes

(g) CNN-LSTM–binary (h) CNN-LSTM–five-class


1.0 1.0

0.9 0.9

0.8 0.8
Normalized accuracy

Normalized accuracy

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Number of attributes Number of attributes

(i) GRU–binary (j) GRU–five-class

Fig. 5 Normalized form of model accuracy (The blue dashed lines show UMT).

UMT were removed. magnificently in both higher and less accurate deep
The voting procedure was conducted over the learning structures. VNN resolved 708 errors out of 1804
remained models and the result was depicted by Fig. 6. (more than 39%) for binary classification-based GRU
The results proved that VNN increased the true responses architecture, and 63 675 false alarms out of about 85 000
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 489

0.9990
Voting Table 4 KDDCUP’99 binary classification confusion
0.9985
All features matrix.
Predicted
0.9980
Normal Malicious Total
0.9975 Normal 301 031 203 301 234
Accuracy

0.9970
Actual Malicious 488 188 121 188 609
Total 301 519 188 324 489 843
0.9965

0.9960
Table 5 KDDCUP’99 five-class classification confusion
matrix.
0.9955 Predicted
0.9950 Normal DoS R2L U2R Probing Total
CNN DNN GRU LSTM CNN-LSTM
Normal 277 269 219 20 608 0 0 298 096
(a) Binary classification
DoS 490 188 107 5 7 0 188 609
1.00
Voting R2L 0 62 3060 0 0 3122
0.95 Actual
0.90
All features U2R 0 1 0 0 0 1
0.85 Probing 0 14 1 0 0 15
0.80 Total 277 759 188 403 23 674 7 0 489 843
0.75
Accuracy

0.70
0.65 in Table 6. These values were achieved by Eqs. (3) – (8)
0.60 in the following:
0.55
FP
0.50 FPR D (3)
0.45 FP C TN
0.40 FN
0.35 FNR D (4)
0.30 FN C TP
CNN DNN GRU LSTM CNN-LSTM
TP
(b) Five-class classification Precision D (5)
TP C FP
Fig. 6 System accuracy: voting-based vs. normal-based TP
using KDDCUP’99 dataset. Recall D (6)
TP C FN
(around 75%) for five-class classification-based CNN- TP C TN
Accuracy D (7)
LSMT models. The detailed number of false alarms and All Data
Precision  Recall
their correction rates were explained in Table 3. F_Score D 2  (8)
We also performed the voting procedure over all the Precision C Recall
where FP, FN, TP, and TN denote False Positive, False
models created by any deep architectures, in which the
Negative, True Positive, and True Negative, respectively.
performance result is summarized in Tables 4 and 5.
The result proved that VNN achieved higher accuracy
Different measurements of the experiment, including
compared to any deep learning structures for both binary
False Positive Rate (FPR), False Negative Rate (FNR),
and five-class classifiers efficiently. Figure 7 compares
Accuracy, Precision, Recall, and F_Score are computed
VNN with DNN, CNN, LSTM, CNN-LSTM, and GRU
Table 3
KDDCUP’99 error correction.
methods.
Number of Number of Correction
Method
errors corrections rate (%) 4 Case Study 2: CTU-13
DNN 777 29 3.73 CTU-13 contains thirteen days labeled traffic, captured
CNN 872 97 11.12
by CTU University, Czech Republic in 2011[42] . It has
Binary LSTM 1551 551 35.53
about twenty million netflow records, including Internet
CNN-LSTM 993 148 14.90
GRU 1804 708 39.25
Table 6 Measurement result of KDDCUP’99 study.
DNN 205 439 25 497 12.41
FPR FNR Accuracy Precision Recall F_Score
CNN 205 306 7463 3.64
Binary
Five-class LSTM 208 849 81 263 38.90 0.0011 0.0016 0.9986 0.9993 0.9984 0.9989
classification
CNN-LSTM 85 068 63 675 74.85 Five-class
0.0982 0.0021 0.9563 0.9302 0.9979 0.9628
GRU 208 513 28 374 13.61 classification
490 Tsinghua Science and Technology, August 2021, 26(4): 484–495

0.9988 destination IP/port numbers, protocol, Transmission


0.9986 Control Protocol (TCP) flags, flow duration, flow size,
0.9984 number of packets, input and output Simple Network
0.9982 Management Protocol (SNMP) interface, and next
Accuracy

0.9980
hop router. These attributes are too simple to be used
0.9978
in a deep learning method to detect network attacks.
0.9976
As a result, Haghighat et al.[35] developed a sliding
0.9974
window-based technique, called SmArt Window-based
0.9972
Anomaly detection using Netflow Traffic (SAWANT),
CNN DNN GRU LSTM CNN-LSTM VNN
which aggregates netflow records and extracts several
(a) Binary classification meaningful attributes using sliding window algorithm.
1.0 Using training small subset of netflow records (one
0.9
to ten percent), SAWANT was able to achieve high
0.8
accurate models, which is its main contribution. As
0.7

0.6
illustrated in Fig. 8, new feature vectors were extracted
Accuracy

0.5 from netflow traffic according to the following procedure.


0.4 In addition, the label of each vector was called malicious
0.3
rate, describing how many the aggregated vectors were
0.2

0.1
abnormal.
0
(1) Slide a window of size w through the netflow
CNN DNN GRU LSTM CNN-LSTM VNN
records.
(b) Five-class classification
(2) For each position of the window, calculate these
Fig. 7 VNN vs. other deep learning architectures. attributes:
 Number of unique values of source IP/port,
Relay Chat (IRC), P2P, HTTP, fast flux, spam, click
destination IP/port, duration, source bytes, number of
fraud, port scan, and DDoS traffic. The goal of CTU-13
packets, and flow size per incoming and outgoing flows.
is to collect a large real botnet traffic mixed with the
 Entropy values of source IP/port, destination
normal user activities in the network. Table 7 describes
IP/port, duration, source bytes, number of packets, and
the distribution of labels in the netflow traffic per day.
flow size per incoming and outgoing flows.
4.1 Deep learning models  Minimum, maximum, average, sum, and variance
Netflow traffic contains high level network activities of duration, source bytes, number of packets, and flow
information, including source IP/port numbers, size per incoming, outgoing, and total flows.
 Calculate malicious rate () as the label of each
Table 7 CTU13 label distribution. vector based on Eq. (9) in the following:
Number Command Number of malicious netflow records
Botnet Normal Background (9)
Day of flows and D
(%) (%) (%) Window size
(million) control (%)
1 2.82 1.41 1.07 0.030 97.47 The new feature vectors were used to train ANN
2 1.81 1.04 0.50 0.110 98.33 model as depicted in Fig. 9, where the output layer
3 4.71 0.56 2.48 0.001 96.94 expressed the malicious rate.
4 1.21 0.15 2.25 0.004 97.58 The results of the test dataset were compared with the
5 0.13 0.53 3.6 1.150 95.70 actual malicious rate values using “Pearson correlation
6 0.56 0.79 1.34 0.030 97.83
7 0.11 0.03 1.47 0.020 98.47 f11 f12 … f1m 𝜌1 f21 f22 … f2m 𝜌2 fn1 fn2 … fnm 𝜌𝑛

8 2.95 0.17 2.46 2.400 97.32


Window1 Window2 … Windown
9 2.75 6.50 1.57 0.180 91.70
10 1.31 8.11 1.20 0.002 90.67
Netflow3
Netflow2
Netflow1

Netflown

11 0.11 7.60 2.53 0.002 89.85 …


12 0.33 0.65 2.34 0.007 96.99
13 1.93 2.01 1.65 0.060 96.26
Fig. 8 SAWANT window-based feature extraction procedure.
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 491

Input layer Hidden layers Output layer

New feature vectors


F1
F2
Netflow F3



Network Pre-processing
Fn

Fig. 9 SAWANT architecture.

coefficient” function, described by Eq. (10) in the following:  Sort the models based on the computed value and
EŒX Y  EŒX EŒY  remove the last 50% models.
rX;Y D p D
 For each two remaining predicted sets i and j :
p
EŒX 2  EŒX 2 EŒY 2  EŒY 2
Pn – Compute ˛ as the Pearson correlation coefficient of
xi yi nxN yN Si and Sj , r.Si ; Sj /.
i D1
s s (10) – Remove t from both Si and Sj , and compute ˇ
n n
P 2
xi nxN 2
P 2
yi nyN 2 as the Pearson correlation coefficient of the two sets,
i D1 i D1 r.Si ft g; Sj ft g/.
where X and Y were two different variable sets. – Compare the Pearson correlation coefficient
Definition 2 Let X and Y be two different data calculated from the above steps.
series. X and Y are positively correlated (r D 1), and – Mark Si and Sj as being similar for test case t if ˛
8xi 2 X; yi 2 Y j yi D ˛xi C ˇ; is greater than ˇ.
 Put similar models into a single set.
where ˛ and ˇ are two arbitrary numbers.  Return the largest set as the voting candidate.
4.2 Voting procedure  Compute the result based on the majority voting
As described in the previous section, a ranking schema over the parties inside the selected set.
Algorithm 3 describes the model selection procedure
mechanism is defined in order to select a subset of
in detail.
more probable models to achieve more accurate results
in the voting procedure. The more decisive models 4.3 Experimental results
were selected in the classification environment (like We chose DNN, CNN, LSTM, and GRU as the deep
Case Study 1 with “malicious” and “benign” classes), learning structure of SAWANT and performed the
the more likely it has more accuracy. However, the voting procedure to evaluate VNN. The SAWANT
main challenge of SAWANT is its predicted malicious pre-processed data contains 92 different attributes. We
rate which is numerical (not categorical). In fact, the extracted 73 unique subsets, each containing 72 features.
SAWANT predicted results were not equal to the Table 8 explains the hyper parameters to test CTU-13
actual values, meaning finding more decisive models dataset.
impossible. Therefore, the aforementioned majority We configured the deep learning structure in a way
voting procedure explained in Section 3.1 is not practical to result both higher and lower accuracies, in which
here. As a result, we developed a new heuristic procedure the performance of DNN, CNN, GRU, and LSTM was
to rank and select better models for any arbitrary test 99%, 94%, 76%, and 70%, respectively. UMT was also
case as t. configured as 0.8 to select better models in the voting
 Normalize the accuracy of all the models according procedure. Figure 10 illustrates the accuracy of each
to Eq. (2) and remove less accurate models based on model created by various extracted subsets and deep
UMT. learning architectures.
 Compute the sum of Pearson correlation coefficient Figure 11 compares the accuracy of VNN with the
(r) of each predicted model with all the others. utilized deep learning structures (DNN, CNN, LSTM,
• Put similar models into a single set.
with the • Return the largest set as the voting candidate.
elation • Compute the result based on the majority voting
schema over the parties inside the selected set.
Algorithm 3 describes the model selection procedure
=
2 in detail.
492 Tsinghua Science and Technology, August 2021, 26(4): 484–495

1.0
Algorithm 3 SAWANT best model selection procedure
(10) input1: Γ = {γ1, γ2, … , γn} //predicted malicous rates set
0.9

0.8
where γi = {pmri1, pmri2, … , pmrim }
//predicted malicous rates of m test cases 0.7

input2: Z = {ζ1, ζ2, … , ζn} //accuracy of the models 0.6

Accuracy
input3: λ //unsatisfied model threshold 0.5

input4: pivot 0.4


nt data output: A set of k best γ of the testcase pivot 0.3
1 //initializing the voting parameters
), and 2 E ← 0 //total sum of uncertainty factors
0.2

0.1
3 ∆ ← {} //inittializing the output
0
4 M ← {} //inittializing the set of satisfied models 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73

5 for each ζi∈Z Number of features

6 ni ← normalize(ζi) (a) CNN–normalized form of accuracy


7 if ni > λ
1.0
8 //adding corresponding uncertainty factor to M、
0.9
9 M ← add(γi)
anking 10 end 0.8

bset of 11 end 0.7

12 for each γi, γj ∈ M 0.6

results 13 δγi ← δγi + r(γi, γj) //r is correlation coefficient Accuracy 0.5

14 end
models 15 ∆ sorted ← sort(∆)
0.4

0.3
nt (like 16 Γ 0 ← remain the top 50% Γ based on ∆sorted 0.2
17 for each γi, γj ∈ Γ 0
lasses), 18 r← r(γi, γj)
0.1

the 0
ver, 19 r0← r(γi − {pmripivot }, γj − {pmrjpivot})
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
Number of features
20 if r is greater than r0
licious 21 (b) DNN–normalized form of accuracy
θi, j ← 1
n fact, 22
else
1.0
the 23 θi, j ← 0 0.9
to 24 end 0.8

models 25 end 0.7


26 partition Γ 0 based on Θ
ajority
0.6
Accuracy

27 return the largest partition as the voting candidate■ 0.5

ractical 0.4
Table 8 Hyper parameters used to test CTU-13. 0.3
Hyper parameter Value 0.2

Train size 10% 0.1

Test size 90% 0


1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
Dropout 0.2 Number of features

Batch input On (c) LSTM–normalized form of accuracy


Activation function Relu 1.0

Number of CNN layers 4 0.9

Number of LSTM layers 2 0.8

Number of DNN layers 2 0.7

0.6
Number of GRU layers 2
Accuracy

0.5
Number of input attributes 72
0.4
Number of input subsets 73 0.3
Output Malicious rate 0.2
UMT 0.8 0.1

TU 0.5 0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
Number of features

and GRU). VNN decreased false alarms significantly, (d) GRU–normalized form of accuracy
especially for LSTM and DNN methods, where 272 507
Fig. 10 Model accuracy reported by the system during the
out of 668 597 errors (around 40%) and 12 418 out of
training phase.
17 112 errors (about 72%) were corrected, respectively.
Table 9 expresses the detail of error correction over CTU- Tables 10 and 11 also summarized VNN performance
13 dataset. over DNN as the best suited model in our case study.
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 493

1.00 on CTU-13. In the future, we plan to overcome this issue


Voting by developing a heuristic function, in order to ignore
0.95
All features
generating less effective models in advance. In addition,
0.90
giving feedback from the candidates and utilizing the
0.85 results to create more robust deep learning architecture
Accuracy

0.80
are another direction to work in the future. Deeper
analysis on different attack types (e.g., those provided
0.75
in KDDCUP’99—DoS, R2L, U2R, and probing) will
0.70
give us a suitable feedback to create more robust models.
0.65 The proposed method missed U2R and probing attacks,
0.60
however the number of samples were too small. But we
DNN CNN LSTM GRU
plan to address this issue in the future.
Fig. 11 System accuracy: voting-based vs. normal-based
using CTU-13 dataset.
Acknowledgment

Table 9 CTU-13 error correction.


This work was supported by the National Natural Science
Number of Number of Correction Foundation of China (No. 61872212) and the National
Method
errors corrections rate (%) Key Research and Development Program of China (No.
DNN 17 112 12 418 72.57 2016YFB1000102).
CNN 76 523 8251 10.78
LSTM 668 597 272 507 40.74 References
GRU 630 541 90 902 14.42
[1] Sophos 2020 threat report, https://www.sophos.com/en-
Table 10 CTU-13 confusion matrix. us/medialibrary/pdfs/technical-papers/sophoslabs-uncut-
Predicted 2020-threat-report.pdf, 2020.
[2] McAfee labs threats report, https://www.mcafee.com/
Normal Malicious Total
enterprise/en-us/assets/reports/rp-quarterly-threats-aug-
Normal 2 103 058 254 2 103 312
2019.pdf, 2019.
Actual Malicious 767 145 921 146 688 [3] S. Behal, K. Kumar, and M. Sachdeva, D-FACE: An
Total 2 103 825 146 175 2 250 000 anomaly-based distributed approach for early detection of
DDoS attacks and flash events, Journal of Network and
Table 11 Measurement result of CTU-13 study. Computer Applications, vol. 111, pp.49–63, 2018.
FPR FNR Accuracy Precision Recall F_Score [4] O. Elejla, B. Belaton, M. Anbar, and A. Alnajjar, Intrusion
0.0017 0.0004 0.9995 0.9999 0.9996 0.9998 detection systems of ICMPv6-based DDoS attacks, Neural
Computing and Applications, vol. 30, no. 1, pp. 45–56,
2018.
5 Conclusion [5] M. H. Haghighat and J. Li, Edmund: Entropy based attack
detection and mitigation engine using netflow Data, in Proc.
This paper presents a novel voting-based deep learning of 8th International Conference on Communication and
framework, called VNN, to correct false alarms reported Network Security, Chengdu, China, 2018, pp. 1–6.
by other deep learning structures and increase the system [6] M. Idhammad, K. Afdel, and M. Belouch, Semi-supervised
performance. The key novelty of VNN was the ability machine learning approach for DDoS detection, Applied
Intelligence, vol. 48, no. 10, pp. 3193–3208, 2018.
to create several models using various kinds of deep [7] D. S. Terzi, R. Terzi, and S. Sagiroglu, Big data analytics
learning structures and different aspects of data, then for network anomaly detection from netflow data, in Proc.
choosing the best models to achieve higher accuracy. of 2017 International Conference on Computer Science and
Experimental results revealed that VNN was highly Engineering, Antalya, Turkey, 2017, pp. 592–597.
[8] J. M. Vidal, A. L. S. Orozco, and L. J. G. Villalba, Adaptive
effective for any kinds of deep learning structures with
artificial immune networks for mitigating DoS flooding
various hyper parameters where it corrected false labels attacks, Swarm and Evolutionary Computation, vol. 38, pp.
interestingly up to 75%. 94–108, 2018.
Although VNN provides high accurate prediction, [9] R. Wang, Z. Jia, and L. Ju, An entropy-based distributed
creating several models is a really time-consuming DDoS detection mechanism in software-defined networking,
in Proc. of 2015 IEEE Trustcom/BigDataSE/ISPA, Helsinki,
procedure. In fact, 190 different models were created
Finland, 2015, pp. 310–317.
for each binary and 5-class classification problems over [10] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, Multi-
KDDCUP’99 dataset. 292 models were also generated classification approaches for classifying mobile app traffic,
494 Tsinghua Science and Technology, August 2021, 26(4): 484–495

Journal of Network and Computer Applications, vol. 103, Belgium, 2016, pp. 21–26.
pp. 131–145, 2018. [25] F. Farahnakian and J. Heikkonen, A deep auto-encoder-
[11] M. Lotfollahi, M. J. Siavoshani, R. S. Hosseinzade, and M. based approach for intrusion detection system, in Proc.
S. Saberian, Deep packet: A novel approach for encrypted of 2018 20th International Conference on Advanced
traffic classification using deep learning, Soft Computing, Communication Technology, Chuncheon-si Gangwon-do,
vol. 24, no. 3, pp. 1999–2012, 2020. South Korea, 2018, pp. 178–183.
[12] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapè, [26] R. Salakhutdinov and G. Hinton, Deep boltzmann machines,
MIMETIC: Mobile encrypted traffic classification using in Proc. of Twelfth International Conference on Artificial
multimodal deep learning, Computer Networks, vol. 165, Intelligence and Statistics, Clearwater, FL, USA, 2009, pp.
pp. 1186–1191, 2019. 448–455.
[13] N. Mansouri and M. Fathi, Simple counting rule for optimal [27] N. Gao, L. Gao, Q. Gao, and H. Wang, An intrusion
data fusion, in Proc. of 2003 IEEE Conference on Control detection model based on deep belief networks, in Proc. of
Applications, Istanbul, Turkey, 2003, pp. 1186–1191. IEEE 2014 Second International Conference on Advanced
[14] D. Ciuonzo, A. De Maio, and P. S. Rossi, A systematic Cloud and Big Data, Huangshan, China, 2014, pp. 247–252.
framework for composite hypothesis testing of independent [28] X. Zhang and J. Chen, Deep learning-based intelligent
Bernoulli trials, IEEE Signal Processing Letters, vol. 22, no. intrusion detection, in Proc. of 2017 IEEE 9th International
9, pp. 1249–1253, 2015. Conference on Communication Software and Networks,
[15] A. Khan and F. Zhang, Using recurrent neural networks Guangzhou, China, 2017, pp. 1133–1137.
(RNNs) as planners for bio-inspired robotic motion, in [29] K. Alrawashdeh and C. Purdy, Toward an online anomaly
Proc. of 2017 IEEE Conference on Control Technology and intrusion detection system based on deep learning, in Proc.
Applications, Mauna Lani, HI, USA, 2017, pp. 1025–1030. of 2016 15th IEEE International Conference on Machine
[16] J. Kim and H. Kim, Applying recurrent neural network to Learning and Applications, Anaheim, CA, USA, 2016, pp.
intrusion detection with hessian free optimization, in Proc. 195–200.
of 2015 International Workshop on Information Security [30] R. Vinayakumar, K. P. Soman, and P. Poornachandran,
Applications, Jeju Island, Korea, 2015, pp. 357–369. A comparative analysis of deep learning approaches for
[17] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, Long short term network intrusion detection systems (N-IDSs): Deep
memory recurrent neural network classifier for intrusion learning for N-IDSs, International Journal of Digital Crime
detection, in Proc. of 2016 International Conference on and Forensics, vol. 11, no. 3, pp. 65–89, 2019.
Platform Technology and Service, Jeju South, Korea, 2016, [31] R. Vinayakumar, M. Alazab, K. P. Soman, P.
pp. 1–5. Poornachandran, A. Al-Nemrat, and S. Venkatraman, Deep
[18] C. Yin, Y. Zhu, J. Fei, and X. He, A deep learning approach learning approach for intelligent intrusion detection system,
for intrusion detection using recurrent neural networks, IEEE Access, vol. 7, pp. 41 525–41 550, 2019.
IEEE Access, vol. 5, pp. 21 954–21 961, 2017. [32] R. Vinayakumar, K. P. Soman, and P. Poornachandran,
[19] S. Althubiti, W. Nick, J. Mason, X. Yuan, and A. Evaluation of recurrent neural network and its variants for
Esterline, Applying long short-term memory recurrent intrusion detection system (IDS), International Journal of
neural network for intrusion detection, in Proc. of IEEE Information System Modeling and Design, vol. 8, no. 3, pp.
Southeast Conference 2018, St. Petersburg, FL, USA, 2018, 43–63, 2017.
pp. 1–5. [33] R. Vinayakumar, K. P. Soman, and P. Poornachandran,
[20] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and Evaluating effectiveness of shallow and deep networks to
M. Ghogho, Deep recurrent neural network for intrusion intrusion detection system, in Proc. of 2017 International
detection in SDN-based networks, in Proc. of 2018 4th IEEE Conference on Advances in Computing, Communications
Conference on Network Softwarization and Workshops, and Informatics, Manipal, India, 2017, pp. 1282–1289.
Montreal, Canada, 2018, pp. 202–206. [34] R. Vinayakumar, K. P. Soman and P. Poornachandran,
[21] Y. Yao, Y. Wei, F. Gao, and G. Yu, Anomaly intrusion Applying convolutional neural network for network
detection approach using hybrid MLP/CNN neural network, intrusion detection, in Proc. of 2017 International
in Proc. of Sixth International Conference on Intelligent Conference on Advances in Computing, Communications
Systems Design and Applications, Jinan, China, 2006, pp. and Informatics, Manipal, India, 2017, pp. 1222–1228.
1095–1102. [35] M. H. Haghighat, Z. Abtahi Foroushani, and J. Li,
[22] K. Wu, Z. Chen, and W. Li, A novel intrusion detection SAWANT: Smart window-based anomaly detection using
model for a massive network using convolutional neural netflow traffic, in Proc. of 2019 IEEE 19th International
networks, IEEE Access, vol. 6, pp. 50 850–50 859, 2018. Conference on Communication Technology, Xi’an, China,
[23] M. E. Aminanto and K. Kim, Deep learning-based feature 2019, pp. 1396–1402.
selection for intrusion detection system in transport layer, in [36] KDD CUP 1999 dataset, http://kdd.ics.uci.edu/databases/
Proc. of Summer Conference of Korea Information Security kddcup99/kddcup99.html, 1999.
Society, Busan, Korea, 2016, pp. 535–538. [37] T. Janarthanan and S. Zargari, Feature selection in UNSW-
[24] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, A deep NB15 and KDDCUP’99 datasets, in Proc. of 2017 IEEE
learning approach for network intrusion detection system, in 26th International Symposium on Industrial Electronics,
Proc. of 9th EAI International Conference on Bio-inspired Edinburgh, UK, 2017, pp. 1881–1886.
Information and Communications Technologies, Brussels, [38] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R.
Mohammad Hashem Haghighat et al.: Intrusion Detection System Using Voting-Based Neural Network 495

Kendall, D. McClung, D. Weber, S. E. Webster, D. pp. 1–6.


Wyschogrod, R. K. Cunningham, et al., Evaluating intrusion [40] A. zgür and H. Erdem, A review of KDD’99 dataset usage
detection systems: The 1998 DARPA off-line intrusion in intrusion detection and machine learning between 2010
detection evaluation, in Proc. of DARPA Information and 2015, doi: 10.7287/PEERJ.PREPRINTS.1954.
Survivability Conference and Exposition, Hilton Head, SC, [41] S. J. Finney and C. DiStefano, Non-normal and categorical
USA, 2000, pp. 12–26. data in structural equation modeling. Structural Equation
[39] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, A Modeling: A Second Course, no. 10, vol. 6, pp. 269–314,
detailed analysis of the KDDCUP’99 dataset, in Proc. of 2006.
2009 IEEE Symposium on Computational Intelligence for [42] CTU-13 botnet traffic dataset, https://mcfp.weebly.com/,
Security and Defense Applications, Ottawa, Canada, 2009, 2011.

Mohammad Hashem Haghighat received Jun Li received the PhD degree from New
the BS degree in computer engineering Jersey Institute of Technology (NJIT) in
from Shiraz Azad University, Shiraz, Iran 1997, and the MEng and BEng degrees
in 2008, and the MS degree in computer in automation from Tsinghua University
engineering from Sharif University of in 1998 and 1985, respectively. He is
Technology, Tehran, Iran in 2010. He is currently a professor at the Department of
currently a PhD candidate at Tsinghua Automation, Tsinghua University, and his
University, Beijing, China. His research research interests include network security
interests include network security, intrusion detection systems, and network automation.
deep learning, and information forensics.

You might also like