0% found this document useful (0 votes)
7 views20 pages

8 Deep

The research article presents a Hybrid Convolutional Neural Network (HYBRID-CNN) designed for detecting abnormal flow in Software-Defined Network (SDN)-based Smart Grids, addressing vulnerabilities in traditional machine learning methods. The proposed method combines 1D and 2D data processing to enhance feature extraction and significantly improves detection accuracy and False Positive Rate compared to existing techniques. Experimental results on benchmark datasets UNSW_NB15 and KDDCup 99 demonstrate the effectiveness of HYBRID-CNN in identifying network anomalies.

Uploaded by

Pro Fa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

8 Deep

The research article presents a Hybrid Convolutional Neural Network (HYBRID-CNN) designed for detecting abnormal flow in Software-Defined Network (SDN)-based Smart Grids, addressing vulnerabilities in traditional machine learning methods. The proposed method combines 1D and 2D data processing to enhance feature extraction and significantly improves detection accuracy and False Positive Rate compared to existing techniques. Experimental results on benchmark datasets UNSW_NB15 and KDDCup 99 demonstrate the effectiveness of HYBRID-CNN in identifying network anomalies.

Uploaded by

Pro Fa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Hindawi

Security and Communication Networks


Volume 2020, Article ID 8850550, 20 pages
https://doi.org/10.1155/2020/8850550

Research Article
HYBRID-CNN: An Efficient Scheme for Abnormal Flow
Detection in the SDN-Based Smart Grid

Pengpeng Ding , Jinguo Li , Liangliang Wang, Mi Wen, and Yuyao Guan


College of Computer Technology and Science, Shanghai University of Electric Power, Shanghai 200090, China

Correspondence should be addressed to Jinguo Li; [email protected]

Received 9 April 2020; Revised 5 July 2020; Accepted 21 July 2020; Published 3 August 2020

Academic Editor: Yin Zhang

Copyright © 2020 Pengpeng Ding et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Software-Defined Network (SDN) can improve the performance of the power communication network and better meet the
control demand of the Smart Grid for its centralized management. Unfortunately, the SDN controller is vulnerable to many
potential network attacks. The accurate detection of abnormal flow is especially important for the security and reliability of the
Smart Grid. Prior works were designed based on traditional machine learning methods, such as Support Vector Machine and
Naive Bayes. They are simple and shallow feature learning, with low accuracy for large and high-dimensional network flow.
Recently, there have been several related works designed based on Long Short-Term Memory (LSTM), and they show excellent
ability on network flow analysis. However, these methods cannot get the deep features from network flow, resulting in low
accuracy. To address the above problems, we propose a Hybrid Convolutional Neural Network (HYBRID-CNN) method.
Specifically, the HYBRID-CNN utilizes a Deep Neural Network (DNN) to effectively memorize global features by one-di-
mensional (1D) data and utilizes a CNN to generalize local features by two-dimensional (2D) data. Finally, the proposed method is
evaluated by experiments on the datasets of UNSW_NB15 and KDDCup 99. The experimental results show that the HYBRID-
CNN significantly outperforms existing methods in terms of accuracy and False Positive Rate (FPR), which successfully
demonstrates that it can effectively detect abnormal flow in the SDN-based Smart Grid.

1. Introduction the SDN itself may be the target of various attacks, such as
DDoS, fake flow, breakthroughs in switches, and attacks on
The Smart Grid is a grid system with automatic control and the control layer. The destruction of the SDN will cause all
self-protection adjustment capabilities [1]. It is supported by switches under its control to be paralyzed or disorders can
information and communication technology to achieve have devastated effects on the entire network [6]. In the
reliability, security, and real-time requirements [2, 3]. The SDN, collaborative abnormal flow detection across multiple
emerging network architecture Software-Defined Network domains requires detailed flow data for each relevant do-
(SDN) ignores the coaxial hardware structure of the network main, such as the contents of a flow table in the last few
which separates the control plane and the data plane, and seconds. Network abnormal flow has the characteristics of
directly implements the virtualized configuration of the potential and unforeseen attacks. Therefore, the detection
switch. It is especially suitable for mobile communication technology of network abnormal flow is challenged by the
network, wired interconnection network, and sensor net- demand for larger-scale and higher-dimensional flow data
work in the Smart Grid [4]. The SDN improves the data [7].
transmission capability and network compatibility of the Recently, most of these studies are based on state
Smart Grid, but it also brings new security issues. The highly transition [8] and artificial intelligence methods [9]. The
centralized network control capability and the damage method based on state transition requires manual calcula-
caused by network abnormal flow intrusion have increased tion and has low recognition accuracy. The method based on
significantly [5]. As the control center of the whole network, artificial intelligence has more advantages in this respect
2 Security and Communication Networks

because of network big data. However, most of the re- use the self-attention mechanism to fuse key fea-
searches have not carried out in-depth feature learning of tures, and finally use the fully connected neural
network flow. For large-scale network abnormal flow de- network for detection.
tection, there are mainly two types of methods. The first type (iii) Third, we compare the proposed method with the
of method relies on sampling data, it uses network flow data single model and verify the performance im-
to establish a library of attack intrusion behavior patterns, provement of the hybrid model. In addition, we
and the collected data including the host’s system logs or discuss a parameter study to optimize the HYBRID-
collected from the network nodes matches the established CNN model.
pattern library. If the match is successful, it is proved to be an
(iv) Fourth, we perform a lot of experimental com-
intrusion; otherwise, it is a normal behavior [10]. This
parisons on the UNSW_NB15 and KDDCup 99
method can effectively identify existing attacks and maintain
benchmark dataset. Experimental results show that
them effectively and improve network security at the time.
the HYBRID-CNN significantly outperforms
However, with the development of computers and the In-
existing approaches in terms of accuracy and False
ternet, more and more new types of attacks appear in the
Positive Rate (FPR).
field of vision. The detection accuracy of expert systems has
fallen sharply. It has been unable to meet the requirements, The rest of this article is organized as follows: we discuss
and the sampling data itself is not accurate, which may cause related work in Section 2 and introduce the system model
the loss of useful information. and security requirements in Section 3. We then introduce
Another type of method is to utilize machine learning some preliminary knowledge in Section 4. In Section 5, we
methods to perform feature extraction and detection introduce our proposed algorithm, and then in Section 6 we
classification after constructing features. The massive introduce experimental comparative analysis. Finally, we
amount of network data makes machine learning methods discuss and conclude in Sections 7 and 8.
more effective than judgment methods based on expert
systems [11]. The traditional machine learning methods are
2. Related Work
just a shallow feature learning classifier. They have certain
limitations when processing complex data. The feature This section discusses two related types of work, namely,
processing that traditional machine learning must do is traditional machine learning and deep learning. In the SDN-
time consuming and requires specialized knowledge. The based network controllers, using traditional machine
performance of most machine learning algorithms depends learning and deep learning to develop flexible and efficient
on the accuracy of the extracted features. Deep learning abnormal flow detection schemes presents some challenges.
reduces the manual design effort of feature extractors for One of the main challenges is how to choose an appropriate
each problem by automatically retrieving advanced features feature selection method and another challenge is to ac-
directly from raw data [12]. Previous studies have used deep curately grasp the correlation between the selected feature
learning to classify mobile encrypted traffic and achieved and the abnormal flow detection task and the redundancy
excellent results [13, 14]. In [15], the authors investigated between these features [16].
several deep learning architectures, including 1D CNN, 2D
CNN, LSTM, Stacked Autoencoder (SAE), and Multilayer
Perceptron (MLP) for mobile encrypted traffic classifica- 2.1. Traditional Machine Learning. Most of the previous
tion. Based on this, this paper aims to apply the excellent studies were based on traditional machine learning methods,
feature learning capabilities of deep learning to the SDN- such as Support Vector Machine (SVM), Decision Tree, and
based Smart Grids to achieve highly accurate network Naive Bayes. Naive Bayes algorithm is an important algo-
abnormal flow detection. rithm in the field of machine learning and data mining. It is
To meet the above problems and challenges, we hope to widely used in the field of machine learning classification,
apply the excellent feature learning capabilities of deep such as text classification and medical diagnosis. Ashraf et al.
learning to the SDN-based Smart Grid to achieve highly [17] applied Naive Bayes for network intrusion detection;
accurate network abnormal flow detection. The main con- their basic idea is to select the most likely category based on
tributions of this article can be summarized as follows: the Bayesian algorithm under the assumption that the
classification is based on feature independence. But this
(i) First, we design a framework for improving the
method is only simple shallow feature learning, and it has
security of the Smart Grid by applying an abnormal
poor performance for large-scale network flow data. Rai et al.
flow detection algorithm in the SDN-based Smart
[18] used decision tree C4.5 to perform intrusion detection
Grid communication network; it can identify ab-
experiments on the NSL-KDD dataset. In this work, 16
normal flow and detect the type of attack.
attributes were selected as detection features on the dataset.
(ii) Second, we propose a deep learning algorithm of The proposed algorithm can be used for feature-based in-
Hybrid Convolutional Neural Networks (HYBRID- trusion detection, but its accuracy is too low, only 79.52%.
CNN) to detect abnormal flow in the SDN-based Reddy et al. [19] proposed a filtering algorithm based on the
Smart Grid communication network. The HYBRID- SVM classifier to perform the classification task on the
CNN adopts dual-channel data input, which can KDDCup 99 dataset. This method performed well on the
extract effective features from 1D and 2D flow data, training field but performed poorly in the test dataset and
Security and Communication Networks 3

could not effectively detect unknowns’ network abnormal utilizes two-channel input structure of 1D data and 2D data:
flow. using a CNN to extract local features and using a DNN to
extract the global features. Specifically, a self-attention
mechanism is added to select the most important features.
2.2. Deep Learning. In recent years, as a branch of machine
learning, deep learning is becoming more and more 3. System Model View
popular. It is applied to intrusion detection and research
shows that deep learning has completely surpassed tradi- In this section, we formalize the system model and system
tional methods in performance [20]. Kwon et al. [15] security requirements.
utilized Deep Neural Network-based deep learning
methods for flow-based anomaly detection. Experimental 3.1. System Model. The Smart Grid uses two-way commu-
results evidence that deep learning can be applied to ab- nication technology to connect many power components to
normal flow detection in the SDN. Long Short-Term ensure mutual communication between the components.
Memory (LSTM) is a special deep learning model of Re- Implementing the SDN on Smart Grid technology separates
current Neural Network. It can remember the input and network control from data forwarding equipment that in-
predicted output of any period and solves the problem of cludes network infrastructure, thereby enabling logically
gradient vanish and explosion in the Recurrent Neural centralized control and enabling the network to be pro-
Network (RNN). LSTM is widely used in the field of grammed by a central software unit. The control layer, as the
Natural Language Processing [21]. Existing researches have brain of the network, carries the controller software. The
been done on abnormal flow detection based on LSTM [22], software-defined routing rules determine where to route
and they found that the algorithms have a significant flow. There are programmable network devices in the data
performance improvement for sequence learning com- plane to route flow according to the rules defined by the
pared with traditional machine learning methods, but there controller. The top of the module implements the function of
is still room for improvement in detection rate and ac- the abnormal flow detection module. As shown in Figure 1,
curacy. CNN is a multi-layer network structure learning the SDN-based Smart Grid mainly includes the following
algorithm. It can learn hierarchical features from a large parts [26].
amount of data and has broad application prospects in the
field of abnormal flow detection. Wang et al. [23] proposed
an end-to-end classification method for one-dimensional 3.1.1. Physical Plane. This layer is responsible for packet
Convolutional Neural Networks. This method integrates switching and routing. It includes the basic components of
feature extraction, feature selection, and classifiers into a network communication in Smart Grid, such as smart meter,
unified end-to-end framework and automatically learns Power Management Unit (PMU), various sensors, and
original inputs and expectations. The nonlinear relation- various communication equipment. Different from the
ship between the outputs has obtained good experimental traditional network, these basic components cannot make
results. However, the one-dimensional data used in this decisions independently because of no control unit. They are
method is not suitable for local feature extraction, resulting only responsible for collecting the generated key data and
in the detection rate less than the ideal one. In [24], the forwarding the collected data to the control layer through
authors present a new technique for network traffic clas- the programmable SDN switch infrastructure while com-
sification based on a combination of RNN and CNN models plying with the rules defined by the controller.
that can be used for Internet of Things (IoT) traffic, which
provides the best detection results. Wang et al. [25] pro- 3.1.2. Southbound Interface. The definition of south inter-
posed using CNN combined with LSTM to analyze and face provides the communication protocol between the
detect network flow. It utilizes CNN to learn low-level physical layer and the control layer. OpenFlow protocol
spatial features of network flow for the first time and then developed by Stanford is currently the most common and
uses LSTM to learn high-level temporal features. The Deep standard protocol in south interface [27]. It can realize
Neural Network completes it automatically, and this secure communication in the SDN by determining the
method has achieved good results in terms of accuracy and message format from a programmable switch to controller.
detection rate.
Based on the above works, traditional machine learning
methods that are typically used in abnormal flow detection 3.1.3. Control Plane. As the central brain, the control layer
often fail and cannot detect many known and new security has a SDN controller or more whose task is to manage the
threats, largely because those approaches provide less focus forwarding behavior of data flow by determining forwarding
on accurate feature selection and classification. It is often rules, which need to be written into the flow table of the
inefficient for large-scale network flow. For the current deep programmable switch in the physical layer through the south
learning methods like LSTM and CNN, they often pay more interface.
attention to the improvement of the model and ignore the
original flow structure features. To address the above 3.1.4. Northbound Interface. The north interface definition
problems, we propose a HYBRID-CNN deep learning provides an interface for communication between the
method for more accurate feature learning. The method control layer and the application layer and enables
4 Security and Communication Networks

Physical plane Control plane Application plane

Anomaly
SDN controller detection module
Flow data

V2G

Northbound interface
Southbound interface
SM

Report
Mobile

Switches

Abnormal
RE packets discard Data filter
Sensor

Figure 1: The system model of the SDN-based Smart Grid; it mainly includes physical plane, southbound interface, control plane,
northbound interface, and application plane. Devices in the physical layer initiate access request through the Internet, and the flow
collection module of the SDN controller captures all request flow statistics table information to extract flow features. HYBRID-CNN is used
to detect abnormal traffic and generate abnormal reports. Then, the generated anomaly report is sent to the SDN controller through the
security channel. Finally, the SDN controller discards attack packets and updates the flow table according to the received report.

application programs to program the network. It abstracts 3.2.2. The Hierarchy of Network Flow. Network flow has a
the details of data in the physical layer and allows network distinct hierarchy, as shown in Figure 2, where the bottom
administrators, service providers, and researchers to cus- row shows a sequence of flow bytes. According to a specific
tomize the control rules and behaviors of their networks. network protocol format, multiple flow bytes are combined
into a network packet, and then multiple network packets
are combined into a network flow. A network flow is divided
3.1.5. Application Plane. The application layer comprises into normal or malicious tasks, and a deep learning algo-
many Smart Grid applications, including network security rithm is used to learn hierarchical features, which has
function programs such as abnormal flow detection module achieved good results. These studies urge us to use deep
and flow data filtering module. All these application-defined learning to learn the hierarchical features of network flow to
policies need to be translated into OpenFlow rules that are complete the task of intrusion anomaly detection.
transferred to the physical layer programmable switch and
then transferred from the north interface to the control
layer. 3.3. Working Methodology. Devices in the physical layer
initiate access request through the Internet, and the flow
collection module of the SDN controller captures all re-
3.2. System Security Requirements quest flow statistics table information to extract flow fea-
tures. The abnormal flow detection module includes three
3.2.1. The Immovability and Concentricity of Network stages: data preprocessing, model training, and model
Architecture. The function of the Smart Grid communica- validation, as shown in Figure 3. First, the collected
tion network is generated with the design phase, and it is flowmeter data are preprocessed, including data encoding,
almost impossible to reconfigure the network based on the data normalization, data reshaping, and data split. After
real-time needs of the network. In terms of performance and data preprocessing, the flow data vectors will be feature-
resilience, the bottlenecks will be caused by this nondynamic extracted, feature-fused, and anomaly-detection-classified
structure of today’s Smart Grid. At the same time, the by the HYBRID-CNN algorithm.
network will be vulnerable to multiple types of attacks. On In addition to the powerful anomaly flow detection
the other hand, the highly centralized network control ca- above, the proposed solution performs end-to-end delivery
pability increases the damage caused by network abnormal of detection reports through the SDN as shown in Figure 1.
flow intrusion considerably [28]. The SDN is the control This is achieved by incorporating the anomaly flow detection
center of the entire network. It may itself be the target of model into the core of the SDN control plane. The execution
various attacks and these attacks will damage the SDN process works in the following order: (i) detection stage, (ii)
resulting in all its control paralysis or misbehavior of a reporting phase, and (iii) update phase. In the first stage, the
switch can have a devastating effect on the entire network. control plane encapsulated with the anomaly flow detection
Therefore, it is necessary to design an effective abnormal model classifies the incoming flow as abnormal and normal.
flow detection algorithm in the SDN controller. Then in the second stage, the report is communicated to the
Security and Communication Networks 5

Network flow

Packet ...... Packet ...... Packet

Byte ...... Byte ...... Byte ...... Byte ...... Byte

Figure 2: The structure of a network flow. Multiple bytes are combined into a packet, and then multiple packets are combined into a
network flow.

Training dataset Validation dataset

Data preprocessing Model training Model validation

Data encoding Feature extraction


HYBRID-CNN model

Data normalization Feature merge

Detection
Data reshaping Classification

Figure 3: Working methodology of the proposed anomaly detection algorithm; it includes data preprocessing, model training, and model
validation.

control plane. If the incoming flow is abnormal, the control probability value between 0 and 1. It increases as the pre-
plane discards the packet and immediately gives up com- dicted probability diverges from the actual label. In binary
munication with the requesting host. This helps protect the classification, where the number of classes M equals 2, the
underlying network with malicious content and prevents it cross-entropy loss can be calculated as
from spreading further on the network. During the update loss � − (ylog(p)) − (1 − y)log(1 − p). (2)
stage, the control plane updates the flow table entry of the
forwarding device. If M > 2 (i.e., multiclass classification), we calculate a
separate loss for each class label per observation and sum the
4. Preliminaries results:
M
In this section, we briefly describe the general notion used in
loss′ � − 􏽘 yo,c log􏼐po,c, 􏼑, (3)
our proposed algorithm. c�1

where y is binary indicator (0 or 1) if class label c is the


4.1. Activation Function. The activation function provides
correct classification for observation o and p is predicted
the nonlinear modeling capability of the network. Rectified
probability that observation o is of class c.
Linear Unit (ReLU) is the most widely used function [29]; it
can keep the gradient from attenuating, thus effectively
alleviating the problem of gradient disappearance; the 4.3. Optimizer. We use Adam optimizer to learn the net-
function expression is as follows: the ReLU activation work weight parameters. And independent adaptive
function produces 0 as an output when x < 0 and produces a learning rates are designed for different parameters with
linear with slope of 1 when x > 0: calculating the first-order moment estimation and the
􏽢 ′ � max(0, x).
y (1) second-order moment estimation of the gradient. Empir-
ical results prove that Adam has greater advantages over
other optimizers in practice [30]. Moving averages of
4.2. Cross-Entropy Loss. Cross-entropy loss measures the gradient mt � β1 mt− 1 + (1 − β1 )gt and squared gradient
performance of a classification model whose output is a vt � β2 vt− 1 + (1 − β2 )g2t , bias corrected estimators for the
6 Security and Communication Networks

first moments m 􏽢 t and second moments 􏽢vt � vt /(1 − βt2 ), the [d1, d2, …, dN]
update rules for Adam are as follows:
Shuffle
m􏽢
ωt � ωt− 1 − η 􏽰�� t , (4)
􏽢vt + ε
[d1′, d2′, …, dN′]
where ω is model weights, η is the step size, and β, ε are
hyperparameters.

5. Proposed HYBRID-CNN Algorithm Training dataset Validation dataset


[d1′, d2′, …, dt′] [dt+1′, dt+2′, …, dN′]
In this part, we first introduce the data preprocessing op-
eration. Then, we describe the structure of HYBRID-CNN Figure 4: Data split. Using the shuffle method on the network flow
dataset to generate random data, and then splitting the random
algorithm and how to detect abnormal flow.
data into a training dataset and a validation dataset.

5.1. Data Preprocessing


dataset to generate random data and then slice the entire
5.1.1. Data Encoding. The input flow data contains a variety dataset to obtain a training dataset and a validation dataset.
of features; some of them are no-numeric types, so they need
to be encoded as numeric types to be used as input to the
neural network. Here, we use Label encoder encoding to 5.2. HYBRID-CNN. The structure of CNN is shown in
convert discrete features to continuous features [31], such as Figure 5. It is an end-to-end deep learning model with
[protocol: TCP, service: HTTP, state: FIN, . . .] ⟶ [pro- powerful feature learning and classification capabilities. It is
tocol: 4, service: 2, state: 2, . . .]. widely used in image classification, speech recognition,
computer vision, and other fields [32].
The network flow contains both abnormal and normal
5.1.2. Data Normalization. Data normalization can speed up flow, and HYBRID-CNN training is performed at this stage
the solution, improve the accuracy of the model, and prevent to detect misused attacks, which aims to further categorize
a feature with a particularly large value range from affecting the malicious data from stages into corresponding classifi-
the distance calculation. For the features that there is a very cation strategies, i.e., Scan, R2L, DoS, and Probe. The
large scope in the difference between the minimum and structure of our proposed HYBRID-CNN algorithm is
maximum values, such as “dur,” “sbytes,” and “dbytes,” we shown in Figure 6. We divide it into three parts. The first part
apply the logarithmic scaling method for scaling to obtain is feature extraction, the second part is feature fusion, and
the features which are mapped to a range. We choose the the third part is the detection classification.
MIN-MAX scaling method [31] and normalize the data
according to the following equation:
Xi − Xmin 5.2.1. Feature Extraction. In the feature extraction phase, we
Xi � , (5)
Xmax − Xmin use the form of dual input of flow data, which aims to extract
the features of flow more comprehensively. The role of the
where Xi denotes each data point, Xmin denotes the mini- input layer is to receive input data, and the size of the input
mum value from all data points, and Xmax denotes the layer is consistent with the size of the input data, such as a
maximum value from all data points for each feature. vector x � [x1 , x2 , . . . , xn ], or a matrix M.
For the first input (the upper part of the blue box), every
5.1.3. Data Reshaping. For CNN input, its format should be user’s access flow essentially is 1D data. We utilize two layers
three-dimensional data (height, width, channel), and as a of DNN to extract the global features of the flow. Our
single sample, the channel should be 1, so that we can re- motivation is to learn the frequent co-occurrence of features
shape a single flow sample with a length of s � h∗ w + 1 to pass by memorizing one-dimensional data. The calculation
obtain a data structure similar to an image and construct a method of each neuron in the fully connected layer is
matrix M of h∗ w, namely, n
⎝􏽘 ω x + b ⎞
xi � f ⎛ ⎠
M11 . . . M1w i,j i 1 . (7)
i�1


⎜ ⎟


M′ � ⎜

⎝ ⋮
⎜ ⋱ ⋮ ⎟ ⎟
⎠.
⎟ (6)
After the data preprocessing, its input shape is
Mh1 . . . Mhw (h∗ w, 1). In layer 1, we set a neuron, and the shape of the
output data is (h∗ w, a). In the fully connected layer 2, we
set b neurons, and the shape of the output data is (h∗ w, b).
5.1.4. Data Split. For every model we want to train, each The two-dimensional data is straightened to obtain a one-
model has two datasets: one is the training dataset and the dimensional feature vector of h∗ w∗ b, 1. In this process, the
other is the validation dataset. As shown in Figure 4, in order activation function used is ReLU to obtain the output
to separate them, we first apply the shuffle method on the feature Owide .
Security and Communication Networks 7

Input Output

SoftMax

Convolution Pooling Convolution Pooling Fully connected


+ activation + activation
Figure 5: The structure of a CNN. It includes input layer, convolution layer, activation function, pooling layer, fully connected layer, and
output layer.

Self-attention

MatMul
1D flow data

SoftMax

Hidden layer Mask (opt.)

Global feature
Scale
Reshape
MatMul
Convolution Pooling Flatten

+
Kernels
2D flow data Local feature
Feature extraction Feature merge Classification
Figure 6: The structure of the proposed HYBRID-CNN algorithm; it includes feature extraction, feature merge, and classification. The
feature extraction aims to extract the feature of flow more comprehensively, the self-attention mechanism aims to fuse key feature, and the
classification aims to classify accurately.

For the second input (the lower part of the blue box), we usually placed after the convolutional layer. By performing a
reshape the one-dimensional data of the first input into a merge operation on a local area of the feature map, the
two-dimensional matrix. We believe that the deeper features feature has a certain spatial invariance. The merge operation
can be better learned in the form of two-dimensional matrix reduces feature size and prevents overfitting. xl+1 is obtained
input. The CNN uses a sliding convolution kernel to extract by the following pooling:
local features of flow data. In this part of the network, a
xl+1 � β down􏼐xl 􏼑 + b, (9)
convolution layer, a pooling layer, and a flatten layer are
included. where down(·) represents the pooling function, β is a
One of the limitations of conventional neural networks is multiplicative bias, and b is additive bias. The reshaped shape
poor scalability due to the full connection of neurons; CNN of the input data is (h, w). We use k convolution kernels with
overcomes this shortcoming by convolving each neuron to the same shape to extract the convolution features. At first,
its neighbors instead of all neurons [33]. Set the input of the the data shape is (h − k + 1, k); after pooling, the shape of the
i-th layer to xl+1 , the output to xl , and the convolution kernel data is ((h − k + 1)/2, k). Then, through the flatten layer, the
to k. The convolution operation is performed by the fol- data shape is ((h − k + 1)/2∗ k, 1), and the output feature
lowing equation: OCNN is obtained.
xl � f􏼐􏽘 xl−i 1 ⊗kli + bl 􏼑, (8) For the two extracted features, perform feature fusion to
obtain the feature Oi (k):
where f(·) is a nonlinear activation function, ⊗ is a con- Oi (k) � Owide + OCNN . (10)
volution sign, and bl is a bias term. The pooling layer is
8 Security and Communication Networks

5.2.2. Feature Merge. In the feature fusion part, we use a self- (R) i7-9750 Hz 2.60 GHz CPU, 8 GB RAM, NVIDIA
attention mechanism to fuse key features. The essence of the GeForce RTX 2060 6G GDDR6 GPU, and 10.2 CUDA,
self-attention mechanism is to observe a specific part using Python, Scikit-learn, NumPy, Pandas, TensorFlow,
according to the observation of the need [34]. and Keras. The data we use comes from an online public
For self-attention, we get three matrices Q (Query), K dataset. We carried out model comparison experiments to
(Key), and V (Value) from the input Oi (k). The self-at- verify that the mixed model has higher accuracy than the
tention mechanism obtains different representations, cal- single model. Compared with traditional machine
culates scaled dot-product attention of each representation, learning methods and deep learning methods, the ex-
and finally concatenates the results. Specifically, the current perimental results show that our method is superior to
representations input into the self-attention layer, and the these methods.
new representation is calculated. First, we have to calculate
the point product between Q and K, and then in order to
prevent the result from being too large, it will be divided by a 6.1. Experimental Setup
􏽰��
scale dk , where dk is the dimension of a query and key 6.1.1. Experimental Data. The dataset we are using is
vector, and then the results are normalized to a probability UNSW_NB15 on network intrusion detection [36], which is
distribution using a SoftMax operation and then multiplied a mixture of real normal activity flow and attack flow created
by the matrix V to obtain a weighted summation repre- by the Australian Network Security Center in the network
sentation. This operation can be expressed as laboratory using IXIA Perfect Storm tool. Table 1 is the list of
QKT features and categories.
Attention(Q, K, V) � softmax􏼠 􏽰�� 􏼡V. (11) These features are categorized into five groups:
dk
(i) Basic features: they involve the attributes that
represent protocols connections
5.2.3. Classification. After feature fusion, we use a fully (ii) Flow features: they include the identifier attributes
connected layer for detection and classification; all neurons between hosts (e.g., server-to-client or client-to-
in the previous layer are connected to each neuron in the serve)
current layer. The fully connected layer is located before the (iii) Content features: they encapsulate the attributes of
output layer. After the extracted features are converted into a TCP/IP; also, they contain some attributes of http
one-dimensional feature vector, they are connected to each services
neuron in the current layer to map the high-level features in
a targeted manner: (iv) Time features: they contain the attributes time, for
example, arrival time between packets, start/end
n
x′i � f􏼠 􏽐 ωi,j xi + b1 􏼡. (12) packet time, and round-trip time of TCP protocol
i�1 (v) Additional generated features: this category can be
further divided into two groups: general-purpose
The fully connected layer will target high-level features
features, whereby each of them has its own purpose,
according to the specific tasks of the output layer perform
to protect the service of protocols, and connection
mapping and use the SoftMax and Sigmoid activation
features that are built from the flow of 100 record
function after mapping to get the final classification de-
connections based on the sequential order of the last
tection result (normal, abnormal, or attack types).
time feature
The output layer is a SoftMax function [35]; it normalizes
K real numbers into a K probabilities distribution, after To label this dataset, two attributes were provided:
applying SoftMax, each component will be in the interval attack_cat represents the nine categories of the attack and
(0, 1), and the components will add up to 1, which can be the normal, and label is 0 for normal and otherwise is 1.
interpreted to map the nonnormalized output of a network
to a probability distribution over predicted output classes.
Set z � (z1 , . . . , zK ) ∈ RK ; the standard SoftMax function 6.1.2. Performance Metrics. The performance metrics for
σ: RK ⟶ RK is defined by the formula: abnormal flow detection depend on the confusion matrix
constructed for any proven classification problem [37].
ezj Its size depends on the number of classes contained in the
σ(z)j � K , forj � 1, . . . , K. (13)
􏽐k�1 ezk dataset. Its main purpose is to compare the actual tags
with the predicted tags. The intrusion detection problem
􏽢:
Hence, the predicted class would be y can be defined by a 2 × 2 confusion matrix, which includes
􏽢 � arg max􏽨σ(z)j 􏽩.
y (14) normal and attack categories for evaluation. The
detailed description of the confusion matrix is shown in
Table 2.
6. Experimental Evaluation TP and TN denote the conditions for correct classi-
fication, while FP and FN denote the conditions for the
To evaluate the proposed abnormal flow detection scheme, mistaken classification. TP and TN refer to correctly
we conduct the simulation on a 64-bit computer with Intel classified attack flow and normal flow, respectively, while
Security and Communication Networks 9

Table 1: Features of the UNSW_NB15 dataset. The Detection Rate (DR), also known as the True Positive
No. Feature name Category
Rate (TPR), is the ratio of correctly classified malicious flow
instances to the total number of malicious flow instances.
(1) dur Numeric
(2) Proto Nonnumeric
The calculation formula is
(3) service Nonnumeric TP
DR � . (16)
(4) state No-numeric FN + TP
(5) spkts Numeric
(6) dpkts Numeric The False Positive Rate (FPR) is the proportion of
(7) sbytes Numeric normal instances that are misclassified as attack flow in the
(8) dbytes Numeric total number of normal instances. The formula is
(9) rate Numeric FP
(10) sttl Numeric FPR � . (17)
(11) dttl Numeric FP + TN
(12) sload Numeric The Precision (Pre) represents the proportion of the
(13) dload Numeric actual normal samples to the samples divided into normal;
(14) sloss Numeric
the formula is
(15) dloss Numeric
(16) sinpakt Numeric TP
Pre � . (18)
(17) dinpakt Numeric TP + FP
(18) sjit Numeric
(19) djit Numeric The F1 score is used to synthesize precision and recall as
(20) swin Numeric an evaluation index. The formula is
(21) dwin Numeric 2∗ Pre∗ DR
(22) stcpb Numeric F1score � . (19)
Pre + DR
(23) dtcpb Numeric
(24) tcprtt Numeric
(25) synack Numeric
(26) ackdat Numeric 6.2. Performance Comparison
(27) smean Numeric
(28) dmean Numeric 6.2.1. Model Comparison. For comparison, we used a single
(29) trans_depth Numeric CNN model and a simple DNN model. Our proposed hybrid
(30) response_body_len Numeric CNN model includes 2 input layers, 1 convolutional layer, 1
(31) ct_srv_src Numeric pooling layer, and 4 fully connected layers. A single CNN
(32) ct_state_ttl Numeric model includes a convolutional layer, a pooling layer, and a
(33) ct_dst_ltm Numeric fully connected layer. The simple DNN model contains only
(34) ct_src_dport_ltm Numeric 3 fully connected layers.
(35) ct_dst_sport_ltm Numeric
The configuration of the model structure parameters in
(36) ct_dst_src_ltm Numeric
(37) is_ftp_login Numeric this paper is shown in Figure 7. Each column is a model. The
(38) ct_ftp_cmd Numeric input data shape of the DNN part of our proposed hybrid
(39) ct_flw_http_mthd Numeric CNN model is (42,1), the data shape through Dense1 is
(40) ct_src_ltm Numeric (42,128), the data shape through Dense_2 is (42,64), and
(41) ct_srv_dst Numeric then the data shape through Flatten_1 is (2688), the shape of
(42) is_sm_ips_ports Numeric the input data of the CNN is (6,7) through the Conv1D_1
layer, the shape of the data becomes (4,32), followed by
Pooling_1, and the shape of the data becomes (2,32). In the
Table 2: Confusion matrix for binary classification problem. Merge layer, the two-channel data are merged into one.
Actual After this layer, the shape of the data becomes (2752) and
Predicted then passes through the Dense_3 layer. As a result, the same
Negative Positive
Negative TN (true negative) FP (false positive)
shape is formed in each model by these layers in turn.
Positive FN (false negative) TP (true positive) As shown in Table 3, we set the initial weight parameters
to random values, set the batch size to 512, and use our
Adam optimizer and binary_cross-entropy loss function to
FP and FN refer to misclassified normal and attack rec-
compile the model. To evaluate the performance of the
ords, respectively. These four items are used to generate
model, we use accuracy as a metric function during training
the following performance evaluation metrics.
verification.
The Accuracy (Acc) is a measure used to evaluate the
After the model is compiled, we use the input data to
overall success rate of the model in detecting normal records
perform model training in batch mode and evaluate the
and abnormal flow and is calculated as
performance index values at the end of each epoch. One
TN + TP epoch means that all training datasets have undergone a
Acc � . (15)
TP + FP + TN + TP complete training iteration. The training results are shown in
Figure 8, where the horizontal axis represents the number of
10 Security and Communication Networks

Hybrid CNN model Single DNN model Single CNN model


Input1: (42, 1) Input2: (6, 7) Input4: (42, 1) Input3: (6, 7)
(42, 1) (6, 7) (42, 1) (6, 7)
Dense_1 Conv1D Dense_1 Conv1D
(42, 128) (4, 32) (42, 128) (4, 32)
(42, 128) (4, 32) (42, 128) (4, 32)
Dense_2 Pooling Dense_2 Pooling
(42, 128) (2, 32) (42, 128) (2, 32)
(42, 64) (2, 32) (42, 64) (2, 32)
Flatten_1 Flatten_2 Flatten_1 Flatten_2
2688 64 2688 64
Attention merge layer: (2752) – –
Dense_3: (32)
Dense_4: (1)

Figure 7: Model configuration parameters.

Table 3: Configuration parameters for different models.


Methods Origin weights Batch_size Activation
Single DNN model Random 512 ReLU
Single CNN model Random 512 ReLU
Our proposed model Random 512 ReLU

epochs trained, and the vertical axis represents the loss and Table 4 lists the performance comparison between our
accuracy score values. We observe that the loss of our proposed HYBRID-CNN and some other existing methods.
proposed hybrid CNN model becomes smaller and smaller It is worth noting that we select a subset for experiments
as the training progresses, and after 100 epochs of training, it based on a certain training dataset ratio. The training dataset
obtains higher accuracy scores than the single CNN model ratio is defined as the proportion of training samples. The
and DNN model. proportion of the dataset is 60%, 70%, and 80%. In each
dataset of experiments, we evaluated five methods including
our proposed method and evaluated three performance
6.2.2. Method Comparison. To evaluate the performance of metrics (Acc, DR, FPR). The experimental results in Table 4
our proposed hybrid CNN model, we performed experi- show that our proposed HYBRID-CNN compared with
ments on UNSW_NB15 dataset. The comparison methods other traditional machine learning methods and deep
selected are as follows: learning methods. Compared with other methods, our
(i) Naive Bayes [17]: Naive Bayes is a supervised proposed HYBRID-CNN can reach Accuracy of 0.9564, DR
learning classifier based on Bayes theorem. It of 0.9856, and FPR of 0.0442, which means that our pro-
classifies the problem by combining previous cal- posed method has higher accuracy in detecting abnormal
culated likelihood and probabilities to make the flow than other traditional methods. It is because the
next probability using Bayes rule. combination input using a DNN and CNN has better feature
learning capabilities.
(ii) SVM [19]: an SVM is a discriminative classifier
Figure 9 is a comparison of the training and validation
formally defined by separating hyperplanes. SVM-
accuracy and loss between our proposed HYBRID-CNN
based kernels classify the data which effectively
method and the other two methods. All models have been
works for most of the datasets. Discriminant
trained for 100 epochs, and performance indicators have
function: “Linear SVM.”
been evaluated after each epoch. By comparison, we can
(iii) LSTM [22]: the improved model based on RNN for find HYBRID-CNN in the training and validation process
intrusion detection, using ReLU activation function, of the method; the loss convergence speed is much faster.
Adam optimizer, 100 epoch, and two-layer LSTM And the best results can be achieved faster for the accuracy
{128, 64}. improvement, which is obviously better than other
(iv) CNN-LSTM [25]: a CNN combined with LSTM to methods.
analyze and detect network flow. It utilizes CNN
to learn low-level spatial features of network flow
for the first time and then uses LSTM to learn 6.2.3. ROC Curves Comparison. We further plot the Re-
high-level temporal features, using ReLU activa- ceiver Operating Characteristic (ROC) curves of our pro-
tion function, Adam optimizer, 100 epoch, and posed HYBRID-CNN and state-of-the-art methods on
two-layer LSTM {128, 64}; two-layer CNN in- UNSW_NB15, as shown in Figure 10. The ROC curve of
cludes pooling layer. HYBRID-CNN is the closest one to the upper left corner,
Security and Communication Networks 11

0.300 0.300

0.275 0.275

0.250 0.250

0.225 0.225
Loss

Loss
0.200 0.200

0.175 0.175

0.150 0.150

0.125 0.125

0.100 0.100
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed Hybrid CNN model Proposed Hybrid CNN model


Single DNN model Single DNN model
Single CNN model Single CNN model
(a) (b)
0.96 0.94
0.93
0.94
0.92

0.92 0.91
Accuracy

Accuracy

0.90
0.90 0.89
0.88
0.88
0.87
0.86 0.86
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed Hybrid CNN model Proposed Hybrid CNN model


Single DNN model Single DNN model
Single CNN model Single CNN model
(c) (d)

Figure 8: Comparison of different models. (a) Training loss. (b) Validation loss. (c) Training accuracy. (d) Validation accuracy.

Table 4: Performance comparison of the proposed and state-of-the-art methods.


Proportion � 80% Proportion � 70% Proportion � 60%
Reference Method
Acc DR FPR Acc DR FPR Acc DR FPR
Ashraf et al. [17] Naive Bayes 0.7663 0.8514 0.3841 0.7669 0.8611 0.3999 0.7655 0.8512 0.3883
Reddy et al. [19] SVM 0.7594 0.6895 0.1170 0.7257 0.7806 0.3714 0.7346 0.7874 0.3591
Xin et al. [22] LSTM 0.8916 0.9843 0.2724 0.8897 0.9840 0.2775 0.8894 0.9835 0.2778
Wang et al. [25] CNN-LSTM 0.8995 0.9612 0.2095 0.8965 0.9460 0.1910 0.8955 0.9571 0.2138
Proposed method HYBRID-CNN 0.9564 0.9856 0.0442 0.9408 0.9382 0.0544 0.9386 0.9493 0.0803

indicating better generalization ability against the other HYBRID-CNN is capable of efficient abnormal flow
methods. All the results reported above demonstrate that detection.
HYBRID-CNN outperforms its competitors. We can con-
clude that HYBRID-CNN effectively handles the abnormal
flow detection problem by the ability to compress the 6.2.4. Computation Comparison. To deepen this investi-
original data to more discriminative abstract features, and gation, Table 5 reports the number of training parameters
12 Security and Communication Networks

0.7 0.7

0.6 0.6

Loss 0.5 0.5

Loss
0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Our proposed method Our proposed method


LSTM [22] LSTM [22]
CNN-LSTM [24] CNN-LSTM [24]
(a) (b)
0.95 0.95

0.90 0.90
0.85
0.85
Accuracy

0.80
Accuracy

0.80
0.75
0.75
0.70
0.70
0.65

0.60 0.65

0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Our proposed method Our proposed method


LSTM [22] LSTM [22]
CNN-LSTM [24] CNN-LSTM [24]
(c) (d)

Figure 9: Comparison of different methods. (a) Training loss. (b) Validation loss. (c) Training accuracy. (d) Validation accuracy.

(in millions) and running time required for both the automatically through the training process, which will
proposed HYBRID-CNN and state-of-the-art methods. We greatly affect the performance of the model. Batch_size α
use GPU to accelerate the training speed of all models. It is the number of training samples of the neural network
can be noticed that, when training on the UNSW_NB15 after one forward-propagation and back-propagation
dataset, the proposed HYBRID-CNN has fewer trainable operation, which means how many samples will be used to
parameters and lower training time and testing time. This evaluate the loss in each optimization process; β is the
outcome results from the use of CNN in the proposed number of different convolution kernels used in convo-
method, which can realize efficient parallel computation, lution operation, how many convolution kernels there
and we use as small number of parameters as possible in the are, and how many feature maps will be generated after
structure. convolution; c is the size of convolution kernels. Each
convolution kernel has three dimensions of length, width,
and depth. In a convolution layer of CNN, the length and
6.3. Parameter Study. There are various configurable width of convolution kernels need to be manually con-
hyperparameters in the model, such as Batch_size α, figured. Optimizer ϵ is the type of optimizer used to
number of convolution kernels β, convolution kernel size optimize loss and then update weight parameters.
c, and optimizer ε. These hyperparameters can only be Therefore, we deeply analyzed the influence of these super
configured manually but cannot be optimized parameters on the performance of our proposed hybrid
Security and Communication Networks 13

1.0

0.8

True Positive Rate


0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate

ROC curve of HYBRID-CNN (area = 0.99)


ROC curve of Naive Bayes (area = 0.84)
ROC curve of SVM (area = 0.80)
ROC curve of CNN + LSTM (area = 0.93)
ROC curve of LSTM (area = 0.89)
Figure 10: ROC curves of HYBRID-CNN and state-of-the-art methods on UNSW_NB15 dataset.

Table 5: The comparison of the computational complexity of the proposed and state-of-the-art methods.
Method Trainable parameters Training time (s) Testing time (s)
LSTM 0.1391 402.58 1.77
CNN-LSTM 0.1404 526.31 8.99
Proposed 0.0951 271.26 0.75

CNN model. In Figure 7, the parameters of the hybrid convolution kernels is 1, we can get an accuracy of 0.9403.
CNN model proposed by us are α � 512, β � 4, c � 1 × 3, When the number of convolution kernels increases to 2,
and ε � Adam. The model training results for these pa- the loss convergence rate also increases. At 4, the speed of
rameters are as follows. loss convergence is significantly accelerated. Generally,
when the network is deeper, more convolution kernels are
often required to fully extract key features.
6.3.1. Effect of Batch_size α. As shown in Figure 11, we set α
to 128, 256, and 512 for experiments. When α � 128, the
training and validation loss converge faster in the same 6.3.3. Effect of Convolution Kernel Size c. As shown in
period and finally reach the set number of iterations. The Figure 13, we set the size c of the convolution kernel to 1 × 2,
best effect is 0.9477. We can know that a smaller 1 × 3, and 1 × 4 for experiments. When the size of the
Batch_size can speed up the optimization in the same convolution kernel is 1 × 2, the training loss and accuracy
period, but it means that more calculation time is needed rate will jitter sharply. It is not conducive to convergence.
to optimize. Increasing the Batch_size properly can im- When the size of the convolution kernel is increasing, the
prove the running speed and gradient descent direction. loss converges a little faster and the fluctuation range be-
With accuracy increasing, the amplitude of training vi- comes smaller, so it should be better to choose a 1 × 3 or 1 × 4
bration decreases. size convolution kernel.

6.3.2. Effect of Number of Convolution Kernels β. As shown 6.3.4. Effect of Optimizer ε. As shown in Figure 14, we have
in Figure 12, we set the number of convolution kernels β as selected several commonly used optimizers SGD, RMSprop,
1, 2, and 4 for experiments. When the number of Adam, and Adagrad for experimental comparison. When
14 Security and Communication Networks

0.15 0.160

0.155
0.14
0.150

0.13 0.145
Loss

Loss
0.140
0.12
0.135

0.11 0.130

0.125
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed HYBRID-CNN model (128) Proposed HYBRID-CNN model (128)


Proposed HYBRID-CNN model (256) Proposed HYBRID-CNN model (256)
Proposed HYBRID-CNN model (512) Proposed HYBRID-CNN model (512)
(a) (b)
0.955 0.940

0.950
0.935
0.945

0.940 0.930
Accuracy

Accuracy

0.935 0.925
0.930
0.920
0.925

0.920 0.915
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed HYBRID-CNN model (128) Proposed HYBRID-CNN model (128)


Proposed HYBRID-CNN model (256) Proposed HYBRID-CNN model (256)
Proposed HYBRID-CNN model (512) Proposed HYBRID-CNN model (512)
(c) (d)

Figure 11: Parameter study of α. (a) Training loss. (b) Validation loss. (c) Training accuracy. (d) Validation accuracy.

SGD is used as an optimizer, the effect is not ideal. It can only (2) w/o DNN: the DNN module is removed from
achieve an accuracy of 0.9259. There was a large shock at HYBRID-CNN
around 40. We can see that when Adam optimizer is used, (3) w/o CNN: the CNN module is removed from HY-
the initial loss convergence is like other optimizers. In the BRID-CNN
medium term, the Adam optimizer loss convergence is
significantly faster and finally achieves the best. The accuracy We further analyzed the detailed performance of HYBRID-
is 0.9483. CNN in the ablation study, and the results of the ablation
studies are shown in Table 6. Comparing HYBRID-CNN with
model (1), we can conclude that the self-attention module can
6.4. Ablation Study. For a thorough analysis, we conduct an help detect abnormal flow, because attention can capture key
ablation study on HYBRID-CNN to analyze the effectiveness features more comprehensively. The effectiveness of DNN can
of each module. The details of the ablation study based on also be demonstrated by comparing HYBRID-CNN with
UNSW_NB15 are listed as follows: model (2). When we removed the DNN module, accuracy
declined because the model could not extract high-dimensional
(1) w/o attention: we remove the self-attention module
global features. However, when the CNN module was re-
from HYBRID-CNN but keep the DNN module and
moved, it could be found that the accuracy was greatly reduced,
the CNN module
Security and Communication Networks 15

0.18 0.17

0.17
0.16
0.16

0.15 0.15
Loss

Loss
0.14
0.14
0.13

0.12 0.13
0.11
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed HYBRID-CNN model (1) Proposed HYBRID-CNN model (1)


Proposed HYBRID-CNN model (2) Proposed HYBRID-CNN model (2)
Proposed HYBRID-CNN model (4) Proposed HYBRID-CNN model (4)
(a) (b)
0.940
0.945

0.940 0.935

0.935 0.930
Accuracy

Accuracy

0.930
0.925
0.925
0.920
0.920

0.915 0.915

0.910
0.910
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
Epoch
Epoch
Proposed HYBRID-CNN model (1)
Proposed HYBRID-CNN model (1)
Proposed HYBRID-CNN model (2)
Proposed HYBRID-CNN model (2)
Proposed HYBRID-CNN model (4)
Proposed HYBRID-CNN model (4)
(c) (d)

Figure 12: Parameter study of β. (a) Training loss. (b) Validation loss. (c) Training accuracy. (d) Validation accuracy.

because the model could not extract the local features of the reshape a one-dimensional flow dataset into a two-
flow, and CNN has a great impact on the results. dimensional matrix, so a zero feature is used here to add
a dummy feature. It does not affect the result and is
just for data reshaping.
6.5. Attack Detection. In order to detect the attack type of We made comparisons with the current latest tech-
abnormal flow, the dataset we used to evaluate the model nology, and Figure 15 illustrates the relative comparison
was KDDCup 99 [38]. The entire dataset has approxi- of our proposed abnormal flow detection algorithm with
mately 5 million flow records, each of which has 41 the current latest technology model. It is obvious from the
features (the 1–9 features are the basic attributes of the obtained results that the proposed model performs better
packet, the 10–22 features are the packet content, and the on the KDDCup 99 dataset than the existing scheme in
23–31 features are flow function and 32–41 are host-based terms of Accuracy, Detection Rate, and F1 score.
features). As shown in Table 7, these attack flow Figure 15(a) shows the Precision evaluation of the pro-
instances can be further divided into DoS, U2R, R2L, posed method corresponding to Normal, PROBE, DoS,
and Probe. For the KDDCup 99 dataset, the flow U2R, and R2L data examples (99.92%, 98.11%, 99.98%,
sample has 41 features and a label. We cannot directly 93.81%, and 93.16%, respectively). Figure 15(b) shows the
16 Security and Communication Networks

0.16 0.160

0.155
0.15
0.150

0.145
Loss

Loss
0.14
0.140

0.135
0.13
0.130

0.12 0.125
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed HYBRID-CNN model (1 × 2) Proposed HYBRID-CNN model (1 × 2)


Proposed HYBRID-CNN model (1 × 3) Proposed HYBRID-CNN model (1 × 3)
Proposed HYBRID-CNN model (1 × 4) Proposed HYBRID-CNN model (1 × 4)
(a) (b)
0.940
0.945

0.935
0.940
Accuracy

Accuracy

0.935 0.930

0.930
0.925

0.925
0.920
0.920
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
Epoch
Epoch
Proposed HYBRID-CNN model (1 × 2)
Proposed HYBRID-CNN model (1 × 2)
Proposed HYBRID-CNN model (1 × 3)
Proposed HYBRID-CNN model (1 × 3)
Proposed HYBRID-CNN model (1 × 4)
Proposed HYBRID-CNN model (1 × 4)
(c) (d)

Figure 13: Parameter study of c. (a) Training loss. (b) Validation loss. (c) Training accuracy. (d) Validation accuracy.

Detection Rate evaluation of the proposed method cor- classes. For U2R and R2L, although the detection rate of
responding to successful detection of Normal, PROBE, the proposed model is lower than that of other classes,
DoS, U2R, and R2L data examples (98.21%, 93.62%, overall, it still achieves better results compared with other
98.89%, 92.59%, and 87.76%, respectively). Figure 15(c) methods.
shows the F1 score evaluation of the proposed method
corresponding to Normal, PROBE, DoS, U2R, and R2L 7. Discussion
data examples (96.74%, 94.02%, 98.51%, 91.92%, and
89.37%, respectively). Evaluation of the UNSW_NB15 dataset shows that our
It can be clearly seen from the obtained results that, model can provide 95.64% accuracy, which is a major
for normal flow, DoS attacks and PROBE attacks have improvement over other deep learning methods. How-
reached the maximum detection level, while detection ever, it should be noted that the results of the “R2L” and
effects for U2R and R2L attacks are slightly lower. In the “U2L” attack classes are lower than those of other classes,
real network, normal activity flow dominates while U2R because the model needs more data to learn. Unfortu-
and R2L are very few classes. Dataset imbalance is a quite nately, due to the severe imbalance in the training data of
common problem in intrusion detection. The detection such attacks, the results obtained are not stable. Hybrid
model is biased towards most classes and neglects a few detection methods are mainly combined with deep
Security and Communication Networks 17

0.300 0.35

0.275
0.30
0.250

0.225 0.25
Loss

Loss
0.200

0.175 0.20

0.150
0.15
0.125

0.100 0.10
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch

Proposed HYBRID-CNN model (Adam) Proposed HYBRID-CNN model (Adam)


Proposed HYBRID-CNN model (SGD) Proposed HYBRID-CNN model (SGD)
Proposed HYBRID-CNN model (RMSprop) Proposed HYBRID-CNN model (RMSprop)
Proposed HYBRID-CNN model (Adagrad) Proposed HYBRID-CNN model (Adagrad)
(a) (b)
0.96 0.94

0.93
0.94
0.92
Accuracy

0.92 0.91
Accuracy

0.90
0.90
0.89

0.88 0.88

0.87
0.86
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
Epoch
Epoch
Proposed HYBRID-CNN model (Adam)
Proposed HYBRID-CNN model (Adam)
Proposed HYBRID-CNN model (SGD)
Proposed HYBRID-CNN model (SGD)
Proposed HYBRID-CNN model (RMSprop)
Proposed HYBRID-CNN model (RMSprop)
Proposed HYBRID-CNN model (Adagrad)
Proposed HYBRID-CNN model (Adagrad)
(c) (d)

Figure 14: Parameter study of ϵ. (a) Training loss. (b) Validation loss. (c) Training accuracy. (d) Validation accuracy.

Table 6: Detailed performance (%) of HYBRID-CNN in ablation study.


Model Acc DR Pre FPR
HYBRID-CNN 95.64 98.56 96.13 4.42
(1) w/o attention 94.88 98.29 95.77 4.69
(2) w/o DNN 93.57 93.94 93.16 5.89
(3) w/o CNN 91.85 92.47 92.43 7.36

learning models, which can usually achieve higher de- time. Of course, our proposed model will spend more
tection accuracy. Considering the complexity of the deep time on training, but using GPU acceleration can reduce
learning algorithm, the algorithm can use less running training time.
18 Security and Communication Networks

Table 7: Attacks in the KDDCup 99 dataset.


Category Training dataset Testing dataset
Back, land, Neptune, pod, smurf, teardrop, mailbomb, processtable, udpstorm,
DoS Back, land, Neptune, pod, smurf, teardrop
apache2, worm
U2R Buffer-overflow, loadmodule, perl, rootkit Buffer-overflow, loadmodule, perl, rootkit, sqlattack, xterm, ps
fpt-write, guess-passwd, imap, multihop, phf, fpt-write, guess-passwd, imap, multihop, phf, spy, warezmaster, xlock, xsnoop,
R2L
spy, warezclient, warezmaster snmpguess, snmpgetattack, httptunnel, sendmail, named
Probe ipsweep, nmap, portsweep, Satan ipsweep, nmap, portsweep, Satan, mscan, saint

102 102
100
100
98

Detection rate (%)


98 96
Precision (%)

94
96
92
94 90

92 88
86
90
Normal Probe DOS U2R R2L Normal Probe DOS U2R R2L
Class types Class types

LSTM [22] LSTM [22]


CNN-LSTM [24] CNN-LSTM [24]
Our proposed Our proposed
(a) (b)
102
100
98
96
F1 score (%)

94
92
90
88
86

Normal Probe DOS U2R R2L


Class types

LSTM [22]
CNN-LSTM [24]
Our proposed
(c)

Figure 15: Experimental evaluation of the proposed method on the KDDCup 99 dataset. (a) Precision evaluation. (b) Detection Rate
evaluation. (c) F1 score evaluation.

8. Conclusion particular, our HYBRID-CNN model consists of the double


channel feature extraction, key feature fusion, and classi-
In this paper, we consider the problem of abnormal net- fication. It gains the benefits of global memorization and
work flow detection of the Smart Grid integrated with the local generalization brought by the DNN and the CNN,
SDN. For the pursuit of accurate detection and guaran- respectively. Besides, to measure the performance of the
teeing network performance, we formulate a deep learning proposed algorithm, we analyze the hyperparameters of the
detection algorithm based on the HYBRID-CNN. In HYBRID-CNN. Compared with other existing detection
Security and Communication Networks 19

algorithms, the experiment results show that the HYBRID- [4] I. Colak, S. Sagiroglu, G. Fulli, M. Yesilbudak, and
CNN has a higher detection accuracy and a lower false C.-F. Covrig, “A survey on the critical issues in smart grid
alarm rate. technologies,” Renewable and Sustainable Energy Reviews,
In our future work, a problem to be solved is to improve vol. 54, pp. 396–405, 2016.
the performance of the model through network structure [5] A. Feghali, R. Kilany, and M. Chamoun, “SDN security
problems and solutions analysis,” in Proceedings of the 2015
optimization and automatic hyperparameter tuning. The
International Conference on Protocol Engineering (ICPE) and
swarm intelligent optimization algorithm, such as Particle International Conference on New Technologies of Distributed
Swarm Optimization (PSO) algorithm and Artificial Bee Systems (NTDS), Paris, France, October 2015.
Colony (ABC) algorithm, can be used to automatically tune [6] R. Chaudhary, G. S. Aujla, S. Garg, N. Kumar, and
hyperparameters, which is an efficient method to improve J. J. P. C. Rodrigues, “SDN-enabled multi-attribute-based
the detection accuracy. Another problem to be solved is the secure communication for smart grid in IoT environment,”
unbalanced dataset. The detection accuracy of a few types of IEEE Transactions on Industrial Informatics, vol. 14, no. 6,
attacks needs to be improved. We hope to use data aug- pp. 2629–2640, 2018.
mentation in future work to reduce the impact of the dataset. [7] M. C. Dacier, H. König, R. Cwalinski, F. Kargl, and S. Dietrich,
“Security challenges and opportunities of software-defined
networking,” IEEE Security & Privacy, vol. 15, no. 2,
Abbreviations pp. 96–100, 2017.
[8] R. L. Sahita, “State-transition based network intrusion de-
ABC: Artificial Bee Colony
tection,” US20050111460A1, 2016.
CNN: Convolutional Neural Network [9] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning
DNN: Deep Neural Network approach to network intrusion detection,” IEEE Transactions
FPR: False Positive Rate on Emerging Topics in Computational Intelligence, vol. 2, no. 1,
IoT: Internet of Things pp. 41–50, 2018.
LSTM: Long Short-Term Memory [10] V. Vaidya, “Dynamic signature inspection-based network
MLP: Multilayer Perceptron intrusion detection,” US6279113B1, 2001.
PMU: Power Management Unit [11] N. Sultana, N. Chilamkurti, W. Peng, and R. Alhadad, “Survey
PSO: Particle Swarm Optimization on SDN based network intrusion detection system using
ReLU: Rectified Linear Unit machine learning approaches,” Peer-to-Peer Networking and
RNN: Recurrent Neural Network Applications, vol. 12, no. 2, pp. 493–501, 2019.
ROC: Receiver Operating Characteristic [12] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, “A
survey of deep learning-based network anomaly detection,”
SAE: Stacked Autoencoder
Cluster Computing, vol. 22, no. S1, pp. 949–961, 2017.
SDN: Software-Defined Network [13] M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade,
SVM: Support Vector Machine. and M. Saberian, “Deep packet: a novel approach for
encrypted traffic classification using deep learning,” Soft
Data Availability Computing, vol. 24, no. 3, pp. 1999–2012, 2020.
[14] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapè, “MI-
The data used to support the findings of this study are METIC: mobile encrypted traffic classification using multi-
available from the corresponding author upon request. modal deep learning,” Computer Networks, vol. 165, Article
ID 106944, 2019.
[15] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Mobile
Conflicts of Interest encrypted traffic classification using deep learning: experi-
The authors declare that they have no conflicts of interest mental evaluation, lessons learned, and challenges,” IEEE
regarding the publication of this article. Transactions on Network and Service Management, vol. 16,
no. 2, pp. 445–458, 2019.
[16] S. Aljawarneh, M. Aldwairi, and M. B. Yassein, “Anomaly-
Acknowledgments based intrusion detection system through feature selection
analysis and building hybrid efficient model,” Journal of
This work was supported by the National Natural Science Computational Science, vol. 25, pp. 152–160, 2018.
Foundation of China (nos. 61702321, 61872230, 61802249, [17] N. Ashraf, W. Ahmad, and R. Ashraf, “A comparative study of
61802248, and U1936213). data mining algorithms for high detection rate in intrusion
detection system,” Annals of Emerging Technologies in
References Computing (AETiC), vol. 2, no. 1, 2018.
[18] K. Rai, M. S. Devi, and A. Guleria, “Decision tree based algorithm
[1] M. L. Tuballa and M. L. Abundo, “A review of the devel- for intrusion detection,” International Journal of Advanced
opment of smart grid technologies,” Renewable and Sus- Networking and Applications, vol. 7, no. 4, p. 2828, 2016.
tainable Energy Reviews, vol. 59, pp. 710–725, 2016. [19] R. R. Reddy, Y. Ramadevi, and K. N. Sunitha, “Effective
[2] X. Yao, Y. Zou, Z. Chen, M. Zhao, and Q. Liu, “Topic-based discriminant function for intrusion detection using SVM,” in
rank search with verifiable social data outsourcing,” Journal of Proceedings of the 2016 International Conference on Advances
Parallel and Distributed Computing, vol. 134, pp. 1–12, 2019. in Computing, Communications and Informatics (ICACCI),
[3] Y. Zou, X. Yao, Z. Chen, and M. Zhao, “Verifiable keyword- Jaipur, India, September 2016.
based semantic similarity search on social data outsourcing,” [20] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and
IEEE Access, vol. 7, pp. 5616–5625, 2018. M. Ghogho, “Deep learning approach for network intrusion
20 Security and Communication Networks

detection in software defined networking,” in Proceedings of [37] A. Tharwat, “Classification assessment methods,” Applied
the 2016 International Conference on Wireless Networks and Computing and Informatics, 2018.
Mobile Communications (WINCOM), Fez, Morocco, October [38] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A
2016. detailed analysis of the KDDCup 99 data set,” in Proceedings of
[21] K. Greff, R. K. Srivastava, J. Koutnı́k, B. R. Steunebrink, and the 2009 IEEE Symposium on Computational Intelligence for
J. Schmidhuber, “LSTM: a search space odyssey,” IEEE Security and Defense Applications, Ottawa, Canada, July 2009.
Transactions on Neural Networks and Learning Systems,
vol. 28, no. 10, pp. 2222–2232, 2016.
[22] Y. Xin, L. Kong, Z. Liu et al., “Machine learning and deep
learning methods for cybersecurity,” IEEE Access, vol. 6,
pp. 35365–35381, 2018.
[23] W. Wang, M. Zhu, J. Wang, X. Zeng, and Z. Yang, “End-to-
end encrypted traffic classification with one-dimensional
convolution neural networks,” in Proceedings of the 2017 IEEE
International Conference on Intelligence and Security Infor-
matics (ISI), Beijing, China, July 2017.
[24] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and
J. Lloret, “Network traffic classifier with convolutional and
recurrent neural networks for internet of things,” IEEE Access,
vol. 5, pp. 18042–18050, 2017.
[25] W. Wang, Y. Sheng, J. Wang et al., “Hast-ids: learning hi-
erarchical spatial-temporal features using deep neural net-
works to improve intrusion detection,” IEEE Access, vol. 6,
pp. 1792–1806, 2017.
[26] S. Demirci and S. Sagiroglu, “Software-defined networking for
improving security in smart grid systems,” in Proceedings of
the 2018 7th International Conference on Renewable Energy
Research and Applications (ICRERA), Paris, France, 2018.
[27] N. McKeown, T. Anderson, H. Balakrishnan et al., “Open-
Flow,” ACM SIGCOMM Computer Communication Review,
vol. 38, no. 2, pp. 69–74, 2008.
[28] C. Gonzalez, S. M. Charfadine, O. Flauzac, and F. Nolot,
“SDN-based security framework for the iot in distributed
grid,” in Proceedings of the 2016 International Multidisci-
plinary Conference on Computer and Energy Science
(SpliTech), Split, Croatia, July 2016.
[29] A. F. Agarap, “Deep learning using rectified linear units
(ReLU),” 2018, https://arxiv.org/abs/1803.08375.
[30] D. P. Kingma and J. Ba, “Adam: a method for stochastic
optimization,” 2014, https://arxiv.org/pdf/1412.6980.pdf.
[31] E. Bisong, “Introduction to scikit-learn,” in Building Machine
Learning and Deep Learning Models on Google Cloud Plat-
form, pp. 215–229, Springer, Berlin, Germany, 2019.
[32] S. Sharma, A. Soni, and V. Malviya, “Face recognition based
on convolution neural network (CNN) applications in image
processing: a survey,” in Proceedings of the Recent Advances in
Interdisciplinary Trends in Engineering & Applications
(RAITEA), 2019.
[33] Y. LeCun and Y. Bengio, “Convolutional networks for images,
speech, and time series,” The Handbook of Brain Theory and
Neural Networks, vol. 3361, no. 10, 1995.
[34] A. Vaswani, N. Shazeer, N. Parmar et al., “Attention is all you
need,” in Proceedings of the 31st Conference on Neural In-
formation Processing Systems (NIPS 2017), pp. 5998–6008,
Long Beach, CA, USA, 2017.
[35] K. Adem, S. Kiliçarslan, and O. Cömert, “Classification and
diagnosis of cervical cancer with stacked autoencoder and
softmax classification,” Expert Systems with Applications,
vol. 115, pp. 557–564, 2019.
[36] N. Moustafa and J. Slay, “UNSW-NB15: a comprehensive data
set for network intrusion detection systems (UNSW-NB15
network data set),” in Proceedings of the 2015 Military
Communications and Information Systems Conference
(MilCIS), Canberra, Australia, November 2015.

You might also like