Boukria 2019
Boukria 2019
Abstract—Software Defined Network (SDN) is considered as the connectivity of data and information that passed through
the main component of the next generation network. Security, in the SDN network.
this environment, has very challenges and risks. Attacking SDN Intrusion detection systems play a big role in the security
controller or injecting false flow rules could affect the network
and block the entire services. To enhance the SDN network enhancement of the network, which are manly designed to de-
security, we propose an anomaly-based intrusion detection system tect malicious activities including virus, worm, DDoS attacks
using deep learning approach. This solution aims to protect the . . . , the earlier possible. Many solutions have been proposed to
communication channel between the SDN control layer and the secure the SDN network using machine learning [7], statistics
SDN infrastructure layer against false data injection attack, and collection [8] and data mining [9] methods. Deep learning
to detect any attempt of attack in SND southbound side. We
analyze the flows that circulate in the SDN network, we use the approaches [10] [11] are mainly used in the most recent
logarithm function followed by the Min/Max scalar technique works of security as an evolution of machine learning methods.
to normalize the flows features. For the flow classification, we They outperformed existing machine learning techniques when
exploit the Relu and Softmax functions. We test the proposed applied to various classification problems [12]. The main
system with CICIDS2017 dataset on an experimental platform contributions of this work are as follows:
combining Mininet environment and ONOS controller. The
evaluation results demonstrate the effectiveness and efficiency • Design an anomaly-based intrusion detection system us-
of the proposed security solution. ing deep learning approach.
Index Terms—Intrusions Detection Systems, Security, Software • Protect the communication between SDN controller and
defined network, ONOS controller, Mininet, Deep learning. physical layer devices.
• Evaluate the proposed solution using Mininet environ-
I. I NTRODUCTION ment and ONOS controller.
The Internet architecture is an increasing complex sys- The figure 1 shows the threat model of the proposed solution.
tem, which changed over the time. The traditional network
architecture has a big limitation in software and hardware
fields, this fact leads to overcome the necessity of new
technologies and architectures as future networks. Software
Defined Networking, a new paradigm that represents the main
concept of the future network. It offers new proprieties and
features by making the network flexible in its configuration
and deployment of services, dynamic, programmable, scalable,
manageable, and cost-effective. These SDN characteristics
make it adaptable with the high-bandwidth and dynamic
applications. SDN decouple the control plane of network from
the forwarding plane, by managing the whole network with a
central entity called the controller[1] [2].
Open flow [3] is the protocol used to manage the SDN
architecture by connecting the control layer of SDN with the
infrastructure layer. SDN architecture and open flow protocol
attract great attention of researchers from both academia and Fig. 1. The threat model of proposed System.
industry. This new architecture adds several functionalities
in the network, which increases the security challenges and The table I shows the list of acronyms used in this paper.
enhance the vulnerability of the network. Many solutions are The rest of the paper is organized as follows, Section II
proposed to securing the SDN network [4] [5] [6]. Security discusses briefly the related work of this solution. In Section
engineers still look to obtain the adequate solution to secure III, we present the description of our solution by presenting
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 06:52:44 UTC from IEEE Xplore. Restrictions apply.
TABLE I achieved binary classification accuracy of 99.82% and 8-class
ACRONYM LIST classification accuracy of 95.65%.
Acronym discription
SDN Software Defined Network Chuanhuang L et al. [15] (2017), propose a detection and
IDS Intrusion Detection System defense system of DDoS attack–based on deep Learning for
ONOS Open Network Operating System
DL Deep Learning OpenFlow-based SDN network. They use recurrent neural
DNN Deep Neural Network network, convolutional neural network and long short-term
GRU-RNN Gated Recurrent Unit - Recurrent Neural Network memory to produce a model with 5 layers, an input layer,
OVS Open Virtual Switch
forward layer, backward layer, fully connected hidden layer
SAE Stacked autoencoder
OF Open Flow and an output layer. The system learns patterns from sequence
DOS Denial of service attack of network traffic and trace network attack in a historical
DDos Distributed Denial of service attack manner. The ISCX2012 [16]data set is used for evaluating
TP True Positive
FP False Positive
the model, this proposition achieves 98% of accuracy.
TN True Negative
FN False Negative Mohd. Z et al [17] (2017), present a system for detection
AC Accuracy and mitigation of SMTP Flood Attack using deep learning
P Precision
R Recall analysis technique, the proposed FlowIDS is a framework
F F-measure integrated with Suricata IDS [18], it based on decision tree
(DT) classification and deep learning (DL) algorithms. They
test their solution by a single simulation of SDN environment
the system architecture and functionalities. The section IV network for DT and DL. The result of simulation has shown
describes the system implementation and discusses the exper- that DL algorithm provides a better network bandwidth han-
iment results. The last section concludes the paper. dling compare to DT algorithm.
II. R ELATED WORK III. P ROPOSED SOLUTION
Few previous works used deep learning approach to secure A. Methodology
SDN network have been proposed in the literature.
In order to have a high-performance intrusion detection
In [13] (2016) a flow-based anomaly detection system by
system, which accurately detects the different types of at-
constructing a deep neural network (DNN) architecture is
tacks and intrusions that affect the SDN network, we design
proposed to secure the SDN network. The structure of the
an anomaly-based intrusion detection method and propose a
Deep learning system includes three basic layers. An input
modular architecture (Figure 2) containing different modules,
layer with six dimensions, three hidden layers contain twelve,
each of which has a role to play for securing the network.
six and three neurons respectively and an output layer with two
dimensions. To achieve better detection rate, the same authors
propose another intrusion detection system for SDN network
[14] (2018), the proposed approach based on Deep recurrent
neural network. The authors propound a Gated Recurrent Unit
- Recurrent neural network (GRU-RNN) model to build an
IDS contains an input and output layers with six and one
dimensions respectively, and a hidden layer which have six,
four and two dimensions. The system architecture composed
of Flow Collector, Anomaly Detector and Anomaly Mitigator.
For the test and training phases, they use the NSL-KDD
dataset. The first solution [13] achieves an accuracy of 75.75%
using six basic features, and the second one [14] had an
accuracy of 89%.
In [12], Niyaz Q et al (2016) propose a deep learning
based multi-vector approach for construct a DDoS intrusion
detection system. The approach incorporates stacked Fig. 2. Architecture of proposed System.
autoencoder (SAE) in an SDN environment. The proposed
system is implemented as a network application on top of Our intrusion detection system supervises malicious traffic
an SDN controller and installing rules only for symmetric that passes through the infrastructure layer. It analyzes the flow
flows, this system consists of three modules Traffic Collector that passes through the various devices and classifies it by an
and Flow installer (TCFI), Feature Extractor (FE), and intelligent system based on deep learning approach. In case
Traffic Classifier (TC). Its evaluation is performed using of detection of an anomaly, an alarm message is sent to the
custom generated traffic traces. The authors claim to have administrator to report the intrusion detection.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 06:52:44 UTC from IEEE Xplore. Restrictions apply.
In order to find the right approach, which will allow data-sets contain data with different orders of
us to classify with the best possible accuracy the different magnitude. This difference in scale can lead to
flows that cross the network. We focused on deep learning lower performance. Standardization is one of the
techniques. techniques to overcome this problem. There are
many techniques for standardizing data. We will use
two techniques to achieve the best possible results.
To safely implement our intrusion detection system, some
hypotheses must be taken into consideration: • Flow classification module: This module represents the
• The security of our SDN network was not corrupted intelligent part of the intrusion detection system, it is
before or during our deployment. trained to differentiate between several types of attacks
• The security of the controller, the application layer, as and normal flows. This intelligent module processes the
well as the communications between them is already received flow by analyzing it, based on the model trained
ensured. during the learning phase.
As shown in Figure 2, the proposed system contains three
modules, these modules work in adequacy for the good func- B. Normalization
tioning, and the well detection of malicious flows that cross • The first phase of normalization consists of reducing
the network. the too large difference between the values of the same
• Features extraction module: Its main functionality is to feature.
handle the traffic passing through the network. Its task is This phase consists of applying the logarithmic function
to capture the different flows passing through the network to the different values of a feature when there is a very
devices and extracts the necessary features from these large difference between the maximum and minimum
flows. These extracted features help the intelligent system value of this feature.
to determine the type of traffic flow.
– The first step of this module is to capture the If (XM ax - XM in > threshold)
flow from the devices through the port mirroring then
technique. The captured flow is stored in the memory Xnormalize = log(x) (1)
until its end of capture. We note that the flow is Where:
not directly processed, which allows the detection of X: the value of the feature that we are trying to
very specific types of attacks such as Dos and Ddos normalize.
attacks. The value of the threshold is fixed at 10000, based on
– Once the flow is fully captured, the second step the different tests where we obtained the best results.
consists in extracting the necessary features to detect
the possible attacks. Once this step is completed, this • The second phase of normalization, which complements
module sends a vector of features representing the the first, consists in using the Min/ Max Scalar technique,
flow to the pre-processing module. which consists in finding the maximums and minimums
• The module of pre-processing of features: This module of each feature and then applying the following function:
processes the vector received from the features extractor
before sending them to the next module. This step is Xnormalize = x − XM in /XM ax − XM in (2)
mainly responsible for the standardization of feature
vectors. Where :
– The first step of this module consists of removing the XM in : smallest value observed by the feature x. XM ax
features that lead to distort the results of detection : highest value observed by the feature x. x: the value of
and the right classification of flows type. As features, the feature that we are trying to normalize.
we have the source and destination IP addresses that
are not able to differentiate or specify the different The purpose of this phase is to obtain all values between
types of flows because the attack or normal flow can 0 and 1, to ensure a better update of weights during the
come from any machine. learning phase.
– The second step consists in modifying some
features of non-numerical values that prevent the C. Training of the intrusion detection system
flow classification module from diagnosing the For the training of the intelligent system that allows the
different flows received to numerical values, such classification of the different flows. We propose a deep
as ” Infinity ” or ” NaN ” in -1 and 0 respectively. learning architecture (Fig 3).
– The last step consists in normalizing the received This architecture contains three levels of layers, the first
flow for each of features. Most of the time, one includes an Input layer composed of 77 input neurons
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 06:52:44 UTC from IEEE Xplore. Restrictions apply.
representing each feature in the flow, the second level com-
prises three hidden layers which contain 128, 64, and 32
neurons respectively. The third level contains 7 output neurons
representing each type of flow.
f (x) = M ax(0, x) (3) We use the CICIDS2017 dataset [21], generated by the
Canadian institute of cybersecurity. Each flow in this collection
I.e. :
of data is represented by a row composed of 79 features. We
use the Wireshark tool to capture the traffic passing within the
f (x) = { 0 if x < 0 , x if x >= 0 }
network, and the CICFLowMeter tool as a feature extractor, it
allows us to extract all the appropriate features with the inputs
For the output layer, we apply the Softmax activation
defined on the proposed intelligent system.
function (4). It attributes probabilities to each class where the
sum of their probabilities is equal to 1. The class with the B. Experiments and results
highest probability is the type of flow analyzed.
1) Evaluation metrics: In the most of works based on deep
exp(xi ) learning approach, the performance of NIDS is evaluated in
σ(xi ) = (4)
exp(x1 ) + exp(x2 ) + ... + exp(xn ) term of accuracy (AC), precision (P), recall (R) and F-measure
Where: (F) to evaluate its detection rate. It requires a high detection
0 < i < n. accuracy and low false alarm rate. The calculation of these
n : the number of output neurons. metrics requires the values of:
• True Positive (TP): the number of attacks correctly
The diagram below ( Fig 4) represents the general function- classified.
ing of the proposed IDS: • True Negative (TN): the number of normal flow correctly
classified.
IV. DEPLOYMENT AND TEST PERFORMANCE
• False Positive (FP): the number of normal flow incor-
A. Deployment environment rectly classified.
We use Mininet emulator [19] to create a realistic virtual • False Negative (FN): the number of attacks incorrectly
network, its Switches support OpenFlow for highly flexible classified.
custom routing and SDN. We manage the SDN network by So, we can present the formula of each parameter as follows:
ONOS [20] controller. ONOS is Open Network Operating Sys-
- Accuracy (AC): the number of true detections over the
tem, this controller bears the cost of the network configuration
total traffic trace:
and control on a real time. It eliminates also the necessity
for controlling routing protocols and the commutation in the TP + TN
structure of network. AC = (5)
TP + TN + FP + FN
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 06:52:44 UTC from IEEE Xplore. Restrictions apply.
- Precision (P): The number of true detections over the The results presented in the figure 5, show amelioration in
true and false detection rate. The higher P is, the lower the rate of metrics in parallel with the increasing of the size
false alarm is : of learning data set from 500 thousand to 1.5 million flow. On
the other hand, performances seem to stabilize by increasing
TP the size from 1.5 million to 2 million flows.
P = (6)
TP + FP
- Recall (R): the percentage of predicted intrusions versus Based on the results obtained in this phase of evaluation,
all intrusion presented. The most detection requires a high we chose the model which gave the best results (trained with
R value : 2 million flows) to improve it.
To be able to improve this model, it is necessary to detail the
TP results obtained.For this purpose we have recalculated all the
R= (7)
TP + FN metrics used for each type of flow. The table III presents the
- F-measure (F): considering both the precision (P) and results obtained.
the recal (R). We aim for a high F value:
2 We can see that the precision of each class is always above
F = 1 1 (8) 95 %. The same remark for the recall except for web attacks
P + R (83%) and bot attack (29%). and for the F-measure each classe
2) Evaluation results: To evaluate the performance of the is above 90% unless the Bot attack with 45%. It should also
proposed system, we have fixed the set of test on 20% of the be noted that the false positive rate is very low except for the
initial dataset, we have using a program which divides the normal flow type.
rest of dataset (80%) on several sets to observe the evolution
of results over the number of flows used during the training We can explain this misclassification by the reduced number
phase. The table II show the evaluation results: of the flows in the learning data set, or that the features of this
set of data cannot differentiate the attack from the normal flow.
In order to better identify the cause of this misclassification
TABLE II and improve the results obtained, it can be modifying the
E VOLUTION METRICS IN PARALLEL WITH THE SIZE OF TRAINING DATA
SET (M ILLION ).
weight of each class during the error calculation in the training
phase. The weight used to assign more importance to flow
Size of Ac(%) P(%) R(%) F(%) FP(%) types that do not contain many occurrences in the learning
dataset data set. This helps to solve the problem of imbalance in data
Model 20% 99.2 99.03 99.19 99.11 1.48
(0.5M) sets.
Model 40% 99.47 99.46 99.46 99.45 0.73
(1M)
Model 60% 99.58 99.58 99.58 99.57 0.41 3) Comparison with the other solutions: In the table IV, we
(1.5M) compare the performance of our intrusion detection system
Model 80% 99.6 99.6 99.6 99.59 0.84
(2M) with other IDSs implementations in term of accuracy as
evaluation measure, using the Deep learning approach in SDN
networks.
TABLE IV
C OMPARISON OF THE PROPOSED SYSTEM WITH OTHER SOLUTIONS THA
USED D EEP LEARNING IN SDN NETWORK .
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 06:52:44 UTC from IEEE Xplore. Restrictions apply.
TABLE III
E VOLUTION OF METRICS RATE IN PARALLEL WITH THE TYPE OF FLOW.
Type of flow DDoS(%) DoS(%) Web Attack(%) PortScan(%) BruteForce (%) Normal(%) Bot(%)
Precision 99.85 98.64 98.36 99.95 95.67 99.71 96.69
Recall (R) 99.85 98.51 83.02 99.83 97.54 99.8 29.69
F-measure 99.85 98.28 90.04 99.89 96.59 99.75 45.43
FP Rate 0.007 0.14 0.001 0.003 0.02 1.03 0.0007
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 06:52:44 UTC from IEEE Xplore. Restrictions apply.