The Hybrid Technique For DDoS Detection With Supervised Learning Algorithms
The Hybrid Technique For DDoS Detection With Supervised Learning Algorithms
PII: S1389-1286(18)30688-1
DOI: https://doi.org/10.1016/j.comnet.2019.04.027
Reference: COMPNW 6796
Please cite this article as: Soodeh Hosseini , Mehrdad Azizi , The Hybrid Technique for
DDoS Detection with Supervised Learning Algorithms, Computer Networks (2019), doi:
https://doi.org/10.1016/j.comnet.2019.04.027
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
T
E-mail: 1,[email protected] and 1
[email protected]
IP
Abstract
CR
Distributed denial of service (DDoS) is still one of the main threats of the online services. Attackers
are able to run DDoS with simple steps and high efficiency in order to prevent or slow down users' access
to services. In this paper, we propose a novel hybrid framework based on data stream approach for
US
detecting DDoS attack with incremental learning. We use a technique which divides the computational
load between client and proxy sides based on their resource to organize the task with high speed. Client
AN
side contains three steps, first is the data collecting of the client system, second is the feature extraction
based on forward feature selection for each algorithm, and the divergence test. Consequently, if
divergence got bigger than a threshold, the attack is detected otherwise data processed to the proxy side.
M
We use the naïve Bayes, random forest, decision tree, multilayer perceptron (MLP), and k-nearest
neighbors (K-NN) on the proxy side to make better results. Different attacks have their specific behavior,
ED
and because of different selected features for each algorithm, the appropriate performance for detecting
attacks and more ability to distinguish new attack types is achieved. The results show that the random
forest produces better results among other mentioned algorithms.
PT
1. Introduction
Distributed denial of service (DDoS) is a common and critical web service attack in recent
AC
years due to its simple operation and high efficiency. For example, a DDoS attack includes such
a power which is able to disconnect a country from the internet. Here the DDoS attacks is
considered as a cyber-warfare tactics [1]. The purpose of the DDoS attack is to deny legitimate
users' access to services by exhausting the hardware resource or bandwidth which means servers
*
Corresponding author
1
ACCEPTED MANUSCRIPT
lose their availability and is an important part of the security of any services. The Computer
Incident Advisory Capability (CIAC), reported the first DDoS attack occurrence. This type of
attack consist of a previous version which is Denial of service (DoS) and is known since 1980
[2]. The difference between these two types of attacks is in the number of sources which is used
in a Dos attack, included a single source, but in DDoS, there are botnets that consist of several
compromised machines called zombies which are controlled by bot master as the attacker. There
T
are many tools to generate DDoS attacks such as Trinity, low orbit ion cannon, tribal flood
IP
network, and mstream, trinoo. These tools are different in their architecture, type of flooding
attack, and also the method that is used for DDoS attack and several other aspects [3].
CR
Different definitions have been proposed for machine learning such as machine learning
addresses and how to build computers that improve automatically through experience [4] or
US
machine learning is about predicting the future based on the past that is questioned. If the time
into the past and future is divided, in the past it is learned from training data then in the future,
will have be tested data and will be created a model or predictor which can be used for the
AN
purpose of predicting next steps [5]. Machine learning is rapidly growing in many fields such as
science, technology, marketing, education, healthcare, and many other fields. Several
M
learning technique in cyber security is helpful by recommending the proper decision for analysis
and even doing the proper action automatically. Some techniques which are available include
artificial neural networks, association rules, and fuzzy association rules, the Bayesian network,
PT
clustering, decision trees, ensemble learning, evolutionary computation, hidden Markov models,
naïve Bayes and support vector machine [6]. The researchers’ approach in this study includes
CE
using some algorithms together and takes the best action based on the presented target and
algorithm result.
AC
The given implementation is based on the open source analytics platform KNIME, a
modular environment that allows interactive execution and simple visual assembly of workflows.
In this tool, it can be simply implemented an algorithm, read data in various formats and
manipulate them, visualize results and many other tasks are at hand [7].
The fast-growing usage of online services and data generation in the world is led to the
emergence of a significant problem which is the way to handle big data and protect online
2
ACCEPTED MANUSCRIPT
services. There are many significant and useful wisdom and information in the data. Therefore,
the huge profit can get from data cannot be ignored [8].
Most of data is generated in a sequential or stream, therefore this type of data should be
considered. There are multiple techniques available for handling this data, and will be described
further in the following sections. Now with increasingly exponential data in the world, the
technique when facing an online threat such as a DDoS attack must be improved. Also, both big
T
data and stream data topics in this work should be considered.
IP
Two main contributions of this work are given. The first major aim is to divide and balance
the process between client and proxy to derive a better result in a specified time due to the
CR
resources limitless. The goal that is intended to be achieved is no overweight on both sides
especially the client side. Secondly, in order to prevent the client to continue the attack in the
US
first place by detecting the attack as soon as possible, either on the client side or the proxy side.
In this paper, a new hybrid framework is introduced based on data stream approach for
DDoS attack detecting with a technique which divides the computational load between the client
AN
and proxy side.
Furthermore, the presented work reduces process time and process cost on intrusion
M
detection system (IDS) by giving some of the processes to the client side. Using a number of
algorithms together can resolve the weaknesses of the others and give a better detection rate to
ED
IDS. Algorithms have different process procedures; therefore, this framework is able to handle
new attack types better than any other presented framework.
Moreover, in section 2, the main study concepts are shortly explained. In section 3 an
PT
overview of some related works is given. The proposed framework will be introduced in section
4, and experimental results are followed in section 5. The last section includes the results of this
CE
paper.
2. Background
AC
In this section, the concepts of DDoS attacks and learning algorithms which are used in this
work are discussed briefly.
3
ACCEPTED MANUSCRIPT
Furthermore In the following section a description of some of the DDoS attack types is given [1,
9]:
T
hosts, also, is able to exhaust networks.
IP
2.1.2. ICMP (Ping) flood
CR
Similar to the UDP, internet control message protocol (ICMP) or ping, intends to send the
packet in the fastest way possible without the need for any reply or confirmation. Attackers of
US
this type will send numerous false requests to the victim. Hence network traffic is exhausted.
the connections are saturated by the attackers. The aim of this attack is to consume as much as
possible of the victim's resources.
ED
down the performance, or even damaging the infrastructure. The attack which is explained
before can be defined as an example of this type.
CE
According to the [10, 11] references, detection and defense techniques are divided into four
types which are described in the following:
4
ACCEPTED MANUSCRIPT
the edge router of source local network or at the access router of an autonomous system (AS)
which is connected to the edge routers. Some examples of this type of mechanism are given in
the following:
1. Ingress/Egress filtering at the sources’ edge/routers
2. D-WARD [12]
3. Multi-level tree for online packet statistics (MULTOPS)
T
4. Mananet’s reverse firewall, etc.
IP
2.2.2. Network base defense mechanism
CR
This type of mechanism is deployed in the network and any router. It can either act as a
defender by itself for users and or it is able to perform the appropriate actions for responding and
defeating the attacker. It should be mentioned that the focus of this paper is on this type of
US
mechanism. These actions can set filter or rate limit the traffic. This mechanism helps to find the
source of attack by cooperative operation between networks adapters. In this mechanism,
AN
networks adapters received legitimate and aggregated malicious traffic data, therefore filtering is
not a suitable option. Finally, better option is set to rate limit on the traffic. For instance, some
mechanisms are as follows:
M
the destination of the attack, therefore, first of all, the attack should be detected and then defense
strategy on the server by filtering or setting a limitation on malicious traffic must be applied.
CE
Because of deployment point, the victim traffic precisely is observed and is detected anomalies,
since attack traffic arriving at the server causes the service to get slow. However, with all
AC
advantage in this mechanism because of huge DDoS attack traffic, the victim sources will be
exhausted to defeat the attack, so will be vulnerable [13]. Some destination-based DDoS defense
mechanisms in the following:
1. IP trace back mechanisms
2. Management information base (MIB)
3. Packet marking and filtering mechanisms and etc.
5
ACCEPTED MANUSCRIPT
T
strength of other mechanisms together is used. For example, network base on this mechanism is
IP
better in rate limit on traffic, but victim side can better distinguish between legitimate traffic and
malicious traffic.
CR
2.3. Learning methods
US
In this work, two kinds of data which are big data, the data generated in the networks that
are extremely high and stream data which means the data generated during the time are
presented. The machine learning from these two types in the following will be described.
AN
2.3.1 Big data learning
In recent years big data has grown in most of the fields such as science, engineering and
M
many various other fields. Using the term big data, with referring to the data set, that includes a
large amount of data. Based on the report from international data corporation (IDC) in 2011,
ED
1.8ZB or 1021B data exists throughout the world, which is about nine times bigger than five
years than before. Growing data will be approached double at least for every two years. Many
PT
companies engage in the term of big data, for example, Google, Facebook, TaoBao and many
others example [14]. There is a lot of significant knowledge and information in data; thus
extracting information’s calls for new learning techniques to handle the challenge of data [8]. In
CE
this work, the system's activity and system's events to detect intrusion are analyzed. System's
activity and system's events can generate big data. It is possible to gather data from multiple
AC
systems and process data to extract anomalies and malicious activities. After extracting
information from big data, effective models to prevent and detect attack activity can be created
[15]. Big data learning includes problems such as [8]:
1. Large scale of data
2. Different types of data
3. High speed of streaming data
6
ACCEPTED MANUSCRIPT
T
time such as user modeling in social networks, monitoring of community network, web mining
IP
and etc. [16] are observable. In the other hand, the size of data exponentially grows up so it does
CR
not have the ability to load into the memory. This problem makes use of a new mechanism. Here
are two strategies available for handling this challenge:
1. The parallel processing: the algorithm into the little and separate part to reduce computing
time is divided.
US
2. The incremental processing: a one-pass algorithm to create a model and update that each time
AN
read a single data is implemented.
In stream data learning some problems are as follows:
1. Processing great size of streaming events like predict, act, filter, etc.
M
2. Scalability and performance when the size and complexity of data are increased.
3. Analytics like real-time data discovery and monitoring, process on arriving query
ED
Incremental learning is a type of learning, which is based on the fact that if a new example
and hypothesis fi is given, the hypothesis fi+1 without learning on all of previous data or
AC
hypothesis can be generated. In this type, the algorithm must be faster than other algorithms
similar to batch learning algorithm. To achieve this goal, most incremental algorithms are read
data for a single time. Therefore, the time efficiency increases and can process more data in a
short time.
In addition the properties which is mentioned before other properties such as, need fixing
and low time for processing each example, reading data based on their order and just in one time,
7
ACCEPTED MANUSCRIPT
constant consuming memory but no matter how many examples processed and predict in
anytime are expected [16].
3. Related work
Until now, some of the basic definitions of presented work are explained. This section will be
a summary of the related works. In the following, some of the researches which is done in recent
T
years is shown in Table 1. This table is based on three columns, objective, deployment and
IP
remarks. The objective is the main aim of each work, which is mostly the attack detection except
one of works. Deployment, which is described above, includes four types, source side, network
CR
base, victim side, and hybrid. At remarks column, the summary of the points of each work are
explained.
Reference
Table 1.
Objective US
Aattack detection study in literature
Deployment Remarks
AN
V. Sekar, et al. (2006) attack detection source side design triggered framework, a multi-stage
[18] framework with high accuracy and
M
scalability property
H. Rahmani, et al. attack detection victim side a statistical approach based on the
(2009) [19] network anomaly and joint entropy using
ED
identification
J. François, et al. (2012) attack detection source side an early detection technique for DDoS
[21] flooding attack which supports
AC
8
ACCEPTED MANUSCRIPT
B. Singh, and S. Panda attack detection intermediate anomaly and signature-based detection
(2015) [9] network from high-speed link in an efficient data
streaming fashion
A. Fadlil, et al. (2017) attack detection source side network traffic activity was statistically
[23] analyzed using Gaussian naïve Bayes
method
D. Kim, and K. Y. Lee attack detection source side Proposed some attributes and a method for
T
(2017) [24] detecting a variant of DDoS attack on the
IP
client side using SVM.
M. A. M. Yusof, et al. attack detection intermediate Proposed a technique called PTA-SVM
CR
(2017) [25] network (Packet Threshold Algorithm with SVM)
to detect DDoS attacks in the first place
traffic.
S. Behal, et al. (2018) attack detection intermediate Proposed an ISP level, flexible and
[13] network automatic model based on collaborative
with the nearest point of presence routers
to distribute computational and storage
complexity.
9
ACCEPTED MANUSCRIPT
T
for comparing results. Finally, they find
IP
out naïve Bayes has better results.
CR
4. The proposed framework
The proposed approach framework is a hybrid machine learning mechanism for detecting
US
and defensing against DDoS attack. The given framework is based on two sides, client side and
proxy side. Because of the limited resource on both sides, the process between them is divided.
AN
There are many papers for detecting DDoS attack using machine learning algorithm. Each of
papers uses one or multiple algorithms separately to detect and compare the results with each
other [23, 24, 28, 29]. Here it is tried to use multiple algorithms together and benefit from all of
M
algorithms properties simultaneous [30]. Also, in the framework, a determiner is used which
helps improving results. In determiner, there are several rules to achieve better results based on
ED
classifiers under the particular situation and system admin can define rules.
In Figure 1 the given proposed framework is shown. Detection process starts on the client
PT
side and after some preliminary step data are compared with the attack profile database and if it
is not attacked, data proceed to the proxy side. Finally, if data detected as normal, it proceeds to
CE
the server.
AC
10
ACCEPTED MANUSCRIPT
T
IP
CR
Figure 1 The proposed framework
4.1. Dataset
US
The first step of each learning is data gathering. Appropriate data is helpful to have a better
AN
result and designing framework efficiently. The presented framework result is based on two
distinct datasets.
The first dataset is NSL-KDD dataset [31]. NSL-KDD is an improved dataset from
M
KDDCUP'99, which solves several problems of KDDCUP'99 and prepares new a dataset with
the selected records from KDD dataset with no previous problem to exist anymore. This means
ED
despite the problems of NSL-KDD, is a proper dataset for researchers [32]. NSL-KDD contains
not only DDoS attack but also different types of attack. In the first place contains 125974 records
PT
for train data and 22544 records for test data with 43 attributes. A filter to distinct DDoS attack
from other attacks and decrease records to 113270 and 17168 for train and test data is applied.
CE
Moreover, it is assumed that all type of DDoS attack in the dataset is same and renamed all type
to DDoS, therefore two classes “DDoS” and “normal” are given.
The second dataset is a dataset which was provided by Alkasassbeh et al. [33]. In this dataset,
AC
the modern attacks like SIDDOS and HTTP flood is presented. Dataset gathering is done by
authors in six-step. First of all, the network traffic is generated then is collected and tested after
preprocessing on data. When preprocessing is done, the features extraction is started and then is
calculated statistical measurement and finally the dataset will be ready. This dataset includes
2160668 records with 28 attributes. In dataset, five classes are defined, UDP-Flood, Smurf,
11
ACCEPTED MANUSCRIPT
SIDDOS, HTTP-FLOOD that are four type of DDoS attack and a class normal. The records into
two classes “DDoS” and “normal” like NSL-KDD dataset are divided.
Before any operation on datasets, data are transformed into numerical and then normalize.
Before deployment the framework some data manipulations is done and then train the
T
classifiers. These steps are helpful to have better performance in client and proxy side due to the
IP
time-consuming of these steps. In the previous section, it is mentioned to some of steps, after that
the forward selection is used to find the best subset of features for train of each algorithm
CR
separately. Forward selection starts with an empty subset of features and in every iteration adds a
feature to improve the performance model. This iterative work continues until add any feature
In this section the beginning of the framework on the client side will be done. In client
side, because of less resource, the simple process such as data gathering from system activity and
AC
system events and then preprocessing for a-divergence test and finally, compare to presented
models is given. If the result of the test is more than the threshold, the connection or any
appropriate action to prevent the attack as soon as possible is terminated.
Pseudocode of client side is as follow:
//input: selected features in offline mode
//output: two type of packet attack or normal and suspicious
12
ACCEPTED MANUSCRIPT
1. data = real_time_data_collect()
2. values = data_extraction(data, selected-features)
3. flag = divergence_test(values)
4. if (flag == “Attack”)
5. terminate connection
6. else
7. process to proxy side
T
IP
4.4. Proxy side
CR
Proxy side architecture is shown in Figure 2. After the client-side process, first, the
incoming data with attack profile database due to avoid computational overhead is checked and
if the results do not match any profile, the result is given to proxy side with an appropriate form,
US
which can process machine learning algorithms on data. The naïve Bayes, random forest,
decision tree, MLP and K-NN algorithms on data are processed, and then an algorithms
AN
determiner is given to the results which decides based on its configured policy and provides the
better result. The approach based on incremental learning with every single input data, the profile
is updated. On the other hand, here is a database of profile which prevents congestion of profiles.
M
//input : packets from client side, trained classifiers from offline side
//output : two type of packet attack or
PT
4. if (flag == “Attack”)
5. terminate connection
AC
6. data = transform(data)
7. i = 0
8. for each cls in classifiers:
9. ans[i] = classifier(data, cls)
10. i=i+1
11. TypeFlag , ProfileFlag = determiner(ans)
13
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
5. Experimental results
CE
In this section the experiments implementation was carried out by analytics platform
KNIME. In Figure 3 the snapshot from KNIME workflow is considered, which is designed for
AC
implementation of the experiment. The machine learning algorithms is operated in batch mode
and in future work and the stream data mode will be studied.
14
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
transform data into presented preferred form. In order to reduce processing cost on the proxy side
and also to improve detection the feature selection is performed. Results are shown in the last
PT
The MLP trainer is configured by one hidden layer with 19 nodes for NSL-KDD and one hidden
layer with eight nodes for the introduced dataset in reference [33]. For the random forest, the
information gain ratio is used for split criterion and set the number of model 29 and 50 for NSL-
AC
KDD and the introduced dataset in reference [33] respectively. The decision tree trained with
quality measures Gini index and Minimum Description Length (MDL) pruning method [34].
KNN trained with k equal to 11 for NSL-KDD and 7 for the other dataset. For NSL-KDD, closer
neighbors have a greater influence on the resulting class than the other ones.
15
ACCEPTED MANUSCRIPT
In the first step of implementation, the preprocessing on our datasets is exerted. Before
common processes, for NSL-KDD the DDoS attack records is selected. Then for both of
presented datasets, the string type attributes in data is changed to the integer type and normalized
data with the min-max algorithm. In the next step, the different type of DDoS attack is renamed,
which are defined in the dataset to two main classes, "DDoS" and "Normal".
T
IP
In KNIME feature selection in a Meta node which name is forward feature selection can be
performed. In this node, the dataset for input is given, and after processing, a table with attributes
CR
number and our accuracy when selecting each group of them is provided. A snapshot of feature
selection result in Figure 4 is shown which rows are sorted based on their descending accuracy
US
and next to each percent of accuracy, and the number of features which are selected for the
accuracy. On the right side of Figure 4, the selected features for selected accuracy are marked
with a blue line. Selected row in Figure 4 illustrates the accuracy of 93.1% with eleven features
AN
which five of them are col 1, col 3, col 7, col 21 and col 27.
M
ED
PT
CE
AC
16
ACCEPTED MANUSCRIPT
T
decision tree 1,2,4,9,11,12,23,35,36,41 98.2 19,27 98.7
IP
1,5,6,7,8,9,10,12,14,16,18,1
MLP 96.1 1,2,5,6,17,19,22,23,26,27 98.4
9, 20,21,26,31, 35,36,41,42
CR
1,4,7,21,22,29,32,35,38,41,4
K-NN 97.7 16,19,20,22,27 98.7
2
Accuracy =
US
Furthermore, the accuracy can be calculated with equation 1.
(1)
AN
In Figure 5 the result in a simple plot to compare datasets accuracy with each other is shown.
M
ED
PT
5.3. Result
As it is described before, the naïve Bayes, random forest, decision tree, MLP, and K-NN
AC
algorithm are implemented. The trained model for 20-times with 5-fold cross-validation is performed
and the results for each dataset are collected. After that, the mean of results is taken. The precision,
recall, and f-measure of each algorithm, which are calculated with the following equation are
shown in Table 1. In front of each algorithm, cells are divided into two cells, one of them for
DDoS attack type and the other for detecting the normal type. Each of the criteria is written
separately for each data class.
17
ACCEPTED MANUSCRIPT
Precision = (2)
Recall = (3)
F-measure = (4)
T
Criterion
Type Precision Recall F-measure Precision Recall F-measure
IP
Algorithm
naïve DDoS 93.6 87.3 92.7 99.9 84.5 91.6
CR
Bayes Normal 94.4 97.5 95 98.2 1 99.1
random DDoS 99.6 99.8 99.7 99.9 87.3 93.2
forest
decision
tree
Normal
DDoS
Normal
99.9
99.4
99.9
99.8
99.8
99.6
US 99.8
99.6
99.7
98.5
99.9
98.5
1
87.3
1
99.3
93.1
99.3
AN
DDoS 93.4 91.8 94.9 99.9 86.2 92.5
MLP
Normal 97.5 95.6 96.4 98.4 1 99.2
DDoS 99.8 99.8 99.8 99.9 87.3 93.1
M
K-NN
Normal 99.9 99.8 99.9 98.5 1 99.3
The given implementation is based on a tool to verify its accuracy. The results are compared
ED
The introduced
implementation KNIME
dataset in [33]
CE
18
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
Until now the results without considering the determiner in the proxy side is described. Before
describing determiner policy, the outputs of all algorithms together will be put in a table with
ED
correct output. By a simple look to the result in the first dataset, only 18 records among 17168
records exist which all of the algorithms have mistaken. There is 2033 records that the
algorithms decide different and at least one of them are able to detect right. This result for the
PT
assumed that if at least two of algorithms determine correct, the whole approach determine
correctly otherwise mistake happened. Two datasets with this policy is processed. Results are
AC
described and calculated true positive rate (TPR) and false positive rate (FPR) in Table 5
according to following equations.
TPR = (6)
FPR = (7)
Detecting DDoS attack is considered as the positive class and detecting normal data is
considered as the negative class. Therefore:
19
ACCEPTED MANUSCRIPT
Table 5. The comparison of True positive rate and false positive rate
Dataset Introduces dataset in
NSL KDD
Measure [33]
TPR 0.995 0.873
FPR 0.005 0.00006
T
Variance is a measurement of the spread between numbers from their average value in a data
set. Another definition is the average of the squared differences from the mean. For each dataset
IP
and algorithm, the variance is calculated separately, for normal and DDoS class. First of all, the
CR
normal and DDoS rows in algorithms result are distinguished. For each class, the standard
deviation for all columns should be calculated and then squared them and get the average
number. In the following, variance formula and results in Table 6 are given. In Table 6 the
∑( )
US
results are shown according to the variance formula which is shown in Eq. 8.
(8)
AN
Where X is an individual data point, u is mean of data points, and N is total number of data
points.
M
Variance value always will be zero or positive number. A small variance indicates that
numbers in the set are near to mean and each other, while a large variance shows that the
ED
opposite ones. Small variance is better than a large one and improves generalization for
classifiers.
PT
For comparing the examined algorithms, we consider a statistical test using STAC [35]. STAC is
a web platform for the comparison of methods with considering statistical tests. If the data have a
20
ACCEPTED MANUSCRIPT
distribution (usually normal), then a parametric test can be used; otherwise, if the data
distribution are not specified, we use a nonparametric test. Therefore, we have a non-parametric
test, the number of algorithms are 5 and the number of datasets is 2 (k=5 and n =2) thus based on
STAC assistant decision tree, Aligned-Ranks test will be chosen. Figure 7 shows path to how
Aligned-Ranks test choose.
T
IP
CR
US
AN
M
ED
The goal of the statistical test is to compare the accuracy of the algorithms thus we compare the
algorithms two by two.
CE
First of all, we normalize accuracies in Table 4 between [0, 1] and give the values to STAC as
input. We apply Aligned-Ranks test and set the post-hoc parameter to holm and significant level
to 0.05 which is shown in Table 7. H0 is accepted when the p-value is greater than the significant
AC
level.
21
ACCEPTED MANUSCRIPT
Table 8 shows the rank of algorithms in STAC. As shown in Table 8, naïve Bayes gets the
minimum score and random forest gets the maximum score.
Algorithms Ranking
T
Random Forest 8.50
IP
Decision Tree 7.25
CR
K-NN 6.75
MLP
US 2.75
AN
naïve Bayes 2.25
Another output is obtained to compare algorithms two by two, which is shows in Table 9.
M
22
ACCEPTED MANUSCRIPT
T
6. Conclusions
IP
In this paper, we proposed the hybrid framework to detect the DDoS attack. Based on the
CR
results, the processes into two sides divided. Each side performed on its work therefore the speed
in order to organize work with this trick is improved. KNIME presented its performance when is
US
compared to other works hence the given results are reliable. Random forest in both datasets
analyzing reappearance better results but in a special situation any of other algorithms may work
better. This is one of the reasons to use the number of classifications. As it can be seen each of
AN
the algorithms selects a different subset of features thus each one performs based on specified
features of attack. Accordingly, a wide range of attacks based on the behavior is detected. Using
M
the number of classification algorithm instead of one in a long time gives the ability to face an
unknown attack better than the previous method. Each algorithm included its weaknesses and
ED
strengths thus is used the combination of algorithms to take a better system detection. Another
property of this paper is store profile in a database of the attack to prevent over-process on
stream data, and is able to compare data features to the provided profile database with the a-
PT
divergence test.
CE
References
[1] K. N. Mallikarjunan, K. Muthupriya, and S. M. Shalinie, "A survey of distributed denial
AC
of service attack," in Intelligent Systems and Control (ISCO), 2016 10th International
Conference on, 2016, pp. 1-6: Ieee.
[2] P. J. Criscuolo, "Distributed denial of service: Trin00, tribe flood network, tribe flood
network 2000, and stacheldraht ciac-2319," CALIFORNIA UNIV LIVERMORE
RADIATION LAB2000.
[3] B. Nagpal, P. Sharma, N. Chauhan, and A. Panesar, "DDoS tools: Classification, analysis
and comparison," in Computing for Sustainable Global Development (INDIACom), 2015
2nd International Conference on, 2015, pp. 342-346: IEEE.
23
ACCEPTED MANUSCRIPT
T
classification, data analysis, and knowledge organization. knime: the konstanz
information miner," Springer, 2007.
IP
[8] J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, "A survey of machine learning for big data
processing," EURASIP Journal on Advances in Signal Processing, vol. 2016, no. 1, p. 67,
CR
2016.
[9] B. Singh and S. Panda, "Defending Against DDOS Flooding Attacks-A Data Streaming
Approach," International Journal of Computer & IT, 2015.
[10] S. T. Zargar, J. Joshi, and D. Tipper, "A survey of defense mechanisms against
[11]
US
distributed denial of service (DDoS) flooding attacks," IEEE communications surveys &
tutorials, vol. 15, no. 4, pp. 2046-2069, 2013.
K. M. Prasad, A. R. M. Reddy, and K. V. Rao, "DoS and DDoS attacks: defense,
AN
detection and traceback mechanisms-a survey," Global Journal of Computer Science and
Technology, 2014.
[12] J. Mirkovic, G. Prier, and P. Reiher, "Attacking DDoS at the source," in Network
Protocols, 2002. Proceedings. 10th IEEE International Conference on, 2002, pp. 312-
M
321: IEEE.
[13] S. Behal, K. Kumar, and M. Sachdeva, "D-FACE: An anomaly based distributed
approach for early detection of DDoS attacks and flash events," Journal of Network and
ED
International Journal of Networks and Communications, vol. 7, no. 1, pp. 24-31, 2017.
[16] V. Lemaire, C. Salperwyck, and A. Bondu, "A survey on supervised classification on
data streams," in European Business Intelligence Summer School, 2014, pp. 88-125:
CE
Springer.
[17] D. Namiot, "On big data stream processing," International Journal of Open Information
Technologies, vol. 3, no. 8, 2015.
[18] V. Sekar, N. G. Duffield, O. Spatscheck, J. E. van der Merwe, and H. Zhang, "LADS:
AC
24
ACCEPTED MANUSCRIPT
[21] J. François, I. Aib, and R. Boutaba, "FireCol: a collaborative protection network for the
detection of flooding DDoS attacks," IEEE/ACM Transactions on Networking (TON),
vol. 20, no. 6, pp. 1828-1841, 2012.
[22] M. Barati, A. Abdullah, N. I. Udzir, R. Mahmod, and N. Mustapha, "Distributed Denial
of Service detection using hybrid machine learning technique," in Biometrics and
Security Technologies (ISBAST), 2014 International Symposium on, 2014, pp. 268-273:
IEEE.
[23] A. Fadlil, I. Riadi, and S. Aji, "A Novel DDoS Attack Detection Based on Gaussian
Naive Bayes," Bulletin of Electrical Engineering and Informatics, vol. 6, no. 2, pp. 140-
T
148, 2017.
[24] D. Kim and K. Y. Lee, "Detection of DDoS Attack on the Client Side Using Support
IP
Vector Machine," International Journal of Applied Engineering Research, vol. 12, no.
20, pp. 9909-9913, 2017.
CR
[25] M. A. M. Yusof, F. H. M. Ali, and M. Y. Darus, "Detection and Defense Algorithms of
Different Types of DDoS Attacks Using Machine Learning," in International Conference
on Computational Science and Technology, 2017, pp. 370-379: Springer.
[26] B. Zhou, J. Li, J. Wu, S. Guo, Y. Gu, and Z. Li, "Machine-Learning-Based Online
[27]
US
Distributed Denial-of-Service Attack Detection Using Spark Streaming," in 2018 IEEE
International Conference on Communications (ICC), 2018, pp. 1-6: IEEE.
M. Idhammad, K. Afdel, and M. Belouch, "Semi-supervised machine learning approach
AN
for DDoS detection," Applied Intelligence, pp. 1-16, 2018.
[28] K. N. Mallikarjunan, A. Bhuvaneshwaran, K. Sundarakantham, and S. M. Shalinie,
"DDAM: Detecting DDoS Attacks Using Machine Learning Approach," in
Computational Intelligence: Theories, Applications and Future Directions-Volume I:
M
[30] I. Cano and M. R. Khan, "ASML: Automatic Streaming Machine Learning," ed.
[31] "Nsl-kdd data set for network-based intrusion detection systems." Available on:
http://nsl.cs.unb.ca/KDD/NSL-KDD.html, March 2009.
[32] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the KDD
PT
CUP 99 data set," in Computational Intelligence for Security and Defense Applications,
2009. CISDA 2009. IEEE Symposium on, 2009, pp. 1-6: IEEE.
[33] M. Alkasassbeh, G. Al-Naymat, A. Hassanat, and M. Almseidin, "Detecting distributed
CE
25
ACCEPTED MANUSCRIPT
Biographies
T
IP
CR
Soodeh Hosseini received B.S. degree in computer science (2004) from Shahid Bahonar
University of Kerman and the M.Sc. and Ph.D. in computer engineering (software from Iran
University of Science and Technology, in 2007 and 2016, respectively. Her main research
US
interests include machine learning, cyber security and computer simulation. She has published
several papers in international journals and conferences. Currently, she is an assistant professor
at Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran
AN
M
ED
Mehrdad Azizi received B.S. degree in computer science (2016) from Vali-e-Asr University of
PT
Rafsanjan and now, his pursuing M.S degree in Shahid Bahonar University of Kerman. His area
of interests include machine learning, cyber security, deep learning and computer vision.
CE
AC
26