Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder
Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder
ABSTRACT The rapid increase in network traffic has recently led to the importance of flow-based intrusion
detection systems processing a small amount of traffic data. Furthermore, anomaly-based methods, which
can identify unknown attacks are also integrated into these systems. In this study, the focus is concentrated
on the detection of anomalous network traffic (or intrusions) from flow-based data using unsupervised deep
learning methods with semi-supervised learning approach. More specifically, Autoencoder and Variational
Autoencoder methods were employed to identify unknown attacks using flow features. In the experiments
carried out, the flow-based features extracted out of network traffic data, including typical and different types
of attacks, were used. The Receiver Operating Characteristics (ROC) and the area under ROC curve, resulting
from these methods were calculated and compared with One-Class Support Vector Machine. The ROC
curves were examined in detail to analyze the performance of the methods in various threshold values. The
experimental results show that Variational Autoencoder performs, for the most part, better than Autoencoder
and One-Class Support Vector Machine.
INDEX TERMS Flow anomaly detection, intrusion detection, deep learning, variational autoencoder,
semi-supervised learning.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
108346 VOLUME 8, 2020
S. Zavrak, M. İskefiyeli: Anomaly-Based Intrusion Detection From Network Flow Features Using VAE
The KDDCUP99 dataset [11], which includes packet-based The objective of ANNs is to model the human brain,
features, too, is used to evaluate the methods in the by mimicking neurons, which are small interconnected input
studies [6], [12]. The NSL-KDD dataset [13], which is a units. Each neuron in ANN participates in decision making,
revised version of KDDCUP99, is used for evaluating the and the results are combined. Behaviors of users are modelled
methods proposed in the studies [3], [4], [6]–[8]. Main by ANNs to find a way to detect anomalies. Numerous
drawbacks of these studies could be listed as follows: a) These ANNs used for anomaly-based IDSs were discussed by
studies focus on the detection of intrusions in content-based Beghdad [17]. Sui et al. [18] proposed an anomaly detection
features. b) Dataset used is very outdated and does not reflect system that used a back-propagation neural network classifier
the real network traffic [14]. and statistical feature vectors. They considered three scenar-
As the flow-based data contains less information on ios of resource depletion, DoS attacks, bandwidth attack and
the network traffic compared to payload-based data, it is a combination of bandwidth attack and resource depletion
much harder to detect both known and unknown attacks. using network flow records with DoS attacks. Tran et al. [19]
In this study, the goal is to detect anomalous network traffic proposed a hybrid detection engine, which used block-based
(or intrusions) from flow-based data, which also contain neural network (BBNN) as the detection method. In order to
statistical properties of the flows using deep learning meth- generate a real-time IDS, which was supplied by NetFlow
ods. Furthermore, Autoencoder (AE) and Variational Autoen- data, it was added in a high-frequency FPGA board.
coder (VAE) methods are employed to detect unknown Abuadlla et al. [1], proposed an IDS to detect and classify
attacks, which mean the attacks are not used in training certain intrusions in flow-based data, which consisted of two
phase by using the flow features extracted from network phases. In the first phase, significant changes are monitored
traffic. to identify potential attacks. In the second phase, if an attack
The main contributions of the study are summarized as is known, multi-layer and radial basis function networks
follow: are used to classify the attack. Jadidi et al. [20] proposed a
• This study concentrates on detection of network attacks method that was based on Multi-Layer Perceptron (MLP)
from flow-based features, based on anomaly-based in order to detect abnormal traffic in flow-based data. The
approach. interconnection weights of MLP are optimized by using
• AE and VAE, which are unsupervised deep learn- Cuckoo and particle swarm optimization with a gravitational
ing methods, are employed together with OCSVM search algorithm (PSOGSA). Mirsky et al. [21] proposed
as anomaly detectors and they are trained in a Kitsune, which was an online anomaly detection system that
semi-supervised learning manner. In addition, unlike the identified the attacks on a local network by employing a group
previous studies mentioned above, this study is unique of ANNs named AEs, to cooperatively distinguish between
as it uses deep learning methods for detecting intrusions normal and abnormal traffic patterns with a performance
based on flow-based data. comparable to offline anomaly detectors. Marir et al. [22]
• It is shown that VAE-based anomaly detection system presented a novel distributed method for identifying abnor-
performs much better compared to others based on the mal behavior utilizing a group of multi-layer SVMs together
detailed discussion of ROC and AUC results. with a deep feature extraction in largescale networks. In the
approach proposed, a non-linear dimensionality reduction
The article is organized as follows. In the next section, the was initially performed with a distributed deep belief
studies carried out on flow-based intrusion detection in the networks on network traffic data and then the features
literature are summarized. In the third section, the theoretical extracted were provided as inputs to the multi-layer group of
information on the techniques and evaluation metrics are SVMs which were constructed through the iterative reduce
provided. The experimental methodology and results are paradigm based on Spark. Vinayakumar et al. [23] explored
presented in the fourth section. The final remarks are a deep neural network (DNN) to create a useful and flexible
provided in the last section. IDS, named ‘‘scale-hybrid-IDS-AlertNet’’, and to identify
unforeseen and unpredictable intrusions via supervised
II. RELATED STUDIES learning approach. They selected the network topologies and
The flow-based intrusion detection is on the rise and research optimal network parameters for DNNs by applying various
made in this field are gathering pace. In recent years, hyperparameter settings with KDDCup99 and the best
numerous methods have been proposed, which used flow performed DNN model was also applied on other contents-
data for identifying intrusions. In this section, a review and flow-based public datasets to carry out benchmarks.
of recent trends and particular state-of-the-art algorithms The SVM is a classification method, which transforms
that detect intrusions from flow-based data are summarized. an n-dimensional input data into classes by generating
These include Artificial Neural Networks (ANN), Support vectors in the space. In the research area of intrusion
Vector Machines (SVM), K-Nearest Neighbor (KNN), Deci- detection, SVM is the method preferred as it provides results
sion Trees, clustering and statistical techniques. A more in lower false positive rates and higher accuracies [74].
comprehensive and detailed analysis of flow-based intrusion Winter et al. [24] proposed an inductive network IDS, which
detection can be found in studies of [15] and [16]. functioned on network flows and used OCSVMs for analysis.
Instead of benign flows, the IDS proposed was trained tree. Thaseen and Kumar [33] discussed application of
with malicious data, as opposed to the previous approaches. different DT-based classification algorithms for intrusion
Wagner et al. [25] presented an anomaly detection method detection. Zhao et al. [34] proposed a simple and efficient
by processing large volumes of Netflow records based on flow-based approach to identify both known and novel
SVM. The method in which the quantitative and contextual botnets by employing a DT approach with Reduced Error
information of Netflow data was considered was carried Pruning method to build a model for classifying the botnets.
out by feeding the Netflow records into kernel function Haddadi et al. [35] suggested an alternative approach for
and forwarding the calculated results to an OCSVM. identifying the behaviors of botnets based on genetic
Umer et al. [26] proposed an intrusion detection model, programming and DTs. The suggested method used two
which handled the flow data. The two-stage model developed different sets, which were common flow attributes and
used OCSVM method in order to identify malicious flows TCF flags attributes, extracted from the datasets. Stevanovic
efficiently and then forwarded the malicious traffic to and Pedersen [36] presented an effective botnet detection
the second phase of the detection process. The detailed approach using flow records of a 39-feature set employing a
analysis of malicious flows was conducted in the second collection of supervised machine learning methods. Accord-
phase. ing to results, the overall best performance was achieved by
K-Nearest Neighbor (KNN) takes into account the knowl- Random Forest method.
edge of adjacent points to perform classification in the The statistical methods used in literature can be summa-
example given. Shubair et al. [27] suggested a flow-based rized as follows. Kanda et al. [37] suggested an anomaly
IDS exploiting the benefits of KNN method with the detection method, called ADMIRE, based on random pro-
combination of fuzzy logic. The study used Least Mean jection and Principal Component Analysis (PCA) to identify
Square method to perform error reduction and KNN to pick abnormal network flows. The assessment was carried out
out the best matching class and fuzzy logic for selecting using the traffic traces from a transpacific connection.
the flow class label. Costa et al. [28] proposed an intrusion Haghighat and Li [38] proposed a novel entropy-based
detection method employing Optimum-path forest clustering method, called as Edmund, to identify abnormal behavior and
(OPFC), which was a KNN graph utilizing probability mitigate network intrusions by using NetFlow traffic data.
density function for weighing of nodes. The authors made use
of enhanced nature inspired techniques (Gravitational Search, III. BACKGROUND
Bat Algorithm, Particle Swarm Optimization, and Harmony A. AUTOENCODER
search) to decide the value of k in the optimization of OPFC. Autoencoder (AE) [39], [40] is a neural network method,
Clustering methods are used to detect unique and ben- which has an operating logic that trains the input vectors
eficial patterns in the dataset. The aggregation of the to reconstruct as output vectors with an unsupervised
similar examples in different clusters is performed using approach [41]. Its architecture is basically constructed by an
these patterns. In order to detect network flow anomalies, encoder and a decoder. A single layer of AE has an encoder
Lakhina et al. [29] employed the clustering methods by and a decoder as in (1) and (2), respectively. In this context,
analyzing the feature distributions. Casas et al. [30] presented σ is the nonlinear transformation function and b and W
an anomaly-based IDS, which collected packets from the are called the bias and the weight of the neural network,
network and gathered these packets together into flows respectively [42].
randomly by employing multiple unsupervised clustering
techniques. In this study, a change detection algorithm, which h = σ (Wxh x + bxh ) (1)
utilized sub-space and density-based clustering that produced z = σ (Whx h + bhx ) (2)
subsets of data in each sub-space, was employed to take r = kx − zk (3)
apart the malicious flows. Hosseinpour et al. [31] proposed a
distributed IDS based on unsupervised clustering combined By using an affine mapping resulting in a nonlinearity, the
with Artificial Immune System. The method proposed transformation of the input vector x to a hidden representation
used DBSCAN clustering method to label traffic data as h is performed using the encoder. The transformation
malicious or non-malicious and made available real-time and operation is applied to hidden representation h to reconstruct
online data in order to train the immune response detectors the initial input space using a decoder. The reconstruction
located around hosts of the networks. The Ward Clustering error r is obtained by taking the difference between the
approach was recommended by Satoh et al. [32] to identify reconstructed vector z and the original input vector x. In order
simple and malicious attacks from the SSH dictionary. The to minimize reconstruction error r, unsupervised training
detection process was performed based on ‘‘the existence procedure is accomplished in AE. The flow chart of the AE
of a connection protocol’’ and ‘‘the inter-arrival time of training algorithm can be illustrated as in Fig. 1.
an authentication-packet and the next’’ by identifying ‘‘the AE-based anomaly detection is accomplished using the
transition point of each sub-protocol’’ through flow features. reconstruction error (RE) as the anomaly score. The input
Decision Trees (DTs) generate a tree model by building data with high RE are assumed to be anomalies. The training
rules based on the attribute value of each node of the of AE is performed by feeding the network input with only
FIGURE 1. The flow chart of AE training algorithm. FIGURE 2. The flow chart of AE-based anomaly detection algorithm.
normal examples. The trained AE model will reconstruct In Equation (7), DKL is the Kullback–Leibler divergence
normal input data with very low RE, where it is unsuccessful between the approximate posterior and the prior of the latent
to do so with anomalous data it has not confronted before. variable z. The likelihood of the input data x given the latent
The flow chart of AE-based anomaly detection algorithm can variable z is represented as pθ (x|z).
be illustrated as in Figure 2.
B. VARIATIONAL AUTOENCODER
Variational Autoencoder (VAE) [43] is defined as a directed
probabilistic graphical model, which is obtained by approx-
imation of an artificial neural network to its posterior [42].
In VAE, the latent variable z in which the generative
process begins, is the highest layer of the graphical model.
The complicated procedure of data generation, which leads
to the data x, is represented by g (z) that is modeled
in the formation of an artificial neural network. Because
the marginal likelihood is intractable, the variational lower
boundary of the marginal likelihood of input data is the
objective function of VAE. The marginal likelihood is FIGURE 3. VAE architecture [42].
obtained by summing of the marginal likelihood of distinct
data points in (4). Equation (5) is obtained if the marginal The parameters of the approximate posterior qφ (z|x)
likelihood of distinct data points are reformulated [42]. is achieved with a neural network by VAE. The directed
XN probabilistic graphical model pθ (x|z) is the decoder and the
approximate posterior qφ (z|x) is the encoder as illustrated
log pθ x (1) , . . . , x (N ) = log pθ x (i) (4)
i=1 in Fig. 3. In this context, it must be emphasized that the
(i) (i)
log pθ x ≥ L θ, φ; x (5) purpose of VAE is to model the distribution parameters
instead of actual value. That is, f (x, φ) in the encoder
= Eqφ (z|x (i) ) −log qφ (z|x) + log pθ (x|z)
(6)
produces the parameter of the approximate posterior qφ (z|x)
(i)
= −DKL (qφ (z|x )||pθ (z)) + Eqφ (z|x (i) [log pθ (x|z)] (7) and to achieve the actual value of the latent variable z,
sampling from q (z, f (x, φ)) is needed. The common choice The reparameterization operation is supposed to guarantee
is the isotropic normal for the distribution of latent variable z, that z̃ follows the distribution of qφ (z|x).
which are pθ (z) and qφ (z|x), since the relationship included The anomaly detection task is performed in a semi-
in variables in latent variable space is expected to be a lot supervised way, meaning that just normal data examples
simpler than the original data space. The distributions of are used to train VAE [42]. The probabilistic decoder gθ
the likelihood pθ (x|z) vary according to the nature of the and encoder fφ both parameterizes an isotropic normal
data. More specifically, multivariate Gaussian distribution is distribution in the original input variable space and the latent
applied when the input data is in continuous form. If it is variable space, respectively. The testing process is carried
binary, Bernoulli distribution is used [42]. The flow chart of out by selecting several examples from the probabilistic
VAE training algorithm can be illustrated as in Fig. 4. encoder of the trained VAE model. The probability of the
original data produced from the distribution is computed
using mean and variance parameters, which are generated
by the probabilistic decoder or each example from the
encoder. The average probability, which is also called the
reconstruction probability (RP), is used as an anomaly score.
The RP is the Monte Carlo estimation of
Eqφ (z|x) log pθ (x|z)
(9)
The stochastic latent variables, which produce the param-
eters of the original input variable distribution are utilized
to compute the RP. This is fundamentally equivalent to
the probability of the data being produced from certain
latent variables taken out of the approximate posterior
distribution.
RP computed in VAE differs from RE calculated in
AE in some ways [42]. First of all, while latent variables
are expressed as deterministic mappings in AE, they are
defined as stochastic variables in VAE. The variability of
the latent space can be taken into consideration from the
sampling process due to the fact that VAE employs the
probabilistic encoder to model the distribution of the latent
variables instead of the latent variable itself. This extends
the meaningfulness of VAE in comparison with AE since
the variability can vary although anomaly data and normal
data may possibly have an identical mean value. Secondly,
reconstructions are stochastic
variables in VAE. RP not only takes into account the
difference between the original input and the reconstruction,
but also considers the variability of the reconstruction by
taking into account the variance parameters. The selective
sensitivity to reconstruction in accordance with variable
variance is empowered by using this feature, which is not
available in AE due to its deterministic nature. Thirdly,
FIGURE 4. The flow chart of VAE training algorithm. probability measures correspond to reconstructions. In AE
based anomaly detection, anomaly scores are generated using
The training of VAE is executed by using the backpropa- REs. In that sense, the calculation of anomaly scores would
gation algorithm [42]. The second term on (7) is computed be challenging if the input variables were heterogeneous
using Monte Carlo gradient techniques in conjunction with because of the unavailability of a general objective technique
a reparameterization approach, which employs a random to determine the suitable weights, which vary depending on
variable from a standard normal distribution rather than a the data. In addition to this, the determination of a proper
random variable from the original distribution. The random and objective threshold for RE is a problematic process.
variable z ∼ qφ (z|x) is reparametrized by a deterministic On the other hand, as the probability distribution of every
transformation hφ (, x) where is from a standard normal single variable enables them to be independently computed
distribution. by its individual variability, the computation of the RP
doesn’t need weighing of the RE of the heterogeneous data.
z̃ = hφ (, x) with ∼ N (0, 1) (8) Therefore, it can be concluded that the determination of the
threshold value of the RP can be performed in significantly can be carried out as follows [46]:
more objective and readily comprehensible way than that of (
+1, if x ∈ S
the RE. f (x) = (10)
−1, if x ∈ S
The working logic of the algorithm is as follows. The input
data is first transformed into a feature space H by applying
a suitable kernel function. Then, the algorithm proposed
attempts to find a hyperplane to separate these transformed
feature vectors from the origin by maximum margin [47].
Given a dataset (x1 , y1 ), . . . , (xN , yN ) ∈ Rn × {±1}, let
φ:Rn → H be a kernel map, which maps the examples
into H . Afterwards, in order to separate the dataset from the
origin, following quadratic programming problem needs to
be resolved:
l
1 1 X
min( ||ω||2 + ξi − ρ) (11)
2 νN
i=1
subject to
yi (ω · φ(xi )) ≥ρ − ξi , ξi ≥ 0, i = 1, . . . , N (12)
The parameter ν ∈ (0, 1) refers to the ratio of ‘‘anomalies’’
or ‘‘outliers’’ in the training dataset. It regulates the tradeoff
in between, including most of the data in the region formed
by the hyperplane and maximizes the distance from the
origin. Some outliers are located outside the border by means
of the slack variables ξi . The decision function f (x) =
sign ((ω · φ (x)) − ρ) will be positive for most examples xi
in the training dataset. Since the separation of some datasets
cannot be performed linearly, the kernel functions, such as the
radial basis kernel [48] and polynomial kernel, are generally
used in SVMs to transform the inputs to high dimensional
space to achieve linear separability.
D. EVALUATION METRICS
The evaluation of the proposed method should be performed
FIGURE 5. The flow chart of VAE-based anomaly detection algorithm. using an appropriate metric. For a binary classification,
the results can be separated into four groups [49], [50]:
1) True Positive (TP): Positive examples correctly classified;
C. ONE CLASS SUPPORT VECTOR MACHINE 2) False Negative (FN): A positive example misclassified;
Support Vector Machine (SVM) was originally suggested 3) False Positive (FP): A negative example misclassified;
by Boser et al. [44]. The primary objective of SVM is 4) True Negative (TN): A negative example correctly
to find out the best machine for a given dataset. SVM classified. Furthermore, subsequent metrics can be calculated
accomplishes this objective by maximizing the correctness from the previous ones [50]:
of the machine for a given training dataset. In addition True Positive Rate (TPR): This metric corresponds to the
to this, the ability of the machine is also maximized in ratio of all ‘‘correctly identified examples’’ to all ‘‘examples
order to classify the forthcoming testing datasets accurately. that should be identified’’.
The best machine is discovered by utilizing a mathematical TPR = TP/(TP + FN ) (13)
optimization method [45].
The SVM method is modified into a One-Class False Positive Rate (FPR): This metric represents the ratio
SVM (OCSVM) as explained in [46]. The dataset, which is of the ‘‘number of misclassified negative examples’’ to the
given as an input to the algorithm in OCSVM, consists of ‘‘total number of negative examples’’.
examples belonging to two different classes, namely positive
FPR = FP/(FP + TN ) (14)
and negative. Noises in the positive samples in the dataset,
which are also defined as ‘‘anomalies’’ or ‘‘outliers’’, are Receiver Operating Characteristics (ROC): The ROC
employed as negative examples. The formulation of OCSVM curve [51], [52] is utilized as a standard criterion in the
assessment of classifiers in the case of a class imbalance strategy, the labelled dataset, which contains only normal
problem encountered in the dataset [53]. The ROC curves flow features is used in training phase to create the normal
attempt to accomplish skew insensitivity by providing the profile of network traffic, and in any case, the unlabeled
summary of the efficiency of classifiers on a range of TPRs dataset consisting of both normal and attack flow features
and FPRs. The ROC curves can decide which percentage of is used in the testing phase. It is important to mention that
examples will be properly classified for a certain FPR by labels in the testing dataset are taken into consideration for
assessing the trained models at different error values. The calculating the evaluation metrics in this study.
ROC curves offer an illustrated approach for deciding the
efficiency of a classifier [53]. A. DATASET SETUP
The area under the ROC curve (AUC) metric is utilized In order to evaluate the methods used for intrusion detection,
as a de facto standard in order to measure the efficiency of a proper dataset needs to be used. Intrusion detection methods
classifiers facing class imbalance problem. The reason behind that are referred to in the literature are evaluated through
is that AUC is not influenced by prior probabilities and the several datasets. Many researchers use KDDCUP99 and
chosen threshold. In addition, it provides a single number NSLKDD datasets to assess the models trained as mentioned
to make comparison between classifiers. The AUC can be in Section I. However, KDDCUP99 dataset suffers from
taken into consideration to estimate how frequently a random the high redundancy problem in the training and test datasets.
positive class example ranks higher than a negative class NSL-KDD dataset was generated from KDDCUP99 to
example after it is ordered by its classification probabilities. resolve this issue. But, another issue, which is not addressed
The AUC values are always bounded between 0 and 1. by NSL-KDD, is that this dataset is not a realistic rep-
If AUC value is less than 0.5, it means that the classifier is resentation of network traffic [14]. Additionally, although
unrealistic [54]. A rough classifying system can serve as a these datasets include some flow-based features, they mostly
guidance to the test accuracy as follows [55]: i) Excellent contain packet-based (content) features. As this study aims
(0.90-1), ii) Good (0.80–0.90), iii) Fair (0.70–0.80), iv) Poor to detect the attacks from flow-based features, some of
(0.60–0.70), v) Fail (0.50–0.60). the candidate flow-based datasets are Kyoto 2006+ [57],
CTU-13 [58], UNSW-NB15 [59], CIDDS-001 [60] and
IV. EXPERIMENTAL RESULTS CICIDS2017 [61].
Semi-Supervised Learning (SSL) paradigm is employed as Kyoto 2006+ is a publicly accessible dataset, which
learning strategy. SSL covers several different settings includ- encompasses real network traffic including numerous attacks
ing [56]: a) Semi-supervised classification: Its alternative performed against honeypots such as DoS, malware,
name is classification with labelled and unlabeled data (or backscatter, port scans, exploits and shellcode. But it contains
partially labelled data). This emerged as an extension to the merely a small amount of data and a small range of realistic
supervised classification problem. b) Constrained clustering: normal user behavior with both statistical information and
This strategy extends unsupervised clustering by using flow-based attributes. The CTU-13 was obtained in a campus
training data consisting of unlabeled examples in addition network by characterizing 13 scenarios covering various
to a number of ‘‘supervised information’’ on clusters. SSL botnet attacks and is accessible in the forms of packet,
strategy of constrained clustering was chosen as unlabeled bidirectional flow, and unidirectional flow. The UNSW-
data were easier to get for the intrusion detection system NB15 was generated in a small emulated network over
than labelled data by virtue of the fact that it required 31 hours by acquiring normal and malicious traffic in
less knowledge, time, and effort. Moreover, this strategy is packet-based format. The data set, which is also offered in
more appropriate for the unsupervised training nature of the the form of flow dataset with extra features, comprises of
detection methods utilized. nine distinct categories of attacks such as DoS, backdoors,
fuzzers, worms or exploits, with predetermined separations
for training and testing. The CIDDS-001 was acquired from a
small emulated network by implementing the user behaviors
of normal and malicious users by executing python scripts.
It covers four weeks of unidirectional flow-based network
traffic as well as network attacks including DoS, SSH brute
force and port scan. The CICIDS2017 was produced in an
emulated network environment within 5 days and covers
network traffic in packet-based format as well as flow-based
format including more than 80 extracted attributes with extra
metadata information on IP addresses and a wide range of up-
FIGURE 6. The semi-supervised learning strategy.
to-date attack types including FTP patator, SSH patator, DoS
slowloris, DoS Slowhttptest, DoS Hulk, DoS GoldenEye,
As seen in Fig. 6, the dataset is broken down into two Heartbleed, Brute force, XSS, SQL Injection, Infiltration,
parts as training and testing dataset. By applying the SSL Bot, DDoS and Port Scan.
Among aforementioned datasets, CICIDS2017 was In the determination of the best-performed models of both
selected for evaluation purposes due to the fact that: AE and VAE, the dimensions in the layers were specified
1) It includes various types of attacks, which had been with trial and error by keeping the number of layers
carried out on networks recently, according to a McAfee constant. The parameter settings used in the neural network
report [62], [23], 2) It is up-to-date, 3) It is a labelled configuration of both AE and VAE are as follows: The
dataset consisting of flow-based features expanded by learning rates such as [0.1, 0.01, 0.001] and the momentums
measuring some parameters statistically, 4) It possesses the such as [0.3, 0.5, 0.9] are used in the training trials. The
characteristics of a real-time network traffic [23], 5) It is best-performed values are 0.001 for the learning rate and
non-linearly separable [23]. Table 1 presents the number of 0.3 for momentum. The parameter of l2 regularization [65]
flow examples in CICDS2017 categorized by attack type and is set to the commonly used value of 0.001 and Xavier [66]
day. It should be noted that the only drawback of the dataset is used as weight initialization method, which is the default
is that the distribution of the classes is highly imbalanced as in the library and is recommended if ReLU is employed
specified in Table 1. in the hidden layers [64]. Both neural networks are trained
with the backpropagation algorithm using conjugate gradient
B. MODEL SETUP optimization algorithm, which is a recommended choice
In this section, discussion is made on how the configuration for large datasets and real-valued outputs [64]. Nesterovs
of the AE, VAE and OCSVM methods is made and how the updater [67] is selected because it uses the momentum that
selection of their parameters is carried out to achieve the best supports the learning procedure to escape local minima and
anomaly-based intrusion detection model. discover better solutions to the optimization process [64].
By virtue of the fact that the methods used in this study The additional parameters used in VAE configuration are as
are parametrized, the performance of the models need to be follow. Leaky Rectified Linear Unit [68] is used in hidden
determined by the optimal parameters. The cross-validation layers as activation function. It is preferred over sigmoid
task cannot be performed in the optimization of hyperparam- or tanh since it resolves the ‘‘dying ReLU’’ problem of
eters as the training process is not executed using anomalous the vanilla ReLU and the vanishing gradient problem of
data [63]. Thus, the tuning of hyperparameters is performed sigmoid/tanh [64]. Gaussian reconstruction distribution is
by taking into consideration the recommendations proposed used with hyperbolic tangent (tanh) pzx activation function
by Patterson and Gibson [64]. After that, the hyperparameters because the type of data being modeled is real-valued [42].
of AE and VAE are mostly configured using rule of thumb. In the additional parameters used in AE configuration,
The default values of the library are used in the configuration which are commonly used and constitute the basis of the
of OCSVM model. studies such as [41], [69], the sigmoid activation function
is used, except for the output layer employing soft-max
activation function [65], with mean square error loss function.
Both AE and VAE neural networks were implemented with
deeplearning4j library [70] and trained in 1000 epochs with
a batch size of 8192.
In OCSVM, radial basis function was used as a kernel
with the default parameters, where γ (gamma) was set to 0.5,
the cost parameter C was set to 1 and the parameter v (nu) was
set to 0.5. The model was created using LibSVM library [71].
As an anomaly score, the RP is used in VAE, the RE is used
in AE, and the prediction score is used in OCSVM without
applying any threshold (if the score is less than 0, the example
is an anomaly, else it is normal).
C. PERFORMANCE EVALUATION
Initially, the normalization was carried out in the features
of the dataset using the feature scaling method [72], which
FIGURE 7. The architectural diagram of AE and VAE used in models. was also called unity-based normalization, to bring all values
into the range [0,1]. Separate training and test datasets were
A simple form of deep autoencoder architecture was used to assess the performance of the methods. The models
chosen to perform experiments. The architectural diagram were created by training the neural network with only normal
of AE and VAE is illustrated in Fig. 7. For both AE and flow data on ‘‘Monday’’. The reason is that the models are
VAE, the encoder has two hidden layers with 512 and trained in unsupervised approach, the last column in dataset
256 dimensions and the decoder has two hidden layers with corresponds to attack class that is not used in training. Both
256 and 512 dimensions, respectively. The bottleneck layer the normal and attack traffic in other days were used for
(latent dimension) of both AE and VAE has 64 dimensions. the testing process. In the testing process, all the classes
TABLE 1. The number of examples in CICDS2017 dataset categorized by attack type and day.
except ‘‘normal’’ or ‘‘benign’’ are considered as ‘‘anomaly’’. TABLE 2. The AUC results of the methods for specified attacks.
In the performance evaluation of an attack class, the metrics
are calculated only using the normal flow records and flow
records of that attack class excluding all the other attack
classes. For example, in the evaluation of ‘‘FTP-Patator’’
attack, ‘‘1829371’’ normal flow records and ‘‘7938’’ attack
flow records are used. Another evaluation strategy is also
performed by combining all attacks into one class ‘‘anomaly’’
other than ‘‘normal’’ in order to comprehend if the methods
can discriminate the unknown attacks from normal traffic.
The performance evaluation of methods is accomplished
by using both ROC and AUC metrics. An evaluation that
is based on AUC measure, is carried out as it is the de
facto standard in unsupervised anomaly detection along
with its feasible interpretability [73]. More specifically,
the whole ROC curve is summarized while the performance
is aggregated over the entire range of threshold values [73].
ROC based evaluation was performed as it was insensitive
to class distribution and also to demonstrate that one method
could perform better than another in different thresholds
(corresponds to different FP values in ROC graph).
The AUC results of the methods are given in Table 2.
In addition, the ROC curves of the algorithm applied to
different kinds of unknown (meaning not trained) attacks performance which indicate they fail to distinguish attacks or
are demonstrated in Fig. 8. The AUC results and the abnormal cases from normal. It is observed that VAE detects
interpretation of ROC curves are explained as follows. abnormal cases better than AE does in the occurrence of
An academic point system was also used as a guide in the high-rate of attacks such as various kinds of DoS attacks and
interpretation of AUC results. DDoS, and both of them exhibit good performance. In the
If the AUC values in Table 2 are analyzed in general, it is attacks of ‘‘SSH Patator’’, ‘‘Web Attack – SQL Injection’’,
realized that VAE performs better than the other two methods, ‘‘Bot’’ and ‘‘Portscan’’, it is noticed that all three methods
and OCSVM performs poorly, except for some attacks. perform poorly. Although each one contains a different
Moreover, some attacks have excellent or good AUC values number of attack samples and is in different attack categories
for all three methods, while others show poor or unrealistic such as brute force, web application attack, bot, and port
FIGURE 8. The ROC curve of (a) normal vs FTP-Patator, (b) normal vs SSH-Patator, (c) normal vs DoS Slowloris, (d) normal vs DoS Slowhttptest, (e) normal
vs DoS Hulk, (f) normal vs DoS GoldenEye, (g) normal vs Heartbleed, (h) normal vs Web Attack-Brute force (i) normal vs Web Attack-XSS, (j) normal vs
Web attack-SQL injection, (k) normal vs infiltration, (l) normal vs Bot, (m) normal vs DDoS, (n) normal vs PortScan, (o) normal vs all attacks.
scan respectively, it can be seen that none of the training to be carried out to improve the performance of the models
models of all three methods are able to distinguish these and to differentiate between these attacks by appending
attacks sufficiently. Consequently, additional research needs further different flow-based features or data or employing
more highly developed machine learning methodology. It is better, for the most part, than AE and OCSVM. However,
interesting to see that all three methods appear to exhibit good it is also important to mention that the methods need to be
and excellent performances in the cases of ‘‘Infiltration’’ and supported by supervised learning methods due to their high
‘‘Heartbleed’’ attacks, respectfully. The reason for this may false alarms. Furthermore, in order to increase the detection
be that the certain attack types are in small numbers in the rate of methods, the flow-based features collected at specified
dataset, so further research need to be conducted to confirm time intervals can be considered due to the fact that the
this assumption by increasing the number of samples of these characteristics of some attacks could be better modelled.
attacks. We thank Google by providing free credit in order to
By combining all attacks types as an anomaly to calculate be able to use its infrastructure for training and testing
a single AUC result and ROC curve, how the methods behave algorithms.
when ‘‘All attacks’’ take place in the network can be assessed.
According to AUC results, VAE wins over all others with REFERENCES
fair performance. The AE also displays fair performance [1] Y. Abuadlla, G. Kvascev, S. Gajin, and Z. Jovanovic, ‘‘Flow-based
whereas the performance of OCSVM is poor. This denotes anomaly intrusion detection system using two neural network stages,’’
that the ability of VAE and AE to model normal network Comput. Sci. Inf. Syst., vol. 11, no. 2, pp. 601–622, 2014, doi: 10.2298/
CSIS130415035A.
traffic is quite successful in detecting fourteen different types
[2] M. Yousefi-Azar, V. Varadharajan, L. Hamey, and U. Tupakula,
of unknown intrusions in wide variety of attack categories. ‘‘Autoencoder-based feature learning for cyber security applications,’’ in
Moreover, it should be noted that although the number of Proc. Int. Joint Conf. Neural Netw. (IJCNN), May 2017, pp. 3854–3861,
doi: 10.1109/IJCNN.2017.7966342.
data samples used in training process (about 19% of the total
[3] B. Zhang, Y. Yu, and J. Li, ‘‘Network intrusion detection based on stacked
dataset) is much less than the number of data samples used sparse autoencoder and binary tree ensemble method,’’ in Proc. IEEE
in the testing process (about 81% of the total data set), the Int. Conf. Commun. Workshops (ICC Workshops), May 2018, pp. 1–6,
good performances of both VAE and AE clearly show that doi: 10.1109/ICCW.2018.8403759.
[4] B. Abolhasanzadeh, ‘‘Nonlinear dimensionality reduction for
these methods are highly capable of detecting the intrusions intrusion detection using auto-encoder bottleneck features,’’ in
or anomalies from flow based features. The use of much Proc. 7th Conf. Inf. Knowl. Technol. (IKT), May 2015, pp. 1–5,
more normal data samples in training (ratios of commonly doi: 10.1109/IKT.2015.7288799.
used splitting is 70% for training and 30% for testing in [5] S. N. Mighan and M. Kahani, ‘‘Deep learning based latent feature
extraction for intrusion detection,’’ in Proc. Iranian Conf. Electr. Eng.
supervised learning approach) when building the model of (ICEE), May 2018, pp. 1511–1516, doi: 10.1109/ICEE.2018.8472418.
normal network traffic from flow features can increase the [6] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, ‘‘A deep learning approach to
performance ratio even further. network intrusion detection,’’ IEEE Trans. Emerg. Topics Comput. Intell.,
vol. 2, no. 1, pp. 41–50, Feb. 2018, doi: 10.1109/TETCI.2017.2772792.
If all ROC curves in Fig. 8 are examined, it is seen that [7] U. Cekmez, Z. Erdem, A. G. Yavuz, O. K. Sahingoz, and A. Buldu,
the attacks having the same behavioral nature exhibit similar ‘‘Network anomaly detection with deep learning,’’ in Proc. 26th
ROC curves in VAE even if it is not applicable to all attacks. Signal Process. Commun. Appl. Conf. (SIU), May 2018, pp. 1–4,
doi: 10.1109/SIU.2018.8404817.
For example, the ROC curves of ‘‘Web Attack-Brute force’’,
[8] R. C. Aygun and A. G. Yavuz, ‘‘A stochastic data discrimination
‘‘Web Attack-XSS’’ and ‘‘Web Attack-SQL Injection’’ proves based autoencoder approach for network anomaly detection,’’ in Proc.
this assumption right. Another similar observation can be 25th Signal Process. Commun. Appl. Conf. (SIU), May 2017, pp. 1–4,
made for ‘‘FTP Patator’’ and ‘‘SSH Patator’’ in the category doi: 10.1109/SIU.2017.7960410.
[9] S. Naseer, Y. Saleem, S. Khalid, M. K. Bashir, J. Han, M. M. Iqbal,
of brute force. Even though AUC values of VAE and AE and K. Han, ‘‘Enhanced network anomaly detection based on deep
methods yield good performance, the indicators that make it neural networks,’’ IEEE Access, vol. 6, pp. 48231–48246, 2018,
difficult to use these methods in real life scenario without any doi: 10.1109/ACCESS.2018.2863036.
supplementary methods are clearly noticed in ROC curves. [10] Z. Chen, C. K. Yeo, B. S. Lee, and C. T. Lau, ‘‘Autoencoder-based
network anomaly detection,’’ in Proc. Wireless Telecommun. Symp. (WTS),
Moreover, all ROC curves show that no high TP values Apr. 2018, pp. 1–5, doi: 10.1109/WTS.2018.8363930.
could be obtained in the low FP values in almost any of [11] KDD Cup 1999 Data. Accessed: Mar. 30, 2019. [Online]. Available:
the attack types (except Heartbleed). This observation can be http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[12] F. Farahnakian and J. Heikkonen, ‘‘A deep auto-encoder based
interpreted as it is not easy to determine a proper threshold approach for intrusion detection system,’’ in Proc. 20th Int. Conf.
value that provides high detection accuracy or low false alarm Adv. Commun. Technol. (ICACT), Feb. 2018, pp. 178–183, doi: 10.23919/
rate in a practical intrusion detection system. ICACT.2018.8323688.
[13] NSL-KDD Dataset. Accessed: Jan. 30, 2019. [Online]. Available:
https://www.unb.ca/cic/datasets/nsl.html
V. CONCLUSION [14] J. McHugh, ‘‘Testing intrusion detection systems: A critique of the 1998
In this study, the detection capabilities of AE and VAE deep and 1999 DARPA intrusion detection system evaluations as performed by
learning methods together with OCSVM were analyzed by lincoln laboratory,’’ ACM Trans. Inf. Syst. Secur., vol. 3, no. 4, pp. 262–294,
Nov. 2000.
applying a semi-supervised learning strategy. The creation [15] M. F. Umer, M. Sher, and Y. Bi, ‘‘Flow-based intrusion detection:
of the models was carried out using normal flow-based Techniques and challenges,’’ Comput. Secur., vol. 70, no. 1, pp. 238–254,
data only. Moreover, the testing of the models was realized 2017, doi: 10.1016/j.cose.2017.05.009.
by using both normal and anomaly data. The experimental [16] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and
B. Stiller, ‘‘An overview of IP flow-based intrusion detection,’’ IEEE
results were computed in terms of ROC curves and AUC Commun. Surveys Tuts., vol. 12, no. 3, pp. 343–356, 3rd Quart., 2010,
metrics. Based on the results, the detection rate of VAE is doi: 10.1109/SURV.2010.032210.00054.
[17] R. Beghdad, ‘‘Critical study of neural networks in detecting intru- [37] Y. Kanda, R. Fontugne, K. Fukuda, and T. Sugawara, ‘‘ADMIRE: Anomaly
sions,’’ Comput. Secur., vol. 27, nos. 5–6, pp. 168–175, Oct. 2008, detection method using entropy-based PCA with three-step sketches,’’
doi: 10.1016/j.cose.2008.06.001. Comput. Commun., vol. 36, no. 5, pp. 575–588, Mar. 2013, doi: 10.1016/J.
[18] S. Song, L. Ling, and C. N. Manikopoulo, ‘‘Flow-based statisti- COMCOM.2012.12.002.
cal aggregation schemes for network anomaly detection,’’ in Proc. [38] M. H. Haghighat and J. Li, ‘‘Edmund: Entropy based attack detection and
IEEE Int. Conf. Netw., Sens. Control, Dec. 2006, pp. 786–791, mitigation engine using netflow data,’’ in Proc. 8th Int. Conf. Commun.
doi: 10.1109/icnsc.2006.1673246. Netw. Secur., 2018, pp. 1–6, doi: 10.1145/3290480.3290484.
[19] Q. A. Tran, F. Jiang, and J. Hu, ‘‘A real-time netFlow-based intrusion detec- [39] G. E. Hinton and R. S. Zemel, ‘‘Autoencoders, minimum description length
tion system with improved BBNN and high-frequency field programmable and Helmholtz free energy,’’ in Proc. Adv. Neural Inf. Process. Syst.,
gate arrays,’’ in Proc. IEEE 11th Int. Conf. Trust, Secur. Privacy Comput. vol. 6, 1994, pp. 3–10, doi: 10.1021/jp906511z.
Commun., Jun. 2012, pp. 201–208, doi: 10.1109/TrustCom.2012.51. [40] H. Bourlard and Y. Kamp, ‘‘Auto-association by multilayer perceptrons
[20] Z. Jadidi, V. Muthukkumarasamy, and E. Sithirasenan, ‘‘Metaheuris- and singular value decomposition,’’ Biol. Cybern., vol. 59, nos. 4–5,
tic algorithms based flow anomaly detector,’’ in Proc. 19th Asia– pp. 291–294, Sep. 1988, doi: 10.1007/BF00332918.
Pacific Conf. Commun. (APCC), Aug. 2013, pp. 717–722, doi: 10.1109/ [41] M. Sakurada and T. Yairi, ‘‘Anomaly detection using autoencoders with
APCC.2013.6766043. nonlinear dimensionality reduction,’’ in Proc. MLSDA 2nd Workshop
[21] Y. Mirsky, T. Doitshman, Y. Elovici, and A. Shabtai, ‘‘Kitsune: An ensem- Mach. Learn. Sensory Data Anal., 2014. p. 4, doi: 10.1145/2689746.
ble of autoencoders for online network intrusion detection,’’ in Proc. Netw. 2689747.
Distrib. Syst. Secur. Symp., 2018, pp. 1–4, doi: 10.14722/ndss.2018.23204. [42] J. An and S. Cho, ‘‘Variational autoencoder based anomaly detection using
[22] N. Marir, H. Wang, G. Feng, B. Li, and M. Jia, ‘‘Distributed abnormal reconstruction probability,’’ SNU Data Mining Center, Seoul, South Korea,
behavior detection approach based on deep belief network and ensemble Tech. Rep., 2015.
SVM using spark,’’ IEEE Access, vol. 6, pp. 59657–59671, 2018, [43] D. P Kingma and M. Welling, ‘‘Auto-encoding variational Bayes,’’ 2013,
doi: 10.1109/ACCESS.2018.2875045. arXiv:1312.6114. [Online]. Available: http://arxiv.org/abs/1312.6114
[23] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, [44] B. E. Boser, I. M. Guyon, and V. N. Vapnik, ‘‘A training algorithm for
A. Al-Nemrat, and S. Venkatraman, ‘‘Deep learning approach optimal margin classifiers,’’ in Proc. 5th Annu. Workshop Comput. Learn.
for intelligent intrusion detection system,’’ IEEE Access, vol. 7, Theory, 1992, pp. 144–152, doi: 10.1145/130385.130401.
pp. 41525–41550, 2019, doi: 10.1109/ACCESS.2019.2895334. [45] Z.-Q. Zeng, H.-B. Yu, H.-R. Xu, Y.-Q. Xie, and J. Gao, ‘‘Fast training
[24] P. Winter, E. Hermann, and M. Zeilinger, ‘‘Inductive intrusion detection support vector machines using parallel sequential minimal optimization,’’
in flow-based network data using one-class support vector machines,’’ in in Proc. 3rd Int. Conf. Intell. Syst. Knowl. Eng., Nov. 2008, pp. 185–208.
Proc. 4th IFIP Int. Conf. New Technol., Mobility Secur., Feb. 2011, pp. 1–5, [46] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and
doi: 10.1109/NTMS.2011.5720582. R. C. Williamson, ‘‘Estimating the support of a high-dimensional
[25] C. Wagner, J. Franáois, R. State, and T. Engel, ‘‘Machine learning approach distribution,’’ Neural Comput., vol. 13, no. 7, pp. 1443–1471, Jul. 2001.
for IP-flow record anomaly detection,’’ Lecture Notes Comput. Sci., [47] Y. Wang, J. Wong, and A. Miner, ‘‘Anomaly intrusion detection using one
vol. 6640, no. 1, pp. 28–39, 2011, doi: 10.1007/978-3-642-20757-0_3. class SVM,’’ in Proc. 5th Annu. IEEE SMC Inf. Assurance Workshop, 2004,
[26] M. F. Umer, M. Sher, and Y. Bi, ‘‘A two-stage flow-based intrusion pp. 358–364, doi: 10.1109/IAW.2004.1437839.
detection model for next-generation networks,’’ PLoS ONE, vol. 13, no. 1, [48] C.-C. Chang and C.-J. Lin, ‘‘LIBSVM: A library for support vector
Jan. 2018, Art. no. e0180945, doi: 10.1371/journal.pone.0180945. machines,’’ ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011.
[27] A. Shubair, S. Ramadass, and A. A. Altyeb, ‘‘KENFIS: KNN-based [49] O. Gu, P. Fogla, D. Dagon, W. Lee, and B. Škorić, ‘‘Measuring intrusion
evolving neuro-fuzzy inference system for computer worms detection,’’ J. detection capability: An information-theoretic approach,’’ in Proc. ACM
Intell. Fuzzy Syst., vol. 26, no. 4, pp. 1893–1908, 2014, doi: 10.3233/IFS- Symp. Inf., Comput. Commun. Secur. (ASIACCS), vol. 2006, 2006,
130868. pp. 90–101, doi: 10.1145/1128817.1128834.
[28] K. A. P. Costa, L. A. M. Pereira, R. Y. M. Nakamura, C. R. Pereira,
[50] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao,
J. P. Papa, and A. X. Falcáo, ‘‘A nature-inspired approach to speed up
H. Hou, and C. Wang, ‘‘Machine learning and deep learning meth-
optimum-path forest clustering and its application to intrusion detection
ods for cybersecurity,’’ IEEE Access, vol. 6, pp. 35365–35381, 2018,
in computer networks,’’ Inf. Sci., vol. 294, pp. 95–108, Feb. 2015,
doi: 10.1109/ACCESS.2018.2836950.
doi: 10.1016/j.ins.2014.09.025.
[51] K. A. Spackman, ‘‘Signal detection theory: Valuable tools for evaluating
[29] A. Lakhina, M. Crovella, and C. Diot, ‘‘Mining anomalies using traffic
inductive learning,’’ in Proc. 6th Int. Workshop Mach. Learn., 1989,
feature distributions,’’ in Proc. Conf. Appl., Technol., Archit., Protocols
pp. 160–163.
Comput. Commun., 2005, p. 217, doi: 10.1145/1080091.1080118.
[52] T. Fawcett, ‘‘ROC graphs: Notes and practical considerations for
[30] P. Casas, J. Mazel, and P. Owezarski, ‘‘UNADA: Unsupervised network
researchers,’’ HP Labs, Palo Alto, CA, USA, Tech Rep. HPL-2003-4, 2004,
anomaly detection using sub-space outliers ranking,’’ in Lecture Notes
pp. 1–38, doi: 10.1.1.10.9777.
Comput. Sci., vol. 6640, no. 1, pp. 40–51, 2011, doi: 10.1007/978-3-642-
[53] H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and
20757-0_4.
[31] F. Hosseinpour, P. V. Amoli, F. Farahnakian, J. Plosila, T. Hämäläinen, and Applications. Hoboken, NJ, USA: Wiley, 2013.
C. Author, ‘‘Artificial immune system based intrusion detection: Innate [54] A. Tharwat, ‘‘Classification assessment methods,’’ Appl. Comput. Inform.,
immunity using an unsupervised learning approach,’’ Int. J. Digit. Content pp. 1–13, Aug. 2018, doi: 10.1016/j.aci.2018.08.003.
Technol. Appl., vol. 8, no. 5, 2014. [55] G. T. Tape. The Area Under an ROC Curve. Accessed: Feb. 18, 2019.
[32] A. Satoh, Y. Nakamura, and T. Ikenaga, ‘‘A flow-based detection method [Online]. Available: http://gim.unmc.edu/dxtests/roc3.htm
for stealthy dictionary attacks against secure shell,’’ J. Inf. Secur. Appl., [56] X. Zhu and A. B. Goldberg, ‘‘Introduction to semi-supervised learning,’’
vol. 21, pp. 31–41, Apr. 2015, doi: 10.1016/j.jisa.2014.08.003. Synth. Lect. Artif. Intell. Mach. Learn., vol. 3, no. 1, pp. 1–130, 2009,
[33] S. Thaseen and C. A. Kumar, ‘‘An analysis of supervised tree based doi: 10.2200/S00196ED1V01Y200906AIM006.
classifiers for intrusion detection system,’’ in Proc. Int. Conf. Pattern [57] J. Song, H. Takakura, Y. Okabe, M. Eto, D. Inoue, and K. Nakao,
Recognit., Informat. Mobile Eng., Feb. 2013, pp. 294–299, doi: 10.1109/ ‘‘Statistical Analysis of Honeypot Data and Building of Kyoto 2006+
ICPRIME.2013.6496489. Dataset for NIDS Evaluation,’’ in Proc. 1st Workshop Building Anal.
[34] D. Zhao, I. Traore, B. Sayed, W. Lu, S. Saad, A. Ghorbani, and Datasets Gathering Exper. Returns Secur., vol. 2011, pp. 29–36,
D. Garant, ‘‘Botnet detection based on traffic behavior analysis and flow doi: 10.1145/1978672.1978676.
intervals,’’ Comput. Secur., vol. 39, pp. 2–16, Nov. 2013, doi: 10.1016/ [58] S. García, M. Grill, J. Stiborek, and A. Zunino, ‘‘An empirical comparison
j.cose.2013.04.007. of botnet detection methods,’’ Comput. Secur., vol. 45, pp. 100–123,
[35] F. Haddadi, D. Runkel, A. Nur Zincir-Heywood, and M. I. Heywood, Sep. 2014, doi: 10.1016/J.COSE.2014.05.011.
‘‘On botnet behaviour analysis using GP and C4.5,’’ in Proc. Genetic [59] N. Moustafa and J. Slay, ‘‘UNSW-NB15: A comprehensive data set
Evol. Comput. Conf. (GECCO), 2014, pp. 1253–1260, doi: 10.1145/ for network intrusion detection systems (UNSW-NB15 network data
2598394.2605435. set),’’ in Proc. Mil. Commun. Inf. Syst. Conf., 2015, pp. 1–8, doi:
[36] M. Stevanovic and J. M. Pedersen, ‘‘An efficient flow-based botnet 10.1109/MilCIS.2015.7348942.
detection using supervised machine learning,’’ in Proc. Int. Conf. [60] M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho, ‘‘Flow-based
Comput., Netw. Commun. (ICNC), Feb. 2014, pp. 797–801, doi: 10.1109/ benchmark data sets for intrusion detection,’’ in Proc. 16th Eur. Conf.
ICCNC.2014.6785439. Cyber Warfare Secur. (ACPI), 2017, pp. 361–369.
[61] I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, ‘‘Toward generat- SULTAN ZAVRAK received the B.Sc. and
ing a new intrusion detection dataset and intrusion traffic characterization,’’ M.Sc. degrees in computer engineering from
in Proc. 4th Int. Conf. Inf. Syst. Secur. Privacy, 2018, pp. 108–116, Karadeniz Technical University, Trabzon, Turkey,
doi: 10.5220/0006639801080116. in 2010 and 2013, respectively. He is currently
[62] C. Beek, D. Dinkar, Y. Gund, G. Lancioni, N. Minihane, F. Moreno, pursuing the Ph.D. degree with Sakarya Uni-
E. Peterson, T. Roccia, C. Schmugar, and R. Simon, ‘‘McAfee labs threats versity, Sakarya, Turkey, under the supervision
report: September 2017,’’ McAfee, Santa Clara, CA, USA, Tech. Rep., of Murat İskefiyeli. He is also working as
2017. Accessed: Feb. 20, 2019. [Online]. Available: https://www.mcafee.
a Research Assistant with the Department of
com/enterprise/en-us/assets/reports/rp-quarterly-threats-sept-2017.pdf
[63] V. Loi Cao, M. Nicolau, and J. McDermott, ‘‘Learning neural representa-
Computer Engineering, Engineering Faculty,
tions for network anomaly detection,’’ IEEE Trans. Cybern., vol. 49, no. 8, Düzce University. His research interests include
pp. 3074–3087, Aug. 2019, doi: 10.1109/TCYB.2018.2838668. computer networks, network security, and machine learning.
[64] J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach.
Newton, MA, USA: O’Reilly Media, 2017.
[65] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
MA, USA: MIT Press, 2016.
[66] X. Glorot and Y. Bengio, ‘‘Understanding the difficulty of training deep
feedforward neural networks,’’ Tech. Rep.
[67] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, ‘‘On the importance of
initialization and momentum in deep learning,’’ Tech. Rep., 2013.
[68] A. L. Maas, A. Y. Hannun, and A. Y. Ng, ‘‘Rectifier nonlinearities improve
neural network acoustic models,’’ in Proc. ICML Workshop Deep Learn.
Audio, Speech Lang. Process., 2013, pp. 1–8. Accessed: Jan. 30, 2019.
[Online]. Available: https://ai.stanford.edu/~amaas/papers/relu_hybrid_
icml2013_final.pdf
[69] C. Zhou and R. C. Paaenroth, ‘‘Anomaly detection with robust deep
autoencoders,’’ in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery
Data Mining, 2017, pp. 665–674, doi: 10.1145/3097983.3098052.
[70] (2019). Deeplearning4j. Accessed: Feb. 23, 2019, [Online]. Available: MURAT İSKEFIYELI (Associate Member, IEEE)
https://deeplearning4j.org/
received the B.Sc., M.Sc., and Ph.D. degrees in
[71] LIBSVM–A Library for Support Vector Machines. Accessed: Feb. 23, 2019.
electrical-electronics engineering from Sakarya
[Online]. Available: https://www.csie.ntu.edu.tw/~cjlin/libsvm/
[72] S. Aksoy and R. M. Haralick, ‘‘Feature normalization and likelihood-based University, Sakarya, Turkey, in 1998, 2002, and
similarity measures for image retrieval,’’ Pattern Recognit. Lett., vol. 22, 2010, respectively. He is currently working as
no. 5, pp. 563–582, 2001. an Assistant Professor with the Department of
[73] W. J. Krzanowski and D. J. Hand, ROC Curves for Continuous Data. Boca Computer Engineering, Faculty of Computer and
Raton, FL, USA: CRC Press, 2009. Information Sciences, Sakarya University. His
[74] H.-J. Liao, C.-H. R. Lin, Y.-C. Lin, and K.-Y. Tung, ‘‘Intrusion detection research interests include computer networks,
system: A comprehensive review,’’ J. Netw. Comput. Appl., vol. 36, no. 1, network security, and machine learning.
pp. 16–24, 2013, doi: 10.1016/j.jnca.2012.09.004.