Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder

This study focuses on detecting anomalous network traffic using flow-based data through unsupervised deep learning methods, specifically Autoencoder and Variational Autoencoder. The experimental results demonstrate that the Variational Autoencoder outperforms both Autoencoder and One-Class Support Vector Machine in identifying unknown attacks. The research highlights the effectiveness of flow-based intrusion detection systems in handling the increasing network traffic and cyber threats.

Uploaded by

ceitakilan25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder

Uploaded by

ceitakilan25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Received May 12, 2020, accepted June 1, 2020, date of publication June 10, 2020, date of current version

June 22, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.3001350

Anomaly-Based Intrusion Detection From

Network Flow Features Using Variational
Autoencoder
SULTAN ZAVRAK 1,2 AND MURAT İSKEFIYELI 3, (Associate Member, IEEE)
1 Department of Computer and Information Engineering, Sakarya University, 54187 Sakarya, Turkey
2 Department of Computer Engineering, Düzce University, 81620 Düzce, Turkey
3 Department of Computer Engineering, Sakarya University, 54187 Sakarya, Turkey

Corresponding author: Sultan Zavrak ([email protected])

The work of Sultan Zavrak was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) through the
2211/C Ph.D. Scholarship Programme for Priority Areas.

ABSTRACT The rapid increase in network traffic has recently led to the importance of flow-based intrusion
detection systems processing a small amount of traffic data. Furthermore, anomaly-based methods, which
can identify unknown attacks are also integrated into these systems. In this study, the focus is concentrated
on the detection of anomalous network traffic (or intrusions) from flow-based data using unsupervised deep
learning methods with semi-supervised learning approach. More specifically, Autoencoder and Variational
Autoencoder methods were employed to identify unknown attacks using flow features. In the experiments
carried out, the flow-based features extracted out of network traffic data, including typical and different types
of attacks, were used. The Receiver Operating Characteristics (ROC) and the area under ROC curve, resulting
from these methods were calculated and compared with One-Class Support Vector Machine. The ROC
curves were examined in detail to analyze the performance of the methods in various threshold values. The
experimental results show that Variational Autoencoder performs, for the most part, better than Autoencoder
and One-Class Support Vector Machine.

INDEX TERMS Flow anomaly detection, intrusion detection, deep learning, variational autoencoder,
semi-supervised learning.

I. INTRODUCTION on the network [1]. Intrusion detection approaches are

With the rapid growth of the Internet, cyber-attacks on basically divided into two categories [1]: misuse-based and
networks and computer systems have also increased expe- anomaly-based. Misuse-based techniques work in a way that
ditiously. As a precaution against these attacks, Intrusion the signatures of previously known attacks are compared.
Detection Systems (IDS) are deployed in networking sys- This technique, which works well against known attacks,
tems. Although IDSs have gained considerable significance is insufficient in the detection of the unknown attacks.
in counteracting increased network attacks, payload-based Conversely, anomaly-based intrusion detection methods have
IDSs lack the scalability due to high-speed of networks the ability to identify unknown or zero-day attacks. Because
and increase in network traffic. Consequently, flow-based of the extreme change in normal behavior of the network,
detection approaches have recently been a potential candidate it is hard to obtain the exact definition of normal behavior.
for IDS. Flow-based IDSs are preferred over traditional Therefore, false alarms of anomaly-based techniques are
IDSs that are based on deep packet inspection, for two higher than that of others.
reasons: firstly, they process noticeably smaller amount of Recently, deep learning methods have also been applied
data, and secondly, the flow-data is easily gathered out of in network intrusion detection systems, as it has been seen
network forwarding devices that utilize standard protocols that the deep learning methods could successfully solve
(such as Cisco NetFlow, IETF IPFIX), without installing many problems faced in the research areas such as text
additional software and data is gathered from each computer classification, object recognition and image classification
etc. In this active research area, the studies using deep
The associate editor coordinating the review of this manuscript and learning approaches mostly focus on dimensionality reduc-
approving it for publication was Kim-Kwang Raymond Choo . tion [2]–[6] and anomaly-based intrusion detection [7]–[10].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
108346 VOLUME 8, 2020
S. Zavrak, M. İskefiyeli: Anomaly-Based Intrusion Detection From Network Flow Features Using VAE

The KDDCUP99 dataset [11], which includes packet-based The objective of ANNs is to model the human brain,
features, too, is used to evaluate the methods in the by mimicking neurons, which are small interconnected input
studies [6], [12]. The NSL-KDD dataset [13], which is a units. Each neuron in ANN participates in decision making,
revised version of KDDCUP99, is used for evaluating the and the results are combined. Behaviors of users are modelled
methods proposed in the studies [3], [4], [6]–[8]. Main by ANNs to find a way to detect anomalies. Numerous
drawbacks of these studies could be listed as follows: a) These ANNs used for anomaly-based IDSs were discussed by
studies focus on the detection of intrusions in content-based Beghdad [17]. Sui et al. [18] proposed an anomaly detection
features. b) Dataset used is very outdated and does not reflect system that used a back-propagation neural network classifier
the real network traffic [14]. and statistical feature vectors. They considered three scenar-
As the flow-based data contains less information on ios of resource depletion, DoS attacks, bandwidth attack and
the network traffic compared to payload-based data, it is a combination of bandwidth attack and resource depletion
much harder to detect both known and unknown attacks. using network flow records with DoS attacks. Tran et al. [19]
In this study, the goal is to detect anomalous network traffic proposed a hybrid detection engine, which used block-based
(or intrusions) from flow-based data, which also contain neural network (BBNN) as the detection method. In order to
statistical properties of the flows using deep learning meth- generate a real-time IDS, which was supplied by NetFlow
ods. Furthermore, Autoencoder (AE) and Variational Autoen- data, it was added in a high-frequency FPGA board.
coder (VAE) methods are employed to detect unknown Abuadlla et al. [1], proposed an IDS to detect and classify
attacks, which mean the attacks are not used in training certain intrusions in flow-based data, which consisted of two
phase by using the flow features extracted from network phases. In the first phase, significant changes are monitored
traffic. to identify potential attacks. In the second phase, if an attack
The main contributions of the study are summarized as is known, multi-layer and radial basis function networks
follow: are used to classify the attack. Jadidi et al. [20] proposed a
• This study concentrates on detection of network attacks method that was based on Multi-Layer Perceptron (MLP)
from flow-based features, based on anomaly-based in order to detect abnormal traffic in flow-based data. The
approach. interconnection weights of MLP are optimized by using
• AE and VAE, which are unsupervised deep learn- Cuckoo and particle swarm optimization with a gravitational
ing methods, are employed together with OCSVM search algorithm (PSOGSA). Mirsky et al. [21] proposed
as anomaly detectors and they are trained in a Kitsune, which was an online anomaly detection system that
semi-supervised learning manner. In addition, unlike the identified the attacks on a local network by employing a group
previous studies mentioned above, this study is unique of ANNs named AEs, to cooperatively distinguish between
as it uses deep learning methods for detecting intrusions normal and abnormal traffic patterns with a performance
based on flow-based data. comparable to offline anomaly detectors. Marir et al. [22]
• It is shown that VAE-based anomaly detection system presented a novel distributed method for identifying abnor-
performs much better compared to others based on the mal behavior utilizing a group of multi-layer SVMs together
detailed discussion of ROC and AUC results. with a deep feature extraction in largescale networks. In the
approach proposed, a non-linear dimensionality reduction
The article is organized as follows. In the next section, the was initially performed with a distributed deep belief
studies carried out on flow-based intrusion detection in the networks on network traffic data and then the features
literature are summarized. In the third section, the theoretical extracted were provided as inputs to the multi-layer group of
information on the techniques and evaluation metrics are SVMs which were constructed through the iterative reduce
provided. The experimental methodology and results are paradigm based on Spark. Vinayakumar et al. [23] explored
presented in the fourth section. The final remarks are a deep neural network (DNN) to create a useful and flexible
provided in the last section. IDS, named ‘‘scale-hybrid-IDS-AlertNet’’, and to identify
unforeseen and unpredictable intrusions via supervised
II. RELATED STUDIES learning approach. They selected the network topologies and
The flow-based intrusion detection is on the rise and research optimal network parameters for DNNs by applying various
made in this field are gathering pace. In recent years, hyperparameter settings with KDDCup99 and the best
numerous methods have been proposed, which used flow performed DNN model was also applied on other contents-
data for identifying intrusions. In this section, a review and flow-based public datasets to carry out benchmarks.
of recent trends and particular state-of-the-art algorithms The SVM is a classification method, which transforms
that detect intrusions from flow-based data are summarized. an n-dimensional input data into classes by generating
These include Artificial Neural Networks (ANN), Support vectors in the space. In the research area of intrusion
Vector Machines (SVM), K-Nearest Neighbor (KNN), Deci- detection, SVM is the method preferred as it provides results
sion Trees, clustering and statistical techniques. A more in lower false positive rates and higher accuracies [74].
comprehensive and detailed analysis of flow-based intrusion Winter et al. [24] proposed an inductive network IDS, which
detection can be found in studies of [15] and [16]. functioned on network flows and used OCSVMs for analysis.